Francis Su was the former president of the Mathematical Association of America. He just gave a beautiful address at the AMS-MAA Joint Meeting, entitled “Mathematics for Human Flourishing”.

Basically, when asked about the goal of mathematics, the answer is often related to its contribution to the progress of mankind through the advancement of science. Francis Su explicits what the deepest goal of mathematics may be: contribute to the flourishing not only of mankind as a whole, but of each of us as human beings. Starting from Aristotle's view that a well-lived life goes through the exercise of “virtue” — excellence of character leading to the excellence of conduct. He then quotes five basic desires which mathematics help fulfill while cultivating such virtues: play, beauty, truth, justice and love.

On the beginning of this New Year, I would just like to conclude this short message by repeating his
final wish: ”May you and all your students flourish!”

I would like to discuss today a beautiful theorem of Grothendieck concerning differential equations. It was mentioned by Yves André in a wonderful talk at IHÉS in March 2016 and Hélène Esnault kindly explained its proof to me during a nice walk in the Bavarian Alps last April... The statement is as follows:

Theorem (Grothendieck, 1970). — Let $X$ be a smooth projective complex algebraic variety. Assume that $X$ is simply connected. Then every vector bundle with an integrable connection on $X$ is trivial.

Let indeed $(E,\nabla)$ be a vector bundle with an integrable connection on $X$ and let us show that it is trivial, namely, that there exist $n$ global sections $e_1,\dots,e_n$ of $E$ which are horizontal ($\nabla e_i=0$) and form a basis of $E$ at each point.

Considering the associated analytic picture, we get a vector bundle $(E^{\mathrm{an}},\nabla)$ with an integrable connection on the analytic manifold $X(\mathbf C)$. Let $x\in X(\mathbf C)$. By the theory of linear differential equations, this furnishes a representation $\rho$ of the topological fundamental group $\pi_1(X(\mathbf C),x)$ in the fiber $E_x$ of the vector bundle $E$ at the point $x$. Saying that $(E^{\mathrm{an}},\nabla)$ is trivial on $X(\mathbf C)$ means that this representation $\rho$ is trivial, which seems to be a triviality since $X$ is simply connected.

However, in this statement, simple connectedness means in the sense of algebraic geometry, namely that $X$ has no non-trivial finite étale covering. And this is why the theorem can be surprising, for this hypothesis does not imply that $\pi_1(X(\mathbf C),x))$ is trivial, only that is has no non-trivial finite quotient. This is Grothendieck's version of Riemann's existence theorem, proved in SGA 1.

However, it is known that $X(\mathbf C)$ is topologically equivalent to a finite cellular space, so that its fundamental group $\pi_1(X(\mathbf C),x)$ is finitely presented.

Proposition (Malčev, 1940). — Let $G$ be a finitely generated subgroup of $\mathrm{GL}(n,\mathbf C)$. Then $G$ is residually finite: for every finite subset $T$ of $G$ not containing $\{\mathrm I_n\}$, there exists a finite group $K$ and a morphism $f\colon G\to K$ such that $T\cap \operatorname{Ker}(f)=\varnothing$.

Consequently, the image of $\rho$ is residually finite. If it were non-trivial, there would exist a non-trivial finite quotient $K$ of $\operatorname{im}(\rho)$, hence a non-trivial finite quotient of $\pi_1(X(\mathbf C),x)$, which, as we have seen, is impossible. Consequently, the image of $\rho$ is trivial and $(E^{\mathrm{an}},\nabla)$ is trivial.

In other words, there exists a basis $(e_1,\dots,e_n)$ of horizontal sections of $E^{\mathrm{an}}$. By Serre's GAGA theorem, $e_1,\dots,e_n$ are in fact algebraic, ie, induced by actual global sections of $E$ on $X$. By construction, they are horizontal and form a basis of $E$ at each point. Q.E.D.

It now remains to explain the proof of the proposition. Let $S$ be a finite symmetric generating subset of $G$ containing $T$, not containing $\mathrm I_n$, and let $R$ be the subring of $\mathbf C$ generated by the entries of the elements of $S$ and their inverses. It is a non-zero finitely generated $\mathbf Z$-algebra; the elements of $S$ are contained in $\mathrm {GL}(n,R)$, hence $G$ is a subgroup of $\mathrm{GL}(n,R)$. Let $\mathfrak m$ be a maximal ideal of $R$ and let $k$ be its residue field; the point of the story is that this field $k$ is finite (I'll explain why in a minute.) Then the reduction map $R\to k$ induces a morphism of groups $\mathrm{GL}(n,R)\to \mathrm {GL}(n,k)$, hence a morphism $G\to \mathrm{GL}(n,k)$. By construction, a non-zero entry of an element of $S$ is invertible in $R$ hence is mapped to a non-zero element in $k$. Consequently, $S$ is disjoint from the kernel of $f$, as was to be shown.

Lemma. —Let $R$ be a finitely generated $\mathbf Z$-algebra and let $\mathfrak m$ be a maximal ideal of $R$. The residue field $R/\mathfrak m$ is finite.

Proof of the lemma. — This could be summarized by saying that $\mathbf Z$ is a Jacobson ring: if $A$ is a Jacobson ring, then every finitely generated $A$-algebra $K$ which is a field is finite over $A$; in particular, $K$ is a finite extension of a quotient field of $A$. In the case $A=\mathbf Z$, the quotient fields of $\mathbf Z$ are the finite fields $\mathbf F_p$, so that $K$ is a finite extension of a finite field, hence is a finite field. Let us however explain the argument. Let $K$ be the field $R/\mathfrak m$; let us replace $\mathbf Z$ by its quotient $A=\mathbf Z/P$, where $P$ is the kernel of the map $\mathbf Z\to R/\mathfrak m$. There are two cases: either $P=(0)$ and $A=\mathbf Z$, or $P=(p)$, for some prime number $p$, and $A$ is the finite field $\mathbf F_p$;
we will eventually see that the first case cannot happen.

Now, $K$ is a field which is a finitely generated algebra over a subalgebra $A$; let $k$ be the fraction field of $A$. The field $K$ is now a finitely generated algera over its subfield $k$; by Zariski's form of Hilbert's Nullstellensatz, $K$ is a finite algebraic extension of $k$. Let us choose a finite generating subset $S$ of $K$ as a $k$-algebra; each element of $S$ is algebraic over $k$; let us consider the product $f$ of the leading coefficients of their minimal polynomials, chosen to belong to $A[T]$ and let $A'=A[1/f]$. By construction, the elements of $S$ are integral over $K$, hence $K$ is integral over $A'$. Since $K$ is a field, we deduce that $A'$ is a field. To conclude, we split the discussion into the two cases stated above.

If $P=(p)$, then $A=\mathbf F_p$, hence $k=\mathbf F_p$ as well, and $K$ is a finite extension of $\mathbf F_p$, hence is a finite field.

Let us assume, by contradiction, that $P=(0)$, hence $A=\mathbf Z$ and $k=\mathbf Q$. By what precedes, there exists an element $f\in\mathbf Z$ such that $\mathbf Q=\mathbf Z[1/f]$. But this cannot be true, because $\mathbf Z[1/f]$ is not a field. Indeed, any prime number which does not divide $f$ is not invertible in $\mathbf Z[1/f]$. This concludes the proof of the lemma.

Remarks. — 1) The theorem does not hold if $X$ is not proper. For example, the affine line $\mathbf A^1_{\mathbf C}$ is simply connected, both algebraically and topologically, but the trivial line bundle $E=\mathscr O_X\cdot e$ endowed with the connection defined by $\nabla (e)=e$ is not trivial. It is analytically trivial though, but its horizontal analytic sections are of the form $\lambda \exp(z) e$, for $\lambda\in\mathbf C$, and except for $\lambda=0$, none of them are algebraic.
However, the theorem holds if one assumes moreover that the connection has regular singularities at infinity.

3) The analogous result in positive characteristic is a conjecture by Johan De Jong formulated in 2010: If $X$ is a projective smooth simply connected algebraic variety over an algebraically closed field of characteristic $p$, then every isocrystal is trivial. It is still open, despite beautiful progress by Hélène Esnault, together with Vikram Mehta and Atsushi Shiho.

A colleague just sent me Xerox copies of a few pages of a 1899 biography of the général Bourbaki. Its author, François Bournand, was the private secretary of Édouard Drumont, an antisemitic writer and journalist. The book would probably not be worth much being mentioned here without its dedication:

À l'abbé Félix Klein

de l'Institut catholique

Hommage respectueux de son dévoué en N.-S.

François Bournand

Professeur d'histoire de l'art à l'École professionnelle catholique

Abbé is abbot, in this context, a catholic priest without a parish; the French initials N.-S. mean Notre Seigneur, Our Lord. It appears that this Félix Klein (note the accent on the e) also has a Wikipedia page.

A few days ago, The Scotsman published a paper about Klaus Roth's legacy, explaining how he donated his fortune (1 million pounds) to various charities. This paper was reported by some friends on Facebook. Yuri Bilu added the mention that he knew two important theorems of Roth, and since one of them did not immediately reached my mind, I decided to write this post.

The first theorem was a 1935 conjecture of Erdős and Turán concerning arithmetic progression of length 3 that Roth proved in 1952. That is, one is given a set $A$ of positive integers and one seeks for triples $(a,b,c)$ of distinct elements of $A$ such that $a+c=2b$; Roth proved that infinitely many such triples exist as soon as the upper density of $A$ is positive, that is:
\[ \limsup_{x\to+\infty} \frac{\mathop{\rm Card}(A\cap [0;x])}x >0. \]
In 1975, Endre Szemerédi proved that such sets of integers contain (finite) arithmetic progressions of arbitrarily large length. Other proofs have been given by Hillel Furstenberg (using ergodic theory) and Tim Gowers (by Fourier/combinatorical methods); Roth had used Hardy-Littlewood's circle method.

In 1976, Erdős strengthened his initial conjecture with Turán and predicted that arithmetic progressions of arbitrarily large length exist in $A$ as soon as
\[ \sum_{a\in A} \frac 1a =+\infty.\]
Such a result is still a conjecture, even for arithmetic progressions of length $3$, but a remarkable particular case has been proved by Ben Green and Terry Tao in 2004, when $A$ is the set of all prime numbers.

Outstanding as these results are (Tao has been given the Fields medal in 2006 and Szemerédi the Abel prize in 2012), the second theorem of Roth was proved in 1955 and was certainly the main reason for awarding him the Fields medal in 1958. Indeed, Roth gave a definitive answer to a long standing question in diophantine approximation that originated from the works of Joseph Liouville (1844). Given a real number $\alpha$, one is interested to rational fractions $p/q$ that are close to $\alpha$, and to the quality of the approximation, namely the exponent $n$ such that $\left| \alpha- \frac pq \right|\leq 1/q^n$. Precisely, the approximation exponent $\kappa(\alpha)$ is the largest lower bound of all real numbers $n$ such that the previous inequality has infinitely many solutions in fractions $p/q$, and Roth's theorem asserts that one has $\kappa(\alpha)=2$ when $\alpha$ is an irrational algebraic number.

One part of this result goes back to Dirichlet, showing that for any irrational number $\alpha$, there exist many good approximations with exponent $2$. This can be proved using the theory of continued fractions and is also a classical application of Dirichlet's box principle. Take a positive integer $Q$ and consider the $Q+1$ numbers $q\alpha- \lfloor q\alpha\rfloor$ in $[0,1]$, for $0\leq q\leq Q$; two of them must be less that $1/Q$ apart; this furnishes integers $p',p'',q',q''$, with $0\leq q'<q''\leq Q$ such that $\left| (q''\alpha-p'')-(q'\alpha-p')\right|\leq 1/Q$; then set $p=p''-p'$ and $q=q''-q'$; one has $\left| q\alpha -p \right|\leq 1/Q$, hence $\left|\alpha-\frac pq\right|\leq 1/Qq\leq 1/q^2$.

To prove an inequality in the other direction, Liouville's argument was that if $\alpha$ is an irrational root of a nonzero polynomial $P\in\mathbf Z[T]$, then $\kappa(\alpha)\leq\deg(P)$. The proof is now standard: given an approximation $p/q$ of $\alpha$, observe that $q^d P(p/q)$ is a non-zero integer (if, say, $P$ is irreducible), so that $\left| q^d P(p/q)\right|\geq 1$. On the other hand, $P(p/q)\approx (p/q-\alpha) P'(\alpha)$, hence an inequality $\left|\alpha-\frac pq\right|\gg q^{-d}$.

This result has been generalized, first by Axel Thue en 1909 (who proved an inequality $\kappa(\alpha)\leq \frac12 d+1$), then by Carl Ludwig Siegel and Freeman Dyson in 1947 (showing $\kappa(\alpha)\leq 2\sqrt d$ and $\kappa(\alpha)\leq\sqrt{2d}$). While Liouville's result was based in the minimal polynomial of $\alpha$, these generalisations required to involve polynomials in two variables, and the non-vanishing of a quantity such that $q^dP(p/q)$ above was definitely less trivial. Roth's proof made use of polynomials of arbitrarily large degree, and his remarkable achievement was a proof of the required non-vanishing result.

Roth's proof was “elementary”, making use only of polynomials and wronskians. There are today more geometric proofs, such as the one by Hélène Esnault and Eckart Viehweg (1984) or Michael Nakamaye's subsequent proof (1995) which is based on Faltings's product theorem.

What is still missing, however, is the proof of an effective version of Roth's theorem, that would give, given any real number $n>\kappa(\alpha)$, an actual integer $Q$ such that every rational fraction $p/q$ in lowest terms such that $\left|\alpha-\frac pq\right|\leq 1/q^n$ satisfies $q\leq Q$. It seems that this defect lies at the very heart of almost all of the current approaches in diophantine approximations...

I had to mentor an Agrégation leçon entitled Examples of dense subsets. For my own edification (and that of the masses), I want to try to record here as many proofs as of the Weierstrass density theorem as I can : Every complex-valued continuous function on the closed interval $[-1;1]$ can be uniformly approximated by polynomials. I'll also include as a bonus the trigonometric variant: Every complex-valued continuous and $2\pi$-periodic function on $\mathbf R$ can be uniformly approximated by trigonometric polynomials.

1. Using the Stone theorem.

This 1937—1948 theorem is probably the final conceptual brick to the edifice of which Weierstrass laid the first stone in 1885. It asserts that a subalgebra of continuous functions on a compact totally regular (e.g., metric) space is dense for the uniform norm if and only if it separates points. In all presentations that I know of, its proof requires to establish that the absolute value function can be uniformly approximated by polynomials on $[-1;1]$:

Stone truncates the power series expansion of the function \[ x\mapsto \sqrt{1-(1-x^2)}=\sum_{n=0}^\infty \binom{1/2}n (x^2-1)^n, \] bounding by hand the error term.

Bourbaki (Topologie générale, X, p. 36, lemme 2) follows a more elementary approach and begins by proving that the function $x\mapsto \sqrt x$ can be uniformly approximated by polynomials on $[0;1]$. (The absolute value function is recovered since $\mathopen|x\mathclose|\sqrt{x^2}$.) To this aim, he introduces the sequence of polynomials given by $p_0=0$ and $p_{n+1}(x)=p_n(x)+\frac12\left(x-p_n(x)^2\right)$ and proves by induction the inequalities \[ 0\leq \sqrt x-p_n(x) \leq \frac{2\sqrt x}{2+n\sqrt x} \leq \frac 2n\] for $x\in[0;1]$ and $n\geq 0$. This implies the desired result.

The algebra of polynomials separates points on the compact set $[-1;1]$, hence is dense. To treat the case of trigonometric polynomials, consider Laurent polynomials on the unit circle.

2. Convolution.

Consider an approximation $(\rho_n)$ of the Dirac distribution, i.e., a sequence of continuous, nonnegative and compactly supported functions on $\mathbf R$ such that $\int\rho_n=1$ and such that for every $\delta>0$, $\int_{\mathopen| x\mathclose|>\delta} \rho_n(x)\,dx\to 0$. Given a continuous function $f$ on $\mathbf R$, form the convolutions defined by $f*\rho_n(x)=\int_{\mathbf R} \rho_n(t) f(x-t)\, dt$. It is classical that $f*\rho_n$ converges uniformly on every compact to $f$.

Now, given a continuous function $f$ on $[-1;1]$, one can extend it to a continuous function with compact support on $\mathbf R$ (defining $f$ to be affine linear on $[-2;-1]$ and on $[1;2]$, and to be zero outside of $[-2;2]$. We want to choose $\rho_n$ so that $f*\rho_n$ is a polynomial on $[-1;1]$. The basic idea is just to choose a parameter $a>0$, and to take $\rho_n(x)= c_n (1-(x/a)^2)^n$ for $\mathopen|x\mathclose|\leq a$ and $\rho_n(x)=0$ otherwise, with $c_n$ adjusted so that $\int\rho_n=1$. Let us write $f*\rho_n(x)=\int_{-2}^2 \rho_n(x-t) f(t)\, dt$; if $x\in[-1;1]$ and $t\in[-2:2]$, then $x-t\in [-3;3]$ so we just need to be sure that $\rho_n$ is a polynomial on that interval, which we get by taking, say, $a=3$. This shows that the restriction of $f*\rho_n$ to $[-1;1]$ is a polynomial function, and we're done.

This approach is more or less that of D. Jackson (“A Proof of Weierstrass's Theorem,” Amer. Math. Monthly, 1934). The difference is that he considers continuous functions on a closed interval contained in $\mathopen]0;1\mathclose[$ which he extends linearly to $[0;1]$ so that they vanish at $0$ and $1$; he considers the same convolution, taking the parameter $a=1$.

As shown by Jacskon, the same approach works easily (in a sense, more easily) for $2\pi$-periodic functions, considering the kernel defined by $\rho_n(x)=c_n(1+\cos(x))^n$, where $c_n$ is chosen so that \int_{-\pi}^\pi \rho_n=1$.

3. Bernstein polynomials.

Take a continuous function $f$ on $[0;1]$ and, for $n\geq 0$, set \[ B_nf(x) = \sum_{k=0}^n f(k/n) \binom nk t^k (1-t)^{n-k}.\] It is classical that $B_nf$ converges uniformly to $f$ on $[0;1]$.

There are two classical proofs of Bernstein's theorem. One is probabilistic and consists in observing that $B_nf(x)$ is the expected value of $f(S_n)$, where $S_n$ is the sum of $n$ i.i.d. Bernoulli random variables with parameter $x\in[0;1]$. Another (generalized as the Korovkin theorem, “On convergence of linear positive operators in the space of continuous functions”, Dokl. Akad. Nauk SSSR (N.S.), vol. 90, ) consists in showing (i) that for $f=1,x,x^2$, $B_nf$ converges uniformly to $f$ (an explicit calculation), (ii) that if $f\geq 0$, then $B_nf\geq 0$ as well, (iii) for every $x\in[0;1]$, squeezing $f$ inbetween two quadratic polynomials $f^+$ and $f_-$ such that $f^+(x)-f^-(x)$ is as small as desired.

A trigonometric variant would be given by Fejér's theorem that the Cesàro averages of a Fourier series of a continuous, $2\pi$-periodic function converge uniformly to that function. In turn, Fejér's theorem can be proved in both ways, either by convolution (the Fejér kernel is nonnegative), or by a Korovkine-type argument (replacing $1,x,x^2$ on $[0;1]$ by $1,z,z^2,z^{-1},z^{-2}$ on the unit circle).

Let us show that for every $\delta\in\mathopen]0,1\mathclose[$ and every $\varepsilon>0$, there exists a polynomial $p$ satisfying the following properties:

$0\leq p(x)\leq \varepsilon$ for $-1\leq x\leq-\delta$;

$0\leq p(x)\leq 1$ for $-\delta\leq x\leq \delta$;

$1-\varepsilon\leq p(x)\leq 1$ for $\delta\leq x\leq 1$.

In other words, these polynomials approximate the (discontinuous) function $f$ on $[-1;1]$ defined by $f(x)=0$ for $x< 0$, $f(x)=1$ for $x> 0$ and $f(0)=1/2$.

A possible formula is $p(x)=(1- ((1-x)/2))^n)^{2^n}$, where $n$ is a large enough integer. First of all, one has $0\leq (1-x)/2\leq 1$ for every $x\in[-1;1]$, so that $0\leq p(x)\leq 1$. Let $x\in[-1;-\delta]$; then one has $(1-x)/2\geq (1+\delta)/2$, hence $p(x)\leq (1-((1+\delta)/2)^n)^{2^n}$, which can be made arbitrarily small when $n\to\infty$. Let finally $x\in[\delta;1]$; then $(1-x)/2\geq (1-\delta)/2$, hence $p(x)\geq (1-((1-\delta)/2)^n)^{2^n}\geq 1- (1-\delta)^n$, which can be made arbitrarily close to $1$ when $n\to\infty$.

By translation and dilations, the discontinuity can be placed at any element of $[0;1]$. Let now $f$ be an arbitrary step function and let us write it as a linear combination $f=\sum a_i f_i$, where $f_i$ is a $\{0,1\}$-valued step function. For every $i$, let $p_i$ be a polynomial that approximates $f_i$ as given above. The linear combination $\sum a_i p_i$ approximates $f$ with maximal error $\sup(\mathopen|a_i\mathclose|)$.

Using uniform continuity of continuous functions on $[-1;1]$, every continuous function can be uniformly approximated by a step function. This concludes the proof.

5. Using approximation by piecewise linear functions.

As in the proof of Stone's theorem, one uses the fact that the function $x\mapsto \mathopen|x\mathclose|$ is uniformly approximated by a sequence of polynomial on $[-1;1]$. Consequently, so are the functions $x\mapsto \max(0,x)=(x+\mathopen|x\mathclose|)/2 $ and $x\mapsto\min(0,x)=(x-\mathopen|x\mathclose|)/2$. By translation and dilation, every continuous piecewise linear function on $[-1;1]$ with only one break point is uniformly approximated by polynomials. By linear combination, every continuous piecewise linear affine function is uniformly approximated by polynomials.
By uniform continuity, every continuous function can be uniformly approximated by continuous piecewise linear affine functions. Weierstrass's theorem follows.

6. Moments.

A linear subspace $A$ of a Banach space is dense if and only if every continuous linear form which vanishes on $A$ is identically $0$. In the present case, the dual of $C^0([-1;1],\mathbf C)$ is the space of complex measures on $[-1;1]$ (Riesz theorem, if one wish, or the definition of a measure). So let $\mu$ be a complex measure on $[-1;1]$ such that $\int_{-1}^1 t^n \,d\mu(t)=0$ for every integer $n\geq 0$; let us show that $\mu=0$. This is the classical problem of showing that a complex measure on $[-1;1]$ is determined by its moments. In fact, the classical proof of this fact runs the other way round, and there must exist ways to reverse the arguments.

One such solution is given in Rudin's Real and complex analysis, where it is more convenient to consider functions on the interval $[0;1]$. So, let $F(z)=\int_0^1 t^z \,d\mu(t)$. The function $F$ is holomorphic and bounded on the half-plane $\Re(z)> 0$ and vanishes at the positive integers. At this point, Rudin makes a conform transformation to the unit disk (setting $w=(z-1)/(z+1)$) and gets a bounded function on the unit disk with zeroes at $(n-1)/(n+1)=1-2/(n+1)$, for $n\in\mathbf N$, and this contradicts the fact that the series $\sum 1/(n+1)$ diverges.

In Rudin, this method is used to prove the more general Müntz–Szász theorem according to which the family $(t^{\lambda_n})$ generates a dense subset of $C([0;1])$ if and only if $\sum 1/\lambda_n=+\infty$.

For every complex number $a$ such that $\mathopen|a\mathclose|>1$, one can write $1/(t-a)$ as a converging power series. By summation, this quickly gives that
\[ F(a) = \int_{-1}^1 \frac{1}{t-a}\, d\mu(t) \equiv 0. \]
Observe that this formula defines a holomorphic function on $\mathbf C\setminus[-1;1]$; by analytic continuous, one thus has $F(a)=0$ for every $a\not\in[-1;1]$.
Take a $C^2$-function $g$ with compact support on the complex plane. For every $t\in\mathbf C$, one has the following formula
\[ \iint \bar\partial g(z) \frac{1}{t-z} \, dx\,dy = g(t), \]
which implies, by integration and Fubini, that
\[ \int_{-1}^1 g(t)\,d\mu(t) = \iint \int \bar\partial g(z) \frac1{t-z}\,d\mu(t)\,dx\,dy = \iint \bar\partial g(z) F(z)\,dx\, dy= 0. \]
On the other hand, every $C^2$ function on $[-1;1]$ can be extended to such a function $g$, so that the measure $\mu$ vanishes on every $C^2$ function on $[-1;1]$. Approximating a continuous function by a $C^2$ function (first take a piecewise linear approximation, and round the corners), we get that $\mu$ vanishes on every continuous function, as was to be proved.

7. Chebyshev/Markov systems.

This proof is due to P. Borwein and taken from the book Polynomials and polynomial inequalities, by P. Borwein and T. Erdélyi (Graduate Texts in Maths, vol. 161, 1995). Let us say that a sequence $(f_n)$ of continuous functions on an interval $I$ is a Markov system (resp. a weak Markov system) if for every integer $n$, every linear combination of $(f_0,\dots,f_n)$ has at most $n$ zeroes (resp. $n$ sign changes) in $I$.

Given a Markov system $(f_n)$, one defines a sequence $(T_n)$, where $T_n-f_n$ is the element of $\langle f_0,\dots,f_{n-1}\rangle$ which is the closest to $f_n$. The function $T_n$ has $n$ zeroes on the interval $I$; let $M_n$ be the maximum distance between two consecutive zeroes.

Borwein's theorem (Theorem 4.1.1 in the mentioned book) then asserts that if the sequence $(f_n)$ is a Markov system consisting of $C^1$ functions, then its linear span is dense in $C(I)$ if and only if $M_n\to 0$.

The sequence of monomials $(x^n)$ on $I=[-1;1]$ is of
course a Markov system. In this case, the polynomial $T_n$ is the $n$th
Chebyshev polynomial, given by $T_n(2\cos(x))=2\cos(nx)$, and its roots
are given by $2\cos((\pi+2k\pi)/2n)$, for $k=0,\dots,n-1$, and $M_n\leq
\pi/n$. This gives yet another proof of Weierstrass's approximation theorem.

I was absolutely excited at the prospect of returning to this avant-garde jazz hall (it has been my 3rd concert there, the first one was in 2010, with Sylvie Courvoisier, Thomas Morgan and Ben Perowski, and the second, last year, with Wadada Leo Smith and Vijay Iyer) to listen to Gerry Hemingway, and the cold rain falling on New York City did not diminish my enthusiasm. (Although I had to take care on the streets, for one could almost see nothing...) I feared I would arrive late, but Gerry Hemingway was still installing his tools, various sticks, small cymbals, woodblocks, as well as a cello bow...

I admit, it took me some time to appreciate the music. Of course, it was free jazz (so what?) and I couldn't really follow the stream of music. Both musicians were acting delicately and skillfully (no discussion) at creating sound, as a painter would spread brush strokes on a canvas—and actually, Hemingway was playing a lot of brushes, those drum sticks made of many (wire or plastic) strings that have a delicate and not very resonating sound... Color after color, something was emerging, sound was being shaped.

There is an eternal discussion about the nature of music (is it rhythm? melody? harmony?) and consequently about the role of each instrument in the shaping of the music. A related question is the way a given instrument should be used to produce sound.

None of the obvious answers was to be heard tonight. Russ Lossing sometimes stroke the strings of the grand piano with mallets, something almost classical in avant-garde piano music. I should have been prepared by the concert of Tony Malaby's Tubacello, that I attended with François Loeser in Sons d'hiver a few weeks ago, where John Hollenbeck simultaneously played drums and prepared piano, but the playing of Gerry Hemingway brought me much surprise. He could blow on the heads of the drums, hit them with a woodblock or strange plastic mallets; he could make the cymbals vibrate by pressing the cell bow on it; he could also take the top hi-hat cymbal on the left hand, and then either hit it with a stick, or press it on the snare drum, thereby producing a mixture of snare/cymbal sound; during a long drum roll, he could also vary the pitch of the sound by pressing the drum head with his right foot—can you imagine the scene?

It is while discussing with him in between the two sets that I gradually understood (some of) his musical conception. How everything is about sound and color. That's why he uses an immense palette of tools, to produce the sounds he feels would best fit the music. He also discussed extended technique, by which he means not the kind of drumistic virtuosity that could allow you (unfortunately, not me...) to play the 26 drum rudiments at 300bpm, but by extending the range of sounds he can consistently produce with his “basic Buddy Rich type instrument”—Google a picture of Terry Bozzio's drumkit if you don't see what I mean. He described himself as a colorist, who thinks of his instrument in terms of pitches; he also said how rhythm also exists in negative, when it is not played explicitly. A striking remark because it exactly depicted how I understand the playing of one of my favorite jazz drummers, Paul Motian, but whom I couldn't appreciate until I became able of hearing what he did not play.

The second set did not sound as abstract as the first one. Probably the two blowing instruments helped giving the sound more flesh and more texture. Samuel Blaser, on the trombone, was absolutely exceptional—go listen at once for his Spring Rain album, an alliance of Jimmy Giuffre and contemporary jazz—and Loren Stillman sang very beautiful melodic lines on the alto sax. The four of them could also play in all combinations, and with extremly interesting dynamics, going effortlessly from one to another. And when a wonderful moment of thunder ended abruptly with the first notes of Paul Motian's Etude, music turned into pure emotion.

As was apparently first noticed by Noam Elkies, 2016 is the cardinality of the general linear group over the field with 7 elements, $G=\mathop{\rm GL}(2,\mathbf F_7)$. I was mentoring an agrégation lesson on finite fields this afternoon, and I could not resist having the student check this. Then came the natural question of describing the Sylow subgroups of this finite group. This is what I describe here.

First of all, let's recall the computation of the cardinality of $G$. The first column of a matrix in $G$ must be non-zero, hence there are $7^2-1$ possibilities; for the second column, it only needs to be non-collinear to the first one, and each choice of the first column forbids $7$ second columns, hence $7^2-7$ possibilities. In the end, one has $\mathop{\rm Card}(G)=(7^2-1)(7^2-7)=48\cdot 42=2016$. The same argument shows that the cardinality of the group $\mathop{\rm GL}(n,\mathbf F_q)$ is equal to $(q^n-1)(q^n-q)\cdots (q^n-q^{n-1})=q^{n(n-1)/2}(q-1)(q^2-1)\cdots (q^n-1)$.

Let's go back to our example. The factorization of this cardinal comes easily: $2016=(7^2-1)(7^2-7)=(7-1)(7+1)7(7-1)=6\cdot 8\cdot 7\cdot 6= 2^5\cdot 3^2\cdot 7$. Consequently, there are three Sylow subgroups to find, for the prime numbers $2$, $3$ and $7$.

The cas $p=7$ is the most classical one. One needs to find a group of order 7, and one such subgroup is given by the group of upper triangular matrices $\begin{pmatrix} 1 & * \\ 0 & 1\end{pmatrix}$. What makes things work is that $p$ is the characteristic of the chosen finite field. In general, if $q$ is a power of $p$, then the subgroup of upper-triangular matrices in $\mathop{\rm GL}(n,\mathbf F_q)$ with $1$s one the diagonal has cardinality $q\cdot q^2\cdots q^{n-1}=q^{n(n-1)/2}$, which is exactly the highest power of $p$ divising the cardinality of $\mathop{\rm GL}(n,\mathbf F_q)$.

Let's now study $p=3$. We need to find a group $S$ of order $3^2=9$ inside $G$. There are a priori two possibilities, either $S\simeq (\mathbf Z/3\mathbf Z)^2$, or $S\simeq (\mathbf Z/9\mathbf Z)$.
We will find a group of the first sort, which will that the second case doesn't happen, because all $3$-Sylows are pairwise conjugate, hence isomorphic.

Now, the multiplicative group $\mathbf F_7^\times$ is of order $6$, and is cyclic, hence contains a subgroup of order $3$, namely $C=\{1,2,4\}$. Consequently, the group of diagonal matrices with coefficients in $C$ is isomorphic to $(\mathbf Z/3\mathbf Z)^2$ and is our desired $3$-Sylow.

Another reason why $G$ does not contain a subgroup $S$ isomorphic to $\mathbf Z/9\mathbf Z$ is that it does not contain elements of order $9$. Let's argue by contradiction and consider a matrix $A\in G$ such that $A^9=I$; then its minimal polynomial $P$ divides $T^9-1$. Since $7\nmid 9$, the matrix $A$ is diagonalizable over the algebraic closure of $\mathbf F_7$. The eigenvalues of $A$ are eigenvalues are $9$th roots of unity, and are quadratic over $\mathbf F_7$ since $\deg(P)\leq 2$. On the other hand, if $\alpha$ is a $9$th root of unity belonging to $\mathbf F_{49}$, one has $\alpha^9=\alpha^{48}=1$, hence $\alpha^3=1$ since $\gcd(9,48)=3$. Consequently, $\alpha$ is a cubic root of unity and $A^3=1$, showing that $A$ has order $3$.

It remains to treat the case $p=2$, which I find slightly trickier. Let's try to find elements $A$ in $G$ whose order divides $2^5$. As above, it is diagonalizable in an algebraic closure, its minimal polynomial divides $T^{32}-1$, and its roots belong to $\mathbf F_{49}$, hence satisfy $\alpha^{32}=\alpha^{48}=1$, hence $\alpha^{16}=1$. Conversely, $\mathbf F_{49}^\times$ is cyclic of order $48$, hence contains an element of order $16$, and such an element is quadratic over $\mathbf F_7$, hence its minimal polynomial $P$ has degree $2$. The corresponding companion matrix $A$ in $G$ is an element of order $16$, generating a subgroup $S_1$ of $G$ isomorphic to $\mathbf Z/16\mathbf Z$. We also observe that $\alpha^8=-1$ (because its square is $1$); since $A^8$ is diagonalizable in an algebraic closure with $-1$ as the only eigenvalue, this shows $A^8=-I$.

Now, there exists a $2$-Sylow subgroup containing $S_1$, and $S_1$ will be a normal subgroup of $S$ (because its index is the smallest prime number dividing the order of $S$, which is $2$). This suggests to introduce the normalizer $N$ of $S_1$ in $G$. One then has $S_1\subset S\subset N$. Let $s\in S$ be such that $s\not\in S_1$; then there exists a unique $k\in\{1,\dots,15\}$ such that $s^{-1}As=A^k$, and $s^{-2}As^2=A^{k^2}=A$ (because $s$ has order $2$ modulo $S_1$), hence $k^2\equiv 1\pmod{16}$—in other words, $k\equiv \pm1\pmod 8$.

There exists a natural choice of $s$: the involution ($s^2=I$) which exchanges the two eigenspaces of $A$. To finish the computation, it's useful to take a specific example of polynomial $P$ of degree $2$ whose roots in $\mathbf F_{49}$ are primitive $16$th roots of unity. In other words, we need to factor the $16$th cyclotomic polynomial $\Phi_{16}=T^8+1$ over $\mathbf F_7$ and find a factor of degree $2$; actually, Galois theory shows that all factors have the same degree, so that there should be 4 factors of degree $2$. To explain the following computation, some remark is useful. Let
$\alpha$ be a $16$th root of unity in $\mathbf F_{49}$; we have
$(\alpha^8)^2=1$ but $\alpha^8\neq 1$, hence $\alpha^8=-1$. If $P$
is the minimal polynomial of $\alpha$, the other root is $\alpha^7$,
hence the constant term of $P$ is equal to $\alpha\cdot
\alpha^7=\alpha^8=-1$.

We start from $T^8+1=(T^4+1)^2-2T^4$ and observe that $2\equiv 4^2\pmod 7$, so that $T^8+1=(T^4+1)^2-4^2T^4=(T^4+4T^2+1)(T^4-4T^2+1)$. To find the factors of degree $2$, we remember that their constant terms should be equal to $-1$. We thus go on differently, writing $T^4+4T^2+1=(T^2+aT-1)(T^2-aT-1)$ and solving for $a$: this gives $-2-a^2=4$, hence $a^2=-6=1$ and $a=\pm1$. The other factors are found similarly and we get
\[ T^8+1=(T^2-T-1)(T^2+T-1)(T^2-4T-1)(T^2+4T-1). \]
We thus choose the factor $T^2-T-1$ and set $A=\begin{pmatrix} 0 & 1 \\ 1 & 1 \end{pmatrix}$.

Two eigenvectors for $A$ are $v=\begin{pmatrix} 1 \\ \alpha \end{pmatrix}$ and $v'=\begin{pmatrix}1 \\ \alpha'\end{pmatrix}$, where $\alpha'=\alpha^7$ is the other root of $T^2-T-1$. The equations for $B$ are $Bv=v'$ and $Bv'=v$; this gives $B=\begin{pmatrix} 1 & 0 \\ 1 & - 1\end{pmatrix}$. The subgroup $S=\langle A,B\rangle$ generated by $A$ and $B$ has order $32$ and is a $2$-Sylow subgroup of $G$.

Generalizing this method involves finding large commutative $p$-subgroups (such as $S_1$) which belong to appropriate (possibly non-split) tori of $\mathop{\rm GL}(n)$ and combining them with adequate parts of their normalizer, which is close to considering Sylow subgroups of the symmetric group. The paper Sylow $p$-subgroups of the classical groups over finite fields with characteristic prime to $p$ by A.J. Weir gives the general description (as well as for orthogonal and symplectic groups), building on an earlier paper in which he constructed Sylow subgroups of symmetric groups. See also the paper Some remarks on Sylow subgroups of the general linear groups by C. R. Leedham-Green and W. Plesken which says a lot about maximal $p$-subgroups of the general linear group (over non-necessarily finite fields). Also, the question was recently the subject of interesting discussions on MathOverflow.

[Edited on Febr. 14 to correct the computation of the 2-Sylow...]