Friday, April 29, 2016

Roth's theorems

A few days ago,  The Scotsman published a paper about Klaus Roth's legacy, explaining how he donated his fortune (1 million pounds) to various charities. This paper was reported by some friends on Facebook. Yuri Bilu added the mention that he knew two important theorems of Roth, and since one of them did not immediately reached my mind, I decided to write this post.

The first theorem was a 1935 conjecture of Erdős and Turán concerning arithmetic progression of length 3 that Roth proved in 1952. That is, one is given a set AA of positive integers and one seeks for triples (a,b,c)(a,b,c) of distinct elements of AA such that a+c=2ba+c=2b; Roth proved that infinitely many such triples exist as soon as the upper density of AA is positive, that is:
lim supx+Card(A[0;x])x>0. \limsup_{x\to+\infty} \frac{\mathop{\rm Card}(A\cap [0;x])}x >0.
In 1975, Endre Szemerédi proved that such sets of integers contain (finite) arithmetic progressions of arbitrarily large length. Other proofs have been given by Hillel Furstenberg (using ergodic theory) and Tim Gowers (by Fourier/combinatorical methods); Roth had used Hardy-Littlewood's circle method.

In 1976, Erdős strengthened his initial conjecture with Turán and predicted that arithmetic progressions of arbitrarily large length exist in AA as soon as
aA1a=+. \sum_{a\in A} \frac 1a =+\infty.
Such a result is still a conjecture, even for arithmetic progressions of length 33, but a remarkable particular case has been proved by Ben Green and Terry Tao in 2004, when AA is the set of all prime numbers.

Outstanding as these results are (Tao has been given the Fields medal in 2006 and Szemerédi the Abel prize in 2012), the second theorem of Roth was proved in 1955 and was certainly the main reason for awarding him the Fields medal in 1958. Indeed, Roth gave a definitive answer to a long standing question in diophantine approximation that originated from the works of Joseph Liouville (1844). Given a real number α\alpha, one is interested to rational fractions p/qp/q that are close to α\alpha, and to the quality of the approximation, namely the exponent nn such that αpq1/qn\left| \alpha- \frac pq \right|\leq 1/q^n. Precisely, the approximation exponent κ(α)\kappa(\alpha) is the largest lower bound of all real numbers nn such that the previous inequality has infinitely many solutions in fractions p/qp/q, and Roth's theorem asserts that one has κ(α)=2\kappa(\alpha)=2 when α\alpha is an irrational algebraic number.

One part of this result goes back to Dirichlet, showing that for any irrational number α\alpha, there exist many good approximations with exponent  22. This can be proved using the theory of continued fractions and is also a classical application of Dirichlet's box principle. Take a positive integer QQ and consider the Q+1Q+1 numbers qαqαq\alpha- \lfloor q\alpha\rfloor in [0,1][0,1], for 0qQ0\leq q\leq Q; two of them must be less that 1/Q1/Q apart; this furnishes integers p,p,q,qp',p'',q',q'', with 0q<qQ0\leq q'<q''\leq Q such that (qαp)(qαp)1/Q\left| (q''\alpha-p'')-(q'\alpha-p')\right|\leq 1/Q; then set p=ppp=p''-p' and q=qqq=q''-q'; one has qαp1/Q\left| q\alpha -p \right|\leq 1/Q, hence αpq1/Qq1/q2\left|\alpha-\frac pq\right|\leq 1/Qq\leq 1/q^2.

To prove an inequality in the other direction, Liouville's argument was that if α\alpha is an irrational root of a nonzero polynomial PZ[T]P\in\mathbf Z[T], then κ(α)deg(P)\kappa(\alpha)\leq\deg(P). The proof is now standard: given an approximation p/qp/q of α\alpha, observe that qdP(p/q)q^d P(p/q) is a non-zero integer (if, say, PP is irreducible), so that qdP(p/q)1\left| q^d P(p/q)\right|\geq 1. On the other hand, P(p/q)(p/qα)P(α)P(p/q)\approx (p/q-\alpha) P'(\alpha), hence an inequality αpqqd\left|\alpha-\frac pq\right|\gg q^{-d}.

This result has been generalized, first by Axel Thue en 1909 (who proved an inequality κ(α)12d+1\kappa(\alpha)\leq \frac12 d+1), then by Carl Ludwig Siegel and Freeman Dyson in 1947 (showing κ(α)2d\kappa(\alpha)\leq 2\sqrt d and κ(α)2d\kappa(\alpha)\leq\sqrt{2d}). While Liouville's result was based in the minimal polynomial of α\alpha, these generalisations required to involve polynomials in two variables, and the non-vanishing of a quantity such that qdP(p/q)q^dP(p/q) above was definitely less trivial. Roth's proof made use of polynomials of arbitrarily large degree, and his remarkable achievement was a proof of the required non-vanishing result.

Roth's proof was “elementary”, making use only of polynomials and wronskians. There are today more geometric proofs, such as the one by Hélène Esnault and Eckart Viehweg (1984) or Michael Nakamaye's subsequent proof (1995) which is based on Faltings's product theorem.

What is still missing, however, is the proof of an effective version of Roth's theorem, that would give, given any real number n>κ(α)n>\kappa(\alpha), an actual integer QQ such that every rational fraction p/qp/q in lowest terms such that αpq1/qn\left|\alpha-\frac pq\right|\leq 1/q^n satisfies qQq\leq Q. It seems that this defect lies at the very heart of almost all of the current approaches in diophantine approximations... 

Wednesday, April 13, 2016

Weierstrass's approximation theorem

I had to mentor an Agrégation leçon entitled Examples of dense subsets. For my own edification (and that of the masses), I want to try to record here as many proofs as of the Weierstrass density theorem as I can : Every complex-valued continuous function on the closed interval [1;1][-1;1] can be uniformly approximated by polynomials. I'll also include as a bonus the trigonometric variant: Every complex-valued continuous and 2π2\pi-periodic function on R\mathbf R can be uniformly approximated by trigonometric polynomials.

1. Using the Stone theorem.

This 1937—1948 theorem is probably the final conceptual brick to the edifice of which Weierstrass laid the first stone in 1885. It asserts that a subalgebra of continuous functions on a compact totally regular (e.g., metric) space is dense for the uniform norm if and only if it separates points. In all presentations that I know of, its proof requires to establish that the absolute value function can be uniformly approximated by polynomials on [1;1][-1;1]:
  • Stone truncates the power series expansion of the function x1(1x2)=n=0(1/2n)(x21)n, x\mapsto \sqrt{1-(1-x^2)}=\sum_{n=0}^\infty \binom{1/2}n (x^2-1)^n, bounding by hand the error term.
  • Bourbaki (Topologie générale, X, p. 36, lemme 2) follows a more elementary approach and begins by proving  that the function xxx\mapsto \sqrt x can be uniformly approximated by polynomials on [0;1][0;1]. (The absolute value function is recovered since xx2\mathopen|x\mathclose|\sqrt{x^2}.) To this aim, he introduces the sequence of polynomials given by p0=0p_0=0 and pn+1(x)=pn(x)+12(xpn(x)2)p_{n+1}(x)=p_n(x)+\frac12\left(x-p_n(x)^2\right) and proves by induction the inequalities 0xpn(x)2x2+nx2n 0\leq \sqrt x-p_n(x) \leq \frac{2\sqrt x}{2+n\sqrt x} \leq \frac 2n for x[0;1]x\in[0;1] and n0n\geq 0. This implies the desired result.
The algebra of polynomials separates points on the compact set [1;1][-1;1], hence is dense. To treat the case of trigonometric polynomials, consider Laurent polynomials on the unit circle.

2. Convolution.

Consider an approximation (ρn)(\rho_n) of the Dirac distribution, i.e., a sequence of continuous, nonnegative and compactly supported functions on R\mathbf R such that ρn=1\int\rho_n=1 and such that for every δ>0\delta>0, x>δρn(x)dx0\int_{\mathopen| x\mathclose|>\delta} \rho_n(x)\,dx\to 0. Given a continuous function ff on R\mathbf R, form the convolutions defined by fρn(x)=Rρn(t)f(xt)dtf*\rho_n(x)=\int_{\mathbf R} \rho_n(t) f(x-t)\, dt. It is classical that fρnf*\rho_n converges uniformly on every compact to ff.

Now, given a continuous function ff on [1;1][-1;1], one can extend it to a continuous function with compact support on R\mathbf R (defining ff to be affine linear on [2;1][-2;-1] and on [1;2][1;2], and to be zero outside of [2;2][-2;2]. We want to choose ρn\rho_n so that fρnf*\rho_n is a polynomial on [1;1][-1;1]. The basic idea is just to choose a parameter a>0a>0, and to take ρn(x)=cn(1(x/a)2)n\rho_n(x)= c_n (1-(x/a)^2)^n for xa\mathopen|x\mathclose|\leq a and ρn(x)=0\rho_n(x)=0 otherwise, with cnc_n adjusted so that ρn=1\int\rho_n=1. Let us write fρn(x)=22ρn(xt)f(t)dtf*\rho_n(x)=\int_{-2}^2 \rho_n(x-t) f(t)\, dt; if x[1;1]x\in[-1;1] and t[2:2]t\in[-2:2], then xt[3;3]x-t\in [-3;3] so we just need to be sure that ρn\rho_n is a polynomial on that interval, which we get by taking, say, a=3a=3. This shows that the restriction of fρnf*\rho_n to [1;1][-1;1] is a polynomial function, and we're done.

This approach is more or less that of D. Jackson (“A Proof of Weierstrass's Theorem,” Amer. Math. Monthly, 1934). The difference is that he considers continuous functions on a closed interval contained in ]0;1[\mathopen]0;1\mathclose[ which he extends linearly to [0;1][0;1] so that they vanish at 00 and 11; he considers the same convolution, taking the parameter a=1a=1.

Weierstrass's own proof (“Über die analytische Darstellbarkeit sogenannter willkurlicher Functionen einer reellen Veranderlichen Sitzungsberichteder,” Königlich Preussischen Akademie der Wissenschaften zu Berlin, 1885) was slightly more sophisticated: he first showed approximation by convolution with the Gaussian kernel  defined by ρn(t)=neπnt2 \rho_n(t) =\sqrt{ n} e^{- \pi n t^2}, and then expanded the kernel as a power series, a suitable truncation of which furnishes the desired polynomials.

As shown by Jacskon, the same approach works easily (in a sense, more easily) for 2π2\pi-periodic functions, considering the kernel defined by ρn(x)=cn(1+cos(x))n\rho_n(x)=c_n(1+\cos(x))^n, where cnc_n is chosen so that \int_{-\pi}^\pi \rho_n=1$.

3. Bernstein polynomials.

Take a continuous function ff on [0;1][0;1] and, for n0n\geq 0, set Bnf(x)=k=0nf(k/n)(nk)tk(1t)nk. B_nf(x) = \sum_{k=0}^n f(k/n) \binom nk t^k (1-t)^{n-k}. It is classical that BnfB_nf converges uniformly to ff on [0;1][0;1].

There are two classical proofs of Bernstein's theorem. One is probabilistic and consists in observing that Bnf(x)B_nf(x) is the expected value of f(Sn)f(S_n), where SnS_n is the sum of nn i.i.d. Bernoulli random variables with parameter x[0;1]x\in[0;1]. Another (generalized as the Korovkin theorem, On convergence of linear positive operators in the space of continuous functions, Dokl. Akad. Nauk SSSR (N.S.), vol. 90,‎ ) consists in showing (i) that for f=1,x,x2f=1,x,x^2, BnfB_nf converges uniformly to ff (an explicit calculation), (ii) that if f0f\geq 0, then Bnf0B_nf\geq 0 as well, (iii) for every x[0;1]x\in[0;1], squeezing ff inbetween two quadratic polynomials f+f^+ and ff_- such that f+(x)f(x)f^+(x)-f^-(x) is as small as desired.

A trigonometric variant would be given by Fejér's theorem that the Cesàro averages of a Fourier series of a continuous, 2π2\pi-periodic function converge uniformly to that function. In turn, Fejér's theorem can be proved in both ways, either by convolution (the Fejér kernel is nonnegative), or by a Korovkine-type argument (replacing 1,x,x21,x,x^2 on [0;1][0;1] by 1,z,z2,z1,z21,z,z^2,z^{-1},z^{-2} on the unit circle).


4. Using approximation by step functions.

This proof originates with a paper of H. Kuhn, “Ein elementarer Beweis des Weierstrasschen Approximationsatzes,” Arch. Math. 15 (1964), p. 316–317.

Let us show that for every δ]0,1[\delta\in\mathopen]0,1\mathclose[ and every ε>0\varepsilon>0, there exists a polynomial pp satisfying the following properties:
  • 0p(x)ε0\leq p(x)\leq \varepsilon for 1xδ-1\leq x\leq-\delta;
  • 0p(x)10\leq p(x)\leq 1 for δxδ-\delta\leq x\leq \delta;
  • 1εp(x)11-\varepsilon\leq p(x)\leq 1 for δx1\delta\leq x\leq 1.
In other words, these polynomials approximate the (discontinuous) function ff on [1;1][-1;1] defined by f(x)=0f(x)=0 for x<0x< 0, f(x)=1f(x)=1 for x>0x> 0 and f(0)=1/2f(0)=1/2.

A possible formula is p(x)=(1((1x)/2))n)2np(x)=(1- ((1-x)/2))^n)^{2^n}, where nn is a large enough integer. First of all, one has 0(1x)/210\leq (1-x)/2\leq 1 for every x[1;1]x\in[-1;1], so that 0p(x)10\leq p(x)\leq 1. Let x[1;δ]x\in[-1;-\delta]; then one has (1x)/2(1+δ)/2(1-x)/2\geq (1+\delta)/2, hence p(x)(1((1+δ)/2)n)2np(x)\leq (1-((1+\delta)/2)^n)^{2^n}, which can be made arbitrarily small when nn\to\infty. Let finally x[δ;1]x\in[\delta;1]; then (1x)/2(1δ)/2(1-x)/2\geq (1-\delta)/2, hence p(x)(1((1δ)/2)n)2n1(1δ)np(x)\geq (1-((1-\delta)/2)^n)^{2^n}\geq 1- (1-\delta)^n, which can be made arbitrarily close to 11 when nn\to\infty.

By translation and dilations, the discontinuity can be placed at any element of [0;1][0;1]. Let now ff be an arbitrary step function and let us write it as a linear combination f=aifif=\sum a_i f_i, where fif_i is a {0,1}\{0,1\}-valued step function. For every ii, let pip_i be a polynomial that approximates fif_i as given above. The linear combination aipi\sum a_i p_i approximates ff with maximal error sup(ai)\sup(\mathopen|a_i\mathclose|).

Using uniform continuity of continuous functions on [1;1][-1;1], every continuous function can be uniformly approximated by a step function. This concludes the proof.

5. Using approximation by piecewise linear functions.

As in the proof of Stone's theorem, one uses the fact that the function xxx\mapsto \mathopen|x\mathclose| is uniformly approximated by a sequence of polynomial on [1;1][-1;1]. Consequently,  so are the functions xmax(0,x)=(x+x)/2x\mapsto \max(0,x)=(x+\mathopen|x\mathclose|)/2 and xmin(0,x)=(xx)/2x\mapsto\min(0,x)=(x-\mathopen|x\mathclose|)/2. By translation and dilation, every continuous piecewise linear function on [1;1][-1;1] with only one break point is uniformly approximated by polynomials. By linear combination, every continuous piecewise linear affine function is uniformly approximated by polynomials.
By uniform continuity, every continuous function can be uniformly approximated by continuous piecewise linear affine functions. Weierstrass's theorem follows.

6. Moments.

A linear subspace AA of a Banach space is dense if and only if every continuous linear form which vanishes on AA is identically 00. In the present case, the dual of C0([1;1],C)C^0([-1;1],\mathbf C) is the space of complex measures on [1;1][-1;1] (Riesz theorem, if one wish, or the definition of a measure). So let μ\mu be a complex measure on [1;1][-1;1] such that 11tndμ(t)=0\int_{-1}^1 t^n \,d\mu(t)=0 for every integer n0n\geq 0; let us show that μ=0\mu=0. This is the classical problem of showing that a complex measure on [1;1][-1;1] is determined by its moments. In fact, the classical proof of this fact runs the other way round, and there must exist ways to reverse the arguments.

One such solution is given in Rudin's Real and complex analysis, where it is more convenient to consider functions on the interval [0;1][0;1]. So, let F(z)=01tzdμ(t)F(z)=\int_0^1 t^z \,d\mu(t). The function FF is holomorphic and bounded on the half-plane (z)>0\Re(z)> 0 and vanishes at the positive integers. At this point, Rudin makes a conform transformation to the unit disk (setting w=(z1)/(z+1)w=(z-1)/(z+1)) and gets a  bounded function on the unit disk with zeroes at (n1)/(n+1)=12/(n+1)(n-1)/(n+1)=1-2/(n+1), for nNn\in\mathbf N, and this contradicts the fact that the series 1/(n+1)\sum 1/(n+1) diverges.

In Rudin, this method is used to prove the more general Müntz–Szász theorem according to which the family (tλn)(t^{\lambda_n}) generates a dense subset of C([0;1])C([0;1]) if and only if 1/λn=+\sum 1/\lambda_n=+\infty.

Here is another solution I learnt in a paper by L. Carleson (“Mergelyan's theorem on uniform polynomial approximation”, Math. Scand., 1964).

For every complex number aa such that a>1\mathopen|a\mathclose|>1, one can write 1/(ta)1/(t-a) as a converging power series. By summation, this quickly gives that
F(a)=111tadμ(t)0. F(a) = \int_{-1}^1 \frac{1}{t-a}\, d\mu(t) \equiv 0.
Observe that this formula defines a holomorphic function on C[1;1]\mathbf C\setminus[-1;1]; by analytic continuous, one thus has F(a)=0F(a)=0 for every a∉[1;1]a\not\in[-1;1].
Take a C2C^2-function gg with compact support on the complex plane. For every tCt\in\mathbf C, one has the following formula
ˉg(z)1tzdxdy=g(t), \iint \bar\partial g(z) \frac{1}{t-z} \, dx\,dy = g(t),
which implies, by integration and Fubini, that
11g(t)dμ(t)=ˉg(z)1tzdμ(t)dxdy=ˉg(z)F(z)dxdy=0. \int_{-1}^1 g(t)\,d\mu(t) = \iint \int \bar\partial g(z) \frac1{t-z}\,d\mu(t)\,dx\,dy = \iint \bar\partial g(z) F(z)\,dx\, dy= 0.
On the other hand, every C2C^2 function on [1;1][-1;1] can be extended to such a function gg, so that the measure μ\mu vanishes on every C2C^2 function on [1;1][-1;1]. Approximating a continuous function by a C2C^2 function (first take a piecewise linear approximation, and round the corners), we get that μ\mu vanishes on every continuous function, as was to be proved.

7. Chebyshev/Markov systems.

This proof is due to P. Borwein and taken from the book Polynomials and polynomial inequalities, by P. Borwein and T. Erdélyi (Graduate Texts in Maths, vol. 161, 1995). Let us say that a sequence (fn)(f_n) of continuous functions on an interval II is a Markov system (resp. a weak Markov system) if for every integer nn, every linear combination of (f0,,fn)(f_0,\dots,f_n) has at most nn zeroes (resp. nn sign changes) in II.

Given a Markov system (fn)(f_n), one defines a sequence (Tn)(T_n), where TnfnT_n-f_n is the element of f0,,fn1\langle f_0,\dots,f_{n-1}\rangle which is the closest to fnf_n. The function TnT_n has nn zeroes on the interval II; let MnM_n be the maximum distance between two consecutive zeroes.

Borwein's theorem  (Theorem 4.1.1 in the mentioned book) then asserts that if the sequence (fn)(f_n) is a Markov system consisting of C1C^1 functions, then its linear span is dense in C(I)C(I) if and only if Mn0M_n\to 0.

The sequence of monomials (xn)(x^n) on I=[1;1]I=[-1;1] is of course a Markov system.  In this case, the polynomial TnT_n is the nnth Chebyshev polynomial, given by Tn(2cos(x))=2cos(nx)T_n(2\cos(x))=2\cos(nx), and its roots are given by 2cos((π+2kπ)/2n)2\cos((\pi+2k\pi)/2n), for k=0,,n1k=0,\dots,n-1, and Mnπ/nM_n\leq \pi/n. This gives yet another proof of Weierstrass's approximation theorem.