A few days ago, The Scotsman published a paper about Klaus Roth's legacy, explaining how he donated his fortune (1 million pounds) to various charities. This paper was reported by some friends on Facebook. Yuri Bilu added the mention that he knew two important theorems of Roth, and since one of them did not immediately reached my mind, I decided to write this post.
The first theorem was a 1935 conjecture of Erdős and Turán concerning arithmetic progression of length 3 that Roth proved in 1952. That is, one is given a set of positive integers and one seeks for triples of distinct elements of such that ; Roth proved that infinitely many such triples exist as soon as the upper density of is positive, that is:
In 1975, Endre Szemerédi proved that such sets of integers contain (finite) arithmetic progressions of arbitrarily large length. Other proofs have been given by Hillel Furstenberg (using ergodic theory) and Tim Gowers (by Fourier/combinatorical methods); Roth had used Hardy-Littlewood's circle method.
In 1976, Erdős strengthened his initial conjecture with Turán and predicted that arithmetic progressions of arbitrarily large length exist in as soon as
Such a result is still a conjecture, even for arithmetic progressions of length , but a remarkable particular case has been proved by Ben Green and Terry Tao in 2004, when is the set of all prime numbers.
Outstanding as these results are (Tao has been given the Fields medal in 2006 and Szemerédi the Abel prize in 2012), the second theorem of Roth was proved in 1955 and was certainly the main reason for awarding him the Fields medal in 1958. Indeed, Roth gave a definitive answer to a long standing question in diophantine approximation that originated from the works of Joseph Liouville (1844). Given a real number , one is interested to rational fractions that are close to , and to the quality of the approximation, namely the exponent such that . Precisely, the approximation exponent is the largest lower bound of all real numbers such that the previous inequality has infinitely many solutions in fractions , and Roth's theorem asserts that one has when is an irrational algebraic number.
One part of this result goes back to Dirichlet, showing that for any irrational number , there exist many good approximations with exponent . This can be proved using the theory of continued fractions and is also a classical application of Dirichlet's box principle. Take a positive integer and consider the numbers in , for ; two of them must be less that apart; this furnishes integers , with such that ; then set and ; one has , hence .
To prove an inequality in the other direction, Liouville's argument was that if is an irrational root of a nonzero polynomial , then . The proof is now standard: given an approximation of , observe that is a non-zero integer (if, say, is irreducible), so that . On the other hand, , hence an inequality .
This result has been generalized, first by Axel Thue en 1909 (who proved an inequality ), then by Carl Ludwig Siegel and Freeman Dyson in 1947 (showing and ). While Liouville's result was based in the minimal polynomial of , these generalisations required to involve polynomials in two variables, and the non-vanishing of a quantity such that above was definitely less trivial. Roth's proof made use of polynomials of arbitrarily large degree, and his remarkable achievement was a proof of the required non-vanishing result.
Roth's proof was “elementary”, making use only of polynomials and wronskians. There are today more geometric proofs, such as the one by Hélène Esnault and Eckart Viehweg (1984) or Michael Nakamaye's subsequent proof (1995) which is based on Faltings's product theorem.
What is still missing, however, is the proof of an effective version of Roth's theorem, that would give, given any real number , an actual integer such that every rational fraction in lowest terms such that satisfies . It seems that this defect lies at the very heart of almost all of the current approaches in diophantine approximations...
Friday, April 29, 2016
Wednesday, April 13, 2016
Weierstrass's approximation theorem
Libellés :
agrégation
,
density
,
topology
I had to mentor an Agrégation leçon entitled Examples of dense subsets. For my own edification (and that of the masses), I want to try to record here as many proofs as of the Weierstrass density theorem as I can : Every complex-valued continuous function on the closed interval [−1;1] can be uniformly approximated by polynomials. I'll also include as a bonus the trigonometric variant: Every complex-valued continuous and 2π-periodic function on R can be uniformly approximated by trigonometric polynomials.
1. Using the Stone theorem.
This 1937—1948 theorem is probably the final conceptual brick to the edifice of which Weierstrass laid the first stone in 1885. It asserts that a subalgebra of continuous functions on a compact totally regular (e.g., metric) space is dense for the uniform norm if and only if it separates points. In all presentations that I know of, its proof requires to establish that the absolute value function can be uniformly approximated by polynomials on [−1;1]:
2. Convolution.
Consider an approximation (ρn) of the Dirac distribution, i.e., a sequence of continuous, nonnegative and compactly supported functions on R such that ∫ρn=1 and such that for every δ>0, ∫∣x∣>δρn(x)dx→0. Given a continuous function f on R, form the convolutions defined by f∗ρn(x)=∫Rρn(t)f(x−t)dt. It is classical that f∗ρn converges uniformly on every compact to f.
Now, given a continuous function f on [−1;1], one can extend it to a continuous function with compact support on R (defining f to be affine linear on [−2;−1] and on [1;2], and to be zero outside of [−2;2]. We want to choose ρn so that f∗ρn is a polynomial on [−1;1]. The basic idea is just to choose a parameter a>0, and to take ρn(x)=cn(1−(x/a)2)n for ∣x∣≤a and ρn(x)=0 otherwise, with cn adjusted so that ∫ρn=1. Let us write f∗ρn(x)=∫−22ρn(x−t)f(t)dt; if x∈[−1;1] and t∈[−2:2], then x−t∈[−3;3] so we just need to be sure that ρn is a polynomial on that interval, which we get by taking, say, a=3. This shows that the restriction of f∗ρn to [−1;1] is a polynomial function, and we're done.
This approach is more or less that of D. Jackson (“A Proof of Weierstrass's Theorem,” Amer. Math. Monthly, 1934). The difference is that he considers continuous functions on a closed interval contained in ]0;1[ which he extends linearly to [0;1] so that they vanish at 0 and 1; he considers the same convolution, taking the parameter a=1.
Weierstrass's own proof (“Über die analytische Darstellbarkeit sogenannter willkurlicher Functionen einer reellen Veranderlichen Sitzungsberichteder,” Königlich Preussischen Akademie der Wissenschaften zu Berlin, 1885) was slightly more sophisticated: he first showed approximation by convolution with the Gaussian kernel defined by ρn(t)=ne−πnt2, and then expanded the kernel as a power series, a suitable truncation of which furnishes the desired polynomials.
As shown by Jacskon, the same approach works easily (in a sense, more easily) for 2π-periodic functions, considering the kernel defined by ρn(x)=cn(1+cos(x))n, where cn is chosen so that \int_{-\pi}^\pi \rho_n=1$.
3. Bernstein polynomials.
Take a continuous function f on [0;1] and, for n≥0, set Bnf(x)=k=0∑nf(k/n)(kn)tk(1−t)n−k. It is classical that Bnf converges uniformly to f on [0;1].
There are two classical proofs of Bernstein's theorem. One is probabilistic and consists in observing that Bnf(x) is the expected value of f(Sn), where Sn is the sum of n i.i.d. Bernoulli random variables with parameter x∈[0;1]. Another (generalized as the Korovkin theorem, “On convergence of linear positive operators in the space of continuous functions”, Dokl. Akad. Nauk SSSR (N.S.), vol. 90, ) consists in showing (i) that for f=1,x,x2, Bnf converges uniformly to f (an explicit calculation), (ii) that if f≥0, then Bnf≥0 as well, (iii) for every x∈[0;1], squeezing f inbetween two quadratic polynomials f+ and f− such that f+(x)−f−(x) is as small as desired.
A trigonometric variant would be given by Fejér's theorem that the Cesàro averages of a Fourier series of a continuous, 2π-periodic function converge uniformly to that function. In turn, Fejér's theorem can be proved in both ways, either by convolution (the Fejér kernel is nonnegative), or by a Korovkine-type argument (replacing 1,x,x2 on [0;1] by 1,z,z2,z−1,z−2 on the unit circle).
4. Using approximation by step functions.
This proof originates with a paper of H. Kuhn, “Ein elementarer Beweis des Weierstrasschen Approximationsatzes,” Arch. Math. 15 (1964), p. 316–317.
Let us show that for every δ∈]0,1[ and every ε>0, there exists a polynomial p satisfying the following properties:
A possible formula is p(x)=(1−((1−x)/2))n)2n, where n is a large enough integer. First of all, one has 0≤(1−x)/2≤1 for every x∈[−1;1], so that 0≤p(x)≤1. Let x∈[−1;−δ]; then one has (1−x)/2≥(1+δ)/2, hence p(x)≤(1−((1+δ)/2)n)2n, which can be made arbitrarily small when n→∞. Let finally x∈[δ;1]; then (1−x)/2≥(1−δ)/2, hence p(x)≥(1−((1−δ)/2)n)2n≥1−(1−δ)n, which can be made arbitrarily close to 1 when n→∞.
By translation and dilations, the discontinuity can be placed at any element of [0;1]. Let now f be an arbitrary step function and let us write it as a linear combination f=∑aifi, where fi is a {0,1}-valued step function. For every i, let pi be a polynomial that approximates fi as given above. The linear combination ∑aipi approximates f with maximal error sup(∣ai∣).
Using uniform continuity of continuous functions on [−1;1], every continuous function can be uniformly approximated by a step function. This concludes the proof.
5. Using approximation by piecewise linear functions.
As in the proof of Stone's theorem, one uses the fact that the function x↦∣x∣ is uniformly approximated by a sequence of polynomial on [−1;1]. Consequently, so are the functions x↦max(0,x)=(x+∣x∣)/2 and x↦min(0,x)=(x−∣x∣)/2. By translation and dilation, every continuous piecewise linear function on [−1;1] with only one break point is uniformly approximated by polynomials. By linear combination, every continuous piecewise linear affine function is uniformly approximated by polynomials.
By uniform continuity, every continuous function can be uniformly approximated by continuous piecewise linear affine functions. Weierstrass's theorem follows.
6. Moments.
A linear subspace A of a Banach space is dense if and only if every continuous linear form which vanishes on A is identically 0. In the present case, the dual of C0([−1;1],C) is the space of complex measures on [−1;1] (Riesz theorem, if one wish, or the definition of a measure). So let μ be a complex measure on [−1;1] such that ∫−11tndμ(t)=0 for every integer n≥0; let us show that μ=0. This is the classical problem of showing that a complex measure on [−1;1] is determined by its moments. In fact, the classical proof of this fact runs the other way round, and there must exist ways to reverse the arguments.
One such solution is given in Rudin's Real and complex analysis, where it is more convenient to consider functions on the interval [0;1]. So, let F(z)=∫01tzdμ(t). The function F is holomorphic and bounded on the half-plane ℜ(z)>0 and vanishes at the positive integers. At this point, Rudin makes a conform transformation to the unit disk (setting w=(z−1)/(z+1)) and gets a bounded function on the unit disk with zeroes at (n−1)/(n+1)=1−2/(n+1), for n∈N, and this contradicts the fact that the series ∑1/(n+1) diverges.
In Rudin, this method is used to prove the more general Müntz–Szász theorem according to which the family (tλn) generates a dense subset of C([0;1]) if and only if ∑1/λn=+∞.
Here is another solution I learnt in a paper by L. Carleson (“Mergelyan's theorem on uniform polynomial approximation”, Math. Scand., 1964).
For every complex number a such that ∣a∣>1, one can write 1/(t−a) as a converging power series. By summation, this quickly gives that
F(a)=∫−11t−a1dμ(t)≡0.
Observe that this formula defines a holomorphic function on C∖[−1;1]; by analytic continuous, one thus has F(a)=0 for every a∈[−1;1].
Take a C2-function g with compact support on the complex plane. For every t∈C, one has the following formula
∬∂ˉg(z)t−z1dxdy=g(t),
which implies, by integration and Fubini, that
∫−11g(t)dμ(t)=∬∫∂ˉg(z)t−z1dμ(t)dxdy=∬∂ˉg(z)F(z)dxdy=0.
On the other hand, every C2 function on [−1;1] can be extended to such a function g, so that the measure μ vanishes on every C2 function on [−1;1]. Approximating a continuous function by a C2 function (first take a piecewise linear approximation, and round the corners), we get that μ vanishes on every continuous function, as was to be proved.
7. Chebyshev/Markov systems.
This proof is due to P. Borwein and taken from the book Polynomials and polynomial inequalities, by P. Borwein and T. Erdélyi (Graduate Texts in Maths, vol. 161, 1995). Let us say that a sequence (fn) of continuous functions on an interval I is a Markov system (resp. a weak Markov system) if for every integer n, every linear combination of (f0,…,fn) has at most n zeroes (resp. n sign changes) in I.
Given a Markov system (fn), one defines a sequence (Tn), where Tn−fn is the element of ⟨f0,…,fn−1⟩ which is the closest to fn. The function Tn has n zeroes on the interval I; let Mn be the maximum distance between two consecutive zeroes.
Borwein's theorem (Theorem 4.1.1 in the mentioned book) then asserts that if the sequence (fn) is a Markov system consisting of C1 functions, then its linear span is dense in C(I) if and only if Mn→0.
The sequence of monomials (xn) on I=[−1;1] is of course a Markov system. In this case, the polynomial Tn is the nth Chebyshev polynomial, given by Tn(2cos(x))=2cos(nx), and its roots are given by 2cos((π+2kπ)/2n), for k=0,…,n−1, and Mn≤π/n. This gives yet another proof of Weierstrass's approximation theorem.
1. Using the Stone theorem.
This 1937—1948 theorem is probably the final conceptual brick to the edifice of which Weierstrass laid the first stone in 1885. It asserts that a subalgebra of continuous functions on a compact totally regular (e.g., metric) space is dense for the uniform norm if and only if it separates points. In all presentations that I know of, its proof requires to establish that the absolute value function can be uniformly approximated by polynomials on [−1;1]:
- Stone truncates the power series expansion of the function x↦1−(1−x2)=n=0∑∞(n1/2)(x2−1)n, bounding by hand the error term.
- Bourbaki (Topologie générale, X, p. 36, lemme 2) follows a more elementary approach and begins by proving that the function x↦x can be uniformly approximated by polynomials on [0;1]. (The absolute value function is recovered since ∣x∣x2.) To this aim, he introduces the sequence of polynomials given by p0=0 and pn+1(x)=pn(x)+21(x−pn(x)2) and proves by induction the inequalities 0≤x−pn(x)≤2+nx2x≤n2 for x∈[0;1] and n≥0. This implies the desired result.
2. Convolution.
Consider an approximation (ρn) of the Dirac distribution, i.e., a sequence of continuous, nonnegative and compactly supported functions on R such that ∫ρn=1 and such that for every δ>0, ∫∣x∣>δρn(x)dx→0. Given a continuous function f on R, form the convolutions defined by f∗ρn(x)=∫Rρn(t)f(x−t)dt. It is classical that f∗ρn converges uniformly on every compact to f.
Now, given a continuous function f on [−1;1], one can extend it to a continuous function with compact support on R (defining f to be affine linear on [−2;−1] and on [1;2], and to be zero outside of [−2;2]. We want to choose ρn so that f∗ρn is a polynomial on [−1;1]. The basic idea is just to choose a parameter a>0, and to take ρn(x)=cn(1−(x/a)2)n for ∣x∣≤a and ρn(x)=0 otherwise, with cn adjusted so that ∫ρn=1. Let us write f∗ρn(x)=∫−22ρn(x−t)f(t)dt; if x∈[−1;1] and t∈[−2:2], then x−t∈[−3;3] so we just need to be sure that ρn is a polynomial on that interval, which we get by taking, say, a=3. This shows that the restriction of f∗ρn to [−1;1] is a polynomial function, and we're done.
This approach is more or less that of D. Jackson (“A Proof of Weierstrass's Theorem,” Amer. Math. Monthly, 1934). The difference is that he considers continuous functions on a closed interval contained in ]0;1[ which he extends linearly to [0;1] so that they vanish at 0 and 1; he considers the same convolution, taking the parameter a=1.
Weierstrass's own proof (“Über die analytische Darstellbarkeit sogenannter willkurlicher Functionen einer reellen Veranderlichen Sitzungsberichteder,” Königlich Preussischen Akademie der Wissenschaften zu Berlin, 1885) was slightly more sophisticated: he first showed approximation by convolution with the Gaussian kernel defined by ρn(t)=ne−πnt2, and then expanded the kernel as a power series, a suitable truncation of which furnishes the desired polynomials.
As shown by Jacskon, the same approach works easily (in a sense, more easily) for 2π-periodic functions, considering the kernel defined by ρn(x)=cn(1+cos(x))n, where cn is chosen so that \int_{-\pi}^\pi \rho_n=1$.
3. Bernstein polynomials.
Take a continuous function f on [0;1] and, for n≥0, set Bnf(x)=k=0∑nf(k/n)(kn)tk(1−t)n−k. It is classical that Bnf converges uniformly to f on [0;1].
There are two classical proofs of Bernstein's theorem. One is probabilistic and consists in observing that Bnf(x) is the expected value of f(Sn), where Sn is the sum of n i.i.d. Bernoulli random variables with parameter x∈[0;1]. Another (generalized as the Korovkin theorem, “On convergence of linear positive operators in the space of continuous functions”, Dokl. Akad. Nauk SSSR (N.S.), vol. 90, ) consists in showing (i) that for f=1,x,x2, Bnf converges uniformly to f (an explicit calculation), (ii) that if f≥0, then Bnf≥0 as well, (iii) for every x∈[0;1], squeezing f inbetween two quadratic polynomials f+ and f− such that f+(x)−f−(x) is as small as desired.
A trigonometric variant would be given by Fejér's theorem that the Cesàro averages of a Fourier series of a continuous, 2π-periodic function converge uniformly to that function. In turn, Fejér's theorem can be proved in both ways, either by convolution (the Fejér kernel is nonnegative), or by a Korovkine-type argument (replacing 1,x,x2 on [0;1] by 1,z,z2,z−1,z−2 on the unit circle).
4. Using approximation by step functions.
This proof originates with a paper of H. Kuhn, “Ein elementarer Beweis des Weierstrasschen Approximationsatzes,” Arch. Math. 15 (1964), p. 316–317.
Let us show that for every δ∈]0,1[ and every ε>0, there exists a polynomial p satisfying the following properties:
- 0≤p(x)≤ε for −1≤x≤−δ;
- 0≤p(x)≤1 for −δ≤x≤δ;
- 1−ε≤p(x)≤1 for δ≤x≤1.
A possible formula is p(x)=(1−((1−x)/2))n)2n, where n is a large enough integer. First of all, one has 0≤(1−x)/2≤1 for every x∈[−1;1], so that 0≤p(x)≤1. Let x∈[−1;−δ]; then one has (1−x)/2≥(1+δ)/2, hence p(x)≤(1−((1+δ)/2)n)2n, which can be made arbitrarily small when n→∞. Let finally x∈[δ;1]; then (1−x)/2≥(1−δ)/2, hence p(x)≥(1−((1−δ)/2)n)2n≥1−(1−δ)n, which can be made arbitrarily close to 1 when n→∞.
By translation and dilations, the discontinuity can be placed at any element of [0;1]. Let now f be an arbitrary step function and let us write it as a linear combination f=∑aifi, where fi is a {0,1}-valued step function. For every i, let pi be a polynomial that approximates fi as given above. The linear combination ∑aipi approximates f with maximal error sup(∣ai∣).
Using uniform continuity of continuous functions on [−1;1], every continuous function can be uniformly approximated by a step function. This concludes the proof.
5. Using approximation by piecewise linear functions.
As in the proof of Stone's theorem, one uses the fact that the function x↦∣x∣ is uniformly approximated by a sequence of polynomial on [−1;1]. Consequently, so are the functions x↦max(0,x)=(x+∣x∣)/2 and x↦min(0,x)=(x−∣x∣)/2. By translation and dilation, every continuous piecewise linear function on [−1;1] with only one break point is uniformly approximated by polynomials. By linear combination, every continuous piecewise linear affine function is uniformly approximated by polynomials.
By uniform continuity, every continuous function can be uniformly approximated by continuous piecewise linear affine functions. Weierstrass's theorem follows.
6. Moments.
A linear subspace A of a Banach space is dense if and only if every continuous linear form which vanishes on A is identically 0. In the present case, the dual of C0([−1;1],C) is the space of complex measures on [−1;1] (Riesz theorem, if one wish, or the definition of a measure). So let μ be a complex measure on [−1;1] such that ∫−11tndμ(t)=0 for every integer n≥0; let us show that μ=0. This is the classical problem of showing that a complex measure on [−1;1] is determined by its moments. In fact, the classical proof of this fact runs the other way round, and there must exist ways to reverse the arguments.
One such solution is given in Rudin's Real and complex analysis, where it is more convenient to consider functions on the interval [0;1]. So, let F(z)=∫01tzdμ(t). The function F is holomorphic and bounded on the half-plane ℜ(z)>0 and vanishes at the positive integers. At this point, Rudin makes a conform transformation to the unit disk (setting w=(z−1)/(z+1)) and gets a bounded function on the unit disk with zeroes at (n−1)/(n+1)=1−2/(n+1), for n∈N, and this contradicts the fact that the series ∑1/(n+1) diverges.
In Rudin, this method is used to prove the more general Müntz–Szász theorem according to which the family (tλn) generates a dense subset of C([0;1]) if and only if ∑1/λn=+∞.
Here is another solution I learnt in a paper by L. Carleson (“Mergelyan's theorem on uniform polynomial approximation”, Math. Scand., 1964).
For every complex number a such that ∣a∣>1, one can write 1/(t−a) as a converging power series. By summation, this quickly gives that
F(a)=∫−11t−a1dμ(t)≡0.
Observe that this formula defines a holomorphic function on C∖[−1;1]; by analytic continuous, one thus has F(a)=0 for every a∈[−1;1].
Take a C2-function g with compact support on the complex plane. For every t∈C, one has the following formula
∬∂ˉg(z)t−z1dxdy=g(t),
which implies, by integration and Fubini, that
∫−11g(t)dμ(t)=∬∫∂ˉg(z)t−z1dμ(t)dxdy=∬∂ˉg(z)F(z)dxdy=0.
On the other hand, every C2 function on [−1;1] can be extended to such a function g, so that the measure μ vanishes on every C2 function on [−1;1]. Approximating a continuous function by a C2 function (first take a piecewise linear approximation, and round the corners), we get that μ vanishes on every continuous function, as was to be proved.
7. Chebyshev/Markov systems.
This proof is due to P. Borwein and taken from the book Polynomials and polynomial inequalities, by P. Borwein and T. Erdélyi (Graduate Texts in Maths, vol. 161, 1995). Let us say that a sequence (fn) of continuous functions on an interval I is a Markov system (resp. a weak Markov system) if for every integer n, every linear combination of (f0,…,fn) has at most n zeroes (resp. n sign changes) in I.
Given a Markov system (fn), one defines a sequence (Tn), where Tn−fn is the element of ⟨f0,…,fn−1⟩ which is the closest to fn. The function Tn has n zeroes on the interval I; let Mn be the maximum distance between two consecutive zeroes.
Borwein's theorem (Theorem 4.1.1 in the mentioned book) then asserts that if the sequence (fn) is a Markov system consisting of C1 functions, then its linear span is dense in C(I) if and only if Mn→0.
The sequence of monomials (xn) on I=[−1;1] is of course a Markov system. In this case, the polynomial Tn is the nth Chebyshev polynomial, given by Tn(2cos(x))=2cos(nx), and its roots are given by 2cos((π+2kπ)/2n), for k=0,…,n−1, and Mn≤π/n. This gives yet another proof of Weierstrass's approximation theorem.
Subscribe to:
Posts
(
Atom
)