Wednesday, April 13, 2016

Weierstrass's approximation theorem

I had to mentor an Agrégation leçon entitled Examples of dense subsets. For my own edification (and that of the masses), I want to try to record here as many proofs as of the Weierstrass density theorem as I can : Every complex-valued continuous function on the closed interval [1;1][-1;1] can be uniformly approximated by polynomials. I'll also include as a bonus the trigonometric variant: Every complex-valued continuous and 2π2\pi-periodic function on R\mathbf R can be uniformly approximated by trigonometric polynomials.

1. Using the Stone theorem.

This 1937—1948 theorem is probably the final conceptual brick to the edifice of which Weierstrass laid the first stone in 1885. It asserts that a subalgebra of continuous functions on a compact totally regular (e.g., metric) space is dense for the uniform norm if and only if it separates points. In all presentations that I know of, its proof requires to establish that the absolute value function can be uniformly approximated by polynomials on [1;1][-1;1]:
  • Stone truncates the power series expansion of the function x1(1x2)=n=0(1/2n)(x21)n, x\mapsto \sqrt{1-(1-x^2)}=\sum_{n=0}^\infty \binom{1/2}n (x^2-1)^n, bounding by hand the error term.
  • Bourbaki (Topologie générale, X, p. 36, lemme 2) follows a more elementary approach and begins by proving  that the function xxx\mapsto \sqrt x can be uniformly approximated by polynomials on [0;1][0;1]. (The absolute value function is recovered since xx2\mathopen|x\mathclose|\sqrt{x^2}.) To this aim, he introduces the sequence of polynomials given by p0=0p_0=0 and pn+1(x)=pn(x)+12(xpn(x)2)p_{n+1}(x)=p_n(x)+\frac12\left(x-p_n(x)^2\right) and proves by induction the inequalities 0xpn(x)2x2+nx2n 0\leq \sqrt x-p_n(x) \leq \frac{2\sqrt x}{2+n\sqrt x} \leq \frac 2n for x[0;1]x\in[0;1] and n0n\geq 0. This implies the desired result.
The algebra of polynomials separates points on the compact set [1;1][-1;1], hence is dense. To treat the case of trigonometric polynomials, consider Laurent polynomials on the unit circle.

2. Convolution.

Consider an approximation (ρn)(\rho_n) of the Dirac distribution, i.e., a sequence of continuous, nonnegative and compactly supported functions on R\mathbf R such that ρn=1\int\rho_n=1 and such that for every δ>0\delta>0, x>δρn(x)dx0\int_{\mathopen| x\mathclose|>\delta} \rho_n(x)\,dx\to 0. Given a continuous function ff on R\mathbf R, form the convolutions defined by fρn(x)=Rρn(t)f(xt)dtf*\rho_n(x)=\int_{\mathbf R} \rho_n(t) f(x-t)\, dt. It is classical that fρnf*\rho_n converges uniformly on every compact to ff.

Now, given a continuous function ff on [1;1][-1;1], one can extend it to a continuous function with compact support on R\mathbf R (defining ff to be affine linear on [2;1][-2;-1] and on [1;2][1;2], and to be zero outside of [2;2][-2;2]. We want to choose ρn\rho_n so that fρnf*\rho_n is a polynomial on [1;1][-1;1]. The basic idea is just to choose a parameter a>0a>0, and to take ρn(x)=cn(1(x/a)2)n\rho_n(x)= c_n (1-(x/a)^2)^n for xa\mathopen|x\mathclose|\leq a and ρn(x)=0\rho_n(x)=0 otherwise, with cnc_n adjusted so that ρn=1\int\rho_n=1. Let us write fρn(x)=22ρn(xt)f(t)dtf*\rho_n(x)=\int_{-2}^2 \rho_n(x-t) f(t)\, dt; if x[1;1]x\in[-1;1] and t[2:2]t\in[-2:2], then xt[3;3]x-t\in [-3;3] so we just need to be sure that ρn\rho_n is a polynomial on that interval, which we get by taking, say, a=3a=3. This shows that the restriction of fρnf*\rho_n to [1;1][-1;1] is a polynomial function, and we're done.

This approach is more or less that of D. Jackson (“A Proof of Weierstrass's Theorem,” Amer. Math. Monthly, 1934). The difference is that he considers continuous functions on a closed interval contained in ]0;1[\mathopen]0;1\mathclose[ which he extends linearly to [0;1][0;1] so that they vanish at 00 and 11; he considers the same convolution, taking the parameter a=1a=1.

Weierstrass's own proof (“Über die analytische Darstellbarkeit sogenannter willkurlicher Functionen einer reellen Veranderlichen Sitzungsberichteder,” Königlich Preussischen Akademie der Wissenschaften zu Berlin, 1885) was slightly more sophisticated: he first showed approximation by convolution with the Gaussian kernel  defined by ρn(t)=neπnt2 \rho_n(t) =\sqrt{ n} e^{- \pi n t^2}, and then expanded the kernel as a power series, a suitable truncation of which furnishes the desired polynomials.

As shown by Jacskon, the same approach works easily (in a sense, more easily) for 2π2\pi-periodic functions, considering the kernel defined by ρn(x)=cn(1+cos(x))n\rho_n(x)=c_n(1+\cos(x))^n, where cnc_n is chosen so that \int_{-\pi}^\pi \rho_n=1$.

3. Bernstein polynomials.

Take a continuous function ff on [0;1][0;1] and, for n0n\geq 0, set Bnf(x)=k=0nf(k/n)(nk)tk(1t)nk. B_nf(x) = \sum_{k=0}^n f(k/n) \binom nk t^k (1-t)^{n-k}. It is classical that BnfB_nf converges uniformly to ff on [0;1][0;1].

There are two classical proofs of Bernstein's theorem. One is probabilistic and consists in observing that Bnf(x)B_nf(x) is the expected value of f(Sn)f(S_n), where SnS_n is the sum of nn i.i.d. Bernoulli random variables with parameter x[0;1]x\in[0;1]. Another (generalized as the Korovkin theorem, On convergence of linear positive operators in the space of continuous functions, Dokl. Akad. Nauk SSSR (N.S.), vol. 90,‎ ) consists in showing (i) that for f=1,x,x2f=1,x,x^2, BnfB_nf converges uniformly to ff (an explicit calculation), (ii) that if f0f\geq 0, then Bnf0B_nf\geq 0 as well, (iii) for every x[0;1]x\in[0;1], squeezing ff inbetween two quadratic polynomials f+f^+ and ff_- such that f+(x)f(x)f^+(x)-f^-(x) is as small as desired.

A trigonometric variant would be given by Fejér's theorem that the Cesàro averages of a Fourier series of a continuous, 2π2\pi-periodic function converge uniformly to that function. In turn, Fejér's theorem can be proved in both ways, either by convolution (the Fejér kernel is nonnegative), or by a Korovkine-type argument (replacing 1,x,x21,x,x^2 on [0;1][0;1] by 1,z,z2,z1,z21,z,z^2,z^{-1},z^{-2} on the unit circle).


4. Using approximation by step functions.

This proof originates with a paper of H. Kuhn, “Ein elementarer Beweis des Weierstrasschen Approximationsatzes,” Arch. Math. 15 (1964), p. 316–317.

Let us show that for every δ]0,1[\delta\in\mathopen]0,1\mathclose[ and every ε>0\varepsilon>0, there exists a polynomial pp satisfying the following properties:
  • 0p(x)ε0\leq p(x)\leq \varepsilon for 1xδ-1\leq x\leq-\delta;
  • 0p(x)10\leq p(x)\leq 1 for δxδ-\delta\leq x\leq \delta;
  • 1εp(x)11-\varepsilon\leq p(x)\leq 1 for δx1\delta\leq x\leq 1.
In other words, these polynomials approximate the (discontinuous) function ff on [1;1][-1;1] defined by f(x)=0f(x)=0 for x<0x< 0, f(x)=1f(x)=1 for x>0x> 0 and f(0)=1/2f(0)=1/2.

A possible formula is p(x)=(1((1x)/2))n)2np(x)=(1- ((1-x)/2))^n)^{2^n}, where nn is a large enough integer. First of all, one has 0(1x)/210\leq (1-x)/2\leq 1 for every x[1;1]x\in[-1;1], so that 0p(x)10\leq p(x)\leq 1. Let x[1;δ]x\in[-1;-\delta]; then one has (1x)/2(1+δ)/2(1-x)/2\geq (1+\delta)/2, hence p(x)(1((1+δ)/2)n)2np(x)\leq (1-((1+\delta)/2)^n)^{2^n}, which can be made arbitrarily small when nn\to\infty. Let finally x[δ;1]x\in[\delta;1]; then (1x)/2(1δ)/2(1-x)/2\geq (1-\delta)/2, hence p(x)(1((1δ)/2)n)2n1(1δ)np(x)\geq (1-((1-\delta)/2)^n)^{2^n}\geq 1- (1-\delta)^n, which can be made arbitrarily close to 11 when nn\to\infty.

By translation and dilations, the discontinuity can be placed at any element of [0;1][0;1]. Let now ff be an arbitrary step function and let us write it as a linear combination f=aifif=\sum a_i f_i, where fif_i is a {0,1}\{0,1\}-valued step function. For every ii, let pip_i be a polynomial that approximates fif_i as given above. The linear combination aipi\sum a_i p_i approximates ff with maximal error sup(ai)\sup(\mathopen|a_i\mathclose|).

Using uniform continuity of continuous functions on [1;1][-1;1], every continuous function can be uniformly approximated by a step function. This concludes the proof.

5. Using approximation by piecewise linear functions.

As in the proof of Stone's theorem, one uses the fact that the function xxx\mapsto \mathopen|x\mathclose| is uniformly approximated by a sequence of polynomial on [1;1][-1;1]. Consequently,  so are the functions xmax(0,x)=(x+x)/2x\mapsto \max(0,x)=(x+\mathopen|x\mathclose|)/2 and xmin(0,x)=(xx)/2x\mapsto\min(0,x)=(x-\mathopen|x\mathclose|)/2. By translation and dilation, every continuous piecewise linear function on [1;1][-1;1] with only one break point is uniformly approximated by polynomials. By linear combination, every continuous piecewise linear affine function is uniformly approximated by polynomials.
By uniform continuity, every continuous function can be uniformly approximated by continuous piecewise linear affine functions. Weierstrass's theorem follows.

6. Moments.

A linear subspace AA of a Banach space is dense if and only if every continuous linear form which vanishes on AA is identically 00. In the present case, the dual of C0([1;1],C)C^0([-1;1],\mathbf C) is the space of complex measures on [1;1][-1;1] (Riesz theorem, if one wish, or the definition of a measure). So let μ\mu be a complex measure on [1;1][-1;1] such that 11tndμ(t)=0\int_{-1}^1 t^n \,d\mu(t)=0 for every integer n0n\geq 0; let us show that μ=0\mu=0. This is the classical problem of showing that a complex measure on [1;1][-1;1] is determined by its moments. In fact, the classical proof of this fact runs the other way round, and there must exist ways to reverse the arguments.

One such solution is given in Rudin's Real and complex analysis, where it is more convenient to consider functions on the interval [0;1][0;1]. So, let F(z)=01tzdμ(t)F(z)=\int_0^1 t^z \,d\mu(t). The function FF is holomorphic and bounded on the half-plane (z)>0\Re(z)> 0 and vanishes at the positive integers. At this point, Rudin makes a conform transformation to the unit disk (setting w=(z1)/(z+1)w=(z-1)/(z+1)) and gets a  bounded function on the unit disk with zeroes at (n1)/(n+1)=12/(n+1)(n-1)/(n+1)=1-2/(n+1), for nNn\in\mathbf N, and this contradicts the fact that the series 1/(n+1)\sum 1/(n+1) diverges.

In Rudin, this method is used to prove the more general Müntz–Szász theorem according to which the family (tλn)(t^{\lambda_n}) generates a dense subset of C([0;1])C([0;1]) if and only if 1/λn=+\sum 1/\lambda_n=+\infty.

Here is another solution I learnt in a paper by L. Carleson (“Mergelyan's theorem on uniform polynomial approximation”, Math. Scand., 1964).

For every complex number aa such that a>1\mathopen|a\mathclose|>1, one can write 1/(ta)1/(t-a) as a converging power series. By summation, this quickly gives that
F(a)=111tadμ(t)0. F(a) = \int_{-1}^1 \frac{1}{t-a}\, d\mu(t) \equiv 0.
Observe that this formula defines a holomorphic function on C[1;1]\mathbf C\setminus[-1;1]; by analytic continuous, one thus has F(a)=0F(a)=0 for every a∉[1;1]a\not\in[-1;1].
Take a C2C^2-function gg with compact support on the complex plane. For every tCt\in\mathbf C, one has the following formula
ˉg(z)1tzdxdy=g(t), \iint \bar\partial g(z) \frac{1}{t-z} \, dx\,dy = g(t),
which implies, by integration and Fubini, that
11g(t)dμ(t)=ˉg(z)1tzdμ(t)dxdy=ˉg(z)F(z)dxdy=0. \int_{-1}^1 g(t)\,d\mu(t) = \iint \int \bar\partial g(z) \frac1{t-z}\,d\mu(t)\,dx\,dy = \iint \bar\partial g(z) F(z)\,dx\, dy= 0.
On the other hand, every C2C^2 function on [1;1][-1;1] can be extended to such a function gg, so that the measure μ\mu vanishes on every C2C^2 function on [1;1][-1;1]. Approximating a continuous function by a C2C^2 function (first take a piecewise linear approximation, and round the corners), we get that μ\mu vanishes on every continuous function, as was to be proved.

7. Chebyshev/Markov systems.

This proof is due to P. Borwein and taken from the book Polynomials and polynomial inequalities, by P. Borwein and T. Erdélyi (Graduate Texts in Maths, vol. 161, 1995). Let us say that a sequence (fn)(f_n) of continuous functions on an interval II is a Markov system (resp. a weak Markov system) if for every integer nn, every linear combination of (f0,,fn)(f_0,\dots,f_n) has at most nn zeroes (resp. nn sign changes) in II.

Given a Markov system (fn)(f_n), one defines a sequence (Tn)(T_n), where TnfnT_n-f_n is the element of f0,,fn1\langle f_0,\dots,f_{n-1}\rangle which is the closest to fnf_n. The function TnT_n has nn zeroes on the interval II; let MnM_n be the maximum distance between two consecutive zeroes.

Borwein's theorem  (Theorem 4.1.1 in the mentioned book) then asserts that if the sequence (fn)(f_n) is a Markov system consisting of C1C^1 functions, then its linear span is dense in C(I)C(I) if and only if Mn0M_n\to 0.

The sequence of monomials (xn)(x^n) on I=[1;1]I=[-1;1] is of course a Markov system.  In this case, the polynomial TnT_n is the nnth Chebyshev polynomial, given by Tn(2cos(x))=2cos(nx)T_n(2\cos(x))=2\cos(nx), and its roots are given by 2cos((π+2kπ)/2n)2\cos((\pi+2k\pi)/2n), for k=0,,n1k=0,\dots,n-1, and Mnπ/nM_n\leq \pi/n. This gives yet another proof of Weierstrass's approximation theorem.

4 comments :

  1. I knew about (1), (2) and (3) only...

    ReplyDelete
  2. Je n'arrive pas à voir le lien entre une fonction holomorphe bornée qui a des zéros en 1-2/(n+1) et la divergence de la série \sum 1/(n+1). Si on enlève la condition bornée, il existe des fonctions holomorphe qui ont des zéros en 1-2/(n+1).

    ReplyDelete
    Replies
    1. Sous entendu, holomorphe sur le disque unité ouvert...

      Delete
    2. Cf. Rudin, théorème 15.23 et son corollaire. C'est une application assez directe de la formule de Jensen.

      Delete