Showing posts with label analysis. Show all posts
Showing posts with label analysis. Show all posts

Friday, April 25, 2025

Yet another proof of the Weierstrass approximation theorem

Browsing through my Zotero database, I fall upon a paper by Harald Kuhn where he proposes an elementary proof of the Weierstrass approximation theorem. The proof is indeed neat, so here it is.

Theorem.Let $f\colon[0;1]\to\mathbf R$ be a continuous function and let $\varepsilon$ be a strictly positive real number. There exists a polynomial $P\in\mathbf R[T]$ such that $\left|P(x)-f(x)\right|<\varepsilon$ for every $x \in[0;1]$.

The proof goes out of the world of continuous functions and considers the Heaviside function $H$ on $\mathbf R$ defined by $H(x)=0$ for $x<0$ and $H(x)=1$ for $x\geq 0$ and will construct a polynomial approximation of $H.$

Lemma.There exists a sequence $(P_n)$ of polynomials which are increasing on $[-1;1]$ and which, for every $\delta>0$, converge uniformly to the function $H$ on $[-1;1]\setminus [-\delta;\delta]$.

For $n\in\mathbf N$, consider the polynomial $Q_n=(1-T^n)^{2^n}$. On $[0;1]$, it defines an decreasing function, with $Q_n(0)=1$ and $Q_n(1)=0$.

Let $q\in[0;1]$ be such that $0\leq q<1/2$. One has $$ Q_n(q) = (1-q^n)^{2^n} \geq 1-2^n q^n $$ in view of the inequality $(1-t)^n \geq 1-nt$ which is valid for $t\in[0;1]$. Since $2q<1$, we see that $Q_n(q)\to 1$. Since $Q_n$ is decreasing, one has $Q_n(q)\leq Q_n(x)\leq Q_n(1)=1$ for every $x\in[0;q]$, which shows that $Q_n$ converges uniformly to the constant function $1$ on the interval $[0;q]$.

Let now $q\in[0;1]$ be such that $1/2<q\leq 1$. Then $$ \frac1{Q_n(q)} = \frac1{(1-q^n)^{2^n}} = \left(1 + \frac{q^n}{1-q^{n}}\right)^{2^n} \geq 1 + \frac{2^nq^n}{1-q^{n}} $$ so that $Q_n(q)\to 0$. Since $Q_n$ is decreasing, one has $0=Q_n(1)\leq Q_n(x)\leq Q_n(q)$ for every $x\in[q;1]$, so that $Q_n$ converges uniformly to the constant function $0$ on the interval $[q;1]$.

Make an affine change of variables and set $P_n = Q_n((1-T)/2)$. The function defined by $P_n$ on $[-1:1]$ is increasing; for any real number $\delta$ such that $\delta>0,$ it converges uniformly to $0$ on $[-1;-\delta]$, and it converges uniformly to $1$ on $[\delta;1]$. This concludes the proof of the lemma.

We can now turn to the proof of the Weierstrass approximation theorem. Let $f$ be a continuous function on $[0;1]$. We may assume that $f(0)=0.$

The first step, that follows from Heine's uniform continuity theorem, consists in noting that there exists a uniform approximation of $f$ by a function of the form $ F(x)=\sum_{m=1}^N a_m H(x-c_m)$, where $(a_1,\dots,a_m)$ and $(c_1,\dots,c_m)$ are real numbers in $[0;1].$ Namely, Heine's theorem implies that there exists $\delta>0$ such that $|f(x)-f(y)|<\varepsilon$ if $|x-y|<\delta$. Choose $N$ such that $N\delta\geq 1$ and set $c_m=m/N$; then define $a_1,\dots,a_N$ so that $a_1=f(c_1)$, $a_1+a_2=f(c_2)$, etc. It is easy to check that $|F(x)-f(x)|\leq \varepsilon$ for every $x\in[0;1]$. Moreover, $\lvert a_m\rvert\leq\varepsilon/2$ for all $m$.

Now, fix a real number $\delta>0$ small enough so that the intervals $[c_m-\delta;c_m+\delta]$ are pairwise disjoint, and $n\in\mathbf N$ large enough so that $|P_n(x)-H(x)|\leq\varepsilon/2A$ for all $x\in[-1;1]$ such that $\delta\leq|x|$, where $A=\lvert a_0\rvert+\dots+\lvert a_N\rvert$. Finally, set $P(T) = \sum_{m=1}^N a_m P_n(T-c_m)$.

Let $x\in[-1;1]$. If $x$ doesn't belong to any interval of the form $[c_m-\delta;c_m+\delta]$, one can write $$\lvert P(x)-F(x)\rvert\leq \sum_{m} \lvert a_m\rvert \,\lvert P_n(x-c_m)- H(x-c_m)\rvert \leq \sum_m \lvert a_m\rvert (\varepsilon/A)\leq \varepsilon. $$ On the other hand, if there exists $m\in\{1,\dots,N\}$ such that $x\in[c_m-\delta;c_m+\delta]$, then there exists a unique such integer $m$. Writing $$\lvert P(x)-F(x)\rvert\leq \sum_{k\neq m} \lvert a_k\rvert \,\lvert P_n(x-c_k)- H(x-c_k)\rvert + \lvert a_m\rvert\, \lvert P_n(x-c_m)-H(x-c_m)\rvert, $$ the term with index $k$ in the first sum is bounded by $\lvert a_k\rvert \varepsilon/A$, while the last term is bounded by $ \lvert a_m\rvert$, because $0\leq P_n\leq H\leq 1$. Consequently, $$\lvert P(x)-F(x)\rvert \leq (\varepsilon/2A)\sum_{k\neq m} \lvert a_k\rvert +\lvert a_m\rvert \leq 2\varepsilon.$$ Finally, $\lvert P(x)-f(x)\rvert\leq \lvert P(x)-F(x)\rvert+\lvert F(x)-f(x)\rvert\leq 3\varepsilon$. This concludes the proof.

Yet another proof of the inequality between the arithmetic and the geometric means

This is an exposition of the proof of the inequality between arithmetic and geometric means given by A. Pełczyński (1992), “Yet another proof of the inequality between the means”, Annales Societatis Mathematicae Polonae. Seria II. Wiadomości Matematyczne, 29, p. 223–224. The proof might look bizarre, but I can guess some relation with another paper of the author where he proves uniqueness of the John ellipsoid. And as bizarre as it is, and despite the abundance of proofs of this inequality, I found it nice. (The paper is written in Polish, but the formulas allow to understand it.)

For $n\geq 1$, let $a_1,\dots,a_n$ be positive real numbers. Their arithmetic mean is $$ A = \dfrac1n \left(a_1+\dots + a_n\right)$$ while their geometric mean is $$G = \left(a_1\dots a_n\right)^{1/n}.$$ The inequality of the title says $G\leq A,$ with equality if and only if all $a_k$ are equal. By homogeneity, it suffices to prove that $A\geq 1$ if $G=1$, with equality if and only if $a_k=1$ for all $k.$ In other words, we have to prove the following theorem.

Theorem.If $a_1,\dots,a_n$ are positive real numbers such that $a_1\dots a_n=1$ and $a_1+\dots+a_n\leq n,$ then $a_1=\dots=a_n=1.$

The case $n=1$ is obvious and we argue by induction on $n$.

Lemma.If $a_1\cdots a_n=1$ and $a_1+\dots+a_n\leq n,$ then $a_1^2+\dots+a_n^2\leq n$

Indeed, we can write $$ a_1^2+\dots+a_n^2 = (a_1+\dots+a_n)^2 - \sum_{i\neq j} a_i a_j \leq n^2 - \sum_{i\neq j} a_i a_j,$$ and we have to give a lower bound for the second term. For given $i\neq j$, the product $a_i a_j$ and the remaining $a_k$, for $k\neq i,j$, are $n-1$ positive real numbers whose product is equal to $1$. By induction, one has $$ n-1 \leq a_i a_j + \sum_{k\neq i,j}a_k.$$ Summing these $n(n-1)$ inequalities, we have $$ n(n-1)^2 \leq \sum_{i\neq j} a_i a_j + \sum_{i\neq j} \sum_{k\neq i,j} a_k.$$ In the second term, every element $a_k$ appears $(n-1)(n-2)$ times, hence $$ n(n-1)^2 \leq \sum_{i\neq j} a_i a_j + (n-1)(n-2) \sum_{k} a_k \leq \sum_{i\neq j} a_i a_j + n(n-1)(n-2), $$ so that $$ \sum_{i\neq j} a_i a_j \geq n(n-1)^2-n(n-1)(n-2)=n(n-1).$$ Finally, we obtain $$a_1^2+\dots+a_n^2 \leq n^2-n(n-1)=n,$$ as claimed.

We can iterate this lemma: if $a_1+\dots+a_n\leq n$, then $$a_1^{2^m}+\dots+a_n^{2^m}\leq n$$ for every integer $m\geq 0$. When $m\to+\infty$, we obtain that $a_k\leq 1$ for every $n$. Since $a_1\dots a_n=1$, we must have $a_1=\dots=a_n=1$, and this concludes the proof.

Thursday, July 6, 2023

Electrostatics and rationality of power series

I would like to tell here a story that runs over some 150 years of mathematics, around the following question: given a power series $\sum a_n T^n$ (in one variable), how can you tell it comes from a rational function?

There are two possible motivations for such a question. One comes from complex function theory: you are given an analytic function and you wish to understand its nature — the simplest of them being the rational functions, it is natural to wonder if that happens or not (the next step would be to decide whether that function is algebraic, as in the problem of Hermann Amandus Schwarz (1843–1921). Another motivation starts from the coefficients $(a_n)$, of which the power series is called the generating series; indeed, the generating series is a rational function if and only if the sequence of coefficients satisfies a linear recurrence relation.

At this stage, there are little tools to answer that question, besides is a general algebraic criterion which essentially reformulates the property that the $(a_n)$ satisfy a linear recurrence relation. For any integers $m$ and $q$, let $D_m^q$ be the determinant of size $(q+1)$ given by \[ D_m^q = \begin{vmatrix} a_m & a_{m+1} & \dots & a_{m+q} \\ a_{m+1} & a_{m+2} & \dots & a_{m+q+1} \\ \vdots & \vdots & & \vdots \\ a_{m+q} & a_{m+q+1} & \dots & a_{m+2q} \end{vmatrix}. \] These determinants are called the Hankel determinants or (when $m=0$) the Kronecker determinants, from the names of the two 19th century German mathematicians Hermann Hankel (1839—1873) and Leopold von Kronecker (1823–1891). With this notation, the following properties are equivalent:

  1. The power series $\sum a_n T^n$ comes from a rational function;
  2. There is an integer $q$ such that $D_m^q=0$ for all large enough integers $m$;
  3. For all large enough integers $q$, one has $D^q_0=0$.
(The proof of that classic criterion is not too complicated, but the standard proof is quite smart. In his book Algebraic numbers and Fourier analysis, Raphaël Salem gives a proof which arguably easier.)

Since this algebraic criterion is very general, it is however almost impossible to prove the vanishing of these determinants without further information, and it is at this stage that Émile Borel enters the story. Émile Borel (1871–1956) has not only be a very important mathematician of the first half of the 20th century, by his works on analysis and probability theory, he also was a member of parliament, a minister of Navy, a member of Résistance during WW2. He founded the French research institution CNRS and of the Institut Henri Poincaré. He was also the first president of the Confédération des travailleurs intellectuels, a intellectual workers union.

In his 1893 paper « Sur un théorème de M. Hadamard », Borel proves the following theorem:

Theorem. — If the coefficients \(a_n\) are integers and if the power series \(\sum a_n T^n \) “defines” a function (possibly with poles) on a disk centered at the origin and of radius strictly greater than 1, then that power series is a rational function.

Observe how these two hypotheses belong to two quite unrelated worlds: the first one sets the question within number theory while the second one resorts from complex function theory. It looks almost as magic that these two hypotheses lead to the nice conclusion that the power series is a rational function.

It is also worth remarking that the second hypothesis is really necessary for the conclusion to hold, because rational functions define functions (with poles) on the whole complex plane. The status of the first hypothesis is more mysterious. While it is not necessary, the conclusion may not hold without it. For example, the exponential series \(\sum T^n/n!\) does define a function (without poles) on the whole complex plane, but is not rational (it grows too fast at infinity).

However, the interaction of number theoretical hypotheses with the question of the nature of power series was not totally inexplored at the time of Borel. For example, a 1852 theorem of the German mathematician Gotthold Eisenstein (Über eine allgemeine Eigenschaft der Reihen-Entwicklungen aller algebraischen Functionen) shows that when the coefficients \(a_n\) of the expansion \(\sum a_nT^n\) of an algebraic functions are rational numbers, the denominators are not arbitrary: there is an integer \(D\geq 1\) such that for all \(n\), \(a_n D^{n+1}\) is an integer. As a consequence of that theorem of Eisenstein, the exponential series or the logarithmic series cannot be algebraic.

It's always time somewhere on the Internet for a mathematical proof, so that I have no excuse for avoiding to tell you *how* Émile Borel proved that result. He uses the above algebraic criterion, hence needs to prove that some determinants \(D^q_m\) introduced above do vanish (for some \(q\) and for all \(m\) large enough). Then his idea consists in observing that these determinants are integers, so that if you wish to prove that they vanish, it suffices to prove that they are smaller than one!

If non-mathematicians are still reading me, there's no mistake here: the main argument for the proof is the remark that a nonzero integer is at least one. While this may sound as a trivial remark, this is something I like to call the main theorem of number theory, because it lies at the heart of almost all proofs in number theory.

So one has to bound determinants from above, and here Borel invokes the « théorème de M. Hadamard » that a determinant, being the volume of the parallelipiped formed by the rows, is smaller than the product of the norms of these rows, considered as vectors of the Euclidean space : in 2-D, the area of a parallelogram is smaller than the lengths of its edges! (Jacques Hadamard (1865—1963) is known for many extremely important results, notably the first proof of the Prime number theorem. It is funny that this elementary result went into the title of a paper!)

But there's no hope that using Hadamard's inequality of our initial matrix can be of some help, since that matrix has integer coefficients, so that all rows have size at least one. So Borel starts making clever row combinations on the Hankel matrices that take into accounts the poles of the function that the given power series defines.

Basically, if \(f=\sum a_nT^n\), there exists a polynomial \(h=\sum c_mT^m\) such that the power series \(g=fh = \sum b_n T^n\) defines a function without poles on some disk \(D(0,R)\) where \(R>1\). Using complex function theory (Cauchy's inequalities), this implies that the coefficients \(b_n\) converge rapidly to 0, roughly as \(R^{-n}\). For the same reason, the coefficients \(a_n\) cannot grow to fast, at most as \(r^{-n}\) for some \(r>0\). The formula \(g=fh\) shows that coefficients \(b_n\) are combinations of the \(a_n\), so that the determinant \(D_n^q\) is also equal to \[ \begin{vmatrix} a_n & a_{n+1} & \dots & a_{n+q} \\ \vdots & & \vdots \\ a_{n+p-1} & a_{n+p} & \dots & a_{n+p+q-1} \\ b_{n+p} & b_{n+p+1} & & b_{n+p+q} \\ \vdots & & \vdots \\ b_{n+q} & b_{n+q+1} & \dots & b_{n+2q} \end{vmatrix}\] Now, Hadamard's inequality implies that the determinant \(D_n^q\) is (roughly) bounded above by \( (r^{-n} )^p (R^{-n}) ^{q+1-p} \): there are \(p\) lines bounded above by some \(r^{-n}\) and the next \(q+1-p\) are bounded above by \(R^{-n}\). This expression rewrites as \( 1/(r^pR^{q+1-p})^n\). Since \(R>1\), we may choose \(q\) large enough so that \(r^p R^{q+1-p}>1\), and then, when \(n\) grows to infinity, the determinant is smaller than 1. Hence it vanishes!

The next chapter of this story happens in 1928, under the hands of the Hungarian mathematician George Pólya (1887-1985). Pólya had already written several papers which explore the interaction of number theory and complex function theory, one of them will even reappear later one in this thread. In his paper “Über gewisse notwendige Determinantenkriterien für die Fortsetzbarkeit einer Potenzreihe”, he studied the analogue of Borel's question when the disk of radius \(R\) is replaced by an arbitrary domain \(U\) of the complex plane containing the origin, proving that if \(U\) is big enough, then the initial power series is a rational function. It is however not so obvious how one should measure the size of \(U\), and it is at this point that electrostatics enter the picture.

In fact, it is convenient to make an inversion : the assumption is that the series \(\sum a_n / T^n\) defines a function (with poles) on the complement of a compact subset \(K\) of the complex plane. Imagine that this compact set is made of metal, put at potential 0, and put a unit electric charge at infinity. According to the 2-D laws of electrostatics, this create an electric potential \(V_K\) which is identically \(0\) on \(K\) and behaves as \( V_K(z)\approx \log(|z|/C_K)\) at infinity. Here, \(C_K\) is a positive constant which is the capacity of \(K\).

Theorem (Pólya). — Assume that the \(a_n\) are integers and the series \(\sum a_n/T^n\) defines a function (with poles) on the complement of \(K\). If the capacity of \(K\) is \(\lt1\), then \(\sum a_n T^n\) is rational.

To apply this theorem, it is important to know of computations of capacities. This was a classic theme of complex function theory and numerical analysis some 50 years ago. Indeed, what the electric potential does is solving the Laplace equation \(\Delta V_K=0\) outside of \(K\) with Dirichlet condition on the boundary of \(K\).

In fact, the early times of complex analysis made a remarkable use of this fact. For example, it was by solving the Laplace equation that Bernhard Riemann proved the existence of meromorphic functions on “Riemann surfaces”, but analysis was not enough developed at that time (around 1860). In a stunningly creative move, Riemann imagines that his surface is partly made of metal, and partly of insulating material and he deduces the construction of the desired function from the electric potential.

More recently, complex analysis and potential theory also had applications to fluid dynamics, for example to compute (at least approximately) the flow of air outside of an airplane wing. (I am not a specialist of this, but I'd guess the development of numerical methods that run on modern computers rendered these beautiful methods obsolete.)

The relation between the theorems of Borel and Pólya is that the capacity of a disk is its radius. This can be seen by the fact that \(V(z)=\log(|z|/R\)\) solves the Laplace equation with Dirichlet condition outside of the disk of radius \(R\).

A few other capacities have been computed, not too many, in fact, because it appears to be a surprisingly difficult problem. For example, the capacity of an interval is a fourth of its length.

Pólya's proof is similar to Borel's, but considers the Kronecker determinant in place of Hankel's. However, the linear combinations that will allow to show that this determinant is small are not as explicit as in Borel's proof. They follow from another interpretation of the capacity introduced by the Hungarian-Israeli mathematician Michael Fekete (1886–1957; born in then Austria–Hungary, now Serbia, he emigrated to Palestine in 1928.)

You know that the diameter \(d_2(K)\) of \(K\) is the upper bound of all distances \(|x-y|\) where \(x,y\) are arbitrary points of \(K\). Now for an integer \(n\geq 2\), consider the upper bound \(d_n(K)\) of all products of distances \( \prod_{i\neq j}{x_j-x_i}\)^{1/n(n-1)}\) where \(x_1,\dots,x_n\) are arbitrary points of \(K\). It is not so hard to prove that the sequence \(d_n(K)\) decreases with \(n\), and the limit \(\delta(K)\) of that sequence is called the transfinite diameter by Fekete.

Proposition. — \( \delta(K)= C_K\).

This allows to make a link between capacity theory and another theme of complex function theory, namely the theory of best approximation, which end up in Pólya's proof: the adequate linear combination for the \(n\)th row is given by the coefficients of the monic polynomial of degree \(n\) which has the smallest least upper bound on \(K\).

If all this is of some appeal to you, there's a wonderful little book by Thomas Ransford, Potential Theory in the Complex Plane, which I find quite readable (say, from 3rd or 4th year of math studies on).

In the forthcoming episodes, I'll discuss two striking applications of the theorems of Borel and Pólya to proof by Bernhard Dwork of a proof of a conjecture of Weil (in 1960), and by a new proof (in 1987) by Jean-Paul Bézivin and Philippe Robba of the transcendence of the numbers \(e\) and \(\pi\), two results initially proven by Charles Hermite and Ferdinand von Lindemann in 1873 and 1882.

Friday, April 2, 2021

On the Hadamard-Lévy theorem, or is it Banach-Mazur?

During the preparation of an agrégation lecture on connectedness, I came across the following theorem, attributed to Hadamard–Lévy: 

Theorem. — Let $f\colon \mathbf R^n\to\mathbf R^n$ be a $\mathscr C^1$-map which is proper and a local diffeomorphism. Then $f$ is a global diffeomorphism.

In this context, that $f$ is proper means that $\| f(x)\| \to+\infty$ when $\| x\|\to+\infty$, while, by the inverse function theorem, the condition that $f$ is a local diffeomorphism is equivalent to the property that its differential $f'(x)$ is invertible, for every $x\in\mathbf R^n$. The conclusion is that $f$ is a diffeomorphism from $\mathbf R^n$ to itself; in particular, $f$ is bijective and its inverse is continuous.

This theorem is not stated in this form neither by Hadamard (1906), nor by Lévy (1920), but is essentially due to Banach & Mazur (1934) and it is the purpose of this note to clarify the history, explain a few proofs, as well as more recent consequences for partial differential equations.

A proper map is closed: the image $f(A)$ of a closed subset $A$ of $\mathbf R^n$ is closed in $\mathbf R^n$. Indeed, let $(a_m)$ be a sequence in $A$ whose image $(f(a_m))$ converges in $\mathbf R^n$ to an element $b$; let us show that there exists $a\in A$ such that $b=f(a)$. The properness assumption on $f$ implies that $(a_m)$ is bounded. Consequently, it has a limit point $a$, and $a\in A$ because $A$ is closed. Necessarily, $f(a)$ is a limit point of the sequence $(f(a_m))$, hence $b=f(a)$.

In this respect, let us note the following reinforcement of the previous theorem, due to Browder (1954):
Theorem (Browder). — Let $f\colon \mathbf R^n\to\mathbf R^n$ be a local homeomorphism. If $f$ is closed, then $f$ is a global homeomorphism.

A surprising aspect of these results and their descendents is that they are based on two really different ideas. Banach & Mazur and Browder are based on the notion of covering, with ideas of homotopy theory and, ultimately, the fact that $\mathbf R^n$ is simply connected. On the other hand, the motivation of Hadamard was to generalize to dimension $n$ the following elementary discussion in the one-dimensional case: Let $f\colon\mathbf R\to\mathbf R$ be a $\mathscr C^1$-function whose derivative is $>0$ everywhere (so that $f$ is strictly increasing); give a condition for $f$ to be surjective. In this case, the condition is easy to find: the indefinite integral $\int f'(x)\,dx$ has to be divergent both at $-\infty$ and $+\infty$. In the $n$-dimensional case, the theorems of Hadamard is the following:

Theorem.Let $f\colon\mathbf R^n\to\mathbf R^n$ be a $\mathscr C^1$-map. For $r\in\mathbf R_+$, let $\omega(r)$ be the infimum, for $x\in\mathbf R^n$ such that $\|x\|=r$, of the norm of the linear map $f'(x)^{-1}$; if $\int_0^\infty dr/\omega(r)=+\infty$, then $f$ is a global diffeomorphism.

In Hadamard's paper, the quantity $\omega(r)$ is described geometrically as the minor axis of the ellipsoid defined by $f'(x)$, and Hadamard insists that using the volume of this ellipsoid only, essentially given by the determinant of $f'(x)$, would not suffice to characterize global diffeomorphisms. (Examples are furnished by maps of the form $f(x_1,x_2)=(f_1(x_1),f_2(x_2))$. The determinant condition considers $f_1'(x_1)f_2'(x_2)$, while one needs individual conditions on $f'_1(x_1)$ and $f'_2(x_2)$.)

In fact, as explained in Plastock (1974), both versions (closedness hypothesis or quantitative assumptions on the differential) imply that the map $f$ is a topological covering of $\mathbf R^n$. Since the target $\mathbf R^n$ is simply connected and the source $\mathbf R^n$ is connceted, $f$ has to be a homeomorphism. I will explain this proof below, but I would first like to explain another one, due to Zuily & Queffelec (1995) propose an alternating proof which is quite interesting.

A dynamical system approach

The goal is to prove that $f$ is bijective and, to that aim, we will prove that every preimage set $f^{-1}(b)$ is reduced to one element. Replacing $f$ by $f-b$, it suffices to treat the case of $b=0$. In other words, we wish to solve that the equation $f(x)=0$ has exactly one solution. For that, it is natural to try to start from some point $\xi\in\mathbf R^n$ and to force $f$ to decrease. This can be done by following the flow of the vector field given by $v(x)=-f'(x)^{-1}(f(x))$. This is a vector field on $\mathbf R^n$ and we can consider its flow: a map $\Phi$ defined on an open subset of $\mathbf R\times\mathbf R^n$ such that $\partial_t \Phi(t,x)=v(\Phi(t,x))$ for all $(t,x)$ and $\Phi(0,x)=x$ for all $x$. In fact, the Cauchy–Lipschitz theorem guarantees the existence of such a flow only if the vector field $v$ is locally Lipschitz, which happens if, for example, $f$ is assumed to be $\mathscr C^2$. In this case, there is even uniqueness of a maximal flow, and we will make this assumption, for safety. (In fact, the paper of De Marco, Gorni & Zampieri (1994) constructs the flow directly thanks to the hypothesis that the vector field is pulled back from the Euler vector field on $\mathbf R^n$.)

What are we doing here? Note that in $\mathbf R^n$, the opposite of the Euler vector field, defined by $u(y)=-y$, has a very simple solution: the flow lines are straight lines going to $0$. The formula above just pulls back this vector field $u$ via the local diffeomorphism $f$, and the flow lines of the vector field $v$ will just be the ones given by pull back by $f$, which will explain the behaviour described below.

In particular, let $a\in\mathbf R^n$ be such that $f(a)=0$ and let $U$ be a neighborhood of $a$ such that $f$ induces a diffeomorphism from $U$ to a ball around $0$. Pulling back the solution of the minus-Euler vector field by $f$, we see that once a flow line enters the open set $U$, it converges to $a$. The goal is now to prove that it will indeed enter such a neighborhood (and, in particular, that such a point $a$ exists).

We consider a flow line starting from a point $x$, that is, $\phi(t)=\Phi(t,x)$ for all times $t$. Let $g(t)= f(\phi(t))$; observe that $g$ satisfies $g'(t)=f'(\phi(t))(\phi'(t))=-g(t)$, hence $g(t)=g(0)e^{-t}$. Assume that the line flow is defined on $[0;t_1\mathopen[$, with $t_1<+\infty$. by what precedes, $g$ is bounded in the neighborhood of $t_1$; since $f$ is assumed to be proper, this implies that $\phi(t)$ is bounded as well. The continuity of the vector field $v$ implies that $\phi$ is uniformly continuous, hence it has a limit at $t_1$. We may then extend the line flow a bit right of $t_1$. As a consequence, the line flow is defined for all times, and $g(t)\to0$ when $t\to+\infty$. By the same properness argument, this implies that $\phi(t)$ is bounded when $t\to+\infty$, hence it has limit points $a$ which satisfy $f(a)=0$. Once $\phi$ enters an appropriate neighborhood of such a point, we have seen that the line flow automatically converges to some point $a\in f^{-1}(0)$.

Let us now consider the map $\lambda\colon\mathbf R^n\to f^{-1}(0)$ that associates with a point $\xi$ the limit of the line flow $t\mapsto \Phi(t,\xi)$ starting from the initial condition $\xi$. By continuity of the flow of a vector field depending on the initial condition, the map $\lambda$ is continuous. On the other hand, the hypothesis that $f$ is a local diffeomorphism implies that $f^{-1}(0)$ is a closed discrete subset of $\mathbf R^n$. Since $\mathbf R^n$ is connected, the map $\lambda$ is constant. Since one has $\lambda(\xi)=\xi$ for every $\xi\in f^{-1}(0)$, this establishes that $f^{-1}(0)$ is reduced to one element, as claimed.

Once $f$ is shown to be bijective, the fact that it is proper (closed would suffice) implies that its inverse bijection $f^{-1}$ is continuous. This concludes the proof.

The theorem of Banach and Mazur

The paper of Banach and Mazur is written in a bigger generality. They consider multivalued continuous maps $F\colon X\to Y$ ($k$-deutige stetige Abbildungen) by which they mean that for every $x$, a subset $F(x)$ of $Y$ is given, of cardinality $k$, the continuity being expressed by sequences: if $x_n\to x$, one can order, for every $n$, the elements of $F(x_n)=\{y_{n,1},\dots,y_{n,k}\}$, as well as the elements of $F(x)=\{y_1,\dots,y_k\}$, in such a way that $y_{n,j}\to y_n$ for all $j$. (In their framework, $X$ and $Y$ are metric spaces, but one could transpose their definition to topological spaces if needed.) They say that such a map is decomposed (zerfällt) if there are continuous functions $f_1,\dots,f_k$ from $X$ to $Y$ such that $F(x)=\{f_1(x),\dots,f_k(x)\}$ for all $x\in X$.

In essence, the definition that Banach and Mazur are proposing contains as a particular case the finite coverings. Namely, if $p\colon Y\to X$ is a finite covering of degree $k$, then the map $x\mapsto p^{-1}(x)$ is a continuous $k$-valued map from $X$ to $Y$. Conversely, let us consider the graph $Z$ of $F$, namely the set of all points $(x,y)\in X\times Y$ such that $y\in F(x)$. Then the first projection $p\colon Z\to X$ is a covering map of degree $k$, but it is not clear that it has local sections.

It would however not be so surprising to 21st-century mathematicians that if one makes a suitable assumption of simple connectedness on $X$, then every such $F$ should be decomposed. Banach and Mazur assume that $X$ satisfies two properties:

  1. The space $X$ is semilocally arcwise connected: for every point $x\in X$ and every neighborhood $U$ of $x$, there exists an open neighborhood $U'$ contained in $U$ such that for every point $x'\in U'$, there exists a path $c\colon[0;1]\to U$ such that $c(0)=x$ and $c(1)=x'$. (Semilocally means that the path is not necessarily in $U'$ but in $U$.)
  2. The space $X$ is arcwise simply connected: two paths $c_0,c_1\colon[0;1]\to X$ with the same endpoints ($c_0(0)=c_1(0)$ and $c_0(1)=c_1(1)$) are strictly homotopic — there exists a continuous map $h\colon[0;1]\to X$ such that $h(0,t)=c_0(t)$ and $h(1,t)=c_1(t)$ for all $t$, and $h(s,0)=c_0(0)$ and $h(s,1)=c_0(1)$ for all $s$.

Consider a $k$-valued continuous map $F$ from $X$ to $Y$, where $X$ is connected. Banach and Mazur first prove that for every path $c\colon [0;1]\to X$ and every point $y_0\in F(c(0))$, there exists a continuous function $f\colon[0;1]\to Y$ such that $f(t)\in F(c(t))$ for all $t$. To that aim, the consider disjoint neighborhoods $V_1,\dots,V_k$ of the elements of $F(c(0))$, with $y_0\in V_1$, say, and observe that for $t$ small enough, there is a unique element in $F(c(t))\cap V_1$. This defines a bit of the path $c$, and one can go on. Now, given two paths $c,c'$ such that $c(0)=c'(0)$ and $c(1)=c'(1)$, and two maps $f,f'$ as above, they consider a homotopy $h\colon[0;1]\times[0;1]\to X$ linking $c$ to $c'$. Subdividing this square in small enough subsquares, one see by induction that $f(1)=f'(1)$. (This is analogous to the proof that a topological covering of the square is trivial.) Fixing a point $x_0\in X$ and a point $y_0\in F(x_0)$, one gets in this way a map from $X$ to $Y$ such that $F(x)$ is equal to $f(1)$, for every path $c\colon[0;1]\to X$ such that $c(0)=x_0$ and $c(1)=x$, and every continuous map $f\colon [0;1]\to Y$ such that $f(t)\in F(c(t))$ for all $t$ and $f(0)=y_0$. This furnishes a map from $X$ to $Y$, and one proves that it is continuous. If one considers all such maps, for all points in $F(x_0)$, one obtains the decomposition of the multivalued map $F$.

To prove their version of the Hadamard–Lévy theorem, Banach and Mazur observe that if $f\colon Y\to X$ is a local homeomorphism which is proper, then setting $F(x)=f^{-1}(y)$ gives a multivalued continuous map. It is not obvious that the cardinalities $k(x)$ of the sets $F(x)$ are constant, but this follows (if $X$ is connected) from the fact that $f$ is both a local homeomorphism and proper. Then $F$ is decomposed, so that there exist continuous maps $g_1,\dots,g_k\colon X\to Y$ such that $f^{-1}(x)=\{g_1(x),\dots,g_k(x)\}$ for all $x\in X$. This implies that $Y$ is the disjoint union of the $k$ connected subsets $g_j(X)$. If $Y$ is connected, then $f$ is a homeomorphism.

The versions of Hadamard and Lévy, after Plastock

Hadamard considered the finite dimensional case, and Lévy extended it to the case of Hilbert spaces.

Plastock considers a Banach-space version of the theorem above: $f\colon E\to F$ is a $\mathscr C^1$-map between Banach spaces with invertible differentials and such that, setting $\omega(r)=\inf_{\|x\| = r}\|f'(x)^{-1}\|$, one has $\int_0^\infty \omega(r)\,dr=+\infty$. Of course, under these hypotheses, the Banach spaces $E$ and $F$ are isomorphic, but it may be useful that they are not identical. Note that $f(E)$ is open in $F$, and the proposition that will insure that $f$ is a global diffeomorphism is the following one, in the spirit of covering theory.

Proposition.(Assuming that $f$ is a local diffeomorphism.) It suffices to prove that the map $f$ satisfies the path lifting property: for every point $x\in E$ and every $\mathscr C^1$ map $c\colon[0;1]\to f(E)$ such that $c(0)=f(x)$, there exists a $\mathscr C^1$ map $d\colon[0;1]\to E$ such that $c(t)=f(d(t))$ for all $t$ and $d(0)=c$.

The goal is now to prove that $f$ satisfies this path lifting property. Using that $f$ is a local homeomorphism, one sees that lifts are unique, and are defined on a maximal subinterval of $[0;1]$ which is either $[0;1]$ itself, or of the form $[0;s\mathclose[$. To prevent the latter case, one needs to impose conditions on the norm $\| f'(x)^{-1}\|$ such as the one phrased in terms of $\omega(r)$ as in the Hadamard–Lévy theorem. In fact, Plastock starts with a simpler case.

Proposition.The path lifting property follows from the following additional hypotheses:

  1. One has $\|f(x)\|\to+\infty$ when $\|x\|\to+\infty$;
  2. There exists a positive continuous function $M\colon\mathbf R_+\to\mathbf R_+$ such that $\|f'(x)^{-1}\|\leq M(\|x\|)$ for all $x.

Assume indeed that a path $c$ has a maximal lift $d$, defined over the interval $[0;s\mathclose[$. By the hypothesis (i), $d(t)$ remains bounded when $t\to s$, because $c(t)=f(d(t))$ tends to $c(s)$. Differentiating the relation $c(t)=f(d(t))$, one gets $c'(t)=f'(d(t))(d'(t))$, hence $d'(t)=f'(d(t))^{-1}(c'(t))$, so that $\| d'(t)\|\leq M(\|d(t)\|) \|c'(t)\|$. This implies that $\|d'\|$ is bounded, so that $d$ is uniformly continuous, hence it has a limit at $s$. Then the path $d$ can be extended by setting $d(s)$ to this limit and using the local diffeomorphism property to go beyong $s$.

The Hadamard–Lévy is related to completeness of some length-spaces. So we shall modify the distance of the Banach space $E$ as follows: if $c\colon[0;1]\to E$ is a path in $E$, then its length is defined by \[ \ell(c) = \int_0^1 \| f'(c(t))^{-1}\|^{-1} \|{c'(t)}\|\, dt. \] Observe that $\|f'(c(t))^{-1}\|^{-1} \geq \omega(\|c(t)\|)$, so that \[ \ell(c) \geq \int_0^1 \omega(\|c(t)\|) \|{c'(t)}\|\, dt. \] The modified distance of two points in $E$ is then redefined as the infimum of the lengths of all paths joining two points.

Lemma.With respect to the modified distance, the space $E$ is complete.

One proves that $\ell(c) \geq \int_{\|{c(0)}\|}^{\|{c(1)}\|}\omega(r)\,dr$. Since $\int_0^\infty \omega(r)\,dr=+\infty$, this implies that Cauchy sequences for the modified distance are bounded in $E$ for the original norm. On the other hand, on any bounded subset of $E$, the Banach norm and the modified distance are equivalent, so that they have the same Cauchy sequences.

Other conditions can be derived from Plastock's general theorem. For example, assuming that $E$ and $F$ are a Hilbert space $H$, he shows that it suffices to assume the existence of a decreasing function $\lambda\colon\mathbf R_+\to\mathbf R_+$ such that $\langle f'(x)(u),u\rangle \geq \lambda(\|x\|) \| u\|^2$ for all $x,y$ and $\int_0^\infty \lambda(r)\,dr=+\infty$. Indeed, under this assumption, one may set $\omega(r)=\lambda(r)$.

Application to periodic solutions of differential equations

Spectral theory can be seen as the infinite dimensional generalization of classical linear algebra. Linear differential operators and linear partial differential operators furnish prominent examples of such operators. The theorems of Hadamard–Lévy type have been applied to solve nonlinear differential equations.

I just give an example here, to give an idea of how this works, and also because I am quite lazy enough to check the details.

Following Brown & Lin (1979), we consider the Newtonian equation of motion: \[ u''(t) + \nabla G (u(t)) = p(t) \] where $G$ represents the ambiant potential, assumed to be smooth enough, and $p\colon \mathbf R\to\mathbf R^n$ is some external control. The problem studied by Brown and Lin is to prove the existence of periodic solutions when $p$ is itself periodic. The method consists in interpreting the left hand side as a non linear map defined on the Sobolev space $E$ of $2\pi$-periodic $\mathscr C^1$-functions with a second derivative in $F=L^2([0;2\pi];\mathbf R^n)$, with values in $F$. Write $L$ for the linear operator $u\mapsto u''$ and $N$ for the (nonlinear) operator $u\mapsto \nabla G(u)$. Then $L$ is linear continuous (hence $L'(u)(v)=L'(v)$), and $N$ is continuously differentiable, with differential given by \[ N'(u) (v) = \left( t \mapsto Q (u(t)) (v(t)) \right) \] for $u,v\in E$, and $Q$ is the Hessian of $G$.

In other words, the differential $(L+N)'(u)$ is the linear map $v\mapsto L(v) + Q(u(t)) v$. It is invertible if the eigenvalues of $Q(u(t))$ are away from integers. Concretely, Brown and Lin assume that there are two constant symmetric matrices $A$ and $B$ such that $A\leq Q(x) \leq B$ for all $x$, and whose eigenvalues $\lambda_1\leq \dots\lambda_n$ and $\mu_1\leq\dots\leq \mu_n$ are such that there are integers $N_1,\dots,N_n$ with $N_k^2<\lambda_k\leq\mu_k<(N_k+1)^2$ for all $k$. Using spectral theory in Hilbert spaces, these conditions imply that the linear operator $L+Q(u)\colon E\to F$ is an isomorphism, and that $\|(L+Q(u)^{-1}\|$ is bounded from above by the constant expression \[ c= \sup_{1\leq k\leq n} \sup (\lambda_k-N_k^2)^{-1},((N_k+1)^2-\mu_k)^{-1} ).\]

Thanks to this differential estimate, the theorem of Hadamard–Lévy implies that the nonlinear differential operator $L+N$ is a global diffeomorphism from $E$ to $F$. In particular, there is a unique $2\pi$-periodic solution for every $2\pi$-periodic control function $p$.

I thank Thomas Richard for his comments.

Thursday, January 14, 2021

On Rolle's theorem

This post is inspired by a paper of Azé and Hiriart-Urruty published in a French high school math journal; in fact, it is mostly a paraphrase of that paper with the hope that it be of some interest to young university students, or to students preparing Agrégation. The topic is Rolle's theorem.

1. The one-dimensional theorem, a generalization and two other proofs

Let us first quote the theorem, in a nonstandard form.

Theorem.Let $I=\mathopen]a;b\mathclose[$ be a nonempty but possibly unbounded interval of $\mathbf R$ and let $f\colon I\to\mathbf R$ be a continuous function. Assume that $f$ has limits at $a$ and $b$, equal to some element $\ell\in\mathbf R\cup\{+\infty\}$. Then $f$ is bounded from below.

  1. If $\inf_I(f)<\ell$, then there exists a point $c\in I$ such that $f(c)=\inf_I (f)$. If, moreover, $f$ has a right derivative and a left derivative at $c$, then $f'_l(c)\leq0$ and $f'_r(c)\geq0$.
  2. If $\inf_I(f)\geq\ell$, then $f$ is bounded on $I$ and there exists a point $c\in I$ such that $f(c)=\sup_I(f)$. If, moreover, $f$ has a right derivative and a left derivative at $c$, then $f'_l(c)\geq0$ and $f'_r(c)\leq0$.

Three ingredients make this version slightly nonstandard:

  • The interval $I$ may be taken to be infinite;
  • The function $f$ may tend to $+\infty$ at the endpoints of $I$;
  • Only left and  right derivatives are assumed.

Of course, if $f$ has a derivative at each point, then the statement implies that $f'(c)=f'_l(c)=f'_r(c)=0$.

a) As stated in this way, the proof is however quite standard and proceeds in two steps.

  1. Using that $f$ has a limit $\ell$ which is not $-\infty$ at $a$ and $b$, it follows that there exists $a'$ and $b'$ in $I$ such that $a<a'<b'<b$ such that $f$ is bounded from below on $\mathopen ]a;a']$ and on $[b';b\mathclose[$. Since $f$ is continuous on the compact interval $[a';b']$, it is then bounded from below on $I$.
    If $\inf_I(f)<\ell$, then we can choose $\ell'\in\mathbf R$ such that $\inf_I(f)<\ell'<\ell$ and $a'$, $b'$ such that $f(x)>\ell'$ outside of $[a';b']$. Then, let $c\in [a';b']$ such that $f(c)=\inf_{[a';b']}(f)$; then $f(c)=\inf_I(f)$. 
    If $\sup_I(f)>\ell$, then we have in particular $\ell\neq+\infty$, and we apply the preceding analysis to $-f$.
    In the remaining case, $\inf_I(f)=\sup_I(f)=\ell$ and $f$ is constant.
  2. For $x>c$, one has $f(x)\geq f(c)$, hence $f'_r(c)\geq 0$; for $x<c$, one has $f(x)\geq f(c)$, hence $f'_l(c)\leq0$.

The interest of the given formulation can be understood by looking at the following two examples.

  1. If $f(x)=|x|$, on $\mathbf R$, then $f$ attains its lower bound at $x=0$ only, where one has $f'_r(0)=1$ and $f'_l(0)=-1$.
  2. Take $f(x)=e^{-x^2}$. Then there exists $c\in\mathbf R$ such that $f'(c)=0$. Of course, one has $f'(x)=-2xe^{-x^2}$, so that $c=0$. However, it is readily seen by induction that for any integer $n$, the $n$th derivative of $f$ is of the form $P_n(x)e^{-x^2}$, where $P_n$ has degree $n$. In particular, $f^{(n)}$ tends to $0$ at infinity. And, by induction again, the theorem implies that $P_n$ has $n$ distinct roots in $\mathbf R$, one between any two consecutive roots of $P_{n-1}$, one larger than the largest root of $P_n$, and one smaller than the smallest root of $P_n$.

b) In a 1959 paper, the Rumanian mathematician Pompeiu proposed an alternative proof of Rolle's theorem, when the interval $I$ is bounded, and which works completely differently. Here is how it works, following the 1979 paper published in American Math. Monthly by Hans Samelson.

First of all, one uses the particular case $n=2$ of the Levi chord lemma :

Lemma.Let $f\colon [a;b]\to\mathbf R$ be a continuous function such that $f(a)=f(b)$. For every integer $n\geq 2$, there exists $a',b'\in[a;b]$ such that $f(a')=f(b')$ and $b'-a'=(b-a)/n$.

Let $h=(b-a)/n$. From the equality
\[ 0 = f(b)-f(a) = (f(a+h)-f(a))+(f(a+2h)-f(a+h))+\cdots + (f(a+nh)-f(a+(n-1)h), \]
one sees that the function $x\mapsto f(x+h)-f(x)$ from $[a;b-h]$ to $\mathbf R$ does not have constant sign. By the intermediate value theorem, it vanishes at some point $a'\in [a;b-h]$. If $b'=a'+h$, then $b'\in[a;b]$, $b'-a'=(b-a)/n$ and $f(a')=f(b')$.

Then, it follows by induction that there exists a sequence of nested intervals $([a_n;b_n])$ in $[a;b]$ with $f(a_n)=f(b_n)$ and $b_n-a_n=(b-a)/2^n$ for all $n$. The sequences $(a_n)$ and $(b_n)$ converge to a same limit $c\in [a;b]$. Since $f(b_n)=f(c)+(b_n-c) (f'(c) + \mathrm o(1))$, $f(a_n)=f(c)+(a_n-c)(f'(c)+\mathrm o(1))$, one has
\[ f'(c) = \lim \frac{f(b_n)-f(a_n)}{b_n-a_n} = 0. \]

What makes this proof genuinely distinct from the classical one is that the obtained point $c$ may not be a local minimum or maximum of $f$, also I don't have an example to offer now.

c) In 1979, Abian furnished yet another proof, which he termed as the “ultimate” one. Here it is:

It focuses on functions $f\colon[a;b]\to\mathbf R$ on a bounded interval of $\mathbf R$ which are not monotone and, precisely, which are up-down, in the sense that $f(a)\leq f(c)$ and $f(c)\geq f(b)$, where $c=(a+b)/2$ is the midpoint of $f$. If $f(a)=f(b)$, then either $f$ or $-f$ is up-down.

Then divide the interval $[a;b]$ in four equal parts: $[a;p]$, $[p;c]$, $[c;q]$ and $[q;b]$. If $f(p)\geq f(c)$, the $f|_{[a;c]}$ is up-down. Otherwise, one has $f(p)\leq f(c)$. In this case, if $f(c)\geq f(q)$, we see that $f|_{[p;q]}$ is up-down. And otherwise, we observe that $f(q)\leq f(c)$ and $f(c)\geq f(b)$, so that $f|_{[c;b]}$ is up-down. Conclusion: we have isolated within the interval $[a;b]$ a subinterval $[a';b']$ of length $(b-a)/2$ such that $f|_{[a';b']}$ is still up-down.

Iterating the procedure, we construct a sequence $([a_n;b_n])$ of nested intervals, with $(b_n-a_n)=(b-a)/2^n$ such that the restriction of $f$ to each of them is up-down. Set $c_n=(a_n+b_n)/2$.

The sequences $(a_n), (b_n),(c_n)$ satisfy have a common limit $c\in [a;b]$. From the inequalities $f(a_n)\leq f(c_n)$ and $a_n\leq c_n$,  we obtain $f'(c)\geq 0$; from the inequalities $f(c_n)\geq f(b_n)$ and $c_n\leq b_n$, we obtain $f'(c)\leq 0$. In conclusion, $f'(c)=0$.

2. Rolle's theorem in normed vector spaces

Theorem. Let $E$ be a normed vector space, let $U$ be an open subset of $E$ and let $f\colon U\to\mathbf R$ be a differentiable function. Assume that there exists $\ell\in\mathbf R\cup\{+\infty\}$ such that $f(x)\to \ell$ when $x$ tends to the “boundary” of $U$ — for every $\ell'<\ell$, there exists a compact subset $K$ of $U$ such that $f(x)\geq\ell'$ for all $x\in U$ but $x\not\in K$. Then $f$ is bounded below on $U$, there exists $a\in U$ such that $f(a)=\inf_U (f)$ and $Df(a)=0$.

The proof is essentially the same as the one we gave in dimension 1. I skip it here.

If $E$ is finite dimensional, then this theorem applies in a vast class of examples : for example, bounded open subsets $U$ of $E$, and continuous functions $f\colon \overline U\to\mathbf R$ which are constant on the boundary $\partial(U)=\overline U - U$ of $U$ and differentiable on $U$.

However, if $E$ is infinite dimensional, the closure of a bounded open set is no more compact, and it does not suffice that $f$ extends to a function on $\overline U$ with a constant value on the boundary.

Example. — Let $E$ be an infinite dimensional Hilbert space, let $U$ be the open unit ball and $B$ be the closed unit ball. Let $g(x)=\frac12 \langle Ax,x\rangle+\langle b,x\rangle +c$ be a quadratic function, where $A\in\mathcal L(E)$, $b\in E$ and $c\in\mathbf R$, and let $f(x)=(1-\lVert x\rVert^2) g(x)$. The function $f$ is differentiable on $E$ and one has
\[  \nabla f(x) =  (1-\Vert x\rVert^2) ( Ax + b) - 2 (\frac12 \langle Ax,x\rangle + \langle b,x\rangle + c) x. \]
Assume that there exists $x\in U$ such that $\nabla f(x)=0$. Then $Ax+b = \lambda x$, with
\[ \lambda= \frac2{1-\lVert x\rVert ^2} \left(\frac12 \langle Ax,x\rangle + \langle b,x\rangle + c \right). \]
Azé and Hiriart-Urruty take $E=L^2([0;1])$, for $A$ the operator of multiplication by the function $t$,  $b(t)=t(1-t)$, and $c=4/27$. Then, one has $g(x)>0$, hence $\lambda>0$, and $x(t)=\frac1{\lambda-t}b(t)$ for $t\in[0;1]$. This implies that $\lambda\geq 1$, for, otherwise, the function $x(t)$ would not belong to $E$. This allows to compute $\lambda$ in terms of $\mu$,  obtaining $\lambda\leq3/4$, which contradicts the inequality $\lambda\geq 1$. (I refer to the paper of Azé and Hiriart-Urruty for more details.)

3. An approximate version of Rolle's theorem

Theorem. Let $B$ the closed euclidean unit ball in $\mathbf R^n$, let $U$ be its interior, let $f\colon B\to \mathbf R$ be a continuous function on $B$. Assume that $\lvert f\rvert \leq \epsilon $ on the boundary $\partial(U)$ and that $f$ is differentiable on $U$. Then there exists $x\in U$ such that $\lVert Df(x)\rVert\leq\epsilon$.

In fact, replacing $f$ by $f/\epsilon$, one sees that it suffices to treat the case $\epsilon =1$.

Let $g(x)=\lVert x\rVert^2- f(x)^2$. This is a continuous function on $B$; it is differentiable on $U$, with $ \nabla g(x)=2(x-f(x)\nabla f(x))$. Let $\mu=\inf_B(g)$. Since $g(0)=-f(0)^2\leq0$, one has $\mu\leq 0$. We distinguish two cases:

  1. If $\mu=0$, then $\rvert f(x)\lvert \leq \lVert x\rVert$ for all $x\in B$. This implies that $\lVert\nabla f(0)\rVert\leq1$.
  2. If $\mu<0$, let $x\in B$ be such that $ g(x)=\mu$; in particular, $f(x)^2\geq \lVert x\rVert^2-\mu>0$, which implies that $f(x)\neq0$. Since $g\geq0$ on $\partial(U)$, we have $x\in B$, hence $\nabla g(x)=0$. Then $x=f(x)\nabla f(x)$, hence $\nabla f(x)=x/f(x)$. Consequently,
    \[ \lVert \nabla f(x)\rVert \leq \frac{\lVert x\rVert}{f(x)}\leq \frac{\lVert x\rVert}{(\lVert x\rVert^2-\mu)^{1/2}}<1.\]

This concludes the proof. 


Thanks to the Twitter users @AntoineTeutsch, @paulbroussous and @apauthie for having indicated me some misprints and incorrections.