This post is inspired by a paper of Azé and Hiriart-Urruty published in a French high school math journal; in fact, it is mostly a paraphrase of that paper with the hope that it be of some interest to young university students, or to students preparing Agrégation. The topic is Rolle's theorem.
1. The one-dimensional theorem, a generalization and two other proofs
Let us first quote the theorem, in a nonstandard form.
Theorem. — Let $I=\mathopen]a;b\mathclose[$ be a nonempty but possibly unbounded interval of $\mathbf R$ and let $f\colon I\to\mathbf R$ be a continuous function. Assume that $f$ has limits at $a$ and $b$, equal to some element $\ell\in\mathbf R\cup\{+\infty\}$. Then $f$ is bounded from below.
- If $\inf_I(f)<\ell$, then there exists a point $c\in I$ such that $f(c)=\inf_I (f)$. If, moreover, $f$ has a right derivative and a left derivative at $c$, then $f'_l(c)\leq0$ and $f'_r(c)\geq0$.
- If $\inf_I(f)\geq\ell$, then $f$ is bounded on $I$ and there exists a point $c\in I$ such that $f(c)=\sup_I(f)$. If, moreover, $f$ has a right derivative and a left derivative at $c$, then $f'_l(c)\geq0$ and $f'_r(c)\leq0$.
Three ingredients make this version slightly nonstandard:
- The interval $I$ may be taken to be infinite;
- The function $f$ may tend to $+\infty$ at the endpoints of $I$;
- Only left and right derivatives are assumed.
Of course, if $f$ has a derivative at each point, then the statement implies that $f'(c)=f'_l(c)=f'_r(c)=0$.
a) As stated in this way, the proof is however quite standard and proceeds in two steps.
- Using that $f$ has a limit $\ell$ which is not $-\infty$ at $a$ and $b$, it follows that there exists $a'$ and $b'$ in $I$ such that $a<a'<b'<b$ such that $f$ is bounded from below on $\mathopen ]a;a']$ and on $[b';b\mathclose[$. Since $f$ is continuous on the compact interval $[a';b']$, it is then bounded from below on $I$.
If $\inf_I(f)<\ell$, then we can choose $\ell'\in\mathbf R$ such that $\inf_I(f)<\ell'<\ell$ and $a'$, $b'$ such that $f(x)>\ell'$ outside of $[a';b']$. Then, let $c\in [a';b']$ such that $f(c)=\inf_{[a';b']}(f)$; then $f(c)=\inf_I(f)$.
If $\sup_I(f)>\ell$, then we have in particular $\ell\neq+\infty$, and we apply the preceding analysis to $-f$.
In the remaining case, $\inf_I(f)=\sup_I(f)=\ell$ and $f$ is constant.
- For $x>c$, one has $f(x)\geq f(c)$, hence $f'_r(c)\geq 0$; for $x<c$, one has $f(x)\geq f(c)$, hence $f'_l(c)\leq0$.
The interest of the given formulation can be understood by looking at the following two examples.
- If $f(x)=|x|$, on $\mathbf R$, then $f$ attains its lower bound at $x=0$ only, where one has $f'_r(0)=1$ and $f'_l(0)=-1$.
- Take $f(x)=e^{-x^2}$. Then there exists $c\in\mathbf R$ such that $f'(c)=0$. Of course, one has $f'(x)=-2xe^{-x^2}$, so that $c=0$. However, it is readily seen by induction that for any integer $n$, the $n$th derivative of $f$ is of the form $P_n(x)e^{-x^2}$, where $P_n$ has degree $n$. In particular, $f^{(n)}$ tends to $0$ at infinity. And, by induction again, the theorem implies that $P_n$ has $n$ distinct roots in $\mathbf R$, one between any two consecutive roots of $P_{n-1}$, one larger than the largest root of $P_n$, and one smaller than the smallest root of $P_n$.
b) In a 1959 paper, the Rumanian mathematician Pompeiu proposed an alternative proof of Rolle's theorem, when the interval $I$ is bounded, and which works completely differently. Here is how it works, following the 1979 paper published in American Math. Monthly by Hans Samelson.
First of all, one uses the particular case $n=2$ of the Levi chord lemma :
Lemma. — Let $f\colon [a;b]\to\mathbf R$ be a continuous function such that $f(a)=f(b)$. For every integer $n\geq 2$, there exists $a',b'\in[a;b]$ such that $f(a')=f(b')$ and $b'-a'=(b-a)/n$.
Let $h=(b-a)/n$. From the equality
\[ 0 = f(b)-f(a) = (f(a+h)-f(a))+(f(a+2h)-f(a+h))+\cdots + (f(a+nh)-f(a+(n-1)h), \]
one sees that the function $x\mapsto f(x+h)-f(x)$ from $[a;b-h]$ to $\mathbf R$ does not have constant sign. By the intermediate value theorem, it vanishes at some point $a'\in [a;b-h]$. If $b'=a'+h$, then $b'\in[a;b]$, $b'-a'=(b-a)/n$ and $f(a')=f(b')$.
Then, it follows by induction that there exists a sequence of nested intervals $([a_n;b_n])$ in $[a;b]$ with $f(a_n)=f(b_n)$ and $b_n-a_n=(b-a)/2^n$ for all $n$. The sequences $(a_n)$ and $(b_n)$ converge to a same limit $c\in [a;b]$. Since $f(b_n)=f(c)+(b_n-c) (f'(c) + \mathrm o(1))$, $f(a_n)=f(c)+(a_n-c)(f'(c)+\mathrm o(1))$, one has
\[ f'(c) = \lim \frac{f(b_n)-f(a_n)}{b_n-a_n} = 0. \]
What makes this proof genuinely distinct from the classical one is that the obtained point $c$ may not be a local minimum or maximum of $f$, also I don't have an example to offer now.
c) In 1979, Abian furnished yet
another proof, which he termed as the “ultimate” one. Here it is:
It focuses on functions $f\colon[a;b]\to\mathbf R$ on a bounded interval of $\mathbf R$ which are not monotone and, precisely, which are up-down, in the sense that $f(a)\leq f(c)$ and $f(c)\geq f(b)$, where $c=(a+b)/2$ is the midpoint of $f$. If $f(a)=f(b)$, then either $f$ or $-f$ is up-down.
Then divide the interval $[a;b]$ in four equal parts: $[a;p]$, $[p;c]$, $[c;q]$ and $[q;b]$. If $f(p)\geq f(c)$, the $f|_{[a;c]}$ is up-down. Otherwise, one has $f(p)\leq f(c)$. In this case, if $f(c)\geq f(q)$, we see that $f|_{[p;q]}$ is up-down. And otherwise, we observe that $f(q)\leq f(c)$ and $f(c)\geq f(b)$, so that $f|_{[c;b]}$ is up-down. Conclusion: we have isolated within the interval $[a;b]$ a subinterval $[a';b']$ of length $(b-a)/2$ such that $f|_{[a';b']}$ is still up-down.
Iterating the procedure, we construct a sequence $([a_n;b_n])$ of nested intervals, with $(b_n-a_n)=(b-a)/2^n$ such that the restriction of $f$ to each of them is up-down. Set $c_n=(a_n+b_n)/2$.
The sequences $(a_n), (b_n),(c_n)$ satisfy have a common limit $c\in [a;b]$. From the inequalities $f(a_n)\leq f(c_n)$ and $a_n\leq c_n$, we obtain $f'(c)\geq 0$; from the inequalities $f(c_n)\geq f(b_n)$ and $c_n\leq b_n$, we obtain $f'(c)\leq 0$. In conclusion, $f'(c)=0$.
2. Rolle's theorem in normed vector spaces
Theorem. — Let $E$ be a normed vector space, let $U$ be an open subset of $E$ and let $f\colon U\to\mathbf R$ be a differentiable function. Assume that there exists $\ell\in\mathbf R\cup\{+\infty\}$ such that $f(x)\to \ell$ when $x$ tends to the “boundary” of $U$ — for every $\ell'<\ell$, there exists a compact subset $K$ of $U$ such that $f(x)\geq\ell'$ for all $x\in U$ but $x\not\in K$. Then $f$ is bounded below on $U$, there exists $a\in U$ such that $f(a)=\inf_U (f)$ and $Df(a)=0$.
The proof is essentially the same as the one we gave in dimension 1. I skip it here.
If $E$ is finite dimensional, then this theorem applies in a vast class of examples : for example, bounded open subsets $U$ of $E$, and continuous functions $f\colon \overline U\to\mathbf R$ which are constant on the boundary $\partial(U)=\overline U - U$ of $U$ and differentiable on $U$.
However, if $E$ is infinite dimensional, the closure of a bounded open set is no more compact, and it does not suffice that $f$ extends to a function on $\overline U$ with a constant value on the boundary.
Example. — Let $E$ be an infinite dimensional Hilbert space, let $U$ be the open unit ball and $B$ be the closed unit ball. Let $g(x)=\frac12 \langle Ax,x\rangle+\langle b,x\rangle +c$ be a quadratic function, where $A\in\mathcal L(E)$, $b\in E$ and $c\in\mathbf R$, and let $f(x)=(1-\lVert x\rVert^2) g(x)$. The function $f$ is differentiable on $E$ and one has
\[ \nabla f(x) = (1-\Vert x\rVert^2) ( Ax + b) - 2 (\frac12 \langle Ax,x\rangle + \langle b,x\rangle + c) x. \]
Assume that there exists $x\in U$ such that $\nabla f(x)=0$. Then $Ax+b = \lambda x$, with
\[ \lambda= \frac2{1-\lVert x\rVert ^2} \left(\frac12 \langle Ax,x\rangle + \langle b,x\rangle + c \right). \]
Azé and Hiriart-Urruty take $E=L^2([0;1])$, for $A$ the operator of multiplication by the function $t$, $b(t)=t(1-t)$, and $c=4/27$. Then, one has $g(x)>0$, hence $\lambda>0$, and $x(t)=\frac1{\lambda-t}b(t)$ for $t\in[0;1]$. This implies that $\lambda\geq 1$, for, otherwise, the function $x(t)$ would not belong to $E$. This allows to compute $\lambda$ in terms of $\mu$, obtaining $\lambda\leq3/4$, which contradicts the inequality $\lambda\geq 1$. (I refer to the paper of Azé and Hiriart-Urruty for more details.)
3. An approximate version of Rolle's theorem
In fact, replacing $f$ by $f/\epsilon$, one sees that it suffices to treat the case $\epsilon =1$.
Let $g(x)=\lVert x\rVert^2- f(x)^2$. This is a continuous function on $B$; it is differentiable on $U$, with $ \nabla g(x)=2(x-f(x)\nabla f(x))$. Let $\mu=\inf_B(g)$. Since $g(0)=-f(0)^2\leq0$, one has $\mu\leq 0$. We distinguish two cases:
- If $\mu=0$, then $\rvert f(x)\lvert \leq \lVert x\rVert$ for all $x\in B$. This implies that $\lVert\nabla f(0)\rVert\leq1$.
- If $\mu<0$, let $x\in B$ be such that $ g(x)=\mu$; in particular, $f(x)^2\geq \lVert x\rVert^2-\mu>0$, which implies that $f(x)\neq0$. Since $g\geq0$ on $\partial(U)$, we have $x\in B$, hence $\nabla g(x)=0$. Then $x=f(x)\nabla f(x)$, hence $\nabla f(x)=x/f(x)$. Consequently,
\[ \lVert \nabla f(x)\rVert \leq \frac{\lVert x\rVert}{f(x)}\leq \frac{\lVert x\rVert}{(\lVert x\rVert^2-\mu)^{1/2}}<1.\]
This concludes the proof.
Thanks to the Twitter users @AntoineTeutsch, @paulbroussous and @apauthie for having indicated me some misprints and incorrections.