Thursday, January 14, 2021

On Rolle's theorem

This post is inspired by a paper of Azé and Hiriart-Urruty published in a French high school math journal; in fact, it is mostly a paraphrase of that paper with the hope that it be of some interest to young university students, or to students preparing Agrégation. The topic is Rolle's theorem.

1. The one-dimensional theorem, a generalization and two other proofs

Let us first quote the theorem, in a nonstandard form.

Theorem.Let I=]a;b[I=\mathopen]a;b\mathclose[ be a nonempty but possibly unbounded interval of R\mathbf R and let f ⁣:IRf\colon I\to\mathbf R be a continuous function. Assume that ff has limits at aa and bb, equal to some element R{+}\ell\in\mathbf R\cup\{+\infty\}. Then ff is bounded from below.

  1. If infI(f)<\inf_I(f)<\ell, then there exists a point cIc\in I such that f(c)=infI(f)f(c)=\inf_I (f). If, moreover, ff has a right derivative and a left derivative at cc, then fl(c)0f'_l(c)\leq0 and fr(c)0f'_r(c)\geq0.
  2. If infI(f)\inf_I(f)\geq\ell, then ff is bounded on II and there exists a point cIc\in I such that f(c)=supI(f)f(c)=\sup_I(f). If, moreover, ff has a right derivative and a left derivative at cc, then fl(c)0f'_l(c)\geq0 and fr(c)0f'_r(c)\leq0.

Three ingredients make this version slightly nonstandard:

  • The interval II may be taken to be infinite;
  • The function ff may tend to ++\infty at the endpoints of II;
  • Only left and  right derivatives are assumed.

Of course, if ff has a derivative at each point, then the statement implies that f(c)=fl(c)=fr(c)=0f'(c)=f'_l(c)=f'_r(c)=0.

a) As stated in this way, the proof is however quite standard and proceeds in two steps.

  1. Using that ff has a limit \ell which is not -\infty at aa and bb, it follows that there exists aa' and bb' in II such that a<a<b<ba<a'<b'<b such that ff is bounded from below on ]a;a]\mathopen ]a;a'] and on [b;b[[b';b\mathclose[. Since ff is continuous on the compact interval [a;b][a';b'], it is then bounded from below on II.
    If infI(f)<\inf_I(f)<\ell, then we can choose R\ell'\in\mathbf R such that infI(f)<<\inf_I(f)<\ell'<\ell and aa', bb' such that f(x)>f(x)>\ell' outside of [a;b][a';b']. Then, let c[a;b]c\in [a';b'] such that f(c)=inf[a;b](f)f(c)=\inf_{[a';b']}(f); then f(c)=infI(f)f(c)=\inf_I(f)
    If supI(f)>\sup_I(f)>\ell, then we have in particular +\ell\neq+\infty, and we apply the preceding analysis to f-f.
    In the remaining case, infI(f)=supI(f)=\inf_I(f)=\sup_I(f)=\ell and ff is constant.
  2. For x>cx>c, one has f(x)f(c)f(x)\geq f(c), hence fr(c)0f'_r(c)\geq 0; for x<cx<c, one has f(x)f(c)f(x)\geq f(c), hence fl(c)0f'_l(c)\leq0.

The interest of the given formulation can be understood by looking at the following two examples.

  1. If f(x)=xf(x)=|x|, on R\mathbf R, then ff attains its lower bound at x=0x=0 only, where one has fr(0)=1f'_r(0)=1 and fl(0)=1f'_l(0)=-1.
  2. Take f(x)=ex2f(x)=e^{-x^2}. Then there exists cRc\in\mathbf R such that f(c)=0f'(c)=0. Of course, one has f(x)=2xex2f'(x)=-2xe^{-x^2}, so that c=0c=0. However, it is readily seen by induction that for any integer nn, the nnth derivative of ff is of the form Pn(x)ex2P_n(x)e^{-x^2}, where PnP_n has degree nn. In particular, f(n)f^{(n)} tends to 00 at infinity. And, by induction again, the theorem implies that PnP_n has nn distinct roots in R\mathbf R, one between any two consecutive roots of Pn1P_{n-1}, one larger than the largest root of PnP_n, and one smaller than the smallest root of PnP_n.

b) In a 1959 paper, the Rumanian mathematician Pompeiu proposed an alternative proof of Rolle's theorem, when the interval II is bounded, and which works completely differently. Here is how it works, following the 1979 paper published in American Math. Monthly by Hans Samelson.

First of all, one uses the particular case n=2n=2 of the Levi chord lemma :

Lemma.Let f ⁣:[a;b]Rf\colon [a;b]\to\mathbf R be a continuous function such that f(a)=f(b)f(a)=f(b). For every integer n2n\geq 2, there exists a,b[a;b]a',b'\in[a;b] such that f(a)=f(b)f(a')=f(b') and ba=(ba)/nb'-a'=(b-a)/n.

Let h=(ba)/nh=(b-a)/n. From the equality
0=f(b)f(a)=(f(a+h)f(a))+(f(a+2h)f(a+h))++(f(a+nh)f(a+(n1)h), 0 = f(b)-f(a) = (f(a+h)-f(a))+(f(a+2h)-f(a+h))+\cdots + (f(a+nh)-f(a+(n-1)h),
one sees that the function xf(x+h)f(x)x\mapsto f(x+h)-f(x) from [a;bh][a;b-h] to R\mathbf R does not have constant sign. By the intermediate value theorem, it vanishes at some point a[a;bh]a'\in [a;b-h]. If b=a+hb'=a'+h, then b[a;b]b'\in[a;b], ba=(ba)/nb'-a'=(b-a)/n and f(a)=f(b)f(a')=f(b').

Then, it follows by induction that there exists a sequence of nested intervals ([an;bn])([a_n;b_n]) in [a;b][a;b] with f(an)=f(bn)f(a_n)=f(b_n) and bnan=(ba)/2nb_n-a_n=(b-a)/2^n for all nn. The sequences (an)(a_n) and (bn)(b_n) converge to a same limit c[a;b]c\in [a;b]. Since f(bn)=f(c)+(bnc)(f(c)+o(1))f(b_n)=f(c)+(b_n-c) (f'(c) + \mathrm o(1)), f(an)=f(c)+(anc)(f(c)+o(1))f(a_n)=f(c)+(a_n-c)(f'(c)+\mathrm o(1)), one has
f(c)=limf(bn)f(an)bnan=0. f'(c) = \lim \frac{f(b_n)-f(a_n)}{b_n-a_n} = 0.

What makes this proof genuinely distinct from the classical one is that the obtained point cc may not be a local minimum or maximum of ff, also I don't have an example to offer now.

c) In 1979, Abian furnished yet another proof, which he termed as the “ultimate” one. Here it is:

It focuses on functions f ⁣:[a;b]Rf\colon[a;b]\to\mathbf R on a bounded interval of R\mathbf R which are not monotone and, precisely, which are up-down, in the sense that f(a)f(c)f(a)\leq f(c) and f(c)f(b)f(c)\geq f(b), where c=(a+b)/2c=(a+b)/2 is the midpoint of ff. If f(a)=f(b)f(a)=f(b), then either ff or f-f is up-down.

Then divide the interval [a;b][a;b] in four equal parts: [a;p][a;p], [p;c][p;c], [c;q][c;q] and [q;b][q;b]. If f(p)f(c)f(p)\geq f(c), the f[a;c]f|_{[a;c]} is up-down. Otherwise, one has f(p)f(c)f(p)\leq f(c). In this case, if f(c)f(q)f(c)\geq f(q), we see that f[p;q]f|_{[p;q]} is up-down. And otherwise, we observe that f(q)f(c)f(q)\leq f(c) and f(c)f(b)f(c)\geq f(b), so that f[c;b]f|_{[c;b]} is up-down. Conclusion: we have isolated within the interval [a;b][a;b] a subinterval [a;b][a';b'] of length (ba)/2(b-a)/2 such that f[a;b]f|_{[a';b']} is still up-down.

Iterating the procedure, we construct a sequence ([an;bn])([a_n;b_n]) of nested intervals, with (bnan)=(ba)/2n(b_n-a_n)=(b-a)/2^n such that the restriction of ff to each of them is up-down. Set cn=(an+bn)/2c_n=(a_n+b_n)/2.

The sequences (an),(bn),(cn)(a_n), (b_n),(c_n) satisfy have a common limit c[a;b]c\in [a;b]. From the inequalities f(an)f(cn)f(a_n)\leq f(c_n) and ancna_n\leq c_n,  we obtain f(c)0f'(c)\geq 0; from the inequalities f(cn)f(bn)f(c_n)\geq f(b_n) and cnbnc_n\leq b_n, we obtain f(c)0f'(c)\leq 0. In conclusion, f(c)=0f'(c)=0.

2. Rolle's theorem in normed vector spaces

Theorem. Let EE be a normed vector space, let UU be an open subset of EE and let f ⁣:URf\colon U\to\mathbf R be a differentiable function. Assume that there exists R{+}\ell\in\mathbf R\cup\{+\infty\} such that f(x)f(x)\to \ell when xx tends to the “boundary” of UU — for every <\ell'<\ell, there exists a compact subset KK of UU such that f(x)f(x)\geq\ell' for all xUx\in U but x∉Kx\not\in K. Then ff is bounded below on UU, there exists aUa\in U such that f(a)=infU(f)f(a)=\inf_U (f) and Df(a)=0Df(a)=0.

The proof is essentially the same as the one we gave in dimension 1. I skip it here.

If EE is finite dimensional, then this theorem applies in a vast class of examples : for example, bounded open subsets UU of EE, and continuous functions f ⁣:URf\colon \overline U\to\mathbf R which are constant on the boundary (U)=UU\partial(U)=\overline U - U of UU and differentiable on UU.

However, if EE is infinite dimensional, the closure of a bounded open set is no more compact, and it does not suffice that ff extends to a function on U\overline U with a constant value on the boundary.

Example. — Let EE be an infinite dimensional Hilbert space, let UU be the open unit ball and BB be the closed unit ball. Let g(x)=12Ax,x+b,x+cg(x)=\frac12 \langle Ax,x\rangle+\langle b,x\rangle +c be a quadratic function, where AL(E)A\in\mathcal L(E), bEb\in E and cRc\in\mathbf R, and let f(x)=(1x2)g(x)f(x)=(1-\lVert x\rVert^2) g(x). The function ff is differentiable on EE and one has
 f(x)= (1x2)(Ax+b)2(12Ax,x+b,x+c)x.  \nabla f(x) =  (1-\Vert x\rVert^2) ( Ax + b) - 2 (\frac12 \langle Ax,x\rangle + \langle b,x\rangle + c) x.
Assume that there exists xUx\in U such that f(x)=0\nabla f(x)=0. Then Ax+b=λxAx+b = \lambda x, with
λ=21x2(12Ax,x+b,x+c). \lambda= \frac2{1-\lVert x\rVert ^2} \left(\frac12 \langle Ax,x\rangle + \langle b,x\rangle + c \right).
Azé and Hiriart-Urruty take E=L2([0;1])E=L^2([0;1]), for AA the operator of multiplication by the function ttb(t)=t(1t)b(t)=t(1-t), and c=4/27c=4/27. Then, one has g(x)>0g(x)>0, hence λ>0\lambda>0, and x(t)=1λtb(t)x(t)=\frac1{\lambda-t}b(t) for t[0;1]t\in[0;1]. This implies that λ1\lambda\geq 1, for, otherwise, the function x(t)x(t) would not belong to EE. This allows to compute λ\lambda in terms of μ\mu,  obtaining λ3/4\lambda\leq3/4, which contradicts the inequality λ1\lambda\geq 1. (I refer to the paper of Azé and Hiriart-Urruty for more details.)

3. An approximate version of Rolle's theorem

Theorem. Let BB the closed euclidean unit ball in Rn\mathbf R^n, let UU be its interior, let f ⁣:BRf\colon B\to \mathbf R be a continuous function on BB. Assume that fϵ\lvert f\rvert \leq \epsilon on the boundary (U)\partial(U) and that ff is differentiable on UU. Then there exists xUx\in U such that Df(x)ϵ\lVert Df(x)\rVert\leq\epsilon.

In fact, replacing ff by f/ϵf/\epsilon, one sees that it suffices to treat the case ϵ=1\epsilon =1.

Let g(x)=x2f(x)2g(x)=\lVert x\rVert^2- f(x)^2. This is a continuous function on BB; it is differentiable on UU, with g(x)=2(xf(x)f(x)) \nabla g(x)=2(x-f(x)\nabla f(x)). Let μ=infB(g)\mu=\inf_B(g). Since g(0)=f(0)20g(0)=-f(0)^2\leq0, one has μ0\mu\leq 0. We distinguish two cases:

  1. If μ=0\mu=0, then f(x)x\rvert f(x)\lvert \leq \lVert x\rVert for all xBx\in B. This implies that f(0)1\lVert\nabla f(0)\rVert\leq1.
  2. If μ<0\mu<0, let xBx\in B be such that g(x)=μ g(x)=\mu; in particular, f(x)2x2μ>0f(x)^2\geq \lVert x\rVert^2-\mu>0, which implies that f(x)0f(x)\neq0. Since g0g\geq0 on (U)\partial(U), we have xBx\in B, hence g(x)=0\nabla g(x)=0. Then x=f(x)f(x)x=f(x)\nabla f(x), hence f(x)=x/f(x)\nabla f(x)=x/f(x). Consequently,
    f(x)xf(x)x(x2μ)1/2<1. \lVert \nabla f(x)\rVert \leq \frac{\lVert x\rVert}{f(x)}\leq \frac{\lVert x\rVert}{(\lVert x\rVert^2-\mu)^{1/2}}<1.

This concludes the proof. 


Thanks to the Twitter users @AntoineTeutsch, @paulbroussous and @apauthie for having indicated me some misprints and incorrections.