Thursday, January 14, 2021

On Rolle's theorem

This post is inspired by a paper of Azé and Hiriart-Urruty published in a French high school math journal; in fact, it is mostly a paraphrase of that paper with the hope that it be of some interest to young university students, or to students preparing Agrégation. The topic is Rolle's theorem.

1. The one-dimensional theorem, a generalization and two other proofs

Let us first quote the theorem, in a nonstandard form.

Theorem.Let $I=\mathopen]a;b\mathclose[$ be a nonempty but possibly unbounded interval of $\mathbf R$ and let $f\colon I\to\mathbf R$ be a continuous function. Assume that $f$ has limits at $a$ and $b$, equal to some element $\ell\in\mathbf R\cup\{+\infty\}$. Then $f$ is bounded from below.

  1. If $\inf_I(f)<\ell$, then there exists a point $c\in I$ such that $f(c)=\inf_I (f)$. If, moreover, $f$ has a right derivative and a left derivative at $c$, then $f'_l(c)\leq0$ and $f'_r(c)\geq0$.
  2. If $\inf_I(f)\geq\ell$, then $f$ is bounded on $I$ and there exists a point $c\in I$ such that $f(c)=\sup_I(f)$. If, moreover, $f$ has a right derivative and a left derivative at $c$, then $f'_l(c)\geq0$ and $f'_r(c)\leq0$.

Three ingredients make this version slightly nonstandard:

  • The interval $I$ may be taken to be infinite;
  • The function $f$ may tend to $+\infty$ at the endpoints of $I$;
  • Only left and  right derivatives are assumed.

Of course, if $f$ has a derivative at each point, then the statement implies that $f'(c)=f'_l(c)=f'_r(c)=0$.

a) As stated in this way, the proof is however quite standard and proceeds in two steps.

  1. Using that $f$ has a limit $\ell$ which is not $-\infty$ at $a$ and $b$, it follows that there exists $a'$ and $b'$ in $I$ such that $a<a'<b'<b$ such that $f$ is bounded from below on $\mathopen ]a;a']$ and on $[b';b\mathclose[$. Since $f$ is continuous on the compact interval $[a';b']$, it is then bounded from below on $I$.
    If $\inf_I(f)<\ell$, then we can choose $\ell'\in\mathbf R$ such that $\inf_I(f)<\ell'<\ell$ and $a'$, $b'$ such that $f(x)>\ell'$ outside of $[a';b']$. Then, let $c\in [a';b']$ such that $f(c)=\inf_{[a';b']}(f)$; then $f(c)=\inf_I(f)$. 
    If $\sup_I(f)>\ell$, then we have in particular $\ell\neq+\infty$, and we apply the preceding analysis to $-f$.
    In the remaining case, $\inf_I(f)=\sup_I(f)=\ell$ and $f$ is constant.
  2. For $x>c$, one has $f(x)\geq f(c)$, hence $f'_r(c)\geq 0$; for $x<c$, one has $f(x)\geq f(c)$, hence $f'_l(c)\leq0$.

The interest of the given formulation can be understood by looking at the following two examples.

  1. If $f(x)=|x|$, on $\mathbf R$, then $f$ attains its lower bound at $x=0$ only, where one has $f'_r(0)=1$ and $f'_l(0)=-1$.
  2. Take $f(x)=e^{-x^2}$. Then there exists $c\in\mathbf R$ such that $f'(c)=0$. Of course, one has $f'(x)=-2xe^{-x^2}$, so that $c=0$. However, it is readily seen by induction that for any integer $n$, the $n$th derivative of $f$ is of the form $P_n(x)e^{-x^2}$, where $P_n$ has degree $n$. In particular, $f^{(n)}$ tends to $0$ at infinity. And, by induction again, the theorem implies that $P_n$ has $n$ distinct roots in $\mathbf R$, one between any two consecutive roots of $P_{n-1}$, one larger than the largest root of $P_n$, and one smaller than the smallest root of $P_n$.

b) In a 1959 paper, the Rumanian mathematician Pompeiu proposed an alternative proof of Rolle's theorem, when the interval $I$ is bounded, and which works completely differently. Here is how it works, following the 1979 paper published in American Math. Monthly by Hans Samelson.

First of all, one uses the particular case $n=2$ of the Levi chord lemma :

Lemma.Let $f\colon [a;b]\to\mathbf R$ be a continuous function such that $f(a)=f(b)$. For every integer $n\geq 2$, there exists $a',b'\in[a;b]$ such that $f(a')=f(b')$ and $b'-a'=(b-a)/n$.

Let $h=(b-a)/n$. From the equality
\[ 0 = f(b)-f(a) = (f(a+h)-f(a))+(f(a+2h)-f(a+h))+\cdots + (f(a+nh)-f(a+(n-1)h), \]
one sees that the function $x\mapsto f(x+h)-f(x)$ from $[a;b-h]$ to $\mathbf R$ does not have constant sign. By the intermediate value theorem, it vanishes at some point $a'\in [a;b-h]$. If $b'=a'+h$, then $b'\in[a;b]$, $b'-a'=(b-a)/n$ and $f(a')=f(b')$.

Then, it follows by induction that there exists a sequence of nested intervals $([a_n;b_n])$ in $[a;b]$ with $f(a_n)=f(b_n)$ and $b_n-a_n=(b-a)/2^n$ for all $n$. The sequences $(a_n)$ and $(b_n)$ converge to a same limit $c\in [a;b]$. Since $f(b_n)=f(c)+(b_n-c) (f'(c) + \mathrm o(1))$, $f(a_n)=f(c)+(a_n-c)(f'(c)+\mathrm o(1))$, one has
\[ f'(c) = \lim \frac{f(b_n)-f(a_n)}{b_n-a_n} = 0. \]

What makes this proof genuinely distinct from the classical one is that the obtained point $c$ may not be a local minimum or maximum of $f$, also I don't have an example to offer now.

c) In 1979, Abian furnished yet another proof, which he termed as the “ultimate” one. Here it is:

It focuses on functions $f\colon[a;b]\to\mathbf R$ on a bounded interval of $\mathbf R$ which are not monotone and, precisely, which are up-down, in the sense that $f(a)\leq f(c)$ and $f(c)\geq f(b)$, where $c=(a+b)/2$ is the midpoint of $f$. If $f(a)=f(b)$, then either $f$ or $-f$ is up-down.

Then divide the interval $[a;b]$ in four equal parts: $[a;p]$, $[p;c]$, $[c;q]$ and $[q;b]$. If $f(p)\geq f(c)$, the $f|_{[a;c]}$ is up-down. Otherwise, one has $f(p)\leq f(c)$. In this case, if $f(c)\geq f(q)$, we see that $f|_{[p;q]}$ is up-down. And otherwise, we observe that $f(q)\leq f(c)$ and $f(c)\geq f(b)$, so that $f|_{[c;b]}$ is up-down. Conclusion: we have isolated within the interval $[a;b]$ a subinterval $[a';b']$ of length $(b-a)/2$ such that $f|_{[a';b']}$ is still up-down.

Iterating the procedure, we construct a sequence $([a_n;b_n])$ of nested intervals, with $(b_n-a_n)=(b-a)/2^n$ such that the restriction of $f$ to each of them is up-down. Set $c_n=(a_n+b_n)/2$.

The sequences $(a_n), (b_n),(c_n)$ satisfy have a common limit $c\in [a;b]$. From the inequalities $f(a_n)\leq f(c_n)$ and $a_n\leq c_n$,  we obtain $f'(c)\geq 0$; from the inequalities $f(c_n)\geq f(b_n)$ and $c_n\leq b_n$, we obtain $f'(c)\leq 0$. In conclusion, $f'(c)=0$.

2. Rolle's theorem in normed vector spaces

Theorem. Let $E$ be a normed vector space, let $U$ be an open subset of $E$ and let $f\colon U\to\mathbf R$ be a differentiable function. Assume that there exists $\ell\in\mathbf R\cup\{+\infty\}$ such that $f(x)\to \ell$ when $x$ tends to the “boundary” of $U$ — for every $\ell'<\ell$, there exists a compact subset $K$ of $U$ such that $f(x)\geq\ell'$ for all $x\in U$ but $x\not\in K$. Then $f$ is bounded below on $U$, there exists $a\in U$ such that $f(a)=\inf_U (f)$ and $Df(a)=0$.

The proof is essentially the same as the one we gave in dimension 1. I skip it here.

If $E$ is finite dimensional, then this theorem applies in a vast class of examples : for example, bounded open subsets $U$ of $E$, and continuous functions $f\colon \overline U\to\mathbf R$ which are constant on the boundary $\partial(U)=\overline U - U$ of $U$ and differentiable on $U$.

However, if $E$ is infinite dimensional, the closure of a bounded open set is no more compact, and it does not suffice that $f$ extends to a function on $\overline U$ with a constant value on the boundary.

Example. — Let $E$ be an infinite dimensional Hilbert space, let $U$ be the open unit ball and $B$ be the closed unit ball. Let $g(x)=\frac12 \langle Ax,x\rangle+\langle b,x\rangle +c$ be a quadratic function, where $A\in\mathcal L(E)$, $b\in E$ and $c\in\mathbf R$, and let $f(x)=(1-\lVert x\rVert^2) g(x)$. The function $f$ is differentiable on $E$ and one has
\[  \nabla f(x) =  (1-\Vert x\rVert^2) ( Ax + b) - 2 (\frac12 \langle Ax,x\rangle + \langle b,x\rangle + c) x. \]
Assume that there exists $x\in U$ such that $\nabla f(x)=0$. Then $Ax+b = \lambda x$, with
\[ \lambda= \frac2{1-\lVert x\rVert ^2} \left(\frac12 \langle Ax,x\rangle + \langle b,x\rangle + c \right). \]
Azé and Hiriart-Urruty take $E=L^2([0;1])$, for $A$ the operator of multiplication by the function $t$,  $b(t)=t(1-t)$, and $c=4/27$. Then, one has $g(x)>0$, hence $\lambda>0$, and $x(t)=\frac1{\lambda-t}b(t)$ for $t\in[0;1]$. This implies that $\lambda\geq 1$, for, otherwise, the function $x(t)$ would not belong to $E$. This allows to compute $\lambda$ in terms of $\mu$,  obtaining $\lambda\leq3/4$, which contradicts the inequality $\lambda\geq 1$. (I refer to the paper of Azé and Hiriart-Urruty for more details.)

3. An approximate version of Rolle's theorem

Theorem. Let $B$ the closed euclidean unit ball in $\mathbf R^n$, let $U$ be its interior, let $f\colon B\to \mathbf R$ be a continuous function on $B$. Assume that $\lvert f\rvert \leq \epsilon $ on the boundary $\partial(U)$ and that $f$ is differentiable on $U$. Then there exists $x\in U$ such that $\lVert Df(x)\rVert\leq\epsilon$.

In fact, replacing $f$ by $f/\epsilon$, one sees that it suffices to treat the case $\epsilon =1$.

Let $g(x)=\lVert x\rVert^2- f(x)^2$. This is a continuous function on $B$; it is differentiable on $U$, with $ \nabla g(x)=2(x-f(x)\nabla f(x))$. Let $\mu=\inf_B(g)$. Since $g(0)=-f(0)^2\leq0$, one has $\mu\leq 0$. We distinguish two cases:

  1. If $\mu=0$, then $\rvert f(x)\lvert \leq \lVert x\rVert$ for all $x\in B$. This implies that $\lVert\nabla f(0)\rVert\leq1$.
  2. If $\mu<0$, let $x\in B$ be such that $ g(x)=\mu$; in particular, $f(x)^2\geq \lVert x\rVert^2-\mu>0$, which implies that $f(x)\neq0$. Since $g\geq0$ on $\partial(U)$, we have $x\in B$, hence $\nabla g(x)=0$. Then $x=f(x)\nabla f(x)$, hence $\nabla f(x)=x/f(x)$. Consequently,
    \[ \lVert \nabla f(x)\rVert \leq \frac{\lVert x\rVert}{f(x)}\leq \frac{\lVert x\rVert}{(\lVert x\rVert^2-\mu)^{1/2}}<1.\]

This concludes the proof. 

Thanks to the Twitter users @AntoineTeutsch, @paulbroussous and @apauthie for having indicated me some misprints and incorrections.

Tuesday, December 22, 2020

Celebrating Ramanujan's birthday — From powers of divisors to coefficients of modular forms

In a Twitter post, Anton Hilado reminded us that today (December 22nd) was the birthday of Srinivasa Ramanujan, and suggested somebody explains the “Ramanujan conjectures”. The following blog post is an attempt at an informal account. Or, as @tjf frames it, my christmas present to math twitter.

The story begins 1916, in a paper Ramanujan published in the Transactions of the Cambridge Philosophical Society, under the not so explicit title: On certain arithmetical functions. His goal started as the investigation of the sum $\sigma_s(n)$ of all $s$th powers of all divisors of an integer $n$, and approximate functional equations of the form
\[ \sigma_r(0)\sigma_s(n)+\sigma_r(1)\sigma_s(n-1)+\dots+\sigma_r(n)\sigma_s(0)
\approx \frac{\Gamma(r+1)\Gamma(s+1)}{\Gamma(r+s+2)} \frac{\zeta(r+1)\zeta(s+1)}{\zeta(r+s+2)}\sigma_{r+s+1}(n) + \frac{\zeta(1-r)+\zeta(1-s)}{r+s} n \sigma_{r+s-1}(n), \]
where $\sigma_s(0)=\dfrac12 \zeta(-s)$, and $\zeta$ is Riemann's zeta function. In what follows, $r,s$ will be positive odd integers, so that $\sigma_s(0)$ is half the value of Riemann's zeta function at a negative odd integer; it is known to be a rational number, namely $(-1)^sB_{s+1}/2(s+1)$, where $B_{s+1}$ is the $(s+1)$th Bernoulli number.

This investigation, in which Ramanujan engages without giving any motivation, quickly leads him to the introduction of infinite series,
\[ S_r = \frac12 \zeta(-r) + \frac{1^rx}{1-x}+\frac{2^r x^2}{1-x^2}+\frac{3^rx^3}{1-x^3}+\dots. \]
Nowadays, the parameter $x$ would be written $q$, and $S_r=\frac12 \zeta(-r) E_{r+1}$, at least if $r$ is an odd integer, $E_r$ being the Fourier expansion of the Eisenstein series of weight $r$. The particular cases $r=1,3,5$ are given special names, namely $P,Q,R$, and Ramanujan proves that $S_s$ is a linear combination of $Q^mR^n$, for integers $m,n$ such that $4m+6n=s+1$. Nowadays, we understand this as the fact that $Q$ and $R$ generated the algebra of modular forms—for the full modular group $\mathrm{SL}(2,\mathbf Z)$.

In the same paper, Ramanujan spells out the system of algebraic differential equations satisfied by $P,Q,R$:
\[ x \frac {dP}{dx} = \frac{1}{12}(P^2-Q),  x\frac{dQ}{dx}=\frac13(PQ-R), x\frac{dR}{dx}=\frac12(PR-Q^2). \]

The difference of the two sides of the initial equation has an expansion as a linear combination of $Q^mR^n$, where $4m+6n=r+s+2$. By the functional equation of Riemann's zeta function, relating $\zeta(s)$ and $\zeta(1-s)$, this expression vanishes for $x=0$, hence there is a factor $Q^3-R^2$.

Ramanujan then notes that

$ x\frac{d}{dx} \log(Q^3-R^2)=P$, so that

\[  P =x\frac{d}{dx} \log \left( x\big((1-x)(1-x^2)(1-x^3)\dots\big)^{24} \right) \]


\[ Q^3-R^2 = 1728 x  \big((1-x)(1-x^2)(1-x^3)\dots\big)^{24}= \sum \tau(n) x^n, \]
an expression now known as Ramanujan's $\Delta$-function. In fact, Ramanujan also makes the relation with elliptic functions, in particular, with Weierstrass's $\wp$-function. Then, $\Delta$ corresponds to the discriminant of the degree 3 polynomial $f$ such that $\wp'(u)^2=f(\wp(u))$. 

In any case, factoring $Q^3-R^2$ in the difference of the two terms, it is written as a linear combination of $Q^mR^n$, where $4m+6n=r+s-10$. When $r$ and $s$ are positive odd integers such that $r+s\leq 12$, there are no such pairs $(m,n)$, hence the difference vanishes, and Ramanujan obtains an equality in these cases. 

Ramanujan is interested in the quality of the initial approximation. He finds an upper bound of the form $\mathrm O(n^{\frac23(r+s+1)})$. Using Hardy–Littlewood's method, he shows that it cannot be smaller than $n^{\frac12(r+s)}$. That prompts his interest for the size of the coefficients of arithmetical functions, and $Q^3-R^2$ is the simplest one. He computes the coefficients $\tau(n)$ for $n\leq30$ and gives them in a table:

Recalling that $\tau(n)$ is $\mathrm O(n^7)$, and not $\mathrm O(n^5)$, Ramanujan states that there is reason to believe that $\tau(n)=\mathrm O(n^{\frac{11}2+\epsilon})$ but not $\mathrm O(n^{\frac{11}2})$. That this holds is Ramanujan's conjecture.

Ramanujan was led to believe this by observing that the Dirichlet series $ \sum \frac{\tau(n)}{n^s} $ factors as an infinite product (“Euler product”, would we say), indexed by the prime numbers:

\[ \sum_{n=1}^\infty \frac{\tau(n)}{n^s} = \prod_p \frac{1}{1-\tau(p)p^{-s}+p^{11-2s}}. \]

This would imply that $\tau$ is a multiplicative function: $\tau(mn )=\tau(m)\tau(n)$ if $m$ and $n$ are coprime, as well as the more complicated relation $\tau(p^{k+2})=\tau(p)\tau(p^{k+1})-p^{11}\tau(p^k)$ between the $\tau(p^k)$. These relations have been proved by Louis Mordell in 1917. He introduced operators (now called Hecke operators) $T_p$ (indexed by prime numbers $p$) on the algebra of modular functions and proved that Ramanujan's $\Delta$-function is an eigenfunction. (It has little merit for that, because it is alone in its weight, so that $T_p \Delta$ is a multiple of $\Delta$, necessarily $T_p\Delta=\tau(p)\Delta$.)

The bound $\lvert{\tau(p)}\rvert\leq p^{11/2}$ means that the polynomial $1-\tau(p) X+p^{11}X^2$ has two complex conjugate roots. This part of the conjecture would be proved in 1973 only, by Pierre Deligne, and required many additional ideas. One was conjectures of Weil about the number of points of algebraic varieties over finite fields, proved by Deligne in 1973, building on Grothendieck's étale cohomology. Another was the insight (due to Michio Kuga, Mikio Sato and Goro Shimura) that Ramanujan's conjecture could be reframed as an instance of the Weil conjectures, and its actual proof by Deligne in 1968, applied to the 10th symmetric product of the universal elliptic curve.