Wednesday, February 12, 2020

Dynamics of Hecke correspondences

This morning, Sebastián Herrero, Ricardo Menares and Juan Rivera-Letelier released a preprint on the arXiv entitled “p-Adic distribution of CM points and Hecke orbits. I. Convergence towards the Gauss point.” In fact, there are two other papers on the arXiv with similar titles, by two other groups of authors. The first one, “p-adic Dynamics of Hecke Operators on Modular Curves” is by
Eyal Z. Goren and Payman L Kassaei, and the second one is by Daniele Disegni, “p-adic equidistribution of CM points”.

In fact, as @_nilradical pointed out initially, the words “Hecke correspondences” are generally used in modular forms books to describe some (pairwise commuting) endomorphisms of the spaces of modular forms. So what are these authors talking about?

As he said: “literally what the hell is going on in math?”. My answer, at a time where I should have been doing something else, such as sleeping, was direct:

This is a very nice question! Take an elliptic curve over the $p$-adics, say, and draw its images under Hecke correspondences. You get a cloud of dots on the $p$-adic modular curve. Normalize by its size. What are the limit measures ? The picture is richer than over the complex.

what do u mean by "correspondences"? i know only about hecke operators to the extent of what is in chapter 3 of shimura's intro book and the end of serre's "course in arithmetic"

You remember that paragraph of Serre's Book where he says modular forms of wt k are functions on pairs (E,ω) — E, ell curve ; ω, nonzero inv diff form on E — with some homogeneity and holomorphy condition ? (Idea: τ -> ell curve C/(Z+τZ) and ω = dz) ...

yea, im ok with weight k modular forms being sections of \Omega^2k

In this point of view, the Hecke operator T_n sends a modular form f to the function that maps (E,ω) to the sum of all f(E',ω') where E' is a quotient E/D (D, cyclic subgroup of order n) and ω' I don't tell here. Now, forget about modular forms (and differential forms ω, as well).

And I went to explain those things in a few Twitter posts, but it's time to say it at a quieter pace.

1. Elliptic curves, modular curves, and modular correspondences.

The reason for the adjective “modular” in the expression “modular forms” or in the expression “modular curves” is that they refer to the “moduli” of some extremly classical geometric objects called elliptic curves. Elliptic curves have various descriptions, all equally beautiful and important: either, complex analytically, as one-dimensional tori, quotients of the complex plane $\mathbf C$ by a lattice $\Lambda\simeq\mathbf Z^2$, or as cubic plane curves given by a (Weierstrass) equation of the form $y^2=x^3+ax+b$ in the affine plane with coordinates $(x,y)$ — forgetting a point at infinity.

In fact, the “set” of all elliptic curve can themselved be arranged in a curve provided one identifies “isomorphic” elliptic curves, so that some simplifications arise. Not any lattice must be considered, one first may assume $\Lambda=\mathbf Z+\mathbf Z\tau$, for some complex number $\tau$ with strictly positive imaginary part, but then the complex number $\tau$ and any complex number of the form $(a\tau+b)/(c\tau+d)$ gives the same elliptic curve, for any matrix $\begin{pmatrix}a & b\\c & d\end{pmatrix}$ with integer entries and determinant $1$. So, in some sense, the set of elliptic curves coincides with the quotient of the (Poincaré) upper half-space by the action of the group $\mathrm{SL}(2,\mathbf Z)$. Alternatively, in the Weierstrass model, there are two parameters $(a,b)$, but all pairs of the form $(u^4a,u^6b)$ give the same curve (via the change of variables $x'=u^2x$, $y'=u^3y$). One can guess some subtleties here, because some the matrix $-I_2$ acts trivially, and also because some specific $\tau$ have a nontrivial stabilizer (even taking $-I_2$ into acount); on the other side, $u=-1$ does not act, and some pairs $(a,b)$ are fixed by some more roots of unity. I will ignore these here; technically, they are solved in two ways, one is to add a “level structure”, the other is to consider this modular curve $M$ as an orbifold — and algebraic geometers say stack.

“Correspondence” is a long-forgotten topic in set theory, since the time where multivalued functions were considered. We forgot them because it's hard to talk consistently about them, but there are instances in which they arise naturally. In set theory, one way to define a correspondence $T$ from a set $X$ to a set $Y$ is to consider its graph $G_T$, a subset of $X\times Y$. If the graph contains a pair $(x,y)$, one says that the correspondence maps $x$ to $y$; but the graph could also contain a pair $(x,y')$, in which case it also maps $x$ to $y'$. And it could very well contain no pair with first element $x$, and then $x$ has no image under the correspondence. Correspondences can be composed: if the correspondence $S$ maps $x$ to $y$ and a correspondence $T$ maps $y$ to $z$,
then $T\circ S$ maps $x$ to $z$. They can also be inverted: just flip the graph.

Modular curves admit natural modular correspondences, indexed by strictly positive integers.
Specifically, the correspondence $T_n$ maps an elliptic curve $E$ to all elliptic curves $E'$ of
the form $E'=E/D$, where $D$ is a cyclic subgroup of rank $n$ of $E$.
The graph of this correspondence, when drawn on the surface $M\times M$, has a natural structure of an algebraic curve.

2. Modular dynamics

One virtue of correspondences in algebraic geometry is that they act naturally on objects that are attached “linearly” to the curve. For example, considering a function $f$ (or a differential form on $M$), rather than looking at all the images $E'$ of $E$, one could add the corresponding values $f(E')$, and consider thus sum as a function of $E$. This is exactly what is done to define the Hecke correspondence on modular forms. Geometrically this corresponds to pulling back $f$ from $M$ to $M\times M$ (using the first projection), then restricting on the graph of the correspondence, then “pushing-out” (a kind of trace) to the second factor — this is where the sum happens.

Another way to linearize the correspondence is to formally add all images $E'$, for example considering that the correspondence maps a Dirac measure $\delta_E$ at a point $E$
to the sum of the Dirac measures $\delta_{E'}$ at its images; maybe dividing by the number of images so that one keeps a probability measure. In this framework, the correspondence $T_n$ is a map from the space of probability measures on the modular curve to itself.

Which is cool because you can now iterate the process and wonder about the possible limit measures.

This is analogous to what is done in complex dynamics, for example when you study the dynamics
of the map $z\mapsto z^2+c$ (here, $c$ is a parameter): a construction of the Julia set consists in taking the 2 preimages of a point $z$ (any point, with a few exceptions), their 4 preimages, etc. As long as the construction goes on, the cloud of points that one gets becomes closer and closer to the Julia set.

In the case of the modular curve and the dynamics of Hecke correspondences (on can compose a given Hecke correspondence, or consider $T_n$ for large $n$, it does not really matter), what happens is described by theorems of William Duke / Laurent Clozel and Emmanuel Ullmo / Rodolphe Richard.
Whatever point one starts with, whatever probability measure one starts with, the probability measures on $M$ that are constructed by the dynamics converge to the Poincaré measure on the modular curve — that is, to the measure that is given by the hyperbolic measure $dx\, dy/ y^2$ on the Poincaré upper half-plane, restricted to a fundamental domain of the action.

In fact, Duke and Clozel/Ullmo have different goals. What they consider is a sequence of probability measures formed by considering elliptic curves with complex multiplication and all their conjugates. The limit theorem that they prove is then quite subtle and relies on deep properties of Maass forms of half integral weight.

3. Modular dynamics: $p$-adic fields

Over the $p$-adics, the dynamics is even more complicated, although its behaviour does not seem to be governed by analytic properties of analytic objects such as Maass forms, but by the arithmetic of the elliptic curve one starts with.

A first difficulty comes from the framework required to define $p$-adic dynamics. One wants to start from the field $\mathbf Q_p$, but it is not algebraically closed, and so the construction of the images of a curve by the correspondence require to take its algebraic closure $\overline{\mathbf Q_p}$. But then $p$-adic analysis is not so cool, because although that field has a natural $p$-adic absolute value, it is not complete anymore — so let's take its completion, $\mathbf C_p$. And this field is algebraically closed.

However, and this is a reflection that the Galois theory of $\mathbf Q_p$ is complicated, the field $\mathbf C_p$ is not locally compact, so that measure theory on such a field is not very well behaved. For example, a basic tool in measure theory over compact (metrizable, say) spaces is that the set of probability measures is itself compact for the vague topology on measures, so that any sequence of probability measures has a converging subsequence, etc.

In our case, this won't hold anymore and one needs to consider a suitable “compactification” of the $p$-adic modular curve — in fact, its analytification $M_p$ in the sense of Berkovich. One then gets a locally compact topological space on which the Hecke correspondences still act naturally, and
questions of dynamics now can be formulated properly.

The only thing I'll write about the dynamics is that it is subtle: there are domains which are stable by the correspondences. For example, if the reduction mod $p$ of an elliptic curve $E$ is supersingular, then all of its images $E' = E/D$ also have supersingular reduction. In other words, the supersingular locus in the Berkovich modular curve $M_p$ is totally invariant by the Hecke correspondence — this locus is a finite union of open disks. This shows that there are many possible limit measure and the papers that I have quoted above study the various limit phenomaena.

They also consider the analogue of Duke's result in that setting. Disegni also considers the analogous problem for Shimura curves (instead of elliptic curves, they parameterize abelian surfaces with real multiplication).

This will be all for this blog spot, now go and read these papers!

Friday, December 27, 2019

Behaviour of conjugacy by reduction modulo integers

Let $A$ and $B\in\mathrm{M}_n(\mathbf Z)$ be two square matrices with integer coefficients. Assume that they are conjugate by $\mathrm{GL}_n(\mathbf Z)$, namely, that there exists a matrix $P\in\mathrm{GL}_n(\mathbf Z)$ such that $B=P^{-1}AP$. Then we can reduce this relation modulo every integer $d\geq 2$ and obtain a similar relation between the images of $A$ and $B$ in $\mathrm M_n(\mathbf Z/d\mathbf Z)$.

Almost the same holds if $A$ and $B$ are only conjugate by $\mathrm{GL}_n(\mathbf Q)$, except for a few exceptions: we just need to take care to reduce the relation modulo integers $d$ that are coprime to the denominators of the coefficients of $P$ or of $P^{-1}$.

I was quite surprised at first to learn that the converse assertion is false. There are matrices $A$ and $B$
in $\mathrm M_2(\mathbf Z)$ whose images modulo every integer $d\geq 2$ are conjugate, but which are not conjugate by a matrix in $\mathrm{GL}_2(\mathbf Z)$.

An example is given by Peter Stebe in his paper “Conjugacy separability of groups of integer matrices”, Proc. of the AMS, 32 (1), mars 1972, p. 1—7.
Namely, set
\[ A = \begin{pmatrix} 188 & 275 \\ 121 & 177 \end{pmatrix} = \begin{pmatrix} 11\cdot 17+1 & 25\cdot 11 \\ 11^2 & 11\cdot 16+1 \end{pmatrix} \]
and
\[ B = \begin{pmatrix} 188 & 11 \\ 3025 & 177 \end{pmatrix} =
\begin{pmatrix} 11\cdot 17+1 & 11 \\ 11^2\cdot 25 & 11\cdot 16+1 \end{pmatrix}. \]
These matrices $A$ and $B$ have integer coefficients, their determinant is $1$, hence they belong to $\mathrm{SL}_2(\mathbf Z)$. They also have the same trace, hence the same characteristic polynomial, which is $T^2-365T+1$. The discriminant of this polynomial is $3\cdot 11^2\cdot 367$. This implies that their complex eigenvalues are distinct, hence these matrices are diagonalizable over $\mathbf C$, and are conjugate over $\mathbf C$.

In the same way, we see that they remain conjugate modulo every prime number $p$
that does not divide the discriminant. Modulo $3$ and $11$, we check that both matrices become
conjugate to $\begin{pmatrix}1 & 1 \\ 0 & 1\end{pmatrix}$, while they become conjugate to $\begin{pmatrix}-1 & 1 \\ 0 & -1 \end{pmatrix}$ modulo $367$.

It is a bit more delicate to prove that if we reduce modulo any integer $d\geq 2$, then $A$ and $B$ become conjugate under $\mathrm{SL}_2(\mathbf Z/d\mathbf Z)$. Stebe's argument runs in two steps.
He first computes the set of matrices $V$ that conjugate $A$ to $B$, namely he solves the equation $VA=BV$. The answer is given by
\[ V=V(x,y) = \begin{pmatrix} x & y \\ 11 y & 25 x-y \end{pmatrix}. \]
Moreover, one has
\[ \det(V(x,y))=25x^2 - xy -11y^2. \]
Consequently, to prove that $A$ and $B$ are conjugate in $\mathrm{SL}_2(\mathbf Z/d\mathbf Z)$, it suffices to find $x,y\in\mathbf Z/d\mathbf Z$ such that $\det(V(x,y))=1 \pmod d$.
To prove that they are conjugate by $\mathrm{SL}_2(\mathbf Z)$, we need to find $x,y\in\mathbf Z$ such that $\det(V(x,y))=1$, and if we agree to be content with a conjugacy by $\mathrm{GL}_2(\mathbf Z)$, then solutions of $\det(V(x,y))=-1$ are also admissible.

Let us first start with the equations modulo $d$. By the Chinese remainder theorem, we may assume that $d=p^m$ is a power of a prime number $p$. Now, if $p\neq 5$, we can take $y=0$ and $x$ such that $5x=1\pmod {p^m}$. If $p=5$, we take $x=0$ and we solve $y$ for $-11y^2=1\pmod {5^m}$, which is possible since $-11\equiv 4\pmod 5$ is a square, and it is easy, by induction (anyway, this is an instance of Hensel's lemma), to produce $y$ modulo $5^m$ such that $y\equiv 3\pmod 5$ and $-11y^2=1\pmod{5^m}$.

On the other hand, the equation $25x^2-xy-11y^2=\pm 1$ has no solutions in integers.
The case of $-1$ is easy by reduction modulo $3$: it becomes $x^2+2xy+y^2=2$, which has no solution since $x^2+2xy+y^2=(x+y)^2$ and $2$ is not a square modulo $3$.
The case of $+1$ is rather more difficult. Stebe treats it by reducing to the Pell equation $u^2=1101y^2+1$ and shows by analysing the minimal solution to this Pell equation that $y$ is divisible by $5$, which is incompatible with the initial equation.


From a more elaborate point of view, $V$ is a smooth scheme over $\mathbf Z$ that violates the integral Hasse principle. In fact, $V$ is a torsor under the centralizer of $A$, which is a torus, and the obstruction has been studied by Colliot-Thélène and Xu, precisely in this context. However, I did not make the calculations that could use their work to reprove Stebe's theorem.

Tuesday, July 2, 2019

Irreducibility of cyclotomic polynomials

For every integer $n\geq 1$, the $n$th cyclotomic polynomial $\Phi_n$ is the monic polynomial whose complex roots are the primitive $n$th roots of unity. A priori, this is a polynomial with complex coefficients, but since every $n$th root of unity is a primitive $d$th root of unity, for a unique divisor $d$ of $n$, one has the relation
\[ T^n-1 = \prod_{d\mid n} \Phi_d(T), \]
which implies, by induction and euclidean divisions, that $\Phi_n \in \mathbf Z[T]$ for every $n$.
The degree of the polynomial $\Phi_n$ is $\phi(n)$, the Euler indicator, number of units in $\mathbf Z/n\mathbf Z$, or number of integers in $\{0,1,\dots,n-1\}$ which are prime to $n$.

The goal of this note is to explain a few proofs that these polynomials are irreducible in $\mathbf Q[T]$ — or equivalently, in view of Gauss's lemma, in $\mathbf Z[T]$. This also amounts to saying that $\deg(\Phi_n)=\phi(n)$ or that the cyclotomic extension has degree $\phi(n)$, or that the canonical group homomorphism from the Galois group of $\mathbf Q(\zeta_n)$ to $(\mathbf Z/n\mathbf Z)^\times$ is an isomorphism.

1. The case where $n=p$ is a prime number.

One has $T^p-1=(T-1)(T^{p+1}+\dots+1)$, hence $\Phi_p=T^{p-1}+\dots+1$. If one reduces it modulo $p$, one finds $\Phi_p(T)\equiv (T-1)^{p-1}$, because $(T-1)\Phi_p(T)=T^p-1\equiv (T-1)^p$. Moreover, $\Phi_p(1)=p$ is not a multiple of $p^2$. By the Eisenstein criterion (after a change of variables $T=1+U$, if one prefers), the polynomial $\Phi_p$ is irreducible.

This argument also works when $n=p^e$ is a power of a prime number. Indeed, since a complex number $\alpha$ is a primitive $p^e$th root of unity if and only if $\alpha^{p^{e-1}}$ is a primitive $p$th root of unity, one has $\Phi_{p^e}= \Phi_p(T^{p^{e-1}})$. Then the Eisenstein criterion gives the result.

Comment.From the point of view of algebraic number theory, this proof makes use of the fact that the cyclotomic extension $\mathbf Q(\zeta_p)$ is totally ramified at $p$, of ramification index $p-1$.
Consequently, it must have degree $p-1$. More generally, it will prove that $\Phi_p$ is irreducible over the field $\mathbf Q_p$ of $p$-adic numbers, or even over any unramified extension of it, or even over any algebraic extension of $\mathbf Q_p$ for which the ramification index is prime to $p-1$.


2. The classical proof

Let us explain a proof that works for all integer $n$. Let $\alpha$ be a primitive $n$th root of unity, and let $P\in\mathbf Z[T]$ be its minimal polynomial — one has $P\mid \Phi_n$ in $\mathbf Z[T]$. Let (A priori, the divisibility is in $\mathbf Q[T]$, but Gauss's lemma implies that it holds in $\mathbf Z[T]$ as well.) Fix a polynomial $Q\in\mathbf Z[T]$ such that $\Phi_n=PQ$.

By euclidean division, one sees that the set $\mathbf Z[\alpha]$ of complex numbers of the form $S(\alpha)$, for $S\in\mathbf Z[T]$, is a free abelian group of rank $\deg(P)$, with basis $1,\alpha,\dots,\alpha^{\deg(P)-1}$.

Let $p$ be a prime number which does not divide $n$. By Fermat's little theorem, one has $P(T)^p \equiv P(T^p) \pmod p$, so that there exists $P_1\in\mathbf Z[T]$ such that $P(X)^p-P(X^p)=pP_1(T)$. This implies that $P(\alpha^p)=p P_1(\alpha)\in p\mathbf Z[\alpha]$.

Since $p$ is prime to $n$, $\alpha^p$ is a primitive $n$th root of unity, hence $\Phi_n(\alpha^p)=0$. Assume that $P(\alpha^p)\neq 0$. Then one has $Q(\alpha^p)=0$. Differentiating the equality $\Phi_n=PQ$, one gets $nT^{n}=T\Phi'_n(T)=TP'Q+TPQ'$; let us evaluate this at $\alpha_p$, we obtain $n=\alpha^p P(\alpha_p) Q'(\alpha^p)=p \alpha^p P^1(\alpha^p)Q'(\alpha^p)$. In other words, $n\in p\mathbf Z[\alpha]$, which is absurd because $n$ does not divide $p$. Consequently, $P(\alpha^p)=0$, and $P$ is also the minimal polynomial of $\alpha^p$.

By induction, one has $P(\alpha^m)=0$ for every integer $m$ which is prime to $n$. All primitive $n$th roots of unity are roots of $P$ and $\deg(P)=\phi(n)=\deg(\Phi_n)$. This shows that $P=\Phi_n$.

Comment.Since this proof considers prime numbers $p$ which do not divide $n$, it makes implicit use of the fact that the cyclotomic extension is unramified away from primes dividing $n$. The differentiation that appears in the proof is a way of proving this non-ramification: if $P(\alpha^p)$ is zero modulo $p$, it must be zero.

3. Landau's proof

A 1929 paper by Landau gives a variant of this classical proof which I just learnt from Milne's notes on Galois theory and which I find significantly easier.

We start as previously, $\alpha$ being a primitive $n$th root of unity and $P\in\mathbf Z[T]$ being its minimal polynomial.

Let us consider, when $k$ varies, the elements $P(\alpha^k)$ of $\mathbf Z[\alpha]$. There are finitely many of them, since this sequence is $n$-periodic, so that they can be written as finitely polynomials of degree $<\deg(P)$ in $\alpha$. Let $A$ be an upper-bound for their coefficients. If $p$ is a prime number, we have $P(\alpha^p) \in p\mathbf Z[\alpha]$ (by an already given argument). This implies $P(\alpha^p)=0$ if $p>A$.

By induction, one has $P(\alpha^m)=0$ for any integer $m$ whose prime factors are all $>A$.

One the other hand, if $m$ is an integer prime to $n$ and $P$ is the product of all prime number $p$ such that $p\leq A$ and $p$ does not divide $m$, then $m'=m+nP $ is another integer all of which prime factors are $>A$. (Indeed, if $p\leq A$, then either $p\mid m$ in
which case $p\nmid nP$ so that then $p\nmid m'$, or $p\nmid m$ in which case $p\mid nP$ so that $p\nmid m'$.) Since $m'\equiv m \pmod n$, one has $P(\alpha^{m})=P(\alpha^{m'})=0$.

This shows that all primitive $n$th roots of unity are roots of $P$, hence $P=\Phi_n$.

Comment. —This proof is quite of a mysterious nature to me.

4. Using Galois theory to pass from local information to global information

The cyclotomic extension $K_n$ contains, as subextension, the cyclotomic extensions $K_{p^e}$, where $n=\prod p_i^{e_i}$ is the decomposition of $n$ has a product of powers of prime numbers. By the first case, $K_{p^e}$ has degree $\phi(p^e)=p^{e-1}(p-1)$ over $\mathbf Q$. To prove that $\Phi_n$ is irreducible, it suffices to prove that these extensions are linearly disjoint, which is the object of the following lemma.

Lemma. — Let $m$ and $n$ be integers and let $d$ be their gcd. Then $K_m\cap K_n=K_d$.

This is an application of Galois theory (and the result holds for every ground field as soon as its characteristic does not divide $m$ and $n$). Let $M$ be the least common multiple of $m$ and $n$. One has $K_N=K_m\cdot K_n$, and the cyclotomic character furnishes a group morphism $\operatorname{Gal}(K_N/\mathbf Q)\to (\mathbf Z/N\mathbf Z)^\times$. The Galois groups $\operatorname{Gal}(K_N/K_m)$ and $\operatorname{Gal}(K_N/K_n)$ corresponding to the subfields $K_m$ and $K_n$ are the kernels of the composition of the cyclotomic character with the projections to $(\mathbf Z/m\mathbf Z)^\times$ and $(\mathbf Z/n\mathbf Z)^\times$, and their intersection to the subgroup generated by these two kernels, which is none but the kernel of the composition of the cyclotomic character with the projection to $(\mathbf Z/d\mathbf Z)^\times$.

Sunday, May 19, 2019

Designs, Skolem sequences, and partitions of integers

Recently, my father offered me the first volume of a graphic novel by Jean-François Kierzkowski and Marek called La suite de Skolem — Skolem's sequences. I knew about the norwegian mathematician Thoralf Skolem for two different reasons (the Löwenheim-Skolem theorem in model theory, and some diophantine equations that Laurent Moret-Bailly put in a more geometric setting — see his series of papers on Problèmes de Skolem), but I had never heared about Skolem sequences.

They appear in his 1957 paper, On certain distributions of integers in pairs with given differences (Math Scand., 5, 57-68).
The question is to put the integers $1,2,\dots,2n$ in $n$ pairs $(a_1,b_1),\dots,(a_n,b_n)$ such that the differences are all different, namely $b_i-a_i=i$ for $i\in\{1,\dots,r\}$. One can put it differently: write a sequence of $2n$ integers, where each of the integers from $1$ to $n$ appear exactly twice, the two $1$s being at distance $1$, the two $2$s at distance $2$, etc.
For example, $4,2,3,2,4,3,1,1$ is a Skolem sequence of length $n$, corresponding to the pairs $(7,8), (2,4), (3,6),(1,5)$.

Le jeu des cavaliers (Jessica Stockholder) — photo V. Pantaloni
Le jeu des cavaliers (photo V. Pantaloni)
The possibility of such sequences has been materialized under the form of a giant sculpture Le jeu des cavaliers by Jessica Stockholder at the Institut des Hautes Études Scientifiques (IHÉS) at Bures sur Yvette.

There is a basic necessary and sufficient condition for such a sequence to exist, namely $n$ has to be congruent to $0$ or $1$ modulo $4$. The proof of necessity is easy (attributed by Skolem to Bang): one has $\sum_{i=1}^n(b_i-a_i)=n(n+1)/2$, and $\sum_{i=1}^n (b_i+a_i)=2n(2n+1)/2$, so that $\sum_{i=1}^n b_i=n(n+1)/4+2n(2n+1)/2=n(5n+3)/4$. If $n$ is even, this forces $n\equiv 0 \pmod 4$, while if $n$ is odd, this forces $5n+3\equiv 0\pmod 4$, hence $n\equiv 1\pmod 4$. The proof of  existence consists in an explicit example of such a sequence, which is written down in Skolem's paper.

Skolem's motivation is only alluded to in that paper, but he explains it a bit further next year. In Some Remarks on the Triple Systems of Steiner, he gives the recipe that furnishes such a system from a Skolem sequence. Steiner triple systems on a set $S$ is the datum of triplets of elements of $S$ such that each pair of two elements of $S$ appears exactly once. In other words, they are a $(3,2,1)$-design on $S$ — a $(m,p,q)$-design on a set $S$ being a family of $m$-subsets of $S$ such that each $q$-subset appears in exactly $p$ of those subsets. Some relatively obvious divisibility conditions can be written down that give a necessary condition for the existence of designs with given parameters, but actual existence is much more difficult. In fact, it has been shown only recently by Peter Keevash that these necessary conditions are sufficient, provided the cardinality of the set $S$ is large enough, see Gil Kalai's talk Designs exist! at the Bourbaki Seminar.

In the case of Steiner triple systems, the condition is that the number $s$ of elements of $S$ be congruent to $1$ or $3$ modulo $6$. Indeed, there are $s(s-1)/2$ pairs of elements of $S$, and each 3-subset of the triple system features 3 such pairs, so that there are $N=s(s-1)/6$ triples. On the other hand, each element of $S$ appears exactly $3N/s$ times, so that $(s-1)/2$ is an integer. So $s$ has to be odd, and either $3$ divides $s$ (in which case $s\equiv 3\pmod 6$) or $s\equiv 1\pmod 6$.

And Skolem's observation is that a family of $n$ pairs $(a_i,b_i)$ as above furnishes a triple system on the set $S=\mathbf Z/(6n+1)\mathbf Z$, namely the triples $(i,i+j,i+b_j+n)$ where $1\leq i,j\leq n$, thus constructing Steiner triple systems on a set whose cardinality $6n+1$, when $n\equiv 0,1\pmod 4$.

My surprise came at the reading of the rest of Skolem's 1957 paper, because I knew the result he then described but had no idea it was due to him. (In fact, it was one of the first homework my math teacher Johan Yebbou gave to us when I was in classes préparatoires.) And since this result is very nice, let me tell you about it.

Theorem.Let $\alpha>1$ and $\beta>1$ be irrational real numbers such that $\alpha^{-1}+\beta^{-1}=1$. Then each strictly positive integer can be written either as $\lfloor \alpha n\rfloor$, or $\lfloor \beta n\rfloor$ for some integer $n\geq 1$, but not of both forms.

First of all, assume $N=\lfloor \alpha n\rfloor=\lfloor \beta m\rfloor$. Using that $\alpha,\beta$ are irrational, we thus write $N< \alpha n<N+1$ and $N<\beta m<N+1$. Dividing these inequalities by $\alpha$ and $\beta$ and adding them, we get $N<n+m<N+1$, since $\alpha^{-1}+\beta^{-1}=1$. This proves that any given integer can be written only of one of those two forms.

Since $\alpha^{-1}+\beta^{-1}=1$, one of $\alpha,\beta$ has to be $<2$. Assume that $1<\alpha<2$. The integers of the form $\lfloor \alpha n\rfloor$ form a strictly increasing sequence, and we want to show that any integer it avoids can be written $\lfloor \beta m\rfloor$.

Set $\gamma=\alpha-1$, so that $\beta=\alpha/(\alpha-1)=1+1/\gamma$. 

For every integer, we have $\lfloor \alpha(n+1)\rfloor = \lfloor \alpha n\rfloor + 1$ or $\lfloor \alpha(n+1)\rfloor=\lfloor \alpha n\rfloor+2$, so that if $\lfloor \alpha n\rfloor + 1$ is avoided, one has $\lfloor \alpha (n+1)\rfloor=\lfloor\alpha n\rfloor +2$.

Then, $\lfloor \alpha n\rfloor = n+\lfloor \gamma n\rfloor=n+k-1$, where $k=1+\lfloor \gamma n\rfloor$. The inequalities $k-1<\gamma n <k$ imply $k/\gamma - 1/\gamma< n<k/\gamma$. Moreover, $\lfloor \alpha(n+1)\rfloor=n+1+\lfloor \gamma(n+1)\rfloor=n+k+1$, so that $k+1<\gamma(n+1)<k+2$, hence $n>k/\gamma+1/\gamma-1>k/\gamma-1$. This proves that $n=\lfloor k/\gamma\rfloor$. Then, $\lfloor k\beta\rfloor=k+\lfloor k/\gamma\rfloor=k+n=\lfloor \alpha n\rfloor +1$.






Saturday, May 12, 2018

A theorem of Lee—Yang on root of polynomials

A recent MathOverflow post asked for a proof that the roots of certain polynomials were located on the unit circle. A comment by Richard Stanley pointed to a beautiful theorem of T. D. Lee and C. N. Yang. By the way, these two authors were physicists, got the Nobel prize in physics in 1957, and were the first Chinese scientists to be honored by the Nobel prize.

This theorem appears in an appendix to their paper, Statistical Theory of Equations of State and Phase Transitions.IL Lattice Gas and Ising Model, published in Phys. Review in 1952, and devoted to properties of the partition function of some lattice gases. Here is a discussion of this theorem, following both the initial paper and notes by Shayan Oveis Gharan.

Let $A=(a_{i,j})_{1\leq i,j\leq n}$ be a Hermitian matrix ($a_{i,j}=\overline{a_{j,i}}$ for all $i,j$). Define a polynomial
\[ F(T) = \sum_{S\subset\{1,\dots,n\}} \prod_{\substack{i\in S \\ j\not\in S}} a_{i,j} T^{\# S}. \]

Theorem 1 (Lee-Yang). — If $\lvert a_{i,j}\rvert\leq 1$ for all $i,j$, then all roots of $F$ have absolute value $1$.

This theorem follows from a multivariate result. Let us define
\[ P(T_1,\dots,T_n) = \sum_{S\subset\{1,\dots,n\}} \prod_{\substack{i\in S \\ j\not\in S}} a_{i,j} \prod_{i\in S} T_i .\]
Say that a polynomial $F\in\mathbf C[T_1,\dots,T_n]$ is good if it has no root $(z_1,\dots,z_n)\in\mathbf C^n$ such that $\lvert z_i\rvert <1$ for all $i$.

Proposition 2 (Lee-Yang). — If $\lvert a_{i,j}\rvert\leq 1$ for all $i,j$, then $P$ is good.

For every pair $(i,j)$, set $a^S_{i,j}=a_{i,j}$ if $i$ belongs to $S$, but not $j$, and set $a^S_{i,j}=1$ otherwise. Consequently,
\[ P(T_1,\dots,T_n) = \sum_{S\subset\{1,\dots,n\}}\prod_{i,j} a^S_{i,j} \prod_{k\in S} T_k. \]
In other words, if we define polynomials
\[ P_{i,j} (T_1,\dots,T_n) = \sum_{S\subset\{1,\dots,n\}} a^S_{i,j} \prod_{k\in S} T_k, \]
then $P$ is the “coefficientwise product” of the polynomials $P_{i,j}$.
We also note that these polynomials have degree at most one with respect to every variable. These observations may motivate the following lemmas concerning good polynomials.

Lemma 1. — If $P,Q$ are good, then so is their product.

Lemma 2. — If $P(T_1,\dots,T_n)$ is good, then $P(a,T_2,\dots,T_n)$ is good for every $a\in\mathbf C$ such that $\lvert a\rvert \leq 1$.

This is obvious if $\lvert a\rvert <1$; in the general case, this follows from the Rouché theorem — the set polynomials (of bounded degree) whose roots belong to some closed subset is closed.

Lemma 3. — If $\lvert a\rvert \leq 1$, then $1+aT$ is good.

This is obvious.

Lemma 4. — If $\lvert a\rvert\leq 1$, then $P=1+aT_1+\bar a T_2+ T_1T_2$ is good.

If $\lvert a\rvert =1$, then $P=(1+aT_1)(1+\bar aT_2)$ is good, as the product of two good polynomials.
Now assume that $\lvert a\rvert <1$. Let $(z_1,z_2)$ be a root of $P$ such that $\lvert z_1\rvert<1$. One has
\[ z_2 = - \frac{1+az_1}{z_1+\bar a}. \]
Since the Möbius transformation $z\mapsto (z+\bar a)/(1+a z)$ defines a bijection from the unit open disk to itself, one has $\lvert z_2\rvert >1$.

Lemma 5. — If $P=a+bT_1+cT_2+dT_1T_2$ is good, then $Q=a+dT$ is good.

Assume otherwise, so that $\lvert a\rvert <\lvert d\rvert$. By symmetry, we assume $\lvert b\rvert\geq \lvert c\rvert$. We write $P (T_1,T_2) = (a+cT_2) + (b+dT_2) T_1$.
Choose $z_2\in\mathbf C$ such that $dz_2$ and $b$ have the same argument; if, moreover $z_2$ is close enough to $1$ and satisfies $\lvert z_2\rvert <1$, then
\[ \lvert b+dz_2\rvert=\lvert b\rvert +\lvert dz_2\rvert > \lvert a\rvert+\lvert c\rvert>\lvert a+cz_2\rvert. \]
Consequently, the polynomial $P(T_1,z_2)$ is not good; a contradiction.

Lemma 6. — If $P,Q$ are good polynomials of degree at most one in each variable, then so is their coefficientwise product.

We first treat the case of one variable: then $P=a+bT$ and $Q=a'+b'T$, so that their coefficientwise product is given by $R=aa'+bb'T$. By assumption $\lvert a\rvert \geq \lvert b\rvert$ and $a\neq 0$;
similarly, $\lvert a'\rvert \geq \lvert b'\rvert $ and $a'\neq 0$. Consequently, $aa'\neq0$ and $\lvert aa'\rvert \geq \lvert bb'\rvert$, which shows that $R$ is good.
We prove the result by induction on $n$. For every subset $S$ of $\{1,\dots,n-1\}$, let $a_S$ and $b_S$ be the coefficients of $\prod_{i\in S}T_i$ and of $\prod_{i\in S} T_i \cdot T_n$ in $P$; define similarly $c_S$ and $d_S$with $Q$. The coefficientwise product of $P$ and $Q$ is equal to
\[ R= \sum_S (a_S c_S +b_S d_S T_n ) \prod_{i\in S} T_i . \]
Let $z\in\mathbf C$ be such that $\lvert z\rvert \leq 1$, so that
\[ P(T_1,\dots,T_{n-1},z)= \sum _S (a_S+b_S z) \prod_{i\in S} T_i \] is good, by lemma 2. Similarly, for $w\in\mathbf C$ such that $\lvert w\rvert\leq 1$, $Q(T_1,\dots,T_{n-1},w)
=\sum _S (c_S+d_S w) \prod_{i\in S} T_i$ is good. By induction, their coefficientwise product, given by
\[ R_{z,w} = \sum_S (a_S+b_S z)(c_S+d_S w) \prod_{i\in S} T_i \]
is good as well.
We now fix complex numbers $z_1,\dots,z_{n-1}$ of absolute value $<1$. By what precedes, the polynomial
\[ S(T,U) = (\sum_S a_S c_S z_S) + (\sum_S b_Sc_S z_S) T + (\sum_S a_S d_S z_S) U
+ (\sum_S b_S d_S z_S) TU \]
is good, where $z_S=\prod_{i\in S}z_i$. According to lemma 4, the polynomial
\[ R(z_1,\dots,z_{n-1},T) = (\sum_S a_S c_S z_S) + (\sum_S b_S d_S z_S) T \]
is good. This proves that $R$ is good.

Proof of theorem 2. — We have already observed that the polynomial $P$ is the coefficientwise product of polynomials $P_{i,j}$, each of them has degree at most one in each variable. On the other hand, one has
\[ P_{i,j} = (1+a_{i,j} T_i + a_{j,i} T_j + T_i T_j) \prod_{k\neq i,j} (1+T_k), \]
a product of good polynomials, so that $P_{i,j}$ is good. This proves that $P$ is good.

In fact, more is true. Indeed, one has
\[
\begin{align*} T_1\dots T_n P(1/T_1,\dots,1/T_n)
& = \sum_ S \prod_{\substack{i\in S \\ j\notin S}} a_{i,j} \prod_{i\notin S} T_i \\
& =P^*(T_1,\dots,T_n)
\end{align*}
\]
where $P^*$ is defined using the transpose matrix of $A$. Consequently, $P$ has no root $(z_1,\dots,z_n)$ with $\lvert z_i\rvert >1$ for every $i$.

Proof of theorem 1. — Let $z$ be a root of $P$. Since the polynomial $P$ is good, so is the one-variable polynomial $F(T)=P(T,\dots,T)$. In particular, $F(z)=0$ implies $\lvert z\rvert \geq 1$. But the polynomial has a symmetry property, inherited by that of $P$, namely $ T^n F(1/T)=F^*(T)$, where $F^*$ is defined using the transpose matrix of $A$. Consequently, $F^*(1/z)=0$ and $\lvert 1/z\rvert \geq 1$. We thus have shown that $\lvert z\rvert=1$.




Wednesday, May 2, 2018

Combinatorics and probability : Greene, Nijenhuis and Wilf's proof of the hook length formula

I've never been very good at remembering representation theory, past the general facts that hold for all finite groups, especially the representation theory of symmetric groups.  So here are personal notes to help me understand the 1979 paper by Greene, Nijenhuis and Wilf, where they give a probabilistic proof of the hook length formula. If you already know what it is about, then you'd be quicker by browsing at the Wikipedia page that I just linked to!

Heroes of this story are partitions, Ferrers diagrams, and Young tableaux. Let us first give definitions.

A partition of an integer $n$ is a decreasing (US: non-increasing) sequence of integers $\lambda =(\lambda_1\geq \lambda_2\geq \dots\geq \lambda_m)$ of strictly positive integers such that $|\lambda|=\lambda_1+\dots+\lambda_m=n$.

The Ferrrers diagram $F(\lambda)$ of this partition $\lambda$ is the stairs-like graphic representation consisting of the first quadrant cells indexed by integers $(i,j)$, where $1\leq i\leq m$ and $1\leq j\leq \lambda_i$. Visualizing a partition by means of its Ferrers diagram makes clear that there is an involution on partitions, $\lambda\mapsto \lambda^*$, which, geometrically consists in applying the symmetry with respect to the diagonal that cuts the first quadrant. In formulas, $j\leq \lambda_i$ if and only if $i\leq \lambda_j^*$.

A Young tableau of shape $\lambda$ is an enumeration of the $n$ cells of this Ferrers diagram such that the enumeration strictly increases in rows and columns.

There is a graphic way of representing Ferrers diagrams and Young tableaux — the tradition says it's the Soviet way — consisting in rotating the picture by 45° ($\pi/4$) to the left and viewing the first quadrant as a kind of bowl in which $n$ balls with diameter $1$ are thrown from the top.

The hook length formula is a formula for the number $f_\lambda$ of Young tableaux of given shape $\lambda$.

For every $(i,j)$ such that $1\leq i\leq m$ and $1\leq j\leq \lambda_i$, define its hook $H_{ij}$ to be the set of cells in the diagram that are either above it, or on its the right — in formula, the set of all pairs $(a,b)$ such that $a=i$ and $j\leq b\leq \lambda_i$, or $i\leq a\leq m$ and $b=j\leq \lambda_a$. Let $h_{ij}$ be the cardinality of the hook $H_{ij}$.

Lemma. — One has $h_{a,b} = (\lambda_a-b)+(\lambda_b^*-a) +1$.

Indeed, $\lambda_a-b$ is the number of cells above $(a,b)$ in the Ferrers diagram of $\lambda$, excluding $(a,b)$, while $(\lambda_b^*-a)$ is the number of cells on the right of $(a,b)$.

Theorem (Frame, Thrall, Robinson; 1954). — Let $n$ be an integer and let $\lambda$ be a partition of $n$. The number of Young tableaux of shape $\lambda$ is given by 
\[ f_\lambda = \frac{n!}{\prod_{(i,j)\in F(\lambda)} h_{ij}}. \]

From the point of view of representation theory, partitions of $n$ are in bijection with conjugacy classes of elements in the symmetric group $\mathfrak S_n$ (the lengths of the orbits of a permutation of $\{1,\dots,n\}$ can be sorted into a partition of $n$, and this partition characterizes the conjugacy class of the given permutation). Then, to each partition of $n$ corresponds an irreducible representation of $\mathfrak S_n$, and $f_\lambda$ appears to be its dimension. (In a future post, I plan to explain this part of the story.)

As already said, the rest of this post is devoted to explaining the probabilistic proof due to Greene, Nijenhuis, and Wilf. (Aside: Nijenhuis is a Dutch name that should pronounced roughly like Nay-en uys.)

A corner of a Ferrers diagram is a cell $(i,j)$ which is both on top of its column, and on the right of its row; in other words, it is a cell whose associated hook is made of itself only. A bit of thought convinces that a corner can be removed, and furnishes a Ferrers diagram with one cell less. Conversely, starting from a Ferrers diagram with $n-1$ cells, one may add a cell on the boundary so as to get a Ferrers diagram with $n$ cells. In the partition point of view, either one part gets one more item, or there is one more part, with only one item. In a Young tableau with $n$ cells, the cell numbered $n$ is at a corner, and removing it furnishes a Young tableau with $n-1$ cells; conversely, starting from a Young tableau with $n-1$ cells, one can add a cell so that it becomes a corner of the new tableau, and label it with $n$.

Let $P(\lambda_1,\dots,\lambda_m)$ be the number on the right hand side of the Frame-Thrall-Robinson formula. By convention, it is set to be $0$ if $(\lambda_1,\dots,\lambda_m)$ does not satisfy $\lambda_1\geq\dots\geq\lambda_m\geq 1$. By induction, one wants to prove
\[ P(\lambda_1,\dots,\lambda_m) = \sum_{i=1}^m P(\lambda_1,\dots,\lambda_{i-1},\lambda_i-1,\lambda_{i+1},\dots,\lambda_m). \]
For every partition $\lambda$, set
\[ p_i(\lambda_1,\dots,\lambda_m) = \frac{P(\lambda_1,\dots,\lambda_{i-1},\lambda_i-1,\lambda_{i+1},\dots,\lambda_m)}{P(\lambda_1,\dots,\lambda_m)}. \]
We thus need to prove
\[ \sum_{i=1}^m p_i(\lambda)=1; \]
which we will do by interpreting the $p_i(\lambda)$ as the probabilities of disjoint events.

Given the Ferrers diagram $F(\lambda)$, let us pick, at random, one cell $(i,j)$, each of them given equal probability $1/n$; then we pick a new cell, at random, in the hook of $(i,j)$, each of them given equal probability $1/(h_{ij}-1)$, etc., until we reach a corner of the given diagram. Such a trial defines a path in the Ferrers diagram, ending at a corner $(a,b)$; its projections  are denoted by $A=\{a_1<a_2<\dots\}$ and $B=\{b_1<b_2<\dots\}$. Let $p(a,b)$ be the probability that we reach the corner $(a,b)$; let $q(A,B)$ be the probability that its projections be $A$ and $B$ conditioned to the hypothesis that it start at $(\inf(A),\inf(B))$.

Lemma. — Let $A,B$ be sets of integers, let $a=\sup(A)$ and let $b=\sup(B)$;  assume that $(a,b)$ is a corner of $\lambda$. One has \[ q(A,B)= \prod_{\substack{i\in A\\ i\neq a}} \frac1{h_{i,b}-1} \prod_{\substack{j\in B\\ j\neq b}} \frac1{h_{a,j}-1}.\]

We argue by induction on the cardinalities of $A$ and $B$. If $A=\{a\}$ and $B=\{b\}$, then $q(A,B)=1$, since both products are empty; this proves the formula in this case. As above, let $a_1<a_2<\dots$ be the enumeration of the elements of $A$ and $b_1<b_2<\dots$ be that for $B$; let also $A'=A\setminus\{a_1\}$ and $B'=B\setminus\{b_1\}$. By construction of the process, after having chosen the initial cell $(a_1,b_1)$,  it either goes on above the initially chosen cell $(a_1,b_1)$, hence at $(a_1,b_2)$, or on its right, that is, at $(a_2,b_1)$. One thus has
\[ \begin{align*}
q(A,B) =   \mathbf P(A,B\mid a_1,b_1)
& =  \mathbf P(a_1,b_1,b_2\mid a_1,b_1) \mathbf P(A,B\mid a_1,b_1,b_2)
+ \mathbf P(a_1,a_2,b_1\mid a_1,b_1) \mathbf P(A,B\mid a_1,a_2,b_1)\\
&= \frac1{f_{a_1,b_1}-1} \mathbf P(A,B'\mid a_1,b_2)
+ \frac1{f_{a_1,b_1}-1} \mathbf P(A',B\mid a_2,b_1) \\
&= \frac1{h_{a_1,b_1}-1}\left( q(A',B) + q(A,B') \right). \end{align*}
\]
By induction, we may assume that the given formula holds for $(A',B)$ and $(A,B')$. Then, one has
\[ \begin{align*}
q(A,B) & = \frac1{h_{a_1,b_1}-1} \left(
 \prod_{\substack{i\in A'\\ i\neq a}} \frac1{h_{i,b}-1} \prod_{\substack{j\in B \\ j\neq b}} \frac1{h_{a,j}-1}
+
 \prod_{\substack{i\in A\\ i\neq a}} \frac1{h_{i,b}-1} \prod_{\substack{j\in B' \\ j\neq b}} \frac1{h_{a,j}-1}\right) \\
& =\frac{  (h_{a_1,b}-1)+(h_{a,b_1}-1)}{h_{a_1,b_1}-1}
 \prod_{\substack{i\in A\\ i\neq a}} \frac1{h_{i,b}-1} \prod_{\substack{j\in B \\ j\neq b}} \frac1{h_{a,j}-1},
\end{align*}
\]
which implies the desired formula once one remembers that
\[ h_{a,b_1} + h_{a_1,b}
= h_{a_1,b_1} + h_{a,b}=  h_{a_1,b_1}+1\]
since $(a,b)$ is a corner of $F(\lambda)$.

Proposition. — Let $(a,b)$ be a corner of the diagram $F(\lambda)$; one has $p(a,b)=p_a(\lambda)$. (Note that $b=\lambda_a$.)

Write $F_a(\lambda)$ for the Ferrers diagram with corner $(a,b)$ removed. Its $(i,j)$-hook is the same as that of $F(\lambda)$ if $i\neq a$ and $j\neq b$; otherwise, it has one element less. Consequently, writing $h'_{i,j}$ for the cardinalities of its hooks, one has
\[\begin{align*}
p_a(\lambda) &= \frac1n \frac{\prod_{(i,j)\in F(\lambda)}h_{i,j}}{\prod_{(i,j)\in F_a(\lambda)} h'_{i,j}} \\ &= n \prod_{i<a} \frac{h_{i,b}}{h_{i,b}-1} \prod_{j<b}\frac{h_{a,j}}{h_{a,j}-1}\\
&=\frac1n \prod_{i<a}\left(1+ \frac1{h_{i,b}-1}\right) \prod_{j<b} \left(1+\frac1{h_{a,j}-1}\right). \end{align*}
\]
Let us now expand the products. We get
\[
p_a(\lambda) = \frac1n \sum_{\sup(A)<a} \sum_{\sup(B)<b} \prod_{i\in A} \frac1{h_{i,b}-1}\prod_{j\in B}\frac1{h_{a,j}-1},
\]
where $A$ and $B$ range over the (possibly empty) subsets of $\{1,\dots,n\}$ satisfying
the given conditions $\sup(A)<a$ and $\sup(B)<b$. (Recall that, by convention, or by definition, one has $\sup(\emptyset)=-\infty$.) Consequently, one has
\[ \begin{align*}
p_a(\lambda) & =\frac1n \sum_{\sup(A)=a} \sup_{\sup(B)=b} \prod_{\substack{i\in A\\ i\neq a}}
 \frac1{h_{i,b}-1} \prod_{\substack{j\in B \\ j\neq b}} \frac1{h_{a,j}-1} \\
& = \frac1n \sum_{\sup(A)=a} \sum_{\sup(B)=b} q(A,B) \\
& = \sum_{\sup(A)=a} \sum_{\sup(B)=b} \mathbf P (A,B ) \\
& = p(a,b),
\end{align*}
\]
as claimed.

Now, every trial has to end at some corner $(a,b)$, so that
\[ \sum_{\text{$(a,b)$ is a corner}} p(a,b) = 1.\]
On the other hand, if $(a,b)$ is a corner, then $b=\lambda_a$, while if $(a,\lambda_a)$ is not a corner, then $P_a(\lambda)=0$. We thus get $\sum_a P_a(\lambda)=P(\lambda)$, as was to be shown.

Wednesday, February 7, 2018

Contemporary homological algebra — Ignoramus et ignorabimus (?)

The title of this post is a quotation of Emil Dubois-Reymond (1818-1896), a 19th century German physiologist, and the elder brother of the mathematician Paul Dubois-Reymond. Meaning we are ignorant, and we will remain ignorant, it adopts a pessimistic point of view on science, which would have intrinsic limitations. As such, this slogan has been quite opposed by David Hilbert who declared, in 1900, at the International congress of mathematicians, that there is no ignorabimus in mathematics. (In fact, there is some ignorabimus, because of Gödel's incompleteness theorem, but that is not the subject of this post.)

I would like to discuss here, in a particularly informal way, some frustration of myself relative to homological algebra, in particular to its most recent developments. I am certainly ill-informed on those matters, and one of my goals is to clarify my own ideas, my expectations, my hopes,...

This mere existence of this post is due to the kind invitation of a colleague of the computer science department working in (higher) category theory, namely François Metayer, who was interested to understand my motivation for willing to understand this topic.


Let me begin with a brief historical summary of the development of homological algebra, partly borrowed from Charles Weibel's History of homological algebra.
  • B. Riemann (1857), E. Betti (1871), H. Poincaré (1895) define homology numbers. 
  • E. Noether (1925) introduces abelian groups, whose elementary divisors, recover the previously defined homology numbers.
  • J. Leray (1946) introduces sheaves, their cohomology, the spectral sequence... 
  • During the years 1940–1955, under the hands of Cartan, Serre, Borel, etc., the theory develops itself in various directions (cohomology of groups, new spectral sequences, etc.).
  • In their foundational book, Homological algebra, H. Cartan and S. Eilenberg (1956) introduce derived functors, projective/injective resolutions,...
  • Around 1950, A. Dold, D. Kan, J. Moore, D. Puppe introduce simplicial methods. D. Kan introduces adjoint functors.
  • A. Grothendieck, in Sur quelques points d'algèbre homologique (1957), introduces general abelian categories, as well as convenient axioms that guarantee the existence of enough injective objects, thus giving birth to a generalized homological algebra.
  • P. Gabriel and M. Zisman (1967) developed the abstract calculus of fractions in categories, and proved that the homotopy category of topological spaces coincides with that of simplicial sets.
  • J.-L. Verdier (1963) defines derived categories. This acknowledges that objects give rise to, say, injective resolutions which are canonical up to homotopy, and that the corresponding complex is an object in its own right, that has to be seen as equivalent to the initial object.  The framework is that of triangulated categories. Progressively, derived categories came to play an important rôle in algebraic geometry (Grothendieck duality, Verdier duality, deformation theory, intersection cohomology and perverse sheaves, the Riemann–Hilbert correspondence, mirror symmetry,...) and representation theory.
  • D. Quillen (1967) introduces model categories, who allow a parallel treatment of homological algebra in linear contexts (modules, sheaves of modules...) and non-linear ones (algebraic topology)... This is completed by A. Grothendieck's (1991) notion of derivators.
  • At some point, the theory of dg-categories appears, but I can't locate it precisely, nor do I understand precisely its relation with other approaches.
  • A. Joyal (2002) begins the study of quasi-categories (which were previously defined by J. M. Boardman and R. M. Vogt, 1973). Under the name of $(\infty,1)$-categories or $\infty$-categories, these quasi-categories are used extensively in Lurie's work (his books Higher topos theory, 2006; Higher algebra, 2017; the 10+ papers on derived algebraic geometry,...).
My main object of interest (up to now) is “classical” algebraic geometry, with homological algebra as an important tool via the cohomology of sheaves, and while I have barely used anything more abstract that cohomology sheaves (almost never complexes), I do agree that there are three main options to homological algebra: derived categories, model categories, and $\infty$-categories.

While I am not absolutely ignorant of the first one (I even lectured on them), the two other approaches still look esoteric to me and I can't say I master them (yet?). Moreover, their learning curve seem to be quite steep (Lurie's books totalize more than 2000 pages, plus the innumerable papers on derived algebraic geometry, etc.) and I do not really see how an average geometer should/could embark in this journey.

However, I believe that this is now a necessary journey, and I would like to mention some recent theorems that support this idea.

First of all, and despite its usefulness, the theory of triangulated/derived categories has many defects. Here are some of them:
  • There is no (and there cannot be any) functorial construction of a cone; 
  • When a triangulated category is endowed with a truncation structure, there is no natural functor from the derived category of its heart to the initial triangulated category; 
  • Derived categories are not well suited for non-abelian categories (filtered derived categories seem to require additional, non-trivial, work, for example);
  • Unbounded derived functors are often hard to define: we now dispose of homotopically injective resolutions (Spaltenstein, Serpé, Alonso-Tarrió et al.), but unbounded Verdier duality still requires some unnatural hypotheses on the morphism, for example.
Three results, now.

The first theorem I want to mention is due to M. Greenberg (1966). Given a scheme $X$ of finite type over a  complete discrete valuation ring $R$ with uniformizer $\pi$, there exists an integer $a\geq 1$, such that for any integer $n\geq1$, a point $x\in X(R/\pi^n)$ lifts to $X(R)$ if and only if it lifts to $X(R/\pi^{an})$.

It may be worth stating it in more concrete terms. Two particular cases of such a ring $R$ are the ring $R[[t]]$ of power series over some field $k$, then $\pi=t$, and the ring $\mathbf Z_p$ of $p$-adic numbers (for some fixed prime number $p$), in which case one has $\pi=p$. It is then important to consider the case of affine scheme. Then $X=V(f_1,\dots,f_m)$ is defined by the vanishing of a finite family $f_1,\dots,f_m$ of polynomials in $R[T_1,\dots,T_n]$ in $n$ variables, so that, for any ring $A$, $X(A)$ is the set of solutions in $A^n$ of the system $f(T_1,\dots,T_n)=\dots=f_m(T_1,\dots,T_n)=0$.  By reduction modulo $\pi^r$, a solution in $R^r$ gives rise to a solution in $R/\pi^r$, and Greenberg's result is about the converse: given a solution $x$ in $R/\pi^r$, how do decide whether it is a reduction of a solution in $R$. A necessary condition is that $x$ lifts to a solution in $R/\pi^s$, for every $s\geq r$. Greenberg's theorem asserts that it is sufficient that $x$ lift to a solution in $R/\pi^{ar}$, for some integer $a\geq 1$ which does not depend on $X$.

The proof of this theorem is non-trivial, but relatively elementary. After some preparation, it boils down to Hensel's lemma or, equivalently, Newton's method for solving equations.
However, it seems to me that there should be an extremely conceptual way to prove this theorem, based on general deformation theory such as the one developed by Illusie (1971). Namely, obstructions to lifting $x$ are encoded by various cohomology classes, and knowing that it lifts enough should be enough to see — on the nose — that these obstructions vanish.

The second one is about cohomology of Artin stacks. Y. Laszlo and M. Olsson (2006) established the 6-operations package for $\ell$-adic sheaves on Artin stacks, but their statements have some hypotheses which look a bit unnatural. For example, the base scheme $S$ needs to be such that all schemes of finite type have finite $\ell$-cohomological dimension — this forbids $S=\operatorname{Spec}(\mathbf R)$. More recently, Y. Liu and W. Zheng developed a more general theory, apparently devoid of restrictive hypotheses, and their work builds on $\infty$-categories, more precisely, a stable $\infty$-category enhancing the unbounded derived category. On page 7 of their paper, they carefully explain why derived categories are unsufficient to take care of the necessary descent datas, but I can't say I understand their explanation yet...

The last one is about the general formalism of 6-operations. While it is clear what these 6 operations should reflect (direct and inverse images; proper direct images and extraordinary inverse images; tensor product, internal hom), the list of the properties they should satisfy is not clear at all (to me). In the case of coherent sheaves, there is such a formulaire, written by A. Grothendieck itself on the occasion of a talk in 1983, but it is quite informal, and not at all a general formalism. Recently, F. Hörmann proposed such a formalism  (2015–2017), based on Grothendieck's theory of derivators.

Now, how should the average mathematician embark in learning these theories?

Who will write the analogue of Godement's book for the homological algebra of the 21st century? Can we hope that it be shorter than 3000 pages?

I hope to find, some day, some answer to these questions, and that they will allow to hear with satisfaction the words of Hilbert: Wir müssen wissen, wir werden wissen.