Freedom Math Dance

The determinant of transvections (an update)

2025-11-30T15:44:00.009+01:00

In the previous post, I had explained how I could prove a general version of the classic fact that transvections have determinant 1. Recall here that transvections in an $R$-module $M$ are linear maps of the form $x\mapsto x+f(x)v$, where $f$ is a linear form on $M$ and $v\in M$ is a vector such that $f(v)=0$. To be able to talk of determinant, I assumed that $M$ had a finite basis, so that linear maps correspond to matrices and the determinant of the matrix serves as a definition for the definition of the corresponding linear map. (It does not depend on the choice of a basis.)

That proof had three steps:

When $R$ is a field $F$ (and $\dim(M)\geq 2$), one can use linear algebra to get a basis $(e_1,\dots,e_n)$ of $M$ such that $v=e_1$ and $f=e_2^*$. Then the matrix of the transvection is triangular with ones on the diagonal, so its determinant is $1$.
When $R$ is a domain, one can consider its field of fractions $F$ and the base change to $F$ of the given transvection. By the case of fields, its determinant is $1$, and since $R$ is a subring of $F$, the determinant of the initial transvection is $1$ as well.
In general, one observes that our transvection is deduced, by base change, from the “universal” transvection, which lives on the ring $R=\mathbf Z[f_1,\dots,f_n,v_1,\dots,v_n]/\langle\sum f_i v_i\rangle$ and prove that this ring is an integral domain when $n\geq 2$.

But one can do more with less effort!

Indeed, the linear forms given by an expression $x\mapsto x+f(x)v$ are also interesting when $f(v)=-2$: they give symmetries in the direction given by $v$ with respect to the hyperplane defined by $f$, and, at least over fields, their determinant is $-1$. This suggests the following, more general, result.

Theorem. — Let $M$ be an $R$-module with a finite basis, let $f\in M^*$ be a linear form and let $v\in M$. The linear map given $u$ by $x\mapsto x + f(x)v$ has determinant $1+f(v)$.

To prove this result, we will first prove the case where $R$ is a field $F$. Then, we have two subcases, according to $f(v)=0$ or $f(v)\neq 0$.

If $f(v)=0$, then $u$ is a transvection and its determinant is $1=1+f(v)$, as shown in the previous post.
Otherwise, $f(v)\neq 0$ and there is a basis $(e_1,\dots,e_n)$ of $M$ such that $(e_1,\dots,e_{n-1})$ is a basis of $\ker(f)$ and $e_n=v$. Then the matrix of $u$ is diagonal, with entries $(1,\dots,1,1+f(v))$, and its determinant is $1+f(v)$, as claimed.

The case where $R$ is a domain is proved as before, by embedding $R$ into its field of fractions $F$.

Finally, we reduce to the general case which is simply $R=\mathbf Z[f_1,\dots,f_n,v_1,\dots,v_n]$, without any need to quotient by the ideal $\langle \sum f_i v_i\rangle$. That ring is an integral domain, hence the theorem. The gain is that we don't need anymore to prove that the polynomial $\sum f_i v_i$ is irreducible.

The determinant of transvections

2025-11-06T23:22:00.007+01:00

A transvection in a $K$-vector space $V$ is a linear map $T(f,v)$ of the form $x\mapsto x + f(x) v$, where $f\in V^*$ is a linear form and $v\in V$ is a vector such that $f(v)=0$. It is known that such a linear map is invertible, with inverse given by $f$ and $-v$. More precisely, one has $T(f,0)=\mathrm{id}$ and $T(f,v+w)=T(f,v)\circ T(f,w)$. In finite dimension, these maps have determinant $1$ and it is known that they generate the special linear group $\mathrm{SL}(V)$, the group of linear automorphisms of determinant $1$.

When I started formalizing in Lean the theory of the special linear group, the question raised itself of the appropriate generality for such results. In particular, what happens when one replaces the field $K$ with a ring $R$ and the $K$-vector space $V$ with an $R$-module?

The definition of the transvections still makes sense. If $f\colon M\to R$ is $R$-linear and $v\in M$, then the map $T(f,v)$ defined by $x\mapsto x+f(x) v$ is linear. When $f=0$ or $v=0$, this map is the identity. Moreover, if $v,w\in V$ are such that $f(v)=f(w)=0$, then $$ T(f,v) \circ T(f,w) (x)=T(f,v)(x+f(x)w)=x+f(x)w + f(x+f(x)w)v = x + f(x)(v+w),$$ so that $T(f,v)\circ T(f,w) = T(f, v+w)$. This implies that the map $v\mapsto T(f,v)$ is a morphism from the additive group of $V$ to the multiplicative monoid $\mathrm{End}(M)$ of $R$-endomorphisms of $M$. In particular, it lands into the group $\mathrm{GL}(M)$ of linear automorphisms of $M$.

Now assume that $M$ is a free $R$-module of finite rank, that is, has a finite basis $(b_1,\dots,b_n)$. This allows to define the determinant of an element of $\mathrm{End}(M)$, as the determinant of the associated matrix in the given basis, and one can ask about the determinant of these transvections. The following is not so surprising, but I couldn't find it in the literature.

Theorem. — For every $f\in M^*$ and $v\in M$ such that $f(v)=0$, one has $\det(T(f,v))=1$.

How can one prove such a result? The special case where $R$ is an integral domain is simpler, and is actually a step in the proof.

Proposition. — The theorem holds when $R$ is an integral domain.

Proof. — I can give two proofs of that result. The simplest one consists in considering the field of fractions $K$ of the integral domain $R$ and remarking that the matrix of the transvection $T(f,v)$ is also the matrix of a transvection in the $K$-vector space $V=K\otimes_R M$. In fact, if one identifies $M$ with $R^n$ by way of the matrix $b$, the linear form $f$ is a row vector in $R^n$, the vector $v$ is a column vector in $R^n$, satisfying $\sum f_j v_j=0$, the vector space $V$ is simply $K^n$, so that the matrix of $T(f,v)$ is also the matrix of a transvection in $V$. Its determinant is $1$, because of the case of fields, so that the determinant of $T(f,v)$ is $1$.

To be complete, let us explain the case of fields. The fact is that one choose an appropriate basis $(b_1,\dots,b_n)$ of $V$ such that $b_1=v$, $(b_1,\dots,b_{n-1})$ is a basis of $\ker(f)$ and $b_n$ is chosen such that $f(b_n)=1$. In that basis, the vector $v$ has coordinates $(1,0,\dots,0)$; the linear form $f$ is given by $f(x_1,\dots,x_n)=x_n$ and the linear map $T(f,v)$ is given by $(x_1,\dots,x_n)\mapsto (x_1+x_n,x_2,\dots,x_n)$. Its matrix is lower triangular, with only ones on the diagonal, hence its determinant is $1$.

Here is a second proof, more elaborate, but quite amusing. Since the determinant is multiplicative, the map $v\mapsto \det(T(f,v))$ is a group morphism from the submodule $\ker(f)$ to $R^\times$. To upgrade this result, one may consider an indeterminate $X$ and consider the transvection $T(f, Xv)$ of the $R[X]$-module $R[X]\otimes M$ deduced from $M$ by base change. This replaces the ring $R$ with the ring $R[X]$ and the determinant $D(X)=\det(T(f,Xv))$ is an invertible element of $R[X]$. A classical theorem describes such polynomials: their constant coefficient is invertible, and all other elements have to be nilpotent. Since $R$ is an integral domain, it has no nilpotent element, so that $D(X)$ is constant, invertible. One has $D(0)=\det(T(f,0))=\det(\mathrm{id})=1$, hence $D(X)=1$. Setting $X=1$, one gets $\det(T(f,v))=1$.

However, this second proof does not seem to generalize about the general case. On the other hand, a standard trick in algebra allows us to reduce to this case, namely, by reducing to the generic case. To explain this more easily, let us consider that $M=R^n$, where the linear form $f$ has coordinates $(f_1,\dots,f_n)$ and the vector $v$ has coordinates $(v_1,\dots,v_n)$. The relation $f(v)=0$ means that $\sum_{i=1}^n f_i v_i=0$, leading to the consideration of the ring $S=\mathbf Z[X_1,\dots,X_n,Y_1,\dots,Y_n]/\langle\sum X_i Y_i\rangle$ of polynomials with integer coefficients in $2n$ indeterminates quotiented out by the relation $\sum X_i Y_i=0$.

Proposition. — For $n\geq 2$, the ring $S=\mathbf Z[X_1,\dots,X_n,Y_1,\dots,Y_n]/\langle\sum X_i Y_i\rangle$ is an integral domain.

Proof. — Since $\mathbf Z$ is a unique factorization domain (UFD), a theorem of Gauss asserts that the polynomial ring $\mathbf Z[X_1,\dots,X_n,Y_1,\dots,Y_n]$ is again a UFD. Consequently, we have to prove that the polynomial $\sum_{i=1}^n X_i Y_i$ is irreducible if $n\geq 2$. Let us consider this ring as a ring of polynomials in the variable $Y_n$ with coefficients in the ring $S'=\mathbf Z[X_1,\dots,X_n,Y_1,\dots,Y_{n-1}]$ in $2n-1$ indeterminates. The proof of Gauss's theorem also establishes that the irreducible elemements of $S=S'[Y_n]$ are either the irreducible elements of $S'$, or the elements of $S'[Y_n]$ which are irreducible as polynomials in $K'[Y_n]$ ($K'$ being the field of fractions of $S'$) and such that the gcd of their coefficients is $1$. Our polynomial of interest is $X_n\cdot Y_n + \sum_{i=1}^{n-1} X_i Y_i$. It has degree $1$ in $Y_n$, hence is irreducible as a polynomial in $K'[Y_n]$. The gcd of its coefficients is $\gcd(X_n,\sum_{i=1}^{n-1}X_i Y_i)$. It divides $X_n$, so is either $\pm1$ or $\pm X_n$, and since $X_n$ doesn't divide $\sum_{i=1}^{n-1}X_i Y_i $, the gcd is equal to $1$, as claimed.

We can now conclude the proof of the theorem when $n\geq 2$. Indeed, the case of integral domains implies that the determinant of the generic transvection $T(X,Y)$ in $S^n$ is equal to $1$. Moreover, there is a (unique) ring morphism $\phi\colon \mathbf Z[X_1,\dots,X_n,Y_1,\dots,Y_n]/\langle \sum X_iY_i\rangle$ that maps $X_i$ to $f_i$ and $Y_i$ to $v_i$ for every $i$. Applying $\phi$ to the coefficients of the matrix $T(X,Y)$, one gets the matrix $T(f,v)$, so that $\det(T(f,v))=\phi(\det(T(X,Y)))=\phi(1)=1$.

When $n\leq 1$, the module $M$ is generated by one element $e$, and the condition $f(v)=0$ implies that $f(x)v=0$ for every $x\in M$, so that $T(f,v)=\mathrm{id}$. Consequently, $\det(T(f,v))=1$.

The two adjunctions of the preimage

2025-09-06T16:40:00.002+02:00

Sometimes in mathematics, you are told about very elementary things of which you hadn't even thought.

I was well aware of some “duality” between image and preimage, but I just learned from Anatole Dedecker (who learned it from Patrick Massot) about another “duality” between preimage and some other notion. Moreover, it appears that this new notion can be used for making slightly more natural a proof in general topology!

Here, “duality” is taken in an informal meaning, the correct word is “adjunction”, in the sense of category theory, and I will try to explain that.

1. Image and preimage

So consider a map $f\colon X \to Y$ between two sets. It induces two other maps relating the sets $\mathcal P(X)$ and $\mathcal P(Y)$ of subsets of $X$ and $Y$. Note that the inclusion relation between subsets these two sets $\mathcal P(X)$ and $\mathcal P(Y)$ allows to view them as ordered sets.

First, we have the direct image operation $f_{*}$, that maps a subset $A\subseteq X$ to the subset $f_{*}(A)$ of $Y$, the set of all images $f(a)\in X$, for $a\in A$. The classical notation would be $f(A)$, but it is ambiguous in the case where a subset $A$ of $X$ is also an element of $X$, and introducing a specific notation will help to clarify some statements later on. This map $f_{*}\colon \mathcal P(X) \to \mathcal P(Y)$ is increasing: for $A$ and $A'\in\mathcal P(X)$ such that $A\subseteq A'$, one has $f_{*}(A) \subseteq f_{*}(A')$.

Then we have the preimage operation $f^{*}$, that maps a subset $B\subseteq X$ to the subset $f^{*}(B)$ of $X$ consisting of all preimages of elements of $B$, namely all $a\in A$ such that $f(a) \in B$. The classical notation is rather $f^{-1}(B)$, but it has the same ambiguity as the direct image. Bizarrely, Bourbaki found the need to invent a another notation for that one, and they put the symbol “$-1$” on top of the letter $f$. The notation $f ^{*}$ is chosen by symmetry with the direct image $f_{*}$. Again, the map $f^{*}\colon \mathcal P(Y) \to\mathcal P(X)$ is increasing: for $B$ and $B'\in\mathcal P(Y)$ such that $B\subseteq B'$, one has $f^{*}(B) \subseteq f^{*}(B')$.

Finally, there is a compatibility between these two operations $f_{*}$ and $f ^{*}$: for $A\in\mathcal P(X)$ and $B\in\mathcal P(Y)$, one has $f_{*}(A) \subseteq B$ if and only if $A \subseteq f^{*}(B)$. Indeed, both of these expressions mean that if $f(a) \in B$ for all $a\in A$. We summarize this property by saying that the operation $f_{*}$ is left adjoint to the operation $f ^{*}$, or that the operation $f^{*}$ is right adjoint to the operation $f_{*}$.

This terminology comes from category theory, in which adjunctions of functors play an important role since the paper of Daniel Kan (1958), Adjoint functors.

In our case, the categories are just the ordered sets $\mathcal P(X)$ and $\mathcal P(Y)$, with the corresponding sets as sets of objects, and where the set of arrows $A$ to $A'\in\mathcal P(X)$ is a singleton when $A\subseteq A'$, and is empty otherwise. The book of Emily Riehl (2016), Category Theory in Context, is a nice introduction to this topic, with illuminating elementary examples. The property that the operations $f_{*}$ and $f^{*}$ are increasing means that they are *functors* between these categories, and the equivalence $f_{*}(A) \subseteq B \Leftrightarrow A \subseteq f^{*}(B)$ induces the category-theoretical adjunction.

In this case, an adjunction pair is also called a Galois connection. There, the terminology comes from Galois theory, the two ordered sets are the set of subextensions of a Galois extension $K\to L$ and the set of subgroups of the Galois group $\operatorname{Gal}(L/K)$, the maps are decreasing and correspond to mapping a subextension $E$ of $L$ to the subgroup of $\operatorname{Gal}(L/E)$ of $\operatorname{Gal}(L/K)$, and a subgroup $H\subseteq \operatorname{Gal}(L/K)$ to the fixed-field $L^H$. In Galois theory, these two maps are even bijective.

2. The adjoint functor theorem

While, as MacLane wrote, “adjoint functors arise everywhere”, not every functor can be part of an adjunction. Indeed, if a functor $F$ is left adjoint to a functor $G$, then $F$ preserves colimits and $G$ preserves limits.

Category theory considers limits and colimits of arbitrary diagrams, but in the restricted setting of ordered sets, where there can be at most one arrow from one object to another, diagrams boil down to subsets of objects, limits correspond to infimums (greatest lower bound) and colimits to supremums (least upper bound), which may exist, or not, in particular ordered sets.In our even more restricted case of the set $\mathcal P(X)$ of subsets of a given set $X$, infimum corresponds to intersection, supremum to union, and we have $f_{*}(\bigcup A_i) = \bigcup f_{*}(A_i)$ for every family $(A_i)$ of subsets of $X$, and $f^{*}(\bigcap B_i) = \bigcap f^{*}(B_i)$ for every family $(B_i)$ of subsets of $Y$.

There is an abstract theorem in category theory, the “general adjoint functor theorem”, that says that these property are essentially sufficient for a functor $F$ to be a left adjoint to some functor $G$, or for a functor $G$ to be a right adjoint to some functor $G$. One has to be more careful for the actual statement, but this is the idea.

For an increasing map $G\colon T \to S$ between ordered sets $S$ and $T$, the existence of a left adjoint $F$ can be understood from: for $s\in S$ and $t\in T$, one should have $F(s)\leq t$ if and only if $s\leq G(t)$: consequently, it suffices to take for $F$ the infimum, assuming it exists, of all $t$ such that $s\leq G(t)$. Dually, the right adjoint $G$ to a functor $F$ would map $t$ to the supremum, assuming it exists, of all $s$ such that $t\leq F(s)$.

In the case of the image $f_{*}\colon \mathcal P(Y)\to \mathcal P(X)$, this rule defines the right adjoint as mapping $B \in\mathcal P(Y)$ to the union of all subsets $A\in\mathcal P(X)$ such that $f _{*}(A) \subseteq B$. This is exactly the preimage of $B$!

Conversely, in the case of the preimage $f^{*}\colon \mathcal P(Y)\to \mathcal P(X)$, this procedure defines the left adjoint as mapping $A \in\mathcal P(X)$ to the intersection of all subsets $B$ such that $A \subseteq f^{*}(B)$. Again, this is just the image $f _{*}(A)$ of $A$, but I find it slightly more difficult to prove without using that we already know this image and the already known adjunction between $f _{*}$ and $f ^{*}$.

3. The other adjunction

We have seen that preimages respect intersections. As a matter of fact, they also respect unions: $f ^{*}(\bigcup B_i)= \bigcup f ^{*}(B_i)$. Given the adjoint functor theorem, this implies that there is an increasing map $f_! \colon \mathcal P(X) \to \mathcal P(Y)$ which is a right adjoint to $f ^{*}$. What is this operation?

The adjoint functor theorem gives a way to compute it: for $A\in\mathcal P(X)$, the set $f_!(A)\in\mathcal P(Y)$ is the union of all subsets $B\in\mathcal P(Y)$ such that $f^{*}(B) \subseteq A$. It suffices to consider such sets $B$ which are singletons $\{b\}$ and we get that a point $b\in Y$ belongs to $f_!(A)$ if and only if all preimages of $b$ belong to $A$.

Here are two more ways to get a grip on this new adjunction.

Note that a point $b\in Y$ belongs to $f_{*}(A)$ if and only if there exists $a\in A$ such that $b = f (a)$, which means that there exists $a\in A$ in the preimage $f^{*}(\{b\})$, relating $f_{*}$ with the existential quantifier. Similarly, a point $b\in Y$ belongs to $f_! (A)$ if and only if for every $a\in f^{*}(\{b\})$, one has $a\in A$, relating $f_!$ with the universal quantifier.

The other way comes by taking complements: a point $b$ does not belong to $f_!(A)$ if it has a preimage that does not belong to $a$. In other words, $f_!(A) = \complement f_{*}(\complement A)$. This leads to considering the complement map from $\mathcal P(X)$ to itself as an order-reversing involution, and similarly on $\mathcal P(Y)$, and observing that they commute with preimage, in the sense that $f^{*}(\complement B) = \complement f^{*}(B)$ for all $B\subseteq Y$. Consequently, this operation transfers the left adjoint $f _{*}$ of $f ^{*}$ to a right adjoint, and conversely, which is exactly what we had observed.

4. An application in general topology

As an application, this adjunction can be used in topology to characterize open or closed maps. By definition, a map $f \colon X\to Y$ between topological spaces is open if it maps an open subset to an open subset, and it is closed if it maps a closed subset to a closed subset.

The definition of $f_!$ using complement, and the fact that a set is closed if and only if its complement is open implies the following lemma:

Lemma. — A map $f\colon X \to Y$ is closed (resp. open) if and only if for every open (resp. closed) subset $U\subseteq X$, the set $f_! (U)$ is closed (resp. open).

It also allows to give a natural proof of the classical characterization of closed maps:

Proposition. — Let $f\colon X \to Y$ be a map between topological spaces. The following properties are equivalent:

The map $f$ is closed;
For any subset $B$ of $Y$, the filter of neighborhoods of $f^{*}(B)$ is coarser than the preimage of the filter of neighborhoods of $B$;
For any subset $B$ of $Y$ and any neighborhood $U$ of $f^{*}(B)$, there exists a neighborhood $V$ of $B$ such that $f^{*}(V)\subseteq U$;
For any point $b\in Y$, the filter of neighborhoods of $f^{*}(\{b\})$ is coarser than the preimage of the filter of neighborhoods of $b$;
For any point $b\in Y$ and any neighborhood $U$ of $f^{*}(\{b\})$, there exists a neighborhood $V$ of $b$ such that $f^{*}(V) \subseteq U$.

Given the definitions of the preimage of a filter and the comparison relation on filters,
the assertions (2) and (3) are equivalent, as well as the assertions (4) and (5).

Obviously, (3) implies (5).

Let us assume (1), that $f$ is closed, and let us prove (3). Let $B$ be a subset of $Y$ and let $U$ be a neighborhood of $f^{*}B$ in $X$. By definition, there exists an open subset $U'$ of $X$ such that $f^{*}B \subseteq U' \subseteq U$. Taking adjunction, we get $B\subseteq f_! U' \subseteq f_! U$. Since $f$ is closed, the set $f_! U'$ is open, so that $f_! U$ is a neighborhood of $B$. It remains to prove that $f^{*}f_! U\subseteq U$. To prove this inclusion, we apply the adjunction $(f^{*}, f_!)$ once more, and see that it is equivalent to the obvious inclusion $f_! U \subseteq f_! U$.

Finally, let us assume (5) and let us prove that $f$ is closed. Let $U$ be an open subset of $X$ and let us prove that $f_! U$ is open in $Y$. It suffices to prove that for every $b\in f_! U$, the set $f_! U$ is a neighborhood of $b$. By the construction of $f_!$, the set $f^{*}(\{b\}) $ is contained in $U$ so that $U$ is a neighborhood of $f^{*}(\{b\})$. Applying (5), we get a neighborhood $V$ of $b$ in $Y$ such that $f^{*}V \subseteq U$. Applying the adjunction $(f^{*}, f_!)$, we get the inclusion $V \subseteq f_! U$. In particular, $f_! U$ is a neighborhood of $b$, as was to be shown.

The Krull dimension of the semiring of natural numbers is equal to 2

2025-07-15T01:16:00.003+02:00

Let $R$ be a ring. Its Krull dimension is the supremum of the lengths $n$ of chains $P_0\subsetneq P_1 \subsetneq\dots\subsetneq P_n$ of prime ideals of $R$. When $R$ is a field, the null ideal is the only prime ideal, and it is a maximal ideal so that its Krull dimension is zero. When $R$ is a principal ideal domain which is not a field, there are two kinds of prime ideals: the null ideal is prime (as in any domain), and the other prime ideals are the maximal ideals of $R$, generated by a prime element $p$. In particular, the Krull dimension of the ring of integers is equal to $1$.

It is classic that these concepts can be defined for semirings as well.

A semiring $R$ is a set endowed with a commutative and associative addition with a neutral element $0$, an associative multiplication with a neutral element $1$, such that addition distributes over multiplication: $(a+b)c=ac+bc$ and $c(a+b)=ca+cb$. When its multiplication is commutative, the semiring is said to be commutative.

Semirings $R$ have ideals: these are nonempty subsets $I$ which are stable under addition ($a+b\in I$ for $a,b\in I$), and stable under multiplication by any element of $R$: for general semirings, one has to distinguish between left, right, and two-sided ideals; for commutative semirings, the notions coincide.

An ideal $P$ of a semiring $R$ is said to be prime if $R\setminus P$ is a multiplicative subset; explicitely, $P\neq R$, and if $ab\in P$, then $a\in P$ or $b\in P$.

An ideal $P$ of a semiring $R$ is said to be maximal if $P\neq R$ and if there is no ideal $I$ such that $P\subsetneq I\subsetneq R$. A semiring $R$ is said to be local if it admits exactly one maximal ideal; this means that the set of non-invertible elements of $R$ is an ideal.

The amusing part comes from the classification of prime and maximal ideals of the semiring $\mathbf N$ of natural numbers, which I learned of via a Lean formalization project led by Junyan Xu.

Theorem.

The semiring $\mathbf N$ is local; its maximal ideal is the set $\mathbf N\setminus\{1\}$.
The null ideal is a prime ideal.
The other prime ideals are the sets $p\mathbf N$, for all prime numbers $p$.

In particular, we have chains $\langle 0\rangle \subsetneq \langle p\rangle \subsetneq \mathbf N\setminus\{1\}$ of prime ideals, hence the result stated in the title of this post:

Corollary. — The Krull dimension of the semiring of natural numbers is equal to $2$.

Proof.

The element $1$ is the only unit of $\mathbf N$, and $\mathbf N\setminus\{1\}$ is obviously an ideal, necessarily its unique maximal ideal.
The null ideal is a prime ideal, as in any semiring which is a domain.
Let now $P$ be a nonzero prime ideal of $\mathbf N$ and let $p$ be the smallest nonzero element of $P$. Then $p\neq 1$ (otherwise, $P=\mathbf N$, which is not prime). The hypothesis that $P$ is prime implies that one of the prime factors of $p$ belongs to $p$; by the choice of $p$, this must be $p$ itself, so that $p$ is a prime number. Then $P$ contains the set $p\mathbf N $ of multiples of $p$, which is a prime ideal of $\mathbf N$. Let us assume that $p\mathbf N\subsetneq P$ and let $n\in P\setminus p\mathbf N$. By assumption, $p$ does not divide $n$, so that these two integers are coprime. By the following proposition, $P$ contains every integer at least equal to $(p-1)(n-1)$; in particular, it contains both a power of $2$ and a power of $3$; since $P$ is prime, it contains $2$ and $3$, and it contains any integer at least $(2-1)(3-1)=2$, hence $P=\mathbf N\setminus\{1\}$. This concludes the proof.

Proposition. — Let $a$ and $b$ be nonzero coprime natural numbers. For any integer $n\geq (a-1)(b-1)$, there are natural numbers $u$ and $v$ such that $n=au+bv$.

Proof. — Since $a$ and $b$ are coprime, there are integers $u$ and $v$ such that $n=au+bv$. Replacing $u$ by $u+b$ and $v$ by $v-a$, we may assume that $0\leq u$. Replacing $u$ by $u-b$ and $v$ by $v+a$, we may assume that $u < b$. Then \[ bv = n - au \geq (a-1)(b-1)-a(b-1)= -(b-1). \] This implies $v\geq0$, so that $u$ and $v$ are natural numbers.

On the other hand, not all natural numbers $< (a-1)(b-1)$ can be written as such as sum. For example, $ab-a-b$ can't. Indeed, if $ab-a-b=au+bv$, hence $ab=a(u+1)+b(v+1)$, then $b$ divides $a(u+1)$, hence $b$ divides $u+1$, and similarly $a$ divides $v+1$. Then $ab$ is the sum of two nonzero multiples of $ab$, a contradiction. The precise distribution of the natural numbers $< (a-1)(b-1)$ which can be written as $au+bv$, for some natural numbers $u$ and $v$ is complicated, but at least one result is known : such an integer $n$ can be written in this form if and only if $ab-a-b-n$ cannot! One direction is clear, if $n$ and $ab-a-b-n$ can both be written in this form, then so can their sum, which is $ab-a-b$, a contradiction. On the other hand, let $n$ be an integer that cannot be written in this form, and write it as $n=au+bv$, for some integers $u$ and $v$, with $0\leq u< b$. By assumption, $v<0$, hence $v\leq -1$. Then \[ ab-a-b-n=ab-a-b-au-bv=a(b-1-u)+b(-v-1).\] We see that $b-1-u\geq 0$ and $-v-1\geq 0$, which shows that $ab-a-b-n$ can be written in the desired form.

Another open question of the style of the proposition had been raised by Frobenius: consider mutually coprime integers $a_1,\dots,a_r$; then any large enough integer $n$ can be written as $n=a_1u_1+\dots+a_ru_r$, for some natural numbers $u_1,\dots,u_r$, but when $r\geq 3$, there is no known formula for largest natural number that cannot be written in this form. The case $r=2$ that we discussed here was due to Sylvester (1884).

Autoformalization of mathematical theorems? No shit!

2025-07-02T16:50:00.002+02:00

I've been formalizing mathematical theorems in Lean for some years now, and one of the major blocks is the difficulty of formalizing elementary results that mathematicians do not take the time to even state. For example, a mathematician's integer can implicitly be a natural number at a line and a real number at the next one, while proof assistants require that some “coercion” maps be introduced. Having three `1` in the same formula, of three different natures, leads to unexpected nightmares. It is therefore very important for the development of proof-assisted mathematics that some automation be provided by the type checking system, with as many obvious things being taken care of by a combination of a fine-tuned API and efficient search algorithms.

Another reason for the difficulty of formalizing proofs is having to navigate in a gigantic mathematical library (Mathlib, for example, has 1.5 million line of codes) and the lack of a definitive search engine and of a reliable and complete documentation. This is not to blame my colleagues — all of this is difficult to write, time consuming, and our personal interests are driven in different directions.

This may explain that part of the field of formalization is driven by the “AI hype”, and — although this hype is so finely rebuted by Emily Bender and Alex Hannah in their last book, The AI Con, which I recommend heartily — that several colleagues use LLMs to create proofs, either in natural language, or in Lean's syntax. They say it is successful, but from the outside, it is really hard to tell whether it is really the case. My main impression is that these softwares obliterate the time where our mind tries to form ideas, leading — for me — to another kind of stress. There are also serious arguments that the systematic cognitive friction is a necessity of well-formed thinking, and cannot be replaced by stochastic optimization. Moreover, these colleagues usually do not address the environmental cost of using such kinds of generative AI, and whether its output is worth that cost.

A few days ago, after I complained — one more time — that stochastic algorithms do not think, I was asked my opinion about the “trinity autoformalization system”. I have to confess I carefully avoid the AI news and hadn't heard about it. A first search led me to a 2022 paper Autoformalization with Large Language Models. Here is the definition of “autoformalization” from their paper:

Autoformalization refers to the task of automatically translating from natural language mathematics to a formal language.

In this case, the authors took as a benchmark statements of mathematical competition problems, and were happy to have their LLMs translate correctly roughly 25% of those problems. This might be a technical exploit, but le me stay unimpressed for the moment: what mathematicians need is more than that, and is not elementary statements of math olympiad problems, but the most advanced mathematical concepts that the human mind is capable to grasp despite the fact that they involve extremely intricate and sometimes combinatorically involved constructions. I am not sure that we fully know what it means to understand these concepts, but I am certain that it doesn't reduce to being able of formally stating a mathematical statement. (And what about those very fine colleagues who demonstrate everyday their mathematical depth while failing at stating precise statements?)

It appears my colleague from the Lean community meant another result. Trinity is a project from Morph Labs, which their web page describes as follows:

Morph Labs is building a cloud for superintelligence.
Infinibranch enables AI agents on Morph Cloud to snapshot, replicate, autoscale, test, debug, deploy, and reason about software at light speed.

This starts pretty badly. First of all, superintelligence is even less defined than intelligence is, and claims of developing superintelligence can only be suspicious, especially when no scientific claim justifies that AI softwares feature any intelligence at all, and most of the AI experiments show a poor rate of success. The next sentence may be worse, since the speed of reasoning is not measured in m/s (nor in ft/hr), and the speed of electric waves in cables is smaller than the speed of light (although, I just learnt, it can be up to 99% of it!).

Their blog page on Trinity starts as follows:

tl;dr
We're excited to announce Trinity, an autoformalization system that represents a critical step toward verified superintelligence.

which combines the colloquial tl;dr (too long, don't read) with the brutal claim that their autoformalization could be a step towards superintelligence. Later on the webpage, they boast about a nearly infinite supply of verified training environments which, in our finite world plagued by a brutal climate change, is quite a thing.

What these people claim to have done is the autoformalization of a 1962 theorem by N. G. de Bruijn (who, incidentally, is the father of Automath, one of the first proof assistants to be built), regarding the behaviour of the number $N(n)$ of integers $\leq n$ all of whose prime factors divide $n$, when $n$ grows to infinity. Answering a question of Erdös, de Bruijn proved that, on average, $N(n)$ is at most $n^{1+\varepsilon}$. On the other hand, the Trinity team puts the accent on a consequence of that result to the abc conjecture of Masser and Oesterlé. That conjecture asserts that for any $\varepsilon\gt0$, there exists a real number $k\gt0$ such that there are only finitely many triples $(a,b,c)$ of coprime natural numbers such that $a+b =c $ and such that the product $\operatorname{rad}(abc)$ of all prime numbers dividing $abc$ is at most $c^{1-\varepsilon}$. Implications of that mysterious elementary looking conjecture to number theoretical questions are manifold, from an asymptotic version of Fermat's Last Theorem to an effective version of the Mordell conjecture to Siegel zeroes of L-functions, etc. In effect, the theorem of de Bruijn implies that there are at most $\mathrm O(N^{2/3})$ triples $(a,b,c)$ of coprime natural numbers at most $N$ such that $\operatorname{rad}(abc)\lt c^{1-\varepsilon}$. When this bound is compared to the $\approx \frac6{\pi^2}N^2$ triples of coprime integers $(a,b,c)$ such that $a+b=c\leq N$, this indicates that the (hoped to be finite) set of the abc conjecture is kind of sparse.

To be fair, I am not capable of judging the difficulty of what they achieved, and I presume this is a difficult piece of software engineering to adjust the myriad of parameters of the implied neural networks. But what I can tell is whether they did what they say they did, and whether (this is more of an opinion) this has any relevance for the future of mathematical formalization. The answers, you can guess, will be no, definitely no.

I would first like to make a mathematical comment on the sentence

We obtain the first formalized theorem towards the abc conjecture, showing it is true almost always.

that can be read on the github page for the project which compares to the title of the (nice) exposition paper by Jared Lichtman: “The abc conjecture is true almost always”, but adds the idea that this theorem would be a step towards the abc conjecture. I claim that it is nothing like that. Indeed, it is an observation of Oesterlé (see Szpiro's paper, page 12) that the $\varepsilon=0$-variant of the abc conjecture is false. On the other hand, the exact same argument shows that the counterexamples to this false conjecture are as sparse, maybe with an upper bound $N^{2/3+\varepsilon}$ for the number of counterexamples. Consequently, if de Bruijn's result were a step towards the abc conjecture, it would simultaneously be a step towards a false conjecture. The most reasonable conclusion is then to admit that this theorem has no proof-theoretic relevance to the abc conjecture.

Then, while Morph's blog says:

Trinity systematically processes entire papers, intelligently corrects its own formalization errors by analyzing failed attempts, and automatically refactors lengthy proofs to extract useful lemmas and abstractions,

the github page for the actual project mentions a human-supplied blueprint. This is a file in (La)TeX format that serves as a convenient host for the Lean project, providing information to the Lean prover. From the expression “human-supplied”, one has to understand that human beings have written that file, that is to say, have split the proof of de Bruijn's theorem in a series of 67 lemmas which divide the initial paper into tiny results, while giving references to other lemmas as indications of how they could be proved. As an example, here is “lemma24”:

\begin{lemma} \label{lem24} \lean{lemma24}
Let $p_1<\cdots<p_k$ be distinct primes, and denote the product $r = p_1 \cdots p_k$. If an integer $n\ge1$ satisfies $\rad(n)=r$, then $n = p_1^{a_1}\cdots p_k^{a_k}$ for some integers $a_1,\dots,a_k\ge 1$.
\end{lemma}
\begin{proof}\leanok
\uses{thm3, lem20}
Uses theorem \ref{thm3} with $n$, then uses lemma \ref{lem20}.
\end{proof}

which means that if an integer $r$ is the product of distinct primes $p_1,\dots,p_k$ and $\operatorname{rad}(n)=r$, then $n$ can be written $p_1^{a_1}\cdots p_k^{a_k}$, for some integers $a_1,\dots,a_k\geq 1$. The indication is to use “thm3” (the fundamental theorem of arithmetic) and “lem20” (the formula for the radical of an integer in terms of its decomposition into prime factors).

To summary, humans have converted the 2-page proof of this result into 67 tiny chunks, supplying plenty of information that would have probably been unnecessary to most professional mathematicians, and fed that to their LLM. I am reluctant to draw a comparison with the AI farms in Kenya where workers were (and probably still are) exploited at tagging violent images for a ridiculous pay, but this is yet another instance where “mechanized artificial intelligence” relies crucially on human's work (beyond the invention and deployment of neural networks, etc., which are human's work as well, of course). And as in these other cases, the technostructure makes every effort to make this human work invisible: the blueprint has no author, the blog post has no author. Everything is made so that you believe that no human was implied in this project.

Let's go back to the content of the blueprint, taking for granted that such a work has to be done anyway. However, we, humans, do not want to have to write endlessly tiny lemmas, each of them apparently irrelevant to the big mathematical picture. We do not want to keep them in a mathematical library, for almost all of them have no interest outside of their specific contect. Moreover, once these lemmas are written, there would be no additional effort for a human to formalize the proof. What we do want is a way to translate with as little effort as possible mathematical ideas, keeping mostly close their natural expression.

Moreover, even if we would have had to provide these tiny lemmas, we wouldn't be able to keep them in a human-accessible mathematical library, for that library would be cluttered by millions of uninteresting rapidly redundant lemmas. Building a mathematical library is a human task, that reflects the activity of past human minds, to be used by future human minds.

As a matter of fact, this project considers a result in elementary mathematics, at an undergraduate level, and can completely ignore the difficulty of building a large scale mathematical library, while this is precisely there that we need better automation. Something which is missing in Lean (but exists in other proof assistant systems, such as Rocq) is a coherent management system of all algebraic/analytic structures, which would allow to optimize their construction without having to modify the rest of the code. There are some cases of such an automation in Lean though, for example the automatic conversion of mathematical lemmas for groups written in multiplicative form to groups written in additive forms, giving one proof for two. Building such a tool requires a good proficiency in parsers and a clear view of what mathematicians do “in their heads” when they perform that apparently innocuous conversion. (For this reason, another automatic conversion for changing the direction of ordering relations appears more difficult to write, because what we would want is less clear. For the moment, mathlib has resolved itself to systematically privilege the $\leq$ inequality; the other one can be made to the dual order, but automating this prompts out the very same problem.) Another kind of crucial automation is the implementation of solvers for linear inequalities, or automatic proofs of algebraic identifies (in numbers, in groups, whatrever) so that the sentence “a computation shows that…” could almost be translated as such in formal code. This requires a good knowledge in programming, and the use of subtle but classical algorithms (simplex method, Gröbner bases, SAT solvers…), each of them finely tailored for its applications. This is a truly beautiful combination of mathematics and computer science, but nothing close of a so-called “general intelligence” tool.

There is something positive, though, that I could say about the search for automatic formalization. The AI companies do not acknowledge it, most of the public opinion is delusionnal, but LLMs have no reasoning faculty at all, and they just can't. What they do is put words together that would fit together with a good probability given the existing corpus that the machine has been given. Patterns exist in language, they exist in mathematics as well, and the (impressing, I concede) faculty of these softwares allows them to simulate text, be it litterary or mathematical, that looks plausible. But that doesn't make it true. On the other hand, these softwares could be combined with proofs assistants that work in a formal, proof-amenable, language, hereby offering a way of assessing the veracity of the output of the vernacular text. (On the other hand, one would need to be sure that the text couldn't say $1+1=3$ when the Lean code has certified that $1+1=2$.)

As a final paragraph, I would like to comment on the metaphors invoked by Morph Labs. Everyone who is even loosely knowledgeable in Sci-Fi movies will have recognized the Wachowskis's movie The Matrix. In that dystopic movie, humans are trapped in a machine-made “reality” after they lost a war against the artificial intelligences they created. Of course, Morpheus and Trinity are humans who fight against this power, but I wonder what the creators of Morph Labs had in mind when they decided to call themselves under this name, and to go on with the metaphor by using the name Trinity. (And who's Neo?) Their obvious reference is a world where artificial intelligence led to humanity's doom, while their rhetoric is one of AI hype. We rejoin here the discussion of Bender and Hannah in their abovementioned book, where they explain that AI doomerism/boosterism are the two faces of the same coin, that takes every development of genAI as unavoidable, whatever its social, ethical, and ecological costs.

Yet another proof of the Weierstrass approximation theorem

2025-04-25T12:48:00.000+02:00

Browsing through my Zotero database, I fall upon a paper by Harald Kuhn where he proposes an elementary proof of the Weierstrass approximation theorem. The proof is indeed neat, so here it is.

Theorem. — Let $f\colon[0;1]\to\mathbf R$ be a continuous function and let $\varepsilon$ be a strictly positive real number. There exists a polynomial $P\in\mathbf R[T]$ such that $\left|P(x)-f(x)\right|<\varepsilon$ for every $x \in[0;1]$.

The proof goes out of the world of continuous functions and considers the Heaviside function $H$ on $\mathbf R$ defined by $H(x)=0$ for $x<0$ and $H(x)=1$ for $x\geq 0$ and will construct a polynomial approximation of $H.$

Lemma. — There exists a sequence $(P_n)$ of polynomials which are increasing on $[-1;1]$ and which, for every $\delta>0$, converge uniformly to the function $H$ on $[-1;1]\setminus [-\delta;\delta]$.

For $n\in\mathbf N$, consider the polynomial $Q_n=(1-T^n)^{2^n}$. On $[0;1]$, it defines an decreasing function, with $Q_n(0)=1$ and $Q_n(1)=0$.

Let $q\in[0;1]$ be such that $0\leq q<1/2$. One has $$ Q_n(q) = (1-q^n)^{2^n} \geq 1-2^n q^n $$ in view of the inequality $(1-t)^n \geq 1-nt$ which is valid for $t\in[0;1]$. Since $2q<1$, we see that $Q_n(q)\to 1$. Since $Q_n$ is decreasing, one has $Q_n(q)\leq Q_n(x)\leq Q_n(1)=1$ for every $x\in[0;q]$, which shows that $Q_n$ converges uniformly to the constant function $1$ on the interval $[0;q]$.

Let now $q\in[0;1]$ be such that $1/2<q\leq 1$. Then $$ \frac1{Q_n(q)} = \frac1{(1-q^n)^{2^n}} = \left(1 + \frac{q^n}{1-q^{n}}\right)^{2^n} \geq 1 + \frac{2^nq^n}{1-q^{n}} $$ so that $Q_n(q)\to 0$. Since $Q_n$ is decreasing, one has $0=Q_n(1)\leq Q_n(x)\leq Q_n(q)$ for every $x\in[q;1]$, so that $Q_n$ converges uniformly to the constant function $0$ on the interval $[q;1]$.

Make an affine change of variables and set $P_n = Q_n((1-T)/2)$. The function defined by $P_n$ on $[-1:1]$ is increasing; for any real number $\delta$ such that $\delta>0,$ it converges uniformly to $0$ on $[-1;-\delta]$, and it converges uniformly to $1$ on $[\delta;1]$. This concludes the proof of the lemma.

We can now turn to the proof of the Weierstrass approximation theorem. Let $f$ be a continuous function on $[0;1]$. We may assume that $f(0)=0.$

The first step, that follows from Heine's uniform continuity theorem, consists in noting that there exists a uniform approximation of $f$ by a function of the form $ F(x)=\sum_{m=1}^N a_m H(x-c_m)$, where $(a_1,\dots,a_m)$ and $(c_1,\dots,c_m)$ are real numbers in $[0;1].$ Namely, Heine's theorem implies that there exists $\delta>0$ such that $|f(x)-f(y)|<\varepsilon$ if $|x-y|<\delta$. Choose $N$ such that $N\delta\geq 1$ and set $c_m=m/N$; then define $a_1,\dots,a_N$ so that $a_1=f(c_1)$, $a_1+a_2=f(c_2)$, etc. It is easy to check that $|F(x)-f(x)|\leq \varepsilon$ for every $x\in[0;1]$. Moreover, $\lvert a_m\rvert\leq\varepsilon/2$ for all $m$.

Now, fix a real number $\delta>0$ small enough so that the intervals $[c_m-\delta;c_m+\delta]$ are pairwise disjoint, and $n\in\mathbf N$ large enough so that $|P_n(x)-H(x)|\leq\varepsilon/2A$ for all $x\in[-1;1]$ such that $\delta\leq|x|$, where $A=\lvert a_0\rvert+\dots+\lvert a_N\rvert$. Finally, set $P(T) = \sum_{m=1}^N a_m P_n(T-c_m)$.

Let $x\in[-1;1]$. If $x$ doesn't belong to any interval of the form $[c_m-\delta;c_m+\delta]$, one can write $$\lvert P(x)-F(x)\rvert\leq \sum_{m} \lvert a_m\rvert \,\lvert P_n(x-c_m)- H(x-c_m)\rvert \leq \sum_m \lvert a_m\rvert (\varepsilon/A)\leq \varepsilon. $$ On the other hand, if there exists $m\in\{1,\dots,N\}$ such that $x\in[c_m-\delta;c_m+\delta]$, then there exists a unique such integer $m$. Writing $$\lvert P(x)-F(x)\rvert\leq \sum_{k\neq m} \lvert a_k\rvert \,\lvert P_n(x-c_k)- H(x-c_k)\rvert + \lvert a_m\rvert\, \lvert P_n(x-c_m)-H(x-c_m)\rvert, $$ the term with index $k$ in the first sum is bounded by $\lvert a_k\rvert \varepsilon/A$, while the last term is bounded by $ \lvert a_m\rvert$, because $0\leq P_n\leq H\leq 1$. Consequently, $$\lvert P(x)-F(x)\rvert \leq (\varepsilon/2A)\sum_{k\neq m} \lvert a_k\rvert +\lvert a_m\rvert \leq 2\varepsilon.$$ Finally, $\lvert P(x)-f(x)\rvert\leq \lvert P(x)-F(x)\rvert+\lvert F(x)-f(x)\rvert\leq 3\varepsilon$. This concludes the proof.

Yet another proof of the inequality between the arithmetic and the geometric means

2025-04-25T00:42:00.004+02:00

This is an exposition of the proof of the inequality between arithmetic and geometric means given by A. Pełczyński (1992), “Yet another proof of the inequality between the means”, Annales Societatis Mathematicae Polonae. Seria II. Wiadomości Matematyczne, 29, p. 223–224. The proof might look bizarre, but I can guess some relation with another paper of the author where he proves uniqueness of the John ellipsoid. And as bizarre as it is, and despite the abundance of proofs of this inequality, I found it nice. (The paper is written in Polish, but the formulas allow to understand it.)

For $n\geq 1$, let $a_1,\dots,a_n$ be positive real numbers. Their arithmetic mean is $$ A = \dfrac1n \left(a_1+\dots + a_n\right)$$ while their geometric mean is $$G = \left(a_1\dots a_n\right)^{1/n}.$$ The inequality of the title says $G\leq A,$ with equality if and only if all $a_k$ are equal. By homogeneity, it suffices to prove that $A\geq 1$ if $G=1$, with equality if and only if $a_k=1$ for all $k.$ In other words, we have to prove the following theorem.

Theorem. — If $a_1,\dots,a_n$ are positive real numbers such that $a_1\dots a_n=1$ and $a_1+\dots+a_n\leq n,$ then $a_1=\dots=a_n=1.$

The case $n=1$ is obvious and we argue by induction on $n$.

Lemma. — If $a_1\cdots a_n=1$ and $a_1+\dots+a_n\leq n,$ then $a_1^2+\dots+a_n^2\leq n$

Indeed, we can write $$ a_1^2+\dots+a_n^2 = (a_1+\dots+a_n)^2 - \sum_{i\neq j} a_i a_j \leq n^2 - \sum_{i\neq j} a_i a_j,$$ and we have to give a lower bound for the second term. For given $i\neq j$, the product $a_i a_j$ and the remaining $a_k$, for $k\neq i,j$, are $n-1$ positive real numbers whose product is equal to $1$. By induction, one has $$ n-1 \leq a_i a_j + \sum_{k\neq i,j}a_k.$$ Summing these $n(n-1)$ inequalities, we have $$ n(n-1)^2 \leq \sum_{i\neq j} a_i a_j + \sum_{i\neq j} \sum_{k\neq i,j} a_k.$$ In the second term, every element $a_k$ appears $(n-1)(n-2)$ times, hence $$ n(n-1)^2 \leq \sum_{i\neq j} a_i a_j + (n-1)(n-2) \sum_{k} a_k \leq \sum_{i\neq j} a_i a_j + n(n-1)(n-2), $$ so that $$ \sum_{i\neq j} a_i a_j \geq n(n-1)^2-n(n-1)(n-2)=n(n-1).$$ Finally, we obtain $$a_1^2+\dots+a_n^2 \leq n^2-n(n-1)=n,$$ as claimed.

We can iterate this lemma: if $a_1+\dots+a_n\leq n$, then $$a_1^{2^m}+\dots+a_n^{2^m}\leq n$$ for every integer $m\geq 0$. When $m\to+\infty$, we obtain that $a_k\leq 1$ for every $n$. Since $a_1\dots a_n=1$, we must have $a_1=\dots=a_n=1$, and this concludes the proof.

A generalization of the Eisenstein criterion

2025-04-07T00:22:00.004+02:00

Recently, in the Zulip server for Lean users, somebody went with something that looked like homework, but managed to sting me a little bit.

It was about irreducibility of polynomials with integer coefficients. Specifically, the guy wanted a proof that the polynomial $T^4-10 T^2+1$ is irreducible, claiming that the Eisenstein criterion was not good at it.

What was to proven (this is what irreducible means) is that it is impossible to write that polynomial as the product of two polynomials with integer coefficients, except by writing $T^4-10 T^2+1$ as $(1)·(T^4-10 T^2+1)$ or as $ (-1)·(-T^4+10 T^2-1)$.

This is both a complicated and a trivial question

Trivial because there are general bounds (initially due to Mignotte) for the integers that appear in any such factorization, and it could just be sufficient to try any possibility in the given range and conclude. Brute force, not very intelligent, but with a certain outcome.

Complicated because those computations would often be long, and there are many criteria in number theory to assert irreducibility.

One of the easiest criteria is the aforementioned Eisenstein criterion.

This criterion requires an auxiliary prime number, and I'll state it as an example, using another polynomial, say $T ^ 4 - 10 T ^ 2 + 2$. Here, one takes the prime number 2, and one observes that modulo 2, the polynomial is equal to $T ^ 4$, while the constant term, 2, is not divisible by $2^2=4$. In that case, the criterion immediately asserts that the polynomial $T^4-10T^2+2$ is irreducible.

With all due respect to Eisenstein, I don't like that criterion too much, though, because it only applies in kind of exceptional situations. Still, it is often useful.

There is a classic case of application, by the way, of the Eisenstein criterion, namely the irreducibility of cyclotomic polynomials of prime index, for example, $T^4+T^3+T^2+T+1$ (for the prime number $p=5$). But it is not visible that the criterion applies, and one needs to perform the change of variable $T = X+1$.

I had noticed that in that case, it maybe interesting to generalize the Eisenstein criterion to avoid this change of variables. Indeed, for a monic polynomial $f$ in $\mathbf Z[T]$, a variant of the criterion states: take an integer $a$ and a prime number $p$, and assume that :

$f(T) \equiv (T-a)^d \mod p$
$f'(a)$ (derivative at $a$) is not divisible by $p^2$.

Then $f$ is irreducible.

For the cyclotomic polynomial of index $5$, $f(T) = T^4+T^3+T^2+T+1$, still taking $p=5$, one has $f(T) \equiv (T-1)^4 \pmod5$ and $f'(1)=4·5/2=10$ is not divisible by $5^2=25$. Consequently, it is irreducible. And the same argument works for all cyclotomic polynomials of prime index.

The reason, that avoids any strange computation, is that $f(T)=(T^5-1)/(T-1)$, which modulo 5 is $(T-1)^4$ — by the divisibility of the binomial coefficients.

To go back to the initial example, $T^4-10T^2+1$, there are indeed no prime numbers with which the Eisenstein criterion can be applied. This is obvious in the standard form, because the constant coefficient is 1. But the variant doesn't help neither. The only prime it could seems to be 2, but its derivative at 1 is equal to $-16$, and is divisible by 4.

This is where a new variant of the criterion can be applied, this time with the prime number 3.

Theorem. — Let $q\in\mathbf Z[T]$ be a monic polynomial, let $p$ be a prime number such that $q$ is irreducible in $\mathbf F_p[T]$. Let $f\in\mathbf Z[T]$ be a monic polynomial. Assume that $f\equiv q^d$ modulo $p$, but that $f\not\equiv 0$ modulo $\langle q, p^2\rangle$. Then $f$ is irreducible in $\mathbf Z[T]$.

To see how this criterion applies, observe that modulo 3, one has $f(T)\equiv T^4+2T^2+1=(T^2+1)^2\pmod 3$. So we are almost as in the initial criterion, but the polynomial $T$ is not $T^2+1$. The first thing that allows this criterion to apply is that $T^2+1$ is irreducible modulo 3. In this case, this is because $-1$ is not a square mod 3.

The criterion also requires of variant of the condition on the derivative — it holds because the polynomial is not zero modulo $\langle T^2+1, 9\rangle. Here, one has \[ T^4-10T^2+1=(T^2+1)^2-12T^2 = (T^2+1)^2-12(T^2+1)+12\] hence it is equal to 3 modulo $\langle T^2+1, 9\rangle$.

And so we have an Eisenstein-type proof that the polynomial $T^4-10T^2+1$ is irreducible over the integers. CQFD!

I made the fun last a bit longer by formalizing the proof in Lean, first of the generalized criterion, and then of the particular example. It is not absolutely convincing yet, because Lean/Mathlib still lacks a bit of tools for handling explicit computations. And probably many parts can be streamlined. Still, it was a fun exercise to do.

The proof works in a more general context and gives the following theore:

Theorem. — Let $R$ be an integral domain, let $P$ be a prime ideal of $R$ and let $K$ be the field of fractions of $R/P$. Let $q\in R[T]$ be a monic polynomial such that $q$ is irreducible in $K[T]$. Let $f\in R[T]$ be a monic polynomial. Assume that $f\equiv q^d$ modulo $P$, but that $f\not\equiv 0$ modulo $\langle q\rangle + P^2$. Then $f$ is irreducible in $R[T]$.

A simple proof of a theorem of Kronecker

2025-03-29T12:01:00.026+01:00

Kronecker's theorem of the title is the following.

Theorem. — Let $\alpha\in\mathbf C$ be an algebraic integer all of whose conjugates have absolute value at most $1$. Then either $\alpha=0$, or $\alpha$ is a root of unity.

This theorem has several elementary proofs. In this post, I explain the simple proof proposed by Gebhart Greiter in his American Mathematical Monthly note, adding details so that it would (hopefully) be more accessible to students.

The possibility $\alpha=0$ is rather unentertaining, hence let us assume that $\alpha\neq0$.

Let us first analyse the hypothesis. The assumption that $\alpha$ is an algebraic integer means that there exists a monic polynomial $f\in\mathbf Z[T]$ such that $f(\alpha)=0$. I claim that $f$ can be assumed to be irreducible in $\mathbf Q[T]$; it is then the minimal polynomial of $\alpha$. This follows from Gauss's theorem, and let me give a proof for that. Let $g\in \mathbf Q[T]$ be a monic irreducible factor of $f$ and let $h\in\mathbf Q[T]$ such that $f=gh$. Chasing denominators and dividing out by their gcd, there are polynomials $g_1,h_1\in\mathbf Z[T]$ whose coefficients are mutually coprime, and natural integers $u,v$ such that $g=ug_1$ and $h=vh_1$. Then $(uv)f=g_1 h_1$. Since $f$ is monic, this implies that $uv$ is an integer. Let us prove that $uv=1$; otherwise, it has a prime factor $p$. Considering the relation $(uv)f=g_1h_1$ modulo $p$ gives $0=g_1 h_1 \pmod p$. Since their coefficients are mutually coprime, the polynomials $g_1$ and $h_1$ are nonzero modulo $p$, hence their product is nonzero. This is a contradiction.

So we have a monic polynomial $f\in\mathbf Z[T]$, irreducible in $\mathbf Q[T]$, such that $f(\alpha)=0$. That is to say, $f$ is the minimal polynomial of $\alpha$, so that the conjugates of $\alpha$ are the roots of $f$. Note that the roots of $f$ are pairwise distinct — otherwise, the $\gcd(f,f')$ would be a nontrivial factor of $f$. Moreover, $0$ is not a root of $f$, for otherwhise one could factor $f=T\cdot f_1$.

Let us now consider the companion matrix to $f$: writing $f=T^n+c_1T^{n-1}+\dots+c_n$, so that $n=\deg(f)$, this is the matrix \[ C_f = \begin{pmatrix} 0 & \dots & 0 & -c_n \\ 1 & \ddots & \vdots & -c_{n-1} \\ 0 & \ddots & & -c_{n-2} \\ 0 & \cdots & 1 & -c_1 \end{pmatrix}.\] If $e_1,\ldots,e_n$ are the canonical column vectors $e_1=(1,0,\dots,0)$, etc., then $C_f\cdot e_1=e_2$, \ldots, $C_f \cdot e_{n-1}=e_{n}$, and $C_f\cdot e_n = -c_{n} e_1-\dots -c_1 e_n$. Consequently, \[ f(C_f)\cdot e_1 = C_f^n\cdot e_1 +c_1 C_f^{n_1} \cdot e_1+ \dots +c_n e_1 = 0.\] Moreover, for $2\leq k\leq n$, one has $f(C_f)\cdot e_k=f(C_f)\cdot C_f^{k-1}\cdot e_1=C_f^{k-1}\cdot f(C_f)\cdot e_1=0$. Consequently, $f(C_f)=0$ and the complex eigenvalues of $f(C_f)$ are roots of $f$. Since $f$ has simple roots, $C_f$ is diagonalizable and their exists a matrix $P\in\mathrm{GL}(n,\mathbf C)$ and diagonal matrix $D$ such that $C_f=P\cdot D\cdot P^{-1}$, the diagonal entries of $D$ are roots of $f$. Since $0$ is not a root of $f$, the matrix $D$ is invertible, and $C_f$ is invertible as well. More precisely, one can infer from the definition of $C_f$ that $g(C_f)\cdot e_1\neq 0$ for any nonzero polynomial $g$ of degre $<n$, so that $f$ is the minimal polynomial of $C_f$. Consequently, all of its roots are actually eigenvalues of $C_f$, and appear in $D$; in particular, $\alpha$ is an eigenvalue of $C_f$.

For every $k\geq 1$, one has $C_f^k=P\cdot (D^k)\cdot P^{-1}$. Since all entries of $D$ have absolute value at most $1,$ the set of all $D^k$ is bounded in $\mathrm{M}_n(\mathbf C)$. Consequently, the set $\{C_f^k\,;\, k\in\mathbf Z\}$ is bounded in $\mathrm M_n(\mathbf C)$. On the other hand, this set consists in matrices in $\mathrm M_n(\mathbf Z)$. It follows that this set is finite.

There are integers $k$ and $\ell$ such that $k< \ell$ and $C_f^k=C_f^\ell$. Since $C_f$ is invertible, one has $C_f^{\ell-k}=1$. Since $\alpha$ is an eigenvalue of $C_f,$ this implies $\alpha^{\ell-k}=1$. We thus have proved that $\alpha$ is a root of unity.

On numbers and unicorns

2025-01-11T19:56:00.001+01:00

Reading a book on philosophy of mathematics, even if it's written lightly, such as that one, Why is there philosophy of mathematics at all? by Ian Hacking, may have unexpected effects. The most visible one has been a poll that I submitted on Mastodon on December 4th. As you can read, there were four options:

Numbers exist
Unicorns exist
Numbers have more existence than unicorns
Neither numbers no unicorns exist

As you can see, three main options each share a rough third of the votes, the existence of unicorns being favored by a small 4% of the 142 participants.

Post by @antoinechambertloir@mathstodon.xyz

View on Mastodon

In this post, I would like to discuss these four options, from the point of view of a mathematician with a limited expertise on philosophy. Funnily, each of them, including the second one, has some substance. Moreover, it will become apparent at the end that the defense of any of those options relies on the meaning we give to the word exist.

Numbers exist

We have traces of numbers as old as we have traces of writing. The Babylonian clay tablet known as “Plimpton 322” dates from 1800 BC and mentions Pythagorean triples (triples of integers $(a,b,c)$ such $a^2=b^2+c^2$) written in the sexagesimal (base 60) writing of Babylonians. More than 1000 years later, Pythagoras and his school wished to explain everything in the world using numbers, usually depicted geometrically, drawing pebbles arranged in triangles, squares, rectangles and even pentagons. Musical harmony was based on specific numerical ratios. Plato's writings are full of references to mathematics, and a few specific numbers, such as 5040, are even given a political relevance. Even earlier, the Chinese I Ching based predictions on random numbers.

Euclid's Elements give us an account of numbers that still sounds quite modern. A great part of elementary arithmetic can be found there: divisibility, the “Euclidean algorithm”, prime numbers and the fact that there are infinitely many of them or, more precisely, that there are more prime numbers than any given magnitude. On the other hand, it is worth saying that Euclid's concept of number doesn't exactly fit with our modern concept: since “A number is a multitude composed of units”, numbers are whole numbers, and it is implicit that that zero is not a number, and neither is one. Moreover, proposition 21 of book 10, which we read as proving the irrationality of the square root of 2, is not a statement about a hypothetical number called “square root of 2”, but the fact that the diagonal of a square of unit length is irrational, ie, not commensurable with the unit length.

History went on and on and numbers gradually were prevalent in modern societies. Deciding the exact date of Easter in a ill-connected Christian world led to the necessity of computing it, using an algorithm known as computus, and the name happened to be used for every calculation. The development of trade led to the development of computing devices, abacus, counting boards (the counter at your prefered shop is the place where the counting board was laid), and Simon Stevin's De Thiende explicitly motivates trade as an application of his book on decimal fractions.

Needless to say, our digital age is litteraly infused with numbers.

Unicorns exist

Traces of unicorns can be found by the Indus valley. The animal seems to have disappeared but the Greeks thought they lived in India. In Western Europe, Middle ages tapestry gives numerous wonderful representations of unicorns, such as the Paris Dame à la licorne. This extract from Umberto Eco's The name of the rose gives additional proof of their existence:

“But is the unicorn a falsehood? It’s the sweetest of animals and a noble symbol. It stands for Christ, and for chastity; it can be captured only by setting a virgin in the forest, so that the animal, catching her most chaste odor, will go and lay its head in her lap, offering itself as prey to the hunters’ snares.”
“So it is said, Adso. But many tend to believe that it’s a fable, an invention of the pagans.”
“What a disappointment,” I said. “I would have liked to encounter one, crossing a wood. Otherwise what’s the pleasure of crossing a wood?”

The Internet age gave even more evidence to the existence of unicorns. For example, we now know that though it is not rainbowish, contrary to a well-spread myth, unicorns have a colorful poop.

Numbers have more existence than unicorns

Whatever numbers and unicorns could be, we are litteraly surrounded by the former, and have scarce traces of the latter. I'd also like to add that numbers have a big impact in all modern human activities: the most basic trading activities use numbers, as does the most sophisticated science. But we don't need to wish to send a man to the moon to require numbers: the political life is build around votations and polls, all timed schedules are expressed in numbers, and schools, hospitals, movies are regularly ranked or assigned rates. An example given by Hacking is the existence of world records for, say, 50 m free stroke swimming. That requires good swimmers, of course, but also elaborate timers, very precise measurement tools to have the swimming pool acknowledged as a legitimate place, and so on. On the other hand, how nice the cultural or artistic representations of unicorns can be, they don't have that prevalence in the modern world, and it is rare that something is possible because of something involving unicorns.

Neither numbers nor unicorns exist

This is probably my favorite take, and that's maybe slightly bizarre from a mathematician, especially one versed in number theory. But give some lucid look at the discourse that we have about numbers. On one side, we have the famous quote of Leopold Kronecker, “Die ganzen Zahlen hat der liebe Gott gemacht, alles andere ist Menschenwerk.” — God made whole numbers, all the rest is the work of man. Does this mean that at least some numbers, the natural integers, exist in the world in the same way that we can find water, iron, gold or grass? Are there number mines somewhere? It doesn't seem so, and I'm inclined to say that if numbers exist, they also are the creation of mankind. The ZFC axioms of set theory postulate the existence of some set $\mathbf N$ satisfying an induction principle, and that induction principle is used to endow that set with an addition and a multiplication which make it a mathematical model of Kronecker's “whole numbers”, but does this mean that numbers exist? After all, what does guarantee that the ZFC axioms are not self contradictory? And if they were, would that mean that whole numbers cease to exist? Certainly not. In any case, our daily use of whole numbers would not be the least disturbed by a potential contradiction in those axioms. But that indicates that being able to speak of integers withing the ZFC set theory doesn't confer those integers some existence, in the same sense that grass, water, iron exist. Another indication of their elusive character is the way mathematicians or computer scientists pretend numbers are specific objects, such as zero being the empty set, 1 being the set $\{\emptyset\}$, and, more generally, the integer $n+1$ being $n \cup \{n\}$. This is at most a functional fiction, as is well explained by Benacerraf (1965) in his paper, “What numbers could not be” (The Philosophical Review, Vol. 74, No. 1, pp. 47-73).

But all in all, everything is fine, because whether numbers (resp. unicorns) exist or don't, we, humans, have developed a language to talk about them, make us think, make us elaborate new works of art or of science, and after all, this is all that counts, isn't it?

The combinatorial Nullstellensatz

2024-09-13T00:38:00.000+02:00

The “combinatorial Nullstellensatz” is a relatively elementary statement due to Noga Alon (1999) whose name, while possibly frightening, really says what it is and what it is good for. (A freely available version is there.) Nullstellensatz is the classic name for a theorem of David Hilbert that relates loci in $F^n$ defined by polynomial equations and the ideals of the polynomial ring $F[T_1,\dots,T_n]$ generated by these equations, when $F$ is an algebraically closed field. The word itself is German and means “theorem about the location of zeroes”. The precise correspondence is slightly involved, and stating it here precisely would derail us from the goal of this post, so let us stay with these informal words for the moment. The adjective combinatorial in the title refers to the fact that Alon deduced many beautiful consequences of this theorem in the field of combinatorics, and for some of them, furnishes quite simple proofs. We'll give one of them below, the Cauchy—Davenport inequality, but my main motivation in writing this text was to discuss the proof of the combinatorial Nullstellensatz, in particular untold aspects of the proofs which took me some time to unravel.$\gdef\Card{\operatorname{Card}}$

The context is the following: $F$ is a field, $n$ is a positive integer, and one considers finite sets $S_1,\dots,S_n$ in $F$, and the product set $S=S_1\times \dots \times S_n$ in $F^n$. One considers polynomials $f\in F[T_1,\dots,T_n]$ in $n$ indeterminates and their values at the points $(s_1,\dots,s_n)$ of $S$. For every $i\in\{1,\dots,n\}$, set $g_i = \prod_{a\in S_i} (T_i-a)$; this is a polynomial of degree $\Card(S_i)$ in the indeterminate $T_i$.

Theorem 1. — If a polynomial $f\in F[T_1,\dots,T_n]$ vanishes at all points $s\in S$, there exist polynomials $h_1,\dots,h_n\in F[T_1,\dots,T_n]$ such that $f=g_1 h_1+\dots+g_n h_n$, and for each $i$, one has $\deg(g_i h_i)\leq \deg (f)$.

This theorem is really close to the classic Nullstellensatz, but the specific nature of the set $S$ allows to have a weaker hypothesis (the field $F$ is not assumed to be algebraically closed) and a stronger conclusion (the Nullstellensatz would assert that there exists some power $f^k$ of $f$ that is of this form, saying nothing of the degrees).

Its proof relies on a kind of Euclidean division algorithm. Assume that $f$ has some monomial $c_mT^m=c_mT_1^{m_1}\dots T_n^{m_n}$ where $m_i\geq \Card(S_i)$; then one can consider a Euclidean division (in the variable $T_i$), $T_i^{m_i}=p_i g_i + r_i$, where $\deg(r_i)<\Card(S_i)$. One can then replace this monomial $c_mT^m$ in $f$ by $c_m T^m/T_i^{m_i})r_i$, and, at the same time register $c_m T^m/T_i^{m_i} p_i$ to $h_i$. Since this amounts to subtract some multiple of $g_i$, the vanishing assumption still holds. Applying this method repeatedly, one reduces to the case where the degree of $f$ in the variable $T_i$ is $<\Card(S_i)$ for all $i$.
Then a variant of the theorem that says that a polynomial in one indeterminate has no more roots than its degree implies that $f\equiv 0$.

A weak point in the written exposition of this argument is the reason why the iterative construction would terminate. Intuitively, something has to decrease, but one needs a minute (or an hour, or a week) of reflexion to understand what precisely decreases. The problem is that if one works one monomial at a time, the degrees might stay the same. The simplest reason why this procedure indeed works belongs to the theory of Gröbner bases and to the proof that the division algorithm actually works: instead of the degrees with respect to all variables, or of the total degree, one should consider a degree with respect to some lexicographic ordering of the variables — the point is that it is a total ordering of the monomials, so that if one consider the leading term, given by the largest monomial, the degree will actually decrease.

The second theorem is in fact the one which is needed for the applications to combinatorics. Its statement is rather written in a contrapositive way — it will show that $f$ does not vanish at some point of $S$.

Theorem 2. — Let $f\in F[T_1,\dots,T_m]$ and assume that $f$ has a nonzero monomial $c_mT^m$, where $|m|$ is the total degree of $f$. If, moreover, $m_i<\Card(S_i)$ for all $i$, then there exists a point $s=(s_1,\dots,s_n)\in S$ such that $f(s)\neq 0$.

It is that theorem whose proof took me a hard time to understand. I finally managed to formalize it in Lean, hence I was pretty sure I had understood it. In fact, writing this blog post helped me simplify it further, removing a couple of useless arguments by contradiction! Anyway, I feel it is worth being told with a bit more detail than in the original paper.

We argue by contradiction, assuming that $f(s)=0$ for all $s\in S_1\times\dots \times S_n$. Applying theorem 1, we get an expression $f=\sum_{i=1}^n g_i h_i$ where $\deg(g_ih_i)\leq \deg(f)$ for all $i$. The coefficient of $T^m$ in $f$ is non zero, by assumption; it is also the sum of the coefficients of the coefficients of $T^m$ in $g_i h_i$, for $1\leq i\leq n$, and we are going to prove that all of them vanish — hence getting the desired contradiction. $\gdef\coeff{\operatorname{coeff}}$

Fix $i\in\{1,\dots, n\}$. If $h_i=0$, then $\coeff_m(g_ih_i)=\coeff_m(0)=0$, hence we may assume that $h_i\neq0$. This implies that $\deg(g_i h_i)=\Card(S_i)+\deg(h_i) \leq \deg(f)=|m|$.
One then has
\[ \coeff_m(g_i h_i)=\sum_{p+q=m} \coeff_p (g_i) \coeff_q(h_i), \]
and it suffices to prove that all terms of this sum vanish. Fix $p,q$ such that $p+q=m$, assume that $\coeff_p(g_i)\neq 0$, and let us prove that $\coeff_q(h_i)=0$.

By the very definition of $g_i$ as a product of $\Card(S_i)$ factors of the form $T_i-a$, this implies that $p_j=0$ for all $j\neq i$. Moreover, $p_i\leq p_i+q_i=m_i < \Card(S_i)$, by the assumption of the theorem, hence $p_i<\Card(S_i)$. This implies \[\Card(S_i)+\deg(h_i)\leq |m|= |p+q|=|p|+|q|=p_i+|q|<\Card(S_i)+|q|.\]
Subtracting $\Card(S_i)$ on both sides, one gets $\deg(h_i)<|q|$, hence $\coeff_q(h_i)=0$, as was to be shown.

To conclude, let us add the combinatorial application to the Cauchy—Davenport theorem.
$\gdef\F{\mathbf F}$

Theorem 3. — Let $p$ be a prime number and let $A$ and $B$ be nonempty subsets of $\F_p$, the field with $p$ elements. Denote by $A+B$ the set of all $a+b$, for $a\in A$ and $b\in B$. Then
\[ \Card(A+B) \geq \min (p, \Card(A)+\Card(B)-1).\]

First consider the case where $\Card(A)+\Card(B)>p$, so that $\min(p,\Card(A)+\Card(B)-1)=p$. In this case, for every $c\in\F_p$, one has
\[\Card(A \cap (c-B))\cup \Card(A\cup (c-B))=\Card(A)+\Card(c-B)>\Card(A)+\Card(B)>p,\]
so that the sets $A$ and $c-B$ intersect. In particular, there exist $a\in A$ and $b\in B$ such that $a+b=c$ and $A+B=\F_p$, and the desired inequality holds.

Let us now treat the case where $\Card(A)+\Card(B)\leq p$, so that $\min(p,\Card(A)+\Card(B)-1)=\Card(A)+\Card(B)-1$. We will assume that the conclusion of the theorem does not hold, that is, $\Card(A+B)\leq \Card(A)+\Card(B)-2$, and derive a contradiction. Let us consider the polynomial $f\in \F_p[X,Y]$ defined by $f=\prod _{c\in A+B} (X+Y-c)$. The degree of $f$ is equal to $\Card(A+B)\leq \Card(A)+\Card(B)-2$. Choose natural integers $u$ and $v$ such that $u+v=\Card(A+B)$, $u\leq\Card(A)-1$ and $v\leq \Card(B)-1$. The coefficient of $X^u Y^v$ is equal to the binomial coefficient $\binom{u+v}u$. Since $u+v\leq \Card(A)+\Card(B)-2\leq p-2$, this coefficient is nonzero in $\F_p$. By theorem 2, there exists $(a,b)\in A\times B$ such that $f(a,b)\neq 0$. However, one has $a+b\in A+B$, hence $f(a,b)=0$ by the definition of $f$. This contradiction concludes the proof that $\Card(A+B)\geq \Card(A)+\Card(B)-1$. It also concludes this post.

Combinatorics of partitions

2024-08-12T13:17:00.004+02:00

Divided powers are an algebraic gadget that emulate, in an arbitrary ring, the functions $x\mapsto x^n/n!$ for all integers $n$, together with the natural functional equations that they satisfy.

One of them is a nice binomial theorem without binomial coefficients : denoting $x^n/n!$ by $x^{[n]}$, one has $$ (x+y)^{[n]}=\sum_{k=0}^n x^{[n-k]} y^{[k]}. $$

Another formula looks at what happens when one iterates the construction: if everything happens nicely, one has $$ (x^{[n]})^{[m]}=\frac1{m!}\left(x^n/n!\right)^m = \frac1{(n!)^m m!} x^{nm} = \frac{(mn)!}{m! n!^m} x^{[mn]}. $$

In the development of the theory, the remark that comes then is the necessary observation that the coefficient $(mn)!/{m! n!^m}$ is an integer, which is not obvious since it is written as a fraction. Two arguments are given to justify this fact.

The formula $$ \frac{(mn)!}{m! n!^m} = \prod_{k=1}^{m} \binom{k n-1}{n-1} $$ which the authors claim can be proved by induction
The observation that $(mn)!/m!n!^m$ is the number of (unordered) partitions of a set with $mn$ elements into $m$ subsets with $n$ elements.

The latter fact is a particular case of a more general counting question: if $n=(n_1,\dots,n_r)$ are integers and $N=n_1+\dots+n_r$, then the number of (unordered) partitions of a set with $N$ elements into subsets of cardinalities $n_1,\dots,n_r$ is equal to $$\frac{N!}{\prod n_i! \prod_{k>0} m_k(n)!},$$ where $m_k(n)$ is the number of occurences of $k$ in the sequence $(n_1,\dots,n_r)$.

The goal of this blog post is to present a combinatorial argument for the first equality, and an alternative expression of the second number as a product of integers. We also give a formal, group theoretical proof, that this quotient of factorials solves the given counting problem.

$\gdef\abs#1{\lvert#1\rvert}$ $\gdef\Card{\operatorname{Card}}$

Counting partitions of given type

Let $S$ be a set with $N$ elements. A partition of $S$ is a subset $\{s_1,\dots,s_r\}$ of $\mathfrak P(S)$ consisting of nonempty, pairwise disjoint, subsets whose union is $S$. Its *type* is the “multiset” of integers given by their cardinalities. Because no $s_i$ is empty, the type is a multiset of nonzero integers. It is slightly easier to authorize zero in this type; then a partition has to be considered as a multiset of subsets of $S$, still assumed pairwise disjoint, so that only the empty subset can be repeated.

The group $G$ of permutations of $S$ acts on the set $\Pi_S$ of partitions of $S$. This action preserves the type of a partition, so that we get an action on the set $\Pi_{S,n}$ on the set of partitions of type $n$, of any multiset of integers $n$. It is nonempty if and only if $\abs n=N$.

This action is transitive: Given two partitions $(s_1,\dots,s_r)$ and $(t_1,\dots t_r)$ with the same type $(n_1,\dots,n_r)$, numbered so that $\abs{s_i}=\abs{t_i}={n_i}$ for all $i$, we just define a permutation of $S$ that maps the elements of $s_i$ to $t_i$.

By the orbit-stabilizer theorem, the number of partitions of type $n$ is equal to $\Card(G)/\Card(G_s)$, where $G_s$ is the stabilizer of any partition $s$ of type $n$. Since $\Card(S)=N$, one has $\Card(G)=N!=\abs n!$.

On the other hand, the stabilizer $G_s$ of a partition $(s_1,\dots,s_r)$ can be described as follows. By an element $g$ of this stabilizer, the elements of $s_i$ are mapped to some set $s_j$ which has the same cardinality as $s_i$. In other words, there is a morphism of groups from $G_s$ to the subgroup of the symmetric group $\mathfrak S_r$ consisting of permutations that preserve the cardinality. This subgroup is a product of symmetric groups $\mathfrak S_{m_k}$, for all $k> 0$, where $m_k$ is the number of occurrences of $k$ in $(n_1,\dots,n_r)$. Here, we omit the factor corresponding to $k=0$ because our morphism doesnt' see it.

This morphism is surjective, because it has a section. Given a permutation $\sigma$ of $\{1,\dots,r\}$ that preserves the cardinalities, we have essentially shown above how to construct a permutation of $S$ that induces $\sigma$, and this permutation belongs to $G_s$. More precisely, if we number the elements of $s_i$ as $(x_{i,1},\dots,x_{i,n_i})$, we lift $\sigma$ to the permutation that maps $x_{i,j}$ to $x_{\sigma(i),j}$ for all $i,j$. This makes sense since $n_{i}=n_{\sigma(i)}$ for all $i$.

The kernel of this morphism $G_s\to \prod_{k>0} \mathfrak S_{m_k}$ consists of permutations that fix each $s_i$. It is clearly equal to the product of the symmetric groups $\prod \mathfrak S_{n_i}$.

One thus has $\Card(G_s)=\prod n_i! \prod_{k>0} m_k!$, as was to be shown.

A combinatorial interpretation of the first relation

As we said above, that relation $$ \frac{(mn)!}{m^ n!^m} = \prod_{k=1}^{m-1} \binom{kn-1}{n-1} $$ can be proved by induction, just playing with factorials, but we want a combinatorial interpretation for it. (Here we assume that $n\geq 1$.)
By the preceding paragraph, the left hand side is the number of partitions of a set with $mn$ elements of type $(n,\dots,n)$. We may assume that this set is numbered $\{1,\dots,mn\}$.

Such a partition can be defined as follows. First, we choose the subset that contains $1$: this amounts to choosing $n-1$ elements among $\{2,\dots,mn\}$, and there are $\binom{mn-1}{n-1}$ possibilities. Now, we have to choose the subset that contains the smallest element which is not in the first chosen subset. There are $n-1$ elements to choose among the remaining $mn-n-1=(m-1)n-1$ ones, hence $\binom{(m-1)n-1}{n-1}$ possibilities. And we go on in the same way, until we have chosen the $m$ required subsets. (For the last one, there is course only one such choice, and notice that the last factor is $\binom{n-1}{n-1}=1$.)

An integral formula for the number of partitions of given type

Finally, we wish to give an alternating formula for the number of partitions of given type of a set $S$ that makes it clear that it is an integer. Again, we can assume that the set $S$ is equal to $\{1,\dots,N\}$ and that $n=(n_1,\dots,n_r)$ is a multiset of integers such that $\abs n=N$.

Let us write $n$ in an other way, $n=(0^{m_0},1^{m_1},\dots)$, meaning that $0$ is repeated $m_0$ times, $1$ is repeated $m_1$ times, etc. One has $N=\sum k m_k$. The idea is that we can first partition $S$ into a subsets of cardinalities $m_1$, $2m_2$,… and then subpartition these subsets: the first one into $m_1$ subsets with one element, the second into $m_2$ subsets with two elements, etc.

The number of ordered partitions with $m_1$, $2m_2$… elements is the multinomial number $$ \binom{N}{m_1,2m_2,\dots}.$$ And the other choice is the product of integers as given in the preceding section. This gives $$ \frac{\abs n!}{\prod_i n_i! \prod_{k>0}m_k(n)!} = \binom{N}{m_1,2m_2,\dots}\prod_{k>0} \frac{(km_k)!}{k!^{m_k} m_k(n)!}.$$ We can also write the fractions that appear in the product as integers: $$ \frac{\abs n!}{\prod_i n_i! \prod_{k>0}m_k(n)!} =\binom{N}{m_1,2m_2,\dots} \prod_{k>0} \prod_{m=1}^{m_k} \binom {km-1}{k-1}.$$

Number theory and finite automata

2024-07-20T18:31:00.003+02:00

$\gdef\abs#1{\lvert#1\rvert}$

There are many parts of number theory I don't know of, and today's story is about one I learnt very recently, at the intersection of number theory and computer science. It is about the natural numbers that we can recognize using a finite automaton.

Finite automata and automatic functions

Finite automata

Finite automata are very elementary computers. These machines can receive instructions of finitely many sorts, and can be in finitely many states; moreover, their behaviour is prescribed: when, in a given state, they receive one instruction, they just move to another state. Mathematically, one can say that there is a finite set $A$ of instructions, a finite set $S$ of states, and a mapping $\phi\colon A \times S\to S$. One state $\varepsilon$ is labeled as the initial state; wen reading a list of instructions $[a_1,\dots,a_m]$, the machine goes through the states $s_0=\varepsilon$, $s_1=\mu(a_1,s_0)$, $s_2=\mu(a_2,s_1)$, etc. until it reaches and outputs its the final state $s_n$. All in all, the automaton defines a function $\Phi$ from the set $A^*$ of lists in $A$ to the set $S$. Such functions are called automatic.

In practice, some states could be labeled as admissible, in which case the machine unlocks the door, and the set of lists $\alpha\in A^*$ such that $\Phi(\alpha)$ is admissible is then called the language of the automaton.

The basic question of automata theory is to describe the (so-called “regular”) languages recognizable by some automaton, more generally the automatic functions. It appears that they are of a very restricted nature: due to the elementary nature of finite automata, they are much more restricted than the kind of languages that would be recognizable by a program in the programming language of your choice. The reason is that they have a fixed memory while “theoretical” computers, as modelled by Turing machines for example, have a potentially infinite memory, that is, as large as the computation may require.

Numbers

Number theory enters when you decide that the set $A$ of instructions are digits 0, 1, 2,…, 9 and the lists in $A^*$ represent a number. You then wish to understand what numbers can be recognizable, or what functions $\Phi\colon\mathbf N\to S$ are automatic, for a finite set $S$. This is what happens, for example, in the very elementary finite automata that stand at almost all Paris doors as electronic locks. They recognize exactly one 4-digit number (and any 4-digit number) puts them back in the initial state.

When it comes about digits, they can be the usual ones, 0, 1, 2,… , 9, writing number in base 10, but any other basis can be considered, and any set of digits that allows to write all numbers can be used. For example, one can consider writing numbers in base 3, with digits 0, 1, 2, 3, 4.

Let us thus fix some basis $b$. The first theorem is that the set of automatic functions does not depend on the set of digits which is considered. The reason is that one can build another kind of machine (called a transducer) that works like a finite automaton except that it output something at each step. Here, we can have a transducer that outputs the digits in some representation given the digits in another one. There could be carries to take care of, but when the basis is the same, their impact is controlled, and it can be controlled by finitely many of them. It is important for this argument that the basis stays the same, and we will see that in a more striking way later on. For this reason, we will not specify the set of digits in the sequel.

Examples

Here are some examples of recognizable sets of numbers.

The empty set works. It suffices to have no terminal state.
Singletons work. For example, to recognize “314”, one needs one initial state “$\varepsilon$”, three states labeled “3”, “31” and “314”, and a junk state “J”. The machine moves from “$\varepsilon$” to “3” if it receives “3”, and to “J” otherwise; from “3” to “31” if it receives “1”, etc.
The union and the intersection of two recognizable sets are recognizable. Just make an automaton whose sets are pair of states of the two automata that recognize the given states, and simulate the moves of each of them. Similarly, the complement of a recognizable set is recognizable.
By the previous examples, finite sets and complements of finite sets are recognizable.
Here is a number theoretically more interesting example : arithmetic progressions are recognizable. To produce an automaton that, say, recognize all integers congruent to 1 modulo 4, it suffices to have a machine with 4 states that computes the euclidean reminder step by step.
Another number theoretically interesting example: powers of $b$. Indeed their writing in base $b$ consists of one digit “1” followed by a series of “0”.
A number theoretically annoying property of recognizable sets: since a finite automaton has finitely many states, the “pumping lemma” shows that for any large enough recognizable number, one can replicate at will some inner part of the writing and get another number. I suppose one can deduce from this property that the set of prime numbers cannot be recognizable. (This is exercise 5.8.12 in the book Automatic sequences, by J-P. Allouche and J. Shallit.)
Numbers recognizable in base $b$ are also recognizable in base $b^2$ (or some other power) and conversely: it suffices to emulate the base $b^2$-writing of a number using base $b$-digits, or conversely.

Building from characteristic functions of admissible languages, the preceding results can be turned to examples of automatic functions. In particular, we obtain that functions which are ultimately periodic are automatic.

Cobham's theorem

Statement

Number theory can also enter a topic from two sides at a time, and here is the point of this blog post. Can one compare the sets of recognizable numbers from different bases? By the following theorem, they are genuinely distinct.

Theorem (Cobham, 1969). — Let $S$ be a finite set and let $f\colon \mathbf N\to S$ be a function which is automatic in two bases $b$ and $c$. If $b$ and $c$ are not powers of the same integer, then $f$ is ultimately periodic.
Since ultimately periodic functions are automatic in any basis, this theorem is definitely optimal.

The original proof of Cobham's theorem is said to be difficult, and this theorem has obtained various other simpler proofs, often partially flawed. I wish to describe here the recent proof by Thijmen Krebs (publisher version, arXiv:1801.06704) which calls itself “more reasonable”.

The assumption that $a$ and $b$ are not powers of the same integer will appear in different, equivalent, form in the proof: they do not have common powers.

Local periods

To prove that $f$ is ultimately periodic, Krebs proves that $f$ is periodic on large overlapping intervals.

Lemma 1. — Let $f: \mathbf N\to S$ be a function and let $I$ and $J$ be intervals of integers. We assume that $f$ has period $p$ on $I$ and period $q$ on $J$. If $\operatorname{Card}(I\cap J)\geq p+q$, then $f$ has period $p$ on the interval $I\cup J$.

The proof is elementary. Consider $x\in I\cup J$ such that $x+p\in I\cup J$. If both $x$ and $x+p$ belong to $I$, then $f(x)=f(x+p)$. Let us assume otherwise. If $x$ belongs to $I$, but not $x+p$, then $x+p\in J$; since $I\cap J$ has at least $p+q$ elements, $x$ must belongs to $J$. The other cases are similar and show that both $x, x+p$ belong to $J$. Using that $I\cap J$ is large, we pick up elements $y,y+p$ in $I\cap J$ such that $x\equiv y \pmod q$. Then $f(x)=f(y)=f(y+p)=f(x+p)$, the first and last equality using the $q$-periodicity on $J$, and the middle one the $p$-periodicity on $I$.

A diophantine approximation lemma

Lemma 2. — Let $a$ and $b$ be two real numbers such that $a,b>1$. For every $\varepsilon>0$, there exist nonzero natural numbers $m$ and $n$ such that $\abs{a^m-b^n}<\varepsilon b^n$.

The result is obvious if $a$ and $b$ have a common power, so we may assume that this is not the case. Then various proofs are possible. For example, one may consider the additive subgroup of $\mathbf R$ generated by $\log(a)$ and $\log(b)$; by assumption, it is dense hence there exists a small linear combination $m\log(a)-n\log(b)$; taking exponentials, $a^m/b^n$ is close to~$1$, as claimed.

Krebs gives a “pigeonhole-like” proof which is nice. For every integer $m\geq1$, consider the unique integer $n_m\geq 1$ such that $1\leq a^m/b^{n_m}<b$. There must be two integers $m$ and $p$ such that $m<p $ and such that $a^m/b^{n_m}$ and $a^p/b^{n_p}$ differ at most from $\varepsilon$, and this implies the result.

Some functional equations

Let $f\colon \mathbf N\to S$ be a function which is automatic in some basis $c$, computed by an automaton with set of states $S$. For every $s\in S$, let $L(s)$ be the set of integers $n$ such that $f(n)=s$. Note that these sets form a partition of $\mathbf N$.

The following lemma is a variant of the pumping lemma in the theory of automata; it is by it that automatic functions acquire their specific properties.

Lemma 3. — For every integers $x,y\in L(s)$ and any integer $z$ such that $0\leq z < c^n$, one has $f(x c ^ n+z)=f(yc ^n +z)$.

Indeed, when the automaton reads the integer $xc^n+z$, it first reads $x$, and goes to state $s$; similarly, when it reads $yc^n+z$, it first reads $y$ and reaches the state $s$ too. In both cases it then reads $z$ and ends up in the same final state.

Construction of local periods

We now consider a function $f$ which is automatic in two bases $a$ and $b$ which do not have common powers. We thus have two finite automata $A$ and $B$, with sets of states $S$ and $T$. For each state $s\in S$, we have a set $L(s)$ of integers as above; these sets form a partition of $\mathbf N$. Let $S^\infty$ be the set of states $s\in S$ such that $L(s)$ is infinite. For $t\in T$, we define $M(t)$ similarly, as well as $T^\infty$.

Let $s\in S^\infty$. Since the $M(t)$, for $t\in T$, form a finite partition of $\mathbf N$ and $L(s)$ is infinite, there exists a state $t(s)$ and two distinct integers $x(s)$ and $y(s)\in L(s)\cap M(t(s))$. Let $K$ be an integer strictly greater than all $x(s), y(s)$, for $s\in S^\infty$.

Applying lemma 3 for $\varepsilon = 1/6K$, we obtain integer $m,n$ such that $\abs{a^m-b^n}<a^m/6K$. For $s\in S^\infty$, we $p(s)=\pm (x(s)-y(s))(a^m-b^n)$; since $a$ and $b$ have no common power, this is a nonzero integer, and we choose the sign so that it is strictly positive.

For each integer $x$, we define an interval $I_x$ of integers by $I_x=[(x+\frac13)a^m,(x+\frac53)a^m]\cap\mathbf N$.

Lemma 4. — For every state $s\in S_\infty$ and any integer $x\in L(s)$, the function $f$ has period $p(s)$ on $I_x$.

To that aim, we consider $z\in [\frac13a^m,\frac53a^m]$ and prove that $f(xa^m+z)=f(xa^m+z+p)$. First of all, we have the inequality $$ \abs{z-x(s)(a^m-b^n)-b^n} \leq \abs{z-a^m}+(x(s)+1) \abs{a^m-b^n} \leq \frac23 a^m + K\abs{a^m-b^n} \leq \frac56a^m\leq b^n $$ because $\abs{a^m-b^n}\leq \frac16 a^m$. Consequently, $z$ is written with at most $n$ digits in base $b$. Applying lemma 3, we have $ f(x a^m+z) = f(x(s)a^m+z$, because $x$ and $x(s)\in L(s)$. Applying lemma 3 again, this is equal to $f(x(s)a^m+z-x(s)(a^m-b^n))$, hence to $f(y(s)a^m+z-x(s)(a^m-b^n)) = f(y(s)a^m+z+p(s))$. Applying lemma 3 a third time, this is equal to $f(xa^m+z+p(s)$. This concludes the proof.

Conclusion

Since the sets $L(s)$, for $s\notin S_\infty$ are finite, the set $L(s)$ for $s\in S_\infty$ cover all integers larger than some integer, say $x$.

For $y\geq x$, there exists $s\in S_\infty$ such that $y\in L(s)$ and we have shown that $f$ is periodic with period $p(s)\leq \frac16a^m$ on an interval $I(x)=[(y+\frac13)a^m,(y+\frac53)a^m]$. By abuse of notation, write $p(x)=p(s)$.

Let $J(z)$ be the union of the intervals $I(y)$, for $x\leq y\leq z$. One has $J(x)=I(x)$; for $z>x$, the intersection $J(z)\cap J(z-1)$ is $[(z+\frac13)a^m;(z+\frac23)a^m]$ hence has at least $\frac13a^m$ points, thus at least $p(x)+p(z)$. Using lemma 1, $f$ has period $p(x)$ on $J(z)$, for all $z$.

This concludes the proof that $f$ is ultimately periodic

Evaluating the operator norms of matrices

2024-04-14T17:19:00.000+02:00

Let $E$ and $F$ be normed vector spaces, over the real or complex numbers, and let $u\colon E\to F$ be a linear map. The continuity of $u$ is proved to be equivalent to the existence of a real number $c$ such that $\|u(x)\|\leq c \|x\|$ for every $x\in E$, and the least such real number is called the operator norm of $u$; we denote it by $\|u\|$. It defines a norm on the linear space $\mathscr L(E;F)$ of continuous linear maps from $E$ to $F$ and as such is quite important. When $E=F$, it is also related to the spectrum of $u$ and is implicitly at the heart of criteria for the Gershgorin criterion for localization of eigenvalues.

$\gdef\R{\mathbf R}\gdef\norm#1{\lVert#1\rVert}\gdef\Abs#1{\left|#1\right|}\gdef\abs#1{\lvert#1\rvert}$

However, even in the simplest cases of matrices, its explicit computation is not trivial at all, and as we'll see even less trivial than what is told in algebra classes, as I learned by browsing Wikipedia when I wanted to prepare a class on the topic.

Since I'm more a kind of abstract guy, I will use both languages, of normed spaces and matrices, for the first one allows to explain a few things at a more fundamental level. I'll make the translation, though. Also, to be specific, I'll work with real vector spaces.

So $E=\R^m$, $F=\R^n$, and linear maps in $\mathscr L(E;F)$ are represented by $n\times m$ matrices. There are plentiful interesting norms on $E$, but I will concentrate the discussion on the $\ell^p$-norm given by $\norm{(x_1,\dots,x_m)} = (|x_1|^p+\dots+|x_m|^p)^{1/p}$. Similarly, I will consider the $\ell^q$-norm on $F$ given by $\norm{(y_1,\dots,y_m)} = (|y_1|^q+\dots+|y_n|^q)^{1/q}$. Here, $p$ and $q$ are elements of $[1;+\infty\mathclose[$. It is also interesting to allow $p=\infty$ or $q=\infty$; in this case, the expression defining the norm is just replaced by $\sup(|x_1|,\dots,|x_m|)$ and $\sup(|y_1|,\dots,|y_n|)$ respectively.

Duality

Whatever norm is given on $E$, the dual space $E^*=\mathscr L(E;\mathbf R)$ is endowed with the dual norm, which is just the operator norm of that space: for $\phi\in E^*$, $\norm\phi$ is the least real number such that $|\phi(x)|\leq \norm\phi \norm x$ for all $x\in E$. And similarly for $F$. To emphasize duality, we will write $\langle x,\phi\rangle$ instead of $\phi(x)$.

Example. — The dual norm of the $\ell^p$ norm can be computed explicitly, thanks to the Young inequality \[ \Abs{ x_1 y_1 + \dots + x_n y_n } \leq (|x_1|^p+\dots + |x_n|^p)^{1/p} (|x_1|^q+\dots+|x_n|^q)^{1/q}\] if $p,q$ are related by the relation $\frac1p+\frac1q=1$. (When $p=1$, this gives $q=\infty$, and symmetrically $p=\infty$ if $q=1$.) Moreover, this inequality is optimal in the sense that for any $(x_1,\dots,x_n)$, one may find a nonzero $(y_1,\dots,y_n)$ for which the inequality is an equality. What this inequality says about norms/dual norms is that if one identifies $\R^n$ with its dual, via the duality bracket $\langle x,y\rangle=x_1y_1+\dots+x_n y_n$, the dual of the $\ell^p$-norm is the $\ell^q$-norm, for that relation $1/p+1/q=1$.

If $u\colon E\to F$ is a continuous linear map, it has an adjoint (or transpose) $u^*\colon F^*\to E^*$, which is defined by $u^*(\phi)= \phi\circ u$, for $\phi\in F^*$. In terms of the duality bracket, this rewrites as \[ \langle \phi, u(x)\rangle = \langle u^*(\phi),x\rangle\] for $x\in E$ and $\phi\in F^*$.

Proposition. — One has $\norm{u^*}=\norm u$.

For $\phi\in F^*$, $\norm{u^*(\phi)}$ is the least real number such that $|u^*(\phi)(x)|\leq \norm{u^*(\phi)} \norm x$ for all $x\in E$. Since one has \[ |u^*(\phi)(x)|= |\langle u^*(\phi),x\rangle|=|\langle\phi, u(x)\rangle\leq \norm\phi \norm{u(x)} \leq \norm\phi \norm u\norm x, \] we see that $\norm {u^*(\phi)}\leq \norm\phi\norm u$ for all $\phi$. As a consequence, $\norm{u^*}\leq \norm u$.
To get the other inequality, we wish to find a nonzero $\phi$ such that $\norm{u^*(\phi)}=\norm{u}\norm\phi$. This $\phi$ should thus be such that there exists $x$ such that $|\langle u^*(\phi),x\rangle|=\norm u\norm\phi\norm x$ which, by the preceding computation means that $|\langle\phi, u(x)\rangle=\norm\phi\norm u\norm x$. Such $\phi$ and $x$ must not exist in general, but we can find reasonable approximations. Start with a nonzero $x\in E$ such that $\norm{u(x)}$ is close to $\norm u\norm x$; then using the Hahn-Banach theorem, find a nonzero $\phi\in F^*$ such that $\norm\phi=1$ and $|\phi(u(x))|=\norm {u(x)}$. We see that $\langle\phi, u(x)\rangle$ is close to $\norm u\norm\phi\norm x$, and this concludes the proof.
In some cases, in particular in the finite dimension case, we can use biduality to get the other inequality. Indeed $E^{**}$ identifies with $E$, with its initial norm, and $u^{**}$ identifies with $u$. By the first case, we thus have $\norm{u^{**}}\leq \norm {u^*}$, hence $\norm u\leq\norm{u^*}$.

The case $p=1$

We compute $\norm{u}$ when $E=\mathbf R^m$ is endowed with the $\ell^1$-norm, and $F$ is arbitrary. The linear map $u\colon E\to F$ thus corresponds with $m$ vectors $u_1,\dots,u_m$ of $F$, and one has \[ u((x_1,\dots,x_m))=x_1 u_1+\dots+x_m u_m. \] By the triangular inequality, we have \[ \norm{u((x_1,\dots,x_m))} \leq |x_1| \norm{u_1}+\dots+\abs{x_m}\norm{u_m} \] hence \[ \norm{u((x_1,\dots,x_m))} \leq (\abs{x_1} +\dots+\abs{x_m}) \sup(\norm{u_1},\dots,\norm{u_m}). \] Consequently, \[ \norm{u} \leq \sup(\norm{u_1},\dots,\norm{u_m}). \] On the other hand, taking $x=(x_1,\dots,x_m)$ of the form $(0,\dots,1,0,\dots)$, where the $1$ is at place $k$ such that $\norm{u_k}$ is largest, we have $\norm{x}=1$ and $\norm{u(x)}=\norm{u_k}$. The preceding inequality is thus an equality.

In the matrix case, this shows that the $(\ell^1,\ell^q)$-norm of a $n\times m$ matrix $A$ is the supremum of the $\ell^q$-norms of the columns of $A$.

The case $q=\infty$

We compute $\norm{u}$ when $F=\mathbf R^n$ is endowed with the $\ell^\infty$-norm, and $E$ is arbitrary. A direct computation is possible in the matrix case, but it is not really illuminating, and I find it better to argue geometrically, using a duality argument.

Namely, we can use $u^*\colon F^*\to E^*$ to compute $\norm{u}$, since $\norm u=\norm{u^*}$. We have seen above that $F^*$ is $\mathbf R^n$, endowed with the $\ell^1$-norm, so that we have computed $\norm{u^*}$ in the preceding section. The basis $(e_1,\dots,e_n)$ of $F$ gives a dual basis $(\phi_1,\dots,\phi_n)$, and one has \[ \norm{u}=\norm{u^*} = \sup (\norm{u^*(\phi_1)},\dots,\norm{u^*(\phi_n)}). \]

In the matrix case, this shows that the $(\ell^p,\ell^\infty)$-norm of a $n\times m$ matrix $A$ is the supremum of the $\ell^p$-norms of the lines of $A$.

Relation with the Gershgorin circle theorem

I mentioned the Gershgorin circle theorem as being in the same spirit as the computation of an operator norm, because its proof relies on the same kind of estimations. In fact, no additional computation is necessary!

Theorem (Gershgorin “circles theorem”). — Let $A=(a_{ij})$ be an $n\times n$ matrix and let $\lambda$ be an eigenvalue of $A$. There exists an integer $i$ such that \[ \abs{\lambda-a_{ii}}\leq \sum_{j\neq i} \abs{a_{ij}}. \]

For the proof, one writes $A=D+N$ where $D$ is diagonal has zeroes on its diagonal, and writes $\lambda x=Ax=Dx+Nx$, hence $(\lambda I-D)x=Nx$. Endow $\R^n$ with the $\ell^\infty$-norm. We can assume that $\norm x=1$. Then the norm of the right hand side is bounded above by $\norm N$, while the norm of the left hand side is $\sup(\abs{\lambda-a_{ii}} |x_i|)\geq |\lambda-a_{ii}|$ if $i$ is chosen so that $|x_i|=\norm x=1$. Given the above formula for $\norm N$, this implies the theorem.

The case $p=q=2$

Since Euclidean spaces are very useful in applications, this may be a very important case to consider, and we will see that the answer is not at all straightforward from the coefficients of the matrix.

We have to bound from above $\norm{u(x)}$. Using the scalar product, we write \[ \norm{u(x)}^2 = \langle u(x),u(x)\rangle = \langle u^*u(x),x\rangle, \] where $u^*\colon F\to E$ now denotes the adjoint of $u$, which identifies with the transpose of $u$ if one identifies $E$ with $E^*$ and $F$ with $F^*$ by means of their scalar products. Using the Cauchy-Schwarz formula, we get that $\norm{u(x)}^2\leq \norm{u^*u(x)}\norm x\leq \norm{u^*u} \norm x^2$, hence $\norm{u} \leq \norm{u^*u}^{1/2}$. This inequality is remarkable because on the other hand, we have $\norm{u^*u}\leq \norm{u^*}\norm{u}=\norm{u}^2$. Consequently, $\norm{u}=\norm{u^*u}^{1/2}$.

This formula might not appear to be so useful, since it reduces the computation of the operator norm of $u$ to that of $u^*u$. However, the linear map $u^*u$ is a positive self-adjoint endomorphism of $E$ hence, (assuming that $E$ is finite dimensional here), it can be diagonalized in a orthonormal basis. We then see that $\norm{u^*u}$ is the largest eigenvalue of $u^*u$, which is also its spectral radius. The square roots of the eigenvalues of $u^*u$ are also called the singular values of $u$, hence $\norm u$ is the largest singular value of $u$.

One can play with duality as well, and we have $\norm{u}=\norm{uu^*}^{1/2}$.

Other cases?

There are general inequalities relating the various $\ell^p$-norms of a vector $x\in\R^m$, and these can be used to deduce inequalities for $\norm u$, when $E=\R^m$ has an $\ell^p$-norm and $F=\R^n$ has an $\ell^q$-norm. However, given the explicit value of $\norm u$ for $(p,q)=(2,2)$ and the fact that no closed form expression exists for the spectral radius, it is unlikely that there is a closed form expression in the remaining cases.

Worse: the exact computation of $\norm u$ in the cases $(\infty,1)$, $(\infty,2)$ or $(2,1)$ is known to be computationally NP-complete, and I try to explain this result below, following J. Rohn (2000) (“Computing the Norm ∥ A ∥∞,1 Is NP-Hard”, Linear and Multilinear Algebra 47 (3), p. 195‑204). I concentrate on the $(\infty, 1)$ case ; the $(\infty,2)$ case is supposed to be analogous (see Joel Tropp's thesis, top of page 48, quoted by Wikipedia, but no arguments are given), and the case $(2,1)$ would follow by symmetry.

A matrix from a graph

Let us consider a finite (undirected, simple, without loops) graph $G$ on the set $V=\{1,\dots,n\}$ of $n$ vertices, with set of edges $E$, and let us introduce the following $n\times n$ matrix $A=(a_{ij})$, a variant of the incidence matrix of the graph $G$ (actually $nI-E$, where $I$ is the identity matrix and $E$ is the incidence matrix of $G$):

One has $a_{ii}=n$ for all $i$;
If $i\neq j$ and vertices $i$ and $j$ are connected by an edge, then $a_{ij}=-1$;
Otherwise, $a_{ij}=0$.

For any subset $S$ of $V$, the cut $c(S)$ of $S$ is the number of edges which have one endpoint in $S$ and the other outside of $S$.

Proposition. — The $(\ell^\infty,\ell^1)$-norm of $A$ is given by \[ 4 \sup_{S\subseteq V} c(S) - 2 \operatorname{Card}(E) + n^2. \]

The proof starts with the following observation, valid for more general matrices.

Lemma. — The $(\ell^\infty,\ell^1)$-norm of a symmetric positive $n\times n$ matrix $A$ is given by $\norm A = \sup_z \langle z, Az \rangle$ where $z$ runs among the set $Z$ of vectors in $\R^n$ with coordinates $\pm1$.

The vectors of $Z$ are the vertices of the polytope $[-1;1]^n$, which is the unit ball of $\R^n$ for the $\ell^\infty$-norm. Consequently, every vector of $[-1;1]^n$ is a convex combination of vectors of $Z$. Writing $x=\sum_{z\in Z} c_z z$, we have \[\norm {Ax} = \norm{\sum c_z Az} \leq \sum c_z \norm {Az}= \sup_{z\in Z} \norm{Az}. \] The other inequality being obvious, we already see that $\norm A=\sup_{z\in Z}\norm{Az}$. Note that this formula holds for any norm on the codomain.
If, for $z\in Z$, one writes $Az=(y_1,\dots,y_n)$, one has $\norm{Az}=|y_1|+\dots+|y_n|$, because the codomain is endowed with the $\ell^1$-norm, so that $\langle z, Az\rangle = \sum z_i y_i\leq \norm{Az}$. We thus the inequality $\sup_{z\in Z} \langle z,Az\rangle \leq \norm A$.
Let us now use the fact that $A$ is symmetric and positive. Fix $z\in Z$, set $Az=(y_1,\dots,y_n)$ as above, and define $x\in Z$ by $x_i=1$ if $y_i\geq0$ and $x_i=-1$ otherwise. One thus has $\langle x, Az\rangle=\sum |y_i|=\norm{Az}$. Since $A$ is symmetric and positive, one has $\langle x-z, A(x-z)\rangle\geq0$, and this implies \[2\norm{Az}= 2\langle x, Az\rangle \leq \langle x, Ax\rangle+\langle z, Az\rangle, \] so that, $\norm{Az}\leq \sup_{x\in Z} \langle x, Ax\rangle$. This concludes the proof.

To prove the theorem, we will apply the preceding lemma. We observe that $A$ is symmetric, by construction. It is also positive, since for every $x\in\R^n$, one has \[\langle x,Ax\rangle=\sum a_{ij}x_ix_j \geq n \sum x_i^2 -\sum_{i\neq j} x_i x_j = (n+1)\sum x_i^2- (\sum x_i)^2\geq \sum x_i^2 \] using the Cauchy-Schwarz inequality $(\sum x_i)^2\leq n\sum x_i^2$. By the preceding lemma, we thus have \[ \norm A = \sup_{z\in\{\pm1\}^n} \langle z, Az\rangle. \] The $2^n$ vectors $z\in Z$ are in bijection with the $2^n$ subsets of $V=\{1,\dots,n\}$, by associating with $z\in Z$ the subset $S$ of $V$ consisting of vertices $i$ such that $z_i=1$. Then, one can compute \[ \langle z, Az\rangle = \sum_{i,j} a_{ij} z_iz_j = 4c(S) - 2\operatorname{Card}(E) + n^2. \] It follows that $\norm A $ is equal to the indicated expression.

The last step of the proof is an application of the “simple max-cut” NP-hardness theorem of Garey, Johnson and Stockmeyer (1976), itself a strenghtening of Karp (1973)'s seminal result that “max-cut” is NP-complete. I won't explain the proofs of these results here, but let me explain what they mean and how they relate to the present discussion. First of all, computer scientists categorize problems according to the time that is required to solve them, in terms of the size of the entries. This notion depends on the actual computer that is used, but the theory of Turing machines allows to single out two classes, P and EXP, consisting of problems which can be solved in polynomial, respectively exponential, time in term of the size of the entries. A second notion, introduced by Karp, is that of NP problems, problems which can be solved in polynomial time by a “non deterministic Turing machine” — “nondeterministic” means the computer can parallelize itself at will when it needs to consider various possibilities. This class belongs to EXP (because one can simulate in exponential time a polynomial time nondeterministic algorithm) and also corresponds to the class of problems whose solution can be checked in polynomial time.

Our problem is to find a subset $S$ of $\{1,\dots,n\}$ that maximizes $c(S)$. This is a restriction of the “general max-cut” problem where, given an integer valued function $w$ on the set of edges, on wishes to find subset that maximizes $c(S;w)$, the sum of the weights of the edges which have one endpoint in $S$ and the other outside of $S$. Karp (1973) observed that the existence of $S$ such that $c(S;w)\geq m$ is an NP problem (if one is provided $S$, it takes polynomial time to compute $c(S;w)$ and to decide that it is at least $m$), and the naïve search algorithm is in EXP, since there are $2^n$ such subsets. Moreover, Karp proves that any NP problem can be reduced to it in polynomial time. This is what is meant by the assertion that it is NP-complete. Consequently, determining $\sup_S c(S;w)$ is NP-hard: if you can solve that problem, then you can solve the “max-cut” problem in polynomial time, hence any other NP-problem. A subsequent theorem by Garey, Johnson and Stockmeyer (1976) established that restricting the max-cut problems to $\pm1$ weights is still NP-hard, and this completes the proof of Rohn's theorem.

(Aside, to insist that signs matter: a theorem of Edmonds and Karp (1972), one can solve the “min-cup” problem in polynomial time, which consists in deciding, for some given integer $m$, whether there exist $S$ such that $c(S;w)\leq m$.)

The topology on the ring of polynomials and the continuity of the evaluation map

2024-04-13T16:01:00.002+02:00

Polynomials are an algebraic gadget, and one is rarely led to think about the topology a ring of polynomials should carry. That happened to me, though, more or less by accident, when María Inés de Frutos Fernández and I worked on implementing in Lean the evaluation of power series. So let's start with them. To simplify the discussion, I only consider the case of one inderminate. When there are finitely many of them, the situation is the same; in the case of infinitely many indeterminates, there might be some additional subtleties, but I have not thought about it.

$\gdef\lbra{[\![}\gdef\rbra{]\!]} \gdef\lpar{(\!(}\gdef\rpar{)\!)} \gdef\bN{\mathbf N} \gdef\coeff{\operatorname{coeff}} \gdef\eval{\operatorname{eval}} \gdef\colim{\operatorname{colim}}$

Power series

A power series over a ring $R$ is just an expression $\sum a_nT^n$, where $(a_0,a_1, \dots)$ is a family of elements of $R$ indexed by the integers. After all, this is just what is meant by “formal series”: coefficients and nothing else.

Defining a topology on the ring $R\lbra T\rbra$ should allow to say what it means for a sequence $(f_m)$ of power series to converge to a power series $f$, and the most natural thing to require is that for every $n$, the coefficient $a_{m,n}$ of $T^n$ in $f_m$ converges to the corresponding coeffient $a_m$ of $T^n$ in $f$. In other words, we endow $R\lbra T\rbra $ with the product topology when it is identified with the product set $R^{\bN}$. The explicit definition may look complicated, but the important point for us is the following characterization of this topology: Let $X$ be a topological space and let $f\colon X \to R\lbra T\rbra$ be a map; for $f$ to be continuous, it is necessary and sufficient that all maps $f_n\colon X \to R$ are continuous, where, for any $x\in X$, $f_n(x)$ is the $n$th coefficient of $f(x)$. In particular, the coeffient maps $R\lbra T\rbra\to R$ are continuous.

What can we do with that topology, then? The first thing, maybe, is to observe its adequacy wrt the ring structure on $R\lbra T\rbra$.

Proposition. — If addition and multiplication on $R$ are continuous, then addition and multiplication on $R\lbra T\rbra$ are continuous.

Let's start with addition. We need to prove that $s\colon R\lbra T\rbra \times R\lbra T\rbra\to R\lbra T\rbra$ is continuous. By the characterization, it is enough to prove that all coordinate functions $s_n\colon R\lbra T\rbra \times R\lbra T\rbra\to R$, $ (f,g)\mapsto \coeff_n(f+g) $, are continuous. But these functions factor through the $n$th coefficient maps: $\coeff_n(f+g) = \coeff_n(f)+\coeff_n(g)$, which is continuous, since addition, coefficients and projections are continuous. This is similar, but slightly more complicated for multiplication: if the multiplication map is denoted by $m$, we have to prove that the maps $m_n$ defined by $m_n(f,g)=\coeff_n(f\cdot g)$ are continuous. However, they can be written as \[ m_n(f,g)=\coeff_n(f\cdot g) = \sum_{p=0}^n \coeff_p(f)\coeff_{n-p}(g). \] Since the projections and the coefficient maps are continuous, it is sufficient to prove that the maps from $R^{n+1} \times R^{n+1}$ to $R$ given by \[((a_0,\dots,a_n),(b_0,\dots,b_n))\mapsto \sum_{p=0}^n a_p b_{n-p} \] are continuous, and this follows from continuity and commutativity of addition on $R$, because it is a polynomial expression.

Polynomials

At this point, let's go back to our initial question of endowing polynomials with a natural topology.

An obvious candidate is the induced topology. This looks correct; in any case, it is such that addition and multiplication on $R[T]$ are continuous. However, it lacks an interesting property with respect to evaluation.

Recall that for every $a\in R$, there is an evaluation map $\eval_a\colon R[T]\to R$, defined by $f\mapsto f(a)$, and even, if one wishes, the two-variable evaluation map $R[T]\times R\to R$.
The first claim is that this map is not continuous.

An example will serve of proof. I take $R$ to be the real numbers, $f_n=T^n$ and $a=1$. Then $f_n$ converges to zero, because for each integer $m$, the real numbers $\coeff_m(f_n)$ are zero for $n>m$. On the other hand, $f_n(a)=f_n(1)=1$ for all $n$, and this does not converge to zero!

So we have to change the topology on polynomials if we want that this map be continuous, and we now give the correct definition. The ring of polynomials is the increasing union of subsets $R[T]_n$, indexed by integers $n$, consisting of all polynomials of degree less than $n$. Each of these subsets is given the product topology, as above, but we endow their union with the “inductive limit” topology. Explicitly, if $Y$ is a topological space and $u\colon R[T]\to Y$ is a map, then $u$ is continuous if and only if, for each integer $n$, its restriction to $R[T]_n$ is continuous.

The inclusion map $R[T]\to R\lbra T\rbra$ is continuous, hence the topology on polynomials is finer than the topology induced by the topology on power series. As the following property indicates, it is usually strictly finer.

We can also observe that addition and multiplication on $R[T]$ are still continuous. The same proof as above works, once we observe that the coefficient maps are continuous. (On the other hand, one may be tempted to compare the product topology of the inductive topologies, with the inductive topology of the product topologies, a thing which is not obvious in the direction that we need.)

Proposition. — Assume that addition and multiplication on $R$ are continuous. Then the evaluation maps $\eval_a \colon R[T]\to R$ are continuous.

We have We have to prove that for every integer $n$, the evaluation map $\eval_a$ induced a continuous map from $R[T]_n$ to $R$. Now, this map factors as a projection map $R[T]\to R^{n+1}$ composed with a polynomial map $(c_0,\dots,c_n)\mapsto c_0+c_1a+\dots+c_n a^n$. It is therefore continuous.

Laurent series

We can upgrade the preceding discussion and define a natural topology on the ring $R\lpar T\rpar$ of Laurent series, which are the power series with possibly negative exponents. For this, for all integers $d$, we set $R\lpar T\rpar_d$ to be the set of power series of the form $ f=\sum_{n=-d}^\infty c_n T^n$, we endow that set with the product topology, and take the corresponding inductive limit topology. We leave to the reader to check that this is a ring topology, but that the naïve product topology on $R\lpar T\rpar$ wouldn't be in general.

Back to the continuity of evaluation

The continuity of the evaluation maps $f\mapsto f(a)$ were an important guide to the topology of the ring of polynomials. This suggests a more general question, for which I don't have a full answer, whether the two-variable evaluation map, $(f,a)\mapsto f(a)$, is continuous. On each subspace $R[T]_d\times R$, the evaluation map is given by a polynomial map ($(c_0,\dots,c_d,a)\mapsto c_0 +c_1a+\dots+c_d a^d$), hence is continuous, but that does not imply the desired continuity, because that only tells us about $R[T]\times R$ with the topology $\colim_d (R[T]_d\times R)$, while we are interested in the topology $(\colim_d R[T]_d)\times R$. To compare these topologies, note that the natural bijection $\colim_d (R[T]_d\times R) \to (\colim_d R[T]_d)\times R$ is continuous (because it is continuous at each level $d$), but the continuity of its inverse is not so clear.

I find it amusing, then, to observe that sequential continuity holds in the important case where $R$ is a field. This relies on the following proposition.

Proposition. — Assume that $R$ is a field. Then, for every converging sequence $(f_n)$ in $R[T]$, the degrees $\deg(f_n)$ are bounded.

Otherwise, we can assume that $(f_n)$ converges to $0$ and that $\deg(f_{n+1})>\deg(f_n)$ for all $n$. We construct a continuous linear form $\phi$ on $R[T]$ such that $\phi(f_n)$ does not converge to $0$. This linear form is given by a formal power series $\phi(f)=\sum a_d c_d$ for $f=\sum c_dT^d$, and we choose the coefficients $(a_n)$ by induction so that $\phi(f_n)=1$ for all $n$. Indeed, if the coefficients are chosen up to $\deg(f_n)$, then we fix $a_d=0$ for $\deg(f_n)<d<\deg(f_{n+1})$ and choose $a_{\deg(f_{n+1})}$ so that $\phi(f_{n+1})=1$. This linear form is continuous because its restriction to any $R[T]_d$ is given by a polynomial, hence is continuous.

Corollary. — If $R$ is a topological ring which is a field, then the evaluation map $R[T]\times R\to R$ is sequentially continuous.

Consider sequences $(f_n)$ in $R[T]$ and $(a_n)$ in $R$ that converge to $f$ and $a$ respectively. By the proposition, there is an integer $d$ such that $\deg(f_n)\leq d$ for all $n$, and $\deg(f)\leq d$. Since evaluation is continuous on $R[T]_d\times R$, one has $f_n(a_n)\to f(a)$, as claimed.

Remark. — The previous proposition does not hold on rings. In fact, if $R=\mathbf Z_p$ is the ring of $p$-adic integers, then $\phi(p^nT^n)=p^n \phi(T^n)$ converges to $0$ for every continuous linear form $\phi$ on $R[T]$. More is true since in that case, evaluation is continuous! The point is that in $\mathbf Z_p$, the ideals $(p^n)$ form a basis of neighborhoods of the origin.

Proposition. — If the topology of $R$ is linear, namely the origin of $R$ has a basis of neighborhoods consisting of ideals, then the evaluation map $R[T]\times R\to R$ is continuous.

By translation, one reduces to showing continuity at $(0,0)$. Let $V$ be a neighborhood of $0$ in $R$ and let $I$ be an ideal of $R$ such that $I\subset V$. Since it is an subgroup of the additive group of $R$, the ideal $I$ is open. Then the set $I\cdot R[T]$ is open because for every $d$, its trace on $R[T]_d$, is equal to $I\cdot R[T]_d$, hence is open. Then, for $f\in I\cdot R[T]$ and $a\in R$, one has $f(a)\in I$, hence $f(a)\in V$.

Here is one case where I can prove that evaluation is continuous.

Proposition. — If the topology of $R$ is given by a family of absolute values, then the evaluation map $(f,a)\mapsto f(a)$ is continuous.

I just treat the case where the topology of $R$ is given by one absolute value. By translation and linearity, it suffices to prove continuity at $(0,0)$. Consider the norm $\|\cdot\|_1$ on $R[T]$ defined by $\|f\|_1=\sum |c_n|$ if $f=\sum c_nT^n$. By the triangular inequality, one has $|f(a)|\leq \|f\|_1 $ for any $a\in R$ such that $|a|\leq 1$. For every $r>0$, the set $V_r$ of polynomials $f\in R[T]$ such that $\|f\|_1<r$ is an open neighborhood of the origin since, for every integer $d$, its intersection with $R[T]_d$ is an open neighborhood of the origin in $R[T]_d$. Let also $W$ be the set of $a\in R$ such that $|a|\leq 1$. Then $V_r\times W$ is a neighborhood of $(0,0)$ in $R[T]\times R$ such that $|f(a)|<r$ for every $(f,a)\in V_r\times W$. This implies the desired continuity.

Flatness and projectivity: when is the localization of a ring a projective module?

2024-04-10T18:35:00.004+02:00

Projective modules and flat modules are two important concepts in algebra, because they characterize those modules for which a general functorial construction (Hom module and tensor product, respectively) behave better than what is the case for general modules.

This blog post came out of reading a confusion on a student's exam: projective modules are flat, but not all flat modules are projective. Since localization gives flat modules, it is easy to obtain a an example of a flat module which is not projective (see below, $\mathbf Q$ works, as a $\mathbf Z$-module), but my question was to understand when the localization of a commutative ring is a projective module.

$\gdef\Hom{\operatorname{Hom}}\gdef\Spec{\operatorname{Spec}}\gdef\id{\mathrm{id}}$

Let me first recall the definitions. Let $R$ be a ring and let $M$ be a (right)$R$-module.

The $\Hom_R(M,\bullet)$-functor associates with a right $R$-module $X$ the abelian group $\Hom_R(M,X)$. By composition, any linear map $f\colon X\to Y$ induces an additive map $\Hom_R(M,f)\colon \Hom_R(M,X)\to \Hom_R(M,X)$: it maps $u\colon M\to X$ to $\phi\circ u$. When $R$ is commutative, these are even $R$-modules and morphisms of $R$-modules. If $f$ is injective, $\Hom_R(M,f)$ is injective as well, but if $f$ is surjective, it is not always the case that $\Hom_R(M,f)$ is surjective, and one says that the $R$-module $M$ is projective if $\Hom_R(M,f)$ is surjective for all surjective linear maps $f$.

The $\otimes_R$-functor associates with a left $R$-module $X$ the abelian group $M\otimes_R X$, and with any linear map $f\colon X\to Y$, the additive map $M\otimes_R X\to M\otimes_R Y$ that maps a split tensor $m\otimes x$ to $m\otimes f(x)$. When $R$ is commutative, these are even $R$-modules and morphisms of $R$-modules. If $f$ is surjective, then $M\otimes_R f$ is surjective, but if $f$ is injective, it is not always the case that $M\otimes_R f$ is injective. One says that $M$ is flat if $M\otimes_R f$ is injective for all injective linear maps $f$.

These notions are quite abstract, and the development of homological algebra made them prevalent in modern algebra.

Example. — Free modules are projective and flat.

Proposition. — An $R$-module $M$ is projective if and only if there exists an $R$-module $N$ such that $M\oplus N$ is free.
Indeed, taking a generating family of $M$, we construct a free module $L$ and a surjective linear map $u\colon L\to M$. Since $M$ is projective, the map $\Hom_R(M,u)$ is surjective and there exists $v\colon M\to L$ such that $u\circ v=\id_M$. Then $v$ is an isomorphism from $M$ to $u(M)$, and one can check that $L=u(M)\oplus \ker(v)$.

Corollary. — Projective modules are flat.

Theorem (Kaplansky). — If $R$ is a local ring, then a projective $R$-module is free.

The theorem has a reasonably easy proof for a finitely generated $R$-module $M$ over a commutative local ring. Let $J$ be the maximal ideal of $R$ and let $k=R/J$ be the residue field. Then $M/JM$ is a finite dimensional $k$-vector space; let us consider a family $(e_1,\dots,e_n)$ in $M$ whose images form a basis of $M/JM$. Now, one has $\langle e_1,\dots,e_n\rangle + J M = M$, hence Nakayama's lemma implies that $M=\langle e_1,\dots,e_n\rangle$. Let then $u\colon R^n\to M$ be the morphism given by $u(a_1,\dots,a_n)=\sum a_i e_i$; by what precedes, it is surjective, and we let $N$ be its kernel. Since $M$ is projective, the morphism $\Hom_R(M,u)$ is surjective, and there exists $v\colon M\to R^n$ such that $u\circ v=\id_M$. We then have an isomorphism $M\oplus N\simeq R^n$, where $N=\ker(v)$. Moding out by $J$, we get $M/JM \oplus N/JN \simeq k^n$. Necessarily, $N/JN=0$, hence $N=JN$; since $N$ is a direct summand of $R^n$, it is finitely generated, and Nakayama's lemma implies that $N=0$.

Example. — Let $R$ be a commutative ring and let $S$ be a multiplicative subset of $R$. Then the fraction ring $S^{-1}R$ is a flat $R$-module.
Let $u\colon X\to Y$ be an injective morphism of $R$-modules. First of all, one identifies the morphism $S^{-1}R\otimes_R u\colon S^{-1}R\otimes_R X\to S^{-1}R\otimes_R Y$ to the morphism $S^{-1}u\colon S^{-1}X\to S^{-1}Y$ induced by $u$ on fraction modules. Then, it is easy to see that $S^{-1}u$ is injective. Let indeed $x/s\in S^{-1}X$ be an element that maps to $0$; one then has $u(x)/s=0$, hence there exists $t\in S$ such that $tu(x)=0$. Consequently, $u(tx)=0$, hence $tx=0$ because $u$ is injective. This implies $x/s=0$.

Theorem. — Let $R$ be a commutative ring. If $M$ is a finitely presented $R$-module, then $M$ is locally free: there exists a finite family $(f_1,\dots,f_n)$ in $R$ such that $R=\langle f_1,\dots,f_n\rangle$ and such that for every $i$, $M_{f_i}$ is a free $R_{f_i}$-module.
The proof is a variant of the case of local rings. Starting from a point $p\in\Spec(R)$, we know that $M_p$ is a finitely presented flat $R_p$-module. As above, we get a surjective morphism $u\colon R^n\to M$ which induces an isomorphism $\kappa(p)^n\to \kappa(p)\otimes M$, and we let $N$ be its kernel. By flatness of $M$ (and an argument involving the snake lemma), the exact sequence $0\to N\to R_p\to M\to 0$ induces an exact sequence $0\to \kappa(p)\otimes N\to \kappa(p)^n\to \kappa(p)\otimes M\to 0$. And since the last sequence is an isomorphism, we have $\kappa(p)\otimes N$. Since $M$ is finitely presented, the module $N$ is finitely generated, and Nakayama's lemma implies that $N_p=0$; moreover, there exists $f\not\in p$ such that $N_f=0$, so that $u_f\colon R_f^n\to M_f$ is an isomorphism. One concludes by using the quasicompactness of $\Spec(R)$.

However, not all flat modules are projective. The most basic example is the following one.

Example. — The $\mathbf Z$-module $\mathbf Q$ is flat, but is not projective.
It is flat because it is the total fraction ring of $\mathbf Z$. To show that it is not projective, we consider the free module $L={\mathbf Z}^{(\mathbf N)}$ with basis $(e_n)$ and the morphism $u\colon L\to\mathbf Q$ that maps $e_n$ to $1/n$ (if $n>0$, say). This morphism is surjective. If $\mathbf Q$ were projective, there would exist a morphism $v\colon \mathbf Q\to L$ such that $u\circ v=\id_{\mathbf Q}$. Consider a fraction $a/b\in\mathbf Q$; one has $b\cdot 1/b=1$, hence $b v(1/b)=v(1)$. We thus see that all coeffiencients of $v(1)$ are divisible by $b$, for any integer $b$; they must be zero, hence $v(1)=0$ and $1=u(v(1))=0$, a contradiction.
The proof generalizes. For example, if $R$ is a domain and $S$ does not consist of units, and does not contain $0$, then $S^{-1}R$ is not projective. (With analogous notation, take a nonzero coefficient $a$ of $v(1)$ and set $b=as$, where $s\in S$ is not $0$; then $as$ divides $a$, hence $s$ divides $1$ and $s$ is a unit.)

These recollections are meant to motivate the forthcoming question: When is it the case that a localization $S^{-1}R$ is a projective $R$-module?

Example. — Let $e$ be an idempotent of $R$, so that the ring $R$ decomposes as a product ot two rings $R\simeq eR \times (1-e)R$, and both factors are projective submodules of $R$ since their direct sum is the free $R$-module $R$. Now, one can observe that $R_e= eR$. Consequently, $R_e$ is projective. Geometrically, $\Spec(R)$ decomposes as a disjoint union of two closed subsets $\mathrm V(e)$ and $\mathrm V(1-e)$; the first one can be viewed as the open subset $\Spec(R_{1-e})$ and the second one as the open subset $\Spec(R_e)$.

The question was to decide whether this geometric condition furnishes the basic conditions for a localization $S^{-1}R$ to be projective. With the above notation, we recall that $\Spec(S^{-1}R)$ is homeomorphic to a the subset of $\Spec(R)$ consisting of prime ideals $p$ such that $p\cap S=\emptyset$. The preceding example corresponds to the case where $\Spec(S^{-1}R)$ is open and closed in $\Spec(R)$. In this case, we view $S^{-1}R$ as a quasicoherent sheaf on $\Spec(R)$, it is free of rank one on the open subset $\Spec(S^{-1}R)$, and zero on the complementary open subset. It is therefore locally free, hence the $R$-module $S^{-1}R$ is projective.

Observation. — The set $\Spec(S^{-1}R)$ is stable under generization. If $S^{-1}R$ is a projective $R$-module, then it is open.
The first part is obvious: if $p$ and $q$ are prime ideals of $R$ such that $p\subseteq q$ and $q\cap S=\emptyset$, then $p\cap S=\emptyset$. The second part follows from the observation that the support of $S^{-1}R$ is exactly $\Spec(S^{-1}R)$, combined with the following proposition.

Proposition. — The support of a projective module is open.
I learnt this result in the paper by Vasconcelos (1969), “On Projective Modules of Finite Rank” (Proceedings of the American Mathematical Society 22 (2): 430‑33). The proof relies on the trace ideal $\tau_R(M)$ of a module: this is the image of the canonical morphism $t\colon M^\vee \otimes_R M\to R$. (It is called the trace ideal, because when $M$ is free, $M^\vee\otimes_R M$ can also be identified with the module of endomorphisms of finite rank of $M$, a split tensor $\phi\otimes m$ corresponds with the endomorhism $x\mapsto \phi(x)m$, and then $t(\phi \otimes m)=\phi(m)$ is its trace.) Now, if $p$ belongs to the support of $M$, then $\tau_R(M)_p=R_p$, while if $p$ does not belong to the support of $M$, one has $M_p=0$, hence $\tau_R(M)_p=0$. In other words, the support of $M$ is the complement of the closed locus $\mathrm V(\tau_R(M))$ of $\Spec(R)$.

On the other hand, one should remember the following basic property of the support of a module.

Proposition. — The support of a module is stable under specialization. The support of a finitely generated module is closed.
Indeed, for every $m\in M$ and $p\in \Spec(R)$, saying that $m=0$ in $M_p$ means that there exist $s\in R$ such that $s\notin p$ with $sm=0$. In other words, this set is $\mathrm V(\mathrm{ann}_R(m))$. This shows that the support of $M$ is the union of the closed subsets $\mathrm V(\mathrm{ann}_R(m))$; it is in particular stable under specialization. If $M$ is finitely generated, this also shows its support is $\mathrm V(\mathrm{ann}_R(M))$, hence is closed.

At this point, one can go either following Vasconcelos (1969) who shows that a projective module $M$ of the form $S^{-1}R$ is finitely generated if and only if its trace ideal is. In particular, if $R$ is noetherian and $S^{-1}R$ is a projective $R$-module, then $\Spec(S^{-1}R)$ is closed. It is thus open and closed, and we are in the situation of the basic example above.

One can also use a topological argument explained to me by Daniel Ferrand: a minimal prime ideal of $R$ that meets $\Spec(S^{-1}R)$ is disjoint from $S$, hence belongs to $\Spec(S^{-1}R)$. Consequently, $\Spec(S^{-1}R)$ is the union of the irreducible components of $\Spec(R)$ that it meets. If this set of irreducible components is finite (or locally finite), for example if $\Spec(R)$ is noetherian, for example if $R$ is a noetherian ring, then $\Spec(S^{-1}R)$ is closed.

I did not find the time to think more about this question, and it would be nice to have an example of a projective localization which does not come from this situation.

Combinatorics of the nilpotent cone

2024-03-16T18:25:00.001+01:00

$\global\def\Card{\operatorname{Card}}\global\def\GL{\mathrm{GL}}\global\def\im{\operatorname{im}}\gdef\KVar{\mathrm{KVar}}$

Let $n$ be an integer and $F$ be a field. Nilpotent matrices $N\in \mathrm M_n(F)$ are those matrices for which there exists an integer $p$ with $N^p=0$. Their characteristic polynomial is $\chi_N(T)=T^n$, and they satisfy $N^n=0$, which shows that the set $\mathscr N_n$ of nilpotent matrices is an algebraic variety. The equation $N^n=0$ is homogeneous of degree $n$, so that $\mathscr N_n$ is a cone.

The classification of nilpotent matrices is an intermediate step in the theory of Jordan decomposition: In an adequate basis, a nilpotent matrix can be written as a diagonal block matrix of “basic” nilpotent matrices, $p \times p$ matrices of the form \[ \begin{pmatrix} 0 & 0 & \dots & 0 & 0 \\ 1 & 0 & & & \vdots \\ 0 & 1 & \ddots & & 0 \\ \vdots & \ddots & \ddots & \ddots & 0 \\ 0 & & 0 & 1 & 0\end{pmatrix} \] whose minimal polynomial is $T^p$. The sum of the sizes of these blocks is $n$ and in this way, it is associated with any $n\times n$ nilpotent matrix a partition $\pi$ of~$n$. It is known that two nilpotent matrices are conjugate if and only if they are associated with the same partition. For any partition $\pi$ of~$n$, let us denote by $J_\pi$ the corresponding matrix whose sizes of blocks are arranged in increasing order, and $\mathscr N_\pi$ the set of nilpotent matrices that are associated with the partition $\pi$.

The theorem of Fine and Herstein (1958)

Having to teach “agrégation” classes made me learn about a classic combinatorial result: counting the number of nilpotent matrices when $F$ is a finite field.

Theorem (Fine, Herstein, 1958). — Let $F$ be a finite field with $q$ elements. The cardinality of $\mathscr N_n(F)$ is $q^{n^2-n}$. Equivalently, the probability that an $n\times n$ matrix with coefficients in $F$ be nilpotent is $q^{-n}$.

The initial proof of this results relies on the action of $\GL_n(F)$ on $\mathscr N_n(F)$: we recalled that the orbits correspond with the partitions of $n$, hence a decomposition \[ \Card(\mathscr N_n(F)) = \sum_{\pi} \Card(\mathscr N_\pi(F)). \] We know that $\mathscr N_\pi(F)$ is the orbit of the matrix $J_\pi$ under the action of $\GL_n(F)$. By the classic orbit-stabilizer formula, one thus has \[ \Card(\mathscr N_\pi(F)) = \frac{\Card(\GL_n(F))}{\Card(C_\pi(F))}, \] where $C_\pi(F)$ is the set of matrices $A\in\GL_n(F)$ such that $AJ_\pi=J_\pi A$. The precise description of $C_\pi(F)$ is delicate but their arguments go as follow.

They first replace the group $C_\pi(F)$ by the algebra $A_\pi(F)$ of all matrices $A\in\mathrm M_n(F)$ such that $AJ_\pi=J_\pi A$. For any integer, let $m_i$ be the multiplicity of an integer $i$ in the partition $\pi$, so that $n=\sum i m_i$. The block decomposition of $J_\pi$ corresponds with a decomposition of $F^n$ as a direct sum of invariant subspaces $V_i$, where $V_i$ has dimension $i m_i$. In fact, $V_1=\ker(J_\pi)$, $V_1\oplus V_2=\ker(J_\pi^2)$, etc. This shows that $A_\pi(F)$ is an algebra of block-triangular matrices. Moreover, the possible diagonal blocks can be shown to be isomorphic to $\mathrm M_{m_i}(F)$. In other words, we have a surjective morphism of algebras \[ A_\pi(F) \to \prod_i \mathrm M_{m_i}(F), \] whose kernel consists of nilpotent matrices. In particular, the proportion of invertible elements in $A_\pi(F)$ is equal to the proportion of invertible elements in the product $\prod_i \mathrm M_{m_i}(F)$.

Ultimately, Fine and Herstein obtain an explicit sum over the set of partitions of $n$ which they prove equals $q^{n^2-n}$, after an additional combinatorial argument.

Soon after, the theorem of Fine and Herstein was given easier proofs, starting from Gerstenhaber (1961) to Kaplansky (1990) and Leinster (2021).

A proof

The following proof is borrowed from Caldero and Peronnier (2022), Carnet de voyage en Algébrie. It can be seen as a simplification of the proofs of Gerstenhaber (1961) and Leinster (2021).

Let us start with the Fitting decomposition of an endomorphism $u\in \mathrm N_n(F)$: the least integer $p$ such that $\ker(u^p)=\ker(u^{p+1})$ coincides with the least integer $p$ such that $\im(u^p)=\im(u^{p+1})$, and one has $F^n=\ker(u^p)\oplus \im(u^p)$. The subspaces $N(u)=\ker(u^p)$ and $I(u)=\im(u^p)$ are invariant under $u$, and $u$ acts nilpotently on $\ker(u^p)$ and bijectively on $\im(u^p)$. In other words, we have associated with $u$ complementary subspaces $N(u)$ and $I(u)$, a nilpotent operator of $N(u)$ and an invertible operator on $I(u)$. This map is bijective.

For any integer $d$, let $\nu_d$ be the cardinality of nilpotent matrices in $\mathrm M_d(F)$, and $\gamma_d$ be the cardinality of invertible matrices in $\mathrm M_d(F)$. Let also $\mathscr D_d$ be the set of all pairs $(K,I)$, where $K$ and $I$ are complementary subspaces of dimensions $d$, $n-d$ of $F^n$. We thus obtain \[ n^2 = \sum_{(K,I)\in\mathscr D_d} \nu_d \cdot \gamma_{n-d}. \] We need to compute the cardinality of $\mathscr D_d$. In fact, given one pair $(K,I)\in\mathscr D_d$, all other are of the form $(g\cdot K,g\cdot I)$, for some $g\in\GL_n(F)$: the group $\GL_n(F)$ acts transitively on $\mathscr D_d$. The stabilizer of $(K,I)$ can be identified with $\GL_d(F)\times \GL_{n-d}(F)$. Consequently, \[ \Card(\mathscr D_d) = \frac{\Card(\GL_n(F))}{\Card(\GL_d(F)\Card(\GL_{n-d}(F))} = \frac{\gamma_n}{\gamma_d \gamma_{n-d}}. \] We thus obtain \[ q^{n^2} = \sum_{d=0}^n \frac{\gamma_n}{\gamma_d \gamma_{n-d}} \nu_d \gamma_{n-d} = \gamma_n \sum_{d=0}^n \frac{\nu_d}{\gamma_d}. \] By subtraction, we get \[ \frac{\nu_n}{\gamma_n} = \frac {q^{n^2}}{\gamma_n} - \frac{q^{(n-1)^2}}{\gamma_{n-1}},\] or \[ \nu_n = q^{n^2} - q^{(n-1)^2} \frac{\gamma_n}{\gamma_{n-1}}. \] It remains to compute $\gamma_n$: since an invertible matrix consists of a nonzero vector, a vector which does not belong to the line generated by the first one, etc., we have \[ \gamma_n = (q^n-1) (q^n-q)\dots (q^n-q^{n-1}). \] Then, \[ \gamma_n = (q^n-1) q^{n-1} (q^{n-1}-1)\dots (q^{n-1}-q^{n-2}) = (q^n-1)q^{n-1} \gamma_{n-1}. \] We thus obtain \[ \nu_n = q^{n^2} - q^{(n-1)^2} (q^n-1) q^{n-1} = q^{n^2} - q^{(n-1)^2} q^{2n-1} + q^{(n-1)^2} q^{n-1} = q^{n^2-n}, \] as claimed.

The proof of Leinster (2021)

Leinster defines a bijection from $\mathscr N_n(F)\times F^n$ to $\mathrm M_n(F)$. The definition is however not very canonical, because he assumes given, for any subspace $V$ of $F^n$, a basis of $V$.

Take a pair $(u,x)$, where $u\in\mathscr N_n(F)$ and $x\in F^n$ and consider the subspace $V_x=\langle x,u(x),\dots\rangle$, the smallest $u$-invariant subspace of $F^n$ which contains $x$. To describe $u$, we observe that we know its restriction to $V_x$, and we need to describe it on the chosen complementary subspace $V'_x$.

To that aim, we have to give ourselves an endomorphism $u'_x$ of $V'_x$ and a linear map $\phi_x\colon V'_x\to V_x$. Since we want $u$ to be nilpotent, it is necessary and sufficient to take $u'_x$ nilpotent.

Instead of considering $\phi_x\colon V'_x\to V_x$, we can consider the map $y\mapsto y+\phi_x(y)$. Its image is a complement $W_x$ of $V_x$ in $F^n$, and any complement can be obtained in this way. The nilpotent endomorphism $u'_x$ of $V'_x$ transfers to a nilpotent endomorphism $w_x$ of $W_x$.

All in all, what the given pair $(u,x)$ furnishes is a subspace $V_x$ with a basis $(x_1=x,x_2=u(x),\dots)$, a complement $W_x$, and a nilpotent endomorphism $w_x$ of $W_x$. This is more or less what the Fitting decomposition of an endomorphism gives us! Recall that $V_x$ was assumed to have been given a basis $(e_1,\dots,e_p)$. There exists a unique automorphism of $V_x$ which maps $e_i$ to $u^{i-1}(x)$ for all $i$. In other words, we have a pair of complementary subspaces $(V_x,W_x)$, a linear automorphism of $V_x$, and a nilpotent automorphism of $W_x$. By the Fitting decomposition, these data furnish in a bijective way an endomorphism of $F^n$, and that concludes the proof.

A remark about motivic integration

The framework of motivic integration suggests to upgrade these combinatorial results into equalities valid for all field $F$, which hold in the Grothendieck ring of varieties $\KVar_F$. As an abelian group, it is generated by symbols $[X]$, for all algebraic varieties $X$ over $F$, with relations $[X]=[Y]+[X\setminus Y]$, whenever $Y$ is a closed subvariety of $X$. The ring structure is defined so that the formula $[X]\cdot[Y]=[X\times Y]$ for all algebraic varieties $X$ and $Y$ over $F$.

By construction of this ring, equalities $[X]=[Y]$ in $\KVar_F$ imply that many invariants of $X$ and $Y$ coincide. In particular, when $F$ is a finite field, they will have the same number of points.

The question is thus to compute the class $[\mathscr N_n]$ of the variety $\mathscr N_n$, for any field $F$. The proofs that I described above can be more or less transferred to this context and imply the following theorem. We denote by $\mathbf L\in \KVar_F$ the class of the affine line $\mathbf A^1$.

Theorem. — One has an equality $[\mathscr N_n] \mathbf L^n = \mathbf L^{n^2}$ in the localization of the Grothendieck ring $\KVar_F$ by the element $(\mathbf L-1)\dots(\mathbf L^{n-1}-1)$.

The following question is then natural. (I have not thought about it at all.)

Question. — Does one have $[\mathscr N_n]=\mathbf L^{n^2-n}$ in $\KVar_F$?

A combinatorial proof of the Newton identities

2023-10-13T18:03:00.010+02:00

Let $T_1,\dots,T_n$ be indeterminates. For all $m\geq0$, denote by $S_m$ the $m$th elementary symmetric polynomial, given by \[ S_m = \sum_{i_1 \lt \dots \lt i_m} T_{i_1}\dots T_{i_m}. \] These are polynomials in $T_1,\dots, T_n$, and as their names suggest, these polynomials are symmetric, meaning that \[ S_m (T_{\sigma(1)},\dots,T_{\sigma(n)}) = S_m (T_1,\dots,T_n) \] for any permutation $\sigma$ of $\{1,\dots,n\}$. By definition, one has $S_0=1$ (there is exactly one family of integers, the empty one, and the empty product is equal to~$1$) and $S_m=0$ for $m>n$ (there are no families of integers $1\leq i_1\lt\dots\lt i_m\leq n$). A well-known theorem asserts that any symmetric polynomial in $T_1,\dots,T_n$ (with coefficients in a commutative ring $R$) can be written uniquely as a polynomial in $S_1,\dots,S_n$ (with coefficients in $R$). It is in particular the case for the Newton polynomials, defined for $p\geq 0$ by \[ N_p = T_1^p + \dots+T_n^p . \] In particular, $N_0=n$ and $N_1=S_1$. The next Newton polynomial can be computed by “completing the square”: \[ N_2 = T_1^2+\dots+T_n^2 = (T_1+\dots+T_n)^2 - 2 S _1 = S_1^2 - 2S_1. \] These are the first two (or three) of a family of relations, called the Newton identities, that relate the polynomials $N_p$ and the polynomials $S_m$: \[ \sum_{m=0}^{p-1} (-1)^m N_{p-m}\cdot S_m + (-1)^p p \cdot S_p = 0. \] Using that the coefficient of $N_p$ is given by $S_0=1$, they allow to express inductively the polynomials $N_p$ in terms of $S_1,\dots,S_p$. If $p>n$, all terms with $m>n$ vanish. Using that the coefficient of $S_p$ is $N_0=p$, these formulas also show, conversely, that if $p!$ is invertible in the commutative ring $R$, then $S_1,\dots,S_p$ can be expressed in terms of $N_1,\dots,N_p$. There are various proofs of these identities. Most of the one I have seen are algebraic in nature, in so that they rely on the relations between the roots of a polynomial and its coefficients, and/or on expansion in power series. The following proof, due to Doron Zeilberger (1984) (Discrete Math. 49, p. 319, doi:10.1016/0012-365X(84)90171-7), is purely combinatorial. I have been made aware of it because it is the one that is formalized in the proof assistant system Lean. (See there for the source code.) We consider the set $S=\{1,\dots,n\}$ and define a set $X$ whose elements are the pairs $(A,k)$ satisfying the following properties:

$A$ is a subset of $S$;
$k$ is an element of $S$;
the cardinality of $A$ is less or equal than $p$;
if ${\mathop {\rm Card}}(A)=p$, then $k\in A$.

Define a map $f\colon X\to X$ by the following rule: for $(A,k)$ in $X$, set

$f(A,k) = (A \setminus\{k\}, k)$ if $k\in A$;
$f(A,k) = (A\cup\{k\}, k)$ if $k\notin A$.

Write $f(A,k)=(A',k)$; observe that this is again an element of $X$. The first two conditions are obvious. In the first case, when $k\in A$, then the cardinality of $A'$ is one less than the cardinality of $A$, hence is at most $p-1$, and the last condition is vacuous. In the second case, when $k\notin A$, the cardinality of $A$ is strictly smaller than $p$, because $(A,k)\in X$, so that the cardinality of $A'$ is at most $p$; moreover, $k\in A'$ hence the last condition holds. Observe that $f\colon X\to X$ is an involution: $f\circ f={\mathrm{id}}_X$. Indeed, with the same notation, one has $f(A,k)=(A',k)$. In the first case, $k\in A$, hence $k\notin A'$, so that $f(A',k)=(A'\cup\{k\},k)=(A,k)$; in the second case, one has $m\in A'$, hence $f(A',k)=(A'\setminus\{k\}, k)=(A,k)$ since $A'=A\cup\{k\}$ and $k\notin A$. Moreover, $f$ has no fixed point because it switches pairs $(A,k)$ such that $k\in A$ and pairs such that $k\notin A$. Zeilberger's proof of the Newton identities build on a function $w\colon X\to R[T_1,\dots,T_n]$ such that $w(f(A,k))=-f(A,k)$ for all $(A,k)\in X$. Since $f$ has no fixed point, the expression vanishes \[ \sum_{(A,k)\in X} w(A,k) \] since one can cancel a term $(A,k)$ with its image $f(A,k)$. Zeilberger's definition of the function $w$ is \[ w(A,k) = (-1)^{\mathrm{Card}(A)} T_k^{p-\mathrm{Card}(A)}\cdot \prod_{a\in A} T_a. \] It satisfies the desired relation: for $(A,k)\in X$ such that $k\notin A$, one has $f(A,k)=(A\cup\{k\}, k)$, so that \[ w(f(A,k)) = (-1)^{\mathrm{Card}(A)+1} T_k^{k-\mathrm{Card}(A)-1} \prod_{a\in A\cup\{k\}} T_a = - (-1)^{\mathrm{Card}(A)} T_k^{k-\mathrm{Card}(A)} \prod_{a\in A} T_a = w(A,k). \] When $k\in A$, one has $f(A,k)=(A',k)$ with $k\notin A'$, and the formula applied to $(A',k)$ implies the one for $(A,k)$, because $f$ is an involution. Now, in the relation $ \sum_{(A,k)} w(A,k) =0$, we first sum on $A$ and then on $k$: \[ \sum_{A\subset S} \sum_{k | (A,k) \in X} (-1)^{\mathrm{Card}(A)} T_k^{p-{\mathrm{Card}(A)}} \prod_{a\in A} T_a = 0\] and put together all those sets $A$ which have the same cardinality, say, $m$. This gives \[ \sum_{m=0}^n (-1)^m \sum_{\mathrm{Card}(A)=m} \prod_{a\in A} T_a \cdot \sum_{k | (A,k)\in X} T_k^{p-m} = 0. \] Let us compute separately the contribution to the sum of each $m$. When $m\gt p$, no subset $A$ is authorized so that the corresponding sum is zero. On the opposite, when $m\lt p$, for any subset $A$ such that $\mathrm{Card}(A)=m$, all pairs $(A,k)$ belong to $X$, so that the inner sum is $N_{p-m}$, and the whole term is $(-1)^mS_m N_{p-m}$. Finally, when $m=p$, the only $k$ that appear in the inner sum are the elements of $A$, and the corresponding term is $\sum_{k\in A} T_k^0=\mathrm {Card}(A)=p$; this term contributes to $(-1)^p p S_p$.

On soccer balls

2023-08-04T21:41:00.003+02:00

This is a swift adaptation of a thread on Mastodon written after having read the beginning of Mara Goyet's column in yesterday's edition of Le Monde (please don't spoil the game by jumping to it now). So look at this picture below.

Does something strike you in it? The directions? The ball? The character? Nothing? 68 people voted, roughly one third found nothing, and two thirds saw that there's a problem with the ball. One nice remark from Sarah was that the ball was larger than the character! So, what's the problem with the ball? As it has been observed by Matt Parker in 2017, no soccer ball can be made following what the picture suggests: what we see is a combination of hexagons, some black, some white, and it is actually impossible to stitch hexagons together to do a ball. Or the other way round, it is impossible to take a ball and draw an hexagonal pattern on it. On the other hand, it is quite easy to draw an hexagonal pattern on a plane, and many kitchen/bathrooms feature such a tiling. Here is a photograph (stolen to somebody on Pinterest) of a 4th cent. BCE pavement (Porta Rosa, in the city of Elea-Velia, Italy)

But on a ball, nope. So how is a soccer ball made if not with hexagons? — The point is that is not made of hexagons only : it has 20 hexagons (so that the sign post above is slightly reminiscent of a soccer ball) but it also has 12 pentagons.

The whole structure forms what is called a truncated icosahedron. A regular icosahedron, as shown on the picture below (the Spinoza monument in Amsterdam) would have 20 triangular faces, such that five of them meet at each vertex. If one make a cut at these vertices, each of them is replaced by a pentagon, and one get the truncated icosahedron.

I should add the mathematical explanation for the impossibility of stitching hexagons into a ball. This comes from a formula due to Euler that if one stitches polygons to a ball in anyway, some relation must hold : the number of vertices minus the number of edges plus the number of polygons is always equal to 2. Imagine you have some number of hexagons, say $n$. That makes $6n$ edges, but each time you make a stitch, you glue two edges together, hence $3n$ edges in the end. What about the vertices, namely the points when at least 3 hexagons will be joined? If the picture is correct, then 3 hexagons meet at each vertex, so the number of vertices is $6n/3=2n$ — Then Euler's formula says $2n - 3n + n = 2$, which is absurd. … Now, is this really a necessity that exactly 3 hexagons meet at each vertex? Couldn't we have more complicated configurations? As a matter of fact, I'm not sure about the answer, and I can try imagine a strange configurations where there are a bunch of hexagons meeting in a different number of points. But algebra seems to contradict this in the end. Let me try. Assume we have $s_3$ vertices where 3 hexagons meet, $s_4$ vertices where 4 vertices meet, etc. The total number of vertices is then $s = s_3 + s_4 + \cdots$. The $n$ hexagons contribute to $6n$ vertices, but those of type $s_3$ count for 3 of them, those of type $s_4$ for 4 of them, etc. so that $6n = 3s_3 + 4s_4 +\cdots = 3s + d$, where $d$ is defined by $d = s_4 + 2s_5+ \cdots$. In particular, we have $6n \geq 3s$, which means $2n \geq s$. Euler's formula still says $2 = s - 3n + n = s - 2n \leq 0$, still giving a contradiction. And if you followed that proof, you may object that there is actually a way to stitch hexagons into a ball: it consists in stitching exactly 2 hexagons together. You get something completely flat, but if you blow enough air or fill it with wool, so that it gets round enough, you might feel that you got a ball. The picture made by the hexagons on a ball is that of an hexagon drawn on the equator, each of them covering one hemisphere. Here are additional references

A post on the BBC site, where the problem is explained roughly, up to the fact that the UK authorities refused to propose a correct design for the post after Matt Parker raised up the issue
A video by Matt Parker on his YouTube channel, @standupmaths, where he discusses how soccer balls are made, especially in the 2014 WWC or the UK Premier Ligue
A text by Étienne Ghys on the large audience website Image des mathématiques about the 2014 WWC soccer ball (in French)
Euler's formula has a rich history which is evoked in the Wikipedia page quoted above, but is also the matter of the great book Proofs and Refutations by Imre Lakatos.

Electrostatics and rationality of power series

2023-07-06T00:33:00.001+02:00

I would like to tell here a story that runs over some 150 years of mathematics, around the following question: given a power series $\sum a_n T^n$ (in one variable), how can you tell it comes from a rational function?

There are two possible motivations for such a question. One comes from complex function theory: you are given an analytic function and you wish to understand its nature — the simplest of them being the rational functions, it is natural to wonder if that happens or not (the next step would be to decide whether that function is algebraic, as in the problem of Hermann Amandus Schwarz (1843–1921). Another motivation starts from the coefficients $(a_n)$, of which the power series is called the generating series; indeed, the generating series is a rational function if and only if the sequence of coefficients satisfies a linear recurrence relation.

At this stage, there are little tools to answer that question, besides is a general algebraic criterion which essentially reformulates the property that the $(a_n)$ satisfy a linear recurrence relation. For any integers $m$ and $q$, let $D_m^q$ be the determinant of size $(q+1)$ given by \[ D_m^q = \begin{vmatrix} a_m & a_{m+1} & \dots & a_{m+q} \\ a_{m+1} & a_{m+2} & \dots & a_{m+q+1} \\ \vdots & \vdots & & \vdots \\ a_{m+q} & a_{m+q+1} & \dots & a_{m+2q} \end{vmatrix}. \] These determinants are called the Hankel determinants or (when $m=0$) the Kronecker determinants, from the names of the two 19th century German mathematicians Hermann Hankel (1839—1873) and Leopold von Kronecker (1823–1891). With this notation, the following properties are equivalent:

The power series $\sum a_n T^n$ comes from a rational function;

There is an integer $q$ such that $D_m^q=0$ for all large enough integers $m$;

For all large enough integers $q$, one has $D^q_0=0$.

(The proof of that classic criterion is not too complicated, but the standard proof is quite smart. In his book Algebraic numbers and Fourier analysis, Raphaël Salem gives a proof which arguably easier.)

Since this algebraic criterion is very general, it is however almost impossible to prove the vanishing of these determinants without further information, and it is at this stage that Émile Borel enters the story. Émile Borel (1871–1956) has not only be a very important mathematician of the first half of the 20th century, by his works on analysis and probability theory, he also was a member of parliament, a minister of Navy, a member of Résistance during WW2. He founded the French research institution CNRS and of the Institut Henri Poincaré. He was also the first president of the Confédération des travailleurs intellectuels, a intellectual workers union.

In his 1893 paper « Sur un théorème de M. Hadamard », Borel proves the following theorem:

Theorem. — If the coefficients $a_n$ are integers and if the power series $\sum a_n T^n $ “defines” a function (possibly with poles) on a disk centered at the origin and of radius strictly greater than 1, then that power series is a rational function.

Observe how these two hypotheses belong to two quite unrelated worlds: the first one sets the question within number theory while the second one resorts from complex function theory. It looks almost as magic that these two hypotheses lead to the nice conclusion that the power series is a rational function.

It is also worth remarking that the second hypothesis is really necessary for the conclusion to hold, because rational functions define functions (with poles) on the whole complex plane. The status of the first hypothesis is more mysterious. While it is not necessary, the conclusion may not hold without it. For example, the exponential series $\sum T^n/n!$ does define a function (without poles) on the whole complex plane, but is not rational (it grows too fast at infinity).

However, the interaction of number theoretical hypotheses with the question of the nature of power series was not totally inexplored at the time of Borel. For example, a 1852 theorem of the German mathematician Gotthold Eisenstein (Über eine allgemeine Eigenschaft der Reihen-Entwicklungen aller algebraischen Functionen) shows that when the coefficients $a_n$ of the expansion $\sum a_nT^n$ of an algebraic functions are rational numbers, the denominators are not arbitrary: there is an integer $D\geq 1$ such that for all $n$, $a_n D^{n+1}$ is an integer. As a consequence of that theorem of Eisenstein, the exponential series or the logarithmic series cannot be algebraic.

It's always time somewhere on the Internet for a mathematical proof, so that I have no excuse for avoiding to tell you *how* Émile Borel proved that result. He uses the above algebraic criterion, hence needs to prove that some determinants $D^q_m$ introduced above do vanish (for some $q$ and for all $m$ large enough). Then his idea consists in observing that these determinants are integers, so that if you wish to prove that they vanish, it suffices to prove that they are smaller than one!

If non-mathematicians are still reading me, there's no mistake here: the main argument for the proof is the remark that a nonzero integer is at least one. While this may sound as a trivial remark, this is something I like to call the main theorem of number theory, because it lies at the heart of almost all proofs in number theory.

So one has to bound determinants from above, and here Borel invokes the « théorème de M. Hadamard » that a determinant, being the volume of the parallelipiped formed by the rows, is smaller than the product of the norms of these rows, considered as vectors of the Euclidean space : in 2-D, the area of a parallelogram is smaller than the lengths of its edges! (Jacques Hadamard (1865—1963) is known for many extremely important results, notably the first proof of the Prime number theorem. It is funny that this elementary result went into the title of a paper!)

But there's no hope that using Hadamard's inequality of our initial matrix can be of some help, since that matrix has integer coefficients, so that all rows have size at least one. So Borel starts making clever row combinations on the Hankel matrices that take into accounts the poles of the function that the given power series defines.

Basically, if $f=\sum a_nT^n$, there exists a polynomial $h=\sum c_mT^m$ such that the power series $g=fh = \sum b_n T^n$ defines a function without poles on some disk $D(0,R)$ where $R>1$. Using complex function theory (Cauchy's inequalities), this implies that the coefficients $b_n$ converge rapidly to 0, roughly as $R^{-n}$. For the same reason, the coefficients $a_n$ cannot grow to fast, at most as $r^{-n}$ for some $r>0$. The formula $g=fh$ shows that coefficients $b_n$ are combinations of the $a_n$, so that the determinant $D_n^q$ is also equal to \[ \begin{vmatrix} a_n & a_{n+1} & \dots & a_{n+q} \\ \vdots & & \vdots \\ a_{n+p-1} & a_{n+p} & \dots & a_{n+p+q-1} \\ b_{n+p} & b_{n+p+1} & & b_{n+p+q} \\ \vdots & & \vdots \\ b_{n+q} & b_{n+q+1} & \dots & b_{n+2q} \end{vmatrix}\] Now, Hadamard's inequality implies that the determinant $D_n^q$ is (roughly) bounded above by $ (r^{-n} )^p (R^{-n}) ^{q+1-p} $: there are $p$ lines bounded above by some $r^{-n}$ and the next $q+1-p$ are bounded above by $R^{-n}$. This expression rewrites as $ 1/(r^pR^{q+1-p})^n$. Since $R>1$, we may choose $q$ large enough so that $r^p R^{q+1-p}>1$, and then, when $n$ grows to infinity, the determinant is smaller than 1. Hence it vanishes!

The next chapter of this story happens in 1928, under the hands of the Hungarian mathematician George Pólya (1887-1985). Pólya had already written several papers which explore the interaction of number theory and complex function theory, one of them will even reappear later one in this thread. In his paper “Über gewisse notwendige Determinantenkriterien für die Fortsetzbarkeit einer Potenzreihe”, he studied the analogue of Borel's question when the disk of radius $R$ is replaced by an arbitrary domain $U$ of the complex plane containing the origin, proving that if $U$ is big enough, then the initial power series is a rational function. It is however not so obvious how one should measure the size of $U$, and it is at this point that electrostatics enter the picture.

In fact, it is convenient to make an inversion : the assumption is that the series $\sum a_n / T^n$ defines a function (with poles) on the complement of a compact subset $K$ of the complex plane. Imagine that this compact set is made of metal, put at potential 0, and put a unit electric charge at infinity. According to the 2-D laws of electrostatics, this create an electric potential $V_K$ which is identically $0$ on $K$ and behaves as $ V_K(z)\approx \log(|z|/C_K)$ at infinity. Here, $C_K$ is a positive constant which is the capacity of $K$.

Theorem (Pólya). — Assume that the $a_n$ are integers and the series $\sum a_n/T^n$ defines a function (with poles) on the complement of $K$. If the capacity of $K$ is $\lt1$, then $\sum a_n T^n$ is rational.

To apply this theorem, it is important to know of computations of capacities. This was a classic theme of complex function theory and numerical analysis some 50 years ago. Indeed, what the electric potential does is solving the Laplace equation $\Delta V_K=0$ outside of $K$ with Dirichlet condition on the boundary of $K$.

In fact, the early times of complex analysis made a remarkable use of this fact. For example, it was by solving the Laplace equation that Bernhard Riemann proved the existence of meromorphic functions on “Riemann surfaces”, but analysis was not enough developed at that time (around 1860). In a stunningly creative move, Riemann imagines that his surface is partly made of metal, and partly of insulating material and he deduces the construction of the desired function from the electric potential.

More recently, complex analysis and potential theory also had applications to fluid dynamics, for example to compute (at least approximately) the flow of air outside of an airplane wing. (I am not a specialist of this, but I'd guess the development of numerical methods that run on modern computers rendered these beautiful methods obsolete.)

The relation between the theorems of Borel and Pólya is that the capacity of a disk is its radius. This can be seen by the fact that $V(z)=\log(|z|/R$\) solves the Laplace equation with Dirichlet condition outside of the disk of radius $R$.

A few other capacities have been computed, not too many, in fact, because it appears to be a surprisingly difficult problem. For example, the capacity of an interval is a fourth of its length.

Pólya's proof is similar to Borel's, but considers the Kronecker determinant in place of Hankel's. However, the linear combinations that will allow to show that this determinant is small are not as explicit as in Borel's proof. They follow from another interpretation of the capacity introduced by the Hungarian-Israeli mathematician Michael Fekete (1886–1957; born in then Austria–Hungary, now Serbia, he emigrated to Palestine in 1928.)

You know that the diameter $d_2(K)$ of $K$ is the upper bound of all distances $|x-y|$ where $x,y$ are arbitrary points of $K$. Now for an integer $n\geq 2$, consider the upper bound $d_n(K)$ of all products of distances $ \prod_{i\neq j}{x_j-x_i}$^{1/n(n-1)}\) where $x_1,\dots,x_n$ are arbitrary points of $K$. It is not so hard to prove that the sequence $d_n(K)$ decreases with $n$, and the limit $\delta(K)$ of that sequence is called the transfinite diameter by Fekete.

Proposition. — $ \delta(K)= C_K$.

This allows to make a link between capacity theory and another theme of complex function theory, namely the theory of best approximation, which end up in Pólya's proof: the adequate linear combination for the $n$th row is given by the coefficients of the monic polynomial of degree $n$ which has the smallest least upper bound on $K$.

If all this is of some appeal to you, there's a wonderful little book by Thomas Ransford, Potential Theory in the Complex Plane, which I find quite readable (say, from 3rd or 4th year of math studies on).

In the forthcoming episodes, I'll discuss two striking applications of the theorems of Borel and Pólya to proof by Bernhard Dwork of a proof of a conjecture of Weil (in 1960), and by a new proof (in 1987) by Jean-Paul Bézivin and Philippe Robba of the transcendence of the numbers $e$ and $\pi$, two results initially proven by Charles Hermite and Ferdinand von Lindemann in 1873 and 1882.

Associated prime ideals

2023-02-17T12:38:00.000+01:00

$\gdef\ann{\mathop{\mathrm{ann}}} \gdef\Ass{\mathop{\mathrm{Ass}}}\gdef\Spec{\mathop{\mathrm{Spec}}}$

I would like to go back to a quite delicate question of commutative algebra, that of associated prime ideals of modules. In most textbooks (Bourbaki, Matsumura…), this concept is considered for modules over a noetherian ring, while it is also necessary to consider it in a greater generality for some applications in algebraic geometry. For my book, (Mostly) commutative algebra (Springer Nature, 2021), I preferred to introduce the general concept (§6.5), because I observed that the initial proofs are in fact easier. In yesterday's class (Cohomology of coherent sheaves, 2nd year of Master course at Université Paris Cité), some remarks of a student, Elias Caeiro, helped me simplify two steps of the treatment I proposed in my book.

Definition. — Let $A$ be a ring and let $M$ be an $A$-module. Say that a prime ideal $P$ of $A$ is associated to $M$ if there exists an element $m\in M$ such that $P$ is minimal among all prime ideals containing $\ann_A(m)$.
We write $\Ass_A(M)$ (sometimes spelt out as “assassin”) for the set of all associated prime ideals of $M$.
(Here, $\ann_A(m)$ is the annihilator of $m$, the ideal of all $a\in A$ such that $am=0$.)

There is a geometric way to intepret this definition: it means that in the spectrum $\Spec(A)$, the irreducible closed set $V(P)$ (of which $P$ is the generic point) is an irreducible component of $V(\ann_A(m))$. Thanks to this remark, associated prime ideals are compatible with localisation: \[ \Ass_{S^{-1}A}(S^{-1}A) = \Ass_A(M) \cap \Spec(S^{-1}A), \] where $\Spec(S^{-1}A)$ is identified as the subset of $\Spec(A)$ consisting of prime ideals $P$ which are disjoint from $S$. In particular, $P$ is associated to $M$ if and only if the maximal ideal $PA_P$ of the local ring $A_P$ is associated to the module $M_P$.

Here is what the associated prime ideals mean, from the point view of module theory.
Proposition. — Let $a\in A$.
a) The multiplication by $a$ is injective in $M$ if and only if $a$ does not belong to any associated prime ideal of $M$.
b) The localized module $M_a$ is zero if and only if $a$ belongs to all associated prime ideals of $M$.
c) In particular, $M=0$ if and only if $\Ass_A(M)=\emptyset$.

Proof. — a) If $a$ belongs to the associated prime ideal $P$, then $a$ belongs to the associated prime ideal $PA_P$ of $M_P$, which means that there exists $m\in M$ such that $PA_P$ is the only prime ideal containing $\ann_{A_P}(m)$. Consequently, $a$ is nilpotent modulo $\ann_{A_P}(m)$ and there exists $n\geq 0$ and $b\in A\setminus P$ such that $a^nb\in\ann_A(m)$. Take a minimal such $n$. Since $b\notin P$, one has $n\geq 1$; then $a^{n-1}b m\neq0$, while $a\cdot a^{n-1}bm=0$ and the homothety $(a)_M$ is not injective. Conversely, if $(a)_M$ is not injective, take $m\neq0$ in $M$ such that $am=0$; the annihilator $\ann_A(m)$ is not equal to $A$, hence $V(\ann_A(m))\neq \emptyset$; take an irreducible component of this closed subset — equivalently a minimal prime ideal $P$ among those containing $\ann_A(m)$; one has $a\in \ann_A(m)$, hence $a\in P$.
b) follows from c), with $a=1$.
c) The module $M$ is zero if and only if the multiplication by $0$ is injective on $M$. By a), this is equivalent to the fact that $\Ass_A(M)$ is empty.

Corollary. — A prime ideal $P$ is in the support of $M$ if and only if it contains some associated prime ideal.
The prime ideal $P$ belongs to the support of $M$ if and only if $M_P\neq0$, if and only if $\Ass_{A_P}(M_P)$ is not empty, if and only if there exists an associated prime ideal of $M$ which belongs to $\Spec(A_P)$, that is, is contained in $P$.

For noetherian rings, one has the following characterization of associated prime ideals, which is usually taken at their definition.

Theorem. — Let $A$ be a noetherian ring and $M$ be an $A$-module. A prime ideal $P$ of $A$ is associated to $M$ if and only if there exists $m\in M$ such that $P=\ann_A(m)$.
If $P=\ann_A(m)$, then $P$ is associated to $M$. Conversely, let $m\in M$ and let $P$ be a minimal prime ideal of $A$ among those containing $\ann_A(m)$. We first assume that $A$ is local with maximal ideal $P$; then $P$ is the only prime ideal of $A$ that contains $\ann_A(m)$, which implies that any element of $P$ is nilpotent modulo $\ann_A(m)$. Since $P$ is finitely generated (because $A$ is noetherian), there exists an integer $n$ such that $P^n\subseteq \ann_A(m)$. Take a minimal such $n$. Since $\ann_A(m)\subseteq P$, one has $n\geq 1$; then $P^{n-1}\not\subseteq\ann_A(m)$ so that there exists $b\in P^{n-1}$ such that $bm\neq0$. Then $ab\in P^n$ for every $a\in P$, so that $P\subseteq \ann_A(bm)$, and $\ann_A(bm)\subseteq P$ because $bm\neq0$. Consequently, $P=\ann_A(bm)$. In the general case, we use the case of a local ring to obtain $m\in M$ such that $\ann_{A_P}(m/1)=PA_P$. Consequently, $\ann_A(m)\subseteq P$, and for every $a\in P$, there exists $b\notin P$ such that $abm=0$. Using that $P$ is finitely generated, one finds $b\notin P$ such that $abm=0$ for every $a\in P$; then $\ann_A(bm)=P$, as was to be shown.

From that point on, both presentations converge. One deduces from the preceding theorem that if $A$ is noetherian and $M$ is finitely generated, there exists a composition series $0=M_0\subseteq M_1 \subseteq \dots \subseteq M_n=M$, with successive quotients $M_k/M_{k-1}$ of the form $A/P_k$, for some prime ideals $P_k$ of $A$, and then $\Ass_A(M)$ is contained in $\{P_1,\dots,P_n\}$, in view of the following lemma. In particular, $\Ass_A(M)$ is finite.

Lemma. — Let $M$ be an $A$-module and let $N$ be a submodule of $M$; then $ \Ass_A(N)\subseteq \Ass_A(M)\subseteq \Ass_A(N)\cup \Ass_A(M/N)$.
The first inclusion $\Ass_A(N)\subseteq \Ass_A(M)$ follows from the definition. Let us prove the second one. Let $P\in\Ass_A(M)$ and let $m\in M$ be such that $P$ is a minimal prime ideal of $A$ among those containing $\ann_A(m)$. Let $m'$ be the image of $M$ in $M/N$. If $P$ contains $\ann_A(m')$, then $P$ is also minimal among such prime ideals, hence $P\in\Ass_A(M/N)$. Otherwise, there exists $b\in \ann_A(m')$ such that $b\notin P$. Let us prove that $P$ is minimal among the prime ideals containing $\ann_A(bm)$. First of all, let $a\in\ann_A(bm)$; then $abm=0$, hence $ab\in P$, hence $a\in P$ since $b\notin P$. Since $\ann_A(m)\subseteq\ann_A(bm)$, it also follows that $P$ is minimal among the prime ideals containing $\ann_A(bm)$. Since $b\in\ann_A(m')$, one has $bm'=0$, hence $bm\in N$ and $P\in\Ass_A(N)$.

The Klein group, the centralizer of a permutation, and its relation with the alternating group

2023-01-01T21:48:00.001+01:00

The following reflexion came out of my irrepressible need to understand why the 3 double transpositions in $\mathfrak S_4$, together with the identity, formed a group $V$. Of course, one might just say: “they are stable under multiplication, as one sees by computing the 4·3/2 = 6 different products”, but who makes this computation anyway? And since I wanted not only to understand this, but to explain it to Lean, I needed an argument that could actually be done, for real. So here is an argument that requires no computation, besides the one that says that there are 3 double transpositions.

Prop. — The subgroup of $\mathfrak S_4$ generated by the 3 double transpositions is the unique 2-sylow subgroup of $\mathfrak A_4$. In particular, it has order 4 and consists of these 3 double transpositions and of the identity.
Proof. — Let $V$ be the subset of $\mathfrak S_4$ consisting of these 3 double transpositions and of the identity. Let $S$ be a 2-sylow subgroup in $\mathfrak A_4$.
We first prove $S \subseteq V$. The subgroup $S$ has order 4. Let $g\in S$. The order of $g$ divides 4, so its cycles have lengths 1, 2 or 4. If there were one cycle of length 4, then $g$ would be that cycle, hence of odd sign. Consequently, either $g=1$, or $g$ has a cycle of length 2, and then there must be a second because $g$ is even. Consequently, $S\subseteq V$, as claimed.
Since $4=\operatorname{\rm Card}(S)=\operatorname{\rm Card}(V)$, this shows that $S=V$, hence $S=\langle V\rangle$.

At this point, we still need to understand why there are 3 double transpositions. More generally, I wanted to prove that the number of permutations in $\mathfrak S_n$ of given orbit type. The orbit type a permutation $g$ is a multiset of strictly positive integers with sum $n$ given by the cardinalities of the orbits of $g$ on $\{1,\dots,n\}$. We write it as $1^{n_1} 2^{n_2}\dots r^{n_r}$, meaning that $g$ has $n_1$ orbits of length $1$ (fixed points), $n_2$ orbits of cardinality $2$, etc., so that $n= \sum n_i i$. Let $\mathscr O_g$ be the set of orbits of $g$. The action of $g$ on a given orbit $c$ coincides with a circular permutation with order the length $\ell(c)$ of this orbit; when it is nontrivial, such a permutation will be called a cycle of $g$. The supports of these cycles are pairwise disjoint, so that these cycles commute, and their product is exactly $g$. In fact, this is the only way of writing $g$ as a product of cycles with pairwise disjoint supports. (By convention, the identity is not a cycle.)

Theorem. — There are exactly \[ N(1^{n_1}\dots r^{n_r}) = \frac{n!}{1^{n_1}\dots r^{n_r} n_1!\dots n_r!} \] permutations with orbit type $1^{n_1} 2^{n_2}\dots r^{n_r}$.

A standard proof of this result goes as follows. Write the decomposition of such a permutation $g$ into cycles with disjoint supports as $g=(\cdot)\dots (\cdot)(\cdot,\cdot)\dots(\cdot,\cdot,\dots)$, leaving blank spaces for the values of the cycles (and, contradicting our convention, allowing for cycles of length 1…). There are $n!$ ways to fill these spaces with the $n$ distinct integers between $1$ and $n$, but some of them will give rise to the same permutation. Indeed, the entries in a cycle of length $s$ only count up to a circular permutation, so that we need to divide the result by $1^{n_1}\dots r^{n_r}$. Moreoveer, we can switch the order of the cycles of given length, hence we also need to divide that result by $n_s!$ (number of ways of switching the various cycles of length $s$), for all possible length $s$.

This is reasonably convincing but one could wish for something more precise, both in the proof, and in the statement. In fact, in the preceding formula, the numerator $n!$ is the order of $\mathfrak S_n$. Since all permutations with a given orbit type are conjugate by $\mathfrak S_n$, the left hand side appears as the cardinality of the orbit of a permutation $g$ of that orbit type, so that the denominator has to be equal the cardinality of the stabilizer of this permutation under the action by conjugation. Therefore, a more precise proof of this formula could run by elucidating the structure of this centralizer. This may also be interesting once one wishes to relativize the result to the alternating group $\mathfrak A_n$ in order to obtain a formula for the cardinality of the various conjugacy classes in $\mathfrak A_n$.

Let us fix a permutation $g\in\mathfrak S_n$ with orbit type $1^{n_1}\dots r^{n_r}$. The stabilizer of $g$ under the action by conjugation is its centralizer $Z_g$, the subgroup of all $k\in\mathfrak S_n$ which commute with $g$.

We first define a morphism of groups \[ \tau \colon Z_g \to \mathfrak S_{n_1}\times \mathfrak S_{n_2}\times\dots \mathfrak S_{n_r}. \] Let $\mathscr O_g$ be the set of orbits of $g$; this is a set with cardinality $n_1+n_2+\dots+n_r$. Restricted to one orbit, the action of $g$ coincides with that of a circular permutation on (which fixes the complementary subset); these circular permuations have disjoint supports, hence they commute pairwise and their product is equal to $g$. For $c\in\mathscr O_g$, we write $\ell(c)$ for its cardinality of its support, this is also the order of the cycle of $g$ defined by this orbit. If $k\in Z_g$, then $kgk^{-1}=g$. Consequently, the action of $k$ permutes the orbits of $g$, respecting their cardinalities. This defines the desired group morphism $\tau$ from $Z_g$ to a product of permutation groups $\mathfrak S_{n_1}\times \dots \mathfrak S_{n_r}$.

This morphism $\tau$ is surjective.
Indeed, given permutations $\sigma_1$ of the set of fixed points of $g$, $\sigma_2$ of the set of orbits of length 2, etc., we construct $k_\sigma\in Z_g$ such that $\tau(k_\tau)=(\sigma_1,\dots,\sigma_r)$. We fix a point $a_c$ in each orbit $c$ and decide that $k_\sigma(a_c)=a_{\sigma_i(c)}$ if $c$ has length $i$. The formula $k_\sigma g=g_\sigma$ imposes $k_\sigma (g^n a_c)=g^n a_{\sigma_i(c)}$ for all $n\in\mathbf Z$, and it remains to check that this formula gives a well defined element in $Z_g$. In fact, this formula defines a group theoretic section of $\tau$.

What is the kernel of this morphism $\tau$?
If $\tau(k)=1$, then $k$ fixes every orbit $c\in\mathscr O_g$. Since $kg=gk$, we obtain that on each orbit $c$, $k$ coincides with some power of the corresponding cycle, which has order $\ell(c)$. We thus obtain an isomorphism \[ \ker(\tau) \simeq \prod_{c\in\mathscr C_g} (\mathbf Z/\ell(c)\mathbf Z). \]

To compute the cardinality of $Z_g$, it is now sufficient to compute those of $\operatorname{\rm im}(\tau)$ and $\ker(\tau)$, and this gives the formula \[ \operatorname{\rm Card} (Z_g) = \operatorname{\rm Card} (\ker(\tau)) \operatorname{\rm Card} (\operatorname{\rm im}(\tau)) = 1^{n_1}\dots r^{n_r} n_1! \dots n_r!, \] as was to be shown.

One of the interest of this argument is that it can be pushed forward to understand the structure of the conjugacy classes in the alternating group $\mathfrak A_n$. The case $n\leq 1$ is uninteresting, hence we assume $n\geq 2$. Then $\mathfrak A_n$ has index 2 in $\mathfrak S_n$, and the formulas \[ \operatorname {\rm Card}((g)_{\mathfrak A_n}) = \frac{{\rm Card}({\mathfrak A_n})}{{\rm Card}(Z_g \cap {\mathfrak A_n})} \quad\text{\rm and}\quad \operatorname {\rm Card}((g)_{\mathfrak S_n}) = \frac{{\rm Card}({\mathfrak S_n})}{{\rm Card}(Z_g)} \] for the cardinalities of the conjugacy classes $(g)_{\mathfrak A_n}$ and $(g)_{\mathfrak S_n}$ imply that both are equal if and only if $Z_g$ is not contained in $\mathfrak A_n$; otherwise, the conjugacy class $(g)_{\mathfrak S_n}$ is the disjoint union of $(g)_{\mathfrak A_n}$ and of a conjugacy class $(g')_{\mathfrak A_n}$ of a permutation $g'$ which is conjugate to $g$ in $\mathfrak S_n$ but not in $\mathfrak A_n$, and both have the same cardinality.

Examples of this phenomenon are classical. For example, the 5-cycles in $\mathfrak S_5$ are conjugate, but they constitute two distinct conjugacy classes under $\mathfrak A_5$. Even more elementary, the 3-cycles $(1\,2\,3)$ and $(1\,3\,2)$ are conjugate in $\mathfrak S_3$, but they are not conjugate in $\mathfrak A_3$ since that group is commutative!

So let us use our description of $Z_g$ to give a full description of this phenomenon.

As a first step, when is $\ker(\tau)$ contained in $\mathfrak A_n$? We have seen that $\ker(\tau)$ is generated by the cycles $c\in\mathscr C_g$. Consequently, $\ker(\tau)$ is contained in $\mathfrak A_n$ if and only if all of them are contained in $\mathfrak A_n$, which means that their lengths are odd.

We assume that this condition holds, so that $\ker(\tau)\subseteq \mathfrak A_n$, and now work on the image of $\tau$. Its surjectivity was proved by the means of an explicit section $\sigma\mapsto k_\sigma$. Given the preceding condition that $\ker(\tau)\subseteq \mathfrak A_n$, a necessary and sufficient condition for the inclusion $Z_g\subseteq \mathfrak A_n$ will be that the image of this section consists of even permutations. This section is a morphism of groups, so it suffices to understand the sign of $k_\sigma$ when $\sigma$ consists of a cycle $(c_1,\dots,c_s)$ in $\mathfrak S_{n_i}$ and is trivial on the other factors. Then $\ell(c_1)=\dots=\ell(c_s)$, by definition of $\sigma$. The formula $k_\sigma(g^n a_c)=g^n a_{\sigma(c)}$ shows that the non trivial cycles of $k_\sigma$ are of the form $(g^n a_{c_1},\dots, g^n a_{c_s})$; they all have the same length, $s$, and there are $\ell(c_1)$ of them. Consequently, the sign of $k_\sigma$ is equal to $(-1)^{(s-1)\ell(c_1)}=(-1)^{s-1}$ since $\ell(c_1)$ is odd. This proves that the sign of $k_\sigma$ is equal to the sign of $\sigma$. In addition to the condition that the orbits of $g$ have odd cardinalities, a necessary and sufficient condition for the image of $\sigma\mapsto k_\sigma$ to be contained in $\mathfrak A_n$ is thus that all symmetric groups $\mathfrak S_{n_i}$ coincide with their alternating groups, that is, $n_i\leq 1$ for all $i$. We can now conclude:

Theorem. — Let $1^{n_1}\dots r^{n_r}$ be a partition of $n$.

If $n_i=0$ for even $i$, and $n_i\leq 1$ for all $i$, then there exist two permutations in $\mathfrak S_n$ with orbit type $1^{n_1}\dots r^{n_r}$ which are not conjugate in $\mathfrak A_n$.

Otherwise, any two permutations in $\mathfrak S_n$ with that orbit type are conjugate in $\mathfrak A_n$.

We can check the initial example of two 5-cycles in $\mathfrak S_5$ which are not conjugate in $\mathfrak A_5$. Their orbit type is $5^1$: the only length that appears is 5, hence odd, and it has multiplicity $1$. In fact, this is the only orbit type in $\mathfrak S_5$ where this phenomenon appears!

Multiplicative square roots

2022-12-12T02:41:00.004+01:00

I will just discuss briefly the first section of a paper by William Waterhouse (2012), “Square Root as a Homomorphism” (American Mathematical Monthly 119 (3), 235-239), which addresses the following question: given a field $F$, when is it possible to define square roots for all squares compatibly with products, ie, so that $\sqrt {ab}=\sqrt a\,\sqrt b$ if $a,b\in F$ are squares.

Real numbers. — Such a square root operation exists when $F$ is the field of real numbers: we are familiar with the process of taking the positive square root of a positive real number.

Finite fields. — It also exists in some finite fields. So let $F$ be a finite field, let $q$ be its number of elements; then $q$ is a power of a prime number $p$, but if you wish, you may already assume that $q=p$ is prime. For simplicity, we assume that $q$ is odd. By Fermat's little theorem, every nonzero element $a\in F$ satisfies $a^{q-1}=1$. Then $q-1$ is even, we can write $a^{q-1}=a^{(q-1)/2})^2=1$, so that $a^{(q-1)/2}=\pm1$, and Euler's criterion asserts that $a$ is a square if and only if $a^{(q-1)/2}=1$. (That this condition is necessary is obvious: write $a=b^2$, one gets $a^{(q-1)/2}=b^{q-1}=1$ by Fermat's criterion. Then, a counting argument shows that it is sufficient: the map $b\mapsto b^2$ is $2$ to $1$ on nonzero elements, hence its image consists of $(q-1)/2$ elements, all of which are squares; since the polynomial equation $T^{(q-1)/2}=1$ has at most $(q-1)/2$ solutions in $F$, we obtained all of them in this way.)

For example, $-1$ is a square if and only if $(-1)^{(q-1)/2}=1$, which happens if and only if $(q-1)/2$ is even, that is, $q\equiv 1\pmod 4$. In this case, do we have a formula for a square root of $-1$? When $q=p$, yes, but it is not an easy one: Wilson's theorem states that $(p-1)!\equiv -1\pmod p$, just because you may pair each integer $a$ such that $1\lt a\lt p-1$ with its multiplicative inverse modulo $p$; then only two factors remain in the product and $(p-1)!\equiv 1\cdot (p-1)\equiv -1\pmod p$. Now, we pair each integer $a$ such that $1\leq a\leq p-1$ with its additive inverse $p-a$; we get $(((p-1)/2) ! )^2 (-1)^((p-1)/2)$, hence $((p-1)/2)!)^2\equiv -1\pmod p$. This is not an easy formula, because computing the factorial takes a long time for large $p$.

It is possible to do much quicker, but you need to have a die at your disposal. Indeed, choose an element $a$ such that $1\leq a\leq p-1$ and compute $b=a^{(p-1)/4}$. Since $b^2=a^{(p-1)/2}=\pm1$, two possibilities arise: when $a$ is a square, we get $1$, but if $a$ is not a square, then we get $-1$. And if we choose $a$ randomly, we have one chance over two of not having chosen a square, hence one chance over two to get an element $b$ such that $b^2=-1$.

At this point you may ask why it isn't as long to compute the power $a^{(p-1)/4}$ than the factorial $((p-1)/2)!$, and you would be right. The reason is that there is a fast recursive way to compute a power $a^n$, by writing $a^n=(a^2)^{n/2}$ if $n$ is odd, and $a^n=a\cdot (a^2)^{(n-1)/2}$ if $n$ is odd. This leads to basically $\log_2(n)$ multiplications and squarings, and not $n$ multiplications ($n-1$, actually) as the naïve expression $a\cdot a\dots a$ might have let you think.

But let us go back to the question of computing square roots. As the last three paragraphs indicate, it could be difficult to do so when $q\equiv 1\pmod 4$. However, it is extremly easy in the other case $q\equiv 3\pmod 4$. Take a nonzero element $a$ which is a square, and write $a^{(q-1)/2}=1$. Since $q\equiv 3\pmod 4$, we write $q=-1+4m$ so that $a^{2m-1}=1$, hence say $a=a^{2m}=(a^m)^2$. We have our square root, it is simply given by $b=a^m=a^{(q+1)/4}$. The resulting map, $a\mapsto a^m$, gives us our desired multiplicative square roots on squares.

Complex numbers. — Now for a negative result, there is no multiplicative square root on the complex numbers, basically for the reason we have been taught that it leads to fallacies. All complex numbers are squares, so let us assume that we have a multiplicative square root $r\colon \mathbf C\to\mathbf C$. Letting $i=r(-1)$, the contradiction comes from the relation $$-i = r(-i)^2=r((-i)^2)=r(-1)=i.$$

We can now state and prove Waterhouse's theorem:

Theorem. — Let $F$ be a field (of characteristic $\neq 2$) and let $S\subseteq F$ be the multiplicative monoid of squares. There exists a multiplicative homomorphism $r\colon S\to F$ if and only if $-1\notin S$.

Proof. — The same negative argument as in the complex numbers works whenever $-1$ is a square in $F$. So let us assume that $-1$ is not a square and let us explain why a multiplicative square root exists. The proof, however, is not explicit but relies on some maximal principle. Moreover, we won't define the square root map directly, but its image.
Let us first analyse the situation. Assume that $r\colon S\to F$ is a multiplicative square root. It is simpler to remove $0$ from the discussion so we consider its restriction $S^\times \to F^\times$ and still denote it by $r$. By assumption, it is a morphism of groups, so that its image $R^\times$ is a subgroup of $F^\times$. Observe that it does not contain $-1$, for if $r(a)=-1$, then $a=r(a)^2=(-1)^2=1$ but $r(1)=1$. Moreover, for every element $a\in F^\times$, we have $r(a^2)^2=a^2$, hence $r(a^2)=\pm a$, so that either $a$, or $-a$ belongs to $R$, but not both since $-1\not\in R^\times$. As a consequence, $R^\times$ is a maximal subgroup of $F^\times$ among those which do not contain $-1$: adding to $R^\times$ any element $a\in F^\times$ such that $a\notin R^\times$ would lead to a subgroup $\langle R^\times,a\rangle$ which contains $-1$.

Let us consider a maximal subgroup of $F^\times$ containing the squares which does not contain $-1$. Starting from $S^\times$, which does not contain $-1$, this can be done using Zorn's lemma, or by transfinite induction: well ordering the elements of $F^\times$, and constructing $R^\times$ by induction. Since $R^\times$ contains the squares, the union $R^\times \cup a R^\times$ is a subgroup of $F^\times$; if it does not contain $-1$, then we replace $R^\times$ by it, other wise we discard $a$ and keep $R^\times$.

Let $a\in F^\times$. If $a\notin R^\times$, the construction means that $-1\in aR^\times$, hence $-a\in R^\times$. But we can't have both $a$ and $-a$ in $R^\times$, for that would imply that $-1\in R^\times$.

If $a\in F^\times$ is a nonzero square, it has two square roots, of the form $\pm b$, and we define $r(a)$ to be its square root which belongs to $R^\times$. One has $r(1)=1$, because $1\in S^\times\subset R^\times$. For nonzero squares $a,b$, the product $r(a)r(b)$ is a square root of $ab$, and it belongs to $R^\times$, hence it equals $r(ab)$. This proves that the map $r$ is multiplicative. This concludes the proof.

Remark. — If you've studied some abstract algebra, you may have recognized something in the middle of the proof. Indeed, the quotient group $V=F^\times/S^\times$ has exponent 2: for every $\alpha$ in this group, $\alpha^2=1$. Consequently, even if it is written multiplicatively, this abelian group is a vector space over the field with 2-elements. Since $-1$ is not a square in $F^\times$, its class $[-1]$ is nonzero in $F^\times/S^\times$, and the quotient group $W=R^\times/S^\times$ is just a maximal vector subspace that does not contain $[-1]$. It is a hyperplane and is defined by a linear form $\phi$ on $V$. Since $V$ is written multiplicatively, this linear form corresponds to a group homomorphism $f\colon F^\times \to\{\pm1\}$ which maps $S^\times$ to $1$ and such that $f(-1)=-1$. For every square $a=b^2$, we then have $r(a)=b f(b)$.

In his paper, Waterhouse goes on by viewing “fields $F$ with a multiplicative square root $r$” as a basic algebraic object, and considering such structures $(F,r)$ which can't be extended by adding algebraic elements. The final theorem of the paper shows that the Galois group $\mathop{\rm Gal}(\overline F/F)$ is either cyclic of order 2, or is the additive group of the 2-adic integers.

#Mathober2022

2022-11-01T18:06:00.013+01:00

Sophia Wood (@fractalkitty) had the good idea to set up a #Mathober project: for each day of october, she proposes you to react to one word of mathematics. I did something on the Mastodon server, that was also crossposted on Twitter. I will copy it here, but meanwhile you can enjoy it there.

You can also enjoy Sophia's work there:

Link to Mastodon : https://mathstodon.xyz/web/@antoinechambertloir/109131452332129714

Link to Twitter: https://twitter.com/achambertloir/status/1578647553205276672

Link to Sophia's Wood sketches:https://fractalkitty.com/2022/10/01/mathober2022-sketches

Yet another post on simplicity

2022-09-13T11:14:00.002+02:00

I see that I finally arrive to an end of my journey in formalizing in Lean the simplicity of the alternating group in 5 letters or more, so it may be a good time to summarize what I did, from the mathematical side.

On a first blog post, “Not simple proofs of simplicity”, I had described my initial plan, but it was not clear at that time that I would either arrive at a final proof, nor that I would be able to formalize it in Lean. In fact, a few weeks after I had started this experiment, I doubted I would make it and went on formalizing the traditional proof that the alternating group is simple. I added a few simplifications—which I was later told were already explained in Jacobson's Basic Algebra, say that's life…– leading to “The very simple proof that the alternating groups of five letters (or more) is simple”. I managed to formalize that proof at the end of 2021, and spent a lot of energy of the 8 next months to formalize the proof that I initially had in mind.

As I had already explained, the goal/constraint is to apply the Iwasawa criterion to the alternating group. This criterion says that if a group $G$ acts primitively on a set $X$, and if we attach to each point $x\in X$ a commutative subgroup $Tx$ of $G$, in such a way that $T(g\cdot x)=g\cdot Tx\cdot g^{-1}$ for every $g\in G$ and every $x\in X$, and if the subgroups $Tx$ generate $G$, then every normal subgroup of $G$ that acts nontrivially on $X$ contains the commutator subgroup. We take $G=\mathfrak A_n$. For $n\geq 5$, its commutator subgroup is $\mathfrak A_n$ itself (for example because any two 3-cycles are conjugated; in particular, a 3-cycle is conjugate to its square, which implies that it maps to $1$ in the abelianization of $\mathfrak A_n$). So we need to get primitive actions of $\mathfrak A_n$ and commutative subgroups.

One of the equivalent criteria for primitivity of a transitive actions is that the stabilizers of points are maximal subgroups. As I had explained at the end of the first post, the maximal subgroups of $\mathfrak S_n$ and $\mathfrak A_n$ are known by the O'Nan–Scott theorem, combined with its converse which is a theorem of Liebeck, Praeger and Saxl. These theorems give a precise list of the maximal subgroups of $\mathfrak S_n$ and $\mathfrak A_n$, of which the first entry is precisely $\mathfrak S_p\times \mathfrak S_{n-p}$ (where the first factor acts on $\{1;\dots;p\}$ and the second acts on $\{p+1;\dots;n\}$) and its intersection with $\mathfrak A_n$, if $0<p<n$ and $n\neq 2p$.

We need to understand the limitation $n\neq 2p$, the point being that if $n=2p$, the subgroup $\mathfrak S_p\times\mathfrak S_p$ is not maximal in $\mathfrak S_{2p}$, it is a subgroup of index 2 of a “wreath product” obtained by adding one permutation that exchanges the two blocks $\{1,\dots,p\}$ and $\{p+1,\dots,2p\}$, for example $(1\,p+1)(2\,p+2)\dots (p\,2p)$. This group is the second entry in the O'Nan–Scott theorem.

These two entries are labelled as intransitive and imprimitive respectively, because $\mathfrak S_p\times \mathfrak S_{n-p}$ has two orbits on $\{1;\dots;n\}$, while the wreath product is transitive but it preserves the partition consisting of the two blocks $\{1,\dots,p\}$ and $\{p+1,\dots,2p\}$.

These two entries seem to be obvious to the group theorists. It is given without proof in the paper of Liebeck, Praeger and Saxl.

The case of $\mathfrak S_n$ is easy, and occupies a subsection of Wilson's book on Finite Simple Groups. It is even funny to prove by hand, and not so hard to formalize in Lean. Take a subgroup $K$ of $\mathfrak S_n$ such that $\mathfrak S_p\times \mathfrak S_{n-p} \subsetneq K$ and let us prove that $K=\mathfrak S_n$. To that end, it suffices to show that $K$ contains any transposition $(a\,b)$. This is obvious if both $a$ and $b$ belong to $\{1;\dots;p\}$ or if they both belong to $\{p+1;dots;n\}$, so assume that $a\in\{1;\dots;p\}$ and $b\in\{p+1;\dots;n\}$. Since $K$ does not stabilize $\{1;\dots;p\}$, there is $x\in\{1;\dots;p\}$ and $k\in K$ such that $y=k\cdot x \in\{p+1;\dots;n\}$. If $n>2p$, there exists $z\in\{p+1;\dots;n\}$ such that $z\neq y$ and $t=k^{-1}\cdot z\in\{p+1;\dots;n\}$; from the relation $k^{-1} \cdot (y\,z) \cdot k=(x\,t)$ and the fact that $(y\,z)\in \mathfrak S_p\times\mathfrak S_{n-p}$, we deduce that $(x\,t)$ belongs to $K$. This gives us one transposition of the desired form; finally, the relation $(a\,b)=h (x\,t) h^{-1}$ with $h=(x\,a)(t\,b)\in\mathfrak S_p\times\mathfrak S_{n-p}$ shows that $(a\,b)\in K$. The other case, $n<2p$ is symmetric.

Bizarrely, the analogous result for the alternating group looked more difficult to me, although some colleague assured me that it could be done, an other one that I could certainly do it, and a last one did it for $n>7$. Since Liebeck, Praeger and Saxl gave no reference at all, I asked Liebeck about and he explained me a short proof that uses totally different ideas.

Let $G=\mathfrak A_n$ or $\mathfrak S_n$ and consider a subgroup $K$ such that $(\mathfrak S_p\times\mathfrak S_{n-p})\cap G \subsetneq K\subseteq G$; we wish to prove that $K=G$. Arguments as given above already show that $K$ acts transitively on $\{1;\dots;n\}$. But we can do more: it acts primitively. Now, one just needs to invoke a 1870 theorem of Jordan: a primitive subgroup of $\mathfrak S_n$ that contains a transposition is $\mathfrak S_n$, and a primitive subgroup of $\mathfrak S_n$ that contains a 3-cycle contains $\mathfrak A_n$!

To prove that $K$ acts primitively, it is convenient to use the standard definition of a primitive action. If a group $G$ acts on a set $X$, call block of the action a nonempty subset $B$ of $X$ which, for every $g\in G$, is either fixed or moved to a disjoint subset by $G$; it follows from the definition that the translates of a block by the action form a partition of $X$. Singletons are blocks, the full subset is a block, and one definition of a primitive action is that the only blocks are these trivial ones (and $X$ is nonempty). Orbits are blocks, so that a primitive action is transitive. Conversely, one can prove that if the action is transitive, then it is primitive if and only if stabilizers of points in $X$ are maximal subgroups. A more general result is that for every point $a\in X$, associating with a set $B$ its stabilizer $G_B$ gives a bijection from the set of blocks that contain $a$ to the set of subgroups of $G$ that contain $G_a$, with inverse bijection associating with a subgroup $K$ containing $G_a$ the orbit $K\cdot a$, and these bijections preserve inclusion.

Proof. — Let $B,B'$ be blocks such that $B\subseteq B'$ and let $g\in G_B$; then $g\cdot B'$ contains $g\cdot B=B$, hence $g\cdot B'$ is not disjoint from $B'$, so that $g\cdot B'=B'$ by definition of a block. This proves that $G_B$ is a subgroup of $ G_{B'}$.

Let $B$ be a block that contains $a$; then $G_B \cdot a=B$. Indeed, the inclusion $G_B\cdot a\subseteq B$ follows from the definition of $G_B$. To prove the other inclusion, let $b\in B$. Since the action is transitive, there exists $g\in G$ such that $g\cdot a=b$; then $g\cdot B$ and $B$ both contain $b$, hence $g\cdot B=B$, so that $g\in G_B$ and $b\in G_B\cdot a$.

Finally, let $K$ a a subgroup of $G$ containing $G_a$ and let $B=K\cdot a$. Let us prove that $B$ is a block such that $K=G_B$. Let $g\in G$ such that $g\cdot B$ and $B$ are not disjoint; let $b,c\in B$ be such that $b=g\cdot c$; write $b=k\cdot a$ and $c=h\cdot a$ for $k,h\in K$. Then $k\cdot a = gh\cdot a$ so that $k^{-1}gh\in G_a$, hence $k^{-1}gh\in K$; we conclude that $g\in K$, hence $g\cdot B=gK\cdot a = K\cdot a=B.$ So $B$ is a block. This also shows that $G_B\subseteq K$, and the converse inclusion is obvious.

Going back to our initial problem, it remains to show that the action of $K$ on $\{1;\dots;n\}$ only has trivial blocks. The proof uses two remarks.

The trace of a block on $\{1;\dots;p\}$, respectively $\{p+1;\dots;n\}$, is either a singleton, or all of it. Indeed, this trace is a block for the induced action of $(\mathfrak S_p\times\mathfrak S_{n-p})\cap G$ on $\{1;\dots;p\}$ (respectively $\{p+1;\dots;n\}$), and this action contains that of $\mathfrak A_p$ (respectively…) and even that of $\mathfrak S_p$ if $p\neq n-1$. On the other hand, the symmetric group acts 2-transitively, hence primitively. (The cases $p=1$ or $p=n-1$ need minor adjustements.)
If $2p<n$, then no nontrivial block can contain $\{p+1;\dots;n\}$. Indeed, there is not enough space in the complementary subset so that disjoint translates of this block make a partition of $\{1;\dots;n\}$.

Let us now conclude the proof. (I still find the following argument a bit convoluted but have nothing really better to propose yet.) Consider a block $B\subset\{1;\dots;n\}$ for the action of $K$, and assume that $B$ is not a singleton, nor the full set. If $B$ meets $\{p+1;\dots;n\}$ in at least two elements, then it contains $\{p+1;\dots;n\}$, hence is the full block, a contradiction. If $B$ meets $\{1;\dots;p\}$ in at least two elements, then it contains $\{1;\dots;p\}$, and some disjoint translate of it is contained in $\{p+1;\dots;n\}$; this translate is a block that contains $\{p+1;\dots;n\}$, hence is the full set, so that the initial block is the full set as well. By similar arguments, $B$ meets both $\{1;\dots;p\}$ and $\{p+1;\dots;n\}$ in exactly one element, and the same hold for any translate $k\cdot B$ of $B$. However, using the hypothesis that $p\neq n-p$ and that $K$ strictly contains $(\mathfrak S_p\times\mathfrak S_{n-p})\cap G$, we find $k\in K$ such that $k\cdot B$ meets $\{p+1;\dots;n\}$ in at least two elements, and we can conclude as earlier that $B$ is the full set.

To terminate this blog spot, I need to say something about Jordan's theorem. Jordan was concerned about the concept multiple transitivity: a group $G$ acting on a set $X$ is $m$-transitive if whenever systems of distinct elements $a_1,\dots,a_m$ on the one side, $b_1,\dots,b_m$ on the other side, are given, there exists $g\in G$ such that $g\cdot a_1=b_1,\dots g \cdot a_m=b_m$ (one assumes here that $m\leq {\mathrm{Card}(X)}$). Many theorems from this time (Matthieu, Bertrand, Serret, Jordan…), partly in relation with Galois theory of equations, aim at limiting the multiple transitivity of subgroups of the symmetric group. The symmetric group itself is $n$-transitive, if $n={\mathrm {Card}(X)}$, the alternating group is $(n-2)$-transitive, and other subgroups have to be much less transitive.

The general result of Jordan, proved in the Note C (page 664) to §398 of his Traité des substitutions et des équations algébriques (1870, Gauthier-Villars) is that a primitive subgroup of $\mathfrak S_n$ containing a cycle of prime order $p$ is $n-p+1$-transitive. For $p=2$, we get that this subgroup is $(n-1)$-transitive, hence is $\mathfrak S_n$; for $p=3$, we get that it is $(n-2)$-transitive, and that implies that it contains the alternating group $\mathfrak A_n$. I formalized these results in Lean, following the presentation of Wielandt's book on Finite permutation groups (theorem 13.3 of that reference). A later theorem of Jordan (1873; see theorem 13.9 in Wielandt's book) asserts that such a subgroup always contains the alternating group provided $n-p\geq 3$; I have not (not yet?) formalized it in Lean.

All in all, this gives a fairly sophisticated proof that the alternating group is simple. One of its merit is to follow a general line, that applies to many other groups. In particular, Iwasawa's criterion is also used by Wilson in his book Finite simple groups to prove that the simplicity of the Mathieu groups $M_{11}, M_{12}$, and of many other finite groups.

I just opened Jordan's book to write this blog post. Let me add that it contains (§85) another proof of simplicity of the alternating group, and I will try to explain it in a later post.