Showing posts with label combinatorics. Show all posts
Showing posts with label combinatorics. Show all posts

Tuesday, July 15, 2025

The Krull dimension of the semiring of natural numbers is equal to 2

Let $R$ be a ring. Its Krull dimension is the supremum of the lengths $n$ of chains $P_0\subsetneq P_1 \subsetneq\dots\subsetneq P_n$ of prime ideals of $R$. When $R$ is a field, the null ideal is the only prime ideal, and it is a maximal ideal so that its Krull dimension is zero. When $R$ is a principal ideal domain which is not a field, there are two kinds of prime ideals: the null ideal is prime (as in any domain), and the other prime ideals are the maximal ideals of $R$, generated by a prime element $p$. In particular, the Krull dimension of the ring of integers is equal to $1$.

It is classic that these concepts can be defined for semirings as well.

A semiring $R$ is a set endowed with a commutative and associative addition with a neutral element $0$, an associative multiplication with a neutral element $1$, such that addition distributes over multiplication: $(a+b)c=ac+bc$ and $c(a+b)=ca+cb$. When its multiplication is commutative, the semiring is said to be commutative.

Semirings $R$ have ideals: these are nonempty subsets $I$ which are stable under addition ($a+b\in I$ for $a,b\in I$), and stable under multiplication by any element of $R$: for general semirings, one has to distinguish between left, right, and two-sided ideals; for commutative semirings, the notions coincide.

An ideal $P$ of a semiring $R$ is said to be prime if $R\setminus P$ is a multiplicative subset; explicitely, $P\neq R$, and if $ab\in P$, then $a\in P$ or $b\in P$.

An ideal $P$ of a semiring $R$ is said to be maximal if $P\neq R$ and if there is no ideal $I$ such that $P\subsetneq I\subsetneq R$. A semiring $R$ is said to be local if it admits exactly one maximal ideal; this means that the set of non-invertible elements of $R$ is an ideal.

The amusing part comes from the classification of prime and maximal ideals of the semiring $\mathbf N$ of natural numbers, which I learned of via a Lean formalization project led by Junyan Xu.

Theorem.

  1. The semiring $\mathbf N$ is local; its maximal ideal is the set $\mathbf N\setminus\{1\}$.
  2. The null ideal is a prime ideal.
  3. The other prime ideals are the sets $p\mathbf N$, for all prime numbers $p$.

In particular, we have chains $\langle 0\rangle \subsetneq \langle p\rangle \subsetneq \mathbf N\setminus\{1\}$ of prime ideals, hence the result stated in the title of this post:

Corollary.The Krull dimension of the semiring of natural numbers is equal to $2$.

Proof.

  1. The element $1$ is the only unit of $\mathbf N$, and $\mathbf N\setminus\{1\}$ is obviously an ideal, necessarily its unique maximal ideal.
  2. The null ideal is a prime ideal, as in any semiring which is a domain.
  3. Let now $P$ be a nonzero prime ideal of $\mathbf N$ and let $p$ be the smallest nonzero element of $P$. Then $p\neq 1$ (otherwise, $P=\mathbf N$, which is not prime). The hypothesis that $P$ is prime implies that one of the prime factors of $p$ belongs to $p$; by the choice of $p$, this must be $p$ itself, so that $p$ is a prime number. Then $P$ contains the set $p\mathbf N $ of multiples of $p$, which is a prime ideal of $\mathbf N$. Let us assume that $p\mathbf N\subsetneq P$ and let $n\in P\setminus p\mathbf N$. By assumption, $p$ does not divide $n$, so that these two integers are coprime. By the following proposition, $P$ contains every integer at least equal to $(p-1)(n-1)$; in particular, it contains both a power of $2$ and a power of $3$; since $P$ is prime, it contains $2$ and $3$, and it contains any integer at least $(2-1)(3-1)=2$, hence $P=\mathbf N\setminus\{1\}$. This concludes the proof.

Proposition.Let $a$ and $b$ be nonzero coprime natural numbers. For any integer $n\geq (a-1)(b-1)$, there are natural numbers $u$ and $v$ such that $n=au+bv$.

Proof. — Since $a$ and $b$ are coprime, there are integers $u$ and $v$ such that $n=au+bv$. Replacing $u$ by $u+b$ and $v$ by $v-a$, we may assume that $0\leq u$. Replacing $u$ by $u-b$ and $v$ by $v+a$, we may assume that $u < b$. Then \[ bv = n - au \geq (a-1)(b-1)-a(b-1)= -(b-1). \] This implies $v\geq0$, so that $u$ and $v$ are natural numbers.

On the other hand, not all natural numbers $< (a-1)(b-1)$ can be written as such as sum. For example, $ab-a-b$ can't. Indeed, if $ab-a-b=au+bv$, hence $ab=a(u+1)+b(v+1)$, then $b$ divides $a(u+1)$, hence $b$ divides $u+1$, and similarly $a$ divides $v+1$. Then $ab$ is the sum of two nonzero multiples of $ab$, a contradiction. The precise distribution of the natural numbers $< (a-1)(b-1)$ which can be written as $au+bv$, for some natural numbers $u$ and $v$ is complicated, but at least one result is known : such an integer $n$ can be written in this form if and only if $ab-a-b-n$ cannot! One direction is clear, if $n$ and $ab-a-b-n$ can both be written in this form, then so can their sum, which is $ab-a-b$, a contradiction. On the other hand, let $n$ be an integer that cannot be written in this form, and write it as $n=au+bv$, for some integers $u$ and $v$, with $0\leq u< b$. By assumption, $v<0$, hence $v\leq -1$. Then \[ ab-a-b-n=ab-a-b-au-bv=a(b-1-u)+b(-v-1).\] We see that $b-1-u\geq 0$ and $-v-1\geq 0$, which shows that $ab-a-b-n$ can be written in the desired form.

Another open question of the style of the proposition had been raised by Frobenius: consider mutually coprime integers $a_1,\dots,a_r$; then any large enough integer $n$ can be written as $n=a_1u_1+\dots+a_ru_r$, for some natural numbers $u_1,\dots,u_r$, but when $r\geq 3$, there is no known formula for largest natural number that cannot be written in this form. The case $r=2$ that we discussed here was due to Sylvester (1884).

Friday, September 13, 2024

The combinatorial Nullstellensatz

The “combinatorial Nullstellensatz” is a relatively elementary statement due to Noga Alon (1999) whose name, while possibly frightening, really says what it is and what it is good for. (A freely available version is there.) Nullstellensatz is the classic name for a theorem of David Hilbert that relates loci in $F^n$ defined by polynomial equations and the ideals of the polynomial ring $F[T_1,\dots,T_n]$ generated by these equations, when $F$ is an algebraically closed field. The word itself is German and means “theorem about the location of zeroes”. The precise correspondence is slightly involved, and stating it here precisely would derail us from the goal of this post, so let us stay with these informal words for the moment. The adjective combinatorial in the title refers to the fact that Alon deduced many beautiful consequences of this theorem in the field of combinatorics, and for some of them, furnishes quite simple proofs. We'll give one of them below, the Cauchy—Davenport inequality, but my main motivation in writing this text was to discuss the proof of the combinatorial Nullstellensatz, in particular untold aspects of the proofs which took me some time to unravel.$\gdef\Card{\operatorname{Card}}$

The context is the following: $F$ is a field, $n$ is a positive integer, and one considers finite sets $S_1,\dots,S_n$ in $F$, and the product set $S=S_1\times \dots \times S_n$ in $F^n$. One considers polynomials $f\in F[T_1,\dots,T_n]$ in $n$ indeterminates and their values at the points $(s_1,\dots,s_n)$ of $S$. For every $i\in\{1,\dots,n\}$, set $g_i = \prod_{a\in S_i} (T_i-a)$; this is a polynomial of degree $\Card(S_i)$ in the indeterminate $T_i$.

Theorem 1. — If a polynomial $f\in F[T_1,\dots,T_n]$ vanishes at all points $s\in S$, there exist polynomials $h_1,\dots,h_n\in F[T_1,\dots,T_n]$ such that $f=g_1 h_1+\dots+g_n h_n$, and for each $i$, one has $\deg(g_i h_i)\leq \deg (f)$.

This theorem is really close to the classic Nullstellensatz, but the specific nature of the set $S$ allows to have a weaker hypothesis (the field $F$ is not assumed to be algebraically closed) and a stronger conclusion (the Nullstellensatz would assert that there exists some power $f^k$ of $f$ that is of this form, saying nothing of the degrees).

Its proof relies on a kind of Euclidean division algorithm. Assume that $f$ has some monomial $c_mT^m=c_mT_1^{m_1}\dots T_n^{m_n}$ where $m_i\geq \Card(S_i)$; then one can consider a Euclidean division (in the variable $T_i$), $T_i^{m_i}=p_i g_i + r_i$, where $\deg(r_i)<\Card(S_i)$. One can then replace this monomial $c_mT^m$ in $f$ by $c_m T^m/T_i^{m_i})r_i$, and, at the same time register $c_m T^m/T_i^{m_i} p_i$ to $h_i$. Since this amounts to subtract some multiple of $g_i$, the vanishing assumption still holds. Applying this method repeatedly, one reduces to the case where the degree of $f$ in the variable $T_i$ is $<\Card(S_i)$ for all $i$.
Then a variant of the theorem that says that a polynomial in one indeterminate has no more roots than its degree implies that $f\equiv 0$.

A weak point in the written exposition of this argument is the reason why the iterative construction would terminate. Intuitively, something has to decrease, but one needs a minute (or an hour, or a week) of reflexion to understand what precisely decreases. The problem is that if one works one monomial at a time, the degrees might stay the same. The simplest reason why this procedure indeed works belongs to the theory of Gröbner bases and to the proof that the division algorithm actually works: instead of the degrees with respect to all variables, or of the total degree, one should consider a degree with respect to some lexicographic ordering of the variables — the point is that it is a total ordering of the monomials, so that if one consider the leading term, given by the largest monomial, the degree will actually decrease.

The second theorem is in fact the one which is needed for the applications to combinatorics. Its statement is rather written in a contrapositive way — it will show that $f$ does not vanish at some point of $S$.

Theorem 2.Let $f\in F[T_1,\dots,T_m]$ and assume that $f$ has a nonzero monomial $c_mT^m$, where $|m|$ is the total degree of $f$. If, moreover, $m_i<\Card(S_i)$ for all $i$, then there exists a point $s=(s_1,\dots,s_n)\in S$ such that $f(s)\neq 0$.

It is that theorem whose proof took me a hard time to understand. I finally managed to formalize it in Lean, hence I was pretty sure I had understood it. In fact, writing this blog post helped me simplify it further, removing a couple of useless arguments by contradiction! Anyway, I feel it is worth being told with a bit more detail than in the original paper.

We argue by contradiction, assuming that $f(s)=0$ for all $s\in S_1\times\dots \times S_n$. Applying theorem 1, we get an expression $f=\sum_{i=1}^n g_i h_i$ where $\deg(g_ih_i)\leq \deg(f)$ for all $i$. The coefficient of $T^m$ in $f$ is non zero, by assumption; it is also the sum of the coefficients of the coefficients of $T^m$ in $g_i h_i$, for $1\leq i\leq n$, and we are going to prove that all of them vanish — hence getting the desired contradiction. $\gdef\coeff{\operatorname{coeff}}$

Fix $i\in\{1,\dots, n\}$.  If $h_i=0$, then $\coeff_m(g_ih_i)=\coeff_m(0)=0$, hence we may assume that $h_i\neq0$. This implies that $\deg(g_i h_i)=\Card(S_i)+\deg(h_i) \leq \deg(f)=|m|$.
One then has
\[ \coeff_m(g_i h_i)=\sum_{p+q=m} \coeff_p (g_i) \coeff_q(h_i), \]
and it suffices to prove that all terms of this sum vanish. Fix $p,q$ such that $p+q=m$, assume that  $\coeff_p(g_i)\neq 0$, and let us prove that $\coeff_q(h_i)=0$.

By the very definition of $g_i$ as a product of $\Card(S_i)$ factors of the form $T_i-a$, this implies that $p_j=0$ for all $j\neq i$. Moreover, $p_i\leq p_i+q_i=m_i < \Card(S_i)$, by the assumption of the theorem, hence $p_i<\Card(S_i)$. This implies \[\Card(S_i)+\deg(h_i)\leq |m|= |p+q|=|p|+|q|=p_i+|q|<\Card(S_i)+|q|.\]
Subtracting $\Card(S_i)$ on both sides, one gets $\deg(h_i)<|q|$, hence $\coeff_q(h_i)=0$, as was to be shown.

To conclude, let us add the combinatorial application to the Cauchy—Davenport theorem.
$\gdef\F{\mathbf F}$

Theorem 3.Let $p$ be a prime number and let $A$ and $B$ be nonempty subsets of $\F_p$, the field with $p$ elements. Denote by $A+B$ the set of all $a+b$, for $a\in A$ and $b\in B$. Then
\[ \Card(A+B) \geq \min (p, \Card(A)+\Card(B)-1).\]


First consider the case where $\Card(A)+\Card(B)>p$, so that $\min(p,\Card(A)+\Card(B)-1)=p$.  In this case, for every $c\in\F_p$, one has
\[\Card(A \cap (c-B))\cup \Card(A\cup (c-B))=\Card(A)+\Card(c-B)>\Card(A)+\Card(B)>p,\]
so that the sets $A$ and $c-B$ intersect. In particular, there exist $a\in A$ and $b\in B$ such that $a+b=c$ and $A+B=\F_p$, and the desired inequality holds.

Let us now treat the case where $\Card(A)+\Card(B)\leq p$, so that $\min(p,\Card(A)+\Card(B)-1)=\Card(A)+\Card(B)-1$. We will assume that the conclusion of the theorem does not hold, that is, $\Card(A+B)\leq \Card(A)+\Card(B)-2$, and derive a contradiction. Let us consider the polynomial $f\in \F_p[X,Y]$ defined by $f=\prod _{c\in A+B} (X+Y-c)$. The degree of $f$ is equal to $\Card(A+B)\leq \Card(A)+\Card(B)-2$. Choose natural integers $u$ and $v$ such that  $u+v=\Card(A+B)$, $u\leq\Card(A)-1$ and $v\leq \Card(B)-1$. The coefficient of $X^u Y^v$ is equal to the binomial coefficient $\binom{u+v}u$. Since $u+v\leq \Card(A)+\Card(B)-2\leq p-2$, this coefficient is nonzero in $\F_p$.  By theorem 2, there exists $(a,b)\in A\times B$ such that $f(a,b)\neq 0$. However, one has $a+b\in A+B$, hence $f(a,b)=0$ by the definition of $f$. This contradiction concludes the proof that $\Card(A+B)\geq \Card(A)+\Card(B)-1$. It also concludes this post.

Monday, August 12, 2024

Combinatorics of partitions

Divided powers are an algebraic gadget that emulate, in an arbitrary ring, the functions $x\mapsto x^n/n!$ for all integers $n$, together with the natural functional equations that they satisfy. 

One of them is a nice binomial theorem without binomial coefficients : denoting $x^n/n!$ by $x^{[n]}$, one has $$ (x+y)^{[n]}=\sum_{k=0}^n x^{[n-k]} y^{[k]}. $$ 

Another formula looks at what happens when one iterates the construction: if everything happens nicely, one has $$ (x^{[n]})^{[m]}=\frac1{m!}\left(x^n/n!\right)^m = \frac1{(n!)^m m!} x^{nm} = \frac{(mn)!}{m! n!^m} x^{[mn]}. $$ 

 In the development of the theory, the remark that comes then is the necessary observation that the coefficient $(mn)!/{m! n!^m}$ is an integer, which is not obvious since it is written as a fraction. Two arguments are given to justify this fact. 

  1. The formula $$ \frac{(mn)!}{m! n!^m} = \prod_{k=1}^{m} \binom{k n-1}{n-1} $$ which the authors claim can be proved by induction
  2. The observation that $(mn)!/m!n!^m$ is the number of (unordered) partitions of a set with $mn$ elements into $m$ subsets with $n$ elements.

The latter fact is a particular case of a more general counting question: if $n=(n_1,\dots,n_r)$ are integers and $N=n_1+\dots+n_r$, then the number of (unordered) partitions of a set with $N$ elements into subsets of cardinalities $n_1,\dots,n_r$ is equal to $$\frac{N!}{\prod n_i! \prod_{k>0} m_k(n)!},$$ where $m_k(n)$ is the number of occurences of $k$ in the sequence $(n_1,\dots,n_r)$. 

The goal of this blog post is to present a combinatorial argument for the first equality, and an alternative expression of the second number as a product of integers. We also give a formal, group theoretical proof, that this quotient of factorials solves the given counting problem. 

 $\gdef\abs#1{\lvert#1\rvert}$ $\gdef\Card{\operatorname{Card}}$  

Counting partitions of given type 

 Let $S$ be a set with $N$ elements. A partition of $S$ is a subset $\{s_1,\dots,s_r\}$ of $\mathfrak P(S)$ consisting of nonempty, pairwise disjoint, subsets whose union is $S$. Its *type* is the “multiset” of integers given by their cardinalities. Because no $s_i$ is empty, the type is a multiset of nonzero integers. It is slightly easier to authorize zero in this type; then a partition has to be considered as a multiset of subsets of $S$, still assumed pairwise disjoint, so that only the empty subset can be repeated.

The group $G$ of permutations of $S$ acts on the set $\Pi_S$ of partitions of $S$. This action preserves the type of a partition, so that we get an action on the set $\Pi_{S,n}$ on the set of partitions of type $n$, of any multiset of integers $n$. It is nonempty if and only if $\abs n=N$.

This action is transitive: Given two partitions $(s_1,\dots,s_r)$ and $(t_1,\dots t_r)$ with the same type $(n_1,\dots,n_r)$, numbered so that $\abs{s_i}=\abs{t_i}={n_i}$ for all $i$, we just define a permutation of $S$ that maps the elements of $s_i$ to $t_i$.

By the orbit-stabilizer theorem, the number of partitions of type $n$ is equal to $\Card(G)/\Card(G_s)$, where $G_s$ is the stabilizer of any partition $s$ of type $n$. Since $\Card(S)=N$, one has $\Card(G)=N!=\abs n!$.

On the other hand, the stabilizer $G_s$ of a partition $(s_1,\dots,s_r)$ can be described as follows. By an element $g$ of this stabilizer, the elements of $s_i$ are mapped to some set $s_j$ which has the same cardinality as $s_i$. In other words, there is a morphism of groups from $G_s$ to the subgroup of the symmetric group $\mathfrak S_r$ consisting of permutations that preserve the cardinality. This subgroup is a product of symmetric groups $\mathfrak S_{m_k}$, for all $k> 0$, where $m_k$ is the number of occurrences of $k$ in $(n_1,\dots,n_r)$. Here, we omit the factor corresponding to $k=0$ because our morphism doesnt' see it.

This morphism is surjective, because it has a section. Given a permutation $\sigma$ of $\{1,\dots,r\}$ that preserves the cardinalities, we have essentially shown above how to construct a permutation of $S$ that induces $\sigma$, and this permutation belongs to $G_s$. More precisely, if we number the elements of $s_i$ as $(x_{i,1},\dots,x_{i,n_i})$, we lift $\sigma$ to the permutation that maps $x_{i,j}$ to $x_{\sigma(i),j}$ for all $i,j$. This makes sense since $n_{i}=n_{\sigma(i)}$ for all $i$.

The kernel of this morphism $G_s\to \prod_{k>0} \mathfrak S_{m_k}$ consists of permutations that fix each $s_i$. It is clearly equal to the product of the symmetric groups $\prod \mathfrak S_{n_i}$.

One thus has $\Card(G_s)=\prod n_i! \prod_{k>0} m_k!$, as was to be shown.

A combinatorial interpretation of the first relation


As we said above, that relation $$ \frac{(mn)!}{m^ n!^m} = \prod_{k=1}^{m-1} \binom{kn-1}{n-1} $$ can be proved by induction, just playing with factorials, but we want a combinatorial interpretation for it. (Here we assume that $n\geq 1$.)
By the preceding paragraph, the left hand side is the number of partitions of a set with $mn$ elements of type $(n,\dots,n)$. We may assume that this set is numbered $\{1,\dots,mn\}$.

Such a partition can be defined as follows. First, we choose the subset that contains $1$: this amounts to choosing $n-1$ elements among $\{2,\dots,mn\}$, and there are $\binom{mn-1}{n-1}$ possibilities. Now, we have to choose the subset that contains the smallest element which is not in the first chosen subset. There are $n-1$ elements to choose among the remaining $mn-n-1=(m-1)n-1$ ones, hence $\binom{(m-1)n-1}{n-1}$ possibilities. And we go on in the same way, until we have chosen the $m$ required subsets. (For the last one, there is course only one such choice, and notice that the last factor is $\binom{n-1}{n-1}=1$.)

An integral formula for the number of partitions of given type


Finally, we wish to give an alternating formula for the number of partitions of given type of a set $S$ that makes it clear that it is an integer. Again, we can assume that the set $S$ is equal to $\{1,\dots,N\}$ and that $n=(n_1,\dots,n_r)$ is a multiset of integers such that $\abs n=N$.

Let us write $n$ in an other way, $n=(0^{m_0},1^{m_1},\dots)$, meaning that $0$ is repeated $m_0$ times, $1$ is repeated $m_1$ times, etc. One has $N=\sum k m_k$. The idea is that we can first partition $S$ into a subsets of cardinalities $m_1$, $2m_2$,… and then subpartition these subsets: the first one into $m_1$ subsets with one element, the second into $m_2$ subsets with two elements, etc.

The number of ordered partitions with $m_1$, $2m_2$… elements is the multinomial number $$ \binom{N}{m_1,2m_2,\dots}.$$ And the other choice is the product of integers as given in the preceding section. This gives $$ \frac{\abs n!}{\prod_i n_i! \prod_{k>0}m_k(n)!} = \binom{N}{m_1,2m_2,\dots}\prod_{k>0} \frac{(km_k)!}{k!^{m_k} m_k(n)!}.$$ We can also write the fractions that appear in the product as integers: $$ \frac{\abs n!}{\prod_i n_i! \prod_{k>0}m_k(n)!} =\binom{N}{m_1,2m_2,\dots} \prod_{k>0} \prod_{m=1}^{m_k} \binom {km-1}{k-1}.$$

Saturday, March 16, 2024

Combinatorics of the nilpotent cone

$\global\def\Card{\operatorname{Card}}\global\def\GL{\mathrm{GL}}\global\def\im{\operatorname{im}}\gdef\KVar{\mathrm{KVar}}$

Let $n$ be an integer and $F$ be a field. Nilpotent matrices $N\in \mathrm M_n(F)$ are those matrices for which there exists an integer $p$ with $N^p=0$. Their characteristic polynomial is $\chi_N(T)=T^n$, and they satisfy $N^n=0$, which shows that the set $\mathscr N_n$ of nilpotent matrices is an algebraic variety. The equation $N^n=0$ is homogeneous of degree $n$, so that $\mathscr N_n$ is a cone.

The classification of nilpotent matrices is an intermediate step in the theory of Jordan decomposition: In an adequate basis, a nilpotent matrix can be written as a diagonal block matrix of “basic” nilpotent matrices, $p \times p$ matrices of the form \[ \begin{pmatrix} 0 & 0 & \dots & 0 & 0 \\ 1 & 0 & & & \vdots \\ 0 & 1 & \ddots & & 0 \\ \vdots & \ddots & \ddots & \ddots & 0 \\ 0 & & 0 & 1 & 0\end{pmatrix} \] whose minimal polynomial is $T^p$. The sum of the sizes of these blocks is $n$ and in this way, it is associated with any $n\times n$ nilpotent matrix a partition $\pi$ of~$n$. It is known that two nilpotent matrices are conjugate if and only if they are associated with the same partition. For any partition $\pi$ of~$n$, let us denote by $J_\pi$ the corresponding matrix whose sizes of blocks are arranged in increasing order, and $\mathscr N_\pi$ the set of nilpotent matrices that are associated with the partition $\pi$.

The theorem of Fine and Herstein (1958)

Having to teach “agrégation” classes made me learn about a classic combinatorial result: counting the number of nilpotent matrices when $F$ is a finite field.

Theorem (Fine, Herstein, 1958). — Let $F$ be a finite field with $q$ elements. The cardinality of $\mathscr N_n(F)$ is $q^{n^2-n}$. Equivalently, the probability that an $n\times n$ matrix with coefficients in $F$ be nilpotent is $q^{-n}$.

The initial proof of this results relies on the action of $\GL_n(F)$ on $\mathscr N_n(F)$: we recalled that the orbits correspond with the partitions of $n$, hence a decomposition \[ \Card(\mathscr N_n(F)) = \sum_{\pi} \Card(\mathscr N_\pi(F)). \] We know that $\mathscr N_\pi(F)$ is the orbit of the matrix $J_\pi$ under the action of $\GL_n(F)$. By the classic orbit-stabilizer formula, one thus has \[ \Card(\mathscr N_\pi(F)) = \frac{\Card(\GL_n(F))}{\Card(C_\pi(F))}, \] where $C_\pi(F)$ is the set of matrices $A\in\GL_n(F)$ such that $AJ_\pi=J_\pi A$. The precise description of $C_\pi(F)$ is delicate but their arguments go as follow.

They first replace the group $C_\pi(F)$ by the algebra $A_\pi(F)$ of all matrices $A\in\mathrm M_n(F)$ such that $AJ_\pi=J_\pi A$. For any integer, let $m_i$ be the multiplicity of an integer $i$ in the partition $\pi$, so that $n=\sum i m_i$. The block decomposition of $J_\pi$ corresponds with a decomposition of $F^n$ as a direct sum of invariant subspaces $V_i$, where $V_i$ has dimension $i m_i$. In fact, $V_1=\ker(J_\pi)$, $V_1\oplus V_2=\ker(J_\pi^2)$, etc. This shows that $A_\pi(F)$ is an algebra of block-triangular matrices. Moreover, the possible diagonal blocks can be shown to be isomorphic to $\mathrm M_{m_i}(F)$. In other words, we have a surjective morphism of algebras \[ A_\pi(F) \to \prod_i \mathrm M_{m_i}(F), \] whose kernel consists of nilpotent matrices. In particular, the proportion of invertible elements in $A_\pi(F)$ is equal to the proportion of invertible elements in the product $\prod_i \mathrm M_{m_i}(F)$.

Ultimately, Fine and Herstein obtain an explicit sum over the set of partitions of $n$ which they prove equals $q^{n^2-n}$, after an additional combinatorial argument.

Soon after, the theorem of Fine and Herstein was given easier proofs, starting from Gerstenhaber (1961) to Kaplansky (1990) and Leinster (2021).

A proof

The following proof is borrowed from Caldero and Peronnier (2022), Carnet de voyage en Algébrie. It can be seen as a simplification of the proofs of Gerstenhaber (1961) and Leinster (2021).

Let us start with the Fitting decomposition of an endomorphism $u\in \mathrm N_n(F)$: the least integer $p$ such that $\ker(u^p)=\ker(u^{p+1})$ coincides with the least integer $p$ such that $\im(u^p)=\im(u^{p+1})$, and one has $F^n=\ker(u^p)\oplus \im(u^p)$. The subspaces $N(u)=\ker(u^p)$ and $I(u)=\im(u^p)$ are invariant under $u$, and $u$ acts nilpotently on $\ker(u^p)$ and bijectively on $\im(u^p)$. In other words, we have associated with $u$ complementary subspaces $N(u)$ and $I(u)$, a nilpotent operator of $N(u)$ and an invertible operator on $I(u)$. This map is bijective.

For any integer $d$, let $\nu_d$ be the cardinality of nilpotent matrices in $\mathrm M_d(F)$, and $\gamma_d$ be the cardinality of invertible matrices in $\mathrm M_d(F)$. Let also $\mathscr D_d$ be the set of all pairs $(K,I)$, where $K$ and $I$ are complementary subspaces of dimensions $d$, $n-d$ of $F^n$. We thus obtain \[ n^2 = \sum_{(K,I)\in\mathscr D_d} \nu_d \cdot \gamma_{n-d}. \] We need to compute the cardinality of $\mathscr D_d$. In fact, given one pair $(K,I)\in\mathscr D_d$, all other are of the form $(g\cdot K,g\cdot I)$, for some $g\in\GL_n(F)$: the group $\GL_n(F)$ acts transitively on $\mathscr D_d$. The stabilizer of $(K,I)$ can be identified with $\GL_d(F)\times \GL_{n-d}(F)$. Consequently, \[ \Card(\mathscr D_d) = \frac{\Card(\GL_n(F))}{\Card(\GL_d(F)\Card(\GL_{n-d}(F))} = \frac{\gamma_n}{\gamma_d \gamma_{n-d}}. \] We thus obtain \[ q^{n^2} = \sum_{d=0}^n \frac{\gamma_n}{\gamma_d \gamma_{n-d}} \nu_d \gamma_{n-d} = \gamma_n \sum_{d=0}^n \frac{\nu_d}{\gamma_d}. \] By subtraction, we get \[ \frac{\nu_n}{\gamma_n} = \frac {q^{n^2}}{\gamma_n} - \frac{q^{(n-1)^2}}{\gamma_{n-1}},\] or \[ \nu_n = q^{n^2} - q^{(n-1)^2} \frac{\gamma_n}{\gamma_{n-1}}. \] It remains to compute $\gamma_n$: since an invertible matrix consists of a nonzero vector, a vector which does not belong to the line generated by the first one, etc., we have \[ \gamma_n = (q^n-1) (q^n-q)\dots (q^n-q^{n-1}). \] Then, \[ \gamma_n = (q^n-1) q^{n-1} (q^{n-1}-1)\dots (q^{n-1}-q^{n-2}) = (q^n-1)q^{n-1} \gamma_{n-1}. \] We thus obtain \[ \nu_n = q^{n^2} - q^{(n-1)^2} (q^n-1) q^{n-1} = q^{n^2} - q^{(n-1)^2} q^{2n-1} + q^{(n-1)^2} q^{n-1} = q^{n^2-n}, \] as claimed.

The proof of Leinster (2021)

Leinster defines a bijection from $\mathscr N_n(F)\times F^n$ to $\mathrm M_n(F)$. The definition is however not very canonical, because he assumes given, for any subspace $V$ of $F^n$, a basis of $V$.

Take a pair $(u,x)$, where $u\in\mathscr N_n(F)$ and $x\in F^n$ and consider the subspace $V_x=\langle x,u(x),\dots\rangle$, the smallest $u$-invariant subspace of $F^n$ which contains $x$. To describe $u$, we observe that we know its restriction to $V_x$, and we need to describe it on the chosen complementary subspace $V'_x$.

To that aim, we have to give ourselves an endomorphism $u'_x$ of $V'_x$ and a linear map $\phi_x\colon V'_x\to V_x$. Since we want $u$ to be nilpotent, it is necessary and sufficient to take $u'_x$ nilpotent.

Instead of considering $\phi_x\colon V'_x\to V_x$, we can consider the map $y\mapsto y+\phi_x(y)$. Its image is a complement $W_x$ of $V_x$ in $F^n$, and any complement can be obtained in this way. The nilpotent endomorphism $u'_x$ of $V'_x$ transfers to a nilpotent endomorphism $w_x$ of $W_x$.

All in all, what the given pair $(u,x)$ furnishes is a subspace $V_x$ with a basis $(x_1=x,x_2=u(x),\dots)$, a complement $W_x$, and a nilpotent endomorphism $w_x$ of $W_x$. This is more or less what the Fitting decomposition of an endomorphism gives us! Recall that $V_x$ was assumed to have been given a basis $(e_1,\dots,e_p)$. There exists a unique automorphism of $V_x$ which maps $e_i$ to $u^{i-1}(x)$ for all $i$. In other words, we have a pair of complementary subspaces $(V_x,W_x)$, a linear automorphism of $V_x$, and a nilpotent automorphism of $W_x$. By the Fitting decomposition, these data furnish in a bijective way an endomorphism of $F^n$, and that concludes the proof.

A remark about motivic integration

The framework of motivic integration suggests to upgrade these combinatorial results into equalities valid for all field $F$, which hold in the Grothendieck ring of varieties $\KVar_F$. As an abelian group, it is generated by symbols $[X]$, for all algebraic varieties $X$ over $F$, with relations $[X]=[Y]+[X\setminus Y]$, whenever $Y$ is a closed subvariety of $X$. The ring structure is defined so that the formula $[X]\cdot[Y]=[X\times Y]$ for all algebraic varieties $X$ and $Y$ over $F$.

By construction of this ring, equalities $[X]=[Y]$ in $\KVar_F$ imply that many invariants of $X$ and $Y$ coincide. In particular, when $F$ is a finite field, they will have the same number of points.

The question is thus to compute the class $[\mathscr N_n]$ of the variety $\mathscr N_n$, for any field $F$. The proofs that I described above can be more or less transferred to this context and imply the following theorem. We denote by $\mathbf L\in \KVar_F$ the class of the affine line $\mathbf A^1$.

Theorem. — One has an equality $[\mathscr N_n] \mathbf L^n = \mathbf L^{n^2}$ in the localization of the Grothendieck ring $\KVar_F$ by the element $(\mathbf L-1)\dots(\mathbf L^{n-1}-1)$.

The following question is then natural. (I have not thought about it at all.)

Question. — Does one have $[\mathscr N_n]=\mathbf L^{n^2-n}$ in $\KVar_F$?

Friday, October 13, 2023

A combinatorial proof of the Newton identities

Let $T_1,\dots,T_n$ be indeterminates. For all $m\geq0$, denote by $S_m$ the $m$th elementary symmetric polynomial, given by \[ S_m = \sum_{i_1 \lt \dots \lt i_m} T_{i_1}\dots T_{i_m}. \] These are polynomials in $T_1,\dots, T_n$, and as their names suggest, these polynomials are symmetric, meaning that \[ S_m (T_{\sigma(1)},\dots,T_{\sigma(n)}) = S_m (T_1,\dots,T_n) \] for any permutation $\sigma$ of $\{1,\dots,n\}$. By definition, one has $S_0=1$ (there is exactly one family of integers, the empty one, and the empty product is equal to~$1$) and $S_m=0$ for $m>n$ (there are no families of integers $1\leq i_1\lt\dots\lt i_m\leq n$). A well-known theorem asserts that any symmetric polynomial in $T_1,\dots,T_n$ (with coefficients in a commutative ring $R$) can be written uniquely as a polynomial in $S_1,\dots,S_n$ (with coefficients in $R$). It is in particular the case for the Newton polynomials, defined for $p\geq 0$ by \[ N_p = T_1^p + \dots+T_n^p . \] In particular, $N_0=n$ and $N_1=S_1$. The next Newton polynomial can be computed by “completing the square”: \[ N_2 = T_1^2+\dots+T_n^2 = (T_1+\dots+T_n)^2 - 2 S _1 = S_1^2 - 2S_1. \] These are the first two (or three) of a family of relations, called the Newton identities, that relate the polynomials $N_p$ and the polynomials $S_m$: \[  \sum_{m=0}^{p-1} (-1)^m N_{p-m}\cdot S_m + (-1)^p p \cdot S_p = 0. \] Using that the coefficient of $N_p$ is given by $S_0=1$, they allow to express inductively the polynomials $N_p$ in terms of $S_1,\dots,S_p$. If $p>n$, all terms with $m>n$ vanish. Using that the coefficient of $S_p$ is $N_0=p$, these formulas also show, conversely, that if $p!$ is invertible in the commutative ring $R$, then $S_1,\dots,S_p$ can be expressed in terms of $N_1,\dots,N_p$. There are various proofs of these identities. Most of the one I have seen are algebraic in nature, in so that they rely on the relations between the roots of a polynomial and its coefficients, and/or on expansion in power series. The following proof, due to Doron Zeilberger (1984) (Discrete Math. 49, p. 319, doi:10.1016/0012-365X(84)90171-7), is purely combinatorial. I have been made aware of it because it is the one that is formalized in the proof assistant system Lean. (See there for the source code.) We consider the set $S=\{1,\dots,n\}$ and define a set $X$ whose elements are the pairs $(A,k)$ satisfying the following properties:
  • $A$ is a subset of $S$;
  • $k$ is an element of $S$;
  • the cardinality of $A$ is less or equal than $p$;
  • if ${\mathop {\rm Card}}(A)=p$, then $k\in A$.
Define a map $f\colon X\to X$ by the following rule: for $(A,k)$ in $X$, set
  • $f(A,k) = (A \setminus\{k\}, k)$ if $k\in A$;
  • $f(A,k) = (A\cup\{k\}, k)$ if $k\notin A$.
Write $f(A,k)=(A',k)$; observe that this is again an element of $X$. The first two conditions are obvious. In the first case, when $k\in A$, then the cardinality of $A'$ is one less than the cardinality of $A$, hence is at most $p-1$, and the last condition is vacuous. In the second case, when $k\notin A$, the cardinality of $A$ is strictly smaller than $p$, because $(A,k)\in X$, so that the cardinality of $A'$ is at most $p$; moreover, $k\in A'$ hence the last condition holds. Observe that $f\colon X\to X$ is an involution: $f\circ f={\mathrm{id}}_X$. Indeed, with the same notation, one has $f(A,k)=(A',k)$. In the first case, $k\in A$, hence $k\notin A'$, so that $f(A',k)=(A'\cup\{k\},k)=(A,k)$; in the second case, one has $m\in A'$, hence $f(A',k)=(A'\setminus\{k\}, k)=(A,k)$ since $A'=A\cup\{k\}$ and $k\notin A$. Moreover, $f$ has no fixed point because it switches pairs $(A,k)$ such that $k\in A$ and pairs such that $k\notin A$. Zeilberger's proof of the Newton identities build on a function $w\colon X\to R[T_1,\dots,T_n]$ such that $w(f(A,k))=-f(A,k)$ for all $(A,k)\in X$. Since $f$ has no fixed point, the expression vanishes \[ \sum_{(A,k)\in X} w(A,k) \] since one can cancel a term $(A,k)$ with its image $f(A,k)$. Zeilberger's definition of the function $w$ is \[ w(A,k) = (-1)^{\mathrm{Card}(A)} T_k^{p-\mathrm{Card}(A)}\cdot \prod_{a\in A} T_a. \] It satisfies the desired relation: for $(A,k)\in X$ such that $k\notin A$, one has $f(A,k)=(A\cup\{k\}, k)$, so that \[ w(f(A,k)) = (-1)^{\mathrm{Card}(A)+1} T_k^{k-\mathrm{Card}(A)-1} \prod_{a\in A\cup\{k\}} T_a = - (-1)^{\mathrm{Card}(A)} T_k^{k-\mathrm{Card}(A)} \prod_{a\in A} T_a = w(A,k). \] When $k\in A$, one has $f(A,k)=(A',k)$ with $k\notin A'$, and the formula applied to $(A',k)$ implies the one for $(A,k)$, because $f$ is an involution. Now, in the relation $ \sum_{(A,k)} w(A,k) =0$, we first sum on $A$ and then on $k$: \[ \sum_{A\subset S} \sum_{k | (A,k) \in X} (-1)^{\mathrm{Card}(A)} T_k^{p-{\mathrm{Card}(A)}} \prod_{a\in A} T_a = 0\] and put together all those sets $A$ which have the same cardinality, say, $m$. This gives \[ \sum_{m=0}^n (-1)^m \sum_{\mathrm{Card}(A)=m} \prod_{a\in A} T_a \cdot \sum_{k | (A,k)\in X} T_k^{p-m} = 0. \] Let us compute separately the contribution to the sum of each $m$. When $m\gt p$, no subset $A$ is authorized so that the corresponding sum is zero. On the opposite, when $m\lt p$, for any subset $A$ such that $\mathrm{Card}(A)=m$, all pairs $(A,k)$ belong to $X$, so that the inner sum is $N_{p-m}$, and the whole term is $(-1)^mS_m N_{p-m}$. Finally, when $m=p$, the only $k$ that appear in the inner sum are the elements of $A$, and the corresponding term is $\sum_{k\in A} T_k^0=\mathrm {Card}(A)=p$; this term contributes to $(-1)^p p S_p$.

Sunday, January 1, 2023

The Klein group, the centralizer of a permutation, and its relation with the alternating group

The following reflexion came out of my irrepressible need to understand why the 3 double transpositions in $\mathfrak S_4$, together with the identity, formed a group $V$. Of course, one might just say: “they are stable under multiplication, as one sees by computing the 4·3/2 = 6 different products”, but who makes this computation anyway? And since I wanted not only to understand this, but to explain it to Lean, I needed an argument that could actually be done, for real. So here is an argument that requires no computation, besides the one that says that there are 3 double transpositions.

Prop.The subgroup of $\mathfrak S_4$ generated by the 3 double transpositions is the unique 2-sylow subgroup of $\mathfrak A_4$. In particular, it has order 4 and consists of these 3 double transpositions and of the identity.
Proof. — Let $V$ be the subset of $\mathfrak S_4$ consisting of these 3 double transpositions and of the identity. Let $S$ be a 2-sylow subgroup in $\mathfrak A_4$.
We first prove $S \subseteq V$. The subgroup $S$ has order 4. Let $g\in S$. The order of $g$ divides 4, so its cycles have lengths 1, 2 or 4. If there were one cycle of length 4, then $g$ would be that cycle, hence of odd sign. Consequently, either $g=1$, or $g$ has a cycle of length 2, and then there must be a second because $g$ is even. Consequently, $S\subseteq V$, as claimed.
Since $4=\operatorname{\rm Card}(S)=\operatorname{\rm Card}(V)$, this shows that $S=V$, hence $S=\langle V\rangle$.

At this point, we still need to understand why there are 3 double transpositions. More generally, I wanted to prove that the number of permutations in $\mathfrak S_n$ of given orbit type. The orbit type a permutation $g$ is a multiset of strictly positive integers with sum $n$ given by the cardinalities of the orbits of $g$ on $\{1,\dots,n\}$. We write it as $1^{n_1} 2^{n_2}\dots r^{n_r}$, meaning that $g$ has $n_1$ orbits of length $1$ (fixed points), $n_2$ orbits of cardinality $2$, etc., so that $n= \sum n_i i$. Let $\mathscr O_g$ be the set of orbits of $g$. The action of $g$ on a given orbit $c$ coincides with a circular permutation with order the length $\ell(c)$ of this orbit; when it is nontrivial, such a permutation will be called a cycle of $g$. The supports of these cycles are pairwise disjoint, so that these cycles commute, and their product is exactly $g$. In fact, this is the only way of writing $g$ as a product of cycles with pairwise disjoint supports. (By convention, the identity is not a cycle.)

Theorem.There are exactly \[ N(1^{n_1}\dots r^{n_r}) = \frac{n!}{1^{n_1}\dots r^{n_r} n_1!\dots n_r!} \] permutations with orbit type $1^{n_1} 2^{n_2}\dots r^{n_r}$.

A standard proof of this result goes as follows. Write the decomposition of such a permutation $g$ into cycles with disjoint supports as $g=(\cdot)\dots (\cdot)(\cdot,\cdot)\dots(\cdot,\cdot,\dots)$, leaving blank spaces for the values of the cycles (and, contradicting our convention, allowing for cycles of length 1…). There are $n!$ ways to fill these spaces with the $n$ distinct integers between $1$ and $n$, but some of them will give rise to the same permutation. Indeed, the entries in a cycle of length $s$ only count up to a circular permutation, so that we need to divide the result by $1^{n_1}\dots r^{n_r}$. Moreoveer, we can switch the order of the cycles of given length, hence we also need to divide that result by $n_s!$ (number of ways of switching the various cycles of length $s$), for all possible length $s$.

This is reasonably convincing but one could wish for something more precise, both in the proof, and in the statement. In fact, in the preceding formula, the numerator $n!$ is the order of $\mathfrak S_n$. Since all permutations with a given orbit type are conjugate by $\mathfrak S_n$, the left hand side appears as the cardinality of the orbit of a permutation $g$ of that orbit type, so that the denominator has to be equal the cardinality of the stabilizer of this permutation under the action by conjugation. Therefore, a more precise proof of this formula could run by elucidating the structure of this centralizer. This may also be interesting once one wishes to relativize the result to the alternating group $\mathfrak A_n$ in order to obtain a formula for the cardinality of the various conjugacy classes in $\mathfrak A_n$.

Let us fix a permutation $g\in\mathfrak S_n$ with orbit type $1^{n_1}\dots r^{n_r}$. The stabilizer of $g$ under the action by conjugation is its centralizer $Z_g$, the subgroup of all $k\in\mathfrak S_n$ which commute with $g$.

We first define a morphism of groups \[ \tau \colon Z_g \to \mathfrak S_{n_1}\times \mathfrak S_{n_2}\times\dots \mathfrak S_{n_r}. \] Let $\mathscr O_g$ be the set of orbits of $g$; this is a set with cardinality $n_1+n_2+\dots+n_r$. Restricted to one orbit, the action of $g$ coincides with that of a circular permutation on (which fixes the complementary subset); these circular permuations have disjoint supports, hence they commute pairwise and their product is equal to $g$. For $c\in\mathscr O_g$, we write $\ell(c)$ for its cardinality of its support, this is also the order of the cycle of $g$ defined by this orbit. If $k\in Z_g$, then $kgk^{-1}=g$. Consequently, the action of $k$ permutes the orbits of $g$, respecting their cardinalities. This defines the desired group morphism $\tau$ from $Z_g$ to a product of permutation groups $\mathfrak S_{n_1}\times \dots \mathfrak S_{n_r}$.

This morphism $\tau$ is surjective.
Indeed, given permutations $\sigma_1$ of the set of fixed points of $g$, $\sigma_2$ of the set of orbits of length 2, etc., we construct $k_\sigma\in Z_g$ such that $\tau(k_\tau)=(\sigma_1,\dots,\sigma_r)$. We fix a point $a_c$ in each orbit $c$ and decide that $k_\sigma(a_c)=a_{\sigma_i(c)}$ if $c$ has length $i$. The formula $k_\sigma g=g_\sigma$ imposes $k_\sigma (g^n a_c)=g^n a_{\sigma_i(c)}$ for all $n\in\mathbf Z$, and it remains to check that this formula gives a well defined element in $Z_g$. In fact, this formula defines a group theoretic section of $\tau$.

What is the kernel of this morphism $\tau$?
If $\tau(k)=1$, then $k$ fixes every orbit $c\in\mathscr O_g$. Since $kg=gk$, we obtain that on each orbit $c$, $k$ coincides with some power of the corresponding cycle, which has order $\ell(c)$. We thus obtain an isomorphism \[ \ker(\tau) \simeq \prod_{c\in\mathscr C_g} (\mathbf Z/\ell(c)\mathbf Z). \]

To compute the cardinality of $Z_g$, it is now sufficient to compute those of $\operatorname{\rm im}(\tau)$ and $\ker(\tau)$, and this gives the formula \[ \operatorname{\rm Card} (Z_g) = \operatorname{\rm Card} (\ker(\tau)) \operatorname{\rm Card} (\operatorname{\rm im}(\tau)) = 1^{n_1}\dots r^{n_r} n_1! \dots n_r!, \] as was to be shown.

One of the interest of this argument is that it can be pushed forward to understand the structure of the conjugacy classes in the alternating group $\mathfrak A_n$. The case $n\leq 1$ is uninteresting, hence we assume $n\geq 2$. Then $\mathfrak A_n$ has index 2 in $\mathfrak S_n$, and the formulas \[ \operatorname  {\rm Card}((g)_{\mathfrak A_n}) = \frac{{\rm Card}({\mathfrak A_n})}{{\rm Card}(Z_g \cap {\mathfrak A_n})} \quad\text{\rm and}\quad \operatorname  {\rm Card}((g)_{\mathfrak S_n}) = \frac{{\rm Card}({\mathfrak S_n})}{{\rm Card}(Z_g)} \] for the cardinalities of the conjugacy classes $(g)_{\mathfrak A_n}$ and $(g)_{\mathfrak S_n}$ imply that both are equal if and only if $Z_g$ is not contained in $\mathfrak A_n$; otherwise, the conjugacy class $(g)_{\mathfrak S_n}$ is the disjoint union of $(g)_{\mathfrak A_n}$ and of a conjugacy class $(g')_{\mathfrak A_n}$ of a permutation $g'$ which is conjugate to $g$ in $\mathfrak S_n$ but not in $\mathfrak A_n$, and both have the same cardinality.

Examples of this phenomenon are classical. For example, the 5-cycles in $\mathfrak S_5$ are conjugate, but they constitute two distinct conjugacy classes under $\mathfrak A_5$. Even more elementary, the 3-cycles $(1\,2\,3)$ and $(1\,3\,2)$ are conjugate in $\mathfrak S_3$, but they are not conjugate in $\mathfrak A_3$ since that group is commutative!

So let us use our description of $Z_g$ to give a full description of this phenomenon.

As a first step, when is $\ker(\tau)$ contained in $\mathfrak A_n$? We have seen that $\ker(\tau)$ is generated by the cycles $c\in\mathscr C_g$. Consequently, $\ker(\tau)$ is contained in $\mathfrak A_n$ if and only if all of them are contained in $\mathfrak A_n$, which means that their lengths are odd.

We assume that this condition holds, so that $\ker(\tau)\subseteq \mathfrak A_n$, and now work on the image of $\tau$. Its surjectivity was proved by the means of an explicit section $\sigma\mapsto k_\sigma$. Given the preceding condition that $\ker(\tau)\subseteq \mathfrak A_n$, a necessary and sufficient condition for the inclusion $Z_g\subseteq \mathfrak A_n$ will be that the image of this section consists of even permutations. This section is a morphism of groups, so it suffices to understand the sign of $k_\sigma$ when $\sigma$ consists of a cycle $(c_1,\dots,c_s)$ in $\mathfrak S_{n_i}$ and is trivial on the other factors. Then $\ell(c_1)=\dots=\ell(c_s)$, by definition of $\sigma$. The formula $k_\sigma(g^n a_c)=g^n a_{\sigma(c)}$ shows that the non trivial cycles of $k_\sigma$ are of the form $(g^n a_{c_1},\dots, g^n a_{c_s})$; they all have the same length, $s$, and there are $\ell(c_1)$ of them. Consequently, the sign of $k_\sigma$ is equal to $(-1)^{(s-1)\ell(c_1)}=(-1)^{s-1}$ since $\ell(c_1)$ is odd. This proves that the sign of $k_\sigma$ is equal to the sign of $\sigma$. In addition to the condition that the orbits of $g$ have odd cardinalities, a necessary and sufficient condition for the image of $\sigma\mapsto k_\sigma$ to be contained in $\mathfrak A_n$ is thus that all symmetric groups $\mathfrak S_{n_i}$ coincide with their alternating groups, that is, $n_i\leq 1$ for all $i$. We can now conclude:

Theorem. — Let $1^{n_1}\dots r^{n_r}$ be a partition of $n$.

  1. If $n_i=0$ for even $i$, and $n_i\leq 1$ for all $i$, then there exist two permutations in $\mathfrak S_n$ with orbit type $1^{n_1}\dots r^{n_r}$ which are not conjugate in $\mathfrak A_n$.
  2. Otherwise, any two permutations in $\mathfrak S_n$ with that orbit type are conjugate in $\mathfrak A_n$.

We can check the initial example of two 5-cycles in $\mathfrak S_5$ which are not conjugate in $\mathfrak A_5$. Their orbit type is $5^1$: the only length that appears is 5, hence odd, and it has multiplicity $1$. In fact, this is the only orbit type in $\mathfrak S_5$ where this phenomenon appears!

Friday, April 29, 2016

Roth's theorems

A few days ago,  The Scotsman published a paper about Klaus Roth's legacy, explaining how he donated his fortune (1 million pounds) to various charities. This paper was reported by some friends on Facebook. Yuri Bilu added the mention that he knew two important theorems of Roth, and since one of them did not immediately reached my mind, I decided to write this post.

The first theorem was a 1935 conjecture of Erdős and Turán concerning arithmetic progression of length 3 that Roth proved in 1952. That is, one is given a set $A$ of positive integers and one seeks for triples $(a,b,c)$ of distinct elements of $A$ such that $a+c=2b$; Roth proved that infinitely many such triples exist as soon as the upper density of $A$ is positive, that is:
\[ \limsup_{x\to+\infty} \frac{\mathop{\rm Card}(A\cap [0;x])}x >0. \]
In 1975, Endre Szemerédi proved that such sets of integers contain (finite) arithmetic progressions of arbitrarily large length. Other proofs have been given by Hillel Furstenberg (using ergodic theory) and Tim Gowers (by Fourier/combinatorical methods); Roth had used Hardy-Littlewood's circle method.

In 1976, Erdős strengthened his initial conjecture with Turán and predicted that arithmetic progressions of arbitrarily large length exist in $A$ as soon as
\[ \sum_{a\in A} \frac 1a =+\infty.\]
Such a result is still a conjecture, even for arithmetic progressions of length $3$, but a remarkable particular case has been proved by Ben Green and Terry Tao in 2004, when $A$ is the set of all prime numbers.

Outstanding as these results are (Tao has been given the Fields medal in 2006 and Szemerédi the Abel prize in 2012), the second theorem of Roth was proved in 1955 and was certainly the main reason for awarding him the Fields medal in 1958. Indeed, Roth gave a definitive answer to a long standing question in diophantine approximation that originated from the works of Joseph Liouville (1844). Given a real number $\alpha$, one is interested to rational fractions $p/q$ that are close to $\alpha$, and to the quality of the approximation, namely the exponent $n$ such that $\left| \alpha- \frac pq \right|\leq 1/q^n$. Precisely, the approximation exponent $\kappa(\alpha)$ is the largest lower bound of all real numbers $n$ such that the previous inequality has infinitely many solutions in fractions $p/q$, and Roth's theorem asserts that one has $\kappa(\alpha)=2$ when $\alpha$ is an irrational algebraic number.

One part of this result goes back to Dirichlet, showing that for any irrational number $\alpha$, there exist many good approximations with exponent  $2$. This can be proved using the theory of continued fractions and is also a classical application of Dirichlet's box principle. Take a positive integer $Q$ and consider the $Q+1$ numbers $q\alpha- \lfloor q\alpha\rfloor$ in $[0,1]$, for $0\leq q\leq Q$; two of them must be less that $1/Q$ apart; this furnishes integers $p',p'',q',q''$, with $0\leq q'<q''\leq Q$ such that $\left| (q''\alpha-p'')-(q'\alpha-p')\right|\leq 1/Q$; then set $p=p''-p'$ and $q=q''-q'$; one has $\left| q\alpha -p \right|\leq 1/Q$, hence $\left|\alpha-\frac pq\right|\leq 1/Qq\leq 1/q^2$.

To prove an inequality in the other direction, Liouville's argument was that if $\alpha$ is an irrational root of a nonzero polynomial $P\in\mathbf Z[T]$, then $\kappa(\alpha)\leq\deg(P)$. The proof is now standard: given an approximation $p/q$ of $\alpha$, observe that $q^d P(p/q)$ is a non-zero integer (if, say, $P$ is irreducible), so that $\left| q^d P(p/q)\right|\geq 1$. On the other hand, $P(p/q)\approx (p/q-\alpha) P'(\alpha)$, hence an inequality $\left|\alpha-\frac pq\right|\gg q^{-d}$.

This result has been generalized, first by Axel Thue en 1909 (who proved an inequality $\kappa(\alpha)\leq \frac12 d+1$), then by Carl Ludwig Siegel and Freeman Dyson in 1947 (showing $\kappa(\alpha)\leq 2\sqrt d$ and $\kappa(\alpha)\leq\sqrt{2d}$). While Liouville's result was based in the minimal polynomial of $\alpha$, these generalisations required to involve polynomials in two variables, and the non-vanishing of a quantity such that $q^dP(p/q)$ above was definitely less trivial. Roth's proof made use of polynomials of arbitrarily large degree, and his remarkable achievement was a proof of the required non-vanishing result.

Roth's proof was “elementary”, making use only of polynomials and wronskians. There are today more geometric proofs, such as the one by Hélène Esnault and Eckart Viehweg (1984) or Michael Nakamaye's subsequent proof (1995) which is based on Faltings's product theorem.

What is still missing, however, is the proof of an effective version of Roth's theorem, that would give, given any real number $n>\kappa(\alpha)$, an actual integer $Q$ such that every rational fraction $p/q$ in lowest terms such that $\left|\alpha-\frac pq\right|\leq 1/q^n$ satisfies $q\leq Q$. It seems that this defect lies at the very heart of almost all of the current approaches in diophantine approximations... 

Tuesday, November 27, 2012

Finite choices

The axiom of choice says that an arbitrary product $\prod_{i\in I} A_i$ of non-empty sets $A_i$ indexed by a set $I$ is non-empty. It is well known that this axiom does not follow from the other axioms of Zermelo-Fraenkel theory. Even finite choices, that is, this statement restricted to the case where all sets are finite, is not a consequence. Even 2-choices, when one assumes that $A_i$ has two elements!

For each integer $n$, call  ${\rm AC}(n)$ the axiom of choice restricted to families $(A_i)$ where $A_i$ has $n$ elements. 

Tarski proved the funny following fact: ${\rm AC}(2) \Rightarrow {\rm AC}(4)$—if you know how to choose between 2 elements, you can choose between 4.

The proof is in fact quite easy. Consider a family $(A_i)$ of sets with 4 elements. I will use choice functions furnished by ${\rm AC}(2)$ to pick-up one preferred element from $A_i$. For simplicity, label the elements of $A_i$ as $\{a,b,c,d\}$ and remove the index $i$. Then, consider the set  $\{\{a,b\},\{a,c\},\{a,d\},\{b,c\},\{b,d\},\{c,d\}\}$ of all pairs of elements of $A_i$. The hypothesis ${\rm AC}(2)$ allows to choose, for each of those pairs, one preferred element. Call $n_a,n_b,n_c,n_d$ the number of times $a,b,c,d$ has been chosen; one thus has $n_a+n_b+n_c+n_d=6$ and consider those elements which have been chosen the most often, those for which $n_?$ is maximal.
  • If there is only one, let's choose it. (This happens in repartitions like $(3,1,1,1)$, etc.)
  • If there are three such elements (the repartition must be $(2,2,2,0)$), let's choose the unique one which has never been chosen.
  • There can't be four such elements because 4 does not divides 6.
  • If there are two (repartition $(2,2,1,1)$), then use your 2-choice function on this pair!

The other funny, but more difficult, thing, is that ${\rm AC}(2)$ does not imply ${\rm AC}(3)$! Why? because the group $\{\pm1\}$ can act without fixed points on a 2-elements set but cannot on a 3-elements set.  I hope to be able to say more on this another day.

Wednesday, October 17, 2012

Ultrafilters

Ultrafilters won't make american coffee better, but they have nice applications in mathematics. In fact, the last three (working) days, I happened to hear about them with three different colleagues, in three different directions.

Let me recall that an ultrafilter $\cal U$ on a set $X$ is a subset of $\mathfrak P(X)$ satisfying the following axioms:
  1. The empty set does not belong to $\cal U$;
  2. If $A\in\cal U$ and $B\supset A$, then $B\in\cal U$;
  3. If $A,B\in \cal U$, then $A\cap B\in\cal U$;
  4. For any subset $A\subset X$, either $A$ or $\complement A$ belongs to $\cal U$ (but not both).
In informal terms, an element of $\cal U$ may be thought of as a “large” subset of $X$; the axioms say 1. that the empty set is not large, 2. that a superset of a large set is again large, 3. that large sets are large enough so that intersections of two large sets is large, and finally, 4. that if a set is not large, then its complement is large. In fact, this last axiom is the one which makes a filter “ultra”.

Here is an example of an ultrafilter. Fix a point $x\in X$ and decree that a set $A\subset X$ is large iff it contains $x$. Such ultrafilters are called principal and they do not quite respect the intuition of elements of an ultrafilter being a large set. Unfortunately, there is no explicit construction of an ultrafilter which is not principal. Fortunately, choice-like axioms imply the existence of non principal ultrafilters.

Ultrafilters are used to construct ultraproducts

Take a family $(A_x)$ of  non-empty sets indexed by the set $X$ and fix a nonprincipal ultrafilter on $X$. (The definition below is not the correct one when some sets $A_x$ may be empty; the correct one is slightly more elaborate. I learnt it from Laurent Moret-Bailly, see Section 2 of his beautiful paper An extension of Greenberg's theorem to general valuation ringsarXiv:1106.0984v3.) Define an equivalence relation on the product set $A=\prod A_x$ by saying that two families $(a_x)$ and $(b_x)$ are equivalent iff the set of $x$ such that $a_x=b_x$ is large. Let $A^*$ be the quotient of $A$ by this equivalence relation; it is called the ultraproduct of the $A_x$. When the $A_x$ have some extra-structure (relations, operations, etc.) on can transport them to the ultraproduct $A^*$. For example, if, for each $x$, $R_x$ is a binary relation on $A_x$, $a,b\in A^*$ are classes of $(a_x)$ and $(b_x)$, say that $R(a,b)$ iff the set of $x$ such that $R_x(a_x,b_x)$ is large. We shall see some specific examples below.

There are many nice bits of mathematics where ultrafilters are involved. Here are three of them which I would like not to forget.

1) Cardinality of ultraproducts of countable sets.

If there is a large subset of $X$ on which $\mathrm{Card}(A_x)$ is bounded, then $A^*$ is finite. Otherwise, $A^*$ is infinite, and has in fact at least the power of the continuum.

Here is a classical proof in the case $X=\mathbf N$, with some twists that helped me understand it.

We want to construct a map from $2^{\mathbf N}$ to $A^*$. Here, $2^{\mathbf N}$ is the set of subsets of $\mathbf N$,  but can be almost viewed as the set of $2$-adic integers, a subset of $\mathbf N$ corresponding to a $2$-adic integer whose nonzero digits are exactly those belonging to the given subset. In formula, $S\subset\mathbf N$ corresponds to the $2$-adic integer $f(S)=\sum_{n=0}^\infty \chi_S(n) 2^n$. If we sum only $n$ terms, we get an approximation $f_n(S)$ modulo $(2^n)$ which is an element of $\{0,\dots,2^n-1\}$. In particular, $f_n(S)\rightarrow f(S)$. This shows that if $S\neq T$ are two distincts subsets of $B$, then $f_n(S)\neq f_n(T)$ for all but finitely many integers~$n$.

Enumerate a large subset $X_1=\{x_0,\dots\} $ of $X$ such that $A_{x_n}$ has at least $2^n$ elements and enumerate those elements. For any $S\in 2^{\mathbf N}$, consider the family $a(S)=(a_x)$ of $\prod A_x$ such that for every $n$, $a_{x_n}$ is the $f_n(S)$th element of $A_{x_n}$, and $a_x$ is any fixed element of $A_x$ otherwise. Let $a^*(S)$ be the class of this family in $A^*$. If $S\neq T$, then $f_S(n)\neq f_T(n)$ for all but finitely many $n$, so that $a(S)$ and $a(T)$ differ in a large set (the complement to a finite set in a large set), hence $a^*(S)\neq a^*(T)$. We have thus constructed an injective map from $2^{\mathbf N}$ to $A^*$, hence $\mathrm{Card}(A^*)\geq 2^{\mathbf N}$.

2) An “ultrafilter proof” of the infinite Ramsey theorem

Any infinite graph contains an infinite subset which is either a complete subgraph (any two vertices are linked by an edge) or a discrete subgraph (no two vertices are linked by an edge). In this statement, a graph is assumed to have no loop.

Let $V$ be the set of vertices of your graph $G$. Choose a non principal ultrafilter on $V$. Say a vertex $v\in V$ is social if the set of its neighbors (those linked to $v$ by an edge of $G$) is large; otherwise, say it is lonely. Now, assuming that the set of social vertices is large, we will construct an infinite complete subgraph of $G$. Otherwise, the set of lonely vertices is large and the same construction (or the consideration of the opposite graph) gives an infinite discrete subgraph.

Fix a social vertex $v_0$. The set of social neighbors of $v_0$ is large, because it is the intersection of the large set of social vertices with the large set of neighbors of $v_0$; so $v_0$ has a social neighbor $v_1$. Assume we have constructed social vertices $v_0,\dots,v_{n-1}$ which are pairwise linked by an edge, the set of social vertices of $V$ which are linked to all of the $v_i$ is large, again because it is the intersection of the large set of social vertices with finitely many large sets of neighbors of each $v_i$. So there is a social vertex $v_n$ which is a neighbor of $v_0,\dots,v_{n-1}$. Go on.

This infinite Ramsey theorem implies the finite one: For any integer $n$, there is an integer $r(n)$ such that  each graph with at least $r(n)$ vertices possesses a subgraph of cardinality $n$ which is either complete or discrete. The proof uses ultrafilters again, but in a different way. Assume by contradiction that the finite Ramsey theorem is false. So there is an integer $n$ and a family $(G_i)$ of finite graphs with an increasing number of vertices, none of which contains a complete or discrete subgraph with $n$ vertices. Take the ultraproduct $G$ of the graphs $G_i$, namely the product graph $\prod G_i$ modulo the equivalence relation $(g_i)\sim (g'_i)$ iff the set of $i$ such that $g_i=g'_i$
is large, for some given non principal ultraproduct on the set $I$ of indices. Write $[g_i]$ for the class of a family $(g_i)$; say that $[g_i]$ is linked to $[g'_i]$ if the set of indices $i$ for which $g_i$ is linked to $g'_i$ in $G_i$ is large. Then $G$ is a graph, an infinite graph (by the result on cardinalities of ultraproducts).

While I write this proof, I realize that this proof prompts naturally the question of the Ramsey theorem for cardinals: Assuming $V$ has some (infinite) cardinality, what is the smallest cardinality for which
there may not exist an either complete or discret subgraph of that cardinality? For the previous proof to work, one needs transfinite induction, so one needs the ultrafilter to be stable by possibly infinite intersections.

3) Ax's proof of Hilbert's Nullstellensatz

Let $k$ be an algebraically closed field, let $f_1,\dots,f_m$ be polynomials in $d$ variables with coefficients in $k$. Assume that the system of polynomial equations
\[ f_1(x_1,\dots,x_d)=\dots=f_m(x_1,\dots,x_d)=0 \]
has a solution in some extension $K$ of $k$. Then, it already has a solution in $k$.

The following proof is given by Ax in the surprising paper A Metamathematical Approach to some Problems in Number Theory, published in the proceedings of a conference  (Proc. Symp. Pure Math., 20, Amer. Math. Soc). I say surprising, because the main part of this 30-pages paper is about 6 pages, the rest is an appendix providing a beautiful introduction the theory of valuations!

I basically copy Ax. One may assume that both fields $k$ and $K$ are countable and algebraically closed (first replace $k$ by the algebraic closure of the subfield generated by the coefficients
of the polynomials $f_1,\dots,f_m$; then replace $K$ by the algebraic closure of the subfield generated over $k$ by the coordinates $(x_1,\dots,x_d)$ of some solution). 

Consider a nonprincipal ultrafilter on the set $\mathbf N$ of integers and look at the ultraproducts $k^*$ and $K^*$ of the families $(k_n)$ and $(K_n)$, where $k_n=k$ and $K_n=K$ for each $K$. These are algebraically closed fields; moreover,  $k^*$ contains $k$ (the set of classes of all families $(a_n)$ where $a_n\equiv a$ for  some $a\in k$). From a solution $(x_1,\dots,x_d)$ in $K$ and considering the class of the constant family, one gets a solution $(x^*_1,\dots,x^*_d)$ in $(K^*)^d$ of the system of equations $f_1(x^*)=\dots=f_m(x^*)=0$.

By set theory, $\mathrm{Card}(k^*)\leq {\mathbf N}^{\mathbf N}=2^{\mathbf N}$, while we have shown that $\mathrm{Card}(k^*)\geq 2^{\mathbf N}$, so that $k^*$ has the cardinality of the continuum. Likewise, $K^*$ has the cardinality of the continuum. Consequently (by set theory again),  $k^*$ and $K^*$ both have a transcendence basis over $k$ of cardinality the continuum; since they are algebraically closed, they are isomorphic as extensions of $k$. The image of the solution $x^*$ by a $k$-isomorphism from $K^*$ to $k^*$ is a solution $y^*$ of our system in $k^*$. By definition, $y^*=(y^*_1,\dots,y^*_d)$ where for each $i$, $y^*_i$ is the class of a sequence $(y_{i,n})$ of elements of $k$. Since $f_j(y^*)=0$, the set of integers $n$ such that $f_j(y_{1,n},\dots,y_{d,n})=0$ is large. Consequently, there is a large set of integers $n$ such that $(y_{1,n},\dots,y_{d,n})$ is a solution of our system. Any such integer $n$ is a proof that this system already has a solution in $k$. Q.E.D.

I do no know whether this proof is really easier than the ones usually given in an algebra course, for all the set theory involved is rather delicate, but I found it rather appealing.