Saturday, March 16, 2024

Combinatorics of the nilpotent cone

$\global\def\Card{\operatorname{Card}}\global\def\GL{\mathrm{GL}}\global\def\im{\operatorname{im}}\gdef\KVar{\mathrm{KVar}}$

Let $n$ be an integer and $F$ be a field. Nilpotent matrices $N\in \mathrm M_n(F)$ are those matrices for which there exists an integer $p$ with $N^p=0$. Their characteristic polynomial is $\chi_N(T)=T^n$, and they satisfy $N^n=0$, which shows that the set $\mathscr N_n$ of nilpotent matrices is an algebraic variety. The equation $N^n=0$ is homogeneous of degree $n$, so that $\mathscr N_n$ is a cone.

The classification of nilpotent matrices is an intermediate step in the theory of Jordan decomposition: In an adequate basis, a nilpotent matrix can be written as a diagonal block matrix of “basic” nilpotent matrices, $p \times p$ matrices of the form \[ \begin{pmatrix} 0 & 0 & \dots & 0 & 0 \\ 1 & 0 & & & \vdots \\ 0 & 1 & \ddots & & 0 \\ \vdots & \ddots & \ddots & \ddots & 0 \\ 0 & & 0 & 1 & 0\end{pmatrix} \] whose minimal polynomial is $T^p$. The sum of the sizes of these blocks is $n$ and in this way, it is associated with any $n\times n$ nilpotent matrix a partition $\pi$ of~$n$. It is known that two nilpotent matrices are conjugate if and only if they are associated with the same partition. For any partition $\pi$ of~$n$, let us denote by $J_\pi$ the corresponding matrix whose sizes of blocks are arranged in increasing order, and $\mathscr N_\pi$ the set of nilpotent matrices that are associated with the partition $\pi$.

The theorem of Fine and Herstein (1958)

Having to teach “agrégation” classes made me learn about a classic combinatorial result: counting the number of nilpotent matrices when $F$ is a finite field.

Theorem (Fine, Herstein, 1958). — Let $F$ be a finite field with $q$ elements. The cardinality of $\mathscr N_n(F)$ is $q^{n^2-n}$. Equivalently, the probability that an $n\times n$ matrix with coefficients in $F$ be nilpotent is $q^{-n}$.

The initial proof of this results relies on the action of $\GL_n(F)$ on $\mathscr N_n(F)$: we recalled that the orbits correspond with the partitions of $n$, hence a decomposition \[ \Card(\mathscr N_n(F)) = \sum_{\pi} \Card(\mathscr N_\pi(F)). \] We know that $\mathscr N_\pi(F)$ is the orbit of the matrix $J_\pi$ under the action of $\GL_n(F)$. By the classic orbit-stabilizer formula, one thus has \[ \Card(\mathscr N_\pi(F)) = \frac{\Card(\GL_n(F))}{\Card(C_\pi(F))}, \] where $C_\pi(F)$ is the set of matrices $A\in\GL_n(F)$ such that $AJ_\pi=J_\pi A$. The precise description of $C_\pi(F)$ is delicate but their arguments go as follow.

They first replace the group $C_\pi(F)$ by the algebra $A_\pi(F)$ of all matrices $A\in\mathrm M_n(F)$ such that $AJ_\pi=J_\pi A$. For any integer, let $m_i$ be the multiplicity of an integer $i$ in the partition $\pi$, so that $n=\sum i m_i$. The block decomposition of $J_\pi$ corresponds with a decomposition of $F^n$ as a direct sum of invariant subspaces $V_i$, where $V_i$ has dimension $i m_i$. In fact, $V_1=\ker(J_\pi)$, $V_1\oplus V_2=\ker(J_\pi^2)$, etc. This shows that $A_\pi(F)$ is an algebra of block-triangular matrices. Moreover, the possible diagonal blocks can be shown to be isomorphic to $\mathrm M_{m_i}(F)$. In other words, we have a surjective morphism of algebras \[ A_\pi(F) \to \prod_i \mathrm M_{m_i}(F), \] whose kernel consists of nilpotent matrices. In particular, the proportion of invertible elements in $A_\pi(F)$ is equal to the proportion of invertible elements in the product $\prod_i \mathrm M_{m_i}(F)$.

Ultimately, Fine and Herstein obtain an explicit sum over the set of partitions of $n$ which they prove equals $q^{n^2-n}$, after an additional combinatorial argument.

Soon after, the theorem of Fine and Herstein was given easier proofs, starting from Gerstenhaber (1961) to Kaplansky (1990) and Leinster (2021).

A proof

The following proof is borrowed from Caldero and Peronnier (2022), Carnet de voyage en Algébrie. It can be seen as a simplification of the proofs of Gerstenhaber (1961) and Leinster (2021).

Let us start with the Fitting decomposition of an endomorphism $u\in \mathrm N_n(F)$: the least integer $p$ such that $\ker(u^p)=\ker(u^{p+1})$ coincides with the least integer $p$ such that $\im(u^p)=\im(u^{p+1})$, and one has $F^n=\ker(u^p)\oplus \im(u^p)$. The subspaces $N(u)=\ker(u^p)$ and $I(u)=\im(u^p)$ are invariant under $u$, and $u$ acts nilpotently on $\ker(u^p)$ and bijectively on $\im(u^p)$. In other words, we have associated with $u$ complementary subspaces $N(u)$ and $I(u)$, a nilpotent operator of $N(u)$ and an invertible operator on $I(u)$. This map is bijective.

For any integer $d$, let $\nu_d$ be the cardinality of nilpotent matrices in $\mathrm M_d(F)$, and $\gamma_d$ be the cardinality of invertible matrices in $\mathrm M_d(F)$. Let also $\mathscr D_d$ be the set of all pairs $(K,I)$, where $K$ and $I$ are complementary subspaces of dimensions $d$, $n-d$ of $F^n$. We thus obtain \[ n^2 = \sum_{(K,I)\in\mathscr D_d} \nu_d \cdot \gamma_{n-d}. \] We need to compute the cardinality of $\mathscr D_d$. In fact, given one pair $(K,I)\in\mathscr D_d$, all other are of the form $(g\cdot K,g\cdot I)$, for some $g\in\GL_n(F)$: the group $\GL_n(F)$ acts transitively on $\mathscr D_d$. The stabilizer of $(K,I)$ can be identified with $\GL_d(F)\times \GL_{n-d}(F)$. Consequently, \[ \Card(\mathscr D_d) = \frac{\Card(\GL_n(F))}{\Card(\GL_d(F)\Card(\GL_{n-d}(F))} = \frac{\gamma_n}{\gamma_d \gamma_{n-d}}. \] We thus obtain \[ q^{n^2} = \sum_{d=0}^n \frac{\gamma_n}{\gamma_d \gamma_{n-d}} \nu_d \gamma_{n-d} = \gamma_n \sum_{d=0}^n \frac{\nu_d}{\gamma_d}. \] By subtraction, we get \[ \frac{\nu_n}{\gamma_n} = \frac {q^{n^2}}{\gamma_n} - \frac{q^{(n-1)^2}}{\gamma_{n-1}},\] or \[ \nu_n = q^{n^2} - q^{(n-1)^2} \frac{\gamma_n}{\gamma_{n-1}}. \] It remains to compute $\gamma_n$: since an invertible matrix consists of a nonzero vector, a vector which does not belong to the line generated by the first one, etc., we have \[ \gamma_n = (q^n-1) (q^n-q)\dots (q^n-q^{n-1}). \] Then, \[ \gamma_n = (q^n-1) q^{n-1} (q^{n-1}-1)\dots (q^{n-1}-q^{n-2}) = (q^n-1)q^{n-1} \gamma_{n-1}. \] We thus obtain \[ \nu_n = q^{n^2} - q^{(n-1)^2} (q^n-1) q^{n-1} = q^{n^2} - q^{(n-1)^2} q^{2n-1} + q^{(n-1)^2} q^{n-1} = q^{n^2-n}, \] as claimed.

The proof of Leinster (2021)

Leinster defines a bijection from $\mathscr N_n(F)\times F^n$ to $\mathrm M_n(F)$. The definition is however not very canonical, because he assumes given, for any subspace $V$ of $F^n$, a basis of $V$.

Take a pair $(u,x)$, where $u\in\mathscr N_n(F)$ and $x\in F^n$ and consider the subspace $V_x=\langle x,u(x),\dots\rangle$, the smallest $u$-invariant subspace of $F^n$ which contains $x$. To describe $u$, we observe that we know its restriction to $V_x$, and we need to describe it on the chosen complementary subspace $V'_x$.

To that aim, we have to give ourselves an endomorphism $u'_x$ of $V'_x$ and a linear map $\phi_x\colon V'_x\to V_x$. Since we want $u$ to be nilpotent, it is necessary and sufficient to take $u'_x$ nilpotent.

Instead of considering $\phi_x\colon V'_x\to V_x$, we can consider the map $y\mapsto y+\phi_x(y)$. Its image is a complement $W_x$ of $V_x$ in $F^n$, and any complement can be obtained in this way. The nilpotent endomorphism $u'_x$ of $V'_x$ transfers to a nilpotent endomorphism $w_x$ of $W_x$.

All in all, what the given pair $(u,x)$ furnishes is a subspace $V_x$ with a basis $(x_1=x,x_2=u(x),\dots)$, a complement $W_x$, and a nilpotent endomorphism $w_x$ of $W_x$. This is more or less what the Fitting decomposition of an endomorphism gives us! Recall that $V_x$ was assumed to have been given a basis $(e_1,\dots,e_p)$. There exists a unique automorphism of $V_x$ which maps $e_i$ to $u^{i-1}(x)$ for all $i$. In other words, we have a pair of complementary subspaces $(V_x,W_x)$, a linear automorphism of $V_x$, and a nilpotent automorphism of $W_x$. By the Fitting decomposition, these data furnish in a bijective way an endomorphism of $F^n$, and that concludes the proof.

A remark about motivic integration

The framework of motivic integration suggests to upgrade these combinatorial results into equalities valid for all field $F$, which hold in the Grothendieck ring of varieties $\KVar_F$. As an abelian group, it is generated by symbols $[X]$, for all algebraic varieties $X$ over $F$, with relations $[X]=[Y]+[X\setminus Y]$, whenever $Y$ is a closed subvariety of $X$. The ring structure is defined so that the formula $[X]\cdot[Y]=[X\times Y]$ for all algebraic varieties $X$ and $Y$ over $F$.

By construction of this ring, equalities $[X]=[Y]$ in $\KVar_F$ imply that many invariants of $X$ and $Y$ coincide. In particular, when $F$ is a finite field, they will have the same number of points.

The question is thus to compute the class $[\mathscr N_n]$ of the variety $\mathscr N_n$, for any field $F$. The proofs that I described above can be more or less transferred to this context and imply the following theorem. We denote by $\mathbf L\in \KVar_F$ the class of the affine line $\mathbf A^1$.

Theorem. — One has an equality $[\mathscr N_n] \mathbf L^n = \mathbf L^{n^2}$ in the localization of the Grothendieck ring $\KVar_F$ by the element $(\mathbf L-1)\dots(\mathbf L^{n-1}-1)$.

The following question is then natural. (I have not thought about it at all.)

Question. — Does one have $[\mathscr N_n]=\mathbf L^{n^2-n}$ in $\KVar_F$?

Friday, October 13, 2023

A combinatorial proof of the Newton identities

Let $T_1,\dots,T_n$ be indeterminates. For all $m\geq0$, denote by $S_m$ the $m$th elementary symmetric polynomial, given by \[ S_m = \sum_{i_1 \lt \dots \lt i_m} T_{i_1}\dots T_{i_m}. \] These are polynomials in $T_1,\dots, T_n$, and as their names suggest, these polynomials are symmetric, meaning that \[ S_m (T_{\sigma(1)},\dots,T_{\sigma(n)}) = S_m (T_1,\dots,T_n) \] for any permutation $\sigma$ of $\{1,\dots,n\}$. By definition, one has $S_0=1$ (there is exactly one family of integers, the empty one, and the empty product is equal to~$1$) and $S_m=0$ for $m>n$ (there are no families of integers $1\leq i_1\lt\dots\lt i_m\leq n$). A well-known theorem asserts that any symmetric polynomial in $T_1,\dots,T_n$ (with coefficients in a commutative ring $R$) can be written uniquely as a polynomial in $S_1,\dots,S_n$ (with coefficients in $R$). It is in particular the case for the Newton polynomials, defined for $p\geq 0$ by \[ N_p = T_1^p + \dots+T_n^p . \] In particular, $N_0=n$ and $N_1=S_1$. The next Newton polynomial can be computed by “completing the square”: \[ N_2 = T_1^2+\dots+T_n^2 = (T_1+\dots+T_n)^2 - 2 S _1 = S_1^2 - 2S_1. \] These are the first two (or three) of a family of relations, called the Newton identities, that relate the polynomials $N_p$ and the polynomials $S_m$: \[  \sum_{m=0}^{p-1} (-1)^m N_{p-m}\cdot S_m + (-1)^p p \cdot S_p = 0. \] Using that the coefficient of $N_p$ is given by $S_0=1$, they allow to express inductively the polynomials $N_p$ in terms of $S_1,\dots,S_p$. If $p>n$, all terms with $m>n$ vanish. Using that the coefficient of $S_p$ is $N_0=p$, these formulas also show, conversely, that if $p!$ is invertible in the commutative ring $R$, then $S_1,\dots,S_p$ can be expressed in terms of $N_1,\dots,N_p$. There are various proofs of these identities. Most of the one I have seen are algebraic in nature, in so that they rely on the relations between the roots of a polynomial and its coefficients, and/or on expansion in power series. The following proof, due to Doron Zeilberger (1984) (Discrete Math. 49, p. 319, doi:10.1016/0012-365X(84)90171-7), is purely combinatorial. I have been made aware of it because it is the one that is formalized in the proof assistant system Lean. (See there for the source code.) We consider the set $S=\{1,\dots,n\}$ and define a set $X$ whose elements are the pairs $(A,k)$ satisfying the following properties:
  • $A$ is a subset of $S$;
  • $k$ is an element of $S$;
  • the cardinality of $A$ is less or equal than $p$;
  • if ${\mathop {\rm Card}}(A)=p$, then $k\in A$.
Define a map $f\colon X\to X$ by the following rule: for $(A,k)$ in $X$, set
  • $f(A,k) = (A \setminus\{k\}, k)$ if $k\in A$;
  • $f(A,k) = (A\cup\{k\}, k)$ if $k\notin A$.
Write $f(A,k)=(A',k)$; observe that this is again an element of $X$. The first two conditions are obvious. In the first case, when $k\in A$, then the cardinality of $A'$ is one less than the cardinality of $A$, hence is at most $p-1$, and the last condition is vacuous. In the second case, when $k\notin A$, the cardinality of $A$ is strictly smaller than $p$, because $(A,k)\in X$, so that the cardinality of $A'$ is at most $p$; moreover, $k\in A'$ hence the last condition holds. Observe that $f\colon X\to X$ is an involution: $f\circ f={\mathrm{id}}_X$. Indeed, with the same notation, one has $f(A,k)=(A',k)$. In the first case, $k\in A$, hence $k\notin A'$, so that $f(A',k)=(A'\cup\{k\},k)=(A,k)$; in the second case, one has $m\in A'$, hence $f(A',k)=(A'\setminus\{k\}, k)=(A,k)$ since $A'=A\cup\{k\}$ and $k\notin A$. Moreover, $f$ has no fixed point because it switches pairs $(A,k)$ such that $k\in A$ and pairs such that $k\notin A$. Zeilberger's proof of the Newton identities build on a function $w\colon X\to R[T_1,\dots,T_n]$ such that $w(f(A,k))=-f(A,k)$ for all $(A,k)\in X$. Since $f$ has no fixed point, the expression vanishes \[ \sum_{(A,k)\in X} w(A,k) \] since one can cancel a term $(A,k)$ with its image $f(A,k)$. Zeilberger's definition of the function $w$ is \[ w(A,k) = (-1)^{\mathrm{Card}(A)} T_k^{p-\mathrm{Card}(A)}\cdot \prod_{a\in A} T_a. \] It satisfies the desired relation: for $(A,k)\in X$ such that $k\notin A$, one has $f(A,k)=(A\cup\{k\}, k)$, so that \[ w(f(A,k)) = (-1)^{\mathrm{Card}(A)+1} T_k^{k-\mathrm{Card}(A)-1} \prod_{a\in A\cup\{k\}} T_a = - (-1)^{\mathrm{Card}(A)} T_k^{k-\mathrm{Card}(A)} \prod_{a\in A} T_a = w(A,k). \] When $k\in A$, one has $f(A,k)=(A',k)$ with $k\notin A'$, and the formula applied to $(A',k)$ implies the one for $(A,k)$, because $f$ is an involution. Now, in the relation $ \sum_{(A,k)} w(A,k) =0$, we first sum on $A$ and then on $k$: \[ \sum_{A\subset S} \sum_{k | (A,k) \in X} (-1)^{\mathrm{Card}(A)} T_k^{p-{\mathrm{Card}(A)}} \prod_{a\in A} T_a = 0\] and put together all those sets $A$ which have the same cardinality, say, $m$. This gives \[ \sum_{m=0}^n (-1)^m \sum_{\mathrm{Card}(A)=m} \prod_{a\in A} T_a \cdot \sum_{k | (A,k)\in X} T_k^{p-m} = 0. \] Let us compute separately the contribution to the sum of each $m$. When $m\gt p$, no subset $A$ is authorized so that the corresponding sum is zero. On the opposite, when $m\lt p$, for any subset $A$ such that $\mathrm{Card}(A)=m$, all pairs $(A,k)$ belong to $X$, so that the inner sum is $N_{p-m}$, and the whole term is $(-1)^mS_m N_{p-m}$. Finally, when $m=p$, the only $k$ that appear in the inner sum are the elements of $A$, and the corresponding term is $\sum_{k\in A} T_k^0=\mathrm {Card}(A)=p$; this term contributes to $(-1)^p p S_p$.

Friday, August 4, 2023

On soccer balls

This is a swift adaptation of a thread on Mastodon written after having read the beginning of Mara Goyet's column in yesterday's edition of Le Monde (please don't spoil the game by jumping to it now). So look at this picture below.
Does something strike you in it? The directions? The ball? The character? Nothing? 68 people voted, roughly one third found nothing, and two thirds saw that there's a problem with the ball. One nice remark from Sarah was that the ball was larger than the character! So, what's the problem with the ball? As it has been observed by Matt Parker in 2017, no soccer ball can be made following what the picture suggests: what we see is a combination of hexagons, some black, some white, and it is actually impossible to stitch hexagons together to do a ball. Or the other way round, it is impossible to take a ball and draw an hexagonal pattern on it. On the other hand, it is quite easy to draw an hexagonal pattern on a plane, and many kitchen/bathrooms feature such a tiling. Here is a photograph (stolen to somebody on Pinterest) of a 4th cent. BCE pavement (Porta Rosa, in the city of Elea-Velia, Italy)
But on a ball, nope. So how is a soccer ball made if not with hexagons? — The point is that is not made of hexagons only : it has 20 hexagons (so that the sign post above is slightly reminiscent of a soccer ball) but it also has 12 pentagons.
The standard “Tesla” soccer ball with its white hexagons and black pentagons. (Wikimedia, shine2010)
The whole structure forms what is called a truncated icosahedron. A regular icosahedron, as shown on the picture below (the Spinoza monument in Amsterdam) would have 20 triangular faces, such that five of them meet at each vertex. If one make a cut at these vertices, each of them is replaced by a pentagon, and one get the truncated icosahedron. I should add the mathematical explanation for the impossibility of stitching hexagons into a ball. This comes from a formula due to Euler that if one stitches polygons to a ball in anyway, some relation must hold : the number of vertices minus the number of edges plus the number of polygons is always equal to 2. Imagine you have some number of hexagons, say \(n\). That makes \(6n\) edges, but each time you make a stitch, you glue two edges together, hence \(3n\) edges in the end. What about the vertices, namely the points when at least 3 hexagons will be joined? If the picture is correct, then 3 hexagons meet at each vertex, so the number of vertices is \(6n/3=2n\) — Then Euler's formula says \(2n - 3n + n = 2\), which is absurd. … Now, is this really a necessity that exactly 3 hexagons meet at each vertex? Couldn't we have more complicated configurations? As a matter of fact, I'm not sure about the answer, and I can try imagine a strange configurations where there are a bunch of hexagons meeting in a different number of points. But algebra seems to contradict this in the end. Let me try. Assume we have \(s_3\) vertices where 3 hexagons meet, \(s_4\) vertices where 4 vertices meet, etc. The total number of vertices is then \(s = s_3 + s_4 + \cdots\). The \(n\) hexagons contribute to \(6n\) vertices, but those of type \(s_3\) count for 3 of them, those of type \(s_4\) for 4 of them, etc. so that \(6n = 3s_3 + 4s_4 +\cdots = 3s + d\), where \(d\) is defined by \(d = s_4 + 2s_5+ \cdots\). In particular, we have \(6n \geq 3s\), which means \(2n \geq s\). Euler's formula still says \(2 = s - 3n + n = s - 2n \leq 0\), still giving a contradiction. And if you followed that proof, you may object that there is actually a way to stitch hexagons into a ball: it consists in stitching exactly 2 hexagons together. You get something completely flat, but if you blow enough air or fill it with wool, so that it gets round enough, you might feel that you got a ball. The picture made by the hexagons on a ball is that of an hexagon drawn on the equator, each of them covering one hemisphere. Here are additional references
  • A post on the BBC site, where the problem is explained roughly, up to the fact that the UK authorities refused to propose a correct design for the post after Matt Parker raised up the issue
  • A video by Matt Parker on his YouTube channel, @standupmaths, where he discusses how soccer balls are made, especially in the 2014 WWC or the UK Premier Ligue
  • A text by Étienne Ghys on the large audience website Image des mathématiques about the 2014 WWC soccer ball (in French)
  • Euler's formula has a rich history which is evoked in the Wikipedia page quoted above, but is also the matter of the great book Proofs and Refutations by Imre Lakatos.

Thursday, July 6, 2023

Electrostatics and rationality of power series

I would like to tell here a story that runs over some 150 years of mathematics, around the following question: given a power series $\sum a_n T^n$ (in one variable), how can you tell it comes from a rational function?

There are two possible motivations for such a question. One comes from complex function theory: you are given an analytic function and you wish to understand its nature — the simplest of them being the rational functions, it is natural to wonder if that happens or not (the next step would be to decide whether that function is algebraic, as in the problem of Hermann Amandus Schwarz (1843–1921). Another motivation starts from the coefficients $(a_n)$, of which the power series is called the generating series; indeed, the generating series is a rational function if and only if the sequence of coefficients satisfies a linear recurrence relation.

At this stage, there are little tools to answer that question, besides is a general algebraic criterion which essentially reformulates the property that the $(a_n)$ satisfy a linear recurrence relation. For any integers $m$ and $q$, let $D_m^q$ be the determinant of size $(q+1)$ given by \[ D_m^q = \begin{vmatrix} a_m & a_{m+1} & \dots & a_{m+q} \\ a_{m+1} & a_{m+2} & \dots & a_{m+q+1} \\ \vdots & \vdots & & \vdots \\ a_{m+q} & a_{m+q+1} & \dots & a_{m+2q} \end{vmatrix}. \] These determinants are called the Hankel determinants or (when $m=0$) the Kronecker determinants, from the names of the two 19th century German mathematicians Hermann Hankel (1839—1873) and Leopold von Kronecker (1823–1891). With this notation, the following properties are equivalent:

  1. The power series $\sum a_n T^n$ comes from a rational function;
  2. There is an integer $q$ such that $D_m^q=0$ for all large enough integers $m$;
  3. For all large enough integers $q$, one has $D^q_0=0$.
(The proof of that classic criterion is not too complicated, but the standard proof is quite smart. In his book Algebraic numbers and Fourier analysis, Raphaël Salem gives a proof which arguably easier.)

Since this algebraic criterion is very general, it is however almost impossible to prove the vanishing of these determinants without further information, and it is at this stage that Émile Borel enters the story. Émile Borel (1871–1956) has not only be a very important mathematician of the first half of the 20th century, by his works on analysis and probability theory, he also was a member of parliament, a minister of Navy, a member of Résistance during WW2. He founded the French research institution CNRS and of the Institut Henri Poincaré. He was also the first president of the Confédération des travailleurs intellectuels, a intellectual workers union.

In his 1893 paper « Sur un théorème de M. Hadamard », Borel proves the following theorem:

Theorem. — If the coefficients \(a_n\) are integers and if the power series \(\sum a_n T^n \) “defines” a function (possibly with poles) on a disk centered at the origin and of radius strictly greater than 1, then that power series is a rational function.

Observe how these two hypotheses belong to two quite unrelated worlds: the first one sets the question within number theory while the second one resorts from complex function theory. It looks almost as magic that these two hypotheses lead to the nice conclusion that the power series is a rational function.

It is also worth remarking that the second hypothesis is really necessary for the conclusion to hold, because rational functions define functions (with poles) on the whole complex plane. The status of the first hypothesis is more mysterious. While it is not necessary, the conclusion may not hold without it. For example, the exponential series \(\sum T^n/n!\) does define a function (without poles) on the whole complex plane, but is not rational (it grows too fast at infinity).

However, the interaction of number theoretical hypotheses with the question of the nature of power series was not totally inexplored at the time of Borel. For example, a 1852 theorem of the German mathematician Gotthold Eisenstein (Über eine allgemeine Eigenschaft der Reihen-Entwicklungen aller algebraischen Functionen) shows that when the coefficients \(a_n\) of the expansion \(\sum a_nT^n\) of an algebraic functions are rational numbers, the denominators are not arbitrary: there is an integer \(D\geq 1\) such that for all \(n\), \(a_n D^{n+1}\) is an integer. As a consequence of that theorem of Eisenstein, the exponential series or the logarithmic series cannot be algebraic.

It's always time somewhere on the Internet for a mathematical proof, so that I have no excuse for avoiding to tell you *how* Émile Borel proved that result. He uses the above algebraic criterion, hence needs to prove that some determinants \(D^q_m\) introduced above do vanish (for some \(q\) and for all \(m\) large enough). Then his idea consists in observing that these determinants are integers, so that if you wish to prove that they vanish, it suffices to prove that they are smaller than one!

If non-mathematicians are still reading me, there's no mistake here: the main argument for the proof is the remark that a nonzero integer is at least one. While this may sound as a trivial remark, this is something I like to call the main theorem of number theory, because it lies at the heart of almost all proofs in number theory.

So one has to bound determinants from above, and here Borel invokes the « théorème de M. Hadamard » that a determinant, being the volume of the parallelipiped formed by the rows, is smaller than the product of the norms of these rows, considered as vectors of the Euclidean space : in 2-D, the area of a parallelogram is smaller than the lengths of its edges! (Jacques Hadamard (1865—1963) is known for many extremely important results, notably the first proof of the Prime number theorem. It is funny that this elementary result went into the title of a paper!)

But there's no hope that using Hadamard's inequality of our initial matrix can be of some help, since that matrix has integer coefficients, so that all rows have size at least one. So Borel starts making clever row combinations on the Hankel matrices that take into accounts the poles of the function that the given power series defines.

Basically, if \(f=\sum a_nT^n\), there exists a polynomial \(h=\sum c_mT^m\) such that the power series \(g=fh = \sum b_n T^n\) defines a function without poles on some disk \(D(0,R)\) where \(R>1\). Using complex function theory (Cauchy's inequalities), this implies that the coefficients \(b_n\) converge rapidly to 0, roughly as \(R^{-n}\). For the same reason, the coefficients \(a_n\) cannot grow to fast, at most as \(r^{-n}\) for some \(r>0\). The formula \(g=fh\) shows that coefficients \(b_n\) are combinations of the \(a_n\), so that the determinant \(D_n^q\) is also equal to \[ \begin{vmatrix} a_n & a_{n+1} & \dots & a_{n+q} \\ \vdots & & \vdots \\ a_{n+p-1} & a_{n+p} & \dots & a_{n+p+q-1} \\ b_{n+p} & b_{n+p+1} & & b_{n+p+q} \\ \vdots & & \vdots \\ b_{n+q} & b_{n+q+1} & \dots & b_{n+2q} \end{vmatrix}\] Now, Hadamard's inequality implies that the determinant \(D_n^q\) is (roughly) bounded above by \( (r^{-n} )^p (R^{-n}) ^{q+1-p} \): there are \(p\) lines bounded above by some \(r^{-n}\) and the next \(q+1-p\) are bounded above by \(R^{-n}\). This expression rewrites as \( 1/(r^pR^{q+1-p})^n\). Since \(R>1\), we may choose \(q\) large enough so that \(r^p R^{q+1-p}>1\), and then, when \(n\) grows to infinity, the determinant is smaller than 1. Hence it vanishes!

The next chapter of this story happens in 1928, under the hands of the Hungarian mathematician George Pólya (1887-1985). Pólya had already written several papers which explore the interaction of number theory and complex function theory, one of them will even reappear later one in this thread. In his paper “Über gewisse notwendige Determinantenkriterien für die Fortsetzbarkeit einer Potenzreihe”, he studied the analogue of Borel's question when the disk of radius \(R\) is replaced by an arbitrary domain \(U\) of the complex plane containing the origin, proving that if \(U\) is big enough, then the initial power series is a rational function. It is however not so obvious how one should measure the size of \(U\), and it is at this point that electrostatics enter the picture.

In fact, it is convenient to make an inversion : the assumption is that the series \(\sum a_n / T^n\) defines a function (with poles) on the complement of a compact subset \(K\) of the complex plane. Imagine that this compact set is made of metal, put at potential 0, and put a unit electric charge at infinity. According to the 2-D laws of electrostatics, this create an electric potential \(V_K\) which is identically \(0\) on \(K\) and behaves as \( V_K(z)\approx \log(|z|/C_K)\) at infinity. Here, \(C_K\) is a positive constant which is the capacity of \(K\).

Theorem (Pólya). — Assume that the \(a_n\) are integers and the series \(\sum a_n/T^n\) defines a function (with poles) on the complement of \(K\). If the capacity of \(K\) is \(\lt1\), then \(\sum a_n T^n\) is rational.

To apply this theorem, it is important to know of computations of capacities. This was a classic theme of complex function theory and numerical analysis some 50 years ago. Indeed, what the electric potential does is solving the Laplace equation \(\Delta V_K=0\) outside of \(K\) with Dirichlet condition on the boundary of \(K\).

In fact, the early times of complex analysis made a remarkable use of this fact. For example, it was by solving the Laplace equation that Bernhard Riemann proved the existence of meromorphic functions on “Riemann surfaces”, but analysis was not enough developed at that time (around 1860). In a stunningly creative move, Riemann imagines that his surface is partly made of metal, and partly of insulating material and he deduces the construction of the desired function from the electric potential.

More recently, complex analysis and potential theory also had applications to fluid dynamics, for example to compute (at least approximately) the flow of air outside of an airplane wing. (I am not a specialist of this, but I'd guess the development of numerical methods that run on modern computers rendered these beautiful methods obsolete.)

The relation between the theorems of Borel and Pólya is that the capacity of a disk is its radius. This can be seen by the fact that \(V(z)=\log(|z|/R\)\) solves the Laplace equation with Dirichlet condition outside of the disk of radius \(R\).

A few other capacities have been computed, not too many, in fact, because it appears to be a surprisingly difficult problem. For example, the capacity of an interval is a fourth of its length.

Pólya's proof is similar to Borel's, but considers the Kronecker determinant in place of Hankel's. However, the linear combinations that will allow to show that this determinant is small are not as explicit as in Borel's proof. They follow from another interpretation of the capacity introduced by the Hungarian-Israeli mathematician Michael Fekete (1886–1957; born in then Austria–Hungary, now Serbia, he emigrated to Palestine in 1928.)

You know that the diameter \(d_2(K)\) of \(K\) is the upper bound of all distances \(|x-y|\) where \(x,y\) are arbitrary points of \(K\). Now for an integer \(n\geq 2\), consider the upper bound \(d_n(K)\) of all products of distances \( \prod_{i\neq j}{x_j-x_i}\)^{1/n(n-1)}\) where \(x_1,\dots,x_n\) are arbitrary points of \(K\). It is not so hard to prove that the sequence \(d_n(K)\) decreases with \(n\), and the limit \(\delta(K)\) of that sequence is called the transfinite diameter by Fekete.

Proposition. — \( \delta(K)= C_K\).

This allows to make a link between capacity theory and another theme of complex function theory, namely the theory of best approximation, which end up in Pólya's proof: the adequate linear combination for the \(n\)th row is given by the coefficients of the monic polynomial of degree \(n\) which has the smallest least upper bound on \(K\).

If all this is of some appeal to you, there's a wonderful little book by Thomas Ransford, Potential Theory in the Complex Plane, which I find quite readable (say, from 3rd or 4th year of math studies on).

In the forthcoming episodes, I'll discuss two striking applications of the theorems of Borel and Pólya to proof by Bernhard Dwork of a proof of a conjecture of Weil (in 1960), and by a new proof (in 1987) by Jean-Paul Bézivin and Philippe Robba of the transcendence of the numbers \(e\) and \(\pi\), two results initially proven by Charles Hermite and Ferdinand von Lindemann in 1873 and 1882.

Friday, February 17, 2023

Associated prime ideals

$\gdef\ann{\mathop{\mathrm{ann}}} \gdef\Ass{\mathop{\mathrm{Ass}}}\gdef\Spec{\mathop{\mathrm{Spec}}}$

I would like to go back to a quite delicate question of commutative algebra, that of associated prime ideals of modules. In most textbooks (Bourbaki, Matsumura…), this concept is considered for modules over a noetherian ring, while it is also necessary to consider it in a greater generality for some applications in algebraic geometry. For my book, (Mostly) commutative algebra (Springer Nature, 2021), I preferred to introduce the general concept (§6.5), because I observed that the initial proofs are in fact easier. In yesterday's class (Cohomology of coherent sheaves, 2nd year of Master course at Université Paris Cité), some remarks of a student, Elias Caeiro, helped me simplify two steps of the treatment I proposed in my book.

Definition.Let $A$ be a ring and let $M$ be an $A$-module. Say that a prime ideal $P$ of $A$ is associated to $M$ if there exists an element $m\in M$ such that $P$ is minimal among all prime ideals containing $\ann_A(m)$.
We write $\Ass_A(M)$ (sometimes spelt out as “assassin”) for the set of all associated prime ideals of $M$.

(Here, $\ann_A(m)$ is the annihilator of $m$, the ideal of all $a\in A$ such that $am=0$.)

There is a geometric way to intepret this definition: it means that in the spectrum $\Spec(A)$, the irreducible closed set $V(P)$ (of which $P$ is the generic point) is an irreducible component of $V(\ann_A(m))$. Thanks to this remark, associated prime ideals are compatible with localisation: \[ \Ass_{S^{-1}A}(S^{-1}A) = \Ass_A(M) \cap \Spec(S^{-1}A), \] where $\Spec(S^{-1}A)$ is identified as the subset of $\Spec(A)$ consisting of prime ideals $P$ which are disjoint from $S$. In particular, $P$ is associated to $M$ if and only if the maximal ideal $PA_P$ of the local ring $A_P$ is associated to the module $M_P$.

Here is what the associated prime ideals mean, from the point view of module theory.
Proposition. — Let $a\in A$.
a) The multiplication by $a$ is injective in $M$ if and only if $a$ does not belong to any associated prime ideal of $M$.
b) The localized module $M_a$ is zero if and only if $a$ belongs to all associated prime ideals of $M$.
c) In particular, $M=0$ if and only if $\Ass_A(M)=\emptyset$.

Proof. — a) If $a$ belongs to the associated prime ideal $P$, then $a$ belongs to the associated prime ideal $PA_P$ of $M_P$, which means that there exists $m\in M$ such that $PA_P$ is the only prime ideal containing $\ann_{A_P}(m)$. Consequently, $a$ is nilpotent modulo $\ann_{A_P}(m)$ and there exists $n\geq 0$ and $b\in A\setminus P$ such that $a^nb\in\ann_A(m)$. Take a minimal such $n$. Since $b\notin P$, one has $n\geq 1$; then $a^{n-1}b m\neq0$, while $a\cdot a^{n-1}bm=0$ and the homothety $(a)_M$ is not injective. Conversely, if $(a)_M$ is not injective, take $m\neq0$ in $M$ such that $am=0$; the annihilator $\ann_A(m)$ is not equal to $A$, hence $V(\ann_A(m))\neq \emptyset$; take an irreducible component of this closed subset — equivalently a minimal prime ideal $P$ among those containing $\ann_A(m)$; one has $a\in \ann_A(m)$, hence $a\in P$.
b) follows from c), with $a=1$.
c) The module $M$ is zero if and only if the multiplication by $0$ is injective on $M$. By a), this is equivalent to the fact that $\Ass_A(M)$ is empty.

Corollary.A prime ideal $P$ is in the support of $M$ if and only if it contains some associated prime ideal.
The prime ideal $P$ belongs to the support of $M$ if and only if $M_P\neq0$, if and only if $\Ass_{A_P}(M_P)$ is not empty, if and only if there exists an associated prime ideal of $M$ which belongs to $\Spec(A_P)$, that is, is contained in $P$.

For noetherian rings, one has the following characterization of associated prime ideals, which is usually taken at their definition.

Theorem.Let $A$ be a noetherian ring and $M$ be an $A$-module. A prime ideal $P$ of $A$ is associated to $M$ if and only if there exists $m\in M$ such that $P=\ann_A(m)$.
If $P=\ann_A(m)$, then $P$ is associated to $M$. Conversely, let $m\in M$ and let $P$ be a minimal prime ideal of $A$ among those containing $\ann_A(m)$. We first assume that $A$ is local with maximal ideal $P$; then $P$ is the only prime ideal of $A$ that contains $\ann_A(m)$, which implies that any element of $P$ is nilpotent modulo $\ann_A(m)$. Since $P$ is finitely generated (because $A$ is noetherian), there exists an integer $n$ such that $P^n\subseteq \ann_A(m)$. Take a minimal such $n$. Since $\ann_A(m)\subseteq P$, one has $n\geq 1$; then $P^{n-1}\not\subseteq\ann_A(m)$ so that there exists $b\in P^{n-1}$ such that $bm\neq0$. Then $ab\in P^n$ for every $a\in P$, so that $P\subseteq \ann_A(bm)$, and $\ann_A(bm)\subseteq P$ because $bm\neq0$. Consequently, $P=\ann_A(bm)$. In the general case, we use the case of a local ring to obtain $m\in M$ such that $\ann_{A_P}(m/1)=PA_P$. Consequently, $\ann_A(m)\subseteq P$, and for every $a\in P$, there exists $b\notin P$ such that $abm=0$. Using that $P$ is finitely generated, one finds $b\notin P$ such that $abm=0$ for every $a\in P$; then $\ann_A(bm)=P$, as was to be shown.

From that point on, both presentations converge. One deduces from the preceding theorem that if $A$ is noetherian and $M$ is finitely generated, there exists a composition series $0=M_0\subseteq M_1 \subseteq \dots \subseteq M_n=M$, with successive quotients $M_k/M_{k-1}$ of the form $A/P_k$, for some prime ideals $P_k$ of $A$, and then $\Ass_A(M)$ is contained in $\{P_1,\dots,P_n\}$, in view of the following lemma. In particular, $\Ass_A(M)$ is finite.

Lemma.Let $M$ be an $A$-module and let $N$ be a submodule of $M$; then $ \Ass_A(N)\subseteq \Ass_A(M)\subseteq \Ass_A(N)\cup \Ass_A(M/N)$.
The first inclusion $\Ass_A(N)\subseteq \Ass_A(M)$ follows from the definition. Let us prove the second one. Let $P\in\Ass_A(M)$ and let $m\in M$ be such that $P$ is a minimal prime ideal of $A$ among those containing $\ann_A(m)$. Let $m'$ be the image of $M$ in $M/N$. If $P$ contains $\ann_A(m')$, then $P$ is also minimal among such prime ideals, hence $P\in\Ass_A(M/N)$. Otherwise, there exists $b\in \ann_A(m')$ such that $b\notin P$. Let us prove that $P$ is minimal among the prime ideals containing $\ann_A(bm)$. First of all, let $a\in\ann_A(bm)$; then $abm=0$, hence $ab\in P$, hence $a\in P$ since $b\notin P$. Since $\ann_A(m)\subseteq\ann_A(bm)$, it also follows that $P$ is minimal among the prime ideals containing $\ann_A(bm)$. Since $b\in\ann_A(m')$, one has $bm'=0$, hence $bm\in N$ and $P\in\Ass_A(N)$.

Sunday, January 1, 2023

The Klein group, the centralizer of a permutation, and its relation with the alternating group

The following reflexion came out of my irrepressible need to understand why the 3 double transpositions in $\mathfrak S_4$, together with the identity, formed a group $V$. Of course, one might just say: “they are stable under multiplication, as one sees by computing the 4·3/2 = 6 different products”, but who makes this computation anyway? And since I wanted not only to understand this, but to explain it to Lean, I needed an argument that could actually be done, for real. So here is an argument that requires no computation, besides the one that says that there are 3 double transpositions.

Prop.The subgroup of $\mathfrak S_4$ generated by the 3 double transpositions is the unique 2-sylow subgroup of $\mathfrak A_4$. In particular, it has order 4 and consists of these 3 double transpositions and of the identity.
Proof. — Let $V$ be the subset of $\mathfrak S_4$ consisting of these 3 double transpositions and of the identity. Let $S$ be a 2-sylow subgroup in $\mathfrak A_4$.
We first prove $S \subseteq V$. The subgroup $S$ has order 4. Let $g\in S$. The order of $g$ divides 4, so its cycles have lengths 1, 2 or 4. If there were one cycle of length 4, then $g$ would be that cycle, hence of odd sign. Consequently, either $g=1$, or $g$ has a cycle of length 2, and then there must be a second because $g$ is even. Consequently, $S\subseteq V$, as claimed.
Since $4=\operatorname{\rm Card}(S)=\operatorname{\rm Card}(V)$, this shows that $S=V$, hence $S=\langle V\rangle$.

At this point, we still need to understand why there are 3 double transpositions. More generally, I wanted to prove that the number of permutations in $\mathfrak S_n$ of given orbit type. The orbit type a permutation $g$ is a multiset of strictly positive integers with sum $n$ given by the cardinalities of the orbits of $g$ on $\{1,\dots,n\}$. We write it as $1^{n_1} 2^{n_2}\dots r^{n_r}$, meaning that $g$ has $n_1$ orbits of length $1$ (fixed points), $n_2$ orbits of cardinality $2$, etc., so that $n= \sum n_i i$. Let $\mathscr O_g$ be the set of orbits of $g$. The action of $g$ on a given orbit $c$ coincides with a circular permutation with order the length $\ell(c)$ of this orbit; when it is nontrivial, such a permutation will be called a cycle of $g$. The supports of these cycles are pairwise disjoint, so that these cycles commute, and their product is exactly $g$. In fact, this is the only way of writing $g$ as a product of cycles with pairwise disjoint supports. (By convention, the identity is not a cycle.)

Theorem.There are exactly \[ N(1^{n_1}\dots r^{n_r}) = \frac{n!}{1^{n_1}\dots r^{n_r} n_1!\dots n_r!} \] permutations with orbit type $1^{n_1} 2^{n_2}\dots r^{n_r}$.

A standard proof of this result goes as follows. Write the decomposition of such a permutation $g$ into cycles with disjoint supports as $g=(\cdot)\dots (\cdot)(\cdot,\cdot)\dots(\cdot,\cdot,\dots)$, leaving blank spaces for the values of the cycles (and, contradicting our convention, allowing for cycles of length 1…). There are $n!$ ways to fill these spaces with the $n$ distinct integers between $1$ and $n$, but some of them will give rise to the same permutation. Indeed, the entries in a cycle of length $s$ only count up to a circular permutation, so that we need to divide the result by $1^{n_1}\dots r^{n_r}$. Moreoveer, we can switch the order of the cycles of given length, hence we also need to divide that result by $n_s!$ (number of ways of switching the various cycles of length $s$), for all possible length $s$.

This is reasonably convincing but one could wish for something more precise, both in the proof, and in the statement. In fact, in the preceding formula, the numerator $n!$ is the order of $\mathfrak S_n$. Since all permutations with a given orbit type are conjugate by $\mathfrak S_n$, the left hand side appears as the cardinality of the orbit of a permutation $g$ of that orbit type, so that the denominator has to be equal the cardinality of the stabilizer of this permutation under the action by conjugation. Therefore, a more precise proof of this formula could run by elucidating the structure of this centralizer. This may also be interesting once one wishes to relativize the result to the alternating group $\mathfrak A_n$ in order to obtain a formula for the cardinality of the various conjugacy classes in $\mathfrak A_n$.

Let us fix a permutation $g\in\mathfrak S_n$ with orbit type $1^{n_1}\dots r^{n_r}$. The stabilizer of $g$ under the action by conjugation is its centralizer $Z_g$, the subgroup of all $k\in\mathfrak S_n$ which commute with $g$.

We first define a morphism of groups \[ \tau \colon Z_g \to \mathfrak S_{n_1}\times \mathfrak S_{n_2}\times\dots \mathfrak S_{n_r}. \] Let $\mathscr O_g$ be the set of orbits of $g$; this is a set with cardinality $n_1+n_2+\dots+n_r$. Restricted to one orbit, the action of $g$ coincides with that of a circular permutation on (which fixes the complementary subset); these circular permuations have disjoint supports, hence they commute pairwise and their product is equal to $g$. For $c\in\mathscr O_g$, we write $\ell(c)$ for its cardinality of its support, this is also the order of the cycle of $g$ defined by this orbit. If $k\in Z_g$, then $kgk^{-1}=g$. Consequently, the action of $k$ permutes the orbits of $g$, respecting their cardinalities. This defines the desired group morphism $\tau$ from $Z_g$ to a product of permutation groups $\mathfrak S_{n_1}\times \dots \mathfrak S_{n_r}$.

This morphism $\tau$ is surjective.
Indeed, given permutations $\sigma_1$ of the set of fixed points of $g$, $\sigma_2$ of the set of orbits of length 2, etc., we construct $k_\sigma\in Z_g$ such that $\tau(k_\tau)=(\sigma_1,\dots,\sigma_r)$. We fix a point $a_c$ in each orbit $c$ and decide that $k_\sigma(a_c)=a_{\sigma_i(c)}$ if $c$ has length $i$. The formula $k_\sigma g=g_\sigma$ imposes $k_\sigma (g^n a_c)=g^n a_{\sigma_i(c)}$ for all $n\in\mathbf Z$, and it remains to check that this formula gives a well defined element in $Z_g$. In fact, this formula defines a group theoretic section of $\tau$.

What is the kernel of this morphism $\tau$?
If $\tau(k)=1$, then $k$ fixes every orbit $c\in\mathscr O_g$. Since $kg=gk$, we obtain that on each orbit $c$, $k$ coincides with some power of the corresponding cycle, which has order $\ell(c)$. We thus obtain an isomorphism \[ \ker(\tau) \simeq \prod_{c\in\mathscr C_g} (\mathbf Z/\ell(c)\mathbf Z). \]

To compute the cardinality of $Z_g$, it is now sufficient to compute those of $\operatorname{\rm im}(\tau)$ and $\ker(\tau)$, and this gives the formula \[ \operatorname{\rm Card} (Z_g) = \operatorname{\rm Card} (\ker(\tau)) \operatorname{\rm Card} (\operatorname{\rm im}(\tau)) = 1^{n_1}\dots r^{n_r} n_1! \dots n_r!, \] as was to be shown.

One of the interest of this argument is that it can be pushed forward to understand the structure of the conjugacy classes in the alternating group $\mathfrak A_n$. The case $n\leq 1$ is uninteresting, hence we assume $n\geq 2$. Then $\mathfrak A_n$ has index 2 in $\mathfrak S_n$, and the formulas \[ \operatorname  {\rm Card}((g)_{\mathfrak A_n}) = \frac{{\rm Card}({\mathfrak A_n})}{{\rm Card}(Z_g \cap {\mathfrak A_n})} \quad\text{\rm and}\quad \operatorname  {\rm Card}((g)_{\mathfrak S_n}) = \frac{{\rm Card}({\mathfrak S_n})}{{\rm Card}(Z_g)} \] for the cardinalities of the conjugacy classes $(g)_{\mathfrak A_n}$ and $(g)_{\mathfrak S_n}$ imply that both are equal if and only if $Z_g$ is not contained in $\mathfrak A_n$; otherwise, the conjugacy class $(g)_{\mathfrak S_n}$ is the disjoint union of $(g)_{\mathfrak A_n}$ and of a conjugacy class $(g')_{\mathfrak A_n}$ of a permutation $g'$ which is conjugate to $g$ in $\mathfrak S_n$ but not in $\mathfrak A_n$, and both have the same cardinality.

Examples of this phenomenon are classical. For example, the 5-cycles in $\mathfrak S_5$ are conjugate, but they constitute two distinct conjugacy classes under $\mathfrak A_5$. Even more elementary, the 3-cycles $(1\,2\,3)$ and $(1\,3\,2)$ are conjugate in $\mathfrak S_3$, but they are not conjugate in $\mathfrak A_3$ since that group is commutative!

So let us use our description of $Z_g$ to give a full description of this phenomenon.

As a first step, when is $\ker(\tau)$ contained in $\mathfrak A_n$? We have seen that $\ker(\tau)$ is generated by the cycles $c\in\mathscr C_g$. Consequently, $\ker(\tau)$ is contained in $\mathfrak A_n$ if and only if all of them are contained in $\mathfrak A_n$, which means that their lengths are odd.

We assume that this condition holds, so that $\ker(\tau)\subseteq \mathfrak A_n$, and now work on the image of $\tau$. Its surjectivity was proved by the means of an explicit section $\sigma\mapsto k_\sigma$. Given the preceding condition that $\ker(\tau)\subseteq \mathfrak A_n$, a necessary and sufficient condition for the inclusion $Z_g\subseteq \mathfrak A_n$ will be that the image of this section consists of even permutations. This section is a morphism of groups, so it suffices to understand the sign of $k_\sigma$ when $\sigma$ consists of a cycle $(c_1,\dots,c_s)$ in $\mathfrak S_{n_i}$ and is trivial on the other factors. Then $\ell(c_1)=\dots=\ell(c_s)$, by definition of $\sigma$. The formula $k_\sigma(g^n a_c)=g^n a_{\sigma(c)}$ shows that the non trivial cycles of $k_\sigma$ are of the form $(g^n a_{c_1},\dots, g^n a_{c_s})$; they all have the same length, $s$, and there are $\ell(c_1)$ of them. Consequently, the sign of $k_\sigma$ is equal to $(-1)^{(s-1)\ell(c_1)}=(-1)^{s-1}$ since $\ell(c_1)$ is odd. This proves that the sign of $k_\sigma$ is equal to the sign of $\sigma$. In addition to the condition that the orbits of $g$ have odd cardinalities, a necessary and sufficient condition for the image of $\sigma\mapsto k_\sigma$ to be contained in $\mathfrak A_n$ is thus that all symmetric groups $\mathfrak S_{n_i}$ coincide with their alternating groups, that is, $n_i\leq 1$ for all $i$. We can now conclude:

Theorem. — Let $1^{n_1}\dots r^{n_r}$ be a partition of $n$.

  1. If $n_i=0$ for even $i$, and $n_i\leq 1$ for all $i$, then there exist two permutations in $\mathfrak S_n$ with orbit type $1^{n_1}\dots r^{n_r}$ which are not conjugate in $\mathfrak A_n$.
  2. Otherwise, any two permutations in $\mathfrak S_n$ with that orbit type are conjugate in $\mathfrak A_n$.

We can check the initial example of two 5-cycles in $\mathfrak S_5$ which are not conjugate in $\mathfrak A_5$. Their orbit type is $5^1$: the only length that appears is 5, hence odd, and it has multiplicity $1$. In fact, this is the only orbit type in $\mathfrak S_5$ where this phenomenon appears!

Monday, December 12, 2022

Multiplicative square roots

I will just discuss briefly the first section of a paper by William Waterhouse (2012), “Square Root as a Homomorphism” (American Mathematical Monthly 119 (3), 235-239), which addresses the following question: given a field $F$, when is it possible to define square roots for all squares compatibly with products, ie, so that $\sqrt {ab}=\sqrt a\,\sqrt b$ if $a,b\in F$ are squares.

Real numbers. — Such a square root operation exists when $F$ is the field of real numbers: we are familiar with the process of taking the positive square root of a positive real number.

Finite fields. — It also exists in some finite fields. So let $F$ be a finite field, let $q$ be its number of elements; then $q$ is a power of a prime number $p$, but if you wish, you may already assume that $q=p$ is prime. For simplicity, we assume that $q$ is odd. By Fermat's little theorem, every nonzero element $a\in F$ satisfies $a^{q-1}=1$. Then $q-1$ is even, we can write $a^{q-1}=a^{(q-1)/2})^2=1$, so that $a^{(q-1)/2}=\pm1$, and Euler's criterion asserts that $a$ is a square if and only if $a^{(q-1)/2}=1$. (That this condition is necessary is obvious: write $a=b^2$, one gets $a^{(q-1)/2}=b^{q-1}=1$ by Fermat's criterion. Then, a counting argument shows that it is sufficient: the map $b\mapsto b^2$ is $2$ to $1$ on nonzero elements, hence its image consists of $(q-1)/2$ elements, all of which are squares; since the polynomial equation $T^{(q-1)/2}=1$ has at most $(q-1)/2$ solutions in $F$, we obtained all of them in this way.)

For example, $-1$ is a square if and only if $(-1)^{(q-1)/2}=1$, which happens if and only if $(q-1)/2$ is even, that is, $q\equiv 1\pmod 4$. In this case, do we have a formula for a square root of $-1$? When $q=p$, yes, but it is not an easy one: Wilson's theorem states that $(p-1)!\equiv -1\pmod p$, just because you may pair each integer $a$ such that $1\lt a\lt p-1$ with its multiplicative inverse modulo $p$; then only two factors remain in the product and $(p-1)!\equiv 1\cdot (p-1)\equiv -1\pmod p$. Now, we pair each integer $a$ such that $1\leq a\leq p-1$ with its additive inverse $p-a$; we get $(((p-1)/2) ! )^2 (-1)^((p-1)/2)$, hence $((p-1)/2)!)^2\equiv -1\pmod p$. This is not an easy formula, because computing the factorial takes a long time for large $p$.

It is possible to do much quicker, but you need to have a die at your disposal. Indeed, choose an element $a$ such that $1\leq a\leq p-1$ and compute $b=a^{(p-1)/4}$. Since $b^2=a^{(p-1)/2}=\pm1$, two possibilities arise: when $a$ is a square, we get $1$, but if $a$ is not a square, then we get $-1$. And if we choose $a$ randomly, we have one chance over two of not having chosen a square, hence one chance over two to get an element $b$ such that $b^2=-1$.

At this point you may ask why it isn't as long to compute the power $a^{(p-1)/4}$ than the factorial $((p-1)/2)!$, and you would be right. The reason is that there is a fast recursive way to compute a power $a^n$, by writing $a^n=(a^2)^{n/2}$ if $n$ is odd, and $a^n=a\cdot (a^2)^{(n-1)/2}$ if $n$ is odd. This leads to basically $\log_2(n)$ multiplications and squarings, and not $n$ multiplications ($n-1$, actually) as the naïve expression $a\cdot a\dots a$ might have let you think.

But let us go back to the question of computing square roots. As the last three paragraphs indicate, it could be difficult to do so when $q\equiv 1\pmod 4$. However, it is extremly easy in the other case $q\equiv 3\pmod 4$. Take a nonzero element $a$ which is a square, and write $a^{(q-1)/2}=1$. Since $q\equiv 3\pmod 4$, we write $q=-1+4m$ so that $a^{2m-1}=1$, hence say $a=a^{2m}=(a^m)^2$. We have our square root, it is simply given by $b=a^m=a^{(q+1)/4}$. The resulting map, $a\mapsto a^m$, gives us our desired multiplicative square roots on squares.

Complex numbers. — Now for a negative result, there is no multiplicative square root on the complex numbers, basically for the reason we have been taught that it leads to fallacies. All complex numbers are squares, so let us assume that we have a multiplicative square root $r\colon \mathbf C\to\mathbf C$. Letting $i=r(-1)$, the contradiction comes from the relation $$-i = r(-i)^2=r((-i)^2)=r(-1)=i.$$

We can now state and prove Waterhouse's theorem:

Theorem.Let $F$ be a field (of characteristic $\neq 2$) and let $S\subseteq F$ be the multiplicative monoid of squares. There exists a multiplicative homomorphism $r\colon S\to F$ if and only if $-1\notin S$.

Proof. — The same negative argument as in the complex numbers works whenever $-1$ is a square in $F$. So let us assume that $-1$ is not a square and let us explain why a multiplicative square root exists. The proof, however, is not explicit but relies on some maximal principle. Moreover, we won't define the square root map directly, but its image.
Let us first analyse the situation. Assume that $r\colon S\to F$ is a multiplicative square root. It is simpler to remove $0$ from the discussion so we consider its restriction $S^\times \to F^\times$ and still denote it by $r$. By assumption, it is a morphism of groups, so that its image $R^\times$ is a subgroup of $F^\times$. Observe that it does not contain $-1$, for if $r(a)=-1$, then $a=r(a)^2=(-1)^2=1$ but $r(1)=1$. Moreover, for every element $a\in F^\times$, we have $r(a^2)^2=a^2$, hence $r(a^2)=\pm a$, so that either $a$, or $-a$ belongs to $R$, but not both since $-1\not\in R^\times$. As a consequence, $R^\times$ is a maximal subgroup of $F^\times$ among those which do not contain $-1$: adding to $R^\times$ any element $a\in F^\times$ such that $a\notin R^\times$ would lead to a subgroup $\langle R^\times,a\rangle$ which contains $-1$.

Let us consider a maximal subgroup of $F^\times$ containing the squares which does not contain $-1$. Starting from $S^\times$, which does not contain $-1$, this can be done using Zorn's lemma, or by transfinite induction: well ordering the elements of $F^\times$, and constructing $R^\times$ by induction. Since $R^\times$ contains the squares, the union $R^\times \cup a R^\times$ is a subgroup of $F^\times$; if it does not contain $-1$, then we replace $R^\times$ by it, other wise we discard $a$ and keep $R^\times$.

Let $a\in F^\times$. If $a\notin R^\times$, the construction means that $-1\in aR^\times$, hence $-a\in R^\times$. But we can't have both $a$ and $-a$ in $R^\times$, for that would imply that $-1\in R^\times$.

If $a\in F^\times$ is a nonzero square, it has two square roots, of the form $\pm b$, and we define $r(a)$ to be its square root which belongs to $R^\times$. One has $r(1)=1$, because $1\in S^\times\subset R^\times$. For nonzero squares $a,b$, the product $r(a)r(b)$ is a square root of $ab$, and it belongs to $R^\times$, hence it equals $r(ab)$. This proves that the map $r$ is multiplicative. This concludes the proof.

Remark. — If you've studied some abstract algebra, you may have recognized something in the middle of the proof. Indeed, the quotient group $V=F^\times/S^\times$ has exponent 2: for every $\alpha$ in this group, $\alpha^2=1$. Consequently, even if it is written multiplicatively, this abelian group is a vector space over the field with 2-elements. Since $-1$ is not a square in $F^\times$, its class $[-1]$ is nonzero in $F^\times/S^\times$, and the quotient group $W=R^\times/S^\times$ is just a maximal vector subspace that does not contain $[-1]$. It is a hyperplane and is defined by a linear form $\phi$ on $V$. Since $V$ is written multiplicatively, this linear form corresponds to a group homomorphism $f\colon F^\times \to\{\pm1\}$ which maps $S^\times$ to $1$ and such that $f(-1)=-1$. For every square $a=b^2$, we then have $r(a)=b f(b)$.

In his paper, Waterhouse goes on by viewing “fields $F$ with a multiplicative square root $r$” as a basic algebraic object, and considering such structures $(F,r)$ which can't be extended by adding algebraic elements. The final theorem of the paper shows that the Galois group $\mathop{\rm Gal}(\overline F/F)$ is either cyclic of order 2, or is the additive group of the 2-adic integers.