Friday, February 17, 2023

Associated prime ideals

$\gdef\ann{\mathop{\mathrm{ann}}} \gdef\Ass{\mathop{\mathrm{Ass}}}\gdef\Spec{\mathop{\mathrm{Spec}}}$

I would like to go back to a quite delicate question of commutative algebra, that of associated prime ideals of modules. In most textbooks (Bourbaki, Matsumura…), this concept is considered for modules over a noetherian ring, while it is also necessary to consider it in a greater generality for some applications in algebraic geometry. For my book, (Mostly) commutative algebra (Springer Nature, 2021), I preferred to introduce the general concept (§6.5), because I observed that the initial proofs are in fact easier. In yesterday's class (Cohomology of coherent sheaves, 2nd year of Master course at Université Paris Cité), some remarks of a student, Elias Caeiro, helped me simplify two steps of the treatment I proposed in my book.

Definition.Let $A$ be a ring and let $M$ be an $A$-module. Say that a prime ideal $P$ of $A$ is associated to $M$ if there exists an element $m\in M$ such that $P$ is minimal among all prime ideals containing $\ann_A(m)$.
We write $\Ass_A(M)$ (sometimes spelt out as “assassin”) for the set of all associated prime ideals of $M$.

(Here, $\ann_A(m)$ is the annihilator of $m$, the ideal of all $a\in A$ such that $am=0$.)

There is a geometric way to intepret this definition: it means that in the spectrum $\Spec(A)$, the irreducible closed set $V(P)$ (of which $P$ is the generic point) is an irreducible component of $V(\ann_A(m))$. Thanks to this remark, associated prime ideals are compatible with localisation: \[ \Ass_{S^{-1}A}(S^{-1}A) = \Ass_A(M) \cap \Spec(S^{-1}A), \] where $\Spec(S^{-1}A)$ is identified as the subset of $\Spec(A)$ consisting of prime ideals $P$ which are disjoint from $S$. In particular, $P$ is associated to $M$ if and only if the maximal ideal $PA_P$ of the local ring $A_P$ is associated to the module $M_P$.

Here is what the associated prime ideals mean, from the point view of module theory.
Proposition. — Let $a\in A$.
a) The multiplication by $a$ is injective in $M$ if and only if $a$ does not belong to any associated prime ideal of $M$.
b) The localized module $M_a$ is zero if and only if $a$ belongs to all associated prime ideals of $M$.
c) In particular, $M=0$ if and only if $\Ass_A(M)=\emptyset$.

Proof. — a) If $a$ belongs to the associated prime ideal $P$, then $a$ belongs to the associated prime ideal $PA_P$ of $M_P$, which means that there exists $m\in M$ such that $PA_P$ is the only prime ideal containing $\ann_{A_P}(m)$. Consequently, $a$ is nilpotent modulo $\ann_{A_P}(m)$ and there exists $n\geq 0$ and $b\in A\setminus P$ such that $a^nb\in\ann_A(m)$. Take a minimal such $n$. Since $b\notin P$, one has $n\geq 1$; then $a^{n-1}b m\neq0$, while $a\cdot a^{n-1}bm=0$ and the homothety $(a)_M$ is not injective. Conversely, if $(a)_M$ is not injective, take $m\neq0$ in $M$ such that $am=0$; the annihilator $\ann_A(m)$ is not equal to $A$, hence $V(\ann_A(m))\neq \emptyset$; take an irreducible component of this closed subset — equivalently a minimal prime ideal $P$ among those containing $\ann_A(m)$; one has $a\in \ann_A(m)$, hence $a\in P$.
b) follows from c), with $a=1$.
c) The module $M$ is zero if and only if the multiplication by $0$ is injective on $M$. By a), this is equivalent to the fact that $\Ass_A(M)$ is empty.

Corollary.A prime ideal $P$ is in the support of $M$ if and only if it contains some associated prime ideal.
The prime ideal $P$ belongs to the support of $M$ if and only if $M_P\neq0$, if and only if $\Ass_{A_P}(M_P)$ is not empty, if and only if there exists an associated prime ideal of $M$ which belongs to $\Spec(A_P)$, that is, is contained in $P$.

For noetherian rings, one has the following characterization of associated prime ideals, which is usually taken at their definition.

Theorem.Let $A$ be a noetherian ring and $M$ be an $A$-module. A prime ideal $P$ of $A$ is associated to $M$ if and only if there exists $m\in M$ such that $P=\ann_A(m)$.
If $P=\ann_A(m)$, then $P$ is associated to $M$. Conversely, let $m\in M$ and let $P$ be a minimal prime ideal of $A$ among those containing $\ann_A(m)$. We first assume that $A$ is local with maximal ideal $P$; then $P$ is the only prime ideal of $A$ that contains $\ann_A(m)$, which implies that any element of $P$ is nilpotent modulo $\ann_A(m)$. Since $P$ is finitely generated (because $A$ is noetherian), there exists an integer $n$ such that $P^n\subseteq \ann_A(m)$. Take a minimal such $n$. Since $\ann_A(m)\subseteq P$, one has $n\geq 1$; then $P^{n-1}\not\subseteq\ann_A(m)$ so that there exists $b\in P^{n-1}$ such that $bm\neq0$. Then $ab\in P^n$ for every $a\in P$, so that $P\subseteq \ann_A(bm)$, and $\ann_A(bm)\subseteq P$ because $bm\neq0$. Consequently, $P=\ann_A(bm)$. In the general case, we use the case of a local ring to obtain $m\in M$ such that $\ann_{A_P}(m/1)=PA_P$. Consequently, $\ann_A(m)\subseteq P$, and for every $a\in P$, there exists $b\notin P$ such that $abm=0$. Using that $P$ is finitely generated, one finds $b\notin P$ such that $abm=0$ for every $a\in P$; then $\ann_A(bm)=P$, as was to be shown.

From that point on, both presentations converge. One deduces from the preceding theorem that if $A$ is noetherian and $M$ is finitely generated, there exists a composition series $0=M_0\subseteq M_1 \subseteq \dots \subseteq M_n=M$, with successive quotients $M_k/M_{k-1}$ of the form $A/P_k$, for some prime ideals $P_k$ of $A$, and then $\Ass_A(M)$ is contained in $\{P_1,\dots,P_n\}$, in view of the following lemma. In particular, $\Ass_A(M)$ is finite.

Lemma.Let $M$ be an $A$-module and let $N$ be a submodule of $M$; then $ \Ass_A(N)\subseteq \Ass_A(M)\subseteq \Ass_A(N)\cup \Ass_A(M/N)$.
The first inclusion $\Ass_A(N)\subseteq \Ass_A(M)$ follows from the definition. Let us prove the second one. Let $P\in\Ass_A(M)$ and let $m\in M$ be such that $P$ is a minimal prime ideal of $A$ among those containing $\ann_A(m)$. Let $m'$ be the image of $M$ in $M/N$. If $P$ contains $\ann_A(m')$, then $P$ is also minimal among such prime ideals, hence $P\in\Ass_A(M/N)$. Otherwise, there exists $b\in \ann_A(m')$ such that $b\notin P$. Let us prove that $P$ is minimal among the prime ideals containing $\ann_A(bm)$. First of all, let $a\in\ann_A(bm)$; then $abm=0$, hence $ab\in P$, hence $a\in P$ since $b\notin P$. Since $\ann_A(m)\subseteq\ann_A(bm)$, it also follows that $P$ is minimal among the prime ideals containing $\ann_A(bm)$. Since $b\in\ann_A(m')$, one has $bm'=0$, hence $bm\in N$ and $P\in\Ass_A(N)$.

Sunday, January 1, 2023

The Klein group, the centralizer of a permutation, and its relation with the alternating group

The following reflexion came out of my irrepressible need to understand why the 3 double transpositions in $\mathfrak S_4$, together with the identity, formed a group $V$. Of course, one might just say: “they are stable under multiplication, as one sees by computing the 4·3/2 = 6 different products”, but who makes this computation anyway? And since I wanted not only to understand this, but to explain it to Lean, I needed an argument that could actually be done, for real. So here is an argument that requires no computation, besides the one that says that there are 3 double transpositions.

Prop.The subgroup of $\mathfrak S_4$ generated by the 3 double transpositions is the unique 2-sylow subgroup of $\mathfrak A_4$. In particular, it has order 4 and consists of these 3 double transpositions and of the identity.
Proof. — Let $V$ be the subset of $\mathfrak S_4$ consisting of these 3 double transpositions and of the identity. Let $S$ be a 2-sylow subgroup in $\mathfrak A_4$.
We first prove $S \subseteq V$. The subgroup $S$ has order 4. Let $g\in S$. The order of $g$ divides 4, so its cycles have lengths 1, 2 or 4. If there were one cycle of length 4, then $g$ would be that cycle, hence of odd sign. Consequently, either $g=1$, or $g$ has a cycle of length 2, and then there must be a second because $g$ is even. Consequently, $S\subseteq V$, as claimed.
Since $4=\operatorname{\rm Card}(S)=\operatorname{\rm Card}(V)$, this shows that $S=V$, hence $S=\langle V\rangle$.

At this point, we still need to understand why there are 3 double transpositions. More generally, I wanted to prove that the number of permutations in $\mathfrak S_n$ of given orbit type. The orbit type a permutation $g$ is a multiset of strictly positive integers with sum $n$ given by the cardinalities of the orbits of $g$ on $\{1,\dots,n\}$. We write it as $1^{n_1} 2^{n_2}\dots r^{n_r}$, meaning that $g$ has $n_1$ orbits of length $1$ (fixed points), $n_2$ orbits of cardinality $2$, etc., so that $n= \sum n_i i$. Let $\mathscr O_g$ be the set of orbits of $g$. The action of $g$ on a given orbit $c$ coincides with a circular permutation with order the length $\ell(c)$ of this orbit; when it is nontrivial, such a permutation will be called a cycle of $g$. The supports of these cycles are pairwise disjoint, so that these cycles commute, and their product is exactly $g$. In fact, this is the only way of writing $g$ as a product of cycles with pairwise disjoint supports. (By convention, the identity is not a cycle.)

Theorem.There are exactly \[ N(1^{n_1}\dots r^{n_r}) = \frac{n!}{1^{n_1}\dots r^{n_r} n_1!\dots n_r!} \] permutations with orbit type $1^{n_1} 2^{n_2}\dots r^{n_r}$.

A standard proof of this result goes as follows. Write the decomposition of such a permutation $g$ into cycles with disjoint supports as $g=(\cdot)\dots (\cdot)(\cdot,\cdot)\dots(\cdot,\cdot,\dots)$, leaving blank spaces for the values of the cycles (and, contradicting our convention, allowing for cycles of length 1…). There are $n!$ ways to fill these spaces with the $n$ distinct integers between $1$ and $n$, but some of them will give rise to the same permutation. Indeed, the entries in a cycle of length $s$ only count up to a circular permutation, so that we need to divide the result by $1^{n_1}\dots r^{n_r}$. Moreoveer, we can switch the order of the cycles of given length, hence we also need to divide that result by $n_s!$ (number of ways of switching the various cycles of length $s$), for all possible length $s$.

This is reasonably convincing but one could wish for something more precise, both in the proof, and in the statement. In fact, in the preceding formula, the numerator $n!$ is the order of $\mathfrak S_n$. Since all permutations with a given orbit type are conjugate by $\mathfrak S_n$, the left hand side appears as the cardinality of the orbit of a permutation $g$ of that orbit type, so that the denominator has to be equal the cardinality of the stabilizer of this permutation under the action by conjugation. Therefore, a more precise proof of this formula could run by elucidating the structure of this centralizer. This may also be interesting once one wishes to relativize the result to the alternating group $\mathfrak A_n$ in order to obtain a formula for the cardinality of the various conjugacy classes in $\mathfrak A_n$.

Let us fix a permutation $g\in\mathfrak S_n$ with orbit type $1^{n_1}\dots r^{n_r}$. The stabilizer of $g$ under the action by conjugation is its centralizer $Z_g$, the subgroup of all $k\in\mathfrak S_n$ which commute with $g$.

We first define a morphism of groups \[ \tau \colon Z_g \to \mathfrak S_{n_1}\times \mathfrak S_{n_2}\times\dots \mathfrak S_{n_r}. \] Let $\mathscr O_g$ be the set of orbits of $g$; this is a set with cardinality $n_1+n_2+\dots+n_r$. Restricted to one orbit, the action of $g$ coincides with that of a circular permutation on (which fixes the complementary subset); these circular permuations have disjoint supports, hence they commute pairwise and their product is equal to $g$. For $c\in\mathscr O_g$, we write $\ell(c)$ for its cardinality of its support, this is also the order of the cycle of $g$ defined by this orbit. If $k\in Z_g$, then $kgk^{-1}=g$. Consequently, the action of $k$ permutes the orbits of $g$, respecting their cardinalities. This defines the desired group morphism $\tau$ from $Z_g$ to a product of permutation groups $\mathfrak S_{n_1}\times \dots \mathfrak S_{n_r}$.

This morphism $\tau$ is surjective.
Indeed, given permutations $\sigma_1$ of the set of fixed points of $g$, $\sigma_2$ of the set of orbits of length 2, etc., we construct $k_\sigma\in Z_g$ such that $\tau(k_\tau)=(\sigma_1,\dots,\sigma_r)$. We fix a point $a_c$ in each orbit $c$ and decide that $k_\sigma(a_c)=a_{\sigma_i(c)}$ if $c$ has length $i$. The formula $k_\sigma g=g_\sigma$ imposes $k_\sigma (g^n a_c)=g^n a_{\sigma_i(c)}$ for all $n\in\mathbf Z$, and it remains to check that this formula gives a well defined element in $Z_g$. In fact, this formula defines a group theoretic section of $\tau$.

What is the kernel of this morphism $\tau$?
If $\tau(k)=1$, then $k$ fixes every orbit $c\in\mathscr O_g$. Since $kg=gk$, we obtain that on each orbit $c$, $k$ coincides with some power of the corresponding cycle, which has order $\ell(c)$. We thus obtain an isomorphism \[ \ker(\tau) \simeq \prod_{c\in\mathscr C_g} (\mathbf Z/\ell(c)\mathbf Z). \]

To compute the cardinality of $Z_g$, it is now sufficient to compute those of $\operatorname{\rm im}(\tau)$ and $\ker(\tau)$, and this gives the formula \[ \operatorname{\rm Card} (Z_g) = \operatorname{\rm Card} (\ker(\tau)) \operatorname{\rm Card} (\operatorname{\rm im}(\tau)) = 1^{n_1}\dots r^{n_r} n_1! \dots n_r!, \] as was to be shown.

One of the interest of this argument is that it can be pushed forward to understand the structure of the conjugacy classes in the alternating group $\mathfrak A_n$. The case $n\leq 1$ is uninteresting, hence we assume $n\geq 2$. Then $\mathfrak A_n$ has index 2 in $\mathfrak S_n$, and the formulas \[ \operatorname  {\rm Card}((g)_{\mathfrak A_n}) = \frac{{\rm Card}({\mathfrak A_n})}{{\rm Card}(Z_g \cap {\mathfrak A_n})} \quad\text{\rm and}\quad \operatorname  {\rm Card}((g)_{\mathfrak S_n}) = \frac{{\rm Card}({\mathfrak S_n})}{{\rm Card}(Z_g)} \] for the cardinalities of the conjugacy classes $(g)_{\mathfrak A_n}$ and $(g)_{\mathfrak S_n}$ imply that both are equal if and only if $Z_g$ is not contained in $\mathfrak A_n$; otherwise, the conjugacy class $(g)_{\mathfrak S_n}$ is the disjoint union of $(g)_{\mathfrak A_n}$ and of a conjugacy class $(g')_{\mathfrak A_n}$ of a permutation $g'$ which is conjugate to $g$ in $\mathfrak S_n$ but not in $\mathfrak A_n$, and both have the same cardinality.

Examples of this phenomenon are classical. For example, the 5-cycles in $\mathfrak S_5$ are conjugate, but they constitute two distinct conjugacy classes under $\mathfrak A_5$. Even more elementary, the 3-cycles $(1\,2\,3)$ and $(1\,3\,2)$ are conjugate in $\mathfrak S_3$, but they are not conjugate in $\mathfrak A_3$ since that group is commutative!

So let us use our description of $Z_g$ to give a full description of this phenomenon.

As a first step, when is $\ker(\tau)$ contained in $\mathfrak A_n$? We have seen that $\ker(\tau)$ is generated by the cycles $c\in\mathscr C_g$. Consequently, $\ker(\tau)$ is contained in $\mathfrak A_n$ if and only if all of them are contained in $\mathfrak A_n$, which means that their lengths are odd.

We assume that this condition holds, so that $\ker(\tau)\subseteq \mathfrak A_n$, and now work on the image of $\tau$. Its surjectivity was proved by the means of an explicit section $\sigma\mapsto k_\sigma$. Given the preceding condition that $\ker(\tau)\subseteq \mathfrak A_n$, a necessary and sufficient condition for the inclusion $Z_g\subseteq \mathfrak A_n$ will be that the image of this section consists of even permutations. This section is a morphism of groups, so it suffices to understand the sign of $k_\sigma$ when $\sigma$ consists of a cycle $(c_1,\dots,c_s)$ in $\mathfrak S_{n_i}$ and is trivial on the other factors. Then $\ell(c_1)=\dots=\ell(c_s)$, by definition of $\sigma$. The formula $k_\sigma(g^n a_c)=g^n a_{\sigma(c)}$ shows that the non trivial cycles of $k_\sigma$ are of the form $(g^n a_{c_1},\dots, g^n a_{c_s})$; they all have the same length, $s$, and there are $\ell(c_1)$ of them. Consequently, the sign of $k_\sigma$ is equal to $(-1)^{(s-1)\ell(c_1)}=(-1)^{s-1}$ since $\ell(c_1)$ is odd. This proves that the sign of $k_\sigma$ is equal to the sign of $\sigma$. In addition to the condition that the orbits of $g$ have odd cardinalities, a necessary and sufficient condition for the image of $\sigma\mapsto k_\sigma$ to be contained in $\mathfrak A_n$ is thus that all symmetric groups $\mathfrak S_{n_i}$ coincide with their alternating groups, that is, $n_i\leq 1$ for all $i$. We can now conclude:

Theorem. — Let $1^{n_1}\dots r^{n_r}$ be a partition of $n$.

  1. If $n_i=0$ for even $i$, and $n_i\leq 1$ for all $i$, then there exist two permutations in $\mathfrak S_n$ with orbit type $1^{n_1}\dots r^{n_r}$ which are not conjugate in $\mathfrak A_n$.
  2. Otherwise, any two permutations in $\mathfrak S_n$ with that orbit type are conjugate in $\mathfrak A_n$.

We can check the initial example of two 5-cycles in $\mathfrak S_5$ which are not conjugate in $\mathfrak A_5$. Their orbit type is $5^1$: the only length that appears is 5, hence odd, and it has multiplicity $1$. In fact, this is the only orbit type in $\mathfrak S_5$ where this phenomenon appears!

Monday, December 12, 2022

Multiplicative square roots

I will just discuss briefly the first section of a paper by William Waterhouse (2012), “Square Root as a Homomorphism” (American Mathematical Monthly 119 (3), 235-239), which addresses the following question: given a field $F$, when is it possible to define square roots for all squares compatibly with products, ie, so that $\sqrt {ab}=\sqrt a\,\sqrt b$ if $a,b\in F$ are squares.

Real numbers. — Such a square root operation exists when $F$ is the field of real numbers: we are familiar with the process of taking the positive square root of a positive real number.

Finite fields. — It also exists in some finite fields. So let $F$ be a finite field, let $q$ be its number of elements; then $q$ is a power of a prime number $p$, but if you wish, you may already assume that $q=p$ is prime. For simplicity, we assume that $q$ is odd. By Fermat's little theorem, every nonzero element $a\in F$ satisfies $a^{q-1}=1$. Then $q-1$ is even, we can write $a^{q-1}=a^{(q-1)/2})^2=1$, so that $a^{(q-1)/2}=\pm1$, and Euler's criterion asserts that $a$ is a square if and only if $a^{(q-1)/2}=1$. (That this condition is necessary is obvious: write $a=b^2$, one gets $a^{(q-1)/2}=b^{q-1}=1$ by Fermat's criterion. Then, a counting argument shows that it is sufficient: the map $b\mapsto b^2$ is $2$ to $1$ on nonzero elements, hence its image consists of $(q-1)/2$ elements, all of which are squares; since the polynomial equation $T^{(q-1)/2}=1$ has at most $(q-1)/2$ solutions in $F$, we obtained all of them in this way.)

For example, $-1$ is a square if and only if $(-1)^{(q-1)/2}=1$, which happens if and only if $(q-1)/2$ is even, that is, $q\equiv 1\pmod 4$. In this case, do we have a formula for a square root of $-1$? When $q=p$, yes, but it is not an easy one: Wilson's theorem states that $(p-1)!\equiv -1\pmod p$, just because you may pair each integer $a$ such that $1\lt a\lt p-1$ with its multiplicative inverse modulo $p$; then only two factors remain in the product and $(p-1)!\equiv 1\cdot (p-1)\equiv -1\pmod p$. Now, we pair each integer $a$ such that $1\leq a\leq p-1$ with its additive inverse $p-a$; we get $(((p-1)/2) ! )^2 (-1)^((p-1)/2)$, hence $((p-1)/2)!)^2\equiv -1\pmod p$. This is not an easy formula, because computing the factorial takes a long time for large $p$.

It is possible to do much quicker, but you need to have a die at your disposal. Indeed, choose an element $a$ such that $1\leq a\leq p-1$ and compute $b=a^{(p-1)/4}$. Since $b^2=a^{(p-1)/2}=\pm1$, two possibilities arise: when $a$ is a square, we get $1$, but if $a$ is not a square, then we get $-1$. And if we choose $a$ randomly, we have one chance over two of not having chosen a square, hence one chance over two to get an element $b$ such that $b^2=-1$.

At this point you may ask why it isn't as long to compute the power $a^{(p-1)/4}$ than the factorial $((p-1)/2)!$, and you would be right. The reason is that there is a fast recursive way to compute a power $a^n$, by writing $a^n=(a^2)^{n/2}$ if $n$ is odd, and $a^n=a\cdot (a^2)^{(n-1)/2}$ if $n$ is odd. This leads to basically $\log_2(n)$ multiplications and squarings, and not $n$ multiplications ($n-1$, actually) as the naïve expression $a\cdot a\dots a$ might have let you think.

But let us go back to the question of computing square roots. As the last three paragraphs indicate, it could be difficult to do so when $q\equiv 1\pmod 4$. However, it is extremly easy in the other case $q\equiv 3\pmod 4$. Take a nonzero element $a$ which is a square, and write $a^{(q-1)/2}=1$. Since $q\equiv 3\pmod 4$, we write $q=-1+4m$ so that $a^{2m-1}=1$, hence say $a=a^{2m}=(a^m)^2$. We have our square root, it is simply given by $b=a^m=a^{(q+1)/4}$. The resulting map, $a\mapsto a^m$, gives us our desired multiplicative square roots on squares.

Complex numbers. — Now for a negative result, there is no multiplicative square root on the complex numbers, basically for the reason we have been taught that it leads to fallacies. All complex numbers are squares, so let us assume that we have a multiplicative square root $r\colon \mathbf C\to\mathbf C$. Letting $i=r(-1)$, the contradiction comes from the relation $$-i = r(-i)^2=r((-i)^2)=r(-1)=i.$$

We can now state and prove Waterhouse's theorem:

Theorem.Let $F$ be a field (of characteristic $\neq 2$) and let $S\subseteq F$ be the multiplicative monoid of squares. There exists a multiplicative homomorphism $r\colon S\to F$ if and only if $-1\notin S$.

Proof. — The same negative argument as in the complex numbers works whenever $-1$ is a square in $F$. So let us assume that $-1$ is not a square and let us explain why a multiplicative square root exists. The proof, however, is not explicit but relies on some maximal principle. Moreover, we won't define the square root map directly, but its image.
Let us first analyse the situation. Assume that $r\colon S\to F$ is a multiplicative square root. It is simpler to remove $0$ from the discussion so we consider its restriction $S^\times \to F^\times$ and still denote it by $r$. By assumption, it is a morphism of groups, so that its image $R^\times$ is a subgroup of $F^\times$. Observe that it does not contain $-1$, for if $r(a)=-1$, then $a=r(a)^2=(-1)^2=1$ but $r(1)=1$. Moreover, for every element $a\in F^\times$, we have $r(a^2)^2=a^2$, hence $r(a^2)=\pm a$, so that either $a$, or $-a$ belongs to $R$, but not both since $-1\not\in R^\times$. As a consequence, $R^\times$ is a maximal subgroup of $F^\times$ among those which do not contain $-1$: adding to $R^\times$ any element $a\in F^\times$ such that $a\notin R^\times$ would lead to a subgroup $\langle R^\times,a\rangle$ which contains $-1$.

Let us consider a maximal subgroup of $F^\times$ containing the squares which does not contain $-1$. Starting from $S^\times$, which does not contain $-1$, this can be done using Zorn's lemma, or by transfinite induction: well ordering the elements of $F^\times$, and constructing $R^\times$ by induction. Since $R^\times$ contains the squares, the union $R^\times \cup a R^\times$ is a subgroup of $F^\times$; if it does not contain $-1$, then we replace $R^\times$ by it, other wise we discard $a$ and keep $R^\times$.

Let $a\in F^\times$. If $a\notin R^\times$, the construction means that $-1\in aR^\times$, hence $-a\in R^\times$. But we can't have both $a$ and $-a$ in $R^\times$, for that would imply that $-1\in R^\times$.

If $a\in F^\times$ is a nonzero square, it has two square roots, of the form $\pm b$, and we define $r(a)$ to be its square root which belongs to $R^\times$. One has $r(1)=1$, because $1\in S^\times\subset R^\times$. For nonzero squares $a,b$, the product $r(a)r(b)$ is a square root of $ab$, and it belongs to $R^\times$, hence it equals $r(ab)$. This proves that the map $r$ is multiplicative. This concludes the proof.

Remark. — If you've studied some abstract algebra, you may have recognized something in the middle of the proof. Indeed, the quotient group $V=F^\times/S^\times$ has exponent 2: for every $\alpha$ in this group, $\alpha^2=1$. Consequently, even if it is written multiplicatively, this abelian group is a vector space over the field with 2-elements. Since $-1$ is not a square in $F^\times$, its class $[-1]$ is nonzero in $F^\times/S^\times$, and the quotient group $W=R^\times/S^\times$ is just a maximal vector subspace that does not contain $[-1]$. It is a hyperplane and is defined by a linear form $\phi$ on $V$. Since $V$ is written multiplicatively, this linear form corresponds to a group homomorphism $f\colon F^\times \to\{\pm1\}$ which maps $S^\times$ to $1$ and such that $f(-1)=-1$. For every square $a=b^2$, we then have $r(a)=b f(b)$.

In his paper, Waterhouse goes on by viewing “fields $F$ with a multiplicative square root $r$” as a basic algebraic object, and considering such structures $(F,r)$ which can't be extended by adding algebraic elements. The final theorem of the paper shows that the Galois group $\mathop{\rm Gal}(\overline F/F)$ is either cyclic of order 2, or is the additive group of the 2-adic integers.

Tuesday, November 1, 2022

#Mathober2022

Sophia Wood (@fractalkitty) had the good idea to set up a #Mathober project: for each day of october, she proposes you to react to one word of mathematics. I did something on the Mastodon server, that was also crossposted on Twitter. I will copy it here, but meanwhile you can enjoy it there.

You can also enjoy Sophia's work there:

Link to Mastodon : https://mathstodon.xyz/web/@antoinechambertloir/109131452332129714

Link to Twitter: https://twitter.com/achambertloir/status/1578647553205276672

Link to Sophia's Wood sketches:https://fractalkitty.com/2022/10/01/mathober2022-sketches

Tuesday, September 13, 2022

Yet another post on simplicity

I see that I finally arrive to an end of my journey in formalizing in Lean the simplicity of the alternating group in 5 letters or more, so it may be a good time to summarize what I did, from the mathematical side. 

On a first blog post, “Not simple proofs of simplicity”, I had described my initial plan, but it was not clear at that time that I would either arrive at a final proof, nor that I would be able to formalize it in Lean. In fact, a few weeks after I had started this experiment, I doubted I would make it and went on formalizing the traditional proof that the alternating group is simple. I added a few simplifications—which I was later told were already explained in Jacobson's Basic Algebra, say that's life…– leading to “The very simple proof that the alternating groups of five letters (or more) is simple”. I managed to formalize that proof at the end of 2021, and spent a lot of energy of the 8 next months to formalize the proof that I initially had in mind.

As I had already explained, the goal/constraint is to apply the Iwasawa criterion to the alternating group. This criterion says that if a group $G$ acts primitively on a set $X$, and if we attach to each point $x\in X$ a commutative subgroup $Tx$ of $G$, in such a way that $T(g\cdot x)=g\cdot Tx\cdot g^{-1}$ for every $g\in G$ and every $x\in X$, and if the subgroups $Tx$ generate $G$, then every normal subgroup of $G$ that acts nontrivially on $X$ contains the commutator subgroup. We take $G=\mathfrak A_n$. For $n\geq 5$, its commutator subgroup is $\mathfrak A_n$ itself (for example because any two 3-cycles are conjugated; in particular, a 3-cycle is conjugate to its square, which implies that it maps to $1$ in the abelianization of $\mathfrak A_n$). So we need to get primitive actions of $\mathfrak A_n$ and commutative subgroups. 

One of the equivalent criteria for primitivity of a transitive actions is that the stabilizers of points are maximal subgroups. As I had explained at the end of the first post, the maximal subgroups of $\mathfrak S_n$ and $\mathfrak A_n$ are known by the O'Nan–Scott theorem, combined with its converse which is a theorem of Liebeck, Praeger and Saxl. These theorems give a precise list of the maximal subgroups of $\mathfrak S_n$ and $\mathfrak A_n$, of which the first entry is precisely $\mathfrak S_p\times \mathfrak S_{n-p}$ (where the first factor acts on $\{1;\dots;p\}$ and the second acts on $\{p+1;\dots;n\}$) and its intersection with $\mathfrak A_n$, if $0<p<n$ and $n\neq 2p$.

We need to understand the limitation $n\neq 2p$, the point being that if $n=2p$, the subgroup $\mathfrak S_p\times\mathfrak S_p$ is not maximal in $\mathfrak S_{2p}$, it is a subgroup of index 2 of a “wreath product” obtained by adding one permutation that exchanges the two blocks $\{1,\dots,p\}$ and $\{p+1,\dots,2p\}$, for example $(1\,p+1)(2\,p+2)\dots (p\,2p)$. This group is the second entry in the O'Nan–Scott theorem.

These two entries are labelled as intransitive and imprimitive respectively, because $\mathfrak S_p\times \mathfrak S_{n-p}$ has two orbits on $\{1;\dots;n\}$, while the wreath product is transitive but it preserves the partition consisting of the two blocks $\{1,\dots,p\}$ and $\{p+1,\dots,2p\}$.

These two entries seem to be obvious to the group theorists. It is given without proof in the paper of Liebeck, Praeger and Saxl.

The case of $\mathfrak S_n$ is easy, and occupies a subsection of Wilson's book on Finite Simple Groups. It is even funny to prove by hand, and not so hard to formalize in Lean. Take a subgroup $K$ of $\mathfrak S_n$ such that $\mathfrak S_p\times \mathfrak S_{n-p} \subsetneq K$ and let us prove that $K=\mathfrak S_n$.  To that end, it suffices to show that $K$ contains any transposition $(a\,b)$. This is obvious if both $a$ and $b$ belong to $\{1;\dots;p\}$ or if they both belong to $\{p+1;dots;n\}$, so assume that $a\in\{1;\dots;p\}$ and $b\in\{p+1;\dots;n\}$. Since $K$ does not stabilize $\{1;\dots;p\}$, there is $x\in\{1;\dots;p\}$ and $k\in K$ such that $y=k\cdot x \in\{p+1;\dots;n\}$. If $n>2p$, there exists $z\in\{p+1;\dots;n\}$ such that $z\neq y$ and $t=k^{-1}\cdot z\in\{p+1;\dots;n\}$; from the relation $k^{-1} \cdot (y\,z) \cdot k=(x\,t)$ and the fact that $(y\,z)\in \mathfrak S_p\times\mathfrak S_{n-p}$, we deduce that $(x\,t)$ belongs to $K$. This gives us one transposition of the desired form; finally, the relation $(a\,b)=h (x\,t) h^{-1}$ with $h=(x\,a)(t\,b)\in\mathfrak S_p\times\mathfrak S_{n-p}$ shows that $(a\,b)\in K$. The other case, $n<2p$ is symmetric.

Bizarrely, the analogous result for the alternating group looked more difficult to me, although some colleague assured me that it could be done, an other one that I could certainly do it, and a last one did it for $n>7$. Since Liebeck, Praeger and Saxl gave no reference at all, I asked Liebeck about and he explained me a short proof that uses totally different ideas.

Let $G=\mathfrak A_n$ or $\mathfrak S_n$ and consider a subgroup $K$ such that $(\mathfrak S_p\times\mathfrak S_{n-p})\cap G \subsetneq K\subseteq G$; we wish to prove that $K=G$. Arguments as given above already show that $K$ acts transitively on $\{1;\dots;n\}$. But we can do more: it acts primitively. Now, one just needs to invoke a 1870 theorem of Jordan: a primitive subgroup of $\mathfrak S_n$ that contains a transposition is $\mathfrak S_n$, and a primitive subgroup of $\mathfrak S_n$ that contains a 3-cycle contains $\mathfrak A_n$!

To prove that $K$ acts primitively, it is convenient to use the standard definition of a primitive action. If a group $G$ acts on a set $X$, call block of the action a nonempty subset $B$ of $X$ which, for every $g\in G$, is either fixed or moved to a disjoint subset by $G$; it follows from the definition that the translates of a block by the action form a partition of $X$. Singletons are blocks, the full subset is a block, and one definition of a primitive action is that the only blocks are these trivial ones (and $X$ is nonempty). Orbits are blocks, so that a primitive action is transitive. Conversely, one can prove that if the action is transitive, then it is primitive if and only if stabilizers of points in $X$ are maximal subgroups. A more general result is that for every point $a\in X$, associating with a set $B$ its stabilizer $G_B$ gives a bijection from the set of blocks that contain $a$ to the set of subgroups of $G$ that contain $G_a$, with inverse bijection associating with a subgroup $K$ containing $G_a$ the orbit $K\cdot a$, and these bijections preserve inclusion. 

Proof. — Let $B,B'$ be blocks such that $B\subseteq B'$ and let $g\in G_B$; then $g\cdot B'$ contains $g\cdot B=B$, hence $g\cdot B'$ is not disjoint from $B'$, so that $g\cdot B'=B'$ by definition of a block. This proves that $G_B$ is a subgroup of $ G_{B'}$.

Let $B$ be a block that contains $a$; then $G_B \cdot a=B$. Indeed, the inclusion $G_B\cdot a\subseteq B$ follows from the definition of $G_B$. To prove the other inclusion, let $b\in B$. Since the action is transitive, there exists $g\in G$ such that $g\cdot a=b$; then $g\cdot B$ and $B$ both contain $b$, hence $g\cdot B=B$, so that $g\in G_B$ and $b\in G_B\cdot a$.

Finally, let $K$ a a subgroup of $G$ containing $G_a$ and let $B=K\cdot a$. Let us prove that $B$ is a block such that $K=G_B$. Let $g\in G$ such that $g\cdot B$ and $B$ are not disjoint; let $b,c\in B$ be such that $b=g\cdot c$; write $b=k\cdot a$ and $c=h\cdot a$ for $k,h\in K$. Then $k\cdot a = gh\cdot a$ so that $k^{-1}gh\in G_a$, hence $k^{-1}gh\in K$; we conclude that $g\in K$, hence $g\cdot B=gK\cdot a = K\cdot a=B.$ So $B$ is a block. This also shows that $G_B\subseteq K$, and the converse inclusion is obvious.

Going back to our initial problem, it remains to show that the action of $K$ on $\{1;\dots;n\}$ only has trivial blocks. The proof uses two remarks.

  1. The trace of a block on $\{1;\dots;p\}$, respectively $\{p+1;\dots;n\}$, is either a singleton, or all of it. Indeed, this trace is a block for the induced action of $(\mathfrak S_p\times\mathfrak S_{n-p})\cap G$ on $\{1;\dots;p\}$ (respectively $\{p+1;\dots;n\}$), and this action contains that of $\mathfrak A_p$ (respectively…) and even that of $\mathfrak S_p$ if $p\neq n-1$. On the other hand, the symmetric group acts 2-transitively, hence primitively.  (The cases $p=1$ or $p=n-1$ need minor adjustements.)
  2. If $2p<n$, then no nontrivial block can contain $\{p+1;\dots;n\}$. Indeed, there is not enough space in the complementary subset so that disjoint translates of this block make a partition of $\{1;\dots;n\}$.

Let us now conclude the proof. (I still find the following argument a bit convoluted but have nothing really better to propose yet.) Consider a block $B\subset\{1;\dots;n\}$ for the action of $K$, and assume that $B$ is not a singleton, nor the full set. If $B$ meets $\{p+1;\dots;n\}$ in at least two elements, then it contains $\{p+1;\dots;n\}$, hence is the full block, a contradiction. If $B$ meets $\{1;\dots;p\}$ in at least two elements, then it contains $\{1;\dots;p\}$, and some disjoint translate of it  is contained in $\{p+1;\dots;n\}$; this translate is a block that contains $\{p+1;\dots;n\}$, hence is the full set, so that the initial block is the full set as well.  By similar arguments, $B$ meets both $\{1;\dots;p\}$ and $\{p+1;\dots;n\}$ in exactly one element, and the same hold for any translate $k\cdot B$ of $B$. However, using the hypothesis that $p\neq n-p$ and that $K$ strictly contains $(\mathfrak S_p\times\mathfrak S_{n-p})\cap G$, we find $k\in K$ such that $k\cdot B$ meets $\{p+1;\dots;n\}$ in at least two elements, and we can conclude as earlier that $B$ is the full set.

To terminate this blog spot, I need to say something about Jordan's theorem. Jordan was concerned about the concept multiple transitivity: a group $G$ acting on a set $X$ is $m$-transitive if whenever systems of distinct elements $a_1,\dots,a_m$ on the one side, $b_1,\dots,b_m$ on the other side, are given, there exists $g\in G$ such that $g\cdot a_1=b_1,\dots g \cdot a_m=b_m$ (one assumes here that $m\leq {\mathrm{Card}(X)}$). Many theorems from this time (Matthieu, Bertrand, Serret, Jordan…), partly in relation with Galois theory of equations, aim at limiting the multiple transitivity of subgroups of the symmetric group. The symmetric group itself is $n$-transitive, if $n={\mathrm {Card}(X)}$, the alternating group is $(n-2)$-transitive, and other subgroups have to be much less transitive.

The general result of Jordan, proved in the Note C (page 664) to §398 of his Traité des substitutions et des équations algébriques (1870, Gauthier-Villars)  is that a primitive subgroup of $\mathfrak S_n$ containing a cycle of prime order $p$ is $n-p+1$-transitive. For $p=2$, we get that this subgroup is $(n-1)$-transitive, hence is $\mathfrak S_n$; for $p=3$, we get that it is $(n-2)$-transitive, and that implies that it contains the alternating group $\mathfrak A_n$. I formalized these results in Lean, following the presentation of Wielandt's book on Finite permutation groups (theorem 13.3 of that reference). A later theorem of Jordan (1873; see theorem 13.9 in Wielandt's book) asserts that such a subgroup always contains the alternating group provided $n-p\geq 3$; I have not (not yet?) formalized it in Lean.

All in all, this gives a fairly sophisticated proof that the alternating group is simple. One of its merit is to follow a general line, that applies to many other groups. In particular, Iwasawa's criterion is also used by Wilson in his book Finite simple groups to prove that the simplicity of the Mathieu groups $M_{11}, M_{12}$, and of many other finite groups.

I just opened Jordan's book to write this blog post. Let me add that it contains (§85) another proof of simplicity of the alternating group, and I will try to explain it in a later post.

Tuesday, December 21, 2021

The very simple proof that the alternating group on 5 letters (or more) is simple

\( \def\supp {\operatorname{supp}} \)

As explained in the previous post, I wanted to formalize the proof that the alternating group on 5 letters or more is simple, using the Iwasawa criterion. This formalization is in the middle of nowhere, because it requires a proof that the natural action of the alternating group $A_n$ on $k$-elements subsets of $\{1,\dots,n\}$ is primitive, provided $n\neq 2k$, $k\neq 0,n$ and $n\geq 5$.

A simple proof

There is a simple, slightly computational proof that the alternating group of 5 letters (or more) is simple, it is explained in many textbooks, for example in James Milne's Group theory notes. However, its formalization is not so easy, because even Milne has a lot of implicit stuff.

The situation, as in Lemma 4.36 of Milne's booklet, is the following.

One is given a nontrivial normal subgroup $N$ of $A_n$ and one wishes to prove that $N=A_n$. We know that the 3-cycles generate $A_n$. We also know that 3-cycles are pairwise conjugate in the symmetric group, and also, using $n\geq 5$, in the alternating group $A_n$. Consequently, it suffices to prove that $N$ contains a 3-cycle. To that aim, one argues by induction on the cardinality of the support $\supp(g)$ of a nontrivial element $g$ of $N$. 

Here is an attempt at presenting the induction argument in a way which is as straightforward as possible.

Since $g\neq 1$, $\supp(g)$ has cardinality $\geq 2$.

If $\supp(g)$ has cardinality $2$, then $g$ is a transposition, contradicting the hypothesis that $g$ belongs to the alternating group. So the support of $g$ has cardinality $\geq 3$.

If $\supp(g)$ has cardinality $3$, then $g$ is a 3-cycle, and we are done. So we may assume that the support of $g$ has cardinality $\geq 4$.

Let us choose an element $a\in\{1,\dots,n\}$ which belongs to the largest cycle of $g$ and set $b=g(a)$; by assumption, one has $b\neq a$. The proof consists in considering an element $c\in\{1,\dots,n\}$ such that $c\neq a$ and $c\neq b$, the 3-cycle $h=(a\,b\,c)$ and the conjugate $g'=h g h^{-1}$. 

Since $h$ is a 3-cycle, it belongs to the alternating group; since $N$ is normal, one has $g'\in N$.

We wish to apply the induction hypothesis to the element $g' g^{-1}$ of $N$. So we need to prove

  1. $g'\neq g$, and
  2. The support of $g' g^{-1}$ has cardinality strictly smaller than the one of $g$.

To guarantee (1), that $g'\neq g$, it suffices to choose $c$ such that $g'(b)\neq g(b)$. But \[ g'(b) = hgh^{-1}(b) = hg(a) = h(b) = x, \] so the new assumption we make is that $c\neq g(b)$.

The rest of the argument is devoted to finding appropriate conditions on $c$ that guarantee (2). First, observe the inclusion $\supp(g'g^{-1})\subset g(\supp(h))\cup \supp(h)$, which is proved by contraposition. Indeed, if $x$ does not belong to the right hand side, then $g^{-1}(x) \notin \supp(h)$, hence $h^{-1}g^{-1}(x)=g^{-1}(x)$ (for example, using that $\supp(h)=\supp(h^{-1})$), and then $g' g^{-1}(x)=hgh^{-1}(g^{-1}(x))=hg(g^{-1}(x))=h(x)=x$, since $x\not\in h(x)$. This proves that the cardinality of the support of $g'g^{-1}$ is at most 6.

However, since $g(a)=b$ belongs both to $g(\supp(h))$ and to $\supp(h)$, the cardinality is at most 5. Explicitly, $\supp(g'g^{-1})\subset \{a,b,c,g(b), g(c)\}$. In particular, a clever choice for $c$ is only needed when $\supp(g)$ has cardinality 4 or 5!

To conclude in the remaining cases, we remark that there are only two possibilities for the cycle-type of $g$: it can only be $(5)$ or $(2,2)$, since it is an alternating permutation, and we split the discussion according to these two cases:

  • If the cycle-type of $g$ is $(5)$, then we choose for $c$ the “last” element of the cycle of $a$, namely $c=g^{-1}(a)$. Then, $g(c)=a$, so that $\supp(g'g^{-1})\subset\{a,b,c,g(b)\}$ which has at most 4 elements.
  • If the cycle-type of $g$ is $(2,2)$, then we have $g(b)=a$ and we choose for $c$ any fixed point of $g$. Then $\supp(g'g^{-1})\subset\{a,b,c\}$ has at most 3 elements.

About the formalization

One annoying part for formalizing this argument is the elimination of cycle-types. One would like that a computer assistant is able to list all possible cycle-types of a given size. Presumably it can, by I cannot (yet), so I need to do the argument by hand, for that specific value.

In principle, that argument needs to be spelt out in class too. We use two formulas:

  1. The sum of the length of the cycles is the cardinality of the support, hence $4$ or $5$ in this case.
  2. The signature of a permutation is even if and only if the number of cycles and the cardinality of the support have the same parity.

One way to write it down consists in taking the length $m$ of the smallest cycle of $g$. One has $m\geq 2$ by assumption.

  1. If there are no other cycles, then the cycle-type of $g$ is $(m)$. Then $m=4$ or $5$, and only $(5)$ respects the parity constraint.
  2. Otherwise, there is only one other cycles, otherwise the sum of their lengths would be at least $3\cdot 2\geq 6$. If $m'$ is the length of that other cycle, one has $2\leq m\leq m'$. Then $2m\leq m+m'\leq 5$, hence $m\leq 2$, so that $m=2$. This gives $m'\leq 3$, giving two cycle-types $(2,3)$ and $(2,2)$, of which the second one only satisfies the parity constraint.

Monday, December 13, 2021

Not simple proofs of simplicity

The last few weeks, I started my self-education in proof formalization (in Lean) by a test case, the simplicity of the alternating group. In this blog post, I want to discuss some aspects of the underlying mathematics which I think I learnt during the formalization process.

Simple groups. Some examples

This is about groups, eventually finite groups. Let us recall that a subgroup $N$ of a group $G$ is normal if for every $g \in G$ and $n\in N$, one has $gng^{-1}\in N$. Concretely, this means that the two equivalence relations modulo $N$ (left or right) coincide, and that the quotient set $G/N$ inherits a group structure that makes the natural projection $G \to G/N$ a morphism of groups.

In this setting, the group $G$ can be viewed as an extension of the two groups $G/N$ and $N$, which are possibly simpler, and this extension may help at understanding properties of the group $G$. At the opposite, one says that $G$ is simple if it has no normal subgroup besides $\{e\}$ and $G$ itself (and if $G\neq\{e\}$).

Trivial examples of normal subgroups of a group $G$ are $\{e\}$ and $G$ itself. Less trivial examples are the center $Z(G)$ of $G$ (those elements $g$ such that $gh=hg $ for all $h\in G$), and the derived subgroup $D(G)$, the subgroup generated by all commutators $ghg^{-1}h^{-1}$. This commutator subgroup is interesting: any subgroup $N$ of $G$ containing the commutator subgroup $D(G)$ is normal, and the quotient is abelian; and conversely.

The kernel of a group morphism is a normal subgroup, and the construction of a quotient shows that all normal subgroups appear in this way. In particular, the alternating group $A_n$ is a normal subgroup of the symmetric group $S_n$.

A simple group has either $Z(G)=\{e\}$ or $Z(G)=G$; in the latter case, this means that $G$ is commutative, and it quickly appears that $G$ has to be finite of prime order. Consequently, our discussion will now concern groups with trivial center.

The concept of simplicity of groups is classically presented in connection with Galois theory, and the question of solvability of equations by radicals. Namely, a “general” polynomial equation of degre $n$ has $S_n$, the full symmetric group on $n$ elements, for its Galois group, and, if $n\geq 5$, the only possible dévissage of this group consists in introducing the alternating group $A_n$ of even permutations, the subgroup $A_n$ being normal and simple. On the other hand, the solvability of polynomial equation by radicals is equivalent to such a dévissage where all successive quotients are cyclic groups (equivalently abelian groups). Since $A_n$ is not abelian, this implies that a “general” polynomial equation of degree $n$ is not solvable by radicals. However, using simplicity of the alternating group is much stronger than what we need: what would be needed is solvability of the symmetric group, and that this does not hold if $n\geq 5$ is much simpler. Consequently, for the problem of resolution by radicals, it suffices to prove that the derived subgroup $D(A_n)$ of the alternating group is equal to $A_n$.

Theorem. — For $n\geq 5$, one has $D(A_n)=A_n$. In particular, the alternating group and the symmetric group on $n$ letters are not solvable.
I give two (equivalent) proofs, the second one being a computational interpretation of the first one. Both use that the 3-cycles generate $A_n$ and are conjugate in $A_n$. The computational proof is shorter, arguingly simpler. As a matter of fact, I never understood it, nor could remember it, until I translated the conceptual proof into the proof assistant.
Consider the morphism $p\colon A_n\to A_n/D(A_n)$. Since $A_n/D(A_n)$ is commutative, all 3-cycles have the same image. Since the square of a 3-cycle is again a 3-cycle, both have the same image. This implies that for every 3-cycle $g\in A_n$, one has $p(g)=p(g^2)$, hence $p(g)=e$. Since the 3_cycles generate $A_n$, the morphism $p$ is trivial; since it is surjective, one has $A_n/D(A_n)=\{e\}$ and $D(A_n)=A_n$.
Computationally, consider a 3-cycle $g$ and its square $g^2$. Since they are conjugate, there exists $h\in A_n$ such that $g^2=hgh^{-1}$. Then $g=hgh^{-1}g^{-1}$, so that $g$ is a commutator; in particular, $D(A_n)$ contains all commutators and $D(A_n)=A_n$.

The remaining cases, for $n\leq 4, are interesting, but classically left as exercises in text books:

  1. One has $A_1=S_1=\{e\}$;
  2. The group $S_2$ is the cyclic group of order 2, hence is simple and solvable, and $A_2$ is trivial;
  3. The group $S_3$ is a noncommutative group of order 6, and $A_3$ is a cyclic group of order 2.
  4. The groups $S_4$ and $A_4$ are noncommutative and solvable, of orders 24 and 12. The derived subgroups $D(S_4)$ and $D(A_4)$ are both equal to the Klein subgroup $V_4$ of $S_4$, consisting of the permutations of the form $(ab)(cd)$ for $a,b,c,d$ any enumeration of $1,2,3,4$ — “double transpositions” – and of the identity. The group $V_4$ is commutative, isomorphic to $(\mathbf Z/2\mathbf Z)^2$, and the quotient $D(A_4)/V_4$ is cyclic of order $3$.

Another classical series of simple groups appears in linear algebra. Let $F$ be a field and let $n$ be an integer such that $n\geq 2$. The group $\mathrm{GL}(n,F)$ of $n\times n$ invertible matrices is not simple, for it is noncommutative but its center consists of homotheties; we can try to mod out by the center, getting the group $\mathrm{PGL}(n,F)=\mathrm{GL}(n,F)/F^\times$ but that one may not be simple. Indeed, another reason for $\mathrm{GL}(n,F)$ not to be simple is that it admits the special linear group $\mathrm{SL}(n,F)$, kernel of determinant, as a normal subgroup. The group $\mathrm{SL}(n,F)$ has a nontrivial center in general, it consists of homotheties of ratio $a\in F^\times$ such that $a^n=1$ — let us denote it by $\mu_n$. But the quotient $\mathrm{PSL}(n,F)=\mathrm{SL}(n,F)/\mu_n$ is simple in general — in general meaning that is is always the case but for two exceptions:

  1. $n=2$ and $F=\mathbf F_2$. Then $\mathrm{PSL}(2,\mathbf F_2)\simeq S_3$ (by the action on $\mathbf P_1(\mathbf F_2)$, see below), hence is not simple.
  2. $n=2$ and $F=\mathbf F_3$. Then $\mathrm{PSL}(2,\mathbf F_3)\simeq S_4$ (again by the action on $\mathbf P_1(\mathbf F_3)$), and is not simple.

Bilinear algebra gives rise to new groups, orthogonal, unitary and symplectic, which also furnish simple groups up to elementary transformations. By the famous “classification of finite simple groups”, these constructions furnish all finite simple groups, up to 26 (or 27) examples called sporadic groups. This remarkable theorem has a fantastic proof, encompassing thousands of pages across the years 1960-2010.

But the question here is: How can one prove that a given group is simple?

Alternating groups

There is a large supply of proofs that the alternating group $A_n$ is simple for $n\geq 5$. Here is a sketch of one.

Let $N$ be a normal subgoup of $A_n$ ($n\geq 5$) and assume that $N\neq\{e\}$. An element of $A_n$ can be written as a product of an even number of transpositions. If two successive permutations in the product are equal, we can cancel them; if the share exactly one a common member, as in $(a\,b)(a\,c)$, their product is a 3-cycle $(a\,c\,b)$; if they have no common member, their product is a double transposition. On the other hand, if $n\geq 5$, we can either insert $(b\,c)(b\,c)$ in the product $(a\,b)(c\,d)$, writing a double transposition as a product of two 3-cycles, or insert $(d\,e)(d\,e)$ in the product $(a\,b)(a\,c)$, writing a 3-cycle as a product of two double transpositions. Consequently, $A_n$ is generated by, either the 3-cycles, or the double transpositions. Moreover, since $n\geq 5$, we can check that 3-cycles are pairwise conjugate, and similarly for double transpositions; consequently, if the normal subgroup $N$ of $A_n$ contains either a 3-cycle, or a double transposition, it will contain all of them, hence be equal to $A_n$.

When $n=5$, the only case that remains to consider is when $N$ contains a 5-cycle, say $g=(1\,2\,3\,4\,5)$. Conjugating $g$ by the 3-cycle $h=(4\,5\,1)$, we get $hgh^{-1}=(h1\,h2\,h3\,h4\,h5)=(4\,2\,3\,5\,1)\in N$. By construction, this element behaves as $g$ on $5$, but differs. Consequently, the commutator $hgh^{-1}g^{-1}$ is a nontrivial element of $N$ that fixes $5$. By the first two cases, one has $N=A_5$.

A similar argument works in general, arguing by descending induction on the cardinality on the fixed point set of an element $g\neq e$ of $N$. One considers an element $h$ of $A_n$ and the conjugate $hgh^{-1}$; if $g=(a_1\,a_2\,\dots)(b_1\,b_2\,\dots)\dots$, then $hgh^{-1}=(ha_1\,ha_2\,\dots)(hb_1\,hb_2\,\dots)\dots$ is an element of $N$ that behaves as $g$ on many elements, but not all. Consquently, $hgh^{-1}g^{-1}$ is a non trivial element of $N$ that fixes more elements than $g$, and we conclude by induction. (Details can be found in James Milne's lectures, 4.34.)

The Iwasawa criterion

In 1941, Kenkiti Iwasawa published a proof of the simplicity of the projective special linear group. From this proof, a general criterion for simplicity has been abstracted:

Theorem (Iwasawa). — Let $G$ be a group with a primitive action on a set $X$. Assume that one is given, for every $x\in X$, a subgroup $T(x)$ of $G$ satisfying the following properties:

  • For every $x\in X$, the subgroup $T(x)$ is commutative;
  • For every $g\in G$ and $x\in X$, $T(g\cdot x)=g T(x)g^{-1}$;
  • The union of the groups $T(x)$, for $x\in X$, generate $G$.
Then any normal subgroup $N$ of $G$ that acts nontrivially on $X$ contains the commutator subgroup of $G$. In particular, if $G=D(G)$, then $G$ is simple.

There are two classical ways to state that the action is primitive. The simplest states that it is transitive and that the stabilizers of points are maximal subgroups of $G$. Another is that there is no imprimitivity block, a nontrivial partition $(X_i)$ of $X$ such that for every $g\in G$ and every $i$, there exist $j$ such that $g\cdot X_i=$X_j$. One can prove that a 2-transitive action (= transitive on ordered pairs of distinct elements) is primitive.

Iwasawa applies his criterion to $G=\mathrm{SL}(n,F)$ acting on the projective space $\mathbf P_{n-1}(F)$ of lines in $F^n$. Except when $n=2$ and $F$ has 2 or 3 elements, this action is 2-transitive, hence primitive. For a nonzero $x\in F^n$, one considers the group $T(x)$ of linear transformations of the form $y\mapsto y + \phi(y) x$ (transvections), for all linear forms $\phi$ on $F^n$ such that $\phi(x)=0$. They constitute a commutative subgroup of $\mathrm{SL}(n,F)$ (isomorphic, as a group, to $F^{n-1}$). The map $T$ gives rise to the data as in Iwasawa's theorem. Consequently, every normal subgroup $N$ of $\mathrm{SL}(n,F)$ that acts nontrivially on $\mathbf P_{n-1}(F)$ contains the commutator subgroup of $\mathrm{SL}(n,F)$, which is $\mathrm{SL}(n,F)$. Explicitly, either $N$ consists of homotheties, or $N$ contains $\mathrm{SL}(n,F)$. This implies that $\mathrm{PSL}(n,F)$ is simple.

Applying the Iwasawa criterion for symmetric groups

One may wish to apply the Iwasawa criterion to the symmetric group. However, the conclusion is not as nice as what I had hoped initially.

Let $S_n$ act on the set of 2-element subsets of $X=\{1,\dots,n\}$. If $n\geq 5$, the action is primitive. This is not completely obvious, because it this action is not 2-transitive (you cannot map $\{1,2\}$ and $\{1,3\}$ to $\{1,2\}$ and $\{3,4\}$)! Its primitivity means that stabilizers are maximal subgroups, and one can prove that the stabilizer of a 2-element subset is indeed maximal unless $n=4$. It is also faithful (no nontrivial acts trivially). To a pair $\{a,b\}$, associate the subgroup of order 2 generated by the transposition $(a\,b)$. It satisfies the criterion, and this shows that the nontrivial normal subgroups of $S_n$ contain $D(S_n)=A_n$. Since $A_n$ has index 2, this shows $N=A_n$ or $N=S_n$.

It is interesting to guess how the criterion breaks for $n=4$. In fact, the action of $S_4$ on pairs is transitive, but not primitive: the stabilizer of $\{1,2\}$ is a subgroup of $S_4$ of order $4$, consisting of the identiy, two transpositions $(1\,2)$ and $(3\,4)$, and their product. Since $\mathrm{Card}(S_4)=24!=2^3\cdot 3$, this subgroup is contained in a 2-sylow subgroup, of order 8, so is not maximal.

However, and I find it unfortunate, this proof does not imply that the alternating group $A_n$ is simple for $n\geq 5$. To prove the simplicity of $A_n$ along these lines, we need to consider the following actions.

  • Let $A_n$ act on the set $X_3$ of 3-element subsets of $X$. For $x=\{a,b,c\}\in X_3$, consider the subgroup $T(x)$ of order 3 of $A_n$ consisting of the identity and the 3-cycles $(a\,b\,c)$ and $(a\,c\,b)$. (It is the alternating group on $x$, viewed as a subgroup of $A_n$.) The Iwasawa criterion applies provided the action is primitive, which presumably holds for $n>6$. Consequently, $A_n$ is simple for $n\geq 7$. However, if $n=6$, the stabilizer of $\{1,2\,3\}$ in $A_6$ is not maximal, for a reason analogous to the one explained for $S_4$.
  • Let $A_n$ act on the set $X_4$ of 4-element subsets of $X$. For $x\in X_4$, we can consider the Klein subgroup $T(x)$ of $A_4$, acting on $x$, viewed as a subgroup of $A_n$. I hope that the action is primitive, except for $n=8$, and this would prove that $A_n$ is simple for $n\geq 8$. This uses that double transpositions generate $A_n$.
  • One can improve the two preceding arguments a little bit. If $n=5$, we can have $A_n$ act on $X_2$, associating with $x$ the alternating group on the three remaining letters. Therefore, $A_5$ is simple (the action is primitive because the stabilizer of a point is the Klein subgroup of $A_4$, as a subgroup of $A_5$, its cardinality is 12, that of $A_5$ is 60, and since 60/12=5 is prime, the Klein subgroup of $A_4$ is maximal in $A_5$). Similarly, if $n=6$, we have $A_6$ act on $X_2$, associating with $x\in X_2$ the Klein subgroup corresponding to the four remaining letters. Provided one can prove that the stabilizer of a pair in $A_6$ is a maximal subgroup, this gives a proof that $A_6$ is simple!

The primitivity assertions seem to hold. In fact, maximal subgroups of $A_n$ and $S_n$ are classified by the O'Nan–Scott theorem. Those we're concerned with have type (a) in the notation of (Liebeck, Praeger, Saxl, 1987. “A Classification of the Maximal Subgroups of the Finite Alternating and Symmetric Groups.” Journal of Algebra 111 (2): 365–83. DOI:10.1016/0021-8693(87)90223-7), and according to Theorem 1 of that paper, the relevant subgroups are indeed maximal.

Formalization: where am I now?

I have already formalized the Iwasawa criterion in Lean (there's an ongoing pull request for this), as well as the result that the normal subgroups of $S_n$ are $\{e\}$, $A_n$ and $S_n$ for $n\geq 5$.

It remains to formalize the rest of the proof, and the main difficulty will be in proving primitivity, plus some nice way of defining the maps $T$ so that their properties are visible.

I also wish to formalize the simplicity of the special linear group along these lines, and it should be ready for an application to orthogonal, unitary or symplectic groups as well.