Friday, October 13, 2023

A combinatorial proof of the Newton identities

Let T1,,TnT_1,\dots,T_n be indeterminates. For all m0m\geq0, denote by SmS_m the mmth elementary symmetric polynomial, given by  Sm=i1<<imTi1Tim. S_m = \sum_{i_1 \lt \dots \lt i_m} T_{i_1}\dots T_{i_m}. These are polynomials in T1,,TnT_1,\dots, T_n, and as their names suggest, these polynomials are symmetric, meaning that  Sm(Tσ(1),,Tσ(n))=Sm(T1,,Tn) S_m (T_{\sigma(1)},\dots,T_{\sigma(n)}) = S_m (T_1,\dots,T_n) for any permutation σ\sigma of {1,,n}\{1,\dots,n\}. By definition, one has S0=1S_0=1 (there is exactly one family of integers, the empty one, and the empty product is equal to~11) and Sm=0S_m=0 for m>nm>n (there are no families of integers 1i1<<imn1\leq i_1\lt\dots\lt i_m\leq n). A well-known theorem asserts that any symmetric polynomial in T1,,TnT_1,\dots,T_n (with coefficients in a commutative ring RR) can be written uniquely as a polynomial in S1,,SnS_1,\dots,S_n (with coefficients in RR). It is in particular the case for the Newton polynomials, defined for p0p\geq 0 by  Np=T1p++Tnp. N_p = T_1^p + \dots+T_n^p . In particular, N0=nN_0=n and N1=S1N_1=S_1. The next Newton polynomial can be computed by “completing the square”:  N2=T12++Tn2=(T1++Tn)22S1=S122S1. N_2 = T_1^2+\dots+T_n^2 = (T_1+\dots+T_n)^2 - 2 S _1 = S_1^2 - 2S_1. These are the first two (or three) of a family of relations, called the Newton identities, that relate the polynomials NpN_p and the polynomials SmS_m:  m=0p1(1)mNpmSm+(1)ppSp=0.  \sum_{m=0}^{p-1} (-1)^m N_{p-m}\cdot S_m + (-1)^p p \cdot S_p = 0. Using that the coefficient of NpN_p is given by S0=1S_0=1, they allow to express inductively the polynomials NpN_p in terms of S1,,SpS_1,\dots,S_p. If p>np>n, all terms with m>nm>n vanish. Using that the coefficient of SpS_p is N0=pN_0=p, these formulas also show, conversely, that if p!p! is invertible in the commutative ring RR, then S1,,SpS_1,\dots,S_p can be expressed in terms of N1,,NpN_1,\dots,N_p. There are various proofs of these identities. Most of the one I have seen are algebraic in nature, in so that they rely on the relations between the roots of a polynomial and its coefficients, and/or on expansion in power series. The following proof, due to Doron Zeilberger (1984) (Discrete Math. 49, p. 319, doi:10.1016/0012-365X(84)90171-7), is purely combinatorial. I have been made aware of it because it is the one that is formalized in the proof assistant system Lean. (See there for the source code.) We consider the set S={1,,n}S=\{1,\dots,n\} and define a set XX whose elements are the pairs (A,k)(A,k) satisfying the following properties:
  • AA is a subset of SS;
  • kk is an element of SS;
  • the cardinality of AA is less or equal than pp;
  • if Card(A)=p{\mathop {\rm Card}}(A)=p, then kAk\in A.
Define a map f ⁣:XXf\colon X\to X by the following rule: for (A,k)(A,k) in XX, set
  • f(A,k)=(A{k},k)f(A,k) = (A \setminus\{k\}, k) if kAk\in A;
  • f(A,k)=(A{k},k)f(A,k) = (A\cup\{k\}, k) if kAk\notin A.
Write f(A,k)=(A,k)f(A,k)=(A',k); observe that this is again an element of XX. The first two conditions are obvious. In the first case, when kAk\in A, then the cardinality of AA' is one less than the cardinality of AA, hence is at most p1p-1, and the last condition is vacuous. In the second case, when kAk\notin A, the cardinality of AA is strictly smaller than pp, because (A,k)X(A,k)\in X, so that the cardinality of AA' is at most pp; moreover, kAk\in A' hence the last condition holds. Observe that f ⁣:XXf\colon X\to X is an involution: ff=idXf\circ f={\mathrm{id}}_X. Indeed, with the same notation, one has f(A,k)=(A,k)f(A,k)=(A',k). In the first case, kAk\in A, hence kAk\notin A', so that f(A,k)=(A{k},k)=(A,k)f(A',k)=(A'\cup\{k\},k)=(A,k); in the second case, one has mAm\in A', hence f(A,k)=(A{k},k)=(A,k)f(A',k)=(A'\setminus\{k\}, k)=(A,k) since A=A{k}A'=A\cup\{k\} and kAk\notin A. Moreover, ff has no fixed point because it switches pairs (A,k)(A,k) such that kAk\in A and pairs such that kAk\notin A. Zeilberger's proof of the Newton identities build on a function w ⁣:XR[T1,,Tn]w\colon X\to R[T_1,\dots,T_n] such that w(f(A,k))=f(A,k)w(f(A,k))=-f(A,k) for all (A,k)X(A,k)\in X. Since ff has no fixed point, the expression vanishes  (A,k)Xw(A,k) \sum_{(A,k)\in X} w(A,k) since one can cancel a term (A,k)(A,k) with its image f(A,k)f(A,k). Zeilberger's definition of the function ww is  w(A,k)=(1)Card(A)TkpCard(A)aATa. w(A,k) = (-1)^{\mathrm{Card}(A)} T_k^{p-\mathrm{Card}(A)}\cdot \prod_{a\in A} T_a. It satisfies the desired relation: for (A,k)X(A,k)\in X such that kAk\notin A, one has f(A,k)=(A{k},k)f(A,k)=(A\cup\{k\}, k), so that  w(f(A,k))=(1)Card(A)+1TkkCard(A)1aA{k}Ta=(1)Card(A)TkkCard(A)aATa=w(A,k). w(f(A,k)) = (-1)^{\mathrm{Card}(A)+1} T_k^{k-\mathrm{Card}(A)-1} \prod_{a\in A\cup\{k\}} T_a = - (-1)^{\mathrm{Card}(A)} T_k^{k-\mathrm{Card}(A)} \prod_{a\in A} T_a = w(A,k). When kAk\in A, one has f(A,k)=(A,k)f(A,k)=(A',k) with kAk\notin A', and the formula applied to (A,k)(A',k) implies the one for (A,k)(A,k), because ff is an involution. Now, in the relation (A,k)w(A,k)=0 \sum_{(A,k)} w(A,k) =0, we first sum on AA and then on kk:  ASk(A,k)X(1)Card(A)TkpCard(A)aATa=0 \sum_{A\subset S} \sum_{k | (A,k) \in X} (-1)^{\mathrm{Card}(A)} T_k^{p-{\mathrm{Card}(A)}} \prod_{a\in A} T_a = 0 and put together all those sets AA which have the same cardinality, say, mm. This gives  m=0n(1)mCard(A)=maATak (A,k)XTkpm=0. \sum_{m=0}^n (-1)^m \sum_{\mathrm{Card}(A)=m} \prod_{a\in A} T_a \cdot \sum_{k | (A,k)\in X} T_k^{p-m} = 0. Let us compute separately the contribution to the sum of each mm. When m>pm\gt p, no subset AA is authorized so that the corresponding sum is zero. On the opposite, when m<pm\lt p, for any subset AA such that Card(A)=m\mathrm{Card}(A)=m, all pairs (A,k)(A,k) belong to XX, so that the inner sum is NpmN_{p-m}, and the whole term is (1)mSmNpm(-1)^mS_m N_{p-m}. Finally, when m=pm=p, the only kk that appear in the inner sum are the elements of AA, and the corresponding term is kATk0=Card(A)=p\sum_{k\in A} T_k^0=\mathrm {Card}(A)=p; this term contributes to (1)ppSp(-1)^p p S_p.

Friday, August 4, 2023

On soccer balls

This is a swift adaptation of a thread on Mastodon written after having read the beginning of Mara Goyet's column in yesterday's edition of Le Monde (please don't spoil the game by jumping to it now). So look at this picture below.
Does something strike you in it? The directions? The ball? The character? Nothing? 68 people voted, roughly one third found nothing, and two thirds saw that there's a problem with the ball. One nice remark from Sarah was that the ball was larger than the character! So, what's the problem with the ball? As it has been observed by Matt Parker in 2017, no soccer ball can be made following what the picture suggests: what we see is a combination of hexagons, some black, some white, and it is actually impossible to stitch hexagons together to do a ball. Or the other way round, it is impossible to take a ball and draw an hexagonal pattern on it. On the other hand, it is quite easy to draw an hexagonal pattern on a plane, and many kitchen/bathrooms feature such a tiling. Here is a photograph (stolen to somebody on Pinterest) of a 4th cent. BCE pavement (Porta Rosa, in the city of Elea-Velia, Italy)
But on a ball, nope. So how is a soccer ball made if not with hexagons? — The point is that is not made of hexagons only : it has 20 hexagons (so that the sign post above is slightly reminiscent of a soccer ball) but it also has 12 pentagons.
The standard “Tesla” soccer ball with its white hexagons and black pentagons. (Wikimedia, shine2010)
The whole structure forms what is called a truncated icosahedron. A regular icosahedron, as shown on the picture below (the Spinoza monument in Amsterdam) would have 20 triangular faces, such that five of them meet at each vertex. If one make a cut at these vertices, each of them is replaced by a pentagon, and one get the truncated icosahedron. I should add the mathematical explanation for the impossibility of stitching hexagons into a ball. This comes from a formula due to Euler that if one stitches polygons to a ball in anyway, some relation must hold : the number of vertices minus the number of edges plus the number of polygons is always equal to 2. Imagine you have some number of hexagons, say nn. That makes 6n6n edges, but each time you make a stitch, you glue two edges together, hence 3n3n edges in the end. What about the vertices, namely the points when at least 3 hexagons will be joined? If the picture is correct, then 3 hexagons meet at each vertex, so the number of vertices is 6n/3=2n6n/3=2n — Then Euler's formula says 2n3n+n=22n - 3n + n = 2, which is absurd. … Now, is this really a necessity that exactly 3 hexagons meet at each vertex? Couldn't we have more complicated configurations? As a matter of fact, I'm not sure about the answer, and I can try imagine a strange configurations where there are a bunch of hexagons meeting in a different number of points. But algebra seems to contradict this in the end. Let me try. Assume we have s3s_3 vertices where 3 hexagons meet, s4s_4 vertices where 4 vertices meet, etc. The total number of vertices is then s=s3+s4+s = s_3 + s_4 + \cdots. The nn hexagons contribute to 6n6n vertices, but those of type s3s_3 count for 3 of them, those of type s4s_4 for 4 of them, etc. so that 6n=3s3+4s4+=3s+d6n = 3s_3 + 4s_4 +\cdots = 3s + d, where dd is defined by d=s4+2s5+d = s_4 + 2s_5+ \cdots. In particular, we have 6n3s6n \geq 3s, which means 2ns2n \geq s. Euler's formula still says 2=s3n+n=s2n02 = s - 3n + n = s - 2n \leq 0, still giving a contradiction. And if you followed that proof, you may object that there is actually a way to stitch hexagons into a ball: it consists in stitching exactly 2 hexagons together. You get something completely flat, but if you blow enough air or fill it with wool, so that it gets round enough, you might feel that you got a ball. The picture made by the hexagons on a ball is that of an hexagon drawn on the equator, each of them covering one hemisphere. Here are additional references
  • A post on the BBC site, where the problem is explained roughly, up to the fact that the UK authorities refused to propose a correct design for the post after Matt Parker raised up the issue
  • A video by Matt Parker on his YouTube channel, @standupmaths, where he discusses how soccer balls are made, especially in the 2014 WWC or the UK Premier Ligue
  • A text by Étienne Ghys on the large audience website Image des mathématiques about the 2014 WWC soccer ball (in French)
  • Euler's formula has a rich history which is evoked in the Wikipedia page quoted above, but is also the matter of the great book Proofs and Refutations by Imre Lakatos.

Thursday, July 6, 2023

Electrostatics and rationality of power series

I would like to tell here a story that runs over some 150 years of mathematics, around the following question: given a power series anTn\sum a_n T^n (in one variable), how can you tell it comes from a rational function?

There are two possible motivations for such a question. One comes from complex function theory: you are given an analytic function and you wish to understand its nature — the simplest of them being the rational functions, it is natural to wonder if that happens or not (the next step would be to decide whether that function is algebraic, as in the problem of Hermann Amandus Schwarz (1843–1921). Another motivation starts from the coefficients (an)(a_n), of which the power series is called the generating series; indeed, the generating series is a rational function if and only if the sequence of coefficients satisfies a linear recurrence relation.

At this stage, there are little tools to answer that question, besides is a general algebraic criterion which essentially reformulates the property that the (an)(a_n) satisfy a linear recurrence relation. For any integers mm and qq, let DmqD_m^q be the determinant of size (q+1)(q+1) given by Dmq=amam+1am+qam+1am+2am+q+1am+qam+q+1am+2q. D_m^q = \begin{vmatrix} a_m & a_{m+1} & \dots & a_{m+q} \\ a_{m+1} & a_{m+2} & \dots & a_{m+q+1} \\ \vdots & \vdots & & \vdots \\ a_{m+q} & a_{m+q+1} & \dots & a_{m+2q} \end{vmatrix}. These determinants are called the Hankel determinants or (when m=0m=0) the Kronecker determinants, from the names of the two 19th century German mathematicians Hermann Hankel (1839—1873) and Leopold von Kronecker (1823–1891). With this notation, the following properties are equivalent:

  1. The power series anTn\sum a_n T^n comes from a rational function;
  2. There is an integer qq such that Dmq=0D_m^q=0 for all large enough integers mm;
  3. For all large enough integers qq, one has D0q=0D^q_0=0.
(The proof of that classic criterion is not too complicated, but the standard proof is quite smart. In his book Algebraic numbers and Fourier analysis, Raphaël Salem gives a proof which arguably easier.)

Since this algebraic criterion is very general, it is however almost impossible to prove the vanishing of these determinants without further information, and it is at this stage that Émile Borel enters the story. Émile Borel (1871–1956) has not only be a very important mathematician of the first half of the 20th century, by his works on analysis and probability theory, he also was a member of parliament, a minister of Navy, a member of Résistance during WW2. He founded the French research institution CNRS and of the Institut Henri Poincaré. He was also the first president of the Confédération des travailleurs intellectuels, a intellectual workers union.

In his 1893 paper « Sur un théorème de M. Hadamard », Borel proves the following theorem:

Theorem. — If the coefficients ana_n are integers and if the power series anTn\sum a_n T^n “defines” a function (possibly with poles) on a disk centered at the origin and of radius strictly greater than 1, then that power series is a rational function.

Observe how these two hypotheses belong to two quite unrelated worlds: the first one sets the question within number theory while the second one resorts from complex function theory. It looks almost as magic that these two hypotheses lead to the nice conclusion that the power series is a rational function.

It is also worth remarking that the second hypothesis is really necessary for the conclusion to hold, because rational functions define functions (with poles) on the whole complex plane. The status of the first hypothesis is more mysterious. While it is not necessary, the conclusion may not hold without it. For example, the exponential series Tn/n!\sum T^n/n! does define a function (without poles) on the whole complex plane, but is not rational (it grows too fast at infinity).

However, the interaction of number theoretical hypotheses with the question of the nature of power series was not totally inexplored at the time of Borel. For example, a 1852 theorem of the German mathematician Gotthold Eisenstein (Über eine allgemeine Eigenschaft der Reihen-Entwicklungen aller algebraischen Functionen) shows that when the coefficients ana_n of the expansion anTn\sum a_nT^n of an algebraic functions are rational numbers, the denominators are not arbitrary: there is an integer D1D\geq 1 such that for all nn, anDn+1a_n D^{n+1} is an integer. As a consequence of that theorem of Eisenstein, the exponential series or the logarithmic series cannot be algebraic.

It's always time somewhere on the Internet for a mathematical proof, so that I have no excuse for avoiding to tell you *how* Émile Borel proved that result. He uses the above algebraic criterion, hence needs to prove that some determinants DmqD^q_m introduced above do vanish (for some qq and for all mm large enough). Then his idea consists in observing that these determinants are integers, so that if you wish to prove that they vanish, it suffices to prove that they are smaller than one!

If non-mathematicians are still reading me, there's no mistake here: the main argument for the proof is the remark that a nonzero integer is at least one. While this may sound as a trivial remark, this is something I like to call the main theorem of number theory, because it lies at the heart of almost all proofs in number theory.

So one has to bound determinants from above, and here Borel invokes the « théorème de M. Hadamard » that a determinant, being the volume of the parallelipiped formed by the rows, is smaller than the product of the norms of these rows, considered as vectors of the Euclidean space : in 2-D, the area of a parallelogram is smaller than the lengths of its edges! (Jacques Hadamard (1865—1963) is known for many extremely important results, notably the first proof of the Prime number theorem. It is funny that this elementary result went into the title of a paper!)

But there's no hope that using Hadamard's inequality of our initial matrix can be of some help, since that matrix has integer coefficients, so that all rows have size at least one. So Borel starts making clever row combinations on the Hankel matrices that take into accounts the poles of the function that the given power series defines.

Basically, if f=anTnf=\sum a_nT^n, there exists a polynomial h=cmTmh=\sum c_mT^m such that the power series g=fh=bnTng=fh = \sum b_n T^n defines a function without poles on some disk D(0,R)D(0,R) where R>1R>1. Using complex function theory (Cauchy's inequalities), this implies that the coefficients bnb_n converge rapidly to 0, roughly as RnR^{-n}. For the same reason, the coefficients ana_n cannot grow to fast, at most as rnr^{-n} for some r>0r>0. The formula g=fhg=fh shows that coefficients bnb_n are combinations of the ana_n, so that the determinant DnqD_n^q is also equal to anan+1 an+qan+p1an+p an+p+q1 bn+pbn+p+1bn+p+qbn+qbn+q+1bn+2q \begin{vmatrix} a_n & a_{n+1} & \dots & a_{n+q} \\ \vdots & & \vdots \\ a_{n+p-1} & a_{n+p} & \dots & a_{n+p+q-1} \\ b_{n+p} & b_{n+p+1} & & b_{n+p+q} \\ \vdots & & \vdots \\ b_{n+q} & b_{n+q+1} & \dots & b_{n+2q} \end{vmatrix} Now, Hadamard's inequality implies that the determinant DnqD_n^q is (roughly) bounded above by (rn)p(Rn)q+1p (r^{-n} )^p (R^{-n}) ^{q+1-p} : there are pp lines bounded above by some rnr^{-n} and the next q+1pq+1-p are bounded above by RnR^{-n}. This expression rewrites as 1/(rpRq+1p)n 1/(r^pR^{q+1-p})^n. Since R>1R>1, we may choose qq large enough so that rpRq+1p>1r^p R^{q+1-p}>1, and then, when nn grows to infinity, the determinant is smaller than 1. Hence it vanishes!

The next chapter of this story happens in 1928, under the hands of the Hungarian mathematician George Pólya (1887-1985). Pólya had already written several papers which explore the interaction of number theory and complex function theory, one of them will even reappear later one in this thread. In his paper “Über gewisse notwendige Determinantenkriterien für die Fortsetzbarkeit einer Potenzreihe”, he studied the analogue of Borel's question when the disk of radius RR is replaced by an arbitrary domain UU of the complex plane containing the origin, proving that if UU is big enough, then the initial power series is a rational function. It is however not so obvious how one should measure the size of UU, and it is at this point that electrostatics enter the picture.

In fact, it is convenient to make an inversion : the assumption is that the series an/Tn\sum a_n / T^n defines a function (with poles) on the complement of a compact subset KK of the complex plane. Imagine that this compact set is made of metal, put at potential 0, and put a unit electric charge at infinity. According to the 2-D laws of electrostatics, this create an electric potential VKV_K which is identically 00 on KK and behaves as VK(z)log(z/CK) V_K(z)\approx \log(|z|/C_K) at infinity. Here, CKC_K is a positive constant which is the capacity of KK.

Theorem (Pólya). — Assume that the ana_n are integers and the series an/Tn\sum a_n/T^n defines a function (with poles) on the complement of KK. If the capacity of KK is <1\lt1, then anTn\sum a_n T^n is rational.

To apply this theorem, it is important to know of computations of capacities. This was a classic theme of complex function theory and numerical analysis some 50 years ago. Indeed, what the electric potential does is solving the Laplace equation ΔVK=0\Delta V_K=0 outside of KK with Dirichlet condition on the boundary of KK.

In fact, the early times of complex analysis made a remarkable use of this fact. For example, it was by solving the Laplace equation that Bernhard Riemann proved the existence of meromorphic functions on “Riemann surfaces”, but analysis was not enough developed at that time (around 1860). In a stunningly creative move, Riemann imagines that his surface is partly made of metal, and partly of insulating material and he deduces the construction of the desired function from the electric potential.

More recently, complex analysis and potential theory also had applications to fluid dynamics, for example to compute (at least approximately) the flow of air outside of an airplane wing. (I am not a specialist of this, but I'd guess the development of numerical methods that run on modern computers rendered these beautiful methods obsolete.)

The relation between the theorems of Borel and Pólya is that the capacity of a disk is its radius. This can be seen by the fact that V(z)=log(z/RV(z)=\log(|z|/R\) solves the Laplace equation with Dirichlet condition outside of the disk of radius RR.

A few other capacities have been computed, not too many, in fact, because it appears to be a surprisingly difficult problem. For example, the capacity of an interval is a fourth of its length.

Pólya's proof is similar to Borel's, but considers the Kronecker determinant in place of Hankel's. However, the linear combinations that will allow to show that this determinant is small are not as explicit as in Borel's proof. They follow from another interpretation of the capacity introduced by the Hungarian-Israeli mathematician Michael Fekete (1886–1957; born in then Austria–Hungary, now Serbia, he emigrated to Palestine in 1928.)

You know that the diameter d2(K)d_2(K) of KK is the upper bound of all distances xy|x-y| where x,yx,y are arbitrary points of KK. Now for an integer n2n\geq 2, consider the upper bound dn(K)d_n(K) of all products of distances ijxjxi \prod_{i\neq j}{x_j-x_i}^{1/n(n-1)}\) where x1,,xnx_1,\dots,x_n are arbitrary points of KK. It is not so hard to prove that the sequence dn(K)d_n(K) decreases with nn, and the limit δ(K)\delta(K) of that sequence is called the transfinite diameter by Fekete.

Proposition.δ(K)=CK \delta(K)= C_K.

This allows to make a link between capacity theory and another theme of complex function theory, namely the theory of best approximation, which end up in Pólya's proof: the adequate linear combination for the nnth row is given by the coefficients of the monic polynomial of degree nn which has the smallest least upper bound on KK.

If all this is of some appeal to you, there's a wonderful little book by Thomas Ransford, Potential Theory in the Complex Plane, which I find quite readable (say, from 3rd or 4th year of math studies on).

In the forthcoming episodes, I'll discuss two striking applications of the theorems of Borel and Pólya to proof by Bernhard Dwork of a proof of a conjecture of Weil (in 1960), and by a new proof (in 1987) by Jean-Paul Bézivin and Philippe Robba of the transcendence of the numbers ee and π\pi, two results initially proven by Charles Hermite and Ferdinand von Lindemann in 1873 and 1882.

Friday, February 17, 2023

Associated prime ideals

\gdef\ann{\mathop{\mathrm{ann}}} \gdef\Ass{\mathop{\mathrm{Ass}}}\gdef\Spec{\mathop{\mathrm{Spec}}}

I would like to go back to a quite delicate question of commutative algebra, that of associated prime ideals of modules. In most textbooks (Bourbaki, Matsumura…), this concept is considered for modules over a noetherian ring, while it is also necessary to consider it in a greater generality for some applications in algebraic geometry. For my book, (Mostly) commutative algebra (Springer Nature, 2021), I preferred to introduce the general concept (§6.5), because I observed that the initial proofs are in fact easier. In yesterday's class (Cohomology of coherent sheaves, 2nd year of Master course at Université Paris Cité), some remarks of a student, Elias Caeiro, helped me simplify two steps of the treatment I proposed in my book.

Definition.Let AA be a ring and let MM be an AA-module. Say that a prime ideal PP of AA is associated to MM if there exists an element mMm\in M such that PP is minimal among all prime ideals containing annA(m)\ann_A(m).
We write AssA(M)\Ass_A(M) (sometimes spelt out as “assassin”) for the set of all associated prime ideals of MM.

(Here, annA(m)\ann_A(m) is the annihilator of mm, the ideal of all aAa\in A such that am=0am=0.)

There is a geometric way to intepret this definition: it means that in the spectrum Spec(A)\Spec(A), the irreducible closed set V(P)V(P) (of which PP is the generic point) is an irreducible component of V(annA(m))V(\ann_A(m)). Thanks to this remark, associated prime ideals are compatible with localisation: AssS1A(S1A)=AssA(M)Spec(S1A), \Ass_{S^{-1}A}(S^{-1}A) = \Ass_A(M) \cap \Spec(S^{-1}A), where Spec(S1A)\Spec(S^{-1}A) is identified as the subset of Spec(A)\Spec(A) consisting of prime ideals PP which are disjoint from SS. In particular, PP is associated to MM if and only if the maximal ideal PAPPA_P of the local ring APA_P is associated to the module MPM_P.

Here is what the associated prime ideals mean, from the point view of module theory.
Proposition. — Let aAa\in A.
a) The multiplication by aa is injective in MM if and only if aa does not belong to any associated prime ideal of MM.
b) The localized module MaM_a is zero if and only if aa belongs to all associated prime ideals of MM.
c) In particular, M=0M=0 if and only if AssA(M)=\Ass_A(M)=\emptyset.

Proof. — a) If aa belongs to the associated prime ideal PP, then aa belongs to the associated prime ideal PAPPA_P of MPM_P, which means that there exists mMm\in M such that PAPPA_P is the only prime ideal containing annAP(m)\ann_{A_P}(m). Consequently, aa is nilpotent modulo annAP(m)\ann_{A_P}(m) and there exists n0n\geq 0 and bAPb\in A\setminus P such that anbannA(m)a^nb\in\ann_A(m). Take a minimal such nn. Since bPb\notin P, one has n1n\geq 1; then an1bm0a^{n-1}b m\neq0, while aan1bm=0a\cdot a^{n-1}bm=0 and the homothety (a)M(a)_M is not injective. Conversely, if (a)M(a)_M is not injective, take m0m\neq0 in MM such that am=0am=0; the annihilator annA(m)\ann_A(m) is not equal to AA, hence V(annA(m))V(\ann_A(m))\neq \emptyset; take an irreducible component of this closed subset — equivalently a minimal prime ideal PP among those containing annA(m)\ann_A(m); one has aannA(m)a\in \ann_A(m), hence aPa\in P.
b) follows from c), with a=1a=1.
c) The module MM is zero if and only if the multiplication by 00 is injective on MM. By a), this is equivalent to the fact that AssA(M)\Ass_A(M) is empty.

Corollary.A prime ideal PP is in the support of MM if and only if it contains some associated prime ideal.
The prime ideal PP belongs to the support of MM if and only if MP0M_P\neq0, if and only if AssAP(MP)\Ass_{A_P}(M_P) is not empty, if and only if there exists an associated prime ideal of MM which belongs to Spec(AP)\Spec(A_P), that is, is contained in PP.

For noetherian rings, one has the following characterization of associated prime ideals, which is usually taken at their definition.

Theorem.Let AA be a noetherian ring and MM be an AA-module. A prime ideal PP of AA is associated to MM if and only if there exists mMm\in M such that P=annA(m)P=\ann_A(m).
If P=annA(m)P=\ann_A(m), then PP is associated to MM. Conversely, let mMm\in M and let PP be a minimal prime ideal of AA among those containing annA(m)\ann_A(m). We first assume that AA is local with maximal ideal PP; then PP is the only prime ideal of AA that contains annA(m)\ann_A(m), which implies that any element of PP is nilpotent modulo annA(m)\ann_A(m). Since PP is finitely generated (because AA is noetherian), there exists an integer nn such that PnannA(m)P^n\subseteq \ann_A(m). Take a minimal such nn. Since annA(m)P\ann_A(m)\subseteq P, one has n1n\geq 1; then Pn1⊈annA(m)P^{n-1}\not\subseteq\ann_A(m) so that there exists bPn1b\in P^{n-1} such that bm0bm\neq0. Then abPnab\in P^n for every aPa\in P, so that PannA(bm)P\subseteq \ann_A(bm), and annA(bm)P\ann_A(bm)\subseteq P because bm0bm\neq0. Consequently, P=annA(bm)P=\ann_A(bm). In the general case, we use the case of a local ring to obtain mMm\in M such that annAP(m/1)=PAP\ann_{A_P}(m/1)=PA_P. Consequently, annA(m)P\ann_A(m)\subseteq P, and for every aPa\in P, there exists bPb\notin P such that abm=0abm=0. Using that PP is finitely generated, one finds bPb\notin P such that abm=0abm=0 for every aPa\in P; then annA(bm)=P\ann_A(bm)=P, as was to be shown.

From that point on, both presentations converge. One deduces from the preceding theorem that if AA is noetherian and MM is finitely generated, there exists a composition series 0=M0M1Mn=M0=M_0\subseteq M_1 \subseteq \dots \subseteq M_n=M, with successive quotients Mk/Mk1M_k/M_{k-1} of the form A/PkA/P_k, for some prime ideals PkP_k of AA, and then AssA(M)\Ass_A(M) is contained in {P1,,Pn}\{P_1,\dots,P_n\}, in view of the following lemma. In particular, AssA(M)\Ass_A(M) is finite.

Lemma.Let MM be an AA-module and let NN be a submodule of MM; then AssA(N)AssA(M)AssA(N)AssA(M/N) \Ass_A(N)\subseteq \Ass_A(M)\subseteq \Ass_A(N)\cup \Ass_A(M/N).
The first inclusion AssA(N)AssA(M)\Ass_A(N)\subseteq \Ass_A(M) follows from the definition. Let us prove the second one. Let PAssA(M)P\in\Ass_A(M) and let mMm\in M be such that PP is a minimal prime ideal of AA among those containing annA(m)\ann_A(m). Let mm' be the image of MM in M/NM/N. If PP contains annA(m)\ann_A(m'), then PP is also minimal among such prime ideals, hence PAssA(M/N)P\in\Ass_A(M/N). Otherwise, there exists bannA(m)b\in \ann_A(m') such that bPb\notin P. Let us prove that PP is minimal among the prime ideals containing annA(bm)\ann_A(bm). First of all, let aannA(bm)a\in\ann_A(bm); then abm=0abm=0, hence abPab\in P, hence aPa\in P since bPb\notin P. Since annA(m)annA(bm)\ann_A(m)\subseteq\ann_A(bm), it also follows that PP is minimal among the prime ideals containing annA(bm)\ann_A(bm). Since bannA(m)b\in\ann_A(m'), one has bm=0bm'=0, hence bmNbm\in N and PAssA(N)P\in\Ass_A(N).

Sunday, January 1, 2023

The Klein group, the centralizer of a permutation, and its relation with the alternating group

The following reflexion came out of my irrepressible need to understand why the 3 double transpositions in S4\mathfrak S_4, together with the identity, formed a group VV. Of course, one might just say: “they are stable under multiplication, as one sees by computing the 4·3/2 = 6 different products”, but who makes this computation anyway? And since I wanted not only to understand this, but to explain it to Lean, I needed an argument that could actually be done, for real. So here is an argument that requires no computation, besides the one that says that there are 3 double transpositions.

Prop.The subgroup of S4\mathfrak S_4 generated by the 3 double transpositions is the unique 2-sylow subgroup of A4\mathfrak A_4. In particular, it has order 4 and consists of these 3 double transpositions and of the identity.
Proof. — Let VV be the subset of S4\mathfrak S_4 consisting of these 3 double transpositions and of the identity. Let SS be a 2-sylow subgroup in A4\mathfrak A_4.
We first prove SVS \subseteq V. The subgroup SS has order 4. Let gSg\in S. The order of gg divides 4, so its cycles have lengths 1, 2 or 4. If there were one cycle of length 4, then gg would be that cycle, hence of odd sign. Consequently, either g=1g=1, or gg has a cycle of length 2, and then there must be a second because gg is even. Consequently, SVS\subseteq V, as claimed.
Since 4=Card(S)=Card(V)4=\operatorname{\rm Card}(S)=\operatorname{\rm Card}(V), this shows that S=VS=V, hence S=VS=\langle V\rangle.

At this point, we still need to understand why there are 3 double transpositions. More generally, I wanted to prove that the number of permutations in Sn\mathfrak S_n of given orbit type. The orbit type a permutation gg is a multiset of strictly positive integers with sum nn given by the cardinalities of the orbits of gg on {1,,n}\{1,\dots,n\}. We write it as 1n12n2rnr1^{n_1} 2^{n_2}\dots r^{n_r}, meaning that gg has n1n_1 orbits of length 11 (fixed points), n2n_2 orbits of cardinality 22, etc., so that n=niin= \sum n_i i. Let Og\mathscr O_g be the set of orbits of gg. The action of gg on a given orbit cc coincides with a circular permutation with order the length (c)\ell(c) of this orbit; when it is nontrivial, such a permutation will be called a cycle of gg. The supports of these cycles are pairwise disjoint, so that these cycles commute, and their product is exactly gg. In fact, this is the only way of writing gg as a product of cycles with pairwise disjoint supports. (By convention, the identity is not a cycle.)

Theorem.There are exactly N(1n1rnr)=n!1n1rnr n1!nr! N(1^{n_1}\dots r^{n_r}) = \frac{n!}{1^{n_1}\dots r^{n_r} n_1!\dots n_r!} permutations with orbit type 1n12n2rnr1^{n_1} 2^{n_2}\dots r^{n_r}.

A standard proof of this result goes as follows. Write the decomposition of such a permutation gg into cycles with disjoint supports as g=()()(,)(,,)g=(\cdot)\dots (\cdot)(\cdot,\cdot)\dots(\cdot,\cdot,\dots), leaving blank spaces for the values of the cycles (and, contradicting our convention, allowing for cycles of length 1…). There are n!n! ways to fill these spaces with the nn distinct integers between 11 and nn, but some of them will give rise to the same permutation. Indeed, the entries in a cycle of length ss only count up to a circular permutation, so that we need to divide the result by 1n1rnr1^{n_1}\dots r^{n_r}. Moreoveer, we can switch the order of the cycles of given length, hence we also need to divide that result by ns!n_s! (number of ways of switching the various cycles of length ss), for all possible length ss.

This is reasonably convincing but one could wish for something more precise, both in the proof, and in the statement. In fact, in the preceding formula, the numerator n!n! is the order of Sn\mathfrak S_n. Since all permutations with a given orbit type are conjugate by Sn\mathfrak S_n, the left hand side appears as the cardinality of the orbit of a permutation gg of that orbit type, so that the denominator has to be equal the cardinality of the stabilizer of this permutation under the action by conjugation. Therefore, a more precise proof of this formula could run by elucidating the structure of this centralizer. This may also be interesting once one wishes to relativize the result to the alternating group An\mathfrak A_n in order to obtain a formula for the cardinality of the various conjugacy classes in An\mathfrak A_n.

Let us fix a permutation gSng\in\mathfrak S_n with orbit type 1n1rnr1^{n_1}\dots r^{n_r}. The stabilizer of gg under the action by conjugation is its centralizer ZgZ_g, the subgroup of all kSnk\in\mathfrak S_n which commute with gg.

We first define a morphism of groups  τ ⁣:ZgSn1×Sn2×Snr. \tau \colon Z_g \to \mathfrak S_{n_1}\times \mathfrak S_{n_2}\times\dots \mathfrak S_{n_r}. Let Og\mathscr O_g be the set of orbits of gg; this is a set with cardinality n1+n2++nrn_1+n_2+\dots+n_r. Restricted to one orbit, the action of gg coincides with that of a circular permutation on (which fixes the complementary subset); these circular permuations have disjoint supports, hence they commute pairwise and their product is equal to gg. For cOgc\in\mathscr O_g, we write (c)\ell(c) for its cardinality of its support, this is also the order of the cycle of gg defined by this orbit. If kZgk\in Z_g, then kgk1=gkgk^{-1}=g. Consequently, the action of kk permutes the orbits of gg, respecting their cardinalities. This defines the desired group morphism τ\tau from ZgZ_g to a product of permutation groups Sn1×Snr\mathfrak S_{n_1}\times \dots \mathfrak S_{n_r}.

This morphism τ\tau is surjective.
Indeed, given permutations σ1\sigma_1 of the set of fixed points of gg, σ2\sigma_2 of the set of orbits of length 2, etc., we construct kσZgk_\sigma\in Z_g such that τ(kτ)=(σ1,,σr)\tau(k_\tau)=(\sigma_1,\dots,\sigma_r). We fix a point aca_c in each orbit cc and decide that kσ(ac)=aσi(c)k_\sigma(a_c)=a_{\sigma_i(c)} if cc has length ii. The formula kσg=gσk_\sigma g=g_\sigma imposes kσ(gnac)=gnaσi(c)k_\sigma (g^n a_c)=g^n a_{\sigma_i(c)} for all nZn\in\mathbf Z, and it remains to check that this formula gives a well defined element in ZgZ_g. In fact, this formula defines a group theoretic section of τ\tau.

What is the kernel of this morphism τ\tau?
If τ(k)=1\tau(k)=1, then kk fixes every orbit cOgc\in\mathscr O_g. Since kg=gkkg=gk, we obtain that on each orbit cc, kk coincides with some power of the corresponding cycle, which has order (c)\ell(c). We thus obtain an isomorphism  ker(τ)cCg(Z/(c)Z). \ker(\tau) \simeq \prod_{c\in\mathscr C_g} (\mathbf Z/\ell(c)\mathbf Z).

To compute the cardinality of ZgZ_g, it is now sufficient to compute those of im(τ)\operatorname{\rm im}(\tau) and ker(τ)\ker(\tau), and this gives the formula  Card(Zg)=Card(ker(τ))Card(im(τ))=1n1rnrn1!nr!, \operatorname{\rm Card} (Z_g) = \operatorname{\rm Card} (\ker(\tau)) \operatorname{\rm Card} (\operatorname{\rm im}(\tau)) = 1^{n_1}\dots r^{n_r} n_1! \dots n_r!, as was to be shown.

One of the interest of this argument is that it can be pushed forward to understand the structure of the conjugacy classes in the alternating group An\mathfrak A_n. The case n1n\leq 1 is uninteresting, hence we assume n2n\geq 2. Then An\mathfrak A_n has index 2 in Sn\mathfrak S_n, and the formulas   Card((g)An)=Card(An)Card(ZgAn)and Card((g)Sn)=Card(Sn)Card(Zg) \operatorname  {\rm Card}((g)_{\mathfrak A_n}) = \frac{{\rm Card}({\mathfrak A_n})}{{\rm Card}(Z_g \cap {\mathfrak A_n})} \quad\text{\rm and}\quad \operatorname  {\rm Card}((g)_{\mathfrak S_n}) = \frac{{\rm Card}({\mathfrak S_n})}{{\rm Card}(Z_g)} for the cardinalities of the conjugacy classes (g)An(g)_{\mathfrak A_n} and (g)Sn(g)_{\mathfrak S_n} imply that both are equal if and only if ZgZ_g is not contained in An\mathfrak A_n; otherwise, the conjugacy class (g)Sn(g)_{\mathfrak S_n} is the disjoint union of (g)An(g)_{\mathfrak A_n} and of a conjugacy class (g)An(g')_{\mathfrak A_n} of a permutation gg' which is conjugate to gg in Sn\mathfrak S_n but not in An\mathfrak A_n, and both have the same cardinality.

Examples of this phenomenon are classical. For example, the 5-cycles in S5\mathfrak S_5 are conjugate, but they constitute two distinct conjugacy classes under A5\mathfrak A_5. Even more elementary, the 3-cycles (123)(1\,2\,3) and (132)(1\,3\,2) are conjugate in S3\mathfrak S_3, but they are not conjugate in A3\mathfrak A_3 since that group is commutative!

So let us use our description of ZgZ_g to give a full description of this phenomenon.

As a first step, when is ker(τ)\ker(\tau) contained in An\mathfrak A_n? We have seen that ker(τ)\ker(\tau) is generated by the cycles cCgc\in\mathscr C_g. Consequently, ker(τ)\ker(\tau) is contained in An\mathfrak A_n if and only if all of them are contained in An\mathfrak A_n, which means that their lengths are odd.

We assume that this condition holds, so that ker(τ)An\ker(\tau)\subseteq \mathfrak A_n, and now work on the image of τ\tau. Its surjectivity was proved by the means of an explicit section σkσ\sigma\mapsto k_\sigma. Given the preceding condition that ker(τ)An\ker(\tau)\subseteq \mathfrak A_n, a necessary and sufficient condition for the inclusion ZgAnZ_g\subseteq \mathfrak A_n will be that the image of this section consists of even permutations. This section is a morphism of groups, so it suffices to understand the sign of kσk_\sigma when σ\sigma consists of a cycle (c1,,cs)(c_1,\dots,c_s) in Sni\mathfrak S_{n_i} and is trivial on the other factors. Then (c1)==(cs)\ell(c_1)=\dots=\ell(c_s), by definition of σ\sigma. The formula kσ(gnac)=gnaσ(c)k_\sigma(g^n a_c)=g^n a_{\sigma(c)} shows that the non trivial cycles of kσk_\sigma are of the form (gnac1,,gnacs)(g^n a_{c_1},\dots, g^n a_{c_s}); they all have the same length, ss, and there are (c1)\ell(c_1) of them. Consequently, the sign of kσk_\sigma is equal to (1)(s1)(c1)=(1)s1(-1)^{(s-1)\ell(c_1)}=(-1)^{s-1} since (c1)\ell(c_1) is odd. This proves that the sign of kσk_\sigma is equal to the sign of σ\sigma. In addition to the condition that the orbits of gg have odd cardinalities, a necessary and sufficient condition for the image of σkσ\sigma\mapsto k_\sigma to be contained in An\mathfrak A_n is thus that all symmetric groups Sni\mathfrak S_{n_i} coincide with their alternating groups, that is, ni1n_i\leq 1 for all ii. We can now conclude:

Theorem. — Let 1n1rnr1^{n_1}\dots r^{n_r} be a partition of nn.

  1. If ni=0n_i=0 for even ii, and ni1n_i\leq 1 for all ii, then there exist two permutations in Sn\mathfrak S_n with orbit type 1n1rnr1^{n_1}\dots r^{n_r} which are not conjugate in An\mathfrak A_n.
  2. Otherwise, any two permutations in Sn\mathfrak S_n with that orbit type are conjugate in An\mathfrak A_n.

We can check the initial example of two 5-cycles in S5\mathfrak S_5 which are not conjugate in A5\mathfrak A_5. Their orbit type is 515^1: the only length that appears is 5, hence odd, and it has multiplicity 11. In fact, this is the only orbit type in S5\mathfrak S_5 where this phenomenon appears!