Friday, April 25, 2025

Yet another proof of the Weierstrass approximation theorem

Browsing through my Zotero database, I fall upon a paper by Harald Kuhn where he proposes an elementary proof of the Weierstrass approximation theorem. The proof is indeed neat, so here it is.

Theorem.Let $f\colon[0;1]\to\mathbf R$ be a continuous function and let $\varepsilon$ be a strictly positive real number. There exists a polynomial $P\in\mathbf R[T]$ such that $\left|P(x)-f(x)\right|<\varepsilon$ for every $x \in[0;1]$.

The proof goes out of the world of continuous functions and considers the Heaviside function $H$ on $\mathbf R$ defined by $H(x)=0$ for $x<0$ and $H(x)=1$ for $x\geq 0$ and will construct a polynomial approximation of $H.$

Lemma.There exists a sequence $(P_n)$ of polynomials which are increasing on $[-1;1]$ and which, for every $\delta>0$, converge uniformly to the function $H$ on $[-1;1]\setminus [-\delta;\delta]$.

For $n\in\mathbf N$, consider the polynomial $Q_n=(1-T^n)^{2^n}$. On $[0;1]$, it defines an decreasing function, with $Q_n(0)=1$ and $Q_n(1)=0$.

Let $q\in[0;1]$ be such that $0\leq q<1/2$. One has $$ Q_n(q) = (1-q^n)^{2^n} \geq 1-2^n q^n $$ in view of the inequality $(1-t)^n \geq 1-nt$ which is valid for $t\in[0;1]$. Since $2q<1$, we see that $Q_n(q)\to 1$. Since $Q_n$ is decreasing, one has $Q_n(q)\leq Q_n(x)\leq Q_n(1)=1$ for every $x\in[0;q]$, which shows that $Q_n$ converges uniformly to the constant function $1$ on the interval $[0;q]$.

Let now $q\in[0;1]$ be such that $1/2<q\leq 1$. Then $$ \frac1{Q_n(q)} = \frac1{(1-q^n)^{2^n}} = \left(1 + \frac{q^n}{1-q^{n}}\right)^{2^n} \geq 1 + \frac{2^nq^n}{1-q^{n}} $$ so that $Q_n(q)\to 0$. Since $Q_n$ is decreasing, one has $0=Q_n(1)\leq Q_n(x)\leq Q_n(q)$ for every $x\in[q;1]$, so that $Q_n$ converges uniformly to the constant function $0$ on the interval $[q;1]$.

Make an affine change of variables and set $P_n = Q_n((1-T)/2)$. The function defined by $P_n$ on $[-1:1]$ is increasing; for any real number $\delta$ such that $\delta>0,$ it converges uniformly to $0$ on $[-1;-\delta]$, and it converges uniformly to $1$ on $[\delta;1]$. This concludes the proof of the lemma.

We can now turn to the proof of the Weierstrass approximation theorem. Let $f$ be a continuous function on $[0;1]$. We may assume that $f(0)=0.$

The first step, that follows from Heine's uniform continuity theorem, consists in noting that there exists a uniform approximation of $f$ by a function of the form $ F(x)=\sum_{m=1}^N a_m H(x-c_m)$, where $(a_1,\dots,a_m)$ and $(c_1,\dots,c_m)$ are real numbers in $[0;1].$ Namely, Heine's theorem implies that there exists $\delta>0$ such that $|f(x)-f(y)|<\varepsilon$ if $|x-y|<\delta$. Choose $N$ such that $N\delta\geq 1$ and set $c_m=m/N$; then define $a_1,\dots,a_N$ so that $a_1=f(c_1)$, $a_1+a_2=f(c_2)$, etc. It is easy to check that $|F(x)-f(x)|\leq \varepsilon$ for every $x\in[0;1]$. Moreover, $\lvert a_m\rvert\leq\varepsilon/2$ for all $m$.

Now, fix a real number $\delta>0$ small enough so that the intervals $[c_m-\delta;c_m+\delta]$ are pairwise disjoint, and $n\in\mathbf N$ large enough so that $|P_n(x)-H(x)|\leq\varepsilon/2A$ for all $x\in[-1;1]$ such that $\delta\leq|x|$, where $A=\lvert a_0\rvert+\dots+\lvert a_N\rvert$. Finally, set $P(T) = \sum_{m=1}^N a_m P_n(T-c_m)$.

Let $x\in[-1;1]$. If $x$ doesn't belong to any interval of the form $[c_m-\delta;c_m+\delta]$, one can write $$\lvert P(x)-F(x)\rvert\leq \sum_{m} \lvert a_m\rvert \,\lvert P_n(x-c_m)- H(x-c_m)\rvert \leq \sum_m \lvert a_m\rvert (\varepsilon/A)\leq \varepsilon. $$ On the other hand, if there exists $m\in\{1,\dots,N\}$ such that $x\in[c_m-\delta;c_m+\delta]$, then there exists a unique such integer $m$. Writing $$\lvert P(x)-F(x)\rvert\leq \sum_{k\neq m} \lvert a_k\rvert \,\lvert P_n(x-c_k)- H(x-c_k)\rvert + \lvert a_m\rvert\, \lvert P_n(x-c_m)-H(x-c_m)\rvert, $$ the term with index $k$ in the first sum is bounded by $\lvert a_k\rvert \varepsilon/A$, while the last term is bounded by $ \lvert a_m\rvert$, because $0\leq P_n\leq H\leq 1$. Consequently, $$\lvert P(x)-F(x)\rvert \leq (\varepsilon/2A)\sum_{k\neq m} \lvert a_k\rvert +\lvert a_m\rvert \leq 2\varepsilon.$$ Finally, $\lvert P(x)-f(x)\rvert\leq \lvert P(x)-F(x)\rvert+\lvert F(x)-f(x)\rvert\leq 3\varepsilon$. This concludes the proof.

Yet another proof of the inequality between the arithmetic and the geometric means

This is an exposition of the proof of the inequality between arithmetic and geometric means given by A. Pełczyński (1992), “Yet another proof of the inequality between the means”, Annales Societatis Mathematicae Polonae. Seria II. Wiadomości Matematyczne, 29, p. 223–224. The proof might look bizarre, but I can guess some relation with another paper of the author where he proves uniqueness of the John ellipsoid. And as bizarre as it is, and despite the abundance of proofs of this inequality, I found it nice. (The paper is written in Polish, but the formulas allow to understand it.)

For $n\geq 1$, let $a_1,\dots,a_n$ be positive real numbers. Their arithmetic mean is $$ A = \dfrac1n \left(a_1+\dots + a_n\right)$$ while their geometric mean is $$G = \left(a_1\dots a_n\right)^{1/n}.$$ The inequality of the title says $G\leq A,$ with equality if and only if all $a_k$ are equal. By homogeneity, it suffices to prove that $A\geq 1$ if $G=1$, with equality if and only if $a_k=1$ for all $k.$ In other words, we have to prove the following theorem.

Theorem.If $a_1,\dots,a_n$ are positive real numbers such that $a_1\dots a_n=1$ and $a_1+\dots+a_n\leq n,$ then $a_1=\dots=a_n=1.$

The case $n=1$ is obvious and we argue by induction on $n$.

Lemma.If $a_1\cdots a_n=1$ and $a_1+\dots+a_n\leq n,$ then $a_1^2+\dots+a_n^2\leq n$

Indeed, we can write $$ a_1^2+\dots+a_n^2 = (a_1+\dots+a_n)^2 - \sum_{i\neq j} a_i a_j \leq n^2 - \sum_{i\neq j} a_i a_j,$$ and we have to give a lower bound for the second term. For given $i\neq j$, the product $a_i a_j$ and the remaining $a_k$, for $k\neq i,j$, are $n-1$ positive real numbers whose product is equal to $1$. By induction, one has $$ n-1 \leq a_i a_j + \sum_{k\neq i,j}a_k.$$ Summing these $n(n-1)$ inequalities, we have $$ n(n-1)^2 \leq \sum_{i\neq j} a_i a_j + \sum_{i\neq j} \sum_{k\neq i,j} a_k.$$ In the second term, every element $a_k$ appears $(n-1)(n-2)$ times, hence $$ n(n-1)^2 \leq \sum_{i\neq j} a_i a_j + (n-1)(n-2) \sum_{k} a_k \leq \sum_{i\neq j} a_i a_j + n(n-1)(n-2), $$ so that $$ \sum_{i\neq j} a_i a_j \geq n(n-1)^2-n(n-1)(n-2)=n(n-1).$$ Finally, we obtain $$a_1^2+\dots+a_n^2 \leq n^2-n(n-1)=n,$$ as claimed.

We can iterate this lemma: if $a_1+\dots+a_n\leq n$, then $$a_1^{2^m}+\dots+a_n^{2^m}\leq n$$ for every integer $m\geq 0$. When $m\to+\infty$, we obtain that $a_k\leq 1$ for every $n$. Since $a_1\dots a_n=1$, we must have $a_1=\dots=a_n=1$, and this concludes the proof.

Monday, April 7, 2025

A generalization of the Eisenstein criterion

Recently, in the Zulip server for Lean users, somebody went with something that looked like homework, but managed to sting me a little bit.

It was about irreducibility of polynomials with integer coefficients. Specifically, the guy wanted a proof that the polynomial $T^4-10 T^2+1$ is irreducible, claiming that the Eisenstein criterion was not good at it.

What was to proven (this is what irreducible means) is that it is impossible to write that polynomial as the product of two polynomials with integer coefficients, except by writing $T^4-10 T^2+1$ as $(1)·(T^4-10 T^2+1)$ or as $ (-1)·(-T^4+10 T^2-1)$.

This is both a complicated and a trivial question

Trivial because there are general bounds (initially due to Mignotte) for the integers that appear in any such factorization, and it could just be sufficient to try any possibility in the given range and conclude. Brute force, not very intelligent, but with a certain outcome.

Complicated because those computations would often be long, and there are many criteria in number theory to assert irreducibility.

One of the easiest criteria is the aforementioned Eisenstein criterion.

This criterion requires an auxiliary prime number, and I'll state it as an example, using another polynomial, say $T ^ 4 - 10 T ^ 2 + 2$. Here, one takes the prime number 2, and one observes that modulo 2, the polynomial is equal to $T ^ 4$, while the constant term, 2, is not divisible by $2^2=4$. In that case, the criterion immediately asserts that the polynomial $T^4-10T^2+2$ is irreducible.

With all due respect to Eisenstein, I don't like that criterion too much, though, because it only applies in kind of exceptional situations. Still, it is often useful.

There is a classic case of application, by the way, of the Eisenstein criterion, namely the irreducibility of cyclotomic polynomials of prime index, for example, $T^4+T^3+T^2+T+1$ (for the prime number $p=5$). But it is not visible that the criterion applies, and one needs to perform the change of variable $T = X+1$.

I had noticed that in that case, it maybe interesting to generalize the Eisenstein criterion to avoid this change of variables. Indeed, for a monic polynomial $f$ in $\mathbf Z[T]$, a variant of the criterion states: take an integer $a$ and a prime number $p$, and assume that :

  • $f(T) \equiv (T-a)^d \mod p$
  • $f'(a)$ (derivative at $a$) is not divisible by $p^2$.

Then $f$ is irreducible.

For the cyclotomic polynomial of index $5$, $f(T) = T^4+T^3+T^2+T+1$, still taking $p=5$, one has $f(T) \equiv (T-1)^4 \pmod5$ and $f'(1)=4·5/2=10$ is not divisible by $5^2=25$. Consequently, it is irreducible. And the same argument works for all cyclotomic polynomials of prime index.

The reason, that avoids any strange computation, is that $f(T)=(T^5-1)/(T-1)$, which modulo 5 is $(T-1)^4$ — by the divisibility of the binomial coefficients.

To go back to the initial example, $T^4-10T^2+1$, there are indeed no prime numbers with which the Eisenstein criterion can be applied. This is obvious in the standard form, because the constant coefficient is 1. But the variant doesn't help neither. The only prime it could seems to be 2, but its derivative at 1 is equal to $-16$, and is divisible by 4.

This is where a new variant of the criterion can be applied, this time with the prime number 3.

Theorem.Let $q\in\mathbf Z[T]$ be a monic polynomial, let $p$ be a prime number such that $q$ is irreducible in $\mathbf F_p[T]$. Let $f\in\mathbf Z[T]$ be a monic polynomial. Assume that $f\equiv q^d$ modulo $p$, but that $f\not\equiv 0$ modulo $\langle q, p^2\rangle$. Then $f$ is irreducible in $\mathbf Z[T]$.

To see how this criterion applies, observe that modulo 3, one has $f(T)\equiv T^4+2T^2+1=(T^2+1)^2\pmod 3$. So we are almost as in the initial criterion, but the polynomial $T$ is not $T^2+1$. The first thing that allows this criterion to apply is that $T^2+1$ is irreducible modulo 3. In this case, this is because $-1$ is not a square mod 3.

The criterion also requires of variant of the condition on the derivative — it holds because the polynomial is not zero modulo $\langle T^2+1, 9\rangle. Here, one has \[ T^4-10T^2+1=(T^2+1)^2-12T^2 = (T^2+1)^2-12(T^2+1)+12\] hence it is equal to 3 modulo $\langle T^2+1, 9\rangle$.

And so we have an Eisenstein-type proof that the polynomial $T^4-10T^2+1$ is irreducible over the integers. CQFD!

I made the fun last a bit longer by formalizing the proof in Lean, first of the generalized criterion, and then of the particular example. It is not absolutely convincing yet, because Lean/Mathlib still lacks a bit of tools for handling explicit computations. And probably many parts can be streamlined. Still, it was a fun exercise to do.

The proof works in a more general context and gives the following theore:

Theorem. Let $R$ be an integral domain, let $P$ be a prime ideal of $R$ and let $K$ be the field of fractions of $R/P$. Let $q\in R[T]$ be a monic polynomial such that $q$ is irreducible in $K[T]$. Let $f\in R[T]$ be a monic polynomial. Assume that $f\equiv q^d$ modulo $P$, but that $f\not\equiv 0$ modulo $\langle q\rangle + P^2$. Then $f$ is irreducible in $R[T]$.

Saturday, March 29, 2025

A simple proof of a theorem of Kronecker

Kronecker's theorem of the title is the following.

Theorem.Let $\alpha\in\mathbf C$ be an algebraic integer all of whose conjugates have absolute value at most $1$. Then either $\alpha=0$, or $\alpha$ is a root of unity.

This theorem has several elementary proofs. In this post, I explain the simple proof proposed by Gebhart Greiter in his American Mathematical Monthly note, adding details so that it would (hopefully) be more accessible to students.

The possibility $\alpha=0$ is rather unentertaining, hence let us assume that $\alpha\neq0$.

Let us first analyse the hypothesis. The assumption that $\alpha$ is an algebraic integer means that there exists a monic polynomial $f\in\mathbf Z[T]$ such that $f(\alpha)=0$. I claim that $f$ can be assumed to be irreducible in $\mathbf Q[T]$; it is then the minimal polynomial of $\alpha$. This follows from Gauss's theorem, and let me give a proof for that. Let $g\in \mathbf Q[T]$ be a monic irreducible factor of $f$ and let $h\in\mathbf Q[T]$ such that $f=gh$. Chasing denominators and dividing out by their gcd, there are polynomials $g_1,h_1\in\mathbf Z[T]$ whose coefficients are mutually coprime, and natural integers $u,v$ such that $g=ug_1$ and $h=vh_1$. Then $(uv)f=g_1 h_1$. Since $f$ is monic, this implies that $uv$ is an integer. Let us prove that $uv=1$; otherwise, it has a prime factor $p$. Considering the relation $(uv)f=g_1h_1$ modulo $p$ gives $0=g_1 h_1 \pmod p$. Since their coefficients are mutually coprime, the polynomials $g_1$ and $h_1$ are nonzero modulo $p$, hence their product is nonzero. This is a contradiction.

So we have a monic polynomial $f\in\mathbf Z[T]$, irreducible in $\mathbf Q[T]$, such that $f(\alpha)=0$. That is to say, $f$ is the minimal polynomial of $\alpha$, so that the conjugates of $\alpha$ are the roots of $f$. Note that the roots of $f$ are pairwise distinct — otherwise, the $\gcd(f,f')$ would be a nontrivial factor of $f$. Moreover, $0$ is not a root of $f$, for otherwhise one could factor $f=T\cdot f_1$.

Let us now consider the companion matrix to $f$: writing $f=T^n+c_1T^{n-1}+\dots+c_n$, so that $n=\deg(f)$, this is the matrix \[ C_f = \begin{pmatrix} 0 & \dots & 0 & -c_n \\ 1 & \ddots & \vdots & -c_{n-1} \\ 0 & \ddots & & -c_{n-2} \\ 0 & \cdots & 1 & -c_1 \end{pmatrix}.\] If $e_1,\ldots,e_n$ are the canonical column vectors $e_1=(1,0,\dots,0)$, etc., then $C_f\cdot e_1=e_2$, \ldots, $C_f \cdot e_{n-1}=e_{n}$, and $C_f\cdot e_n = -c_{n} e_1-\dots -c_1 e_n$. Consequently, \[ f(C_f)\cdot e_1 = C_f^n\cdot e_1 +c_1 C_f^{n_1} \cdot e_1+ \dots +c_n e_1 = 0.\] Moreover, for $2\leq k\leq n$, one has $f(C_f)\cdot e_k=f(C_f)\cdot C_f^{k-1}\cdot e_1=C_f^{k-1}\cdot f(C_f)\cdot e_1=0$. Consequently, $f(C_f)=0$ and the complex eigenvalues of $f(C_f)$ are roots of $f$. Since $f$ has simple roots, $C_f$ is diagonalizable and their exists a matrix $P\in\mathrm{GL}(n,\mathbf C)$ and diagonal matrix $D$ such that $C_f=P\cdot D\cdot P^{-1}$, the diagonal entries of $D$ are roots of $f$. Since $0$ is not a root of $f$, the matrix $D$ is invertible, and $C_f$ is invertible as well. More precisely, one can infer from the definition of $C_f$ that $g(C_f)\cdot e_1\neq 0$ for any nonzero polynomial $g$ of degre $<n$, so that $f$ is the minimal polynomial of $C_f$. Consequently, all of its roots are actually eigenvalues of $C_f$, and appear in $D$; in particular, $\alpha$ is an eigenvalue of $C_f$.

For every $k\geq 1$, one has $C_f^k=P\cdot (D^k)\cdot P^{-1}$. Since all entries of $D$ have absolute value at most $1,$ the set of all $D^k$ is bounded in $\mathrm{M}_n(\mathbf C)$. Consequently, the set $\{C_f^k\,;\, k\in\mathbf Z\}$ is bounded in $\mathrm M_n(\mathbf C)$. On the other hand, this set consists in matrices in $\mathrm M_n(\mathbf Z)$. It follows that this set is finite.

There are integers $k$ and $\ell$ such that $k< \ell$ and $C_f^k=C_f^\ell$. Since $C_f$ is invertible, one has $C_f^{\ell-k}=1$. Since $\alpha$ is an eigenvalue of $C_f,$ this implies $\alpha^{\ell-k}=1$. We thus have proved that $\alpha$ is a root of unity.

Saturday, January 11, 2025

On numbers and unicorns

Reading a book on philosophy of mathematics, even if it's written lightly, such as that one, Why is there philosophy of mathematics at all? by Ian Hacking, may have unexpected effects. The most visible one has been a poll that I submitted on Mastodon on December 4th. As you can read, there were four options:

  • Numbers exist
  • Unicorns exist
  • Numbers have more existence than unicorns
  • Neither numbers no unicorns exist

As you can see, three main options each share a rough third of the votes, the existence of unicorns being favored by a small 4% of the 142 participants.

Post by @antoinechambertloir@mathstodon.xyz
View on Mastodon

In this post, I would like to discuss these four options, from the point of view of a mathematician with a limited expertise on philosophy. Funnily, each of them, including the second one, has some substance. Moreover, it will become apparent at the end that the defense of any of those options relies on the meaning we give to the word exist.

Numbers exist

We have traces of numbers as old as we have traces of writing. The Babylonian clay tablet known as “Plimpton 322” dates from 1800 BC and mentions Pythagorean triples (triples of integers $(a,b,c)$ such $a^2=b^2+c^2$) written in the sexagesimal (base 60) writing of Babylonians. More than 1000 years later, Pythagoras and his school wished to explain everything in the world using numbers, usually depicted geometrically, drawing pebbles arranged in triangles, squares, rectangles and even pentagons. Musical harmony was based on specific numerical ratios. Plato's writings are full of references to mathematics, and a few specific numbers, such as 5040, are even given a political relevance. Even earlier, the Chinese I Ching based predictions on random numbers.

Euclid's Elements give us an account of numbers that still sounds quite modern. A great part of elementary arithmetic can be found there: divisibility, the “Euclidean algorithm”, prime numbers and the fact that there are infinitely many of them or, more precisely, that there are more prime numbers than any given magnitude. On the other hand, it is worth saying that Euclid's concept of number doesn't exactly fit with our modern concept: since “A number is a multitude composed of units”, numbers are whole numbers, and it is implicit that that zero is not a number, and neither is one. Moreover, proposition 21 of book 10, which we read as proving the irrationality of the square root of 2, is not a statement about a hypothetical number called “square root of 2”, but the fact that the diagonal of a square of unit length is irrational, ie, not commensurable with the unit length.

History went on and on and numbers gradually were prevalent in modern societies. Deciding the exact date of Easter in a ill-connected Christian world led to the necessity of computing it, using an algorithm known as computus, and the name happened to be used for every calculation. The development of trade led to the development of computing devices, abacus, counting boards (the counter at your prefered shop is the place where the counting board was laid), and Simon Stevin's De Thiende explicitly motivates trade as an application of his book on decimal fractions.

Needless to say, our digital age is litteraly infused with numbers.

Unicorns exist

Traces of unicorns can be found by the Indus valley. The animal seems to have disappeared but the Greeks thought they lived in India. In Western Europe, Middle ages tapestry gives numerous wonderful representations of unicorns, such as the Paris Dame à la licorne. This extract from Umberto Eco's The name of the rose gives additional proof of their existence:

“But is the unicorn a falsehood? It’s the sweetest of animals and a noble symbol. It stands for Christ, and for chastity; it can be captured only by setting a virgin in the forest, so that the animal, catching her most chaste odor, will go and lay its head in her lap, offering itself as prey to the hunters’ snares.”
“So it is said, Adso. But many tend to believe that it’s a fable, an invention of the pagans.”
“What a disappointment,” I said. “I would have liked to encounter one, crossing a wood. Otherwise what’s the pleasure of crossing a wood?”

The Internet age gave even more evidence to the existence of unicorns. For example, we now know that though it is not rainbowish, contrary to a well-spread myth, unicorns have a colorful poop.

Numbers have more existence than unicorns

Whatever numbers and unicorns could be, we are litteraly surrounded by the former, and have scarce traces of the latter. I'd also like to add that numbers have a big impact in all modern human activities: the most basic trading activities use numbers, as does the most sophisticated science. But we don't need to wish to send a man to the moon to require numbers: the political life is build around votations and polls, all timed schedules are expressed in numbers, and schools, hospitals, movies are regularly ranked or assigned rates. An example given by Hacking is the existence of world records for, say, 50 m free stroke swimming. That requires good swimmers, of course, but also elaborate timers, very precise measurement tools to have the swimming pool acknowledged as a legitimate place, and so on. On the other hand, how nice the cultural or artistic representations of unicorns can be, they don't have that prevalence in the modern world, and it is rare that something is possible because of something involving unicorns.

Neither numbers nor unicorns exist

This is probably my favorite take, and that's maybe slightly bizarre from a mathematician, especially one versed in number theory. But give some lucid look at the discourse that we have about numbers. On one side, we have the famous quote of Leopold Kronecker, “Die ganzen Zahlen hat der liebe Gott gemacht, alles andere ist Menschenwerk.” — God made whole numbers, all the rest is the work of man. Does this mean that at least some numbers, the natural integers, exist in the world in the same way that we can find water, iron, gold or grass? Are there number mines somewhere? It doesn't seem so, and I'm inclined to say that if numbers exist, they also are the creation of mankind. The ZFC axioms of set theory postulate the existence of some set $\mathbf N$ satisfying an induction principle, and that induction principle is used to endow that set with an addition and a multiplication which make it a mathematical model of Kronecker's “whole numbers”, but does this mean that numbers exist? After all, what does guarantee that the ZFC axioms are not self contradictory? And if they were, would that mean that whole numbers cease to exist? Certainly not. In any case, our daily use of whole numbers would not be the least disturbed by a potential contradiction in those axioms. But that indicates that being able to speak of integers withing the ZFC set theory doesn't confer those integers some existence, in the same sense that grass, water, iron exist. Another indication of their elusive character is the way mathematicians or computer scientists pretend numbers are specific objects, such as zero being the empty set, 1 being the set $\{\emptyset\}$, and, more generally, the integer $n+1$ being $n \cup \{n\}$. This is at most a functional fiction, as is well explained by Benacerraf (1965) in his paper, “What numbers could not be” (The Philosophical Review, Vol. 74, No. 1, pp. 47-73).

But all in all, everything is fine, because whether numbers (resp. unicorns) exist or don't, we, humans, have developed a language to talk about them, make us think, make us elaborate new works of art or of science, and after all, this is all that counts, isn't it?

Friday, September 13, 2024

The combinatorial Nullstellensatz

The “combinatorial Nullstellensatz” is a relatively elementary statement due to Noga Alon (1999) whose name, while possibly frightening, really says what it is and what it is good for. (A freely available version is there.) Nullstellensatz is the classic name for a theorem of David Hilbert that relates loci in $F^n$ defined by polynomial equations and the ideals of the polynomial ring $F[T_1,\dots,T_n]$ generated by these equations, when $F$ is an algebraically closed field. The word itself is German and means “theorem about the location of zeroes”. The precise correspondence is slightly involved, and stating it here precisely would derail us from the goal of this post, so let us stay with these informal words for the moment. The adjective combinatorial in the title refers to the fact that Alon deduced many beautiful consequences of this theorem in the field of combinatorics, and for some of them, furnishes quite simple proofs. We'll give one of them below, the Cauchy—Davenport inequality, but my main motivation in writing this text was to discuss the proof of the combinatorial Nullstellensatz, in particular untold aspects of the proofs which took me some time to unravel.$\gdef\Card{\operatorname{Card}}$

The context is the following: $F$ is a field, $n$ is a positive integer, and one considers finite sets $S_1,\dots,S_n$ in $F$, and the product set $S=S_1\times \dots \times S_n$ in $F^n$. One considers polynomials $f\in F[T_1,\dots,T_n]$ in $n$ indeterminates and their values at the points $(s_1,\dots,s_n)$ of $S$. For every $i\in\{1,\dots,n\}$, set $g_i = \prod_{a\in S_i} (T_i-a)$; this is a polynomial of degree $\Card(S_i)$ in the indeterminate $T_i$.

Theorem 1. — If a polynomial $f\in F[T_1,\dots,T_n]$ vanishes at all points $s\in S$, there exist polynomials $h_1,\dots,h_n\in F[T_1,\dots,T_n]$ such that $f=g_1 h_1+\dots+g_n h_n$, and for each $i$, one has $\deg(g_i h_i)\leq \deg (f)$.

This theorem is really close to the classic Nullstellensatz, but the specific nature of the set $S$ allows to have a weaker hypothesis (the field $F$ is not assumed to be algebraically closed) and a stronger conclusion (the Nullstellensatz would assert that there exists some power $f^k$ of $f$ that is of this form, saying nothing of the degrees).

Its proof relies on a kind of Euclidean division algorithm. Assume that $f$ has some monomial $c_mT^m=c_mT_1^{m_1}\dots T_n^{m_n}$ where $m_i\geq \Card(S_i)$; then one can consider a Euclidean division (in the variable $T_i$), $T_i^{m_i}=p_i g_i + r_i$, where $\deg(r_i)<\Card(S_i)$. One can then replace this monomial $c_mT^m$ in $f$ by $c_m T^m/T_i^{m_i})r_i$, and, at the same time register $c_m T^m/T_i^{m_i} p_i$ to $h_i$. Since this amounts to subtract some multiple of $g_i$, the vanishing assumption still holds. Applying this method repeatedly, one reduces to the case where the degree of $f$ in the variable $T_i$ is $<\Card(S_i)$ for all $i$.
Then a variant of the theorem that says that a polynomial in one indeterminate has no more roots than its degree implies that $f\equiv 0$.

A weak point in the written exposition of this argument is the reason why the iterative construction would terminate. Intuitively, something has to decrease, but one needs a minute (or an hour, or a week) of reflexion to understand what precisely decreases. The problem is that if one works one monomial at a time, the degrees might stay the same. The simplest reason why this procedure indeed works belongs to the theory of Gröbner bases and to the proof that the division algorithm actually works: instead of the degrees with respect to all variables, or of the total degree, one should consider a degree with respect to some lexicographic ordering of the variables — the point is that it is a total ordering of the monomials, so that if one consider the leading term, given by the largest monomial, the degree will actually decrease.

The second theorem is in fact the one which is needed for the applications to combinatorics. Its statement is rather written in a contrapositive way — it will show that $f$ does not vanish at some point of $S$.

Theorem 2.Let $f\in F[T_1,\dots,T_m]$ and assume that $f$ has a nonzero monomial $c_mT^m$, where $|m|$ is the total degree of $f$. If, moreover, $m_i<\Card(S_i)$ for all $i$, then there exists a point $s=(s_1,\dots,s_n)\in S$ such that $f(s)\neq 0$.

It is that theorem whose proof took me a hard time to understand. I finally managed to formalize it in Lean, hence I was pretty sure I had understood it. In fact, writing this blog post helped me simplify it further, removing a couple of useless arguments by contradiction! Anyway, I feel it is worth being told with a bit more detail than in the original paper.

We argue by contradiction, assuming that $f(s)=0$ for all $s\in S_1\times\dots \times S_n$. Applying theorem 1, we get an expression $f=\sum_{i=1}^n g_i h_i$ where $\deg(g_ih_i)\leq \deg(f)$ for all $i$. The coefficient of $T^m$ in $f$ is non zero, by assumption; it is also the sum of the coefficients of the coefficients of $T^m$ in $g_i h_i$, for $1\leq i\leq n$, and we are going to prove that all of them vanish — hence getting the desired contradiction. $\gdef\coeff{\operatorname{coeff}}$

Fix $i\in\{1,\dots, n\}$.  If $h_i=0$, then $\coeff_m(g_ih_i)=\coeff_m(0)=0$, hence we may assume that $h_i\neq0$. This implies that $\deg(g_i h_i)=\Card(S_i)+\deg(h_i) \leq \deg(f)=|m|$.
One then has
\[ \coeff_m(g_i h_i)=\sum_{p+q=m} \coeff_p (g_i) \coeff_q(h_i), \]
and it suffices to prove that all terms of this sum vanish. Fix $p,q$ such that $p+q=m$, assume that  $\coeff_p(g_i)\neq 0$, and let us prove that $\coeff_q(h_i)=0$.

By the very definition of $g_i$ as a product of $\Card(S_i)$ factors of the form $T_i-a$, this implies that $p_j=0$ for all $j\neq i$. Moreover, $p_i\leq p_i+q_i=m_i < \Card(S_i)$, by the assumption of the theorem, hence $p_i<\Card(S_i)$. This implies \[\Card(S_i)+\deg(h_i)\leq |m|= |p+q|=|p|+|q|=p_i+|q|<\Card(S_i)+|q|.\]
Subtracting $\Card(S_i)$ on both sides, one gets $\deg(h_i)<|q|$, hence $\coeff_q(h_i)=0$, as was to be shown.

To conclude, let us add the combinatorial application to the Cauchy—Davenport theorem.
$\gdef\F{\mathbf F}$

Theorem 3.Let $p$ be a prime number and let $A$ and $B$ be nonempty subsets of $\F_p$, the field with $p$ elements. Denote by $A+B$ the set of all $a+b$, for $a\in A$ and $b\in B$. Then
\[ \Card(A+B) \geq \min (p, \Card(A)+\Card(B)-1).\]


First consider the case where $\Card(A)+\Card(B)>p$, so that $\min(p,\Card(A)+\Card(B)-1)=p$.  In this case, for every $c\in\F_p$, one has
\[\Card(A \cap (c-B))\cup \Card(A\cup (c-B))=\Card(A)+\Card(c-B)>\Card(A)+\Card(B)>p,\]
so that the sets $A$ and $c-B$ intersect. In particular, there exist $a\in A$ and $b\in B$ such that $a+b=c$ and $A+B=\F_p$, and the desired inequality holds.

Let us now treat the case where $\Card(A)+\Card(B)\leq p$, so that $\min(p,\Card(A)+\Card(B)-1)=\Card(A)+\Card(B)-1$. We will assume that the conclusion of the theorem does not hold, that is, $\Card(A+B)\leq \Card(A)+\Card(B)-2$, and derive a contradiction. Let us consider the polynomial $f\in \F_p[X,Y]$ defined by $f=\prod _{c\in A+B} (X+Y-c)$. The degree of $f$ is equal to $\Card(A+B)\leq \Card(A)+\Card(B)-2$. Choose natural integers $u$ and $v$ such that  $u+v=\Card(A+B)$, $u\leq\Card(A)-1$ and $v\leq \Card(B)-1$. The coefficient of $X^u Y^v$ is equal to the binomial coefficient $\binom{u+v}u$. Since $u+v\leq \Card(A)+\Card(B)-2\leq p-2$, this coefficient is nonzero in $\F_p$.  By theorem 2, there exists $(a,b)\in A\times B$ such that $f(a,b)\neq 0$. However, one has $a+b\in A+B$, hence $f(a,b)=0$ by the definition of $f$. This contradiction concludes the proof that $\Card(A+B)\geq \Card(A)+\Card(B)-1$. It also concludes this post.

Monday, August 12, 2024

Combinatorics of partitions

Divided powers are an algebraic gadget that emulate, in an arbitrary ring, the functions $x\mapsto x^n/n!$ for all integers $n$, together with the natural functional equations that they satisfy. 

One of them is a nice binomial theorem without binomial coefficients : denoting $x^n/n!$ by $x^{[n]}$, one has $$ (x+y)^{[n]}=\sum_{k=0}^n x^{[n-k]} y^{[k]}. $$ 

Another formula looks at what happens when one iterates the construction: if everything happens nicely, one has $$ (x^{[n]})^{[m]}=\frac1{m!}\left(x^n/n!\right)^m = \frac1{(n!)^m m!} x^{nm} = \frac{(mn)!}{m! n!^m} x^{[mn]}. $$ 

 In the development of the theory, the remark that comes then is the necessary observation that the coefficient $(mn)!/{m! n!^m}$ is an integer, which is not obvious since it is written as a fraction. Two arguments are given to justify this fact. 

  1. The formula $$ \frac{(mn)!}{m! n!^m} = \prod_{k=1}^{m} \binom{k n-1}{n-1} $$ which the authors claim can be proved by induction
  2. The observation that $(mn)!/m!n!^m$ is the number of (unordered) partitions of a set with $mn$ elements into $m$ subsets with $n$ elements.

The latter fact is a particular case of a more general counting question: if $n=(n_1,\dots,n_r)$ are integers and $N=n_1+\dots+n_r$, then the number of (unordered) partitions of a set with $N$ elements into subsets of cardinalities $n_1,\dots,n_r$ is equal to $$\frac{N!}{\prod n_i! \prod_{k>0} m_k(n)!},$$ where $m_k(n)$ is the number of occurences of $k$ in the sequence $(n_1,\dots,n_r)$. 

The goal of this blog post is to present a combinatorial argument for the first equality, and an alternative expression of the second number as a product of integers. We also give a formal, group theoretical proof, that this quotient of factorials solves the given counting problem. 

 $\gdef\abs#1{\lvert#1\rvert}$ $\gdef\Card{\operatorname{Card}}$  

Counting partitions of given type 

 Let $S$ be a set with $N$ elements. A partition of $S$ is a subset $\{s_1,\dots,s_r\}$ of $\mathfrak P(S)$ consisting of nonempty, pairwise disjoint, subsets whose union is $S$. Its *type* is the “multiset” of integers given by their cardinalities. Because no $s_i$ is empty, the type is a multiset of nonzero integers. It is slightly easier to authorize zero in this type; then a partition has to be considered as a multiset of subsets of $S$, still assumed pairwise disjoint, so that only the empty subset can be repeated.

The group $G$ of permutations of $S$ acts on the set $\Pi_S$ of partitions of $S$. This action preserves the type of a partition, so that we get an action on the set $\Pi_{S,n}$ on the set of partitions of type $n$, of any multiset of integers $n$. It is nonempty if and only if $\abs n=N$.

This action is transitive: Given two partitions $(s_1,\dots,s_r)$ and $(t_1,\dots t_r)$ with the same type $(n_1,\dots,n_r)$, numbered so that $\abs{s_i}=\abs{t_i}={n_i}$ for all $i$, we just define a permutation of $S$ that maps the elements of $s_i$ to $t_i$.

By the orbit-stabilizer theorem, the number of partitions of type $n$ is equal to $\Card(G)/\Card(G_s)$, where $G_s$ is the stabilizer of any partition $s$ of type $n$. Since $\Card(S)=N$, one has $\Card(G)=N!=\abs n!$.

On the other hand, the stabilizer $G_s$ of a partition $(s_1,\dots,s_r)$ can be described as follows. By an element $g$ of this stabilizer, the elements of $s_i$ are mapped to some set $s_j$ which has the same cardinality as $s_i$. In other words, there is a morphism of groups from $G_s$ to the subgroup of the symmetric group $\mathfrak S_r$ consisting of permutations that preserve the cardinality. This subgroup is a product of symmetric groups $\mathfrak S_{m_k}$, for all $k> 0$, where $m_k$ is the number of occurrences of $k$ in $(n_1,\dots,n_r)$. Here, we omit the factor corresponding to $k=0$ because our morphism doesnt' see it.

This morphism is surjective, because it has a section. Given a permutation $\sigma$ of $\{1,\dots,r\}$ that preserves the cardinalities, we have essentially shown above how to construct a permutation of $S$ that induces $\sigma$, and this permutation belongs to $G_s$. More precisely, if we number the elements of $s_i$ as $(x_{i,1},\dots,x_{i,n_i})$, we lift $\sigma$ to the permutation that maps $x_{i,j}$ to $x_{\sigma(i),j}$ for all $i,j$. This makes sense since $n_{i}=n_{\sigma(i)}$ for all $i$.

The kernel of this morphism $G_s\to \prod_{k>0} \mathfrak S_{m_k}$ consists of permutations that fix each $s_i$. It is clearly equal to the product of the symmetric groups $\prod \mathfrak S_{n_i}$.

One thus has $\Card(G_s)=\prod n_i! \prod_{k>0} m_k!$, as was to be shown.

A combinatorial interpretation of the first relation


As we said above, that relation $$ \frac{(mn)!}{m^ n!^m} = \prod_{k=1}^{m-1} \binom{kn-1}{n-1} $$ can be proved by induction, just playing with factorials, but we want a combinatorial interpretation for it. (Here we assume that $n\geq 1$.)
By the preceding paragraph, the left hand side is the number of partitions of a set with $mn$ elements of type $(n,\dots,n)$. We may assume that this set is numbered $\{1,\dots,mn\}$.

Such a partition can be defined as follows. First, we choose the subset that contains $1$: this amounts to choosing $n-1$ elements among $\{2,\dots,mn\}$, and there are $\binom{mn-1}{n-1}$ possibilities. Now, we have to choose the subset that contains the smallest element which is not in the first chosen subset. There are $n-1$ elements to choose among the remaining $mn-n-1=(m-1)n-1$ ones, hence $\binom{(m-1)n-1}{n-1}$ possibilities. And we go on in the same way, until we have chosen the $m$ required subsets. (For the last one, there is course only one such choice, and notice that the last factor is $\binom{n-1}{n-1}=1$.)

An integral formula for the number of partitions of given type


Finally, we wish to give an alternating formula for the number of partitions of given type of a set $S$ that makes it clear that it is an integer. Again, we can assume that the set $S$ is equal to $\{1,\dots,N\}$ and that $n=(n_1,\dots,n_r)$ is a multiset of integers such that $\abs n=N$.

Let us write $n$ in an other way, $n=(0^{m_0},1^{m_1},\dots)$, meaning that $0$ is repeated $m_0$ times, $1$ is repeated $m_1$ times, etc. One has $N=\sum k m_k$. The idea is that we can first partition $S$ into a subsets of cardinalities $m_1$, $2m_2$,… and then subpartition these subsets: the first one into $m_1$ subsets with one element, the second into $m_2$ subsets with two elements, etc.

The number of ordered partitions with $m_1$, $2m_2$… elements is the multinomial number $$ \binom{N}{m_1,2m_2,\dots}.$$ And the other choice is the product of integers as given in the preceding section. This gives $$ \frac{\abs n!}{\prod_i n_i! \prod_{k>0}m_k(n)!} = \binom{N}{m_1,2m_2,\dots}\prod_{k>0} \frac{(km_k)!}{k!^{m_k} m_k(n)!}.$$ We can also write the fractions that appear in the product as integers: $$ \frac{\abs n!}{\prod_i n_i! \prod_{k>0}m_k(n)!} =\binom{N}{m_1,2m_2,\dots} \prod_{k>0} \prod_{m=1}^{m_k} \binom {km-1}{k-1}.$$