Showing posts with label number theory. Show all posts
Showing posts with label number theory. Show all posts

Tuesday, July 15, 2025

The Krull dimension of the semiring of natural numbers is equal to 2

Let $R$ be a ring. Its Krull dimension is the supremum of the lengths $n$ of chains $P_0\subsetneq P_1 \subsetneq\dots\subsetneq P_n$ of prime ideals of $R$. When $R$ is a field, the null ideal is the only prime ideal, and it is a maximal ideal so that its Krull dimension is zero. When $R$ is a principal ideal domain which is not a field, there are two kinds of prime ideals: the null ideal is prime (as in any domain), and the other prime ideals are the maximal ideals of $R$, generated by a prime element $p$. In particular, the Krull dimension of the ring of integers is equal to $1$.

It is classic that these concepts can be defined for semirings as well.

A semiring $R$ is a set endowed with a commutative and associative addition with a neutral element $0$, an associative multiplication with a neutral element $1$, such that addition distributes over multiplication: $(a+b)c=ac+bc$ and $c(a+b)=ca+cb$. When its multiplication is commutative, the semiring is said to be commutative.

Semirings $R$ have ideals: these are nonempty subsets $I$ which are stable under addition ($a+b\in I$ for $a,b\in I$), and stable under multiplication by any element of $R$: for general semirings, one has to distinguish between left, right, and two-sided ideals; for commutative semirings, the notions coincide.

An ideal $P$ of a semiring $R$ is said to be prime if $R\setminus P$ is a multiplicative subset; explicitely, $P\neq R$, and if $ab\in P$, then $a\in P$ or $b\in P$.

An ideal $P$ of a semiring $R$ is said to be maximal if $P\neq R$ and if there is no ideal $I$ such that $P\subsetneq I\subsetneq R$. A semiring $R$ is said to be local if it admits exactly one maximal ideal; this means that the set of non-invertible elements of $R$ is an ideal.

The amusing part comes from the classification of prime and maximal ideals of the semiring $\mathbf N$ of natural numbers, which I learned of via a Lean formalization project led by Junyan Xu.

Theorem.

  1. The semiring $\mathbf N$ is local; its maximal ideal is the set $\mathbf N\setminus\{1\}$.
  2. The null ideal is a prime ideal.
  3. The other prime ideals are the sets $p\mathbf N$, for all prime numbers $p$.

In particular, we have chains $\langle 0\rangle \subsetneq \langle p\rangle \subsetneq \mathbf N\setminus\{1\}$ of prime ideals, hence the result stated in the title of this post:

Corollary.The Krull dimension of the semiring of natural numbers is equal to $2$.

Proof.

  1. The element $1$ is the only unit of $\mathbf N$, and $\mathbf N\setminus\{1\}$ is obviously an ideal, necessarily its unique maximal ideal.
  2. The null ideal is a prime ideal, as in any semiring which is a domain.
  3. Let now $P$ be a nonzero prime ideal of $\mathbf N$ and let $p$ be the smallest nonzero element of $P$. Then $p\neq 1$ (otherwise, $P=\mathbf N$, which is not prime). The hypothesis that $P$ is prime implies that one of the prime factors of $p$ belongs to $p$; by the choice of $p$, this must be $p$ itself, so that $p$ is a prime number. Then $P$ contains the set $p\mathbf N $ of multiples of $p$, which is a prime ideal of $\mathbf N$. Let us assume that $p\mathbf N\subsetneq P$ and let $n\in P\setminus p\mathbf N$. By assumption, $p$ does not divide $n$, so that these two integers are coprime. By the following proposition, $P$ contains every integer at least equal to $(p-1)(n-1)$; in particular, it contains both a power of $2$ and a power of $3$; since $P$ is prime, it contains $2$ and $3$, and it contains any integer at least $(2-1)(3-1)=2$, hence $P=\mathbf N\setminus\{1\}$. This concludes the proof.

Proposition.Let $a$ and $b$ be nonzero coprime natural numbers. For any integer $n\geq (a-1)(b-1)$, there are natural numbers $u$ and $v$ such that $n=au+bv$.

Proof. — Since $a$ and $b$ are coprime, there are integers $u$ and $v$ such that $n=au+bv$. Replacing $u$ by $u+b$ and $v$ by $v-a$, we may assume that $0\leq u$. Replacing $u$ by $u-b$ and $v$ by $v+a$, we may assume that $u < b$. Then \[ bv = n - au \geq (a-1)(b-1)-a(b-1)= -(b-1). \] This implies $v\geq0$, so that $u$ and $v$ are natural numbers.

On the other hand, not all natural numbers $< (a-1)(b-1)$ can be written as such as sum. For example, $ab-a-b$ can't. Indeed, if $ab-a-b=au+bv$, hence $ab=a(u+1)+b(v+1)$, then $b$ divides $a(u+1)$, hence $b$ divides $u+1$, and similarly $a$ divides $v+1$. Then $ab$ is the sum of two nonzero multiples of $ab$, a contradiction. The precise distribution of the natural numbers $< (a-1)(b-1)$ which can be written as $au+bv$, for some natural numbers $u$ and $v$ is complicated, but at least one result is known : such an integer $n$ can be written in this form if and only if $ab-a-b-n$ cannot! One direction is clear, if $n$ and $ab-a-b-n$ can both be written in this form, then so can their sum, which is $ab-a-b$, a contradiction. On the other hand, let $n$ be an integer that cannot be written in this form, and write it as $n=au+bv$, for some integers $u$ and $v$, with $0\leq u< b$. By assumption, $v<0$, hence $v\leq -1$. Then \[ ab-a-b-n=ab-a-b-au-bv=a(b-1-u)+b(-v-1).\] We see that $b-1-u\geq 0$ and $-v-1\geq 0$, which shows that $ab-a-b-n$ can be written in the desired form.

Another open question of the style of the proposition had been raised by Frobenius: consider mutually coprime integers $a_1,\dots,a_r$; then any large enough integer $n$ can be written as $n=a_1u_1+\dots+a_ru_r$, for some natural numbers $u_1,\dots,u_r$, but when $r\geq 3$, there is no known formula for largest natural number that cannot be written in this form. The case $r=2$ that we discussed here was due to Sylvester (1884).

Wednesday, July 2, 2025

Autoformalization of mathematical theorems? No shit!

I've been formalizing mathematical theorems in Lean for some years now, and one of the major blocks is the difficulty of formalizing elementary results that mathematicians do not take the time to even state. For example, a mathematician's integer can implicitly be a natural number at a line and a real number at the next one, while proof assistants require that some “coercion” maps be introduced. Having three `1` in the same formula, of three different natures, leads to unexpected nightmares. It is therefore very important for the development of proof-assisted mathematics that some automation be provided by the type checking system, with as many obvious things being taken care of by a combination of a fine-tuned API and efficient search algorithms. 

Another reason for the difficulty of formalizing proofs is having to navigate in a gigantic mathematical library (Mathlib, for example, has 1.5 million line of codes) and the lack of a definitive search engine and of a reliable and complete documentation. This is not to blame my colleagues — all of this is difficult to write, time consuming, and our personal interests are driven in different directions. 

This may explain that part of the field of formalization is driven by the “AI hype”, and — although this hype is so finely rebuted by Emily Bender and Alex Hannah in their last book, The AI Con, which I recommend heartily — that several colleagues use LLMs to create proofs, either in natural language, or in Lean's syntax. They say it is successful, but from the outside, it is really hard to tell whether it is really the case. My main impression is that these softwares obliterate the time where our mind tries to form ideas, leading — for me — to another kind of stress. There are also serious arguments that the systematic cognitive friction is a necessity of well-formed thinking, and cannot be replaced by stochastic optimization. Moreover, these colleagues usually do not address the environmental cost of using such kinds of generative AI, and whether its output is worth that cost.

A few days ago, after I complained — one more time — that stochastic algorithms do not think, I was asked my opinion about the “trinity autoformalization system”. I have to confess I carefully avoid the AI news and hadn't heard about it. A first search led me to a 2022 paper Autoformalization with Large Language Models. Here is the definition of “autoformalization” from their paper:

Autoformalization refers to the task of automatically translating from natural language mathematics to a formal language.

In this case, the authors took as a benchmark statements of mathematical competition problems, and were happy to have their LLMs translate correctly roughly 25% of those problems. This might be a technical exploit, but le me stay unimpressed for the moment: what mathematicians need is more than that, and is not elementary statements of math olympiad problems, but the most advanced mathematical concepts that the human mind is capable to grasp despite the fact that they involve extremely intricate and sometimes combinatorically involved constructions. I am not sure that we fully know what it means to understand these concepts, but I am certain that it doesn't reduce to being able of formally stating a mathematical statement. (And what about those very fine colleagues who demonstrate everyday their mathematical depth while failing at stating precise statements?)

It appears my colleague from the Lean community meant another result. Trinity is a project from Morph Labs, which their web page describes as follows:

Morph Labs is building a cloud for superintelligence.
Infinibranch enables AI agents on Morph Cloud to snapshot, replicate, autoscale, test, debug, deploy, and reason about software at light speed.

This starts pretty badly. First of all, superintelligence is even less defined than intelligence is, and claims of developing superintelligence can only be suspicious, especially when no scientific claim justifies that AI softwares feature any intelligence at all, and most of the AI experiments show a poor rate of success. The next sentence may be worse, since the speed of reasoning is not measured in m/s (nor in ft/hr), and the speed of electric waves in cables is smaller than the speed of light (although, I just learnt, it can be up to 99% of it!).

Their blog page on Trinity starts as follows:

tl;dr
We're excited to announce Trinity, an autoformalization system that represents a critical step toward verified superintelligence.

which combines the colloquial tl;dr (too long, don't read) with the brutal claim that their autoformalization could be a step towards superintelligence. Later on the webpage, they boast about a nearly infinite supply of verified training environments which, in our finite world plagued by a brutal climate change, is quite a thing. 

What these people claim to have done is the autoformalization of a 1962 theorem by N. G. de Bruijn (who, incidentally, is the father of Automath, one of the first proof assistants to be built), regarding the behaviour of the number $N(n)$ of integers $\leq n$ all of whose prime factors divide $n$, when $n$ grows to infinity. Answering a question of Erdös, de Bruijn proved that, on average, $N(n)$ is at most $n^{1+\varepsilon}$. On the other hand, the Trinity team puts the accent on a consequence of that result to the abc conjecture of Masser and Oesterlé. That conjecture asserts that for any $\varepsilon\gt0$, there exists a real number $k\gt0$ such that there are only finitely many triples $(a,b,c)$ of coprime natural numbers such that $a+b =c $ and such that the product $\operatorname{rad}(abc)$ of all prime numbers dividing $abc$ is at most $c^{1-\varepsilon}$. Implications of that mysterious elementary looking conjecture to number theoretical questions are manifold, from an asymptotic version of Fermat's Last Theorem to an effective version of the Mordell conjecture to Siegel zeroes of L-functions, etc. In effect, the theorem of de Bruijn implies that there are at most $\mathrm O(N^{2/3})$ triples $(a,b,c)$ of coprime natural numbers at most $N$ such that $\operatorname{rad}(abc)\lt c^{1-\varepsilon}$. When this bound is compared to the $\approx \frac6{\pi^2}N^2$ triples of coprime integers $(a,b,c)$ such that $a+b=c\leq N$, this indicates that the (hoped to be finite) set of the abc conjecture is kind of sparse.

To be fair, I am not capable of judging the difficulty of what they achieved, and I presume this is a difficult piece of software engineering to adjust the myriad of parameters of the implied neural networks. But what I can tell is whether they did what they say they did, and whether (this is more of an opinion) this has any relevance for the future of mathematical formalization. The answers, you can guess, will be no, definitely no. 

I would first like to make a mathematical comment on the sentence

We obtain the first formalized theorem towards the abc conjecture, showing it is true almost always. 

that can be read on the github page for the project which compares to the title of the (nice) exposition paper by Jared Lichtman: “The abc conjecture is true almost always”, but adds the idea that this theorem would be a step towards the abc conjecture. I claim that it is nothing like that. Indeed, it is an observation of Oesterlé (see Szpiro's paper, page 12) that the $\varepsilon=0$-variant of the abc conjecture is false. On the other hand, the exact same argument shows that the counterexamples to this false conjecture are as sparse, maybe with an upper bound $N^{2/3+\varepsilon}$ for the number of counterexamples. Consequently, if de Bruijn's result were a step towards the abc conjecture, it would simultaneously be a step towards a false conjecture. The most reasonable conclusion is then to admit that this theorem has no proof-theoretic relevance to the abc conjecture.

Then, while Morph's blog says: 

Trinity systematically processes entire papers, intelligently corrects its own formalization errors by analyzing failed attempts, and automatically refactors lengthy proofs to extract useful lemmas and abstractions,

the github page for the actual project mentions a human-supplied blueprint. This is a file in (La)TeX format that serves as a convenient host for the Lean project, providing information to the Lean prover. From the expression “human-supplied”, one has to understand that human beings have written that file, that is to say, have split the proof of de Bruijn's theorem in a series of 67 lemmas which divide the initial paper into tiny results, while giving references to other lemmas as indications of how they could be proved. As an example, here is “lemma24”:

\begin{lemma} \label{lem24} \lean{lemma24}
Let $p_1<\cdots<p_k$ be distinct primes, and denote the product $r = p_1 \cdots p_k$. If an integer $n\ge1$ satisfies $\rad(n)=r$, then $n = p_1^{a_1}\cdots p_k^{a_k}$ for some integers $a_1,\dots,a_k\ge 1$.
\end{lemma}
\begin{proof}\leanok
\uses{thm3, lem20}
Uses theorem \ref{thm3} with $n$, then uses lemma \ref{lem20}.
\end{proof}

which means that if an integer $r$ is the product of distinct primes $p_1,\dots,p_k$ and $\operatorname{rad}(n)=r$, then $n$ can be written $p_1^{a_1}\cdots p_k^{a_k}$, for some integers $a_1,\dots,a_k\geq 1$. The indication is to use “thm3” (the fundamental theorem of arithmetic) and “lem20” (the formula for the radical of an integer in terms of its decomposition into prime factors). 

To summary, humans have converted the 2-page proof of this result into 67 tiny chunks, supplying plenty of information that would have probably been unnecessary to most professional mathematicians, and fed that to their LLM. I am reluctant to draw a comparison with the AI farms in Kenya where workers were (and probably still are) exploited at tagging violent images for a ridiculous pay, but this is yet another instance where “mechanized artificial intelligence” relies crucially on human's work (beyond the invention and deployment of neural networks, etc., which are human's work as well, of course). And as in these other cases, the technostructure makes every effort to make this human work invisible: the blueprint has no author, the blog post has no author. Everything is made so that you believe that no human was implied in this project. 

Let's go back to the content of the blueprint, taking for granted that such a work has to be done anyway. However, we, humans, do not want to have to write endlessly tiny lemmas, each of them apparently irrelevant to the big mathematical picture. We do not want to keep them in a mathematical library, for almost all of them have no interest outside of their specific contect. Moreover, once these lemmas are written, there would be no additional effort for a human to formalize the proof. What we do want is a way to translate with as little effort as possible mathematical ideas, keeping mostly close their natural expression.

Moreover, even if we would have had to provide these tiny lemmas, we wouldn't be able to keep them in a human-accessible mathematical library, for that library would be cluttered by millions of uninteresting rapidly redundant lemmas. Building a mathematical library is a human task, that reflects the activity of past human minds, to be used by future human minds. 

As a matter of fact, this project considers a result in elementary mathematics, at an undergraduate level,  and can completely ignore the difficulty of building a large scale mathematical library, while this is precisely there that we need better automation. Something which is missing in Lean (but exists in other proof assistant systems, such as Rocq) is a coherent management system of all algebraic/analytic structures, which would allow to optimize their construction without having to modify the rest of the code. There are some cases of such an automation in Lean though, for example the automatic conversion of mathematical lemmas for groups written in multiplicative form to groups written in additive forms, giving one proof for two. Building such a tool requires a good proficiency in parsers and a clear view of what mathematicians do “in their heads” when they perform that apparently innocuous conversion. (For this reason, another automatic conversion for changing the direction of ordering relations appears more difficult to write, because what we would want is less clear. For the moment, mathlib has resolved itself to systematically privilege the $\leq$ inequality; the other one can be made to the dual order, but automating this prompts out the very same problem.) Another kind of crucial automation is the implementation of solvers for linear inequalities, or automatic proofs of algebraic identifies (in numbers, in groups, whatrever) so that the sentence “a computation shows that…” could almost be translated as such in formal code. This requires a good knowledge in programming, and the use of subtle but classical algorithms (simplex method, Gröbner bases, SAT solvers…), each of them finely tailored for its applications. This is a truly beautiful combination of mathematics and computer science, but nothing close of a so-called “general intelligence” tool.

There is something positive, though, that I could say about the search for automatic formalization. The AI companies do not acknowledge it, most of the public opinion is delusionnal, but LLMs have no reasoning faculty at all, and they just can't. What they do is put words together that would fit together with a good probability given the existing corpus that the machine has been given. Patterns exist in language, they exist in mathematics as well, and the (impressing, I concede) faculty of these softwares allows them to simulate text, be it litterary or mathematical, that looks plausible. But that doesn't make it true. On the other hand, these softwares could be combined with proofs assistants that work in a formal, proof-amenable, language, hereby offering a way of assessing the veracity of the output of the vernacular text. (On the other hand, one would need to be sure that the text couldn't say $1+1=3$ when the Lean code has certified that $1+1=2$.)

As a final paragraph, I would like to comment on the metaphors invoked by Morph Labs. Everyone who is even loosely knowledgeable in Sci-Fi movies will have recognized the Wachowskis's movie The Matrix. In that dystopic movie, humans are trapped in a machine-made “reality” after they lost a war against the artificial intelligences they created. Of course, Morpheus and Trinity are humans who fight against this power, but I wonder what the creators of Morph Labs had in mind when they decided to call themselves under this name, and to go on with the metaphor by using the name Trinity. (And who's Neo?) Their obvious reference is a world where artificial intelligence led to humanity's doom, while their rhetoric is one of AI hype. We rejoin here the discussion of Bender and Hannah in their abovementioned book, where they explain that AI doomerism/boosterism are the two faces of the same coin, that takes every development of genAI as unavoidable, whatever its social, ethical, and ecological costs.

Monday, April 7, 2025

A generalization of the Eisenstein criterion

Recently, in the Zulip server for Lean users, somebody went with something that looked like homework, but managed to sting me a little bit.

It was about irreducibility of polynomials with integer coefficients. Specifically, the guy wanted a proof that the polynomial $T^4-10 T^2+1$ is irreducible, claiming that the Eisenstein criterion was not good at it.

What was to proven (this is what irreducible means) is that it is impossible to write that polynomial as the product of two polynomials with integer coefficients, except by writing $T^4-10 T^2+1$ as $(1)·(T^4-10 T^2+1)$ or as $ (-1)·(-T^4+10 T^2-1)$.

This is both a complicated and a trivial question

Trivial because there are general bounds (initially due to Mignotte) for the integers that appear in any such factorization, and it could just be sufficient to try any possibility in the given range and conclude. Brute force, not very intelligent, but with a certain outcome.

Complicated because those computations would often be long, and there are many criteria in number theory to assert irreducibility.

One of the easiest criteria is the aforementioned Eisenstein criterion.

This criterion requires an auxiliary prime number, and I'll state it as an example, using another polynomial, say $T ^ 4 - 10 T ^ 2 + 2$. Here, one takes the prime number 2, and one observes that modulo 2, the polynomial is equal to $T ^ 4$, while the constant term, 2, is not divisible by $2^2=4$. In that case, the criterion immediately asserts that the polynomial $T^4-10T^2+2$ is irreducible.

With all due respect to Eisenstein, I don't like that criterion too much, though, because it only applies in kind of exceptional situations. Still, it is often useful.

There is a classic case of application, by the way, of the Eisenstein criterion, namely the irreducibility of cyclotomic polynomials of prime index, for example, $T^4+T^3+T^2+T+1$ (for the prime number $p=5$). But it is not visible that the criterion applies, and one needs to perform the change of variable $T = X+1$.

I had noticed that in that case, it maybe interesting to generalize the Eisenstein criterion to avoid this change of variables. Indeed, for a monic polynomial $f$ in $\mathbf Z[T]$, a variant of the criterion states: take an integer $a$ and a prime number $p$, and assume that :

  • $f(T) \equiv (T-a)^d \mod p$
  • $f'(a)$ (derivative at $a$) is not divisible by $p^2$.

Then $f$ is irreducible.

For the cyclotomic polynomial of index $5$, $f(T) = T^4+T^3+T^2+T+1$, still taking $p=5$, one has $f(T) \equiv (T-1)^4 \pmod5$ and $f'(1)=4·5/2=10$ is not divisible by $5^2=25$. Consequently, it is irreducible. And the same argument works for all cyclotomic polynomials of prime index.

The reason, that avoids any strange computation, is that $f(T)=(T^5-1)/(T-1)$, which modulo 5 is $(T-1)^4$ — by the divisibility of the binomial coefficients.

To go back to the initial example, $T^4-10T^2+1$, there are indeed no prime numbers with which the Eisenstein criterion can be applied. This is obvious in the standard form, because the constant coefficient is 1. But the variant doesn't help neither. The only prime it could seems to be 2, but its derivative at 1 is equal to $-16$, and is divisible by 4.

This is where a new variant of the criterion can be applied, this time with the prime number 3.

Theorem.Let $q\in\mathbf Z[T]$ be a monic polynomial, let $p$ be a prime number such that $q$ is irreducible in $\mathbf F_p[T]$. Let $f\in\mathbf Z[T]$ be a monic polynomial. Assume that $f\equiv q^d$ modulo $p$, but that $f\not\equiv 0$ modulo $\langle q, p^2\rangle$. Then $f$ is irreducible in $\mathbf Z[T]$.

To see how this criterion applies, observe that modulo 3, one has $f(T)\equiv T^4+2T^2+1=(T^2+1)^2\pmod 3$. So we are almost as in the initial criterion, but the polynomial $T$ is not $T^2+1$. The first thing that allows this criterion to apply is that $T^2+1$ is irreducible modulo 3. In this case, this is because $-1$ is not a square mod 3.

The criterion also requires of variant of the condition on the derivative — it holds because the polynomial is not zero modulo $\langle T^2+1, 9\rangle. Here, one has \[ T^4-10T^2+1=(T^2+1)^2-12T^2 = (T^2+1)^2-12(T^2+1)+12\] hence it is equal to 3 modulo $\langle T^2+1, 9\rangle$.

And so we have an Eisenstein-type proof that the polynomial $T^4-10T^2+1$ is irreducible over the integers. CQFD!

I made the fun last a bit longer by formalizing the proof in Lean, first of the generalized criterion, and then of the particular example. It is not absolutely convincing yet, because Lean/Mathlib still lacks a bit of tools for handling explicit computations. And probably many parts can be streamlined. Still, it was a fun exercise to do.

The proof works in a more general context and gives the following theore:

Theorem. Let $R$ be an integral domain, let $P$ be a prime ideal of $R$ and let $K$ be the field of fractions of $R/P$. Let $q\in R[T]$ be a monic polynomial such that $q$ is irreducible in $K[T]$. Let $f\in R[T]$ be a monic polynomial. Assume that $f\equiv q^d$ modulo $P$, but that $f\not\equiv 0$ modulo $\langle q\rangle + P^2$. Then $f$ is irreducible in $R[T]$.

Saturday, March 29, 2025

A simple proof of a theorem of Kronecker

Kronecker's theorem of the title is the following.

Theorem.Let $\alpha\in\mathbf C$ be an algebraic integer all of whose conjugates have absolute value at most $1$. Then either $\alpha=0$, or $\alpha$ is a root of unity.

This theorem has several elementary proofs. In this post, I explain the simple proof proposed by Gebhart Greiter in his American Mathematical Monthly note, adding details so that it would (hopefully) be more accessible to students.

The possibility $\alpha=0$ is rather unentertaining, hence let us assume that $\alpha\neq0$.

Let us first analyse the hypothesis. The assumption that $\alpha$ is an algebraic integer means that there exists a monic polynomial $f\in\mathbf Z[T]$ such that $f(\alpha)=0$. I claim that $f$ can be assumed to be irreducible in $\mathbf Q[T]$; it is then the minimal polynomial of $\alpha$. This follows from Gauss's theorem, and let me give a proof for that. Let $g\in \mathbf Q[T]$ be a monic irreducible factor of $f$ and let $h\in\mathbf Q[T]$ such that $f=gh$. Chasing denominators and dividing out by their gcd, there are polynomials $g_1,h_1\in\mathbf Z[T]$ whose coefficients are mutually coprime, and natural integers $u,v$ such that $g=ug_1$ and $h=vh_1$. Then $(uv)f=g_1 h_1$. Since $f$ is monic, this implies that $uv$ is an integer. Let us prove that $uv=1$; otherwise, it has a prime factor $p$. Considering the relation $(uv)f=g_1h_1$ modulo $p$ gives $0=g_1 h_1 \pmod p$. Since their coefficients are mutually coprime, the polynomials $g_1$ and $h_1$ are nonzero modulo $p$, hence their product is nonzero. This is a contradiction.

So we have a monic polynomial $f\in\mathbf Z[T]$, irreducible in $\mathbf Q[T]$, such that $f(\alpha)=0$. That is to say, $f$ is the minimal polynomial of $\alpha$, so that the conjugates of $\alpha$ are the roots of $f$. Note that the roots of $f$ are pairwise distinct — otherwise, the $\gcd(f,f')$ would be a nontrivial factor of $f$. Moreover, $0$ is not a root of $f$, for otherwhise one could factor $f=T\cdot f_1$.

Let us now consider the companion matrix to $f$: writing $f=T^n+c_1T^{n-1}+\dots+c_n$, so that $n=\deg(f)$, this is the matrix \[ C_f = \begin{pmatrix} 0 & \dots & 0 & -c_n \\ 1 & \ddots & \vdots & -c_{n-1} \\ 0 & \ddots & & -c_{n-2} \\ 0 & \cdots & 1 & -c_1 \end{pmatrix}.\] If $e_1,\ldots,e_n$ are the canonical column vectors $e_1=(1,0,\dots,0)$, etc., then $C_f\cdot e_1=e_2$, \ldots, $C_f \cdot e_{n-1}=e_{n}$, and $C_f\cdot e_n = -c_{n} e_1-\dots -c_1 e_n$. Consequently, \[ f(C_f)\cdot e_1 = C_f^n\cdot e_1 +c_1 C_f^{n_1} \cdot e_1+ \dots +c_n e_1 = 0.\] Moreover, for $2\leq k\leq n$, one has $f(C_f)\cdot e_k=f(C_f)\cdot C_f^{k-1}\cdot e_1=C_f^{k-1}\cdot f(C_f)\cdot e_1=0$. Consequently, $f(C_f)=0$ and the complex eigenvalues of $f(C_f)$ are roots of $f$. Since $f$ has simple roots, $C_f$ is diagonalizable and their exists a matrix $P\in\mathrm{GL}(n,\mathbf C)$ and diagonal matrix $D$ such that $C_f=P\cdot D\cdot P^{-1}$, the diagonal entries of $D$ are roots of $f$. Since $0$ is not a root of $f$, the matrix $D$ is invertible, and $C_f$ is invertible as well. More precisely, one can infer from the definition of $C_f$ that $g(C_f)\cdot e_1\neq 0$ for any nonzero polynomial $g$ of degre $<n$, so that $f$ is the minimal polynomial of $C_f$. Consequently, all of its roots are actually eigenvalues of $C_f$, and appear in $D$; in particular, $\alpha$ is an eigenvalue of $C_f$.

For every $k\geq 1$, one has $C_f^k=P\cdot (D^k)\cdot P^{-1}$. Since all entries of $D$ have absolute value at most $1,$ the set of all $D^k$ is bounded in $\mathrm{M}_n(\mathbf C)$. Consequently, the set $\{C_f^k\,;\, k\in\mathbf Z\}$ is bounded in $\mathrm M_n(\mathbf C)$. On the other hand, this set consists in matrices in $\mathrm M_n(\mathbf Z)$. It follows that this set is finite.

There are integers $k$ and $\ell$ such that $k< \ell$ and $C_f^k=C_f^\ell$. Since $C_f$ is invertible, one has $C_f^{\ell-k}=1$. Since $\alpha$ is an eigenvalue of $C_f,$ this implies $\alpha^{\ell-k}=1$. We thus have proved that $\alpha$ is a root of unity.

Friday, September 13, 2024

The combinatorial Nullstellensatz

The “combinatorial Nullstellensatz” is a relatively elementary statement due to Noga Alon (1999) whose name, while possibly frightening, really says what it is and what it is good for. (A freely available version is there.) Nullstellensatz is the classic name for a theorem of David Hilbert that relates loci in $F^n$ defined by polynomial equations and the ideals of the polynomial ring $F[T_1,\dots,T_n]$ generated by these equations, when $F$ is an algebraically closed field. The word itself is German and means “theorem about the location of zeroes”. The precise correspondence is slightly involved, and stating it here precisely would derail us from the goal of this post, so let us stay with these informal words for the moment. The adjective combinatorial in the title refers to the fact that Alon deduced many beautiful consequences of this theorem in the field of combinatorics, and for some of them, furnishes quite simple proofs. We'll give one of them below, the Cauchy—Davenport inequality, but my main motivation in writing this text was to discuss the proof of the combinatorial Nullstellensatz, in particular untold aspects of the proofs which took me some time to unravel.$\gdef\Card{\operatorname{Card}}$

The context is the following: $F$ is a field, $n$ is a positive integer, and one considers finite sets $S_1,\dots,S_n$ in $F$, and the product set $S=S_1\times \dots \times S_n$ in $F^n$. One considers polynomials $f\in F[T_1,\dots,T_n]$ in $n$ indeterminates and their values at the points $(s_1,\dots,s_n)$ of $S$. For every $i\in\{1,\dots,n\}$, set $g_i = \prod_{a\in S_i} (T_i-a)$; this is a polynomial of degree $\Card(S_i)$ in the indeterminate $T_i$.

Theorem 1. — If a polynomial $f\in F[T_1,\dots,T_n]$ vanishes at all points $s\in S$, there exist polynomials $h_1,\dots,h_n\in F[T_1,\dots,T_n]$ such that $f=g_1 h_1+\dots+g_n h_n$, and for each $i$, one has $\deg(g_i h_i)\leq \deg (f)$.

This theorem is really close to the classic Nullstellensatz, but the specific nature of the set $S$ allows to have a weaker hypothesis (the field $F$ is not assumed to be algebraically closed) and a stronger conclusion (the Nullstellensatz would assert that there exists some power $f^k$ of $f$ that is of this form, saying nothing of the degrees).

Its proof relies on a kind of Euclidean division algorithm. Assume that $f$ has some monomial $c_mT^m=c_mT_1^{m_1}\dots T_n^{m_n}$ where $m_i\geq \Card(S_i)$; then one can consider a Euclidean division (in the variable $T_i$), $T_i^{m_i}=p_i g_i + r_i$, where $\deg(r_i)<\Card(S_i)$. One can then replace this monomial $c_mT^m$ in $f$ by $c_m T^m/T_i^{m_i})r_i$, and, at the same time register $c_m T^m/T_i^{m_i} p_i$ to $h_i$. Since this amounts to subtract some multiple of $g_i$, the vanishing assumption still holds. Applying this method repeatedly, one reduces to the case where the degree of $f$ in the variable $T_i$ is $<\Card(S_i)$ for all $i$.
Then a variant of the theorem that says that a polynomial in one indeterminate has no more roots than its degree implies that $f\equiv 0$.

A weak point in the written exposition of this argument is the reason why the iterative construction would terminate. Intuitively, something has to decrease, but one needs a minute (or an hour, or a week) of reflexion to understand what precisely decreases. The problem is that if one works one monomial at a time, the degrees might stay the same. The simplest reason why this procedure indeed works belongs to the theory of Gröbner bases and to the proof that the division algorithm actually works: instead of the degrees with respect to all variables, or of the total degree, one should consider a degree with respect to some lexicographic ordering of the variables — the point is that it is a total ordering of the monomials, so that if one consider the leading term, given by the largest monomial, the degree will actually decrease.

The second theorem is in fact the one which is needed for the applications to combinatorics. Its statement is rather written in a contrapositive way — it will show that $f$ does not vanish at some point of $S$.

Theorem 2.Let $f\in F[T_1,\dots,T_m]$ and assume that $f$ has a nonzero monomial $c_mT^m$, where $|m|$ is the total degree of $f$. If, moreover, $m_i<\Card(S_i)$ for all $i$, then there exists a point $s=(s_1,\dots,s_n)\in S$ such that $f(s)\neq 0$.

It is that theorem whose proof took me a hard time to understand. I finally managed to formalize it in Lean, hence I was pretty sure I had understood it. In fact, writing this blog post helped me simplify it further, removing a couple of useless arguments by contradiction! Anyway, I feel it is worth being told with a bit more detail than in the original paper.

We argue by contradiction, assuming that $f(s)=0$ for all $s\in S_1\times\dots \times S_n$. Applying theorem 1, we get an expression $f=\sum_{i=1}^n g_i h_i$ where $\deg(g_ih_i)\leq \deg(f)$ for all $i$. The coefficient of $T^m$ in $f$ is non zero, by assumption; it is also the sum of the coefficients of the coefficients of $T^m$ in $g_i h_i$, for $1\leq i\leq n$, and we are going to prove that all of them vanish — hence getting the desired contradiction. $\gdef\coeff{\operatorname{coeff}}$

Fix $i\in\{1,\dots, n\}$.  If $h_i=0$, then $\coeff_m(g_ih_i)=\coeff_m(0)=0$, hence we may assume that $h_i\neq0$. This implies that $\deg(g_i h_i)=\Card(S_i)+\deg(h_i) \leq \deg(f)=|m|$.
One then has
\[ \coeff_m(g_i h_i)=\sum_{p+q=m} \coeff_p (g_i) \coeff_q(h_i), \]
and it suffices to prove that all terms of this sum vanish. Fix $p,q$ such that $p+q=m$, assume that  $\coeff_p(g_i)\neq 0$, and let us prove that $\coeff_q(h_i)=0$.

By the very definition of $g_i$ as a product of $\Card(S_i)$ factors of the form $T_i-a$, this implies that $p_j=0$ for all $j\neq i$. Moreover, $p_i\leq p_i+q_i=m_i < \Card(S_i)$, by the assumption of the theorem, hence $p_i<\Card(S_i)$. This implies \[\Card(S_i)+\deg(h_i)\leq |m|= |p+q|=|p|+|q|=p_i+|q|<\Card(S_i)+|q|.\]
Subtracting $\Card(S_i)$ on both sides, one gets $\deg(h_i)<|q|$, hence $\coeff_q(h_i)=0$, as was to be shown.

To conclude, let us add the combinatorial application to the Cauchy—Davenport theorem.
$\gdef\F{\mathbf F}$

Theorem 3.Let $p$ be a prime number and let $A$ and $B$ be nonempty subsets of $\F_p$, the field with $p$ elements. Denote by $A+B$ the set of all $a+b$, for $a\in A$ and $b\in B$. Then
\[ \Card(A+B) \geq \min (p, \Card(A)+\Card(B)-1).\]


First consider the case where $\Card(A)+\Card(B)>p$, so that $\min(p,\Card(A)+\Card(B)-1)=p$.  In this case, for every $c\in\F_p$, one has
\[\Card(A \cap (c-B))\cup \Card(A\cup (c-B))=\Card(A)+\Card(c-B)>\Card(A)+\Card(B)>p,\]
so that the sets $A$ and $c-B$ intersect. In particular, there exist $a\in A$ and $b\in B$ such that $a+b=c$ and $A+B=\F_p$, and the desired inequality holds.

Let us now treat the case where $\Card(A)+\Card(B)\leq p$, so that $\min(p,\Card(A)+\Card(B)-1)=\Card(A)+\Card(B)-1$. We will assume that the conclusion of the theorem does not hold, that is, $\Card(A+B)\leq \Card(A)+\Card(B)-2$, and derive a contradiction. Let us consider the polynomial $f\in \F_p[X,Y]$ defined by $f=\prod _{c\in A+B} (X+Y-c)$. The degree of $f$ is equal to $\Card(A+B)\leq \Card(A)+\Card(B)-2$. Choose natural integers $u$ and $v$ such that  $u+v=\Card(A+B)$, $u\leq\Card(A)-1$ and $v\leq \Card(B)-1$. The coefficient of $X^u Y^v$ is equal to the binomial coefficient $\binom{u+v}u$. Since $u+v\leq \Card(A)+\Card(B)-2\leq p-2$, this coefficient is nonzero in $\F_p$.  By theorem 2, there exists $(a,b)\in A\times B$ such that $f(a,b)\neq 0$. However, one has $a+b\in A+B$, hence $f(a,b)=0$ by the definition of $f$. This contradiction concludes the proof that $\Card(A+B)\geq \Card(A)+\Card(B)-1$. It also concludes this post.

Saturday, July 20, 2024

Number theory and finite automata

$\gdef\abs#1{\lvert#1\rvert}$

There are many parts of number theory I don't know of, and today's story is about one I learnt very recently, at the intersection of number theory and computer science. It is about the natural numbers that we can recognize using a finite automaton. 

Finite automata and automatic functions

Finite automata

Finite automata are very elementary computers. These machines can receive instructions of finitely many sorts, and can be in finitely many states; moreover, their behaviour is prescribed: when, in a given state, they receive one instruction, they just move to another state. Mathematically, one can say that there is a finite set $A$ of instructions, a finite set $S$ of states, and a mapping $\phi\colon A \times S\to S$. One state $\varepsilon$ is labeled as the initial state; wen reading a list of instructions $[a_1,\dots,a_m]$, the machine goes through the states $s_0=\varepsilon$, $s_1=\mu(a_1,s_0)$, $s_2=\mu(a_2,s_1)$, etc. until it reaches and outputs its the final state $s_n$. All in all, the automaton defines a function $\Phi$ from the set $A^*$ of lists in $A$ to the set $S$. Such functions are called automatic.

In practice, some states could be labeled as admissible, in which case the machine unlocks the door, and the set of lists $\alpha\in A^*$ such that $\Phi(\alpha)$ is admissible is then called the language of the automaton.

The basic question of automata theory is to describe the (so-called “regular”) languages recognizable by some automaton, more generally the automatic functions. It appears that they are of a very restricted nature: due to the elementary nature of finite automata, they are much more restricted than the kind of languages that would be recognizable by a program in the programming language of your choice. The reason is that they have a fixed memory while “theoretical” computers, as modelled by Turing machines for example, have a potentially infinite memory, that is, as large as the computation may require.

Numbers

Number theory enters when you decide that the set $A$ of instructions are digits 0, 1, 2,…, 9 and the lists in $A^*$ represent a number. You then wish to understand what numbers can be recognizable, or what functions $\Phi\colon\mathbf N\to S$ are automatic, for a finite set $S$. This is what happens, for example, in the very elementary finite automata that stand at almost all Paris doors as electronic locks. They recognize exactly one 4-digit number (and any 4-digit number) puts them back in the initial state.

When it comes about digits, they can be the usual ones, 0, 1, 2,… , 9, writing number in base 10, but any other basis can be considered, and any set of digits that allows to write all numbers can be used. For example, one can consider writing numbers in base 3, with digits 0, 1, 2, 3, 4.

Let us thus fix some basis $b$. The first theorem is that the set of automatic functions does not depend on the set of digits which is considered. The reason is that one can build another kind of machine (called a transducer) that works like a finite automaton except that it output something at each step. Here, we can have a transducer that outputs the digits in some representation given the digits in another one. There could be carries to take care of, but when the basis is the same, their impact is controlled, and it can be controlled by finitely many of them. It is important for this argument that the basis stays the same, and we will see that in a more striking way later on. For this reason, we will not specify the set of digits in the sequel.

Examples

Here are some examples of recognizable sets of numbers.

  • The empty set works. It suffices to have no terminal state.
  • Singletons work. For example, to recognize “314”, one needs one initial state “$\varepsilon$”, three states labeled “3”, “31” and “314”, and a junk state “J”. The machine moves from “$\varepsilon$” to “3” if it receives “3”, and to “J” otherwise; from “3” to “31” if it receives “1”, etc.
  • The union and the intersection of two recognizable sets are recognizable. Just make an automaton whose sets are pair of states of the two automata that recognize the given states, and simulate the moves of each of them. Similarly, the complement of a recognizable set is recognizable.
  • By the previous examples, finite sets and complements of finite sets are recognizable.
  • Here is a number theoretically more interesting example : arithmetic progressions are recognizable. To produce an automaton that, say, recognize all integers congruent to 1 modulo 4, it suffices to have a machine with 4 states that computes the euclidean reminder step by step.
  • Another number theoretically interesting example: powers of $b$. Indeed their writing in base $b$ consists of one digit “1” followed by a series of “0”.
  • A number theoretically annoying property of recognizable sets: since a finite automaton has finitely many states, the “pumping lemma” shows that for any large enough recognizable number, one can replicate at will some inner part of the writing and get another number. I suppose one can deduce from this property that the set of prime numbers cannot be recognizable. (This is exercise 5.8.12 in the book Automatic sequences, by J-P. Allouche and J. Shallit.)
  • Numbers recognizable in base $b$ are also recognizable in base $b^2$ (or some other power) and conversely: it suffices to emulate the base $b^2$-writing of a number using base $b$-digits, or conversely.

Building from characteristic functions of admissible languages, the preceding results can be turned to examples of automatic functions. In particular, we obtain that functions which are ultimately periodic are automatic.

Cobham's theorem

Statement

Number theory can also enter a topic from two sides at a time, and here is the point of this blog post. Can one compare the sets of recognizable numbers from different bases? By the following theorem, they are genuinely distinct.

Theorem (Cobham, 1969). — Let $S$ be a finite set and let $f\colon \mathbf N\to S$ be a function which is automatic in two bases $b$ and $c$. If $b$ and $c$ are not powers of the same integer, then $f$ is ultimately periodic.
Since ultimately periodic functions are automatic in any basis, this theorem is definitely optimal.

The original proof of Cobham's theorem is said to be difficult, and this theorem has obtained various other simpler proofs, often partially flawed. I wish to describe here the recent proof by Thijmen Krebs (publisher version, arXiv:1801.06704) which calls itself “more reasonable”.

The assumption that $a$ and $b$ are not powers of the same integer will appear in different, equivalent, form in the proof: they do not have common powers.

Local periods

To prove that $f$ is ultimately periodic, Krebs proves that $f$ is periodic on large overlapping intervals.

Lemma 1.Let $f: \mathbf N\to S$ be a function and let $I$ and $J$ be intervals of integers. We assume that $f$ has period $p$ on $I$ and period $q$ on $J$. If $\operatorname{Card}(I\cap J)\geq p+q$, then $f$ has period $p$ on the interval $I\cup J$.

The proof is elementary. Consider $x\in I\cup J$ such that $x+p\in I\cup J$. If both $x$ and $x+p$ belong to $I$, then $f(x)=f(x+p)$. Let us assume otherwise. If $x$ belongs to $I$, but not $x+p$, then $x+p\in J$; since $I\cap J$ has at least $p+q$ elements, $x$ must belongs to $J$. The other cases are similar and show that both $x, x+p$ belong to $J$. Using that $I\cap J$ is large, we pick up elements $y,y+p$ in $I\cap J$ such that $x\equiv y \pmod q$. Then $f(x)=f(y)=f(y+p)=f(x+p)$, the first and last equality using the $q$-periodicity on $J$, and the middle one the $p$-periodicity on $I$.

A diophantine approximation lemma

Lemma 2.Let $a$ and $b$ be two real numbers such that $a,b>1$. For every $\varepsilon>0$, there exist nonzero natural numbers $m$ and $n$ such that $\abs{a^m-b^n}<\varepsilon b^n$.

The result is obvious if $a$ and $b$ have a common power, so we may assume that this is not the case. Then various proofs are possible. For example, one may consider the additive subgroup of $\mathbf R$ generated by $\log(a)$ and $\log(b)$; by assumption, it is dense hence there exists a small linear combination $m\log(a)-n\log(b)$; taking exponentials, $a^m/b^n$ is close to~$1$, as claimed.

Krebs gives a “pigeonhole-like” proof which is nice. For every integer $m\geq1$, consider the unique integer $n_m\geq 1$ such that $1\leq a^m/b^{n_m}<b$. There must be two integers $m$ and $p$ such that $m<p $ and such that $a^m/b^{n_m}$ and $a^p/b^{n_p}$ differ at most from $\varepsilon$, and this implies the result.

Some functional equations

Let $f\colon \mathbf N\to S$ be a function which is automatic in some basis $c$, computed by an automaton with set of states $S$. For every $s\in S$, let $L(s)$ be the set of integers $n$ such that $f(n)=s$. Note that these sets form a partition of $\mathbf N$.

The following lemma is a variant of the pumping lemma in the theory of automata; it is by it that automatic functions acquire their specific properties.

Lemma 3.For every integers $x,y\in L(s)$ and any integer $z$ such that $0\leq z < c^n$, one has $f(x c ^ n+z)=f(yc ^n +z)$.

Indeed, when the automaton reads the integer $xc^n+z$, it first reads $x$, and goes to state $s$; similarly, when it reads $yc^n+z$, it first reads $y$ and reaches the state $s$ too. In both cases it then reads $z$ and ends up in the same final state.

Construction of local periods

We now consider a function $f$ which is automatic in two bases $a$ and $b$ which do not have common powers. We thus have two finite automata $A$ and $B$, with sets of states $S$ and $T$. For each state $s\in S$, we have a set $L(s)$ of integers as above; these sets form a partition of $\mathbf N$. Let $S^\infty$ be the set of states $s\in S$ such that $L(s)$ is infinite. For $t\in T$, we define $M(t)$ similarly, as well as $T^\infty$.

Let $s\in S^\infty$. Since the $M(t)$, for $t\in T$, form a finite partition of $\mathbf N$ and $L(s)$ is infinite, there exists a state $t(s)$ and two distinct integers $x(s)$ and $y(s)\in L(s)\cap M(t(s))$. Let $K$ be an integer strictly greater than all $x(s), y(s)$, for $s\in S^\infty$.

Applying lemma 3 for $\varepsilon = 1/6K$, we obtain integer $m,n$ such that $\abs{a^m-b^n}<a^m/6K$. For $s\in S^\infty$, we $p(s)=\pm (x(s)-y(s))(a^m-b^n)$; since $a$ and $b$ have no common power, this is a nonzero integer, and we choose the sign so that it is strictly positive.

For each integer $x$, we define an interval $I_x$ of integers by $I_x=[(x+\frac13)a^m,(x+\frac53)a^m]\cap\mathbf N$.

Lemma 4.For every state $s\in S_\infty$ and any integer $x\in L(s)$, the function $f$ has period $p(s)$ on $I_x$.

To that aim, we consider $z\in [\frac13a^m,\frac53a^m]$ and prove that $f(xa^m+z)=f(xa^m+z+p)$. First of all, we have the inequality $$ \abs{z-x(s)(a^m-b^n)-b^n} \leq \abs{z-a^m}+(x(s)+1) \abs{a^m-b^n} \leq \frac23 a^m + K\abs{a^m-b^n} \leq \frac56a^m\leq b^n $$ because $\abs{a^m-b^n}\leq \frac16 a^m$. Consequently, $z$ is written with at most $n$ digits in base $b$. Applying lemma 3, we have $ f(x a^m+z) = f(x(s)a^m+z$, because $x$ and $x(s)\in L(s)$. Applying lemma 3 again, this is equal to $f(x(s)a^m+z-x(s)(a^m-b^n))$, hence to $f(y(s)a^m+z-x(s)(a^m-b^n)) = f(y(s)a^m+z+p(s))$. Applying lemma 3 a third time, this is equal to $f(xa^m+z+p(s)$. This concludes the proof.

Conclusion

Since the sets $L(s)$, for $s\notin S_\infty$ are finite, the set $L(s)$ for $s\in S_\infty$ cover all integers larger than some integer, say $x$.

For $y\geq x$, there exists $s\in S_\infty$ such that $y\in L(s)$ and we have shown that $f$ is periodic with period $p(s)\leq \frac16a^m$ on an interval $I(x)=[(y+\frac13)a^m,(y+\frac53)a^m]$. By abuse of notation, write $p(x)=p(s)$.

Let $J(z)$ be the union of the intervals $I(y)$, for $x\leq y\leq z$. One has $J(x)=I(x)$; for $z>x$, the intersection $J(z)\cap J(z-1)$ is $[(z+\frac13)a^m;(z+\frac23)a^m]$ hence has at least $\frac13a^m$ points, thus at least $p(x)+p(z)$. Using lemma 1, $f$ has period $p(x)$ on $J(z)$, for all $z$.

This concludes the proof that $f$ is ultimately periodic

Thursday, July 6, 2023

Electrostatics and rationality of power series

I would like to tell here a story that runs over some 150 years of mathematics, around the following question: given a power series $\sum a_n T^n$ (in one variable), how can you tell it comes from a rational function?

There are two possible motivations for such a question. One comes from complex function theory: you are given an analytic function and you wish to understand its nature — the simplest of them being the rational functions, it is natural to wonder if that happens or not (the next step would be to decide whether that function is algebraic, as in the problem of Hermann Amandus Schwarz (1843–1921). Another motivation starts from the coefficients $(a_n)$, of which the power series is called the generating series; indeed, the generating series is a rational function if and only if the sequence of coefficients satisfies a linear recurrence relation.

At this stage, there are little tools to answer that question, besides is a general algebraic criterion which essentially reformulates the property that the $(a_n)$ satisfy a linear recurrence relation. For any integers $m$ and $q$, let $D_m^q$ be the determinant of size $(q+1)$ given by \[ D_m^q = \begin{vmatrix} a_m & a_{m+1} & \dots & a_{m+q} \\ a_{m+1} & a_{m+2} & \dots & a_{m+q+1} \\ \vdots & \vdots & & \vdots \\ a_{m+q} & a_{m+q+1} & \dots & a_{m+2q} \end{vmatrix}. \] These determinants are called the Hankel determinants or (when $m=0$) the Kronecker determinants, from the names of the two 19th century German mathematicians Hermann Hankel (1839—1873) and Leopold von Kronecker (1823–1891). With this notation, the following properties are equivalent:

  1. The power series $\sum a_n T^n$ comes from a rational function;
  2. There is an integer $q$ such that $D_m^q=0$ for all large enough integers $m$;
  3. For all large enough integers $q$, one has $D^q_0=0$.
(The proof of that classic criterion is not too complicated, but the standard proof is quite smart. In his book Algebraic numbers and Fourier analysis, Raphaël Salem gives a proof which arguably easier.)

Since this algebraic criterion is very general, it is however almost impossible to prove the vanishing of these determinants without further information, and it is at this stage that Émile Borel enters the story. Émile Borel (1871–1956) has not only be a very important mathematician of the first half of the 20th century, by his works on analysis and probability theory, he also was a member of parliament, a minister of Navy, a member of Résistance during WW2. He founded the French research institution CNRS and of the Institut Henri Poincaré. He was also the first president of the Confédération des travailleurs intellectuels, a intellectual workers union.

In his 1893 paper « Sur un théorème de M. Hadamard », Borel proves the following theorem:

Theorem. — If the coefficients \(a_n\) are integers and if the power series \(\sum a_n T^n \) “defines” a function (possibly with poles) on a disk centered at the origin and of radius strictly greater than 1, then that power series is a rational function.

Observe how these two hypotheses belong to two quite unrelated worlds: the first one sets the question within number theory while the second one resorts from complex function theory. It looks almost as magic that these two hypotheses lead to the nice conclusion that the power series is a rational function.

It is also worth remarking that the second hypothesis is really necessary for the conclusion to hold, because rational functions define functions (with poles) on the whole complex plane. The status of the first hypothesis is more mysterious. While it is not necessary, the conclusion may not hold without it. For example, the exponential series \(\sum T^n/n!\) does define a function (without poles) on the whole complex plane, but is not rational (it grows too fast at infinity).

However, the interaction of number theoretical hypotheses with the question of the nature of power series was not totally inexplored at the time of Borel. For example, a 1852 theorem of the German mathematician Gotthold Eisenstein (Über eine allgemeine Eigenschaft der Reihen-Entwicklungen aller algebraischen Functionen) shows that when the coefficients \(a_n\) of the expansion \(\sum a_nT^n\) of an algebraic functions are rational numbers, the denominators are not arbitrary: there is an integer \(D\geq 1\) such that for all \(n\), \(a_n D^{n+1}\) is an integer. As a consequence of that theorem of Eisenstein, the exponential series or the logarithmic series cannot be algebraic.

It's always time somewhere on the Internet for a mathematical proof, so that I have no excuse for avoiding to tell you *how* Émile Borel proved that result. He uses the above algebraic criterion, hence needs to prove that some determinants \(D^q_m\) introduced above do vanish (for some \(q\) and for all \(m\) large enough). Then his idea consists in observing that these determinants are integers, so that if you wish to prove that they vanish, it suffices to prove that they are smaller than one!

If non-mathematicians are still reading me, there's no mistake here: the main argument for the proof is the remark that a nonzero integer is at least one. While this may sound as a trivial remark, this is something I like to call the main theorem of number theory, because it lies at the heart of almost all proofs in number theory.

So one has to bound determinants from above, and here Borel invokes the « théorème de M. Hadamard » that a determinant, being the volume of the parallelipiped formed by the rows, is smaller than the product of the norms of these rows, considered as vectors of the Euclidean space : in 2-D, the area of a parallelogram is smaller than the lengths of its edges! (Jacques Hadamard (1865—1963) is known for many extremely important results, notably the first proof of the Prime number theorem. It is funny that this elementary result went into the title of a paper!)

But there's no hope that using Hadamard's inequality of our initial matrix can be of some help, since that matrix has integer coefficients, so that all rows have size at least one. So Borel starts making clever row combinations on the Hankel matrices that take into accounts the poles of the function that the given power series defines.

Basically, if \(f=\sum a_nT^n\), there exists a polynomial \(h=\sum c_mT^m\) such that the power series \(g=fh = \sum b_n T^n\) defines a function without poles on some disk \(D(0,R)\) where \(R>1\). Using complex function theory (Cauchy's inequalities), this implies that the coefficients \(b_n\) converge rapidly to 0, roughly as \(R^{-n}\). For the same reason, the coefficients \(a_n\) cannot grow to fast, at most as \(r^{-n}\) for some \(r>0\). The formula \(g=fh\) shows that coefficients \(b_n\) are combinations of the \(a_n\), so that the determinant \(D_n^q\) is also equal to \[ \begin{vmatrix} a_n & a_{n+1} & \dots & a_{n+q} \\ \vdots & & \vdots \\ a_{n+p-1} & a_{n+p} & \dots & a_{n+p+q-1} \\ b_{n+p} & b_{n+p+1} & & b_{n+p+q} \\ \vdots & & \vdots \\ b_{n+q} & b_{n+q+1} & \dots & b_{n+2q} \end{vmatrix}\] Now, Hadamard's inequality implies that the determinant \(D_n^q\) is (roughly) bounded above by \( (r^{-n} )^p (R^{-n}) ^{q+1-p} \): there are \(p\) lines bounded above by some \(r^{-n}\) and the next \(q+1-p\) are bounded above by \(R^{-n}\). This expression rewrites as \( 1/(r^pR^{q+1-p})^n\). Since \(R>1\), we may choose \(q\) large enough so that \(r^p R^{q+1-p}>1\), and then, when \(n\) grows to infinity, the determinant is smaller than 1. Hence it vanishes!

The next chapter of this story happens in 1928, under the hands of the Hungarian mathematician George Pólya (1887-1985). Pólya had already written several papers which explore the interaction of number theory and complex function theory, one of them will even reappear later one in this thread. In his paper “Über gewisse notwendige Determinantenkriterien für die Fortsetzbarkeit einer Potenzreihe”, he studied the analogue of Borel's question when the disk of radius \(R\) is replaced by an arbitrary domain \(U\) of the complex plane containing the origin, proving that if \(U\) is big enough, then the initial power series is a rational function. It is however not so obvious how one should measure the size of \(U\), and it is at this point that electrostatics enter the picture.

In fact, it is convenient to make an inversion : the assumption is that the series \(\sum a_n / T^n\) defines a function (with poles) on the complement of a compact subset \(K\) of the complex plane. Imagine that this compact set is made of metal, put at potential 0, and put a unit electric charge at infinity. According to the 2-D laws of electrostatics, this create an electric potential \(V_K\) which is identically \(0\) on \(K\) and behaves as \( V_K(z)\approx \log(|z|/C_K)\) at infinity. Here, \(C_K\) is a positive constant which is the capacity of \(K\).

Theorem (Pólya). — Assume that the \(a_n\) are integers and the series \(\sum a_n/T^n\) defines a function (with poles) on the complement of \(K\). If the capacity of \(K\) is \(\lt1\), then \(\sum a_n T^n\) is rational.

To apply this theorem, it is important to know of computations of capacities. This was a classic theme of complex function theory and numerical analysis some 50 years ago. Indeed, what the electric potential does is solving the Laplace equation \(\Delta V_K=0\) outside of \(K\) with Dirichlet condition on the boundary of \(K\).

In fact, the early times of complex analysis made a remarkable use of this fact. For example, it was by solving the Laplace equation that Bernhard Riemann proved the existence of meromorphic functions on “Riemann surfaces”, but analysis was not enough developed at that time (around 1860). In a stunningly creative move, Riemann imagines that his surface is partly made of metal, and partly of insulating material and he deduces the construction of the desired function from the electric potential.

More recently, complex analysis and potential theory also had applications to fluid dynamics, for example to compute (at least approximately) the flow of air outside of an airplane wing. (I am not a specialist of this, but I'd guess the development of numerical methods that run on modern computers rendered these beautiful methods obsolete.)

The relation between the theorems of Borel and Pólya is that the capacity of a disk is its radius. This can be seen by the fact that \(V(z)=\log(|z|/R\)\) solves the Laplace equation with Dirichlet condition outside of the disk of radius \(R\).

A few other capacities have been computed, not too many, in fact, because it appears to be a surprisingly difficult problem. For example, the capacity of an interval is a fourth of its length.

Pólya's proof is similar to Borel's, but considers the Kronecker determinant in place of Hankel's. However, the linear combinations that will allow to show that this determinant is small are not as explicit as in Borel's proof. They follow from another interpretation of the capacity introduced by the Hungarian-Israeli mathematician Michael Fekete (1886–1957; born in then Austria–Hungary, now Serbia, he emigrated to Palestine in 1928.)

You know that the diameter \(d_2(K)\) of \(K\) is the upper bound of all distances \(|x-y|\) where \(x,y\) are arbitrary points of \(K\). Now for an integer \(n\geq 2\), consider the upper bound \(d_n(K)\) of all products of distances \( \prod_{i\neq j}{x_j-x_i}\)^{1/n(n-1)}\) where \(x_1,\dots,x_n\) are arbitrary points of \(K\). It is not so hard to prove that the sequence \(d_n(K)\) decreases with \(n\), and the limit \(\delta(K)\) of that sequence is called the transfinite diameter by Fekete.

Proposition. — \( \delta(K)= C_K\).

This allows to make a link between capacity theory and another theme of complex function theory, namely the theory of best approximation, which end up in Pólya's proof: the adequate linear combination for the \(n\)th row is given by the coefficients of the monic polynomial of degree \(n\) which has the smallest least upper bound on \(K\).

If all this is of some appeal to you, there's a wonderful little book by Thomas Ransford, Potential Theory in the Complex Plane, which I find quite readable (say, from 3rd or 4th year of math studies on).

In the forthcoming episodes, I'll discuss two striking applications of the theorems of Borel and Pólya to proof by Bernhard Dwork of a proof of a conjecture of Weil (in 1960), and by a new proof (in 1987) by Jean-Paul Bézivin and Philippe Robba of the transcendence of the numbers \(e\) and \(\pi\), two results initially proven by Charles Hermite and Ferdinand von Lindemann in 1873 and 1882.

Saturday, March 27, 2021

The Fermat–Lévy constant

Undergraduate analysis is full of computions of strange trigonometric series and I posted on Twitter, as a challenge, to compute the following constant: $$ c = \sum_{p=3}^\infty \frac1{p^2} \int_0^{2\pi} \left( \sum_{n=1}^\infty \frac1{n^2} \cos(nx)\right)^3\, dx. $$ As some followers quickly remarked, there could be some irony with this exercise, because quite unexpectedly, they could solve it using the solution to Fermat's Last Theorem. So I'll start with the origin of this problem.

(Light) philosophy of mathematics

This constant came as a screen capture from a 1950 paper by Paul Lévy, Axiome de Zermelo et nombres transfinis (Zermelo axiom and transfinite numbers) where he discusses various reasons to accept, or refuse, Zermelo's principle according to which every set can be endowed with a well-ordering.

Zermelo had proved this principle in 1905 and we know today that this principle is equivalent to the “axiom of choice”. In fact, given a set $S$, Zermelo chooses once and for all, for every nonempty subset $A$ of $S$ some element $f(A)$ of $A$ and it seems that this is precisely this simultaneous quantity of choices that led his contemporaries to realize the necessity of the axiom of choice, as an axiom, first for philosophical reasons — because they objected to the very possibility of doing so.

We know today how Zermelo's theorem is equivalent to the axiom of choice, that it is essentially inoccuous (Gödel, 1938, showing that if there is a set theory without assuming choice, there is some set theory where the axiom of choice is valid), but that it is also unprovable from the other axioms of set theory, say, those (ZF) proposed by Zermelo and Fraenkel (Cohen, 1965, showing that if there is a set theory without assuming the axiom of choice, there is some set theory where the axiom of choice does not hold).

We also know how this axiom, whose usefulness is recognized in the proofs of many theorems of algebra (existence and uniqueness of an algebraic closure, of prime or maximal ideals, of basis for vector spaces, etc.), also leads to paradoxical constructions (such as nonmeasurable sets, or the Banach–Tarski paradoxical decomposition of the unit sphere into finitely many parts that can be recombined into two unit spheres). For many modern mathematicians, the question is now that of a risk-benefice balance — using this axiom on safe grounds, say. However, for mathematicians such as Lebesgue, Borel or Brouwer, who were inspired by a constructive or intuitionistic philosophy of mathematics, this axiom had to be rejected, because they objected to the idea of using objects that cannot be named.

In his 1950 paper, Paul Lévy discusses these objections and explains why he rejects them. As he recognizes, the nature of language only allows for a countable amount of mathematical formulas, hence we can only name a countable amount of mathematical objects, and we know – this is a 1878 theorem of Cantor – that the continuum, for example, is not countable. There must exist, says Paul Lévy, numbers that cannot be named, « des nombres innomables ». I should add that Loren Graham and Jean-Michel Kantor wrote a book on that topic, Naming Infinity: A True Story of Religious Mysticism and Mathematical Creativity, which contains a whole chapter on what they call “the French trio: Borel, Lebesgue, Baire”.

At this point, Lévy introduces the constant whose definition starts this note. Let us now read what Lévy tells us (my translation):

Do mathematical objects exist in themselves, or only as products of the human mind and of our human logic? The point of view of Mr Brouwer regarding the truth of the statements of theorems forces us to ask the question.

Let us consider a precise statement, to fix ideas, that of Fermat's theorem. For us, it is true or false, and the sum $$ c = \sum_{p=3}^\infty \int_0^{2\pi} \left( \sum_1^\infty \frac{\cos n^p x}{n^2}\right)^3\, dx $$ is zero in the first case and positive in the second. If it is false, one can check that it is false (although we do not know in advance how much time is needed). If it is true, an infinite number of checks would be necessary to assure it, and nothing guarantees a priori that this infinite number can be replaced by a finite reasoning. So there are three possible hypotheses (for some theorems, there would be four): the theorem is false; it is true and provable; it is true and unprovable.

In this latter case, Mr Brouwer refuses to consider this theorem as true, and the constant $c$ is neither zero nor positive (nor negative, of course).

Let us detail his argument a little bit. Lévy says that his strangely defined constant $c$ is zero in case FLT is true, and is strictly positive otherwise. But what is its value? For some mathematicians such as Brouwer, this constant can only have a value if we can prove that we are in one case or the other. At the time when Lévy writes his article, in 1950, one did not know yet whether Fermat's last theorem was true or false, it would only be proved in 1995! There is one way to prove that FLT is false, it consists in enumerating all quadruples $(x,y,z,p)$, where $x,y,z$ are strictly positive integers and $p$ is a prime number $\geq 3$, until one reaches some quadruple where $x^p+y^p=z^p$ – this guarantees that $c>0$, Lévy says. But if FLT is true, then this enumeration will go on indefinitely and we will never reach the conclusion. Lévy has an objection to this disappointment, which I find quite amusing, based on the paradox of Zeno of Elea: imagine that we are quicker and quicker at each verification, for example, say twice as fast each time. We could need 1 second for the first quadruple, half a second for the second one, a quarter second for the third one, and so on, and then the whole verification would take 2 seconds. This is obviously a thought experiment and I do not know if physics forbids such a phaenomenon, which anyway Lévy qualifies as supernatural. Computer scientists could object as well, because the complexity of the required computations grows at each step. But let us stop that this philosophical discussion and go back to Lévy's constant.

Computing the Lévy constant

The indicated sum is a series, indexed by prime number $p\geq 3$ (we could consider integers as well) of some integral from $0$ to $2\pi$ of some trigonometric expression (divided by $p^2$) which I will denote by $c_p$.

The point is that $c_p$ is always positive or zero, and that $c_p$ is strictly positive if and only if Fermat's last theorem fails for exponent $p$. It is not precised by Lévy, but the goal of division by $p^2$ is to make sure that the infinite series converges, whatever its sum.

Since cosines are smaller than $1$, the series of all $\cos(n^p x)/n^2$ converges normally and its sum is a continuous function of $x$ which I will denote by $f(x)$. And one has $c_p=\int_0^{2\pi} f(x)^3\,dx$. (Since $f$ is bounded from above independently of $p$, for example by $\pi^2/6$, each integral is uniformly bounded, and the division by $p^2$ will make the whole series convergent.) To understand this integral, let us expand the cube.

What we get is a sum, indexed by triples of integers $(k,m,n)$ of terms of the form $\cos(k^px) \cos(m^px)\cos(n^px) / k^2m^2n^2$, and we need to integrate each of them from $0$ to $2\pi$.

Several methods are possible, such as knowing one's formulas for products of cosines, but I prefer replacing the cosines by their complex exponential expression (Euler's formula), so that each term gets replaced by a sum of 8 terms of the form $e^{\pm i k^p x} e^{\pm i m^p x} e^{\pm i n^p x} / 8 k^2m^2n^2$. Let us factor the exponentials: we get $e^{ i (\pm k^p\pm m^p\pm n^p) x} / 8k^2m^2 n^2$, and its integral from $0$ to $2\pi$ is equal to $\pi/4k^2m^2n^2$ if the argument of $x$ vanishes, and is zero otherwise.

In other words, the triple $(k,m,n)$ contributes as a strictly positive quantity to the sum as soon as $(\pm k,\pm m,\pm n)$ is a counterexample to Fermat's theorem for exponent $p$. Amusing, isn't it? Yes, but you will admit that it is essentially tautological… However…

The circle method

I now reach deeper, in any case mathematically more efficient, considerations and wish to say some words of the circle method. I'll take as an example the problem that the English mathematician Waring had posed in 1770.

In his Arithmetics, Diophantus had asked whether every integer is a sum of 4 perfect squares of integers, and Lagrange proved in 1770 that it is indeed the case. Waring generalized the question to higher powers: is it true that every integer is the sum of at most 9 perfect cubes of integers, or 19 fourth powers, etc.? These two numbers, 9 and 19, correspond to the fact that we need indeed 9 cubes to write 23 ($23 = 2^3 + 2^3 + 2^3 + 1 +1 + 1 + 1 + 1$) and 19 fourth powers to write $79$. So Waring was asking whether this necessary bound was, in fact, sufficient.

We know today that this is the case for cubes (Wieferich, 1909; Kempner, 1912) ou les puissances quatrièmes (Balasubramanian, Dress, Deshouillers, 1986). We also know that there is, for every integer $k\geq3$, some smallest integer $g(k)$ such that every integer is the sum of at most $g(k)$, as well as an upper bound which conjecturally is an equality, namely $g(k)=2^k+\lfloor (3/2)^k\rfloor - 2$.

However, the question seems to me as rather less interesting, when $k$ grows, than the asymptotic analogue: every large enough integer is the sum of at most $G(k)$ perfect $k$th powers, because for this problem a much better bound is known to hold: $G(k)\leq 2k \log(k)+\cdots$ (Vinogradov, Karatsuba, Vaughan, Wooley…).

And to establish these results, trigonometric series strike back, as the heart of the circle method of Hardy and Littlewood.

To write $N$ as a sum of $k$th powers, the idea is to set $f(x)$ as the sum, for all integers $n$ smaller than $P=N^{1/k}$, of $\exp(2i\pi n^k x)$, and to observe that the number of solution to Waring's equality $n_1^k+\cdots+n_r^k=N$ is given by the integral from $0$ to $1$ of the quantity $f(x)^r \exp(-2i\pi Nx)$.

At this point, we have not been less tautological than Paul Lévy was, and it required the genius of Hardy, Littlewood and their followers to observe that this integral could be actually evaluated. To that aim, the idea is to split the interval $[0;1]$ in two regions. First of all, intervals (major arcs) centered around rational numbers of the form $a/q$, where $q$ is “small”, and the complementary set (minor arcs). Roughly, the contribution of major arcs can be reduced to trigonometric sums in finite fields, using the Chinese remainder theorem and other trics, and it appears to have a simple asymptotic expansion – this is number theory. When the Gods favor us, the contribution of the minor arcs may be bounded efficiently, using more analytic methods, such as Cauchy-Schwarz or Weyl differencing. This requires the integer $r$ to be large enough, and then one gets an asymptotic expansion for the number of $r$ perfect $k$th powers that add up to $N$, and it is strictly positive for $N$ large enough!

What one needs to remember is that this works for $r$ quite large. In other words, we can study Diophantine equations whose number of variables is very large compared to the degree. An article by Birch (and which I have not studied well enough…) is called Forms in many variables. Alas, it is completely inoperant for questions such as Fermat's last theorem which has 3 variables and quite a large degree. On the other hand, the circle method is very robust, it allows the equation to be modified – Birch's paper is very flexible – and we know that small modifications of an equation can give rise to dramatic differences regarding its behaviour.

So, as the computer scientists would say, this may be rather a feature than a bug.

Tuesday, December 22, 2020

Celebrating Ramanujan's birthday — From powers of divisors to coefficients of modular forms

In a Twitter post, Anton Hilado reminded us that today (December 22nd) was the birthday of Srinivasa Ramanujan, and suggested somebody explains the “Ramanujan conjectures”. The following blog post is an attempt at an informal account. Or, as @tjf frames it, my christmas present to math twitter.

The story begins 1916, in a paper Ramanujan published in the Transactions of the Cambridge Philosophical Society, under the not so explicit title: On certain arithmetical functions. His goal started as the investigation of the sum $\sigma_s(n)$ of all $s$th powers of all divisors of an integer $n$, and approximate functional equations of the form
\[ \sigma_r(0)\sigma_s(n)+\sigma_r(1)\sigma_s(n-1)+\dots+\sigma_r(n)\sigma_s(0)
\approx \frac{\Gamma(r+1)\Gamma(s+1)}{\Gamma(r+s+2)} \frac{\zeta(r+1)\zeta(s+1)}{\zeta(r+s+2)}\sigma_{r+s+1}(n) + \frac{\zeta(1-r)+\zeta(1-s)}{r+s} n \sigma_{r+s-1}(n), \]
where $\sigma_s(0)=\dfrac12 \zeta(-s)$, and $\zeta$ is Riemann's zeta function. In what follows, $r,s$ will be positive odd integers, so that $\sigma_s(0)$ is half the value of Riemann's zeta function at a negative odd integer; it is known to be a rational number, namely $(-1)^sB_{s+1}/2(s+1)$, where $B_{s+1}$ is the $(s+1)$th Bernoulli number.

This investigation, in which Ramanujan engages without giving any motivation, quickly leads him to the introduction of infinite series,
\[ S_r = \frac12 \zeta(-r) + \frac{1^rx}{1-x}+\frac{2^r x^2}{1-x^2}+\frac{3^rx^3}{1-x^3}+\dots. \]
Nowadays, the parameter $x$ would be written $q$, and $S_r=\frac12 \zeta(-r) E_{r+1}$, at least if $r$ is an odd integer, $E_r$ being the Fourier expansion of the Eisenstein series of weight $r$. The particular cases $r=1,3,5$ are given special names, namely $P,Q,R$, and Ramanujan proves that $S_s$ is a linear combination of $Q^mR^n$, for integers $m,n$ such that $4m+6n=s+1$. Nowadays, we understand this as the fact that $Q$ and $R$ generated the algebra of modular forms—for the full modular group $\mathrm{SL}(2,\mathbf Z)$.

In the same paper, Ramanujan spells out the system of algebraic differential equations satisfied by $P,Q,R$:
\[ x \frac {dP}{dx} = \frac{1}{12}(P^2-Q),  x\frac{dQ}{dx}=\frac13(PQ-R), x\frac{dR}{dx}=\frac12(PR-Q^2). \]

The difference of the two sides of the initial equation has an expansion as a linear combination of $Q^mR^n$, where $4m+6n=r+s+2$. By the functional equation of Riemann's zeta function, relating $\zeta(s)$ and $\zeta(1-s)$, this expression vanishes for $x=0$, hence there is a factor $Q^3-R^2$.

Ramanujan then notes that

$ x\frac{d}{dx} \log(Q^3-R^2)=P$, so that

\[  P =x\frac{d}{dx} \log \left( x\big((1-x)(1-x^2)(1-x^3)\dots\big)^{24} \right) \]

and 

\[ Q^3-R^2 = 1728 x  \big((1-x)(1-x^2)(1-x^3)\dots\big)^{24}= \sum \tau(n) x^n, \]
an expression now known as Ramanujan's $\Delta$-function. In fact, Ramanujan also makes the relation with elliptic functions, in particular, with Weierstrass's $\wp$-function. Then, $\Delta$ corresponds to the discriminant of the degree 3 polynomial $f$ such that $\wp'(u)^2=f(\wp(u))$. 

In any case, factoring $Q^3-R^2$ in the difference of the two terms, it is written as a linear combination of $Q^mR^n$, where $4m+6n=r+s-10$. When $r$ and $s$ are positive odd integers such that $r+s\leq 12$, there are no such pairs $(m,n)$, hence the difference vanishes, and Ramanujan obtains an equality in these cases. 

Ramanujan is interested in the quality of the initial approximation. He finds an upper bound of the form $\mathrm O(n^{\frac23(r+s+1)})$. Using Hardy–Littlewood's method, he shows that it cannot be smaller than $n^{\frac12(r+s)}$. That prompts his interest for the size of the coefficients of arithmetical functions, and $Q^3-R^2$ is the simplest one. He computes the coefficients $\tau(n)$ for $n\leq30$ and gives them in a table:

Recalling that $\tau(n)$ is $\mathrm O(n^7)$, and not $\mathrm O(n^5)$, Ramanujan states that there is reason to believe that $\tau(n)=\mathrm O(n^{\frac{11}2+\epsilon})$ but not $\mathrm O(n^{\frac{11}2})$. That this holds is Ramanujan's conjecture.

Ramanujan was led to believe this by observing that the Dirichlet series $ \sum \frac{\tau(n)}{n^s} $ factors as an infinite product (“Euler product”, would we say), indexed by the prime numbers:

\[ \sum_{n=1}^\infty \frac{\tau(n)}{n^s} = \prod_p \frac{1}{1-\tau(p)p^{-s}+p^{11-2s}}. \]

This would imply that $\tau$ is a multiplicative function: $\tau(mn )=\tau(m)\tau(n)$ if $m$ and $n$ are coprime, as well as the more complicated relation $\tau(p^{k+2})=\tau(p)\tau(p^{k+1})-p^{11}\tau(p^k)$ between the $\tau(p^k)$. These relations have been proved by Louis Mordell in 1917. He introduced operators (now called Hecke operators) $T_p$ (indexed by prime numbers $p$) on the algebra of modular functions and proved that Ramanujan's $\Delta$-function is an eigenfunction. (It has little merit for that, because it is alone in its weight, so that $T_p \Delta$ is a multiple of $\Delta$, necessarily $T_p\Delta=\tau(p)\Delta$.)

The bound $\lvert{\tau(p)}\rvert\leq p^{11/2}$ means that the polynomial $1-\tau(p) X+p^{11}X^2$ has two complex conjugate roots. This part of the conjecture would be proved in 1973 only, by Pierre Deligne, and required many additional ideas. One was conjectures of Weil about the number of points of algebraic varieties over finite fields, proved by Deligne in 1973, building on Grothendieck's étale cohomology. Another was the insight (due to Michio Kuga, Mikio Sato and Goro Shimura) that Ramanujan's conjecture could be reframed as an instance of the Weil conjectures, and its actual proof by Deligne in 1968, applied to the 10th symmetric product of the universal elliptic curve.