Saturday, July 20, 2024

Number theory and finite automata

$\gdef\abs#1{\lvert#1\rvert}$

There are many parts of number theory I don't know of, and today's story is about one I learnt very recently, at the intersection of number theory and computer science. It is about the natural numbers that we can recognize using a finite automaton. 

Finite automata and automatic functions

Finite automata

Finite automata are very elementary computers. These machines can receive instructions of finitely many sorts, and can be in finitely many states; moreover, their behaviour is prescribed: when, in a given state, they receive one instruction, they just move to another state. Mathematically, one can say that there is a finite set $A$ of instructions, a finite set $S$ of states, and a mapping $\phi\colon A \times S\to S$. One state $\varepsilon$ is labeled as the initial state; wen reading a list of instructions $[a_1,\dots,a_m]$, the machine goes through the states $s_0=\varepsilon$, $s_1=\mu(a_1,s_0)$, $s_2=\mu(a_2,s_1)$, etc. until it reaches and outputs its the final state $s_n$. All in all, the automaton defines a function $\Phi$ from the set $A^*$ of lists in $A$ to the set $S$. Such functions are called automatic.

In practice, some states could be labeled as admissible, in which case the machine unlocks the door, and the set of lists $\alpha\in A^*$ such that $\Phi(\alpha)$ is admissible is then called the language of the automaton.

The basic question of automata theory is to describe the (so-called “regular”) languages recognizable by some automaton, more generally the automatic functions. It appears that they are of a very restricted nature: due to the elementary nature of finite automata, they are much more restricted than the kind of languages that would be recognizable by a program in the programming language of your choice. The reason is that they have a fixed memory while “theoretical” computers, as modelled by Turing machines for example, have a potentially infinite memory, that is, as large as the computation may require.

Numbers

Number theory enters when you decide that the set $A$ of instructions are digits 0, 1, 2,…, 9 and the lists in $A^*$ represent a number. You then wish to understand what numbers can be recognizable, or what functions $\Phi\colon\mathbf N\to S$ are automatic, for a finite set $S$. This is what happens, for example, in the very elementary finite automata that stand at almost all Paris doors as electronic locks. They recognize exactly one 4-digit number (and any 4-digit number) puts them back in the initial state.

When it comes about digits, they can be the usual ones, 0, 1, 2,… , 9, writing number in base 10, but any other basis can be considered, and any set of digits that allows to write all numbers can be used. For example, one can consider writing numbers in base 3, with digits 0, 1, 2, 3, 4.

Let us thus fix some basis $b$. The first theorem is that the set of automatic functions does not depend on the set of digits which is considered. The reason is that one can build another kind of machine (called a transducer) that works like a finite automaton except that it output something at each step. Here, we can have a transducer that outputs the digits in some representation given the digits in another one. There could be carries to take care of, but when the basis is the same, their impact is controlled, and it can be controlled by finitely many of them. It is important for this argument that the basis stays the same, and we will see that in a more striking way later on. For this reason, we will not specify the set of digits in the sequel.

Examples

Here are some examples of recognizable sets of numbers.

  • The empty set works. It suffices to have no terminal state.
  • Singletons work. For example, to recognize “314”, one needs one initial state “$\varepsilon$”, three states labeled “3”, “31” and “314”, and a junk state “J”. The machine moves from “$\varepsilon$” to “3” if it receives “3”, and to “J” otherwise; from “3” to “31” if it receives “1”, etc.
  • The union and the intersection of two recognizable sets are recognizable. Just make an automaton whose sets are pair of states of the two automata that recognize the given states, and simulate the moves of each of them. Similarly, the complement of a recognizable set is recognizable.
  • By the previous examples, finite sets and complements of finite sets are recognizable.
  • Here is a number theoretically more interesting example : arithmetic progressions are recognizable. To produce an automaton that, say, recognize all integers congruent to 1 modulo 4, it suffices to have a machine with 4 states that computes the euclidean reminder step by step.
  • Another number theoretically interesting example: powers of $b$. Indeed their writing in base $b$ consists of one digit “1” followed by a series of “0”.
  • A number theoretically annoying property of recognizable sets: since a finite automaton has finitely many states, the “pumping lemma” shows that for any large enough recognizable number, one can replicate at will some inner part of the writing and get another number. I suppose one can deduce from this property that the set of prime numbers cannot be recognizable. (This is exercise 5.8.12 in the book Automatic sequences, by J-P. Allouche and J. Shallit.)
  • Numbers recognizable in base $b$ are also recognizable in base $b^2$ (or some other power) and conversely: it suffices to emulate the base $b^2$-writing of a number using base $b$-digits, or conversely.

Building from characteristic functions of admissible languages, the preceding results can be turned to examples of automatic functions. In particular, we obtain that functions which are ultimately periodic are automatic.

Cobham's theorem

Statement

Number theory can also enter a topic from two sides at a time, and here is the point of this blog post. Can one compare the sets of recognizable numbers from different bases? By the following theorem, they are genuinely distinct.

Theorem (Cobham, 1969). — Let $S$ be a finite set and let $f\colon \mathbf N\to S$ be a function which is automatic in two bases $b$ and $c$. If $b$ and $c$ are not powers of the same integer, then $f$ is ultimately periodic.
Since ultimately periodic functions are automatic in any basis, this theorem is definitely optimal.

The original proof of Cobham's theorem is said to be difficult, and this theorem has obtained various other simpler proofs, often partially flawed. I wish to describe here the recent proof by Thijmen Krebs (publisher version, arXiv:1801.06704) which calls itself “more reasonable”.

The assumption that $a$ and $b$ are not powers of the same integer will appear in different, equivalent, form in the proof: they do not have common powers.

Local periods

To prove that $f$ is ultimately periodic, Krebs proves that $f$ is periodic on large overlapping intervals.

Lemma 1.Let $f: \mathbf N\to S$ be a function and let $I$ and $J$ be intervals of integers. We assume that $f$ has period $p$ on $I$ and period $q$ on $J$. If $\operatorname{Card}(I\cap J)\geq p+q$, then $f$ has period $p$ on the interval $I\cup J$.

The proof is elementary. Consider $x\in I\cup J$ such that $x+p\in I\cup J$. If both $x$ and $x+p$ belong to $I$, then $f(x)=f(x+p)$. Let us assume otherwise. If $x$ belongs to $I$, but not $x+p$, then $x+p\in J$; since $I\cap J$ has at least $p+q$ elements, $x$ must belongs to $J$. The other cases are similar and show that both $x, x+p$ belong to $J$. Using that $I\cap J$ is large, we pick up elements $y,y+p$ in $I\cap J$ such that $x\equiv y \pmod q$. Then $f(x)=f(y)=f(y+p)=f(x+p)$, the first and last equality using the $q$-periodicity on $J$, and the middle one the $p$-periodicity on $I$.

A diophantine approximation lemma

Lemma 2.Let $a$ and $b$ be two real numbers such that $a,b>1$. For every $\varepsilon>0$, there exist nonzero natural numbers $m$ and $n$ such that $\abs{a^m-b^n}<\varepsilon b^n$.

The result is obvious if $a$ and $b$ have a common power, so we may assume that this is not the case. Then various proofs are possible. For example, one may consider the additive subgroup of $\mathbf R$ generated by $\log(a)$ and $\log(b)$; by assumption, it is dense hence there exists a small linear combination $m\log(a)-n\log(b)$; taking exponentials, $a^m/b^n$ is close to~$1$, as claimed.

Krebs gives a “pigeonhole-like” proof which is nice. For every integer $m\geq1$, consider the unique integer $n_m\geq 1$ such that $1\leq a^m/b^{n_m}<b$. There must be two integers $m$ and $p$ such that $m<p $ and such that $a^m/b^{n_m}$ and $a^p/b^{n_p}$ differ at most from $\varepsilon$, and this implies the result.

Some functional equations

Let $f\colon \mathbf N\to S$ be a function which is automatic in some basis $c$, computed by an automaton with set of states $S$. For every $s\in S$, let $L(s)$ be the set of integers $n$ such that $f(n)=s$. Note that these sets form a partition of $\mathbf N$.

The following lemma is a variant of the pumping lemma in the theory of automata; it is by it that automatic functions acquire their specific properties.

Lemma 3.For every integers $x,y\in L(s)$ and any integer $z$ such that $0\leq z < c^n$, one has $f(x c ^ n+z)=f(yc ^n +z)$.

Indeed, when the automaton reads the integer $xc^n+z$, it first reads $x$, and goes to state $s$; similarly, when it reads $yc^n+z$, it first reads $y$ and reaches the state $s$ too. In both cases it then reads $z$ and ends up in the same final state.

Construction of local periods

We now consider a function $f$ which is automatic in two bases $a$ and $b$ which do not have common powers. We thus have two finite automata $A$ and $B$, with sets of states $S$ and $T$. For each state $s\in S$, we have a set $L(s)$ of integers as above; these sets form a partition of $\mathbf N$. Let $S^\infty$ be the set of states $s\in S$ such that $L(s)$ is infinite. For $t\in T$, we define $M(t)$ similarly, as well as $T^\infty$.

Let $s\in S^\infty$. Since the $M(t)$, for $t\in T$, form a finite partition of $\mathbf N$ and $L(s)$ is infinite, there exists a state $t(s)$ and two distinct integers $x(s)$ and $y(s)\in L(s)\cap M(t(s))$. Let $K$ be an integer strictly greater than all $x(s), y(s)$, for $s\in S^\infty$.

Applying lemma 3 for $\varepsilon = 1/6K$, we obtain integer $m,n$ such that $\abs{a^m-b^n}<a^m/6K$. For $s\in S^\infty$, we $p(s)=\pm (x(s)-y(s))(a^m-b^n)$; since $a$ and $b$ have no common power, this is a nonzero integer, and we choose the sign so that it is strictly positive.

For each integer $x$, we define an interval $I_x$ of integers by $I_x=[(x+\frac13)a^m,(x+\frac53)a^m]\cap\mathbf N$.

Lemma 4.For every state $s\in S_\infty$ and any integer $x\in L(s)$, the function $f$ has period $p(s)$ on $I_x$.

To that aim, we consider $z\in [\frac13a^m,\frac53a^m]$ and prove that $f(xa^m+z)=f(xa^m+z+p)$. First of all, we have the inequality $$ \abs{z-x(s)(a^m-b^n)-b^n} \leq \abs{z-a^m}+(x(s)+1) \abs{a^m-b^n} \leq \frac23 a^m + K\abs{a^m-b^n} \leq \frac56a^m\leq b^n $$ because $\abs{a^m-b^n}\leq \frac16 a^m$. Consequently, $z$ is written with at most $n$ digits in base $b$. Applying lemma 3, we have $ f(x a^m+z) = f(x(s)a^m+z$, because $x$ and $x(s)\in L(s)$. Applying lemma 3 again, this is equal to $f(x(s)a^m+z-x(s)(a^m-b^n))$, hence to $f(y(s)a^m+z-x(s)(a^m-b^n)) = f(y(s)a^m+z+p(s))$. Applying lemma 3 a third time, this is equal to $f(xa^m+z+p(s)$. This concludes the proof.

Conclusion

Since the sets $L(s)$, for $s\notin S_\infty$ are finite, the set $L(s)$ for $s\in S_\infty$ cover all integers larger than some integer, say $x$.

For $y\geq x$, there exists $s\in S_\infty$ such that $y\in L(s)$ and we have shown that $f$ is periodic with period $p(s)\leq \frac16a^m$ on an interval $I(x)=[(y+\frac13)a^m,(y+\frac53)a^m]$. By abuse of notation, write $p(x)=p(s)$.

Let $J(z)$ be the union of the intervals $I(y)$, for $x\leq y\leq z$. One has $J(x)=I(x)$; for $z>x$, the intersection $J(z)\cap J(z-1)$ is $[(z+\frac13)a^m;(z+\frac23)a^m]$ hence has at least $\frac13a^m$ points, thus at least $p(x)+p(z)$. Using lemma 1, $f$ has period $p(x)$ on $J(z)$, for all $z$.

This concludes the proof that $f$ is ultimately periodic