Saturday, July 20, 2024

Number theory and finite automata

\gdef\abs#1{\lvert#1\rvert}

There are many parts of number theory I don't know of, and today's story is about one I learnt very recently, at the intersection of number theory and computer science. It is about the natural numbers that we can recognize using a finite automaton. 

Finite automata and automatic functions

Finite automata

Finite automata are very elementary computers. These machines can receive instructions of finitely many sorts, and can be in finitely many states; moreover, their behaviour is prescribed: when, in a given state, they receive one instruction, they just move to another state. Mathematically, one can say that there is a finite set AA of instructions, a finite set SS of states, and a mapping ϕ ⁣:A×SS\phi\colon A \times S\to S. One state ε\varepsilon is labeled as the initial state; wen reading a list of instructions [a1,,am][a_1,\dots,a_m], the machine goes through the states s0=εs_0=\varepsilon, s1=μ(a1,s0)s_1=\mu(a_1,s_0), s2=μ(a2,s1)s_2=\mu(a_2,s_1), etc. until it reaches and outputs its the final state sns_n. All in all, the automaton defines a function Φ\Phi from the set AA^* of lists in AA to the set SS. Such functions are called automatic.

In practice, some states could be labeled as admissible, in which case the machine unlocks the door, and the set of lists αA\alpha\in A^* such that Φ(α)\Phi(\alpha) is admissible is then called the language of the automaton.

The basic question of automata theory is to describe the (so-called “regular”) languages recognizable by some automaton, more generally the automatic functions. It appears that they are of a very restricted nature: due to the elementary nature of finite automata, they are much more restricted than the kind of languages that would be recognizable by a program in the programming language of your choice. The reason is that they have a fixed memory while “theoretical” computers, as modelled by Turing machines for example, have a potentially infinite memory, that is, as large as the computation may require.

Numbers

Number theory enters when you decide that the set AA of instructions are digits 0, 1, 2,…, 9 and the lists in AA^* represent a number. You then wish to understand what numbers can be recognizable, or what functions Φ ⁣:NS\Phi\colon\mathbf N\to S are automatic, for a finite set SS. This is what happens, for example, in the very elementary finite automata that stand at almost all Paris doors as electronic locks. They recognize exactly one 4-digit number (and any 4-digit number) puts them back in the initial state.

When it comes about digits, they can be the usual ones, 0, 1, 2,… , 9, writing number in base 10, but any other basis can be considered, and any set of digits that allows to write all numbers can be used. For example, one can consider writing numbers in base 3, with digits 0, 1, 2, 3, 4.

Let us thus fix some basis bb. The first theorem is that the set of automatic functions does not depend on the set of digits which is considered. The reason is that one can build another kind of machine (called a transducer) that works like a finite automaton except that it output something at each step. Here, we can have a transducer that outputs the digits in some representation given the digits in another one. There could be carries to take care of, but when the basis is the same, their impact is controlled, and it can be controlled by finitely many of them. It is important for this argument that the basis stays the same, and we will see that in a more striking way later on. For this reason, we will not specify the set of digits in the sequel.

Examples

Here are some examples of recognizable sets of numbers.

  • The empty set works. It suffices to have no terminal state.
  • Singletons work. For example, to recognize “314”, one needs one initial state “ε\varepsilon”, three states labeled “3”, “31” and “314”, and a junk state “J”. The machine moves from “ε\varepsilon” to “3” if it receives “3”, and to “J” otherwise; from “3” to “31” if it receives “1”, etc.
  • The union and the intersection of two recognizable sets are recognizable. Just make an automaton whose sets are pair of states of the two automata that recognize the given states, and simulate the moves of each of them. Similarly, the complement of a recognizable set is recognizable.
  • By the previous examples, finite sets and complements of finite sets are recognizable.
  • Here is a number theoretically more interesting example : arithmetic progressions are recognizable. To produce an automaton that, say, recognize all integers congruent to 1 modulo 4, it suffices to have a machine with 4 states that computes the euclidean reminder step by step.
  • Another number theoretically interesting example: powers of bb. Indeed their writing in base bb consists of one digit “1” followed by a series of “0”.
  • A number theoretically annoying property of recognizable sets: since a finite automaton has finitely many states, the “pumping lemma” shows that for any large enough recognizable number, one can replicate at will some inner part of the writing and get another number. I suppose one can deduce from this property that the set of prime numbers cannot be recognizable. (This is exercise 5.8.12 in the book Automatic sequences, by J-P. Allouche and J. Shallit.)
  • Numbers recognizable in base bb are also recognizable in base b2b^2 (or some other power) and conversely: it suffices to emulate the base b2b^2-writing of a number using base bb-digits, or conversely.

Building from characteristic functions of admissible languages, the preceding results can be turned to examples of automatic functions. In particular, we obtain that functions which are ultimately periodic are automatic.

Cobham's theorem

Statement

Number theory can also enter a topic from two sides at a time, and here is the point of this blog post. Can one compare the sets of recognizable numbers from different bases? By the following theorem, they are genuinely distinct.

Theorem (Cobham, 1969). — Let SS be a finite set and let f ⁣:NSf\colon \mathbf N\to S be a function which is automatic in two bases bb and cc. If bb and cc are not powers of the same integer, then ff is ultimately periodic.
Since ultimately periodic functions are automatic in any basis, this theorem is definitely optimal.

The original proof of Cobham's theorem is said to be difficult, and this theorem has obtained various other simpler proofs, often partially flawed. I wish to describe here the recent proof by Thijmen Krebs (publisher version, arXiv:1801.06704) which calls itself “more reasonable”.

The assumption that aa and bb are not powers of the same integer will appear in different, equivalent, form in the proof: they do not have common powers.

Local periods

To prove that ff is ultimately periodic, Krebs proves that ff is periodic on large overlapping intervals.

Lemma 1.Let f:NSf: \mathbf N\to S be a function and let II and JJ be intervals of integers. We assume that ff has period pp on II and period qq on JJ. If Card(IJ)p+q\operatorname{Card}(I\cap J)\geq p+q, then ff has period pp on the interval IJI\cup J.

The proof is elementary. Consider xIJx\in I\cup J such that x+pIJx+p\in I\cup J. If both xx and x+px+p belong to II, then f(x)=f(x+p)f(x)=f(x+p). Let us assume otherwise. If xx belongs to II, but not x+px+p, then x+pJx+p\in J; since IJI\cap J has at least p+qp+q elements, xx must belongs to JJ. The other cases are similar and show that both x,x+px, x+p belong to JJ. Using that IJI\cap J is large, we pick up elements y,y+py,y+p in IJI\cap J such that xy(modq)x\equiv y \pmod q. Then f(x)=f(y)=f(y+p)=f(x+p)f(x)=f(y)=f(y+p)=f(x+p), the first and last equality using the qq-periodicity on JJ, and the middle one the pp-periodicity on II.

A diophantine approximation lemma

Lemma 2.Let aa and bb be two real numbers such that a,b>1a,b>1. For every ε>0\varepsilon>0, there exist nonzero natural numbers mm and nn such that ambn<εbn\abs{a^m-b^n}<\varepsilon b^n.

The result is obvious if aa and bb have a common power, so we may assume that this is not the case. Then various proofs are possible. For example, one may consider the additive subgroup of R\mathbf R generated by log(a)\log(a) and log(b)\log(b); by assumption, it is dense hence there exists a small linear combination mlog(a)nlog(b)m\log(a)-n\log(b); taking exponentials, am/bna^m/b^n is close to~11, as claimed.

Krebs gives a “pigeonhole-like” proof which is nice. For every integer m1m\geq1, consider the unique integer nm1n_m\geq 1 such that 1am/bnm<b1\leq a^m/b^{n_m}<b. There must be two integers mm and pp such that m<pm<p and such that am/bnma^m/b^{n_m} and ap/bnpa^p/b^{n_p} differ at most from ε\varepsilon, and this implies the result.

Some functional equations

Let f ⁣:NSf\colon \mathbf N\to S be a function which is automatic in some basis cc, computed by an automaton with set of states SS. For every sSs\in S, let L(s)L(s) be the set of integers nn such that f(n)=sf(n)=s. Note that these sets form a partition of N\mathbf N.

The following lemma is a variant of the pumping lemma in the theory of automata; it is by it that automatic functions acquire their specific properties.

Lemma 3.For every integers x,yL(s)x,y\in L(s) and any integer zz such that 0z<cn0\leq z < c^n, one has f(xcn+z)=f(ycn+z)f(x c ^ n+z)=f(yc ^n +z).

Indeed, when the automaton reads the integer xcn+zxc^n+z, it first reads xx, and goes to state ss; similarly, when it reads ycn+zyc^n+z, it first reads yy and reaches the state ss too. In both cases it then reads zz and ends up in the same final state.

Construction of local periods

We now consider a function ff which is automatic in two bases aa and bb which do not have common powers. We thus have two finite automata AA and BB, with sets of states SS and TT. For each state sSs\in S, we have a set L(s)L(s) of integers as above; these sets form a partition of N\mathbf N. Let SS^\infty be the set of states sSs\in S such that L(s)L(s) is infinite. For tTt\in T, we define M(t)M(t) similarly, as well as TT^\infty.

Let sSs\in S^\infty. Since the M(t)M(t), for tTt\in T, form a finite partition of N\mathbf N and L(s)L(s) is infinite, there exists a state t(s)t(s) and two distinct integers x(s)x(s) and y(s)L(s)M(t(s))y(s)\in L(s)\cap M(t(s)). Let KK be an integer strictly greater than all x(s),y(s)x(s), y(s), for sSs\in S^\infty.

Applying lemma 3 for ε=1/6K\varepsilon = 1/6K, we obtain integer m,nm,n such that ambn<am/6K\abs{a^m-b^n}<a^m/6K. For sSs\in S^\infty, we p(s)=±(x(s)y(s))(ambn)p(s)=\pm (x(s)-y(s))(a^m-b^n); since aa and bb have no common power, this is a nonzero integer, and we choose the sign so that it is strictly positive.

For each integer xx, we define an interval IxI_x of integers by Ix=[(x+13)am,(x+53)am]NI_x=[(x+\frac13)a^m,(x+\frac53)a^m]\cap\mathbf N.

Lemma 4.For every state sSs\in S_\infty and any integer xL(s)x\in L(s), the function ff has period p(s)p(s) on IxI_x.

To that aim, we consider z[13am,53am]z\in [\frac13a^m,\frac53a^m] and prove that f(xam+z)=f(xam+z+p)f(xa^m+z)=f(xa^m+z+p). First of all, we have the inequality zx(s)(ambn)bnzam+(x(s)+1)ambn23am+Kambn56ambn \abs{z-x(s)(a^m-b^n)-b^n} \leq \abs{z-a^m}+(x(s)+1) \abs{a^m-b^n} \leq \frac23 a^m + K\abs{a^m-b^n} \leq \frac56a^m\leq b^n because ambn16am\abs{a^m-b^n}\leq \frac16 a^m. Consequently, zz is written with at most nn digits in base bb. Applying lemma 3, we have f(xam+z)=f(x(s)am+z f(x a^m+z) = f(x(s)a^m+z, because xx and x(s)L(s)x(s)\in L(s). Applying lemma 3 again, this is equal to f(x(s)am+zx(s)(ambn))f(x(s)a^m+z-x(s)(a^m-b^n)), hence to f(y(s)am+zx(s)(ambn))=f(y(s)am+z+p(s))f(y(s)a^m+z-x(s)(a^m-b^n)) = f(y(s)a^m+z+p(s)). Applying lemma 3 a third time, this is equal to f(xam+z+p(s)f(xa^m+z+p(s). This concludes the proof.

Conclusion

Since the sets L(s)L(s), for sSs\notin S_\infty are finite, the set L(s)L(s) for sSs\in S_\infty cover all integers larger than some integer, say xx.

For yxy\geq x, there exists sSs\in S_\infty such that yL(s)y\in L(s) and we have shown that ff is periodic with period p(s)16amp(s)\leq \frac16a^m on an interval I(x)=[(y+13)am,(y+53)am]I(x)=[(y+\frac13)a^m,(y+\frac53)a^m]. By abuse of notation, write p(x)=p(s)p(x)=p(s).

Let J(z)J(z) be the union of the intervals I(y)I(y), for xyzx\leq y\leq z. One has J(x)=I(x)J(x)=I(x); for z>xz>x, the intersection J(z)J(z1)J(z)\cap J(z-1) is [(z+13)am;(z+23)am][(z+\frac13)a^m;(z+\frac23)a^m] hence has at least 13am\frac13a^m points, thus at least p(x)+p(z)p(x)+p(z). Using lemma 1, ff has period p(x)p(x) on J(z)J(z), for all zz.

This concludes the proof that ff is ultimately periodic