Saturday, March 29, 2025

A simple proof of a theorem of Kronecker

Kronecker's theorem of the title is the following.

Theorem.Let αC\alpha\in\mathbf C be an algebraic integer all of whose conjugates have absolute value at most 11. Then either α=0\alpha=0, or α\alpha is a root of unity.

This theorem has several elementary proofs. In this post, I explain the simple proof proposed by Gebhart Greiter in his American Mathematical Monthly note, adding details so that it would (hopefully) be more accessible to students.

The possibility α=0\alpha=0 is rather unentertaining, hence let us assume that α0\alpha\neq0.

Let us first analyse the hypothesis. The assumption that α\alpha is an algebraic integer means that there exists a monic polynomial fZ[T]f\in\mathbf Z[T] such that f(α)=0f(\alpha)=0. I claim that ff can be assumed to be irreducible in Q[T]\mathbf Q[T]; it is then the minimal polynomial of α\alpha. This follows from Gauss's theorem, and let me give a proof for that. Let gQ[T]g\in \mathbf Q[T] be a monic irreducible factor of ff and let hQ[T]h\in\mathbf Q[T] such that f=ghf=gh. Chasing denominators and dividing out by their gcd, there are polynomials g1,h1Z[T]g_1,h_1\in\mathbf Z[T] whose coefficients are mutually coprime, and natural integers u,vu,v such that g=ug1g=ug_1 and h=vh1h=vh_1. Then (uv)f=g1h1(uv)f=g_1 h_1. Since ff is monic, this implies that uvuv is an integer. Let us prove that uv=1uv=1; otherwise, it has a prime factor pp. Considering the relation (uv)f=g1h1(uv)f=g_1h_1 modulo pp gives 0=g1h1(modp)0=g_1 h_1 \pmod p. Since their coefficients are mutually coprime, the polynomials g1g_1 and h1h_1 are nonzero modulo pp, hence their product is nonzero. This is a contradiction.

So we have a monic polynomial fZ[T]f\in\mathbf Z[T], irreducible in Q[T]\mathbf Q[T], such that f(α)=0f(\alpha)=0. That is to say, ff is the minimal polynomial of α\alpha, so that the conjugates of α\alpha are the roots of ff. Note that the roots of ff are pairwise distinct — otherwise, the gcd(f,f)\gcd(f,f') would be a nontrivial factor of ff. Moreover, 00 is not a root of ff, for otherwhise one could factor f=Tf1f=T\cdot f_1.

Let us now consider the companion matrix to ff: writing f=Tn+c1Tn1++cnf=T^n+c_1T^{n-1}+\dots+c_n, so that n=deg(f)n=\deg(f), this is the matrix  Cf=(00cn1cn10cn201c1). C_f = \begin{pmatrix} 0 & \dots & 0 & -c_n \\ 1 & \ddots & \vdots & -c_{n-1} \\ 0 & \ddots & & -c_{n-2} \\ 0 & \cdots & 1 & -c_1 \end{pmatrix}. If e1,,ene_1,\ldots,e_n are the canonical column vectors e1=(1,0,,0)e_1=(1,0,\dots,0), etc., then Cfe1=e2C_f\cdot e_1=e_2, \ldots, Cfen1=enC_f \cdot e_{n-1}=e_{n}, and Cfen=cne1c1enC_f\cdot e_n = -c_{n} e_1-\dots -c_1 e_n. Consequently,  f(Cf)e1=Cfne1+c1Cfn1e1++cne1=0. f(C_f)\cdot e_1 = C_f^n\cdot e_1 +c_1 C_f^{n_1} \cdot e_1+ \dots +c_n e_1 = 0. Moreover, for 2kn2\leq k\leq n, one has f(Cf)ek=f(Cf)Cfk1e1=Cfk1f(Cf)e1=0f(C_f)\cdot e_k=f(C_f)\cdot C_f^{k-1}\cdot e_1=C_f^{k-1}\cdot f(C_f)\cdot e_1=0. Consequently, f(Cf)=0f(C_f)=0 and the complex eigenvalues of f(Cf)f(C_f) are roots of ff. Since ff has simple roots, CfC_f is diagonalizable and their exists a matrix PGL(n,C)P\in\mathrm{GL}(n,\mathbf C) and diagonal matrix DD such that Cf=PDP1C_f=P\cdot D\cdot P^{-1}, the diagonal entries of DD are roots of ff. Since 00 is not a root of ff, the matrix DD is invertible, and CfC_f is invertible as well. More precisely, one can infer from the definition of CfC_f that g(Cf)e10g(C_f)\cdot e_1\neq 0 for any nonzero polynomial gg of degre <n<n, so that ff is the minimal polynomial of CfC_f. Consequently, all of its roots are actually eigenvalues of CfC_f, and appear in DD; in particular, α\alpha is an eigenvalue of CfC_f.

For every k1k\geq 1, one has Cfk=P(Dk)P1C_f^k=P\cdot (D^k)\cdot P^{-1}. Since all entries of DD have absolute value at most 1,1, the set of all DkD^k is bounded in Mn(C)\mathrm{M}_n(\mathbf C). Consequently, the set {Cfk;kZ}\{C_f^k\,;\, k\in\mathbf Z\} is bounded in Mn(C)\mathrm M_n(\mathbf C). On the other hand, this set consists in matrices in Mn(Z)\mathrm M_n(\mathbf Z). It follows that this set is finite.

There are integers kk and \ell such that k<k< \ell and Cfk=CfC_f^k=C_f^\ell. Since CfC_f is invertible, one has Cfk=1C_f^{\ell-k}=1. Since α\alpha is an eigenvalue of Cf,C_f, this implies αk=1\alpha^{\ell-k}=1. We thus have proved that α\alpha is a root of unity.

Saturday, January 11, 2025

On numbers and unicorns

Reading a book on philosophy of mathematics, even if it's written lightly, such as that one, Why is there philosophy of mathematics at all? by Ian Hacking, may have unexpected effects. The most visible one has been a poll that I submitted on Mastodon on December 4th. As you can read, there were four options:

  • Numbers exist
  • Unicorns exist
  • Numbers have more existence than unicorns
  • Neither numbers no unicorns exist

As you can see, three main options each share a rough third of the votes, the existence of unicorns being favored by a small 4% of the 142 participants.

Post by @antoinechambertloir@mathstodon.xyz
View on Mastodon

In this post, I would like to discuss these four options, from the point of view of a mathematician with a limited expertise on philosophy. Funnily, each of them, including the second one, has some substance. Moreover, it will become apparent at the end that the defense of any of those options relies on the meaning we give to the word exist.

Numbers exist

We have traces of numbers as old as we have traces of writing. The Babylonian clay tablet known as “Plimpton 322” dates from 1800 BC and mentions Pythagorean triples (triples of integers (a,b,c)(a,b,c) such a2=b2+c2a^2=b^2+c^2) written in the sexagesimal (base 60) writing of Babylonians. More than 1000 years later, Pythagoras and his school wished to explain everything in the world using numbers, usually depicted geometrically, drawing pebbles arranged in triangles, squares, rectangles and even pentagons. Musical harmony was based on specific numerical ratios. Plato's writings are full of references to mathematics, and a few specific numbers, such as 5040, are even given a political relevance. Even earlier, the Chinese I Ching based predictions on random numbers.

Euclid's Elements give us an account of numbers that still sounds quite modern. A great part of elementary arithmetic can be found there: divisibility, the “Euclidean algorithm”, prime numbers and the fact that there are infinitely many of them or, more precisely, that there are more prime numbers than any given magnitude. On the other hand, it is worth saying that Euclid's concept of number doesn't exactly fit with our modern concept: since “A number is a multitude composed of units”, numbers are whole numbers, and it is implicit that that zero is not a number, and neither is one. Moreover, proposition 21 of book 10, which we read as proving the irrationality of the square root of 2, is not a statement about a hypothetical number called “square root of 2”, but the fact that the diagonal of a square of unit length is irrational, ie, not commensurable with the unit length.

History went on and on and numbers gradually were prevalent in modern societies. Deciding the exact date of Easter in a ill-connected Christian world led to the necessity of computing it, using an algorithm known as computus, and the name happened to be used for every calculation. The development of trade led to the development of computing devices, abacus, counting boards (the counter at your prefered shop is the place where the counting board was laid), and Simon Stevin's De Thiende explicitly motivates trade as an application of his book on decimal fractions.

Needless to say, our digital age is litteraly infused with numbers.

Unicorns exist

Traces of unicorns can be found by the Indus valley. The animal seems to have disappeared but the Greeks thought they lived in India. In Western Europe, Middle ages tapestry gives numerous wonderful representations of unicorns, such as the Paris Dame à la licorne. This extract from Umberto Eco's The name of the rose gives additional proof of their existence:

“But is the unicorn a falsehood? It’s the sweetest of animals and a noble symbol. It stands for Christ, and for chastity; it can be captured only by setting a virgin in the forest, so that the animal, catching her most chaste odor, will go and lay its head in her lap, offering itself as prey to the hunters’ snares.”
“So it is said, Adso. But many tend to believe that it’s a fable, an invention of the pagans.”
“What a disappointment,” I said. “I would have liked to encounter one, crossing a wood. Otherwise what’s the pleasure of crossing a wood?”

The Internet age gave even more evidence to the existence of unicorns. For example, we now know that though it is not rainbowish, contrary to a well-spread myth, unicorns have a colorful poop.

Numbers have more existence than unicorns

Whatever numbers and unicorns could be, we are litteraly surrounded by the former, and have scarce traces of the latter. I'd also like to add that numbers have a big impact in all modern human activities: the most basic trading activities use numbers, as does the most sophisticated science. But we don't need to wish to send a man to the moon to require numbers: the political life is build around votations and polls, all timed schedules are expressed in numbers, and schools, hospitals, movies are regularly ranked or assigned rates. An example given by Hacking is the existence of world records for, say, 50 m free stroke swimming. That requires good swimmers, of course, but also elaborate timers, very precise measurement tools to have the swimming pool acknowledged as a legitimate place, and so on. On the other hand, how nice the cultural or artistic representations of unicorns can be, they don't have that prevalence in the modern world, and it is rare that something is possible because of something involving unicorns.

Neither numbers nor unicorns exist

This is probably my favorite take, and that's maybe slightly bizarre from a mathematician, especially one versed in number theory. But give some lucid look at the discourse that we have about numbers. On one side, we have the famous quote of Leopold Kronecker, “Die ganzen Zahlen hat der liebe Gott gemacht, alles andere ist Menschenwerk.” — God made whole numbers, all the rest is the work of man. Does this mean that at least some numbers, the natural integers, exist in the world in the same way that we can find water, iron, gold or grass? Are there number mines somewhere? It doesn't seem so, and I'm inclined to say that if numbers exist, they also are the creation of mankind. The ZFC axioms of set theory postulate the existence of some set N\mathbf N satisfying an induction principle, and that induction principle is used to endow that set with an addition and a multiplication which make it a mathematical model of Kronecker's “whole numbers”, but does this mean that numbers exist? After all, what does guarantee that the ZFC axioms are not self contradictory? And if they were, would that mean that whole numbers cease to exist? Certainly not. In any case, our daily use of whole numbers would not be the least disturbed by a potential contradiction in those axioms. But that indicates that being able to speak of integers withing the ZFC set theory doesn't confer those integers some existence, in the same sense that grass, water, iron exist. Another indication of their elusive character is the way mathematicians or computer scientists pretend numbers are specific objects, such as zero being the empty set, 1 being the set {}\{\emptyset\}, and, more generally, the integer n+1n+1 being n{n}n \cup \{n\}. This is at most a functional fiction, as is well explained by Benacerraf (1965) in his paper, “What numbers could not be” (The Philosophical Review, Vol. 74, No. 1, pp. 47-73).

But all in all, everything is fine, because whether numbers (resp. unicorns) exist or don't, we, humans, have developed a language to talk about them, make us think, make us elaborate new works of art or of science, and after all, this is all that counts, isn't it?

Friday, September 13, 2024

The combinatorial Nullstellensatz

The “combinatorial Nullstellensatz” is a relatively elementary statement due to Noga Alon (1999) whose name, while possibly frightening, really says what it is and what it is good for. (A freely available version is there.) Nullstellensatz is the classic name for a theorem of David Hilbert that relates loci in FnF^n defined by polynomial equations and the ideals of the polynomial ring F[T1,,Tn]F[T_1,\dots,T_n] generated by these equations, when FF is an algebraically closed field. The word itself is German and means “theorem about the location of zeroes”. The precise correspondence is slightly involved, and stating it here precisely would derail us from the goal of this post, so let us stay with these informal words for the moment. The adjective combinatorial in the title refers to the fact that Alon deduced many beautiful consequences of this theorem in the field of combinatorics, and for some of them, furnishes quite simple proofs. We'll give one of them below, the Cauchy—Davenport inequality, but my main motivation in writing this text was to discuss the proof of the combinatorial Nullstellensatz, in particular untold aspects of the proofs which took me some time to unravel.\gdef\Card{\operatorname{Card}}

The context is the following: FF is a field, nn is a positive integer, and one considers finite sets S1,,SnS_1,\dots,S_n in FF, and the product set S=S1××SnS=S_1\times \dots \times S_n in FnF^n. One considers polynomials fF[T1,,Tn]f\in F[T_1,\dots,T_n] in nn indeterminates and their values at the points (s1,,sn)(s_1,\dots,s_n) of SS. For every i{1,,n}i\in\{1,\dots,n\}, set gi=aSi(Tia)g_i = \prod_{a\in S_i} (T_i-a); this is a polynomial of degree Card(Si)\Card(S_i) in the indeterminate TiT_i.

Theorem 1. — If a polynomial fF[T1,,Tn]f\in F[T_1,\dots,T_n] vanishes at all points sSs\in S, there exist polynomials h1,,hnF[T1,,Tn]h_1,\dots,h_n\in F[T_1,\dots,T_n] such that f=g1h1++gnhnf=g_1 h_1+\dots+g_n h_n, and for each ii, one has deg(gihi)deg(f)\deg(g_i h_i)\leq \deg (f).

This theorem is really close to the classic Nullstellensatz, but the specific nature of the set SS allows to have a weaker hypothesis (the field FF is not assumed to be algebraically closed) and a stronger conclusion (the Nullstellensatz would assert that there exists some power fkf^k of ff that is of this form, saying nothing of the degrees).

Its proof relies on a kind of Euclidean division algorithm. Assume that ff has some monomial cmTm=cmT1m1Tnmnc_mT^m=c_mT_1^{m_1}\dots T_n^{m_n} where miCard(Si)m_i\geq \Card(S_i); then one can consider a Euclidean division (in the variable TiT_i), Timi=pigi+riT_i^{m_i}=p_i g_i + r_i, where deg(ri)<Card(Si)\deg(r_i)<\Card(S_i). One can then replace this monomial cmTmc_mT^m in ff by cmTm/Timi)ric_m T^m/T_i^{m_i})r_i, and, at the same time register cmTm/Timipic_m T^m/T_i^{m_i} p_i to hih_i. Since this amounts to subtract some multiple of gig_i, the vanishing assumption still holds. Applying this method repeatedly, one reduces to the case where the degree of ff in the variable TiT_i is <Card(Si)<\Card(S_i) for all ii.
Then a variant of the theorem that says that a polynomial in one indeterminate has no more roots than its degree implies that f0f\equiv 0.

A weak point in the written exposition of this argument is the reason why the iterative construction would terminate. Intuitively, something has to decrease, but one needs a minute (or an hour, or a week) of reflexion to understand what precisely decreases. The problem is that if one works one monomial at a time, the degrees might stay the same. The simplest reason why this procedure indeed works belongs to the theory of Gröbner bases and to the proof that the division algorithm actually works: instead of the degrees with respect to all variables, or of the total degree, one should consider a degree with respect to some lexicographic ordering of the variables — the point is that it is a total ordering of the monomials, so that if one consider the leading term, given by the largest monomial, the degree will actually decrease.

The second theorem is in fact the one which is needed for the applications to combinatorics. Its statement is rather written in a contrapositive way — it will show that ff does not vanish at some point of SS.

Theorem 2.Let fF[T1,,Tm]f\in F[T_1,\dots,T_m] and assume that ff has a nonzero monomial cmTmc_mT^m, where m|m| is the total degree of ff. If, moreover, mi<Card(Si)m_i<\Card(S_i) for all ii, then there exists a point s=(s1,,sn)Ss=(s_1,\dots,s_n)\in S such that f(s)0f(s)\neq 0.

It is that theorem whose proof took me a hard time to understand. I finally managed to formalize it in Lean, hence I was pretty sure I had understood it. In fact, writing this blog post helped me simplify it further, removing a couple of useless arguments by contradiction! Anyway, I feel it is worth being told with a bit more detail than in the original paper.

We argue by contradiction, assuming that f(s)=0f(s)=0 for all sS1××Sns\in S_1\times\dots \times S_n. Applying theorem 1, we get an expression f=i=1ngihif=\sum_{i=1}^n g_i h_i where deg(gihi)deg(f)\deg(g_ih_i)\leq \deg(f) for all ii. The coefficient of TmT^m in ff is non zero, by assumption; it is also the sum of the coefficients of the coefficients of TmT^m in gihig_i h_i, for 1in1\leq i\leq n, and we are going to prove that all of them vanish — hence getting the desired contradiction. \gdef\coeff{\operatorname{coeff}}

Fix i{1,,n}i\in\{1,\dots, n\}.  If hi=0h_i=0, then coeffm(gihi)=coeffm(0)=0\coeff_m(g_ih_i)=\coeff_m(0)=0, hence we may assume that hi0h_i\neq0. This implies that deg(gihi)=Card(Si)+deg(hi)deg(f)=m\deg(g_i h_i)=\Card(S_i)+\deg(h_i) \leq \deg(f)=|m|.
One then has
coeffm(gihi)=p+q=mcoeffp(gi)coeffq(hi), \coeff_m(g_i h_i)=\sum_{p+q=m} \coeff_p (g_i) \coeff_q(h_i),
and it suffices to prove that all terms of this sum vanish. Fix p,qp,q such that p+q=mp+q=m, assume that  coeffp(gi)0\coeff_p(g_i)\neq 0, and let us prove that coeffq(hi)=0\coeff_q(h_i)=0.

By the very definition of gig_i as a product of Card(Si)\Card(S_i) factors of the form TiaT_i-a, this implies that pj=0p_j=0 for all jij\neq i. Moreover, pipi+qi=mi<Card(Si)p_i\leq p_i+q_i=m_i < \Card(S_i), by the assumption of the theorem, hence pi<Card(Si)p_i<\Card(S_i). This implies Card(Si)+deg(hi)m=p+q=p+q=pi+q<Card(Si)+q.\Card(S_i)+\deg(h_i)\leq |m|= |p+q|=|p|+|q|=p_i+|q|<\Card(S_i)+|q|.
Subtracting Card(Si)\Card(S_i) on both sides, one gets deg(hi)<q\deg(h_i)<|q|, hence coeffq(hi)=0\coeff_q(h_i)=0, as was to be shown.

To conclude, let us add the combinatorial application to the Cauchy—Davenport theorem.
\gdef\F{\mathbf F}

Theorem 3.Let pp be a prime number and let AA and BB be nonempty subsets of Fp\F_p, the field with pp elements. Denote by A+BA+B the set of all a+ba+b, for aAa\in A and bBb\in B. Then
Card(A+B)min(p,Card(A)+Card(B)1). \Card(A+B) \geq \min (p, \Card(A)+\Card(B)-1).


First consider the case where Card(A)+Card(B)>p\Card(A)+\Card(B)>p, so that min(p,Card(A)+Card(B)1)=p\min(p,\Card(A)+\Card(B)-1)=p.  In this case, for every cFpc\in\F_p, one has
Card(A(cB))Card(A(cB))=Card(A)+Card(cB)>Card(A)+Card(B)>p,\Card(A \cap (c-B))\cup \Card(A\cup (c-B))=\Card(A)+\Card(c-B)>\Card(A)+\Card(B)>p,
so that the sets AA and cBc-B intersect. In particular, there exist aAa\in A and bBb\in B such that a+b=ca+b=c and A+B=FpA+B=\F_p, and the desired inequality holds.

Let us now treat the case where Card(A)+Card(B)p\Card(A)+\Card(B)\leq p, so that min(p,Card(A)+Card(B)1)=Card(A)+Card(B)1\min(p,\Card(A)+\Card(B)-1)=\Card(A)+\Card(B)-1. We will assume that the conclusion of the theorem does not hold, that is, Card(A+B)Card(A)+Card(B)2\Card(A+B)\leq \Card(A)+\Card(B)-2, and derive a contradiction. Let us consider the polynomial fFp[X,Y]f\in \F_p[X,Y] defined by f=cA+B(X+Yc)f=\prod _{c\in A+B} (X+Y-c). The degree of ff is equal to Card(A+B)Card(A)+Card(B)2\Card(A+B)\leq \Card(A)+\Card(B)-2. Choose natural integers uu and vv such that  u+v=Card(A+B)u+v=\Card(A+B), uCard(A)1u\leq\Card(A)-1 and vCard(B)1v\leq \Card(B)-1. The coefficient of XuYvX^u Y^v is equal to the binomial coefficient (u+vu)\binom{u+v}u. Since u+vCard(A)+Card(B)2p2u+v\leq \Card(A)+\Card(B)-2\leq p-2, this coefficient is nonzero in Fp\F_p.  By theorem 2, there exists (a,b)A×B(a,b)\in A\times B such that f(a,b)0f(a,b)\neq 0. However, one has a+bA+Ba+b\in A+B, hence f(a,b)=0f(a,b)=0 by the definition of ff. This contradiction concludes the proof that Card(A+B)Card(A)+Card(B)1\Card(A+B)\geq \Card(A)+\Card(B)-1. It also concludes this post.

Monday, August 12, 2024

Combinatorics of partitions

Divided powers are an algebraic gadget that emulate, in an arbitrary ring, the functions xxn/n!x\mapsto x^n/n! for all integers nn, together with the natural functional equations that they satisfy. 

One of them is a nice binomial theorem without binomial coefficients : denoting xn/n!x^n/n! by x[n]x^{[n]}, one has (x+y)[n]=k=0nx[nk]y[k]. (x+y)^{[n]}=\sum_{k=0}^n x^{[n-k]} y^{[k]}.  

Another formula looks at what happens when one iterates the construction: if everything happens nicely, one has (x[n])[m]=1m!(xn/n!)m=1(n!)mm!xnm=(mn)!m!n!mx[mn]. (x^{[n]})^{[m]}=\frac1{m!}\left(x^n/n!\right)^m = \frac1{(n!)^m m!} x^{nm} = \frac{(mn)!}{m! n!^m} x^{[mn]}.  

 In the development of the theory, the remark that comes then is the necessary observation that the coefficient (mn)!/m!n!m(mn)!/{m! n!^m} is an integer, which is not obvious since it is written as a fraction. Two arguments are given to justify this fact. 

  1. The formula (mn)!m!n!m=k=1m(kn1n1) \frac{(mn)!}{m! n!^m} = \prod_{k=1}^{m} \binom{k n-1}{n-1} which the authors claim can be proved by induction
  2. The observation that (mn)!/m!n!m(mn)!/m!n!^m is the number of (unordered) partitions of a set with mnmn elements into mm subsets with nn elements.

The latter fact is a particular case of a more general counting question: if n=(n1,,nr)n=(n_1,\dots,n_r) are integers and N=n1++nrN=n_1+\dots+n_r, then the number of (unordered) partitions of a set with NN elements into subsets of cardinalities n1,,nrn_1,\dots,n_r is equal to N!ni!k>0mk(n)!,\frac{N!}{\prod n_i! \prod_{k>0} m_k(n)!}, where mk(n)m_k(n) is the number of occurences of kk in the sequence (n1,,nr)(n_1,\dots,n_r)

The goal of this blog post is to present a combinatorial argument for the first equality, and an alternative expression of the second number as a product of integers. We also give a formal, group theoretical proof, that this quotient of factorials solves the given counting problem. 

 \gdef\abs#1{\lvert#1\rvert} \gdef\Card{\operatorname{Card}}  

Counting partitions of given type 

 Let SS be a set with NN elements. A partition of SS is a subset {s1,,sr}\{s_1,\dots,s_r\} of P(S)\mathfrak P(S) consisting of nonempty, pairwise disjoint, subsets whose union is SS. Its *type* is the “multiset” of integers given by their cardinalities. Because no sis_i is empty, the type is a multiset of nonzero integers. It is slightly easier to authorize zero in this type; then a partition has to be considered as a multiset of subsets of SS, still assumed pairwise disjoint, so that only the empty subset can be repeated.

The group GG of permutations of SS acts on the set ΠS\Pi_S of partitions of SS. This action preserves the type of a partition, so that we get an action on the set ΠS,n\Pi_{S,n} on the set of partitions of type nn, of any multiset of integers nn. It is nonempty if and only if n=N\abs n=N.

This action is transitive: Given two partitions (s1,,sr)(s_1,\dots,s_r) and (t1,tr)(t_1,\dots t_r) with the same type (n1,,nr)(n_1,\dots,n_r), numbered so that si=ti=ni\abs{s_i}=\abs{t_i}={n_i} for all ii, we just define a permutation of SS that maps the elements of sis_i to tit_i.

By the orbit-stabilizer theorem, the number of partitions of type nn is equal to Card(G)/Card(Gs)\Card(G)/\Card(G_s), where GsG_s is the stabilizer of any partition ss of type nn. Since Card(S)=N\Card(S)=N, one has Card(G)=N!=n!\Card(G)=N!=\abs n!.

On the other hand, the stabilizer GsG_s of a partition (s1,,sr)(s_1,\dots,s_r) can be described as follows. By an element gg of this stabilizer, the elements of sis_i are mapped to some set sjs_j which has the same cardinality as sis_i. In other words, there is a morphism of groups from GsG_s to the subgroup of the symmetric group Sr\mathfrak S_r consisting of permutations that preserve the cardinality. This subgroup is a product of symmetric groups Smk\mathfrak S_{m_k}, for all k>0k> 0, where mkm_k is the number of occurrences of kk in (n1,,nr)(n_1,\dots,n_r). Here, we omit the factor corresponding to k=0k=0 because our morphism doesnt' see it.

This morphism is surjective, because it has a section. Given a permutation σ\sigma of {1,,r}\{1,\dots,r\} that preserves the cardinalities, we have essentially shown above how to construct a permutation of SS that induces σ\sigma, and this permutation belongs to GsG_s. More precisely, if we number the elements of sis_i as (xi,1,,xi,ni)(x_{i,1},\dots,x_{i,n_i}), we lift σ\sigma to the permutation that maps xi,jx_{i,j} to xσ(i),jx_{\sigma(i),j} for all i,ji,j. This makes sense since ni=nσ(i)n_{i}=n_{\sigma(i)} for all ii.

The kernel of this morphism Gsk>0SmkG_s\to \prod_{k>0} \mathfrak S_{m_k} consists of permutations that fix each sis_i. It is clearly equal to the product of the symmetric groups Sni\prod \mathfrak S_{n_i}.

One thus has Card(Gs)=ni!k>0mk!\Card(G_s)=\prod n_i! \prod_{k>0} m_k!, as was to be shown.

A combinatorial interpretation of the first relation


As we said above, that relation (mn)!mn!m=k=1m1(kn1n1) \frac{(mn)!}{m^ n!^m} = \prod_{k=1}^{m-1} \binom{kn-1}{n-1} can be proved by induction, just playing with factorials, but we want a combinatorial interpretation for it. (Here we assume that n1n\geq 1.)
By the preceding paragraph, the left hand side is the number of partitions of a set with mnmn elements of type (n,,n)(n,\dots,n). We may assume that this set is numbered {1,,mn}\{1,\dots,mn\}.

Such a partition can be defined as follows. First, we choose the subset that contains 11: this amounts to choosing n1n-1 elements among {2,,mn}\{2,\dots,mn\}, and there are (mn1n1)\binom{mn-1}{n-1} possibilities. Now, we have to choose the subset that contains the smallest element which is not in the first chosen subset. There are n1n-1 elements to choose among the remaining mnn1=(m1)n1mn-n-1=(m-1)n-1 ones, hence ((m1)n1n1)\binom{(m-1)n-1}{n-1} possibilities. And we go on in the same way, until we have chosen the mm required subsets. (For the last one, there is course only one such choice, and notice that the last factor is (n1n1)=1\binom{n-1}{n-1}=1.)

An integral formula for the number of partitions of given type


Finally, we wish to give an alternating formula for the number of partitions of given type of a set SS that makes it clear that it is an integer. Again, we can assume that the set SS is equal to {1,,N}\{1,\dots,N\} and that n=(n1,,nr)n=(n_1,\dots,n_r) is a multiset of integers such that n=N\abs n=N.

Let us write nn in an other way, n=(0m0,1m1,)n=(0^{m_0},1^{m_1},\dots), meaning that 00 is repeated m0m_0 times, 11 is repeated m1m_1 times, etc. One has N=kmkN=\sum k m_k. The idea is that we can first partition SS into a subsets of cardinalities m1m_1, 2m22m_2,… and then subpartition these subsets: the first one into m1m_1 subsets with one element, the second into m2m_2 subsets with two elements, etc.

The number of ordered partitions with m1m_1, 2m22m_2… elements is the multinomial number (Nm1,2m2,). \binom{N}{m_1,2m_2,\dots}. And the other choice is the product of integers as given in the preceding section. This gives n!ini!k>0mk(n)!=(Nm1,2m2,)k>0(kmk)!k!mkmk(n)!. \frac{\abs n!}{\prod_i n_i! \prod_{k>0}m_k(n)!} = \binom{N}{m_1,2m_2,\dots}\prod_{k>0} \frac{(km_k)!}{k!^{m_k} m_k(n)!}. We can also write the fractions that appear in the product as integers: n!ini!k>0mk(n)!=(Nm1,2m2,)k>0m=1mk(km1k1). \frac{\abs n!}{\prod_i n_i! \prod_{k>0}m_k(n)!} =\binom{N}{m_1,2m_2,\dots} \prod_{k>0} \prod_{m=1}^{m_k} \binom {km-1}{k-1}.

Saturday, July 20, 2024

Number theory and finite automata

\gdef\abs#1{\lvert#1\rvert}

There are many parts of number theory I don't know of, and today's story is about one I learnt very recently, at the intersection of number theory and computer science. It is about the natural numbers that we can recognize using a finite automaton. 

Finite automata and automatic functions

Finite automata

Finite automata are very elementary computers. These machines can receive instructions of finitely many sorts, and can be in finitely many states; moreover, their behaviour is prescribed: when, in a given state, they receive one instruction, they just move to another state. Mathematically, one can say that there is a finite set AA of instructions, a finite set SS of states, and a mapping ϕ ⁣:A×SS\phi\colon A \times S\to S. One state ε\varepsilon is labeled as the initial state; wen reading a list of instructions [a1,,am][a_1,\dots,a_m], the machine goes through the states s0=εs_0=\varepsilon, s1=μ(a1,s0)s_1=\mu(a_1,s_0), s2=μ(a2,s1)s_2=\mu(a_2,s_1), etc. until it reaches and outputs its the final state sns_n. All in all, the automaton defines a function Φ\Phi from the set AA^* of lists in AA to the set SS. Such functions are called automatic.

In practice, some states could be labeled as admissible, in which case the machine unlocks the door, and the set of lists αA\alpha\in A^* such that Φ(α)\Phi(\alpha) is admissible is then called the language of the automaton.

The basic question of automata theory is to describe the (so-called “regular”) languages recognizable by some automaton, more generally the automatic functions. It appears that they are of a very restricted nature: due to the elementary nature of finite automata, they are much more restricted than the kind of languages that would be recognizable by a program in the programming language of your choice. The reason is that they have a fixed memory while “theoretical” computers, as modelled by Turing machines for example, have a potentially infinite memory, that is, as large as the computation may require.

Numbers

Number theory enters when you decide that the set AA of instructions are digits 0, 1, 2,…, 9 and the lists in AA^* represent a number. You then wish to understand what numbers can be recognizable, or what functions Φ ⁣:NS\Phi\colon\mathbf N\to S are automatic, for a finite set SS. This is what happens, for example, in the very elementary finite automata that stand at almost all Paris doors as electronic locks. They recognize exactly one 4-digit number (and any 4-digit number) puts them back in the initial state.

When it comes about digits, they can be the usual ones, 0, 1, 2,… , 9, writing number in base 10, but any other basis can be considered, and any set of digits that allows to write all numbers can be used. For example, one can consider writing numbers in base 3, with digits 0, 1, 2, 3, 4.

Let us thus fix some basis bb. The first theorem is that the set of automatic functions does not depend on the set of digits which is considered. The reason is that one can build another kind of machine (called a transducer) that works like a finite automaton except that it output something at each step. Here, we can have a transducer that outputs the digits in some representation given the digits in another one. There could be carries to take care of, but when the basis is the same, their impact is controlled, and it can be controlled by finitely many of them. It is important for this argument that the basis stays the same, and we will see that in a more striking way later on. For this reason, we will not specify the set of digits in the sequel.

Examples

Here are some examples of recognizable sets of numbers.

  • The empty set works. It suffices to have no terminal state.
  • Singletons work. For example, to recognize “314”, one needs one initial state “ε\varepsilon”, three states labeled “3”, “31” and “314”, and a junk state “J”. The machine moves from “ε\varepsilon” to “3” if it receives “3”, and to “J” otherwise; from “3” to “31” if it receives “1”, etc.
  • The union and the intersection of two recognizable sets are recognizable. Just make an automaton whose sets are pair of states of the two automata that recognize the given states, and simulate the moves of each of them. Similarly, the complement of a recognizable set is recognizable.
  • By the previous examples, finite sets and complements of finite sets are recognizable.
  • Here is a number theoretically more interesting example : arithmetic progressions are recognizable. To produce an automaton that, say, recognize all integers congruent to 1 modulo 4, it suffices to have a machine with 4 states that computes the euclidean reminder step by step.
  • Another number theoretically interesting example: powers of bb. Indeed their writing in base bb consists of one digit “1” followed by a series of “0”.
  • A number theoretically annoying property of recognizable sets: since a finite automaton has finitely many states, the “pumping lemma” shows that for any large enough recognizable number, one can replicate at will some inner part of the writing and get another number. I suppose one can deduce from this property that the set of prime numbers cannot be recognizable. (This is exercise 5.8.12 in the book Automatic sequences, by J-P. Allouche and J. Shallit.)
  • Numbers recognizable in base bb are also recognizable in base b2b^2 (or some other power) and conversely: it suffices to emulate the base b2b^2-writing of a number using base bb-digits, or conversely.

Building from characteristic functions of admissible languages, the preceding results can be turned to examples of automatic functions. In particular, we obtain that functions which are ultimately periodic are automatic.

Cobham's theorem

Statement

Number theory can also enter a topic from two sides at a time, and here is the point of this blog post. Can one compare the sets of recognizable numbers from different bases? By the following theorem, they are genuinely distinct.

Theorem (Cobham, 1969). — Let SS be a finite set and let f ⁣:NSf\colon \mathbf N\to S be a function which is automatic in two bases bb and cc. If bb and cc are not powers of the same integer, then ff is ultimately periodic.
Since ultimately periodic functions are automatic in any basis, this theorem is definitely optimal.

The original proof of Cobham's theorem is said to be difficult, and this theorem has obtained various other simpler proofs, often partially flawed. I wish to describe here the recent proof by Thijmen Krebs (publisher version, arXiv:1801.06704) which calls itself “more reasonable”.

The assumption that aa and bb are not powers of the same integer will appear in different, equivalent, form in the proof: they do not have common powers.

Local periods

To prove that ff is ultimately periodic, Krebs proves that ff is periodic on large overlapping intervals.

Lemma 1.Let f:NSf: \mathbf N\to S be a function and let II and JJ be intervals of integers. We assume that ff has period pp on II and period qq on JJ. If Card(IJ)p+q\operatorname{Card}(I\cap J)\geq p+q, then ff has period pp on the interval IJI\cup J.

The proof is elementary. Consider xIJx\in I\cup J such that x+pIJx+p\in I\cup J. If both xx and x+px+p belong to II, then f(x)=f(x+p)f(x)=f(x+p). Let us assume otherwise. If xx belongs to II, but not x+px+p, then x+pJx+p\in J; since IJI\cap J has at least p+qp+q elements, xx must belongs to JJ. The other cases are similar and show that both x,x+px, x+p belong to JJ. Using that IJI\cap J is large, we pick up elements y,y+py,y+p in IJI\cap J such that xy(modq)x\equiv y \pmod q. Then f(x)=f(y)=f(y+p)=f(x+p)f(x)=f(y)=f(y+p)=f(x+p), the first and last equality using the qq-periodicity on JJ, and the middle one the pp-periodicity on II.

A diophantine approximation lemma

Lemma 2.Let aa and bb be two real numbers such that a,b>1a,b>1. For every ε>0\varepsilon>0, there exist nonzero natural numbers mm and nn such that ambn<εbn\abs{a^m-b^n}<\varepsilon b^n.

The result is obvious if aa and bb have a common power, so we may assume that this is not the case. Then various proofs are possible. For example, one may consider the additive subgroup of R\mathbf R generated by log(a)\log(a) and log(b)\log(b); by assumption, it is dense hence there exists a small linear combination mlog(a)nlog(b)m\log(a)-n\log(b); taking exponentials, am/bna^m/b^n is close to~11, as claimed.

Krebs gives a “pigeonhole-like” proof which is nice. For every integer m1m\geq1, consider the unique integer nm1n_m\geq 1 such that 1am/bnm<b1\leq a^m/b^{n_m}<b. There must be two integers mm and pp such that m<pm<p and such that am/bnma^m/b^{n_m} and ap/bnpa^p/b^{n_p} differ at most from ε\varepsilon, and this implies the result.

Some functional equations

Let f ⁣:NSf\colon \mathbf N\to S be a function which is automatic in some basis cc, computed by an automaton with set of states SS. For every sSs\in S, let L(s)L(s) be the set of integers nn such that f(n)=sf(n)=s. Note that these sets form a partition of N\mathbf N.

The following lemma is a variant of the pumping lemma in the theory of automata; it is by it that automatic functions acquire their specific properties.

Lemma 3.For every integers x,yL(s)x,y\in L(s) and any integer zz such that 0z<cn0\leq z < c^n, one has f(xcn+z)=f(ycn+z)f(x c ^ n+z)=f(yc ^n +z).

Indeed, when the automaton reads the integer xcn+zxc^n+z, it first reads xx, and goes to state ss; similarly, when it reads ycn+zyc^n+z, it first reads yy and reaches the state ss too. In both cases it then reads zz and ends up in the same final state.

Construction of local periods

We now consider a function ff which is automatic in two bases aa and bb which do not have common powers. We thus have two finite automata AA and BB, with sets of states SS and TT. For each state sSs\in S, we have a set L(s)L(s) of integers as above; these sets form a partition of N\mathbf N. Let SS^\infty be the set of states sSs\in S such that L(s)L(s) is infinite. For tTt\in T, we define M(t)M(t) similarly, as well as TT^\infty.

Let sSs\in S^\infty. Since the M(t)M(t), for tTt\in T, form a finite partition of N\mathbf N and L(s)L(s) is infinite, there exists a state t(s)t(s) and two distinct integers x(s)x(s) and y(s)L(s)M(t(s))y(s)\in L(s)\cap M(t(s)). Let KK be an integer strictly greater than all x(s),y(s)x(s), y(s), for sSs\in S^\infty.

Applying lemma 3 for ε=1/6K\varepsilon = 1/6K, we obtain integer m,nm,n such that ambn<am/6K\abs{a^m-b^n}<a^m/6K. For sSs\in S^\infty, we p(s)=±(x(s)y(s))(ambn)p(s)=\pm (x(s)-y(s))(a^m-b^n); since aa and bb have no common power, this is a nonzero integer, and we choose the sign so that it is strictly positive.

For each integer xx, we define an interval IxI_x of integers by Ix=[(x+13)am,(x+53)am]NI_x=[(x+\frac13)a^m,(x+\frac53)a^m]\cap\mathbf N.

Lemma 4.For every state sSs\in S_\infty and any integer xL(s)x\in L(s), the function ff has period p(s)p(s) on IxI_x.

To that aim, we consider z[13am,53am]z\in [\frac13a^m,\frac53a^m] and prove that f(xam+z)=f(xam+z+p)f(xa^m+z)=f(xa^m+z+p). First of all, we have the inequality zx(s)(ambn)bnzam+(x(s)+1)ambn23am+Kambn56ambn \abs{z-x(s)(a^m-b^n)-b^n} \leq \abs{z-a^m}+(x(s)+1) \abs{a^m-b^n} \leq \frac23 a^m + K\abs{a^m-b^n} \leq \frac56a^m\leq b^n because ambn16am\abs{a^m-b^n}\leq \frac16 a^m. Consequently, zz is written with at most nn digits in base bb. Applying lemma 3, we have f(xam+z)=f(x(s)am+z f(x a^m+z) = f(x(s)a^m+z, because xx and x(s)L(s)x(s)\in L(s). Applying lemma 3 again, this is equal to f(x(s)am+zx(s)(ambn))f(x(s)a^m+z-x(s)(a^m-b^n)), hence to f(y(s)am+zx(s)(ambn))=f(y(s)am+z+p(s))f(y(s)a^m+z-x(s)(a^m-b^n)) = f(y(s)a^m+z+p(s)). Applying lemma 3 a third time, this is equal to f(xam+z+p(s)f(xa^m+z+p(s). This concludes the proof.

Conclusion

Since the sets L(s)L(s), for sSs\notin S_\infty are finite, the set L(s)L(s) for sSs\in S_\infty cover all integers larger than some integer, say xx.

For yxy\geq x, there exists sSs\in S_\infty such that yL(s)y\in L(s) and we have shown that ff is periodic with period p(s)16amp(s)\leq \frac16a^m on an interval I(x)=[(y+13)am,(y+53)am]I(x)=[(y+\frac13)a^m,(y+\frac53)a^m]. By abuse of notation, write p(x)=p(s)p(x)=p(s).

Let J(z)J(z) be the union of the intervals I(y)I(y), for xyzx\leq y\leq z. One has J(x)=I(x)J(x)=I(x); for z>xz>x, the intersection J(z)J(z1)J(z)\cap J(z-1) is [(z+13)am;(z+23)am][(z+\frac13)a^m;(z+\frac23)a^m] hence has at least 13am\frac13a^m points, thus at least p(x)+p(z)p(x)+p(z). Using lemma 1, ff has period p(x)p(x) on J(z)J(z), for all zz.

This concludes the proof that ff is ultimately periodic

Sunday, April 14, 2024

Evaluating the operator norms of matrices

Let EE and FF be normed vector spaces, over the real or complex numbers, and let u ⁣:EFu\colon E\to F be a linear map. The continuity of uu is proved to be equivalent to the existence of a real number cc such that u(x)cx\|u(x)\|\leq c \|x\| for every xEx\in E, and the least such real number is called the operator norm of uu; we denote it by u\|u\|. It defines a norm on the linear space L(E;F)\mathscr L(E;F) of continuous linear maps from EE to FF and as such is quite important. When E=FE=F, it is also related to the spectrum of uu and is implicitly at the heart of criteria for the Gershgorin criterion for localization of eigenvalues.

\gdef\R{\mathbf R}\gdef\norm#1{\lVert#1\rVert}\gdef\Abs#1{\left|#1\right|}\gdef\abs#1{\lvert#1\rvert}

However, even in the simplest cases of matrices, its explicit computation is not trivial at all, and as we'll see even less trivial than what is told in algebra classes, as I learned by browsing Wikipedia when I wanted to prepare a class on the topic.

Since I'm more a kind of abstract guy, I will use both languages, of normed spaces and matrices, for the first one allows to explain a few things at a more fundamental level. I'll make the translation, though. Also, to be specific, I'll work with real vector spaces.

So E=RmE=\R^m, F=RnF=\R^n, and linear maps in L(E;F)\mathscr L(E;F) are represented by n×mn\times m matrices. There are plentiful interesting norms on EE, but I will concentrate the discussion on the p\ell^p-norm given by (x1,,xm)=(x1p++xmp)1/p\norm{(x_1,\dots,x_m)} = (|x_1|^p+\dots+|x_m|^p)^{1/p}. Similarly, I will consider the q\ell^q-norm on FF given by (y1,,ym)=(y1q++ynq)1/q\norm{(y_1,\dots,y_m)} = (|y_1|^q+\dots+|y_n|^q)^{1/q}. Here, pp and qq are elements of [1;+[[1;+\infty\mathclose[. It is also interesting to allow p=p=\infty or q=q=\infty; in this case, the expression defining the norm is just replaced by sup(x1,,xm)\sup(|x_1|,\dots,|x_m|) and sup(y1,,yn)\sup(|y_1|,\dots,|y_n|) respectively.

Duality

Whatever norm is given on EE, the dual space E=L(E;R)E^*=\mathscr L(E;\mathbf R) is endowed with the dual norm, which is just the operator norm of that space: for ϕE\phi\in E^*, ϕ\norm\phi is the least real number such that ϕ(x)ϕx|\phi(x)|\leq \norm\phi \norm x for all xEx\in E. And similarly for FF. To emphasize duality, we will write x,ϕ\langle x,\phi\rangle instead of ϕ(x)\phi(x).

Example. — The dual norm of the p\ell^p norm can be computed explicitly, thanks to the Young inequality   x1y1++xnyn(x1p++xnp)1/p(x1q++xnq)1/q \Abs{ x_1 y_1 + \dots + x_n y_n } \leq (|x_1|^p+\dots + |x_n|^p)^{1/p} (|x_1|^q+\dots+|x_n|^q)^{1/q} if p,qp,q are related by the relation 1p+1q=1\frac1p+\frac1q=1. (When p=1p=1, this gives q=q=\infty, and symmetrically p=p=\infty if q=1q=1.) Moreover, this inequality is optimal in the sense that for any (x1,,xn)(x_1,\dots,x_n), one may find a nonzero (y1,,yn)(y_1,\dots,y_n) for which the inequality is an equality. What this inequality says about norms/dual norms is that if one identifies Rn\R^n with its dual, via the duality bracket x,y=x1y1++xnyn\langle x,y\rangle=x_1y_1+\dots+x_n y_n, the dual of the p\ell^p-norm is the q\ell^q-norm, for that relation 1/p+1/q=11/p+1/q=1.

If u ⁣:EFu\colon E\to F is a continuous linear map, it has an adjoint (or transpose) u ⁣:FEu^*\colon F^*\to E^*, which is defined by u(ϕ)=ϕuu^*(\phi)= \phi\circ u, for ϕF\phi\in F^*. In terms of the duality bracket, this rewrites as  ϕ,u(x)=u(ϕ),x \langle \phi, u(x)\rangle = \langle u^*(\phi),x\rangle for xEx\in E and ϕF\phi\in F^*.

Proposition.One has u=u\norm{u^*}=\norm u.

For ϕF\phi\in F^*, u(ϕ)\norm{u^*(\phi)} is the least real number such that u(ϕ)(x)u(ϕ)x|u^*(\phi)(x)|\leq \norm{u^*(\phi)} \norm x for all xEx\in E. Since one has  u(ϕ)(x)=u(ϕ),x=ϕ,u(x)ϕu(x)ϕux, |u^*(\phi)(x)|= |\langle u^*(\phi),x\rangle|=|\langle\phi, u(x)\rangle\leq \norm\phi \norm{u(x)} \leq \norm\phi \norm u\norm x, we see that u(ϕ)ϕu\norm {u^*(\phi)}\leq \norm\phi\norm u for all ϕ\phi. As a consequence, uu\norm{u^*}\leq \norm u.
To get the other inequality, we wish to find a nonzero ϕ\phi such that u(ϕ)=uϕ\norm{u^*(\phi)}=\norm{u}\norm\phi. This ϕ\phi should thus be such that there exists xx such that u(ϕ),x=uϕx|\langle u^*(\phi),x\rangle|=\norm u\norm\phi\norm x which, by the preceding computation means that ϕ,u(x)=ϕux|\langle\phi, u(x)\rangle=\norm\phi\norm u\norm x. Such ϕ\phi and xx must not exist in general, but we can find reasonable approximations. Start with a nonzero xEx\in E such that u(x)\norm{u(x)} is close to ux\norm u\norm x; then using the Hahn-Banach theorem, find a nonzero ϕF\phi\in F^* such that ϕ=1\norm\phi=1 and ϕ(u(x))=u(x)|\phi(u(x))|=\norm {u(x)}. We see that ϕ,u(x)\langle\phi, u(x)\rangle is close to uϕx\norm u\norm\phi\norm x, and this concludes the proof.
In some cases, in particular in the finite dimension case, we can use biduality to get the other inequality. Indeed EE^{**} identifies with EE, with its initial norm, and uu^{**} identifies with uu. By the first case, we thus have uu\norm{u^{**}}\leq \norm {u^*}, hence uu\norm u\leq\norm{u^*}.

The case p=1p=1

We compute u\norm{u} when E=RmE=\mathbf R^m is endowed with the 1\ell^1-norm, and FF is arbitrary. The linear map u ⁣:EFu\colon E\to F thus corresponds with mm vectors u1,,umu_1,\dots,u_m of FF, and one has  u((x1,,xm))=x1u1++xmum. u((x_1,\dots,x_m))=x_1 u_1+\dots+x_m u_m. By the triangular inequality, we have u((x1,,xm))x1u1++xmum \norm{u((x_1,\dots,x_m))} \leq |x_1| \norm{u_1}+\dots+\abs{x_m}\norm{u_m} hence  u((x1,,xm))(x1++xm)sup(u1,,um). \norm{u((x_1,\dots,x_m))} \leq (\abs{x_1} +\dots+\abs{x_m}) \sup(\norm{u_1},\dots,\norm{u_m}). Consequently,  usup(u1,,um). \norm{u} \leq \sup(\norm{u_1},\dots,\norm{u_m}). On the other hand, taking x=(x1,,xm)x=(x_1,\dots,x_m) of the form (0,,1,0,)(0,\dots,1,0,\dots), where the 11 is at place kk such that uk\norm{u_k} is largest, we have x=1\norm{x}=1 and u(x)=uk\norm{u(x)}=\norm{u_k}. The preceding inequality is thus an equality.

In the matrix case, this shows that the (1,q)(\ell^1,\ell^q)-norm of a n×mn\times m matrix AA is the supremum of the q\ell^q-norms of the columns of AA.

The case q=q=\infty

We compute u\norm{u} when F=RnF=\mathbf R^n is endowed with the \ell^\infty-norm, and EE is arbitrary. A direct computation is possible in the matrix case, but it is not really illuminating, and I find it better to argue geometrically, using a duality argument.

Namely, we can use u ⁣:FEu^*\colon F^*\to E^* to compute u\norm{u}, since u=u\norm u=\norm{u^*}. We have seen above that FF^* is Rn\mathbf R^n, endowed with the 1\ell^1-norm, so that we have computed u\norm{u^*} in the preceding section. The basis (e1,,en)(e_1,\dots,e_n) of FF gives a dual basis (ϕ1,,ϕn)(\phi_1,\dots,\phi_n), and one has u=u=sup(u(ϕ1),,u(ϕn)). \norm{u}=\norm{u^*} = \sup (\norm{u^*(\phi_1)},\dots,\norm{u^*(\phi_n)}).

In the matrix case, this shows that the (p,)(\ell^p,\ell^\infty)-norm of a n×mn\times m matrix AA is the supremum of the p\ell^p-norms of the lines of AA.

Relation with the Gershgorin circle theorem

I mentioned the Gershgorin circle theorem as being in the same spirit as the computation of an operator norm, because its proof relies on the same kind of estimations. In fact, no additional computation is necessary!

Theorem (Gershgorin “circles theorem”). — Let A=(aij)A=(a_{ij}) be an n×nn\times n matrix and let λ\lambda be an eigenvalue of AA. There exists an integer ii such that  λaiijiaij. \abs{\lambda-a_{ii}}\leq \sum_{j\neq i} \abs{a_{ij}}.

For the proof, one writes A=D+NA=D+N where DD is diagonal has zeroes on its diagonal, and writes λx=Ax=Dx+Nx\lambda x=Ax=Dx+Nx, hence (λID)x=Nx(\lambda I-D)x=Nx. Endow Rn\R^n with the \ell^\infty-norm. We can assume that x=1\norm x=1. Then the norm of the right hand side is bounded above by N\norm N, while the norm of the left hand side is sup(λaiixi)λaii\sup(\abs{\lambda-a_{ii}} |x_i|)\geq |\lambda-a_{ii}| if ii is chosen so that xi=x=1|x_i|=\norm x=1. Given the above formula for N\norm N, this implies the theorem.

The case p=q=2p=q=2

Since Euclidean spaces are very useful in applications, this may be a very important case to consider, and we will see that the answer is not at all straightforward from the coefficients of the matrix.

We have to bound from above u(x)\norm{u(x)}. Using the scalar product, we write  u(x)2=u(x),u(x)=uu(x),x, \norm{u(x)}^2 = \langle u(x),u(x)\rangle = \langle u^*u(x),x\rangle, where u ⁣:FEu^*\colon F\to E now denotes the adjoint of uu, which identifies with the transpose of uu if one identifies EE with EE^* and FF with FF^* by means of their scalar products. Using the Cauchy-Schwarz formula, we get that u(x)2uu(x)xuux2\norm{u(x)}^2\leq \norm{u^*u(x)}\norm x\leq \norm{u^*u} \norm x^2, hence uuu1/2\norm{u} \leq \norm{u^*u}^{1/2}. This inequality is remarkable because on the other hand, we have uuuu=u2\norm{u^*u}\leq \norm{u^*}\norm{u}=\norm{u}^2. Consequently, u=uu1/2\norm{u}=\norm{u^*u}^{1/2}.

This formula might not appear to be so useful, since it reduces the computation of the operator norm of uu to that of uuu^*u. However, the linear map uuu^*u is a positive self-adjoint endomorphism of EE hence, (assuming that EE is finite dimensional here), it can be diagonalized in a orthonormal basis. We then see that uu\norm{u^*u} is the largest eigenvalue of uuu^*u, which is also its spectral radius. The square roots of the eigenvalues of uuu^*u are also called the singular values of uu, hence u\norm u is the largest singular value of uu.

One can play with duality as well, and we have u=uu1/2\norm{u}=\norm{uu^*}^{1/2}.

Other cases?

There are general inequalities relating the various p\ell^p-norms of a vector xRmx\in\R^m, and these can be used to deduce inequalities for u\norm u, when E=RmE=\R^m has an p\ell^p-norm and F=RnF=\R^n has an q\ell^q-norm. However, given the explicit value of u\norm u for (p,q)=(2,2)(p,q)=(2,2) and the fact that no closed form expression exists for the spectral radius, it is unlikely that there is a closed form expression in the remaining cases.

Worse: the exact computation of u\norm u in the cases (,1)(\infty,1), (,2)(\infty,2) or (2,1)(2,1) is known to be computationally NP-complete, and I try to explain this result below, following J. Rohn (2000) (“Computing the Norm ∥ A ∥∞,1 Is NP-Hard”, Linear and Multilinear Algebra 47 (3), p. 195‑204). I concentrate on the (,1)(\infty, 1) case ; the (,2)(\infty,2) case is supposed to be analogous (see Joel Tropp's thesis, top of page 48, quoted by Wikipedia, but no arguments are given), and the case (2,1)(2,1) would follow by symmetry.

A matrix from a graph

Let us consider a finite (undirected, simple, without loops) graph GG on the set V={1,,n}V=\{1,\dots,n\} of nn vertices, with set of edges EE, and let us introduce the following n×nn\times n matrix A=(aij)A=(a_{ij}), a variant of the incidence matrix of the graph GG (actually nIEnI-E, where II is the identity matrix and EE is the incidence matrix of GG):

  • One has aii=na_{ii}=n for all ii;
  • If iji\neq j and vertices ii and jj are connected by an edge, then aij=1a_{ij}=-1;
  • Otherwise, aij=0a_{ij}=0.
For any subset SS of VV, the cut c(S)c(S) of SS is the number of edges which have one endpoint in SS and the other outside of SS.

Proposition.The (,1)(\ell^\infty,\ell^1)-norm of AA is given by  4supSVc(S) 2Card(E)+n2. 4 \sup_{S\subseteq V} c(S) - 2 \operatorname{Card}(E) + n^2.

The proof starts with the following observation, valid for more general matrices.

Lemma. — The (,1)(\ell^\infty,\ell^1)-norm of a symmetric positive n×nn\times n matrix AA is given by A=supzz,Az\norm A = \sup_z \langle z, Az \rangle where zz runs among the set ZZ of vectors in Rn\R^n with coordinates ±1\pm1.

The vectors of ZZ are the vertices of the polytope [1;1]n[-1;1]^n, which is the unit ball of Rn\R^n for the \ell^\infty-norm. Consequently, every vector of [1;1]n[-1;1]^n is a convex combination of vectors of ZZ. Writing x=zZczzx=\sum_{z\in Z} c_z z, we have Ax=czAzczAz=supzZAz.\norm {Ax} = \norm{\sum c_z Az} \leq \sum c_z \norm {Az}= \sup_{z\in Z} \norm{Az}. The other inequality being obvious, we already see that A=supzZAz\norm A=\sup_{z\in Z}\norm{Az}. Note that this formula holds for any norm on the codomain.
If, for zZz\in Z, one writes Az=(y1,,yn)Az=(y_1,\dots,y_n), one has Az=y1++yn\norm{Az}=|y_1|+\dots+|y_n|, because the codomain is endowed with the 1\ell^1-norm, so that z,Az=ziyiAz\langle z, Az\rangle = \sum z_i y_i\leq \norm{Az}. We thus the inequality supzZz,AzA\sup_{z\in Z} \langle z,Az\rangle \leq \norm A.
Let us now use the fact that AA is symmetric and positive. Fix zZz\in Z, set Az=(y1,,yn)Az=(y_1,\dots,y_n) as above, and define xZx\in Z by xi=1x_i=1 if yi0y_i\geq0 and xi=1x_i=-1 otherwise. One thus has x,Az=yi=Az\langle x, Az\rangle=\sum |y_i|=\norm{Az}. Since AA is symmetric and positive, one has xz,A(xz)0\langle x-z, A(x-z)\rangle\geq0, and this implies 2Az= 2x,Azx,Ax+z,Az,2\norm{Az}= 2\langle x, Az\rangle \leq \langle x, Ax\rangle+\langle z, Az\rangle, so that, AzsupxZx,Ax\norm{Az}\leq \sup_{x\in Z} \langle x, Ax\rangle. This concludes the proof.

To prove the theorem, we will apply the preceding lemma. We observe that AA is symmetric, by construction. It is also positive, since for every xRnx\in\R^n, one has x,Ax=aijxixjnxi2ijxixj=(n+1)xi2(xi)2xi2\langle x,Ax\rangle=\sum a_{ij}x_ix_j \geq n \sum x_i^2 -\sum_{i\neq j} x_i x_j = (n+1)\sum x_i^2- (\sum x_i)^2\geq \sum x_i^2 using the Cauchy-Schwarz inequality (xi)2nxi2(\sum x_i)^2\leq n\sum x_i^2. By the preceding lemma, we thus have  A=supz{±1}nz,Az. \norm A = \sup_{z\in\{\pm1\}^n} \langle z, Az\rangle. The 2n2^n vectors zZz\in Z are in bijection with the 2n2^n subsets of V={1,,n}V=\{1,\dots,n\}, by associating with zZz\in Z the subset SS of VV consisting of vertices ii such that zi=1z_i=1. Then, one can compute  z,Az=i,jaijzizj=4c(S)2Card(E)+n2. \langle z, Az\rangle = \sum_{i,j} a_{ij} z_iz_j = 4c(S) - 2\operatorname{Card}(E) + n^2. It follows that A \norm A  is equal to the indicated expression.

The last step of the proof is an application of the “simple max-cut” NP-hardness theorem of Garey, Johnson and Stockmeyer (1976), itself a strenghtening of Karp (1973)'s seminal result that “max-cut” is NP-complete. I won't explain the proofs of these results here, but let me explain what they mean and how they relate to the present discussion. First of all, computer scientists categorize problems according to the time that is required to solve them, in terms of the size of the entries. This notion depends on the actual computer that is used, but the theory of Turing machines allows to single out two classes, P and EXP, consisting of problems which can be solved in polynomial, respectively exponential, time in term of the size of the entries. A second notion, introduced by Karp, is that of NP problems, problems which can be solved in polynomial time by a “non deterministic Turing machine” — “nondeterministic” means the computer can parallelize itself at will when it needs to consider various possibilities. This class belongs to EXP (because one can simulate in exponential time a polynomial time nondeterministic algorithm) and also corresponds to the class of problems whose solution can be checked in polynomial time.

Our problem is to find a subset SS of {1,,n}\{1,\dots,n\} that maximizes c(S)c(S). This is a restriction of the “general max-cut” problem where, given an integer valued function ww on the set of edges, on wishes to find subset that maximizes c(S;w)c(S;w), the sum of the weights of the edges which have one endpoint in SS and the other outside of SS. Karp (1973) observed that the existence of SS such that c(S;w)mc(S;w)\geq m is an NP problem (if one is provided SS, it takes polynomial time to compute c(S;w)c(S;w) and to decide that it is at least mm), and the naïve search algorithm is in EXP, since there are 2n2^n such subsets. Moreover, Karp proves that any NP problem can be reduced to it in polynomial time. This is what is meant by the assertion that it is NP-complete. Consequently, determining supSc(S;w)\sup_S c(S;w) is NP-hard: if you can solve that problem, then you can solve the “max-cut” problem in polynomial time, hence any other NP-problem. A subsequent theorem by Garey, Johnson and Stockmeyer (1976) established that restricting the max-cut problems to ±1\pm1 weights is still NP-hard, and this completes the proof of Rohn's theorem.

(Aside, to insist that signs matter: a theorem of Edmonds and Karp (1972), one can solve the “min-cup” problem in polynomial time, which consists in deciding, for some given integer mm, whether there exist SS such that c(S;w)mc(S;w)\leq m.)

Saturday, April 13, 2024

The topology on the ring of polynomials and the continuity of the evaluation map

Polynomials are an algebraic gadget, and one is rarely led to think about the topology a ring of polynomials should carry. That happened to me, though, more or less by accident, when María Inés de Frutos Fernández and I worked on implementing in Lean the evaluation of power series. So let's start with them. To simplify the discussion, I only consider the case of one inderminate. When there are finitely many of them, the situation is the same; in the case of infinitely many indeterminates, there might be some additional subtleties, but I have not thought about it.

\gdef\lbra{[\![}\gdef\rbra{]\!]} \gdef\lpar{(\!(}\gdef\rpar{)\!)} \gdef\bN{\mathbf N} \gdef\coeff{\operatorname{coeff}} \gdef\eval{\operatorname{eval}} \gdef\colim{\operatorname{colim}}

Power series

A power series over a ring RR is just an expression anTn\sum a_nT^n, where (a0,a1,)(a_0,a_1, \dots) is a family of elements of RR indexed by the integers. After all, this is just what is meant by “formal series”: coefficients and nothing else.

Defining a topology on the ring R[ ⁣[T] ⁣]R\lbra T\rbra should allow to say what it means for a sequence (fm)(f_m) of power series to converge to a power series ff, and the most natural thing to require is that for every nn, the coefficient am,na_{m,n} of TnT^n in fmf_m converges to the corresponding coeffient ama_m of TnT^n in ff. In other words, we endow R[ ⁣[T] ⁣]R\lbra T\rbra with the product topology when it is identified with the product set RNR^{\bN}. The explicit definition may look complicated, but the important point for us is the following characterization of this topology: Let XX be a topological space and let f ⁣:XR[ ⁣[T] ⁣]f\colon X \to R\lbra T\rbra be a map; for ff to be continuous, it is necessary and sufficient that all maps fn ⁣:XRf_n\colon X \to R are continuous, where, for any xXx\in X, fn(x)f_n(x) is the nnth coefficient of f(x)f(x). In particular, the coeffient maps R[ ⁣[T] ⁣]RR\lbra T\rbra\to R are continuous.

What can we do with that topology, then? The first thing, maybe, is to observe its adequacy wrt the ring structure on R[ ⁣[T] ⁣]R\lbra T\rbra.

Proposition.If addition and multiplication on RR are continuous, then addition and multiplication on R[ ⁣[T] ⁣]R\lbra T\rbra are continuous.

Let's start with addition. We need to prove that s ⁣:R[ ⁣[T] ⁣]×R[ ⁣[T] ⁣]R[ ⁣[T] ⁣]s\colon R\lbra T\rbra \times R\lbra T\rbra\to R\lbra T\rbra is continuous. By the characterization, it is enough to prove that all coordinate functions sn ⁣:R[ ⁣[T] ⁣]×R[ ⁣[T] ⁣]Rs_n\colon R\lbra T\rbra \times R\lbra T\rbra\to R, (f,g)coeffn(f+g) (f,g)\mapsto \coeff_n(f+g) , are continuous. But these functions factor through the nnth coefficient maps: coeffn(f+g)=coeffn(f)+coeffn(g)\coeff_n(f+g) = \coeff_n(f)+\coeff_n(g), which is continuous, since addition, coefficients and projections are continuous. This is similar, but slightly more complicated for multiplication: if the multiplication map is denoted by mm, we have to prove that the maps mnm_n defined by mn(f,g)=coeffn(fg)m_n(f,g)=\coeff_n(f\cdot g) are continuous. However, they can be written as  mn(f,g)=coeffn(fg)=p=0ncoeffp(f)coeffnp(g). m_n(f,g)=\coeff_n(f\cdot g) = \sum_{p=0}^n \coeff_p(f)\coeff_{n-p}(g). Since the projections and the coefficient maps are continuous, it is sufficient to prove that the maps from Rn+1×Rn+1R^{n+1} \times R^{n+1} to RR given by ((a0,,an),(b0,,bn))p=0napbnp((a_0,\dots,a_n),(b_0,\dots,b_n))\mapsto \sum_{p=0}^n a_p b_{n-p} are continuous, and this follows from continuity and commutativity of addition on RR, because it is a polynomial expression.

Polynomials

At this point, let's go back to our initial question of endowing polynomials with a natural topology.

An obvious candidate is the induced topology. This looks correct; in any case, it is such that addition and multiplication on R[T]R[T] are continuous. However, it lacks an interesting property with respect to evaluation.

Recall that for every aRa\in R, there is an evaluation map evala ⁣:R[T]R\eval_a\colon R[T]\to R, defined by ff(a)f\mapsto f(a), and even, if one wishes, the two-variable evaluation map R[T]×RRR[T]\times R\to R.
The first claim is that this map is not continuous.

An example will serve of proof. I take RR to be the real numbers, fn=Tnf_n=T^n and a=1a=1. Then fnf_n converges to zero, because for each integer mm, the real numbers coeffm(fn)\coeff_m(f_n) are zero for n>mn>m. On the other hand, fn(a)=fn(1)=1f_n(a)=f_n(1)=1 for all nn, and this does not converge to zero!

So we have to change the topology on polynomials if we want that this map be continuous, and we now give the correct definition. The ring of polynomials is the increasing union of subsets R[T]nR[T]_n, indexed by integers nn, consisting of all polynomials of degree less than nn. Each of these subsets is given the product topology, as above, but we endow their union with the “inductive limit” topology. Explicitly, if YY is a topological space and u ⁣:R[T]Yu\colon R[T]\to Y is a map, then uu is continuous if and only if, for each integer nn, its restriction to R[T]nR[T]_n is continuous.

The inclusion map R[T]R[ ⁣[T] ⁣]R[T]\to R\lbra T\rbra is continuous, hence the topology on polynomials is finer than the topology induced by the topology on power series. As the following property indicates, it is usually strictly finer.

We can also observe that addition and multiplication on R[T]R[T] are still continuous. The same proof as above works, once we observe that the coefficient maps are continuous. (On the other hand, one may be tempted to compare the product topology of the inductive topologies, with the inductive topology of the product topologies, a thing which is not obvious in the direction that we need.)

Proposition.Assume that addition and multiplication on RR are continuous. Then the evaluation maps evala ⁣:R[T]R\eval_a \colon R[T]\to R are continuous.

We have We have to prove that for every integer nn, the evaluation map evala\eval_a induced a continuous map from R[T]nR[T]_n to RR. Now, this map factors as a projection map R[T]Rn+1R[T]\to R^{n+1} composed with a polynomial map (c0,,cn)c0+c1a++cnan(c_0,\dots,c_n)\mapsto c_0+c_1a+\dots+c_n a^n. It is therefore continuous.

Laurent series

We can upgrade the preceding discussion and define a natural topology on the ring R( ⁣(T) ⁣)R\lpar T\rpar of Laurent series, which are the power series with possibly negative exponents. For this, for all integers dd, we set R( ⁣(T) ⁣)dR\lpar T\rpar_d to be the set of power series of the form f=n=dcnTn f=\sum_{n=-d}^\infty c_n T^n, we endow that set with the product topology, and take the corresponding inductive limit topology. We leave to the reader to check that this is a ring topology, but that the naïve product topology on R( ⁣(T) ⁣)R\lpar T\rpar wouldn't be in general.

Back to the continuity of evaluation

The continuity of the evaluation maps ff(a)f\mapsto f(a) were an important guide to the topology of the ring of polynomials. This suggests a more general question, for which I don't have a full answer, whether the two-variable evaluation map, (f,a)f(a)(f,a)\mapsto f(a), is continuous. On each subspace R[T]d×RR[T]_d\times R, the evaluation map is given by a polynomial map ((c0,,cd,a)c0+c1a++cdad(c_0,\dots,c_d,a)\mapsto c_0 +c_1a+\dots+c_d a^d), hence is continuous, but that does not imply the desired continuity, because that only tells us about R[T]×RR[T]\times R with the topology colimd(R[T]d×R)\colim_d (R[T]_d\times R), while we are interested in the topology (colimdR[T]d)×R(\colim_d R[T]_d)\times R. To compare these topologies, note that the natural bijection colimd(R[T]d×R)(colimdR[T]d)×R\colim_d (R[T]_d\times R) \to (\colim_d R[T]_d)\times R is continuous (because it is continuous at each level dd), but the continuity of its inverse is not so clear.

I find it amusing, then, to observe that sequential continuity holds in the important case where RR is a field. This relies on the following proposition.

Proposition.Assume that RR is a field. Then, for every converging sequence (fn)(f_n) in R[T]R[T], the degrees deg(fn)\deg(f_n) are bounded.

Otherwise, we can assume that (fn)(f_n) converges to 00 and that deg(fn+1)>deg(fn)\deg(f_{n+1})>\deg(f_n) for all nn. We construct a continuous linear form ϕ\phi on R[T]R[T] such that ϕ(fn)\phi(f_n) does not converge to 00. This linear form is given by a formal power series ϕ(f)=adcd\phi(f)=\sum a_d c_d for f=cdTdf=\sum c_dT^d, and we choose the coefficients (an)(a_n) by induction so that ϕ(fn)=1\phi(f_n)=1 for all nn. Indeed, if the coefficients are chosen up to deg(fn)\deg(f_n), then we fix ad=0a_d=0 for deg(fn)<d<deg(fn+1)\deg(f_n)<d<\deg(f_{n+1}) and choose adeg(fn+1)a_{\deg(f_{n+1})} so that ϕ(fn+1)=1\phi(f_{n+1})=1. This linear form is continuous because its restriction to any R[T]dR[T]_d is given by a polynomial, hence is continuous.

Corollary. — If RR is a topological ring which is a field, then the evaluation map R[T]×RRR[T]\times R\to R is sequentially continuous.

Consider sequences (fn)(f_n) in R[T]R[T] and (an)(a_n) in RR that converge to ff and aa respectively. By the proposition, there is an integer dd such that deg(fn)d\deg(f_n)\leq d for all nn, and deg(f)d\deg(f)\leq d. Since evaluation is continuous on R[T]d×RR[T]_d\times R, one has fn(an)f(a)f_n(a_n)\to f(a), as claimed.

Remark. — The previous proposition does not hold on rings. In fact, if R=ZpR=\mathbf Z_p is the ring of pp-adic integers, then ϕ(pnTn)=pnϕ(Tn)\phi(p^nT^n)=p^n \phi(T^n) converges to 00 for every continuous linear form ϕ\phi on R[T]R[T]. More is true since in that case, evaluation is continuous! The point is that in Zp\mathbf Z_p, the ideals (pn)(p^n) form a basis of neighborhoods of the origin.

Proposition. — If the topology of RR is linear, namely the origin of RR has a basis of neighborhoods consisting of ideals, then the evaluation map R[T]×RRR[T]\times R\to R is continuous.

By translation, one reduces to showing continuity at (0,0)(0,0). Let VV be a neighborhood of 00 in RR and let II be an ideal of RR such that IVI\subset V. Since it is an subgroup of the additive group of RR, the ideal II is open. Then the set IR[T]I\cdot R[T] is open because for every dd, its trace on R[T]dR[T]_d, is equal to IR[T]dI\cdot R[T]_d, hence is open. Then, for fIR[T]f\in I\cdot R[T] and aRa\in R, one has f(a)If(a)\in I, hence f(a)Vf(a)\in V.

Here is one case where I can prove that evaluation is continuous.

Proposition.If the topology of RR is given by a family of absolute values, then the evaluation map (f,a)f(a)(f,a)\mapsto f(a) is continuous.

I just treat the case where the topology of RR is given by one absolute value. By translation and linearity, it suffices to prove continuity at (0,0)(0,0). Consider the norm 1\|\cdot\|_1 on R[T]R[T] defined by f1=cn\|f\|_1=\sum |c_n| if f=cnTnf=\sum c_nT^n. By the triangular inequality, one has f(a)f1|f(a)|\leq \|f\|_1 for any aRa\in R such that a1|a|\leq 1. For every r>0r>0, the set VrV_r of polynomials fR[T]f\in R[T] such that f1<r\|f\|_1<r is an open neighborhood of the origin since, for every integer dd, its intersection with R[T]dR[T]_d is an open neighborhood of the origin in R[T]dR[T]_d. Let also WW be the set of aRa\in R such that a1|a|\leq 1. Then Vr×WV_r\times W is a neighborhood of (0,0)(0,0) in R[T]×RR[T]\times R such that f(a)<r|f(a)|<r for every (f,a)Vr×W(f,a)\in V_r\times W. This implies the desired continuity.