Showing posts with label set theory. Show all posts
Showing posts with label set theory. Show all posts

Saturday, September 6, 2025

The two adjunctions of the preimage

Sometimes in mathematics, you are told about very elementary things of which you hadn't even thought.

I was well aware of some “duality” between image and preimage, but I just learned from Anatole Dedecker (who learned it from Patrick Massot) about another “duality” between preimage and some other notion. Moreover, it appears that this new notion can be used for making slightly more natural a proof in general topology!

Here, “duality” is taken in an informal meaning, the correct word is “adjunction”, in the sense of category theory, and I will try to explain that.

1. Image and preimage

So consider a map $f\colon X \to Y$ between two sets. It induces two other maps relating the sets $\mathcal P(X)$ and $\mathcal P(Y)$ of subsets of $X$ and $Y$. Note that the inclusion relation between subsets these two sets $\mathcal P(X)$ and $\mathcal P(Y)$ allows to view them as ordered sets.

First, we have the direct image operation $f_{*}$, that maps a subset $A\subseteq X$ to the subset $f_{*}(A)$ of $Y$, the set of all images $f(a)\in X$, for $a\in A$. The classical notation would be $f(A)$, but it is ambiguous in the case where a subset $A$ of $X$ is also an element of $X$, and introducing a specific notation will help to clarify some statements later on. This map $f_{*}\colon \mathcal P(X) \to \mathcal P(Y)$ is increasing: for $A$ and $A'\in\mathcal P(X)$ such that $A\subseteq A'$, one has $f_{*}(A) \subseteq f_{*}(A')$.

Then we have the preimage operation $f^{*}$, that maps a subset $B\subseteq X$ to the subset $f^{*}(B)$ of $X$ consisting of all preimages of elements of $B$, namely all $a\in A$ such that $f(a) \in B$. The classical notation is rather $f^{-1}(B)$, but it has the same ambiguity as the direct image. Bizarrely, Bourbaki found the need to invent a another notation for that one, and they put the symbol “$-1$” on top of the letter $f$. The notation $f ^{*}$ is chosen by symmetry with the direct image $f_{*}$. Again, the map $f^{*}\colon \mathcal P(Y) \to\mathcal P(X)$ is increasing: for $B$ and $B'\in\mathcal P(Y)$ such that $B\subseteq B'$, one has $f^{*}(B) \subseteq f^{*}(B')$.

Finally, there is a compatibility between these two operations $f_{*}$ and $f ^{*}$: for $A\in\mathcal P(X)$ and $B\in\mathcal P(Y)$, one has $f_{*}(A) \subseteq B$ if and only if $A \subseteq f^{*}(B)$. Indeed, both of these expressions mean that if $f(a) \in B$ for all $a\in A$. We summarize this property by  saying that the operation $f_{*}$ is left adjoint to the operation $f ^{*}$, or that the operation $f^{*}$ is right adjoint to the operation $f_{*}$.

This terminology comes from category theory, in which adjunctions of functors play an important role since the paper of Daniel Kan (1958), Adjoint functors.

In our case, the categories are just the ordered sets $\mathcal P(X)$ and $\mathcal P(Y)$, with the corresponding sets as sets of objects, and where the set of arrows $A$ to $A'\in\mathcal P(X)$ is a singleton when $A\subseteq A'$, and is empty otherwise. The book of Emily Riehl (2016), Category Theory in Context, is a nice introduction to this topic, with illuminating elementary examples. The property that the operations $f_{*}$ and $f^{*}$ are increasing means that they are *functors* between these categories, and the equivalence $f_{*}(A) \subseteq B \Leftrightarrow A \subseteq f^{*}(B)$ induces the category-theoretical adjunction.

In this case, an adjunction pair is also called a Galois connection. There, the terminology comes from  Galois theory, the two ordered sets are the set of subextensions of a Galois extension $K\to L$ and the set of subgroups of the Galois group $\operatorname{Gal}(L/K)$, the maps are decreasing and correspond to mapping a subextension $E$ of $L$ to the subgroup of $\operatorname{Gal}(L/E)$ of $\operatorname{Gal}(L/K)$, and a subgroup $H\subseteq \operatorname{Gal}(L/K)$ to the fixed-field $L^H$. In Galois theory, these two maps are even bijective.

2. The adjoint functor theorem

While, as MacLane wrote, “adjoint functors arise everywhere”, not every functor can be part of an adjunction. Indeed, if a functor $F$ is left adjoint to a functor $G$, then $F$ preserves colimits and $G$ preserves limits.

Category theory considers limits and colimits of arbitrary diagrams, but in the restricted setting of ordered sets, where there can be at most one arrow from one object to another, diagrams boil down to subsets of objects, limits correspond to infimums (greatest lower bound) and colimits to supremums (least upper bound), which may exist, or not, in particular ordered sets.In our even more restricted case of the set $\mathcal P(X)$ of subsets of a given set $X$, infimum corresponds to intersection, supremum to union, and we have $f_{*}(\bigcup A_i) = \bigcup f_{*}(A_i)$ for every family $(A_i)$ of subsets of $X$, and $f^{*}(\bigcap B_i) = \bigcap f^{*}(B_i)$ for every family $(B_i)$ of subsets of $Y$.

There is an abstract theorem in category theory, the “general adjoint functor theorem”, that says that these property are essentially sufficient for a functor $F$ to be a left adjoint to some functor $G$, or for a functor $G$ to be a right adjoint to some functor $G$. One has to be more careful for the actual statement, but this is the idea.

For an increasing map $G\colon T \to S$ between ordered sets $S$ and $T$, the existence of a left adjoint $F$ can be understood from: for $s\in S$ and $t\in T$, one should have $F(s)\leq t$ if and only if $s\leq G(t)$: consequently, it suffices to take for $F$ the infimum, assuming it exists, of all $t$ such that $s\leq G(t)$. Dually,  the right adjoint $G$ to a functor $F$ would map $t$ to the supremum, assuming it exists,  of all $s$ such that $t\leq F(s)$.

In the case of the image $f_{*}\colon \mathcal P(Y)\to \mathcal P(X)$, this rule defines the right adjoint as mapping $B \in\mathcal P(Y)$ to the union of all subsets $A\in\mathcal P(X)$ such that $f _{*}(A) \subseteq B$. This is exactly the preimage of $B$!

Conversely, in the case of the preimage $f^{*}\colon \mathcal P(Y)\to \mathcal P(X)$, this procedure defines the left adjoint as mapping $A \in\mathcal P(X)$ to the intersection of all subsets $B$ such that $A \subseteq f^{*}(B)$. Again, this is just the image $f _{*}(A)$ of $A$, but I find it slightly more difficult to prove without using that we already know this image and the already known adjunction between $f _{*}$ and $f ^{*}$.

3. The other adjunction

We have seen that preimages respect intersections. As a matter of fact, they also respect unions: $f ^{*}(\bigcup B_i)= \bigcup f ^{*}(B_i)$. Given the adjoint functor theorem, this implies that there is an increasing map $f_! \colon \mathcal P(X) \to \mathcal P(Y)$ which is a right adjoint to $f ^{*}$. What is this operation?

The adjoint functor theorem gives a way to compute it: for $A\in\mathcal P(X)$, the set $f_!(A)\in\mathcal P(Y)$ is the union of all subsets $B\in\mathcal P(Y)$ such that $f^{*}(B) \subseteq A$. It suffices to consider such sets $B$ which are singletons $\{b\}$ and we get that a point $b\in Y$ belongs to $f_!(A)$ if and only if all preimages of $b$ belong to $A$.

Here are two more ways to get a grip on this new adjunction.

Note that a point $b\in Y$ belongs to $f_{*}(A)$ if and only if there exists $a\in A$ such that $b = f (a)$, which means that there exists $a\in A$ in the preimage $f^{*}(\{b\})$, relating $f_{*}$ with the existential quantifier. Similarly, a point $b\in Y$ belongs to $f_! (A)$ if and only if for every $a\in f^{*}(\{b\})$, one has $a\in A$, relating $f_!$ with the universal quantifier.

The other way comes by taking complements: a point $b$ does not belong to $f_!(A)$ if it has a preimage that does not belong to $a$. In other words, $f_!(A) = \complement f_{*}(\complement A)$. This leads to considering the complement map from $\mathcal P(X)$ to itself as an order-reversing involution, and similarly on $\mathcal P(Y)$, and observing that they commute with preimage, in the sense that $f^{*}(\complement B) = \complement f^{*}(B)$ for all $B\subseteq Y$. Consequently, this operation transfers the left adjoint $f _{*}$ of $f ^{*}$ to a right adjoint, and conversely, which is exactly what we had observed.

4. An application in general topology

As an application, this adjunction can be used in topology to characterize open or closed maps. By definition, a map $f \colon X\to Y$ between topological spaces is open if it maps an open subset to an open subset, and it is closed if it maps a closed subset to a closed subset.

The definition of $f_!$ using complement, and the fact that a set is closed if and only if its complement is open implies the following lemma:

Lemma.A map $f\colon X \to Y$ is closed (resp. open) if and only if for every open (resp. closed) subset $U\subseteq X$, the set $f_! (U)$ is closed (resp. open).

It also allows to give a natural proof of the classical characterization of closed maps:

Proposition.Let  $f\colon X \to Y$ be a map between topological spaces. The following properties are equivalent:

  1. The map $f$ is closed;
  2. For any subset $B$ of $Y$, the filter of neighborhoods of $f^{*}(B)$ is coarser than the preimage of the filter of neighborhoods of $B$;
  3. For any subset $B$ of $Y$ and any neighborhood $U$ of $f^{*}(B)$, there exists a neighborhood $V$ of $B$ such that $f^{*}(V)\subseteq U$;
  4. For any point $b\in Y$, the filter of neighborhoods of $f^{*}(\{b\})$ is coarser than the preimage of the filter of neighborhoods of $b$;
  5. For any point $b\in Y$ and any neighborhood $U$ of $f^{*}(\{b\})$, there exists a neighborhood $V$ of $b$ such that $f^{*}(V) \subseteq U$.


Given the definitions of the preimage of a filter and the comparison relation on filters,
the assertions (2) and (3) are equivalent, as well as the assertions (4) and (5).

Obviously, (3) implies (5).

Let us assume (1), that $f$ is closed, and let us prove (3). Let $B$ be a subset of $Y$ and let $U$ be a neighborhood of $f^{*}B$ in $X$. By definition, there exists an open subset $U'$ of $X$ such that $f^{*}B \subseteq U' \subseteq U$. Taking adjunction, we get $B\subseteq f_! U' \subseteq f_! U$. Since $f$ is closed, the set $f_! U'$ is open, so that $f_! U$ is a neighborhood of $B$. It remains to prove that $f^{*}f_! U\subseteq U$.  To prove this inclusion, we apply the adjunction $(f^{*}, f_!)$ once more, and see that it is equivalent to the obvious inclusion  $f_! U \subseteq f_! U$.

Finally, let us assume (5) and let us prove that $f$ is closed. Let $U$ be an open subset of $X$ and let us prove that $f_! U$ is open in $Y$. It suffices to prove that for every $b\in f_! U$, the set $f_! U$ is a neighborhood of $b$. By the construction of $f_!$, the set $f^{*}(\{b\}) $ is contained in $U$ so that $U$ is a neighborhood of $f^{*}(\{b\})$. Applying (5), we get a neighborhood $V$ of $b$ in $Y$ such that $f^{*}V \subseteq U$. Applying the adjunction $(f^{*}, f_!)$, we get the inclusion $V \subseteq f_! U$. In particular, $f_! U$ is a neighborhood of $b$, as was to be shown.
 

Saturday, March 27, 2021

The Fermat–Lévy constant

Undergraduate analysis is full of computions of strange trigonometric series and I posted on Twitter, as a challenge, to compute the following constant: $$ c = \sum_{p=3}^\infty \frac1{p^2} \int_0^{2\pi} \left( \sum_{n=1}^\infty \frac1{n^2} \cos(nx)\right)^3\, dx. $$ As some followers quickly remarked, there could be some irony with this exercise, because quite unexpectedly, they could solve it using the solution to Fermat's Last Theorem. So I'll start with the origin of this problem.

(Light) philosophy of mathematics

This constant came as a screen capture from a 1950 paper by Paul Lévy, Axiome de Zermelo et nombres transfinis (Zermelo axiom and transfinite numbers) where he discusses various reasons to accept, or refuse, Zermelo's principle according to which every set can be endowed with a well-ordering.

Zermelo had proved this principle in 1905 and we know today that this principle is equivalent to the “axiom of choice”. In fact, given a set $S$, Zermelo chooses once and for all, for every nonempty subset $A$ of $S$ some element $f(A)$ of $A$ and it seems that this is precisely this simultaneous quantity of choices that led his contemporaries to realize the necessity of the axiom of choice, as an axiom, first for philosophical reasons — because they objected to the very possibility of doing so.

We know today how Zermelo's theorem is equivalent to the axiom of choice, that it is essentially inoccuous (Gödel, 1938, showing that if there is a set theory without assuming choice, there is some set theory where the axiom of choice is valid), but that it is also unprovable from the other axioms of set theory, say, those (ZF) proposed by Zermelo and Fraenkel (Cohen, 1965, showing that if there is a set theory without assuming the axiom of choice, there is some set theory where the axiom of choice does not hold).

We also know how this axiom, whose usefulness is recognized in the proofs of many theorems of algebra (existence and uniqueness of an algebraic closure, of prime or maximal ideals, of basis for vector spaces, etc.), also leads to paradoxical constructions (such as nonmeasurable sets, or the Banach–Tarski paradoxical decomposition of the unit sphere into finitely many parts that can be recombined into two unit spheres). For many modern mathematicians, the question is now that of a risk-benefice balance — using this axiom on safe grounds, say. However, for mathematicians such as Lebesgue, Borel or Brouwer, who were inspired by a constructive or intuitionistic philosophy of mathematics, this axiom had to be rejected, because they objected to the idea of using objects that cannot be named.

In his 1950 paper, Paul Lévy discusses these objections and explains why he rejects them. As he recognizes, the nature of language only allows for a countable amount of mathematical formulas, hence we can only name a countable amount of mathematical objects, and we know – this is a 1878 theorem of Cantor – that the continuum, for example, is not countable. There must exist, says Paul Lévy, numbers that cannot be named, « des nombres innomables ». I should add that Loren Graham and Jean-Michel Kantor wrote a book on that topic, Naming Infinity: A True Story of Religious Mysticism and Mathematical Creativity, which contains a whole chapter on what they call “the French trio: Borel, Lebesgue, Baire”.

At this point, Lévy introduces the constant whose definition starts this note. Let us now read what Lévy tells us (my translation):

Do mathematical objects exist in themselves, or only as products of the human mind and of our human logic? The point of view of Mr Brouwer regarding the truth of the statements of theorems forces us to ask the question.

Let us consider a precise statement, to fix ideas, that of Fermat's theorem. For us, it is true or false, and the sum $$ c = \sum_{p=3}^\infty \int_0^{2\pi} \left( \sum_1^\infty \frac{\cos n^p x}{n^2}\right)^3\, dx $$ is zero in the first case and positive in the second. If it is false, one can check that it is false (although we do not know in advance how much time is needed). If it is true, an infinite number of checks would be necessary to assure it, and nothing guarantees a priori that this infinite number can be replaced by a finite reasoning. So there are three possible hypotheses (for some theorems, there would be four): the theorem is false; it is true and provable; it is true and unprovable.

In this latter case, Mr Brouwer refuses to consider this theorem as true, and the constant $c$ is neither zero nor positive (nor negative, of course).

Let us detail his argument a little bit. Lévy says that his strangely defined constant $c$ is zero in case FLT is true, and is strictly positive otherwise. But what is its value? For some mathematicians such as Brouwer, this constant can only have a value if we can prove that we are in one case or the other. At the time when Lévy writes his article, in 1950, one did not know yet whether Fermat's last theorem was true or false, it would only be proved in 1995! There is one way to prove that FLT is false, it consists in enumerating all quadruples $(x,y,z,p)$, where $x,y,z$ are strictly positive integers and $p$ is a prime number $\geq 3$, until one reaches some quadruple where $x^p+y^p=z^p$ – this guarantees that $c>0$, Lévy says. But if FLT is true, then this enumeration will go on indefinitely and we will never reach the conclusion. Lévy has an objection to this disappointment, which I find quite amusing, based on the paradox of Zeno of Elea: imagine that we are quicker and quicker at each verification, for example, say twice as fast each time. We could need 1 second for the first quadruple, half a second for the second one, a quarter second for the third one, and so on, and then the whole verification would take 2 seconds. This is obviously a thought experiment and I do not know if physics forbids such a phaenomenon, which anyway Lévy qualifies as supernatural. Computer scientists could object as well, because the complexity of the required computations grows at each step. But let us stop that this philosophical discussion and go back to Lévy's constant.

Computing the Lévy constant

The indicated sum is a series, indexed by prime number $p\geq 3$ (we could consider integers as well) of some integral from $0$ to $2\pi$ of some trigonometric expression (divided by $p^2$) which I will denote by $c_p$.

The point is that $c_p$ is always positive or zero, and that $c_p$ is strictly positive if and only if Fermat's last theorem fails for exponent $p$. It is not precised by Lévy, but the goal of division by $p^2$ is to make sure that the infinite series converges, whatever its sum.

Since cosines are smaller than $1$, the series of all $\cos(n^p x)/n^2$ converges normally and its sum is a continuous function of $x$ which I will denote by $f(x)$. And one has $c_p=\int_0^{2\pi} f(x)^3\,dx$. (Since $f$ is bounded from above independently of $p$, for example by $\pi^2/6$, each integral is uniformly bounded, and the division by $p^2$ will make the whole series convergent.) To understand this integral, let us expand the cube.

What we get is a sum, indexed by triples of integers $(k,m,n)$ of terms of the form $\cos(k^px) \cos(m^px)\cos(n^px) / k^2m^2n^2$, and we need to integrate each of them from $0$ to $2\pi$.

Several methods are possible, such as knowing one's formulas for products of cosines, but I prefer replacing the cosines by their complex exponential expression (Euler's formula), so that each term gets replaced by a sum of 8 terms of the form $e^{\pm i k^p x} e^{\pm i m^p x} e^{\pm i n^p x} / 8 k^2m^2n^2$. Let us factor the exponentials: we get $e^{ i (\pm k^p\pm m^p\pm n^p) x} / 8k^2m^2 n^2$, and its integral from $0$ to $2\pi$ is equal to $\pi/4k^2m^2n^2$ if the argument of $x$ vanishes, and is zero otherwise.

In other words, the triple $(k,m,n)$ contributes as a strictly positive quantity to the sum as soon as $(\pm k,\pm m,\pm n)$ is a counterexample to Fermat's theorem for exponent $p$. Amusing, isn't it? Yes, but you will admit that it is essentially tautological… However…

The circle method

I now reach deeper, in any case mathematically more efficient, considerations and wish to say some words of the circle method. I'll take as an example the problem that the English mathematician Waring had posed in 1770.

In his Arithmetics, Diophantus had asked whether every integer is a sum of 4 perfect squares of integers, and Lagrange proved in 1770 that it is indeed the case. Waring generalized the question to higher powers: is it true that every integer is the sum of at most 9 perfect cubes of integers, or 19 fourth powers, etc.? These two numbers, 9 and 19, correspond to the fact that we need indeed 9 cubes to write 23 ($23 = 2^3 + 2^3 + 2^3 + 1 +1 + 1 + 1 + 1$) and 19 fourth powers to write $79$. So Waring was asking whether this necessary bound was, in fact, sufficient.

We know today that this is the case for cubes (Wieferich, 1909; Kempner, 1912) ou les puissances quatrièmes (Balasubramanian, Dress, Deshouillers, 1986). We also know that there is, for every integer $k\geq3$, some smallest integer $g(k)$ such that every integer is the sum of at most $g(k)$, as well as an upper bound which conjecturally is an equality, namely $g(k)=2^k+\lfloor (3/2)^k\rfloor - 2$.

However, the question seems to me as rather less interesting, when $k$ grows, than the asymptotic analogue: every large enough integer is the sum of at most $G(k)$ perfect $k$th powers, because for this problem a much better bound is known to hold: $G(k)\leq 2k \log(k)+\cdots$ (Vinogradov, Karatsuba, Vaughan, Wooley…).

And to establish these results, trigonometric series strike back, as the heart of the circle method of Hardy and Littlewood.

To write $N$ as a sum of $k$th powers, the idea is to set $f(x)$ as the sum, for all integers $n$ smaller than $P=N^{1/k}$, of $\exp(2i\pi n^k x)$, and to observe that the number of solution to Waring's equality $n_1^k+\cdots+n_r^k=N$ is given by the integral from $0$ to $1$ of the quantity $f(x)^r \exp(-2i\pi Nx)$.

At this point, we have not been less tautological than Paul Lévy was, and it required the genius of Hardy, Littlewood and their followers to observe that this integral could be actually evaluated. To that aim, the idea is to split the interval $[0;1]$ in two regions. First of all, intervals (major arcs) centered around rational numbers of the form $a/q$, where $q$ is “small”, and the complementary set (minor arcs). Roughly, the contribution of major arcs can be reduced to trigonometric sums in finite fields, using the Chinese remainder theorem and other trics, and it appears to have a simple asymptotic expansion – this is number theory. When the Gods favor us, the contribution of the minor arcs may be bounded efficiently, using more analytic methods, such as Cauchy-Schwarz or Weyl differencing. This requires the integer $r$ to be large enough, and then one gets an asymptotic expansion for the number of $r$ perfect $k$th powers that add up to $N$, and it is strictly positive for $N$ large enough!

What one needs to remember is that this works for $r$ quite large. In other words, we can study Diophantine equations whose number of variables is very large compared to the degree. An article by Birch (and which I have not studied well enough…) is called Forms in many variables. Alas, it is completely inoperant for questions such as Fermat's last theorem which has 3 variables and quite a large degree. On the other hand, the circle method is very robust, it allows the equation to be modified – Birch's paper is very flexible – and we know that small modifications of an equation can give rise to dramatic differences regarding its behaviour.

So, as the computer scientists would say, this may be rather a feature than a bug.

Sunday, December 8, 2013

Homotopy type theory on Images des mathématiques

This post will be a short advertisement to a longer general audience text about homotopy type theory that I published on the website Images des mathématiques.

In this text, I try to convey my excitement at the reading of the book published by the participants of last year's IAS program, under direction of Steve Awoodey, Thierry Coquand and Vladimir Voevodsky.  As I write there (this is the title of this article), this remarkable work is at the crossroads of foundations of mathematics, topology and computer science. Indeed, the new foundational setup for mathematics provided by type theory may not only replace set theory; it is also at the heart of the systems for computer proof checking, and gave birth to a new kind of ``synthetic homotopy theory'' which is totally freed of the general topology framework.

Also remarkable is the way this book was produced: written collaboratively, using technology well known in open source software's development, then published under a Creative commons's license, and printed on demand.

This is not the only general audience paper on this subject, probably not the last one neither. Here are links to those I know of:
Once more, here is the link towards my article on Images des mathématiques and that towards the HoTT Book!

Friday, March 8, 2013

A presheaf that has no associated sheaf

In his paper Basically bounded functors and flat sheaves (Pacific Math. J, vol. 57, no. 2, 1975, p. 597-610), William C. Waterhouse gives a nice example of a presheaf that has no associated sheaf. This is Theorem 5.5 (page 605).  I thank François Loeser for having indicated this paper to me, and for his suggestion of explaining it here!


Of course, such a beast is reputed not to exist, since it is well known that any presheaf has an associated sheaf, see for example Godement's book Topologie algébrique et théorie des faisceaux, pages 110-111.
That is, for any presheaf $F$ on a topological space, there is a sheaf $G$ with a morphism of presheaves $\alpha\colon F\to G$ which satisfies a universal property: any morphism from $F$ to a sheaf factors uniquely through $\alpha$.


Waterhouse's presheaf is a more sophisticated example of a presheaf, since it is a presheaf on the category of affine schemes for the flat topology. Thus, a presheaf $F$ on the category of affine schemes is the datum, 

  • of a set $F(A)$ for every ring $A$, 
  • and of a map $\phi_*\colon F(A)\to F(B)$ for every morphism of rings $\phi\colon A\to B$,

subject to the following conditions:

  • if $\phi\colon A\to B$ and $\psi\colon B\to C$ are morphism of rings, then $(\psi\circ\phi)_*=\psi_*\circ\phi_*$;
  • one has ${\rm id}_A)_*={\rm id}_{F(A)}$ for every ring $A$.

Any morphism of rings $\phi\colon A\to B$ gives rise to two morphisms $\psi_1,\psi_2\colon B\to B\otimes B$ respectively defined by $\psi_1(b)=b\otimes 1$ and $\psi_2(b)=1\otimes b$, and the two compositions $A\to B\to B\otimes_A B$ are equal. Consequently, for any presheaf $F$, the two associated maps $F(A) \to F(B) \to F(B\otimes_A B)$ are equal.

By definition, a presheaf $F$ is a sheaf for the flat topology if for any faithfully flat morphism of rings, the map ${\phi_*} \colon F(A)\to F(B)$ is injective and its image is the set of elements $g\in F(B)$ at which the two natural maps $(\psi_1)_*$ and $(\psi_2)_*$ from $F(B)$ to $F(B\otimes_A B)$ coincide.

Here is Waterhouse's example.

For every ring $A$, let $F(A)$ be the set of all locally constant functions $f$ from $\mathop{\rm Spec}(A)$ to some von Neumann cardinal such that $f(\mathfrak p)<\mathop{\rm Card}(\kappa(\mathfrak p))$ for every $\mathfrak p\in\mathop{\rm Spec}(A)$.

This is a presheaf. Indeed, let $\phi\colon A\to B$ is a ring morphism, let $\phi^a\colon\mathop{\rm Spec}(B)\to \mathop{\rm Spec}(A)$ be the associated continuous map on spectra. For $f\in F(A)$, then $f\circ\phi^a$ is a locally constant map from ${\rm Spec}(B)$ to some von Neumann cardinal. Moreover, for every prime ideal $\mathfrak q$ in $B$, with inverse image $\mathfrak p=\phi^{-1}(\mathfrak q)=\phi^a(\mathfrak q)$, the morphism $\phi$ induces an injection from the residue field $\kappa(\mathfrak q)$ into $\kappa(\mathfrak p)$, so that $f\circ\phi^a$ satisfies the additional condition on $F$, hence $f\circ\phi^a\in F(B)$.

However, this presheaf has no associated sheaf for the flat topology. The proof is by contradiction. So assume that $G$ is a sheaf and $\alpha\colon F\to G$ satisfies the universal property.

First of all, we prove that the morphism $\alpha$ is injective: for any ring $A$, the map $\alpha_A\colon F(A)\to G(A)$ is injective. For any cardinal $c$ and any ring $A$, let $L_c(A)$ be the set of locally constant maps  from ${\rm Spec}(A)$ to $c$. Then $L_c$ is a presheaf, and in fact a sheaf. There is a natural morphism of presheaves $\beta_c\colon F\to L_c$, given by $\beta_c(f)(\mathfrak p)=f(\mathfrak p)$ if $f(\mathfrak p)\in c$, that is, $f(\mathfrak p)<c$, and $\beta_c(f)(\mathfrak p)=0$ otherwise. Consequently, there is a unique morphism of sheaves $\gamma_c\colon G\to L_c$ such that $\beta_c=\gamma_c\circ\alpha$. For any ring $A$, and any large enough cardinal $c$, the  map $\beta_c(A)\colon F(A)\to L_c(A)$ is injective. In particular, the map $\alpha(A)$ must be injective.

Let $B$ be a ring and $\phi\colon A\to B$ be a faithfully flat morphism. Let $\psi_1,\psi_2\colon B\to B\otimes_A B$ be the two natural morphisms of rings defined above. Then, the equalizer $E(A,B)$ of the two maps $(\psi_1)_*$ and $(\psi_2)_*$ from $F(B)$ to $F(B\otimes_A B)$ must inject into the equalizer of the two corresponding maps from $G(B)$ to $G(B\otimes_A B)$. Consequently, one has an injection from $E(A,B)$ to $G(A)$.

The contradiction will become apparent once one can find rings $B$ for which $E(A,B)$ has a cardinality as large as desired. If ${\rm Spec}(B)$ is a point $\mathfrak p$, then $F(B)$ is just the set of functions $f$ from the point $\mathfrak p$ to some von Neumann cardinal $c$ such that $f(\mathfrak p)<{\rm Card}(\kappa(\mathfrak p))$. That is, $F(B)$ is the cardinal ${\rm Card}(\kappa(\mathfrak p))$ itself. And since ${\rm Spec}(B)$ is a point, the coincidence condition is necessarily satisfied, so that $E(A,B)= {\rm Card}(\kappa(\mathfrak p))\leq G(A)$.

To conclude, it suffices to take a faithfully flat morphism $A\to B$  such that $B$ is field of cardinality strictly greater than $G(A)$. For example, one can take $A$ to be a field and $B$ the field of rational functions in many indeterminates (strictly more than the cardinality of $G(A)$).

What does this example show? Why isn't there a contradiction in mathematics (yet)?

Because the definition of sheaves and presheaves for the flat topology that I gave above was definitely defective: it neglects in a too dramatic way the set theoretical issues that one must tackle to define sheaves on categories. In the standard setting of set theory provided by ZFC, everything is a set. In particular, categories, presheaves, etc. are sets or maps between sets (themselves represented by sets).  But the presheaf $F$ that Waterhouse defines does not exist as a set, since there does not exist a set $\mathbf{Ring}$ of all rings, nor a set $\mathbf{card}$ of all von Neumann cardinals.

The usual way (as explained in SGA 4) to introduce sheaves for the flat topology consists in adding the axiom of universes — there exists a set $\mathscr U$ which is a model of set theory. Then, one does not consider the (inexistent) set of all rings, or cardinals, but only those belonging to the universe $\mathscr U$—one talks of $\mathscr U$-categories, $\mathscr U$-(pre)sheaves, etc.. In that framework, the $\mathscr U$-presheaf $F$ defined by Waterhouse (where one restricts oneself to algebras and von Neumann cardinals in $\mathscr U$) has an associated sheaf $G_{\mathscr U}$. But this sheaf depends on the chosen universe: if $\mathscr V$ is an universe containing $\mathscr U$, the restriction of $G_{\mathscr V}$ to algebras in $\mathscr U$ will no longer be a $\mathscr U$-presheaf.

Tuesday, November 27, 2012

Finite choices

The axiom of choice says that an arbitrary product $\prod_{i\in I} A_i$ of non-empty sets $A_i$ indexed by a set $I$ is non-empty. It is well known that this axiom does not follow from the other axioms of Zermelo-Fraenkel theory. Even finite choices, that is, this statement restricted to the case where all sets are finite, is not a consequence. Even 2-choices, when one assumes that $A_i$ has two elements!

For each integer $n$, call  ${\rm AC}(n)$ the axiom of choice restricted to families $(A_i)$ where $A_i$ has $n$ elements. 

Tarski proved the funny following fact: ${\rm AC}(2) \Rightarrow {\rm AC}(4)$—if you know how to choose between 2 elements, you can choose between 4.

The proof is in fact quite easy. Consider a family $(A_i)$ of sets with 4 elements. I will use choice functions furnished by ${\rm AC}(2)$ to pick-up one preferred element from $A_i$. For simplicity, label the elements of $A_i$ as $\{a,b,c,d\}$ and remove the index $i$. Then, consider the set  $\{\{a,b\},\{a,c\},\{a,d\},\{b,c\},\{b,d\},\{c,d\}\}$ of all pairs of elements of $A_i$. The hypothesis ${\rm AC}(2)$ allows to choose, for each of those pairs, one preferred element. Call $n_a,n_b,n_c,n_d$ the number of times $a,b,c,d$ has been chosen; one thus has $n_a+n_b+n_c+n_d=6$ and consider those elements which have been chosen the most often, those for which $n_?$ is maximal.
  • If there is only one, let's choose it. (This happens in repartitions like $(3,1,1,1)$, etc.)
  • If there are three such elements (the repartition must be $(2,2,2,0)$), let's choose the unique one which has never been chosen.
  • There can't be four such elements because 4 does not divides 6.
  • If there are two (repartition $(2,2,1,1)$), then use your 2-choice function on this pair!

The other funny, but more difficult, thing, is that ${\rm AC}(2)$ does not imply ${\rm AC}(3)$! Why? because the group $\{\pm1\}$ can act without fixed points on a 2-elements set but cannot on a 3-elements set.  I hope to be able to say more on this another day.

Wednesday, November 21, 2012

The category of sets and its opposite

In the book Categories and sheaves by Kashiwara and Shapira, I found a nice argument for the fact that the category of sets is not equivalent to its opposite: they write « every morphism to the initial object is an isomorphism ». Of course!

In the category Sets, the initial object is the empty set, which means that for every set $A$, there is a unique map from $\emptyset$ to $A$. Now, if we reverse the process, namely, if we consider a set $A$ and a map $f\colon A\to \emptyset$, we see that $A$ must be empty and $f$ is a bijection, hence an isomorphism in the category of sets.

In the opposite category, all arrows are reversed, the initial object becomes the terminal object, etc. Isomorphisms are (reversed) maps which have an inverse, so isomorphisms are still given by bijections. A terminal object of Sets is one-element set, $\{x\}$ (you could take the set $1=\{\emptyset\}$ if, like von Neumann, you believe that numbers are sets). Indeed, there is a unique map from any set to a one-element set. Reverse the process again and consider a set $A$ and a map $f\colon \{x\} \to A$. This amounts to choosing an element of $A$, but such maps are not bijections in general, unless $A$ has itself only one element.

Thursday, October 18, 2012

Cardinality of ultraproducts (an update)

I thought a little more about the proof that an ultraproduct of finite sets is either finite (if the family of cardinals is bounded with respect to the chosen ultrafilter) or has the power of the continuum. I believe that I can now state things in a neater, albeit longer, way.

Let me recall the notation:  $(A_x)$ be a family of sets indexed by a set $X$, $A=\prod A_x$ is the product of the sets $A_x$, $\cal U$ be a nonprincipal ultrafilter on a set $X$, and $A^*$ is the ultraproduct of the $A_x$ with respect to $\cal U$.

I then recall the notion of convergence with respect to the ultrafilter $\cal U$. Let $V$ be a topological space and let  $(v_x)$ be a family of points of $V$.

  1. Say $(v_x)$ $\cal U$-converges to $v$ if for any neighborhood $W$ of $v$, the set of $x\in X$ such that $v_x\in W$ belongs to $\cal U$. 
  2. If $V$ is Hausdorff, then $(v_x)$ has at most one limit. In particular, if $(w_x)$ is another family of points of $V$ indexed by $X$ which converges to a point $w\neq v$, the set of $x\in X$ such that $v_x\neq w_x$ belongs to $\cal U$.
  3. Moreover, if $V$ is compact, then any family $(v_x)$ has a $\cal U$-limit in $V$. This is essentially the definition of compactness in terms of ultrafilters, but it does no harm recalling the proof. If, by contradiction, $(v_x)$ has no $\cal U$-limit, any point $v\in V$ has a neighborhood $W_v$ such that the set $X_v$ $x\in X$ with $x\not\in W_v$ belongs to $\cal U$. Since $V$ is compact, there exists a finite set $T$ in $V$ such that the $W_v$, for $v\in T$, cover $V$. This implies that the intersection of the $X_v$, for $v\in T$, is empty. However, this is a finite intersection of elements of $\cal U$, hence belongs to $\cal U$, hence is non-empty.

Last reminding. The set of nonnegative integers $\mathbf N$ is dense in the Hausdorff and compact set $\mathbf Z_2$ of $2$-adic integers endowed with the $2$-adic distance.

For every $x\in X$, choose a surjection $f_x$ from $A_x$ to the interval of integers $[0,\mathrm{Card}(A_x))$, viewed as a subset of $\mathbf Z_2$. For any family $a=(a_x)$ where $a_x\in A_x$ for every $x\in X$, the family $(f_x(a_x))$ has a unique $\cal U$-limit in $\mathbf Z_2$, denoted $f(a)$. This defines a map $f\colon A\to \mathbf Z_2$.

Let $a=(a_x)$ and $b=(b_x)$ be two such families and assume that $f(a)\neq f(b)$. It follows from the fact that $\mathbf Z_2$ is Hausdorff that the set of $x\in X$ such that $a_x\neq b_x$ belongs to $\cal U$. Said the other way round, if $a$ and $b$ are $\cal U$-equivalent, then $f(a)=f(b)$, so that $f$ defines a map $f^*\colon A^*\to\mathbf Z_2$.

We now show that if the cardinalities $\mathrm{Card}(A_x)$ are not bounded (with respect to $\cal U$), this map $f^*$ is surjective. Let $v\in\mathbf Z_2$; for any $x\in X$, let $v_x$ be the integer in $[0,\mathrm{Card}(A_x))$ which is the closest to $v$ with respect to the $2$-adic distance and let $a_x\in A_x$ be such that $f_x(a_x)=v_x$. I claim that $d(v,v_x)$ $\cal U$-converges to $0$; indeed, for any $k$, the set $X_k$  of $x\in X$ such that $\mathrm{Card}(A_x)\geq 2^k$ belongs to $\cal U$ (this is what it means to be unbounded with respect to $\cal U$) and for  $x\in X_k$, $d(v,v_x)\leq 2^{-k}$.  By construction, $(f_x(a_x))=(v_x)$, hence $f((a_x))=v$; if $a^*$ is the class in $A^*$ of the family $(a_x)$, we thus have $f^*(a^*)=v$.

Finally, $\mathrm{Card}(A^*) \geq \mathrm{Card}(\mathbf Z_2)=2^{\mathbf N}$, as was to be shown.

(This proof is longer, but only because I needed to recall—I mean, to recall to myself—the topological apparatus of convergence for ultrafilters.)

Wednesday, October 17, 2012

Ultrafilters

Ultrafilters won't make american coffee better, but they have nice applications in mathematics. In fact, the last three (working) days, I happened to hear about them with three different colleagues, in three different directions.

Let me recall that an ultrafilter $\cal U$ on a set $X$ is a subset of $\mathfrak P(X)$ satisfying the following axioms:
  1. The empty set does not belong to $\cal U$;
  2. If $A\in\cal U$ and $B\supset A$, then $B\in\cal U$;
  3. If $A,B\in \cal U$, then $A\cap B\in\cal U$;
  4. For any subset $A\subset X$, either $A$ or $\complement A$ belongs to $\cal U$ (but not both).
In informal terms, an element of $\cal U$ may be thought of as a “large” subset of $X$; the axioms say 1. that the empty set is not large, 2. that a superset of a large set is again large, 3. that large sets are large enough so that intersections of two large sets is large, and finally, 4. that if a set is not large, then its complement is large. In fact, this last axiom is the one which makes a filter “ultra”.

Here is an example of an ultrafilter. Fix a point $x\in X$ and decree that a set $A\subset X$ is large iff it contains $x$. Such ultrafilters are called principal and they do not quite respect the intuition of elements of an ultrafilter being a large set. Unfortunately, there is no explicit construction of an ultrafilter which is not principal. Fortunately, choice-like axioms imply the existence of non principal ultrafilters.

Ultrafilters are used to construct ultraproducts

Take a family $(A_x)$ of  non-empty sets indexed by the set $X$ and fix a nonprincipal ultrafilter on $X$. (The definition below is not the correct one when some sets $A_x$ may be empty; the correct one is slightly more elaborate. I learnt it from Laurent Moret-Bailly, see Section 2 of his beautiful paper An extension of Greenberg's theorem to general valuation ringsarXiv:1106.0984v3.) Define an equivalence relation on the product set $A=\prod A_x$ by saying that two families $(a_x)$ and $(b_x)$ are equivalent iff the set of $x$ such that $a_x=b_x$ is large. Let $A^*$ be the quotient of $A$ by this equivalence relation; it is called the ultraproduct of the $A_x$. When the $A_x$ have some extra-structure (relations, operations, etc.) on can transport them to the ultraproduct $A^*$. For example, if, for each $x$, $R_x$ is a binary relation on $A_x$, $a,b\in A^*$ are classes of $(a_x)$ and $(b_x)$, say that $R(a,b)$ iff the set of $x$ such that $R_x(a_x,b_x)$ is large. We shall see some specific examples below.

There are many nice bits of mathematics where ultrafilters are involved. Here are three of them which I would like not to forget.

1) Cardinality of ultraproducts of countable sets.

If there is a large subset of $X$ on which $\mathrm{Card}(A_x)$ is bounded, then $A^*$ is finite. Otherwise, $A^*$ is infinite, and has in fact at least the power of the continuum.

Here is a classical proof in the case $X=\mathbf N$, with some twists that helped me understand it.

We want to construct a map from $2^{\mathbf N}$ to $A^*$. Here, $2^{\mathbf N}$ is the set of subsets of $\mathbf N$,  but can be almost viewed as the set of $2$-adic integers, a subset of $\mathbf N$ corresponding to a $2$-adic integer whose nonzero digits are exactly those belonging to the given subset. In formula, $S\subset\mathbf N$ corresponds to the $2$-adic integer $f(S)=\sum_{n=0}^\infty \chi_S(n) 2^n$. If we sum only $n$ terms, we get an approximation $f_n(S)$ modulo $(2^n)$ which is an element of $\{0,\dots,2^n-1\}$. In particular, $f_n(S)\rightarrow f(S)$. This shows that if $S\neq T$ are two distincts subsets of $B$, then $f_n(S)\neq f_n(T)$ for all but finitely many integers~$n$.

Enumerate a large subset $X_1=\{x_0,\dots\} $ of $X$ such that $A_{x_n}$ has at least $2^n$ elements and enumerate those elements. For any $S\in 2^{\mathbf N}$, consider the family $a(S)=(a_x)$ of $\prod A_x$ such that for every $n$, $a_{x_n}$ is the $f_n(S)$th element of $A_{x_n}$, and $a_x$ is any fixed element of $A_x$ otherwise. Let $a^*(S)$ be the class of this family in $A^*$. If $S\neq T$, then $f_S(n)\neq f_T(n)$ for all but finitely many $n$, so that $a(S)$ and $a(T)$ differ in a large set (the complement to a finite set in a large set), hence $a^*(S)\neq a^*(T)$. We have thus constructed an injective map from $2^{\mathbf N}$ to $A^*$, hence $\mathrm{Card}(A^*)\geq 2^{\mathbf N}$.

2) An “ultrafilter proof” of the infinite Ramsey theorem

Any infinite graph contains an infinite subset which is either a complete subgraph (any two vertices are linked by an edge) or a discrete subgraph (no two vertices are linked by an edge). In this statement, a graph is assumed to have no loop.

Let $V$ be the set of vertices of your graph $G$. Choose a non principal ultrafilter on $V$. Say a vertex $v\in V$ is social if the set of its neighbors (those linked to $v$ by an edge of $G$) is large; otherwise, say it is lonely. Now, assuming that the set of social vertices is large, we will construct an infinite complete subgraph of $G$. Otherwise, the set of lonely vertices is large and the same construction (or the consideration of the opposite graph) gives an infinite discrete subgraph.

Fix a social vertex $v_0$. The set of social neighbors of $v_0$ is large, because it is the intersection of the large set of social vertices with the large set of neighbors of $v_0$; so $v_0$ has a social neighbor $v_1$. Assume we have constructed social vertices $v_0,\dots,v_{n-1}$ which are pairwise linked by an edge, the set of social vertices of $V$ which are linked to all of the $v_i$ is large, again because it is the intersection of the large set of social vertices with finitely many large sets of neighbors of each $v_i$. So there is a social vertex $v_n$ which is a neighbor of $v_0,\dots,v_{n-1}$. Go on.

This infinite Ramsey theorem implies the finite one: For any integer $n$, there is an integer $r(n)$ such that  each graph with at least $r(n)$ vertices possesses a subgraph of cardinality $n$ which is either complete or discrete. The proof uses ultrafilters again, but in a different way. Assume by contradiction that the finite Ramsey theorem is false. So there is an integer $n$ and a family $(G_i)$ of finite graphs with an increasing number of vertices, none of which contains a complete or discrete subgraph with $n$ vertices. Take the ultraproduct $G$ of the graphs $G_i$, namely the product graph $\prod G_i$ modulo the equivalence relation $(g_i)\sim (g'_i)$ iff the set of $i$ such that $g_i=g'_i$
is large, for some given non principal ultraproduct on the set $I$ of indices. Write $[g_i]$ for the class of a family $(g_i)$; say that $[g_i]$ is linked to $[g'_i]$ if the set of indices $i$ for which $g_i$ is linked to $g'_i$ in $G_i$ is large. Then $G$ is a graph, an infinite graph (by the result on cardinalities of ultraproducts).

While I write this proof, I realize that this proof prompts naturally the question of the Ramsey theorem for cardinals: Assuming $V$ has some (infinite) cardinality, what is the smallest cardinality for which
there may not exist an either complete or discret subgraph of that cardinality? For the previous proof to work, one needs transfinite induction, so one needs the ultrafilter to be stable by possibly infinite intersections.

3) Ax's proof of Hilbert's Nullstellensatz

Let $k$ be an algebraically closed field, let $f_1,\dots,f_m$ be polynomials in $d$ variables with coefficients in $k$. Assume that the system of polynomial equations
\[ f_1(x_1,\dots,x_d)=\dots=f_m(x_1,\dots,x_d)=0 \]
has a solution in some extension $K$ of $k$. Then, it already has a solution in $k$.

The following proof is given by Ax in the surprising paper A Metamathematical Approach to some Problems in Number Theory, published in the proceedings of a conference  (Proc. Symp. Pure Math., 20, Amer. Math. Soc). I say surprising, because the main part of this 30-pages paper is about 6 pages, the rest is an appendix providing a beautiful introduction the theory of valuations!

I basically copy Ax. One may assume that both fields $k$ and $K$ are countable and algebraically closed (first replace $k$ by the algebraic closure of the subfield generated by the coefficients
of the polynomials $f_1,\dots,f_m$; then replace $K$ by the algebraic closure of the subfield generated over $k$ by the coordinates $(x_1,\dots,x_d)$ of some solution). 

Consider a nonprincipal ultrafilter on the set $\mathbf N$ of integers and look at the ultraproducts $k^*$ and $K^*$ of the families $(k_n)$ and $(K_n)$, where $k_n=k$ and $K_n=K$ for each $K$. These are algebraically closed fields; moreover,  $k^*$ contains $k$ (the set of classes of all families $(a_n)$ where $a_n\equiv a$ for  some $a\in k$). From a solution $(x_1,\dots,x_d)$ in $K$ and considering the class of the constant family, one gets a solution $(x^*_1,\dots,x^*_d)$ in $(K^*)^d$ of the system of equations $f_1(x^*)=\dots=f_m(x^*)=0$.

By set theory, $\mathrm{Card}(k^*)\leq {\mathbf N}^{\mathbf N}=2^{\mathbf N}$, while we have shown that $\mathrm{Card}(k^*)\geq 2^{\mathbf N}$, so that $k^*$ has the cardinality of the continuum. Likewise, $K^*$ has the cardinality of the continuum. Consequently (by set theory again),  $k^*$ and $K^*$ both have a transcendence basis over $k$ of cardinality the continuum; since they are algebraically closed, they are isomorphic as extensions of $k$. The image of the solution $x^*$ by a $k$-isomorphism from $K^*$ to $k^*$ is a solution $y^*$ of our system in $k^*$. By definition, $y^*=(y^*_1,\dots,y^*_d)$ where for each $i$, $y^*_i$ is the class of a sequence $(y_{i,n})$ of elements of $k$. Since $f_j(y^*)=0$, the set of integers $n$ such that $f_j(y_{1,n},\dots,y_{d,n})=0$ is large. Consequently, there is a large set of integers $n$ such that $(y_{1,n},\dots,y_{d,n})$ is a solution of our system. Any such integer $n$ is a proof that this system already has a solution in $k$. Q.E.D.

I do no know whether this proof is really easier than the ones usually given in an algebra course, for all the set theory involved is rather delicate, but I found it rather appealing.