Showing posts with label topology. Show all posts
Showing posts with label topology. Show all posts

Saturday, September 6, 2025

The two adjunctions of the preimage

Sometimes in mathematics, you are told about very elementary things of which you hadn't even thought.

I was well aware of some “duality” between image and preimage, but I just learned from Anatole Dedecker (who learned it from Patrick Massot) about another “duality” between preimage and some other notion. Moreover, it appears that this new notion can be used for making slightly more natural a proof in general topology!

Here, “duality” is taken in an informal meaning, the correct word is “adjunction”, in the sense of category theory, and I will try to explain that.

1. Image and preimage

So consider a map $f\colon X \to Y$ between two sets. It induces two other maps relating the sets $\mathcal P(X)$ and $\mathcal P(Y)$ of subsets of $X$ and $Y$. Note that the inclusion relation between subsets these two sets $\mathcal P(X)$ and $\mathcal P(Y)$ allows to view them as ordered sets.

First, we have the direct image operation $f_{*}$, that maps a subset $A\subseteq X$ to the subset $f_{*}(A)$ of $Y$, the set of all images $f(a)\in X$, for $a\in A$. The classical notation would be $f(A)$, but it is ambiguous in the case where a subset $A$ of $X$ is also an element of $X$, and introducing a specific notation will help to clarify some statements later on. This map $f_{*}\colon \mathcal P(X) \to \mathcal P(Y)$ is increasing: for $A$ and $A'\in\mathcal P(X)$ such that $A\subseteq A'$, one has $f_{*}(A) \subseteq f_{*}(A')$.

Then we have the preimage operation $f^{*}$, that maps a subset $B\subseteq X$ to the subset $f^{*}(B)$ of $X$ consisting of all preimages of elements of $B$, namely all $a\in A$ such that $f(a) \in B$. The classical notation is rather $f^{-1}(B)$, but it has the same ambiguity as the direct image. Bizarrely, Bourbaki found the need to invent a another notation for that one, and they put the symbol “$-1$” on top of the letter $f$. The notation $f ^{*}$ is chosen by symmetry with the direct image $f_{*}$. Again, the map $f^{*}\colon \mathcal P(Y) \to\mathcal P(X)$ is increasing: for $B$ and $B'\in\mathcal P(Y)$ such that $B\subseteq B'$, one has $f^{*}(B) \subseteq f^{*}(B')$.

Finally, there is a compatibility between these two operations $f_{*}$ and $f ^{*}$: for $A\in\mathcal P(X)$ and $B\in\mathcal P(Y)$, one has $f_{*}(A) \subseteq B$ if and only if $A \subseteq f^{*}(B)$. Indeed, both of these expressions mean that if $f(a) \in B$ for all $a\in A$. We summarize this property by  saying that the operation $f_{*}$ is left adjoint to the operation $f ^{*}$, or that the operation $f^{*}$ is right adjoint to the operation $f_{*}$.

This terminology comes from category theory, in which adjunctions of functors play an important role since the paper of Daniel Kan (1958), Adjoint functors.

In our case, the categories are just the ordered sets $\mathcal P(X)$ and $\mathcal P(Y)$, with the corresponding sets as sets of objects, and where the set of arrows $A$ to $A'\in\mathcal P(X)$ is a singleton when $A\subseteq A'$, and is empty otherwise. The book of Emily Riehl (2016), Category Theory in Context, is a nice introduction to this topic, with illuminating elementary examples. The property that the operations $f_{*}$ and $f^{*}$ are increasing means that they are *functors* between these categories, and the equivalence $f_{*}(A) \subseteq B \Leftrightarrow A \subseteq f^{*}(B)$ induces the category-theoretical adjunction.

In this case, an adjunction pair is also called a Galois connection. There, the terminology comes from  Galois theory, the two ordered sets are the set of subextensions of a Galois extension $K\to L$ and the set of subgroups of the Galois group $\operatorname{Gal}(L/K)$, the maps are decreasing and correspond to mapping a subextension $E$ of $L$ to the subgroup of $\operatorname{Gal}(L/E)$ of $\operatorname{Gal}(L/K)$, and a subgroup $H\subseteq \operatorname{Gal}(L/K)$ to the fixed-field $L^H$. In Galois theory, these two maps are even bijective.

2. The adjoint functor theorem

While, as MacLane wrote, “adjoint functors arise everywhere”, not every functor can be part of an adjunction. Indeed, if a functor $F$ is left adjoint to a functor $G$, then $F$ preserves colimits and $G$ preserves limits.

Category theory considers limits and colimits of arbitrary diagrams, but in the restricted setting of ordered sets, where there can be at most one arrow from one object to another, diagrams boil down to subsets of objects, limits correspond to infimums (greatest lower bound) and colimits to supremums (least upper bound), which may exist, or not, in particular ordered sets.In our even more restricted case of the set $\mathcal P(X)$ of subsets of a given set $X$, infimum corresponds to intersection, supremum to union, and we have $f_{*}(\bigcup A_i) = \bigcup f_{*}(A_i)$ for every family $(A_i)$ of subsets of $X$, and $f^{*}(\bigcap B_i) = \bigcap f^{*}(B_i)$ for every family $(B_i)$ of subsets of $Y$.

There is an abstract theorem in category theory, the “general adjoint functor theorem”, that says that these property are essentially sufficient for a functor $F$ to be a left adjoint to some functor $G$, or for a functor $G$ to be a right adjoint to some functor $G$. One has to be more careful for the actual statement, but this is the idea.

For an increasing map $G\colon T \to S$ between ordered sets $S$ and $T$, the existence of a left adjoint $F$ can be understood from: for $s\in S$ and $t\in T$, one should have $F(s)\leq t$ if and only if $s\leq G(t)$: consequently, it suffices to take for $F$ the infimum, assuming it exists, of all $t$ such that $s\leq G(t)$. Dually,  the right adjoint $G$ to a functor $F$ would map $t$ to the supremum, assuming it exists,  of all $s$ such that $t\leq F(s)$.

In the case of the image $f_{*}\colon \mathcal P(Y)\to \mathcal P(X)$, this rule defines the right adjoint as mapping $B \in\mathcal P(Y)$ to the union of all subsets $A\in\mathcal P(X)$ such that $f _{*}(A) \subseteq B$. This is exactly the preimage of $B$!

Conversely, in the case of the preimage $f^{*}\colon \mathcal P(Y)\to \mathcal P(X)$, this procedure defines the left adjoint as mapping $A \in\mathcal P(X)$ to the intersection of all subsets $B$ such that $A \subseteq f^{*}(B)$. Again, this is just the image $f _{*}(A)$ of $A$, but I find it slightly more difficult to prove without using that we already know this image and the already known adjunction between $f _{*}$ and $f ^{*}$.

3. The other adjunction

We have seen that preimages respect intersections. As a matter of fact, they also respect unions: $f ^{*}(\bigcup B_i)= \bigcup f ^{*}(B_i)$. Given the adjoint functor theorem, this implies that there is an increasing map $f_! \colon \mathcal P(X) \to \mathcal P(Y)$ which is a right adjoint to $f ^{*}$. What is this operation?

The adjoint functor theorem gives a way to compute it: for $A\in\mathcal P(X)$, the set $f_!(A)\in\mathcal P(Y)$ is the union of all subsets $B\in\mathcal P(Y)$ such that $f^{*}(B) \subseteq A$. It suffices to consider such sets $B$ which are singletons $\{b\}$ and we get that a point $b\in Y$ belongs to $f_!(A)$ if and only if all preimages of $b$ belong to $A$.

Here are two more ways to get a grip on this new adjunction.

Note that a point $b\in Y$ belongs to $f_{*}(A)$ if and only if there exists $a\in A$ such that $b = f (a)$, which means that there exists $a\in A$ in the preimage $f^{*}(\{b\})$, relating $f_{*}$ with the existential quantifier. Similarly, a point $b\in Y$ belongs to $f_! (A)$ if and only if for every $a\in f^{*}(\{b\})$, one has $a\in A$, relating $f_!$ with the universal quantifier.

The other way comes by taking complements: a point $b$ does not belong to $f_!(A)$ if it has a preimage that does not belong to $a$. In other words, $f_!(A) = \complement f_{*}(\complement A)$. This leads to considering the complement map from $\mathcal P(X)$ to itself as an order-reversing involution, and similarly on $\mathcal P(Y)$, and observing that they commute with preimage, in the sense that $f^{*}(\complement B) = \complement f^{*}(B)$ for all $B\subseteq Y$. Consequently, this operation transfers the left adjoint $f _{*}$ of $f ^{*}$ to a right adjoint, and conversely, which is exactly what we had observed.

4. An application in general topology

As an application, this adjunction can be used in topology to characterize open or closed maps. By definition, a map $f \colon X\to Y$ between topological spaces is open if it maps an open subset to an open subset, and it is closed if it maps a closed subset to a closed subset.

The definition of $f_!$ using complement, and the fact that a set is closed if and only if its complement is open implies the following lemma:

Lemma.A map $f\colon X \to Y$ is closed (resp. open) if and only if for every open (resp. closed) subset $U\subseteq X$, the set $f_! (U)$ is closed (resp. open).

It also allows to give a natural proof of the classical characterization of closed maps:

Proposition.Let  $f\colon X \to Y$ be a map between topological spaces. The following properties are equivalent:

  1. The map $f$ is closed;
  2. For any subset $B$ of $Y$, the filter of neighborhoods of $f^{*}(B)$ is coarser than the preimage of the filter of neighborhoods of $B$;
  3. For any subset $B$ of $Y$ and any neighborhood $U$ of $f^{*}(B)$, there exists a neighborhood $V$ of $B$ such that $f^{*}(V)\subseteq U$;
  4. For any point $b\in Y$, the filter of neighborhoods of $f^{*}(\{b\})$ is coarser than the preimage of the filter of neighborhoods of $b$;
  5. For any point $b\in Y$ and any neighborhood $U$ of $f^{*}(\{b\})$, there exists a neighborhood $V$ of $b$ such that $f^{*}(V) \subseteq U$.


Given the definitions of the preimage of a filter and the comparison relation on filters,
the assertions (2) and (3) are equivalent, as well as the assertions (4) and (5).

Obviously, (3) implies (5).

Let us assume (1), that $f$ is closed, and let us prove (3). Let $B$ be a subset of $Y$ and let $U$ be a neighborhood of $f^{*}B$ in $X$. By definition, there exists an open subset $U'$ of $X$ such that $f^{*}B \subseteq U' \subseteq U$. Taking adjunction, we get $B\subseteq f_! U' \subseteq f_! U$. Since $f$ is closed, the set $f_! U'$ is open, so that $f_! U$ is a neighborhood of $B$. It remains to prove that $f^{*}f_! U\subseteq U$.  To prove this inclusion, we apply the adjunction $(f^{*}, f_!)$ once more, and see that it is equivalent to the obvious inclusion  $f_! U \subseteq f_! U$.

Finally, let us assume (5) and let us prove that $f$ is closed. Let $U$ be an open subset of $X$ and let us prove that $f_! U$ is open in $Y$. It suffices to prove that for every $b\in f_! U$, the set $f_! U$ is a neighborhood of $b$. By the construction of $f_!$, the set $f^{*}(\{b\}) $ is contained in $U$ so that $U$ is a neighborhood of $f^{*}(\{b\})$. Applying (5), we get a neighborhood $V$ of $b$ in $Y$ such that $f^{*}V \subseteq U$. Applying the adjunction $(f^{*}, f_!)$, we get the inclusion $V \subseteq f_! U$. In particular, $f_! U$ is a neighborhood of $b$, as was to be shown.
 

Saturday, April 13, 2024

The topology on the ring of polynomials and the continuity of the evaluation map

Polynomials are an algebraic gadget, and one is rarely led to think about the topology a ring of polynomials should carry. That happened to me, though, more or less by accident, when María Inés de Frutos Fernández and I worked on implementing in Lean the evaluation of power series. So let's start with them. To simplify the discussion, I only consider the case of one inderminate. When there are finitely many of them, the situation is the same; in the case of infinitely many indeterminates, there might be some additional subtleties, but I have not thought about it.

$\gdef\lbra{[\![}\gdef\rbra{]\!]} \gdef\lpar{(\!(}\gdef\rpar{)\!)} \gdef\bN{\mathbf N} \gdef\coeff{\operatorname{coeff}} \gdef\eval{\operatorname{eval}} \gdef\colim{\operatorname{colim}}$

Power series

A power series over a ring $R$ is just an expression $\sum a_nT^n$, where $(a_0,a_1, \dots)$ is a family of elements of $R$ indexed by the integers. After all, this is just what is meant by “formal series”: coefficients and nothing else.

Defining a topology on the ring \(R\lbra T\rbra\) should allow to say what it means for a sequence $(f_m)$ of power series to converge to a power series $f$, and the most natural thing to require is that for every $n$, the coefficient $a_{m,n}$ of \(T^n\) in $f_m$ converges to the corresponding coeffient $a_m$ of $T^n$ in \(f\). In other words, we endow \(R\lbra T\rbra \) with the product topology when it is identified with the product set \(R^{\bN}\). The explicit definition may look complicated, but the important point for us is the following characterization of this topology: Let \(X\) be a topological space and let \(f\colon X \to R\lbra T\rbra\) be a map; for \(f\) to be continuous, it is necessary and sufficient that all maps \(f_n\colon X \to R\) are continuous, where, for any \(x\in X\), \(f_n(x)\) is the \(n\)th coefficient of \(f(x)\). In particular, the coeffient maps \(R\lbra T\rbra\to R\) are continuous.

What can we do with that topology, then? The first thing, maybe, is to observe its adequacy wrt the ring structure on \(R\lbra T\rbra\).

Proposition.If addition and multiplication on \(R\) are continuous, then addition and multiplication on \(R\lbra T\rbra\) are continuous.

Let's start with addition. We need to prove that \(s\colon R\lbra T\rbra \times R\lbra T\rbra\to R\lbra T\rbra\) is continuous. By the characterization, it is enough to prove that all coordinate functions \(s_n\colon R\lbra T\rbra \times R\lbra T\rbra\to R\), \( (f,g)\mapsto \coeff_n(f+g) \), are continuous. But these functions factor through the \(n\)th coefficient maps: \(\coeff_n(f+g) = \coeff_n(f)+\coeff_n(g)\), which is continuous, since addition, coefficients and projections are continuous. This is similar, but slightly more complicated for multiplication: if the multiplication map is denoted by \(m\), we have to prove that the maps \(m_n\) defined by $m_n(f,g)=\coeff_n(f\cdot g)$ are continuous. However, they can be written as \[ m_n(f,g)=\coeff_n(f\cdot g) = \sum_{p=0}^n \coeff_p(f)\coeff_{n-p}(g). \] Since the projections and the coefficient maps are continuous, it is sufficient to prove that the maps from \(R^{n+1} \times R^{n+1}\) to \(R\) given by \[((a_0,\dots,a_n),(b_0,\dots,b_n))\mapsto \sum_{p=0}^n a_p b_{n-p} \] are continuous, and this follows from continuity and commutativity of addition on \(R\), because it is a polynomial expression.

Polynomials

At this point, let's go back to our initial question of endowing polynomials with a natural topology.

An obvious candidate is the induced topology. This looks correct; in any case, it is such that addition and multiplication on \(R[T]\) are continuous. However, it lacks an interesting property with respect to evaluation.

Recall that for every \(a\in R\), there is an evaluation map \(\eval_a\colon R[T]\to R\), defined by \(f\mapsto f(a)\), and even, if one wishes, the two-variable evaluation map \(R[T]\times R\to R\).
The first claim is that this map is not continuous.

An example will serve of proof. I take \(R\) to be the real numbers, \(f_n=T^n\) and \(a=1\). Then \(f_n\) converges to zero, because for each integer \(m\), the real numbers \(\coeff_m(f_n)\) are zero for \(n>m\). On the other hand, \(f_n(a)=f_n(1)=1\) for all \(n\), and this does not converge to zero!

So we have to change the topology on polynomials if we want that this map be continuous, and we now give the correct definition. The ring of polynomials is the increasing union of subsets \(R[T]_n\), indexed by integers \(n\), consisting of all polynomials of degree less than \(n\). Each of these subsets is given the product topology, as above, but we endow their union with the “inductive limit” topology. Explicitly, if \(Y\) is a topological space and \(u\colon R[T]\to Y\) is a map, then \(u\) is continuous if and only if, for each integer \(n\), its restriction to \(R[T]_n\) is continuous.

The inclusion map \(R[T]\to R\lbra T\rbra\) is continuous, hence the topology on polynomials is finer than the topology induced by the topology on power series. As the following property indicates, it is usually strictly finer.

We can also observe that addition and multiplication on \(R[T]\) are still continuous. The same proof as above works, once we observe that the coefficient maps are continuous. (On the other hand, one may be tempted to compare the product topology of the inductive topologies, with the inductive topology of the product topologies, a thing which is not obvious in the direction that we need.)

Proposition.Assume that addition and multiplication on \(R\) are continuous. Then the evaluation maps \(\eval_a \colon R[T]\to R\) are continuous.

We have We have to prove that for every integer \(n\), the evaluation map \(\eval_a\) induced a continuous map from \(R[T]_n\) to \(R\). Now, this map factors as a projection map \(R[T]\to R^{n+1}\) composed with a polynomial map \((c_0,\dots,c_n)\mapsto c_0+c_1a+\dots+c_n a^n\). It is therefore continuous.

Laurent series

We can upgrade the preceding discussion and define a natural topology on the ring \(R\lpar T\rpar\) of Laurent series, which are the power series with possibly negative exponents. For this, for all integers \(d\), we set \(R\lpar T\rpar_d\) to be the set of power series of the form \( f=\sum_{n=-d}^\infty c_n T^n\), we endow that set with the product topology, and take the corresponding inductive limit topology. We leave to the reader to check that this is a ring topology, but that the naïve product topology on \(R\lpar T\rpar\) wouldn't be in general.

Back to the continuity of evaluation

The continuity of the evaluation maps $f\mapsto f(a)$ were an important guide to the topology of the ring of polynomials. This suggests a more general question, for which I don't have a full answer, whether the two-variable evaluation map, \((f,a)\mapsto f(a)\), is continuous. On each subspace $R[T]_d\times R$, the evaluation map is given by a polynomial map ($(c_0,\dots,c_d,a)\mapsto c_0 +c_1a+\dots+c_d a^d$), hence is continuous, but that does not imply the desired continuity, because that only tells us about $R[T]\times R$ with the topology $\colim_d (R[T]_d\times R)$, while we are interested in the topology $(\colim_d R[T]_d)\times R$. To compare these topologies, note that the natural bijection $\colim_d (R[T]_d\times R) \to (\colim_d R[T]_d)\times R$ is continuous (because it is continuous at each level $d$), but the continuity of its inverse is not so clear.

I find it amusing, then, to observe that sequential continuity holds in the important case where $R$ is a field. This relies on the following proposition.

Proposition.Assume that $R$ is a field. Then, for every converging sequence $(f_n)$ in $R[T]$, the degrees $\deg(f_n)$ are bounded.

Otherwise, we can assume that $(f_n)$ converges to $0$ and that $\deg(f_{n+1})>\deg(f_n)$ for all $n$. We construct a continuous linear form $\phi$ on $R[T]$ such that $\phi(f_n)$ does not converge to $0$. This linear form is given by a formal power series $\phi(f)=\sum a_d c_d$ for $f=\sum c_dT^d$, and we choose the coefficients $(a_n)$ by induction so that $\phi(f_n)=1$ for all $n$. Indeed, if the coefficients are chosen up to $\deg(f_n)$, then we fix $a_d=0$ for $\deg(f_n)<d<\deg(f_{n+1})$ and choose $a_{\deg(f_{n+1})}$ so that $\phi(f_{n+1})=1$. This linear form is continuous because its restriction to any $R[T]_d$ is given by a polynomial, hence is continuous.

Corollary. — If $R$ is a topological ring which is a field, then the evaluation map $R[T]\times R\to R$ is sequentially continuous.

Consider sequences $(f_n)$ in $R[T]$ and $(a_n)$ in $R$ that converge to $f$ and $a$ respectively. By the proposition, there is an integer $d$ such that $\deg(f_n)\leq d$ for all $n$, and $\deg(f)\leq d$. Since evaluation is continuous on $R[T]_d\times R$, one has $f_n(a_n)\to f(a)$, as claimed.

Remark. — The previous proposition does not hold on rings. In fact, if $R=\mathbf Z_p$ is the ring of $p$-adic integers, then $\phi(p^nT^n)=p^n \phi(T^n)$ converges to $0$ for every continuous linear form $\phi$ on $R[T]$. More is true since in that case, evaluation is continuous! The point is that in $\mathbf Z_p$, the ideals $(p^n)$ form a basis of neighborhoods of the origin.

Proposition. — If the topology of $R$ is linear, namely the origin of $R$ has a basis of neighborhoods consisting of ideals, then the evaluation map $R[T]\times R\to R$ is continuous.

By translation, one reduces to showing continuity at $(0,0)$. Let $V$ be a neighborhood of $0$ in $R$ and let $I$ be an ideal of $R$ such that $I\subset V$. Since it is an subgroup of the additive group of $R$, the ideal $I$ is open. Then the set $I\cdot R[T]$ is open because for every $d$, its trace on $R[T]_d$, is equal to $I\cdot R[T]_d$, hence is open. Then, for $f\in I\cdot R[T]$ and $a\in R$, one has $f(a)\in I$, hence $f(a)\in V$.

Here is one case where I can prove that evaluation is continuous.

Proposition.If the topology of $R$ is given by a family of absolute values, then the evaluation map $(f,a)\mapsto f(a)$ is continuous.

I just treat the case where the topology of $R$ is given by one absolute value. By translation and linearity, it suffices to prove continuity at $(0,0)$. Consider the norm $\|\cdot\|_1$ on $R[T]$ defined by $\|f\|_1=\sum |c_n|$ if $f=\sum c_nT^n$. By the triangular inequality, one has $|f(a)|\leq \|f\|_1 $ for any $a\in R$ such that $|a|\leq 1$. For every $r>0$, the set $V_r$ of polynomials $f\in R[T]$ such that $\|f\|_1<r$ is an open neighborhood of the origin since, for every integer $d$, its intersection with $R[T]_d$ is an open neighborhood of the origin in $R[T]_d$. Let also $W$ be the set of $a\in R$ such that $|a|\leq 1$. Then $V_r\times W$ is a neighborhood of $(0,0)$ in $R[T]\times R$ such that $|f(a)|<r$ for every $(f,a)\in V_r\times W$. This implies the desired continuity.

Friday, April 2, 2021

On the Hadamard-Lévy theorem, or is it Banach-Mazur?

During the preparation of an agrégation lecture on connectedness, I came across the following theorem, attributed to Hadamard–Lévy: 

Theorem. — Let $f\colon \mathbf R^n\to\mathbf R^n$ be a $\mathscr C^1$-map which is proper and a local diffeomorphism. Then $f$ is a global diffeomorphism.

In this context, that $f$ is proper means that $\| f(x)\| \to+\infty$ when $\| x\|\to+\infty$, while, by the inverse function theorem, the condition that $f$ is a local diffeomorphism is equivalent to the property that its differential $f'(x)$ is invertible, for every $x\in\mathbf R^n$. The conclusion is that $f$ is a diffeomorphism from $\mathbf R^n$ to itself; in particular, $f$ is bijective and its inverse is continuous.

This theorem is not stated in this form neither by Hadamard (1906), nor by Lévy (1920), but is essentially due to Banach & Mazur (1934) and it is the purpose of this note to clarify the history, explain a few proofs, as well as more recent consequences for partial differential equations.

A proper map is closed: the image $f(A)$ of a closed subset $A$ of $\mathbf R^n$ is closed in $\mathbf R^n$. Indeed, let $(a_m)$ be a sequence in $A$ whose image $(f(a_m))$ converges in $\mathbf R^n$ to an element $b$; let us show that there exists $a\in A$ such that $b=f(a)$. The properness assumption on $f$ implies that $(a_m)$ is bounded. Consequently, it has a limit point $a$, and $a\in A$ because $A$ is closed. Necessarily, $f(a)$ is a limit point of the sequence $(f(a_m))$, hence $b=f(a)$.

In this respect, let us note the following reinforcement of the previous theorem, due to Browder (1954):
Theorem (Browder). — Let $f\colon \mathbf R^n\to\mathbf R^n$ be a local homeomorphism. If $f$ is closed, then $f$ is a global homeomorphism.

A surprising aspect of these results and their descendents is that they are based on two really different ideas. Banach & Mazur and Browder are based on the notion of covering, with ideas of homotopy theory and, ultimately, the fact that $\mathbf R^n$ is simply connected. On the other hand, the motivation of Hadamard was to generalize to dimension $n$ the following elementary discussion in the one-dimensional case: Let $f\colon\mathbf R\to\mathbf R$ be a $\mathscr C^1$-function whose derivative is $>0$ everywhere (so that $f$ is strictly increasing); give a condition for $f$ to be surjective. In this case, the condition is easy to find: the indefinite integral $\int f'(x)\,dx$ has to be divergent both at $-\infty$ and $+\infty$. In the $n$-dimensional case, the theorems of Hadamard is the following:

Theorem.Let $f\colon\mathbf R^n\to\mathbf R^n$ be a $\mathscr C^1$-map. For $r\in\mathbf R_+$, let $\omega(r)$ be the infimum, for $x\in\mathbf R^n$ such that $\|x\|=r$, of the norm of the linear map $f'(x)^{-1}$; if $\int_0^\infty dr/\omega(r)=+\infty$, then $f$ is a global diffeomorphism.

In Hadamard's paper, the quantity $\omega(r)$ is described geometrically as the minor axis of the ellipsoid defined by $f'(x)$, and Hadamard insists that using the volume of this ellipsoid only, essentially given by the determinant of $f'(x)$, would not suffice to characterize global diffeomorphisms. (Examples are furnished by maps of the form $f(x_1,x_2)=(f_1(x_1),f_2(x_2))$. The determinant condition considers $f_1'(x_1)f_2'(x_2)$, while one needs individual conditions on $f'_1(x_1)$ and $f'_2(x_2)$.)

In fact, as explained in Plastock (1974), both versions (closedness hypothesis or quantitative assumptions on the differential) imply that the map $f$ is a topological covering of $\mathbf R^n$. Since the target $\mathbf R^n$ is simply connected and the source $\mathbf R^n$ is connceted, $f$ has to be a homeomorphism. I will explain this proof below, but I would first like to explain another one, due to Zuily & Queffelec (1995) propose an alternating proof which is quite interesting.

A dynamical system approach

The goal is to prove that $f$ is bijective and, to that aim, we will prove that every preimage set $f^{-1}(b)$ is reduced to one element. Replacing $f$ by $f-b$, it suffices to treat the case of $b=0$. In other words, we wish to solve that the equation $f(x)=0$ has exactly one solution. For that, it is natural to try to start from some point $\xi\in\mathbf R^n$ and to force $f$ to decrease. This can be done by following the flow of the vector field given by $v(x)=-f'(x)^{-1}(f(x))$. This is a vector field on $\mathbf R^n$ and we can consider its flow: a map $\Phi$ defined on an open subset of $\mathbf R\times\mathbf R^n$ such that $\partial_t \Phi(t,x)=v(\Phi(t,x))$ for all $(t,x)$ and $\Phi(0,x)=x$ for all $x$. In fact, the Cauchy–Lipschitz theorem guarantees the existence of such a flow only if the vector field $v$ is locally Lipschitz, which happens if, for example, $f$ is assumed to be $\mathscr C^2$. In this case, there is even uniqueness of a maximal flow, and we will make this assumption, for safety. (In fact, the paper of De Marco, Gorni & Zampieri (1994) constructs the flow directly thanks to the hypothesis that the vector field is pulled back from the Euler vector field on $\mathbf R^n$.)

What are we doing here? Note that in $\mathbf R^n$, the opposite of the Euler vector field, defined by $u(y)=-y$, has a very simple solution: the flow lines are straight lines going to $0$. The formula above just pulls back this vector field $u$ via the local diffeomorphism $f$, and the flow lines of the vector field $v$ will just be the ones given by pull back by $f$, which will explain the behaviour described below.

In particular, let $a\in\mathbf R^n$ be such that $f(a)=0$ and let $U$ be a neighborhood of $a$ such that $f$ induces a diffeomorphism from $U$ to a ball around $0$. Pulling back the solution of the minus-Euler vector field by $f$, we see that once a flow line enters the open set $U$, it converges to $a$. The goal is now to prove that it will indeed enter such a neighborhood (and, in particular, that such a point $a$ exists).

We consider a flow line starting from a point $x$, that is, $\phi(t)=\Phi(t,x)$ for all times $t$. Let $g(t)= f(\phi(t))$; observe that $g$ satisfies $g'(t)=f'(\phi(t))(\phi'(t))=-g(t)$, hence $g(t)=g(0)e^{-t}$. Assume that the line flow is defined on $[0;t_1\mathopen[$, with $t_1<+\infty$. by what precedes, $g$ is bounded in the neighborhood of $t_1$; since $f$ is assumed to be proper, this implies that $\phi(t)$ is bounded as well. The continuity of the vector field $v$ implies that $\phi$ is uniformly continuous, hence it has a limit at $t_1$. We may then extend the line flow a bit right of $t_1$. As a consequence, the line flow is defined for all times, and $g(t)\to0$ when $t\to+\infty$. By the same properness argument, this implies that $\phi(t)$ is bounded when $t\to+\infty$, hence it has limit points $a$ which satisfy $f(a)=0$. Once $\phi$ enters an appropriate neighborhood of such a point, we have seen that the line flow automatically converges to some point $a\in f^{-1}(0)$.

Let us now consider the map $\lambda\colon\mathbf R^n\to f^{-1}(0)$ that associates with a point $\xi$ the limit of the line flow $t\mapsto \Phi(t,\xi)$ starting from the initial condition $\xi$. By continuity of the flow of a vector field depending on the initial condition, the map $\lambda$ is continuous. On the other hand, the hypothesis that $f$ is a local diffeomorphism implies that $f^{-1}(0)$ is a closed discrete subset of $\mathbf R^n$. Since $\mathbf R^n$ is connected, the map $\lambda$ is constant. Since one has $\lambda(\xi)=\xi$ for every $\xi\in f^{-1}(0)$, this establishes that $f^{-1}(0)$ is reduced to one element, as claimed.

Once $f$ is shown to be bijective, the fact that it is proper (closed would suffice) implies that its inverse bijection $f^{-1}$ is continuous. This concludes the proof.

The theorem of Banach and Mazur

The paper of Banach and Mazur is written in a bigger generality. They consider multivalued continuous maps $F\colon X\to Y$ ($k$-deutige stetige Abbildungen) by which they mean that for every $x$, a subset $F(x)$ of $Y$ is given, of cardinality $k$, the continuity being expressed by sequences: if $x_n\to x$, one can order, for every $n$, the elements of $F(x_n)=\{y_{n,1},\dots,y_{n,k}\}$, as well as the elements of $F(x)=\{y_1,\dots,y_k\}$, in such a way that $y_{n,j}\to y_n$ for all $j$. (In their framework, $X$ and $Y$ are metric spaces, but one could transpose their definition to topological spaces if needed.) They say that such a map is decomposed (zerfällt) if there are continuous functions $f_1,\dots,f_k$ from $X$ to $Y$ such that $F(x)=\{f_1(x),\dots,f_k(x)\}$ for all $x\in X$.

In essence, the definition that Banach and Mazur are proposing contains as a particular case the finite coverings. Namely, if $p\colon Y\to X$ is a finite covering of degree $k$, then the map $x\mapsto p^{-1}(x)$ is a continuous $k$-valued map from $X$ to $Y$. Conversely, let us consider the graph $Z$ of $F$, namely the set of all points $(x,y)\in X\times Y$ such that $y\in F(x)$. Then the first projection $p\colon Z\to X$ is a covering map of degree $k$, but it is not clear that it has local sections.

It would however not be so surprising to 21st-century mathematicians that if one makes a suitable assumption of simple connectedness on $X$, then every such $F$ should be decomposed. Banach and Mazur assume that $X$ satisfies two properties:

  1. The space $X$ is semilocally arcwise connected: for every point $x\in X$ and every neighborhood $U$ of $x$, there exists an open neighborhood $U'$ contained in $U$ such that for every point $x'\in U'$, there exists a path $c\colon[0;1]\to U$ such that $c(0)=x$ and $c(1)=x'$. (Semilocally means that the path is not necessarily in $U'$ but in $U$.)
  2. The space $X$ is arcwise simply connected: two paths $c_0,c_1\colon[0;1]\to X$ with the same endpoints ($c_0(0)=c_1(0)$ and $c_0(1)=c_1(1)$) are strictly homotopic — there exists a continuous map $h\colon[0;1]\to X$ such that $h(0,t)=c_0(t)$ and $h(1,t)=c_1(t)$ for all $t$, and $h(s,0)=c_0(0)$ and $h(s,1)=c_0(1)$ for all $s$.

Consider a $k$-valued continuous map $F$ from $X$ to $Y$, where $X$ is connected. Banach and Mazur first prove that for every path $c\colon [0;1]\to X$ and every point $y_0\in F(c(0))$, there exists a continuous function $f\colon[0;1]\to Y$ such that $f(t)\in F(c(t))$ for all $t$. To that aim, the consider disjoint neighborhoods $V_1,\dots,V_k$ of the elements of $F(c(0))$, with $y_0\in V_1$, say, and observe that for $t$ small enough, there is a unique element in $F(c(t))\cap V_1$. This defines a bit of the path $c$, and one can go on. Now, given two paths $c,c'$ such that $c(0)=c'(0)$ and $c(1)=c'(1)$, and two maps $f,f'$ as above, they consider a homotopy $h\colon[0;1]\times[0;1]\to X$ linking $c$ to $c'$. Subdividing this square in small enough subsquares, one see by induction that $f(1)=f'(1)$. (This is analogous to the proof that a topological covering of the square is trivial.) Fixing a point $x_0\in X$ and a point $y_0\in F(x_0)$, one gets in this way a map from $X$ to $Y$ such that $F(x)$ is equal to $f(1)$, for every path $c\colon[0;1]\to X$ such that $c(0)=x_0$ and $c(1)=x$, and every continuous map $f\colon [0;1]\to Y$ such that $f(t)\in F(c(t))$ for all $t$ and $f(0)=y_0$. This furnishes a map from $X$ to $Y$, and one proves that it is continuous. If one considers all such maps, for all points in $F(x_0)$, one obtains the decomposition of the multivalued map $F$.

To prove their version of the Hadamard–Lévy theorem, Banach and Mazur observe that if $f\colon Y\to X$ is a local homeomorphism which is proper, then setting $F(x)=f^{-1}(y)$ gives a multivalued continuous map. It is not obvious that the cardinalities $k(x)$ of the sets $F(x)$ are constant, but this follows (if $X$ is connected) from the fact that $f$ is both a local homeomorphism and proper. Then $F$ is decomposed, so that there exist continuous maps $g_1,\dots,g_k\colon X\to Y$ such that $f^{-1}(x)=\{g_1(x),\dots,g_k(x)\}$ for all $x\in X$. This implies that $Y$ is the disjoint union of the $k$ connected subsets $g_j(X)$. If $Y$ is connected, then $f$ is a homeomorphism.

The versions of Hadamard and Lévy, after Plastock

Hadamard considered the finite dimensional case, and Lévy extended it to the case of Hilbert spaces.

Plastock considers a Banach-space version of the theorem above: $f\colon E\to F$ is a $\mathscr C^1$-map between Banach spaces with invertible differentials and such that, setting $\omega(r)=\inf_{\|x\| = r}\|f'(x)^{-1}\|$, one has $\int_0^\infty \omega(r)\,dr=+\infty$. Of course, under these hypotheses, the Banach spaces $E$ and $F$ are isomorphic, but it may be useful that they are not identical. Note that $f(E)$ is open in $F$, and the proposition that will insure that $f$ is a global diffeomorphism is the following one, in the spirit of covering theory.

Proposition.(Assuming that $f$ is a local diffeomorphism.) It suffices to prove that the map $f$ satisfies the path lifting property: for every point $x\in E$ and every $\mathscr C^1$ map $c\colon[0;1]\to f(E)$ such that $c(0)=f(x)$, there exists a $\mathscr C^1$ map $d\colon[0;1]\to E$ such that $c(t)=f(d(t))$ for all $t$ and $d(0)=c$.

The goal is now to prove that $f$ satisfies this path lifting property. Using that $f$ is a local homeomorphism, one sees that lifts are unique, and are defined on a maximal subinterval of $[0;1]$ which is either $[0;1]$ itself, or of the form $[0;s\mathclose[$. To prevent the latter case, one needs to impose conditions on the norm $\| f'(x)^{-1}\|$ such as the one phrased in terms of $\omega(r)$ as in the Hadamard–Lévy theorem. In fact, Plastock starts with a simpler case.

Proposition.The path lifting property follows from the following additional hypotheses:

  1. One has $\|f(x)\|\to+\infty$ when $\|x\|\to+\infty$;
  2. There exists a positive continuous function $M\colon\mathbf R_+\to\mathbf R_+$ such that $\|f'(x)^{-1}\|\leq M(\|x\|)$ for all $x.

Assume indeed that a path $c$ has a maximal lift $d$, defined over the interval $[0;s\mathclose[$. By the hypothesis (i), $d(t)$ remains bounded when $t\to s$, because $c(t)=f(d(t))$ tends to $c(s)$. Differentiating the relation $c(t)=f(d(t))$, one gets $c'(t)=f'(d(t))(d'(t))$, hence $d'(t)=f'(d(t))^{-1}(c'(t))$, so that $\| d'(t)\|\leq M(\|d(t)\|) \|c'(t)\|$. This implies that $\|d'\|$ is bounded, so that $d$ is uniformly continuous, hence it has a limit at $s$. Then the path $d$ can be extended by setting $d(s)$ to this limit and using the local diffeomorphism property to go beyong $s$.

The Hadamard–Lévy is related to completeness of some length-spaces. So we shall modify the distance of the Banach space $E$ as follows: if $c\colon[0;1]\to E$ is a path in $E$, then its length is defined by \[ \ell(c) = \int_0^1 \| f'(c(t))^{-1}\|^{-1} \|{c'(t)}\|\, dt. \] Observe that $\|f'(c(t))^{-1}\|^{-1} \geq \omega(\|c(t)\|)$, so that \[ \ell(c) \geq \int_0^1 \omega(\|c(t)\|) \|{c'(t)}\|\, dt. \] The modified distance of two points in $E$ is then redefined as the infimum of the lengths of all paths joining two points.

Lemma.With respect to the modified distance, the space $E$ is complete.

One proves that $\ell(c) \geq \int_{\|{c(0)}\|}^{\|{c(1)}\|}\omega(r)\,dr$. Since $\int_0^\infty \omega(r)\,dr=+\infty$, this implies that Cauchy sequences for the modified distance are bounded in $E$ for the original norm. On the other hand, on any bounded subset of $E$, the Banach norm and the modified distance are equivalent, so that they have the same Cauchy sequences.

Other conditions can be derived from Plastock's general theorem. For example, assuming that $E$ and $F$ are a Hilbert space $H$, he shows that it suffices to assume the existence of a decreasing function $\lambda\colon\mathbf R_+\to\mathbf R_+$ such that $\langle f'(x)(u),u\rangle \geq \lambda(\|x\|) \| u\|^2$ for all $x,y$ and $\int_0^\infty \lambda(r)\,dr=+\infty$. Indeed, under this assumption, one may set $\omega(r)=\lambda(r)$.

Application to periodic solutions of differential equations

Spectral theory can be seen as the infinite dimensional generalization of classical linear algebra. Linear differential operators and linear partial differential operators furnish prominent examples of such operators. The theorems of Hadamard–Lévy type have been applied to solve nonlinear differential equations.

I just give an example here, to give an idea of how this works, and also because I am quite lazy enough to check the details.

Following Brown & Lin (1979), we consider the Newtonian equation of motion: \[ u''(t) + \nabla G (u(t)) = p(t) \] where $G$ represents the ambiant potential, assumed to be smooth enough, and $p\colon \mathbf R\to\mathbf R^n$ is some external control. The problem studied by Brown and Lin is to prove the existence of periodic solutions when $p$ is itself periodic. The method consists in interpreting the left hand side as a non linear map defined on the Sobolev space $E$ of $2\pi$-periodic $\mathscr C^1$-functions with a second derivative in $F=L^2([0;2\pi];\mathbf R^n)$, with values in $F$. Write $L$ for the linear operator $u\mapsto u''$ and $N$ for the (nonlinear) operator $u\mapsto \nabla G(u)$. Then $L$ is linear continuous (hence $L'(u)(v)=L'(v)$), and $N$ is continuously differentiable, with differential given by \[ N'(u) (v) = \left( t \mapsto Q (u(t)) (v(t)) \right) \] for $u,v\in E$, and $Q$ is the Hessian of $G$.

In other words, the differential $(L+N)'(u)$ is the linear map $v\mapsto L(v) + Q(u(t)) v$. It is invertible if the eigenvalues of $Q(u(t))$ are away from integers. Concretely, Brown and Lin assume that there are two constant symmetric matrices $A$ and $B$ such that $A\leq Q(x) \leq B$ for all $x$, and whose eigenvalues $\lambda_1\leq \dots\lambda_n$ and $\mu_1\leq\dots\leq \mu_n$ are such that there are integers $N_1,\dots,N_n$ with $N_k^2<\lambda_k\leq\mu_k<(N_k+1)^2$ for all $k$. Using spectral theory in Hilbert spaces, these conditions imply that the linear operator $L+Q(u)\colon E\to F$ is an isomorphism, and that $\|(L+Q(u)^{-1}\|$ is bounded from above by the constant expression \[ c= \sup_{1\leq k\leq n} \sup (\lambda_k-N_k^2)^{-1},((N_k+1)^2-\mu_k)^{-1} ).\]

Thanks to this differential estimate, the theorem of Hadamard–Lévy implies that the nonlinear differential operator $L+N$ is a global diffeomorphism from $E$ to $F$. In particular, there is a unique $2\pi$-periodic solution for every $2\pi$-periodic control function $p$.

I thank Thomas Richard for his comments.

Wednesday, April 13, 2016

Weierstrass's approximation theorem

I had to mentor an Agrégation leçon entitled Examples of dense subsets. For my own edification (and that of the masses), I want to try to record here as many proofs as of the Weierstrass density theorem as I can : Every complex-valued continuous function on the closed interval $[-1;1]$ can be uniformly approximated by polynomials. I'll also include as a bonus the trigonometric variant: Every complex-valued continuous and $2\pi$-periodic function on $\mathbf R$ can be uniformly approximated by trigonometric polynomials.

1. Using the Stone theorem.

This 1937—1948 theorem is probably the final conceptual brick to the edifice of which Weierstrass laid the first stone in 1885. It asserts that a subalgebra of continuous functions on a compact totally regular (e.g., metric) space is dense for the uniform norm if and only if it separates points. In all presentations that I know of, its proof requires to establish that the absolute value function can be uniformly approximated by polynomials on $[-1;1]$:
  • Stone truncates the power series expansion of the function \[ x\mapsto \sqrt{1-(1-x^2)}=\sum_{n=0}^\infty \binom{1/2}n (x^2-1)^n, \] bounding by hand the error term.
  • Bourbaki (Topologie générale, X, p. 36, lemme 2) follows a more elementary approach and begins by proving  that the function $x\mapsto \sqrt x$ can be uniformly approximated by polynomials on $[0;1]$. (The absolute value function is recovered since $\mathopen|x\mathclose|\sqrt{x^2}$.) To this aim, he introduces the sequence of polynomials given by $p_0=0$ and $p_{n+1}(x)=p_n(x)+\frac12\left(x-p_n(x)^2\right)$ and proves by induction the inequalities \[ 0\leq \sqrt x-p_n(x) \leq \frac{2\sqrt x}{2+n\sqrt x} \leq \frac 2n\] for $x\in[0;1]$ and $n\geq 0$. This implies the desired result.
The algebra of polynomials separates points on the compact set $[-1;1]$, hence is dense. To treat the case of trigonometric polynomials, consider Laurent polynomials on the unit circle.

2. Convolution.

Consider an approximation $(\rho_n)$ of the Dirac distribution, i.e., a sequence of continuous, nonnegative and compactly supported functions on $\mathbf R$ such that $\int\rho_n=1$ and such that for every $\delta>0$, $\int_{\mathopen| x\mathclose|>\delta} \rho_n(x)\,dx\to 0$. Given a continuous function $f$ on $\mathbf R$, form the convolutions defined by $f*\rho_n(x)=\int_{\mathbf R} \rho_n(t) f(x-t)\, dt$. It is classical that $f*\rho_n$ converges uniformly on every compact to $f$.

Now, given a continuous function $f$ on $[-1;1]$, one can extend it to a continuous function with compact support on $\mathbf R$ (defining $f$ to be affine linear on $[-2;-1]$ and on $[1;2]$, and to be zero outside of $[-2;2]$. We want to choose $\rho_n$ so that $f*\rho_n$ is a polynomial on $[-1;1]$. The basic idea is just to choose a parameter $a>0$, and to take $\rho_n(x)= c_n (1-(x/a)^2)^n$ for $\mathopen|x\mathclose|\leq a$ and $\rho_n(x)=0$ otherwise, with $c_n$ adjusted so that $\int\rho_n=1$. Let us write $f*\rho_n(x)=\int_{-2}^2 \rho_n(x-t) f(t)\, dt$; if $x\in[-1;1]$ and $t\in[-2:2]$, then $x-t\in [-3;3]$ so we just need to be sure that $\rho_n$ is a polynomial on that interval, which we get by taking, say, $a=3$. This shows that the restriction of $f*\rho_n$ to $[-1;1]$ is a polynomial function, and we're done.

This approach is more or less that of D. Jackson (“A Proof of Weierstrass's Theorem,” Amer. Math. Monthly, 1934). The difference is that he considers continuous functions on a closed interval contained in $\mathopen]0;1\mathclose[$ which he extends linearly to $[0;1]$ so that they vanish at $0$ and $1$; he considers the same convolution, taking the parameter $a=1$.

Weierstrass's own proof (“Über die analytische Darstellbarkeit sogenannter willkurlicher Functionen einer reellen Veranderlichen Sitzungsberichteder,” Königlich Preussischen Akademie der Wissenschaften zu Berlin, 1885) was slightly more sophisticated: he first showed approximation by convolution with the Gaussian kernel  defined by $ \rho_n(t) =\sqrt{ n} e^{- \pi n t^2}$, and then expanded the kernel as a power series, a suitable truncation of which furnishes the desired polynomials.

As shown by Jacskon, the same approach works easily (in a sense, more easily) for $2\pi$-periodic functions, considering the kernel defined by $\rho_n(x)=c_n(1+\cos(x))^n$, where $c_n$ is chosen so that \int_{-\pi}^\pi \rho_n=1$.

3. Bernstein polynomials.

Take a continuous function $f$ on $[0;1]$ and, for $n\geq 0$, set \[ B_nf(x) = \sum_{k=0}^n f(k/n) \binom nk t^k (1-t)^{n-k}.\] It is classical that $B_nf$ converges uniformly to $f$ on $[0;1]$.

There are two classical proofs of Bernstein's theorem. One is probabilistic and consists in observing that $B_nf(x)$ is the expected value of $f(S_n)$, where $S_n$ is the sum of $n$ i.i.d. Bernoulli random variables with parameter $x\in[0;1]$. Another (generalized as the Korovkin theorem, On convergence of linear positive operators in the space of continuous functions, Dokl. Akad. Nauk SSSR (N.S.), vol. 90,‎ ) consists in showing (i) that for $f=1,x,x^2$, $B_nf$ converges uniformly to $f$ (an explicit calculation), (ii) that if $f\geq 0$, then $B_nf\geq 0$ as well, (iii) for every $x\in[0;1]$, squeezing $f$ inbetween two quadratic polynomials $f^+$ and $f_-$ such that $f^+(x)-f^-(x)$ is as small as desired.

A trigonometric variant would be given by Fejér's theorem that the Cesàro averages of a Fourier series of a continuous, $2\pi$-periodic function converge uniformly to that function. In turn, Fejér's theorem can be proved in both ways, either by convolution (the Fejér kernel is nonnegative), or by a Korovkine-type argument (replacing $1,x,x^2$ on $[0;1]$ by $1,z,z^2,z^{-1},z^{-2}$ on the unit circle).


4. Using approximation by step functions.

This proof originates with a paper of H. Kuhn, “Ein elementarer Beweis des Weierstrasschen Approximationsatzes,” Arch. Math. 15 (1964), p. 316–317.

Let us show that for every $\delta\in\mathopen]0,1\mathclose[$ and every $\varepsilon>0$, there exists a polynomial $p$ satisfying the following properties:
  • $0\leq p(x)\leq \varepsilon$ for $-1\leq x\leq-\delta$;
  • $0\leq p(x)\leq 1$ for $-\delta\leq x\leq \delta$;
  • $1-\varepsilon\leq p(x)\leq 1$ for $\delta\leq x\leq 1$.
In other words, these polynomials approximate the (discontinuous) function $f$ on $[-1;1]$ defined by $f(x)=0$ for $x< 0$, $f(x)=1$ for $x> 0$ and $f(0)=1/2$.

A possible formula is $p(x)=(1- ((1-x)/2))^n)^{2^n}$, where $n$ is a large enough integer. First of all, one has $0\leq (1-x)/2\leq 1$ for every $x\in[-1;1]$, so that $0\leq p(x)\leq 1$. Let $x\in[-1;-\delta]$; then one has $(1-x)/2\geq (1+\delta)/2$, hence $p(x)\leq (1-((1+\delta)/2)^n)^{2^n}$, which can be made arbitrarily small when $n\to\infty$. Let finally $x\in[\delta;1]$; then $(1-x)/2\geq (1-\delta)/2$, hence $p(x)\geq (1-((1-\delta)/2)^n)^{2^n}\geq 1- (1-\delta)^n$, which can be made arbitrarily close to $1$ when $n\to\infty$.

By translation and dilations, the discontinuity can be placed at any element of $[0;1]$. Let now $f$ be an arbitrary step function and let us write it as a linear combination $f=\sum a_i f_i$, where $f_i$ is a $\{0,1\}$-valued step function. For every $i$, let $p_i$ be a polynomial that approximates $f_i$ as given above. The linear combination $\sum a_i p_i$ approximates $f$ with maximal error $\sup(\mathopen|a_i\mathclose|)$.

Using uniform continuity of continuous functions on $[-1;1]$, every continuous function can be uniformly approximated by a step function. This concludes the proof.

5. Using approximation by piecewise linear functions.

As in the proof of Stone's theorem, one uses the fact that the function $x\mapsto \mathopen|x\mathclose|$ is uniformly approximated by a sequence of polynomial on $[-1;1]$. Consequently,  so are the functions $x\mapsto \max(0,x)=(x+\mathopen|x\mathclose|)/2 $ and $x\mapsto\min(0,x)=(x-\mathopen|x\mathclose|)/2$. By translation and dilation, every continuous piecewise linear function on $[-1;1]$ with only one break point is uniformly approximated by polynomials. By linear combination, every continuous piecewise linear affine function is uniformly approximated by polynomials.
By uniform continuity, every continuous function can be uniformly approximated by continuous piecewise linear affine functions. Weierstrass's theorem follows.

6. Moments.

A linear subspace $A$ of a Banach space is dense if and only if every continuous linear form which vanishes on $A$ is identically $0$. In the present case, the dual of $C^0([-1;1],\mathbf C)$ is the space of complex measures on $[-1;1]$ (Riesz theorem, if one wish, or the definition of a measure). So let $\mu$ be a complex measure on $[-1;1]$ such that $\int_{-1}^1 t^n \,d\mu(t)=0$ for every integer $n\geq 0$; let us show that $\mu=0$. This is the classical problem of showing that a complex measure on $[-1;1]$ is determined by its moments. In fact, the classical proof of this fact runs the other way round, and there must exist ways to reverse the arguments.

One such solution is given in Rudin's Real and complex analysis, where it is more convenient to consider functions on the interval $[0;1]$. So, let $F(z)=\int_0^1 t^z \,d\mu(t)$. The function $F$ is holomorphic and bounded on the half-plane $\Re(z)> 0$ and vanishes at the positive integers. At this point, Rudin makes a conform transformation to the unit disk (setting $w=(z-1)/(z+1)$) and gets a  bounded function on the unit disk with zeroes at $(n-1)/(n+1)=1-2/(n+1)$, for $n\in\mathbf N$, and this contradicts the fact that the series $\sum 1/(n+1)$ diverges.

In Rudin, this method is used to prove the more general Müntz–Szász theorem according to which the family $(t^{\lambda_n})$ generates a dense subset of $C([0;1])$ if and only if $\sum 1/\lambda_n=+\infty$.

Here is another solution I learnt in a paper by L. Carleson (“Mergelyan's theorem on uniform polynomial approximation”, Math. Scand., 1964).

For every complex number $a$ such that $\mathopen|a\mathclose|>1$, one can write $1/(t-a)$ as a converging power series. By summation, this quickly gives that
\[ F(a) = \int_{-1}^1 \frac{1}{t-a}\, d\mu(t) \equiv 0. \]
Observe that this formula defines a holomorphic function on $\mathbf C\setminus[-1;1]$; by analytic continuous, one thus has $F(a)=0$ for every $a\not\in[-1;1]$.
Take a $C^2$-function $g$ with compact support on the complex plane. For every $t\in\mathbf C$, one has the following formula
\[ \iint \bar\partial g(z) \frac{1}{t-z} \, dx\,dy = g(t), \]
which implies, by integration and Fubini, that
\[ \int_{-1}^1 g(t)\,d\mu(t) = \iint \int \bar\partial g(z) \frac1{t-z}\,d\mu(t)\,dx\,dy = \iint \bar\partial g(z) F(z)\,dx\, dy= 0. \]
On the other hand, every $C^2$ function on $[-1;1]$ can be extended to such a function $g$, so that the measure $\mu$ vanishes on every $C^2$ function on $[-1;1]$. Approximating a continuous function by a $C^2$ function (first take a piecewise linear approximation, and round the corners), we get that $\mu$ vanishes on every continuous function, as was to be proved.

7. Chebyshev/Markov systems.

This proof is due to P. Borwein and taken from the book Polynomials and polynomial inequalities, by P. Borwein and T. Erdélyi (Graduate Texts in Maths, vol. 161, 1995). Let us say that a sequence $(f_n)$ of continuous functions on an interval $I$ is a Markov system (resp. a weak Markov system) if for every integer $n$, every linear combination of $(f_0,\dots,f_n)$ has at most $n$ zeroes (resp. $n$ sign changes) in $I$.

Given a Markov system $(f_n)$, one defines a sequence $(T_n)$, where $T_n-f_n$ is the element of $\langle f_0,\dots,f_{n-1}\rangle$ which is the closest to $f_n$. The function $T_n$ has $n$ zeroes on the interval $I$; let $M_n$ be the maximum distance between two consecutive zeroes.

Borwein's theorem  (Theorem 4.1.1 in the mentioned book) then asserts that if the sequence $(f_n)$ is a Markov system consisting of $C^1$ functions, then its linear span is dense in $C(I)$ if and only if $M_n\to 0$.

The sequence of monomials $(x^n)$ on $I=[-1;1]$ is of course a Markov system.  In this case, the polynomial $T_n$ is the $n$th Chebyshev polynomial, given by $T_n(2\cos(x))=2\cos(nx)$, and its roots are given by $2\cos((\pi+2k\pi)/2n)$, for $k=0,\dots,n-1$, and $M_n\leq \pi/n$. This gives yet another proof of Weierstrass's approximation theorem.

Wednesday, November 11, 2015

When Baire meets Krasner


Here is a well-but-ought-to-be-better known theorem.

Theorem. — Let $\ell$ be a prime number and let $G$ be a compact subgroup of $\mathop{\rm GL}_d(\overline{\mathbf Q_\ell})$. Then there exists a finite extension $E$ of $\mathbf Q_\ell$ such that $G$ is contained in $\mathop{\rm GL}_d(E)$.

Before explaining its proof, let us recall why such a theorem can be of any interest at all. The keyword here is Galois representations.

It is now a well-established fact that linear representations are an extremly useful tool to study groups. This is standard for finite groups, for which complex linear representations appear at one point or another of graduate studies, and its topological version is even more classical for the abelian groups $\mathbf R/\mathbf Z$ (Fourier series) and $\mathbf R$ (Fourier integrals). On the other hand, some groups are extremly difficult to grasp while their representations are ubiquitous, namely the absolute Galois groups $G_K=\operatorname{Gal}(\overline K/K)$ of fields $K$.

With the notable exception of real closed fields, these groups are  infinite and have a natural (profinite) topology with open subgroups the groups $\operatorname{Gal}(\overline K/L)$, where $L$ is a finite extension of $K$ lying in $\overline K$. It is therefore important to study their continuous linear representations. Complex representations are important but since $G_K$ is totally discontinuous, their image is always finite. Therefore, $\ell$-adic representations, namely continuous morphisms from $G_K$ to $\mathop{\rm GL}_d(\mathbf Q_\ell)$, are more important. Here $\mathbf Q_\ell$ is the field of $\ell$-adic numbers.

Their use goes back to Weil's proof of the Riemann hypothesis for curves over finite fields, via the action on $\ell^\infty$-division points of its Jacobian variety. Here $\ell$ is a prime different from the characteristic of the ground field. More generally, every Abelian variety $A$ over a field $K$ of characteristic $\neq\ell$ gives rise to a Tate module $T_\ell(A)$ which is a free $\mathbf Z_\ell$-module of rank $d=2\dim(A)$, endowed with a continuous action $\rho_{A,\ell}$ of  $G_K$. Taking a basis of $T_\ell(A)$, one thus has a continuous morphism $G_K\to \mathop{\rm GL}_d(\mathbf Z_\ell)$, and, embedding $\mathbf Z_\ell$ in the field of $\ell$-adic numbers,  a continuous morphism $G_K\to\mathop{\rm GL}_d(\mathbf Q_\ell)$. Even more generally, one can consider the $\ell$-adic étale cohomology of algebraic varieties over $K$.

For various reasons, such as the need to diagonalize additional group actions, one can be led to consider similar representations where $\mathbf Q_\ell$ is replaced by a finite extension of $\mathbf Q_\ell$, or even by the algebraic closure $\overline{\mathbf Q_\ell}$. Since $G_K$ is a compact topological groups, its image by a continuous representation $\rho\colon G_K\to\mathop{\rm GL}_d(\overline{\mathbf Q_\ell}$ is a compact subgroup of $\mathop{\rm GL}_d(\overline{\mathbf Q_\ell}$ to which the above theorem applies.

This being said for the motivation, one proof (attributed to Warren Sinnott)  is given by Keith Conrad in his short note, Compact subgroups of ${\rm GL}_n(\overline{\mathbf Q}_p)$. In fact, while browsing at his large set of excellent expository notes,  I fell on that one and felt urged to write this blog post.

The following proof had been explained to me by Jean-Benoît Bost almost exactly 20 years ago. I believe that it ought to be much more widely known.

It relies on the Baire category theorem and on Krasner's lemma.

Lemma 1 (essentially Baire). — Let $G$ be a compact topological group and let $(G_n)$ be an increasing sequence of closed subgroups of $G$ such that $\bigcup G_n=G$. There exists an integer $n$ such that $G_n=G$.

Proof. Since $G$ is compact Hausdorff, it satisfies the Baire category theorem and there exists an integer $m$ such that $G_m$ contains a non-empty open subset $V$. For every $g\in V$, then $V\cdot g^{-1}$ is an open neighborhood of identity contained in $G_m$. This shows that $G_n$ is open in $G$. Since $G$ is compact, it has finitely many cosets $g_iG_m$ modulo $G_m$; there exists an integer $n\geq m$ such that $g_i\in G_n$ for every $i$, hence $G=G_n$. QED.

Lemma 2 (essentially Krasner). — For every integer $d$, the set of all extensions of $\mathbf Q_\ell$ of degree $d$, contained in $\overline{\mathbf Q_\ell}$, is finite.

Proof. Every finite extension of $\mathbf Q_\ell$ has a primitive element whose minimal polynomial can be taken monic and with coefficients in $\mathbf Z_\ell$; its degree is the degree of the polynomial. On the other hand, Krasner's lemma asserts that for every such irreducible polynomial $P$, there exist a real number $c_P$ for every monic polynomial $Q$ such that the coefficients of $Q-P$ have absolute values $<c_P$, then $Q$ has a root in the field $E_P=\mathbf Q_\ell[T]/(P)$. By compactness of $\mathbf Z_\ell$, the set of all finite subextensions of given degree of $\overline{\mathbf Q_\ell}$ is finite. QED.

Let us now give the proof of the theorem. Let $(E_n)$ be a increasing sequence of finite subextensions of $\overline{\mathbf Q_\ell}$ such that $\overline{\mathbf Q_\ell}=\bigcup_n E_n$ (lemma 2; take for $E_n$ the subfield generated by $E_{n-1}$ and all the subextensions of degree $n$ of $\overline{\mathbf Q_\ell}$). Then $G_n=G\cap \mathop{\rm GL}_d(E_n)$ is a closed subgroup of $G$, and $G$ is the increasing union of all $G_n$. By lemma 1, there exists an integer $n$ such that $G_n=G$. QED.
 

Monday, May 11, 2015

Model theory and algebraic geometry, 3 — Real closed fields and o-minimality

In this third post devoted to some interactions between model theory and algebraic geometry, we describe the concept of o-minimality and the o-minimal complex analysis of Peterzil and Starchenko.

1. Real closed fields and the theorem of Tarski-Seidenberg

To begin with, we work in the language $L_{\mathrm{or}}$ of ordered rings which is the language of rings $L_{\mathrm r}=\{+,-,\cdot,0,1\}$ enlarged with an order relation $\leq$.

Let us recall the definition of a real closed field: this is an field $K$ endowed with an ordering which is compatible with the field laws (the sum of positive elements is positive and the product of positive elements is positive) which satisfies the intermediate value theorem for polynomials: for every polynomial $P\in K[T]$, any pair $(a,b)$ of elements of $K$ such that $a<b$, $P(a)<0$ and $P(b)>0$, there exists $c\in K$ such that $P(c)=0$ and $a<c<b$. Observe that this property can be expressed by a sequence of first-order formulas, one for each degree.

The field $\mathbf R$ of real numbers is real closed, but there are many other. For example, the field of formal Puiseux series with real coefficients is also real closed.

A theorem of Artin-Schreier asserts that a field $K$ is real closed if and only if $\sqrt{-1}\not\in K$ and $K(\sqrt{-1})$ is an algebraic closure of $K$. This is also equialent to the fact that “the” algebraic closure of $K$ is a finite non-trivial extension of $K$. While the algebraic notion adapted to the language of rings is that of an algebraically closed field, the notion of a real closed field is the one which is adapted to the language of ordered rings. In model theoretic terms, the theory of real closed fields is the model companion of the theory of ordered fields.

The analogue of the theorem of Chevalley is the classical theorem of Tarski-Seidenberg:

Theorem (Tarski-Seidenberg). — The theory of real closed fields eliminates quantifiers in the language of ordered rings.

There is a very classical example of this theorem, namely, the resolution of polynomial equation of degree 2. Indeed, in a real closed field, every positive element has a square root (if $a>0$, then the polynomial $T^2-a$ is negative at $0$ and positive at $\max(a,1)$, so that it admits a positive root). The usual algebraic computation thus shows that the formula $\exists x, x^2+ax+b=0$ is equivalent to the formula $a^2-4b\geq 0$.

Corollary 1. — If $M$ is a real closed field and $A$ is a subset of $A$, then $\mathop{\rm Def}(M^n,A)$ is the set of all semi-algebraic subsets of $M$ defined by polynomials with coefficients in $A$.

Corollary 2. — If $M$ is a real closed field, the definable subsets of $M$ are the finite unions of intervals (open, closed or half-open, $\mathopen]a;b\mathclose[$, $\mathopen]a;b]$, $[\mathopen a;b\mathclose[$, $[a;b]$, possibly unbounded, possibly reduced to singletons).

2. O-minimality

The seemingly innocuous property stated in corollary 2 leads to a definition which is surprisingly important and powerful.

Definition. — Let $T$ be the theory of a real closed field $M$ in an expansion $L$ of the language of ordered rings. One says that $T$ is o-minimal if the definable subsets of $M$ are the finite unions of intervals.

It is a non-trivial result that the o-minimality is indeed a property of the theory $T$, and not a property of the model $M$: if it holds, then for every elementary extension $N$ of $M$, the definable subsets of $N$ still are finite unions of intervals.

By the theorem of Tarski-Seidenberg, the theory of real closed fields is o-minimal. The discovery of more complicated o-minimal theories is a remarkable fact from the 80s.

Example. — Let $L_{\mathrm{an},\mathrm{exp}}$ be the language obtained by adjoining to the language $L_{\mathrm{or}}$ of ordered rings symbols of functions $\exp$ and $f$, for every real analytic function $f\colon [0;1]^n\to\mathbf R$. The field of real numbers is viewed as a structure for this language by interpreting $\exp$ as the exponential function from $\mathbf R$ to $\mathbf R$, and every function symbol $f$ as the function from $\mathbf R^n$ to $\mathbf R$ that maps $x$ to $f(x)$ if $x\in [0;1]^n$, and to $0$ otherwise. The theory (denoted $\mathbf R_{\mathrm{an},\mathrm{exp}})$) of $\mathbf R$ in this language is o-minimal.

This is a thorem of van den Dries and Miller; the case of $L_{\mathrm{an}}$ (without the exponential function) had been established Denef and van den Dries, while the case of $L_{\mathrm{exp}}$ is due to Wilkie.

To give a non-example, let us consider the language obtained by adjoining a symbol $\sin$ and view $\mathbf R$ as a structure for this language, the symbol $\sin$ being interpreted as the sine function from $\mathbf R$ to $\mathbf R$. Then the theory of $\mathbf R$ in this language is not o-minimal. Indeed, the set $2\pi\mathbf Z$ is definable by the formula $\sin(x)=0$, but $2\pi\mathbf Z$ has infinitely many connected components, so is not a finite union of intervals.

One motivation for o-minimality is that it realizes (part of) Grothendieck quest towards tame topology as described in his Esquisse d'un programme. Indeed, sets which are definable in an o-minimal structure have many tameness properties:
  • The interior, the closure, the boundary of a definable set is definable.
  • Every definable set is homeomorphic to (the topological realization) of a simplicial complex
  • Every definable set has a celllular decomposition. Precisely, let us call a cell of $\mathbf R^{n+1}$ any subset $C$ of the following form: one is given a definable subset $A$ of $\mathbf R^n$ and definable functions $f,g\colon A\to\mathbf R$ such that $f(x)<g(x)$ for every $x\in A$, and the set $C$ is defined by the condition $x\in A$, and by one of the conditions $t<f(x)$, or $t=f(x)$, or $f(x)<t<g(x)$, or $t>f(x)$.  Then for every finite family $(B_i)$ of definable subsets of $\mathbf R^{n+1}$, there is a finite partition of $\mathbf R^{n+1}$ into cells such that every $B_i$ is a union of cells.
  • Every definable function is piecewise smooth.
  • Definable continuous functions are definably piecewise trivial (theorem of Hardt): for every function $f\colon X\to Y$ between definable sets which is definable and continuous, there is a finite partition $(Y_i)$ of $Y$ into definable subsets such that the map $f_i\colon f^{-1}(Y_i)\to Y_i$ deduced from $f$ by restriction is isomorphic to a projection $Y_i\times S_i\to Y_i$.

Recently, o-minimality has had spectacular and fantastic applications via the approach of Pila-Zannier to the conjecture of Pink, leading to new proofs of the Manin-Mumford conjecture (Pila-Zannier), and to proofs of the André-Oort conjecture (Pila, Pila-Tsimerman, Klingler-Ullmo-Yafaev), and, more recently, to partial results towards the conjecture of Pink (Gao, Habegger-Pila,...). However, this is not the goal of that post, so let me refer the interested reader to Tom Scanlon's Bourbaki talk on that topic.

3. O-minimal complex analysis

The standard identification of the field $\mathbf C$ of complex numbers with $\mathbf R^2$ (associating with a complex number its real and imaginary parts) allows to talk of complex valued functions (on a subset of $\mathbf C^n$) which are definable in a given language. In a remarkable series of papers, Peterzil and Starchenko have shown that holomorphic functions which are definable in an o-minimal structure possess very rigid properties. Let us quote some of their theorems.

So we fix an expansion of the language $L_{\mathrm{or}}$ of which the field $\mathbf R$ is a structure whose theory is o-minimal. By “definable”, we mean definable in that language. The typical language considered in the applications here is the language $L_{\mathrm{an},\mathrm{exp}}$.

Theorem. — Let $A$ be a finite subset of $\mathbf C$ and let $f\colon \mathbf C\setminus A\to \mathbf C$ be a holomorphic function. If $f$ is definable, then it is a rational function.

Theorem. — Let $V\subset\mathbf C^n$ be a closed analytic subset. If $V$ is definable, then $V$ is algebraic.

Corollary (Theorem of Chow). — Let $V\subset\mathbf P^n(\mathbf C)$ be a closed analytic subset. Then $V$ is algebraic.

Indeed, working on the standard charts of $\mathbf P^n(\mathbf C)$, we see that $V$ is locally definable by analytic functions. By compactness of $\mathbf P^n(\mathbf C)$, it is thus definable in the language $L_{\mathrm{an}}$. Since the theory of $\mathbf R$ in this language is o-minimal, the corollary is a consequence of the previous theorem.

Let us finally give an important example. Let $X$ be an bounded symmetric domain. This means that $X$ is a bounded open subset of $\mathbf C^n$ such that for every point $p\in X$, there exists a biholomorphic involution $f\colon X\to X$ such that $p$ is an isolated fixed point of $f$. This implies that $X$ is a homogeneous space $G/K$ under a semisimple Lie group $G$ which acts by holomorphisms, and $K$ is a maximal compact subgroup of $G$. Moreover, $X$ has a canonical Kähler metric which is invariant under $G$.

The most classical example is given by the Poincaré upper half-plane on which $\mathrm{PGL}(2,\mathbf R)$ acts by homographies; of course, the upper half-plane is not bounded, but is biholomorphic to the open unit disk.

A more sophisticated example is given by the Siegel upper half-plane or, rather, its bounded version. That is, $X$ is the set of $n\times n$ symmetric complex matrices $Z$ such that $\mathrm I_n-Z^* Z$ is positive definite. It is a homogeneous space for the symplectic group $\mathrm{Sp}(2n,\mathbf R)$; the fixator of $Z=0$ is the unitary group $U(n)$.

Let now $\Gamma$ be an arithmetic subgroup of $\mathrm{Sp}(2n,\mathbf R)$; for example, let us take $\Gamma$ be a subgroup of finite index of $\mathrm{Sp}(2n,\mathbf Z)$. Then the quotient $S=X/\Gamma$ admits a structure of an analytic set and the projection $p\colon X\to S$ is an analytic map. If $\Gamma$ is “small enough” (torsion free, say), then $S$ is even complex manifold manifold, and $p$ is a covering. An important and difficult theorem of Baily-Borel asserts that $S$ is an algebraic variety.

In fact, it is classical in this context that there exist Siegel sets, which are explicit subsets $F$ of $X$ such that $\Gamma\cdot F=X$ and such that the set of $\gamma\in\Gamma$ such that $\gamma\cdot F\cap F\neq\emptyset$ is finite. So Siegel sets are almost fundamental domains. An important remark is that they are semi-algebraic, that is, definable in the language of ordered rings. For example in the upper half-plane, one may take $F$ to be the set of all $z\in\mathbf C$ such that $-\frac12\leq \Re(z)\leq \frac12$ and $\Im(z)\geq \sqrt 3/2$. One may even take “fundamental sets” (which are fundamental domains up to something of empty interior) such as the one defined by the inequalities $-\frac12\leq \Re(z)\leq\frac12$ and $\lvert z\rvert \geq1$.

Peterzil and Starchenko have proved that there restriction to $F$ of the projection $p$ is definable in the language $L_{\mathrm{an},\mathrm{exp}}$. An immediate consequence is that $S$ is definable in this language, hence is algebraic.

These results have been generalized by Klinger, Ullmo and Yafaev to any bounded symmetric domain. This is an important technical part of their proof of the hyperbolic Ax-Lindemann conjecture.

Link to Part 4 — Elimination of imaginaries

Saturday, May 2, 2015

Model theory and algebraic geometry, 2 — Definable sets, types; quantifier elimination

This is the second post in a series of 4 devoted to the exposition of interactions between model theory and algebraic geometry. In the first one, I explained the notions of language, structures and theories, with examples taken from algebra. Here, I shall discuss the notion of definable set, of types, as well as basic results from dimension theory ($\omega$-stability).

So we fix a theory $T$ in a language $L$. A definable set is defined, in a given model $M$ of $T$, by a formula. More precisely, we consider definable sets in cartesian powers $M^n$ of the model $M$, which can be defined by a formula in $n$ free variables with parameters in some subset $A$ of $M$. By definition, such a formula is a formula of the form $\phi(x;a)$, where $\phi(x;y)$ is a formula in $n+m$ free variables, split into two groups $x=(x_1,\dots,x_n)$ and $y=(y_1,\dots,y_m)$ and $a=(a_1,\dots,a_m)\in A^m$ is an $m$-tuple of parameters; the formula $\phi(x;y)$ can have quantifiers and bounded variables too. Given such a formula, we define a subset $[\phi(x;a)]$ of $M^n$ by $\{ x\in M^n\mid \phi(x;a)\}$. We write $\mathrm{Def}(M^n;A)$ for the set of all subsets of $M^n$ which are definable with parameters in $A$.

Let us give examples, where $L$ is the language of rings and $T$ is the theory $\mathrm{ACF}$ of algebraically closed fields:
  • $V_1=\{x\mid x\neq 0 \}\subset M $, given by the formula “$x\neq 0$” with 1 variable and $0$ parameter;
  • $V_2=\{x\mid \exists y, 2xy=1\} \subset M $, given by the formula “$\exists y, 2xy=1$” with 1 free variable $x$, and one bounded variable $y$;
  • $V_3=\{(x,y)\mid x^2+\sqrt 2 y^2=\pi \}\subset \mathbf C^2$, where the model $\mathbf C$ is the field of complex numbers, $\phi((x,y),(a,b))$ is the formula $x^2+ay^2=b$ in 4 free variables, and the parameters are given by $(a,b)=(\sqrt 2,\pi)$.
Theorem (Chevalley). — Let $L$ be the language of rings, $T=\mathrm{ACF}$ and $M$ be an algebraically closed field; let $A$ be a subset of $M$. The set $\mathrm{Def}(M^n;A)$ is the smallest boolean algebra of subsets of $M^n$ which contains all subsets of $M^n$ of the form $[P(x;a)]$ where $P$ is a polynomial in $n+m$ variables with coefficients in $\mathbf Z$ and $a=(a_1,\dots,a_m)$ is an $m$-tuple of elements of $A$. In other words, a subsets of $M^n$ is definable with parameters in $A$ if and only if it is constructible with parameters in $A$.

The reason behind this theorem is the following set-theoretic interpretation of quantifiers and logical connectors. Precisely, if $\phi$ is a formula in $n+m+p$ variables, and $a\in A^p$, the definable subset $[\exists y \phi(x,y,a)]$ of $M^n$ coincides with the image of the definable subset $[\phi(x,y;a)]$ of $M^{n+m}$ under the projection $p_x \colon M^{n+m}\to M^n$. Similarly, if $\phi(x)$ and $\psi(x)$ are two formulas in $n$ free variables, then the definable subset $[\phi(x)\wedge\psi(x)]$ is the union of the definable subsets $[\phi(x)]$ and $[\psi(x)]$. And if $\phi(x)$ is a formula in $n$ variables, then the definable subset $[\neg\phi(x)]$ is the complement in $M^n$ of the definable subset $[\phi(x)]$.

For example, the subset $V_2=[\exists y, 2xy=1]$ defined above can also be defined by $M\setminus [2x=0]$.

One says that the theory ACF admits elimination of quantifiers: modulo the axioms of algebraically closed fields, every formula of the language $L$ is equivalent to a formula without quantifiers.

An important consequence of this property is that for every extension $M\hookrightarrow M'$ of models of ACF, the theory of $M'$ is equal to the theory of $M$—one says that every extension of models is elementary.

Let $p$ be either $0$ or a prime number. Observe that every algebraically closed field of characteristic $p$ is an extension of $\overline{\mathbf Q}$ if $p=0$, or of $\overline{\mathbf F_p}$ if $p$ is a prime number. As a consequence, for every characteristic $p\geq0$, the theory $\mathrm{ACF}_p$ of algebraically closed fields of characteristic $p$ (defined by the axioms of $\mathrm{ACF}$, and  the axiom $1+1+\dots+1=0$ that the characteristic is $p$ if $p$ is a prime number, or the infinite list of axioms that assert that the characteristic is $\neq \ell$, if $p=0$) is complete: this list of axioms determines everything that can be said about algebraically closed fields of characteristic $p$.

Definition. — Let $a\in M^n$ and let $A$ be a subset of $M$. The type of $a$ (with parameters in $A$) is the set $\mathrm{tp}(a/A)$ of all formulas $\phi(x;b)$ in $n$ free variables with parameters in $A$ such that $\phi(a;b)$ holds in the model $M$.

Definition. — Let $A$ be a subset of $M$. For every integer $n\geq 0$, the set $S_n(A)$ of types (with parameters in $A$) is the set of all types $\mathrm{tp}(a/A)$, where $N$ is an extension of $M$ which is a model of $T$ and $a\in N^n$. One then says that this type is realized in $N$.

Gödel's completeness theorem allows us to give an alternative description of $S_n(A)$. Namely, let $p$ be a set of formulas in $n$ free variables and parameters in $A$ which contains the diagram of $A$ (that is, all formulas which involve only elements of $A$ and are true in $M$). Assume that $p$ is consistent (there exists a model $N$ which is an extension of $M$ and and element $a\in M^n$ such that $\phi(a)$ holds in $N$ for every $\phi\in p$) and maximal (for every formula $\phi\not\in p$, then for every model $N$ and every $a\in N^n$ such that $p\subset \mathrm{tp}(a/A)$, then $\phi(a)$ does not hold). Then $p\in S_n(A)$.

For every formula $\phi\in L(A)$ in $n$ free variables and parameters in $A$, let $V_\phi$ be the set of types $p\in S_n(A)$ such that $\phi\in p$. Then the subsets $V_\phi$ of $S_n(A)$ consistute a basis of open sets for a natural topology on $S_n(A)$.

Theorem. — The topological space $S_n(A)$ is compact and totally discontinuous.

Let us detail the case of the theory ACF in the langage of rings. I claim that if $K$ is a field, then $S_n(K)$ is homeomorphic to the spectrum $\mathop{\rm Spec}(K[T_1,\dots,T_n])$ endowed with its constructible topology. Concretely, for every algebraically closed extension $M$ of $K$ and every $a\in M^n$, the homeomorphism $j$ maps $\mathrm{tp}(a/K)$ to the prime ideal $\mathfrak p_a$ consisting of all polynomials $P\in K[T_1,\dots,T_n]$ such that $P(a)=0$.

A type $p=\mathrm{tp}(a/K)$ is isolated if and only if the prime ideal $\mathfrak p_a$ is maximal. Consequently, if $n=1$, there is exactly one non-isolated type in $S_1(K)$, corresponding to the generic point of the spectrum $\mathop{\rm Spec}(K[T])$.

As for any compact topological space, a space of types can be studied via its Cantor-Bendixson analysis, which is a decreasing sequence of subspaces, indexed by ordinals, defined by transfinite induction. First of all, for every topological space $X$, one denotes by $D(X)$ the set of all non-isolated points of $X$. One then defines $X_0=X$, $X_{\alpha}=D(X_\beta)$ if $\alpha=\beta+1$ is a successor-ordinal, and $X_\alpha=\bigcap_{\beta<\alpha} X_\beta$ if $\alpha$ is a limit-ordinal. For $x\in X$, the Cantor-Bendixson rank of $x$ is defined by $r_{CB}(x)=\alpha$ if $x\in X_\alpha$ and $x\not\in X_\beta$ for $\beta>\alpha$, and $r_{CB}(x)=\infty$ if $x\in X_\alpha$ for every ordinal $\alpha$. The set of points of infinite rank is the largest perfect subset of $X$.

Let us return to the example of the theory ACF. If a type $p\in S_n(K)$ corresponds to a prime ideal $\mathfrak p=j(p)$ of $\mathop{\rm Spec}(K[T_1,\dots,T_n])$, its Cantor-Bendixson rank is the Zariski dimension of $V(I)$. More generally, if $F$ is a constructible subset of $\mathop{\rm Spec}(K[T_1,\dots,T_n])$, then $r_{CB}(F)$ is the Zariski-dimension of the Zariski-closure of $F$. Moreover, the points of maximal Cantor-Bendixson rank correspond to the generic points of the irreducible components of maximal dimension; in particular, there are only finitely many of them.

Definition. — One says that a theory $T$ is $\omega$-stable if for every finite or countable set of parameters $A$, the space of 1-types $S_1(A)$ is finite or countable.

The theory ACF is $\omega$-stable. Indeed, if $K$ is the field generated by $A$, then $K[T]$ being
a countable noetherian ring, it has only countably many prime ideals.

Since any non-empty perfect set is uncountable, one has the following lemma.

Lemma. — Let $T$ be an $\omega$-stable theory and let $M$ be a model of $T$. Then the Cantor-Bendixson rank of every type $x\in S_n(M)$ is finite.

Let us assume that $T$ is $\omega$-stable and let $F$ be a closed subset of $S_n(M)$. Then $r_{CB}(F)=\sup \{ r_{CB}(x)\,;\, x\in F\}$ is finite, and the set of points $x\in F$ such that $r_{CB}(x)=r_{CB}(F)$ is finite and non-empty.

This example gives a strong indication that the model theory approach may be extremly fruitful for the study of algebraic theories whose geometry is not as well developed than algebraic geometry.

Link to Part 3 — Real closed fields and o-minimality

Sunday, December 8, 2013

Homotopy type theory on Images des mathématiques

This post will be a short advertisement to a longer general audience text about homotopy type theory that I published on the website Images des mathématiques.

In this text, I try to convey my excitement at the reading of the book published by the participants of last year's IAS program, under direction of Steve Awoodey, Thierry Coquand and Vladimir Voevodsky.  As I write there (this is the title of this article), this remarkable work is at the crossroads of foundations of mathematics, topology and computer science. Indeed, the new foundational setup for mathematics provided by type theory may not only replace set theory; it is also at the heart of the systems for computer proof checking, and gave birth to a new kind of ``synthetic homotopy theory'' which is totally freed of the general topology framework.

Also remarkable is the way this book was produced: written collaboratively, using technology well known in open source software's development, then published under a Creative commons's license, and printed on demand.

This is not the only general audience paper on this subject, probably not the last one neither. Here are links to those I know of:
Once more, here is the link towards my article on Images des mathématiques and that towards the HoTT Book!

Friday, March 8, 2013

A presheaf that has no associated sheaf

In his paper Basically bounded functors and flat sheaves (Pacific Math. J, vol. 57, no. 2, 1975, p. 597-610), William C. Waterhouse gives a nice example of a presheaf that has no associated sheaf. This is Theorem 5.5 (page 605).  I thank François Loeser for having indicated this paper to me, and for his suggestion of explaining it here!


Of course, such a beast is reputed not to exist, since it is well known that any presheaf has an associated sheaf, see for example Godement's book Topologie algébrique et théorie des faisceaux, pages 110-111.
That is, for any presheaf $F$ on a topological space, there is a sheaf $G$ with a morphism of presheaves $\alpha\colon F\to G$ which satisfies a universal property: any morphism from $F$ to a sheaf factors uniquely through $\alpha$.


Waterhouse's presheaf is a more sophisticated example of a presheaf, since it is a presheaf on the category of affine schemes for the flat topology. Thus, a presheaf $F$ on the category of affine schemes is the datum, 

  • of a set $F(A)$ for every ring $A$, 
  • and of a map $\phi_*\colon F(A)\to F(B)$ for every morphism of rings $\phi\colon A\to B$,

subject to the following conditions:

  • if $\phi\colon A\to B$ and $\psi\colon B\to C$ are morphism of rings, then $(\psi\circ\phi)_*=\psi_*\circ\phi_*$;
  • one has ${\rm id}_A)_*={\rm id}_{F(A)}$ for every ring $A$.

Any morphism of rings $\phi\colon A\to B$ gives rise to two morphisms $\psi_1,\psi_2\colon B\to B\otimes B$ respectively defined by $\psi_1(b)=b\otimes 1$ and $\psi_2(b)=1\otimes b$, and the two compositions $A\to B\to B\otimes_A B$ are equal. Consequently, for any presheaf $F$, the two associated maps $F(A) \to F(B) \to F(B\otimes_A B)$ are equal.

By definition, a presheaf $F$ is a sheaf for the flat topology if for any faithfully flat morphism of rings, the map ${\phi_*} \colon F(A)\to F(B)$ is injective and its image is the set of elements $g\in F(B)$ at which the two natural maps $(\psi_1)_*$ and $(\psi_2)_*$ from $F(B)$ to $F(B\otimes_A B)$ coincide.

Here is Waterhouse's example.

For every ring $A$, let $F(A)$ be the set of all locally constant functions $f$ from $\mathop{\rm Spec}(A)$ to some von Neumann cardinal such that $f(\mathfrak p)<\mathop{\rm Card}(\kappa(\mathfrak p))$ for every $\mathfrak p\in\mathop{\rm Spec}(A)$.

This is a presheaf. Indeed, let $\phi\colon A\to B$ is a ring morphism, let $\phi^a\colon\mathop{\rm Spec}(B)\to \mathop{\rm Spec}(A)$ be the associated continuous map on spectra. For $f\in F(A)$, then $f\circ\phi^a$ is a locally constant map from ${\rm Spec}(B)$ to some von Neumann cardinal. Moreover, for every prime ideal $\mathfrak q$ in $B$, with inverse image $\mathfrak p=\phi^{-1}(\mathfrak q)=\phi^a(\mathfrak q)$, the morphism $\phi$ induces an injection from the residue field $\kappa(\mathfrak q)$ into $\kappa(\mathfrak p)$, so that $f\circ\phi^a$ satisfies the additional condition on $F$, hence $f\circ\phi^a\in F(B)$.

However, this presheaf has no associated sheaf for the flat topology. The proof is by contradiction. So assume that $G$ is a sheaf and $\alpha\colon F\to G$ satisfies the universal property.

First of all, we prove that the morphism $\alpha$ is injective: for any ring $A$, the map $\alpha_A\colon F(A)\to G(A)$ is injective. For any cardinal $c$ and any ring $A$, let $L_c(A)$ be the set of locally constant maps  from ${\rm Spec}(A)$ to $c$. Then $L_c$ is a presheaf, and in fact a sheaf. There is a natural morphism of presheaves $\beta_c\colon F\to L_c$, given by $\beta_c(f)(\mathfrak p)=f(\mathfrak p)$ if $f(\mathfrak p)\in c$, that is, $f(\mathfrak p)<c$, and $\beta_c(f)(\mathfrak p)=0$ otherwise. Consequently, there is a unique morphism of sheaves $\gamma_c\colon G\to L_c$ such that $\beta_c=\gamma_c\circ\alpha$. For any ring $A$, and any large enough cardinal $c$, the  map $\beta_c(A)\colon F(A)\to L_c(A)$ is injective. In particular, the map $\alpha(A)$ must be injective.

Let $B$ be a ring and $\phi\colon A\to B$ be a faithfully flat morphism. Let $\psi_1,\psi_2\colon B\to B\otimes_A B$ be the two natural morphisms of rings defined above. Then, the equalizer $E(A,B)$ of the two maps $(\psi_1)_*$ and $(\psi_2)_*$ from $F(B)$ to $F(B\otimes_A B)$ must inject into the equalizer of the two corresponding maps from $G(B)$ to $G(B\otimes_A B)$. Consequently, one has an injection from $E(A,B)$ to $G(A)$.

The contradiction will become apparent once one can find rings $B$ for which $E(A,B)$ has a cardinality as large as desired. If ${\rm Spec}(B)$ is a point $\mathfrak p$, then $F(B)$ is just the set of functions $f$ from the point $\mathfrak p$ to some von Neumann cardinal $c$ such that $f(\mathfrak p)<{\rm Card}(\kappa(\mathfrak p))$. That is, $F(B)$ is the cardinal ${\rm Card}(\kappa(\mathfrak p))$ itself. And since ${\rm Spec}(B)$ is a point, the coincidence condition is necessarily satisfied, so that $E(A,B)= {\rm Card}(\kappa(\mathfrak p))\leq G(A)$.

To conclude, it suffices to take a faithfully flat morphism $A\to B$  such that $B$ is field of cardinality strictly greater than $G(A)$. For example, one can take $A$ to be a field and $B$ the field of rational functions in many indeterminates (strictly more than the cardinality of $G(A)$).

What does this example show? Why isn't there a contradiction in mathematics (yet)?

Because the definition of sheaves and presheaves for the flat topology that I gave above was definitely defective: it neglects in a too dramatic way the set theoretical issues that one must tackle to define sheaves on categories. In the standard setting of set theory provided by ZFC, everything is a set. In particular, categories, presheaves, etc. are sets or maps between sets (themselves represented by sets).  But the presheaf $F$ that Waterhouse defines does not exist as a set, since there does not exist a set $\mathbf{Ring}$ of all rings, nor a set $\mathbf{card}$ of all von Neumann cardinals.

The usual way (as explained in SGA 4) to introduce sheaves for the flat topology consists in adding the axiom of universes — there exists a set $\mathscr U$ which is a model of set theory. Then, one does not consider the (inexistent) set of all rings, or cardinals, but only those belonging to the universe $\mathscr U$—one talks of $\mathscr U$-categories, $\mathscr U$-(pre)sheaves, etc.. In that framework, the $\mathscr U$-presheaf $F$ defined by Waterhouse (where one restricts oneself to algebras and von Neumann cardinals in $\mathscr U$) has an associated sheaf $G_{\mathscr U}$. But this sheaf depends on the chosen universe: if $\mathscr V$ is an universe containing $\mathscr U$, the restriction of $G_{\mathscr V}$ to algebras in $\mathscr U$ will no longer be a $\mathscr U$-presheaf.

Wednesday, December 5, 2012

Product of two quotient spaces - an frequent and (in)famous mistake

The first edition of Bourbaki's General Topology (chapter I, §9, p. 56) contains the following theorem.

Proposition 3. Soient $E$, $F$ deux espaces topologiques, $R$ une relation d'équivalence dans $E$, $S$ une relation d'équivalence dans $F$. L'application canonique de l'espace produit $(E/R) \times (F/S)$ sur l'espace quotient $(E\times F)/(R\times S)$ est un homéomorphisme.

It is followed by a very convincing proof. However, the theorem is wrong. The subsequent editions give an example where the spaces are not homeomorphic, even when one of the equivalence relation is equality.

I finally understood where the mistake is. It is in the very statement! Indeed, there is a canonical map, say $h$, between those two spaces, but it goes the other way round, namely from $(E\times F)/(R\times S)$ to $(E/R)\times (F/S)$. This map is continuous, as it should be. But Bourbaki, assuming that the natural canonical map goes the other way round, pretended that $h^{-1}$ is continuous, and embarked in proving that its reciprocal bijection, $h$, is also continuous, what it is...

There are cases where one would like this theorem to holds, for example when one discusses topologies on the fundamental group. Indeed, the fundamental group of a pointed space $(X,x)$ is a quotient of the space of loops based at $x$ on $X$ for the pointed-homotopy relation, hence can be endowed with the quotient of the topology of compact convergence (roughly, uniform convergence on compact sets). Multiplication of loops is continuous. However, the resulting group law on $\pi_1(X,x)$ need not be.

The mistake appears in the recent litterature, see for example this paper, or that one (which has been even featured as «best AMM paper of the year» in 2000...). MathScinet is not aware of the flaws in those papers... Fortunately, MathOverflow is!

Sunday, November 18, 2012

The van Kampen Theorem

Let me recall the statement of this theorem.

Theorem. Let $X$ be a topological space, let $U,V$ be connected open subsets of $X$ such that $W=U\cap V$ is connected and let $x$ be a point of $U\cap V$. Then, the fundamental group $\pi_1(X,x)$ is the amalgamated product $\pi_1(U,x) *_{\pi_1(W,x)} \pi_1(V,x)$, that is, the quotient of the free product of the groups $\pi_1(U,x)$ and $\pi_1(V,x)$ by the normal subgroup generated by the elements of the form $i_U(c)i_V(c)^{-1}$, where $i_U$ and $i_V$ are the natural injections from the groups $\pi_1(U,x)$ and $\pi_1(V,x)$ respectively in their free product.

The classical proof of this result in topology books relies decomposes a loop at $x$ as a product of loops at $x$ which are either contained in $U$, or in $V$.

(In fact, van Kampen proves a theorem which is quite different at first sight.)

It has been long recognized that there is a completely different approach is possible, from which all loops are totally absent. For this proof we make a supplementary assumption, namely that our spaces are « semi-locally simply connected » : Any point $a$ of $X$ has a neighborhood $A$ such that the morphism $\pi_1(A,a)\to \pi_1(X,a)$ is trivial.

When $X$ is a connected slsc space together with a point $x$, the theory of the fundamental group is related to the theory of coverings,under the form of an equivalence of categories between coverings of $X$ and sets with an action of $\pi_1(X,x)$. The equivalence of categories is explicit; it maps a covering $p\colon Y\to X$ to the fiber $p^{-1}(x)$ on which $\pi_1(X,x)$ acts naturally via the path-lifting property of coverings (given $y\in p^{-1}(x)$, any loop $c$ at $x$ lifts uniquely to a path with origin $y$, the endpoint of which is $c\cdot y$).

Given this equivalence, one can prove the van Kampen Theorem very easily in two steps. First of all, one observes that it is equivalent to have a covering of $X$ as to have a covering of $U$ and a covering of $V$ together with an identification of these coverings above $W$. A covering of $U$ corresponds to a set $A$ with an action of $\pi_1(U,x)$; a covering of $V$ corresponds to a set $B$ with an action of $\pi_1(V,x)$; an identification of these coverings above $W$ corresponds to a bijection from $A$ to $B$ which is compatible with the two actions of $\pi_1(W,x)$ acting on $A$ via the morphism $\pi_1(W,x)\to \pi_1(U,x)$ and on $B$ via the morphism $\pi_1(W,x)\to \pi_1(V,x)$. It is harmless to assume that $A=B$ and that the bijection from $A$ to $B$ is the identity. Now, a covering of $X$ corresponds to a set $A$ together with two actions of the groups $\pi_1(U,x)$ and $\pi_1(V,x)$ such that the two actions of $\pi_1(W,x)$ are equal. This is precisely the same as a set $A$ together with an action of the amalgamated product $\pi_1(U,x)*_{\pi_1(W,x)}\pi_1(V,x)$. CQFD.

The same proof applies and allows to describe the fundamental group of an union of spaces in more general contexts. For example, let us use the same method to understand the fundamental group of the circle $\mathbf S_1$. It is clear that a circle is nothing but an interval $ [0,1]$ of which the two endpoints are glued, and a covering of the circle corresponds to a covering $p\colon X\to [0,1]$ of the interval $[0,1]$ together with an identification of the fibers at $0$ and~$1$. Now, a covering of the interval can be written as a product $A\times [0,1]$ (where $A$ is the fiber at $0$, say). Consequently, identifying the fibers at $0$ and $1$ means giving yourself a bijection of $A$ to $A$. In other words, a covering of the circle « is »  a set $A$ together with a permutation of $A$, in other words, a set $A$ with an action of the additive group $\mathbf Z$. Moreover, the obvious loop is the image by the glueing map $[0,1]\to\mathbf S_1$ of the obvious path joining $0$ to $1$ so that this loop is the generator of $\pi_1(\mathbf S_1,p(0))$.

Observe that the latter example is not an instance of the van Kampen Theorem. One could get it via a groupoid-version of van Kampen.

All of this is more or less explained in the following texts:

  • Adrien and Régine Douady, Algèbre et théories galoisiennes, Cassini 2005.
  • Ronald Brown, Topology and Groupoids, Booksurge Publishing, 2006.
  • I remember having read an old Bourbaki Tribu from the 50sby Cartan, Eilenberg and/or Weil explaining this, but I cannot find it anymore on the archive. :-(