Sunday, April 14, 2024

Evaluating the operator norms of matrices

Let $E$ and $F$ be normed vector spaces, over the real or complex numbers, and let $u\colon E\to F$ be a linear map. The continuity of $u$ is proved to be equivalent to the existence of a real number $c$ such that $\|u(x)\|\leq c \|x\|$ for every $x\in E$, and the least such real number is called the operator norm of $u$; we denote it by $\|u\|$. It defines a norm on the linear space $\mathscr L(E;F)$ of continuous linear maps from $E$ to $F$ and as such is quite important. When $E=F$, it is also related to the spectrum of $u$ and is implicitly at the heart of criteria for the Gershgorin criterion for localization of eigenvalues.

$\gdef\R{\mathbf R}\gdef\norm#1{\lVert#1\rVert}\gdef\Abs#1{\left|#1\right|}\gdef\abs#1{\lvert#1\rvert}$

However, even in the simplest cases of matrices, its explicit computation is not trivial at all, and as we'll see even less trivial than what is told in algebra classes, as I learned by browsing Wikipedia when I wanted to prepare a class on the topic.

Since I'm more a kind of abstract guy, I will use both languages, of normed spaces and matrices, for the first one allows to explain a few things at a more fundamental level. I'll make the translation, though. Also, to be specific, I'll work with real vector spaces.

So $E=\R^m$, $F=\R^n$, and linear maps in $\mathscr L(E;F)$ are represented by $n\times m$ matrices. There are plentiful interesting norms on $E$, but I will concentrate the discussion on the $\ell^p$-norm given by $\norm{(x_1,\dots,x_m)} = (|x_1|^p+\dots+|x_m|^p)^{1/p}$. Similarly, I will consider the $\ell^q$-norm on $F$ given by $\norm{(y_1,\dots,y_m)} = (|y_1|^q+\dots+|y_n|^q)^{1/q}$. Here, $p$ and $q$ are elements of $[1;+\infty\mathclose[$. It is also interesting to allow $p=\infty$ or $q=\infty$; in this case, the expression defining the norm is just replaced by $\sup(|x_1|,\dots,|x_m|)$ and $\sup(|y_1|,\dots,|y_n|)$ respectively.

Duality

Whatever norm is given on $E$, the dual space $E^*=\mathscr L(E;\mathbf R)$ is endowed with the dual norm, which is just the operator norm of that space: for $\phi\in E^*$, $\norm\phi$ is the least real number such that $|\phi(x)|\leq \norm\phi \norm x$ for all $x\in E$. And similarly for $F$. To emphasize duality, we will write $\langle x,\phi\rangle$ instead of $\phi(x)$.

Example. — The dual norm of the $\ell^p$ norm can be computed explicitly, thanks to the Young inequality \[ \Abs{ x_1 y_1 + \dots + x_n y_n } \leq (|x_1|^p+\dots + |x_n|^p)^{1/p} (|x_1|^q+\dots+|x_n|^q)^{1/q}\] if $p,q$ are related by the relation $\frac1p+\frac1q=1$. (When $p=1$, this gives $q=\infty$, and symmetrically $p=\infty$ if $q=1$.) Moreover, this inequality is optimal in the sense that for any $(x_1,\dots,x_n)$, one may find a nonzero $(y_1,\dots,y_n)$ for which the inequality is an equality. What this inequality says about norms/dual norms is that if one identifies $\R^n$ with its dual, via the duality bracket $\langle x,y\rangle=x_1y_1+\dots+x_n y_n$, the dual of the $\ell^p$-norm is the $\ell^q$-norm, for that relation $1/p+1/q=1$.

If $u\colon E\to F$ is a continuous linear map, it has an adjoint (or transpose) $u^*\colon F^*\to E^*$, which is defined by $u^*(\phi)= \phi\circ u$, for $\phi\in F^*$. In terms of the duality bracket, this rewrites as \[ \langle \phi, u(x)\rangle = \langle u^*(\phi),x\rangle\] for $x\in E$ and $\phi\in F^*$.

Proposition.One has $\norm{u^*}=\norm u$.

For $\phi\in F^*$, $\norm{u^*(\phi)}$ is the least real number such that $|u^*(\phi)(x)|\leq \norm{u^*(\phi)} \norm x$ for all $x\in E$. Since one has \[ |u^*(\phi)(x)|= |\langle u^*(\phi),x\rangle|=|\langle\phi, u(x)\rangle\leq \norm\phi \norm{u(x)} \leq \norm\phi \norm u\norm x, \] we see that $\norm {u^*(\phi)}\leq \norm\phi\norm u$ for all $\phi$. As a consequence, $\norm{u^*}\leq \norm u$.
To get the other inequality, we wish to find a nonzero $\phi$ such that $\norm{u^*(\phi)}=\norm{u}\norm\phi$. This $\phi$ should thus be such that there exists $x$ such that $|\langle u^*(\phi),x\rangle|=\norm u\norm\phi\norm x$ which, by the preceding computation means that $|\langle\phi, u(x)\rangle=\norm\phi\norm u\norm x$. Such $\phi$ and $x$ must not exist in general, but we can find reasonable approximations. Start with a nonzero $x\in E$ such that $\norm{u(x)}$ is close to $\norm u\norm x$; then using the Hahn-Banach theorem, find a nonzero $\phi\in F^*$ such that $\norm\phi=1$ and $|\phi(u(x))|=\norm {u(x)}$. We see that $\langle\phi, u(x)\rangle$ is close to $\norm u\norm\phi\norm x$, and this concludes the proof.
In some cases, in particular in the finite dimension case, we can use biduality to get the other inequality. Indeed $E^{**}$ identifies with $E$, with its initial norm, and $u^{**}$ identifies with $u$. By the first case, we thus have $\norm{u^{**}}\leq \norm {u^*}$, hence $\norm u\leq\norm{u^*}$.

The case $p=1$

We compute $\norm{u}$ when $E=\mathbf R^m$ is endowed with the $\ell^1$-norm, and $F$ is arbitrary. The linear map $u\colon E\to F$ thus corresponds with $m$ vectors $u_1,\dots,u_m$ of $F$, and one has \[ u((x_1,\dots,x_m))=x_1 u_1+\dots+x_m u_m. \] By the triangular inequality, we have \[ \norm{u((x_1,\dots,x_m))} \leq |x_1| \norm{u_1}+\dots+\abs{x_m}\norm{u_m} \] hence \[ \norm{u((x_1,\dots,x_m))} \leq (\abs{x_1} +\dots+\abs{x_m}) \sup(\norm{u_1},\dots,\norm{u_m}). \] Consequently, \[ \norm{u} \leq \sup(\norm{u_1},\dots,\norm{u_m}). \] On the other hand, taking $x=(x_1,\dots,x_m)$ of the form $(0,\dots,1,0,\dots)$, where the $1$ is at place $k$ such that $\norm{u_k}$ is largest, we have $\norm{x}=1$ and $\norm{u(x)}=\norm{u_k}$. The preceding inequality is thus an equality.

In the matrix case, this shows that the $(\ell^1,\ell^q)$-norm of a $n\times m$ matrix $A$ is the supremum of the $\ell^q$-norms of the columns of $A$.

The case $q=\infty$

We compute $\norm{u}$ when $F=\mathbf R^n$ is endowed with the $\ell^\infty$-norm, and $E$ is arbitrary. A direct computation is possible in the matrix case, but it is not really illuminating, and I find it better to argue geometrically, using a duality argument.

Namely, we can use $u^*\colon F^*\to E^*$ to compute $\norm{u}$, since $\norm u=\norm{u^*}$. We have seen above that $F^*$ is $\mathbf R^n$, endowed with the $\ell^1$-norm, so that we have computed $\norm{u^*}$ in the preceding section. The basis $(e_1,\dots,e_n)$ of $F$ gives a dual basis $(\phi_1,\dots,\phi_n)$, and one has \[ \norm{u}=\norm{u^*} = \sup (\norm{u^*(\phi_1)},\dots,\norm{u^*(\phi_n)}). \]

In the matrix case, this shows that the $(\ell^p,\ell^\infty)$-norm of a $n\times m$ matrix $A$ is the supremum of the $\ell^p$-norms of the lines of $A$.

Relation with the Gershgorin circle theorem

I mentioned the Gershgorin circle theorem as being in the same spirit as the computation of an operator norm, because its proof relies on the same kind of estimations. In fact, no additional computation is necessary!

Theorem (Gershgorin “circles theorem”). — Let $A=(a_{ij})$ be an $n\times n$ matrix and let $\lambda$ be an eigenvalue of $A$. There exists an integer $i$ such that \[ \abs{\lambda-a_{ii}}\leq \sum_{j\neq i} \abs{a_{ij}}. \]

For the proof, one writes $A=D+N$ where $D$ is diagonal has zeroes on its diagonal, and writes $\lambda x=Ax=Dx+Nx$, hence $(\lambda I-D)x=Nx$. Endow $\R^n$ with the $\ell^\infty$-norm. We can assume that $\norm x=1$. Then the norm of the right hand side is bounded above by $\norm N$, while the norm of the left hand side is $\sup(\abs{\lambda-a_{ii}} |x_i|)\geq |\lambda-a_{ii}|$ if $i$ is chosen so that $|x_i|=\norm x=1$. Given the above formula for $\norm N$, this implies the theorem.

The case $p=q=2$

Since Euclidean spaces are very useful in applications, this may be a very important case to consider, and we will see that the answer is not at all straightforward from the coefficients of the matrix.

We have to bound from above $\norm{u(x)}$. Using the scalar product, we write \[ \norm{u(x)}^2 = \langle u(x),u(x)\rangle = \langle u^*u(x),x\rangle, \] where $u^*\colon F\to E$ now denotes the adjoint of $u$, which identifies with the transpose of $u$ if one identifies $E$ with $E^*$ and $F$ with $F^*$ by means of their scalar products. Using the Cauchy-Schwarz formula, we get that $\norm{u(x)}^2\leq \norm{u^*u(x)}\norm x\leq \norm{u^*u} \norm x^2$, hence $\norm{u} \leq \norm{u^*u}^{1/2}$. This inequality is remarkable because on the other hand, we have $\norm{u^*u}\leq \norm{u^*}\norm{u}=\norm{u}^2$. Consequently, $\norm{u}=\norm{u^*u}^{1/2}$.

This formula might not appear to be so useful, since it reduces the computation of the operator norm of $u$ to that of $u^*u$. However, the linear map $u^*u$ is a positive self-adjoint endomorphism of $E$ hence, (assuming that $E$ is finite dimensional here), it can be diagonalized in a orthonormal basis. We then see that $\norm{u^*u}$ is the largest eigenvalue of $u^*u$, which is also its spectral radius. The square roots of the eigenvalues of $u^*u$ are also called the singular values of $u$, hence $\norm u$ is the largest singular value of $u$.

One can play with duality as well, and we have $\norm{u}=\norm{uu^*}^{1/2}$.

Other cases?

There are general inequalities relating the various $\ell^p$-norms of a vector $x\in\R^m$, and these can be used to deduce inequalities for $\norm u$, when $E=\R^m$ has an $\ell^p$-norm and $F=\R^n$ has an $\ell^q$-norm. However, given the explicit value of $\norm u$ for $(p,q)=(2,2)$ and the fact that no closed form expression exists for the spectral radius, it is unlikely that there is a closed form expression in the remaining cases.

Worse: the exact computation of $\norm u$ in the cases $(\infty,1)$, $(\infty,2)$ or $(2,1)$ is known to be computationally NP-complete, and I try to explain this result below, following J. Rohn (2000) (“Computing the Norm ∥ A ∥∞,1 Is NP-Hard”, Linear and Multilinear Algebra 47 (3), p. 195‑204). I concentrate on the $(\infty, 1)$ case ; the $(\infty,2)$ case is supposed to be analogous (see Joel Tropp's thesis, top of page 48, quoted by Wikipedia, but no arguments are given), and the case $(2,1)$ would follow by symmetry.

A matrix from a graph

Let us consider a finite (undirected, simple, without loops) graph $G$ on the set $V=\{1,\dots,n\}$ of $n$ vertices, with set of edges $E$, and let us introduce the following $n\times n$ matrix $A=(a_{ij})$, a variant of the incidence matrix of the graph $G$ (actually $nI-E$, where $I$ is the identity matrix and $E$ is the incidence matrix of $G$):

  • One has $a_{ii}=n$ for all $i$;
  • If $i\neq j$ and vertices $i$ and $j$ are connected by an edge, then $a_{ij}=-1$;
  • Otherwise, $a_{ij}=0$.
For any subset $S$ of $V$, the cut $c(S)$ of $S$ is the number of edges which have one endpoint in $S$ and the other outside of $S$.

Proposition.The $(\ell^\infty,\ell^1)$-norm of $A$ is given by \[ 4 \sup_{S\subseteq V} c(S) - 2 \operatorname{Card}(E) + n^2. \]

The proof starts with the following observation, valid for more general matrices.

Lemma. — The $(\ell^\infty,\ell^1)$-norm of a symmetric positive $n\times n$ matrix $A$ is given by $\norm A = \sup_z \langle z, Az \rangle$ where $z$ runs among the set $Z$ of vectors in $\R^n$ with coordinates $\pm1$.

The vectors of $Z$ are the vertices of the polytope $[-1;1]^n$, which is the unit ball of $\R^n$ for the $\ell^\infty$-norm. Consequently, every vector of $[-1;1]^n$ is a convex combination of vectors of $Z$. Writing $x=\sum_{z\in Z} c_z z$, we have \[\norm {Ax} = \norm{\sum c_z Az} \leq \sum c_z \norm {Az}= \sup_{z\in Z} \norm{Az}. \] The other inequality being obvious, we already see that $\norm A=\sup_{z\in Z}\norm{Az}$. Note that this formula holds for any norm on the codomain.
If, for $z\in Z$, one writes $Az=(y_1,\dots,y_n)$, one has $\norm{Az}=|y_1|+\dots+|y_n|$, because the codomain is endowed with the $\ell^1$-norm, so that $\langle z, Az\rangle = \sum z_i y_i\leq \norm{Az}$. We thus the inequality $\sup_{z\in Z} \langle z,Az\rangle \leq \norm A$.
Let us now use the fact that $A$ is symmetric and positive. Fix $z\in Z$, set $Az=(y_1,\dots,y_n)$ as above, and define $x\in Z$ by $x_i=1$ if $y_i\geq0$ and $x_i=-1$ otherwise. One thus has $\langle x, Az\rangle=\sum |y_i|=\norm{Az}$. Since $A$ is symmetric and positive, one has $\langle x-z, A(x-z)\rangle\geq0$, and this implies \[2\norm{Az}= 2\langle x, Az\rangle \leq \langle x, Ax\rangle+\langle z, Az\rangle, \] so that, $\norm{Az}\leq \sup_{x\in Z} \langle x, Ax\rangle$. This concludes the proof.

To prove the theorem, we will apply the preceding lemma. We observe that $A$ is symmetric, by construction. It is also positive, since for every $x\in\R^n$, one has \[\langle x,Ax\rangle=\sum a_{ij}x_ix_j \geq n \sum x_i^2 -\sum_{i\neq j} x_i x_j = (n+1)\sum x_i^2- (\sum x_i)^2\geq \sum x_i^2 \] using the Cauchy-Schwarz inequality $(\sum x_i)^2\leq n\sum x_i^2$. By the preceding lemma, we thus have \[ \norm A = \sup_{z\in\{\pm1\}^n} \langle z, Az\rangle. \] The $2^n$ vectors $z\in Z$ are in bijection with the $2^n$ subsets of $V=\{1,\dots,n\}$, by associating with $z\in Z$ the subset $S$ of $V$ consisting of vertices $i$ such that $z_i=1$. Then, one can compute \[ \langle z, Az\rangle = \sum_{i,j} a_{ij} z_iz_j = 4c(S) - 2\operatorname{Card}(E) + n^2. \] It follows that $\norm A $ is equal to the indicated expression.

The last step of the proof is an application of the “simple max-cut” NP-hardness theorem of Garey, Johnson and Stockmeyer (1976), itself a strenghtening of Karp (1973)'s seminal result that “max-cut” is NP-complete. I won't explain the proofs of these results here, but let me explain what they mean and how they relate to the present discussion. First of all, computer scientists categorize problems according to the time that is required to solve them, in terms of the size of the entries. This notion depends on the actual computer that is used, but the theory of Turing machines allows to single out two classes, P and EXP, consisting of problems which can be solved in polynomial, respectively exponential, time in term of the size of the entries. A second notion, introduced by Karp, is that of NP problems, problems which can be solved in polynomial time by a “non deterministic Turing machine” — “nondeterministic” means the computer can parallelize itself at will when it needs to consider various possibilities. This class belongs to EXP (because one can simulate in exponential time a polynomial time nondeterministic algorithm) and also corresponds to the class of problems whose solution can be checked in polynomial time.

Our problem is to find a subset $S$ of $\{1,\dots,n\}$ that maximizes $c(S)$. This is a restriction of the “general max-cut” problem where, given an integer valued function $w$ on the set of edges, on wishes to find subset that maximizes $c(S;w)$, the sum of the weights of the edges which have one endpoint in $S$ and the other outside of $S$. Karp (1973) observed that the existence of $S$ such that $c(S;w)\geq m$ is an NP problem (if one is provided $S$, it takes polynomial time to compute $c(S;w)$ and to decide that it is at least $m$), and the naïve search algorithm is in EXP, since there are $2^n$ such subsets. Moreover, Karp proves that any NP problem can be reduced to it in polynomial time. This is what is meant by the assertion that it is NP-complete. Consequently, determining $\sup_S c(S;w)$ is NP-hard: if you can solve that problem, then you can solve the “max-cut” problem in polynomial time, hence any other NP-problem. A subsequent theorem by Garey, Johnson and Stockmeyer (1976) established that restricting the max-cut problems to $\pm1$ weights is still NP-hard, and this completes the proof of Rohn's theorem.

(Aside, to insist that signs matter: a theorem of Edmonds and Karp (1972), one can solve the “min-cup” problem in polynomial time, which consists in deciding, for some given integer $m$, whether there exist $S$ such that $c(S;w)\leq m$.)

No comments :

Post a Comment