Sunday, April 14, 2024

Evaluating the operator norms of matrices

Let EE and FF be normed vector spaces, over the real or complex numbers, and let u ⁣:EFu\colon E\to F be a linear map. The continuity of uu is proved to be equivalent to the existence of a real number cc such that u(x)cx\|u(x)\|\leq c \|x\| for every xEx\in E, and the least such real number is called the operator norm of uu; we denote it by u\|u\|. It defines a norm on the linear space L(E;F)\mathscr L(E;F) of continuous linear maps from EE to FF and as such is quite important. When E=FE=F, it is also related to the spectrum of uu and is implicitly at the heart of criteria for the Gershgorin criterion for localization of eigenvalues.

\gdef\R{\mathbf R}\gdef\norm#1{\lVert#1\rVert}\gdef\Abs#1{\left|#1\right|}\gdef\abs#1{\lvert#1\rvert}

However, even in the simplest cases of matrices, its explicit computation is not trivial at all, and as we'll see even less trivial than what is told in algebra classes, as I learned by browsing Wikipedia when I wanted to prepare a class on the topic.

Since I'm more a kind of abstract guy, I will use both languages, of normed spaces and matrices, for the first one allows to explain a few things at a more fundamental level. I'll make the translation, though. Also, to be specific, I'll work with real vector spaces.

So E=RmE=\R^m, F=RnF=\R^n, and linear maps in L(E;F)\mathscr L(E;F) are represented by n×mn\times m matrices. There are plentiful interesting norms on EE, but I will concentrate the discussion on the p\ell^p-norm given by (x1,,xm)=(x1p++xmp)1/p\norm{(x_1,\dots,x_m)} = (|x_1|^p+\dots+|x_m|^p)^{1/p}. Similarly, I will consider the q\ell^q-norm on FF given by (y1,,ym)=(y1q++ynq)1/q\norm{(y_1,\dots,y_m)} = (|y_1|^q+\dots+|y_n|^q)^{1/q}. Here, pp and qq are elements of [1;+[[1;+\infty\mathclose[. It is also interesting to allow p=p=\infty or q=q=\infty; in this case, the expression defining the norm is just replaced by sup(x1,,xm)\sup(|x_1|,\dots,|x_m|) and sup(y1,,yn)\sup(|y_1|,\dots,|y_n|) respectively.

Duality

Whatever norm is given on EE, the dual space E=L(E;R)E^*=\mathscr L(E;\mathbf R) is endowed with the dual norm, which is just the operator norm of that space: for ϕE\phi\in E^*, ϕ\norm\phi is the least real number such that ϕ(x)ϕx|\phi(x)|\leq \norm\phi \norm x for all xEx\in E. And similarly for FF. To emphasize duality, we will write x,ϕ\langle x,\phi\rangle instead of ϕ(x)\phi(x).

Example. — The dual norm of the p\ell^p norm can be computed explicitly, thanks to the Young inequality   x1y1++xnyn(x1p++xnp)1/p(x1q++xnq)1/q \Abs{ x_1 y_1 + \dots + x_n y_n } \leq (|x_1|^p+\dots + |x_n|^p)^{1/p} (|x_1|^q+\dots+|x_n|^q)^{1/q} if p,qp,q are related by the relation 1p+1q=1\frac1p+\frac1q=1. (When p=1p=1, this gives q=q=\infty, and symmetrically p=p=\infty if q=1q=1.) Moreover, this inequality is optimal in the sense that for any (x1,,xn)(x_1,\dots,x_n), one may find a nonzero (y1,,yn)(y_1,\dots,y_n) for which the inequality is an equality. What this inequality says about norms/dual norms is that if one identifies Rn\R^n with its dual, via the duality bracket x,y=x1y1++xnyn\langle x,y\rangle=x_1y_1+\dots+x_n y_n, the dual of the p\ell^p-norm is the q\ell^q-norm, for that relation 1/p+1/q=11/p+1/q=1.

If u ⁣:EFu\colon E\to F is a continuous linear map, it has an adjoint (or transpose) u ⁣:FEu^*\colon F^*\to E^*, which is defined by u(ϕ)=ϕuu^*(\phi)= \phi\circ u, for ϕF\phi\in F^*. In terms of the duality bracket, this rewrites as  ϕ,u(x)=u(ϕ),x \langle \phi, u(x)\rangle = \langle u^*(\phi),x\rangle for xEx\in E and ϕF\phi\in F^*.

Proposition.One has u=u\norm{u^*}=\norm u.

For ϕF\phi\in F^*, u(ϕ)\norm{u^*(\phi)} is the least real number such that u(ϕ)(x)u(ϕ)x|u^*(\phi)(x)|\leq \norm{u^*(\phi)} \norm x for all xEx\in E. Since one has  u(ϕ)(x)=u(ϕ),x=ϕ,u(x)ϕu(x)ϕux, |u^*(\phi)(x)|= |\langle u^*(\phi),x\rangle|=|\langle\phi, u(x)\rangle\leq \norm\phi \norm{u(x)} \leq \norm\phi \norm u\norm x, we see that u(ϕ)ϕu\norm {u^*(\phi)}\leq \norm\phi\norm u for all ϕ\phi. As a consequence, uu\norm{u^*}\leq \norm u.
To get the other inequality, we wish to find a nonzero ϕ\phi such that u(ϕ)=uϕ\norm{u^*(\phi)}=\norm{u}\norm\phi. This ϕ\phi should thus be such that there exists xx such that u(ϕ),x=uϕx|\langle u^*(\phi),x\rangle|=\norm u\norm\phi\norm x which, by the preceding computation means that ϕ,u(x)=ϕux|\langle\phi, u(x)\rangle=\norm\phi\norm u\norm x. Such ϕ\phi and xx must not exist in general, but we can find reasonable approximations. Start with a nonzero xEx\in E such that u(x)\norm{u(x)} is close to ux\norm u\norm x; then using the Hahn-Banach theorem, find a nonzero ϕF\phi\in F^* such that ϕ=1\norm\phi=1 and ϕ(u(x))=u(x)|\phi(u(x))|=\norm {u(x)}. We see that ϕ,u(x)\langle\phi, u(x)\rangle is close to uϕx\norm u\norm\phi\norm x, and this concludes the proof.
In some cases, in particular in the finite dimension case, we can use biduality to get the other inequality. Indeed EE^{**} identifies with EE, with its initial norm, and uu^{**} identifies with uu. By the first case, we thus have uu\norm{u^{**}}\leq \norm {u^*}, hence uu\norm u\leq\norm{u^*}.

The case p=1p=1

We compute u\norm{u} when E=RmE=\mathbf R^m is endowed with the 1\ell^1-norm, and FF is arbitrary. The linear map u ⁣:EFu\colon E\to F thus corresponds with mm vectors u1,,umu_1,\dots,u_m of FF, and one has  u((x1,,xm))=x1u1++xmum. u((x_1,\dots,x_m))=x_1 u_1+\dots+x_m u_m. By the triangular inequality, we have u((x1,,xm))x1u1++xmum \norm{u((x_1,\dots,x_m))} \leq |x_1| \norm{u_1}+\dots+\abs{x_m}\norm{u_m} hence  u((x1,,xm))(x1++xm)sup(u1,,um). \norm{u((x_1,\dots,x_m))} \leq (\abs{x_1} +\dots+\abs{x_m}) \sup(\norm{u_1},\dots,\norm{u_m}). Consequently,  usup(u1,,um). \norm{u} \leq \sup(\norm{u_1},\dots,\norm{u_m}). On the other hand, taking x=(x1,,xm)x=(x_1,\dots,x_m) of the form (0,,1,0,)(0,\dots,1,0,\dots), where the 11 is at place kk such that uk\norm{u_k} is largest, we have x=1\norm{x}=1 and u(x)=uk\norm{u(x)}=\norm{u_k}. The preceding inequality is thus an equality.

In the matrix case, this shows that the (1,q)(\ell^1,\ell^q)-norm of a n×mn\times m matrix AA is the supremum of the q\ell^q-norms of the columns of AA.

The case q=q=\infty

We compute u\norm{u} when F=RnF=\mathbf R^n is endowed with the \ell^\infty-norm, and EE is arbitrary. A direct computation is possible in the matrix case, but it is not really illuminating, and I find it better to argue geometrically, using a duality argument.

Namely, we can use u ⁣:FEu^*\colon F^*\to E^* to compute u\norm{u}, since u=u\norm u=\norm{u^*}. We have seen above that FF^* is Rn\mathbf R^n, endowed with the 1\ell^1-norm, so that we have computed u\norm{u^*} in the preceding section. The basis (e1,,en)(e_1,\dots,e_n) of FF gives a dual basis (ϕ1,,ϕn)(\phi_1,\dots,\phi_n), and one has u=u=sup(u(ϕ1),,u(ϕn)). \norm{u}=\norm{u^*} = \sup (\norm{u^*(\phi_1)},\dots,\norm{u^*(\phi_n)}).

In the matrix case, this shows that the (p,)(\ell^p,\ell^\infty)-norm of a n×mn\times m matrix AA is the supremum of the p\ell^p-norms of the lines of AA.

Relation with the Gershgorin circle theorem

I mentioned the Gershgorin circle theorem as being in the same spirit as the computation of an operator norm, because its proof relies on the same kind of estimations. In fact, no additional computation is necessary!

Theorem (Gershgorin “circles theorem”). — Let A=(aij)A=(a_{ij}) be an n×nn\times n matrix and let λ\lambda be an eigenvalue of AA. There exists an integer ii such that  λaiijiaij. \abs{\lambda-a_{ii}}\leq \sum_{j\neq i} \abs{a_{ij}}.

For the proof, one writes A=D+NA=D+N where DD is diagonal has zeroes on its diagonal, and writes λx=Ax=Dx+Nx\lambda x=Ax=Dx+Nx, hence (λID)x=Nx(\lambda I-D)x=Nx. Endow Rn\R^n with the \ell^\infty-norm. We can assume that x=1\norm x=1. Then the norm of the right hand side is bounded above by N\norm N, while the norm of the left hand side is sup(λaiixi)λaii\sup(\abs{\lambda-a_{ii}} |x_i|)\geq |\lambda-a_{ii}| if ii is chosen so that xi=x=1|x_i|=\norm x=1. Given the above formula for N\norm N, this implies the theorem.

The case p=q=2p=q=2

Since Euclidean spaces are very useful in applications, this may be a very important case to consider, and we will see that the answer is not at all straightforward from the coefficients of the matrix.

We have to bound from above u(x)\norm{u(x)}. Using the scalar product, we write  u(x)2=u(x),u(x)=uu(x),x, \norm{u(x)}^2 = \langle u(x),u(x)\rangle = \langle u^*u(x),x\rangle, where u ⁣:FEu^*\colon F\to E now denotes the adjoint of uu, which identifies with the transpose of uu if one identifies EE with EE^* and FF with FF^* by means of their scalar products. Using the Cauchy-Schwarz formula, we get that u(x)2uu(x)xuux2\norm{u(x)}^2\leq \norm{u^*u(x)}\norm x\leq \norm{u^*u} \norm x^2, hence uuu1/2\norm{u} \leq \norm{u^*u}^{1/2}. This inequality is remarkable because on the other hand, we have uuuu=u2\norm{u^*u}\leq \norm{u^*}\norm{u}=\norm{u}^2. Consequently, u=uu1/2\norm{u}=\norm{u^*u}^{1/2}.

This formula might not appear to be so useful, since it reduces the computation of the operator norm of uu to that of uuu^*u. However, the linear map uuu^*u is a positive self-adjoint endomorphism of EE hence, (assuming that EE is finite dimensional here), it can be diagonalized in a orthonormal basis. We then see that uu\norm{u^*u} is the largest eigenvalue of uuu^*u, which is also its spectral radius. The square roots of the eigenvalues of uuu^*u are also called the singular values of uu, hence u\norm u is the largest singular value of uu.

One can play with duality as well, and we have u=uu1/2\norm{u}=\norm{uu^*}^{1/2}.

Other cases?

There are general inequalities relating the various p\ell^p-norms of a vector xRmx\in\R^m, and these can be used to deduce inequalities for u\norm u, when E=RmE=\R^m has an p\ell^p-norm and F=RnF=\R^n has an q\ell^q-norm. However, given the explicit value of u\norm u for (p,q)=(2,2)(p,q)=(2,2) and the fact that no closed form expression exists for the spectral radius, it is unlikely that there is a closed form expression in the remaining cases.

Worse: the exact computation of u\norm u in the cases (,1)(\infty,1), (,2)(\infty,2) or (2,1)(2,1) is known to be computationally NP-complete, and I try to explain this result below, following J. Rohn (2000) (“Computing the Norm ∥ A ∥∞,1 Is NP-Hard”, Linear and Multilinear Algebra 47 (3), p. 195‑204). I concentrate on the (,1)(\infty, 1) case ; the (,2)(\infty,2) case is supposed to be analogous (see Joel Tropp's thesis, top of page 48, quoted by Wikipedia, but no arguments are given), and the case (2,1)(2,1) would follow by symmetry.

A matrix from a graph

Let us consider a finite (undirected, simple, without loops) graph GG on the set V={1,,n}V=\{1,\dots,n\} of nn vertices, with set of edges EE, and let us introduce the following n×nn\times n matrix A=(aij)A=(a_{ij}), a variant of the incidence matrix of the graph GG (actually nIEnI-E, where II is the identity matrix and EE is the incidence matrix of GG):

  • One has aii=na_{ii}=n for all ii;
  • If iji\neq j and vertices ii and jj are connected by an edge, then aij=1a_{ij}=-1;
  • Otherwise, aij=0a_{ij}=0.
For any subset SS of VV, the cut c(S)c(S) of SS is the number of edges which have one endpoint in SS and the other outside of SS.

Proposition.The (,1)(\ell^\infty,\ell^1)-norm of AA is given by  4supSVc(S) 2Card(E)+n2. 4 \sup_{S\subseteq V} c(S) - 2 \operatorname{Card}(E) + n^2.

The proof starts with the following observation, valid for more general matrices.

Lemma. — The (,1)(\ell^\infty,\ell^1)-norm of a symmetric positive n×nn\times n matrix AA is given by A=supzz,Az\norm A = \sup_z \langle z, Az \rangle where zz runs among the set ZZ of vectors in Rn\R^n with coordinates ±1\pm1.

The vectors of ZZ are the vertices of the polytope [1;1]n[-1;1]^n, which is the unit ball of Rn\R^n for the \ell^\infty-norm. Consequently, every vector of [1;1]n[-1;1]^n is a convex combination of vectors of ZZ. Writing x=zZczzx=\sum_{z\in Z} c_z z, we have Ax=czAzczAz=supzZAz.\norm {Ax} = \norm{\sum c_z Az} \leq \sum c_z \norm {Az}= \sup_{z\in Z} \norm{Az}. The other inequality being obvious, we already see that A=supzZAz\norm A=\sup_{z\in Z}\norm{Az}. Note that this formula holds for any norm on the codomain.
If, for zZz\in Z, one writes Az=(y1,,yn)Az=(y_1,\dots,y_n), one has Az=y1++yn\norm{Az}=|y_1|+\dots+|y_n|, because the codomain is endowed with the 1\ell^1-norm, so that z,Az=ziyiAz\langle z, Az\rangle = \sum z_i y_i\leq \norm{Az}. We thus the inequality supzZz,AzA\sup_{z\in Z} \langle z,Az\rangle \leq \norm A.
Let us now use the fact that AA is symmetric and positive. Fix zZz\in Z, set Az=(y1,,yn)Az=(y_1,\dots,y_n) as above, and define xZx\in Z by xi=1x_i=1 if yi0y_i\geq0 and xi=1x_i=-1 otherwise. One thus has x,Az=yi=Az\langle x, Az\rangle=\sum |y_i|=\norm{Az}. Since AA is symmetric and positive, one has xz,A(xz)0\langle x-z, A(x-z)\rangle\geq0, and this implies 2Az= 2x,Azx,Ax+z,Az,2\norm{Az}= 2\langle x, Az\rangle \leq \langle x, Ax\rangle+\langle z, Az\rangle, so that, AzsupxZx,Ax\norm{Az}\leq \sup_{x\in Z} \langle x, Ax\rangle. This concludes the proof.

To prove the theorem, we will apply the preceding lemma. We observe that AA is symmetric, by construction. It is also positive, since for every xRnx\in\R^n, one has x,Ax=aijxixjnxi2ijxixj=(n+1)xi2(xi)2xi2\langle x,Ax\rangle=\sum a_{ij}x_ix_j \geq n \sum x_i^2 -\sum_{i\neq j} x_i x_j = (n+1)\sum x_i^2- (\sum x_i)^2\geq \sum x_i^2 using the Cauchy-Schwarz inequality (xi)2nxi2(\sum x_i)^2\leq n\sum x_i^2. By the preceding lemma, we thus have  A=supz{±1}nz,Az. \norm A = \sup_{z\in\{\pm1\}^n} \langle z, Az\rangle. The 2n2^n vectors zZz\in Z are in bijection with the 2n2^n subsets of V={1,,n}V=\{1,\dots,n\}, by associating with zZz\in Z the subset SS of VV consisting of vertices ii such that zi=1z_i=1. Then, one can compute  z,Az=i,jaijzizj=4c(S)2Card(E)+n2. \langle z, Az\rangle = \sum_{i,j} a_{ij} z_iz_j = 4c(S) - 2\operatorname{Card}(E) + n^2. It follows that A \norm A  is equal to the indicated expression.

The last step of the proof is an application of the “simple max-cut” NP-hardness theorem of Garey, Johnson and Stockmeyer (1976), itself a strenghtening of Karp (1973)'s seminal result that “max-cut” is NP-complete. I won't explain the proofs of these results here, but let me explain what they mean and how they relate to the present discussion. First of all, computer scientists categorize problems according to the time that is required to solve them, in terms of the size of the entries. This notion depends on the actual computer that is used, but the theory of Turing machines allows to single out two classes, P and EXP, consisting of problems which can be solved in polynomial, respectively exponential, time in term of the size of the entries. A second notion, introduced by Karp, is that of NP problems, problems which can be solved in polynomial time by a “non deterministic Turing machine” — “nondeterministic” means the computer can parallelize itself at will when it needs to consider various possibilities. This class belongs to EXP (because one can simulate in exponential time a polynomial time nondeterministic algorithm) and also corresponds to the class of problems whose solution can be checked in polynomial time.

Our problem is to find a subset SS of {1,,n}\{1,\dots,n\} that maximizes c(S)c(S). This is a restriction of the “general max-cut” problem where, given an integer valued function ww on the set of edges, on wishes to find subset that maximizes c(S;w)c(S;w), the sum of the weights of the edges which have one endpoint in SS and the other outside of SS. Karp (1973) observed that the existence of SS such that c(S;w)mc(S;w)\geq m is an NP problem (if one is provided SS, it takes polynomial time to compute c(S;w)c(S;w) and to decide that it is at least mm), and the naïve search algorithm is in EXP, since there are 2n2^n such subsets. Moreover, Karp proves that any NP problem can be reduced to it in polynomial time. This is what is meant by the assertion that it is NP-complete. Consequently, determining supSc(S;w)\sup_S c(S;w) is NP-hard: if you can solve that problem, then you can solve the “max-cut” problem in polynomial time, hence any other NP-problem. A subsequent theorem by Garey, Johnson and Stockmeyer (1976) established that restricting the max-cut problems to ±1\pm1 weights is still NP-hard, and this completes the proof of Rohn's theorem.

(Aside, to insist that signs matter: a theorem of Edmonds and Karp (1972), one can solve the “min-cup” problem in polynomial time, which consists in deciding, for some given integer mm, whether there exist SS such that c(S;w)mc(S;w)\leq m.)

No comments :

Post a Comment