Let and be normed vector spaces, over the real or complex numbers, and let be a linear map. The continuity of is proved to be equivalent to the existence of a real number such that for every , and the least such real number is called the operator norm of ; we denote it by . It defines a norm on the linear space of continuous linear maps from to and as such is quite important. When , it is also related to the spectrum of and is implicitly at the heart of criteria for the Gershgorin criterion for localization of eigenvalues.
However, even in the simplest cases of matrices, its explicit computation is not trivial at all, and as we'll see even less trivial than what is told in algebra classes, as I learned by browsing Wikipedia when I wanted to prepare a class on the topic.
Since I'm more a kind of abstract guy, I will use both languages, of normed spaces and matrices, for the first one allows to explain a few things at a more fundamental level. I'll make the translation, though. Also, to be specific, I'll work with real vector spaces.
So , , and linear maps in are represented by matrices. There are plentiful interesting norms on , but I will concentrate the discussion on the -norm given by . Similarly, I will consider the -norm on given by . Here, and are elements of . It is also interesting to allow or ; in this case, the expression defining the norm is just replaced by and respectively.
Duality
Whatever norm is given on , the dual space is endowed with the dual norm, which is just the operator norm of that space: for , is the least real number such that for all . And similarly for . To emphasize duality, we will write instead of .
Example. — The dual norm of the norm can be computed explicitly, thanks to the Young inequality if are related by the relation . (When , this gives , and symmetrically if .) Moreover, this inequality is optimal in the sense that for any , one may find a nonzero for which the inequality is an equality. What this inequality says about norms/dual norms is that if one identifies with its dual, via the duality bracket , the dual of the -norm is the -norm, for that relation .
If is a continuous linear map, it has an adjoint (or transpose) , which is defined by , for . In terms of the duality bracket, this rewrites as for and .
Proposition. — One has .
For , is the least real number such that for all . Since one has
we see that for all . As a consequence, .
To get the other inequality, we wish to find a nonzero such that . This should thus be such that there exists such that which, by the preceding computation means that . Such and must not exist in general, but we can find reasonable approximations.
Start with a nonzero such that is close to ;
then using the Hahn-Banach theorem, find a nonzero such that and . We see that is close to , and this concludes the proof.
In some cases, in particular in the finite dimension case, we can use biduality to get the other inequality. Indeed identifies with , with its initial norm, and identifies with . By the first case, we thus have , hence .
The case
We compute when is endowed with the -norm, and is arbitrary. The linear map thus corresponds with vectors of , and one has By the triangular inequality, we have hence Consequently, On the other hand, taking of the form , where the is at place such that is largest, we have and . The preceding inequality is thus an equality.
In the matrix case, this shows that the -norm of a matrix is the supremum of the -norms of the columns of .
The case
We compute when is endowed with the -norm, and is arbitrary. A direct computation is possible in the matrix case, but it is not really illuminating, and I find it better to argue geometrically, using a duality argument.
Namely, we can use to compute , since . We have seen above that is , endowed with the -norm, so that we have computed in the preceding section. The basis of gives a dual basis , and one has
In the matrix case, this shows that the -norm of a matrix is the supremum of the -norms of the lines of .
Relation with the Gershgorin circle theorem
I mentioned the Gershgorin circle theorem as being in the same spirit as the computation of an operator norm, because its proof relies on the same kind of estimations. In fact, no additional computation is necessary!
Theorem (Gershgorin “circles theorem”). — Let be an matrix and let be an eigenvalue of . There exists an integer such that
For the proof, one writes where is diagonal has zeroes on its diagonal, and writes , hence . Endow with the -norm. We can assume that . Then the norm of the right hand side is bounded above by , while the norm of the left hand side is if is chosen so that . Given the above formula for , this implies the theorem.
The case
Since Euclidean spaces are very useful in applications, this may be a very important case to consider, and we will see that the answer is not at all straightforward from the coefficients of the matrix.
We have to bound from above . Using the scalar product, we write where now denotes the adjoint of , which identifies with the transpose of if one identifies with and with by means of their scalar products. Using the Cauchy-Schwarz formula, we get that , hence . This inequality is remarkable because on the other hand, we have . Consequently, .
This formula might not appear to be so useful, since it reduces the computation of the operator norm of to that of . However, the linear map is a positive self-adjoint endomorphism of hence, (assuming that is finite dimensional here), it can be diagonalized in a orthonormal basis. We then see that is the largest eigenvalue of , which is also its spectral radius. The square roots of the eigenvalues of are also called the singular values of , hence is the largest singular value of .
One can play with duality as well, and we have .
Other cases?
There are general inequalities relating the various -norms of a vector , and these can be used to deduce inequalities for , when has an -norm and has an -norm. However, given the explicit value of for and the fact that no closed form expression exists for the spectral radius, it is unlikely that there is a closed form expression in the remaining cases.
Worse: the exact computation of in the cases , or is known to be computationally NP-complete, and I try to explain this result below, following J. Rohn (2000) (“Computing the Norm ∥ A ∥∞,1 Is NP-Hard”, Linear and Multilinear Algebra 47 (3), p. 195‑204). I concentrate on the case ; the case is supposed to be analogous (see Joel Tropp's thesis, top of page 48, quoted by Wikipedia, but no arguments are given), and the case would follow by symmetry.
A matrix from a graph
Let us consider a finite (undirected, simple, without loops) graph on the set of vertices, with set of edges , and let us introduce the following matrix , a variant of the incidence matrix of the graph (actually , where is the identity matrix and is the incidence matrix of ):
- One has for all ;
- If and vertices and are connected by an edge, then ;
- Otherwise, .
Proposition. — The -norm of is given by
The proof starts with the following observation, valid for more general matrices.
Lemma. — The -norm of a symmetric positive matrix is given by where runs among the set of vectors in with coordinates .
The vectors of are the vertices of the polytope , which is the unit ball of for the -norm. Consequently, every vector of is a convex combination of vectors of .
Writing , we have
The other inequality being obvious, we already see that .
Note that this formula holds for any norm on the codomain.
If, for , one writes , one has , because
the codomain is endowed with the -norm,
so that . We thus the inequality
.
Let us now use the fact that is symmetric and positive.
Fix , set as above, and define by if and
otherwise. One thus has .
Since is symmetric and positive, one has , and this implies
so that, . This concludes the proof.
To prove the theorem, we will apply the preceding lemma. We observe that is symmetric, by construction. It is also positive, since for every , one has using the Cauchy-Schwarz inequality . By the preceding lemma, we thus have The vectors are in bijection with the subsets of , by associating with the subset of consisting of vertices such that . Then, one can compute It follows that is equal to the indicated expression.
The last step of the proof is an application of the “simple max-cut” NP-hardness theorem of Garey, Johnson and Stockmeyer (1976), itself a strenghtening of Karp (1973)'s seminal result that “max-cut” is NP-complete. I won't explain the proofs of these results here, but let me explain what they mean and how they relate to the present discussion. First of all, computer scientists categorize problems according to the time that is required to solve them, in terms of the size of the entries. This notion depends on the actual computer that is used, but the theory of Turing machines allows to single out two classes, P and EXP, consisting of problems which can be solved in polynomial, respectively exponential, time in term of the size of the entries. A second notion, introduced by Karp, is that of NP problems, problems which can be solved in polynomial time by a “non deterministic Turing machine” — “nondeterministic” means the computer can parallelize itself at will when it needs to consider various possibilities. This class belongs to EXP (because one can simulate in exponential time a polynomial time nondeterministic algorithm) and also corresponds to the class of problems whose solution can be checked in polynomial time.
Our problem is to find a subset of that maximizes . This is a restriction of the “general max-cut” problem where, given an integer valued function on the set of edges, on wishes to find subset that maximizes , the sum of the weights of the edges which have one endpoint in and the other outside of . Karp (1973) observed that the existence of such that is an NP problem (if one is provided , it takes polynomial time to compute and to decide that it is at least ), and the naïve search algorithm is in EXP, since there are such subsets. Moreover, Karp proves that any NP problem can be reduced to it in polynomial time. This is what is meant by the assertion that it is NP-complete. Consequently, determining is NP-hard: if you can solve that problem, then you can solve the “max-cut” problem in polynomial time, hence any other NP-problem. A subsequent theorem by Garey, Johnson and Stockmeyer (1976) established that restricting the max-cut problems to weights is still NP-hard, and this completes the proof of Rohn's theorem.
(Aside, to insist that signs matter: a theorem of Edmonds and Karp (1972), one can solve the “min-cup” problem in polynomial time, which consists in deciding, for some given integer , whether there exist such that .)