Matrix variety: basic geometry

low rank
variety
Author

Bin Gao

Published

December 20, 2024

Abstract

The set of bounded-rank matrices, \(\mathbb{R}^{m\times n}_{\leq r}=\{X\in\mathbb{R}^{m\times n}: \mathrm{rank}(X)\leq r\}\), is a non-smooth algebraic variety called matrix variety. More specifically, it can be defined by the matrices in which all \((r+1)\times(r+1)\) minors are zero, so it is also called a determinantal variety. This post presents the basic geometry of \(\mathbb{R}^{m\times n}_{\leq r}\) that is foundational for optimization involving low-rank structures.

1. Algebraic variety

Definition

Algebraic varieties are the central objects of study in algebraic geometry.1 In the classical definition, an algebraic variety \(V\) is the set of solutions of a system of polynomial equations. \[V=\{x\in\mathbb{R}^{s}: p_i(x)=0~\text{for}~i=1,\dots,d\} = \{x\in\mathbb{R}^{s}: {P}(x)=(p_1(x),\dots,p_d(x))=0\},\]

where \(p_i(x)\) is polynomial, and we only consider real coefficients.

Singularity (local property)

A point \(x\in V\) is called singular if the tangent space at \(x\) is not regularly defined, i.e., either it does not exist or a special definition must be provided. In other words, it is locally non-flat at \(x\).
A point \(x\in V\) is called regular if \(x\) is not singular.
A singular point \(x\in V\) is called a node if the Hessian matrix at \(x\) is non-singular.

Local structure: For an algebraic variety \(V\), almost all points in \(V\) are regular. Moreover, the set of regular points is both open and dense in \(V\), and \(V\) is a manifold near every regular point.

Jacobian criterion:2 A point \(x\in V\) is singular if the Jacobian matrix \(J(x)=[\nabla p_1(x)~\nabla p_1(x)~\cdots~\nabla p_d(x)]\) of the polynomials \(P(x)\) has a lower rank at \(x\) than other points of \(V\), i.e., \(\mathrm{rank}(J(x)) < \mathrm{rank}(J(y))\). The rationale behind this is that singular points are those at which all the first-order partial derivatives simultaneously vanish, implying a not well-defined tangent space.

Smoothness

An algebraic variety \(V\) is said to be non-singular or smooth if it has no singular points. The Jacobian criterion provides a computational way to test smoothness.

Example: a cubic curve
A cubic curve, \[y^2-x^2(x+1)=0,\] defines a non-smooth algebraic variety. The origin \((0,0)\) is a singular point because the Jacobian \(J(x,y) = (-3x^2-2x, 2y)\) vanishes at the origin while having rank \(1\) at other points. Moreover, the Hessian of the cubic curve is \(H(x,y)=\left[\begin{smallmatrix} -6x-2 & 0\\ 0 & 2\end{smallmatrix}\right]\), which has both positive and negative eigenvalues at the singular point \((0,0)\), implying \((0,0)\) is also a node.

2. Matrix variety

Definition

The set of bounded-rank matrices with \(r\leq \mathrm{min}\{m,n\}\), \[\mathbb{R}^{m\times n}_{\leq r}:=\{X\in\mathbb{R}^{m\times n}: \mathrm{rank}(X)\leq r\},\]

is an algebraic variety (also called matrix variety) since it can be defined by the matrices in which all \((r+1)\times(r+1)\) minors vanish. Let \(I_{r+1}=\{\mathbf{i}_{r+1}=(i_1,\dots,i_{r+1}): 1\leq i_1<\cdots<i_{r+1}\leq m\}\) and \(J_{r+1}=\{\mathbf{j}_{r+1}=(j_1,\dots,j_{r+1}): 1\leq j_1<\cdots<j_{r+1}\leq n\}\) be the index sets of \((r+1)\times(r+1)\) submatrices, it holds that \[\mathbb{R}^{m\times n}_{\leq r}=\{X\in\mathbb{R}^{m\times n}: \mathrm{det}(X_{\mathbf{i}_{r+1}, \mathbf{j}_{r+1}})=0~\text{for all}~\mathbf{i}_{r+1}\in I_{r+1}, \mathbf{j}_{r+1}\in J_{r+1}\},\] where the number of equalities is \(d\). Thus, it is also called a determinantal variety.

Singular points

The singular points of determinantal variety can be derived via the tools in algebraic geometry; e.g., [Theorem 10.3.3]3 or [Proposition 1.1]4. Here, we provide an elementary way to determine singularity by using the Jabobian criterion.

Given a square matrix \(A\). By the Jacobi’s formula, we have \(\nabla_A\mathrm{det}(A)={(A^*)}^{\top}\) where \(A^*\) is the adjoint matrix of \(A\). Thus, the partial gradient on the index \(\{\mathbf{i}_{r+1}, \mathbf{j}_{r+1}\}\) of each defining polynomial is \[\nabla_{X_{\mathbf{i}_{r+1}, \mathbf{j}_{r+1}}}\mathrm{det}(X_{\mathbf{i}_{r+1}, \mathbf{j}_{r+1}}) = (X_{\mathbf{i}_{r+1}, \mathbf{j}_{r+1}}^*)^{\top},\]

which leads to the full gradient \(\nabla_X\mathrm{det}(X_{\mathbf{i}_{r+1}, \mathbf{j}_{r+1}})\) in which \(\{\mathbf{i}_{r+1}, \mathbf{j}_{r+1}\}\) entries are those of \((X_{\mathbf{i}_{r+1}, \mathbf{j}_{r+1}}^*)^{\top}\) and the others are \(0\). According to the representation of an adjoint matrix via the cofactor matrix, the Jacobian matrix of \(d\) defining polynomials reduces to a large matrix \(J(X)\in\mathbb{R}^{m\times nd}\) such that each entry of \(J(X)\) corresponds to either an \(r\times r\) minor \(\mathrm{det}(X_{\mathbf{i}_{r}, \mathbf{j}_{r}})\) or \(0\). In fact, the Jacobian is consist of all \(r\times r\) minors. It directly implies that the Jacobian vanishes at any point \[X\in \mathbb{R}^{m\times n}_{< r}:=\{X\in\mathbb{R}^{m\times n}: \mathrm{rank}(X)< r\}\]

since all \(r\times r\) minors vanish at \(X\in \mathbb{R}^{m\times n}_{< r}\), i.e., \(\mathrm{det}(X_{\mathbf{i}_{r}, \mathbf{j}_{r}})=0\). Therefore, the set of singular points of matrix variety is \(\mathbb{R}^{m\times n}_{< r}\). Note that it is also a matrix variety since \(\mathbb{R}^{m\times n}_{< r}=\mathbb{R}^{m\times n}_{\leq (r-1)}\). Moreover, the set of regular points is \[\mathbb{R}^{m\times n}_{r}:=\{X\in\mathbb{R}^{m\times n}: \mathrm{rank}(X)= r\},\]

which is actually a smooth manifold.

In summary, \(\mathbb{R}^{m\times n}_{\leq r}\) is a non-smooth algebraic variety and its singular points are \(\mathbb{R}^{m\times n}_{< r}\).

Topology

The matrix variety \(\mathbb{R}^{m\times n}_{\leq r}\) is closed. Moreover, it is straightforward to verify that the set of regular points \(\mathbb{R}^{m\times n}_{r}\) is open and dense in \(\mathbb{R}^{m\times n}_{\leq r}\), which validates the local struture of an algebraic variety. Specifically, a sequence in \(\mathbb{R}^{m\times n}_{r}\) can converge to any singular point, and the closure of \(\mathbb{R}^{m\times n}_{r}\) is the matrix variety itself.

3. Fixed-rank manifold

The set of all fixed-rank matrices \(\mathbb{R}^{m\times n}_{\underline{r}}\) is a smooth manifold called fixed-rank manifold, which is also an embedded submanifold of \(\mathbb{R}^{m\times n}\) with dimension \(\underline{r}(m+n-\underline{r})\). To see this, one can find a local defining mapping from [Example 5.30] of the classic textbook5 or [Section 7.5] of a nice book6 for optimization on manifolds.
The matrix variety can be also interpreted by the “stratified” or “layered” space in the sense of \(\mathbb{R}^{m\times n}_{\leq r}= \bigcup_{\underline{r}=0}^{\underline{r}=r} \mathbb{R}^{m\times n}_{\underline{r}}\).

Note that \(\mathbb{R}^{m\times n}_{\underline{r}}\) is not closed.

Tangent space

Given \(X\in\mathbb{R}_{\underline{r}}^{m\times n}\) with \(\underline{r}\leq r\). The tangent space at \(X\) can be parameterized by the derivative of a low-rank decomposition of \(X\). For instance, one can follow \(X=AB^{\top}\) in [Lemma 1]7 or the thin singular value decomposition (SVD) \(X=U\Sigma V^{\top}\) in [Proposition 2.1]8. Here, we follow the latter for the sake of computational convenience.

Give the thin SVD \(X=U\Sigma V^{\top}\), where \(U\in\mathrm{St}(\underline{r},m)\), \(V\in\mathrm{St}(\underline{r},n)\), \(\mathrm{St}(p,n):=\{X\in\mathbb{R}^{n\times p}: X^{\top}X=I\}\) is the Stiefel manifold, and \(\Sigma=\mathrm{Diag}(\sigma_1,\sigma_2,\dots,\sigma_{\underline{r}})\) with \(\sigma_1\geq\sigma_2\geq\cdots\geq\sigma_{\underline{r}}>0\). The tangent can be parameterized as follows,
\[\begin{aligned} \mathrm{T}_X \mathbb{R}^{m\times n}_{\underline{r}}&=\left\{\begin{bmatrix} U & U^\perp \end{bmatrix} \begin{bmatrix} \mathbb{R}^{\underline{r}\times \underline{r}} & \mathbb{R}^{\underline{r}\times (n-\underline{r})}\\ \mathbb{R}^{(m-\underline{r})\times \underline{r}} & 0 \end{bmatrix} \begin{bmatrix} V & V^\perp \end{bmatrix}^{\top}\right\}, \end{aligned}\]

where “\(^\perp\)” denotes the orthogonal complement. A tangent vector can be illustrated9 as

\([U\quad U^\perp]\) \([V\quad V^\perp]\),

where a shaded square represents an arbitrary matrix and the blank represents the matrix with zero elements.

Normal space

Consider the standard Euclidean metric. The normal space at \(X\) is \[\begin{aligned} \mathrm{N}_X \mathbb{R}^{m\times n}_{\underline{r}}&=\left\{\begin{bmatrix} U & U^\perp \end{bmatrix} \begin{bmatrix} 0 & 0\\ 0 & \mathbb{R}^{(m-\underline{r})\times (n-\underline{r})} \end{bmatrix} \begin{bmatrix} V & V^\perp \end{bmatrix}^{\top}\right\} \end{aligned}\] and a normal vector is

\([U\quad U^\perp]\) \([V\quad V^\perp]\).

With the help of illustrations, it is direct to see that the direct sum of tangent and normal spaces forms the Euclidean space, i.e., \(\mathbb{R}^{m\times n}=\mathrm{T}_X \mathbb{R}^{m\times n}_{\underline{r}}\oplus\mathrm{N}_X \mathbb{R}^{m\times n}_{\underline{r}}\).

Projection

Given a matrix \(A\in\mathbb{R}^{m\times n}\). The orthogonal projections onto the tangent and normal space are given by \[\begin{equation} \begin{aligned} \mathrm{P}_{\mathrm{T}_X \mathbb{R}^{m\times n}_{\underline{r}}} A&=\mathrm{P}_U A \mathrm{P}_V+\mathrm{P}_U^\perp A \mathrm{P}^{}_V+\mathrm{P}_U^{} A \mathrm{P}_V^\perp,\\ \mathrm{P}_{\mathrm{N}_X \mathbb{R}^{m\times n}_{\underline{r}}} A&=\mathrm{P}_U^{\perp} A \mathrm{P}_V^\perp, \end{aligned} \end{equation}\] where \(\mathrm{P}_U:=UU^{\top}\), \(\mathrm{P}_U^\perp:=I_{m}-UU^{\top}\), \(\mathrm{P}_V:=VV^{\top}\), and \(\mathrm{P}_V^\perp:=I_{n}-VV^{\top}\) are projection matrices.

4. Geometry of \(\mathbb{R}^{m\times n}_{\leq r}\)

Although \(\mathbb{R}^{m\times n}_{\leq r}\) is a non-smooth algebraic variety, its geometry is closely related to fixed-rank manifolds. If \(X\in\mathbb{R}^{m\times n}_{\leq r}\) is regular, i.e., \(\mathrm{rank}(X)=r\), the geometry at \(X\) locally behaves as a manifold. Alternatively, if \(X\) is singular, one can investigate local geometry via tangent and normal cones.

Tangent cone

Given \(X\in\mathbb{R}_{\leq r}^{m\times n}\) with rank \(\underline{r}\leq r\). The (Bouligand) tangent cone of \(\mathbb{R}^{m\times n}_{\leq r}\) at \(X\) is defined by \(\mathrm{T}_X \mathbb{R}^{m\times n}_{\leq r}:=\{\varXi\in\mathbb{R}^{m\times n}: \exists t^{(i)}\to 0, X^{(i)}\to X\text{ in }\mathbb{R}^{m\times n}_{\leq r}, \mathrm{s.~t.}~\frac{X^{(i)}-X}{t^{(i)}}\to\varXi\}\). The parametrization of tangent cone can be obtained through different ways; e.g., [Theorem 6.1]10 with a diagonal normalization, or [Theorem 3.2]11 with a space decomposition. Specifically, the tangent cone is \[\begin{aligned} \mathrm{T}_X \mathbb{R}^{m\times n}_{\leq r}&=\left\{\begin{bmatrix} U & U^\perp \end{bmatrix} \begin{bmatrix} \mathbb{R}^{\underline{r}\times \underline{r}} & \mathbb{R}^{\underline{r}\times (n-\underline{r})}\\ \mathbb{R}^{(m-\underline{r})\times \underline{r}} & \mathbb{R}^{(m-\underline{r})\times (n-\underline{r})}_{\leq (r-\underline{r})} \end{bmatrix} \begin{bmatrix} V & V^\perp \end{bmatrix}^{\top}\right\}. \end{aligned}\] A new parametrization: Here, we give a new reduced parametrization for the tangent cone by further decomposing an element \(F=\tilde{U}S\tilde{V}^{\top}\) in \(\mathbb{R}^{(m-\underline{r})\times (n-\underline{r})}_{\leq (r-\underline{r})}\). By some matrix computations12, we yield a new parametrization as follows, \[\begin{aligned} \mathrm{T}_X \mathbb{R}^{m\times n}_{\leq r}&=\left\{\begin{bmatrix} U & U_1 & U_2 \end{bmatrix} \begin{bmatrix} \mathbb{R}^{\underline{r}\times \underline{r}} & \mathbb{R}^{\underline{r}\times (r-\underline{r})} & \mathbb{R}^{\underline{r}\times (n-r)}\\ \mathbb{R}^{(r-\underline{r})\times \underline{r}} & \mathbb{R}^{(r-\underline{r})\times (r-\underline{r})} & 0\\ \mathbb{R}^{(m-r)\times \underline{r}} & 0 & 0 \end{bmatrix} \begin{bmatrix} V & V_1 & V_2 \end{bmatrix}^{\top}\right\}, \end{aligned}\]

where \(U_1\in \mathrm{St}(r-\underline{r},m), V_1\in \mathrm{St}(r-\underline{r},n), U_2\in \mathrm{St}(m-r,m), V_2\in \mathrm{St}(n-r,n)\) satisfying \([ U\ U_1\ U_2]\in\mathcal{O}(m)\) and \([ V\ V_1\ V_2]\in\mathcal{O}(n)\). Furthemore, it can be illustrated as

\([U\quad U_1\quad U_2]\) \([V\quad V_1\quad V_2]\).

Space decomposition at singular points: Note that if \(X\in\mathbb{R}^{m\times n}_{\leq r}\) is regular, then the tangent cone boils down to the tangent space. Again from the illustration, it is easy to see the decomposition \[\mathrm{T}_X\mathbb{R}^{m\times n}_{\leq r}=\mathrm{T}_X \mathbb{R}^{m\times n}_{\underline{r}}\oplus\mathrm{N}_{\leq (r-\underline{r})}(X),\] where \(\mathrm{N}_{\leq (r-\underline{r})}(X):=\left\{N\in\mathrm{N}_X \mathbb{R}^{m\times n}_{\underline{r}}: \mathrm{rank}(N)\leq(r-\underline{r})\right\}\) denotes a low-rank cone in the normal space. It indicates that the local geometry can be identified via the local (fixed-rank) manifold \(\mathbb{R}_{\underline{r}}^{m\times n}\). As a result, a tangent vector can be viewed as follows, which gives a feeling how a tangent vector looks like.

Rank increase along normal part: Another interesing fact is that given a matrix \(X\in\mathbb{R}_{\leq r}^{m\times n}\) with rank \(\underline{r} < r\) and a vector in the cone \(N\in\mathrm{N}_{\leq (r-\underline{r})}(X)\setminus\{0\}\), it holds that \[\mathrm{rank}(X+tN)\in(\underline{r},r]\quad \text{for}~t>0.\] As suggested in [Section 3.3]13, this principle can be applied to increase the rank of \(X\), which facilitates the development of rank-adaptive methods.

Normal cone

Given \(X\in\mathbb{R}_{\leq r}^{m\times n}\) with rank \(\underline{r}\leq r\).. The set \(\mathrm{N}_X \mathbb{R}^{m\times n}_{\leq r}=(\mathrm{T}_X \mathbb{R}^{m\times n}_{\leq r})^\circ:=\{N\in\mathbb{R}^{m\times n}:\langle N, \varXi\rangle\leq 0\ \text{for all}\ \varXi\in\mathrm{T}_X \mathbb{R}^{m\times n}_{\leq r}\}\) is called the normal cone of \(\mathbb{R}^{m\times n}_{\leq r}\). It is easy to verify that \[\begin{aligned} \mathrm{N}_X \mathbb{R}^{m\times n}_{\leq r}&= \left\{ \begin{array}{ll} \mathrm{N}_X \mathbb{R}^{m\times n}_{r},&\ \text{if}\ \underline{r}=r;\\ \{0\},&\ \text{if}\ \underline{r}<r. \end{array} \right. \end{aligned}\]

Projection

Given a matrix \(A\in\mathbb{R}^{m\times n}\). The orthogonal projection onto the tangent space is \[\mathrm{P}_{\mathrm{T}_X \mathbb{R}^{m\times n}_{\leq r}} A=\mathrm{P}_{\mathrm{T}_X \mathbb{R}^{m\times n}_{\underline{r}}} A+\mathrm{P}_{\leq (r-\underline{r})} \left(\mathrm{P}_U^\perp A \mathrm{P}_V^\perp\right),\]

where \(\mathrm{P}_{\leq (r-\underline{r})}\) is the projection onto \(\mathbb{R}^{m\times n}_{\leq (r-\underline{r})}\), which can be achieved by the truncated SVD.

Footnotes

  1. Robin Hartshorne. “Algebraic geometry”. Vol. 52. Springer Science & Business Media, 2013.↩︎

  2. The Jacobian criterion for testing smoothness can be found in a textbook on computational algebraic geometry, e.g., [Corollary 5.6.14] of the book: Gert-Martin Greuel, Gerhard Pfister. “A Singular introduction to commutative algebra”. Vol. 348. Berlin: Springer, 2008.↩︎

  3. V. Lakshmibai, and Justin Brown. “The Grassmannian variety.” Developments in Mathematics. Vol. 42. Springer New York, 2015.↩︎

  4. Winfried Bruns, and Udo Vetter. “Determinantal rings”. Vol. 1327. Springer, 2006.↩︎

  5. John M. Lee, “Introduction to Smooth Manifolds”. Version 3.0. December 31, 2000↩︎

  6. Nicolas Boumal. “An introduction to optimization on smooth manifolds”. Cambridge University Press, 2023.↩︎

  7. Uri Shalit, Daphna Weinshall, and Gal Chechik. “Online learning in the manifold of low-rank matrices.” Advances in neural information processing systems 23 (2010).↩︎

  8. Bart Vandereycken. “Low-rank matrix completion by Riemannian optimization”. In: SIAM Journal on Optimization 23.2 (2013), pp. 12141236.↩︎

  9. Bin Gao, Renfeng Peng, Ya-xiang Yuan. “Low-rank optimization on Tucker tensor varieties.” arXiv:2311.18324 (2023).↩︎

  10. T. P. Cason, P.-A. Absil, and P. Van Dooren, Iterative methods for low rank approximation of graph similarity matrices, Linear Algebra Appl., 438 (2013), pp. 1863–1882.↩︎

  11. Reinhold Schneider and André Uschmajew. “Convergence results for projected line-search methods on varieties of low-rank matrices Via Lojasiewicz Inequality”. In: SIAM Journal on Optimization 25.1 (2015), pp. 622–646.↩︎

  12. See [Section 2.1] of the work by Bin Gao, Renfeng Peng, Ya-xiang Yuan. “Low-rank optimization on Tucker tensor varieties.” arXiv:2311.18324 (2023).↩︎

  13. Bin Gao and P.-A. Absil. “A Riemannian rank-adaptive method for low-rank matrix completion”. In: Computational Optimization and Applications 81 (2022), pp. 67–90.↩︎