Matrix variety: basic geometry

Abstract

The set of bounded-rank matrices, $R_{\leq r}^{m \times n} = {X \in R^{m \times n} : rank (X) \leq r}$ , is a non-smooth algebraic variety called matrix variety. More specifically, it can be defined by the matrices in which all $(r + 1) \times (r + 1)$ minors are zero, so it is also called a determinantal variety. This post presents the basic geometry of $R_{\leq r}^{m \times n}$ that is foundational for optimization involving low-rank structures.

1. Algebraic variety

Definition

Algebraic varieties are the central objects of study in algebraic geometry.¹ In the classical definition, an algebraic variety $V$ is the set of solutions of a system of polynomial equations. $V = {x \in R^{s} : p_{i} (x) = 0 for i = 1, \dots, d} = {x \in R^{s} : P (x) = (p_{1} (x), \dots, p_{d} (x)) = 0},$

where $p_{i} (x)$ is polynomial, and we only consider real coefficients.

Singularity (local property)

A point $x \in V$ is called singular if the tangent space at $x$ is not regularly defined, i.e., either it does not exist or a special definition must be provided. In other words, it is locally non-flat at $x$ .
A point $x \in V$ is called regular if $x$ is not singular.
A singular point $x \in V$ is called a node if the Hessian matrix at $x$ is non-singular.

Local structure: For an algebraic variety $V$ , almost all points in $V$ are regular. Moreover, the set of regular points is both open and dense in $V$ , and $V$ is a manifold near every regular point.

Jacobian criterion:² A point $x \in V$ is singular if the Jacobian matrix $J (x) = [\nabla p_{1} (x) \nabla p_{1} (x) \dots \nabla p_{d} (x)]$ of the polynomials $P (x)$ has a lower rank at $x$ than other points of $V$ , i.e., $rank (J (x)) < rank (J (y))$ . The rationale behind this is that singular points are those at which all the first-order partial derivatives simultaneously vanish, implying a not well-defined tangent space.

Smoothness

An algebraic variety $V$ is said to be non-singular or smooth if it has no singular points. The Jacobian criterion provides a computational way to test smoothness.

Example: a cubic curve

A cubic curve,

y^{2} - x^{2} (x + 1) = 0,

defines a non-smooth algebraic variety. The origin

(0, 0)

is a singular point because the Jacobian

J (x, y) = (- 3 x^{2} - 2 x, 2 y)

vanishes at the origin while having rank

1

at other points. Moreover, the Hessian of the cubic curve is

H (x, y) = [\begin{matrix} - 6 x - 2 & 0 \\ 0 & 2 \end{matrix}]

, which has both positive and negative eigenvalues at the singular point

(0, 0)

, implying

(0, 0)

is also a node.

2. Matrix variety

Definition

The set of bounded-rank matrices with $r \leq \min {m, n}$ , $R_{\leq r}^{m \times n} := {X \in R^{m \times n} : rank (X) \leq r},$

is an algebraic variety (also called matrix variety) since it can be defined by the matrices in which all $(r + 1) \times (r + 1)$ minors vanish. Let $I_{r + 1} = {i_{r + 1} = (i_{1}, \dots, i_{r + 1}) : 1 \leq i_{1} < \dots < i_{r + 1} \leq m}$ and $J_{r + 1} = {j_{r + 1} = (j_{1}, \dots, j_{r + 1}) : 1 \leq j_{1} < \dots < j_{r + 1} \leq n}$ be the index sets of $(r + 1) \times (r + 1)$ submatrices, it holds that $R_{\leq r}^{m \times n} = {X \in R^{m \times n} : \det (X_{i_{r + 1}, j_{r + 1}}) = 0 for all i_{r + 1} \in I_{r + 1}, j_{r + 1} \in J_{r + 1}},$ where the number of equalities is $d$ . Thus, it is also called a determinantal variety.

Singular points

The singular points of determinantal variety can be derived via the tools in algebraic geometry; e.g., [Theorem 10.3.3]³ or [Proposition 1.1]⁴. Here, we provide an elementary way to determine singularity by using the Jabobian criterion.

Given a square matrix $A$ . By the Jacobi’s formula, we have $\nabla_{A} \det (A) = {(A^{*})}^{⊤}$ where $A^{*}$ is the adjoint matrix of $A$ . Thus, the partial gradient on the index ${i_{r + 1}, j_{r + 1}}$ of each defining polynomial is $\nabla_{X_{i_{r + 1}, j_{r + 1}}} \det (X_{i_{r + 1}, j_{r + 1}}) = (X_{i_{r + 1}, j_{r + 1}}^{*})^{⊤},$

which leads to the full gradient $\nabla_{X} \det (X_{i_{r + 1}, j_{r + 1}})$ in which ${i_{r + 1}, j_{r + 1}}$ entries are those of $(X_{i_{r + 1}, j_{r + 1}}^{*})^{⊤}$ and the others are $0$ . According to the representation of an adjoint matrix via the cofactor matrix, the Jacobian matrix of $d$ defining polynomials reduces to a large matrix $J (X) \in R^{m \times n d}$ such that each entry of $J (X)$ corresponds to either an $r \times r$ minor $\det (X_{i_{r}, j_{r}})$ or $0$ . In fact, the Jacobian is consist of all $r \times r$ minors. It directly implies that the Jacobian vanishes at any point $X \in R_{< r}^{m \times n} := {X \in R^{m \times n} : rank (X) < r}$

since all $r \times r$ minors vanish at $X \in R_{< r}^{m \times n}$ , i.e., $\det (X_{i_{r}, j_{r}}) = 0$ . Therefore, the set of singular points of matrix variety is $R_{< r}^{m \times n}$ . Note that it is also a matrix variety since $R_{< r}^{m \times n} = R_{\leq (r - 1)}^{m \times n}$ . Moreover, the set of regular points is $R_{r}^{m \times n} := {X \in R^{m \times n} : rank (X) = r},$

which is actually a smooth manifold.

In summary, $R_{\leq r}^{m \times n}$ is a non-smooth algebraic variety and its singular points are $R_{< r}^{m \times n}$ .

Topology

The matrix variety $R_{\leq r}^{m \times n}$ is closed. Moreover, it is straightforward to verify that the set of regular points $R_{r}^{m \times n}$ is open and dense in $R_{\leq r}^{m \times n}$ , which validates the local struture of an algebraic variety. Specifically, a sequence in $R_{r}^{m \times n}$ can converge to any singular point, and the closure of $R_{r}^{m \times n}$ is the matrix variety itself.

3. Fixed-rank manifold

The set of all fixed-rank matrices $R_{\underset{―}{r}}^{m \times n}$ is a smooth manifold called fixed-rank manifold, which is also an embedded submanifold of $R^{m \times n}$ with dimension $\underset{―}{r} (m + n - \underset{―}{r})$ . To see this, one can find a local defining mapping from [Example 5.30] of the classic textbook⁵ or [Section 7.5] of a nice book⁶ for optimization on manifolds.
The matrix variety can be also interpreted by the “stratified” or “layered” space in the sense of $R_{\leq r}^{m \times n} = ⋃_{\underset{―}{r} = 0}^{\underset{―}{r} = r} R_{\underset{―}{r}}^{m \times n}$ .

Note that $R_{\underset{―}{r}}^{m \times n}$ is not closed.

Tangent space

Given $X \in R_{\underset{―}{r}}^{m \times n}$ with $\underset{―}{r} \leq r$ . The tangent space at $X$ can be parameterized by the derivative of a low-rank decomposition of $X$ . For instance, one can follow $X = A B^{⊤}$ in [Lemma 1]⁷ or the thin singular value decomposition (SVD) $X = U Σ V^{⊤}$ in [Proposition 2.1]⁸. Here, we follow the latter for the sake of computational convenience.

Give the thin SVD

X = U Σ V^{⊤}

, where

U \in St (\underset{―}{r}, m)

V \in St (\underset{―}{r}, n)

St (p, n) := {X \in R^{n \times p} : X^{⊤} X = I}

is the Stiefel manifold, and

Σ = Diag (σ_{1}, σ_{2}, \dots, σ_{\underset{―}{r}})

with

σ_{1} \geq σ_{2} \geq \dots \geq σ_{\underset{―}{r}} > 0

. The tangent space can be parameterized as follows,

\begin{aligned} T_{X} R_{\underset{―}{r}}^{m \times n} & = {[\begin{array}{c} U & U^{⊥} \end{array}] [\begin{array}{c} R^{\underset{―}{r} \times \underset{―}{r}} & R^{\underset{―}{r} \times (n - \underset{―}{r})} \\ R^{(m - \underset{―}{r}) \times \underset{―}{r}} & 0 \end{array}] {[\begin{array}{c} V & V^{⊥} \end{array}]}^{⊤}}, \end{aligned}

where “ $^{⊥}$ ” denotes the orthogonal complement. A tangent vector can be illustrated⁹ as

$[U U^{⊥}]$ $[V V^{⊥}]$ ,

where a shaded square represents an arbitrary matrix and the blank represents the matrix with zero elements.

Normal space

Consider the standard Euclidean metric. The normal space at

X

\begin{aligned} N_{X} R_{\underset{―}{r}}^{m \times n} & = {[\begin{array}{c} U & U^{⊥} \end{array}] [\begin{array}{c} 0 & 0 \\ 0 & R^{(m - \underset{―}{r}) \times (n - \underset{―}{r})} \end{array}] {[\begin{array}{c} V & V^{⊥} \end{array}]}^{⊤}} \end{aligned}

and a normal vector is

$[U U^{⊥}]$ $[V V^{⊥}]$ .

With the help of illustrations, it is direct to see that the direct sum of tangent and normal spaces forms the Euclidean space, i.e., $R^{m \times n} = T_{X} R_{\underset{―}{r}}^{m \times n} \oplus N_{X} R_{\underset{―}{r}}^{m \times n}$ .

Projection

Given a matrix $A \in R^{m \times n}$ . The orthogonal projections onto the tangent and normal space are given by $\begin{aligned} P_{T_{X} R_{\underset{―}{r}}^{m \times n}} A & = P_{U} A P_{V} + P_{U}^{⊥} A P_{V}^{} + P_{U}^{} A P_{V}^{⊥}, \\ P_{N_{X} R_{\underset{―}{r}}^{m \times n}} A & = P_{U}^{⊥} A P_{V}^{⊥}, \end{aligned}$ where $P_{U} := U U^{⊤}$ , $P_{U}^{⊥} := I_{m} - U U^{⊤}$ , $P_{V} := V V^{⊤}$ , and $P_{V}^{⊥} := I_{n} - V V^{⊤}$ are projection matrices.

4. Geometry of $R_{\leq r}^{m \times n}$

Although $R_{\leq r}^{m \times n}$ is a non-smooth algebraic variety, its geometry is closely related to fixed-rank manifolds. If $X \in R_{\leq r}^{m \times n}$ is regular, i.e., $rank (X) = r$ , the geometry at $X$ locally behaves as a manifold. Alternatively, if $X$ is singular, one can investigate local geometry via tangent and normal cones.

Tangent cone

Given

X \in R_{\leq r}^{m \times n}

with rank

\underset{―}{r} \leq r

. The (Bouligand) tangent cone of

R_{\leq r}^{m \times n}

X

is defined by

T_{X} R_{\leq r}^{m \times n} := {Ξ \in R^{m \times n} : \exists t^{(i)} \to 0, X^{(i)} \to X in R_{\leq r}^{m \times n}, s . t . \frac{X^{(i)} - X}{t^{(i)}} \to Ξ}

. The parametrization of tangent cone can be obtained through different ways; e.g., [Theorem 6.1]¹⁰ with a diagonal normalization, or [Theorem 3.2]¹¹ with a space decomposition. Specifically, the tangent cone is

\begin{aligned} T_{X} R_{\leq r}^{m \times n} & = {[\begin{array}{c} U & U^{⊥} \end{array}] [\begin{array}{c} R^{\underset{―}{r} \times \underset{―}{r}} & R^{\underset{―}{r} \times (n - \underset{―}{r})} \\ R^{(m - \underset{―}{r}) \times \underset{―}{r}} & R_{\leq (r - \underset{―}{r})}^{(m - \underset{―}{r}) \times (n - \underset{―}{r})} \end{array}] {[\begin{array}{c} V & V^{⊥} \end{array}]}^{⊤}} . \end{aligned}

A new parametrization: Here, we give a new reduced parametrization for the tangent cone by further decomposing an element

F = \tilde{U} S {\tilde{V}}^{⊤}

R_{\leq (r - \underset{―}{r})}^{(m - \underset{―}{r}) \times (n - \underset{―}{r})}

. By some matrix computations¹², we yield a new parametrization as follows,

\begin{aligned} T_{X} R_{\leq r}^{m \times n} & = {[\begin{array}{c} U & U_{1} & U_{2} \end{array}] [\begin{array}{c} R^{\underset{―}{r} \times \underset{―}{r}} & R^{\underset{―}{r} \times (r - \underset{―}{r})} & R^{\underset{―}{r} \times (n - r)} \\ R^{(r - \underset{―}{r}) \times \underset{―}{r}} & R^{(r - \underset{―}{r}) \times (r - \underset{―}{r})} & 0 \\ R^{(m - r) \times \underset{―}{r}} & 0 & 0 \end{array}] {[\begin{array}{c} V & V_{1} & V_{2} \end{array}]}^{⊤}}, \end{aligned}

where $U_{1} \in St (r - \underset{―}{r}, m), V_{1} \in St (r - \underset{―}{r}, n), U_{2} \in St (m - r, m), V_{2} \in St (n - r, n)$ satisfying $[U U_{1} U_{2}] \in O (m)$ and $[V V_{1} V_{2}] \in O (n)$ . Furthemore, it can be illustrated as

$[U U_{1} U_{2}]$ $[V V_{1} V_{2}]$ .

Space decomposition at singular points: Note that if $X \in R_{\leq r}^{m \times n}$ is regular, then the tangent cone boils down to the tangent space. Again from the illustration, it is easy to see the decomposition $T_{X} R_{\leq r}^{m \times n} = T_{X} R_{\underset{―}{r}}^{m \times n} \oplus N_{\leq (r - \underset{―}{r})} (X),$ where $N_{\leq (r - \underset{―}{r})} (X) := {N \in N_{X} R_{\underset{―}{r}}^{m \times n} : rank (N) \leq (r - \underset{―}{r})}$ denotes a low-rank cone in the normal space. It indicates that the local geometry can be identified via the local (fixed-rank) manifold $R_{\underset{―}{r}}^{m \times n}$ . As a result, a tangent vector can be viewed as follows, which gives a feeling how a tangent vector looks like.

Rank increase along normal part: Another interesing fact is that given a matrix $X \in R_{\leq r}^{m \times n}$ with rank $\underset{―}{r} < r$ and a vector in the cone $N \in N_{\leq (r - \underset{―}{r})} (X) ∖ {0}$ , it holds that $rank (X + t N) \in (\underset{―}{r}, r] for t > 0.$ As suggested in [Section 3.3]¹³, this principle can be applied to increase the rank of $X$ , which facilitates the development of rank-adaptive methods.

Normal cone

Given

X \in R_{\leq r}^{m \times n}

with rank

\underset{―}{r} \leq r

. The set

N_{X} R_{\leq r}^{m \times n} = (T_{X} R_{\leq r}^{m \times n})^{\circ} := {N \in R^{m \times n} : ⟨ N, Ξ ⟩ \leq 0 for all Ξ \in T_{X} R_{\leq r}^{m \times n}}

is called the normal cone of

R_{\leq r}^{m \times n}

. It is easy to verify that

\begin{aligned} N_{X} R_{\leq r}^{m \times n} & = {\begin{cases} N_{X} R_{r}^{m \times n}, & if \underset{―}{r} = r; \\ {0}, & if \underset{―}{r} < r . \end{cases} \end{aligned}

Projection

Given a matrix $A \in R^{m \times n}$ . The orthogonal projection onto the tangent space is $P_{T_{X} R_{\leq r}^{m \times n}} A = P_{T_{X} R_{\underset{―}{r}}^{m \times n}} A + P_{\leq (r - \underset{―}{r})} (P_{U}^{⊥} A P_{V}^{⊥}),$

where $P_{\leq (r - \underset{―}{r})}$ is the projection onto $R_{\leq (r - \underset{―}{r})}^{m \times n}$ , which can be achieved by the truncated SVD.

Footnotes

Robin Hartshorne. “Algebraic geometry”. Vol. 52. Springer Science & Business Media, 2013.↩︎
The Jacobian criterion for testing smoothness can be found in a textbook on computational algebraic geometry, e.g., [Corollary 5.6.14] of the book: Gert-Martin Greuel, Gerhard Pfister. “A Singular introduction to commutative algebra”. Vol. 348. Berlin: Springer, 2008.↩︎
V. Lakshmibai, and Justin Brown. “The Grassmannian variety.” Developments in Mathematics. Vol. 42. Springer New York, 2015.↩︎
Winfried Bruns, and Udo Vetter. “Determinantal rings”. Vol. 1327. Springer, 2006.↩︎
John M. Lee, “Introduction to Smooth Manifolds”. Version 3.0. December 31, 2000↩︎
Nicolas Boumal. “An introduction to optimization on smooth manifolds”. Cambridge University Press, 2023.↩︎
Uri Shalit, Daphna Weinshall, and Gal Chechik. “Online learning in the manifold of low-rank matrices.” Advances in neural information processing systems 23 (2010).↩︎
Bart Vandereycken. “Low-rank matrix completion by Riemannian optimization”. In: SIAM Journal on Optimization 23.2 (2013), pp. 12141236.↩︎
Bin Gao, Renfeng Peng, Ya-xiang Yuan. “Low-rank optimization on Tucker tensor varieties.” arXiv:2311.18324 (2023).↩︎
T. P. Cason, P.-A. Absil, and P. Van Dooren, Iterative methods for low rank approximation of graph similarity matrices, Linear Algebra Appl., 438 (2013), pp. 1863–1882.↩︎
Reinhold Schneider and André Uschmajew. “Convergence results for projected line-search methods on varieties of low-rank matrices Via Lojasiewicz Inequality”. In: SIAM Journal on Optimization 25.1 (2015), pp. 622–646.↩︎
See [Section 2.1] of the work by Bin Gao, Renfeng Peng, Ya-xiang Yuan. “Low-rank optimization on Tucker tensor varieties.” arXiv:2311.18324 (2023).↩︎
Bin Gao and P.-A. Absil. “A Riemannian rank-adaptive method for low-rank matrix completion”. In: Computational Optimization and Applications 81 (2022), pp. 67–90.↩︎

1. Algebraic variety

Definition

Singularity (local property)

Smoothness

2. Matrix variety

Definition

Singular points

Topology

3. Fixed-rank manifold

Tangent space

Normal space

Projection

4. Geometry of R≤rm×n

Tangent cone

Normal cone

Projection

Footnotes

4. Geometry of $R_{\leq r}^{m \times n}$