1. Algebraic variety
Definition
Algebraic varieties are the central objects of study in algebraic geometry.1 In the classical definition, an algebraic variety \(V\) is the set of solutions of a system of polynomial equations. \[V=\{x\in\mathbb{R}^{s}: p_i(x)=0~\text{for}~i=1,\dots,d\} = \{x\in\mathbb{R}^{s}: {P}(x)=(p_1(x),\dots,p_d(x))=0\},\]
where \(p_i(x)\) is polynomial, and we only consider real coefficients.
Singularity (local property)
A point \(x\in V\) is called singular if the tangent space at \(x\) is not regularly defined, i.e., either it does not exist or a special definition must be provided. In other words, it is locally non-flat at \(x\).
A point \(x\in V\) is called regular if \(x\) is not singular.
A singular point \(x\in V\) is called a node if the Hessian matrix at \(x\) is non-singular.
Local structure: For an algebraic variety \(V\), almost all points in \(V\) are regular. Moreover, the set of regular points is both open and dense in \(V\), and \(V\) is a manifold near every regular point.
Jacobian criterion:2 A point \(x\in V\) is singular if the Jacobian matrix \(J(x)=[\nabla p_1(x)~\nabla p_1(x)~\cdots~\nabla p_d(x)]\) of the polynomials \(P(x)\) has a lower rank at \(x\) than other points of \(V\), i.e., \(\mathrm{rank}(J(x)) < \mathrm{rank}(J(y))\). The rationale behind this is that singular points are those at which all the first-order partial derivatives simultaneously vanish, implying a not well-defined tangent space.
Smoothness
An algebraic variety \(V\) is said to be non-singular or smooth if it has no singular points. The Jacobian criterion provides a computational way to test smoothness.
2. Matrix variety
Definition
The set of bounded-rank matrices with \(r\leq \mathrm{min}\{m,n\}\), \[\mathbb{R}^{m\times n}_{\leq r}:=\{X\in\mathbb{R}^{m\times n}: \mathrm{rank}(X)\leq r\},\]
is an algebraic variety (also called matrix variety) since it can be defined by the matrices in which all \((r+1)\times(r+1)\) minors vanish. Let \(I_{r+1}=\{\mathbf{i}_{r+1}=(i_1,\dots,i_{r+1}): 1\leq i_1<\cdots<i_{r+1}\leq m\}\) and \(J_{r+1}=\{\mathbf{j}_{r+1}=(j_1,\dots,j_{r+1}): 1\leq j_1<\cdots<j_{r+1}\leq n\}\) be the index sets of \((r+1)\times(r+1)\) submatrices, it holds that \[\mathbb{R}^{m\times n}_{\leq r}=\{X\in\mathbb{R}^{m\times n}: \mathrm{det}(X_{\mathbf{i}_{r+1}, \mathbf{j}_{r+1}})=0~\text{for all}~\mathbf{i}_{r+1}\in I_{r+1}, \mathbf{j}_{r+1}\in J_{r+1}\},\] where the number of equalities is \(d\). Thus, it is also called a determinantal variety.
Singular points
The singular points of determinantal variety can be derived via the tools in algebraic geometry; e.g., [Theorem 10.3.3]3 or [Proposition 1.1]4. Here, we provide an elementary way to determine singularity by using the Jabobian criterion.
Given a square matrix \(A\). By the Jacobi’s formula, we have \(\nabla_A\mathrm{det}(A)={(A^*)}^{\top}\) where \(A^*\) is the adjoint matrix of \(A\). Thus, the partial gradient on the index \(\{\mathbf{i}_{r+1}, \mathbf{j}_{r+1}\}\) of each defining polynomial is \[\nabla_{X_{\mathbf{i}_{r+1}, \mathbf{j}_{r+1}}}\mathrm{det}(X_{\mathbf{i}_{r+1}, \mathbf{j}_{r+1}}) = (X_{\mathbf{i}_{r+1}, \mathbf{j}_{r+1}}^*)^{\top},\]
which leads to the full gradient \(\nabla_X\mathrm{det}(X_{\mathbf{i}_{r+1}, \mathbf{j}_{r+1}})\) in which \(\{\mathbf{i}_{r+1}, \mathbf{j}_{r+1}\}\) entries are those of \((X_{\mathbf{i}_{r+1}, \mathbf{j}_{r+1}}^*)^{\top}\) and the others are \(0\). According to the representation of an adjoint matrix via the cofactor matrix, the Jacobian matrix of \(d\) defining polynomials reduces to a large matrix \(J(X)\in\mathbb{R}^{m\times nd}\) such that each entry of \(J(X)\) corresponds to either an \(r\times r\) minor \(\mathrm{det}(X_{\mathbf{i}_{r}, \mathbf{j}_{r}})\) or \(0\). In fact, the Jacobian is consist of all \(r\times r\) minors. It directly implies that the Jacobian vanishes at any point \[X\in \mathbb{R}^{m\times n}_{< r}:=\{X\in\mathbb{R}^{m\times n}: \mathrm{rank}(X)< r\}\]
since all \(r\times r\) minors vanish at \(X\in \mathbb{R}^{m\times n}_{< r}\), i.e., \(\mathrm{det}(X_{\mathbf{i}_{r}, \mathbf{j}_{r}})=0\). Therefore, the set of singular points of matrix variety is \(\mathbb{R}^{m\times n}_{< r}\). Note that it is also a matrix variety since \(\mathbb{R}^{m\times n}_{< r}=\mathbb{R}^{m\times n}_{\leq (r-1)}\). Moreover, the set of regular points is \[\mathbb{R}^{m\times n}_{r}:=\{X\in\mathbb{R}^{m\times n}: \mathrm{rank}(X)= r\},\]
which is actually a smooth manifold.
Topology
The matrix variety \(\mathbb{R}^{m\times n}_{\leq r}\) is closed. Moreover, it is straightforward to verify that the set of regular points \(\mathbb{R}^{m\times n}_{r}\) is open and dense in \(\mathbb{R}^{m\times n}_{\leq r}\), which validates the local struture of an algebraic variety. Specifically, a sequence in \(\mathbb{R}^{m\times n}_{r}\) can converge to any singular point, and the closure of \(\mathbb{R}^{m\times n}_{r}\) is the matrix variety itself.
3. Fixed-rank manifold
The set of all fixed-rank matrices \(\mathbb{R}^{m\times n}_{\underline{r}}\) is a smooth manifold called fixed-rank manifold, which is also an embedded submanifold of \(\mathbb{R}^{m\times n}\) with dimension \(\underline{r}(m+n-\underline{r})\). To see this, one can find a local defining mapping from [Example 5.30] of the classic textbook5 or [Section 7.5] of a nice book6 for optimization on manifolds.
The matrix variety can be also interpreted by the “stratified” or “layered” space in the sense of \(\mathbb{R}^{m\times n}_{\leq r}= \bigcup_{\underline{r}=0}^{\underline{r}=r} \mathbb{R}^{m\times n}_{\underline{r}}\).
Tangent space
Given \(X\in\mathbb{R}_{\underline{r}}^{m\times n}\) with \(\underline{r}\leq r\). The tangent space at \(X\) can be parameterized by the derivative of a low-rank decomposition of \(X\). For instance, one can follow \(X=AB^{\top}\) in [Lemma 1]7 or the thin singular value decomposition (SVD) \(X=U\Sigma V^{\top}\) in [Proposition 2.1]8. Here, we follow the latter for the sake of computational convenience.
Give the thin SVD \(X=U\Sigma V^{\top}\), where \(U\in\mathrm{St}(\underline{r},m)\), \(V\in\mathrm{St}(\underline{r},n)\), \(\mathrm{St}(p,n):=\{X\in\mathbb{R}^{n\times p}: X^{\top}X=I\}\) is the Stiefel manifold, and \(\Sigma=\mathrm{Diag}(\sigma_1,\sigma_2,\dots,\sigma_{\underline{r}})\) with \(\sigma_1\geq\sigma_2\geq\cdots\geq\sigma_{\underline{r}}>0\). The tangent can be parameterized as follows,\[\begin{aligned} \mathrm{T}_X \mathbb{R}^{m\times n}_{\underline{r}}&=\left\{\begin{bmatrix} U & U^\perp \end{bmatrix} \begin{bmatrix} \mathbb{R}^{\underline{r}\times \underline{r}} & \mathbb{R}^{\underline{r}\times (n-\underline{r})}\\ \mathbb{R}^{(m-\underline{r})\times \underline{r}} & 0 \end{bmatrix} \begin{bmatrix} V & V^\perp \end{bmatrix}^{\top}\right\}, \end{aligned}\]
where “\(^\perp\)” denotes the orthogonal complement. A tangent vector can be illustrated9 as
\([U\quad U^\perp]\) \([V\quad V^\perp]\),
where a shaded square represents an arbitrary matrix and the blank represents the matrix with zero elements.
Normal space
Consider the standard Euclidean metric. The normal space at \(X\) is \[\begin{aligned} \mathrm{N}_X \mathbb{R}^{m\times n}_{\underline{r}}&=\left\{\begin{bmatrix} U & U^\perp \end{bmatrix} \begin{bmatrix} 0 & 0\\ 0 & \mathbb{R}^{(m-\underline{r})\times (n-\underline{r})} \end{bmatrix} \begin{bmatrix} V & V^\perp \end{bmatrix}^{\top}\right\} \end{aligned}\] and a normal vector is\([U\quad U^\perp]\) \([V\quad V^\perp]\).
With the help of illustrations, it is direct to see that the direct sum of tangent and normal spaces forms the Euclidean space, i.e., \(\mathbb{R}^{m\times n}=\mathrm{T}_X \mathbb{R}^{m\times n}_{\underline{r}}\oplus\mathrm{N}_X \mathbb{R}^{m\times n}_{\underline{r}}\).
Projection
Given a matrix \(A\in\mathbb{R}^{m\times n}\). The orthogonal projections onto the tangent and normal space are given by \[\begin{equation} \begin{aligned} \mathrm{P}_{\mathrm{T}_X \mathbb{R}^{m\times n}_{\underline{r}}} A&=\mathrm{P}_U A \mathrm{P}_V+\mathrm{P}_U^\perp A \mathrm{P}^{}_V+\mathrm{P}_U^{} A \mathrm{P}_V^\perp,\\ \mathrm{P}_{\mathrm{N}_X \mathbb{R}^{m\times n}_{\underline{r}}} A&=\mathrm{P}_U^{\perp} A \mathrm{P}_V^\perp, \end{aligned} \end{equation}\] where \(\mathrm{P}_U:=UU^{\top}\), \(\mathrm{P}_U^\perp:=I_{m}-UU^{\top}\), \(\mathrm{P}_V:=VV^{\top}\), and \(\mathrm{P}_V^\perp:=I_{n}-VV^{\top}\) are projection matrices.
4. Geometry of \(\mathbb{R}^{m\times n}_{\leq r}\)
Although \(\mathbb{R}^{m\times n}_{\leq r}\) is a non-smooth algebraic variety, its geometry is closely related to fixed-rank manifolds. If \(X\in\mathbb{R}^{m\times n}_{\leq r}\) is regular, i.e., \(\mathrm{rank}(X)=r\), the geometry at \(X\) locally behaves as a manifold. Alternatively, if \(X\) is singular, one can investigate local geometry via tangent and normal cones.
Tangent cone
Given \(X\in\mathbb{R}_{\leq r}^{m\times n}\) with rank \(\underline{r}\leq r\). The (Bouligand) tangent cone of \(\mathbb{R}^{m\times n}_{\leq r}\) at \(X\) is defined by \(\mathrm{T}_X \mathbb{R}^{m\times n}_{\leq r}:=\{\varXi\in\mathbb{R}^{m\times n}: \exists t^{(i)}\to 0, X^{(i)}\to X\text{ in }\mathbb{R}^{m\times n}_{\leq r}, \mathrm{s.~t.}~\frac{X^{(i)}-X}{t^{(i)}}\to\varXi\}\). The parametrization of tangent cone can be obtained through different ways; e.g., [Theorem 6.1]10 with a diagonal normalization, or [Theorem 3.2]11 with a space decomposition. Specifically, the tangent cone is \[\begin{aligned} \mathrm{T}_X \mathbb{R}^{m\times n}_{\leq r}&=\left\{\begin{bmatrix} U & U^\perp \end{bmatrix} \begin{bmatrix} \mathbb{R}^{\underline{r}\times \underline{r}} & \mathbb{R}^{\underline{r}\times (n-\underline{r})}\\ \mathbb{R}^{(m-\underline{r})\times \underline{r}} & \mathbb{R}^{(m-\underline{r})\times (n-\underline{r})}_{\leq (r-\underline{r})} \end{bmatrix} \begin{bmatrix} V & V^\perp \end{bmatrix}^{\top}\right\}. \end{aligned}\] A new parametrization: Here, we give a new reduced parametrization for the tangent cone by further decomposing an element \(F=\tilde{U}S\tilde{V}^{\top}\) in \(\mathbb{R}^{(m-\underline{r})\times (n-\underline{r})}_{\leq (r-\underline{r})}\). By some matrix computations12, we yield a new parametrization as follows, \[\begin{aligned} \mathrm{T}_X \mathbb{R}^{m\times n}_{\leq r}&=\left\{\begin{bmatrix} U & U_1 & U_2 \end{bmatrix} \begin{bmatrix} \mathbb{R}^{\underline{r}\times \underline{r}} & \mathbb{R}^{\underline{r}\times (r-\underline{r})} & \mathbb{R}^{\underline{r}\times (n-r)}\\ \mathbb{R}^{(r-\underline{r})\times \underline{r}} & \mathbb{R}^{(r-\underline{r})\times (r-\underline{r})} & 0\\ \mathbb{R}^{(m-r)\times \underline{r}} & 0 & 0 \end{bmatrix} \begin{bmatrix} V & V_1 & V_2 \end{bmatrix}^{\top}\right\}, \end{aligned}\]where \(U_1\in \mathrm{St}(r-\underline{r},m), V_1\in \mathrm{St}(r-\underline{r},n), U_2\in \mathrm{St}(m-r,m), V_2\in \mathrm{St}(n-r,n)\) satisfying \([ U\ U_1\ U_2]\in\mathcal{O}(m)\) and \([ V\ V_1\ V_2]\in\mathcal{O}(n)\). Furthemore, it can be illustrated as
\([U\quad U_1\quad U_2]\) \([V\quad V_1\quad V_2]\).
Space decomposition at singular points: Note that if \(X\in\mathbb{R}^{m\times n}_{\leq r}\) is regular, then the tangent cone boils down to the tangent space. Again from the illustration, it is easy to see the decomposition \[\mathrm{T}_X\mathbb{R}^{m\times n}_{\leq r}=\mathrm{T}_X \mathbb{R}^{m\times n}_{\underline{r}}\oplus\mathrm{N}_{\leq (r-\underline{r})}(X),\] where \(\mathrm{N}_{\leq (r-\underline{r})}(X):=\left\{N\in\mathrm{N}_X \mathbb{R}^{m\times n}_{\underline{r}}: \mathrm{rank}(N)\leq(r-\underline{r})\right\}\) denotes a low-rank cone in the normal space. It indicates that the local geometry can be identified via the local (fixed-rank) manifold \(\mathbb{R}_{\underline{r}}^{m\times n}\). As a result, a tangent vector can be viewed as follows, which gives a feeling how a tangent vector looks like.
Rank increase along normal part: Another interesing fact is that given a matrix \(X\in\mathbb{R}_{\leq r}^{m\times n}\) with rank \(\underline{r} < r\) and a vector in the cone \(N\in\mathrm{N}_{\leq (r-\underline{r})}(X)\setminus\{0\}\), it holds that \[\mathrm{rank}(X+tN)\in(\underline{r},r]\quad \text{for}~t>0.\] As suggested in [Section 3.3]13, this principle can be applied to increase the rank of \(X\), which facilitates the development of rank-adaptive methods.
Normal cone
Given \(X\in\mathbb{R}_{\leq r}^{m\times n}\) with rank \(\underline{r}\leq r\).. The set \(\mathrm{N}_X \mathbb{R}^{m\times n}_{\leq r}=(\mathrm{T}_X \mathbb{R}^{m\times n}_{\leq r})^\circ:=\{N\in\mathbb{R}^{m\times n}:\langle N, \varXi\rangle\leq 0\ \text{for all}\ \varXi\in\mathrm{T}_X \mathbb{R}^{m\times n}_{\leq r}\}\) is called the normal cone of \(\mathbb{R}^{m\times n}_{\leq r}\). It is easy to verify that \[\begin{aligned} \mathrm{N}_X \mathbb{R}^{m\times n}_{\leq r}&= \left\{ \begin{array}{ll} \mathrm{N}_X \mathbb{R}^{m\times n}_{r},&\ \text{if}\ \underline{r}=r;\\ \{0\},&\ \text{if}\ \underline{r}<r. \end{array} \right. \end{aligned}\]Projection
Given a matrix \(A\in\mathbb{R}^{m\times n}\). The orthogonal projection onto the tangent space is \[\mathrm{P}_{\mathrm{T}_X \mathbb{R}^{m\times n}_{\leq r}} A=\mathrm{P}_{\mathrm{T}_X \mathbb{R}^{m\times n}_{\underline{r}}} A+\mathrm{P}_{\leq (r-\underline{r})} \left(\mathrm{P}_U^\perp A \mathrm{P}_V^\perp\right),\]
where \(\mathrm{P}_{\leq (r-\underline{r})}\) is the projection onto \(\mathbb{R}^{m\times n}_{\leq (r-\underline{r})}\), which can be achieved by the truncated SVD.
Footnotes
Robin Hartshorne. “Algebraic geometry”. Vol. 52. Springer Science & Business Media, 2013.↩︎
The Jacobian criterion for testing smoothness can be found in a textbook on computational algebraic geometry, e.g., [Corollary 5.6.14] of the book: Gert-Martin Greuel, Gerhard Pfister. “A Singular introduction to commutative algebra”. Vol. 348. Berlin: Springer, 2008.↩︎
V. Lakshmibai, and Justin Brown. “The Grassmannian variety.” Developments in Mathematics. Vol. 42. Springer New York, 2015.↩︎
Winfried Bruns, and Udo Vetter. “Determinantal rings”. Vol. 1327. Springer, 2006.↩︎
John M. Lee, “Introduction to Smooth Manifolds”. Version 3.0. December 31, 2000↩︎
Nicolas Boumal. “An introduction to optimization on smooth manifolds”. Cambridge University Press, 2023.↩︎
Uri Shalit, Daphna Weinshall, and Gal Chechik. “Online learning in the manifold of low-rank matrices.” Advances in neural information processing systems 23 (2010).↩︎
Bart Vandereycken. “Low-rank matrix completion by Riemannian optimization”. In: SIAM Journal on Optimization 23.2 (2013), pp. 12141236.↩︎
Bin Gao, Renfeng Peng, Ya-xiang Yuan. “Low-rank optimization on Tucker tensor varieties.” arXiv:2311.18324 (2023).↩︎
T. P. Cason, P.-A. Absil, and P. Van Dooren, Iterative methods for low rank approximation of graph similarity matrices, Linear Algebra Appl., 438 (2013), pp. 1863–1882.↩︎
Reinhold Schneider and André Uschmajew. “Convergence results for projected line-search methods on varieties of low-rank matrices Via Lojasiewicz Inequality”. In: SIAM Journal on Optimization 25.1 (2015), pp. 622–646.↩︎
See [Section 2.1] of the work by Bin Gao, Renfeng Peng, Ya-xiang Yuan. “Low-rank optimization on Tucker tensor varieties.” arXiv:2311.18324 (2023).↩︎
Bin Gao and P.-A. Absil. “A Riemannian rank-adaptive method for low-rank matrix completion”. In: Computational Optimization and Applications 81 (2022), pp. 67–90.↩︎