The new ssvd()
function estimates an l1-penalized singular value or principal components decomposition (SVD or PCA) that introduces sparsity in the right singular vectors. Based on the fast and memory-efficient sPCA-rSVD algorithm of Haipeng Shen and Jianhua Huang.
x
A numeric real- or complex-valued matrix or real-valued sparse matrix.k
Matrix rank of the computed decomposition (see the Details section below).n
Number of nonzero components in the right singular vectors. If k > 1
, then a single value of n
specifies the number of nonzero components in each regularized right singular vector. Or, specify a vector of length k
indicating the number of desired nonzero components in each returned vector. See the examples.maxit
Maximum number of soft-thresholding iterations.tol
Convergence is determined when \(\|U_j - U_{j-1}\|_F < tol\), where \(U_j\) is the matrix of estimated left regularized singular vectors at iteration j.center
a logical value indicating whether the variables should be shifted to be zero centered. Alternately, a centering vector of length equal the number of columns of x
can be supplied. Use center=TRUE
to perform a regularized sparse PCA.scale.
a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place. Alternatively, a vector of length equal the number of columns of x
can be supplied, following the convention of the scale
function. See ?scale
for more details.alpha
Optional scalar regularization parameter between zero and one (see Details below).tsvd
Optional initial rank-k truncated SVD or PCA (skips computation if supplied)....
Additional arguments passed verbatim to irlba()
.The ssvd()
function implements a version of an algorithm by Shen and Huang that computes a penalized SVD or PCA that introduces sparsity in the right singular vectors by solving a penalized least squares problem. The algorithm in the rank 1 case finds vectors {u, w} that minimize
\[\|x - u w^T\|_F^2 + \lambda \|w\|_1\]
such that \(\|u\| = 1\), and then sets \(v = w / \|w\|\) and \(d = u^T x v\); see the paper [1] for details. The penalty λ is implicitly determined from the specified desired number of nonzero values n
. Higher rank output is determined similarly but using a sequence of \(\lambda\) values determined to maintain the desired number of nonzero elements in each column of v
specified by n
. Unlike standard SVD or PCA, the columns of the returned v
when k > 1
may not be orthogonal.
A list containing the following components:
u
regularized left singular vectors with orthonormal columnsd
regularized upper-triangluar projection matrix so that x %*% v == u %*% d
v
regularized, sparse right singular vectors with columns of unit normcenter, scale
the centering and scaling used, if anylambda
the per-column regularization parameter found to obtain the desired sparsityiter
number of soft thresholding iterationsn
value of input parameter n
alpha
value of input parameter alpha
Our implementation of the Shen-Huang method makes the following choices:
v
, value(s) for the λ penalty are determined to achieve the sparsity goal subject to the parameter alpha
.k > 1
) instead of the deflation method described in the reference.alpha
parameter, a scalar between zero and one, selects any possible value of lambda that produces the desired number of nonzero entries. The default alpha = 0
selects a penalized solution with largest corresponding value of d
in the 1-d case. Think of alpha
as fine-tuning of the penalty.d
when k > 1
so that x %*% v == u %*% d
. Non-zero elements above the diagonal result from non-orthogonality of the v
matrix, providing a simple interpretation of cumulative information, or explained variance in the PCA case, via the singular value decomposition of d %*% t(v)
.What if you have no idea of values of the desired sparsity argument n
? The papers[1,2] referenced below describe cross-validation and an ad-hoc approaches; neither of which are in the package yet. Both may be prohibitively computationally expensive for matrices with a huge number of columns. A future version of this package will include a revised approach to automatically selecting a reasonable sparsity constraint.
Compare with the similar but much more general functions SPC()
and PMD()
in the PMA package by Daniela M. Witten, Robert Tibshirani, Sam Gross, and Balasubramanian Narasimhan. The PMD
function can compute low-rank regularized matrix decompositions with sparsity penalties on both the u
and v
vectors. The ssvd
function is similar to the PMD(*, L1) method invocation of PMD()
or alternatively the SPC()
function. Although less general than PMD()
the ssvd()
function can be faster and more memory efficient for the basic sparse PCA problem. See the examples below for some simple comparisons.
(Note that the s4vd package by Martin Sill and Sebastian Kaiser, https://cran.r-project.org/package=s4vd, includes a fast optimized version of a closely related algorithm by Shen, Huang, and Marron, that penalizes both u
and v
.)
suppressMessages(library(irlba))
set.seed(1)
u <- matrix(rnorm(200), ncol=1)
v <- matrix(c(runif(50, min=0.1), rep(0,250)), ncol=1)
u <- u / drop(sqrt(crossprod(u)))
v <- v / drop(sqrt(crossprod(v)))
x <- u %*% t(v) + 0.001 * matrix(rnorm(200*300), ncol=300)
s <- ssvd(x, n=50)
table(actual=v[, 1] != 0, estimated=s$v[, 1] != 0)
## estimated
## actual FALSE TRUE
## FALSE 250 0
## TRUE 0 50
oldpar <- par(mfrow=c(2, 1))
plot(u, main="True u (black circles), Estimated u (blue discs)")
points(s$u, pch=19, col=4, cex=0.5)
plot(v, main="True v (black circles), Estimated v (blue discs)")
points(s$v, pch=19, col=4, cex=0.5)
Compare with the SPC()
function in the PMA package, regularizing only the v
vector and choosing the regularization constraint sum(abs(s$v))
computed above by ssvd (both methods find the about same solution in this “sparse SVD” case):
suppressMessages(library(PMA))
p <- SPC(x, sumabsv=sum(abs(s$v)), center=FALSE, trace=FALSE)
table(actual=v[, 1] != 0, estimated=p$v[, 1] != 0)
## estimated
## actual FALSE TRUE
## FALSE 250 0
## TRUE 0 50
Compare regularized singular values:
print(c(ssvd=s$d, SPC=p$d))
## ssvd SPC
## 1.000728 1.000727
Same example, but computing a “sparse PCA” instead of “sparse SVD”. Both methods again compute about the same results:
sp <- ssvd(x, n=50, center=TRUE)
pp <- SPC(x, sumabsv=sum(abs(sp$v)), center=TRUE, trace=FALSE)
print(c(ssvd=sp$d, SPC=pp$d))
## ssvd SPC
## 0.9999901 1.0005540
Let’s consider a simple rank 2 example (k=2
) with noise. Like the last example, we know the exact number of nonzero elements in each solution vector of the noise-free matrix. Note the application of different sparsity constraints on each column of the estimated v
.
set.seed(1)
u <- qr.Q(qr(matrix(rnorm(400), ncol=2)))
v <- matrix(0, ncol=2, nrow=300)
v[sample(300, 15), 1] <- runif(15, min=0.1)
v[sample(300, 50), 2] <- runif(50, min=0.1)
v <- qr.Q(qr(v))
x <- u %*% (c(2, 1) * t(v)) + .001 * matrix(rnorm(200 * 300), 200)
s <- ssvd(x, k=2, n=colSums(v != 0))
# Compare actual and estimated vectors (compare up to sign):
table(actual=v[, 1] != 0, estimated=s$v[, 1] != 0)
## estimated
## actual FALSE TRUE
## FALSE 285 0
## TRUE 0 15
table(actual=v[, 2] != 0, estimated=s$v[, 2] != 0)
## estimated
## actual FALSE TRUE
## FALSE 235 2
## TRUE 2 61
plot(v[, 1], main="True v1 (black circles), Estimated v1 (blue discs)")
points(-s$v[, 1], pch=19, col=4, cex=0.5)
plot(v[, 2], main="True v2 (black circles), Estimated v2 (blue discs)")
points(-s$v[, 2], pch=19, col=4, cex=0.5)
The ssvd()
function can outperform SPC()
for large dense and sparse problems, as shown in the following very simple rank 1 5,000 × 10,000 dense real matrix example. The example was run my home PC with an AMD A10-7850K quad-core 3700 MHz CPU and 16 GB of DDR3 synchronous, unbuffered 1333 MHz RAM running Ubuntu 16.04 and R version 3.4.2 linked to OpenBLAS (using OpenMP) r0.3.0 (OMP_NUM_CORES
variable unset, so using all four cores when possible).
The methods obtain about the same solution…
library(irlba)
set.seed(1)
u <- matrix(rnorm(5000), ncol=1)
v <- matrix(c(runif(50, min=0.1), rep(0,9950)), ncol=1)
u <- u / drop(sqrt(crossprod(u)))
v <- v / drop(sqrt(crossprod(v)))
x <- u %*% t(v) + 0.001 * matrix(rnorm(5000 * 10000), ncol=10000)
t1 <- system.time(s <- ssvd(x, n=50))
suppressMessages(library(PMA))
t2 <- system.time(p <- SPC(x, sumabsv=sum(abs(s$v)), center=FALSE, trace=FALSE))
print(drop(sqrt(crossprod(s$v - p$v))))
[1] 2.99928e-07
but with substantially different performance…
print(rbind(ssvd=t1, SPC=t2)[, 1:3])
user.self sys.self elapsed
ssvd 3.884 0.004 1.223
SPC 1164.848 19.464 311.410
Again, SPC()
and especially the PMD()
function are more flexible and general than ssvd()
, but ssvd()
might be better for large-scale problems.