The augmented implicitly restarted Lanczos bidiagonalization algorithm (IRLBA) finds a few approximate largest singular values and corresponding singular vectors of a sparse or dense matrix using a method of Baglama and Reichel.

J. Baglama and L. Reichel, SIAM J. Sci. Comput. (2005). (

It is a fast and memory-efficient way to compute a partial SVD, principal components, and some specialized partial eigenvalue decompositions. I introduced Baglama and Reichel’s irlba algorithm to the R world in a talk at the useR! conference in Dortmund way back in 2008.


For your reading pleasure…

And, some interesting related recent work by Musco and colleagues:

What’s new in version 2.2.1 (May, 2017)

A <- matrix(rnorm(100), 10)
# Define a custom matrix multiplication function that scales the columns of A
# to have unit norm (cf the scale option).

# ------------------ the new way ------------------------------------------
# Simply using an S4 class
setClass("scaled_matrix", contains="matrix", slots=c(scale="numeric"))
setMethod("%*%", signature(x="scaled_matrix", y="numeric"), function(x ,y) x@.Data %*% (y / x@scale))
setMethod("%*%", signature(x="numeric", y="scaled_matrix"), function(x ,y) (x %*% y@.Data) / y@scale)
a <- new("scaled_matrix", A, scale=col_scale)

irlba(a, 3)$d
## [1] 1.820227 1.622988 1.067185

# ------------------ the old way ------------------------------------------
col_scale <- sqrt(apply(A, 2, crossprod))
mult <- function(x, y)
          # check if x is a  vector
          if (is.vector(x))
            return((x %*% y) / col_scale)
          # else x is the matrix
          x %*% (y / col_scale)
irlba(A, 3, mult=mult)$d
## [1] 1.820227 1.622988 1.067185

# ------------------ compare ----------------------------------------------
irlba(A, 3, scale=col_scale)$d
## [1] 1.820227 1.622988 1.067185

# Or,
svd(sweep(A, 2, col_scale, FUN=`/`))$d[1:3]
## [1] 1.820227 1.622988 1.067185


The package vignette (PDF):

This example uses a special one-sided basis option and a custom matrix product to compute the top three principal components of the entire 1000 Genomes Project variant data set (whole genome variants for 2,504 people): The example optionally works in parallel and finishes in only 8 minutes on a 4 node Linux cluster. compares the irlba package with the RSpectra package, a high-quality eigenvalue solver for R.

Other applications

The newe tcor algorithm ( and R package ( – not yet on CRAN) use irlba to compute thresholded correlation matrices very quickly. This example computes the most highly correlated gene expressions from the Cancer Genome Atlas RNASeq gene expression data for breast cancer:

Many more applications will appear here soon, check back!