SUBSPACE LEARNING AND IMPUTATION FOR STREAMING
BIG DATA MATRICES AND TENSORS
ABSTRACT
Extracting latent low-dimensional structure from high-dimensional data is of paramount
importance in timely inference tasks encountered with “Big Data” analytics. However,
increasingly noisy, heterogeneous, and incomplete datasets, as well as the need for real-time
processing of streaming data, posemmajor challenges to this end. In this context, the present
paper permeates benefits from rank minimization to scalable imputation of missing data, via
tracking low-dimensional subspaces and unraveling latent (possibly multi-way) structure from
incomplete streaming data. For low-rank matrix data, a subspace estimator is proposed based on
an exponentially weighted least-squares criterion regularized with the nuclear norm. After
recasting the nonseparable nuclear norm into a form amenable to online optimization, real-time
algorithms with complementary strengths are developed, and their convergence is established
under simplifying technical assumptions. In a stationary setting, the asymptotic estimates
obtained offer the well-documented performance guarantees of the batch nuclear-norm
regularized estimator. Under the same unifying framework, a novel online (adaptive) algorithm
is developed to obtain multi-way decompositions of low-rank tensors with missing entries and
perform imputation as a byproduct. Simulated tests with both synthetic as well as real Internet
and cardiac magnetic resonance imagery (MRI) data confirm the efficacy of the proposed
algorithms, and their superior performance relative to state-of-the-art alternatives.

SUBSPACE LEARNING AND IMPUTATION FOR STREAMING BIG DATA MATRICES AND TENSORS

  • 1.
    SUBSPACE LEARNING ANDIMPUTATION FOR STREAMING BIG DATA MATRICES AND TENSORS ABSTRACT Extracting latent low-dimensional structure from high-dimensional data is of paramount importance in timely inference tasks encountered with “Big Data” analytics. However, increasingly noisy, heterogeneous, and incomplete datasets, as well as the need for real-time processing of streaming data, posemmajor challenges to this end. In this context, the present paper permeates benefits from rank minimization to scalable imputation of missing data, via tracking low-dimensional subspaces and unraveling latent (possibly multi-way) structure from incomplete streaming data. For low-rank matrix data, a subspace estimator is proposed based on an exponentially weighted least-squares criterion regularized with the nuclear norm. After recasting the nonseparable nuclear norm into a form amenable to online optimization, real-time algorithms with complementary strengths are developed, and their convergence is established under simplifying technical assumptions. In a stationary setting, the asymptotic estimates obtained offer the well-documented performance guarantees of the batch nuclear-norm regularized estimator. Under the same unifying framework, a novel online (adaptive) algorithm is developed to obtain multi-way decompositions of low-rank tensors with missing entries and perform imputation as a byproduct. Simulated tests with both synthetic as well as real Internet and cardiac magnetic resonance imagery (MRI) data confirm the efficacy of the proposed algorithms, and their superior performance relative to state-of-the-art alternatives.