Talwalkar mlconf (1)

Divide-‐and-‐Conquer

Matrix
Factoriza5on
Ameet
Talwalkar
UC
Berkeley

November
15th,
2013
Collaborators:
Lester
Mackey2,
Michael
I.
Jordan1,

Yadong
Mu3,
Shih-‐Fu
Chang3
1UC
Berkeley

2Stanford
University

3Columbia
University

Three
Converging
Trends

Three
Converging
Trends

Big
Data

Three
Converging
Trends

Big
Data

Distributed

CompuOng

Three
Converging
Trends
Machine

Learning

Big
Data

Distributed

CompuOng

Goal:
Extend
ML
to
the
Big
Data
SeAng

Challenge:
ML
not
developed
with
scalability
in
mind
✦

Does
not
naturally
scale
/
leverage
distributed
compuOng

Machine

Learning

Big
Data

Distributed

CompuOng

Goal:
Extend
ML
to
the
Big
Data
SeAng

Challenge:
ML
not
developed
with
scalability
in
mind
✦

Does
not
naturally
scale
/
leverage
distributed
compuOng

Our
approach:
Divide-‐and-‐conquer
✦

Apply
exisOng
base
algorithms
to
subsets
of
data
and
combine
Machine

Learning

Big
Data

Distributed

CompuOng

Goal:
Extend
ML
to
the
Big
Data
SeAng

Challenge:
ML
not
developed
with
scalability
in
mind
✦

Does
not
naturally
scale
/
leverage
distributed
compuOng

Our
approach:
✦

Apply
exisOng
base
algorithms
to
subsets
of
data
and
combine
✓
✓
✓

Build
upon
exisOng
suites
of
ML
algorithms
Preserve
favorable
algorithm
properOes
Naturally
leverage
distributed
compuOng

Machine

Learning

Big
Data

Distributed

CompuOng

Goal:
Extend
ML
to
the
Big
Data
SeAng

Challenge:
ML
not
developed
with
scalability
in
mind
✦

Does
not
naturally
scale
/
leverage
distributed
compuOng

Our
approach:
✦

Apply
exisOng
base
algorithms
to
subsets
of
data
and
combine
✓
✓
✓

✦

Build
upon
exisOng
suites
of
ML
algorithms
Preserve
favorable
algorithm
properOes
Naturally
leverage
distributed
compuOng

E.g.,

✦
✦
✦

Machine

Learning

Big
Data

Matrix
factorizaOon
(DFC) [MTJ, NIPS11; TMMFJ, ICCV13]
[KTSJ, ICML12; KTSJ,
Assessing
esOmator
quality
(BLB) JRSS13; KTASJ, KDD13]
Genomic
Variant
Calling [BTTJPYS13, submitted, CTZFJP13, submitted]

Distributed

CompuOng

Matrix
CompleOon
Goal: Recover a matrix from a
subset of its entries

Matrix
CompleOon
Goal: Recover a matrix from a
subset of its entries
Can we do this at scale?
✦
✦
✦
✦
✦

Netﬂix: 30M users, 100K+ videos
Facebook: 1B users
Pandora: 70M active users, 1M songs
Amazon: Millions of users and products
...

Reducing
Degrees
of
Freedom

Reducing
Degrees
of
Freedom
✦

Problem: Impossible without
additional information
✦

mn degrees of freedom

n
m

Reducing
Degrees
of
Freedom
✦

✦

✦


Solution: Assume small # of
factors determine preference

n
m

r

=m

n

r

‘Low-rank’

Reducing
Degrees
of
Freedom
✦

✦

✦


Solution: Assume small # of
factors determine preference
✦

O(m + n) degrees of freedom

✦

Linear storage costs

n
m

r

=m

n

r

‘Low-rank’

Bad
Sampling

✦

Problem:

We
have
no
raOng

informaOon
about

Bad
Sampling

✦

Problem:

We
have
no
raOng

informaOon
about

✦

SoluOon:

Assume

˜

+ m))

⌦(r(n
observed
entries
drawn

uniformly
at
random

Bad
InformaOon
Spread

✦

Problem:
Other
raOngs
don’t

inform
us
about
missing
raOng

bad
spread
of
informaOon

Bad
InformaOon
Spread

✦

Problem:
Other
raOngs
don’t

inform
us
about
missing
raOng

✦

SoluOon:

Assume

incoherence
with
standard

basis [Candes and Recht, 2009]

bad
spread
of
informaOon

Matrix
CompleOon

=
In

+

‘noise’

Low-rank

Goal:
Recover
a
matrix
from
a
subset
of
its

entries,
assuming
✦

low-‐rank,
incoherent

✦

uniform
sampling

Matrix
CompleOon

=
In

+
Low-rank

✦

Nuclear-‐norm
heurisOc

+
strong
theoreOcal
guarantees
+
good
empirical
results

‘noise’

Matrix
CompleOon

=
In

+
Low-rank

✦

Nuclear-‐norm
heurisOc

+
strong
theoreOcal
guarantees
+
good
empirical
results
⎯
very
slow
computa5on

‘noise’

Matrix
CompleOon

=
In

+

‘noise’

Low-rank
✦

Nuclear-‐norm
heurisOc

+
strong
theoreOcal
guarantees
+
good
empirical
results
⎯
very
slow
computa5on

Goal:
Scale
MC
algorithms
and
preserve
guarantees

Divide-‐Factor-‐Combine
(DFC)
[MTJ, NIPS11]

(DFC)
[MTJ, NIPS11]

✦

D
step:
Divide
input
matrix
into
submatrices

(DFC)
[MTJ, NIPS11]

✦

D
step:
Divide
input
matrix
into
submatrices

✦

F
step:
Factor
in
parallel
using
a
base
MC
algorithm

(DFC)
[MTJ, NIPS11]

✦

D
step:
Divide
input
matrix
into
submatrices

✦

F
step:
Factor
in
parallel
using
a
base
MC
algorithm

✦

C
step:
Combine
submatrix
esOmates

(DFC)
[MTJ, NIPS11]

✦

D
step:
Divide
input
matrix
into
submatrices

✦

F
step:
Factor
in
parallel
using
a
base
MC
algorithm

✦

C
step:
Combine
submatrix
esOmates

Advantages:
✦

Submatrix
factorizaOon
is
much
cheaper
and
easily
parallelized

✦

Minimal
communicaOon
between
parallel
jobs

✦

Retains
comparable
recovery
guarantees
(with
proper
choice

of
division
/
combinaOon
strategies)

DFC-‐Proj
✦

D
step:
Randomly
parOOon
observed
entries
into
t
submatrices:

DFC-‐Proj
✦

D
step:
Randomly
parOOon
observed
entries
into
t
submatrices:

✦

F
step:
Complete
the
submatrices
in
parallel
✦

Reduced
cost:
Expect
t-‐fold
speedup
per
iteraOon

✦

Parallel
computaOon:
Pay
cost
of
one
cheaper
MC

DFC-‐Proj
✦

D
step:
Randomly
parOOon
observed
entries
into
t
submatrices:

✦

F
step:
Complete
the
submatrices
in
parallel
✦
✦

✦

Reduced
cost:
Expect
t-‐fold
speedup
per
iteraOon
Parallel
computaOon:
Pay
cost
of
one
cheaper
MC

C
step:
Project
onto
single
low-‐dimensional
column
space

DFC-‐Proj
✦

D
step:
Randomly
parOOon
observed
entries
into
t
submatrices:

✦

F
step:
Complete
the
submatrices
in
parallel
✦

Reduced
cost:
Expect
t-‐fold
speedup
per
iteraOon

✦

Parallel
computaOon:
Pay
cost
of
one
cheaper
MC

C
step:
Project
onto
single
low-‐dimensional
column
space

✦
✦

Roughly,
share
informaOon
across
sub-‐soluOons

DFC-‐Proj
✦

D
step:
Randomly
parOOon
observed
entries
into
t
submatrices:

✦

F
step:
Complete
the
submatrices
in
parallel
✦

Reduced
cost:
Expect
t-‐fold
speedup
per
iteraOon

✦

Parallel
computaOon:
Pay
cost
of
one
cheaper
MC

C
step:
Project
onto
single
low-‐dimensional
column
space

✦
✦

Roughly,
share
informaOon
across
sub-‐soluOons

✦

Minimal
cost:
linear
in
n,
quadraOc
in
rank
of
sub-‐soluOons

DFC-‐Proj
✦

D
step:
Randomly
parOOon
observed
entries
into
t
submatrices:

✦

F
step:
Complete
the
submatrices
in
parallel
✦

Reduced
cost:
Expect
t-‐fold
speedup
per
iteraOon

✦

Parallel
computaOon:
Pay
cost
of
one
cheaper
MC

C
step:
Project
onto
single
low-‐dimensional
column
space

✦
✦

Roughly,
share
informaOon
across
sub-‐soluOons

✦

Minimal
cost:
linear
in
n,
quadraOc
in
rank
of
sub-‐soluOons

=

DFC-‐Proj
✦

D
step:
Randomly
parOOon
observed
entries
into
t
submatrices:

✦

F
step:
Complete
the
submatrices
in
parallel
✦

Reduced
cost:
Expect
t-‐fold
speedup
per
iteraOon

✦

Parallel
computaOon:
Pay
cost
of
one
cheaper
MC

C
step:
Project
onto
single
low-‐dimensional
column
space

✦
✦

Roughly,
share
informaOon
across
sub-‐soluOons

✦

Minimal
cost:
linear
in
n,
quadraOc
in
rank
of
sub-‐soluOons

=

=

DFC-‐Proj
✦

D
step:
Randomly
parOOon
observed
entries
into
t
submatrices:

✦

F
step:
Complete
the
submatrices
in
parallel
✦

Reduced
cost:
Expect
t-‐fold
speedup
per
iteraOon

✦

Parallel
computaOon:
Pay
cost
of
one
cheaper
MC

C
step:
Project
onto
single
low-‐dimensional
column
space

✦
✦
✦

✦

Roughly,
share
informaOon
across
sub-‐soluOons
Minimal
cost:
linear
in
n,
quadraOc
in
rank
of
sub-‐soluOons

Ensemble: Project onto column space of each sub-solution and
average

Does
It
Work?
Yes,
with
high
probability.
Theorem:

Assume:

✦ L
0

is
low-‐rank
and
incoherent,

✦
˜

entries
sampled
uniformly
at
random,

⌦(r(n + m))
✦

Nuclear
norm
heurisOc
is
base
algorithm.

Does
It
Work?
Yes,
with
high
probability.
Theorem:

Assume:

✦ L
0

is
low-‐rank
and
incoherent,

✦
˜

entries
sampled
uniformly
at
random,

⌦(r(n + m))
✦

Nuclear
norm
heurisOc
is
base
algorithm.

ˆ

Then

L

=

L0

with
(slightly
less)
high
probability.

Does
It
Work?
Yes,
with
high
probability.
Theorem:

Assume:

✦ L
0

is
low-‐rank
and
incoherent,

✦
˜

entries
sampled
uniformly
at
random,

⌦(r(n + m))
✦

Nuclear
norm
heurisOc
is
base
algorithm.

ˆ

Then

L

=

L0

with
(slightly
less)
high
probability.

✦

Noisy
seang:
(2

✏)
approximaOon
of
original
bound

+

✦

Can
divide
into
an
increasing
number
of
subproblems

˜
(
t

!

1
)
when
number
of
observed
entries
in ! (r2 (n + m))

DFC
Noisy
Recovery
MC

0.25

Proj−10%
Proj−Ens−10%
Base−MC

RMSE

0.2
0.15
0.1
0.05
0
0

2

4

6

8

10

% revealed entries

✦

Noisy recovery relative to base algorithm ( n = 10K, r = 10 )

DFC Speedup
MC
3500
Proj−10%
Proj−Ens−10%
Base−MC

3000

time (s)

2500
2000
1500
1000
500
0

1

2

3
m

✦

4

5
4

x 10

Speedup over APG for random matrices with 4% of entries
revealed and r = 0.001n

Matrix
CompleOon
NeIlix
Prize:

✦ 100
million
raOngs
in
{1,
...
,
5}
✦ 18K
movies,
480K
user
✦ Issues:
Full-‐rank;
Noisy,
non-‐uniform

observaOons

Matrix
CompleOon
NeIlix
Prize:

✦ 100
million
raOngs
in
{1,
...
,
5}
✦ 18K
movies,
480K
user
✦ Issues:
Full-‐rank;
Noisy,
non-‐uniform

observaOons

NeIlix
Method

Error

Time

Nuclear
Norm
DFC,
t=4
DFC,
t=10
DFC-‐Ens,
t=4
DFC-‐Ens,
t=10

0.8433

2653.1s

Matrix
CompleOon
NeIlix
Prize:

✦ 100
million
raOngs
in
{1,
...
,
5}
✦ 18K
movies,
480K
user
✦ Issues:
Full-‐rank;
Noisy,
non-‐uniform

observaOons

NeIlix
Method

Error

Time

Nuclear
Norm
DFC,
t=4
DFC,
t=10
DFC-‐Ens,
t=4
DFC-‐Ens,
t=10

0.8433
0.8436
0.8484
0.8411
0.8433

2653.1s
689.5s
289.7s
689.5s
289.7

Robust
Matrix
FactorizaOon
[Chandrasekaran, Sanghavi, Parrilo, and Willsky, 2009; Candes, Li, Ma, and Wright, 2011; Zhou, Li, Wright, Candes, and Ma, 2010]

Matrix

Comple5on

=
In

+
Low-rank

‘noise’

Robust
Matrix
FactorizaOon

Matrix

Comple5on

=
In

Principal

Component

Analysis

+

+

‘noise’

Low-rank

=
In

‘noise’

Low-rank

Robust
Matrix
FactorizaOon

Matrix

Comple5on

=
In

Principal

Component

Analysis

+

+

In

Low-rank

=
In

‘noise’

Low-rank

=

Robust
Matrix

Factoriza5on

‘noise’

+
Low-rank

+
Sparse
Outliers

‘noise’

Video
Surveillance
✦

Goal:
separate
foreground
from
background

✦
✦
✦

Store
video
as
matrix
Low-rank
=
background
Outliers
=
movement

Video
Surveillance
✦

Goal:
separate
foreground
from
background

✦
✦
✦

Store
video
as
matrix
Low-rank
=
background
Outliers
=
movement

Original
Frame

Video
Surveillance
✦

Goal:
separate
foreground
from
background

✦
✦
✦

Store
video
as
matrix
Low-rank
=
background
Outliers
=
movement

Original
Frame

Nuclear
Norm
(342.5s)

Video
Surveillance
✦

Goal:
separate
foreground
from
background

✦
✦
✦

Store
video
as
matrix
Low-rank
=
background
Outliers
=
movement

Original
Frame

Nuclear
Norm
(342.5s)

DFC-‐5%
(24.2s)

DFC-‐0.5%
(5.2s)

Subspace
SegmentaOon
[Liu, Lin, and Yu, 2010]

Matrix

Comple5on

=
In

+
Low-rank

‘noise’

Subspace
SegmentaOon

Matrix

Comple5on

=
In

Principal

Component

Analysis

+

+

‘noise’

Low-rank

=
In

‘noise’

Low-rank

Subspace
SegmentaOon

Matrix

Comple5on

=
In

Principal

Component

Analysis

+

+

‘noise’

Low-rank

=
In

Subspace

Segmenta5on

‘noise’

Low-rank

=
In

+
Low-rank

‘noise’

MoOvaOon:
Face
images
...

Principal

Component

Analysis

...
In

MoOvaOon:
Face
images
...

Principal

Component

Analysis

...
In

=

+

‘noise’

Low-rank

✦
Model
images
of
one
person
via
one
low-‐dimensional
subspace

MoOvaOon:
Face
images

Subspace

Segmenta5on
In

MoOvaOon:
Face
images

Subspace

Segmenta5on

=
In

+

‘noise’

Low-rank

✦
Model
images
of
ﬁve
people
via
ﬁve
low-‐dimensional
subspaces

MoOvaOon:
Face
images

Subspace

Segmenta5on

=
In

+

‘noise’

Low-rank

✦
Model
images
of
ﬁve
people
via
ﬁve
low-‐dimensional
subspaces
✦
Recover
subspaces

cluster
images

MoOvaOon:
Face
images

Subspace

Segmenta5on

=
In

✦

+

‘noise’

Low-rank

Nuclear
norm
heurisOc
to
provably
recovers
subspaces
✦ Guarantees
are
preserved
with
DFC [TMMFJ, ICCV13]

MoOvaOon:
Face
images

Subspace

Segmenta5on

=
In

+

‘noise’

Low-rank

✦

Toy
Experiment:
IdenOfy
images
corresponding
to
same
person

(10
people,
640
images)

✦

DFC
Results:
Linear
speedup,
State-‐of-‐the-‐art
accuracy

Video
Event
DetecOon

✦
✦

Input:
videos,
some
of
which
are
associated
with
events
Goal:
predict
events
for
unlabeled
videos

Video
Event
DetecOon

✦
✦
✦

Input:
videos,
some
of
which
are
associated
with
events
Goal:
predict
events
for
unlabeled
videos
Idea:
✦

Featurize
each
video

Video
Event
DetecOon

✦
✦
✦

Input:
videos,
some
of
which
are
associated
with
events
Goal:
predict
events
for
unlabeled
videos
Idea:
✦
✦

Featurize
each
video
Learn
video
clusters
via
nuclear
norm
heurisOc

Video
Event
DetecOon

✦
✦
✦

Input:
videos,
some
of
which
are
associated
with
events
Goal:
predict
events
for
unlabeled
videos
Idea:
✦
✦
✦

Featurize
each
video
Learn
video
clusters
via
nuclear
norm
heurisOc
Given
labeled
nodes
and
cluster
structure,
make
predicOons

Video
Event
DetecOon

✦
✦
✦

Input:
videos,
some
of
which
are
associated
with
events
Goal:
predict
events
for
unlabeled
videos
Idea:
✦
✦
✦

Featurize
each
video
Learn
video
clusters
via
nuclear
norm
heurisOc
Given
labeled
nodes
and
cluster
structure,
make
predicOons

Can
do
this
at
scale
with
DFC!

DFC
Summary
✦

DFC:
distributed
framework
for
matrix
factorizaOon
✦ Similar
recovery
guarantees
✦ Signiﬁcant
speedups

✦

DFC
applied
to
3
classes
of
problems:
✦ Matrix
compleOon
✦ Robust
matrix
factorizaOon
✦ Subspace
recovery

✦

Extend
DFC
to
other
MF
methods,
e.g.,
ALS,
SGD?

Big
Data
and
Distributed
CompuOng

are
valuable
resources,
but
...

Big
Data
and
Distributed
CompuOng

are
valuable
resources,
but
...
✦

Challenge
1:
ML
not
developed
with
scalability
in
mind

Big
Data
and
Distributed
CompuOng

are
valuable
resources,
but
...
✦

Challenge
1:
ML
not
developed
with
scalability
in
mind
(e.g.,
DFC)

Big
Data
and
Distributed
CompuOng

are
valuable
resources,
but
...
✦

Challenge
1:
ML
not
developed
with
scalability
in
mind
(e.g.,
DFC)

✦

Challenge
2:
ML
not
developed
with
ease-‐of-‐use
in
mind

Big
Data
and
Distributed
CompuOng

are
valuable
resources,
but
...
✦

Challenge
1:
ML
not
developed
with
scalability
in
mind

ML base

ML base

(e.g.,
DFC)

ML base
ML base
✦

Challenge
2:
ML
not
developed
with
ease-‐of-‐use
in
mind

ML base

ML base
www.mlbase.org

ML base

Talwalkar mlconf (1)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (10)

Similar to Talwalkar mlconf (1)

Similar to Talwalkar mlconf (1) (20)

More from MLconf

More from MLconf (20)

Recently uploaded

Recently uploaded (20)

Talwalkar mlconf (1)