Igh maa-2015 nov

Iterated geometric harmonics for missing data recovery
Iterated geometric harmonics
for missing data recovery
Jonathan A. Lindgren, Erin P. J. Pearse, and Zach Zhang
jlindgre, epearse, zazhang, @calpoly.edu
California Polytechnic State University
Nov. 14, 2015
San Luis Obispo, CA

Motivation: the missing data problem
Introduction and background
The missing data problem
Missing data is often a problem. Data can be lost
while recording measurements,
during storage or transmission,
due to equipment failure,
...

The missing data problem
Missing data is often a problem. Data can be lost
while recording measurements,
during storage or transmission,
due to equipment failure,
...
Existing techniques:
require some records (rows) to be complete, or
require some characteristics (columns) to be complete, or
are based on linear regression.
(But data often has highly nonlinear internal structure!)

A dataset is a collection of vectors, stored as a matrix
The data is an n × p matrix. Each row is a vector of length p; one
row is a record and each column is a parameter or coordinate.
{[ ]n records
(p characteristics)
one record

A dataset is a collection of vectors, stored as a matrix
The data is an n × p matrix. Each row is a vector of length p; one
row is a record and each column is a parameter or coordinate.
EXAMPLES
36 photos, each of size 112 pixels × 92 pixels.
{vk}36
k=1 ⊆ R10,304. (Each photo stored as a vector)
Results from a psychology experiment: a 50-question exam
given to 200 people.
{vk}200
k=1 ⊆ R50.
3000 student records (SAT, ACT, GPA, class rank, etc.)
{vk}3000
k=1 ⊆ R20.

Special case of the missing data problem
Suppose all missing data are in one column






v1
v2 f2
v3
...
vn fn







Consider last column as a function f : {1, 2, . . . , n} → R.

Out-of-sample extension of an empirical function
Idea: A function f is deﬁned on a subset Γ of the dataset.
f : Γ → Y, where Γ ⊆ Rp is the set where value of f is known.
Want to extend f to F : X → Y so that F|Γ(x) = f(x), for x ∈ Γ.
f
XΓ

f
F
XΓ

Application: The data is a sample {(x, f(x))}x∈Γ.
Example: X may be a collection of images or documents.
Y = R
Want to generalize to as-yet-unseen instances in X.
“function extension” ←→ “automated sorting”
=⇒ machine learning/manifold learning

A solution: Geometric harmonics
The network model associated to a dataset
Similarities within data are modeled via nonlinearity
Introduce a nonlinear kernel function k to model the similarity
between two vectors.
k(v, u) =
≈ 0, v and u very different
≈ 1, v and u very similar

Similarities within data are modeled via nonlinearity
Introduce a nonlinear kernel function k to model the similarity
between two vectors.
k(v, u) =
≈ 0, v and u very different
≈ 1, v and u very similar
Two possible choices of such a kernel function:
k(v, u) =
e− v−u 2
2/ε
| Corr(v, u)|m

Convert the dataset into a network
Goal: replace original dataset in Rn×p with a similarity network.
Network = connected weighted undirected graph.
Similarity network = weights represents similarities.

Vector vi −→ vertex vi in the network.



v1
v2
v3
v4




k
−−−−−→
v1 • 4
2
• v2
3
wwwwwwwww
v3 •
1
• v4

Vector vi −→ vertex vi in the network.



v1
v2
v3
v4




k
−−−−−→
v1 • 4
2
• v2
3
wwwwwwwww
v3 •
1
• v4
K =
v1 v2 v3 v4
v1
v2
v3
v4








0 4 2 0
4 0 3 0
2 3 0 1
0 0 1 0









Efﬁciency gain: n × p data matrix → n × n adjacency matrix




v1
v2
v3
v4




k
−−−−−→ K =




0 4 2 0
4 0 3 0
2 3 0 1
0 0 1 0




Ki,j := k(vi, vi)
Advantageous for high-dimensional datasets: p >> n.

Geometric harmonics
Coifman and Lafon introduced the machine learning tool
“geometric harmonics” in 2005.
Idea: the eigenfunctions of a diffusion operator can be used to
perform global analysis of the dataset and of functions on a
dataset.

Geometric harmonics: construction and deﬁnition
For matrix K with Ku,v = k(u, v), consider the integral operator
f → Kf by (Kf)(u) :=
v∈Γ
Ku,vf(v), u ∈ X.
“Restricted matrix multiplication”

v∈Γ
Ku,vf(v), u ∈ X.
Diagonalize restricted matrix [K]u,v∈Γ via:
v∈Γ
Ku,vψj(v) = λjψj(u), u ∈ Γ.
NOTE:
k symmetric =⇒ K symmetric =⇒ {ψj} form ONB

v∈Γ
Ku,vf(v), u ∈ X.
Diagonalize restricted matrix [K]u,v∈Γ via:
v∈Γ
Ku,vψj(v) = λjψj(u), u ∈ Γ.
[Nystr¨om] Reverse this equation to deﬁne values off Γ:
Ψj(u) :=
1
λj
v∈Γ
Ku,vψj(v), u ∈ X.
{Ψj}n
j=1 are the geometric harmonics, where n = |Γ|.

Geometric harmonics: the extension algorithm
For f : Γ → Y and n = |Γ|, deﬁne
F(x) =
n
j=1
f, ψj ΓΨj(x), x ∈ X.

Geometric harmonics: the extension algorithm
For f : Γ → Y and n = |Γ|, deﬁne
F(x) =
n
j=1
f, ψj ΓΨj(x), x ∈ X.
For x ∈ Γ, Ψj(x) = ψj(x), so
F(x) =
n
j=1
f, ψj ΓΨj(x) =
n
j=1
f, ψj Γψj(x) = f(x),
since this is just the decomposition of f in the ONB {ψj}n
j=1.

Geometric harmonics: limitations
Geometric harmonics does not apply to missing data.
Consider f : Γ → R as extra column with holes:






v1
v2
v3 f
...
vn







Geometric harmonics requires ﬁrst p columns to be complete.

Iterated geometric harmonics: basic idea
Underlying assumption of geometric harmonics:
Data are samples from a submanifold.
Restated as a continuity assumption:
If p − 1 entries of u and v are very close, then so is the pth.

Iterated geometric harmonics: basic idea
Underlying assumption of geometric harmonics:
Data are samples from a submanifold.
Restated as a continuity assumption:
If p − 1 entries of u and v are very close, then so is the pth.
Idea: Consider jth column to be a function of the others





v1
v2
...
vn





−→






a11
a21
...
an1
a12
a22
...
an2
. . .
. . .
. . .
a1j
a2j
...
anj
. . .
. . .
. . .
a1p
a2p
...
anp






Geometric harmonics can be applied to jth column.

Iterated geometric harmonics: the iteration scheme
1 Record locations of missing values in the dataset.
2 Stochastically impute missing values.
Drawn from N(µ, σ2
), computed columnwise.

3 Iteration through columns.
(a) Choose (at random) a column to update.
(b) “Unlock” entries of column to be imputed.
(c) Use geometric harmonics to update those entries.
Current column is treated as a function of the others.
New values are initially computed in terms of poor guesses.
Successive passes improve guesses.
(d) Continue until all columns are updated.

3 Iteration through columns.
(a) Choose (at random) a column to update.
(b) “Unlock” entries of column to be imputed.
(c) Use geometric harmonics to update those entries.
Current column is treated as a function of the others.
New values are initially computed in terms of poor guesses.
Successive passes improve guesses.
(d) Continue until all columns are updated.
4 Repeat iteration until updates cause negligible change.
Process typically stabilizes after about 4 cycles.

damaged restored original
(70% data loss)

Iterated geometric harmonics: applications
Iterated geometric harmonics requires continuity assumption
Probably not well-suited to social network analysis, etc.

Iterated geometric harmonics requires multiple similar
datapoints/records
Video footage is a natural application.
10–24 images per second, usually very similar.
Applications for security, military, law enforcement.

Iterated geometric harmonics requires multiple similar
datapoints/records
Video footage is a natural application.
10–24 images per second, usually very similar.
Applications for security, military, law enforcement.
Iterated geometric harmonics excels when p >> n
However, has demonstrated good performance on
low-dimensional time series.
Example: San Diego weather data (next slide)

San Diego Airport weather data
n = 2000, p = 25
0 1 2 3 4 5
0
500
1000
1500
2000
2500
GH Iterations
L−2Error
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 1 2 3 4 5 6
8
10
12
14
16
18
20
22
GH Iterations
StandardDeviation
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4

Summary
Iterated Geometric Harmonics (IGH):
Robust data reconstruction, even at high rates of data loss.
Well suited to high-dimensional problems p >> n.
Relies on continuity assumptions on underlying data.
Application to image reconstruction, video footage, etc.
Patent pending (U.S. Patent Application No.: 14/920,556)

Summary
Iterated Geometric Harmonics (IGH):
Robust data reconstruction, even at high rates of data loss.
Well suited to high-dimensional problems p >> n.
Relies on continuity assumptions on underlying data.
Application to image reconstruction, video footage, etc.
Patent pending (U.S. Patent Application No.: 14/920,556)
Future work: noisy data.

Nov. 14, 2015
San Luis Obispo, CA

Future work
From missing data to noisy data
Future work: noisy data
The problem of “noisy data” is more difﬁcult:
Before improving the data, bad values need to be located.

Future work
Future work: noisy data
The problem of “noisy data” is more difficult:
Before improving the data, bad values need to be located.
Current work: using Markov random fields to detect noise.
Markov random fields: another graph-based tool for data
analysis.

Future work
Future work: Markov random ﬁelds
original
(noisy)
data
improved
data

Future work
original
(noisy)
data
improved
data
a1 a2 a3
a4
w13
u4
u1
u5
u2
u6
u3
w12
w45
w23
w56
w24 w35
a5 a6
b1 b2 b3
b4 b5 b6
Minimize the energy functional:
E = wij(ai − aj)2
+ ui(ai − bi)2
where {bi} are given,
wij are tuned by user (and usually all equal), and
ui are tuned by user (and usually all equal).

Future work
original
(noisy)
data
improved
data
a1 a2 a3
a4
w13
u4
u1
u5
u2
u6
u3
w12
w45
w23
w56
w24 w35
a5 a6
b1 b2 b3
b4 b5 b6
Minimize the energy functional:
E = (ai − aj)2
+ λ (ai − bi)2
where {bi} are given,
wij = ui = 1, and λ is tuned by user.

Future work
Markov random ﬁelds (MRF) use simulated annealing solve
minimize E given {bi}
Output: improved data {ai}.
Our approach:
1 Apply MRF to ﬁnd improved data {ai}.
2 Compare {ai} to original data {bi}.
3 Label nodes with large values of |ai − bi| as missing data.
4 Apply IGH and obtain better improved data.

Future work
Nov. 14, 2015
San Luis Obispo, CA

Theoretical underpinnings
Reproducing kernel Hilbert spaces
Under the hood: reproducing kernel Hilbert spaces
Suppose X ∈ Rn and k : X × X → R is
nonnegative: k(x, y) ≥ 0
symmetric: k(x, y) = k(y, x)
positive semidefinite: for any choice of {xi}m
i=1,
Ki,j = k(xi, xj) defines a positive semidefinite matrix.

i=1,
[Aronszajn] There is a Hilbert space H of functions on X with
kx := k(x, ·) ∈ H, for x ∈ X
kx, f = f(x) (reproducing property)

i=1,
[Aronszajn] There is a Hilbert space H of functions on X with
kx := k(x, ·) ∈ H, for x ∈ X
kx, f = f(x) (reproducing property)
In the discrete case, H is the closure of
f = x axkx, ax ∈ scalars.

For Γ ⊆ X, the operator K : L2(Γ, µ) → H given by
(Kf)(x) =
Γ
k(x, y)f(y)dµ(y), x ∈ X,
turns out to have adjoint operator K : H → L2(Γ, µ) given by
domain restriction:
K g(y) = g(y), y ∈ Γ, g ∈ H.

For Γ ⊆ X, the operator K : L2(Γ, µ) → H given by
(Kf)(x) =
Γ
k(x, y)f(y)dµ(y), x ∈ X,
turns out to have adjoint operator K : H → L2(Γ, µ) given by
domain restriction:
K g(y) = g(y), y ∈ Γ, g ∈ H.
K K is self-adjoint, positive, and compact.
Its eigenvalues are discrete and non-negative.
Since K is restriction, eigs can be found by diagonalizing k
on Γ.

Igh maa-2015 nov

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Igh maa-2015 nov

Similar to Igh maa-2015 nov (20)

Recently uploaded

Recently uploaded (20)

Igh maa-2015 nov