SlideShare a Scribd company logo
1 of 49
Download to read offline
Iterated geometric harmonics for missing data recovery
Iterated geometric harmonics
for missing data recovery
Jonathan A. Lindgren, Erin P. J. Pearse, and Zach Zhang
jlindgre, epearse, zazhang, @calpoly.edu
California Polytechnic State University
Nov. 14, 2015
California Polytechnic State University
San Luis Obispo, CA
Iterated geometric harmonics for missing data recovery
Motivation: the missing data problem
Introduction and background
The missing data problem
Missing data is often a problem. Data can be lost
while recording measurements,
during storage or transmission,
due to equipment failure,
...
Iterated geometric harmonics for missing data recovery
Motivation: the missing data problem
Introduction and background
The missing data problem
Missing data is often a problem. Data can be lost
while recording measurements,
during storage or transmission,
due to equipment failure,
...
Existing techniques:
require some records (rows) to be complete, or
require some characteristics (columns) to be complete, or
are based on linear regression.
(But data often has highly nonlinear internal structure!)
Iterated geometric harmonics for missing data recovery
Motivation: the missing data problem
Introduction and background
A dataset is a collection of vectors, stored as a matrix
The data is an n × p matrix. Each row is a vector of length p; one
row is a record and each column is a parameter or coordinate.
{[ ]n records
(p characteristics)
one record
Iterated geometric harmonics for missing data recovery
Motivation: the missing data problem
Introduction and background
A dataset is a collection of vectors, stored as a matrix
The data is an n × p matrix. Each row is a vector of length p; one
row is a record and each column is a parameter or coordinate.
EXAMPLES
36 photos, each of size 112 pixels × 92 pixels.
{vk}36
k=1 ⊆ R10,304. (Each photo stored as a vector)
Results from a psychology experiment: a 50-question exam
given to 200 people.
{vk}200
k=1 ⊆ R50.
3000 student records (SAT, ACT, GPA, class rank, etc.)
{vk}3000
k=1 ⊆ R20.
Iterated geometric harmonics for missing data recovery
Motivation: the missing data problem
Introduction and background
Special case of the missing data problem
Suppose all missing data are in one column






v1
v2 f2
v3
...
vn fn







Consider last column as a function f : {1, 2, . . . , n} → R.
Iterated geometric harmonics for missing data recovery
Motivation: the missing data problem
Introduction and background
Out-of-sample extension of an empirical function
Idea: A function f is defined on a subset Γ of the dataset.
f : Γ → Y, where Γ ⊆ Rp is the set where value of f is known.
Want to extend f to F : X → Y so that F|Γ(x) = f(x), for x ∈ Γ.
f
XΓ
Iterated geometric harmonics for missing data recovery
Motivation: the missing data problem
Introduction and background
Out-of-sample extension of an empirical function
Idea: A function f is defined on a subset Γ of the dataset.
f : Γ → Y, where Γ ⊆ Rp is the set where value of f is known.
Want to extend f to F : X → Y so that F|Γ(x) = f(x), for x ∈ Γ.
f
F
XΓ
Iterated geometric harmonics for missing data recovery
Motivation: the missing data problem
Introduction and background
Out-of-sample extension of an empirical function
Idea: A function f is defined on a subset Γ of the dataset.
f : Γ → Y, where Γ ⊆ Rp is the set where value of f is known.
Want to extend f to F : X → Y so that F|Γ(x) = f(x), for x ∈ Γ.
Application: The data is a sample {(x, f(x))}x∈Γ.
Example: X may be a collection of images or documents.
Y = R
Want to generalize to as-yet-unseen instances in X.
“function extension” ←→ “automated sorting”
=⇒ machine learning/manifold learning
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
The network model associated to a dataset
Similarities within data are modeled via nonlinearity
Introduce a nonlinear kernel function k to model the similarity
between two vectors.
k(v, u) =
≈ 0, v and u very different
≈ 1, v and u very similar
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
The network model associated to a dataset
Similarities within data are modeled via nonlinearity
Introduce a nonlinear kernel function k to model the similarity
between two vectors.
k(v, u) =
≈ 0, v and u very different
≈ 1, v and u very similar
Two possible choices of such a kernel function:
k(v, u) =
e− v−u 2
2/ε
| Corr(v, u)|m
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
The network model associated to a dataset
Convert the dataset into a network
Goal: replace original dataset in Rn×p with a similarity network.
Network = connected weighted undirected graph.
Similarity network = weights represents similarities.
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
The network model associated to a dataset
Convert the dataset into a network
Goal: replace original dataset in Rn×p with a similarity network.
Network = connected weighted undirected graph.
Similarity network = weights represents similarities.
Vector vi −→ vertex vi in the network.



v1
v2
v3
v4




k
−−−−−→
v1 • 4
2
• v2
3
wwwwwwwww
v3 •
1
• v4
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
The network model associated to a dataset
Convert the dataset into a network
Goal: replace original dataset in Rn×p with a similarity network.
Network = connected weighted undirected graph.
Similarity network = weights represents similarities.
Vector vi −→ vertex vi in the network.



v1
v2
v3
v4




k
−−−−−→
v1 • 4
2
• v2
3
wwwwwwwww
v3 •
1
• v4
K =
v1 v2 v3 v4
v1
v2
v3
v4








0 4 2 0
4 0 3 0
2 3 0 1
0 0 1 0








Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
The network model associated to a dataset
Convert the dataset into a network
Goal: replace original dataset in Rn×p with a similarity network.
Network = connected weighted undirected graph.
Similarity network = weights represents similarities.
Efficiency gain: n × p data matrix → n × n adjacency matrix




v1
v2
v3
v4




k
−−−−−→ K =




0 4 2 0
4 0 3 0
2 3 0 1
0 0 1 0




Ki,j := k(vi, vi)
Advantageous for high-dimensional datasets: p >> n.
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
The network model associated to a dataset
Geometric harmonics
Coifman and Lafon introduced the machine learning tool
“geometric harmonics” in 2005.
Idea: the eigenfunctions of a diffusion operator can be used to
perform global analysis of the dataset and of functions on a
dataset.
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
The network model associated to a dataset
Geometric harmonics: construction and definition
For matrix K with Ku,v = k(u, v), consider the integral operator
f → Kf by (Kf)(u) :=
v∈Γ
Ku,vf(v), u ∈ X.
“Restricted matrix multiplication”
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
The network model associated to a dataset
Geometric harmonics: construction and definition
For matrix K with Ku,v = k(u, v), consider the integral operator
f → Kf by (Kf)(u) :=
v∈Γ
Ku,vf(v), u ∈ X.
Diagonalize restricted matrix [K]u,v∈Γ via:
v∈Γ
Ku,vψj(v) = λjψj(u), u ∈ Γ.
NOTE:
k symmetric =⇒ K symmetric =⇒ {ψj} form ONB
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
The network model associated to a dataset
Geometric harmonics: construction and definition
For matrix K with Ku,v = k(u, v), consider the integral operator
f → Kf by (Kf)(u) :=
v∈Γ
Ku,vf(v), u ∈ X.
Diagonalize restricted matrix [K]u,v∈Γ via:
v∈Γ
Ku,vψj(v) = λjψj(u), u ∈ Γ.
[Nystr¨om] Reverse this equation to define values off Γ:
Ψj(u) :=
1
λj
v∈Γ
Ku,vψj(v), u ∈ X.
{Ψj}n
j=1 are the geometric harmonics, where n = |Γ|.
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
The network model associated to a dataset
Geometric harmonics: the extension algorithm
For f : Γ → Y and n = |Γ|, define
F(x) =
n
j=1
f, ψj ΓΨj(x), x ∈ X.
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
The network model associated to a dataset
Geometric harmonics: the extension algorithm
For f : Γ → Y and n = |Γ|, define
F(x) =
n
j=1
f, ψj ΓΨj(x), x ∈ X.
For x ∈ Γ, Ψj(x) = ψj(x), so
F(x) =
n
j=1
f, ψj ΓΨj(x) =
n
j=1
f, ψj Γψj(x) = f(x),
since this is just the decomposition of f in the ONB {ψj}n
j=1.
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
The network model associated to a dataset
Geometric harmonics: limitations
Geometric harmonics does not apply to missing data.
Consider f : Γ → R as extra column with holes:






v1
v2
v3 f
...
vn







Geometric harmonics requires first p columns to be complete.
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
Iterated geometric harmonics: basic idea
Underlying assumption of geometric harmonics:
Data are samples from a submanifold.
Restated as a continuity assumption:
If p − 1 entries of u and v are very close, then so is the pth.
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
Iterated geometric harmonics: basic idea
Underlying assumption of geometric harmonics:
Data are samples from a submanifold.
Restated as a continuity assumption:
If p − 1 entries of u and v are very close, then so is the pth.
Idea: Consider jth column to be a function of the others





v1
v2
...
vn





−→






a11
a21
...
an1
a12
a22
...
an2
. . .
. . .
. . .
a1j
a2j
...
anj
. . .
. . .
. . .
a1p
a2p
...
anp






Geometric harmonics can be applied to jth column.
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
Iterated geometric harmonics: the iteration scheme
1 Record locations of missing values in the dataset.
2 Stochastically impute missing values.
Drawn from N(µ, σ2
), computed columnwise.
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
Iterated geometric harmonics: the iteration scheme
1 Record locations of missing values in the dataset.
2 Stochastically impute missing values.
Drawn from N(µ, σ2
), computed columnwise.
3 Iteration through columns.
(a) Choose (at random) a column to update.
(b) “Unlock” entries of column to be imputed.
(c) Use geometric harmonics to update those entries.
Current column is treated as a function of the others.
New values are initially computed in terms of poor guesses.
Successive passes improve guesses.
(d) Continue until all columns are updated.
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
Iterated geometric harmonics: the iteration scheme
1 Record locations of missing values in the dataset.
2 Stochastically impute missing values.
Drawn from N(µ, σ2
), computed columnwise.
3 Iteration through columns.
(a) Choose (at random) a column to update.
(b) “Unlock” entries of column to be imputed.
(c) Use geometric harmonics to update those entries.
Current column is treated as a function of the others.
New values are initially computed in terms of poor guesses.
Successive passes improve guesses.
(d) Continue until all columns are updated.
4 Repeat iteration until updates cause negligible change.
Process typically stabilizes after about 4 cycles.
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
damaged restored original
(70% data loss)
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
Iterated geometric harmonics: applications
Iterated geometric harmonics requires continuity assumption
Probably not well-suited to social network analysis, etc.
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
Iterated geometric harmonics: applications
Iterated geometric harmonics requires continuity assumption
Probably not well-suited to social network analysis, etc.
Iterated geometric harmonics requires multiple similar
datapoints/records
Video footage is a natural application.
10–24 images per second, usually very similar.
Applications for security, military, law enforcement.
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
Iterated geometric harmonics: applications
Iterated geometric harmonics requires continuity assumption
Probably not well-suited to social network analysis, etc.
Iterated geometric harmonics requires multiple similar
datapoints/records
Video footage is a natural application.
10–24 images per second, usually very similar.
Applications for security, military, law enforcement.
Iterated geometric harmonics excels when p >> n
However, has demonstrated good performance on
low-dimensional time series.
Example: San Diego weather data (next slide)
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
San Diego Airport weather data
n = 2000, p = 25
0 1 2 3 4 5
0
500
1000
1500
2000
2500
GH Iterations
L−2Error
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 1 2 3 4 5 6
8
10
12
14
16
18
20
22
GH Iterations
StandardDeviation
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
Summary
Iterated Geometric Harmonics (IGH):
Robust data reconstruction, even at high rates of data loss.
Well suited to high-dimensional problems p >> n.
Relies on continuity assumptions on underlying data.
Application to image reconstruction, video footage, etc.
Patent pending (U.S. Patent Application No.: 14/920,556)
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
Summary
Iterated Geometric Harmonics (IGH):
Robust data reconstruction, even at high rates of data loss.
Well suited to high-dimensional problems p >> n.
Relies on continuity assumptions on underlying data.
Application to image reconstruction, video footage, etc.
Patent pending (U.S. Patent Application No.: 14/920,556)
Future work: noisy data.
Iterated geometric harmonics for missing data recovery
A solution: Geometric harmonics
Iterated geometric harmonics
Iterated geometric harmonics
for missing data recovery
Jonathan A. Lindgren, Erin P. J. Pearse, and Zach Zhang
jlindgre, epearse, zazhang, @calpoly.edu
California Polytechnic State University
Nov. 14, 2015
California Polytechnic State University
San Luis Obispo, CA
Iterated geometric harmonics for missing data recovery
Future work
From missing data to noisy data
Future work: noisy data
The problem of “noisy data” is more difficult:
Before improving the data, bad values need to be located.
Iterated geometric harmonics for missing data recovery
Future work
From missing data to noisy data
Future work: noisy data
The problem of “noisy data” is more difficult:
Before improving the data, bad values need to be located.
Current work: using Markov random fields to detect noise.
Markov random fields: another graph-based tool for data
analysis.
Iterated geometric harmonics for missing data recovery
Future work
From missing data to noisy data
Future work: Markov random fields
original
(noisy)
data
improved
data
Iterated geometric harmonics for missing data recovery
Future work
From missing data to noisy data
Future work: Markov random fields
original
(noisy)
data
improved
data
a1 a2 a3
a4
w13
u4
u1
u5
u2
u6
u3
w12
w45
w23
w56
w24 w35
a5 a6
b1 b2 b3
b4 b5 b6
Minimize the energy functional:
E = wij(ai − aj)2
+ ui(ai − bi)2
where {bi} are given,
wij are tuned by user (and usually all equal), and
ui are tuned by user (and usually all equal).
Iterated geometric harmonics for missing data recovery
Future work
From missing data to noisy data
Future work: Markov random fields
original
(noisy)
data
improved
data
a1 a2 a3
a4
w13
u4
u1
u5
u2
u6
u3
w12
w45
w23
w56
w24 w35
a5 a6
b1 b2 b3
b4 b5 b6
Minimize the energy functional:
E = (ai − aj)2
+ λ (ai − bi)2
where {bi} are given,
wij = ui = 1, and λ is tuned by user.
Iterated geometric harmonics for missing data recovery
Future work
From missing data to noisy data
Future work: Markov random fields
Markov random fields (MRF) use simulated annealing solve
minimize E given {bi}
Output: improved data {ai}.
Our approach:
1 Apply MRF to find improved data {ai}.
2 Compare {ai} to original data {bi}.
3 Label nodes with large values of |ai − bi| as missing data.
4 Apply IGH and obtain better improved data.
Iterated geometric harmonics for missing data recovery
Future work
From missing data to noisy data
Iterated geometric harmonics
for missing data recovery
Jonathan A. Lindgren, Erin P. J. Pearse, and Zach Zhang
jlindgre, epearse, zazhang, @calpoly.edu
California Polytechnic State University
Nov. 14, 2015
California Polytechnic State University
San Luis Obispo, CA
Iterated geometric harmonics for missing data recovery
Theoretical underpinnings
Reproducing kernel Hilbert spaces
Under the hood: reproducing kernel Hilbert spaces
Suppose X ∈ Rn and k : X × X → R is
nonnegative: k(x, y) ≥ 0
symmetric: k(x, y) = k(y, x)
positive semidefinite: for any choice of {xi}m
i=1,
Ki,j = k(xi, xj) defines a positive semidefinite matrix.
Iterated geometric harmonics for missing data recovery
Theoretical underpinnings
Reproducing kernel Hilbert spaces
Under the hood: reproducing kernel Hilbert spaces
Suppose X ∈ Rn and k : X × X → R is
nonnegative: k(x, y) ≥ 0
symmetric: k(x, y) = k(y, x)
positive semidefinite: for any choice of {xi}m
i=1,
Ki,j = k(xi, xj) defines a positive semidefinite matrix.
[Aronszajn] There is a Hilbert space H of functions on X with
kx := k(x, ·) ∈ H, for x ∈ X
kx, f = f(x) (reproducing property)
Iterated geometric harmonics for missing data recovery
Theoretical underpinnings
Reproducing kernel Hilbert spaces
Under the hood: reproducing kernel Hilbert spaces
Suppose X ∈ Rn and k : X × X → R is
nonnegative: k(x, y) ≥ 0
symmetric: k(x, y) = k(y, x)
positive semidefinite: for any choice of {xi}m
i=1,
Ki,j = k(xi, xj) defines a positive semidefinite matrix.
[Aronszajn] There is a Hilbert space H of functions on X with
kx := k(x, ·) ∈ H, for x ∈ X
kx, f = f(x) (reproducing property)
In the discrete case, H is the closure of
f = x axkx, ax ∈ scalars.
Iterated geometric harmonics for missing data recovery
Theoretical underpinnings
Reproducing kernel Hilbert spaces
Under the hood: reproducing kernel Hilbert spaces
For Γ ⊆ X, the operator K : L2(Γ, µ) → H given by
(Kf)(x) =
Γ
k(x, y)f(y)dµ(y), x ∈ X,
turns out to have adjoint operator K : H → L2(Γ, µ) given by
domain restriction:
K g(y) = g(y), y ∈ Γ, g ∈ H.
Iterated geometric harmonics for missing data recovery
Theoretical underpinnings
Reproducing kernel Hilbert spaces
Under the hood: reproducing kernel Hilbert spaces
For Γ ⊆ X, the operator K : L2(Γ, µ) → H given by
(Kf)(x) =
Γ
k(x, y)f(y)dµ(y), x ∈ X,
turns out to have adjoint operator K : H → L2(Γ, µ) given by
domain restriction:
K g(y) = g(y), y ∈ Γ, g ∈ H.
K K is self-adjoint, positive, and compact.
Its eigenvalues are discrete and non-negative.
Since K is restriction, eigs can be found by diagonalizing k
on Γ.

More Related Content

What's hot

A Hough Transform Based On a Map-Reduce Algorithm
A Hough Transform Based On a Map-Reduce AlgorithmA Hough Transform Based On a Map-Reduce Algorithm
A Hough Transform Based On a Map-Reduce AlgorithmIJERA Editor
 
Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsYoonho Lee
 
ABC short course: final chapters
ABC short course: final chaptersABC short course: final chapters
ABC short course: final chaptersChristian Robert
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapterChristian Robert
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Christian Robert
 
Kolev skalna2018 article-exact_solutiontoa_parametricline
Kolev skalna2018 article-exact_solutiontoa_parametriclineKolev skalna2018 article-exact_solutiontoa_parametricline
Kolev skalna2018 article-exact_solutiontoa_parametriclineAlina Barbulescu
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannolli0601
 
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsFrank Nielsen
 
A lattice-based consensus clustering
A lattice-based consensus clusteringA lattice-based consensus clustering
A lattice-based consensus clusteringDmitrii Ignatov
 
Scalable sentiment classification for big data analysis using naive bayes cla...
Scalable sentiment classification for big data analysis using naive bayes cla...Scalable sentiment classification for big data analysis using naive bayes cla...
Scalable sentiment classification for big data analysis using naive bayes cla...Tien-Yang (Aiden) Wu
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modulesChristian Robert
 
Refining Bayesian Data Analysis Methods for Use with Longer Waveforms
Refining Bayesian Data Analysis Methods for Use with Longer WaveformsRefining Bayesian Data Analysis Methods for Use with Longer Waveforms
Refining Bayesian Data Analysis Methods for Use with Longer WaveformsJames Bell
 

What's hot (20)

ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
Deep Learning Opening Workshop - Domain Adaptation Challenges in Genomics: a ...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
A Hough Transform Based On a Map-Reduce Algorithm
A Hough Transform Based On a Map-Reduce AlgorithmA Hough Transform Based On a Map-Reduce Algorithm
A Hough Transform Based On a Map-Reduce Algorithm
 
Gradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation GraphsGradient Estimation Using Stochastic Computation Graphs
Gradient Estimation Using Stochastic Computation Graphs
 
ABC short course: final chapters
ABC short course: final chaptersABC short course: final chapters
ABC short course: final chapters
 
ABC-Gibbs
ABC-GibbsABC-Gibbs
ABC-Gibbs
 
ABC short course: model choice chapter
ABC short course: model choice chapterABC short course: model choice chapter
ABC short course: model choice chapter
 
QMC: Transition Workshop - Reduced Component-by-Component Constructions of (P...
QMC: Transition Workshop - Reduced Component-by-Component Constructions of (P...QMC: Transition Workshop - Reduced Component-by-Component Constructions of (P...
QMC: Transition Workshop - Reduced Component-by-Component Constructions of (P...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Laplace's Demon: seminar #1
Laplace's Demon: seminar #1Laplace's Demon: seminar #1
Laplace's Demon: seminar #1
 
Kolev skalna2018 article-exact_solutiontoa_parametricline
Kolev skalna2018 article-exact_solutiontoa_parametriclineKolev skalna2018 article-exact_solutiontoa_parametricline
Kolev skalna2018 article-exact_solutiontoa_parametricline
 
Deep Learning Opening Workshop - Deep ReLU Networks Viewed as a Statistical M...
Deep Learning Opening Workshop - Deep ReLU Networks Viewed as a Statistical M...Deep Learning Opening Workshop - Deep ReLU Networks Viewed as a Statistical M...
Deep Learning Opening Workshop - Deep ReLU Networks Viewed as a Statistical M...
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmann
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Tailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest NeighborsTailored Bregman Ball Trees for Effective Nearest Neighbors
Tailored Bregman Ball Trees for Effective Nearest Neighbors
 
A lattice-based consensus clustering
A lattice-based consensus clusteringA lattice-based consensus clustering
A lattice-based consensus clustering
 
Scalable sentiment classification for big data analysis using naive bayes cla...
Scalable sentiment classification for big data analysis using naive bayes cla...Scalable sentiment classification for big data analysis using naive bayes cla...
Scalable sentiment classification for big data analysis using naive bayes cla...
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
 
Refining Bayesian Data Analysis Methods for Use with Longer Waveforms
Refining Bayesian Data Analysis Methods for Use with Longer WaveformsRefining Bayesian Data Analysis Methods for Use with Longer Waveforms
Refining Bayesian Data Analysis Methods for Use with Longer Waveforms
 

Viewers also liked

Viewers also liked (20)

CV_Ramesh_M_K (1)
CV_Ramesh_M_K (1)CV_Ramesh_M_K (1)
CV_Ramesh_M_K (1)
 
Aceros del edificio
Aceros del edificioAceros del edificio
Aceros del edificio
 
Navidad
NavidadNavidad
Navidad
 
20151125 Seminario Università di Verona
20151125 Seminario Università di Verona20151125 Seminario Università di Verona
20151125 Seminario Università di Verona
 
Mck proposal
Mck proposal Mck proposal
Mck proposal
 
Tipos de Graficos en Excel
Tipos de Graficos en Excel Tipos de Graficos en Excel
Tipos de Graficos en Excel
 
Grupo 5
Grupo 5Grupo 5
Grupo 5
 
Alimentacion correcta
Alimentacion correctaAlimentacion correcta
Alimentacion correcta
 
Eines de percussió
Eines de percussióEines de percussió
Eines de percussió
 
Trabajo de informatica sistema circulatorio
Trabajo de informatica sistema circulatorioTrabajo de informatica sistema circulatorio
Trabajo de informatica sistema circulatorio
 
Evaluatioin pdf
Evaluatioin pdfEvaluatioin pdf
Evaluatioin pdf
 
Globale Standards im Web of Things
Globale Standards im Web of ThingsGlobale Standards im Web of Things
Globale Standards im Web of Things
 
Puerta de enlace predeterminada (gateway)
Puerta de enlace predeterminada (gateway)Puerta de enlace predeterminada (gateway)
Puerta de enlace predeterminada (gateway)
 
Understanding Plan Expenses
Understanding Plan ExpensesUnderstanding Plan Expenses
Understanding Plan Expenses
 
AÑO NUEVO
AÑO NUEVOAÑO NUEVO
AÑO NUEVO
 
Ultimo
UltimoUltimo
Ultimo
 
Model concept
Model conceptModel concept
Model concept
 
Fin de año
Fin de añoFin de año
Fin de año
 
Inf.visita
Inf.visitaInf.visita
Inf.visita
 
Trabajooooooooooo
TrabajoooooooooooTrabajooooooooooo
Trabajooooooooooo
 

Similar to Igh maa-2015 nov

Section5 Rbf
Section5 RbfSection5 Rbf
Section5 Rbfkylin
 
Radial Basis Function Interpolation
Radial Basis Function InterpolationRadial Basis Function Interpolation
Radial Basis Function InterpolationJesse Bettencourt
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing홍배 김
 
Learning to Reconstruct
Learning to ReconstructLearning to Reconstruct
Learning to ReconstructJonas Adler
 
Expectation propagation
Expectation propagationExpectation propagation
Expectation propagationDong Guo
 
Efficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representationsEfficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representationsNAVER Engineering
 
Artificial Intelligence
Artificial Intelligence Artificial Intelligence
Artificial Intelligence butest
 
A Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfA Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfPo-Chuan Chen
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysisbutest
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonChun-Ming Chang
 
Matrix Computations in Machine Learning
Matrix Computations in Machine LearningMatrix Computations in Machine Learning
Matrix Computations in Machine Learningbutest
 

Similar to Igh maa-2015 nov (20)

Section5 Rbf
Section5 RbfSection5 Rbf
Section5 Rbf
 
Radial Basis Function Interpolation
Radial Basis Function InterpolationRadial Basis Function Interpolation
Radial Basis Function Interpolation
 
Gaussian processing
Gaussian processingGaussian processing
Gaussian processing
 
Learning to Reconstruct
Learning to ReconstructLearning to Reconstruct
Learning to Reconstruct
 
Modeling quadratic fxns
Modeling quadratic fxnsModeling quadratic fxns
Modeling quadratic fxns
 
Expectation propagation
Expectation propagationExpectation propagation
Expectation propagation
 
Efficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representationsEfficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representations
 
QMC: Transition Workshop - Approximating Multivariate Functions When Function...
QMC: Transition Workshop - Approximating Multivariate Functions When Function...QMC: Transition Workshop - Approximating Multivariate Functions When Function...
QMC: Transition Workshop - Approximating Multivariate Functions When Function...
 
1519 differentiation-integration-02
1519 differentiation-integration-021519 differentiation-integration-02
1519 differentiation-integration-02
 
Artificial Intelligence
Artificial Intelligence Artificial Intelligence
Artificial Intelligence
 
A Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdfA Statistical Perspective on Retrieval-Based Models.pdf
A Statistical Perspective on Retrieval-Based Models.pdf
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Machine Learning and Statistical Analysis
Machine Learning and Statistical AnalysisMachine Learning and Statistical Analysis
Machine Learning and Statistical Analysis
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
 
Matrix Computations in Machine Learning
Matrix Computations in Machine LearningMatrix Computations in Machine Learning
Matrix Computations in Machine Learning
 

Recently uploaded

Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...Call girls in Ahmedabad High profile
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 

Recently uploaded (20)

Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
High Profile Call Girls Dahisar Arpita 9907093804 Independent Escort Service ...
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 

Igh maa-2015 nov

  • 1. Iterated geometric harmonics for missing data recovery Iterated geometric harmonics for missing data recovery Jonathan A. Lindgren, Erin P. J. Pearse, and Zach Zhang jlindgre, epearse, zazhang, @calpoly.edu California Polytechnic State University Nov. 14, 2015 California Polytechnic State University San Luis Obispo, CA
  • 2. Iterated geometric harmonics for missing data recovery Motivation: the missing data problem Introduction and background The missing data problem Missing data is often a problem. Data can be lost while recording measurements, during storage or transmission, due to equipment failure, ...
  • 3. Iterated geometric harmonics for missing data recovery Motivation: the missing data problem Introduction and background The missing data problem Missing data is often a problem. Data can be lost while recording measurements, during storage or transmission, due to equipment failure, ... Existing techniques: require some records (rows) to be complete, or require some characteristics (columns) to be complete, or are based on linear regression. (But data often has highly nonlinear internal structure!)
  • 4. Iterated geometric harmonics for missing data recovery Motivation: the missing data problem Introduction and background A dataset is a collection of vectors, stored as a matrix The data is an n × p matrix. Each row is a vector of length p; one row is a record and each column is a parameter or coordinate. {[ ]n records (p characteristics) one record
  • 5. Iterated geometric harmonics for missing data recovery Motivation: the missing data problem Introduction and background A dataset is a collection of vectors, stored as a matrix The data is an n × p matrix. Each row is a vector of length p; one row is a record and each column is a parameter or coordinate. EXAMPLES 36 photos, each of size 112 pixels × 92 pixels. {vk}36 k=1 ⊆ R10,304. (Each photo stored as a vector) Results from a psychology experiment: a 50-question exam given to 200 people. {vk}200 k=1 ⊆ R50. 3000 student records (SAT, ACT, GPA, class rank, etc.) {vk}3000 k=1 ⊆ R20.
  • 6. Iterated geometric harmonics for missing data recovery Motivation: the missing data problem Introduction and background Special case of the missing data problem Suppose all missing data are in one column       v1 v2 f2 v3 ... vn fn        Consider last column as a function f : {1, 2, . . . , n} → R.
  • 7. Iterated geometric harmonics for missing data recovery Motivation: the missing data problem Introduction and background Out-of-sample extension of an empirical function Idea: A function f is defined on a subset Γ of the dataset. f : Γ → Y, where Γ ⊆ Rp is the set where value of f is known. Want to extend f to F : X → Y so that F|Γ(x) = f(x), for x ∈ Γ. f XΓ
  • 8. Iterated geometric harmonics for missing data recovery Motivation: the missing data problem Introduction and background Out-of-sample extension of an empirical function Idea: A function f is defined on a subset Γ of the dataset. f : Γ → Y, where Γ ⊆ Rp is the set where value of f is known. Want to extend f to F : X → Y so that F|Γ(x) = f(x), for x ∈ Γ. f F XΓ
  • 9. Iterated geometric harmonics for missing data recovery Motivation: the missing data problem Introduction and background Out-of-sample extension of an empirical function Idea: A function f is defined on a subset Γ of the dataset. f : Γ → Y, where Γ ⊆ Rp is the set where value of f is known. Want to extend f to F : X → Y so that F|Γ(x) = f(x), for x ∈ Γ. Application: The data is a sample {(x, f(x))}x∈Γ. Example: X may be a collection of images or documents. Y = R Want to generalize to as-yet-unseen instances in X. “function extension” ←→ “automated sorting” =⇒ machine learning/manifold learning
  • 10. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics The network model associated to a dataset Similarities within data are modeled via nonlinearity Introduce a nonlinear kernel function k to model the similarity between two vectors. k(v, u) = ≈ 0, v and u very different ≈ 1, v and u very similar
  • 11. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics The network model associated to a dataset Similarities within data are modeled via nonlinearity Introduce a nonlinear kernel function k to model the similarity between two vectors. k(v, u) = ≈ 0, v and u very different ≈ 1, v and u very similar Two possible choices of such a kernel function: k(v, u) = e− v−u 2 2/ε | Corr(v, u)|m
  • 12. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics The network model associated to a dataset Convert the dataset into a network Goal: replace original dataset in Rn×p with a similarity network. Network = connected weighted undirected graph. Similarity network = weights represents similarities.
  • 13. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics The network model associated to a dataset Convert the dataset into a network Goal: replace original dataset in Rn×p with a similarity network. Network = connected weighted undirected graph. Similarity network = weights represents similarities. Vector vi −→ vertex vi in the network.    v1 v2 v3 v4     k −−−−−→ v1 • 4 2 • v2 3 wwwwwwwww v3 • 1 • v4
  • 14. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics The network model associated to a dataset Convert the dataset into a network Goal: replace original dataset in Rn×p with a similarity network. Network = connected weighted undirected graph. Similarity network = weights represents similarities. Vector vi −→ vertex vi in the network.    v1 v2 v3 v4     k −−−−−→ v1 • 4 2 • v2 3 wwwwwwwww v3 • 1 • v4 K = v1 v2 v3 v4 v1 v2 v3 v4         0 4 2 0 4 0 3 0 2 3 0 1 0 0 1 0        
  • 15. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics The network model associated to a dataset Convert the dataset into a network Goal: replace original dataset in Rn×p with a similarity network. Network = connected weighted undirected graph. Similarity network = weights represents similarities. Efficiency gain: n × p data matrix → n × n adjacency matrix     v1 v2 v3 v4     k −−−−−→ K =     0 4 2 0 4 0 3 0 2 3 0 1 0 0 1 0     Ki,j := k(vi, vi) Advantageous for high-dimensional datasets: p >> n.
  • 16. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics The network model associated to a dataset Geometric harmonics Coifman and Lafon introduced the machine learning tool “geometric harmonics” in 2005. Idea: the eigenfunctions of a diffusion operator can be used to perform global analysis of the dataset and of functions on a dataset.
  • 17. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics The network model associated to a dataset Geometric harmonics: construction and definition For matrix K with Ku,v = k(u, v), consider the integral operator f → Kf by (Kf)(u) := v∈Γ Ku,vf(v), u ∈ X. “Restricted matrix multiplication”
  • 18. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics The network model associated to a dataset Geometric harmonics: construction and definition For matrix K with Ku,v = k(u, v), consider the integral operator f → Kf by (Kf)(u) := v∈Γ Ku,vf(v), u ∈ X. Diagonalize restricted matrix [K]u,v∈Γ via: v∈Γ Ku,vψj(v) = λjψj(u), u ∈ Γ. NOTE: k symmetric =⇒ K symmetric =⇒ {ψj} form ONB
  • 19. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics The network model associated to a dataset Geometric harmonics: construction and definition For matrix K with Ku,v = k(u, v), consider the integral operator f → Kf by (Kf)(u) := v∈Γ Ku,vf(v), u ∈ X. Diagonalize restricted matrix [K]u,v∈Γ via: v∈Γ Ku,vψj(v) = λjψj(u), u ∈ Γ. [Nystr¨om] Reverse this equation to define values off Γ: Ψj(u) := 1 λj v∈Γ Ku,vψj(v), u ∈ X. {Ψj}n j=1 are the geometric harmonics, where n = |Γ|.
  • 20. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics The network model associated to a dataset Geometric harmonics: the extension algorithm For f : Γ → Y and n = |Γ|, define F(x) = n j=1 f, ψj ΓΨj(x), x ∈ X.
  • 21. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics The network model associated to a dataset Geometric harmonics: the extension algorithm For f : Γ → Y and n = |Γ|, define F(x) = n j=1 f, ψj ΓΨj(x), x ∈ X. For x ∈ Γ, Ψj(x) = ψj(x), so F(x) = n j=1 f, ψj ΓΨj(x) = n j=1 f, ψj Γψj(x) = f(x), since this is just the decomposition of f in the ONB {ψj}n j=1.
  • 22. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics The network model associated to a dataset Geometric harmonics: limitations Geometric harmonics does not apply to missing data. Consider f : Γ → R as extra column with holes:       v1 v2 v3 f ... vn        Geometric harmonics requires first p columns to be complete.
  • 23. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics Iterated geometric harmonics Iterated geometric harmonics: basic idea Underlying assumption of geometric harmonics: Data are samples from a submanifold. Restated as a continuity assumption: If p − 1 entries of u and v are very close, then so is the pth.
  • 24. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics Iterated geometric harmonics Iterated geometric harmonics: basic idea Underlying assumption of geometric harmonics: Data are samples from a submanifold. Restated as a continuity assumption: If p − 1 entries of u and v are very close, then so is the pth. Idea: Consider jth column to be a function of the others      v1 v2 ... vn      −→       a11 a21 ... an1 a12 a22 ... an2 . . . . . . . . . a1j a2j ... anj . . . . . . . . . a1p a2p ... anp       Geometric harmonics can be applied to jth column.
  • 25. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics Iterated geometric harmonics Iterated geometric harmonics: the iteration scheme 1 Record locations of missing values in the dataset. 2 Stochastically impute missing values. Drawn from N(µ, σ2 ), computed columnwise.
  • 26. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics Iterated geometric harmonics Iterated geometric harmonics: the iteration scheme 1 Record locations of missing values in the dataset. 2 Stochastically impute missing values. Drawn from N(µ, σ2 ), computed columnwise. 3 Iteration through columns. (a) Choose (at random) a column to update. (b) “Unlock” entries of column to be imputed. (c) Use geometric harmonics to update those entries. Current column is treated as a function of the others. New values are initially computed in terms of poor guesses. Successive passes improve guesses. (d) Continue until all columns are updated.
  • 27. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics Iterated geometric harmonics Iterated geometric harmonics: the iteration scheme 1 Record locations of missing values in the dataset. 2 Stochastically impute missing values. Drawn from N(µ, σ2 ), computed columnwise. 3 Iteration through columns. (a) Choose (at random) a column to update. (b) “Unlock” entries of column to be imputed. (c) Use geometric harmonics to update those entries. Current column is treated as a function of the others. New values are initially computed in terms of poor guesses. Successive passes improve guesses. (d) Continue until all columns are updated. 4 Repeat iteration until updates cause negligible change. Process typically stabilizes after about 4 cycles.
  • 28. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics Iterated geometric harmonics
  • 29. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics Iterated geometric harmonics
  • 30. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics Iterated geometric harmonics damaged restored original (70% data loss)
  • 31. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics Iterated geometric harmonics Iterated geometric harmonics: applications Iterated geometric harmonics requires continuity assumption Probably not well-suited to social network analysis, etc.
  • 32. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics Iterated geometric harmonics Iterated geometric harmonics: applications Iterated geometric harmonics requires continuity assumption Probably not well-suited to social network analysis, etc. Iterated geometric harmonics requires multiple similar datapoints/records Video footage is a natural application. 10–24 images per second, usually very similar. Applications for security, military, law enforcement.
  • 33. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics Iterated geometric harmonics Iterated geometric harmonics: applications Iterated geometric harmonics requires continuity assumption Probably not well-suited to social network analysis, etc. Iterated geometric harmonics requires multiple similar datapoints/records Video footage is a natural application. 10–24 images per second, usually very similar. Applications for security, military, law enforcement. Iterated geometric harmonics excels when p >> n However, has demonstrated good performance on low-dimensional time series. Example: San Diego weather data (next slide)
  • 34. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics Iterated geometric harmonics San Diego Airport weather data n = 2000, p = 25 0 1 2 3 4 5 0 500 1000 1500 2000 2500 GH Iterations L−2Error 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0 1 2 3 4 5 6 8 10 12 14 16 18 20 22 GH Iterations StandardDeviation 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4
  • 35. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics Iterated geometric harmonics Summary Iterated Geometric Harmonics (IGH): Robust data reconstruction, even at high rates of data loss. Well suited to high-dimensional problems p >> n. Relies on continuity assumptions on underlying data. Application to image reconstruction, video footage, etc. Patent pending (U.S. Patent Application No.: 14/920,556)
  • 36. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics Iterated geometric harmonics Summary Iterated Geometric Harmonics (IGH): Robust data reconstruction, even at high rates of data loss. Well suited to high-dimensional problems p >> n. Relies on continuity assumptions on underlying data. Application to image reconstruction, video footage, etc. Patent pending (U.S. Patent Application No.: 14/920,556) Future work: noisy data.
  • 37. Iterated geometric harmonics for missing data recovery A solution: Geometric harmonics Iterated geometric harmonics Iterated geometric harmonics for missing data recovery Jonathan A. Lindgren, Erin P. J. Pearse, and Zach Zhang jlindgre, epearse, zazhang, @calpoly.edu California Polytechnic State University Nov. 14, 2015 California Polytechnic State University San Luis Obispo, CA
  • 38. Iterated geometric harmonics for missing data recovery Future work From missing data to noisy data Future work: noisy data The problem of “noisy data” is more difficult: Before improving the data, bad values need to be located.
  • 39. Iterated geometric harmonics for missing data recovery Future work From missing data to noisy data Future work: noisy data The problem of “noisy data” is more difficult: Before improving the data, bad values need to be located. Current work: using Markov random fields to detect noise. Markov random fields: another graph-based tool for data analysis.
  • 40. Iterated geometric harmonics for missing data recovery Future work From missing data to noisy data Future work: Markov random fields original (noisy) data improved data
  • 41. Iterated geometric harmonics for missing data recovery Future work From missing data to noisy data Future work: Markov random fields original (noisy) data improved data a1 a2 a3 a4 w13 u4 u1 u5 u2 u6 u3 w12 w45 w23 w56 w24 w35 a5 a6 b1 b2 b3 b4 b5 b6 Minimize the energy functional: E = wij(ai − aj)2 + ui(ai − bi)2 where {bi} are given, wij are tuned by user (and usually all equal), and ui are tuned by user (and usually all equal).
  • 42. Iterated geometric harmonics for missing data recovery Future work From missing data to noisy data Future work: Markov random fields original (noisy) data improved data a1 a2 a3 a4 w13 u4 u1 u5 u2 u6 u3 w12 w45 w23 w56 w24 w35 a5 a6 b1 b2 b3 b4 b5 b6 Minimize the energy functional: E = (ai − aj)2 + λ (ai − bi)2 where {bi} are given, wij = ui = 1, and λ is tuned by user.
  • 43. Iterated geometric harmonics for missing data recovery Future work From missing data to noisy data Future work: Markov random fields Markov random fields (MRF) use simulated annealing solve minimize E given {bi} Output: improved data {ai}. Our approach: 1 Apply MRF to find improved data {ai}. 2 Compare {ai} to original data {bi}. 3 Label nodes with large values of |ai − bi| as missing data. 4 Apply IGH and obtain better improved data.
  • 44. Iterated geometric harmonics for missing data recovery Future work From missing data to noisy data Iterated geometric harmonics for missing data recovery Jonathan A. Lindgren, Erin P. J. Pearse, and Zach Zhang jlindgre, epearse, zazhang, @calpoly.edu California Polytechnic State University Nov. 14, 2015 California Polytechnic State University San Luis Obispo, CA
  • 45. Iterated geometric harmonics for missing data recovery Theoretical underpinnings Reproducing kernel Hilbert spaces Under the hood: reproducing kernel Hilbert spaces Suppose X ∈ Rn and k : X × X → R is nonnegative: k(x, y) ≥ 0 symmetric: k(x, y) = k(y, x) positive semidefinite: for any choice of {xi}m i=1, Ki,j = k(xi, xj) defines a positive semidefinite matrix.
  • 46. Iterated geometric harmonics for missing data recovery Theoretical underpinnings Reproducing kernel Hilbert spaces Under the hood: reproducing kernel Hilbert spaces Suppose X ∈ Rn and k : X × X → R is nonnegative: k(x, y) ≥ 0 symmetric: k(x, y) = k(y, x) positive semidefinite: for any choice of {xi}m i=1, Ki,j = k(xi, xj) defines a positive semidefinite matrix. [Aronszajn] There is a Hilbert space H of functions on X with kx := k(x, ·) ∈ H, for x ∈ X kx, f = f(x) (reproducing property)
  • 47. Iterated geometric harmonics for missing data recovery Theoretical underpinnings Reproducing kernel Hilbert spaces Under the hood: reproducing kernel Hilbert spaces Suppose X ∈ Rn and k : X × X → R is nonnegative: k(x, y) ≥ 0 symmetric: k(x, y) = k(y, x) positive semidefinite: for any choice of {xi}m i=1, Ki,j = k(xi, xj) defines a positive semidefinite matrix. [Aronszajn] There is a Hilbert space H of functions on X with kx := k(x, ·) ∈ H, for x ∈ X kx, f = f(x) (reproducing property) In the discrete case, H is the closure of f = x axkx, ax ∈ scalars.
  • 48. Iterated geometric harmonics for missing data recovery Theoretical underpinnings Reproducing kernel Hilbert spaces Under the hood: reproducing kernel Hilbert spaces For Γ ⊆ X, the operator K : L2(Γ, µ) → H given by (Kf)(x) = Γ k(x, y)f(y)dµ(y), x ∈ X, turns out to have adjoint operator K : H → L2(Γ, µ) given by domain restriction: K g(y) = g(y), y ∈ Γ, g ∈ H.
  • 49. Iterated geometric harmonics for missing data recovery Theoretical underpinnings Reproducing kernel Hilbert spaces Under the hood: reproducing kernel Hilbert spaces For Γ ⊆ X, the operator K : L2(Γ, µ) → H given by (Kf)(x) = Γ k(x, y)f(y)dµ(y), x ∈ X, turns out to have adjoint operator K : H → L2(Γ, µ) given by domain restriction: K g(y) = g(y), y ∈ Γ, g ∈ H. K K is self-adjoint, positive, and compact. Its eigenvalues are discrete and non-negative. Since K is restriction, eigs can be found by diagonalizing k on Γ.