SlideShare a Scribd company logo
1 of 30
Download to read offline
Generalized Low Rank Models
Madeleine Udell
Center for the Mathematics of Information
Caltech
Based on joint work with Stephen Boyd, Anqi Fu, Corinne Horn, and
Reza Zadeh
H2O World 11/11/2014
1 / 29
Data table
age gender state income education · · ·
29 F CT $53,000 college · · ·
57 ? NY $19,000 high school · · ·
? M CA $102,000 masters · · ·
41 F NV $23,000 ? · · ·
...
...
...
...
...
detect demographic groups?
find typical responses?
identify similar states?
impute missing entries?
2 / 29
Data table
m examples (patients, respondents, households, assets)
n features (tests, questions, sensors, times)

 A

 =



A11 · · · A1n
...
...
...
Am1 · · · Amn



ith row of A is feature vector for ith example
jth column of A gives values for jth feature across all
examples
3 / 29
Low rank model
given: A ∈ Rm×n
, k m, n
find: X ∈ Rm×k
, Y ∈ Rk×n
for which

X

 Y ≈

 A


i.e., xi yj ≈ Aij , where

X

 =



—x1—
...
—xm—


 Y =


| |
y1 · · · yn
| |


interpretation:
X and Y are (compressed) representation of A
xT
i ∈ Rk
is a point associated with example i
yj ∈ Rk
is a point associated with feature j
inner product xi yj approximates Aij
4 / 29
Why use a low rank model?
reduce storage; speed transmission
understand (visualize, cluster)
remove noise
infer missing data
simplify data processing
5 / 29
Principal components analysis
PCA:
minimize A − XY 2
F = m
i=1
n
j=1(Aij − xi yj )2
with variables X ∈ Rm×k
, Y ∈ Rk×n
old roots [Pearson 1901, Hotelling 1933]
least squares low rank fitting
(analytical) solution via SVD of A = UΣV T :
X = UkΣ
1/2
k Y = Σ
1/2
k V T
k
(Not unique: (XT, T−1Y ) also a solution for T invertible.)
(numerical) solution via alternating minimization
6 / 29
Low rank models for gait analysis 1
time forehead (x) forehead (y) · · · right toe (y) right toe (z)
t1 1.4 2.7 · · · -0.5 -0.1
t2 2.7 3.5 · · · 1.3 0.9
t3 3.3 -.9 · · · 4.2 1.8
...
...
...
...
...
...
rows of Y are principal stances
rows of X decompose stance into combination of principal
stances
1
gait analysis demo: https://github.com/h2oai/h2o-3/blob/
master/h2o-r/demos/rdemo.glrm.walking.gait.R
7 / 29
Interpreting principal components
columns of A (features) (y coordinates over time)
����
� �� ��� ��� ���
���������������
��������������
���������������
���������������
��������������
���������������
�����������
���������������
�����������
������������
���������
���������
�������������
�������������
�����������
�����������
����������
����������
��������
�����
�
���
����
����
����
8 / 29
Interpreting principal components
columns of A (features) (z coordinates over time)
����
� �� ��� ���
���������������
��������������
���������������
���������������
��������������
���������������
�����������
���������������
�����������
������������
���������
���������
�������������
�������������
�����������
�����������
����������
����������
��������
�����
�
���
����
����
8 / 29
Interpreting principal components
row of Y
(archetypical example)
(principal stance)
�
������ ��������������
�������������
�������������
������������
�������������
�������������
������������
�������������
�������������
������������
�������������
�������������
������
�������������
���������
������ �������������
�������������
�������
����������
����������
�������
�������
�������
�������
�����������
����������������������
���������
�������� ��������
��������
�
���
����
����
����
9 / 29
Interpreting principal components
columns of X (archetypical features) (principal timeseries)
����
� �� ��� ���
��
��
��
��
��
��������
�����
��
��
�
�
�
10 / 29
Interpreting principal components
column of XY (red) (predicted feature)
column of A (blue) (observed feature)
����
� �� ��� ��� ���
�
��
��
�
�
�
������������
11 / 29
Generalized low rank model
minimize (i,j)∈Ω Lj (xi yj , Aij ) + m
i=1 ri (xi ) + n
j=1 ˜rj (yj )
loss functions Lj for each column
e.g., different losses for reals, booleans, categoricals,
ordinals, . . .
regularizers r : R1×k
→ R, ˜r : Rk
→ R
observe only (i, j) ∈ Ω (other entries are missing)
12 / 29
Matrix completion
observe Aij only for (i, j) ∈ Ω ⊂ {1, . . . , m} × {1, . . . , n}
minimize (i,j)∈Ω(Aij − xi yj )2 + m
i=1 xi
2
2 + n
j=1 yj
2
2
two regimes:
some entries missing: don’t waste data; “borrow
strength” from entries that are not missing
most entries missing: matrix completion still works!
13 / 29
Regularizers
minimize (i,j)∈Ω Lj (xi yj , Aij ) + m
i=1 ri (xi ) + n
j=1 ˜rj (yj )
choose regularizers r, ˜r to impose structure:
structure r(x) ˜r(y)
small x 2
2 y 2
2
sparse x 1 y 1
nonnegative 1(x ≥ 0) 1(y ≥ 0)
clustered 1(card(x) = 1) 0
14 / 29
Losses
minimize (i,j)∈Ω Lj (xi yj , Aij ) + m
i=1 ri (xi ) + n
j=1 ˜rj (yj )
choose loss L(u, a) adapted to data type:
data type loss L(u, a)
real quadratic (u − a)2
real absolute value |u − a|
real huber huber(u − a)
boolean hinge (1 − ua)+
boolean logistic log(1 + exp(−au))
integer poisson exp(u) − au + a log a − a
ordinal ordinal hinge a−1
a =1(1 − u + a )++
d
a =a+1(1 + u − a )+
categorical one-vs-all (1 − ua)+ + a =a(1 + ua )+
15 / 29
Examples
variations on GLRMs recover many known models:
Model Lj(u, a) r(x) ˜r(y) reference
PCA (u − a)2
0 0 [Pearson 1901]
matrix completion (u − a)2
x 2
2 y 2
2 [Keshavan 2010]
NNMF (u − a)2
1(x ≥ 0) 1(y ≥ 0) [Lee 1999]
sparse PCA (u − a)2
x 1 y 1 [D’Aspremont 20
sparse coding (u − a)2
x 1 y 2
2 [Olshausen 1997]
k-means (u − a)2
1(card(x) = 1) 0 [Tropp 2004]
robust PCA |u − a| x 2
2 y 2
2 [Candes 2011]
logistic PCA log(1 + exp(−au)) x 2
2 y 2
2 [Collins 2001]
boolean PCA (1 − au)+ x 2
2 y 2
2 [Srebro 2004]
16 / 29
Impute heterogeneous data
PCA:
mixed data types
−12
−9
−6
−3
0
3
6
9
12
remove entries
−12
−9
−6
−3
0
3
6
9
12
pca rank 10 recovery
−12
−9
−6
−3
0
3
6
9
12
error
−3.0
−2.4
−1.8
−1.2
−0.6
0.0
0.6
1.2
1.8
2.4
3.0
GLRM:
mixed data types
−12
−9
−6
−3
0
3
6
9
12
remove entries
−12
−9
−6
−3
0
3
6
9
12
glrm rank 10 recovery
−12
−9
−6
−3
0
3
6
9
12
error
−3.0
−2.4
−1.8
−1.2
−0.6
0.0
0.6
1.2
1.8
2.4
3.0
17 / 29
Validate model
minimize (i,j)∈Ω Lij (Aij , xi yj ) + m
i=1 γri (xi ) + n
j=1 γ˜rj (yj )
How to choose model parameters (k, γ)?
Leave out 10% of entries, and use model to predict them
0 1 2 3 4 5
0.2
0.4
0.6
0.8
1
γ
normalizedtesterror
k=1
k=2
k=3
k=4
k=5
18 / 29
American community survey
2013 ACS:
3M respondents, 87 economic/demographic survey
questions
income
cost of utilities (water, gas, electric)
weeks worked per year
hours worked per week
home ownership
looking for work
use foodstamps
education level
state of residence
. . .
1/3 of responses missing
19 / 29
Using a GLRM for exploratory data analysis


| |
y1 · · · yn
| |


age gender state · · ·
29 F CT · · ·
57 ? NY · · ·
? M CA · · ·
41 F NV · · ·
...
...
...
≈



—x1—
...
—xm—



cluster respondents? cluster rows of X
demographic profiles? rows of Y
which features are similar? cluster columns of Y
impute missing entries? argmina Lj (xi yj , a)
20 / 29
Fitting a GLRM to the ACS
construct a rank 10 GLRM with loss functions respecting
data types
huber for real values
hinge loss for booleans
ordinal hinge loss for ordinals
one-vs-all hinge loss for categoricals
scale losses and regularizers
fit the GLRM
21 / 29
American community survey
most similar features (in demography space):
Alaska: Montana, North Dakota
California: Illinois, cost of water
Colorado: Oregon, Idaho
Ohio: Indiana, Michigan
Pennsylvania: Massachusetts, New Jersey
Virginia: Maryland, Connecticut
Hours worked: weeks worked, education
22 / 29
Low rank models for dimensionality reduction 2
U.S. Wage & Hour Division (WHD) compliance actions:
company # employees zip violations · · ·
h2o.ai 58 95050 0 · · ·
stanford 8300 94305 0 · · ·
caltech 741 91107 0 · · ·
...
...
...
...
208,806 rows (cases) × 252 columns (violation info)
32,989 zip codes. . .
2
labor law violation demo: https://github.com/h2oai/h2o-3/blob/
master/h2o-r/demos/rdemo.census.labor.violations.large.R
23 / 29
Low rank models for dimensionality reduction
ACS demographic data:
zip unemployment mean income · · ·
94305 12% $47,000 · · ·
06511 19% $32,000 · · ·
60647 23% $23,000 · · ·
94121 4% $178,000 · · ·
...
...
...
32,989 rows (zip codes) × 150 columns (demographic info)
GLRM embeds zip codes into (low dimensional)
demography space
24 / 29
Low rank models for dimensionality reduction
Zip code features:
25 / 29
Low rank models for dimensionality reduction
build 3 sets of features to predict violations:
categorical: expand zip code to categorical variable
concatenate: join tables on zip
GLRM: replace zip code by low dimensional zip code
features
fit a supervised (deep learning) model:
method train error test error runtime
categorical 0.2091690 0.2173612 23.7600000
concatenate 0.2258872 0.2515906 4.4700000
GLRM 0.1790884 0.1933637 4.3600000
26 / 29
Fitting GLRMs with alternating minimization
minimize (i,j)∈Ω Lj (xi yj , Aij ) + m
i=1 ri (xi ) + n
j=1 ˜rj (yj )
repeat:
1. minimize objective over xi (in parallel)
2. minimize objective over yj (in parallel)
properties:
subproblems easy to solve
objective decreases at every step, so converges if losses and
regularizers are bounded below
(not guaranteed to find global solution, but) usually finds
good model in practice
naturally parallel, so scales to huge problems
27 / 29
A simple, fast update rule
proximal gradient method: let
g =
j:(i,j)∈Ω
Lj (xi yj , Aij )yj
and update
xt+1
i = proxαt r (xt
i − αtg)
(where proxf (z) = argminx (f (x) + 1
2
x − z 2
2))
simple: only requires ability to evaluate L and proxr
stochastic variant: use noisy estimate for g
time per iteration: O((n+m+|Ω|)k
p ) on p processors
Implementations available in Python (serial), Julia (shared
memory parallel), Spark (parallel distributed), and H2O (parallel
distributed).
28 / 29
Conclusion
generalized low rank models
find structure in data automatically
can handle huge, heterogeneous data coherently
transform big messy data into small clean data
paper:
http://arxiv.org/abs/1410.0342
H2O:
https://github.com/h2oai/h2o-world-2015-training/
blob/master/tutorials/glrm/glrm-tutorial.md
julia:
https://github.com/madeleineudell/LowRankModels.jl
29 / 29

More Related Content

What's hot

A direct method for estimating linear non-Gaussian acyclic models
A direct method for estimating linear non-Gaussian acyclic modelsA direct method for estimating linear non-Gaussian acyclic models
A direct method for estimating linear non-Gaussian acyclic modelsShiga University, RIKEN
 
Deep generative model.pdf
Deep generative model.pdfDeep generative model.pdf
Deep generative model.pdfHyungjoo Cho
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEKai-Wen Zhao
 
Clustering training
Clustering trainingClustering training
Clustering trainingGabor Veress
 

What's hot (7)

pet ct.pptx
pet ct.pptxpet ct.pptx
pet ct.pptx
 
Seher başaran
Seher başaranSeher başaran
Seher başaran
 
A direct method for estimating linear non-Gaussian acyclic models
A direct method for estimating linear non-Gaussian acyclic modelsA direct method for estimating linear non-Gaussian acyclic models
A direct method for estimating linear non-Gaussian acyclic models
 
Deep generative model.pdf
Deep generative model.pdfDeep generative model.pdf
Deep generative model.pdf
 
High Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNEHigh Dimensional Data Visualization using t-SNE
High Dimensional Data Visualization using t-SNE
 
Clustering training
Clustering trainingClustering training
Clustering training
 
Curse of Dimensionality and Big Data
Curse of Dimensionality and Big DataCurse of Dimensionality and Big Data
Curse of Dimensionality and Big Data
 

Viewers also liked

H2O World - GLRM - Anqi Fu
H2O World - GLRM - Anqi FuH2O World - GLRM - Anqi Fu
H2O World - GLRM - Anqi FuSri Ambati
 
H2O World - Cancer Detection via the Lasso - Rob Tibshirani
H2O World - Cancer Detection via the Lasso - Rob TibshiraniH2O World - Cancer Detection via the Lasso - Rob Tibshirani
H2O World - Cancer Detection via the Lasso - Rob TibshiraniSri Ambati
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandrySri Ambati
 
Sparkling Water Meetup: Deep Learning for Public Safety
Sparkling Water Meetup: Deep Learning for Public SafetySparkling Water Meetup: Deep Learning for Public Safety
Sparkling Water Meetup: Deep Learning for Public SafetySri Ambati
 
Distributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta MeetupDistributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta MeetupSri Ambati
 
Python and H2O with Cliff Click at PyData Dallas 2015
Python and H2O with Cliff Click at PyData Dallas 2015Python and H2O with Cliff Click at PyData Dallas 2015
Python and H2O with Cliff Click at PyData Dallas 2015Sri Ambati
 
Data Science Competition
Data Science CompetitionData Science Competition
Data Science CompetitionJeong-Yoon Lee
 
H2O World - Welcome to H2O World with Arno Candel
H2O World - Welcome to H2O World with Arno CandelH2O World - Welcome to H2O World with Arno Candel
H2O World - Welcome to H2O World with Arno CandelSri Ambati
 
Machine Learning for the Sensored Internet of Things
Machine Learning for the Sensored Internet of ThingsMachine Learning for the Sensored Internet of Things
Machine Learning for the Sensored Internet of ThingsSri Ambati
 
H2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SF
H2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SFH2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SF
H2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SFSri Ambati
 
H2O World - Migrating from Proprietary Analytics Software - Fonda Ingram
H2O World - Migrating from Proprietary Analytics Software - Fonda IngramH2O World - Migrating from Proprietary Analytics Software - Fonda Ingram
H2O World - Migrating from Proprietary Analytics Software - Fonda IngramSri Ambati
 
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamH2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamSri Ambati
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsSri Ambati
 
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...Sri Ambati
 
Intro to Machine Learning with H2O and Python - Denver
Intro to Machine Learning with H2O and Python - DenverIntro to Machine Learning with H2O and Python - Denver
Intro to Machine Learning with H2O and Python - DenverSri Ambati
 
H2O World - GLM - Tomas Nykodym
H2O World - GLM - Tomas NykodymH2O World - GLM - Tomas Nykodym
H2O World - GLM - Tomas NykodymSri Ambati
 
H2O World - NCS Continuous Media Optimization w/H2O - Satya Satyamoorthy
H2O World - NCS Continuous Media Optimization w/H2O - Satya SatyamoorthyH2O World - NCS Continuous Media Optimization w/H2O - Satya Satyamoorthy
H2O World - NCS Continuous Media Optimization w/H2O - Satya SatyamoorthySri Ambati
 
H2O World - Transamerica's Product Recommender Platform - Vishal Bamba & Niti...
H2O World - Transamerica's Product Recommender Platform - Vishal Bamba & Niti...H2O World - Transamerica's Product Recommender Platform - Vishal Bamba & Niti...
H2O World - Transamerica's Product Recommender Platform - Vishal Bamba & Niti...Sri Ambati
 
2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_on2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_onSri Ambati
 
H2O World - H2O Deep Learning with Arno Candel
H2O World - H2O Deep Learning with Arno CandelH2O World - H2O Deep Learning with Arno Candel
H2O World - H2O Deep Learning with Arno CandelSri Ambati
 

Viewers also liked (20)

H2O World - GLRM - Anqi Fu
H2O World - GLRM - Anqi FuH2O World - GLRM - Anqi Fu
H2O World - GLRM - Anqi Fu
 
H2O World - Cancer Detection via the Lasso - Rob Tibshirani
H2O World - Cancer Detection via the Lasso - Rob TibshiraniH2O World - Cancer Detection via the Lasso - Rob Tibshirani
H2O World - Cancer Detection via the Lasso - Rob Tibshirani
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
 
Sparkling Water Meetup: Deep Learning for Public Safety
Sparkling Water Meetup: Deep Learning for Public SafetySparkling Water Meetup: Deep Learning for Public Safety
Sparkling Water Meetup: Deep Learning for Public Safety
 
Distributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta MeetupDistributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta Meetup
 
Python and H2O with Cliff Click at PyData Dallas 2015
Python and H2O with Cliff Click at PyData Dallas 2015Python and H2O with Cliff Click at PyData Dallas 2015
Python and H2O with Cliff Click at PyData Dallas 2015
 
Data Science Competition
Data Science CompetitionData Science Competition
Data Science Competition
 
H2O World - Welcome to H2O World with Arno Candel
H2O World - Welcome to H2O World with Arno CandelH2O World - Welcome to H2O World with Arno Candel
H2O World - Welcome to H2O World with Arno Candel
 
Machine Learning for the Sensored Internet of Things
Machine Learning for the Sensored Internet of ThingsMachine Learning for the Sensored Internet of Things
Machine Learning for the Sensored Internet of Things
 
H2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SF
H2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SFH2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SF
H2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SF
 
H2O World - Migrating from Proprietary Analytics Software - Fonda Ingram
H2O World - Migrating from Proprietary Analytics Software - Fonda IngramH2O World - Migrating from Proprietary Analytics Software - Fonda Ingram
H2O World - Migrating from Proprietary Analytics Software - Fonda Ingram
 
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamH2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
H2O World - Quora: Machine Learning Algorithms to Grow the World's Knowledge ...
 
Intro to Machine Learning with H2O and Python - Denver
Intro to Machine Learning with H2O and Python - DenverIntro to Machine Learning with H2O and Python - Denver
Intro to Machine Learning with H2O and Python - Denver
 
H2O World - GLM - Tomas Nykodym
H2O World - GLM - Tomas NykodymH2O World - GLM - Tomas Nykodym
H2O World - GLM - Tomas Nykodym
 
H2O World - NCS Continuous Media Optimization w/H2O - Satya Satyamoorthy
H2O World - NCS Continuous Media Optimization w/H2O - Satya SatyamoorthyH2O World - NCS Continuous Media Optimization w/H2O - Satya Satyamoorthy
H2O World - NCS Continuous Media Optimization w/H2O - Satya Satyamoorthy
 
H2O World - Transamerica's Product Recommender Platform - Vishal Bamba & Niti...
H2O World - Transamerica's Product Recommender Platform - Vishal Bamba & Niti...H2O World - Transamerica's Product Recommender Platform - Vishal Bamba & Niti...
H2O World - Transamerica's Product Recommender Platform - Vishal Bamba & Niti...
 
2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_on2014 09 30_sparkling_water_hands_on
2014 09 30_sparkling_water_hands_on
 
H2O World - H2O Deep Learning with Arno Candel
H2O World - H2O Deep Learning with Arno CandelH2O World - H2O Deep Learning with Arno Candel
H2O World - H2O Deep Learning with Arno Candel
 

Similar to H2O World - Generalized Low Rank Models - Madeleine Udell

Inria Tech Talk - La classification de données complexes avec MASSICCC
Inria Tech Talk - La classification de données complexes avec MASSICCCInria Tech Talk - La classification de données complexes avec MASSICCC
Inria Tech Talk - La classification de données complexes avec MASSICCCStéphanie Roger
 
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydH2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydSri Ambati
 
Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics Alexander Litvinenko
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Frank Nielsen
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodFrank Nielsen
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classificationSung Yub Kim
 
Maneuvering target track prediction model
Maneuvering target track prediction modelManeuvering target track prediction model
Maneuvering target track prediction modelIJCI JOURNAL
 
Calibrating the Lee-Carter and the Poisson Lee-Carter models via Neural Netw...
Calibrating the Lee-Carter and the Poisson Lee-Carter models  via Neural Netw...Calibrating the Lee-Carter and the Poisson Lee-Carter models  via Neural Netw...
Calibrating the Lee-Carter and the Poisson Lee-Carter models via Neural Netw...Salvatore Scognamiglio
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsGilles Louppe
 
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...Michael Lie
 
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Fabian Pedregosa
 
Generalized Low Rank Models
Generalized Low Rank ModelsGeneralized Low Rank Models
Generalized Low Rank ModelsSri Ambati
 
ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)CrackDSE
 
Conditional Random Fields
Conditional Random FieldsConditional Random Fields
Conditional Random Fieldslswing
 

Similar to H2O World - Generalized Low Rank Models - Madeleine Udell (20)

Inria Tech Talk - La classification de données complexes avec MASSICCC
Inria Tech Talk - La classification de données complexes avec MASSICCCInria Tech Talk - La classification de données complexes avec MASSICCC
Inria Tech Talk - La classification de données complexes avec MASSICCC
 
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen BoydH2O World - Consensus Optimization and Machine Learning - Stephen Boyd
H2O World - Consensus Optimization and Machine Learning - Stephen Boyd
 
Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics Tucker tensor analysis of Matern functions in spatial statistics
Tucker tensor analysis of Matern functions in spatial statistics
 
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...
 
On learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihoodOn learning statistical mixtures maximizing the complete likelihood
On learning statistical mixtures maximizing the complete likelihood
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
MCQMC 2016 Tutorial
MCQMC 2016 TutorialMCQMC 2016 Tutorial
MCQMC 2016 Tutorial
 
Maneuvering target track prediction model
Maneuvering target track prediction modelManeuvering target track prediction model
Maneuvering target track prediction model
 
Calibrating the Lee-Carter and the Poisson Lee-Carter models via Neural Netw...
Calibrating the Lee-Carter and the Poisson Lee-Carter models  via Neural Netw...Calibrating the Lee-Carter and the Poisson Lee-Carter models  via Neural Netw...
Calibrating the Lee-Carter and the Poisson Lee-Carter models via Neural Netw...
 
Seminar psu 20.10.2013
Seminar psu 20.10.2013Seminar psu 20.10.2013
Seminar psu 20.10.2013
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
 
ML unit-1.pptx
ML unit-1.pptxML unit-1.pptx
ML unit-1.pptx
 
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
Time-Series Analysis on Multiperiodic Conditional Correlation by Sparse Covar...
 
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
 
Generalized Low Rank Models
Generalized Low Rank ModelsGeneralized Low Rank Models
Generalized Low Rank Models
 
AINL 2016: Strijov
AINL 2016: StrijovAINL 2016: Strijov
AINL 2016: Strijov
 
ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)ISI MSQE Entrance Question Paper (2008)
ISI MSQE Entrance Question Paper (2008)
 
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
CLIM Fall 2017 Course: Statistics for Climate Research, Spatial Data: Models ...
 
Conditional Random Fields
Conditional Random FieldsConditional Random Fields
Conditional Random Fields
 

More from Sri Ambati

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxSri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thSri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMsSri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the WaySri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OSri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersSri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email AgainSri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati
 

More from Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 

Recently uploaded

Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfIdiosysTechnologies1
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 

Recently uploaded (20)

Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdf
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 

H2O World - Generalized Low Rank Models - Madeleine Udell

  • 1. Generalized Low Rank Models Madeleine Udell Center for the Mathematics of Information Caltech Based on joint work with Stephen Boyd, Anqi Fu, Corinne Horn, and Reza Zadeh H2O World 11/11/2014 1 / 29
  • 2. Data table age gender state income education · · · 29 F CT $53,000 college · · · 57 ? NY $19,000 high school · · · ? M CA $102,000 masters · · · 41 F NV $23,000 ? · · · ... ... ... ... ... detect demographic groups? find typical responses? identify similar states? impute missing entries? 2 / 29
  • 3. Data table m examples (patients, respondents, households, assets) n features (tests, questions, sensors, times)   A   =    A11 · · · A1n ... ... ... Am1 · · · Amn    ith row of A is feature vector for ith example jth column of A gives values for jth feature across all examples 3 / 29
  • 4. Low rank model given: A ∈ Rm×n , k m, n find: X ∈ Rm×k , Y ∈ Rk×n for which  X   Y ≈   A   i.e., xi yj ≈ Aij , where  X   =    —x1— ... —xm—    Y =   | | y1 · · · yn | |   interpretation: X and Y are (compressed) representation of A xT i ∈ Rk is a point associated with example i yj ∈ Rk is a point associated with feature j inner product xi yj approximates Aij 4 / 29
  • 5. Why use a low rank model? reduce storage; speed transmission understand (visualize, cluster) remove noise infer missing data simplify data processing 5 / 29
  • 6. Principal components analysis PCA: minimize A − XY 2 F = m i=1 n j=1(Aij − xi yj )2 with variables X ∈ Rm×k , Y ∈ Rk×n old roots [Pearson 1901, Hotelling 1933] least squares low rank fitting (analytical) solution via SVD of A = UΣV T : X = UkΣ 1/2 k Y = Σ 1/2 k V T k (Not unique: (XT, T−1Y ) also a solution for T invertible.) (numerical) solution via alternating minimization 6 / 29
  • 7. Low rank models for gait analysis 1 time forehead (x) forehead (y) · · · right toe (y) right toe (z) t1 1.4 2.7 · · · -0.5 -0.1 t2 2.7 3.5 · · · 1.3 0.9 t3 3.3 -.9 · · · 4.2 1.8 ... ... ... ... ... ... rows of Y are principal stances rows of X decompose stance into combination of principal stances 1 gait analysis demo: https://github.com/h2oai/h2o-3/blob/ master/h2o-r/demos/rdemo.glrm.walking.gait.R 7 / 29
  • 8. Interpreting principal components columns of A (features) (y coordinates over time) ���� � �� ��� ��� ��� ��������������� �������������� ��������������� ��������������� �������������� ��������������� ����������� ��������������� ����������� ������������ ��������� ��������� ������������� ������������� ����������� ����������� ���������� ���������� �������� ����� � ��� ���� ���� ���� 8 / 29
  • 9. Interpreting principal components columns of A (features) (z coordinates over time) ���� � �� ��� ��� ��������������� �������������� ��������������� ��������������� �������������� ��������������� ����������� ��������������� ����������� ������������ ��������� ��������� ������������� ������������� ����������� ����������� ���������� ���������� �������� ����� � ��� ���� ���� 8 / 29
  • 10. Interpreting principal components row of Y (archetypical example) (principal stance) � ������ �������������� ������������� ������������� ������������ ������������� ������������� ������������ ������������� ������������� ������������ ������������� ������������� ������ ������������� ��������� ������ ������������� ������������� ������� ���������� ���������� ������� ������� ������� ������� ����������� ���������������������� ��������� �������� �������� �������� � ��� ���� ���� ���� 9 / 29
  • 11. Interpreting principal components columns of X (archetypical features) (principal timeseries) ���� � �� ��� ��� �� �� �� �� �� �������� ����� �� �� � � � 10 / 29
  • 12. Interpreting principal components column of XY (red) (predicted feature) column of A (blue) (observed feature) ���� � �� ��� ��� ��� � �� �� � � � ������������ 11 / 29
  • 13. Generalized low rank model minimize (i,j)∈Ω Lj (xi yj , Aij ) + m i=1 ri (xi ) + n j=1 ˜rj (yj ) loss functions Lj for each column e.g., different losses for reals, booleans, categoricals, ordinals, . . . regularizers r : R1×k → R, ˜r : Rk → R observe only (i, j) ∈ Ω (other entries are missing) 12 / 29
  • 14. Matrix completion observe Aij only for (i, j) ∈ Ω ⊂ {1, . . . , m} × {1, . . . , n} minimize (i,j)∈Ω(Aij − xi yj )2 + m i=1 xi 2 2 + n j=1 yj 2 2 two regimes: some entries missing: don’t waste data; “borrow strength” from entries that are not missing most entries missing: matrix completion still works! 13 / 29
  • 15. Regularizers minimize (i,j)∈Ω Lj (xi yj , Aij ) + m i=1 ri (xi ) + n j=1 ˜rj (yj ) choose regularizers r, ˜r to impose structure: structure r(x) ˜r(y) small x 2 2 y 2 2 sparse x 1 y 1 nonnegative 1(x ≥ 0) 1(y ≥ 0) clustered 1(card(x) = 1) 0 14 / 29
  • 16. Losses minimize (i,j)∈Ω Lj (xi yj , Aij ) + m i=1 ri (xi ) + n j=1 ˜rj (yj ) choose loss L(u, a) adapted to data type: data type loss L(u, a) real quadratic (u − a)2 real absolute value |u − a| real huber huber(u − a) boolean hinge (1 − ua)+ boolean logistic log(1 + exp(−au)) integer poisson exp(u) − au + a log a − a ordinal ordinal hinge a−1 a =1(1 − u + a )++ d a =a+1(1 + u − a )+ categorical one-vs-all (1 − ua)+ + a =a(1 + ua )+ 15 / 29
  • 17. Examples variations on GLRMs recover many known models: Model Lj(u, a) r(x) ˜r(y) reference PCA (u − a)2 0 0 [Pearson 1901] matrix completion (u − a)2 x 2 2 y 2 2 [Keshavan 2010] NNMF (u − a)2 1(x ≥ 0) 1(y ≥ 0) [Lee 1999] sparse PCA (u − a)2 x 1 y 1 [D’Aspremont 20 sparse coding (u − a)2 x 1 y 2 2 [Olshausen 1997] k-means (u − a)2 1(card(x) = 1) 0 [Tropp 2004] robust PCA |u − a| x 2 2 y 2 2 [Candes 2011] logistic PCA log(1 + exp(−au)) x 2 2 y 2 2 [Collins 2001] boolean PCA (1 − au)+ x 2 2 y 2 2 [Srebro 2004] 16 / 29
  • 18. Impute heterogeneous data PCA: mixed data types −12 −9 −6 −3 0 3 6 9 12 remove entries −12 −9 −6 −3 0 3 6 9 12 pca rank 10 recovery −12 −9 −6 −3 0 3 6 9 12 error −3.0 −2.4 −1.8 −1.2 −0.6 0.0 0.6 1.2 1.8 2.4 3.0 GLRM: mixed data types −12 −9 −6 −3 0 3 6 9 12 remove entries −12 −9 −6 −3 0 3 6 9 12 glrm rank 10 recovery −12 −9 −6 −3 0 3 6 9 12 error −3.0 −2.4 −1.8 −1.2 −0.6 0.0 0.6 1.2 1.8 2.4 3.0 17 / 29
  • 19. Validate model minimize (i,j)∈Ω Lij (Aij , xi yj ) + m i=1 γri (xi ) + n j=1 γ˜rj (yj ) How to choose model parameters (k, γ)? Leave out 10% of entries, and use model to predict them 0 1 2 3 4 5 0.2 0.4 0.6 0.8 1 γ normalizedtesterror k=1 k=2 k=3 k=4 k=5 18 / 29
  • 20. American community survey 2013 ACS: 3M respondents, 87 economic/demographic survey questions income cost of utilities (water, gas, electric) weeks worked per year hours worked per week home ownership looking for work use foodstamps education level state of residence . . . 1/3 of responses missing 19 / 29
  • 21. Using a GLRM for exploratory data analysis   | | y1 · · · yn | |   age gender state · · · 29 F CT · · · 57 ? NY · · · ? M CA · · · 41 F NV · · · ... ... ... ≈    —x1— ... —xm—    cluster respondents? cluster rows of X demographic profiles? rows of Y which features are similar? cluster columns of Y impute missing entries? argmina Lj (xi yj , a) 20 / 29
  • 22. Fitting a GLRM to the ACS construct a rank 10 GLRM with loss functions respecting data types huber for real values hinge loss for booleans ordinal hinge loss for ordinals one-vs-all hinge loss for categoricals scale losses and regularizers fit the GLRM 21 / 29
  • 23. American community survey most similar features (in demography space): Alaska: Montana, North Dakota California: Illinois, cost of water Colorado: Oregon, Idaho Ohio: Indiana, Michigan Pennsylvania: Massachusetts, New Jersey Virginia: Maryland, Connecticut Hours worked: weeks worked, education 22 / 29
  • 24. Low rank models for dimensionality reduction 2 U.S. Wage & Hour Division (WHD) compliance actions: company # employees zip violations · · · h2o.ai 58 95050 0 · · · stanford 8300 94305 0 · · · caltech 741 91107 0 · · · ... ... ... ... 208,806 rows (cases) × 252 columns (violation info) 32,989 zip codes. . . 2 labor law violation demo: https://github.com/h2oai/h2o-3/blob/ master/h2o-r/demos/rdemo.census.labor.violations.large.R 23 / 29
  • 25. Low rank models for dimensionality reduction ACS demographic data: zip unemployment mean income · · · 94305 12% $47,000 · · · 06511 19% $32,000 · · · 60647 23% $23,000 · · · 94121 4% $178,000 · · · ... ... ... 32,989 rows (zip codes) × 150 columns (demographic info) GLRM embeds zip codes into (low dimensional) demography space 24 / 29
  • 26. Low rank models for dimensionality reduction Zip code features: 25 / 29
  • 27. Low rank models for dimensionality reduction build 3 sets of features to predict violations: categorical: expand zip code to categorical variable concatenate: join tables on zip GLRM: replace zip code by low dimensional zip code features fit a supervised (deep learning) model: method train error test error runtime categorical 0.2091690 0.2173612 23.7600000 concatenate 0.2258872 0.2515906 4.4700000 GLRM 0.1790884 0.1933637 4.3600000 26 / 29
  • 28. Fitting GLRMs with alternating minimization minimize (i,j)∈Ω Lj (xi yj , Aij ) + m i=1 ri (xi ) + n j=1 ˜rj (yj ) repeat: 1. minimize objective over xi (in parallel) 2. minimize objective over yj (in parallel) properties: subproblems easy to solve objective decreases at every step, so converges if losses and regularizers are bounded below (not guaranteed to find global solution, but) usually finds good model in practice naturally parallel, so scales to huge problems 27 / 29
  • 29. A simple, fast update rule proximal gradient method: let g = j:(i,j)∈Ω Lj (xi yj , Aij )yj and update xt+1 i = proxαt r (xt i − αtg) (where proxf (z) = argminx (f (x) + 1 2 x − z 2 2)) simple: only requires ability to evaluate L and proxr stochastic variant: use noisy estimate for g time per iteration: O((n+m+|Ω|)k p ) on p processors Implementations available in Python (serial), Julia (shared memory parallel), Spark (parallel distributed), and H2O (parallel distributed). 28 / 29
  • 30. Conclusion generalized low rank models find structure in data automatically can handle huge, heterogeneous data coherently transform big messy data into small clean data paper: http://arxiv.org/abs/1410.0342 H2O: https://github.com/h2oai/h2o-world-2015-training/ blob/master/tutorials/glrm/glrm-tutorial.md julia: https://github.com/madeleineudell/LowRankModels.jl 29 / 29