SlideShare a Scribd company logo
Using Feature Grouping as a
Stochastic Regularizer for
High Dimensional Noisy Data
Sergül Aydöre
Assistant Professor
Electrical and Computer Engineering
Stevens Institute of Technology
2
Landscape of Machine Learning Applications
https://research.hubspot.com/charts/simplified-ai-landscape​
• Data is High Dimensional, Noisy and Sample Size is
small as in NeuroImaging
3
But what if
PET acquisition process
wikipedia
Implantation of intracranial
electrodes.
Cleveland Epilepsy Clinic
An elastic EEG cap with 60
electrodes [Bai2012]
A typical MEG equipment [BML2001]
MRI Scanner and rs-fMRI time series acquisition [NVIDIA]
4
Other High Dimensional, Noisy Data and Small
Sample Size Situations
Genomics
Integrative Genomics Viewer, 2012
Seismology
https://www.mapnagroup.com
Astronomy
AstronomyMagazine,
2015
5
Challenges
1. High Dimensionality of the data due to rich temporal and
spatial structure
6
Challenges
1. High Dimensionality of the data due to rich temporal and
spatial structure
2. Noise in the data due to mechanical or physical artifacts.
7
Challenges
1. High Dimensionality of the data due to rich temporal and
spatial structure
2. Noise in the data due to mechanical or physical artifacts.
3. Difficulty and cost of data collection
8
Overfitting
• ML models with
large number of
parameters
require large
amount of data.
Otherwise,
overfitting can
occur!
http://scott.fortmann-roe.com/docs/MeasuringError.html
9
Regularization Methods to overcome Overfitting
• Early Stopping [Yao, 2007]
• Ridge Regression (ℓ2 regularization) [Tibshirami 1996]
• Least Absolute Shrinkage and Selection Operator
(LASSO or ℓ1 regularization ) [Tibshirami 1996]
• Dropout [Srivastana 2014]
• Group Lasso [Yuan 2016]
Regularization Methods to overcome Overfitting
• Early Stopping
• Ridge Regression (ℓ2 regularization)
• Least Absolute Shrinkage and Selection Operator
(LASSO or ℓ1 regularization )
• Dropout
• Group Lasso
SPARSITY
Regularization Methods to overcome Overfitting
• Early Stopping
• Ridge Regression (ℓ2 regularization)
• Least Absolute Shrinkage and Selection Operator
(LASSO or ℓ1 regularization )
• Dropout
• Group Lasso
STOCHASTICITY
SPARSITY
Regularization Methods to overcome Overfitting
• Early Stopping
• Ridge Regression (ℓ2 regularization)
• Least Absolute Shrinkage and Selection Operator
(LASSO or ℓ1 regularization )
• Dropout
• Group Lasso
STOCHASTICITY
STRUCTURE & SPARSITY
12
SPARSITY
Regularization Methods to overcome Overfitting
• Early Stopping
• Ridge Regression (ℓ2 regularization)
• Least Absolute Shrinkage and Selection Operator
(LASSO or ℓ1 regularization )
• Dropout
• Group Lasso
• PROPOSED: STRUCTURE & STOHASTICITY
STOCHASTICITY
STRUCTURE & SPARSITY
13
SPARSITY
14
Problem Setting: Supervised Learning
• Training samples:
drawn from
• Parameters of the model are estimated by:
Loss per sample
15
Multinomial Logistic Regression
• The class label probability of a given input is:
• Hence, the parameter space is
• The loss per sample is:
16
Dropout
• Randomly removes units in the network during training.
• Idea: Prevents units from co-adapting too much.
• Attractive property: Can be used inside stochastic gradient descent
without an additional computation cost.
[Srivastana 2014]
17
Dropout
• Randomly removes units in the network during training.
• Idea: Prevents units from co-adapting too much.
• Attractive property: Can be used inside stochastic gradient descent
without an additional computation cost.
[Srivastana 2014]
18
Dropout
• Randomly removes units in the network during training.
• Idea: Prevents units from co-adapting too much.
• Attractive property: Can be used inside stochastic gradient descent
without an additional computation cost.
[Srivastana 2014]
19
FeatureDropoutMatrices
Randomly picked matrix
Dropout for Multinomial Logistic Regression
20
FeatureDropoutMatrices
Randomly picked matrix
PERSON A
PERSON B
PERSON X
PERSON Y
PERSON Z
Dropout for Multinomial Logistic Regression
21
FeatureDropoutMatrices
Randomly picked matrix
PERSON A
PERSON B
PERSON X
PERSON Y
PERSON Z
Forward Propagation
Dropout for Multinomial Logistic Regression
22
FeatureDropoutMatrices
Randomly picked matrix
PERSON A
PERSON B
PERSON X
PERSON Y
PERSON Z
Forward Propagation
Back Propagation
Dropout for Multinomial Logistic Regression
23
StructuredProjectionMatrices
PERSON A
PERSON B
PERSON X
PERSON Y
PERSON Z
Forward Propagation
Back Propagation
Replace Masking with Structured Matrices
Randomly picked matrix
24
Replace Masking with Structured Matrices
25
Replace Masking with Structured Matrices
Each is generated from
random samples (size r) with
replacement from the training
data set (size n).
26
Replace Masking with Structured Matrices
27
Replace Masking with Structured Matrices
28
Replace Masking with Structured Matrices
29
Replace Masking with Structured Matrices
30
Replace Masking with Structured Matrices
We project the training
samples onto a lower
dimensional space by
. Hence, weight matrix
becomes:
approximate x
31
Replace Masking with Structured Matrices
To update , we
project the gradients
back to the original
space
32
Replace Masking with Structured Matrices
No projection is
necessary for the
bias term.
33
Dimensionality Reduction Method by Feature
Grouping
Hoyos-Idrobo 2016
34
Dimensionality Reduction Method by Feature
Grouping
Hoyos-Idrobo 2016
35
Dimensionality Reduction Method by Feature
Grouping
Hoyos-Idrobo 2016
36
Recursive Nearest Agglomeration Clustering
(ReNA)
Hoyos-Idrobo 2016
• Agglomerative clustering schemes start off by placing every data
element in its own cluster.
• They proceed by merging repeatedly the closest pair of connected
clusters until finding the desired number of clusters.
37
Insights: Random Reductions While Fitting
• Let where is the deterministic
term and is the zero-mean noise term.
Loss on the
smoothed input
Regularization Cost
variance of the
model given the
smooth input
features
variance of the
estimated target due
to the randomization
Insights: Random Reductions While Fitting
• Regularization Cost:
• For dropout, we have and is diagonal matrix
where for .
• This is equivalent to ridge regression after “orthogonalizing” the
features.
Constant for linear
regression
39
Computational Complexity
Total
number
of epochs
40
Experimental Results: Olivetti Faces
• High Dimensional Data and the sample size is small
• Consists of grayscale 64 x 64 face images from 40 subjects
• For each subject , there are 10 different images with varying light.
• Goal: Identification of the individual whose picture was taken
41
Experimental Results: Olivetti Faces
• High Dimensional Data and the sample size is small
• Consists of grayscale 64 x 64 face images from 40 subjects
• For each subject , there are 10 different images with varying light.
• Goal: Identification of the individual whose picture was taken
42
Experimental Results: Olivetti Faces
• High Dimensional Data and the sample size is small
• Consists of grayscale 64 x 64 face images from 40 subjects
• For each subject , there are 10 different images with varying light.
• Goal: Identification of the individual whose picture was taken
43
Experimental Results: Olivetti Faces
• High Dimensional Data and the sample size is small
• Consists of grayscale 64 x 64 face images from 40 subjects
• For each subject , there are 10 different images with varying light.
• Goal: Identification of the individual whose picture was taken
44
Experimental Results: Olivetti Faces
• High Dimensional Data and the sample size is small
• Consists of grayscale 64 x 64 face images from 40 subjects
• For each subject , there are 10 different images with varying light.
• Goal: Identification of the individual whose picture was taken
45
Experimental Results: Olivetti Faces
• High Dimensional Data and the sample size is small
• Consists of grayscale 64 x 64 face images from 40 subjects
• For each subject , there are 10 different images with varying light.
• Goal: Identification of the individual whose picture was taken
46
Experimental Results: Olivetti Faces
47
Experimental Results: Olivetti Faces
48
Experimental Results: Olivetti Faces
49
Experimental Results: Olivetti Faces
50
Experimental Results: Olivetti Faces
51
Experimental Results: Olivetti Faces
52
Experimental Results: Olivetti Faces
53
Experimental Results: Olivetti Faces
54
Experimental Results: Olivetti Faces
55
Experimental Results: Olivetti Faces
• Visualization of the learned weights for logistic regression for a single
Olivetti face with high noise using different regularizers.
56
Experimental Results: Olivetti Faces
• Performance in terms of loss as a function of computation time for
MLP with a single layer using feature grouping and best parameters
for other regularizers, for Olivetti face data with high noise.
57
Experimental Results: Neuroimaging Data Set
• Openly accessible fMRI data set from Human Connectome Project
• 500 subjects, 8 cognitive tasks to classify
• Feature dimension: 33854, training set: 3052 samples, test set: 791
samples
58
Experimental Results: Neuroimaging Data Set
59
Experimental Results: Neuroimaging Data Set
60
Summary – Stochastic Regularizer
• We introduced a stochastic regularizer
based on feature averaging that
captures the structure of data.
• Our approach leads to higher accuracy
at high noise settings without
additional computation time.
• Learned weights have more structure
at high noise settings.
61
Collaborators and References
• S. Aydore, B. Thirion, O. Grisel, G. Varoquaux. “Using Feature Grouping as a Stochastic
Regularizer for High-Dimensional Noisy Data”, Women in Machine Learning Workshop, NeurIPS
2018, Montreal, Canada, 2018, accessible at arXiv preprint: 1807.11718.
• S. Aydore, L. Dicker, D. Foster.“A local Regret in Nonconvex Online Learning”, Continual
Learning Workshop, NeurIPS 2018, Montreal, Canada, 2018, accessible at arXiv preprint:
1811.05095.
Bertrand Thirion
(INRIA, France)
Olivier Grisel
(INRIA, France)
Gaël Varoquaux
(INRIA, France)
Dean Foster
(Amazon & University of Pennsylvania)
Lee Dicker
(Amazon & University of Rutgers)
Thank You
More on my website…
http://www.sergulaydore.com

More Related Content

What's hot

Ashfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataAshfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, Pepperdata
MLconf
 

What's hot (20)

Clustering introduction
Clustering introductionClustering introduction
Clustering introduction
 
Deep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural NetworksDeep Implicit Layers: Learning Structured Problems with Neural Networks
Deep Implicit Layers: Learning Structured Problems with Neural Networks
 
Context-aware preference modeling with factorization
Context-aware preference modeling with factorizationContext-aware preference modeling with factorization
Context-aware preference modeling with factorization
 
Utilizing additional information in factorization methods (research overview,...
Utilizing additional information in factorization methods (research overview,...Utilizing additional information in factorization methods (research overview,...
Utilizing additional information in factorization methods (research overview,...
 
Improving neural question generation using answer separation
Improving neural question generation using answer separationImproving neural question generation using answer separation
Improving neural question generation using answer separation
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2
 
Ashfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, PepperdataAshfaq Munshi, ML7 Fellow, Pepperdata
Ashfaq Munshi, ML7 Fellow, Pepperdata
 
Ot regularization and_gradient_descent
Ot regularization and_gradient_descentOt regularization and_gradient_descent
Ot regularization and_gradient_descent
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
ddpg seminar
ddpg seminarddpg seminar
ddpg seminar
 
Ml9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methodsMl9 introduction to-unsupervised_learning_and_clustering_methods
Ml9 introduction to-unsupervised_learning_and_clustering_methods
 
Bayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-LearningBayesian Model-Agnostic Meta-Learning
Bayesian Model-Agnostic Meta-Learning
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques[CVPR2020] Simple but effective image enhancement techniques
[CVPR2020] Simple but effective image enhancement techniques
 
Robot, Learning From Data
Robot, Learning From DataRobot, Learning From Data
Robot, Learning From Data
 
QMIX: monotonic value function factorization paper review
QMIX: monotonic value function factorization paper reviewQMIX: monotonic value function factorization paper review
QMIX: monotonic value function factorization paper review
 
Object-Region Video Transformers
Object-Region Video TransformersObject-Region Video Transformers
Object-Region Video Transformers
 
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
 
Ml10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topicsMl10 dimensionality reduction-and_advanced_topics
Ml10 dimensionality reduction-and_advanced_topics
 
InfoGAIL
InfoGAIL InfoGAIL
InfoGAIL
 

Similar to Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy Data, by Sergül Aydöre, Assistant Professor at Stevens Institute of Technology

Robust inference via generative classifiers for handling noisy labels
Robust inference via generative classifiers for handling noisy labelsRobust inference via generative classifiers for handling noisy labels
Robust inference via generative classifiers for handling noisy labels
Kimin Lee
 
Network recasting
Network recastingNetwork recasting
Network recasting
NAVER Engineering
 
Sift based arabic sign language recognition aecia 2014 –november17-19, addis ...
Sift based arabic sign language recognition aecia 2014 –november17-19, addis ...Sift based arabic sign language recognition aecia 2014 –november17-19, addis ...
Sift based arabic sign language recognition aecia 2014 –november17-19, addis ...
Tarek Gaber
 
ensembles_emptytemplate_v2
ensembles_emptytemplate_v2ensembles_emptytemplate_v2
ensembles_emptytemplate_v2
Shrayes Ramesh
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
MLconf
 

Similar to Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy Data, by Sergül Aydöre, Assistant Professor at Stevens Institute of Technology (20)

To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balance
 
Robust inference via generative classifiers for handling noisy labels
Robust inference via generative classifiers for handling noisy labelsRobust inference via generative classifiers for handling noisy labels
Robust inference via generative classifiers for handling noisy labels
 
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...
[CVPR2022, LongVersion] Online Continual Learning on a Contaminated Data Stre...
 
Network recasting
Network recastingNetwork recasting
Network recasting
 
Multi-class Classification on Riemannian Manifolds for Video Surveillance
Multi-class Classification on Riemannian Manifolds for Video SurveillanceMulti-class Classification on Riemannian Manifolds for Video Surveillance
Multi-class Classification on Riemannian Manifolds for Video Surveillance
 
convolutional_rbm.ppt
convolutional_rbm.pptconvolutional_rbm.ppt
convolutional_rbm.ppt
 
Unsupervised Learning Clustering KMean and Hirarchical.pptx
Unsupervised Learning Clustering KMean and Hirarchical.pptxUnsupervised Learning Clustering KMean and Hirarchical.pptx
Unsupervised Learning Clustering KMean and Hirarchical.pptx
 
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...
Master defence 2020 - Oleh Misko - Ensembling and Transfer Learning for Multi...
 
Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...Two strategies for large-scale multi-label classification on the YouTube-8M d...
Two strategies for large-scale multi-label classification on the YouTube-8M d...
 
Sift based arabic sign language recognition aecia 2014 –november17-19, addis ...
Sift based arabic sign language recognition aecia 2014 –november17-19, addis ...Sift based arabic sign language recognition aecia 2014 –november17-19, addis ...
Sift based arabic sign language recognition aecia 2014 –november17-19, addis ...
 
ensembles_emptytemplate_v2
ensembles_emptytemplate_v2ensembles_emptytemplate_v2
ensembles_emptytemplate_v2
 
Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering concepts
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
Learning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted DropoutLearning Sparse Networks using Targeted Dropout
Learning Sparse Networks using Targeted Dropout
 
A Re-evaluation of Pedestrian Detection on Riemannian Manifolds
A Re-evaluation of Pedestrian Detection on Riemannian ManifoldsA Re-evaluation of Pedestrian Detection on Riemannian Manifolds
A Re-evaluation of Pedestrian Detection on Riemannian Manifolds
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
 

More from WiMLDSMontreal

More from WiMLDSMontreal (11)

The Five Ws of Funding, by Sahar Ansary, Partner, R&D Partners
The Five Ws of Funding, by Sahar Ansary, Partner, R&D PartnersThe Five Ws of Funding, by Sahar Ansary, Partner, R&D Partners
The Five Ws of Funding, by Sahar Ansary, Partner, R&D Partners
 
The Agile methodology - Delivering new ways of working, by Sandra Frechette, ...
The Agile methodology - Delivering new ways of working, by Sandra Frechette, ...The Agile methodology - Delivering new ways of working, by Sandra Frechette, ...
The Agile methodology - Delivering new ways of working, by Sandra Frechette, ...
 
Coveo Machine Learning for E-Commerce: At the Center of Business Challenges, ...
Coveo Machine Learning for E-Commerce: At the Center of Business Challenges, ...Coveo Machine Learning for E-Commerce: At the Center of Business Challenges, ...
Coveo Machine Learning for E-Commerce: At the Center of Business Challenges, ...
 
How to build a data science project in a corporate setting, by Soraya Christi...
How to build a data science project in a corporate setting, by Soraya Christi...How to build a data science project in a corporate setting, by Soraya Christi...
How to build a data science project in a corporate setting, by Soraya Christi...
 
Diversity and Knowledge Production, by Jihane Lamouri, Diversity, Equity and ...
Diversity and Knowledge Production, by Jihane Lamouri, Diversity, Equity and ...Diversity and Knowledge Production, by Jihane Lamouri, Diversity, Equity and ...
Diversity and Knowledge Production, by Jihane Lamouri, Diversity, Equity and ...
 
Diversity & Deep Tech Start-ups, by Eleonora Vella, Program Director & Princi...
Diversity & Deep Tech Start-ups, by Eleonora Vella, Program Director & Princi...Diversity & Deep Tech Start-ups, by Eleonora Vella, Program Director & Princi...
Diversity & Deep Tech Start-ups, by Eleonora Vella, Program Director & Princi...
 
Ubiquitous Machine Learning: Lessons from DeepRL in Robotics and Speech, by F...
Ubiquitous Machine Learning: Lessons from DeepRL in Robotics and Speech, by F...Ubiquitous Machine Learning: Lessons from DeepRL in Robotics and Speech, by F...
Ubiquitous Machine Learning: Lessons from DeepRL in Robotics and Speech, by F...
 
Fashion-Gen: The Generative Fashion Dataset and Challenge by Negar Rostamzade...
Fashion-Gen: The Generative Fashion Dataset and Challenge by Negar Rostamzade...Fashion-Gen: The Generative Fashion Dataset and Challenge by Negar Rostamzade...
Fashion-Gen: The Generative Fashion Dataset and Challenge by Negar Rostamzade...
 
Artistic Applications of AI, by Luba Elliott, AI Curator
Artistic Applications of AI, by Luba Elliott, AI CuratorArtistic Applications of AI, by Luba Elliott, AI Curator
Artistic Applications of AI, by Luba Elliott, AI Curator
 
What Scares Me About AI, by Rachel Thomas, Co-founder of fast.ai & Professor ...
What Scares Me About AI, by Rachel Thomas, Co-founder of fast.ai & Professor ...What Scares Me About AI, by Rachel Thomas, Co-founder of fast.ai & Professor ...
What Scares Me About AI, by Rachel Thomas, Co-founder of fast.ai & Professor ...
 
Building Analytics and Data Science at A Start-Up, by Kathleen Siminyu, Head ...
Building Analytics and Data Science at A Start-Up, by Kathleen Siminyu, Head ...Building Analytics and Data Science at A Start-Up, by Kathleen Siminyu, Head ...
Building Analytics and Data Science at A Start-Up, by Kathleen Siminyu, Head ...
 

Recently uploaded

Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
Bhaskar Mitra
 

Recently uploaded (20)

Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
Search and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical FuturesSearch and Society: Reimagining Information Access for Radical Futures
Search and Society: Reimagining Information Access for Radical Futures
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdf
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 

Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy Data, by Sergül Aydöre, Assistant Professor at Stevens Institute of Technology

  • 1. Using Feature Grouping as a Stochastic Regularizer for High Dimensional Noisy Data Sergül Aydöre Assistant Professor Electrical and Computer Engineering Stevens Institute of Technology
  • 2. 2 Landscape of Machine Learning Applications https://research.hubspot.com/charts/simplified-ai-landscape​
  • 3. • Data is High Dimensional, Noisy and Sample Size is small as in NeuroImaging 3 But what if PET acquisition process wikipedia Implantation of intracranial electrodes. Cleveland Epilepsy Clinic An elastic EEG cap with 60 electrodes [Bai2012] A typical MEG equipment [BML2001] MRI Scanner and rs-fMRI time series acquisition [NVIDIA]
  • 4. 4 Other High Dimensional, Noisy Data and Small Sample Size Situations Genomics Integrative Genomics Viewer, 2012 Seismology https://www.mapnagroup.com Astronomy AstronomyMagazine, 2015
  • 5. 5 Challenges 1. High Dimensionality of the data due to rich temporal and spatial structure
  • 6. 6 Challenges 1. High Dimensionality of the data due to rich temporal and spatial structure 2. Noise in the data due to mechanical or physical artifacts.
  • 7. 7 Challenges 1. High Dimensionality of the data due to rich temporal and spatial structure 2. Noise in the data due to mechanical or physical artifacts. 3. Difficulty and cost of data collection
  • 8. 8 Overfitting • ML models with large number of parameters require large amount of data. Otherwise, overfitting can occur! http://scott.fortmann-roe.com/docs/MeasuringError.html
  • 9. 9 Regularization Methods to overcome Overfitting • Early Stopping [Yao, 2007] • Ridge Regression (ℓ2 regularization) [Tibshirami 1996] • Least Absolute Shrinkage and Selection Operator (LASSO or ℓ1 regularization ) [Tibshirami 1996] • Dropout [Srivastana 2014] • Group Lasso [Yuan 2016]
  • 10. Regularization Methods to overcome Overfitting • Early Stopping • Ridge Regression (ℓ2 regularization) • Least Absolute Shrinkage and Selection Operator (LASSO or ℓ1 regularization ) • Dropout • Group Lasso SPARSITY
  • 11. Regularization Methods to overcome Overfitting • Early Stopping • Ridge Regression (ℓ2 regularization) • Least Absolute Shrinkage and Selection Operator (LASSO or ℓ1 regularization ) • Dropout • Group Lasso STOCHASTICITY SPARSITY
  • 12. Regularization Methods to overcome Overfitting • Early Stopping • Ridge Regression (ℓ2 regularization) • Least Absolute Shrinkage and Selection Operator (LASSO or ℓ1 regularization ) • Dropout • Group Lasso STOCHASTICITY STRUCTURE & SPARSITY 12 SPARSITY
  • 13. Regularization Methods to overcome Overfitting • Early Stopping • Ridge Regression (ℓ2 regularization) • Least Absolute Shrinkage and Selection Operator (LASSO or ℓ1 regularization ) • Dropout • Group Lasso • PROPOSED: STRUCTURE & STOHASTICITY STOCHASTICITY STRUCTURE & SPARSITY 13 SPARSITY
  • 14. 14 Problem Setting: Supervised Learning • Training samples: drawn from • Parameters of the model are estimated by: Loss per sample
  • 15. 15 Multinomial Logistic Regression • The class label probability of a given input is: • Hence, the parameter space is • The loss per sample is:
  • 16. 16 Dropout • Randomly removes units in the network during training. • Idea: Prevents units from co-adapting too much. • Attractive property: Can be used inside stochastic gradient descent without an additional computation cost. [Srivastana 2014]
  • 17. 17 Dropout • Randomly removes units in the network during training. • Idea: Prevents units from co-adapting too much. • Attractive property: Can be used inside stochastic gradient descent without an additional computation cost. [Srivastana 2014]
  • 18. 18 Dropout • Randomly removes units in the network during training. • Idea: Prevents units from co-adapting too much. • Attractive property: Can be used inside stochastic gradient descent without an additional computation cost. [Srivastana 2014]
  • 19. 19 FeatureDropoutMatrices Randomly picked matrix Dropout for Multinomial Logistic Regression
  • 20. 20 FeatureDropoutMatrices Randomly picked matrix PERSON A PERSON B PERSON X PERSON Y PERSON Z Dropout for Multinomial Logistic Regression
  • 21. 21 FeatureDropoutMatrices Randomly picked matrix PERSON A PERSON B PERSON X PERSON Y PERSON Z Forward Propagation Dropout for Multinomial Logistic Regression
  • 22. 22 FeatureDropoutMatrices Randomly picked matrix PERSON A PERSON B PERSON X PERSON Y PERSON Z Forward Propagation Back Propagation Dropout for Multinomial Logistic Regression
  • 23. 23 StructuredProjectionMatrices PERSON A PERSON B PERSON X PERSON Y PERSON Z Forward Propagation Back Propagation Replace Masking with Structured Matrices Randomly picked matrix
  • 24. 24 Replace Masking with Structured Matrices
  • 25. 25 Replace Masking with Structured Matrices Each is generated from random samples (size r) with replacement from the training data set (size n).
  • 26. 26 Replace Masking with Structured Matrices
  • 27. 27 Replace Masking with Structured Matrices
  • 28. 28 Replace Masking with Structured Matrices
  • 29. 29 Replace Masking with Structured Matrices
  • 30. 30 Replace Masking with Structured Matrices We project the training samples onto a lower dimensional space by . Hence, weight matrix becomes: approximate x
  • 31. 31 Replace Masking with Structured Matrices To update , we project the gradients back to the original space
  • 32. 32 Replace Masking with Structured Matrices No projection is necessary for the bias term.
  • 33. 33 Dimensionality Reduction Method by Feature Grouping Hoyos-Idrobo 2016
  • 34. 34 Dimensionality Reduction Method by Feature Grouping Hoyos-Idrobo 2016
  • 35. 35 Dimensionality Reduction Method by Feature Grouping Hoyos-Idrobo 2016
  • 36. 36 Recursive Nearest Agglomeration Clustering (ReNA) Hoyos-Idrobo 2016 • Agglomerative clustering schemes start off by placing every data element in its own cluster. • They proceed by merging repeatedly the closest pair of connected clusters until finding the desired number of clusters.
  • 37. 37 Insights: Random Reductions While Fitting • Let where is the deterministic term and is the zero-mean noise term. Loss on the smoothed input Regularization Cost variance of the model given the smooth input features variance of the estimated target due to the randomization
  • 38. Insights: Random Reductions While Fitting • Regularization Cost: • For dropout, we have and is diagonal matrix where for . • This is equivalent to ridge regression after “orthogonalizing” the features. Constant for linear regression
  • 40. 40 Experimental Results: Olivetti Faces • High Dimensional Data and the sample size is small • Consists of grayscale 64 x 64 face images from 40 subjects • For each subject , there are 10 different images with varying light. • Goal: Identification of the individual whose picture was taken
  • 41. 41 Experimental Results: Olivetti Faces • High Dimensional Data and the sample size is small • Consists of grayscale 64 x 64 face images from 40 subjects • For each subject , there are 10 different images with varying light. • Goal: Identification of the individual whose picture was taken
  • 42. 42 Experimental Results: Olivetti Faces • High Dimensional Data and the sample size is small • Consists of grayscale 64 x 64 face images from 40 subjects • For each subject , there are 10 different images with varying light. • Goal: Identification of the individual whose picture was taken
  • 43. 43 Experimental Results: Olivetti Faces • High Dimensional Data and the sample size is small • Consists of grayscale 64 x 64 face images from 40 subjects • For each subject , there are 10 different images with varying light. • Goal: Identification of the individual whose picture was taken
  • 44. 44 Experimental Results: Olivetti Faces • High Dimensional Data and the sample size is small • Consists of grayscale 64 x 64 face images from 40 subjects • For each subject , there are 10 different images with varying light. • Goal: Identification of the individual whose picture was taken
  • 45. 45 Experimental Results: Olivetti Faces • High Dimensional Data and the sample size is small • Consists of grayscale 64 x 64 face images from 40 subjects • For each subject , there are 10 different images with varying light. • Goal: Identification of the individual whose picture was taken
  • 55. 55 Experimental Results: Olivetti Faces • Visualization of the learned weights for logistic regression for a single Olivetti face with high noise using different regularizers.
  • 56. 56 Experimental Results: Olivetti Faces • Performance in terms of loss as a function of computation time for MLP with a single layer using feature grouping and best parameters for other regularizers, for Olivetti face data with high noise.
  • 57. 57 Experimental Results: Neuroimaging Data Set • Openly accessible fMRI data set from Human Connectome Project • 500 subjects, 8 cognitive tasks to classify • Feature dimension: 33854, training set: 3052 samples, test set: 791 samples
  • 60. 60 Summary – Stochastic Regularizer • We introduced a stochastic regularizer based on feature averaging that captures the structure of data. • Our approach leads to higher accuracy at high noise settings without additional computation time. • Learned weights have more structure at high noise settings.
  • 61. 61 Collaborators and References • S. Aydore, B. Thirion, O. Grisel, G. Varoquaux. “Using Feature Grouping as a Stochastic Regularizer for High-Dimensional Noisy Data”, Women in Machine Learning Workshop, NeurIPS 2018, Montreal, Canada, 2018, accessible at arXiv preprint: 1807.11718. • S. Aydore, L. Dicker, D. Foster.“A local Regret in Nonconvex Online Learning”, Continual Learning Workshop, NeurIPS 2018, Montreal, Canada, 2018, accessible at arXiv preprint: 1811.05095. Bertrand Thirion (INRIA, France) Olivier Grisel (INRIA, France) Gaël Varoquaux (INRIA, France) Dean Foster (Amazon & University of Pennsylvania) Lee Dicker (Amazon & University of Rutgers)
  • 62. Thank You More on my website… http://www.sergulaydore.com

Editor's Notes

  1. In the graphic below, the x-axis reflects the level of technical sophistication the AI tool has. The y-axis represents the mass appeal of the tool. Here is a landscape for the popular machine learning applications. It is of course very exciting to see such progress in AI. But all these applications require massive amounts of data to train machine learning models.
  2. Some fields such as brain imaging often does not have such massive amounts of samples whereas the dimension of the features is large due to the rich spatial and temporal information.
  3. This problem is not limited to brain imaging. There are other fields which also suffer from small-sample data situations.
  4. The performance of machine learning models is often evaluated by their prediction ability on unseen data. While each iteration of model training decreases the training risk, fitting the training data too well can lead to failure in generalization on future predictions. This phenomenon is often called ``overfitting’’ in the field of machine learning. The risk of overfitting is more severe for high-dimensional data-scarce situations. Such situations are common when the data collection is expensive, as in neuroscience, biology, or geology.
  5. Feature grouping defines a matrix Φ that extracts piece- wise constant approximations of the data Let ΦFG be a matrix composed with constant amplitude groups (clusters). Formally, the set of k clusters is given by P = {C1, C2, . . . , Ck}, where each cluster Cq ⊂ [p] contains a set of indexesthatdoesnotoverlapotherclusters,C ∩C =∅,for 􏰚ql all q ̸= l. Thus, (ΦFG x)q = αq j∈Cq xj yields a reduction of a data sample x on the q-th cluster, where αq is a constant for each cluster. With an appropriate permutation of the indexes of the data x, the matrix ΦFG can be written as We call ΦFG x ∈ Rk the reduced version of x and ΦTFGΦFG x ∈ Rp the approximation of x.
  6. Feature grouping defines a matrix Φ that extracts piece- wise constant approximations of the data Let ΦFG be a matrix composed with constant amplitude groups (clusters). Formally, the set of k clusters is given by P = {C1, C2, . . . , Ck}, where each cluster Cq ⊂ [p] contains a set of indexesthatdoesnotoverlapotherclusters,C ∩C =∅,for 􏰚ql all q ̸= l. Thus, (ΦFG x)q = αq j∈Cq xj yields a reduction of a data sample x on the q-th cluster, where αq is a constant for each cluster. With an appropriate permutation of the indexes of the data x, the matrix ΦFG can be written as We call ΦFG x ∈ Rk the reduced version of x and ΦTFGΦFG x ∈ Rp the approximation of x.
  7. Feature grouping defines a matrix Φ that extracts piece- wise constant approximations of the data Let ΦFG be a matrix composed with constant amplitude groups (clusters). Formally, the set of k clusters is given by P = {C1, C2, . . . , Ck}, where each cluster Cq ⊂ [p] contains a set of indexesthatdoesnotoverlapotherclusters,C ∩C =∅,for 􏰚ql all q ̸= l. Thus, (ΦFG x)q = αq j∈Cq xj yields a reduction of a data sample x on the q-th cluster, where αq is a constant for each cluster. With an appropriate permutation of the indexes of the data x, the matrix ΦFG can be written as We call ΦFG x ∈ Rk the reduced version of x and ΦTFGΦFG x ∈ Rp the approximation of x.