SlideShare a Scribd company logo
Large Scale Kernel Learning using Block
Coordinate Descent
Shaleen Kumar Gupta, Research Assistant3
Authors:
Stephen Tu1 Rebecca Reolofs1 Shivaram Venkatraman1
Benjamin Recht1,2
1Department of Electrical Engineering and Computer Science
UC Berkeley, Berkeley, CA
2Department of Statistics
UC Berkeley, Berkeley, CA
3Nanyang Technological University, 2016
Outline
1 Overview
Introduction
Background
2 Datasets
TIMIT
Yelp Reviews
CIFAR-10
3 Experimental Results
4 Performance and Scalability
5 Conclusion
Outline
1 Overview
Introduction
Background
2 Datasets
TIMIT
Yelp Reviews
CIFAR-10
3 Experimental Results
4 Performance and Scalability
5 Conclusion
Overview
Kernel methods are a powerful tool in machine learning,
allowing one to discover non-linear structure by mapping data
into a higher dimensional, possibly infinite, feature space.
Problem: They do not scale well.
This paper attempts to exploit distributed computation in
Block CD and present results.
Moreover, the paper attempts to study the performance of
Random Features and Nystrom approximations on three large
datasets from speech (TIMIT), text (Yelp) and image
classification (CIFAR-10) domains.
Outline
1 Overview
Introduction
Background
2 Datasets
TIMIT
Yelp Reviews
CIFAR-10
3 Experimental Results
4 Performance and Scalability
5 Conclusion
Kernel Methods
https://www.reddit.com/r/MachineLearning/comments/15zrpp/please_explain_support_vector_machines_
svm_like_i/c7rkwce
If our data can’t be separated by a straight line we might need
to use a curvy line.
Kernel Methods
https://www.reddit.com/r/MachineLearning/comments/15zrpp/please_explain_support_vector_machines_
svm_like_i/c7rkwce
If our data can’t be separated by a straight line we might need
to use a curvy line.
A straight line in a higher dimensional space can be a curvy
line when projected onto a lower dimensional space.
So what we are really doing is using the kernel to put our data
into a high dimensional space, then finding a hyperplane to
separate the data in that high dimensional space.
This straight line looks like a curvy line when we bring it down
to the lower dimensional space.
Kernel Approximation Techniques (1/2)
Kernel Trick: The essence of the kernel-trick is that if you
can describe an algorithm in a certain way – which is using
only inner products – then you never need to actually use the
feature mapping, as long as you can compute the inner
product in the feature space.
While there are many kernel approximation techniques to do
the Kernel Trick, one prominent one is using the RBF Kernel.
We will also analyze two other Kernel approximation
techniques, namely Nystrom Method and Random Features
Technique, in this paper.
Kernel Approximation Techniques (2/2)
If we would use all data points, we would map to an RN
dimensional space and have the scaling problems.
Also, we would need to store all kernel values.
Nystrom method says that we don’t need go to the full space
spanned by all N training points, but we can just use a subset.
This will only yield an approximate embedding but if we keep
the number of samples we use the same, the resulting
embedding will be independent of dataset size and we can
basically choose the complexity to suit our problem.
Random feature based methods use an element-wise
approximation of the kernel.
Outline
1 Overview
Introduction
Background
2 Datasets
TIMIT
Yelp Reviews
CIFAR-10
3 Experimental Results
4 Performance and Scalability
5 Conclusion
TIMIT
Phone classification task was performed on the TIMIT
dataset, which consisted of spoken audio from 462 speakers
The authors applied a Gaussian (RBF) kernel for the Nystrom
and exact methods and used random cosines for the random
feature method.
Outline
1 Overview
Introduction
Background
2 Datasets
TIMIT
Yelp Reviews
CIFAR-10
3 Experimental Results
4 Performance and Scalability
5 Conclusion
Yelp Reviews
The goal was to predict a rating from one to five stars from
the text of a review.
A usual 80:20 Training:Test split was applied
nltk was used for tokenization and stemming and n-gram
modeling was done with n=3.
For the exact and Nystrom experiments, they apply a linear
kernel.
For random features, they apply a hash kernel using
MurmurHash3 as their hash function.
Since they were predicting ratings for a review, they measured
accuracy by using the root mean square error (RMSE) of the
predicted rating as compared to the actual rating.
Outline
1 Overview
Introduction
Background
2 Datasets
TIMIT
Yelp Reviews
CIFAR-10
3 Experimental Results
4 Performance and Scalability
5 Conclusion
CIFAR-10
The task was to do image classification of the CIFAR-10
dataset.
The dataset contained 500,000 training images and 4096
features per image.
The authors started with these 4096 features in the dataset as
input and used the RBF kernel for the exact and Nystrom
method and random cosines for the random features method.
Experimental Results (1/3)
Figure: Classification Error against Time using different methods on the
TIMIT, Yelp and CIFAR-10 datasets. The little black stars denote the
end of an epoch
Experimental Results (2/3)
Figure: Classification Error against number of features for Nystrom and
Random Features on the TIMIT, Yelp and CIFAR-10 datasets
Experimental Results (3/3)
Performance
Figure: Breakdown of time to compute a single block of coordinate
descent in the first epoch on the TIMIT, Yelp and CIFAR-10 datasets
From the figure, we see that the choice of the kernel
approximation can significantly impact performance since
different kernels take different amounts of time to generate.
For example, the hash random feature used for the Yelp
dataset is much cheaper to compute than the string kernel.
However, computing a block of the RBF kernel is similar in
cost to computing a block of random cosine features.
Scalability of RBF Kernel Generation
Figure: Time taken to compute on eblock of RBF kernel as they scale the
number of examples and the number of machines used
Here, ideal scaling implies that the time to generate a block of the kernel
matrix remains constant as they increase both the data and the number
of machines.
However, computing a block of the RBF kernel involves broadcasting a b
x d matrix to all the machines in the cluster. This causes a slight
decrease in performance as they go from 8 to 128 machines. However,
they believe that the kernel block generation methods will continue to
scale well for larger datasets since broadcast routines scale as O(logM).
Conclusion
This paper shows that scalable kernel machines are feasible
with distributed computation.
Results suggest that the Nystrom method generally achieves
better statistical accuracy than random features
However, it can require significantly more iterations of
optimization.
On the theoretical side, a limitation of this analysis is that
achieving rates better than gradient descent cannot be hoped.
References and Further Reading I
Stephen Tu, Rebecca Roelofs, Shivaram Venkataraman,
Benjamin Recht
Large Scale Kernel Learning using Block Coordinate Descent
February 18, 2016
Tianbao Yang, Yu-feng Li, Mehrdad Mahdavi, Rong Jin,
Zhi-Hua Zhou
Nystrom Method vs Random Fourier Features: A Theoretical
and Empirical Comparison
Advances in Neural Information Processing Systems 25 (NIPS
2012)

More Related Content

What's hot

PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
Jinwon Lee
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
Alpine Data
 
Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Sangamesh Ragate
 
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
Pooyan Jamshidi
 
Spine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationSpine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localization
Devansh16
 
A Novel Framework and Policies for On-line Block of Cores Allotment for Multi...
A Novel Framework and Policies for On-line Block of Cores Allotment for Multi...A Novel Framework and Policies for On-line Block of Cores Allotment for Multi...
A Novel Framework and Policies for On-line Block of Cores Allotment for Multi...
ijcsa
 
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup  - Alex PerrierLarge data with Scikit-learn - Boston Data Mining Meetup  - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
Alexis Perrier
 
virtualization
virtualizationvirtualization
virtualization
Avi Nash
 
Energy Curtailing with Huddling Practices with Fuzzy in Wireless Sensor Network
Energy Curtailing with Huddling Practices with Fuzzy in Wireless Sensor NetworkEnergy Curtailing with Huddling Practices with Fuzzy in Wireless Sensor Network
Energy Curtailing with Huddling Practices with Fuzzy in Wireless Sensor Network
ijsrd.com
 
An Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchAn Introduction to Neural Architecture Search
An Introduction to Neural Architecture Search
Bill Liu
 
Producer consumer-problems
Producer consumer-problemsProducer consumer-problems
Producer consumer-problemsRichard Ashworth
 
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
taeseon ryu
 
Chap3 slides
Chap3 slidesChap3 slides
Chap3 slides
BaliThorat1
 
Image Processing IEEE 2015 Projects
Image Processing IEEE 2015 ProjectsImage Processing IEEE 2015 Projects
Image Processing IEEE 2015 Projects
Vijay Karan
 
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
Sunghoon Joo
 
Beyond data and model parallelism for deep neural networks
Beyond data and model parallelism for deep neural networksBeyond data and model parallelism for deep neural networks
Beyond data and model parallelism for deep neural networks
JunKudo2
 
Optimal buffer allocation in
Optimal buffer allocation inOptimal buffer allocation in
Optimal buffer allocation in
csandit
 
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
taeseon ryu
 

What's hot (20)

PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)
 
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
An Uncertainty-Aware Approach to Optimal Configuration of Stream Processing S...
 
Spine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localizationSpine net learning scale permuted backbone for recognition and localization
Spine net learning scale permuted backbone for recognition and localization
 
A Novel Framework and Policies for On-line Block of Cores Allotment for Multi...
A Novel Framework and Policies for On-line Block of Cores Allotment for Multi...A Novel Framework and Policies for On-line Block of Cores Allotment for Multi...
A Novel Framework and Policies for On-line Block of Cores Allotment for Multi...
 
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup  - Alex PerrierLarge data with Scikit-learn - Boston Data Mining Meetup  - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
 
virtualization
virtualizationvirtualization
virtualization
 
Energy Curtailing with Huddling Practices with Fuzzy in Wireless Sensor Network
Energy Curtailing with Huddling Practices with Fuzzy in Wireless Sensor NetworkEnergy Curtailing with Huddling Practices with Fuzzy in Wireless Sensor Network
Energy Curtailing with Huddling Practices with Fuzzy in Wireless Sensor Network
 
An Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchAn Introduction to Neural Architecture Search
An Introduction to Neural Architecture Search
 
Producer consumer-problems
Producer consumer-problemsProducer consumer-problems
Producer consumer-problems
 
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
[딥논읽] Meta-Transfer Learning for Zero-Shot Super-Resolution paper review
 
post119s1-file3
post119s1-file3post119s1-file3
post119s1-file3
 
Chap3 slides
Chap3 slidesChap3 slides
Chap3 slides
 
Image Processing IEEE 2015 Projects
Image Processing IEEE 2015 ProjectsImage Processing IEEE 2015 Projects
Image Processing IEEE 2015 Projects
 
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
 
[ppt]
[ppt][ppt]
[ppt]
 
Beyond data and model parallelism for deep neural networks
Beyond data and model parallelism for deep neural networksBeyond data and model parallelism for deep neural networks
Beyond data and model parallelism for deep neural networks
 
Optimal buffer allocation in
Optimal buffer allocation inOptimal buffer allocation in
Optimal buffer allocation in
 
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Eff...
 

Similar to Large Scale Kernel Learning using Block Coordinate Descent

Predicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesPredicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensembles
Varad Meru
 
Super Resolution with OCR Optimization
Super Resolution with OCR OptimizationSuper Resolution with OCR Optimization
Super Resolution with OCR Optimization
niveditJain
 
IRJET- Automatic Object Sorting using Deep Learning
IRJET- Automatic Object Sorting using Deep LearningIRJET- Automatic Object Sorting using Deep Learning
IRJET- Automatic Object Sorting using Deep Learning
IRJET Journal
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)
Fatimakhan325
 
Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio
Armando Vieira
 
SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.
bhavinecindus
 
Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Pedro Lopes
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...
Classification of Iris Data using Kernel Radial Basis Probabilistic  Neural N...Classification of Iris Data using Kernel Radial Basis Probabilistic  Neural N...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...
Scientific Review SR
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Scientific Review
 
Hyper-parameter optimization of convolutional neural network based on particl...
Hyper-parameter optimization of convolutional neural network based on particl...Hyper-parameter optimization of convolutional neural network based on particl...
Hyper-parameter optimization of convolutional neural network based on particl...
journalBEEI
 
Hand Written Digit Classification
Hand Written Digit ClassificationHand Written Digit Classification
Hand Written Digit Classification
ijtsrd
 
Semantic Image Retrieval Using Relevance Feedback
Semantic Image Retrieval Using Relevance Feedback  Semantic Image Retrieval Using Relevance Feedback
Semantic Image Retrieval Using Relevance Feedback
dannyijwest
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET Journal
 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
IAEME Publication
 
A Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means AlgorithmA Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means Algorithm
IRJET Journal
 
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...
ijceronline
 
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGA HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
cscpconf
 

Similar to Large Scale Kernel Learning using Block Coordinate Descent (20)

Predicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesPredicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensembles
 
Super Resolution with OCR Optimization
Super Resolution with OCR OptimizationSuper Resolution with OCR Optimization
Super Resolution with OCR Optimization
 
IRJET- Automatic Object Sorting using Deep Learning
IRJET- Automatic Object Sorting using Deep LearningIRJET- Automatic Object Sorting using Deep Learning
IRJET- Automatic Object Sorting using Deep Learning
 
Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)Types of Machine Learnig Algorithms(CART, ID3)
Types of Machine Learnig Algorithms(CART, ID3)
 
Recursive
RecursiveRecursive
Recursive
 
Model checking
Model checkingModel checking
Model checking
 
Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio
 
SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.SYNOPSIS on Parse representation and Linear SVM.
SYNOPSIS on Parse representation and Linear SVM.
 
Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013Poster_Reseau_Neurones_Journees_2013
Poster_Reseau_Neurones_Journees_2013
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...
Classification of Iris Data using Kernel Radial Basis Probabilistic  Neural N...Classification of Iris Data using Kernel Radial Basis Probabilistic  Neural N...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural N...
 
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
 
Hyper-parameter optimization of convolutional neural network based on particl...
Hyper-parameter optimization of convolutional neural network based on particl...Hyper-parameter optimization of convolutional neural network based on particl...
Hyper-parameter optimization of convolutional neural network based on particl...
 
Hand Written Digit Classification
Hand Written Digit ClassificationHand Written Digit Classification
Hand Written Digit Classification
 
Semantic Image Retrieval Using Relevance Feedback
Semantic Image Retrieval Using Relevance Feedback  Semantic Image Retrieval Using Relevance Feedback
Semantic Image Retrieval Using Relevance Feedback
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
 
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
AN IMPROVED METHOD FOR IDENTIFYING WELL-TEST INTERPRETATION MODEL BASED ON AG...
 
A Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means AlgorithmA Study of Efficiency Improvements Technique for K-Means Algorithm
A Study of Efficiency Improvements Technique for K-Means Algorithm
 
Fulltext
FulltextFulltext
Fulltext
 
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...
PSO-based Training, Pruning, and Ensembling of Extreme Learning Machine RBF N...
 
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MININGA HYBRID CLUSTERING ALGORITHM FOR DATA MINING
A HYBRID CLUSTERING ALGORITHM FOR DATA MINING
 

Recently uploaded

FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 

Recently uploaded (20)

FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 

Large Scale Kernel Learning using Block Coordinate Descent

  • 1. Large Scale Kernel Learning using Block Coordinate Descent Shaleen Kumar Gupta, Research Assistant3 Authors: Stephen Tu1 Rebecca Reolofs1 Shivaram Venkatraman1 Benjamin Recht1,2 1Department of Electrical Engineering and Computer Science UC Berkeley, Berkeley, CA 2Department of Statistics UC Berkeley, Berkeley, CA 3Nanyang Technological University, 2016
  • 2. Outline 1 Overview Introduction Background 2 Datasets TIMIT Yelp Reviews CIFAR-10 3 Experimental Results 4 Performance and Scalability 5 Conclusion
  • 3. Outline 1 Overview Introduction Background 2 Datasets TIMIT Yelp Reviews CIFAR-10 3 Experimental Results 4 Performance and Scalability 5 Conclusion
  • 4. Overview Kernel methods are a powerful tool in machine learning, allowing one to discover non-linear structure by mapping data into a higher dimensional, possibly infinite, feature space. Problem: They do not scale well. This paper attempts to exploit distributed computation in Block CD and present results. Moreover, the paper attempts to study the performance of Random Features and Nystrom approximations on three large datasets from speech (TIMIT), text (Yelp) and image classification (CIFAR-10) domains.
  • 5. Outline 1 Overview Introduction Background 2 Datasets TIMIT Yelp Reviews CIFAR-10 3 Experimental Results 4 Performance and Scalability 5 Conclusion
  • 7. Kernel Methods https://www.reddit.com/r/MachineLearning/comments/15zrpp/please_explain_support_vector_machines_ svm_like_i/c7rkwce If our data can’t be separated by a straight line we might need to use a curvy line. A straight line in a higher dimensional space can be a curvy line when projected onto a lower dimensional space. So what we are really doing is using the kernel to put our data into a high dimensional space, then finding a hyperplane to separate the data in that high dimensional space. This straight line looks like a curvy line when we bring it down to the lower dimensional space.
  • 8. Kernel Approximation Techniques (1/2) Kernel Trick: The essence of the kernel-trick is that if you can describe an algorithm in a certain way – which is using only inner products – then you never need to actually use the feature mapping, as long as you can compute the inner product in the feature space. While there are many kernel approximation techniques to do the Kernel Trick, one prominent one is using the RBF Kernel. We will also analyze two other Kernel approximation techniques, namely Nystrom Method and Random Features Technique, in this paper.
  • 9. Kernel Approximation Techniques (2/2) If we would use all data points, we would map to an RN dimensional space and have the scaling problems. Also, we would need to store all kernel values. Nystrom method says that we don’t need go to the full space spanned by all N training points, but we can just use a subset. This will only yield an approximate embedding but if we keep the number of samples we use the same, the resulting embedding will be independent of dataset size and we can basically choose the complexity to suit our problem. Random feature based methods use an element-wise approximation of the kernel.
  • 10. Outline 1 Overview Introduction Background 2 Datasets TIMIT Yelp Reviews CIFAR-10 3 Experimental Results 4 Performance and Scalability 5 Conclusion
  • 11. TIMIT Phone classification task was performed on the TIMIT dataset, which consisted of spoken audio from 462 speakers The authors applied a Gaussian (RBF) kernel for the Nystrom and exact methods and used random cosines for the random feature method.
  • 12. Outline 1 Overview Introduction Background 2 Datasets TIMIT Yelp Reviews CIFAR-10 3 Experimental Results 4 Performance and Scalability 5 Conclusion
  • 13. Yelp Reviews The goal was to predict a rating from one to five stars from the text of a review. A usual 80:20 Training:Test split was applied nltk was used for tokenization and stemming and n-gram modeling was done with n=3. For the exact and Nystrom experiments, they apply a linear kernel. For random features, they apply a hash kernel using MurmurHash3 as their hash function. Since they were predicting ratings for a review, they measured accuracy by using the root mean square error (RMSE) of the predicted rating as compared to the actual rating.
  • 14. Outline 1 Overview Introduction Background 2 Datasets TIMIT Yelp Reviews CIFAR-10 3 Experimental Results 4 Performance and Scalability 5 Conclusion
  • 15. CIFAR-10 The task was to do image classification of the CIFAR-10 dataset. The dataset contained 500,000 training images and 4096 features per image. The authors started with these 4096 features in the dataset as input and used the RBF kernel for the exact and Nystrom method and random cosines for the random features method.
  • 16. Experimental Results (1/3) Figure: Classification Error against Time using different methods on the TIMIT, Yelp and CIFAR-10 datasets. The little black stars denote the end of an epoch
  • 17. Experimental Results (2/3) Figure: Classification Error against number of features for Nystrom and Random Features on the TIMIT, Yelp and CIFAR-10 datasets
  • 19. Performance Figure: Breakdown of time to compute a single block of coordinate descent in the first epoch on the TIMIT, Yelp and CIFAR-10 datasets From the figure, we see that the choice of the kernel approximation can significantly impact performance since different kernels take different amounts of time to generate. For example, the hash random feature used for the Yelp dataset is much cheaper to compute than the string kernel. However, computing a block of the RBF kernel is similar in cost to computing a block of random cosine features.
  • 20. Scalability of RBF Kernel Generation Figure: Time taken to compute on eblock of RBF kernel as they scale the number of examples and the number of machines used Here, ideal scaling implies that the time to generate a block of the kernel matrix remains constant as they increase both the data and the number of machines. However, computing a block of the RBF kernel involves broadcasting a b x d matrix to all the machines in the cluster. This causes a slight decrease in performance as they go from 8 to 128 machines. However, they believe that the kernel block generation methods will continue to scale well for larger datasets since broadcast routines scale as O(logM).
  • 21. Conclusion This paper shows that scalable kernel machines are feasible with distributed computation. Results suggest that the Nystrom method generally achieves better statistical accuracy than random features However, it can require significantly more iterations of optimization. On the theoretical side, a limitation of this analysis is that achieving rates better than gradient descent cannot be hoped.
  • 22. References and Further Reading I Stephen Tu, Rebecca Roelofs, Shivaram Venkataraman, Benjamin Recht Large Scale Kernel Learning using Block Coordinate Descent February 18, 2016 Tianbao Yang, Yu-feng Li, Mehrdad Mahdavi, Rong Jin, Zhi-Hua Zhou Nystrom Method vs Random Fourier Features: A Theoretical and Empirical Comparison Advances in Neural Information Processing Systems 25 (NIPS 2012)