SlideShare a Scribd company logo
1 of 99
Download to read offline
Fast Distributed Online Classiļ¬cation
Ram Sriharsha (Product Manager, Apache Spark, Databricks)
Prasad Chalasani (SVP Data Science, MediaMath)
13 April, 2016
Summary
We leveraged recent machine-learning research to develop a
I fast, practical,
I scalable (up to 100s of Millions of sparse features)
I online,
I distributed (built on Apache Spark),
I single-pass,
ML classiļ¬er that has signiļ¬cant advantages
over most similar ML packages.
Key Conceptual Take-aways
I Supervised Machine Learning
I Online vs Batch Learning, and importance of Online
I Challenges in online-learning
I Distributed implementation in Spark.
Supervised Machine Learning: Overview
Given:
I labeled training data,
Goal:
I ļ¬t a model to predict labels on (unseen) test data.
Supervised Machine Learning: Overview
Given:
I training data D: n labeled examples
{(x1, y1), (x2, y2), . . . , (xn, yn)} where
I xi is a k-dimensional feature-vector
I yi is the label (0 or 1) that we want to predict.
I an error (or loss) metric L(p, y) from predicting p when true
label is y.
Supervised Machine Learning: Overview
Given:
I training data D: n labeled examples
{(x1, y1), (x2, y2), . . . , (xn, yn)} where
I xi is a k-dimensional feature-vector
I yi is the label (0 or 1) that we want to predict.
I an error (or loss) metric L(p, y) from predicting p when true
label is y.
Fix a family of functions fw (x) œ F that are parametrised by a
weight-vector w.
Supervised Machine Learning: Overview
Given:
I training data D: n labeled examples
{(x1, y1), (x2, y2), . . . , (xn, yn)} where
I xi is a k-dimensional feature-vector
I yi is the label (0 or 1) that we want to predict.
I an error (or loss) metric L(p, y) from predicting p when true
label is y.
Fix a family of functions fw (x) œ F that are parametrised by a
weight-vector w.
Goal: ļ¬nd w that minimizes average loss over D:
L(w) =
1
n
nĆæ
i=1
Li (w) =
1
n
nĆæ
i=1
L(fw (xi ), yi ).
Logistic Regression
Logistic model fw (x) = 1
1+eā‰ wĀ·x
Probability interpretation
Logistic Regression
Logistic model fw (x) = 1
1+eā‰ wĀ·x
Probability interpretation
Loss Function Li (w) = ā‰ yi ln(fw (xi )) ā‰  (1 ā‰  yi ) ln(1 ā‰  fw (xi ))
Logistic Regression
Logistic model fw (x) = 1
1+eā‰ wĀ·x
Probability interpretation
Loss Function Li (w) = ā‰ yi ln(fw (xi )) ā‰  (1 ā‰  yi ) ln(1 ā‰  fw (xi ))
Overall Loss L(w) =
qn
i=1 Li (w)
Logistic Regression
Logistic model fw (x) = 1
1+eā‰ wĀ·x
Probability interpretation
Loss Function Li (w) = ā‰ yi ln(fw (xi )) ā‰  (1 ā‰  yi ) ln(1 ā‰  fw (xi ))
Overall Loss L(w) =
qn
i=1 Li (w)
L(w) is convex:
I no local minima
I di erentiate and follow gradients
Logistic Regression: gradient descent
Gradient Descent
Basic idea:
I start with an initial guess of weight-vector w
I at iteration t, update w to a new weight-vector wƕ:
wƕ
= w ā‰  ā„gt
where
I gt is the (vector) gradient of L(w) w.r.t. w at time t,
I ā„ is the learning rate.
Gradient Descent
Gradient gt =āˆ† step direction
Learning rate ā„ =āˆ† step size
Gradient Descent
gt =
Ė†L(w)
Ė†w
=
Ćæ
i
Ė†Li (w)
Ė†w
=
Ćæ
i
gti
Gradient Descent
gt =
Ė†L(w)
Ė†w
=
Ćæ
i
Ė†Li (w)
Ė†w
=
Ćæ
i
gti
This is Batch Gradient Descent (BGD) :
I To make one weight-update, need to compute gradient over
entire training data-set.
I repeat this until convergence.
Gradient Descent
gt =
Ė†L(w)
Ė†w
=
Ćæ
i
Ė†Li (w)
Ė†w
=
Ćæ
i
gti
This is Batch Gradient Descent (BGD) :
I To make one weight-update, need to compute gradient over
entire training data-set.
I repeat this until convergence.
BGD is not scalable to large data-sets.
Online (Stochastic) Gradient Descent (SGD)
A drastic simpliļ¬cation:
Instead of computing gradient based on entire training data-set
gt =
Ćæ
i
Ė†Li (w)
Ė†w
,
and doing an update wƕ = w ā‰  ā„gt.
Online (Stochastic) Gradient Descent (SGD)
A drastic simpliļ¬cation: Shu e data-set (if not naturally shu ed),
Compute gradient based on a one example
gti =
Ė†Li (w)
Ė†w
,
and do an update wƕ = w ā‰  ā„gti .
Batch vs Online Gradient Descent
Batch:
I to make one step, compute gradient w.r.t. entire data-set
I extremely slow updates
I correct gradient
Batch vs Online Gradient Descent
Batch:
I to make one step, compute gradient w.r.t. entire data-set
I extremely slow updates
I correct gradient
Online:
I to make one step, compute gradient w.r.t. one example:
I extremely fast updates
I not necessarily correct gradient
Visualize Batch vs Stochastic Gradient Descent
Batch vs Online Learning
Batch Learning:
I process a large training data-set, generate a model
I use model to predict labels of test data-set
Batch vs Online Learning
Batch Learning:
I process a large training data-set, generate a model
I use model to predict labels of test data-set
Drawbacks:
I infeasible/impractically slow for large data-sets.
I need to repeat batch process to update model with new data
Batch vs Online Learning
Online Learning:
I for each ā€œtrainingā€ example:
Batch vs Online Learning
Online Learning:
I for each ā€œtrainingā€ example:
I generate prediction (score),
Batch vs Online Learning
Online Learning:
I for each ā€œtrainingā€ example:
I generate prediction (score),
I compare with true label,
Batch vs Online Learning
Online Learning:
I for each ā€œtrainingā€ example:
I generate prediction (score),
I compare with true label,
I update model (weights w)
Batch vs Online Learning
Online Learning:
I for each ā€œtrainingā€ example:
I generate prediction (score),
I compare with true label,
I update model (weights w)
I for each ā€œtestā€ example:
Batch vs Online Learning
Online Learning:
I for each ā€œtrainingā€ example:
I generate prediction (score),
I compare with true label,
I update model (weights w)
I for each ā€œtestā€ example:
I predict with latest learned model (weights w)
Batch vs Online Learning
Online Learning beneļ¬ts:
I does not pre-process enire training data-set
Batch vs Online Learning
Online Learning beneļ¬ts:
I does not pre-process enire training data-set
I does not explicitly retain previously-seen examples
Batch vs Online Learning
Online Learning beneļ¬ts:
I does not pre-process enire training data-set
I does not explicitly retain previously-seen examples
I extremely light-weight: space and time-e cient
Batch vs Online Learning
Online Learning beneļ¬ts:
I does not pre-process enire training data-set
I does not explicitly retain previously-seen examples
I extremely light-weight: space and time-e cient
I no distinct ā€œtrainingā€ and ā€œtestingā€ phases:
Batch vs Online Learning
Online Learning beneļ¬ts:
I does not pre-process enire training data-set
I does not explicitly retain previously-seen examples
I extremely light-weight: space and time-e cient
I no distinct ā€œtrainingā€ and ā€œtestingā€ phases:
I incremental, continual learning
Batch vs Online Learning
Online Learning beneļ¬ts:
I does not pre-process enire training data-set
I does not explicitly retain previously-seen examples
I extremely light-weight: space and time-e cient
I no distinct ā€œtrainingā€ and ā€œtestingā€ phases:
I incremental, continual learning
I adapts to changing patterns
Batch vs Online Learning
Online Learning beneļ¬ts:
I does not pre-process enire training data-set
I does not explicitly retain previously-seen examples
I extremely light-weight: space and time-e cient
I no distinct ā€œtrainingā€ and ā€œtestingā€ phases:
I incremental, continual learning
I adapts to changing patterns
I easily update existing model with new data
Batch vs Online Learning
Online Learning beneļ¬ts:
I does not pre-process enire training data-set
I does not explicitly retain previously-seen examples
I extremely light-weight: space and time-e cient
I no distinct ā€œtrainingā€ and ā€œtestingā€ phases:
I incremental, continual learning
I adapts to changing patterns
I easily update existing model with new data
I better generalization to unseen observations.
The Online Learning Paradigm
As each labeled example (xi , yi ) is seen,
I make prediction given only current weight-vector w
I update weight-vector w
Online Learning: Use Scenarios
I extremely large data-sets where
Online Learning: Use Scenarios
I extremely large data-sets where
I batch learning is computationally infeasible/impractical, and
Online Learning: Use Scenarios
I extremely large data-sets where
I batch learning is computationally infeasible/impractical, and
I itā€™s possible to only do a single pass over the data.
Online Learning: Use Scenarios
I extremely large data-sets where
I batch learning is computationally infeasible/impractical, and
I itā€™s possible to only do a single pass over the data.
I data arrives in real-time, and
Online Learning: Use Scenarios
I extremely large data-sets where
I batch learning is computationally infeasible/impractical, and
I itā€™s possible to only do a single pass over the data.
I data arrives in real-time, and
I decisions/predictions must be made quickly
Online Learning: Use Scenarios
I extremely large data-sets where
I batch learning is computationally infeasible/impractical, and
I itā€™s possible to only do a single pass over the data.
I data arrives in real-time, and
I decisions/predictions must be made quickly
I learned model needs to adapt quickly to recent observations.
Online Learning Examples
Online Learning Example: Advertising (MediaMath)
Listen to 100 billion ad-opportunities daily from Ad Exchanges.
Online Learning Example: Advertising (MediaMath)
Listen to 100 billion ad-opportunities daily from Ad Exchanges.
For each opportunity, need to predict whether exposed user will buy,
as function of several features:
I hour_of_day, browser_type, geo_region, age, . . .
Online Learning Example: Advertising (MediaMath)
Listen to 100 billion ad-opportunities daily from Ad Exchanges.
For each opportunity, need to predict whether exposed user will buy,
as function of several features:
I hour_of_day, browser_type, geo_region, age, . . .
Online learning beneļ¬ts:
I fast update of learned model to reļ¬‚ect latest observations
Online Learning Example: Advertising (MediaMath)
Listen to 100 billion ad-opportunities daily from Ad Exchanges.
For each opportunity, need to predict whether exposed user will buy,
as function of several features:
I hour_of_day, browser_type, geo_region, age, . . .
Online learning beneļ¬ts:
I fast update of learned model to reļ¬‚ect latest observations
I light-weight models extremely quick to compute
Online Learning: IOT
Vast amounts of data; need to adapt, respond quickly.
Online Learning: IOT
Vast amounts of data; need to adapt, respond quickly.
Nest Thermostats: behavior data =āˆ† predict preferred room temp.
Online Learning: IOT
Vast amounts of data; need to adapt, respond quickly.
Nest Thermostats: behavior data =āˆ† predict preferred room temp.
Self-driving Cars: (sensor data, other cars) =āˆ† predict collision
Online Learning: IOT
Vast amounts of data; need to adapt, respond quickly.
Nest Thermostats: behavior data =āˆ† predict preferred room temp.
Self-driving Cars: (sensor data, other cars) =āˆ† predict collision
Clinical: Sensors (activity, vitals, . . . ) =āˆ† predict cardiac event
Online Learning: IOT
Vast amounts of data; need to adapt, respond quickly.
Nest Thermostats: behavior data =āˆ† predict preferred room temp.
Self-driving Cars: (sensor data, other cars) =āˆ† predict collision
Clinical: Sensors (activity, vitals, . . . ) =āˆ† predict cardiac event
Smart cities: tra c sensors =āˆ† predict congestion
Online Learning: Challenge #1
Feature scale di erences
Online Learning: Feature Scaling
Example from wearable devices domain:
I feature 1 = heart-rate, range 40 to 200
I feature 2 = step-count, range 0 to 500,000
Online Learning: Feature Scaling
Example from wearable devices domain:
I feature 1 = heart-rate, range 40 to 200
I feature 2 = step-count, range 0 to 500,000
Extreme scale di erences =āˆ† convergence problems.
Convergence much faster when features are of same scale:
I normalize each feature by dividing by its max possible value.
Online Learning: Feature Scaling
But often:
I range of features not known in advance, and
I we cannot make a separate pass over the data to ļ¬nd ranges.
Online Learning: Feature Scaling
But often:
I range of features not known in advance, and
I we cannot make a separate pass over the data to ļ¬nd ranges.
=āˆ† Need single-pass algorithms that
adaptively normalize features with each new observation.
[Ross,Mineiro,Langford 2013] proposed such an algorithm,
which we implemented in our online ML system.
Online Learning Challenge #2:
(Sparse) Feature frequency di erences
Online Learning: Feature Frequency Di erences
Some sparse features occur much more frequently than others, e.g.:
I categorical feature country with 200 values,
Online Learning: Feature Frequency Di erences
Some sparse features occur much more frequently than others, e.g.:
I categorical feature country with 200 values,
I encoded as a vector of length 200 with exactly one entry = 1,
and the rest 0
Online Learning: Feature Frequency Di erences
Some sparse features occur much more frequently than others, e.g.:
I categorical feature country with 200 values,
I encoded as a vector of length 200 with exactly one entry = 1,
and the rest 0
I country=USA may occur much more often than
country=Belgium
Online Learning: Feature Frequency Di erences
Some sparse features occur much more frequently than others, e.g.:
I categorical feature country with 200 values,
I encoded as a vector of length 200 with exactly one entry = 1,
and the rest 0
I country=USA may occur much more often than
country=Belgium
I indicator feature visited_site = 1 much more often than
purchased=1.
Online Learning: Feature Frequency Di erences
Often, rare features much more predictive than frequent features.
Same learning rate for all features =āˆ† slow convergence.
Online Learning: Feature Frequency Di erences
Often, rare features much more predictive than frequent features.
Same learning rate for all features =āˆ† slow convergence.
=āˆ† rare features should have larger learning rates:
I bigger steps whenever a rare feature is seen
I much faster convergence
Online Learning: Feature Frequency Di erences
Often, rare features much more predictive than frequent features.
Same learning rate for all features =āˆ† slow convergence.
=āˆ† rare features should have larger learning rates:
I bigger steps whenever a rare feature is seen
I much faster convergence
E ectively, the algo pays more attention to rare features
Enables ļ¬nding rare but predictive features.
Online Learning: Feature Frequency Di erences
Often, rare features much more predictive than frequent features.
Same learning rate for all features =āˆ† slow convergence.
=āˆ† rare features should have larger learning rates:
I bigger steps whenever a rare feature is seen
I much faster convergence
E ectively, the algo pays more attention to rare features
Enables ļ¬nding rare but predictive features.
ADAGRAD is an algorithm for this [Duchi,Hazan,Singer 2010], and
we implemented this in our learning system.
Online Learning Challenge #3:
Encoding sparse features
Online Learning: Sparse Features
E.g. site_domain has large (unknown) set of possible values
I google.com, yahoo.com, cnn.com, . . .
Online Learning: Sparse Features
E.g. site_domain has large (unknown) set of possible values
I google.com, yahoo.com, cnn.com, . . .
I need to encode (conceptually) as 1-hot vectors, e.g.
Online Learning: Sparse Features
E.g. site_domain has large (unknown) set of possible values
I google.com, yahoo.com, cnn.com, . . .
I need to encode (conceptually) as 1-hot vectors, e.g.
I google.com = (1, 0, 0, 0, ... )
Online Learning: Sparse Features
E.g. site_domain has large (unknown) set of possible values
I google.com, yahoo.com, cnn.com, . . .
I need to encode (conceptually) as 1-hot vectors, e.g.
I google.com = (1, 0, 0, 0, ... )
I yahoo.com = (0, 1, 0, 0, ... )
Online Learning: Sparse Features
E.g. site_domain has large (unknown) set of possible values
I google.com, yahoo.com, cnn.com, . . .
I need to encode (conceptually) as 1-hot vectors, e.g.
I google.com = (1, 0, 0, 0, ... )
I yahoo.com = (0, 1, 0, 0, ... )
I cnn.com = (0, 0, 1, 0, ... )
Online Learning: Sparse Features
E.g. site_domain has large (unknown) set of possible values
I google.com, yahoo.com, cnn.com, . . .
I need to encode (conceptually) as 1-hot vectors, e.g.
I google.com = (1, 0, 0, 0, ... )
I yahoo.com = (0, 1, 0, 0, ... )
I cnn.com = (0, 0, 1, 0, ... )
I . . .
Online Learning: Sparse Features
E.g. site_domain has large (unknown) set of possible values
I google.com, yahoo.com, cnn.com, . . .
I need to encode (conceptually) as 1-hot vectors, e.g.
I google.com = (1, 0, 0, 0, ... )
I yahoo.com = (0, 1, 0, 0, ... )
I cnn.com = (0, 0, 1, 0, ... )
I . . .
I all possible values not known in advance
Online Learning: Sparse Features
E.g. site_domain has large (unknown) set of possible values
I google.com, yahoo.com, cnn.com, . . .
I need to encode (conceptually) as 1-hot vectors, e.g.
I google.com = (1, 0, 0, 0, ... )
I yahoo.com = (0, 1, 0, 0, ... )
I cnn.com = (0, 0, 1, 0, ... )
I . . .
I all possible values not known in advance
I cannot pre-process data to ļ¬nd all possible values
Online Learning: Sparse Features
E.g. site_domain has large (unknown) set of possible values
I google.com, yahoo.com, cnn.com, . . .
I need to encode (conceptually) as 1-hot vectors, e.g.
I google.com = (1, 0, 0, 0, ... )
I yahoo.com = (0, 1, 0, 0, ... )
I cnn.com = (0, 0, 1, 0, ... )
I . . .
I all possible values not known in advance
I cannot pre-process data to ļ¬nd all possible values
I donā€™t want to encode explicit (long) vectors
Online Learning: Sparse Features, Hashing Trick
e.g. observation:
I country = "china" (categorical)
I age=32 (numerical)
I domain="google.com" (categorical)
Online Learning: Sparse Features, Hashing Trick
Hash the feature-names:
I hash("country_china") = 24378
I hash("age") = 32905
I hash("domain_google.com") = 84395
Online Learning: Sparse Features, Hashing Trick
Represent observation as a (special) Map:
{24378 Ʀ 1.0, 32905 Ʀ 32.0, 84395 Ʀ 1.0}
Online Learning: Sparse Features, Hashing Trick
Represent observation as a (special) Map:
{24378 Ʀ 1.0, 32905 Ʀ 32.0, 84395 Ʀ 1.0}
Sparse Representation (no explicit vectors)
Online Learning: Sparse Features, Hashing Trick
Represent observation as a (special) Map:
{24378 Ʀ 1.0, 32905 Ʀ 32.0, 84395 Ʀ 1.0}
Sparse Representation (no explicit vectors)
No need for separate pass on data (unlike Spark MLLib)
Online Learning Challenge #4:
Distributed Implementation of Online Learning
Distributed Online Logistic Regression
Stochastic Gradient Descent (SGD) is inherently sequential:
I how to parallelize?
Distributed Online Logistic Regression
Stochastic Gradient Descent (SGD) is inherently sequential:
I how to parallelize?
Our (Scala) implementation in Apache Spark:
I Randomly re-partition training data into shards
I Use SGD to learn a model for each shard
I average models using TreeReduce (~ ā€œAllReduceā€)
I leverages Spark/Hadoop fault-tolerance.
Slider:
Fast Distributed Online Learning System
Slider
Fast, distributed, online, single-pass learning system.
I Written in Scala on top of Spark
I Works directly with Spark Data Frames
I Usable as a library within other JVM systems
I Leverages Spark/Hadoop fault-tolerance
I Stochastic Gradient Descent
I Online feature-scaling/normalization
I Adaptive (per-feature) learning-rates
I Single-pass
I Hashing-trick to encode sparse features
Slider, Vowpal-Wabbit (VW), Spark-ML (SML)
Fast, distributed, online, single-pass learning system.
I Written in Scala on top of Spark (SML)
I Works directly with Spark Data Frames (SML)
I Usable as a library within other JVM systems (SML)
I Leverages Spark/Hadoop fault-tolerance (SML)
I Stochastic Gradient Descent (SGD) (VW, SML)
I Online feature-scaling/normalization (VW)
I Adaptive (per-feature) learning-rates (VW)
I Single-pass (VW, SML)
I Hashing-trick to encode sparse features (VW)
Slider example
Slider example
Slider example
Slider example
Slider vs Spark ML
Task: Predict conversion probability from ad Impression features
I 14M impressions from 1 ad campaign
I 17 Categorical features, 2 numerical features
I Train on ļ¬rst 80%, test on remaining 20%
Slider vs Spark ML
Task: Predict conversion probability from ad Impression features
I 14M impressions from 1 ad campaign
I 17 Categorical features, 2 numerical features
I Train on ļ¬rst 80%, test on remaining 20%
Spark ML (using Pipelines)
I makes 17 passes over data: one for each categorical feature
I trains and scores in 40 minutes
I need to specify iterations, etc.
I AUC = 0.52 on test data
Slider vs Spark ML
Task: Predict conversion probability from ad Impression features
I 14M impressions from 1 ad campaign
I 17 Categorical features, 2 numerical features
I Train on ļ¬rst 80%, test on remaining 20%
Slider
I makes just one pass over data.
I trains and scores in 5 minutes.
I no tuning
I AUC = 0.68 on test data
Other Work
I Online version of k-means clustering
I FTRL algorithm (regularized alternative to SGD)
Ongoing/Future:
I Online learning with Spark Streaming
I Benchmarking vs other ML systems
Thank you

More Related Content

Similar to Fast Distributed Online Classification

Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...Edureka!
Ā 
How can we train with few data
How can we train with few dataHow can we train with few data
How can we train with few dataDong Heon Cho
Ā 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Simplilearn
Ā 
Scalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data MiningScalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data MiningGerard de Melo
Ā 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithmsArunangsu Sahu
Ā 
Reinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual BanditsReinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual BanditsMax Pagels
Ā 
Competition 1 (blog 1)
Competition 1 (blog 1)Competition 1 (blog 1)
Competition 1 (blog 1)TarunPaparaju
Ā 
Machine Learning at Geeky Base 2
Machine Learning at Geeky Base 2Machine Learning at Geeky Base 2
Machine Learning at Geeky Base 2Kan Ouivirach, Ph.D.
Ā 
Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...
Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...
Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...Edureka!
Ā 
Overfitting and-tbl
Overfitting and-tblOverfitting and-tbl
Overfitting and-tblDigvijay Singh
Ā 
Big Data Science - hype?
Big Data Science - hype?Big Data Science - hype?
Big Data Science - hype?BalaBit
Ā 
WEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been LearnedWEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been Learnedweka Content
Ā 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedDataminingTools Inc
Ā 
Search Engines
Search EnginesSearch Engines
Search Enginesbutest
Ā 
Recommender Systems from A to Z ā€“ Model Training
Recommender Systems from A to Z ā€“ Model TrainingRecommender Systems from A to Z ā€“ Model Training
Recommender Systems from A to Z ā€“ Model TrainingCrossing Minds
Ā 
Feature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performanceFeature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performanceVenkat Projects
Ā 
Artificial Intelligence.pptx
Artificial Intelligence.pptxArtificial Intelligence.pptx
Artificial Intelligence.pptxKaviya452563
Ā 
Think-Aloud Protocols
Think-Aloud ProtocolsThink-Aloud Protocols
Think-Aloud Protocolsbutest
Ā 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionMargaret Wang
Ā 

Similar to Fast Distributed Online Classification (20)

Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Ā 
How can we train with few data
How can we train with few dataHow can we train with few data
How can we train with few data
Ā 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Ā 
Scalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data MiningScalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data Mining
Ā 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
Ā 
Reinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual BanditsReinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual Bandits
Ā 
Competition 1 (blog 1)
Competition 1 (blog 1)Competition 1 (blog 1)
Competition 1 (blog 1)
Ā 
Machine Learning at Geeky Base 2
Machine Learning at Geeky Base 2Machine Learning at Geeky Base 2
Machine Learning at Geeky Base 2
Ā 
Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...
Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...
Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...
Ā 
Overfitting and-tbl
Overfitting and-tblOverfitting and-tbl
Overfitting and-tbl
Ā 
Big Data Science - hype?
Big Data Science - hype?Big Data Science - hype?
Big Data Science - hype?
Ā 
WEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been LearnedWEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been Learned
Ā 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been Learned
Ā 
Search Engines
Search EnginesSearch Engines
Search Engines
Ā 
Recommender Systems from A to Z ā€“ Model Training
Recommender Systems from A to Z ā€“ Model TrainingRecommender Systems from A to Z ā€“ Model Training
Recommender Systems from A to Z ā€“ Model Training
Ā 
Feature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performanceFeature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performance
Ā 
Kaggle KDD Cup Report
Kaggle KDD Cup ReportKaggle KDD Cup Report
Kaggle KDD Cup Report
Ā 
Artificial Intelligence.pptx
Artificial Intelligence.pptxArtificial Intelligence.pptx
Artificial Intelligence.pptx
Ā 
Think-Aloud Protocols
Think-Aloud ProtocolsThink-Aloud Protocols
Think-Aloud Protocols
Ā 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and prediction
Ā 

More from DataWorks Summit/Hadoop Summit

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionDataWorks Summit/Hadoop Summit
Ā 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinDataWorks Summit/Hadoop Summit
Ā 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
Ā 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
Ā 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinDataWorks Summit/Hadoop Summit
Ā 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
Ā 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
Ā 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
Ā 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
Ā 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient DataWorks Summit/Hadoop Summit
Ā 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
Ā 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopDataWorks Summit/Hadoop Summit
Ā 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
Ā 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
Ā 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
Ā 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
Ā 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
Ā 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
Ā 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
Ā 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
Ā 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
Ā 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
Ā 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
Ā 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
Ā 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Ā 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Ā 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
Ā 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
Ā 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
Ā 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
Ā 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Ā 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
Ā 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
Ā 

Recently uploaded

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
Ā 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
Ā 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
Ā 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
Ā 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
Ā 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
Ā 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
Ā 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
Ā 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
Ā 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
Ā 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
Ā 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
Ā 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
Ā 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
Ā 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
Ā 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
Ā 
Scaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organizationScaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organizationRadu Cotescu
Ā 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
Ā 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
Ā 

Recently uploaded (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
Ā 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Ā 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Ā 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
Ā 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Ā 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Ā 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
Ā 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
Ā 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Ā 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Ā 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
Ā 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Ā 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
Ā 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
Ā 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Ā 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
Ā 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
Ā 
Scaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organizationScaling API-first ā€“ The story of a global engineering organization
Scaling API-first ā€“ The story of a global engineering organization
Ā 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
Ā 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Ā 

Fast Distributed Online Classification

  • 1. Fast Distributed Online Classiļ¬cation Ram Sriharsha (Product Manager, Apache Spark, Databricks) Prasad Chalasani (SVP Data Science, MediaMath) 13 April, 2016
  • 2. Summary We leveraged recent machine-learning research to develop a I fast, practical, I scalable (up to 100s of Millions of sparse features) I online, I distributed (built on Apache Spark), I single-pass, ML classiļ¬er that has signiļ¬cant advantages over most similar ML packages.
  • 3. Key Conceptual Take-aways I Supervised Machine Learning I Online vs Batch Learning, and importance of Online I Challenges in online-learning I Distributed implementation in Spark.
  • 4. Supervised Machine Learning: Overview Given: I labeled training data, Goal: I ļ¬t a model to predict labels on (unseen) test data.
  • 5. Supervised Machine Learning: Overview Given: I training data D: n labeled examples {(x1, y1), (x2, y2), . . . , (xn, yn)} where I xi is a k-dimensional feature-vector I yi is the label (0 or 1) that we want to predict. I an error (or loss) metric L(p, y) from predicting p when true label is y.
  • 6. Supervised Machine Learning: Overview Given: I training data D: n labeled examples {(x1, y1), (x2, y2), . . . , (xn, yn)} where I xi is a k-dimensional feature-vector I yi is the label (0 or 1) that we want to predict. I an error (or loss) metric L(p, y) from predicting p when true label is y. Fix a family of functions fw (x) œ F that are parametrised by a weight-vector w.
  • 7. Supervised Machine Learning: Overview Given: I training data D: n labeled examples {(x1, y1), (x2, y2), . . . , (xn, yn)} where I xi is a k-dimensional feature-vector I yi is the label (0 or 1) that we want to predict. I an error (or loss) metric L(p, y) from predicting p when true label is y. Fix a family of functions fw (x) œ F that are parametrised by a weight-vector w. Goal: ļ¬nd w that minimizes average loss over D: L(w) = 1 n nĆæ i=1 Li (w) = 1 n nĆæ i=1 L(fw (xi ), yi ).
  • 8. Logistic Regression Logistic model fw (x) = 1 1+eā‰ wĀ·x Probability interpretation
  • 9. Logistic Regression Logistic model fw (x) = 1 1+eā‰ wĀ·x Probability interpretation Loss Function Li (w) = ā‰ yi ln(fw (xi )) ā‰  (1 ā‰  yi ) ln(1 ā‰  fw (xi ))
  • 10. Logistic Regression Logistic model fw (x) = 1 1+eā‰ wĀ·x Probability interpretation Loss Function Li (w) = ā‰ yi ln(fw (xi )) ā‰  (1 ā‰  yi ) ln(1 ā‰  fw (xi )) Overall Loss L(w) = qn i=1 Li (w)
  • 11. Logistic Regression Logistic model fw (x) = 1 1+eā‰ wĀ·x Probability interpretation Loss Function Li (w) = ā‰ yi ln(fw (xi )) ā‰  (1 ā‰  yi ) ln(1 ā‰  fw (xi )) Overall Loss L(w) = qn i=1 Li (w) L(w) is convex: I no local minima I di erentiate and follow gradients
  • 13. Gradient Descent Basic idea: I start with an initial guess of weight-vector w I at iteration t, update w to a new weight-vector wƕ: wƕ = w ā‰  ā„gt where I gt is the (vector) gradient of L(w) w.r.t. w at time t, I ā„ is the learning rate.
  • 14. Gradient Descent Gradient gt =āˆ† step direction Learning rate ā„ =āˆ† step size
  • 16. Gradient Descent gt = Ė†L(w) Ė†w = Ćæ i Ė†Li (w) Ė†w = Ćæ i gti This is Batch Gradient Descent (BGD) : I To make one weight-update, need to compute gradient over entire training data-set. I repeat this until convergence.
  • 17. Gradient Descent gt = Ė†L(w) Ė†w = Ćæ i Ė†Li (w) Ė†w = Ćæ i gti This is Batch Gradient Descent (BGD) : I To make one weight-update, need to compute gradient over entire training data-set. I repeat this until convergence. BGD is not scalable to large data-sets.
  • 18. Online (Stochastic) Gradient Descent (SGD) A drastic simpliļ¬cation: Instead of computing gradient based on entire training data-set gt = Ćæ i Ė†Li (w) Ė†w , and doing an update wƕ = w ā‰  ā„gt.
  • 19. Online (Stochastic) Gradient Descent (SGD) A drastic simpliļ¬cation: Shu e data-set (if not naturally shu ed), Compute gradient based on a one example gti = Ė†Li (w) Ė†w , and do an update wƕ = w ā‰  ā„gti .
  • 20. Batch vs Online Gradient Descent Batch: I to make one step, compute gradient w.r.t. entire data-set I extremely slow updates I correct gradient
  • 21. Batch vs Online Gradient Descent Batch: I to make one step, compute gradient w.r.t. entire data-set I extremely slow updates I correct gradient Online: I to make one step, compute gradient w.r.t. one example: I extremely fast updates I not necessarily correct gradient
  • 22. Visualize Batch vs Stochastic Gradient Descent
  • 23. Batch vs Online Learning Batch Learning: I process a large training data-set, generate a model I use model to predict labels of test data-set
  • 24. Batch vs Online Learning Batch Learning: I process a large training data-set, generate a model I use model to predict labels of test data-set Drawbacks: I infeasible/impractically slow for large data-sets. I need to repeat batch process to update model with new data
  • 25. Batch vs Online Learning Online Learning: I for each ā€œtrainingā€ example:
  • 26. Batch vs Online Learning Online Learning: I for each ā€œtrainingā€ example: I generate prediction (score),
  • 27. Batch vs Online Learning Online Learning: I for each ā€œtrainingā€ example: I generate prediction (score), I compare with true label,
  • 28. Batch vs Online Learning Online Learning: I for each ā€œtrainingā€ example: I generate prediction (score), I compare with true label, I update model (weights w)
  • 29. Batch vs Online Learning Online Learning: I for each ā€œtrainingā€ example: I generate prediction (score), I compare with true label, I update model (weights w) I for each ā€œtestā€ example:
  • 30. Batch vs Online Learning Online Learning: I for each ā€œtrainingā€ example: I generate prediction (score), I compare with true label, I update model (weights w) I for each ā€œtestā€ example: I predict with latest learned model (weights w)
  • 31. Batch vs Online Learning Online Learning beneļ¬ts: I does not pre-process enire training data-set
  • 32. Batch vs Online Learning Online Learning beneļ¬ts: I does not pre-process enire training data-set I does not explicitly retain previously-seen examples
  • 33. Batch vs Online Learning Online Learning beneļ¬ts: I does not pre-process enire training data-set I does not explicitly retain previously-seen examples I extremely light-weight: space and time-e cient
  • 34. Batch vs Online Learning Online Learning beneļ¬ts: I does not pre-process enire training data-set I does not explicitly retain previously-seen examples I extremely light-weight: space and time-e cient I no distinct ā€œtrainingā€ and ā€œtestingā€ phases:
  • 35. Batch vs Online Learning Online Learning beneļ¬ts: I does not pre-process enire training data-set I does not explicitly retain previously-seen examples I extremely light-weight: space and time-e cient I no distinct ā€œtrainingā€ and ā€œtestingā€ phases: I incremental, continual learning
  • 36. Batch vs Online Learning Online Learning beneļ¬ts: I does not pre-process enire training data-set I does not explicitly retain previously-seen examples I extremely light-weight: space and time-e cient I no distinct ā€œtrainingā€ and ā€œtestingā€ phases: I incremental, continual learning I adapts to changing patterns
  • 37. Batch vs Online Learning Online Learning beneļ¬ts: I does not pre-process enire training data-set I does not explicitly retain previously-seen examples I extremely light-weight: space and time-e cient I no distinct ā€œtrainingā€ and ā€œtestingā€ phases: I incremental, continual learning I adapts to changing patterns I easily update existing model with new data
  • 38. Batch vs Online Learning Online Learning beneļ¬ts: I does not pre-process enire training data-set I does not explicitly retain previously-seen examples I extremely light-weight: space and time-e cient I no distinct ā€œtrainingā€ and ā€œtestingā€ phases: I incremental, continual learning I adapts to changing patterns I easily update existing model with new data I better generalization to unseen observations.
  • 39. The Online Learning Paradigm As each labeled example (xi , yi ) is seen, I make prediction given only current weight-vector w I update weight-vector w
  • 40. Online Learning: Use Scenarios I extremely large data-sets where
  • 41. Online Learning: Use Scenarios I extremely large data-sets where I batch learning is computationally infeasible/impractical, and
  • 42. Online Learning: Use Scenarios I extremely large data-sets where I batch learning is computationally infeasible/impractical, and I itā€™s possible to only do a single pass over the data.
  • 43. Online Learning: Use Scenarios I extremely large data-sets where I batch learning is computationally infeasible/impractical, and I itā€™s possible to only do a single pass over the data. I data arrives in real-time, and
  • 44. Online Learning: Use Scenarios I extremely large data-sets where I batch learning is computationally infeasible/impractical, and I itā€™s possible to only do a single pass over the data. I data arrives in real-time, and I decisions/predictions must be made quickly
  • 45. Online Learning: Use Scenarios I extremely large data-sets where I batch learning is computationally infeasible/impractical, and I itā€™s possible to only do a single pass over the data. I data arrives in real-time, and I decisions/predictions must be made quickly I learned model needs to adapt quickly to recent observations.
  • 47. Online Learning Example: Advertising (MediaMath) Listen to 100 billion ad-opportunities daily from Ad Exchanges.
  • 48. Online Learning Example: Advertising (MediaMath) Listen to 100 billion ad-opportunities daily from Ad Exchanges. For each opportunity, need to predict whether exposed user will buy, as function of several features: I hour_of_day, browser_type, geo_region, age, . . .
  • 49. Online Learning Example: Advertising (MediaMath) Listen to 100 billion ad-opportunities daily from Ad Exchanges. For each opportunity, need to predict whether exposed user will buy, as function of several features: I hour_of_day, browser_type, geo_region, age, . . . Online learning beneļ¬ts: I fast update of learned model to reļ¬‚ect latest observations
  • 50. Online Learning Example: Advertising (MediaMath) Listen to 100 billion ad-opportunities daily from Ad Exchanges. For each opportunity, need to predict whether exposed user will buy, as function of several features: I hour_of_day, browser_type, geo_region, age, . . . Online learning beneļ¬ts: I fast update of learned model to reļ¬‚ect latest observations I light-weight models extremely quick to compute
  • 51. Online Learning: IOT Vast amounts of data; need to adapt, respond quickly.
  • 52. Online Learning: IOT Vast amounts of data; need to adapt, respond quickly. Nest Thermostats: behavior data =āˆ† predict preferred room temp.
  • 53. Online Learning: IOT Vast amounts of data; need to adapt, respond quickly. Nest Thermostats: behavior data =āˆ† predict preferred room temp. Self-driving Cars: (sensor data, other cars) =āˆ† predict collision
  • 54. Online Learning: IOT Vast amounts of data; need to adapt, respond quickly. Nest Thermostats: behavior data =āˆ† predict preferred room temp. Self-driving Cars: (sensor data, other cars) =āˆ† predict collision Clinical: Sensors (activity, vitals, . . . ) =āˆ† predict cardiac event
  • 55. Online Learning: IOT Vast amounts of data; need to adapt, respond quickly. Nest Thermostats: behavior data =āˆ† predict preferred room temp. Self-driving Cars: (sensor data, other cars) =āˆ† predict collision Clinical: Sensors (activity, vitals, . . . ) =āˆ† predict cardiac event Smart cities: tra c sensors =āˆ† predict congestion
  • 56. Online Learning: Challenge #1 Feature scale di erences
  • 57. Online Learning: Feature Scaling Example from wearable devices domain: I feature 1 = heart-rate, range 40 to 200 I feature 2 = step-count, range 0 to 500,000
  • 58. Online Learning: Feature Scaling Example from wearable devices domain: I feature 1 = heart-rate, range 40 to 200 I feature 2 = step-count, range 0 to 500,000 Extreme scale di erences =āˆ† convergence problems. Convergence much faster when features are of same scale: I normalize each feature by dividing by its max possible value.
  • 59. Online Learning: Feature Scaling But often: I range of features not known in advance, and I we cannot make a separate pass over the data to ļ¬nd ranges.
  • 60. Online Learning: Feature Scaling But often: I range of features not known in advance, and I we cannot make a separate pass over the data to ļ¬nd ranges. =āˆ† Need single-pass algorithms that adaptively normalize features with each new observation. [Ross,Mineiro,Langford 2013] proposed such an algorithm, which we implemented in our online ML system.
  • 61. Online Learning Challenge #2: (Sparse) Feature frequency di erences
  • 62. Online Learning: Feature Frequency Di erences Some sparse features occur much more frequently than others, e.g.: I categorical feature country with 200 values,
  • 63. Online Learning: Feature Frequency Di erences Some sparse features occur much more frequently than others, e.g.: I categorical feature country with 200 values, I encoded as a vector of length 200 with exactly one entry = 1, and the rest 0
  • 64. Online Learning: Feature Frequency Di erences Some sparse features occur much more frequently than others, e.g.: I categorical feature country with 200 values, I encoded as a vector of length 200 with exactly one entry = 1, and the rest 0 I country=USA may occur much more often than country=Belgium
  • 65. Online Learning: Feature Frequency Di erences Some sparse features occur much more frequently than others, e.g.: I categorical feature country with 200 values, I encoded as a vector of length 200 with exactly one entry = 1, and the rest 0 I country=USA may occur much more often than country=Belgium I indicator feature visited_site = 1 much more often than purchased=1.
  • 66. Online Learning: Feature Frequency Di erences Often, rare features much more predictive than frequent features. Same learning rate for all features =āˆ† slow convergence.
  • 67. Online Learning: Feature Frequency Di erences Often, rare features much more predictive than frequent features. Same learning rate for all features =āˆ† slow convergence. =āˆ† rare features should have larger learning rates: I bigger steps whenever a rare feature is seen I much faster convergence
  • 68. Online Learning: Feature Frequency Di erences Often, rare features much more predictive than frequent features. Same learning rate for all features =āˆ† slow convergence. =āˆ† rare features should have larger learning rates: I bigger steps whenever a rare feature is seen I much faster convergence E ectively, the algo pays more attention to rare features Enables ļ¬nding rare but predictive features.
  • 69. Online Learning: Feature Frequency Di erences Often, rare features much more predictive than frequent features. Same learning rate for all features =āˆ† slow convergence. =āˆ† rare features should have larger learning rates: I bigger steps whenever a rare feature is seen I much faster convergence E ectively, the algo pays more attention to rare features Enables ļ¬nding rare but predictive features. ADAGRAD is an algorithm for this [Duchi,Hazan,Singer 2010], and we implemented this in our learning system.
  • 70. Online Learning Challenge #3: Encoding sparse features
  • 71. Online Learning: Sparse Features E.g. site_domain has large (unknown) set of possible values I google.com, yahoo.com, cnn.com, . . .
  • 72. Online Learning: Sparse Features E.g. site_domain has large (unknown) set of possible values I google.com, yahoo.com, cnn.com, . . . I need to encode (conceptually) as 1-hot vectors, e.g.
  • 73. Online Learning: Sparse Features E.g. site_domain has large (unknown) set of possible values I google.com, yahoo.com, cnn.com, . . . I need to encode (conceptually) as 1-hot vectors, e.g. I google.com = (1, 0, 0, 0, ... )
  • 74. Online Learning: Sparse Features E.g. site_domain has large (unknown) set of possible values I google.com, yahoo.com, cnn.com, . . . I need to encode (conceptually) as 1-hot vectors, e.g. I google.com = (1, 0, 0, 0, ... ) I yahoo.com = (0, 1, 0, 0, ... )
  • 75. Online Learning: Sparse Features E.g. site_domain has large (unknown) set of possible values I google.com, yahoo.com, cnn.com, . . . I need to encode (conceptually) as 1-hot vectors, e.g. I google.com = (1, 0, 0, 0, ... ) I yahoo.com = (0, 1, 0, 0, ... ) I cnn.com = (0, 0, 1, 0, ... )
  • 76. Online Learning: Sparse Features E.g. site_domain has large (unknown) set of possible values I google.com, yahoo.com, cnn.com, . . . I need to encode (conceptually) as 1-hot vectors, e.g. I google.com = (1, 0, 0, 0, ... ) I yahoo.com = (0, 1, 0, 0, ... ) I cnn.com = (0, 0, 1, 0, ... ) I . . .
  • 77. Online Learning: Sparse Features E.g. site_domain has large (unknown) set of possible values I google.com, yahoo.com, cnn.com, . . . I need to encode (conceptually) as 1-hot vectors, e.g. I google.com = (1, 0, 0, 0, ... ) I yahoo.com = (0, 1, 0, 0, ... ) I cnn.com = (0, 0, 1, 0, ... ) I . . . I all possible values not known in advance
  • 78. Online Learning: Sparse Features E.g. site_domain has large (unknown) set of possible values I google.com, yahoo.com, cnn.com, . . . I need to encode (conceptually) as 1-hot vectors, e.g. I google.com = (1, 0, 0, 0, ... ) I yahoo.com = (0, 1, 0, 0, ... ) I cnn.com = (0, 0, 1, 0, ... ) I . . . I all possible values not known in advance I cannot pre-process data to ļ¬nd all possible values
  • 79. Online Learning: Sparse Features E.g. site_domain has large (unknown) set of possible values I google.com, yahoo.com, cnn.com, . . . I need to encode (conceptually) as 1-hot vectors, e.g. I google.com = (1, 0, 0, 0, ... ) I yahoo.com = (0, 1, 0, 0, ... ) I cnn.com = (0, 0, 1, 0, ... ) I . . . I all possible values not known in advance I cannot pre-process data to ļ¬nd all possible values I donā€™t want to encode explicit (long) vectors
  • 80. Online Learning: Sparse Features, Hashing Trick e.g. observation: I country = "china" (categorical) I age=32 (numerical) I domain="google.com" (categorical)
  • 81. Online Learning: Sparse Features, Hashing Trick Hash the feature-names: I hash("country_china") = 24378 I hash("age") = 32905 I hash("domain_google.com") = 84395
  • 82. Online Learning: Sparse Features, Hashing Trick Represent observation as a (special) Map: {24378 Ʀ 1.0, 32905 Ʀ 32.0, 84395 Ʀ 1.0}
  • 83. Online Learning: Sparse Features, Hashing Trick Represent observation as a (special) Map: {24378 Ʀ 1.0, 32905 Ʀ 32.0, 84395 Ʀ 1.0} Sparse Representation (no explicit vectors)
  • 84. Online Learning: Sparse Features, Hashing Trick Represent observation as a (special) Map: {24378 Ʀ 1.0, 32905 Ʀ 32.0, 84395 Ʀ 1.0} Sparse Representation (no explicit vectors) No need for separate pass on data (unlike Spark MLLib)
  • 85. Online Learning Challenge #4: Distributed Implementation of Online Learning
  • 86. Distributed Online Logistic Regression Stochastic Gradient Descent (SGD) is inherently sequential: I how to parallelize?
  • 87. Distributed Online Logistic Regression Stochastic Gradient Descent (SGD) is inherently sequential: I how to parallelize? Our (Scala) implementation in Apache Spark: I Randomly re-partition training data into shards I Use SGD to learn a model for each shard I average models using TreeReduce (~ ā€œAllReduceā€) I leverages Spark/Hadoop fault-tolerance.
  • 89. Slider Fast, distributed, online, single-pass learning system. I Written in Scala on top of Spark I Works directly with Spark Data Frames I Usable as a library within other JVM systems I Leverages Spark/Hadoop fault-tolerance I Stochastic Gradient Descent I Online feature-scaling/normalization I Adaptive (per-feature) learning-rates I Single-pass I Hashing-trick to encode sparse features
  • 90. Slider, Vowpal-Wabbit (VW), Spark-ML (SML) Fast, distributed, online, single-pass learning system. I Written in Scala on top of Spark (SML) I Works directly with Spark Data Frames (SML) I Usable as a library within other JVM systems (SML) I Leverages Spark/Hadoop fault-tolerance (SML) I Stochastic Gradient Descent (SGD) (VW, SML) I Online feature-scaling/normalization (VW) I Adaptive (per-feature) learning-rates (VW) I Single-pass (VW, SML) I Hashing-trick to encode sparse features (VW)
  • 95. Slider vs Spark ML Task: Predict conversion probability from ad Impression features I 14M impressions from 1 ad campaign I 17 Categorical features, 2 numerical features I Train on ļ¬rst 80%, test on remaining 20%
  • 96. Slider vs Spark ML Task: Predict conversion probability from ad Impression features I 14M impressions from 1 ad campaign I 17 Categorical features, 2 numerical features I Train on ļ¬rst 80%, test on remaining 20% Spark ML (using Pipelines) I makes 17 passes over data: one for each categorical feature I trains and scores in 40 minutes I need to specify iterations, etc. I AUC = 0.52 on test data
  • 97. Slider vs Spark ML Task: Predict conversion probability from ad Impression features I 14M impressions from 1 ad campaign I 17 Categorical features, 2 numerical features I Train on ļ¬rst 80%, test on remaining 20% Slider I makes just one pass over data. I trains and scores in 5 minutes. I no tuning I AUC = 0.68 on test data
  • 98. Other Work I Online version of k-means clustering I FTRL algorithm (regularized alternative to SGD) Ongoing/Future: I Online learning with Spark Streaming I Benchmarking vs other ML systems