SlideShare a Scribd company logo
Fast Distributed Online Classification
Ram Sriharsha (Product Manager, Apache Spark, Databricks)
Prasad Chalasani (SVP Data Science, MediaMath)
13 April, 2016
Summary
We leveraged recent machine-learning research to develop a
I fast, practical,
I scalable (up to 100s of Millions of sparse features)
I online,
I distributed (built on Apache Spark),
I single-pass,
ML classifier that has significant advantages
over most similar ML packages.
Key Conceptual Take-aways
I Supervised Machine Learning
I Online vs Batch Learning, and importance of Online
I Challenges in online-learning
I Distributed implementation in Spark.
Supervised Machine Learning: Overview
Given:
I labeled training data,
Goal:
I fit a model to predict labels on (unseen) test data.
Supervised Machine Learning: Overview
Given:
I training data D: n labeled examples
{(x1, y1), (x2, y2), . . . , (xn, yn)} where
I xi is a k-dimensional feature-vector
I yi is the label (0 or 1) that we want to predict.
I an error (or loss) metric L(p, y) from predicting p when true
label is y.
Supervised Machine Learning: Overview
Given:
I training data D: n labeled examples
{(x1, y1), (x2, y2), . . . , (xn, yn)} where
I xi is a k-dimensional feature-vector
I yi is the label (0 or 1) that we want to predict.
I an error (or loss) metric L(p, y) from predicting p when true
label is y.
Fix a family of functions fw (x) œ F that are parametrised by a
weight-vector w.
Supervised Machine Learning: Overview
Given:
I training data D: n labeled examples
{(x1, y1), (x2, y2), . . . , (xn, yn)} where
I xi is a k-dimensional feature-vector
I yi is the label (0 or 1) that we want to predict.
I an error (or loss) metric L(p, y) from predicting p when true
label is y.
Fix a family of functions fw (x) œ F that are parametrised by a
weight-vector w.
Goal: find w that minimizes average loss over D:
L(w) =
1
n
nÿ
i=1
Li (w) =
1
n
nÿ
i=1
L(fw (xi ), yi ).
Logistic Regression
Logistic model fw (x) = 1
1+e≠w·x
Probability interpretation
Logistic Regression
Logistic model fw (x) = 1
1+e≠w·x
Probability interpretation
Loss Function Li (w) = ≠yi ln(fw (xi )) ≠ (1 ≠ yi ) ln(1 ≠ fw (xi ))
Logistic Regression
Logistic model fw (x) = 1
1+e≠w·x
Probability interpretation
Loss Function Li (w) = ≠yi ln(fw (xi )) ≠ (1 ≠ yi ) ln(1 ≠ fw (xi ))
Overall Loss L(w) =
qn
i=1 Li (w)
Logistic Regression
Logistic model fw (x) = 1
1+e≠w·x
Probability interpretation
Loss Function Li (w) = ≠yi ln(fw (xi )) ≠ (1 ≠ yi ) ln(1 ≠ fw (xi ))
Overall Loss L(w) =
qn
i=1 Li (w)
L(w) is convex:
I no local minima
I di erentiate and follow gradients
Logistic Regression: gradient descent
Gradient Descent
Basic idea:
I start with an initial guess of weight-vector w
I at iteration t, update w to a new weight-vector wÕ:
wÕ
= w ≠ ⁄gt
where
I gt is the (vector) gradient of L(w) w.r.t. w at time t,
I ⁄ is the learning rate.
Gradient Descent
Gradient gt =∆ step direction
Learning rate ⁄ =∆ step size
Gradient Descent
gt =
ˆL(w)
ˆw
=
ÿ
i
ˆLi (w)
ˆw
=
ÿ
i
gti
Gradient Descent
gt =
ˆL(w)
ˆw
=
ÿ
i
ˆLi (w)
ˆw
=
ÿ
i
gti
This is Batch Gradient Descent (BGD) :
I To make one weight-update, need to compute gradient over
entire training data-set.
I repeat this until convergence.
Gradient Descent
gt =
ˆL(w)
ˆw
=
ÿ
i
ˆLi (w)
ˆw
=
ÿ
i
gti
This is Batch Gradient Descent (BGD) :
I To make one weight-update, need to compute gradient over
entire training data-set.
I repeat this until convergence.
BGD is not scalable to large data-sets.
Online (Stochastic) Gradient Descent (SGD)
A drastic simplification:
Instead of computing gradient based on entire training data-set
gt =
ÿ
i
ˆLi (w)
ˆw
,
and doing an update wÕ = w ≠ ⁄gt.
Online (Stochastic) Gradient Descent (SGD)
A drastic simplification: Shu e data-set (if not naturally shu ed),
Compute gradient based on a one example
gti =
ˆLi (w)
ˆw
,
and do an update wÕ = w ≠ ⁄gti .
Batch vs Online Gradient Descent
Batch:
I to make one step, compute gradient w.r.t. entire data-set
I extremely slow updates
I correct gradient
Batch vs Online Gradient Descent
Batch:
I to make one step, compute gradient w.r.t. entire data-set
I extremely slow updates
I correct gradient
Online:
I to make one step, compute gradient w.r.t. one example:
I extremely fast updates
I not necessarily correct gradient
Visualize Batch vs Stochastic Gradient Descent
Batch vs Online Learning
Batch Learning:
I process a large training data-set, generate a model
I use model to predict labels of test data-set
Batch vs Online Learning
Batch Learning:
I process a large training data-set, generate a model
I use model to predict labels of test data-set
Drawbacks:
I infeasible/impractically slow for large data-sets.
I need to repeat batch process to update model with new data
Batch vs Online Learning
Online Learning:
I for each “training” example:
Batch vs Online Learning
Online Learning:
I for each “training” example:
I generate prediction (score),
Batch vs Online Learning
Online Learning:
I for each “training” example:
I generate prediction (score),
I compare with true label,
Batch vs Online Learning
Online Learning:
I for each “training” example:
I generate prediction (score),
I compare with true label,
I update model (weights w)
Batch vs Online Learning
Online Learning:
I for each “training” example:
I generate prediction (score),
I compare with true label,
I update model (weights w)
I for each “test” example:
Batch vs Online Learning
Online Learning:
I for each “training” example:
I generate prediction (score),
I compare with true label,
I update model (weights w)
I for each “test” example:
I predict with latest learned model (weights w)
Batch vs Online Learning
Online Learning benefits:
I does not pre-process enire training data-set
Batch vs Online Learning
Online Learning benefits:
I does not pre-process enire training data-set
I does not explicitly retain previously-seen examples
Batch vs Online Learning
Online Learning benefits:
I does not pre-process enire training data-set
I does not explicitly retain previously-seen examples
I extremely light-weight: space and time-e cient
Batch vs Online Learning
Online Learning benefits:
I does not pre-process enire training data-set
I does not explicitly retain previously-seen examples
I extremely light-weight: space and time-e cient
I no distinct “training” and “testing” phases:
Batch vs Online Learning
Online Learning benefits:
I does not pre-process enire training data-set
I does not explicitly retain previously-seen examples
I extremely light-weight: space and time-e cient
I no distinct “training” and “testing” phases:
I incremental, continual learning
Batch vs Online Learning
Online Learning benefits:
I does not pre-process enire training data-set
I does not explicitly retain previously-seen examples
I extremely light-weight: space and time-e cient
I no distinct “training” and “testing” phases:
I incremental, continual learning
I adapts to changing patterns
Batch vs Online Learning
Online Learning benefits:
I does not pre-process enire training data-set
I does not explicitly retain previously-seen examples
I extremely light-weight: space and time-e cient
I no distinct “training” and “testing” phases:
I incremental, continual learning
I adapts to changing patterns
I easily update existing model with new data
Batch vs Online Learning
Online Learning benefits:
I does not pre-process enire training data-set
I does not explicitly retain previously-seen examples
I extremely light-weight: space and time-e cient
I no distinct “training” and “testing” phases:
I incremental, continual learning
I adapts to changing patterns
I easily update existing model with new data
I better generalization to unseen observations.
The Online Learning Paradigm
As each labeled example (xi , yi ) is seen,
I make prediction given only current weight-vector w
I update weight-vector w
Online Learning: Use Scenarios
I extremely large data-sets where
Online Learning: Use Scenarios
I extremely large data-sets where
I batch learning is computationally infeasible/impractical, and
Online Learning: Use Scenarios
I extremely large data-sets where
I batch learning is computationally infeasible/impractical, and
I it’s possible to only do a single pass over the data.
Online Learning: Use Scenarios
I extremely large data-sets where
I batch learning is computationally infeasible/impractical, and
I it’s possible to only do a single pass over the data.
I data arrives in real-time, and
Online Learning: Use Scenarios
I extremely large data-sets where
I batch learning is computationally infeasible/impractical, and
I it’s possible to only do a single pass over the data.
I data arrives in real-time, and
I decisions/predictions must be made quickly
Online Learning: Use Scenarios
I extremely large data-sets where
I batch learning is computationally infeasible/impractical, and
I it’s possible to only do a single pass over the data.
I data arrives in real-time, and
I decisions/predictions must be made quickly
I learned model needs to adapt quickly to recent observations.
Online Learning Examples
Online Learning Example: Advertising (MediaMath)
Listen to 100 billion ad-opportunities daily from Ad Exchanges.
Online Learning Example: Advertising (MediaMath)
Listen to 100 billion ad-opportunities daily from Ad Exchanges.
For each opportunity, need to predict whether exposed user will buy,
as function of several features:
I hour_of_day, browser_type, geo_region, age, . . .
Online Learning Example: Advertising (MediaMath)
Listen to 100 billion ad-opportunities daily from Ad Exchanges.
For each opportunity, need to predict whether exposed user will buy,
as function of several features:
I hour_of_day, browser_type, geo_region, age, . . .
Online learning benefits:
I fast update of learned model to reflect latest observations
Online Learning Example: Advertising (MediaMath)
Listen to 100 billion ad-opportunities daily from Ad Exchanges.
For each opportunity, need to predict whether exposed user will buy,
as function of several features:
I hour_of_day, browser_type, geo_region, age, . . .
Online learning benefits:
I fast update of learned model to reflect latest observations
I light-weight models extremely quick to compute
Online Learning: IOT
Vast amounts of data; need to adapt, respond quickly.
Online Learning: IOT
Vast amounts of data; need to adapt, respond quickly.
Nest Thermostats: behavior data =∆ predict preferred room temp.
Online Learning: IOT
Vast amounts of data; need to adapt, respond quickly.
Nest Thermostats: behavior data =∆ predict preferred room temp.
Self-driving Cars: (sensor data, other cars) =∆ predict collision
Online Learning: IOT
Vast amounts of data; need to adapt, respond quickly.
Nest Thermostats: behavior data =∆ predict preferred room temp.
Self-driving Cars: (sensor data, other cars) =∆ predict collision
Clinical: Sensors (activity, vitals, . . . ) =∆ predict cardiac event
Online Learning: IOT
Vast amounts of data; need to adapt, respond quickly.
Nest Thermostats: behavior data =∆ predict preferred room temp.
Self-driving Cars: (sensor data, other cars) =∆ predict collision
Clinical: Sensors (activity, vitals, . . . ) =∆ predict cardiac event
Smart cities: tra c sensors =∆ predict congestion
Online Learning: Challenge #1
Feature scale di erences
Online Learning: Feature Scaling
Example from wearable devices domain:
I feature 1 = heart-rate, range 40 to 200
I feature 2 = step-count, range 0 to 500,000
Online Learning: Feature Scaling
Example from wearable devices domain:
I feature 1 = heart-rate, range 40 to 200
I feature 2 = step-count, range 0 to 500,000
Extreme scale di erences =∆ convergence problems.
Convergence much faster when features are of same scale:
I normalize each feature by dividing by its max possible value.
Online Learning: Feature Scaling
But often:
I range of features not known in advance, and
I we cannot make a separate pass over the data to find ranges.
Online Learning: Feature Scaling
But often:
I range of features not known in advance, and
I we cannot make a separate pass over the data to find ranges.
=∆ Need single-pass algorithms that
adaptively normalize features with each new observation.
[Ross,Mineiro,Langford 2013] proposed such an algorithm,
which we implemented in our online ML system.
Online Learning Challenge #2:
(Sparse) Feature frequency di erences
Online Learning: Feature Frequency Di erences
Some sparse features occur much more frequently than others, e.g.:
I categorical feature country with 200 values,
Online Learning: Feature Frequency Di erences
Some sparse features occur much more frequently than others, e.g.:
I categorical feature country with 200 values,
I encoded as a vector of length 200 with exactly one entry = 1,
and the rest 0
Online Learning: Feature Frequency Di erences
Some sparse features occur much more frequently than others, e.g.:
I categorical feature country with 200 values,
I encoded as a vector of length 200 with exactly one entry = 1,
and the rest 0
I country=USA may occur much more often than
country=Belgium
Online Learning: Feature Frequency Di erences
Some sparse features occur much more frequently than others, e.g.:
I categorical feature country with 200 values,
I encoded as a vector of length 200 with exactly one entry = 1,
and the rest 0
I country=USA may occur much more often than
country=Belgium
I indicator feature visited_site = 1 much more often than
purchased=1.
Online Learning: Feature Frequency Di erences
Often, rare features much more predictive than frequent features.
Same learning rate for all features =∆ slow convergence.
Online Learning: Feature Frequency Di erences
Often, rare features much more predictive than frequent features.
Same learning rate for all features =∆ slow convergence.
=∆ rare features should have larger learning rates:
I bigger steps whenever a rare feature is seen
I much faster convergence
Online Learning: Feature Frequency Di erences
Often, rare features much more predictive than frequent features.
Same learning rate for all features =∆ slow convergence.
=∆ rare features should have larger learning rates:
I bigger steps whenever a rare feature is seen
I much faster convergence
E ectively, the algo pays more attention to rare features
Enables finding rare but predictive features.
Online Learning: Feature Frequency Di erences
Often, rare features much more predictive than frequent features.
Same learning rate for all features =∆ slow convergence.
=∆ rare features should have larger learning rates:
I bigger steps whenever a rare feature is seen
I much faster convergence
E ectively, the algo pays more attention to rare features
Enables finding rare but predictive features.
ADAGRAD is an algorithm for this [Duchi,Hazan,Singer 2010], and
we implemented this in our learning system.
Online Learning Challenge #3:
Encoding sparse features
Online Learning: Sparse Features
E.g. site_domain has large (unknown) set of possible values
I google.com, yahoo.com, cnn.com, . . .
Online Learning: Sparse Features
E.g. site_domain has large (unknown) set of possible values
I google.com, yahoo.com, cnn.com, . . .
I need to encode (conceptually) as 1-hot vectors, e.g.
Online Learning: Sparse Features
E.g. site_domain has large (unknown) set of possible values
I google.com, yahoo.com, cnn.com, . . .
I need to encode (conceptually) as 1-hot vectors, e.g.
I google.com = (1, 0, 0, 0, ... )
Online Learning: Sparse Features
E.g. site_domain has large (unknown) set of possible values
I google.com, yahoo.com, cnn.com, . . .
I need to encode (conceptually) as 1-hot vectors, e.g.
I google.com = (1, 0, 0, 0, ... )
I yahoo.com = (0, 1, 0, 0, ... )
Online Learning: Sparse Features
E.g. site_domain has large (unknown) set of possible values
I google.com, yahoo.com, cnn.com, . . .
I need to encode (conceptually) as 1-hot vectors, e.g.
I google.com = (1, 0, 0, 0, ... )
I yahoo.com = (0, 1, 0, 0, ... )
I cnn.com = (0, 0, 1, 0, ... )
Online Learning: Sparse Features
E.g. site_domain has large (unknown) set of possible values
I google.com, yahoo.com, cnn.com, . . .
I need to encode (conceptually) as 1-hot vectors, e.g.
I google.com = (1, 0, 0, 0, ... )
I yahoo.com = (0, 1, 0, 0, ... )
I cnn.com = (0, 0, 1, 0, ... )
I . . .
Online Learning: Sparse Features
E.g. site_domain has large (unknown) set of possible values
I google.com, yahoo.com, cnn.com, . . .
I need to encode (conceptually) as 1-hot vectors, e.g.
I google.com = (1, 0, 0, 0, ... )
I yahoo.com = (0, 1, 0, 0, ... )
I cnn.com = (0, 0, 1, 0, ... )
I . . .
I all possible values not known in advance
Online Learning: Sparse Features
E.g. site_domain has large (unknown) set of possible values
I google.com, yahoo.com, cnn.com, . . .
I need to encode (conceptually) as 1-hot vectors, e.g.
I google.com = (1, 0, 0, 0, ... )
I yahoo.com = (0, 1, 0, 0, ... )
I cnn.com = (0, 0, 1, 0, ... )
I . . .
I all possible values not known in advance
I cannot pre-process data to find all possible values
Online Learning: Sparse Features
E.g. site_domain has large (unknown) set of possible values
I google.com, yahoo.com, cnn.com, . . .
I need to encode (conceptually) as 1-hot vectors, e.g.
I google.com = (1, 0, 0, 0, ... )
I yahoo.com = (0, 1, 0, 0, ... )
I cnn.com = (0, 0, 1, 0, ... )
I . . .
I all possible values not known in advance
I cannot pre-process data to find all possible values
I don’t want to encode explicit (long) vectors
Online Learning: Sparse Features, Hashing Trick
e.g. observation:
I country = "china" (categorical)
I age=32 (numerical)
I domain="google.com" (categorical)
Online Learning: Sparse Features, Hashing Trick
Hash the feature-names:
I hash("country_china") = 24378
I hash("age") = 32905
I hash("domain_google.com") = 84395
Online Learning: Sparse Features, Hashing Trick
Represent observation as a (special) Map:
{24378 æ 1.0, 32905 æ 32.0, 84395 æ 1.0}
Online Learning: Sparse Features, Hashing Trick
Represent observation as a (special) Map:
{24378 æ 1.0, 32905 æ 32.0, 84395 æ 1.0}
Sparse Representation (no explicit vectors)
Online Learning: Sparse Features, Hashing Trick
Represent observation as a (special) Map:
{24378 æ 1.0, 32905 æ 32.0, 84395 æ 1.0}
Sparse Representation (no explicit vectors)
No need for separate pass on data (unlike Spark MLLib)
Online Learning Challenge #4:
Distributed Implementation of Online Learning
Distributed Online Logistic Regression
Stochastic Gradient Descent (SGD) is inherently sequential:
I how to parallelize?
Distributed Online Logistic Regression
Stochastic Gradient Descent (SGD) is inherently sequential:
I how to parallelize?
Our (Scala) implementation in Apache Spark:
I Randomly re-partition training data into shards
I Use SGD to learn a model for each shard
I average models using TreeReduce (~ “AllReduce”)
I leverages Spark/Hadoop fault-tolerance.
Slider:
Fast Distributed Online Learning System
Slider
Fast, distributed, online, single-pass learning system.
I Written in Scala on top of Spark
I Works directly with Spark Data Frames
I Usable as a library within other JVM systems
I Leverages Spark/Hadoop fault-tolerance
I Stochastic Gradient Descent
I Online feature-scaling/normalization
I Adaptive (per-feature) learning-rates
I Single-pass
I Hashing-trick to encode sparse features
Slider, Vowpal-Wabbit (VW), Spark-ML (SML)
Fast, distributed, online, single-pass learning system.
I Written in Scala on top of Spark (SML)
I Works directly with Spark Data Frames (SML)
I Usable as a library within other JVM systems (SML)
I Leverages Spark/Hadoop fault-tolerance (SML)
I Stochastic Gradient Descent (SGD) (VW, SML)
I Online feature-scaling/normalization (VW)
I Adaptive (per-feature) learning-rates (VW)
I Single-pass (VW, SML)
I Hashing-trick to encode sparse features (VW)
Slider example
Slider example
Slider example
Slider example
Slider vs Spark ML
Task: Predict conversion probability from ad Impression features
I 14M impressions from 1 ad campaign
I 17 Categorical features, 2 numerical features
I Train on first 80%, test on remaining 20%
Slider vs Spark ML
Task: Predict conversion probability from ad Impression features
I 14M impressions from 1 ad campaign
I 17 Categorical features, 2 numerical features
I Train on first 80%, test on remaining 20%
Spark ML (using Pipelines)
I makes 17 passes over data: one for each categorical feature
I trains and scores in 40 minutes
I need to specify iterations, etc.
I AUC = 0.52 on test data
Slider vs Spark ML
Task: Predict conversion probability from ad Impression features
I 14M impressions from 1 ad campaign
I 17 Categorical features, 2 numerical features
I Train on first 80%, test on remaining 20%
Slider
I makes just one pass over data.
I trains and scores in 5 minutes.
I no tuning
I AUC = 0.68 on test data
Other Work
I Online version of k-means clustering
I FTRL algorithm (regularized alternative to SGD)
Ongoing/Future:
I Online learning with Spark Streaming
I Benchmarking vs other ML systems
Thank you

More Related Content

Similar to Fast Distributed Online Classification

Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Edureka!
 
How can we train with few data
How can we train with few dataHow can we train with few data
How can we train with few data
Dong Heon Cho
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Simplilearn
 
Scalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data MiningScalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data Mining
Gerard de Melo
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
Arunangsu Sahu
 
Reinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual BanditsReinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual Bandits
Max Pagels
 
Competition 1 (blog 1)
Competition 1 (blog 1)Competition 1 (blog 1)
Competition 1 (blog 1)
TarunPaparaju
 
Machine Learning at Geeky Base 2
Machine Learning at Geeky Base 2Machine Learning at Geeky Base 2
Machine Learning at Geeky Base 2
Kan Ouivirach, Ph.D.
 
Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...
Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...
Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...
Edureka!
 
Overfitting and-tbl
Overfitting and-tblOverfitting and-tbl
Overfitting and-tbl
Digvijay Singh
 
Big Data Science - hype?
Big Data Science - hype?Big Data Science - hype?
Big Data Science - hype?
BalaBit
 
WEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been LearnedWEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been Learned
weka Content
 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been Learned
DataminingTools Inc
 
Search Engines
Search EnginesSearch Engines
Search Engines
butest
 
Recommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model TrainingRecommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model Training
Crossing Minds
 
Feature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performanceFeature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performance
Venkat Projects
 
Kaggle KDD Cup Report
Kaggle KDD Cup ReportKaggle KDD Cup Report
Kaggle KDD Cup Report
Chamila Wijayarathna
 
Artificial Intelligence.pptx
Artificial Intelligence.pptxArtificial Intelligence.pptx
Artificial Intelligence.pptx
Kaviya452563
 
Think-Aloud Protocols
Think-Aloud ProtocolsThink-Aloud Protocols
Think-Aloud Protocols
butest
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and prediction
Margaret Wang
 

Similar to Fast Distributed Online Classification (20)

Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
 
How can we train with few data
How can we train with few dataHow can we train with few data
How can we train with few data
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
Scalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data MiningScalable Learning Technologies for Big Data Mining
Scalable Learning Technologies for Big Data Mining
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
Reinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual BanditsReinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual Bandits
 
Competition 1 (blog 1)
Competition 1 (blog 1)Competition 1 (blog 1)
Competition 1 (blog 1)
 
Machine Learning at Geeky Base 2
Machine Learning at Geeky Base 2Machine Learning at Geeky Base 2
Machine Learning at Geeky Base 2
 
Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...
Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...
Logistic Regression in R | Machine Learning Algorithms | Data Science Trainin...
 
Overfitting and-tbl
Overfitting and-tblOverfitting and-tbl
Overfitting and-tbl
 
Big Data Science - hype?
Big Data Science - hype?Big Data Science - hype?
Big Data Science - hype?
 
WEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been LearnedWEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been Learned
 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been Learned
 
Search Engines
Search EnginesSearch Engines
Search Engines
 
Recommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model TrainingRecommender Systems from A to Z – Model Training
Recommender Systems from A to Z – Model Training
 
Feature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performanceFeature extraction for classifying students based on theirac ademic performance
Feature extraction for classifying students based on theirac ademic performance
 
Kaggle KDD Cup Report
Kaggle KDD Cup ReportKaggle KDD Cup Report
Kaggle KDD Cup Report
 
Artificial Intelligence.pptx
Artificial Intelligence.pptxArtificial Intelligence.pptx
Artificial Intelligence.pptx
 
Think-Aloud Protocols
Think-Aloud ProtocolsThink-Aloud Protocols
Think-Aloud Protocols
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and prediction
 

More from DataWorks Summit/Hadoop Summit

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Recently uploaded

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Precisely
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 

Recently uploaded (20)

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their MainframeDigital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
Digital Banking in the Cloud: How Citizens Bank Unlocked Their Mainframe
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 

Fast Distributed Online Classification

  • 1. Fast Distributed Online Classification Ram Sriharsha (Product Manager, Apache Spark, Databricks) Prasad Chalasani (SVP Data Science, MediaMath) 13 April, 2016
  • 2. Summary We leveraged recent machine-learning research to develop a I fast, practical, I scalable (up to 100s of Millions of sparse features) I online, I distributed (built on Apache Spark), I single-pass, ML classifier that has significant advantages over most similar ML packages.
  • 3. Key Conceptual Take-aways I Supervised Machine Learning I Online vs Batch Learning, and importance of Online I Challenges in online-learning I Distributed implementation in Spark.
  • 4. Supervised Machine Learning: Overview Given: I labeled training data, Goal: I fit a model to predict labels on (unseen) test data.
  • 5. Supervised Machine Learning: Overview Given: I training data D: n labeled examples {(x1, y1), (x2, y2), . . . , (xn, yn)} where I xi is a k-dimensional feature-vector I yi is the label (0 or 1) that we want to predict. I an error (or loss) metric L(p, y) from predicting p when true label is y.
  • 6. Supervised Machine Learning: Overview Given: I training data D: n labeled examples {(x1, y1), (x2, y2), . . . , (xn, yn)} where I xi is a k-dimensional feature-vector I yi is the label (0 or 1) that we want to predict. I an error (or loss) metric L(p, y) from predicting p when true label is y. Fix a family of functions fw (x) œ F that are parametrised by a weight-vector w.
  • 7. Supervised Machine Learning: Overview Given: I training data D: n labeled examples {(x1, y1), (x2, y2), . . . , (xn, yn)} where I xi is a k-dimensional feature-vector I yi is the label (0 or 1) that we want to predict. I an error (or loss) metric L(p, y) from predicting p when true label is y. Fix a family of functions fw (x) œ F that are parametrised by a weight-vector w. Goal: find w that minimizes average loss over D: L(w) = 1 n nÿ i=1 Li (w) = 1 n nÿ i=1 L(fw (xi ), yi ).
  • 8. Logistic Regression Logistic model fw (x) = 1 1+e≠w·x Probability interpretation
  • 9. Logistic Regression Logistic model fw (x) = 1 1+e≠w·x Probability interpretation Loss Function Li (w) = ≠yi ln(fw (xi )) ≠ (1 ≠ yi ) ln(1 ≠ fw (xi ))
  • 10. Logistic Regression Logistic model fw (x) = 1 1+e≠w·x Probability interpretation Loss Function Li (w) = ≠yi ln(fw (xi )) ≠ (1 ≠ yi ) ln(1 ≠ fw (xi )) Overall Loss L(w) = qn i=1 Li (w)
  • 11. Logistic Regression Logistic model fw (x) = 1 1+e≠w·x Probability interpretation Loss Function Li (w) = ≠yi ln(fw (xi )) ≠ (1 ≠ yi ) ln(1 ≠ fw (xi )) Overall Loss L(w) = qn i=1 Li (w) L(w) is convex: I no local minima I di erentiate and follow gradients
  • 13. Gradient Descent Basic idea: I start with an initial guess of weight-vector w I at iteration t, update w to a new weight-vector wÕ: wÕ = w ≠ ⁄gt where I gt is the (vector) gradient of L(w) w.r.t. w at time t, I ⁄ is the learning rate.
  • 14. Gradient Descent Gradient gt =∆ step direction Learning rate ⁄ =∆ step size
  • 16. Gradient Descent gt = ˆL(w) ˆw = ÿ i ˆLi (w) ˆw = ÿ i gti This is Batch Gradient Descent (BGD) : I To make one weight-update, need to compute gradient over entire training data-set. I repeat this until convergence.
  • 17. Gradient Descent gt = ˆL(w) ˆw = ÿ i ˆLi (w) ˆw = ÿ i gti This is Batch Gradient Descent (BGD) : I To make one weight-update, need to compute gradient over entire training data-set. I repeat this until convergence. BGD is not scalable to large data-sets.
  • 18. Online (Stochastic) Gradient Descent (SGD) A drastic simplification: Instead of computing gradient based on entire training data-set gt = ÿ i ˆLi (w) ˆw , and doing an update wÕ = w ≠ ⁄gt.
  • 19. Online (Stochastic) Gradient Descent (SGD) A drastic simplification: Shu e data-set (if not naturally shu ed), Compute gradient based on a one example gti = ˆLi (w) ˆw , and do an update wÕ = w ≠ ⁄gti .
  • 20. Batch vs Online Gradient Descent Batch: I to make one step, compute gradient w.r.t. entire data-set I extremely slow updates I correct gradient
  • 21. Batch vs Online Gradient Descent Batch: I to make one step, compute gradient w.r.t. entire data-set I extremely slow updates I correct gradient Online: I to make one step, compute gradient w.r.t. one example: I extremely fast updates I not necessarily correct gradient
  • 22. Visualize Batch vs Stochastic Gradient Descent
  • 23. Batch vs Online Learning Batch Learning: I process a large training data-set, generate a model I use model to predict labels of test data-set
  • 24. Batch vs Online Learning Batch Learning: I process a large training data-set, generate a model I use model to predict labels of test data-set Drawbacks: I infeasible/impractically slow for large data-sets. I need to repeat batch process to update model with new data
  • 25. Batch vs Online Learning Online Learning: I for each “training” example:
  • 26. Batch vs Online Learning Online Learning: I for each “training” example: I generate prediction (score),
  • 27. Batch vs Online Learning Online Learning: I for each “training” example: I generate prediction (score), I compare with true label,
  • 28. Batch vs Online Learning Online Learning: I for each “training” example: I generate prediction (score), I compare with true label, I update model (weights w)
  • 29. Batch vs Online Learning Online Learning: I for each “training” example: I generate prediction (score), I compare with true label, I update model (weights w) I for each “test” example:
  • 30. Batch vs Online Learning Online Learning: I for each “training” example: I generate prediction (score), I compare with true label, I update model (weights w) I for each “test” example: I predict with latest learned model (weights w)
  • 31. Batch vs Online Learning Online Learning benefits: I does not pre-process enire training data-set
  • 32. Batch vs Online Learning Online Learning benefits: I does not pre-process enire training data-set I does not explicitly retain previously-seen examples
  • 33. Batch vs Online Learning Online Learning benefits: I does not pre-process enire training data-set I does not explicitly retain previously-seen examples I extremely light-weight: space and time-e cient
  • 34. Batch vs Online Learning Online Learning benefits: I does not pre-process enire training data-set I does not explicitly retain previously-seen examples I extremely light-weight: space and time-e cient I no distinct “training” and “testing” phases:
  • 35. Batch vs Online Learning Online Learning benefits: I does not pre-process enire training data-set I does not explicitly retain previously-seen examples I extremely light-weight: space and time-e cient I no distinct “training” and “testing” phases: I incremental, continual learning
  • 36. Batch vs Online Learning Online Learning benefits: I does not pre-process enire training data-set I does not explicitly retain previously-seen examples I extremely light-weight: space and time-e cient I no distinct “training” and “testing” phases: I incremental, continual learning I adapts to changing patterns
  • 37. Batch vs Online Learning Online Learning benefits: I does not pre-process enire training data-set I does not explicitly retain previously-seen examples I extremely light-weight: space and time-e cient I no distinct “training” and “testing” phases: I incremental, continual learning I adapts to changing patterns I easily update existing model with new data
  • 38. Batch vs Online Learning Online Learning benefits: I does not pre-process enire training data-set I does not explicitly retain previously-seen examples I extremely light-weight: space and time-e cient I no distinct “training” and “testing” phases: I incremental, continual learning I adapts to changing patterns I easily update existing model with new data I better generalization to unseen observations.
  • 39. The Online Learning Paradigm As each labeled example (xi , yi ) is seen, I make prediction given only current weight-vector w I update weight-vector w
  • 40. Online Learning: Use Scenarios I extremely large data-sets where
  • 41. Online Learning: Use Scenarios I extremely large data-sets where I batch learning is computationally infeasible/impractical, and
  • 42. Online Learning: Use Scenarios I extremely large data-sets where I batch learning is computationally infeasible/impractical, and I it’s possible to only do a single pass over the data.
  • 43. Online Learning: Use Scenarios I extremely large data-sets where I batch learning is computationally infeasible/impractical, and I it’s possible to only do a single pass over the data. I data arrives in real-time, and
  • 44. Online Learning: Use Scenarios I extremely large data-sets where I batch learning is computationally infeasible/impractical, and I it’s possible to only do a single pass over the data. I data arrives in real-time, and I decisions/predictions must be made quickly
  • 45. Online Learning: Use Scenarios I extremely large data-sets where I batch learning is computationally infeasible/impractical, and I it’s possible to only do a single pass over the data. I data arrives in real-time, and I decisions/predictions must be made quickly I learned model needs to adapt quickly to recent observations.
  • 47. Online Learning Example: Advertising (MediaMath) Listen to 100 billion ad-opportunities daily from Ad Exchanges.
  • 48. Online Learning Example: Advertising (MediaMath) Listen to 100 billion ad-opportunities daily from Ad Exchanges. For each opportunity, need to predict whether exposed user will buy, as function of several features: I hour_of_day, browser_type, geo_region, age, . . .
  • 49. Online Learning Example: Advertising (MediaMath) Listen to 100 billion ad-opportunities daily from Ad Exchanges. For each opportunity, need to predict whether exposed user will buy, as function of several features: I hour_of_day, browser_type, geo_region, age, . . . Online learning benefits: I fast update of learned model to reflect latest observations
  • 50. Online Learning Example: Advertising (MediaMath) Listen to 100 billion ad-opportunities daily from Ad Exchanges. For each opportunity, need to predict whether exposed user will buy, as function of several features: I hour_of_day, browser_type, geo_region, age, . . . Online learning benefits: I fast update of learned model to reflect latest observations I light-weight models extremely quick to compute
  • 51. Online Learning: IOT Vast amounts of data; need to adapt, respond quickly.
  • 52. Online Learning: IOT Vast amounts of data; need to adapt, respond quickly. Nest Thermostats: behavior data =∆ predict preferred room temp.
  • 53. Online Learning: IOT Vast amounts of data; need to adapt, respond quickly. Nest Thermostats: behavior data =∆ predict preferred room temp. Self-driving Cars: (sensor data, other cars) =∆ predict collision
  • 54. Online Learning: IOT Vast amounts of data; need to adapt, respond quickly. Nest Thermostats: behavior data =∆ predict preferred room temp. Self-driving Cars: (sensor data, other cars) =∆ predict collision Clinical: Sensors (activity, vitals, . . . ) =∆ predict cardiac event
  • 55. Online Learning: IOT Vast amounts of data; need to adapt, respond quickly. Nest Thermostats: behavior data =∆ predict preferred room temp. Self-driving Cars: (sensor data, other cars) =∆ predict collision Clinical: Sensors (activity, vitals, . . . ) =∆ predict cardiac event Smart cities: tra c sensors =∆ predict congestion
  • 56. Online Learning: Challenge #1 Feature scale di erences
  • 57. Online Learning: Feature Scaling Example from wearable devices domain: I feature 1 = heart-rate, range 40 to 200 I feature 2 = step-count, range 0 to 500,000
  • 58. Online Learning: Feature Scaling Example from wearable devices domain: I feature 1 = heart-rate, range 40 to 200 I feature 2 = step-count, range 0 to 500,000 Extreme scale di erences =∆ convergence problems. Convergence much faster when features are of same scale: I normalize each feature by dividing by its max possible value.
  • 59. Online Learning: Feature Scaling But often: I range of features not known in advance, and I we cannot make a separate pass over the data to find ranges.
  • 60. Online Learning: Feature Scaling But often: I range of features not known in advance, and I we cannot make a separate pass over the data to find ranges. =∆ Need single-pass algorithms that adaptively normalize features with each new observation. [Ross,Mineiro,Langford 2013] proposed such an algorithm, which we implemented in our online ML system.
  • 61. Online Learning Challenge #2: (Sparse) Feature frequency di erences
  • 62. Online Learning: Feature Frequency Di erences Some sparse features occur much more frequently than others, e.g.: I categorical feature country with 200 values,
  • 63. Online Learning: Feature Frequency Di erences Some sparse features occur much more frequently than others, e.g.: I categorical feature country with 200 values, I encoded as a vector of length 200 with exactly one entry = 1, and the rest 0
  • 64. Online Learning: Feature Frequency Di erences Some sparse features occur much more frequently than others, e.g.: I categorical feature country with 200 values, I encoded as a vector of length 200 with exactly one entry = 1, and the rest 0 I country=USA may occur much more often than country=Belgium
  • 65. Online Learning: Feature Frequency Di erences Some sparse features occur much more frequently than others, e.g.: I categorical feature country with 200 values, I encoded as a vector of length 200 with exactly one entry = 1, and the rest 0 I country=USA may occur much more often than country=Belgium I indicator feature visited_site = 1 much more often than purchased=1.
  • 66. Online Learning: Feature Frequency Di erences Often, rare features much more predictive than frequent features. Same learning rate for all features =∆ slow convergence.
  • 67. Online Learning: Feature Frequency Di erences Often, rare features much more predictive than frequent features. Same learning rate for all features =∆ slow convergence. =∆ rare features should have larger learning rates: I bigger steps whenever a rare feature is seen I much faster convergence
  • 68. Online Learning: Feature Frequency Di erences Often, rare features much more predictive than frequent features. Same learning rate for all features =∆ slow convergence. =∆ rare features should have larger learning rates: I bigger steps whenever a rare feature is seen I much faster convergence E ectively, the algo pays more attention to rare features Enables finding rare but predictive features.
  • 69. Online Learning: Feature Frequency Di erences Often, rare features much more predictive than frequent features. Same learning rate for all features =∆ slow convergence. =∆ rare features should have larger learning rates: I bigger steps whenever a rare feature is seen I much faster convergence E ectively, the algo pays more attention to rare features Enables finding rare but predictive features. ADAGRAD is an algorithm for this [Duchi,Hazan,Singer 2010], and we implemented this in our learning system.
  • 70. Online Learning Challenge #3: Encoding sparse features
  • 71. Online Learning: Sparse Features E.g. site_domain has large (unknown) set of possible values I google.com, yahoo.com, cnn.com, . . .
  • 72. Online Learning: Sparse Features E.g. site_domain has large (unknown) set of possible values I google.com, yahoo.com, cnn.com, . . . I need to encode (conceptually) as 1-hot vectors, e.g.
  • 73. Online Learning: Sparse Features E.g. site_domain has large (unknown) set of possible values I google.com, yahoo.com, cnn.com, . . . I need to encode (conceptually) as 1-hot vectors, e.g. I google.com = (1, 0, 0, 0, ... )
  • 74. Online Learning: Sparse Features E.g. site_domain has large (unknown) set of possible values I google.com, yahoo.com, cnn.com, . . . I need to encode (conceptually) as 1-hot vectors, e.g. I google.com = (1, 0, 0, 0, ... ) I yahoo.com = (0, 1, 0, 0, ... )
  • 75. Online Learning: Sparse Features E.g. site_domain has large (unknown) set of possible values I google.com, yahoo.com, cnn.com, . . . I need to encode (conceptually) as 1-hot vectors, e.g. I google.com = (1, 0, 0, 0, ... ) I yahoo.com = (0, 1, 0, 0, ... ) I cnn.com = (0, 0, 1, 0, ... )
  • 76. Online Learning: Sparse Features E.g. site_domain has large (unknown) set of possible values I google.com, yahoo.com, cnn.com, . . . I need to encode (conceptually) as 1-hot vectors, e.g. I google.com = (1, 0, 0, 0, ... ) I yahoo.com = (0, 1, 0, 0, ... ) I cnn.com = (0, 0, 1, 0, ... ) I . . .
  • 77. Online Learning: Sparse Features E.g. site_domain has large (unknown) set of possible values I google.com, yahoo.com, cnn.com, . . . I need to encode (conceptually) as 1-hot vectors, e.g. I google.com = (1, 0, 0, 0, ... ) I yahoo.com = (0, 1, 0, 0, ... ) I cnn.com = (0, 0, 1, 0, ... ) I . . . I all possible values not known in advance
  • 78. Online Learning: Sparse Features E.g. site_domain has large (unknown) set of possible values I google.com, yahoo.com, cnn.com, . . . I need to encode (conceptually) as 1-hot vectors, e.g. I google.com = (1, 0, 0, 0, ... ) I yahoo.com = (0, 1, 0, 0, ... ) I cnn.com = (0, 0, 1, 0, ... ) I . . . I all possible values not known in advance I cannot pre-process data to find all possible values
  • 79. Online Learning: Sparse Features E.g. site_domain has large (unknown) set of possible values I google.com, yahoo.com, cnn.com, . . . I need to encode (conceptually) as 1-hot vectors, e.g. I google.com = (1, 0, 0, 0, ... ) I yahoo.com = (0, 1, 0, 0, ... ) I cnn.com = (0, 0, 1, 0, ... ) I . . . I all possible values not known in advance I cannot pre-process data to find all possible values I don’t want to encode explicit (long) vectors
  • 80. Online Learning: Sparse Features, Hashing Trick e.g. observation: I country = "china" (categorical) I age=32 (numerical) I domain="google.com" (categorical)
  • 81. Online Learning: Sparse Features, Hashing Trick Hash the feature-names: I hash("country_china") = 24378 I hash("age") = 32905 I hash("domain_google.com") = 84395
  • 82. Online Learning: Sparse Features, Hashing Trick Represent observation as a (special) Map: {24378 æ 1.0, 32905 æ 32.0, 84395 æ 1.0}
  • 83. Online Learning: Sparse Features, Hashing Trick Represent observation as a (special) Map: {24378 æ 1.0, 32905 æ 32.0, 84395 æ 1.0} Sparse Representation (no explicit vectors)
  • 84. Online Learning: Sparse Features, Hashing Trick Represent observation as a (special) Map: {24378 æ 1.0, 32905 æ 32.0, 84395 æ 1.0} Sparse Representation (no explicit vectors) No need for separate pass on data (unlike Spark MLLib)
  • 85. Online Learning Challenge #4: Distributed Implementation of Online Learning
  • 86. Distributed Online Logistic Regression Stochastic Gradient Descent (SGD) is inherently sequential: I how to parallelize?
  • 87. Distributed Online Logistic Regression Stochastic Gradient Descent (SGD) is inherently sequential: I how to parallelize? Our (Scala) implementation in Apache Spark: I Randomly re-partition training data into shards I Use SGD to learn a model for each shard I average models using TreeReduce (~ “AllReduce”) I leverages Spark/Hadoop fault-tolerance.
  • 89. Slider Fast, distributed, online, single-pass learning system. I Written in Scala on top of Spark I Works directly with Spark Data Frames I Usable as a library within other JVM systems I Leverages Spark/Hadoop fault-tolerance I Stochastic Gradient Descent I Online feature-scaling/normalization I Adaptive (per-feature) learning-rates I Single-pass I Hashing-trick to encode sparse features
  • 90. Slider, Vowpal-Wabbit (VW), Spark-ML (SML) Fast, distributed, online, single-pass learning system. I Written in Scala on top of Spark (SML) I Works directly with Spark Data Frames (SML) I Usable as a library within other JVM systems (SML) I Leverages Spark/Hadoop fault-tolerance (SML) I Stochastic Gradient Descent (SGD) (VW, SML) I Online feature-scaling/normalization (VW) I Adaptive (per-feature) learning-rates (VW) I Single-pass (VW, SML) I Hashing-trick to encode sparse features (VW)
  • 95. Slider vs Spark ML Task: Predict conversion probability from ad Impression features I 14M impressions from 1 ad campaign I 17 Categorical features, 2 numerical features I Train on first 80%, test on remaining 20%
  • 96. Slider vs Spark ML Task: Predict conversion probability from ad Impression features I 14M impressions from 1 ad campaign I 17 Categorical features, 2 numerical features I Train on first 80%, test on remaining 20% Spark ML (using Pipelines) I makes 17 passes over data: one for each categorical feature I trains and scores in 40 minutes I need to specify iterations, etc. I AUC = 0.52 on test data
  • 97. Slider vs Spark ML Task: Predict conversion probability from ad Impression features I 14M impressions from 1 ad campaign I 17 Categorical features, 2 numerical features I Train on first 80%, test on remaining 20% Slider I makes just one pass over data. I trains and scores in 5 minutes. I no tuning I AUC = 0.68 on test data
  • 98. Other Work I Online version of k-means clustering I FTRL algorithm (regularized alternative to SGD) Ongoing/Future: I Online learning with Spark Streaming I Benchmarking vs other ML systems