SlideShare a Scribd company logo
1 of 27
Many Shades of Scale:
Big Learning
Beyond Big Data
Misha Bilenko
Principal Researcher
Microsoft Azure Machine Learning
ML ♥ More Data
What we see in production
[Banko and Brill, 2001]
What we [used to] learn in school
[Mooney, 1996]
ML ♥ More Data
What we see in production
[Banko and Brill, 2001]
Is training on
more examples
all there is to it?
Big Learning ≠ Learning(BigData)
• Big data: size → distributing storage and processing
• Big learning: scale bottlenecks in training and prediction
• Classic bottlenecks: bytes and cycles
Large datasets → distribute training on larger hardware (FPGAs, GPUs, cores, clusters)
• Other scaling dimensions
Features Components/People
Learning from Counts
Distributed Robust Algorithm for Count-based Learning
joint work with Chris Meek (MSR)
Wenhan Wang, Pete Luferenko (Azure ML)
Scaling to many Features
Learning with relational data
𝑝(𝑐𝑙𝑖𝑐𝑘|𝑎𝑑,𝑐𝑜𝑛𝑡𝑒𝑥𝑡,𝑢𝑠𝑒𝑟) adid = 1010054353
adText = K2 ski sale!
Userid = 0xb49129827048dd9b
IP =
Query = powder skis
QCategories = {skiing, outdoor gear}
#𝑢𝑠𝑒𝑟𝑠~109 #𝑞𝑢𝑒𝑟𝑖𝑒𝑠~109+ #𝑎𝑑𝑠~107 # 𝑎𝑑 × 𝑞𝑢𝑒𝑟𝑦 ~1010+
• Information retrieval
• Advertising, recommending, search: item, page/query, user
• Transaction classification
• Payment fraud: transaction, product, user
• Email spam: message, sender, recipient
• Intrusion detection: session, system, user
• IoT: device, location
Learning with relational data
adid: 1010054353
adText: Fall ski sale!
userid 0xb49129827048dd9b
query powder skis
qCategories {skiing, outdoor gear}
• Problem: representing high-cardinality attributes as features
• Scalable: to billions of attribute values
• Efficient: ~105+
• Flexible: for a variety of downstream learners
• Adaptive: to distribution change
• Standard approaches: binary features, hashing
• What everyone should use in industry: learning with counts
• Formalization and generalization
Standard approach 1: binary (one-hot, indicator)
Attributes are mapped to indices based on lookup tables
- Not scalable cannot support high-cardinality attributes
- Not efficient large value-index dictionary must be retained
- Not flexible only linear learners are practical
- Not adaptive doesn’t support drift in attribute values
0010000..00 0..01000000 00000..001 0..00001000
#userIPs #ads #queries #queries x #ads
𝑖𝑑𝑥 𝑢 𝑖𝑑𝑥 𝑞 𝑝𝑜𝑤𝑑𝑒𝑟 𝑠𝑘𝑖𝑠𝑖𝑑𝑥 𝑎 𝑘2. 𝑐𝑜𝑚 𝑖𝑑𝑥 𝑝𝑜𝑤𝑑𝑒𝑟 𝑠𝑘𝑖𝑠, 𝑘2. 𝑐𝑜𝑚
Standard approach 1+: feature hashing
Attributes are mapped to indices via hashing: ℎ 𝑥𝑖 = ℎ𝑎𝑠ℎ 𝑥𝑖 mod 𝑚
• Collisions are rare; dot products unbiased
+ Scalable no mapping tables
+ Efficient low cost, preserves sparsity
- Not flexible only linear learners are practical
± Adaptive new values ok, no temporal effects
ℎ powder skis + k2. com
ℎ powder skis
ℎ k2. com
𝑚 ∼ 107
[Moody ‘89, Tarjan-Skadron ‘05, Weinberger+ ’08]
Learning with counts
• Features are per-label counts [+odds] [+backoff]
𝝓 = [N+ N- log(N+)-log(N-) IsRest]
• log(N+)-log(N-) = log
: log-odds/Naïve Bayes estimate
• N+, N-: indicators of confidence of the naïve estimate
• IsFromRest: indicator of back-off vs. “real count”
𝐶𝑜𝑢𝑛𝑡𝑠( 𝐶𝑜𝑢𝑛𝑡𝑠(
𝐶𝑜𝑢𝑛𝑡𝑠(powder skis)
powder skis
𝐶𝑜𝑢𝑛𝑡𝑠(powder skis,
powder skis,
IP 𝑵+ 𝑵− 46964 993424 31 843 12 430
… … …
REST 745623 13964931
𝝓(𝑪𝒐𝒖𝒏𝒕𝒔 (𝑰𝑷)) 𝝓(𝑪𝒐𝒖𝒏𝒕𝒔 (𝒂𝒅)) 𝝓(𝑪𝒐𝒖𝒏𝒕𝒔 (𝒒𝒖𝒆𝒓𝒚)) 𝝓(𝑪𝒐𝒖𝒏𝒕𝒔 (𝒒𝒖𝒆𝒓𝒚, 𝒂𝒅))
Learning with counts
• Features are per-label counts [+odds] [+backoff]
𝝓 = [N+ N- log(N+)-log(N-) IsRest]
+ Scalable “head” in memory + tail in backoff; or: count-min sketch
+ Efficient low cost, low dimensionality
+ Flexible low dimensionality works well with non-linear learners
+ Adaptive new values easily added, back-off for infrequent values, temporal counts
𝝓(𝑪𝒐𝒖𝒏𝒕𝒔(𝒖𝒔𝒆𝒓)) 𝝓(𝑪𝒐𝒖𝒏𝒕𝒔(𝒂𝒅)) 𝝓(𝑪𝒐𝒖𝒏𝒕𝒔(𝒒𝒖𝒆𝒓𝒚) 𝝓(𝑪(𝒒𝒖𝒆𝒓𝒚 × 𝒂𝒅))
𝐶𝑜𝑢𝑛𝑡𝑠( 𝐶𝑜𝑢𝑛𝑡𝑠(
𝐶𝑜𝑢𝑛𝑡𝑠(powder skis)
powder skis
𝐶𝑜𝑢𝑛𝑡𝑠(powder skis,
powder skis,
𝝓(𝑪𝒐𝒖𝒏𝒕𝒔 (𝑰𝑷)) 𝝓(𝑪𝒐𝒖𝒏𝒕𝒔 (𝒂𝒅)) 𝝓(𝑪𝒐𝒖𝒏𝒕𝒔 (𝒒𝒖𝒆𝒓𝒚)) 𝝓(𝑪𝒐𝒖𝒏𝒕𝒔 (𝒒𝒖𝒆𝒓𝒚, 𝒂𝒅))
IP 𝑵+ 𝑵− 46964 993424 31 843 12 430
… … …
REST 745623 13964931
Backoff is a pain. Count-Min Sketches to the Rescue!
[Cormode-Muthukrishnan ‘04]
Intuition: correct for collisions by using multiple hashes
Featurize: 𝑚𝑖𝑛𝑗 (𝑀[𝑗][ℎ𝑗(𝑖)]) Estimation Time : O(d)
= M (d x w)
Count: for each hash function M[j][hj(i)] ++ Update Time: O(d)
Learning from counts: aggregation
Aggregate 𝐶𝑜𝑢𝑛𝑡(𝑦, 𝑏𝑖𝑛 𝑥 ) for different 𝑏𝑖𝑛 𝑥
• Standard MapReduce
• Bin function: any projection
• Backoff options: “tail bin”, hashing, hierarchical (shrinkage)
IP 𝑵+ 𝑵− 46964 993424 31 843 12 430
… … …
REST 745623 13964931
query 𝑵+ 𝑵−
facebook 281912 7957321
dozen roses 32791 640964
… … …
REST 6321789 43477252
Query × AdId 𝑵+ 𝑵−
facebook, ad1 54546 978964
facebook, ad2 232343 8431467
dozen roses, ad3 12973 430982
… … …
REST 4419312 52754683
IP[2] 𝑵+ 𝑵−
173.194.*.* 46964 993424
87.250.*.* 6341 91356
131.253.*.* 75126 430826
… … …
Learning from counts: combiner training
IP 𝑵+ 𝑵− 46964 993424 31 843 12 430
… … …
REST 745623 13964931
query 𝑵+ 𝑵−
facebook 281912 7957321
dozen roses 32791 640964
… … …
REST 6321789 43477252
Train predictor
ln 𝑁+
− ln 𝑁−
Original numeric features
Train non-linear model on count-based features
• Counts, transforms, lookup properties
• Additional features can be injected
Query × AdId 𝑵+ 𝑵−
facebook, ad1 54546 978964
facebook, ad2 232343 8431467
dozen roses, ad3 12973 430982
… … …
REST 4419312 52754683
Prediction with counts
IP 𝑵+ 𝑵− 46964 993424 31 843 12 430
… … …
REST 745623 13964931
query 𝑵+ 𝑵−
facebook 281912 7957321
dozen roses 32791 640964
… … …
REST 6321789 43477252
URL × Country 𝑵+ 𝑵−
url1, US 54546 978964
url2, CA 232343 8431467
url3, FR 12973 430982
… … …
REST 4419312 52754683
ln 𝑁+
− ln 𝑁−
Counting →
• Counts are updated continuously
• Combiner re-training infrequent
Original numeric features
Where did it come from?
Li et al. 2010
Pavlov et al. 2009
Lee et al. 1998
Yeh and Patt, 1991
Hillard et al. 2011
• De-facto standard in online advertising industry
• Rediscovered by everyone who really cares about accuracy
Do we need to separate counting and training?
• Can we use use same data for both counting and featurization
• Bad idea: leakage = count features contain labels → overfitting
• Combiner dedicates capacity to decoding example’s label from features
• Can we hold out each example’s label during train-set featurization?
• Bad idea: leakage and bias
• Illustration: two examples, same feature values, different labels (click and non-click)
• Different representations are inconsistent and allow decoding the label
Train predictorCounting
Example ID Label N+[a] N-[a]
1 + 𝑁𝑎
− 1 𝑁 𝑎
2 - 𝑁 𝑎
𝑁 𝑎
Solution via Differential privacy
• What is leakage? Revealing information about any individual label
• Formally: count table cT is ε-leakage-proof if same features for ∀𝑥, 𝑇, 𝑇′ = 𝑇(𝑥𝑖, 𝑦𝑖)
• Theorem: adding noise sampled from Laplace(k/𝜖) makes counts 𝜖-leakage-proof
• Typically 1 ≤ 𝑘 ≤ 100
• Concretely: N+ = N+ + LaplaceRand(0,10k) N- = N- + LaplaceRand(0,10k)
• In practice: LaplaceRand(0,1) sufficient
Learning from counts: why it works
• State-of-the-art accuracy
• Easy to implement on standard clusters
• Monitorable and debuggable
• Temporal changes easy to monitor
• Easy emergency recovery (bot attacks, etc.)
• Error debugging (which feature to blame)
• Modular (vs. monolithic)
• Components: learners and count features
• People: multiple feature/learner authors
Big Learning: Pipelines and Teams
Ravi: text features in R
Jim: matrix projections
Vera: sweeping boosted trees
Steph: count features
on Hadoop
How to scale up Machine Learning to
Parallel and Distributed Data Scientists?
• Cloud-hosted, graphical environment
for creating, training, evaluating, sharing, and deploying
machine learning models
• Supports versioning and collaboration
• Dozens of ML algorithms, extensible via R and Python
Learning with Counts in Azure ML
Criteo 1TB dataset
an hour on HDInsight Hadoop cluster
minutes in AzureML Studio
one click to RRS service
Maximizing Utilization: Keeping it Asynchronous
• Macro-level: concurrently executing pipelines
• Micro-level: asynchronous optimization (with overwriting updates)
• Hogwild SGD [Recht-Re], Downpour SGD [Google Brain]
• Parameter Server [Smola et al.]
• GraphLab [Guestrin et al.]
• SA-SDCA [Tran, Hosseini, Xiao, Finley, B.]
Semi-Asynchronous SDCA:
state-of-the-art linear learning
• SDCA: Stochastic Dual Coordinate Ascent [Shalev-Schwartz & Zhang]
• Plot: SGD marries SVM and they have a beautiful baby
• Algorithm: for each example: update example’s 𝛼𝑖, then re-estimate weights
• Let’s make it asynchronous, Hogwild-style!
• Problem: primal and dual diverge
• Solution: separate thread for primal-dual synchronization
• Taking it out-of-memory: block pseudo-random data loading
SGD update
𝑤 𝑡+1
← 𝑤 𝑡
−𝛾𝑡 𝜆𝑤 𝑡
− 𝑦𝑖 𝜙𝑖
(𝑤 𝑡
⋅ 𝑥𝑖) 𝑥𝑖
SDCA update
← 𝛼𝑖
+ Δ𝛼𝑖
𝑤 𝑡
← 𝑤 𝑡−1
Keeping it asynchronous: it pays off
In closing: Big Learning = Streetfighting
• Big features are resource-hungry: learning with counts, projections…
• Make them distributed and easy to compute/monitor
• Big learners are resource-hungry
• Parallelize them (preferably asynchronously)
• Big pipelines are resource-hungry: authored by many humans
• Run them a collaborative cloud environment

More Related Content

What's hot

Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringSri Ambati
LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019Faisal Siddiqi
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning SystemsXavier Amatriain
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsMark Peng
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated RecommendationsHarald Steck
Kaggle and data science
Kaggle and data scienceKaggle and data science
Kaggle and data scienceAkira Shibata
Data engineering zoomcamp introduction
Data engineering zoomcamp  introductionData engineering zoomcamp  introduction
Data engineering zoomcamp introductionAlexey Grigorev
Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019Faisal Siddiqi
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsJaya Kawale
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingTed Xiao
Supporting decisions with ML
Supporting decisions with MLSupporting decisions with ML
Supporting decisions with MLMegan Neider
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentationHJ van Veen
The How and Why of Feature Engineering
The How and Why of Feature EngineeringThe How and Why of Feature Engineering
The How and Why of Feature EngineeringAlice Zheng
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Preferred Networks
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsGabriel Moreira
ML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talkML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talkFaisal Siddiqi
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at NetflixLinas Baltrunas
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsJustin Basilico

What's hot (20)

Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019LinkedIn talk at Netflix ML Platform meetup Sep 2019
LinkedIn talk at Netflix ML Platform meetup Sep 2019
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
Kaggle and data science
Kaggle and data scienceKaggle and data science
Kaggle and data science
Data engineering zoomcamp introduction
Data engineering zoomcamp  introductionData engineering zoomcamp  introduction
Data engineering zoomcamp introduction
Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019Facebook Talk at Netflix ML Platform meetup Sep 2019
Facebook Talk at Netflix ML Platform meetup Sep 2019
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
Supporting decisions with ML
Supporting decisions with MLSupporting decisions with ML
Supporting decisions with ML
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
The How and Why of Feature Engineering
The How and Why of Feature EngineeringThe How and Why of Feature Engineering
The How and Why of Feature Engineering
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
Learn to Rank search results
Learn to Rank search resultsLearn to Rank search results
Learn to Rank search results
ML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talkML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talk
Context Aware Recommendations at Netflix
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems

Viewers also liked

Melanie Warrick, Deep Learning Engineer, at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, at MLconf SF - 11/13/15Melanie Warrick, Deep Learning Engineer, at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, at MLconf SF - 11/13/15MLconf
Jason Baldridge, Associate Professor of Computational Linguistics, University...
Jason Baldridge, Associate Professor of Computational Linguistics, University...Jason Baldridge, Associate Professor of Computational Linguistics, University...
Jason Baldridge, Associate Professor of Computational Linguistics, University...MLconf
Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16
Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16
Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16MLconf
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016MLconf
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016MLconf
10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle CompetitionsDataRobot

Viewers also liked (6)

Melanie Warrick, Deep Learning Engineer, at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, at MLconf SF - 11/13/15Melanie Warrick, Deep Learning Engineer, at MLconf SF - 11/13/15
Melanie Warrick, Deep Learning Engineer, at MLconf SF - 11/13/15
Jason Baldridge, Associate Professor of Computational Linguistics, University...
Jason Baldridge, Associate Professor of Computational Linguistics, University...Jason Baldridge, Associate Professor of Computational Linguistics, University...
Jason Baldridge, Associate Professor of Computational Linguistics, University...
Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16
Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16
Ted Willke, Sr Principal Engineer, Intel at MLconf SEA - 5/20/16
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions

Similar to Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15

Machine learning workshop @DYP Pune
Machine learning workshop @DYP PuneMachine learning workshop @DYP Pune
Machine learning workshop @DYP PuneGanesh Raskar
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017Manish Pandey
Designing Artificial Intelligence
Designing Artificial IntelligenceDesigning Artificial Intelligence
Designing Artificial IntelligenceDavid Chou
Big Data, Bigger Analytics
Big Data, Bigger AnalyticsBig Data, Bigger Analytics
Big Data, Bigger AnalyticsItzhak Kameli
Deep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataDeep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataWeCloudData
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917Bill Liu
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Gabriel Moreira
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureIvo Andreev
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?Ivo Andreev
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptSanket Shikhar
From CAD to Classroom Final 17 Apr 15
From CAD to Classroom Final 17 Apr 15From CAD to Classroom Final 17 Apr 15
From CAD to Classroom Final 17 Apr 15Nick Palfrey
Machine learning & Time Series Analysis , Finlab CTO 韓承佑
Machine learning & Time Series Analysis ,  Finlab CTO 韓承佑Machine learning & Time Series Analysis ,  Finlab CTO 韓承佑
Machine learning & Time Series Analysis , Finlab CTO 韓承佑TaiLiLuo
Lecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningLecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningmy6305874
Scaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will loveScaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will loveJune Andrews
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_SummaryHiram Fleitas León

Similar to Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15 (20)

Learning with counts
Learning with countsLearning with counts
Learning with counts
Machine learning workshop @DYP Pune
Machine learning workshop @DYP PuneMachine learning workshop @DYP Pune
Machine learning workshop @DYP Pune
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
Designing Artificial Intelligence
Designing Artificial IntelligenceDesigning Artificial Intelligence
Designing Artificial Intelligence
Big Data, Bigger Analytics
Big Data, Bigger AnalyticsBig Data, Bigger Analytics
Big Data, Bigger Analytics
Deep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudDataDeep Learning Introduction - WeCloudData
Deep Learning Introduction - WeCloudData
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917
R programmingmilano
R programmingmilanoR programmingmilano
R programmingmilano
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with Azure
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
From CAD to Classroom Final 17 Apr 15
From CAD to Classroom Final 17 Apr 15From CAD to Classroom Final 17 Apr 15
From CAD to Classroom Final 17 Apr 15
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
Machine learning & Time Series Analysis , Finlab CTO 韓承佑
Machine learning & Time Series Analysis ,  Finlab CTO 韓承佑Machine learning & Time Series Analysis ,  Finlab CTO 韓承佑
Machine learning & Time Series Analysis , Finlab CTO 韓承佑
Machine learning & Time Series Analysis
Machine learning & Time Series AnalysisMachine learning & Time Series Analysis
Machine learning & Time Series Analysis
Lecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learningLecture 1 Pandas Basics.pptx machine learning
Lecture 1 Pandas Basics.pptx machine learning
Scaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will loveScaling & Transforming Stitch Fix's Visibility into What Folks will love
Scaling & Transforming Stitch Fix's Visibility into What Folks will love

More from MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16

Recently uploaded (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece

Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15

  • 1. Many Shades of Scale: Big Learning Beyond Big Data Misha Bilenko Principal Researcher Microsoft Azure Machine Learning
  • 2. ML ♥ More Data What we see in production [Banko and Brill, 2001] What we [used to] learn in school [Mooney, 1996]
  • 3. ML ♥ More Data What we see in production [Banko and Brill, 2001] Is training on more examples all there is to it?
  • 4. Big Learning ≠ Learning(BigData) • Big data: size → distributing storage and processing • Big learning: scale bottlenecks in training and prediction • Classic bottlenecks: bytes and cycles Large datasets → distribute training on larger hardware (FPGAs, GPUs, cores, clusters) • Other scaling dimensions Features Components/People
  • 5. 5 Learning from Counts with DRACuLa Distributed Robust Algorithm for Count-based Learning joint work with Chris Meek (MSR) Wenhan Wang, Pete Luferenko (Azure ML) Scaling to many Features
  • 6. Learning with relational data 𝑝(𝑐𝑙𝑖𝑐𝑘|𝑎𝑑,𝑐𝑜𝑛𝑡𝑒𝑥𝑡,𝑢𝑠𝑒𝑟) adid = 1010054353 adText = K2 ski sale! adURL= Userid = 0xb49129827048dd9b IP = Query = powder skis QCategories = {skiing, outdoor gear} 6 #𝑢𝑠𝑒𝑟𝑠~109 #𝑞𝑢𝑒𝑟𝑖𝑒𝑠~109+ #𝑎𝑑𝑠~107 # 𝑎𝑑 × 𝑞𝑢𝑒𝑟𝑦 ~1010+ • Information retrieval • Advertising, recommending, search: item, page/query, user • Transaction classification • Payment fraud: transaction, product, user • Email spam: message, sender, recipient • Intrusion detection: session, system, user • IoT: device, location
  • 7. Learning with relational data 𝑝(𝑐𝑙𝑖𝑐𝑘|𝑢𝑠𝑒𝑟,𝑐𝑜𝑛𝑡𝑒𝑥𝑡,𝑎𝑑) adid: 1010054353 adText: Fall ski sale! adURL: userid 0xb49129827048dd9b IP query powder skis qCategories {skiing, outdoor gear} 7 • Problem: representing high-cardinality attributes as features • Scalable: to billions of attribute values • Efficient: ~105+ predictions/sec/node • Flexible: for a variety of downstream learners • Adaptive: to distribution change • Standard approaches: binary features, hashing • What everyone should use in industry: learning with counts • Formalization and generalization
  • 8. Standard approach 1: binary (one-hot, indicator) Attributes are mapped to indices based on lookup tables - Not scalable cannot support high-cardinality attributes - Not efficient large value-index dictionary must be retained - Not flexible only linear learners are practical - Not adaptive doesn’t support drift in attribute values 0010000..00 0..01000000 00000..001 0..00001000 #userIPs #ads #queries #queries x #ads 𝑖𝑑𝑥 𝑢 𝑖𝑑𝑥 𝑞 𝑝𝑜𝑤𝑑𝑒𝑟 𝑠𝑘𝑖𝑠𝑖𝑑𝑥 𝑎 𝑘2. 𝑐𝑜𝑚 𝑖𝑑𝑥 𝑝𝑜𝑤𝑑𝑒𝑟 𝑠𝑘𝑖𝑠, 𝑘2. 𝑐𝑜𝑚 8
  • 9. Standard approach 1+: feature hashing Attributes are mapped to indices via hashing: ℎ 𝑥𝑖 = ℎ𝑎𝑠ℎ 𝑥𝑖 mod 𝑚 • Collisions are rare; dot products unbiased + Scalable no mapping tables + Efficient low cost, preserves sparsity - Not flexible only linear learners are practical ± Adaptive new values ok, no temporal effects 0000010..0000010000..0000010...000001000 ℎ powder skis + k2. com ℎ powder skis ℎ k2. com ℎ 𝑚 ∼ 107 [Moody ‘89, Tarjan-Skadron ‘05, Weinberger+ ’08] 9 𝜙(𝑥)
  • 10. Learning with counts • Features are per-label counts [+odds] [+backoff] 𝝓 = [N+ N- log(N+)-log(N-) IsRest] • log(N+)-log(N-) = log 𝒑(+) 𝒑(−) : log-odds/Naïve Bayes estimate • N+, N-: indicators of confidence of the naïve estimate • IsFromRest: indicator of back-off vs. “real count” 𝐶𝑜𝑢𝑛𝑡𝑠( 𝐶𝑜𝑢𝑛𝑡𝑠( 𝐶𝑜𝑢𝑛𝑡𝑠(powder skis) powder skis 𝐶𝑜𝑢𝑛𝑡𝑠(powder skis, powder skis, IP 𝑵+ 𝑵− 46964 993424 31 843 12 430 … … … REST 745623 13964931 𝝓(𝑪𝒐𝒖𝒏𝒕𝒔 (𝑰𝑷)) 𝝓(𝑪𝒐𝒖𝒏𝒕𝒔 (𝒂𝒅)) 𝝓(𝑪𝒐𝒖𝒏𝒕𝒔 (𝒒𝒖𝒆𝒓𝒚)) 𝝓(𝑪𝒐𝒖𝒏𝒕𝒔 (𝒒𝒖𝒆𝒓𝒚, 𝒂𝒅))
  • 11. Learning with counts • Features are per-label counts [+odds] [+backoff] 𝝓 = [N+ N- log(N+)-log(N-) IsRest] + Scalable “head” in memory + tail in backoff; or: count-min sketch + Efficient low cost, low dimensionality + Flexible low dimensionality works well with non-linear learners + Adaptive new values easily added, back-off for infrequent values, temporal counts 𝝓(𝑪𝒐𝒖𝒏𝒕𝒔(𝒖𝒔𝒆𝒓)) 𝝓(𝑪𝒐𝒖𝒏𝒕𝒔(𝒂𝒅)) 𝝓(𝑪𝒐𝒖𝒏𝒕𝒔(𝒒𝒖𝒆𝒓𝒚) 𝝓(𝑪(𝒒𝒖𝒆𝒓𝒚 × 𝒂𝒅)) 𝐶𝑜𝑢𝑛𝑡𝑠( 𝐶𝑜𝑢𝑛𝑡𝑠( 𝐶𝑜𝑢𝑛𝑡𝑠(powder skis) powder skis 𝐶𝑜𝑢𝑛𝑡𝑠(powder skis, powder skis, 𝝓(𝑪𝒐𝒖𝒏𝒕𝒔 (𝑰𝑷)) 𝝓(𝑪𝒐𝒖𝒏𝒕𝒔 (𝒂𝒅)) 𝝓(𝑪𝒐𝒖𝒏𝒕𝒔 (𝒒𝒖𝒆𝒓𝒚)) 𝝓(𝑪𝒐𝒖𝒏𝒕𝒔 (𝒒𝒖𝒆𝒓𝒚, 𝒂𝒅)) IP 𝑵+ 𝑵− 46964 993424 31 843 12 430 … … … REST 745623 13964931
  • 12. Backoff is a pain. Count-Min Sketches to the Rescue! [Cormode-Muthukrishnan ‘04] Intuition: correct for collisions by using multiple hashes Featurize: 𝑚𝑖𝑛𝑗 (𝑀[𝑗][ℎ𝑗(𝑖)]) Estimation Time : O(d) = M (d x w) Count: for each hash function M[j][hj(i)] ++ Update Time: O(d)
  • 13. Learning from counts: aggregation Aggregate 𝐶𝑜𝑢𝑛𝑡(𝑦, 𝑏𝑖𝑛 𝑥 ) for different 𝑏𝑖𝑛 𝑥 • Standard MapReduce • Bin function: any projection • Backoff options: “tail bin”, hashing, hierarchical (shrinkage) IP 𝑵+ 𝑵− 46964 993424 31 843 12 430 … … … REST 745623 13964931 query 𝑵+ 𝑵− facebook 281912 7957321 dozen roses 32791 640964 … … … REST 6321789 43477252 Query × AdId 𝑵+ 𝑵− facebook, ad1 54546 978964 facebook, ad2 232343 8431467 dozen roses, ad3 12973 430982 … … … REST 4419312 52754683 timeTnow Counting IP[2] 𝑵+ 𝑵− 173.194.*.* 46964 993424 87.250.*.* 6341 91356 131.253.*.* 75126 430826 … … … 13
  • 14. Learning from counts: combiner training IP 𝑵+ 𝑵− 46964 993424 31 843 12 430 … … … REST 745623 13964931 query 𝑵+ 𝑵− facebook 281912 7957321 dozen roses 32791 640964 … … … REST 6321789 43477252 timeTnow Train predictor …. IsBackoff ln 𝑁+ − ln 𝑁− Aggregated features Original numeric features 𝑁− 𝑁+ Counting Train non-linear model on count-based features • Counts, transforms, lookup properties • Additional features can be injected Query × AdId 𝑵+ 𝑵− facebook, ad1 54546 978964 facebook, ad2 232343 8431467 dozen roses, ad3 12973 430982 … … … REST 4419312 52754683 14
  • 15. Prediction with counts IP 𝑵+ 𝑵− 46964 993424 31 843 12 430 … … … REST 745623 13964931 query 𝑵+ 𝑵− facebook 281912 7957321 dozen roses 32791 640964 … … … REST 6321789 43477252 URL × Country 𝑵+ 𝑵− url1, US 54546 978964 url2, CA 232343 8431467 url3, FR 12973 430982 … … … REST 4419312 52754683 time Tnow …. IsBackoff ln 𝑁+ − ln 𝑁− Aggregated features 𝑁− 𝑁+ Counting → • Counts are updated continuously • Combiner re-training infrequent Ttrain Original numeric features
  • 16. Where did it come from? Li et al. 2010 Pavlov et al. 2009 Lee et al. 1998 Yeh and Patt, 1991 16 Hillard et al. 2011 • De-facto standard in online advertising industry • Rediscovered by everyone who really cares about accuracy
  • 17. Do we need to separate counting and training? • Can we use use same data for both counting and featurization • Bad idea: leakage = count features contain labels → overfitting • Combiner dedicates capacity to decoding example’s label from features • Can we hold out each example’s label during train-set featurization? • Bad idea: leakage and bias • Illustration: two examples, same feature values, different labels (click and non-click) • Different representations are inconsistent and allow decoding the label Train predictorCounting Example ID Label N+[a] N-[a] 1 + 𝑁𝑎 + − 1 𝑁 𝑎 − 2 - 𝑁 𝑎 + 𝑁 𝑎 − -1
  • 18. Solution via Differential privacy • What is leakage? Revealing information about any individual label • Formally: count table cT is ε-leakage-proof if same features for ∀𝑥, 𝑇, 𝑇′ = 𝑇(𝑥𝑖, 𝑦𝑖) • Theorem: adding noise sampled from Laplace(k/𝜖) makes counts 𝜖-leakage-proof • Typically 1 ≤ 𝑘 ≤ 100 • Concretely: N+ = N+ + LaplaceRand(0,10k) N- = N- + LaplaceRand(0,10k) • In practice: LaplaceRand(0,1) sufficient
  • 19. Learning from counts: why it works • State-of-the-art accuracy • Easy to implement on standard clusters • Monitorable and debuggable • Temporal changes easy to monitor • Easy emergency recovery (bot attacks, etc.) • Error debugging (which feature to blame) • Modular (vs. monolithic) • Components: learners and count features • People: multiple feature/learner authors 19
  • 20. Big Learning: Pipelines and Teams Ravi: text features in R Jim: matrix projections Vera: sweeping boosted trees Steph: count features on Hadoop How to scale up Machine Learning to Parallel and Distributed Data Scientists?
  • 21. AzureML • Cloud-hosted, graphical environment for creating, training, evaluating, sharing, and deploying machine learning models • Supports versioning and collaboration • Dozens of ML algorithms, extensible via R and Python
  • 23. Learning with Counts in Azure ML Criteo 1TB dataset Counting: an hour on HDInsight Hadoop cluster Training: minutes in AzureML Studio Deployment one click to RRS service
  • 24. Maximizing Utilization: Keeping it Asynchronous • Macro-level: concurrently executing pipelines • Micro-level: asynchronous optimization (with overwriting updates) • Hogwild SGD [Recht-Re], Downpour SGD [Google Brain] • Parameter Server [Smola et al.] • GraphLab [Guestrin et al.] • SA-SDCA [Tran, Hosseini, Xiao, Finley, B.]
  • 25. Semi-Asynchronous SDCA: state-of-the-art linear learning • SDCA: Stochastic Dual Coordinate Ascent [Shalev-Schwartz & Zhang] • Plot: SGD marries SVM and they have a beautiful baby • Algorithm: for each example: update example’s 𝛼𝑖, then re-estimate weights • Let’s make it asynchronous, Hogwild-style! • Problem: primal and dual diverge • Solution: separate thread for primal-dual synchronization • Taking it out-of-memory: block pseudo-random data loading SGD update 𝑤 𝑡+1 ← 𝑤 𝑡 −𝛾𝑡 𝜆𝑤 𝑡 − 𝑦𝑖 𝜙𝑖 ′ (𝑤 𝑡 ⋅ 𝑥𝑖) 𝑥𝑖 SDCA update 𝛼𝑖 𝑡 ← 𝛼𝑖 𝑡−1 + Δ𝛼𝑖 𝑤 𝑡 ← 𝑤 𝑡−1 + Δ𝛼𝑖 𝜆𝑛 𝑥𝑖
  • 27. In closing: Big Learning = Streetfighting • Big features are resource-hungry: learning with counts, projections… • Make them distributed and easy to compute/monitor • Big learners are resource-hungry • Parallelize them (preferably asynchronously) • Big pipelines are resource-hungry: authored by many humans • Run them a collaborative cloud environment