SlideShare a Scribd company logo
Learn Like a Human – Taking Machine
Learning from Batch to Real-Time
Elad Rosenheim
Who am I
 Architect at Dynamic Yield,
“Predictors” Team Lead
 Previously:
 AlphaCSP
 SAP
 Performance & Scale, DevOps
 Measure All the Things!
 East-Asia & Japan
Who’s Dynamic Yield?
We’re optimizing & personalizing websites since 2011
 Start-up in Tel-Aviv
headed by Liad Agmon
 I Joined as 5th employee, we’re 50 now and growing fast
On the Agenda
Our clients’ problem
Old School Solutions
Meet the ML Bandits
Our clients’ problem
Publishers, retailers, SaaS
all share a common problem
They know their domain
but not how to optimize for each user
Screen real-estate is limited
yet everyone sees the same thing
What top videos to show on
NBC News’ site?
What user segments should see
this element at this location?
What’s the best layout for this
element?
Both the layout of this page and
each element in it deserve testing
What’s the best layout?
What types of products to show
whom?
What articles to show on
ynet’s homepage?
What titles and images?
In what order?
What is the best default sort order for products on Adika?
Does is significantly differ between user segments?
The Beginning
 First, there was the educated guess
 Then, there was the A/B test
 "Data Beats Opinion“
 Freedom to experiment (with nice tools)
 Hopefully: less fear of change, less politics
 How does it work?
 Split traffic between baseline and alternative variations
 In theory: sit & wait for significant results
 In practice: peek at the numbers till the nice “95% confidence”
A/B Tests: Already Old School?
While you wait, you're bleeding clicks
clicks == money
What about the really dynamic stuff?
Campaigns, Current Headlines, Products on Sale
Enter the Multi-Arm Bandits
 A Single-Arm Bandit
 Suppose I have multiple arms in front of me,
each with its unknown mean reward…
 How do I optimize income from multiple machines?
 Caution or Haste?
 Explore vs. Exploit
 In our context:
How do I optimize multiple variations?
Bandits - A Classic Problem
 (Very) Simple Solutions
 ε-greedy, ε–decreasing
 First 100% random explore, then ~90% exploit?
 Magic numbers, built-in revenue loss
 Bayesian-based approaches
 Smoother curve from explore to exploit
 “Winner” is now a less relevant term
Bandits work well when…
 We want to find the variation “best on average“
…but we’re not improving the conversion rate of any single variation
2.4% 1.7% 0.4%
Enter Personalization
 Each of us is a beautiful and unique feature vector!
 By showing the right variation to the right people,
we can improve conversions per variation
and beat the best variation
 ML Challenge Accepted
The Usual Suspects
Collaborative Filtering?
 Very big, very sparse matrix
 Cold Start
 Batch
 Not suitable in this case
Classifiers?
Logistic Regression, Random Forest et al.
 Periodically learn over all converters so far
 More data == more time, bigger model
 Not the classic question
What We Need
 Like a bandit, we need to learn as we go (not in batch),
but this time with “context” - the user’s data
 Incremental Learning over the stream of impressions & rewards
(“Partial Fit”)
 We’re looking to…
 Start learning from the first impression
 Handle the explore-exploit curve
 Run fast (enough)
 In the worst case: converge on the best variation, like a bandit
Meet the Contextual Bandits
 They “eat” the data stream
 They demand fast access to user data
 Historical or immediate
 Their model is always ready for action
 In the Papers
 Linear Bayes, LinUCB
 What we do: Per-Variation Logistic Regression
 A variant supporting updates in “mini-batches”
 Exploration-on-top
 Worst case: “Garbage In  Multi Arm Bandit Out”
 Light on memory, compact output
 Online should be fast & scale
 Offline: a testbed for iteratively testing new ideas
 New algorithms
 Tweaked parameters
 Feature transformations
How We Do It: Online & Offline
The Online Flow
DY Web Servers
a. get our script
b. log impressions,
conversions
Queue
Per Test
Learn
Workers
User
DB
Persist
ModelLoad to
Predict Server
Queue
Per Test
A B C
A B C
A B C
Predictions
The Offline Evaluator
 Test, Improve, Iterate
 Using real-world data
 Using generated data
 From easy to hard
Going Global
 Learn in the center site, fast predict in each geo. How?
 Push models via local Redis slaves
 Compressed SSH tunnel
 User data - daily aggregation
 Storage into LMDB (simple, fast memory-mapped K/V DB)
 Sync via S3 (LZ4 compressed), read from SSD
 Learn & Predict services
 Python as ML lingua franca: NumPy, SciPy, scikit-learn
Elad & Idan Say Goodbye
 Better data beats better algorithms
 Reduce aggressively
 Keep It Simple, Smart!
 Elad Rosenheim
 Idan Michaeli
 Read our blog
 Hiring? but of course! What’s with the Groundhog?

More Related Content

Similar to Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)

Learn Like a Human: Taking Machine Learning from Batch to Real-Time
Learn Like a Human: Taking Machine Learning from Batch to Real-TimeLearn Like a Human: Taking Machine Learning from Batch to Real-Time
Learn Like a Human: Taking Machine Learning from Batch to Real-Time
Dynamic Yield
 
Growth Hacking Conference '17 - Antwerp
Growth Hacking Conference '17 - AntwerpGrowth Hacking Conference '17 - Antwerp
Growth Hacking Conference '17 - Antwerp
Thibault Imbert
 
2010 04 28 The Lean Startup webinar for the Lean Enterprise Institute
2010 04 28 The Lean Startup webinar for the Lean Enterprise Institute2010 04 28 The Lean Startup webinar for the Lean Enterprise Institute
2010 04 28 The Lean Startup webinar for the Lean Enterprise Institute
Eric Ries
 
Master the essentials of conversion optimization
Master the essentials of conversion optimizationMaster the essentials of conversion optimization
Master the essentials of conversion optimization
Arnas Rackauskas
 
Velocity Conference: Building a Scalable, Global SaaS Offering: Lessons from ...
Velocity Conference: Building a Scalable, Global SaaS Offering: Lessons from ...Velocity Conference: Building a Scalable, Global SaaS Offering: Lessons from ...
Velocity Conference: Building a Scalable, Global SaaS Offering: Lessons from ...
Intuit Inc.
 
WordCamp Nashville 2016: The promise and peril of Agile and Lean practices
WordCamp Nashville 2016: The promise and peril of Agile and Lean practicesWordCamp Nashville 2016: The promise and peril of Agile and Lean practices
WordCamp Nashville 2016: The promise and peril of Agile and Lean practices
mtoppa
 
HadoopSummit2015_SelfEvolvingModels
HadoopSummit2015_SelfEvolvingModelsHadoopSummit2015_SelfEvolvingModels
HadoopSummit2015_SelfEvolvingModels
peas2bees
 
2010 08 19 The Lean Startup TechAviv
2010 08 19 The Lean Startup TechAviv2010 08 19 The Lean Startup TechAviv
2010 08 19 The Lean Startup TechAviv
Eric Ries
 
Einstein Analytics Prediction Builder
Einstein Analytics Prediction BuilderEinstein Analytics Prediction Builder
Einstein Analytics Prediction Builder
rikkehovgaard
 
LTK - FC - Supply Chain - Startup Challenge v3.pdf
LTK - FC - Supply Chain - Startup Challenge v3.pdfLTK - FC - Supply Chain - Startup Challenge v3.pdf
LTK - FC - Supply Chain - Startup Challenge v3.pdf
jeroen_tjepkema
 
Big data workshop october 18
Big data workshop october 18Big data workshop october 18
Big data workshop october 18
Mohammad Zaman
 
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate
BDX 2016 - Kevin lyons & yakir buskilla  @ eXelate BDX 2016 - Kevin lyons & yakir buskilla  @ eXelate
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate
Ido Shilon
 
Adobe User Group Amsterdam - Correlation between Innovation & Growth Hacking
Adobe User Group Amsterdam - Correlation between Innovation & Growth Hacking Adobe User Group Amsterdam - Correlation between Innovation & Growth Hacking
Adobe User Group Amsterdam - Correlation between Innovation & Growth Hacking
jeroentjepkema
 
Self Evolving Model to Attain to State of Dynamic System Accuracy
Self Evolving Model to Attain to State of Dynamic System AccuracySelf Evolving Model to Attain to State of Dynamic System Accuracy
Self Evolving Model to Attain to State of Dynamic System Accuracy
DataWorks Summit
 
HadoopSummit'2015:Self Evolving Models for Dynamic System Accuracy
HadoopSummit'2015:Self Evolving Models for Dynamic System AccuracyHadoopSummit'2015:Self Evolving Models for Dynamic System Accuracy
HadoopSummit'2015:Self Evolving Models for Dynamic System Accuracy
Rekha Joshi
 
2009 10 28 The Lean Startup In Paris
2009 10 28 The Lean Startup In Paris2009 10 28 The Lean Startup In Paris
2009 10 28 The Lean Startup In Paris
Eric Ries
 
2010 10 19 the lean startup workshop for i_gap ireland
2010 10 19 the lean startup workshop for i_gap ireland2010 10 19 the lean startup workshop for i_gap ireland
2010 10 19 the lean startup workshop for i_gap ireland
Eric Ries
 
Croll lean analytics workshop (3h) - lean ux nyc april 2014
Croll   lean analytics workshop (3h) - lean ux nyc april 2014Croll   lean analytics workshop (3h) - lean ux nyc april 2014
Croll lean analytics workshop (3h) - lean ux nyc april 2014
Lean Analytics
 
Catalina Oyaneder | Ultimate Stack Compilation | Codemotion Madrid 2018
Catalina Oyaneder | Ultimate Stack Compilation | Codemotion Madrid 2018 Catalina Oyaneder | Ultimate Stack Compilation | Codemotion Madrid 2018
Catalina Oyaneder | Ultimate Stack Compilation | Codemotion Madrid 2018
Codemotion
 
2010 10 28 the lean startup at ucsd
2010 10 28 the lean startup at ucsd2010 10 28 the lean startup at ucsd
2010 10 28 the lean startup at ucsd
Eric Ries
 

Similar to Taking Machine Learning from Batch to Real-Time (big data eXposed 2015) (20)

Learn Like a Human: Taking Machine Learning from Batch to Real-Time
Learn Like a Human: Taking Machine Learning from Batch to Real-TimeLearn Like a Human: Taking Machine Learning from Batch to Real-Time
Learn Like a Human: Taking Machine Learning from Batch to Real-Time
 
Growth Hacking Conference '17 - Antwerp
Growth Hacking Conference '17 - AntwerpGrowth Hacking Conference '17 - Antwerp
Growth Hacking Conference '17 - Antwerp
 
2010 04 28 The Lean Startup webinar for the Lean Enterprise Institute
2010 04 28 The Lean Startup webinar for the Lean Enterprise Institute2010 04 28 The Lean Startup webinar for the Lean Enterprise Institute
2010 04 28 The Lean Startup webinar for the Lean Enterprise Institute
 
Master the essentials of conversion optimization
Master the essentials of conversion optimizationMaster the essentials of conversion optimization
Master the essentials of conversion optimization
 
Velocity Conference: Building a Scalable, Global SaaS Offering: Lessons from ...
Velocity Conference: Building a Scalable, Global SaaS Offering: Lessons from ...Velocity Conference: Building a Scalable, Global SaaS Offering: Lessons from ...
Velocity Conference: Building a Scalable, Global SaaS Offering: Lessons from ...
 
WordCamp Nashville 2016: The promise and peril of Agile and Lean practices
WordCamp Nashville 2016: The promise and peril of Agile and Lean practicesWordCamp Nashville 2016: The promise and peril of Agile and Lean practices
WordCamp Nashville 2016: The promise and peril of Agile and Lean practices
 
HadoopSummit2015_SelfEvolvingModels
HadoopSummit2015_SelfEvolvingModelsHadoopSummit2015_SelfEvolvingModels
HadoopSummit2015_SelfEvolvingModels
 
2010 08 19 The Lean Startup TechAviv
2010 08 19 The Lean Startup TechAviv2010 08 19 The Lean Startup TechAviv
2010 08 19 The Lean Startup TechAviv
 
Einstein Analytics Prediction Builder
Einstein Analytics Prediction BuilderEinstein Analytics Prediction Builder
Einstein Analytics Prediction Builder
 
LTK - FC - Supply Chain - Startup Challenge v3.pdf
LTK - FC - Supply Chain - Startup Challenge v3.pdfLTK - FC - Supply Chain - Startup Challenge v3.pdf
LTK - FC - Supply Chain - Startup Challenge v3.pdf
 
Big data workshop october 18
Big data workshop october 18Big data workshop october 18
Big data workshop october 18
 
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate
BDX 2016 - Kevin lyons & yakir buskilla  @ eXelate BDX 2016 - Kevin lyons & yakir buskilla  @ eXelate
BDX 2016 - Kevin lyons & yakir buskilla @ eXelate
 
Adobe User Group Amsterdam - Correlation between Innovation & Growth Hacking
Adobe User Group Amsterdam - Correlation between Innovation & Growth Hacking Adobe User Group Amsterdam - Correlation between Innovation & Growth Hacking
Adobe User Group Amsterdam - Correlation between Innovation & Growth Hacking
 
Self Evolving Model to Attain to State of Dynamic System Accuracy
Self Evolving Model to Attain to State of Dynamic System AccuracySelf Evolving Model to Attain to State of Dynamic System Accuracy
Self Evolving Model to Attain to State of Dynamic System Accuracy
 
HadoopSummit'2015:Self Evolving Models for Dynamic System Accuracy
HadoopSummit'2015:Self Evolving Models for Dynamic System AccuracyHadoopSummit'2015:Self Evolving Models for Dynamic System Accuracy
HadoopSummit'2015:Self Evolving Models for Dynamic System Accuracy
 
2009 10 28 The Lean Startup In Paris
2009 10 28 The Lean Startup In Paris2009 10 28 The Lean Startup In Paris
2009 10 28 The Lean Startup In Paris
 
2010 10 19 the lean startup workshop for i_gap ireland
2010 10 19 the lean startup workshop for i_gap ireland2010 10 19 the lean startup workshop for i_gap ireland
2010 10 19 the lean startup workshop for i_gap ireland
 
Croll lean analytics workshop (3h) - lean ux nyc april 2014
Croll   lean analytics workshop (3h) - lean ux nyc april 2014Croll   lean analytics workshop (3h) - lean ux nyc april 2014
Croll lean analytics workshop (3h) - lean ux nyc april 2014
 
Catalina Oyaneder | Ultimate Stack Compilation | Codemotion Madrid 2018
Catalina Oyaneder | Ultimate Stack Compilation | Codemotion Madrid 2018 Catalina Oyaneder | Ultimate Stack Compilation | Codemotion Madrid 2018
Catalina Oyaneder | Ultimate Stack Compilation | Codemotion Madrid 2018
 
2010 10 28 the lean startup at ucsd
2010 10 28 the lean startup at ucsd2010 10 28 the lean startup at ucsd
2010 10 28 the lean startup at ucsd
 

Recently uploaded

Mobile app Development Services | Drona Infotech
Mobile app Development Services  | Drona InfotechMobile app Development Services  | Drona Infotech
Mobile app Development Services | Drona Infotech
Drona Infotech
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
Aftab Hussain
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Łukasz Chruściel
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
Alina Yurenko
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
Ayan Halder
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
Roshan Dwivedi
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Crescat
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 

Recently uploaded (20)

Mobile app Development Services | Drona Infotech
Mobile app Development Services  | Drona InfotechMobile app Development Services  | Drona Infotech
Mobile app Development Services | Drona Infotech
 
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeA Study of Variable-Role-based Feature Enrichment in Neural Models of Code
A Study of Variable-Role-based Feature Enrichment in Neural Models of Code
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Need for Speed: Removing speed bumps from your Symfony projects ⚡️
Need for Speed: Removing speed bumps from your Symfony projects ⚡️
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)GOING AOT WITH GRAALVM FOR  SPRING BOOT (SPRING IO)
GOING AOT WITH GRAALVM FOR SPRING BOOT (SPRING IO)
 
Using Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional SafetyUsing Xen Hypervisor for Functional Safety
Using Xen Hypervisor for Functional Safety
 
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 

Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)

  • 1.
  • 2. Learn Like a Human – Taking Machine Learning from Batch to Real-Time Elad Rosenheim
  • 3. Who am I  Architect at Dynamic Yield, “Predictors” Team Lead  Previously:  AlphaCSP  SAP  Performance & Scale, DevOps  Measure All the Things!  East-Asia & Japan
  • 4. Who’s Dynamic Yield? We’re optimizing & personalizing websites since 2011  Start-up in Tel-Aviv headed by Liad Agmon  I Joined as 5th employee, we’re 50 now and growing fast
  • 5. On the Agenda Our clients’ problem Old School Solutions Meet the ML Bandits
  • 6. Our clients’ problem Publishers, retailers, SaaS all share a common problem They know their domain but not how to optimize for each user Screen real-estate is limited yet everyone sees the same thing
  • 7. What top videos to show on NBC News’ site? What user segments should see this element at this location? What’s the best layout for this element?
  • 8. Both the layout of this page and each element in it deserve testing What’s the best layout? What types of products to show whom?
  • 9. What articles to show on ynet’s homepage? What titles and images? In what order?
  • 10. What is the best default sort order for products on Adika? Does is significantly differ between user segments?
  • 11. The Beginning  First, there was the educated guess  Then, there was the A/B test  "Data Beats Opinion“  Freedom to experiment (with nice tools)  Hopefully: less fear of change, less politics  How does it work?  Split traffic between baseline and alternative variations  In theory: sit & wait for significant results  In practice: peek at the numbers till the nice “95% confidence”
  • 12. A/B Tests: Already Old School? While you wait, you're bleeding clicks clicks == money What about the really dynamic stuff? Campaigns, Current Headlines, Products on Sale
  • 13. Enter the Multi-Arm Bandits  A Single-Arm Bandit  Suppose I have multiple arms in front of me, each with its unknown mean reward…  How do I optimize income from multiple machines?  Caution or Haste?  Explore vs. Exploit  In our context: How do I optimize multiple variations?
  • 14. Bandits - A Classic Problem  (Very) Simple Solutions  ε-greedy, ε–decreasing  First 100% random explore, then ~90% exploit?  Magic numbers, built-in revenue loss  Bayesian-based approaches  Smoother curve from explore to exploit  “Winner” is now a less relevant term
  • 15. Bandits work well when…  We want to find the variation “best on average“ …but we’re not improving the conversion rate of any single variation 2.4% 1.7% 0.4%
  • 16. Enter Personalization  Each of us is a beautiful and unique feature vector!  By showing the right variation to the right people, we can improve conversions per variation and beat the best variation  ML Challenge Accepted
  • 17. The Usual Suspects Collaborative Filtering?  Very big, very sparse matrix  Cold Start  Batch  Not suitable in this case Classifiers? Logistic Regression, Random Forest et al.  Periodically learn over all converters so far  More data == more time, bigger model  Not the classic question
  • 18. What We Need  Like a bandit, we need to learn as we go (not in batch), but this time with “context” - the user’s data  Incremental Learning over the stream of impressions & rewards (“Partial Fit”)  We’re looking to…  Start learning from the first impression  Handle the explore-exploit curve  Run fast (enough)  In the worst case: converge on the best variation, like a bandit
  • 19. Meet the Contextual Bandits  They “eat” the data stream  They demand fast access to user data  Historical or immediate  Their model is always ready for action  In the Papers  Linear Bayes, LinUCB  What we do: Per-Variation Logistic Regression  A variant supporting updates in “mini-batches”  Exploration-on-top  Worst case: “Garbage In  Multi Arm Bandit Out”  Light on memory, compact output
  • 20.  Online should be fast & scale  Offline: a testbed for iteratively testing new ideas  New algorithms  Tweaked parameters  Feature transformations How We Do It: Online & Offline
  • 21. The Online Flow DY Web Servers a. get our script b. log impressions, conversions Queue Per Test Learn Workers User DB Persist ModelLoad to Predict Server Queue Per Test A B C A B C A B C Predictions
  • 22. The Offline Evaluator  Test, Improve, Iterate  Using real-world data  Using generated data  From easy to hard
  • 23. Going Global  Learn in the center site, fast predict in each geo. How?  Push models via local Redis slaves  Compressed SSH tunnel  User data - daily aggregation  Storage into LMDB (simple, fast memory-mapped K/V DB)  Sync via S3 (LZ4 compressed), read from SSD  Learn & Predict services  Python as ML lingua franca: NumPy, SciPy, scikit-learn
  • 24. Elad & Idan Say Goodbye  Better data beats better algorithms  Reduce aggressively  Keep It Simple, Smart!  Elad Rosenheim  Idan Michaeli  Read our blog  Hiring? but of course! What’s with the Groundhog?

Editor's Notes

  1. איך ניסו עד היום לתקוף את הבעיות האלה?
  2. בואו ניקח את הפיתרון שלנו שלב אחד קדימה
  3. בואו ניקח את הפיתרון שלנו שלב אחד קדימה
  4. בואו ניקח את הפיתרון שלנו שלב אחד קדימה
  5. בואו ניקח את הפיתרון שלנו שלב אחד קדימה
  6. בקיצור, העולם לא כ"כ מופלא כפי שמוכרים לנו
  7. בואו ניקח את הפיתרון שלנו שלב אחד קדימה
  8. לבעיה הזו אין פיתרון אופטימלי, אבל יש בהחלט גישות שונות ברמות שונות של מורכבות
  9. עכשיו, בנדיטים הם לא כ"כ רעים למעשה...
  10. והמגבלה הזו מביאה אותנו לשלב הבא בחיפוש, והוא: פרסונליזציה
  11. יפה, אז קיבלנו את האתגר. על איזה אלגוריתמים אנחנו חושבים?
  12. אז בעצם, אנחנו מחפשים משהו אחר – משפחה חדשה של אלגוריתמים
  13. אז בואו ונכיר את הגיבורים החדשים שלנו...
  14. אלגוריתם כשלעצמו זה דבר נחמד, אבל איך בונים את כל המעטפת לפרודקשן?
  15. בואו ונבין טוב יותר את הזרימה