SlideShare a Scribd company logo
Real-time Learning

©MapR Technologies - Confidential       1
whoami – Ted Dunning

     Chief Application Architect, MapR Technologies
     Committer, member, Apache Software Foundation
       –   particularly Mahout, Zookeeper and Drill


                       (we’re hiring)

     Contact me at
       tdunning@maprtech.com
       tdunning@apache.com
       ted.dunning@gmail.com
       @ted_dunning



©MapR Technologies - Confidential            2
     Slides and such (available late tonight):
       –   http://www.mapr.com/company/events/devoxx-3-29-2013
     Hash tags: #mapr #devoxxfr




©MapR Technologies - Confidential        3
Agenda

     What is real-time learning?
     A sample problem
     Philosophy, statistics and the nature of the knowledge
     A solution
     System design




©MapR Technologies - Confidential     4
What is Real-time Learning?

     Training data arrives one record at a time


     The system improves a mathematical model based on a small
      amount of training data


     We retain at most a fixed amount of state


     Each learning step takes O(1) time and memory




©MapR Technologies - Confidential      5
We have a product
                             to sell …
                                    from a web-site


©MapR Technologies - Confidential        6
What tag-
                               What                                 line?
                              picture?
                                                  Bogus Dog Food is the Best!
                                                  Now available in handy 1 ton
                                                  bags!



                                         Buy 5!




                                                    What call to
                                                     action?




©MapR Technologies - Confidential                           7
The Challenge

     Design decisions affect probability of success
       –   Cheesy web-sites don’t even sell cheese



     The best designers do better when allowed to fail
       –   Exploration juices creativity




     But failing is expensive
       –   If only because we could have succeeded
       –   But also because offending or disappointing customers is bad


©MapR Technologies - Confidential            8
A Quick Diversion

     You see a coin
       –   What is the probability of heads?
       –   Could it be larger or smaller than that?
     I flip the coin and while it is in the air ask again
     I catch the coin and ask again
     I look at the coin (and you don’t) and ask again
     Why does the answer change?
       –   And did it ever have a single value?




©MapR Technologies - Confidential                 9
A Philosophical Conclusion

     Probability as expressed by humans is subjective and depends on
      information and experience




©MapR Technologies - Confidential    10
So now you understand
                   Bayesian probability



©MapR Technologies - Confidential   11
Another Quick Diversion

     Let’s play a shell game
     This is a special shell game
     It costs you nothing to play
     The pea has constant probability of being under each shell
           (trust me)
     How do you find the best shell?
     How do you find it while maximizing the number of wins?




©MapR Technologies - Confidential       12
Pause for short
                                    con-game



©MapR Technologies - Confidential          13
Conclusions

     Can you identify winners or losers without trying them out?
       No


     Can you ever completely eliminate a shell with a bad streak?
       No


     Should you keep trying apparent losers?
       Yes, but at a decreasing rate




©MapR Technologies - Confidential      14
So now you understand
                   multi-armed bandits



©MapR Technologies - Confidential   15
Is there an optimum
                   strategy?



©MapR Technologies - Confidential   16
Thompson Sampling

     Select each shell according to the probability that it is the best


     Probability that it is the best can be computed using posterior

                   é                           ù
P(i is best) = ò I êE[ri | q ] = max E[rj | q ]ú P(q | D) dq
                   ë              j            û
     But I promised a simple answer




©MapR Technologies - Confidential      17
Thompson Sampling – Take 2

     Sample θ

                   q ~ P(q | D)
     Pick i to maximize reward

                  i = argmax E[r | q ]
                                    j

     Record result from using i




©MapR Technologies - Confidential        18
Nearly Forgotten until Recently

     Citations for Thompson sampling




©MapR Technologies - Confidential   19
Bayesian Bandit for the Shells

     Compute distributions based on data so far
     Sample p1, p2 and p3 from these distributions
     Pick shell i where i = argmaxi pi


     Lemma 1: The probability of picking shell i will match the
      probability it is the best shell


     Lemma 2: This is as good as it gets




©MapR Technologies - Confidential         20
And it works!


                                    0.12


                                    0.11


                                     0.1


                                    0.09


                                    0.08


                                    0.07
                           regret




                                    0.06
                                                                 ε- greedy, ε = 0.05
                                    0.05


                                    0.04                                               Bayesian Bandit with Gam m a- Norm al
                                    0.03


                                    0.02


                                    0.01


                                      0
                                           0   100   200   300       400    500        600    700    800    900    1000   1100

                                                                                   n




©MapR Technologies - Confidential                                                 21
Video Demo




©MapR Technologies - Confidential       22
The Basic Idea

     We can encode a distribution by sampling
     Sampling allows unification of exploration and exploitation


     Can be extended to more general response models




©MapR Technologies - Confidential     23
The Original Problem

                                                                      x2
                                    x1

                                                  Bogus Dog Food is the Best!
                                                  Now available in handy 1 ton
                                                  bags!



                                         Buy 5!




                                                        x3




©MapR Technologies - Confidential                            24
Mathematical Statement

     Logistic or probit regression

                                    P(conversion) = w   (å x q )
                                                            i ij

                                                     1
                                            w(x) =
                                                   1+ e- x
                                                   erf(x) +1
                                            w(x) =
                                                       2



©MapR Technologies - Confidential                  25
Same Algorithm

     Sample θ

                   q ~ P(q | D)
     Pick design x to maximize reward


                 x* = argmax E[rx | q ] = argmax å xiqij
                                    x        x




©MapR Technologies - Confidential       26
Context Variables

                                                                          x2
                                    x1

                                                      Bogus Dog Food is the Best!
                                                      Now available in handy 1 ton
                                                      bags!



                                             Buy 5!




                                                            x3


           y1=user.geo                   y2=env.time       y3=env.day_of_week        y4=env.weekend


©MapR Technologies - Confidential                                27
Two Kinds of Variables

     The web-site design - x1, x2, x3
       –   We can change these
       –   Different values give different web-site designs


     The environment or context – y1, y2, y3, y4
       –   We can’t change these
       –   They can change themselves


     Our model should include interactions between x and y




©MapR Technologies - Confidential             28
Same Algorithm, More Greek Letters

     Sample θ, π, φ

           (q, P, F)~ P(q, P, F | D)
     Pick design x to maximize reward, y’s are constant

               x* = argmax E[rx | q ]
                                    x

                           = argmax å xiqi + å xi y j p ij + å yij i
                                    x   i       i, j         i

     This looks very fancy, but is actually pretty simple


©MapR Technologies - Confidential               29
Surprises

     We cannot record a non-conversion until we wait


     We cannot record a conversion until we wait for the same time


     Learning from conversions requires delay


     We don’t have to wait very long




©MapR Technologies - Confidential       30
©MapR Technologies - Confidential   31
©MapR Technologies - Confidential   32
©MapR Technologies - Confidential   33
©MapR Technologies - Confidential   34
Required Steps

     Learn distribution of parameters from data
       –   Logistic regression or probit regression (can be on-line!)
       –   Need Bayesian learning algorithm


     Sample from posterior distribution
       –   Generally included in Bayesian learning algorithm


     Pick design
       –   Simple sequential search


     Record data


©MapR Technologies - Confidential              35
Required system
                                        design



©MapR Technologies - Confidential          36
Hadoop is Not Very Real-time

                                            Unprocessed       now
                                               Data

                                    t


                                          Fully Latest full   Hadoop job
                                        processed period      takes this
                                                              long for this
                                                              data

©MapR Technologies - Confidential              37
Real-time and Long-time together

                                                Blended       now
                                                  View
                                                  view

                                    t

                                         Hadoop works     Storm
                                        great back here   works
                                                           here



©MapR Technologies - Confidential                38
Traditional Hadoop Design

     Can use Kafka cluster to queue log lines
     Can use Storm cluster to do real time learning
     Can host web site on NAS
     Can use Flume cluster to import data from Kafka to Hadoop
     Can record long-term history on Hadoop Cluster


     How many clusters?




©MapR Technologies - Confidential     39
HDFS
                                                     Data


                                        Flume
                                    Hadoop

          Users
                                                  Kafka
                                                    Kafka
                                                     Kafka
                                                 Cluster
                                                   Cluster           Kafka
                                                    Cluster           API
                                                                             Storm
                                             Kafka
                     Web Site


                                                                     Design
                                                                    Targeting

                                                                 Web Service NAS
©MapR Technologies - Confidential                           40
That is a lot of
                                 moving parts!



©MapR Technologies - Confidential       41
Alternative Design

     Can host log catcher on MapR via NFS
     Storm can read data directly from queue
     Can host web server directly on cluster


     Only one cluster needed
       –   Total instances drops by 3x
       –   Admin burden massively decreased




©MapR Technologies - Confidential         42
Users




                                                                            http



                                                                          Web-server
                                      Catcher              Storm




                                           Topic                   Web
                                           Queue                   Data
                               MapR




©MapR Technologies - Confidential                    43
You can do this
                                       yourself!



©MapR Technologies - Confidential          44
Contact Me!

     We’re hiring at MapR in US and Europe

     MapR software available for research use

     Contact me at tdunning@maprtech.com or @ted_dunning

     Share news with @apachemahout


     Tweet #devoxxfr #mapr #mahout @ted_dunning




©MapR Technologies - Confidential    45

More Related Content

Similar to Devoxx Real-time Learning

Boston hug-2012-07
Boston hug-2012-07Boston hug-2012-07
Boston hug-2012-07
Ted Dunning
 
Real-time and Long-time Together
Real-time and Long-time TogetherReal-time and Long-time Together
Real-time and Long-time Together
MapR Technologies
 
Graphlab dunning-clustering
Graphlab dunning-clusteringGraphlab dunning-clustering
Graphlab dunning-clustering
Ted Dunning
 
Strata New York 2012
Strata New York 2012Strata New York 2012
Strata New York 2012
MapR Technologies
 
Machine Learning - What, Where and How
Machine Learning - What, Where and HowMachine Learning - What, Where and How
Machine Learning - What, Where and Hownarinderk
 
CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceCMU Lecture on Hadoop Performance
CMU Lecture on Hadoop Performance
MapR Technologies
 
New Directions for Mahout
New Directions for MahoutNew Directions for Mahout
New Directions for Mahout
Ted Dunning
 
London data science
London data scienceLondon data science
London data science
Ted Dunning
 
How to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detectionHow to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detectionDataWorks Summit
 
Super-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRSuper-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapR
Data Science London
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
Ted Dunning
 
Deep Learning for Fraud Detection
Deep Learning for Fraud DetectionDeep Learning for Fraud Detection
Deep Learning for Fraud Detection
DataWorks Summit/Hadoop Summit
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learning
Ted Dunning
 
Uncertainties in large scale power systems
Uncertainties in large scale power systemsUncertainties in large scale power systems
Uncertainties in large scale power systems
Olivier Teytaud
 
Bias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniquesBias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniques
Olivier Teytaud
 

Similar to Devoxx Real-time Learning (15)

Boston hug-2012-07
Boston hug-2012-07Boston hug-2012-07
Boston hug-2012-07
 
Real-time and Long-time Together
Real-time and Long-time TogetherReal-time and Long-time Together
Real-time and Long-time Together
 
Graphlab dunning-clustering
Graphlab dunning-clusteringGraphlab dunning-clustering
Graphlab dunning-clustering
 
Strata New York 2012
Strata New York 2012Strata New York 2012
Strata New York 2012
 
Machine Learning - What, Where and How
Machine Learning - What, Where and HowMachine Learning - What, Where and How
Machine Learning - What, Where and How
 
CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceCMU Lecture on Hadoop Performance
CMU Lecture on Hadoop Performance
 
New Directions for Mahout
New Directions for MahoutNew Directions for Mahout
New Directions for Mahout
 
London data science
London data scienceLondon data science
London data science
 
How to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detectionHow to find what you didn't know to look for, oractical anomaly detection
How to find what you didn't know to look for, oractical anomaly detection
 
Super-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRSuper-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapR
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
 
Deep Learning for Fraud Detection
Deep Learning for Fraud DetectionDeep Learning for Fraud Detection
Deep Learning for Fraud Detection
 
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learning
 
Uncertainties in large scale power systems
Uncertainties in large scale power systemsUncertainties in large scale power systems
Uncertainties in large scale power systems
 
Bias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniquesBias correction, and other uncertainty management techniques
Bias correction, and other uncertainty management techniques
 

More from Ted Dunning

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
Ted Dunning
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
Ted Dunning
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
Ted Dunning
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
Ted Dunning
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
Ted Dunning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
Ted Dunning
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
Ted Dunning
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
Ted Dunning
 
T digest-update
T digest-updateT digest-update
T digest-update
Ted Dunning
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
Ted Dunning
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
Ted Dunning
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
Ted Dunning
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
Ted Dunning
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
Ted Dunning
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Ted Dunning
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
Ted Dunning
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
Ted Dunning
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015Ted Dunning
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
Ted Dunning
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approaches
Ted Dunning
 

More from Ted Dunning (20)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
 
T digest-update
T digest-updateT digest-update
T digest-update
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approaches
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 

Devoxx Real-time Learning

  • 2. whoami – Ted Dunning  Chief Application Architect, MapR Technologies  Committer, member, Apache Software Foundation – particularly Mahout, Zookeeper and Drill (we’re hiring)  Contact me at tdunning@maprtech.com tdunning@apache.com ted.dunning@gmail.com @ted_dunning ©MapR Technologies - Confidential 2
  • 3. Slides and such (available late tonight): – http://www.mapr.com/company/events/devoxx-3-29-2013  Hash tags: #mapr #devoxxfr ©MapR Technologies - Confidential 3
  • 4. Agenda  What is real-time learning?  A sample problem  Philosophy, statistics and the nature of the knowledge  A solution  System design ©MapR Technologies - Confidential 4
  • 5. What is Real-time Learning?  Training data arrives one record at a time  The system improves a mathematical model based on a small amount of training data  We retain at most a fixed amount of state  Each learning step takes O(1) time and memory ©MapR Technologies - Confidential 5
  • 6. We have a product to sell … from a web-site ©MapR Technologies - Confidential 6
  • 7. What tag- What line? picture? Bogus Dog Food is the Best! Now available in handy 1 ton bags! Buy 5! What call to action? ©MapR Technologies - Confidential 7
  • 8. The Challenge  Design decisions affect probability of success – Cheesy web-sites don’t even sell cheese  The best designers do better when allowed to fail – Exploration juices creativity  But failing is expensive – If only because we could have succeeded – But also because offending or disappointing customers is bad ©MapR Technologies - Confidential 8
  • 9. A Quick Diversion  You see a coin – What is the probability of heads? – Could it be larger or smaller than that?  I flip the coin and while it is in the air ask again  I catch the coin and ask again  I look at the coin (and you don’t) and ask again  Why does the answer change? – And did it ever have a single value? ©MapR Technologies - Confidential 9
  • 10. A Philosophical Conclusion  Probability as expressed by humans is subjective and depends on information and experience ©MapR Technologies - Confidential 10
  • 11. So now you understand Bayesian probability ©MapR Technologies - Confidential 11
  • 12. Another Quick Diversion  Let’s play a shell game  This is a special shell game  It costs you nothing to play  The pea has constant probability of being under each shell (trust me)  How do you find the best shell?  How do you find it while maximizing the number of wins? ©MapR Technologies - Confidential 12
  • 13. Pause for short con-game ©MapR Technologies - Confidential 13
  • 14. Conclusions  Can you identify winners or losers without trying them out? No  Can you ever completely eliminate a shell with a bad streak? No  Should you keep trying apparent losers? Yes, but at a decreasing rate ©MapR Technologies - Confidential 14
  • 15. So now you understand multi-armed bandits ©MapR Technologies - Confidential 15
  • 16. Is there an optimum strategy? ©MapR Technologies - Confidential 16
  • 17. Thompson Sampling  Select each shell according to the probability that it is the best  Probability that it is the best can be computed using posterior é ù P(i is best) = ò I êE[ri | q ] = max E[rj | q ]ú P(q | D) dq ë j û  But I promised a simple answer ©MapR Technologies - Confidential 17
  • 18. Thompson Sampling – Take 2  Sample θ q ~ P(q | D)  Pick i to maximize reward i = argmax E[r | q ] j  Record result from using i ©MapR Technologies - Confidential 18
  • 19. Nearly Forgotten until Recently  Citations for Thompson sampling ©MapR Technologies - Confidential 19
  • 20. Bayesian Bandit for the Shells  Compute distributions based on data so far  Sample p1, p2 and p3 from these distributions  Pick shell i where i = argmaxi pi  Lemma 1: The probability of picking shell i will match the probability it is the best shell  Lemma 2: This is as good as it gets ©MapR Technologies - Confidential 20
  • 21. And it works! 0.12 0.11 0.1 0.09 0.08 0.07 regret 0.06 ε- greedy, ε = 0.05 0.05 0.04 Bayesian Bandit with Gam m a- Norm al 0.03 0.02 0.01 0 0 100 200 300 400 500 600 700 800 900 1000 1100 n ©MapR Technologies - Confidential 21
  • 22. Video Demo ©MapR Technologies - Confidential 22
  • 23. The Basic Idea  We can encode a distribution by sampling  Sampling allows unification of exploration and exploitation  Can be extended to more general response models ©MapR Technologies - Confidential 23
  • 24. The Original Problem x2 x1 Bogus Dog Food is the Best! Now available in handy 1 ton bags! Buy 5! x3 ©MapR Technologies - Confidential 24
  • 25. Mathematical Statement  Logistic or probit regression P(conversion) = w (å x q ) i ij 1 w(x) = 1+ e- x erf(x) +1 w(x) = 2 ©MapR Technologies - Confidential 25
  • 26. Same Algorithm  Sample θ q ~ P(q | D)  Pick design x to maximize reward x* = argmax E[rx | q ] = argmax å xiqij x x ©MapR Technologies - Confidential 26
  • 27. Context Variables x2 x1 Bogus Dog Food is the Best! Now available in handy 1 ton bags! Buy 5! x3 y1=user.geo y2=env.time y3=env.day_of_week y4=env.weekend ©MapR Technologies - Confidential 27
  • 28. Two Kinds of Variables  The web-site design - x1, x2, x3 – We can change these – Different values give different web-site designs  The environment or context – y1, y2, y3, y4 – We can’t change these – They can change themselves  Our model should include interactions between x and y ©MapR Technologies - Confidential 28
  • 29. Same Algorithm, More Greek Letters  Sample θ, π, φ (q, P, F)~ P(q, P, F | D)  Pick design x to maximize reward, y’s are constant x* = argmax E[rx | q ] x = argmax å xiqi + å xi y j p ij + å yij i x i i, j i  This looks very fancy, but is actually pretty simple ©MapR Technologies - Confidential 29
  • 30. Surprises  We cannot record a non-conversion until we wait  We cannot record a conversion until we wait for the same time  Learning from conversions requires delay  We don’t have to wait very long ©MapR Technologies - Confidential 30
  • 31. ©MapR Technologies - Confidential 31
  • 32. ©MapR Technologies - Confidential 32
  • 33. ©MapR Technologies - Confidential 33
  • 34. ©MapR Technologies - Confidential 34
  • 35. Required Steps  Learn distribution of parameters from data – Logistic regression or probit regression (can be on-line!) – Need Bayesian learning algorithm  Sample from posterior distribution – Generally included in Bayesian learning algorithm  Pick design – Simple sequential search  Record data ©MapR Technologies - Confidential 35
  • 36. Required system design ©MapR Technologies - Confidential 36
  • 37. Hadoop is Not Very Real-time Unprocessed now Data t Fully Latest full Hadoop job processed period takes this long for this data ©MapR Technologies - Confidential 37
  • 38. Real-time and Long-time together Blended now View view t Hadoop works Storm great back here works here ©MapR Technologies - Confidential 38
  • 39. Traditional Hadoop Design  Can use Kafka cluster to queue log lines  Can use Storm cluster to do real time learning  Can host web site on NAS  Can use Flume cluster to import data from Kafka to Hadoop  Can record long-term history on Hadoop Cluster  How many clusters? ©MapR Technologies - Confidential 39
  • 40. HDFS Data Flume Hadoop Users Kafka Kafka Kafka Cluster Cluster Kafka Cluster API Storm Kafka Web Site Design Targeting Web Service NAS ©MapR Technologies - Confidential 40
  • 41. That is a lot of moving parts! ©MapR Technologies - Confidential 41
  • 42. Alternative Design  Can host log catcher on MapR via NFS  Storm can read data directly from queue  Can host web server directly on cluster  Only one cluster needed – Total instances drops by 3x – Admin burden massively decreased ©MapR Technologies - Confidential 42
  • 43. Users http Web-server Catcher Storm Topic Web Queue Data MapR ©MapR Technologies - Confidential 43
  • 44. You can do this yourself! ©MapR Technologies - Confidential 44
  • 45. Contact Me!  We’re hiring at MapR in US and Europe  MapR software available for research use  Contact me at tdunning@maprtech.com or @ted_dunning  Share news with @apachemahout  Tweet #devoxxfr #mapr #mahout @ted_dunning ©MapR Technologies - Confidential 45