SlideShare a Scribd company logo
1 of 36
Online Learning
Bayesian bandits and more

©MapR Technologies - Confidential   1
whoami – Ted Dunning


     Ted Dunning
       tdunning@maprtech.com
       tdunning@apache.org
       @ted_dunning


     We’re hiring at MapR


     For slides and other info
        http://www.slideshare.net/tdunning



©MapR Technologies - Confidential   2
Online
                                Scalable
                                     Incremental
©MapR Technologies - Confidential      3
Scalability and Learning

     What does scalable mean?



     What are inherent characteristics of scalable learning?



     What are the logical implications?




©MapR Technologies - Confidential     4
Scalable ≈ On-line
                                    If you squint just right


©MapR Technologies - Confidential               5
unit of work ≈ unit of time



©MapR Technologies - Confidential   6
Infinite
        Data                        Learning
     Stream

                                     State



©MapR Technologies - Confidential     7
Pick One




©MapR Technologies - Confidential       8
©MapR Technologies - Confidential   9
©MapR Technologies - Confidential   10
Now pick again



©MapR Technologies - Confidential         11
A Quick Diversion

     You see a coin
       –   What is the probability of heads?
       –   Could it be larger or smaller than that?
     I flip the coin and while it is in the air ask again
     I catch the coin and ask again
     I look at the coin (and you don’t) and ask again
     Why does the answer change?
       –   And did it ever have a single value?




©MapR Technologies - Confidential             12
Which One to Play?

     One may be better than the other
     The better coin pays off at some rate
     Playing the other will pay off at a lesser rate
       –   Playing the lesser coin has “opportunity cost”


     But how do we know which is which?
       –   Explore versus Exploit!




©MapR Technologies - Confidential             13
A First Conclusion

     Probability as expressed by humans is subjective and depends on
      information and experience




©MapR Technologies - Confidential    14
A Second Conclusion

     A single number is a bad way to express uncertain knowledge



     A distribution of values might be better




©MapR Technologies - Confidential     15
I Dunno




©MapR Technologies - Confidential   16
5 and 5




©MapR Technologies - Confidential   17
2 and 10




©MapR Technologies - Confidential   18
The Cynic Among Us




©MapR Technologies - Confidential   19
Demo


©MapR Technologies - Confidential     20
An Example




©MapR Technologies - Confidential   21
An Example




©MapR Technologies - Confidential   22
The Cluster Proximity Features

     Every point can be described by the nearest cluster
       –   4.3 bits per point in this case
       –   Significant error that can be decreased (to a point) by increasing number of
           clusters
     Or by the proximity to the 2 nearest clusters (2 x 4.3 bits + 1 sign
      bit + 2 proximities)
       –   Error is negligible
       –   Unwinds the data into a simple representation




©MapR Technologies - Confidential             23
Diagonalized Cluster Proximity




©MapR Technologies - Confidential   24
Lots of Clusters Are Fine




©MapR Technologies - Confidential   25
Surrogate Method

     Start with sloppy clustering into κ = k log n clusters
     Use these clusters as a weighted surrogate for the data
     Cluster surrogate data using ball k-means


     Results are provably high quality for highly clusterable data
     Sloppy clustering can be done on-line
     Surrogate can be kept in memory
     Ball k-means pass can be done at any time




©MapR Technologies - Confidential       26
Algorithm Costs

     O(k d log n) per point for Lloyd’s algorithm
          … not so good for k = 2000, n = 108

     Surrogate methods
          …. O(d log κ) = O(d (log k + log log n)) per point


     This is a big deal:
       –   k d log n = 2000 x 10 x 26 = 500,000
       –   log k + log log n = 11 + 5 = 17
       –   30,000 times faster makes the grade as a bona fide big deal



©MapR Technologies - Confidential            27
30,000 times faster sounds good




©MapR Technologies - Confidential       28
30,000 times faster sounds good

                                    but that isn’t the big news




©MapR Technologies - Confidential                29
30,000 times faster sounds good

                                    but that isn’t the big news

                                      these algorithms do
                                       on-line clustering

©MapR Technologies - Confidential                30
Parallel Speedup?

                                        200


                                                                                     Non- threaded




                                                                  ✓
                                        100
                                                  2
                 Tim e per point (μs)




                                                                                      Threaded version
                                                          3

                                        50
                                                                    4
                                        40                                              6
                                                                             5

                                                                                              8
                                        30
                                                                                                  10        14
                                                                                                       12
                                        20                    Perfect Scaling                                    16




                                        10
                                              1       2       3         4        5                                    20


                                                                  Threads
©MapR Technologies - Confidential                                       31
What about deployment?




©MapR Technologies - Confidential      32
Infinite
        Data                        Learning
     Stream

                                     State



©MapR Technologies - Confidential     33
Data
                                      Mapper
                              Split

                                       State



©MapR Technologies - Confidential      34
Data
                                      Mapper
                                      Mapper
                              Split    Mapper

                   Need shared
                    memory!                 State



©MapR Technologies - Confidential      35
whoami – Ted Dunning


     We’re hiring at MapR


     Ted Dunning
       tdunning@maprtech.com
       tdunning@apache.org
       @ted_dunning



     For slides and other info
                                    http://www.slideshare.net/tdunning

©MapR Technologies - Confidential                   36

More Related Content

What's hot

Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningTed Dunning
 
Graphlab dunning-clustering
Graphlab dunning-clusteringGraphlab dunning-clustering
Graphlab dunning-clusteringTed Dunning
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedTed Dunning
 
Bda-dunning-2012-12-06
Bda-dunning-2012-12-06Bda-dunning-2012-12-06
Bda-dunning-2012-12-06Ted Dunning
 
What is the past future tense of data?
What is the past future tense of data?What is the past future tense of data?
What is the past future tense of data?Ted Dunning
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to NewMapR Technologies
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteTed Dunning
 
Real-time and Long-time Together
Real-time and Long-time TogetherReal-time and Long-time Together
Real-time and Long-time TogetherMapR Technologies
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015Ted Dunning
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionTed Dunning
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoopTed Dunning
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Ted Dunning
 
What's new in Apache Mahout
What's new in Apache MahoutWhat's new in Apache Mahout
What's new in Apache MahoutTed Dunning
 
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ..."Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...Edge AI and Vision Alliance
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really MatterTed Dunning
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesTed Dunning
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labImpetus Technologies
 

What's hot (20)

Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learningBuzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learning
 
Graphlab dunning-clustering
Graphlab dunning-clusteringGraphlab dunning-clustering
Graphlab dunning-clustering
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinned
 
Bda-dunning-2012-12-06
Bda-dunning-2012-12-06Bda-dunning-2012-12-06
Bda-dunning-2012-12-06
 
Dunning strata-2012-27-02
Dunning strata-2012-27-02Dunning strata-2012-27-02
Dunning strata-2012-27-02
 
What is the past future tense of data?
What is the past future tense of data?What is the past future tense of data?
What is the past future tense of data?
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to New
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
Real-time and Long-time Together
Real-time and Long-time TogetherReal-time and Long-time Together
Real-time and Long-time Together
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
Devoxx Real-Time Learning
Devoxx Real-Time LearningDevoxx Real-Time Learning
Devoxx Real-Time Learning
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly Detection
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
 
What's new in Apache Mahout
What's new in Apache MahoutWhat's new in Apache Mahout
What's new in Apache Mahout
 
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ..."Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
"Deep Learning and Vision Algorithm Development in MATLAB Targeting Embedded ...
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really Matter
 
Hcj 2013-01-21
Hcj 2013-01-21Hcj 2013-01-21
Hcj 2013-01-21
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approaches
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
 

Similar to Strata new-york-2012

Super-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRSuper-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRData Science London
 
Cmu Lecture on Hadoop Performance
Cmu Lecture on Hadoop PerformanceCmu Lecture on Hadoop Performance
Cmu Lecture on Hadoop PerformanceTed Dunning
 
predictive-analytics-san-diego-2013-02-21
predictive-analytics-san-diego-2013-02-21predictive-analytics-san-diego-2013-02-21
predictive-analytics-san-diego-2013-02-21Ted Dunning
 
CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceCMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceMapR Technologies
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
 
Real Time Learning
Real Time LearningReal Time Learning
Real Time LearningTed Dunning
 
Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutWhats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutTed Dunning
 
What's Right and Wrong with Apache Mahout
What's Right and Wrong with Apache MahoutWhat's Right and Wrong with Apache Mahout
What's Right and Wrong with Apache MahoutMapR Technologies
 
Real-time and long-time together
Real-time and long-time togetherReal-time and long-time together
Real-time and long-time togetherTed Dunning
 
News from Mahout
News from MahoutNews from Mahout
News from MahoutTed Dunning
 
Graphlab Ted Dunning Clustering
Graphlab Ted Dunning  ClusteringGraphlab Ted Dunning  Clustering
Graphlab Ted Dunning ClusteringMapR Technologies
 
Chicago Hadoop in Finance - Ted Dunning
Chicago Hadoop in Finance - Ted DunningChicago Hadoop in Finance - Ted Dunning
Chicago Hadoop in Finance - Ted DunningMapR Technologies
 
Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012MapR Technologies
 

Similar to Strata new-york-2012 (20)

Strata New York 2012
Strata New York 2012Strata New York 2012
Strata New York 2012
 
Super-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapRSuper-Fast Clustering Report in MapR
Super-Fast Clustering Report in MapR
 
Cmu Lecture on Hadoop Performance
Cmu Lecture on Hadoop PerformanceCmu Lecture on Hadoop Performance
Cmu Lecture on Hadoop Performance
 
predictive-analytics-san-diego-2013-02-21
predictive-analytics-san-diego-2013-02-21predictive-analytics-san-diego-2013-02-21
predictive-analytics-san-diego-2013-02-21
 
News From Mahout
News From MahoutNews From Mahout
News From Mahout
 
GoTo Amsterdam 2013 Skinned
GoTo Amsterdam 2013 SkinnedGoTo Amsterdam 2013 Skinned
GoTo Amsterdam 2013 Skinned
 
CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceCMU Lecture on Hadoop Performance
CMU Lecture on Hadoop Performance
 
London hug
London hugLondon hug
London hug
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
Real Time Learning
Real Time LearningReal Time Learning
Real Time Learning
 
Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutWhats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache Mahout
 
What's Right and Wrong with Apache Mahout
What's Right and Wrong with Apache MahoutWhat's Right and Wrong with Apache Mahout
What's Right and Wrong with Apache Mahout
 
Real-time and long-time together
Real-time and long-time togetherReal-time and long-time together
Real-time and long-time together
 
News from Mahout
News from MahoutNews from Mahout
News from Mahout
 
Graphlab Ted Dunning Clustering
Graphlab Ted Dunning  ClusteringGraphlab Ted Dunning  Clustering
Graphlab Ted Dunning Clustering
 
New directions for mahout
New directions for mahoutNew directions for mahout
New directions for mahout
 
Chicago Hadoop in Finance - Ted Dunning
Chicago Hadoop in Finance - Ted DunningChicago Hadoop in Finance - Ted Dunning
Chicago Hadoop in Finance - Ted Dunning
 
Deep Learning for Fraud Detection
Deep Learning for Fraud DetectionDeep Learning for Fraud Detection
Deep Learning for Fraud Detection
 
Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012
 
Big Data Analytics London
Big Data Analytics LondonBig Data Analytics London
Big Data Analytics London
 

More from Ted Dunning

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxTed Dunning
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with KubernetesTed Dunning
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in KubernetesTed Dunning
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forTed Dunning
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning LogisticsTed Dunning
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTed Dunning
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logisticsTed Dunning
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real DataTed Dunning
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data SecurelyTed Dunning
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeTed Dunning
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossibleTed Dunning
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningTed Dunning
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation TechnTed Dunning
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Ted Dunning
 
My talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveMy talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveTed Dunning
 
Building multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesTed Dunning
 

More from Ted Dunning (20)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
 
T digest-update
T digest-updateT digest-update
T digest-update
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation Techn
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0
 
My talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveMy talk about recommendation and search to the Hive
My talk about recommendation and search to the Hive
 
Building multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search engines
 

Recently uploaded

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Recently uploaded (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Strata new-york-2012

  • 1. Online Learning Bayesian bandits and more ©MapR Technologies - Confidential 1
  • 2. whoami – Ted Dunning  Ted Dunning tdunning@maprtech.com tdunning@apache.org @ted_dunning  We’re hiring at MapR  For slides and other info http://www.slideshare.net/tdunning ©MapR Technologies - Confidential 2
  • 3. Online Scalable Incremental ©MapR Technologies - Confidential 3
  • 4. Scalability and Learning  What does scalable mean?  What are inherent characteristics of scalable learning?  What are the logical implications? ©MapR Technologies - Confidential 4
  • 5. Scalable ≈ On-line If you squint just right ©MapR Technologies - Confidential 5
  • 6. unit of work ≈ unit of time ©MapR Technologies - Confidential 6
  • 7. Infinite Data Learning Stream State ©MapR Technologies - Confidential 7
  • 8. Pick One ©MapR Technologies - Confidential 8
  • 9. ©MapR Technologies - Confidential 9
  • 10. ©MapR Technologies - Confidential 10
  • 11. Now pick again ©MapR Technologies - Confidential 11
  • 12. A Quick Diversion  You see a coin – What is the probability of heads? – Could it be larger or smaller than that?  I flip the coin and while it is in the air ask again  I catch the coin and ask again  I look at the coin (and you don’t) and ask again  Why does the answer change? – And did it ever have a single value? ©MapR Technologies - Confidential 12
  • 13. Which One to Play?  One may be better than the other  The better coin pays off at some rate  Playing the other will pay off at a lesser rate – Playing the lesser coin has “opportunity cost”  But how do we know which is which? – Explore versus Exploit! ©MapR Technologies - Confidential 13
  • 14. A First Conclusion  Probability as expressed by humans is subjective and depends on information and experience ©MapR Technologies - Confidential 14
  • 15. A Second Conclusion  A single number is a bad way to express uncertain knowledge  A distribution of values might be better ©MapR Technologies - Confidential 15
  • 16. I Dunno ©MapR Technologies - Confidential 16
  • 17. 5 and 5 ©MapR Technologies - Confidential 17
  • 18. 2 and 10 ©MapR Technologies - Confidential 18
  • 19. The Cynic Among Us ©MapR Technologies - Confidential 19
  • 20. Demo ©MapR Technologies - Confidential 20
  • 21. An Example ©MapR Technologies - Confidential 21
  • 22. An Example ©MapR Technologies - Confidential 22
  • 23. The Cluster Proximity Features  Every point can be described by the nearest cluster – 4.3 bits per point in this case – Significant error that can be decreased (to a point) by increasing number of clusters  Or by the proximity to the 2 nearest clusters (2 x 4.3 bits + 1 sign bit + 2 proximities) – Error is negligible – Unwinds the data into a simple representation ©MapR Technologies - Confidential 23
  • 24. Diagonalized Cluster Proximity ©MapR Technologies - Confidential 24
  • 25. Lots of Clusters Are Fine ©MapR Technologies - Confidential 25
  • 26. Surrogate Method  Start with sloppy clustering into κ = k log n clusters  Use these clusters as a weighted surrogate for the data  Cluster surrogate data using ball k-means  Results are provably high quality for highly clusterable data  Sloppy clustering can be done on-line  Surrogate can be kept in memory  Ball k-means pass can be done at any time ©MapR Technologies - Confidential 26
  • 27. Algorithm Costs  O(k d log n) per point for Lloyd’s algorithm … not so good for k = 2000, n = 108  Surrogate methods …. O(d log κ) = O(d (log k + log log n)) per point  This is a big deal: – k d log n = 2000 x 10 x 26 = 500,000 – log k + log log n = 11 + 5 = 17 – 30,000 times faster makes the grade as a bona fide big deal ©MapR Technologies - Confidential 27
  • 28. 30,000 times faster sounds good ©MapR Technologies - Confidential 28
  • 29. 30,000 times faster sounds good but that isn’t the big news ©MapR Technologies - Confidential 29
  • 30. 30,000 times faster sounds good but that isn’t the big news these algorithms do on-line clustering ©MapR Technologies - Confidential 30
  • 31. Parallel Speedup? 200 Non- threaded ✓ 100 2 Tim e per point (μs) Threaded version 3 50 4 40 6 5 8 30 10 14 12 20 Perfect Scaling 16 10 1 2 3 4 5 20 Threads ©MapR Technologies - Confidential 31
  • 32. What about deployment? ©MapR Technologies - Confidential 32
  • 33. Infinite Data Learning Stream State ©MapR Technologies - Confidential 33
  • 34. Data Mapper Split State ©MapR Technologies - Confidential 34
  • 35. Data Mapper Mapper Split Mapper Need shared memory! State ©MapR Technologies - Confidential 35
  • 36. whoami – Ted Dunning  We’re hiring at MapR  Ted Dunning tdunning@maprtech.com tdunning@apache.org @ted_dunning  For slides and other info http://www.slideshare.net/tdunning ©MapR Technologies - Confidential 36