SlideShare a Scribd company logo
1 of 41
© 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies 1
© 2014 MapR Technologies 2 
Who I am 
Ted Dunning, Chief Applications Architect, MapR Technologies 
Email tdunning@mapr.com tdunning@apache.org 
Twitter @Ted_Dunning 
Apache Mahout https://mahout.apache.org/ 
Twitter @ApacheMahout
© 2014 MapR Technologies 3 
Agenda 
• Background – recommending with puppies and ponies 
• Speed tricks 
• Accuracy tricks 
• Moving to real-time
© 2014 MapR Technologies 4 
Puppies and Ponies
© 2014 MapR Technologies 5 
CooccurrenCcoeoAcncaulryrseisn ce Analysis
© 2014 MapR Technologies 6 
How Often Do Items Co-occur 
How often do items co-occur? 
A 
¢ A A
© 2014 MapR Technologies 7 
Which Co-occurrences are Interesting? 
Which cooccurences are interesting? 
¢ A A 
sparsify(A¢A) 
Each row of indicators becomes a field in a 
search engine document
© 2014 MapR Technologies 8 
Recommendations 
Alice got an apple and 
Alice a puppy
© 2014 MapR Technologies 9 
Recommendations 
Alice got an apple and 
Alice a puppy 
Charles Charles got a bicycle
© 2014 MapR Technologies 10 
Recommendations 
Alice got an apple and 
Alice a puppy 
Bob Bob got an apple 
Charles Charles got a bicycle
© 2014 MapR Technologies 11 
Recommendations 
Alice got an apple and 
Alice a puppy 
Bob What else would Bob like? 
Charles Charles got a bicycle
© 2014 MapR Technologies 12 
Recommendations 
Alice got an apple and 
Alice a puppy 
Bob A puppy! 
Charles Charles got a bicycle
By the way, like me, Bob also 
wants a pony… 
© 2014 MapR Technologies 13
© 2014 MapR Technologies 14 
Recommendations 
? 
Alice 
Bob 
Amelia 
Charles 
What if everybody gets a 
pony? 
What else would you recommend for 
new user Amelia?
© 2014 MapR Technologies 15 
Recommendations 
? 
Alice 
Bob 
Amelia 
Charles 
If everybody gets a pony, it’s 
not a very good indicator of 
what to else predict...
© 2014 MapR Technologies 16 
Problems with Raw Co-occurrence 
• Very popular items co-occur with everything or why it’s not very 
helpful to know that everybody wants a pony… 
– Examples: Welcome document; Elevator music 
• Very widespread occurrence is not interesting to generate indicators 
for recommendation 
– Unless you want to offer an item that is constantly desired, such as 
razor blades (or ponies) 
• What we want is anomalous co-occurrence 
– This is the source of interesting indicators of preference on which to 
base recommendation
Overview: Get Useful Indicators from Behaviors 
© 2014 MapR Technologies 17 
1. Use log files to build history matrix of users x items 
– Remember: this history of interactions will be sparse compared to all 
potential combinations 
2. Transform to a co-occurrence matrix of items x items 
3. Look for useful indicators by identifying anomalous co-occurrences to 
make an indicator matrix 
– Log Likelihood Ratio (LLR) can be helpful to judge which co-occurrences 
can with confidence be used as indicators of preference 
– ItemSimilarityJob in Apache Mahout uses LLR
Which one is the anomalous co-occurrence? 
A not A 
B 1 0 
not B 0 2 
© 2014 MapR Technologies 18 
A not A 
B 13 1000 
not B 1000 100,000 
A not A 
B 1 0 
not B 0 10,000 
A not A 
B 10 0 
not B 0 100,000
Which one is the anomalous co-occurrence? 
A not A 
0.90 1.95 
B 1 0 
not B 0 2 
© 2014 MapR Technologies 19 
A not A 
B 13 1000 
not B 1000 100,000 
A not A 
4.52 14.3 
B 1 0 
not B 0 10,000 
A not A 
B 10 0 
not B 0 100,000 
Dunning Ted, Accurate Methods for the Statistics of Surprise and Coincidence, 
Computational Linguistics vol 19 no. 1 (1993)
Collection of Documents: Insert Meta-Data 
Search Engine 
© 2014 MapR Technologies 20 
Item 
meta-data 
Ingest easily via NFS 
Document for 
“puppy” 
id: t4 
title: puppy 
desc: The sweetest little puppy 
ever. 
keywords: puppy, dog, pet 
✔ 
indicators: (t1)
© 2014 MapR Technologies 22 
Cooccurrence Mechanics 
• Cooccurrence is just a self-join 
for each user, i 
for each history item j1 in Ai* 
for each history item j2 in Ai* 
count pair (j1, j2) 
aij1 
aij2 
= A¢A 
å 
i
© 2014 MapR Technologies 23 
Cross-occurrence Mechanics 
• Cross occurrence is just a self-join of adjoined matrices 
for each user, i 
for each history item j1 in Ai* 
for each history item j2 in Bi* 
count pair (j1, j2) 
å ab= A¢B 
[A | B]¢ [A | B] = 
ij1 
ij2 
i 
A¢A A¢B 
B¢A B¢B 
é 
ë ê 
ù 
û ú
© 2014 MapR Technologies 24 
A word about scaling
© 2014 MapR Technologies 25 
A few pragmatic tricks 
• Downsample all user histories to max length (interaction cut) 
– Can be random or most-recent (no apparent effect on accuracy) 
– Prolific users are often pathological anyway 
– Common limit is 300 items (no apparent effect on accuracy) 
• Downsample all items to limit max viewers (frequency limit) 
– Can be random or earliest (no apparent effect) 
– Ubiquitous items are uninformative 
– Common limit is 500 users (no apparent effect) 
Schelter, et al. Scalable similarity-based neighborhood methods with MapReduce. 
Proceedings of the sixth ACM conference on Recommender systems. 2012
© 2014 MapR Technologies 26 
But note! 
• Number of pairs for a user history with ki distinct items is ≈ ki 
2/2 
• Average size of user history increases with increasing dataset 
– Average may grow more slowly than N (or not!) 
– Full cooccurrence cost grows strictly faster than N 
å 2 
Î o(N) 
t μ ki 
i 
– i.e. it just doesn’t scale 
• Downsampling interactions places bounds on per user cost 
– Cooccurrence with interaction cut is scalable 
2 ( ) < Nkmax 
å 2 , kÎ O(N) 
i 
t μ min kmax 
2 
i
0 200 400 600 800 1000 
© 2014 MapR Technologies 27 
0 1 2 3 4 5 6 
Benefit of down−sampling 
User Limit 
Pairs (x 109) 
Without down−sampling 
Track limit = 1000 
500 
200 
Computed on 48,373,586 pair−wise tr iples 
from the million song dataset 
● 
●
Batch Scaling in Time Implies Scaling in Space 
© 2014 MapR Technologies 28 
• Note: 
– With frequency limit sampling, max cooccurrence count is small (<1000) 
– With interaction cut, total number of non-zero pairs is relatively small 
– Entire cooccurrence matrix can be stored in memory in ~10-15 GB 
• Specifically: 
– With interaction cut, cooccurrence scales in size 
A¢A 
0 
ÎO(N) 
– Without interaction cut, cooccurrence does not scale size-wise 
A¢A 
0 
Îw(N)
© 2014 MapR Technologies 29 
Impact of Interaction Cut Downsampling 
• Interaction cut allows batch cooccurrence analysis to be O(N) in 
time and space 
• This is intriguing 
– Amortized cost is low 
– Could this be extended to an on-line form? 
• Incremental matrix factorization is hard 
– Could cooccurrence be a key alternative? 
• Scaling matters most at scale 
– Cooccurrence is very accurate at large scale 
– Factorization shows benefits at smaller scales
© 2014 MapR Technologies 30 
Online update
© 2014 MapR Technologies 31 
Requirements for Online Algorithms 
• Each unit of input must require O(1) work 
– Theoretical bound 
• The constants have to be small enough on average 
– Pragmatic constraint 
• Total accumulated data must be small (enough) 
– Pragmatic constraint
Real-time recommendations using MapR data platform 
New User 
History 
© 2014 MapR Technologies 32 
Log Files 
Search 
Technology 
Item 
Meta-Data 
via 
NFS 
via 
NFS Pre Post 
MapR Cluster 
Recommendations 
Web 
Tier 
Recommendations 
happen in real-time 
Batch co-occurrence 
Want this to be real-time
© 2014 MapR Technologies 33 
Space Bound Implies Time Bound 
• Because user histories are pruned, only a limited number of 
value updates need be made with each new observation 
• This bound is just twice the interaction cut kmax 
– Which is a constant 
• Bounding the number of updates trivially bounds the time
Implications for Online Update 
(A+ eij )¢ (A+ eij ) - A¢A = e¢ijA+ A¢eij +d jj 
ú+d jj 
© 2014 MapR Technologies 34 
= 
0 
Ai* 
0 
é 
ê 
ê 
ê 
ë 
ù 
ú 
ú 
ú 
û 
é 
+ ê 
0 ( A)¢ 0 
i* ë 
ù 
û
© 2014 MapR Technologies 35 
Ai* + Ai* ( )¢ + dii 
0 
£ 2kmax -1Î O(1) 
With interaction cut at kmax
© 2014 MapR Technologies 36 
But Wait, It Gets Better 
• The new observation may be pruned 
– For users at the interaction cut, we can ignore updates 
– For items at the frequency cut, we can ignore updates 
– Ignoring updates only affects indicators, not recommendation query 
– At million song dataset size, half of all updates are pruned 
• On average ki is much less than the interaction cut 
– For million song dataset, average appears to grow with log of frequency 
limit, with little dependency on values of interaction cut > 200 
• LLR cutoff avoids almost all updates to counting 
• Average grows slowly with frequency cut
0 200 400 600 800 1000 
© 2014 MapR Technologies 37 
0 5 10 15 20 25 30 35 
Interaction cut (kmax) 
kave 
Frequency cut = 1000 
= 500 
= 200
0 200 400 600 800 1000 
© 2014 MapR Technologies 38 
0 5 10 15 20 25 30 35 
Frequency cut 
kave
© 2014 MapR Technologies 39 
Recap 
• Cooccurrence-based recommendations are simple 
– Deploying with a search engine is even better 
• Interaction cut and frequency cut are key to batch scalability 
• Similar effect occurs in online form of updates 
– Only dozens of updates per transaction needed 
– Data structure required is relatively small 
– Very, very few updates cause search engine updates 
• Fully online recommendation is very feasible, almost easy
available for free at 
http://www.mapr.com/practical-machine-learning 
© 2014 MapR Technologies 40 
More Details Available 
available for free at 
http://www.mapr.com/practical-machine-learning
© 2014 MapR Technologies 41 
Who I am 
Ted Dunning, Chief Applications Architect, MapR Technologies 
Email tdunning@mapr.com tdunning@apache.org 
Twitter @Ted_Dunning 
Apache Mahout https://mahout.apache.org/ 
Twitter @ApacheMahout 
Apache Drill http://incubator.apache.org/drill/ 
Twitter @ApacheDrill
© 2014 MapR Technologies 42 
Q & A 
Engage with us! 
@mapr maprtech 
tdunning@mapr.com 
MapR 
maprtech 
mapr-technologies

More Related Content

What's hot

Exploration Strategies in Reinforcement Learning
Exploration Strategies in Reinforcement LearningExploration Strategies in Reinforcement Learning
Exploration Strategies in Reinforcement LearningDongmin Lee
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningDongHyun Kwak
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018HJ van Veen
 
Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)Dongmin Lee
 
Mathematical Background for Artificial Intelligence
Mathematical Background for Artificial IntelligenceMathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligenceananth
 
Reinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual BanditsReinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual BanditsMax Pagels
 
Machine learning the next revolution or just another hype
Machine learning   the next revolution or just another hypeMachine learning   the next revolution or just another hype
Machine learning the next revolution or just another hypeJorge Ferrer
 
Revisiting Offline Evaluation for Implicit-Feedback Recommender Systems (Doct...
Revisiting Offline Evaluation for Implicit-Feedback Recommender Systems (Doct...Revisiting Offline Evaluation for Implicit-Feedback Recommender Systems (Doct...
Revisiting Offline Evaluation for Implicit-Feedback Recommender Systems (Doct...Olivier Jeunen
 
H2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellH2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellSri Ambati
 
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Dongmin Lee
 
Exploration and diversity in recommender systems
Exploration and diversity in recommender systemsExploration and diversity in recommender systems
Exploration and diversity in recommender systemsJaya Kawale
 
Real-world Reinforcement Learning
Real-world Reinforcement LearningReal-world Reinforcement Learning
Real-world Reinforcement LearningMax Pagels
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9Roger Barga
 
Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoHridyesh Bisht
 
Episodic Policy Gradient Training
Episodic Policy Gradient TrainingEpisodic Policy Gradient Training
Episodic Policy Gradient TrainingHung Le
 
Model Based Episodic Memory
Model Based Episodic MemoryModel Based Episodic Memory
Model Based Episodic MemoryHung Le
 
Lessons learnt at building recommendation services at industry scale
Lessons learnt at building recommendation services at industry scaleLessons learnt at building recommendation services at industry scale
Lessons learnt at building recommendation services at industry scaleDomonkos Tikk
 
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021Chris Ohk
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5Roger Barga
 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialOmar Enayet
 

What's hot (20)

Exploration Strategies in Reinforcement Learning
Exploration Strategies in Reinforcement LearningExploration Strategies in Reinforcement Learning
Exploration Strategies in Reinforcement Learning
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)Maximum Entropy Reinforcement Learning (Stochastic Control)
Maximum Entropy Reinforcement Learning (Stochastic Control)
 
Mathematical Background for Artificial Intelligence
Mathematical Background for Artificial IntelligenceMathematical Background for Artificial Intelligence
Mathematical Background for Artificial Intelligence
 
Reinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual BanditsReinforcement Learning in Practice: Contextual Bandits
Reinforcement Learning in Practice: Contextual Bandits
 
Machine learning the next revolution or just another hype
Machine learning   the next revolution or just another hypeMachine learning   the next revolution or just another hype
Machine learning the next revolution or just another hype
 
Revisiting Offline Evaluation for Implicit-Feedback Recommender Systems (Doct...
Revisiting Offline Evaluation for Implicit-Feedback Recommender Systems (Doct...Revisiting Offline Evaluation for Implicit-Feedback Recommender Systems (Doct...
Revisiting Offline Evaluation for Implicit-Feedback Recommender Systems (Doct...
 
H2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDellH2O World - Ensembles with Erin LeDell
H2O World - Ensembles with Erin LeDell
 
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Va...
 
Exploration and diversity in recommender systems
Exploration and diversity in recommender systemsExploration and diversity in recommender systems
Exploration and diversity in recommender systems
 
Real-world Reinforcement Learning
Real-world Reinforcement LearningReal-world Reinforcement Learning
Real-world Reinforcement Learning
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
Machine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demoMachine learning Algorithms with a Sagemaker demo
Machine learning Algorithms with a Sagemaker demo
 
Episodic Policy Gradient Training
Episodic Policy Gradient TrainingEpisodic Policy Gradient Training
Episodic Policy Gradient Training
 
Model Based Episodic Memory
Model Based Episodic MemoryModel Based Episodic Memory
Model Based Episodic Memory
 
Lessons learnt at building recommendation services at industry scale
Lessons learnt at building recommendation services at industry scaleLessons learnt at building recommendation services at industry scale
Lessons learnt at building recommendation services at industry scale
 
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
Evolving Reinforcement Learning Algorithms, JD. Co-Reyes et al, 2021
 
Barga Data Science lecture 5
Barga Data Science lecture 5Barga Data Science lecture 5
Barga Data Science lecture 5
 
Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners Tutorial
 

Viewers also liked

Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SF
Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SFLise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SF
Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SFMLconf
 
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFTed Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFMLconf
 
Quoc Le, Software Engineer, Google at MLconf SF
Quoc Le, Software Engineer, Google at MLconf SFQuoc Le, Software Engineer, Google at MLconf SF
Quoc Le, Software Engineer, Google at MLconf SFMLconf
 
Steffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFSteffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFMLconf
 
MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...Sri Ambati
 
Ameet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SF
Ameet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SFAmeet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SF
Ameet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SFMLconf
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning SystemsXavier Amatriain
 
Agile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender SystemsAgile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender SystemsJohann Schleier-Smith
 

Viewers also liked (8)

Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SF
Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SFLise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SF
Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SF
 
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFTed Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF
 
Quoc Le, Software Engineer, Google at MLconf SF
Quoc Le, Software Engineer, Google at MLconf SFQuoc Le, Software Engineer, Google at MLconf SF
Quoc Le, Software Engineer, Google at MLconf SF
 
Steffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SFSteffen Rendle, Research Scientist, Google at MLconf SF
Steffen Rendle, Research Scientist, Google at MLconf SF
 
MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...MLconf - Distributed Deep Learning for Classification and Regression Problems...
MLconf - Distributed Deep Learning for Classification and Regression Problems...
 
Ameet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SF
Ameet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SFAmeet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SF
Ameet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SF
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
 
Agile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender SystemsAgile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender Systems
 

Similar to Ted Dunning, Chief Application Architect, MapR at MLconf SF

Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeTed Dunning
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesTed Dunning
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with HadoopDataWorks Summit
 
How to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterHow to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterDataWorks Summit
 
How to tell which algorithms really matter
How to tell which algorithms really matterHow to tell which algorithms really matter
How to tell which algorithms really matterDataWorks Summit
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation TechnTed Dunning
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedTed Dunning
 
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen ChinaAllen Day, PhD
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsAlbert Bifet
 
Practical Computing with Chaos
Practical Computing with ChaosPractical Computing with Chaos
Practical Computing with ChaosMapR Technologies
 
Practical Computing With Chaos
Practical Computing With ChaosPractical Computing With Chaos
Practical Computing With ChaosDataWorks Summit
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossibleTed Dunning
 
Dealing with an Upside Down Internet
Dealing with an Upside Down InternetDealing with an Upside Down Internet
Dealing with an Upside Down InternetMapR Technologies
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownDataWorks Summit
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningMapR Technologies
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015Ted Dunning
 

Similar to Ted Dunning, Chief Application Architect, MapR at MLconf SF (20)

Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
 
Dunning ml-conf-2014
Dunning ml-conf-2014Dunning ml-conf-2014
Dunning ml-conf-2014
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approaches
 
Predictive Analytics with Hadoop
Predictive Analytics with HadoopPredictive Analytics with Hadoop
Predictive Analytics with Hadoop
 
How to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterHow to Determine which Algorithms Really Matter
How to Determine which Algorithms Really Matter
 
How to tell which algorithms really matter
How to tell which algorithms really matterHow to tell which algorithms really matter
How to tell which algorithms really matter
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation Techn
 
T digest-update
T digest-updateT digest-update
T digest-update
 
Goto amsterdam-2013-skinned
Goto amsterdam-2013-skinnedGoto amsterdam-2013-skinned
Goto amsterdam-2013-skinned
 
GoTo Amsterdam 2013 Skinned
GoTo Amsterdam 2013 SkinnedGoTo Amsterdam 2013 Skinned
GoTo Amsterdam 2013 Skinned
 
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
2014.06.16 - BGI - Genomics BigData Workloads - Shenzhen China
 
Real-Time Big Data Stream Analytics
Real-Time Big Data Stream AnalyticsReal-Time Big Data Stream Analytics
Real-Time Big Data Stream Analytics
 
Practical Computing with Chaos
Practical Computing with ChaosPractical Computing with Chaos
Practical Computing with Chaos
 
Practical Computing With Chaos
Practical Computing With ChaosPractical Computing With Chaos
Practical Computing With Chaos
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
 
Dealing with an Upside Down Internet
Dealing with an Upside Down InternetDealing with an Upside Down Internet
Dealing with an Upside Down Internet
 
How the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside DownHow the Internet of Things are Turning the Internet Upside Down
How the Internet of Things are Turning the Internet Upside Down
 
Deep Learning vs. Cheap Learning
Deep Learning vs. Cheap LearningDeep Learning vs. Cheap Learning
Deep Learning vs. Cheap Learning
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 

More from MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Recently uploaded

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Recently uploaded (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

Ted Dunning, Chief Application Architect, MapR at MLconf SF

  • 1. © 2014 MapR Techno©lo 2g0ie1s4 MapR Technologies 1
  • 2. © 2014 MapR Technologies 2 Who I am Ted Dunning, Chief Applications Architect, MapR Technologies Email tdunning@mapr.com tdunning@apache.org Twitter @Ted_Dunning Apache Mahout https://mahout.apache.org/ Twitter @ApacheMahout
  • 3. © 2014 MapR Technologies 3 Agenda • Background – recommending with puppies and ponies • Speed tricks • Accuracy tricks • Moving to real-time
  • 4. © 2014 MapR Technologies 4 Puppies and Ponies
  • 5. © 2014 MapR Technologies 5 CooccurrenCcoeoAcncaulryrseisn ce Analysis
  • 6. © 2014 MapR Technologies 6 How Often Do Items Co-occur How often do items co-occur? A ¢ A A
  • 7. © 2014 MapR Technologies 7 Which Co-occurrences are Interesting? Which cooccurences are interesting? ¢ A A sparsify(A¢A) Each row of indicators becomes a field in a search engine document
  • 8. © 2014 MapR Technologies 8 Recommendations Alice got an apple and Alice a puppy
  • 9. © 2014 MapR Technologies 9 Recommendations Alice got an apple and Alice a puppy Charles Charles got a bicycle
  • 10. © 2014 MapR Technologies 10 Recommendations Alice got an apple and Alice a puppy Bob Bob got an apple Charles Charles got a bicycle
  • 11. © 2014 MapR Technologies 11 Recommendations Alice got an apple and Alice a puppy Bob What else would Bob like? Charles Charles got a bicycle
  • 12. © 2014 MapR Technologies 12 Recommendations Alice got an apple and Alice a puppy Bob A puppy! Charles Charles got a bicycle
  • 13. By the way, like me, Bob also wants a pony… © 2014 MapR Technologies 13
  • 14. © 2014 MapR Technologies 14 Recommendations ? Alice Bob Amelia Charles What if everybody gets a pony? What else would you recommend for new user Amelia?
  • 15. © 2014 MapR Technologies 15 Recommendations ? Alice Bob Amelia Charles If everybody gets a pony, it’s not a very good indicator of what to else predict...
  • 16. © 2014 MapR Technologies 16 Problems with Raw Co-occurrence • Very popular items co-occur with everything or why it’s not very helpful to know that everybody wants a pony… – Examples: Welcome document; Elevator music • Very widespread occurrence is not interesting to generate indicators for recommendation – Unless you want to offer an item that is constantly desired, such as razor blades (or ponies) • What we want is anomalous co-occurrence – This is the source of interesting indicators of preference on which to base recommendation
  • 17. Overview: Get Useful Indicators from Behaviors © 2014 MapR Technologies 17 1. Use log files to build history matrix of users x items – Remember: this history of interactions will be sparse compared to all potential combinations 2. Transform to a co-occurrence matrix of items x items 3. Look for useful indicators by identifying anomalous co-occurrences to make an indicator matrix – Log Likelihood Ratio (LLR) can be helpful to judge which co-occurrences can with confidence be used as indicators of preference – ItemSimilarityJob in Apache Mahout uses LLR
  • 18. Which one is the anomalous co-occurrence? A not A B 1 0 not B 0 2 © 2014 MapR Technologies 18 A not A B 13 1000 not B 1000 100,000 A not A B 1 0 not B 0 10,000 A not A B 10 0 not B 0 100,000
  • 19. Which one is the anomalous co-occurrence? A not A 0.90 1.95 B 1 0 not B 0 2 © 2014 MapR Technologies 19 A not A B 13 1000 not B 1000 100,000 A not A 4.52 14.3 B 1 0 not B 0 10,000 A not A B 10 0 not B 0 100,000 Dunning Ted, Accurate Methods for the Statistics of Surprise and Coincidence, Computational Linguistics vol 19 no. 1 (1993)
  • 20. Collection of Documents: Insert Meta-Data Search Engine © 2014 MapR Technologies 20 Item meta-data Ingest easily via NFS Document for “puppy” id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet ✔ indicators: (t1)
  • 21. © 2014 MapR Technologies 22 Cooccurrence Mechanics • Cooccurrence is just a self-join for each user, i for each history item j1 in Ai* for each history item j2 in Ai* count pair (j1, j2) aij1 aij2 = A¢A å i
  • 22. © 2014 MapR Technologies 23 Cross-occurrence Mechanics • Cross occurrence is just a self-join of adjoined matrices for each user, i for each history item j1 in Ai* for each history item j2 in Bi* count pair (j1, j2) å ab= A¢B [A | B]¢ [A | B] = ij1 ij2 i A¢A A¢B B¢A B¢B é ë ê ù û ú
  • 23. © 2014 MapR Technologies 24 A word about scaling
  • 24. © 2014 MapR Technologies 25 A few pragmatic tricks • Downsample all user histories to max length (interaction cut) – Can be random or most-recent (no apparent effect on accuracy) – Prolific users are often pathological anyway – Common limit is 300 items (no apparent effect on accuracy) • Downsample all items to limit max viewers (frequency limit) – Can be random or earliest (no apparent effect) – Ubiquitous items are uninformative – Common limit is 500 users (no apparent effect) Schelter, et al. Scalable similarity-based neighborhood methods with MapReduce. Proceedings of the sixth ACM conference on Recommender systems. 2012
  • 25. © 2014 MapR Technologies 26 But note! • Number of pairs for a user history with ki distinct items is ≈ ki 2/2 • Average size of user history increases with increasing dataset – Average may grow more slowly than N (or not!) – Full cooccurrence cost grows strictly faster than N å 2 Î o(N) t μ ki i – i.e. it just doesn’t scale • Downsampling interactions places bounds on per user cost – Cooccurrence with interaction cut is scalable 2 ( ) < Nkmax å 2 , kÎ O(N) i t μ min kmax 2 i
  • 26. 0 200 400 600 800 1000 © 2014 MapR Technologies 27 0 1 2 3 4 5 6 Benefit of down−sampling User Limit Pairs (x 109) Without down−sampling Track limit = 1000 500 200 Computed on 48,373,586 pair−wise tr iples from the million song dataset ● ●
  • 27. Batch Scaling in Time Implies Scaling in Space © 2014 MapR Technologies 28 • Note: – With frequency limit sampling, max cooccurrence count is small (<1000) – With interaction cut, total number of non-zero pairs is relatively small – Entire cooccurrence matrix can be stored in memory in ~10-15 GB • Specifically: – With interaction cut, cooccurrence scales in size A¢A 0 ÎO(N) – Without interaction cut, cooccurrence does not scale size-wise A¢A 0 Îw(N)
  • 28. © 2014 MapR Technologies 29 Impact of Interaction Cut Downsampling • Interaction cut allows batch cooccurrence analysis to be O(N) in time and space • This is intriguing – Amortized cost is low – Could this be extended to an on-line form? • Incremental matrix factorization is hard – Could cooccurrence be a key alternative? • Scaling matters most at scale – Cooccurrence is very accurate at large scale – Factorization shows benefits at smaller scales
  • 29. © 2014 MapR Technologies 30 Online update
  • 30. © 2014 MapR Technologies 31 Requirements for Online Algorithms • Each unit of input must require O(1) work – Theoretical bound • The constants have to be small enough on average – Pragmatic constraint • Total accumulated data must be small (enough) – Pragmatic constraint
  • 31. Real-time recommendations using MapR data platform New User History © 2014 MapR Technologies 32 Log Files Search Technology Item Meta-Data via NFS via NFS Pre Post MapR Cluster Recommendations Web Tier Recommendations happen in real-time Batch co-occurrence Want this to be real-time
  • 32. © 2014 MapR Technologies 33 Space Bound Implies Time Bound • Because user histories are pruned, only a limited number of value updates need be made with each new observation • This bound is just twice the interaction cut kmax – Which is a constant • Bounding the number of updates trivially bounds the time
  • 33. Implications for Online Update (A+ eij )¢ (A+ eij ) - A¢A = e¢ijA+ A¢eij +d jj ú+d jj © 2014 MapR Technologies 34 = 0 Ai* 0 é ê ê ê ë ù ú ú ú û é + ê 0 ( A)¢ 0 i* ë ù û
  • 34. © 2014 MapR Technologies 35 Ai* + Ai* ( )¢ + dii 0 £ 2kmax -1Î O(1) With interaction cut at kmax
  • 35. © 2014 MapR Technologies 36 But Wait, It Gets Better • The new observation may be pruned – For users at the interaction cut, we can ignore updates – For items at the frequency cut, we can ignore updates – Ignoring updates only affects indicators, not recommendation query – At million song dataset size, half of all updates are pruned • On average ki is much less than the interaction cut – For million song dataset, average appears to grow with log of frequency limit, with little dependency on values of interaction cut > 200 • LLR cutoff avoids almost all updates to counting • Average grows slowly with frequency cut
  • 36. 0 200 400 600 800 1000 © 2014 MapR Technologies 37 0 5 10 15 20 25 30 35 Interaction cut (kmax) kave Frequency cut = 1000 = 500 = 200
  • 37. 0 200 400 600 800 1000 © 2014 MapR Technologies 38 0 5 10 15 20 25 30 35 Frequency cut kave
  • 38. © 2014 MapR Technologies 39 Recap • Cooccurrence-based recommendations are simple – Deploying with a search engine is even better • Interaction cut and frequency cut are key to batch scalability • Similar effect occurs in online form of updates – Only dozens of updates per transaction needed – Data structure required is relatively small – Very, very few updates cause search engine updates • Fully online recommendation is very feasible, almost easy
  • 39. available for free at http://www.mapr.com/practical-machine-learning © 2014 MapR Technologies 40 More Details Available available for free at http://www.mapr.com/practical-machine-learning
  • 40. © 2014 MapR Technologies 41 Who I am Ted Dunning, Chief Applications Architect, MapR Technologies Email tdunning@mapr.com tdunning@apache.org Twitter @Ted_Dunning Apache Mahout https://mahout.apache.org/ Twitter @ApacheMahout Apache Drill http://incubator.apache.org/drill/ Twitter @ApacheDrill
  • 41. © 2014 MapR Technologies 42 Q & A Engage with us! @mapr maprtech tdunning@mapr.com MapR maprtech mapr-technologies

Editor's Notes

  1. Mention that the Pony book said “RowSimilarityJob”…