SlideShare a Scribd company logo
Beware… For It’S THE...
Vowpal platypus
Peter HurforD
(With a little help from some friends)
WE OFTEN WANT TO PREDICT STUFF...
WE OFTEN WANT TO PREDICT STUFF…
...BUT WE RUN INTO LIMITATIONS.
WE OFTEN WANT TO PREDICT STUFF…
...BUT WE RUN INTO LIMITATIONS.
× ...Data set is too large, it doesn’t fit in RAM.
WE OFTEN WANT TO PREDICT STUFF…
...BUT WE RUN INTO LIMITATIONS.
× ...Data set is too large, it doesn’t fit in RAM.
× ...Data set is so large, it doesn’t fit on disk!
WE OFTEN WANT TO PREDICT STUFF…
...BUT WE RUN INTO LIMITATIONS.
× ...Data set is too large, it doesn’t fit in RAM.
× ...Data set is so large, it doesn’t fit on disk!
× ...Model train time is so slow, you can’t iterate
and try things.
“I want to use parallel
learning algorithms to
create fantastic learning
machines!”
- John Langford, 1997
YOU FOOL! THE ONLY
THING PARALLEL
MACHINES ARE USEFUL
FOR ARE COMPUTATIONAL
WINDTUNNELS!
TEN YEARS LATER...
VOWPAL
...Fast Online Learning
TEN YEARS LATER...
...WHAT’s WITH THE NAME?
...WHAT’s WITH THE NAME?
...WHAT’s WITH THE NAME?
+
...WHAT’s WITH THE NAME?
+
Traditional Approach
1. Load all training data
into RAM at once.
2. Fit model to training
dataset.
3. Load all predicting data
into RAM at once.
4. Use trained model to
make predictions.
WHAT DOES IT DO?
VW “Online” Approach
1. Train model on single
datapoints, one at a
time.
2. Do it again multiple
times.
3. Use trained model to
predict on new
datapoints, one at a
time.
Traditional Approach
1. Load all training data
into RAM at once.
2. Fit model to training
dataset.
3. Load all predicting data
into RAM at once.
4. Use trained model to
make predictions.
WHAT DOES IT DO?
× Online approach
eventually converges to
the same results as a
traditional (batch)
approach over enough
iterations.
WHAT DOES IT DO?
WHAT DOES IT DO?
× Online approach
eventually converges to
the same results as a
traditional (batch)
approach over enough
iterations.
× But you’re no longer
dependent on RAM!
Kaggle: World Data Science Competitions
× 3rd, 14th, and 29th / 718 on $16K Criteo ad click challenge
× 3rd / 472 on $2K KDD Cup Challenge
× 8th / 128 on $25K Avito.ru illicit content filtering challenge
IS IT ANY GOOD?
× szilard/benchm-ml: widely cited (1127 star) independent ML
speed benchmarks.
× Logistic Regression on 10M datapoints on a c3.8xlarge instance
(32 cores, 60GB RAM).
DID I MENTION IT’S FAST?
Engine Speed
Python Sklearn Crashed
R 90sec
Vowpal Wabbit 15sec
Spark 35sec
× szilard/benchm-ml: widely cited (1127 star) independent ML
speed benchmarks.
× Logistic Regression on 10M datapoints on a c3.8xlarge instance
(32 cores, 60GB RAM).
DID I MENTION IT’S FAST?
Engine Speed
Python Sklearn Crashed
R 90sec
Vowpal Wabbit 15sec
Spark 35sec
Yes, this was Spark 2.0, but it
was using MLLib. ML
performance is under testing
now.
× szilard/benchm-ml: widely cited (1127 star) independent ML
speed benchmarks.
× Logistic Regression on 10M datapoints on a c3.8xlarge instance
(32 cores, 60GB RAM).
DID I MENTION IT’S FAST?
Engine Speed
Python Sklearn Crashed
R 90sec
Vowpal Wabbit 15sec
Spark 35sec
But this benchmark was
only single core!
× szilard/benchm-ml: widely cited (1127 star) independent ML
speed benchmarks.
× Logistic Regression on 10M datapoints on a c3.8xlarge instance
(32 cores, 60GB RAM).
DID I MENTION IT’S FAST?
Engine Speed
Python Sklearn Crashed
R 90sec
Vowpal Wabbit 15sec
Spark 35sec
...and none of the
benchmarks include
data load time! (VP has
none.)
...But what’s THIS
ABOUT A PLATYPUS?
WHAT IS VOWPAL PLATYPUS?
× An open source vehicle for productionizing
Vowpal Wabbit in Python.
WHAT IS VOWPAL PLATYPUS?
× An open source vehicle for productionizing
Vowpal Wabbit in Python.
× Train and predict on Python dictionaries
instead of the obscure VW format.
WHAT IS VOWPAL PLATYPUS?
× An open source vehicle for productionizing
Vowpal Wabbit in Python.
× Train and predict on Python dictionaries
instead of the obscure VW format.
× Easily use VW’s parallel features to go
multicore and multi-machine.
WHAT IS VOWPAL PLATYPUS?
× An open source vehicle for productionizing
Vowpal Wabbit in Python.
× Train and predict on Python dictionaries
instead of the obscure VW format.
× Easily use VW’s parallel features to go
multicore and multi-machine.
VW has been used on
“terascale datasets, with
trillions of features,
billions of training
examples and millions of
parameters in an hour
using a cluster of 1000
machines.”
WHAT IS VOWPAL PLATYPUS?
× An open source vehicle for productionizing
Vowpal Wabbit in Python.
× Train and predict on Python dictionaries
instead of the obscure VW format.
× Easily use VW’s parallel features to go
multicore and multi-machine.
...so far VP has only
been used on a
maximum of 3 machines
(combined 108 core),
but we’re getting there...
dEMo #1!
dEMo #2!
dEMo #2!
dEMo #2!27,279 MOVIES & 138,494 users
dEMo #2!27,279 MOVIES & 138,494 users
3,757,977,826PReDICTIONS...need to be made.
dEMo #2!27,279 MOVIES & 138,494 users
21m47s
3,757,977,826PReDICTIONS...need to be made.
Total runtime on
3x c4.8xlarge
(108 cores total)
342nanoseconds per prediction
(wall clock time)
THE END! (...OR IS IT?)

More Related Content

Viewers also liked

Advanced Python : Static and Class Methods
Advanced Python : Static and Class Methods Advanced Python : Static and Class Methods
Advanced Python : Static and Class Methods
Bhanwar Singh Meena
 
Object Oriented Programming in Python
Object Oriented Programming in PythonObject Oriented Programming in Python
Object Oriented Programming in Python
Sujith Kumar
 
Advance OOP concepts in Python
Advance OOP concepts in PythonAdvance OOP concepts in Python
Advance OOP concepts in Python
Sujith Kumar
 
Basics of Object Oriented Programming in Python
Basics of Object Oriented Programming in PythonBasics of Object Oriented Programming in Python
Basics of Object Oriented Programming in Python
Sujith Kumar
 
Python Tricks That You Can't Live Without
Python Tricks That You Can't Live WithoutPython Tricks That You Can't Live Without
Python Tricks That You Can't Live Without
Audrey Roy
 
Prepping the Analytics organization for Artificial Intelligence evolution
Prepping the Analytics organization for Artificial Intelligence evolutionPrepping the Analytics organization for Artificial Intelligence evolution
Prepping the Analytics organization for Artificial Intelligence evolution
Ramkumar Ravichandran
 
Python 101: Python for Absolute Beginners (PyTexas 2014)
Python 101: Python for Absolute Beginners (PyTexas 2014)Python 101: Python for Absolute Beginners (PyTexas 2014)
Python 101: Python for Absolute Beginners (PyTexas 2014)
Paige Bailey
 
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsPython for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Roelof Pieters
 
Python Worst Practices
Python Worst PracticesPython Worst Practices
Python Worst Practices
Daniel Greenfeld
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial Intelligence
Lukas Masuch
 
Learn 90% of Python in 90 Minutes
Learn 90% of Python in 90 MinutesLearn 90% of Python in 90 Minutes
Learn 90% of Python in 90 Minutes
Matt Harrison
 
Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to Python
Nowell Strite
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
Sri Ambati
 

Viewers also liked (13)

Advanced Python : Static and Class Methods
Advanced Python : Static and Class Methods Advanced Python : Static and Class Methods
Advanced Python : Static and Class Methods
 
Object Oriented Programming in Python
Object Oriented Programming in PythonObject Oriented Programming in Python
Object Oriented Programming in Python
 
Advance OOP concepts in Python
Advance OOP concepts in PythonAdvance OOP concepts in Python
Advance OOP concepts in Python
 
Basics of Object Oriented Programming in Python
Basics of Object Oriented Programming in PythonBasics of Object Oriented Programming in Python
Basics of Object Oriented Programming in Python
 
Python Tricks That You Can't Live Without
Python Tricks That You Can't Live WithoutPython Tricks That You Can't Live Without
Python Tricks That You Can't Live Without
 
Prepping the Analytics organization for Artificial Intelligence evolution
Prepping the Analytics organization for Artificial Intelligence evolutionPrepping the Analytics organization for Artificial Intelligence evolution
Prepping the Analytics organization for Artificial Intelligence evolution
 
Python 101: Python for Absolute Beginners (PyTexas 2014)
Python 101: Python for Absolute Beginners (PyTexas 2014)Python 101: Python for Absolute Beginners (PyTexas 2014)
Python 101: Python for Absolute Beginners (PyTexas 2014)
 
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural NetsPython for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
 
Python Worst Practices
Python Worst PracticesPython Worst Practices
Python Worst Practices
 
Deep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial Intelligence
 
Learn 90% of Python in 90 Minutes
Learn 90% of Python in 90 MinutesLearn 90% of Python in 90 Minutes
Learn 90% of Python in 90 Minutes
 
Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to Python
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
 

Similar to Vowpal Platypus: Very Fast Multi-Core Machine Learning in Python.

Spark Gotchas and Lessons Learned (2/20/20)
Spark Gotchas and Lessons Learned (2/20/20)Spark Gotchas and Lessons Learned (2/20/20)
Spark Gotchas and Lessons Learned (2/20/20)
Jen Waller
 
MongoDB & Machine Learning
MongoDB & Machine LearningMongoDB & Machine Learning
MongoDB & Machine Learning
Tom Maiaroto
 
The computer science behind a modern disributed data store
The computer science behind a modern disributed data storeThe computer science behind a modern disributed data store
The computer science behind a modern disributed data store
J On The Beach
 
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
NETWAYS
 
Aug 2012 HUG: Hug BigTop
Aug 2012 HUG: Hug BigTopAug 2012 HUG: Hug BigTop
Aug 2012 HUG: Hug BigTop
Yahoo Developer Network
 
The Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed DatabaseThe Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed Database
ArangoDB Database
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
Rajesh Rajamani
 
Dear compiler please don't be my nanny v2
Dear compiler  please don't be my nanny v2Dear compiler  please don't be my nanny v2
Dear compiler please don't be my nanny v2
Dino Dini
 
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Distributed machine learning 101 using apache spark from a browser   devoxx.b...Distributed machine learning 101 using apache spark from a browser   devoxx.b...
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Andy Petrella
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
Domino Data Lab
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
Mike Acton
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Databricks
 
Metasepi team meeting #16: Safety on ATS language + MCU
Metasepi team meeting #16: Safety on ATS language + MCUMetasepi team meeting #16: Safety on ATS language + MCU
Metasepi team meeting #16: Safety on ATS language + MCU
Kiwamu Okabe
 
Sparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With SparkSparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With Spark
Ian Pointer
 
Lessons I Learned While Scaling to 5000 Puppet Agents
Lessons I Learned While Scaling to 5000 Puppet AgentsLessons I Learned While Scaling to 5000 Puppet Agents
Lessons I Learned While Scaling to 5000 Puppet Agents
Puppet
 
Data science tutorial
Data science tutorialData science tutorial
Data science tutorial
Karumanchi Sujatha
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
Travis Oliphant
 
Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit
Antti Haapala
 
Beat the devil: towards a Drupal performance benchmark
Beat the devil: towards a Drupal performance benchmarkBeat the devil: towards a Drupal performance benchmark
Beat the devil: towards a Drupal performance benchmark
Pedro González Serrano
 
The Right Data for the Right Job
The Right Data for the Right JobThe Right Data for the Right Job
The Right Data for the Right Job
Emily Curtin
 

Similar to Vowpal Platypus: Very Fast Multi-Core Machine Learning in Python. (20)

Spark Gotchas and Lessons Learned (2/20/20)
Spark Gotchas and Lessons Learned (2/20/20)Spark Gotchas and Lessons Learned (2/20/20)
Spark Gotchas and Lessons Learned (2/20/20)
 
MongoDB & Machine Learning
MongoDB & Machine LearningMongoDB & Machine Learning
MongoDB & Machine Learning
 
The computer science behind a modern disributed data store
The computer science behind a modern disributed data storeThe computer science behind a modern disributed data store
The computer science behind a modern disributed data store
 
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
 
Aug 2012 HUG: Hug BigTop
Aug 2012 HUG: Hug BigTopAug 2012 HUG: Hug BigTop
Aug 2012 HUG: Hug BigTop
 
The Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed DatabaseThe Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed Database
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
Dear compiler please don't be my nanny v2
Dear compiler  please don't be my nanny v2Dear compiler  please don't be my nanny v2
Dear compiler please don't be my nanny v2
 
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Distributed machine learning 101 using apache spark from a browser   devoxx.b...Distributed machine learning 101 using apache spark from a browser   devoxx.b...
Distributed machine learning 101 using apache spark from a browser devoxx.b...
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
 
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
 
Metasepi team meeting #16: Safety on ATS language + MCU
Metasepi team meeting #16: Safety on ATS language + MCUMetasepi team meeting #16: Safety on ATS language + MCU
Metasepi team meeting #16: Safety on ATS language + MCU
 
Sparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With SparkSparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With Spark
 
Lessons I Learned While Scaling to 5000 Puppet Agents
Lessons I Learned While Scaling to 5000 Puppet AgentsLessons I Learned While Scaling to 5000 Puppet Agents
Lessons I Learned While Scaling to 5000 Puppet Agents
 
Data science tutorial
Data science tutorialData science tutorial
Data science tutorial
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
 
Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit
 
Beat the devil: towards a Drupal performance benchmark
Beat the devil: towards a Drupal performance benchmarkBeat the devil: towards a Drupal performance benchmark
Beat the devil: towards a Drupal performance benchmark
 
The Right Data for the Right Job
The Right Data for the Right JobThe Right Data for the Right Job
The Right Data for the Right Job
 

Recently uploaded

Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdfNamma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
22ad0301
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
MastanaihnaiduYasam
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
Timothy Spann
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
tzu5xla
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
ugydym
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
ArshadAyub49
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
Vietnam Cotton & Spinning Association
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
nhutnguyen355078
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Marlon Dumas
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
keesa2
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
GeorgiiSteshenko
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
9gr6pty
 

Recently uploaded (20)

Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdfNamma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
Namma-Kalvi-11th-Physics-Study-Material-Unit-1-EM-221086.pdf
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative ClassifiersML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
ML-PPT-UNIT-2 Generative Classifiers Discriminative Classifiers
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus06-18-2024-Princeton Meetup-Introduction to Milvus
06-18-2024-Princeton Meetup-Introduction to Milvus
 
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理 原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
原版一比一爱尔兰都柏林大学毕业证(UCD毕业证书)如何办理
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
 
Sid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.pptSid Sigma educational and problem solving power point- Six Sigma.ppt
Sid Sigma educational and problem solving power point- Six Sigma.ppt
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics March 2024
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
 
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
一比一原版(uob毕业证书)伯明翰大学毕业证如何办理
 

Vowpal Platypus: Very Fast Multi-Core Machine Learning in Python.

  • 1. Beware… For It’S THE... Vowpal platypus Peter HurforD (With a little help from some friends)
  • 2. WE OFTEN WANT TO PREDICT STUFF...
  • 3. WE OFTEN WANT TO PREDICT STUFF… ...BUT WE RUN INTO LIMITATIONS.
  • 4. WE OFTEN WANT TO PREDICT STUFF… ...BUT WE RUN INTO LIMITATIONS. × ...Data set is too large, it doesn’t fit in RAM.
  • 5. WE OFTEN WANT TO PREDICT STUFF… ...BUT WE RUN INTO LIMITATIONS. × ...Data set is too large, it doesn’t fit in RAM. × ...Data set is so large, it doesn’t fit on disk!
  • 6. WE OFTEN WANT TO PREDICT STUFF… ...BUT WE RUN INTO LIMITATIONS. × ...Data set is too large, it doesn’t fit in RAM. × ...Data set is so large, it doesn’t fit on disk! × ...Model train time is so slow, you can’t iterate and try things.
  • 7. “I want to use parallel learning algorithms to create fantastic learning machines!” - John Langford, 1997
  • 8. YOU FOOL! THE ONLY THING PARALLEL MACHINES ARE USEFUL FOR ARE COMPUTATIONAL WINDTUNNELS!
  • 15. Traditional Approach 1. Load all training data into RAM at once. 2. Fit model to training dataset. 3. Load all predicting data into RAM at once. 4. Use trained model to make predictions. WHAT DOES IT DO?
  • 16. VW “Online” Approach 1. Train model on single datapoints, one at a time. 2. Do it again multiple times. 3. Use trained model to predict on new datapoints, one at a time. Traditional Approach 1. Load all training data into RAM at once. 2. Fit model to training dataset. 3. Load all predicting data into RAM at once. 4. Use trained model to make predictions. WHAT DOES IT DO?
  • 17. × Online approach eventually converges to the same results as a traditional (batch) approach over enough iterations. WHAT DOES IT DO?
  • 18. WHAT DOES IT DO? × Online approach eventually converges to the same results as a traditional (batch) approach over enough iterations. × But you’re no longer dependent on RAM!
  • 19. Kaggle: World Data Science Competitions × 3rd, 14th, and 29th / 718 on $16K Criteo ad click challenge × 3rd / 472 on $2K KDD Cup Challenge × 8th / 128 on $25K Avito.ru illicit content filtering challenge IS IT ANY GOOD?
  • 20. × szilard/benchm-ml: widely cited (1127 star) independent ML speed benchmarks. × Logistic Regression on 10M datapoints on a c3.8xlarge instance (32 cores, 60GB RAM). DID I MENTION IT’S FAST? Engine Speed Python Sklearn Crashed R 90sec Vowpal Wabbit 15sec Spark 35sec
  • 21. × szilard/benchm-ml: widely cited (1127 star) independent ML speed benchmarks. × Logistic Regression on 10M datapoints on a c3.8xlarge instance (32 cores, 60GB RAM). DID I MENTION IT’S FAST? Engine Speed Python Sklearn Crashed R 90sec Vowpal Wabbit 15sec Spark 35sec Yes, this was Spark 2.0, but it was using MLLib. ML performance is under testing now.
  • 22. × szilard/benchm-ml: widely cited (1127 star) independent ML speed benchmarks. × Logistic Regression on 10M datapoints on a c3.8xlarge instance (32 cores, 60GB RAM). DID I MENTION IT’S FAST? Engine Speed Python Sklearn Crashed R 90sec Vowpal Wabbit 15sec Spark 35sec But this benchmark was only single core!
  • 23. × szilard/benchm-ml: widely cited (1127 star) independent ML speed benchmarks. × Logistic Regression on 10M datapoints on a c3.8xlarge instance (32 cores, 60GB RAM). DID I MENTION IT’S FAST? Engine Speed Python Sklearn Crashed R 90sec Vowpal Wabbit 15sec Spark 35sec ...and none of the benchmarks include data load time! (VP has none.)
  • 25. WHAT IS VOWPAL PLATYPUS? × An open source vehicle for productionizing Vowpal Wabbit in Python.
  • 26. WHAT IS VOWPAL PLATYPUS? × An open source vehicle for productionizing Vowpal Wabbit in Python. × Train and predict on Python dictionaries instead of the obscure VW format.
  • 27. WHAT IS VOWPAL PLATYPUS? × An open source vehicle for productionizing Vowpal Wabbit in Python. × Train and predict on Python dictionaries instead of the obscure VW format. × Easily use VW’s parallel features to go multicore and multi-machine.
  • 28. WHAT IS VOWPAL PLATYPUS? × An open source vehicle for productionizing Vowpal Wabbit in Python. × Train and predict on Python dictionaries instead of the obscure VW format. × Easily use VW’s parallel features to go multicore and multi-machine. VW has been used on “terascale datasets, with trillions of features, billions of training examples and millions of parameters in an hour using a cluster of 1000 machines.”
  • 29. WHAT IS VOWPAL PLATYPUS? × An open source vehicle for productionizing Vowpal Wabbit in Python. × Train and predict on Python dictionaries instead of the obscure VW format. × Easily use VW’s parallel features to go multicore and multi-machine. ...so far VP has only been used on a maximum of 3 machines (combined 108 core), but we’re getting there...
  • 31.
  • 32.
  • 33.
  • 34.
  • 37. dEMo #2!27,279 MOVIES & 138,494 users
  • 38. dEMo #2!27,279 MOVIES & 138,494 users 3,757,977,826PReDICTIONS...need to be made.
  • 39. dEMo #2!27,279 MOVIES & 138,494 users 21m47s 3,757,977,826PReDICTIONS...need to be made. Total runtime on 3x c4.8xlarge (108 cores total) 342nanoseconds per prediction (wall clock time)
  • 40. THE END! (...OR IS IT?)