Vowpal Platypus: Very Fast Multi-Core Machine Learning in Python.

Beware… For It’S THE...
Vowpal platypus
Peter HurforD
(With a little help from some friends)
WE OFTEN WANT TO PREDICT STUFF...
WE OFTEN WANT TO PREDICT STUFF…
...BUT WE RUN INTO LIMITATIONS.
WE OFTEN WANT TO PREDICT STUFF…
...BUT WE RUN INTO LIMITATIONS.
× ...Data set is too large, it doesn’t fit in RAM.
WE OFTEN WANT TO PREDICT STUFF…
...BUT WE RUN INTO LIMITATIONS.
× ...Data set is too large, it doesn’t fit in RAM.
× ...Data set is so large, it doesn’t fit on disk!
WE OFTEN WANT TO PREDICT STUFF…
...BUT WE RUN INTO LIMITATIONS.
× ...Data set is too large, it doesn’t fit in RAM.
× ...Data set is so large, it doesn’t fit on disk!
× ...Model train time is so slow, you can’t iterate
and try things.
“I want to use parallel
learning algorithms to
create fantastic learning
machines!”
- John Langford, 1997
YOU FOOL! THE ONLY
THING PARALLEL
MACHINES ARE USEFUL
FOR ARE COMPUTATIONAL
WINDTUNNELS!
TEN YEARS LATER...
VOWPAL
...Fast Online Learning
TEN YEARS LATER...
...WHAT’s WITH THE NAME?
...WHAT’s WITH THE NAME?
...WHAT’s WITH THE NAME?
+
...WHAT’s WITH THE NAME?
+
Traditional Approach
1. Load all training data
into RAM at once.
2. Fit model to training
dataset.
3. Load all predicting data
into RAM at once.
4. Use trained model to
make predictions.
WHAT DOES IT DO?
VW “Online” Approach
1. Train model on single
datapoints, one at a
time.
2. Do it again multiple
times.
3. Use trained model to
predict on new
datapoints, one at a
time.
Traditional Approach
1. Load all training data
into RAM at once.
2. Fit model to training
dataset.
3. Load all predicting data
into RAM at once.
4. Use trained model to
make predictions.
WHAT DOES IT DO?
× Online approach
eventually converges to
the same results as a
traditional (batch)
approach over enough
iterations.
WHAT DOES IT DO?
WHAT DOES IT DO?
× Online approach
eventually converges to
the same results as a
traditional (batch)
approach over enough
iterations.
× But you’re no longer
dependent on RAM!
Kaggle: World Data Science Competitions
× 3rd, 14th, and 29th / 718 on $16K Criteo ad click challenge
× 3rd / 472 on $2K KDD Cup Challenge
× 8th / 128 on $25K Avito.ru illicit content filtering challenge
IS IT ANY GOOD?
× szilard/benchm-ml: widely cited (1127 star) independent ML
speed benchmarks.
× Logistic Regression on 10M datapoints on a c3.8xlarge instance
(32 cores, 60GB RAM).
DID I MENTION IT’S FAST?
Engine Speed
Python Sklearn Crashed
R 90sec
Vowpal Wabbit 15sec
Spark 35sec
× szilard/benchm-ml: widely cited (1127 star) independent ML
speed benchmarks.
× Logistic Regression on 10M datapoints on a c3.8xlarge instance
(32 cores, 60GB RAM).
DID I MENTION IT’S FAST?
Engine Speed
Python Sklearn Crashed
R 90sec
Vowpal Wabbit 15sec
Spark 35sec
Yes, this was Spark 2.0, but it
was using MLLib. ML
performance is under testing
now.
× szilard/benchm-ml: widely cited (1127 star) independent ML
speed benchmarks.
× Logistic Regression on 10M datapoints on a c3.8xlarge instance
(32 cores, 60GB RAM).
DID I MENTION IT’S FAST?
Engine Speed
Python Sklearn Crashed
R 90sec
Vowpal Wabbit 15sec
Spark 35sec
But this benchmark was
only single core!
× szilard/benchm-ml: widely cited (1127 star) independent ML
speed benchmarks.
× Logistic Regression on 10M datapoints on a c3.8xlarge instance
(32 cores, 60GB RAM).
DID I MENTION IT’S FAST?
Engine Speed
Python Sklearn Crashed
R 90sec
Vowpal Wabbit 15sec
Spark 35sec
...and none of the
benchmarks include
data load time! (VP has
none.)
...But what’s THIS
ABOUT A PLATYPUS?
WHAT IS VOWPAL PLATYPUS?
× An open source vehicle for productionizing
Vowpal Wabbit in Python.
WHAT IS VOWPAL PLATYPUS?
× An open source vehicle for productionizing
Vowpal Wabbit in Python.
× Train and predict on Python dictionaries
instead of the obscure VW format.
WHAT IS VOWPAL PLATYPUS?
× An open source vehicle for productionizing
Vowpal Wabbit in Python.
× Train and predict on Python dictionaries
instead of the obscure VW format.
× Easily use VW’s parallel features to go
multicore and multi-machine.
WHAT IS VOWPAL PLATYPUS?
× An open source vehicle for productionizing
Vowpal Wabbit in Python.
× Train and predict on Python dictionaries
instead of the obscure VW format.
× Easily use VW’s parallel features to go
multicore and multi-machine.
VW has been used on
“terascale datasets, with
trillions of features,
billions of training
examples and millions of
parameters in an hour
using a cluster of 1000
machines.”
WHAT IS VOWPAL PLATYPUS?
× An open source vehicle for productionizing
Vowpal Wabbit in Python.
× Train and predict on Python dictionaries
instead of the obscure VW format.
× Easily use VW’s parallel features to go
multicore and multi-machine.
...so far VP has only
been used on a
maximum of 3 machines
(combined 108 core),
but we’re getting there...
dEMo #1!
Vowpal Platypus: Very Fast Multi-Core Machine Learning in Python.
Vowpal Platypus: Very Fast Multi-Core Machine Learning in Python.
Vowpal Platypus: Very Fast Multi-Core Machine Learning in Python.
Vowpal Platypus: Very Fast Multi-Core Machine Learning in Python.
dEMo #2!
dEMo #2!
dEMo #2!27,279 MOVIES & 138,494 users
dEMo #2!27,279 MOVIES & 138,494 users
3,757,977,826PReDICTIONS...need to be made.
dEMo #2!27,279 MOVIES & 138,494 users
21m47s
3,757,977,826PReDICTIONS...need to be made.
Total runtime on
3x c4.8xlarge
(108 cores total)
342nanoseconds per prediction
(wall clock time)
THE END! (...OR IS IT?)
1 of 40

Recommended

Voxxed Days Thesaloniki 2016 - Machine Learning for Developers by
Voxxed Days Thesaloniki 2016 - Machine Learning for DevelopersVoxxed Days Thesaloniki 2016 - Machine Learning for Developers
Voxxed Days Thesaloniki 2016 - Machine Learning for DevelopersVoxxed Days Thessaloniki
263 views56 slides
Tensorflow London 12: Marcel Horstmann and Laurent Decamp 'Using TensorFlow t... by
Tensorflow London 12: Marcel Horstmann and Laurent Decamp 'Using TensorFlow t...Tensorflow London 12: Marcel Horstmann and Laurent Decamp 'Using TensorFlow t...
Tensorflow London 12: Marcel Horstmann and Laurent Decamp 'Using TensorFlow t...Seldon
534 views20 slides
Pydata2017 11-29 by
Pydata2017 11-29Pydata2017 11-29
Pydata2017 11-29Yuta Kashino
5.5K views44 slides
A TurtleBot Configurations Measurement Harness to Build a Sensitivity Model by
A TurtleBot Configurations Measurement Harness to Build a Sensitivity ModelA TurtleBot Configurations Measurement Harness to Build a Sensitivity Model
A TurtleBot Configurations Measurement Harness to Build a Sensitivity ModelMiguel Velez
84 views32 slides
Deep learning by
Deep learningDeep learning
Deep learningPratap Dangeti
8.8K views32 slides
Machine learning with scikitlearn by
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearnPratap Dangeti
3.9K views31 slides

More Related Content

Viewers also liked

Advanced Python : Static and Class Methods by
Advanced Python : Static and Class Methods Advanced Python : Static and Class Methods
Advanced Python : Static and Class Methods Bhanwar Singh Meena
21.7K views12 slides
Object Oriented Programming in Python by
Object Oriented Programming in PythonObject Oriented Programming in Python
Object Oriented Programming in PythonSujith Kumar
7.6K views36 slides
Advance OOP concepts in Python by
Advance OOP concepts in PythonAdvance OOP concepts in Python
Advance OOP concepts in PythonSujith Kumar
19.3K views23 slides
Basics of Object Oriented Programming in Python by
Basics of Object Oriented Programming in PythonBasics of Object Oriented Programming in Python
Basics of Object Oriented Programming in PythonSujith Kumar
35.5K views29 slides
Python Tricks That You Can't Live Without by
Python Tricks That You Can't Live WithoutPython Tricks That You Can't Live Without
Python Tricks That You Can't Live WithoutAudrey Roy
29.6K views45 slides
Prepping the Analytics organization for Artificial Intelligence evolution by
Prepping the Analytics organization for Artificial Intelligence evolutionPrepping the Analytics organization for Artificial Intelligence evolution
Prepping the Analytics organization for Artificial Intelligence evolutionRamkumar Ravichandran
36.7K views38 slides

Viewers also liked(13)

Advanced Python : Static and Class Methods by Bhanwar Singh Meena
Advanced Python : Static and Class Methods Advanced Python : Static and Class Methods
Advanced Python : Static and Class Methods
Bhanwar Singh Meena21.7K views
Object Oriented Programming in Python by Sujith Kumar
Object Oriented Programming in PythonObject Oriented Programming in Python
Object Oriented Programming in Python
Sujith Kumar7.6K views
Advance OOP concepts in Python by Sujith Kumar
Advance OOP concepts in PythonAdvance OOP concepts in Python
Advance OOP concepts in Python
Sujith Kumar19.3K views
Basics of Object Oriented Programming in Python by Sujith Kumar
Basics of Object Oriented Programming in PythonBasics of Object Oriented Programming in Python
Basics of Object Oriented Programming in Python
Sujith Kumar35.5K views
Python Tricks That You Can't Live Without by Audrey Roy
Python Tricks That You Can't Live WithoutPython Tricks That You Can't Live Without
Python Tricks That You Can't Live Without
Audrey Roy29.6K views
Prepping the Analytics organization for Artificial Intelligence evolution by Ramkumar Ravichandran
Prepping the Analytics organization for Artificial Intelligence evolutionPrepping the Analytics organization for Artificial Intelligence evolution
Prepping the Analytics organization for Artificial Intelligence evolution
Ramkumar Ravichandran36.7K views
Python 101: Python for Absolute Beginners (PyTexas 2014) by Paige Bailey
Python 101: Python for Absolute Beginners (PyTexas 2014)Python 101: Python for Absolute Beginners (PyTexas 2014)
Python 101: Python for Absolute Beginners (PyTexas 2014)
Paige Bailey29.8K views
Python for Image Understanding: Deep Learning with Convolutional Neural Nets by Roelof Pieters
Python for Image Understanding: Deep Learning with Convolutional Neural NetsPython for Image Understanding: Deep Learning with Convolutional Neural Nets
Python for Image Understanding: Deep Learning with Convolutional Neural Nets
Roelof Pieters90.5K views
Deep Learning - The Past, Present and Future of Artificial Intelligence by Lukas Masuch
Deep Learning - The Past, Present and Future of Artificial IntelligenceDeep Learning - The Past, Present and Future of Artificial Intelligence
Deep Learning - The Past, Present and Future of Artificial Intelligence
Lukas Masuch381.7K views
Learn 90% of Python in 90 Minutes by Matt Harrison
Learn 90% of Python in 90 MinutesLearn 90% of Python in 90 Minutes
Learn 90% of Python in 90 Minutes
Matt Harrison319.7K views
Introduction to Python by Nowell Strite
Introduction to PythonIntroduction to Python
Introduction to Python
Nowell Strite168K views
Deep Learning through Examples by Sri Ambati
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
Sri Ambati150.6K views

Similar to Vowpal Platypus: Very Fast Multi-Core Machine Learning in Python.

Spark Gotchas and Lessons Learned (2/20/20) by
Spark Gotchas and Lessons Learned (2/20/20)Spark Gotchas and Lessons Learned (2/20/20)
Spark Gotchas and Lessons Learned (2/20/20)Jen Waller
171 views22 slides
MongoDB & Machine Learning by
MongoDB & Machine LearningMongoDB & Machine Learning
MongoDB & Machine LearningTom Maiaroto
14.3K views44 slides
The computer science behind a modern disributed data store by
The computer science behind a modern disributed data storeThe computer science behind a modern disributed data store
The computer science behind a modern disributed data storeJ On The Beach
1.6K views56 slides
OSDC 2018 | The Computer science behind a modern distributed data store by Ma... by
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...NETWAYS
16 views57 slides
Aug 2012 HUG: Hug BigTop by
Aug 2012 HUG: Hug BigTopAug 2012 HUG: Hug BigTop
Aug 2012 HUG: Hug BigTopYahoo Developer Network
942 views37 slides
The Computer Science Behind a modern Distributed Database by
The Computer Science Behind a modern Distributed DatabaseThe Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed DatabaseArangoDB Database
414 views56 slides

Similar to Vowpal Platypus: Very Fast Multi-Core Machine Learning in Python.(20)

Spark Gotchas and Lessons Learned (2/20/20) by Jen Waller
Spark Gotchas and Lessons Learned (2/20/20)Spark Gotchas and Lessons Learned (2/20/20)
Spark Gotchas and Lessons Learned (2/20/20)
Jen Waller171 views
MongoDB & Machine Learning by Tom Maiaroto
MongoDB & Machine LearningMongoDB & Machine Learning
MongoDB & Machine Learning
Tom Maiaroto14.3K views
The computer science behind a modern disributed data store by J On The Beach
The computer science behind a modern disributed data storeThe computer science behind a modern disributed data store
The computer science behind a modern disributed data store
J On The Beach1.6K views
OSDC 2018 | The Computer science behind a modern distributed data store by Ma... by NETWAYS
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
OSDC 2018 | The Computer science behind a modern distributed data store by Ma...
NETWAYS16 views
The Computer Science Behind a modern Distributed Database by ArangoDB Database
The Computer Science Behind a modern Distributed DatabaseThe Computer Science Behind a modern Distributed Database
The Computer Science Behind a modern Distributed Database
ArangoDB Database414 views
Dear compiler please don't be my nanny v2 by Dino Dini
Dear compiler  please don't be my nanny v2Dear compiler  please don't be my nanny v2
Dear compiler please don't be my nanny v2
Dino Dini1.3K views
Distributed machine learning 101 using apache spark from a browser devoxx.b... by Andy Petrella
Distributed machine learning 101 using apache spark from a browser   devoxx.b...Distributed machine learning 101 using apache spark from a browser   devoxx.b...
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Andy Petrella1.1K views
Leveraging Open Source Automated Data Science Tools by Domino Data Lab
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
Domino Data Lab 664 views
Data oriented design and c++ by Mike Acton
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
Mike Acton33.6K views
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi... by Databricks
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Microservices and Teraflops: Effortlessly Scaling Data Science with PyWren wi...
Databricks900 views
Metasepi team meeting #16: Safety on ATS language + MCU by Kiwamu Okabe
Metasepi team meeting #16: Safety on ATS language + MCUMetasepi team meeting #16: Safety on ATS language + MCU
Metasepi team meeting #16: Safety on ATS language + MCU
Kiwamu Okabe15.1K views
Sparklife - Life In The Trenches With Spark by Ian Pointer
Sparklife - Life In The Trenches With SparkSparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With Spark
Ian Pointer595 views
Lessons I Learned While Scaling to 5000 Puppet Agents by Puppet
Lessons I Learned While Scaling to 5000 Puppet AgentsLessons I Learned While Scaling to 5000 Puppet Agents
Lessons I Learned While Scaling to 5000 Puppet Agents
Puppet15.1K views
Wapid and wobust active online machine leawning with Vowpal Wabbit by Antti Haapala
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit
Antti Haapala2.4K views
The Right Data for the Right Job by Emily Curtin
The Right Data for the Right JobThe Right Data for the Right Job
The Right Data for the Right Job
Emily Curtin279 views

Recently uploaded

Listed Instruments Survey 2022.pptx by
Listed Instruments Survey  2022.pptxListed Instruments Survey  2022.pptx
Listed Instruments Survey 2022.pptxsecretariat4
93 views12 slides
Dr. Ousmane Badiane-2023 ReSAKSS Conference by
Dr. Ousmane Badiane-2023 ReSAKSS ConferenceDr. Ousmane Badiane-2023 ReSAKSS Conference
Dr. Ousmane Badiane-2023 ReSAKSS ConferenceAKADEMIYA2063
5 views34 slides
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf by
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf10urkyr34
7 views259 slides
VoxelNet by
VoxelNetVoxelNet
VoxelNettaeseon ryu
17 views21 slides
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an... by
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...StatsCommunications
7 views26 slides
META.pptx by
META.pptxMETA.pptx
META.pptxvasanthan19012003
6 views10 slides

Recently uploaded(20)

Listed Instruments Survey 2022.pptx by secretariat4
Listed Instruments Survey  2022.pptxListed Instruments Survey  2022.pptx
Listed Instruments Survey 2022.pptx
secretariat493 views
Dr. Ousmane Badiane-2023 ReSAKSS Conference by AKADEMIYA2063
Dr. Ousmane Badiane-2023 ReSAKSS ConferenceDr. Ousmane Badiane-2023 ReSAKSS Conference
Dr. Ousmane Badiane-2023 ReSAKSS Conference
AKADEMIYA20635 views
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf by 10urkyr34
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf
10urkyr347 views
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an... by StatsCommunications
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
OECD-Persol Holdings Workshop on Advancing Employee Well-being in Business an...
SUPER STORE SQL PROJECT.pptx by khan888620
SUPER STORE SQL PROJECT.pptxSUPER STORE SQL PROJECT.pptx
SUPER STORE SQL PROJECT.pptx
khan88862013 views
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo... by DataScienceConferenc1
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][DigiHealth] Muthu Ramachandran AI and Blockchain Framework fo...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... by DataScienceConferenc1
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
Shreyas hospital statistics.pdf by samithavinal
Shreyas hospital statistics.pdfShreyas hospital statistics.pdf
Shreyas hospital statistics.pdf
samithavinal5 views
DGST Methodology Presentation.pdf by maddierlegum
DGST Methodology Presentation.pdfDGST Methodology Presentation.pdf
DGST Methodology Presentation.pdf
maddierlegum7 views
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init... by DataScienceConferenc1
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
[DSC Europe 23][Cryptica] Martin_Summer_Digital_central_bank_money_Ideas_init...
4_4_WP_4_06_ND_Model.pptx by d6fmc6kwd4
4_4_WP_4_06_ND_Model.pptx4_4_WP_4_06_ND_Model.pptx
4_4_WP_4_06_ND_Model.pptx
d6fmc6kwd47 views
Data Journeys Hard Talk workshop final.pptx by info828217
Data Journeys Hard Talk workshop final.pptxData Journeys Hard Talk workshop final.pptx
Data Journeys Hard Talk workshop final.pptx
info82821711 views
LIVE OAK MEMORIAL PARK.pptx by ms2332always
LIVE OAK MEMORIAL PARK.pptxLIVE OAK MEMORIAL PARK.pptx
LIVE OAK MEMORIAL PARK.pptx
ms2332always7 views
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation by DataScienceConferenc1
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation

Vowpal Platypus: Very Fast Multi-Core Machine Learning in Python.

  • 1. Beware… For It’S THE... Vowpal platypus Peter HurforD (With a little help from some friends)
  • 2. WE OFTEN WANT TO PREDICT STUFF...
  • 3. WE OFTEN WANT TO PREDICT STUFF… ...BUT WE RUN INTO LIMITATIONS.
  • 4. WE OFTEN WANT TO PREDICT STUFF… ...BUT WE RUN INTO LIMITATIONS. × ...Data set is too large, it doesn’t fit in RAM.
  • 5. WE OFTEN WANT TO PREDICT STUFF… ...BUT WE RUN INTO LIMITATIONS. × ...Data set is too large, it doesn’t fit in RAM. × ...Data set is so large, it doesn’t fit on disk!
  • 6. WE OFTEN WANT TO PREDICT STUFF… ...BUT WE RUN INTO LIMITATIONS. × ...Data set is too large, it doesn’t fit in RAM. × ...Data set is so large, it doesn’t fit on disk! × ...Model train time is so slow, you can’t iterate and try things.
  • 7. “I want to use parallel learning algorithms to create fantastic learning machines!” - John Langford, 1997
  • 8. YOU FOOL! THE ONLY THING PARALLEL MACHINES ARE USEFUL FOR ARE COMPUTATIONAL WINDTUNNELS!
  • 15. Traditional Approach 1. Load all training data into RAM at once. 2. Fit model to training dataset. 3. Load all predicting data into RAM at once. 4. Use trained model to make predictions. WHAT DOES IT DO?
  • 16. VW “Online” Approach 1. Train model on single datapoints, one at a time. 2. Do it again multiple times. 3. Use trained model to predict on new datapoints, one at a time. Traditional Approach 1. Load all training data into RAM at once. 2. Fit model to training dataset. 3. Load all predicting data into RAM at once. 4. Use trained model to make predictions. WHAT DOES IT DO?
  • 17. × Online approach eventually converges to the same results as a traditional (batch) approach over enough iterations. WHAT DOES IT DO?
  • 18. WHAT DOES IT DO? × Online approach eventually converges to the same results as a traditional (batch) approach over enough iterations. × But you’re no longer dependent on RAM!
  • 19. Kaggle: World Data Science Competitions × 3rd, 14th, and 29th / 718 on $16K Criteo ad click challenge × 3rd / 472 on $2K KDD Cup Challenge × 8th / 128 on $25K Avito.ru illicit content filtering challenge IS IT ANY GOOD?
  • 20. × szilard/benchm-ml: widely cited (1127 star) independent ML speed benchmarks. × Logistic Regression on 10M datapoints on a c3.8xlarge instance (32 cores, 60GB RAM). DID I MENTION IT’S FAST? Engine Speed Python Sklearn Crashed R 90sec Vowpal Wabbit 15sec Spark 35sec
  • 21. × szilard/benchm-ml: widely cited (1127 star) independent ML speed benchmarks. × Logistic Regression on 10M datapoints on a c3.8xlarge instance (32 cores, 60GB RAM). DID I MENTION IT’S FAST? Engine Speed Python Sklearn Crashed R 90sec Vowpal Wabbit 15sec Spark 35sec Yes, this was Spark 2.0, but it was using MLLib. ML performance is under testing now.
  • 22. × szilard/benchm-ml: widely cited (1127 star) independent ML speed benchmarks. × Logistic Regression on 10M datapoints on a c3.8xlarge instance (32 cores, 60GB RAM). DID I MENTION IT’S FAST? Engine Speed Python Sklearn Crashed R 90sec Vowpal Wabbit 15sec Spark 35sec But this benchmark was only single core!
  • 23. × szilard/benchm-ml: widely cited (1127 star) independent ML speed benchmarks. × Logistic Regression on 10M datapoints on a c3.8xlarge instance (32 cores, 60GB RAM). DID I MENTION IT’S FAST? Engine Speed Python Sklearn Crashed R 90sec Vowpal Wabbit 15sec Spark 35sec ...and none of the benchmarks include data load time! (VP has none.)
  • 25. WHAT IS VOWPAL PLATYPUS? × An open source vehicle for productionizing Vowpal Wabbit in Python.
  • 26. WHAT IS VOWPAL PLATYPUS? × An open source vehicle for productionizing Vowpal Wabbit in Python. × Train and predict on Python dictionaries instead of the obscure VW format.
  • 27. WHAT IS VOWPAL PLATYPUS? × An open source vehicle for productionizing Vowpal Wabbit in Python. × Train and predict on Python dictionaries instead of the obscure VW format. × Easily use VW’s parallel features to go multicore and multi-machine.
  • 28. WHAT IS VOWPAL PLATYPUS? × An open source vehicle for productionizing Vowpal Wabbit in Python. × Train and predict on Python dictionaries instead of the obscure VW format. × Easily use VW’s parallel features to go multicore and multi-machine. VW has been used on “terascale datasets, with trillions of features, billions of training examples and millions of parameters in an hour using a cluster of 1000 machines.”
  • 29. WHAT IS VOWPAL PLATYPUS? × An open source vehicle for productionizing Vowpal Wabbit in Python. × Train and predict on Python dictionaries instead of the obscure VW format. × Easily use VW’s parallel features to go multicore and multi-machine. ...so far VP has only been used on a maximum of 3 machines (combined 108 core), but we’re getting there...
  • 37. dEMo #2!27,279 MOVIES & 138,494 users
  • 38. dEMo #2!27,279 MOVIES & 138,494 users 3,757,977,826PReDICTIONS...need to be made.
  • 39. dEMo #2!27,279 MOVIES & 138,494 users 21m47s 3,757,977,826PReDICTIONS...need to be made. Total runtime on 3x c4.8xlarge (108 cores total) 342nanoseconds per prediction (wall clock time)
  • 40. THE END! (...OR IS IT?)