SlideShare a Scribd company logo
Realtime
Predictive
Analytics
Using scikit-learn and RabbitMQ
Using scikit-learn and RabbitMQMichael Becker
PyData Boston 2013
Who is this guy?
Software Engineer @ AWeber
Founder of the DataPhilly Meetup group
@beckerfuffle
beckerfuffle.com
These slides and more @ github.com/mdbecker
What I'll cover
•Model Distribution
What I'll cover
•Model Distribution
•Data flow
What I'll cover
•Model Distribution
•Data flow
•RabbitMQ
What I'll cover
•Model Distribution
•Data flow
•RabbitMQ
•Demo
What I'll cover
•Model Distribution
•Data flow
•RabbitMQ
•Demo
•Scalability
What I'll cover
•Model Distribution
•Data flow
•RabbitMQ
•Demo
•Scalability
•Other considerations
Supervised Learning
38 top wikipedias
Arabic ‫العربية‬
Bulgarian Български
Catalan Català
Czech Čeština
Danish Dansk
German Deutsch
English English
Spanish Español
Estonian Eesti
Basque Euskara
Persian ‫فارسی‬
Finnish Suomi
French Français
Hebrew ‫עברית‬
Hindi िहिन्दी
Croatian Hrvatski
Hungarian Magyar
Indonesian Bahasa Indonesia
Italian Italiano
Japanese 日本語
Kazakh Қазақша
Korean 한국어
Lithuanian Lietuvių
Malay Bahasa Melayu
Dutch Nederlands
Norwegian (Bokmål) Norsk (Bokmål)
Polish Polski
Portuguese Português
Romanian Română
Russian Русский
Slovak Slovenčina
Slovenian Slovenščina
Serbian Српски / Srpski
Swedish Svenska
Turkish Türkçe
Ukrainian Українська
Vietnamese Tiếng Việt
Waray-Waray Winaray
The model
Distributing the model
Data input
The client
Message loss
Enter RabbitMQ
Reliability
Flexible Routing
Clustering
HA Queues
Many clients
AMQP
Reliability
Flexible Routing
Clustering
HA Queues
Many clients
Data processing
The worker
The design
Demo time!
Demo time!
Demo time!
Scaling
Realtime vs batch
Monitoring
Load
Verify
Thank you
API & Worker: Kelly O’Brien (linkedin.com/in/kellyobie)
UI: Matt Parke (ordinaryrobot.com)
Classifier: Michael Becker (github.com/mdbecker)
Images: Wikipedia
My info
Tweet me @beckerfuffle
Find me at beckerfuffle.com
These slides and more @ github.com/mdbecker

More Related Content

Viewers also liked

Machine Learning with scikit-learn
Machine Learning with scikit-learnMachine Learning with scikit-learn
Machine Learning with scikit-learn
odsc
 
Data Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learnData Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learn
Asim Jalis
 
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learnNumerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
Arnaud Joly
 
Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael Varoquaux
Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael VaroquauxPyparis2017 / Scikit-learn - an incomplete yearly review, by Gael Varoquaux
Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael Varoquaux
Pôle Systematic Paris-Region
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
Gilles Louppe
 
Exploring Machine Learning in Python with Scikit-Learn
Exploring Machine Learning in Python with Scikit-LearnExploring Machine Learning in Python with Scikit-Learn
Exploring Machine Learning in Python with Scikit-Learn
Kan Ouivirach, Ph.D.
 
Intro to scikit-learn
Intro to scikit-learnIntro to scikit-learn
Intro to scikit-learn
AWeber
 
Intro to scikit learn may 2017
Intro to scikit learn may 2017Intro to scikit learn may 2017
Intro to scikit learn may 2017
Francesco Mosconi
 
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
PyData
 
Intro to machine learning with scikit learn
Intro to machine learning with scikit learnIntro to machine learning with scikit learn
Intro to machine learning with scikit learn
Yoss Cohen
 
Scikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the projectScikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the project
Gael Varoquaux
 
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
Gilles Louppe
 
Converting Scikit-Learn to PMML
Converting Scikit-Learn to PMMLConverting Scikit-Learn to PMML
Converting Scikit-Learn to PMML
Villu Ruusmann
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/Categorization
Oswal Abhishek
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
Benjamin Bengfort
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Jimmy Lai
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-Learn
Sarah Guido
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
DataRobot
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Olivier Grisel
 
Make Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature EngineeringMake Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature Engineering
DataRobot
 

Viewers also liked (20)

Machine Learning with scikit-learn
Machine Learning with scikit-learnMachine Learning with scikit-learn
Machine Learning with scikit-learn
 
Data Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learnData Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learn
 
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learnNumerical tour in the Python eco-system: Python, NumPy, scikit-learn
Numerical tour in the Python eco-system: Python, NumPy, scikit-learn
 
Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael Varoquaux
Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael VaroquauxPyparis2017 / Scikit-learn - an incomplete yearly review, by Gael Varoquaux
Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael Varoquaux
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
 
Exploring Machine Learning in Python with Scikit-Learn
Exploring Machine Learning in Python with Scikit-LearnExploring Machine Learning in Python with Scikit-Learn
Exploring Machine Learning in Python with Scikit-Learn
 
Intro to scikit-learn
Intro to scikit-learnIntro to scikit-learn
Intro to scikit-learn
 
Intro to scikit learn may 2017
Intro to scikit learn may 2017Intro to scikit learn may 2017
Intro to scikit learn may 2017
 
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...
 
Intro to machine learning with scikit learn
Intro to machine learning with scikit learnIntro to machine learning with scikit learn
Intro to machine learning with scikit learn
 
Scikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the projectScikit-learn for easy machine learning: the vision, the tool, and the project
Scikit-learn for easy machine learning: the vision, the tool, and the project
 
Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
 
Converting Scikit-Learn to PMML
Converting Scikit-Learn to PMMLConverting Scikit-Learn to PMML
Converting Scikit-Learn to PMML
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/Categorization
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...
 
A Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-LearnA Beginner's Guide to Machine Learning with Scikit-Learn
A Beginner's Guide to Machine Learning with Scikit-Learn
 
Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
 
Make Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature EngineeringMake Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature Engineering
 

Similar to Realtime predictive analytics using RabbitMQ & scikit-learn

Drools and jBPM 6 Overview
Drools and jBPM 6 OverviewDrools and jBPM 6 Overview
Drools and jBPM 6 Overview
Mark Proctor
 
Reveal's Advanced Analytics: Using R & Python
Reveal's Advanced Analytics: Using R & PythonReveal's Advanced Analytics: Using R & Python
Reveal's Advanced Analytics: Using R & Python
Poojitha B
 
Qcon beijing 2010
Qcon beijing 2010Qcon beijing 2010
Qcon beijing 2010
Vonbo
 
Object Oriented Programming in Swift Ch0 - Encapsulation
Object Oriented Programming in Swift Ch0 - EncapsulationObject Oriented Programming in Swift Ch0 - Encapsulation
Object Oriented Programming in Swift Ch0 - Encapsulation
Chihyang Li
 
Neo4j in Depth
Neo4j in DepthNeo4j in Depth
Neo4j in Depth
Max De Marzi
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
Travis Oliphant
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
Peter Wang
 
WordCamp 2012 - Seth Carstens Presentation (Responsive Width)
WordCamp 2012 - Seth Carstens Presentation (Responsive Width)WordCamp 2012 - Seth Carstens Presentation (Responsive Width)
WordCamp 2012 - Seth Carstens Presentation (Responsive Width)
Seth Carstens
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
Sri Ambati
 
Análisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic StackAnálisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic Stack
Elasticsearch
 
Lean Security
Lean SecurityLean Security
Lean Security
Ben Johnson
 
SearchLove Boston 2016 | Paul Shapiro | How to Automate Your Keyword Research
SearchLove Boston 2016 | Paul Shapiro | How to Automate Your Keyword ResearchSearchLove Boston 2016 | Paul Shapiro | How to Automate Your Keyword Research
SearchLove Boston 2016 | Paul Shapiro | How to Automate Your Keyword Research
Distilled
 
Ds @ bol
Ds @ bolDs @ bol
Ds @ bol
Asparuh Hristov
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
Davide Mauri
 
Análisis de las novedades del Elastic Stack
Análisis de las novedades del Elastic StackAnálisis de las novedades del Elastic Stack
Análisis de las novedades del Elastic Stack
Elasticsearch
 
Elastic Stack roadmap deep dive
Elastic Stack roadmap deep diveElastic Stack roadmap deep dive
Elastic Stack roadmap deep dive
Elasticsearch
 
Automated Production Ready ML at Scale
Automated Production Ready ML at ScaleAutomated Production Ready ML at Scale
Automated Production Ready ML at Scale
Databricks
 
Architecting for Data Science
Architecting for Data ScienceArchitecting for Data Science
Architecting for Data Science
Johann Schleier-Smith
 
HTML Semantic Tags
HTML Semantic TagsHTML Semantic Tags
HTML Semantic Tags
Bruce Kyle
 
Run Fast, Try Not to Break S**t
Run Fast, Try Not to Break S**tRun Fast, Try Not to Break S**t
Run Fast, Try Not to Break S**t
Michael Schmidt
 

Similar to Realtime predictive analytics using RabbitMQ & scikit-learn (20)

Drools and jBPM 6 Overview
Drools and jBPM 6 OverviewDrools and jBPM 6 Overview
Drools and jBPM 6 Overview
 
Reveal's Advanced Analytics: Using R & Python
Reveal's Advanced Analytics: Using R & PythonReveal's Advanced Analytics: Using R & Python
Reveal's Advanced Analytics: Using R & Python
 
Qcon beijing 2010
Qcon beijing 2010Qcon beijing 2010
Qcon beijing 2010
 
Object Oriented Programming in Swift Ch0 - Encapsulation
Object Oriented Programming in Swift Ch0 - EncapsulationObject Oriented Programming in Swift Ch0 - Encapsulation
Object Oriented Programming in Swift Ch0 - Encapsulation
 
Neo4j in Depth
Neo4j in DepthNeo4j in Depth
Neo4j in Depth
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
 
WordCamp 2012 - Seth Carstens Presentation (Responsive Width)
WordCamp 2012 - Seth Carstens Presentation (Responsive Width)WordCamp 2012 - Seth Carstens Presentation (Responsive Width)
WordCamp 2012 - Seth Carstens Presentation (Responsive Width)
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
Análisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic StackAnálisis del roadmap del Elastic Stack
Análisis del roadmap del Elastic Stack
 
Lean Security
Lean SecurityLean Security
Lean Security
 
SearchLove Boston 2016 | Paul Shapiro | How to Automate Your Keyword Research
SearchLove Boston 2016 | Paul Shapiro | How to Automate Your Keyword ResearchSearchLove Boston 2016 | Paul Shapiro | How to Automate Your Keyword Research
SearchLove Boston 2016 | Paul Shapiro | How to Automate Your Keyword Research
 
Ds @ bol
Ds @ bolDs @ bol
Ds @ bol
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
 
Análisis de las novedades del Elastic Stack
Análisis de las novedades del Elastic StackAnálisis de las novedades del Elastic Stack
Análisis de las novedades del Elastic Stack
 
Elastic Stack roadmap deep dive
Elastic Stack roadmap deep diveElastic Stack roadmap deep dive
Elastic Stack roadmap deep dive
 
Automated Production Ready ML at Scale
Automated Production Ready ML at ScaleAutomated Production Ready ML at Scale
Automated Production Ready ML at Scale
 
Architecting for Data Science
Architecting for Data ScienceArchitecting for Data Science
Architecting for Data Science
 
HTML Semantic Tags
HTML Semantic TagsHTML Semantic Tags
HTML Semantic Tags
 
Run Fast, Try Not to Break S**t
Run Fast, Try Not to Break S**tRun Fast, Try Not to Break S**t
Run Fast, Try Not to Break S**t
 

More from AWeber

ASCEND Content Marketing Power Tools
ASCEND Content Marketing Power ToolsASCEND Content Marketing Power Tools
ASCEND Content Marketing Power Tools
AWeber
 
ASCEND Multichannel Marketing Power Tools
ASCEND Multichannel Marketing Power ToolsASCEND Multichannel Marketing Power Tools
ASCEND Multichannel Marketing Power Tools
AWeber
 
Beginner's Guide to Marketing on Social Networks
Beginner's Guide to Marketing on Social NetworksBeginner's Guide to Marketing on Social Networks
Beginner's Guide to Marketing on Social Networks
AWeber
 
5 Content Blind Spots and How to Avoid Them
5 Content Blind Spots and How to Avoid Them5 Content Blind Spots and How to Avoid Them
5 Content Blind Spots and How to Avoid Them
AWeber
 
Digital Marketing Tips from Experts at the Top of the Summit
Digital Marketing Tips from Experts at the Top of the SummitDigital Marketing Tips from Experts at the Top of the Summit
Digital Marketing Tips from Experts at the Top of the Summit
AWeber
 
Data Processing with Mechanical Turk
Data Processing with Mechanical TurkData Processing with Mechanical Turk
Data Processing with Mechanical Turk
AWeber
 
5 WordPress Plugins that will Rock Your World
5 WordPress Plugins that will Rock Your World5 WordPress Plugins that will Rock Your World
5 WordPress Plugins that will Rock Your World
AWeber
 
How to Grow Your Email List Like the Pros
How to Grow Your Email List Like the ProsHow to Grow Your Email List Like the Pros
How to Grow Your Email List Like the Pros
AWeber
 
How to Create Killer Emails that Make Readers Love You
How to Create Killer Emails that Make Readers Love YouHow to Create Killer Emails that Make Readers Love You
How to Create Killer Emails that Make Readers Love You
AWeber
 
Breathing Life (and ROI) Back Into Your Email Marketing
Breathing Life (and ROI) Back Into Your Email MarketingBreathing Life (and ROI) Back Into Your Email Marketing
Breathing Life (and ROI) Back Into Your Email Marketing
AWeber
 
More Engagement, Less Effort: The Lowdown on Marketing Automation
More Engagement, Less Effort: The Lowdown on Marketing AutomationMore Engagement, Less Effort: The Lowdown on Marketing Automation
More Engagement, Less Effort: The Lowdown on Marketing Automation
AWeber
 
25 List Building Tricks: Ideas, Examples and Resources to Improve Your Email ROI
25 List Building Tricks: Ideas, Examples and Resources to Improve Your Email ROI25 List Building Tricks: Ideas, Examples and Resources to Improve Your Email ROI
25 List Building Tricks: Ideas, Examples and Resources to Improve Your Email ROI
AWeber
 
Email List-Building 101: How to Reel In New Readers with a Few Simple Steps
Email List-Building 101: How to Reel In New Readers with a Few Simple StepsEmail List-Building 101: How to Reel In New Readers with a Few Simple Steps
Email List-Building 101: How to Reel In New Readers with a Few Simple Steps
AWeber
 
30 Ideas in 30 Minutes: Top Holiday Marketing Ideas You Can Steal For 2012
30 Ideas in 30 Minutes: Top Holiday Marketing Ideas You Can Steal For 201230 Ideas in 30 Minutes: Top Holiday Marketing Ideas You Can Steal For 2012
30 Ideas in 30 Minutes: Top Holiday Marketing Ideas You Can Steal For 2012
AWeber
 
How To Get The Results You Want From An Email Campaign
How To Get The Results You Want From An Email CampaignHow To Get The Results You Want From An Email Campaign
How To Get The Results You Want From An Email Campaign
AWeber
 
Smart Email Marketing: Engage Your Customers and Grow Your Business
Smart Email Marketing: Engage Your Customers and Grow Your BusinessSmart Email Marketing: Engage Your Customers and Grow Your Business
Smart Email Marketing: Engage Your Customers and Grow Your Business
AWeber
 
Get More Email Subscribers
Get More Email SubscribersGet More Email Subscribers
Get More Email Subscribers
AWeber
 
Efficient Marketing: The Tools You Need and How to Use Them
Efficient Marketing: The Tools You Need and How to Use ThemEfficient Marketing: The Tools You Need and How to Use Them
Efficient Marketing: The Tools You Need and How to Use Them
AWeber
 
From Local Business to National Sensation
From Local Business to National SensationFrom Local Business to National Sensation
From Local Business to National Sensation
AWeber
 
Live h2gs
Live h2gsLive h2gs
Live h2gs
AWeber
 

More from AWeber (20)

ASCEND Content Marketing Power Tools
ASCEND Content Marketing Power ToolsASCEND Content Marketing Power Tools
ASCEND Content Marketing Power Tools
 
ASCEND Multichannel Marketing Power Tools
ASCEND Multichannel Marketing Power ToolsASCEND Multichannel Marketing Power Tools
ASCEND Multichannel Marketing Power Tools
 
Beginner's Guide to Marketing on Social Networks
Beginner's Guide to Marketing on Social NetworksBeginner's Guide to Marketing on Social Networks
Beginner's Guide to Marketing on Social Networks
 
5 Content Blind Spots and How to Avoid Them
5 Content Blind Spots and How to Avoid Them5 Content Blind Spots and How to Avoid Them
5 Content Blind Spots and How to Avoid Them
 
Digital Marketing Tips from Experts at the Top of the Summit
Digital Marketing Tips from Experts at the Top of the SummitDigital Marketing Tips from Experts at the Top of the Summit
Digital Marketing Tips from Experts at the Top of the Summit
 
Data Processing with Mechanical Turk
Data Processing with Mechanical TurkData Processing with Mechanical Turk
Data Processing with Mechanical Turk
 
5 WordPress Plugins that will Rock Your World
5 WordPress Plugins that will Rock Your World5 WordPress Plugins that will Rock Your World
5 WordPress Plugins that will Rock Your World
 
How to Grow Your Email List Like the Pros
How to Grow Your Email List Like the ProsHow to Grow Your Email List Like the Pros
How to Grow Your Email List Like the Pros
 
How to Create Killer Emails that Make Readers Love You
How to Create Killer Emails that Make Readers Love YouHow to Create Killer Emails that Make Readers Love You
How to Create Killer Emails that Make Readers Love You
 
Breathing Life (and ROI) Back Into Your Email Marketing
Breathing Life (and ROI) Back Into Your Email MarketingBreathing Life (and ROI) Back Into Your Email Marketing
Breathing Life (and ROI) Back Into Your Email Marketing
 
More Engagement, Less Effort: The Lowdown on Marketing Automation
More Engagement, Less Effort: The Lowdown on Marketing AutomationMore Engagement, Less Effort: The Lowdown on Marketing Automation
More Engagement, Less Effort: The Lowdown on Marketing Automation
 
25 List Building Tricks: Ideas, Examples and Resources to Improve Your Email ROI
25 List Building Tricks: Ideas, Examples and Resources to Improve Your Email ROI25 List Building Tricks: Ideas, Examples and Resources to Improve Your Email ROI
25 List Building Tricks: Ideas, Examples and Resources to Improve Your Email ROI
 
Email List-Building 101: How to Reel In New Readers with a Few Simple Steps
Email List-Building 101: How to Reel In New Readers with a Few Simple StepsEmail List-Building 101: How to Reel In New Readers with a Few Simple Steps
Email List-Building 101: How to Reel In New Readers with a Few Simple Steps
 
30 Ideas in 30 Minutes: Top Holiday Marketing Ideas You Can Steal For 2012
30 Ideas in 30 Minutes: Top Holiday Marketing Ideas You Can Steal For 201230 Ideas in 30 Minutes: Top Holiday Marketing Ideas You Can Steal For 2012
30 Ideas in 30 Minutes: Top Holiday Marketing Ideas You Can Steal For 2012
 
How To Get The Results You Want From An Email Campaign
How To Get The Results You Want From An Email CampaignHow To Get The Results You Want From An Email Campaign
How To Get The Results You Want From An Email Campaign
 
Smart Email Marketing: Engage Your Customers and Grow Your Business
Smart Email Marketing: Engage Your Customers and Grow Your BusinessSmart Email Marketing: Engage Your Customers and Grow Your Business
Smart Email Marketing: Engage Your Customers and Grow Your Business
 
Get More Email Subscribers
Get More Email SubscribersGet More Email Subscribers
Get More Email Subscribers
 
Efficient Marketing: The Tools You Need and How to Use Them
Efficient Marketing: The Tools You Need and How to Use ThemEfficient Marketing: The Tools You Need and How to Use Them
Efficient Marketing: The Tools You Need and How to Use Them
 
From Local Business to National Sensation
From Local Business to National SensationFrom Local Business to National Sensation
From Local Business to National Sensation
 
Live h2gs
Live h2gsLive h2gs
Live h2gs
 

Recently uploaded

High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
Vadym Kazulkin
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
saastr
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
Ivo Velitchkov
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 

Recently uploaded (20)

High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Apps Break Data
Apps Break DataApps Break Data
Apps Break Data
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 

Realtime predictive analytics using RabbitMQ & scikit-learn

Editor's Notes

  1. Good morning everyone, My name is Michael Becker, I work in the Data Analysis and Management team at AWeber, an email marketing company in Chalfont, PA I'm also the founder of the DataPhilly Meetup group You can find me online @beckerfuffle on Twitter. At beckerfuffle.com, and I'm also mdbecker on github. I'll be posting the materials for this talk on my github.
  2. This talk will cover a lot of the logistics behind utilizing a trained scikit learn model in a real-life production environment. In this talk I’ll cover: How to distribute your model
  3. I’ll discuss how to get new data to your model for prediction.
  4. I’ll introduce RabbitMQ, what it is and why you should care.
  5. I’ll demonstrate how we can put all this together into a finished product
  6. I’ll discuss how to scale your model
  7. Finally I cover some additional things to consider when using scikit learn models in a realtime production environment.
  8. To start off, let's recap what the supervised model training process looks like. 1) You have your training data and labels 2) You vectorize your data, you train your machine learning algorithm. 3) ??? 4) Make predictions with new data 5) Profit
  9. In this case I'm going to talk about one of the first models I created. A model that predicts the language of input text. To create this model, I used 38 of the top Wikipedias based on number of articles. I then dumped several of the most popular articles as defined by their number of hits.
  10. I converted the wiki markup to plain text. I trained a LinearSVC (Support Vector Classifier) model using a bi/trigram (n-gram) approach I had read worked well for language classification. This approach involves counting all combinations of 2 (bigram) or 3 (trigram) character sequences in your dataset. I tested the model and I was seeing ~99% accuracy. Here I've defined a pipeline combining a text feature extractor with a simple classifier. A pipeline is a utility used to build a composite classifier. To extract features, I'm using a TfidfVectorizer. The vectorizer first counts the number of occurrences of each n-gram in each document to "vectorize the text." It then applies the TF-IDF (term frequency–inverse document frequency) algorithm. TF-IDF reflects how important a word is to a document in a collection of documents. The TF-IDF value increases based on the number of times a n-gram appears in the document, but is offset by the frequency of the n-gram in the rest of the documents. So for example an common word like "the" would get down weighted compared to a less common word like "automobile."
  11. So the first thing you might ask yourself after you've trained your awesome model is "now what?" So one of the first problems you'll want to solve is how to distribute your model? The easiest thing to do this is to pickle (serialize) the model to disk and distribute it as part of your application. You can also store it in a database such as GridFS or Amazon S3. In the case of my model, it took up roughly 400MB in memory. This is pretty big, but easily storable on disk (and more importantly in memory).
  12. Next let's discuss how we’re going to get data into our model. You're data could be coming from many types of sources, a web front-end, a DB trigger, etc.. In many cases, you can't easily control the rate of incoming data and you don't want to hold up the front-end or the database while you wait for a prediction to be made. In these cases, it's useful to be able to process your data asynchronously.
  13. In the example I'm giving today, we created a simple web front-end (similar to google translate) where a user can enter some text to be classified, and get a classification back. We don't want to hold up a thread or process in the client waiting on our classifier to do its thing. Rather the front-end sends the input to a REST API which will record the text input and return a tracking_id that the client can then use to get the result.
  14. Decoupling the UI from the backend in this way solves one design issue. However another thing to consider is weather you can afford to lose messages. If all of your data needs to be processed you have 2 options. You either need to have a built in retry mechanism in the front end, or you need a persistent and durable queue to hold your messages.
  15. Enter RabbitMQ. One of the many features provided by RabbitMQ is Highly Available Queues. By using RabbitMQ, you can ensure that every message is processed without needing to implement a fancy (and likely error prone) retry mechanism in your front-end.
  16. RabbitMQ uses AMQP (Advanced Message Queuing Protocol) for all client communication. Using AMQP allows clients running on different platforms or written in different languages, to easily send messages to each other. From a high level, AMQP enables clients to publish messages, and other clients to consume those messages. It does all this without requiring you to roll your own protocol or library.
  17. Once you hook your data input source into RabbitMQ and start publishing data, all you need to do is put your model in a persistent worker and start consuming input.
  18. In the case of my language classification model, we implemented a simple worker that unpickles the classifier and subscribes to an input queue. It then runs an event loop (main) that pulls new messages as they become available and passes them to process_event. Process event calls predict on our model and converts the numerical prediction to a human readable format. This prediction is then stored in our DB for the front-end to retrieve.
  19. So that’s basically it. Our design looks a little something like this: The input comes from the UI where the user enters some text they wish to classify. The UI hits a Flask REST API via a GET request. The API stores the request in the DB. The API sends a message to RabbitMQ with the text to classify and the tracking_id for storing the resulting classification. The API returns a json response to the UI with the tracking_id. The worker pulls the message off the queue in RabbitMQ. The worker calls predict on the classifier with the text as input. The classifier returns a prediction. The worker updates the database with the result. The UI displays the result.
  20. Alright so let’s see what this all looks like in action!
  21. Alright so let’s see what this all looks like in action!
  22. Alright so let’s see what this all looks like in action!
  23. Besides the basic design concerns I’ve already covered, there’s a few more things worth mentioning. The worst thing that can happen when you're processing data asynchronously is for your queue to backup. Backups will result in longer processing times, and if unbounded, you'll likely crash RabbitMQ. The easiest way to scale your workers is to start another instance. Using this strategy, processing should scale roughly linearly. In my experience, you can easily handle thousands of messages a second this way.
  24. Another way to scale your worker is to convert it to processing requests in batches. Many of the algorithms scale super-linearly when you pass multiple samples to the predict method. The downside of this is that you will no longer be able to process results in realtime. However, if you're restricted on resources (memory & cpu), this might be a worthwhile alternative.
  25. Keep an eye on your queue sizes, alert when they backup. Scale as needed (possibly automatically).
  26. Understand your load requirements. Load test end-to-end to verify you can handle the expected load.
  27. Periodically re-verify your algorithm using new data. Build in a feedback loop so that you can collect new labeled samples to verify the performance Version control your classifier. Keep detailed changelogs and performance metrics/characteristics.
  28. I’d like to thank Kelly O’brien and Matt Parke for helping me with the front-end and back-end for the demo. Without them things would be a lot less exciting!
  29. You can find me online @beckerfuffle on Twitter. At beckerfuffle.com, and I'm also mdbecker on github. I'll be posting the materials for this talk on my github.