Realtime predictive analytics using RabbitMQ & scikit-learn

•Download as PPT, PDF•

1 like•3,867 views

In this talk, AWeber's Michael Becker describes how to deploy a predictive model in a production environment using RabbitMQ and scikit-learn. You'll see a realtime content classification system to demonstrate this design.

Technology Education

Realtime
Predictive
Analytics
Using scikit-learn and RabbitMQ
Using scikit-learn and RabbitMQMichael Becker
PyData Boston 2013

Who is this guy?
Software Engineer @ AWeber
Founder of the DataPhilly Meetup group
@beckerfuffle
beckerfuffle.com
These slides and more @ github.com/mdbecker

What I'll cover
•Model Distribution
•Data flow

What I'll cover
•Model Distribution
•Data flow
•RabbitMQ

What I'll cover
•Model Distribution
•Data flow
•RabbitMQ
•Demo

What I'll cover
•Model Distribution
•Data flow
•RabbitMQ
•Demo
•Scalability

What I'll cover
•Model Distribution
•Data flow
•RabbitMQ
•Demo
•Scalability
•Other considerations

38 top wikipedias
Arabic ‫العربية‬
Bulgarian Български
Catalan Català
Czech Čeština
Danish Dansk
German Deutsch
English English
Spanish Español
Estonian Eesti
Basque Euskara
Persian ‫فارسی‬
Finnish Suomi
French Français
Hebrew ‫עברית‬
Hindi िहिन्दी
Croatian Hrvatski
Hungarian Magyar
Indonesian Bahasa Indonesia
Italian Italiano
Japanese 日本語
Kazakh Қазақша
Korean 한국어
Lithuanian Lietuvių
Malay Bahasa Melayu
Dutch Nederlands
Norwegian (Bokmål) Norsk (Bokmål)
Polish Polski
Portuguese Português
Romanian Română
Russian Русский
Slovak Slovenčina
Slovenian Slovenščina
Serbian Српски / Srpski
Swedish Svenska
Turkish Türkçe
Ukrainian Українська
Vietnamese Tiếng Việt
Waray-Waray Winaray

Enter RabbitMQ
Reliability
Flexible Routing
Clustering
HA Queues
Many clients

AMQP
Reliability
Flexible Routing
Clustering
HA Queues
Many clients

Thank you
API & Worker: Kelly O’Brien (linkedin.com/in/kellyobie)
UI: Matt Parke (ordinaryrobot.com)
Classifier: Michael Becker (github.com/mdbecker)
Images: Wikipedia

My info
Tweet me @beckerfuffle
Find me at beckerfuffle.com
These slides and more @ github.com/mdbecker

This document summarizes the evolution of using MySQL in AWS, from initial small deployments to more complex architectures with high availability and geo-redundancy needs. It describes starting with basic RDS instances, scaling to handle more reads with read replicas, and the limitations of multi-AZ deployments that require rolling your own HA solutions using tools like Pacemaker and mysqlfailover. As needs grow further, it discusses exploring synchronous replication and geo-redundancy across locations.

Ai4se lab template

CS, NcState

Machine learning with scikit-learn

Qingkai Kong

This document provides an overview of machine learning concepts and techniques using the scikit-learn library in Python. It begins with introductions to different types of machine learning problems including supervised learning tasks like classification and regression as well as unsupervised learning problems like clustering and dimensionality reduction. It then discusses common machine learning algorithms such as support vector machines, k-means clustering, random forests, and principal component analysis. The document also covers best practices for developing machine learning models including data preprocessing, evaluating model performance, and tuning hyperparameters.

Scikit-learn: the state of the union 2016

Gael Varoquaux

Machine learning in production with scikit-learn

Jeff Klukas

Presented at PyOhio 2017: https://pyohio.org/schedule/presentation/284/ The Python data ecosystem provides amazing tools to quickly get up and running with machine learning models, but the path to stably serving them in production is not so clear. We'll discuss details of wrapping a minimal REST API around scikit-learn, training and persisting models in batch, and logging decisions, then compare to some other common approaches to productionizing models.

Introduction to Machine Learning with Python and scikit-learn

Matt Hagy

Clustering: A Scikit Learn Tutorial

Damian R. Mingle, MBA

Think machine-learning-with-scikit-learn-chetan

Chetan Khatri

This document provides an overview of machine learning concepts including supervised learning pipelines, different classifier types, and what makes a good feature for classification. It discusses machine learning algorithms learning from examples and experience, and highlights scikit-learn as an open source machine learning library. Examples are given around classifying dog breeds based on height, showing how features can capture different types of information and the importance of avoiding redundant or useless features.

scikit-learn has emerged as one of the most popular open source machine learning toolkits, now widely used in academia and industry. scikit-learn provides easy-to-use interfaces to perform advanced analysis and build powerful predictive models. The tutorial will cover basic concepts of machine learning, such as supervised and unsupervised learning, cross validation, and model selection. We will see how to prepare data for machine learning, and go from applying a single algorithm to building a machine learning pipeline. We will also cover how to build machine learning models on text data, and how to handle very large datasets.

Data Science and Machine Learning Using Python and Scikit-learn

Asim Jalis

Numerical tour in the Python eco-system: Python, NumPy, scikit-learn

Arnaud Joly

Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael Varoquaux

Pôle Systematic Paris-Region

This document summarizes recent developments in scikit-learn, an open-source machine learning library for Python. It discusses improvements made in version 0.18, including new cross-validation objects and using randomized PCA instead of standard PCA. Upcoming improvements mentioned include adding memory caching to pipelines, a new SAGA solver for logistic regression, and quantile and local outlier factor transformers. It also discusses the scikit-learn user base of 350,000 returning users, its role as core Python infrastructure, and funding and contributions from various academic institutions that support its continued development.

Tree models with Scikit-Learn: Great models with little assumptions

Gilles Louppe

Exploring Machine Learning in Python with Scikit-Learn

Kan Ouivirach, Ph.D.

This document introduces machine learning and discusses why programmers need to know machine learning. It describes the difference between programming and machine learning. Machine learning is hard because it involves inducing functions from examples to generalize to new examples, rather than implementing specified functions. The document discusses real-world machine learning applications like recommendation systems. It recommends using Python and Scikit-Learn for machine learning tasks, as Scikit-Learn provides easy-to-use implementations of popular algorithms with consistent APIs and documentation.

Intro to scikit-learn

AWeber

Intro to scikit learn may 2017

Francesco Mosconi

Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...

PyData

This document discusses authorship attribution and forensic linguistics using machine learning techniques. It defines authorship attribution as identifying the author of an anonymous text. Feature extraction methods are described, including lexical, character, syntactic, and application-specific features. A classification problem approach is outlined involving defining classes, extracting features, training a machine learning classifier, and evaluating. Python libraries like Pandas and Scikit-learn are used for feature extraction, classification, and evaluating models on sample datasets with up to 96% accuracy.

Intro to machine learning with scikit learn

Yoss Cohen

The document discusses machine learning concepts and programming with scikit-learn. It introduces the machine learning process of getting data, pre-processing, partitioning for training and testing, creating a classifier, training and evaluating the model. As an example, it loads the Iris dataset and plots sepal length vs width with labels. It also uses PCA for dimensionality reduction to better classify the Iris data in 3 dimensions.

Scikit-learn for easy machine learning: the vision, the tool, and the project

Gael Varoquaux

Scikit-learn is a popular machine learning tool. What can it do for you?Why you you want to use it? What can you do with it? Where is it going?In this talk, I will discuss why and how scikit-learn became popular. Iwill argue that it is successful because of its vision: it fills an important slot in the rich ecosystem of data science. I will demonstrate how scikit-learn makes predictive analysis easy and yet versatile.I will shed some light on our development process: how do we, as a community, ensure the quality and the growth of scikit-learn?

Accelerating Random Forests in Scikit-Learn

Gilles Louppe

Random Forests are without contest one of the most robust, accurate and versatile tools for solving machine learning tasks. Implementing this algorithm properly and efficiently remains however a challenging task involving issues that are easily overlooked if not considered with care. In this talk, we present the Random Forests implementation developed within the Scikit-Learn machine learning library. In particular, we describe the iterative team efforts that led us to gradually improve our codebase and eventually make Scikit-Learn's Random Forests one of the most efficient implementations in the scientific ecosystem, across all libraries and programming languages. Algorithmic and technical optimizations that have made this possible include: - An efficient formulation of the decision tree algorithm, tailored for Random Forests; - Cythonization of the tree induction algorithm; - CPU cache optimizations, through low-level organization of data into contiguous memory blocks; - Efficient multi-threading through GIL-free routines; - A dedicated sorting procedure, taking into account the properties of data; - Shared pre-computations whenever critical. Overall, we believe that lessons learned from this case study extend to a broad range of scientific applications and may be of interest to anybody doing data analysis in Python.

Converting Scikit-Learn to PMML

Villu Ruusmann

This document discusses converting Scikit-Learn machine learning pipelines to PMML (Predictive Model Markup Language) format. Key points include: - Scikit-Learn pipelines can be serialized to PMML, allowing models to be deployed anywhere that supports PMML. - PMML represents the fitted pipeline using standardized data structures, including feature and target field definitions. - The sklearn2pmml Python library converts Scikit-Learn pipelines to PMML. It handles feature engineering, selection, estimator fitting, and model customization. - Hyperparameter tuning and algorithm selection tools like GridSearchCV and TPOT can also have their best pipelines exported to PMML.

Text Classification/Categorization

Oswal Abhishek

This document provides an overview of natural language processing (NLP) for text categorization and classification. It discusses supervised and unsupervised learning problems and classification algorithms like Naive Bayes and support vector machines (SVM). Specific applications mentioned include email classification, spam filtering, and document organization. The document compares Naive Bayes and SVM, noting that Naive Bayes is easier and faster while SVM is more difficult but can handle binary classification problems.

Introduction to Machine Learning with SciKit-Learn

Benjamin Bengfort

Scikit-Learn is a powerful machine learning library implemented in Python with numeric and scientific computing powerhouses Numpy, Scipy, and matplotlib for extremely fast analysis of small to medium sized data sets. It is open source, commercially usable and contains many modern machine learning algorithms for classification, regression, clustering, feature extraction, and optimization. For this reason Scikit-Learn is often the first tool in a Data Scientists toolkit for machine learning of incoming data sets. The purpose of this one day course is to serve as an introduction to Machine Learning with Scikit-Learn. We will explore several clustering, classification, and regression algorithms for a variety of machine learning tasks and learn how to implement these tasks with our data using Scikit-Learn and Python. In particular, we will structure our machine learning models as though we were producing a data product, an actionable model that can be used in larger programs or algorithms; rather than as simply a research or investigation methodology.

Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...

Jimmy Lai

Big data analysis relies on exploiting various handy tools to gain insight from data easily. In this talk, the speaker demonstrates a data mining flow for text classification using many Python tools. The flow consists of feature extraction/selection, model training/tuning and evaluation. Various tools are used in the flow, including: Pandas for feature processing, scikit-learn for classification, IPython, Notebook for fast sketching, matplotlib for visualization.

A Beginner's Guide to Machine Learning with Scikit-Learn

Sarah Guido

Given at the PyData NYC 2013 conference (http://vimeo.com/79517341), and will be given at PyTennessee 2014. Scikit-learn is one of the most well-known machine learning Python modules in existence. But how does it work, and what, for that matter, is machine learning? For those with programming experience but who are new to machine learning, this talk gives a beginner-level overview of how machine learning can be useful, important machine learning concepts, and how to implement them with scikit-learn. We’ll use real world data to look at supervised and unsupervised machine learning algorithms and why scikit-learn is useful for performing these tasks.

Gradient Boosted Regression Trees in scikit-learn

DataRobot

Slides of the talk "Gradient Boosted Regression Trees in scikit-learn" by Peter Prettenhofer and Gilles Louppe held at PyData London 2014. Abstract: This talk describes Gradient Boosted Regression Trees (GBRT), a powerful statistical learning technique with applications in a variety of areas, ranging from web page ranking to environmental niche modeling. GBRT is a key ingredient of many winning solutions in data-mining competitions such as the Netflix Prize, the GE Flight Quest, or the Heritage Health Price. I will give a brief introduction to the GBRT model and regression trees -- focusing on intuition rather than mathematical formulas. The majority of the talk will be dedicated to an in depth discussion how to apply GBRT in practice using scikit-learn. We will cover important topics such as regularization, model tuning and model interpretation that should significantly improve your score on Kaggle.

Statistical Machine Learning for Text Classification with scikit-learn and NLTK

Olivier Grisel

This document discusses using machine learning algorithms and natural language processing tools for text classification tasks. It covers using scikit-learn and NLTK to extract features from text, build predictive models, and evaluate performance on tasks like sentiment analysis, topic categorization, and language identification. Feature extraction methods discussed include bag-of-words, TF-IDF, n-grams, and collocations. Classifiers covered are Naive Bayes and linear support vector machines. The document reports typical accuracy results in the 70-97% range for different datasets and models.

Make Sense Out of Data with Feature Engineering

DataRobot

Drools and jBPM 6 Overview

Mark Proctor

Reveal's Advanced Analytics: Using R & Python

Poojitha B

Viewers also liked

Machine Learning with scikit-learn

odsc

Data Science and Machine Learning Using Python and Scikit-learn

Asim Jalis

Numerical tour in the Python eco-system: Python, NumPy, scikit-learn

Arnaud Joly

Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael Varoquaux

Pôle Systematic Paris-Region

Tree models with Scikit-Learn: Great models with little assumptions

Gilles Louppe

Exploring Machine Learning in Python with Scikit-Learn

Kan Ouivirach, Ph.D.

Intro to scikit-learn

AWeber

Intro to scikit learn may 2017

Francesco Mosconi

Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...

PyData

Intro to machine learning with scikit learn

Yoss Cohen

Scikit-learn for easy machine learning: the vision, the tool, and the project

Gael Varoquaux

Accelerating Random Forests in Scikit-Learn

Gilles Louppe

Converting Scikit-Learn to PMML

Villu Ruusmann

Text Classification/Categorization

Oswal Abhishek

Introduction to Machine Learning with SciKit-Learn

Benjamin Bengfort

Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...

Jimmy Lai

A Beginner's Guide to Machine Learning with Scikit-Learn

Sarah Guido

Gradient Boosted Regression Trees in scikit-learn

DataRobot

Statistical Machine Learning for Text Classification with scikit-learn and NLTK

Olivier Grisel

Make Sense Out of Data with Feature Engineering

DataRobot

Viewers also liked (20)

Machine Learning with scikit-learn

Data Science and Machine Learning Using Python and Scikit-learn

Numerical tour in the Python eco-system: Python, NumPy, scikit-learn

Pyparis2017 / Scikit-learn - an incomplete yearly review, by Gael Varoquaux

Tree models with Scikit-Learn: Great models with little assumptions

Exploring Machine Learning in Python with Scikit-Learn

Intro to scikit-learn

Intro to scikit learn may 2017

Authorship Attribution and Forensic Linguistics with Python/Scikit-Learn/Pand...

Intro to machine learning with scikit learn

Scikit-learn for easy machine learning: the vision, the tool, and the project

Accelerating Random Forests in Scikit-Learn

Converting Scikit-Learn to PMML

Text Classification/Categorization

Introduction to Machine Learning with SciKit-Learn

Text Classification in Python – using Pandas, scikit-learn, IPython Notebook ...

A Beginner's Guide to Machine Learning with Scikit-Learn

Gradient Boosted Regression Trees in scikit-learn

Statistical Machine Learning for Text Classification with scikit-learn and NLTK

Make Sense Out of Data with Feature Engineering

Similar to Realtime predictive analytics using RabbitMQ & scikit-learn

Drools and jBPM 6 Overview

Mark Proctor

Reveal's Advanced Analytics: Using R & Python

Poojitha B

Qcon beijing 2010

Vonbo

This summary provides an overview of the key topics and speakers at the QCon Beijing conference on April 23-25. Some of the topics included Agile methodologies, Twitter architecture, JavaScript expert Douglas Crockford, Python web development, and more. Speakers would discuss Agile practices in China, how Twitter scales its infrastructure, Crockford's views on JavaScript and HTML5, Python frameworks like Flask and web.py, and techniques like test-driven development in Python. The conference aimed to cover a wide range of current technologies and approaches in software development.

Object Oriented Programming in Swift Ch0 - Encapsulation

Chihyang Li

This document introduces object oriented programming concepts in Swift. It discusses key OOP principles like encapsulation, inheritance and polymorphism. It also covers object oriented analysis, design and programming levels. Specific concepts explained include data abstraction, access control, class invariants, pre/postconditions and design by contract. Common programming paradigms like procedural, object oriented and spaghetti code are compared. Modularization benefits like reusability, maintainability and debugging are highlighted.

Neo4j in Depth

Max De Marzi

This document provides an overview of Neo4j, a graph database management system. It discusses how Neo4j stores data as nodes and relationships, allowing for fast querying of connected data. Traditional relational databases struggle with complex relationships, while NoSQL databases don't support relationships at all. Neo4j addresses these issues through its native graph storage and processing capabilities. The document highlights key Neo4j features like scalability, high performance, and its Cypher query language.

Continuum Analytics and Python

Travis Oliphant

PyData Texas 2015 Keynote

Peter Wang

This document summarizes Peter Wang's keynote speech at PyData Texas 2015. It begins by looking back at the history and growth of PyData conferences over the past 3 years. It then discusses some of the main data science challenges companies currently face. The rest of the speech focuses on the role of Python in data science, how the technology landscape has evolved, and PyData's mission to empower scientists to explore, analyze, and share their data.

WordCamp 2012 - Seth Carstens Presentation (Responsive Width)

Seth Carstens

This document summarizes a presentation on developing responsive websites for smartphones, tablets, and other mobile devices. It discusses using meta tags and CSS3 media queries to create responsive designs, grid systems like 960.gs and Blueprint to plan layouts, and jQuery Mobile for cross-device development. It also recommends testing websites on emulated devices using tools like MITE and KITE and considering performance, usability, and purpose when deciding between customizing or cloning content for mobile.

Intro to Machine Learning with H2O and AWS

Sri Ambati

Análisis del roadmap del Elastic Stack

Elasticsearch

Lean Security

Ben Johnson

We have a lot to do on the cybersecurity side, and we are almost always lacking people, or budget, or both. Can we take lessons and approaches from entrepreneurship to apply to our cybersecurity programs? Can we do more with what we have, or for each addition can we make sure it has a large impact? We’ll explore some entrepreneurship principles and then dive into some ways to improve security without large increases in headcount or budget.

SearchLove Boston 2016 | Paul Shapiro | How to Automate Your Keyword Research

Distilled

Are you tapping into automation for keyword research? If not, why not? When it comes to SEO, automation is awesome. For starters, it can help free up a lot of time that is normally spent on menial tasks. What’s more, it can also aid deep analysis, and even facilitate innovation. If you are still doing keyword research manually, this is a must-attend session. Paul will show you how to get started with automated keyword research, using some easy-to-use tools. You’ll see first-hand how they can help you uncover valuable insights automatically. Overall, you will walk away with an immediately actionable plan to start automating your keyword research today.

Ds @ bol

Asparuh Hristov

The document discusses various data science applications at Bol.com including measuring user interactions on the website, forecasting product demand, and building recommendation systems. It provides examples and details for each application. For measuring, it notes Bol is able to process user event data with a 1-2 second lag compared to 25-30 seconds for another company. For recommendations, it highlights improvements from moving the service to the cloud including faster response times and being able to generate new predictions in 30 minutes instead of 24 hours. For forecasting demand, it outlines the process and techniques used including starting small, experimenting fast, and scaling up over time using various machine learning models and cloud technologies.

Azure Machine Learning

Davide Mauri

Si è tornato a parlare molto di Machine Learning negli ultimi anni. Grazie anche al fatto che è possibile oggi processare enormi moli di dati in tempi (relativamente) veloci questa parte dell'informatica sta vivendo una seconda giovinezza. In questa sessione vedremo cos'è il machine learning, quali sono le diverse casistiche tecniche e funzionali in cui può essere usato ed inizieremo a "giocare" con i dati per vedere fin dove possiamo spingerci, usando strumenti On-Premise e quindi spostandoci poi sull'offerta Azure Machine Learning dove, una volta fatta propria la teoria, si possono realizzare soluzioni estremamente complesse in modo molto visuale, oppure integrandosi con R ed IPython e sfruttare la scalabilità di Azure per avere performance ottimali. Il tutto senza dimenticare che gli algoritmi così ottenuti possono essere facilmente integrati nelle nostre applicazioni semplicemente invocando un web service.

Análisis de las novedades del Elastic Stack

Elasticsearch

Elastic Stack roadmap deep dive

Elasticsearch

- Elastic provides a search and analytics platform called the Elastic Stack that includes the Elastic Stack, Beats data shippers, and Kibana analytics and visualization tools. - The presentation discussed updates to Elastic's products including performance improvements to search, new features for distributed search across data centers, and enhanced security options for authentication and authorization. - Elastic aims to provide customizable and extensible solutions for users to ingest, store, search, analyze and visualize large volumes of data from various sources.

Automated Production Ready ML at Scale

Databricks

In this session you will learn about how H&M have created a reference architecture for deploying their machine learning models on azure utilizing databricks following devOps principles. The architecture is currently used in production and has been iterated over multiple times to solve some of the discovered pain points. The team that are presenting is currently responsible for ensuring that best practices are implemented on all H&M use cases covering 100''s of models across the entire H&M group. <br> This architecture will not only give benefits to data scientist to use notebooks for exploration and modeling but also give the engineers a way to build robust production grade code for deployment. The session will in addition cover topics like lifecycle management, traceability, automation, scalability and version control.

Architecting for Data Science

Johann Schleier-Smith

1. The document discusses architecting data science platforms for a dating product using an event-driven architecture that stores all data as a stream of events. 2. Key aspects of the architecture include an event history repository that stores real-time event streams, a Solr search index for querying events, and using the event stream for both online and offline machine learning. 3. The architecture aims to enable fast experimentation cycles by using the same code and data for production, development, and training machine learning models.

HTML Semantic Tags

Bruce Kyle

Run Fast, Try Not to Break S**t

Michael Schmidt

From its humble beginning as a place where people would pay $5 to get a funny video, Fiverr has grown into the world’s largest marketplace for digital services. Along the way, our frontend architecture has had to evolve as well - with technologies changing at a rapid pace and frontend developers in general always wanting to work with the latest, shiniest thing, not being adaptable to the environment around you can easily lead you down a road where your stack can’t support your needs & where you’re constantly playing catch-up to whatever it is everyone else is doing. In this talk, I’ll give an overview of the FE path that Fiverr took — where we started, what we’re currently doing and where we’re (hopefully) going.

Similar to Realtime predictive analytics using RabbitMQ & scikit-learn (20)

Drools and jBPM 6 Overview

Reveal's Advanced Analytics: Using R & Python

Qcon beijing 2010

Object Oriented Programming in Swift Ch0 - Encapsulation

Neo4j in Depth

Continuum Analytics and Python

PyData Texas 2015 Keynote

WordCamp 2012 - Seth Carstens Presentation (Responsive Width)

Intro to Machine Learning with H2O and AWS

Análisis del roadmap del Elastic Stack

Lean Security

SearchLove Boston 2016 | Paul Shapiro | How to Automate Your Keyword Research

Ds @ bol

Azure Machine Learning

Análisis de las novedades del Elastic Stack

Elastic Stack roadmap deep dive

Automated Production Ready ML at Scale

Architecting for Data Science

HTML Semantic Tags

Run Fast, Try Not to Break S**t

More from AWeber

ASCEND Content Marketing Power Tools

AWeber

This document summarizes insights from processing over 4 million opt-ins per month. First, pages should be "giving pages" that provide value instead of immediately asking for information. Sidebar opt-ins should be replaced with a two-step process that gives an incentive. Second, the highest converting page is a simple "resource guide" listing relevant tools and apps in the practitioner's field. These pages require little time or effort to create but consistently outperform longer-form or higher-perceived-value offers. Marketers are encouraged to test these approaches.

ASCEND Multichannel Marketing Power Tools

AWeber

ASCEND Summit 2014 provided tons of learning opportunities specific to improving your efforts in multichannel marketing. Want to drill down into marketing channels like SEO, email, affiliate marketing, landing pages and mobile? These four ASCEND sessions cover today's most effective marketing methods, with actionable insights you can use right away. Featuring: Justine Jordan, Hunter Boyle, Oli Gardner, Brian Massey, Mohammed Ahmed, Tricia Meyer, Sarah Bundy, Jennifer Myers Ward, Geno Prussakov, and Brian Littleton We've also organized these speakers (and two others - Peter Shankman and Wil Reynolds) into a video package to help you capture the energy, inspiration and actionable takeaways from ASCEND Summit 2014. Order your Multichannel Marketing Power Tools video today: http://multichannelvideo.ascendsummit.com

Beginner's Guide to Marketing on Social Networks

AWeber

Instagram, Reddit, and MySpace are popular social networks but may not be worth marketing time for beginners. Instagram focuses on photos but has over 200 million users. Reddit is a discussion site divided into topic-based subgroups but the diverse audience makes targeting difficult. MySpace was once dominant but has declined significantly and lacks relevance for most modern businesses. Beginners should focus their initial social media efforts on more consistently high-impact networks like Facebook, Twitter, Google+, Pinterest, YouTube, and Tumblr.

5 Content Blind Spots and How to Avoid Them

AWeber

Digital Marketing Tips from Experts at the Top of the Summit

AWeber

Data Processing with Mechanical Turk

AWeber

Looking at a photo and deciding whether the person depicted is happy, angry or sad may seem like a trivial task for anyone to do. However, differing contexts and other subtle factors make it very costly for a computer to do the same. Being able to analyze subjective information automatically is an invaluable tool for small businesses. This data can be used to shape business decisions and drive profits. One way to achieve this goal is through crowdsourcing. In other words, getting a large group of volunteers to participate in a common problem and combining their contirbutions. Actually organizing, funding, and managing a project like this can be daunting and expensive, this is where Amazon's Mechanical Turk comes in. This talk explains how Mechanical Turk works and cover various ways in which it can be leveraged by anyone. We will cover use cases that have been successful, the mechanics of posting, processing and testing tasks, and specific tools for accomplishing these goals. This talk was given by Michael Becker and Kelly O'Brien at the 2013 Philly Tech Week on April 23, 2013.

5 WordPress Plugins that will Rock Your World

AWeber

How to Grow Your Email List Like the Pros

AWeber

How to Create Killer Emails that Make Readers Love You

AWeber

The document discusses how to create effective emails that readers love. It notes that readers want to be passionate about the topic, entertained, learn, and connect. The emails should address these desires through a positive experience on sign-up forms, welcome emails, and when readers reply. Welcome emails are important for bonding readers and setting expectations. Future emails can teach, encourage sharing, and build strong relationships through focusing on reader benefits and making new readers feel welcome. The goal is for readers to follow people, not just blogs or information.

Breathing Life (and ROI) Back Into Your Email Marketing

AWeber

Has your email marketing become a routine? It happens. When we get too bogged down in patterns, our creative juices can get stagnant. Let's shake things up for 2013. Infuse your campaigns with new flavor as we review clever, fun campaigns that worked (and a few that didn't). You'll come way with ideas and inspiration you can put to work right away to revitalize your ROI. Presented by Hunter Boyle at MarketingSherpa's Email Summit 2013, Las Vegas Learn more at: http://www.aweber.com/blog

More Engagement, Less Effort: The Lowdown on Marketing Automation

AWeber

Want to turn strangers into raving fans while you sleep? It may not happen overnight, but automated marketing can help you build your audience, nurture relationships and grow your bottom-line results. All at a fraction of the effort and investment of standard email marketing processes. Want to learn more? Whether you're just starting out or improving an existing program, join us to get the lowdown on simple ways to make automated marketing do your heavy lifting. We'll look at real-world examples and research, so you'll come away ready to take action. Presented by Hunter Boyle of AWeber & DJ Waldow of Waldow Social at Explore Social Media, Portland

25 List Building Tricks: Ideas, Examples and Resources to Improve Your Email ROI

AWeber

Learn how to dramatically grow your email marketing lists with these 25 ideas and resources. Compiled with input and real examples from a variety of marketing all-stars, you're sure to find new tricks to increase your subscriber base and keep them more engaged with your content. Presented by Hunter Boyle at Affiliate Summit East NYC, #ASE12, Aug. 2012. For more tricks, visit: http://www.aweber.com/blog

Email List-Building 101: How to Reel In New Readers with a Few Simple Steps

AWeber

This document provides tips for bloggers and website owners on how to build their email list. It recommends including an email signup form in the sidebar, at the end of blog posts, and when users comment in order to give users multiple opportunities to subscribe. The forms should make a compelling offer to encourage subscriptions rather than just asking for "updates." Additional tips include focusing on the benefits of building an email list and addressing potential privacy concerns to reassure users. The presenter offers to answer questions and provides contact information for those seeking more email marketing help.

30 Ideas in 30 Minutes: Top Holiday Marketing Ideas You Can Steal For 2012

AWeber

The document provides 30 marketing ideas for the holiday season that can be used in emails. Some of the ideas include surprising readers with unique content, creating interactive elements like games and contests, giving gifts or discounts to subscribers, and telling relatable stories that customers can relate to around the holidays. It emphasizes keeping emails lighthearted while still promoting products or services.

How To Get The Results You Want From An Email Campaign

AWeber

This document discusses how to improve email marketing efforts. It recommends automating messages like welcome emails and follow up series. It also suggests gathering subscribers through forms on the website and social media. Additionally, the document advises setting clear subscriber expectations and sharing past email examples. Finally, it proposes optimizing efforts through segmentation, split testing forms and emails, and analyzing metrics. The overall goal is to improve open and click-through rates and generate more sales from email campaigns.

Smart Email Marketing: Engage Your Customers and Grow Your Business

AWeber

What does it mean to market with email? To some, it simply means slapping a form on a website and sending out the occasional newsletter. But to the savvy small business marketer, it means creating a valuable incentive for subscribing, respecting the subscriber's time and attention, and using email to increase the lifetime value (LTV) of a subscriber. For more email marketing tips, visit http://www.aweber.com/blog/

Get More Email Subscribers

AWeber

Efficient Marketing: The Tools You Need and How to Use Them

AWeber

From Local Business to National Sensation

AWeber

Dustin Maher (dustinmaherfitness.com) graduated from the University of Wisconsin in 2006 with a degree in Kinesiology and Business knowing that he wanted to help people get in shape. Soon after graduating, he launched MamaTone Fitness in Madison, Wisconsin. In this presentation at the Greater Philly Email Marketers Meetup on June 6, Crystal Gouldey shared how this fresh-out-of-college fitness instructor grew his local fitness company to a national business with 10 locations, 28 DVDs, a published book, and an email list of 12,000+ subscribers using online marketing tactics.

Live h2gs

AWeber

The document provides an overview of the topics to be covered in a live demo of getting started with AWeber, including: 1) setting up an initial list and confirmation message; 2) creating a welcome email; 3) building a sign up form and getting it on a website; and secondary topics like broadcast newsletters and importing subscribers. The document emphasizes that an email campaign should have a product/service people want, benefit-oriented content, and a website to be successful.

More from AWeber (20)

ASCEND Content Marketing Power Tools

ASCEND Multichannel Marketing Power Tools

Beginner's Guide to Marketing on Social Networks

5 Content Blind Spots and How to Avoid Them

Digital Marketing Tips from Experts at the Top of the Summit

Data Processing with Mechanical Turk

5 WordPress Plugins that will Rock Your World

How to Grow Your Email List Like the Pros

How to Create Killer Emails that Make Readers Love You

Breathing Life (and ROI) Back Into Your Email Marketing

More Engagement, Less Effort: The Lowdown on Marketing Automation

25 List Building Tricks: Ideas, Examples and Resources to Improve Your Email ROI

Email List-Building 101: How to Reel In New Readers with a Few Simple Steps

30 Ideas in 30 Minutes: Top Holiday Marketing Ideas You Can Steal For 2012

How To Get The Results You Want From An Email Campaign

Smart Email Marketing: Engage Your Customers and Grow Your Business

Get More Email Subscribers

Efficient Marketing: The Tools You Need and How to Use Them

From Local Business to National Sensation

Live h2gs

Recently uploaded

High performance Serverless Java on AWS- GoTo Amsterdam 2024

Vadym Kazulkin

Java is for many years one of the most popular programming languages, but it used to have hard times in the Serverless community. Java is known for its high cold start times and high memory footprint, comparing to other programming languages like Node.js and Python. In this talk I'll look at the general best practices and techniques we can use to decrease memory consumption, cold start times for Java Serverless development on AWS including GraalVM (Native Image) and AWS own offering SnapStart based on Firecracker microVM snapshot and restore and CRaC (Coordinated Restore at Checkpoint) runtime hooks. I'll also provide a lot of benchmarking on Lambda functions trying out various deployment package sizes, Lambda memory settings, Java compilation options and HTTP (a)synchronous clients and measure their impact on cold and warm start times.

“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/ Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit. The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers. Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.

Dandelion Hashtable: beyond billion requests per second on a commodity server

Antonios Katsarakis

This slide deck presents DLHT, a concurrent in-memory hashtable. Despite efforts to optimize hashtables, that go as far as sacrificing core functionality, state-of-the-art designs still incur multiple memory accesses per request and block request processing in three cases. First, most hashtables block while waiting for data to be retrieved from memory. Second, open-addressing designs, which represent the current state-of-the-art, either cannot free index slots on deletes or must block all requests to do so. Third, index resizes block every request until all objects are copied to the new index. Defying folklore wisdom, DLHT forgoes open-addressing and adopts a fully-featured and memory-aware closed-addressing design based on bounded cache-line-chaining. This design offers lock-free index operations and deletes that free slots instantly, (2) completes most requests with a single memory access, (3) utilizes software prefetching to hide memory latencies, and (4) employs a novel non-blocking and parallel resizing. In a commodity server and a memory-resident workload, DLHT surpasses 1.6B requests per second and provides 3.5x (12x) the throughput of the state-of-the-art closed-addressing (open-addressing) resizable hashtable on Gets (Deletes).

Northern Engraving | Nameplate Manufacturing Process - 2024

Northern Engraving

Manufacturing custom quality metal nameplates and badges involves several standard operations. Processes include sheet prep, lithography, screening, coating, punch press and inspection. All decoration is completed in the flat sheet with adhesive and tooling operations following. The possibilities for creating unique durable nameplates are endless. How will you create your brand identity? We can help!

LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...

DanBrown980551

This LF Energy webinar took place June 20, 2024. It featured: -Alex Thornton, LF Energy -Hallie Cramer, Google -Daniel Roesler, UtilityAPI -Henry Richardson, WattTime In response to the urgency and scale required to effectively address climate change, open source solutions offer significant potential for driving innovation and progress. Currently, there is a growing demand for standardization and interoperability in energy data and modeling. Open source standards and specifications within the energy sector can also alleviate challenges associated with data fragmentation, transparency, and accessibility. At the same time, it is crucial to consider privacy and security concerns throughout the development of open source platforms. This webinar will delve into the motivations behind establishing LF Energy’s Carbon Data Specification Consortium. It will provide an overview of the draft specifications and the ongoing progress made by the respective working groups. Three primary specifications will be discussed: -Discovery and client registration, emphasizing transparent processes and secure and private access -Customer data, centering around customer tariffs, bills, energy usage, and full consumption disclosure -Power systems data, focusing on grid data, inclusive of transmission and distribution networks, generation, intergrid power flows, and market settlement data

What is an RPA CoE? Session 1 – CoE Vision

DianaGray10

9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...

saastr

JavaLand 2024: Application Development Green Masterplan

Miro Wengner

Session 1 - Intro to Robotic Process Automation.pdf

UiPathCommunity

👉 Check out our full 'Africa Series - Automation Student Developers (EN)' page to register for the full program: https://bit.ly/Automation_Student_Kickstart In this session, we shall introduce you to the world of automation, the UiPath Platform, and guide you on how to install and setup UiPath Studio on your Windows PC. 📕 Detailed agenda: What is RPA? Benefits of RPA? RPA Applications The UiPath End-to-End Automation Platform UiPath Studio CE Installation and Setup 💻 Extra training through UiPath Academy: Introduction to Automation UiPath Business Automation Platform Explore automation development with UiPath Studio 👉 Register here for our upcoming Session 2 on June 20: Introduction to UiPath Studio Fundamentals: https://community.uipath.com/events/details/uipath-lagos-presents-session-2-introduction-to-uipath-studio-fundamentals/

"Choosing proper type of scaling", Olena Syrota

Fwdays

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency

ScyllaDB

Y-Combinator seed pitch deck template PP

c5vrf27qcz

Apps Break Data

Ivo Velitchkov

How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?

Demystifying Knowledge Management through Storytelling

Enterprise Knowledge

The Department of Veteran Affairs (VA) invited Taylor Paschal, Knowledge & Information Management Consultant at Enterprise Knowledge, to speak at a Knowledge Management Lunch and Learn hosted on June 12, 2024. All Office of Administration staff were invited to attend and received professional development credit for participating in the voluntary event. The objectives of the Lunch and Learn presentation were to: - Review what KM ‘is’ and ‘isn’t’ - Understand the value of KM and the benefits of engaging - Define and reflect on your “what’s in it for me?” - Share actionable ways you can participate in Knowledge - - Capture & Transfer

GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph

Neo4j

Astute Business Solutions | Oracle Cloud Partner |

AstuteBusiness

"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk

Fwdays

At this talk we will discuss DDoS protection tools and best practices, discuss network architectures and what AWS has to offer. Also, we will look into one of the largest DDoS attacks on Ukrainian infrastructure that happened in February 2022. We'll see, what techniques helped to keep the web resources available for Ukrainians and how AWS improved DDoS protection for all customers based on Ukraine experience

Fueling AI with Great Data with Airbyte Webinar

Zilliz

Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors

DianaGray10

Join us to learn how UiPath Apps can directly and easily interact with prebuilt connectors via Integration Service--including Salesforce, ServiceNow, Open GenAI, and more. The best part is you can achieve this without building a custom workflow! Say goodbye to the hassle of using separate automations to call APIs. By seamlessly integrating within App Studio, you can now easily streamline your workflow, while gaining direct access to our Connector Catalog of popular applications. We’ll discuss and demo the benefits of UiPath Apps and connectors including: Creating a compelling user experience for any software, without the limitations of APIs. Accelerating the app creation process, saving time and effort Enjoying high-performance CRUD (create, read, update, delete) operations, for seamless data management. Speakers: Russell Alfeche, Technology Leader, RPA at qBotic and UiPath MVP Charlie Greenberg, host

Monitoring and Managing Anomaly Detection on OpenShift.pdf

Tosin Akinosho

Monitoring and Managing Anomaly Detection on OpenShift Overview Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices. Key Topics Covered 1. Introduction to Anomaly Detection - Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems. 2. Understanding Edge (IoT) - Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source. 3. What is ArgoCD? - Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices. 4. Deployment Using ArgoCD for Edge Devices - Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD. 5. Introduction to Apache Kafka and S3 - Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions. 6. Viewing Kafka Messages in the Data Lake - Learn how to view and analyze Kafka messages stored in a data lake for better insights. 7. What is Prometheus? - Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices. 8. Monitoring Application Metrics with Prometheus - Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system. 9. What is Camel K? - Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes. 10. Configuring Camel K Integrations for Data Pipelines - Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow. 11. What is a Jupyter Notebook? - Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text. 12. Jupyter Notebooks with Code Examples - Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.

Recently uploaded (20)

High performance Serverless Java on AWS- GoTo Amsterdam 2024

“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...

Dandelion Hashtable: beyond billion requests per second on a commodity server

Northern Engraving | Nameplate Manufacturing Process - 2024

LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...

What is an RPA CoE? Session 1 – CoE Vision

9 CEO's who hit $100m ARR Share Their Top Growth Tactics Nathan Latka, Founde...

JavaLand 2024: Application Development Green Masterplan

Session 1 - Intro to Robotic Process Automation.pdf

"Choosing proper type of scaling", Olena Syrota

Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency

Y-Combinator seed pitch deck template PP

Apps Break Data

Demystifying Knowledge Management through Storytelling

GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph

Astute Business Solutions | Oracle Cloud Partner |

"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk

Fueling AI with Great Data with Airbyte Webinar

Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors

Monitoring and Managing Anomaly Detection on OpenShift.pdf

Realtime predictive analytics using RabbitMQ & scikit-learn

1. Realtime Predictive Analytics Using scikit-learn and RabbitMQ Using scikit-learn and RabbitMQMichael Becker PyData Boston 2013

2. Who is this guy? Software Engineer @ AWeber Founder of the DataPhilly Meetup group @beckerfuffle beckerfuffle.com These slides and more @ github.com/mdbecker

3. What I'll cover •Model Distribution

4. What I'll cover •Model Distribution •Data flow

5. What I'll cover •Model Distribution •Data flow •RabbitMQ

6. What I'll cover •Model Distribution •Data flow •RabbitMQ •Demo

7. What I'll cover •Model Distribution •Data flow •RabbitMQ •Demo •Scalability

8. What I'll cover •Model Distribution •Data flow •RabbitMQ •Demo •Scalability •Other considerations

9. Supervised Learning

10. 38 top wikipedias Arabic ‫العربية‬ Bulgarian Български Catalan Català Czech Čeština Danish Dansk German Deutsch English English Spanish Español Estonian Eesti Basque Euskara Persian ‫فارسی‬ Finnish Suomi French Français Hebrew ‫עברית‬ Hindi िहिन्दी Croatian Hrvatski Hungarian Magyar Indonesian Bahasa Indonesia Italian Italiano Japanese 日本語 Kazakh Қазақша Korean 한국어 Lithuanian Lietuvių Malay Bahasa Melayu Dutch Nederlands Norwegian (Bokmål) Norsk (Bokmål) Polish Polski Portuguese Português Romanian Română Russian Русский Slovak Slovenčina Slovenian Slovenščina Serbian Српски / Srpski Swedish Svenska Turkish Türkçe Ukrainian Українська Vietnamese Tiếng Việt Waray-Waray Winaray

11. The model

12. Distributing the model

13. Data input

14. The client

15. Message loss

16. Enter RabbitMQ Reliability Flexible Routing Clustering HA Queues Many clients

17. AMQP Reliability Flexible Routing Clustering HA Queues Many clients

18. Data processing

19. The worker

20. The design

21. Demo time!

22. Demo time!

23. Demo time!

24. Scaling

25. Realtime vs batch

26. Monitoring

27. Load

28. Verify

29. Thank you API & Worker: Kelly O’Brien (linkedin.com/in/kellyobie) UI: Matt Parke (ordinaryrobot.com) Classifier: Michael Becker (github.com/mdbecker) Images: Wikipedia

30. My info Tweet me @beckerfuffle Find me at beckerfuffle.com These slides and more @ github.com/mdbecker

Editor's Notes

Good morning everyone, My name is Michael Becker, I work in the Data Analysis and Management team at AWeber, an email marketing company in Chalfont, PA I'm also the founder of the DataPhilly Meetup group You can find me online @beckerfuffle on Twitter. At beckerfuffle.com, and I'm also mdbecker on github. I'll be posting the materials for this talk on my github.
This talk will cover a lot of the logistics behind utilizing a trained scikit learn model in a real-life production environment. In this talk I’ll cover: How to distribute your model
I’ll discuss how to get new data to your model for prediction.
I’ll introduce RabbitMQ, what it is and why you should care.
I’ll demonstrate how we can put all this together into a finished product
I’ll discuss how to scale your model
Finally I cover some additional things to consider when using scikit learn models in a realtime production environment.
To start off, let's recap what the supervised model training process looks like. 1) You have your training data and labels 2) You vectorize your data, you train your machine learning algorithm. 3) ??? 4) Make predictions with new data 5) Profit
In this case I'm going to talk about one of the first models I created. A model that predicts the language of input text. To create this model, I used 38 of the top Wikipedias based on number of articles. I then dumped several of the most popular articles as defined by their number of hits.
I converted the wiki markup to plain text. I trained a LinearSVC (Support Vector Classifier) model using a bi/trigram (n-gram) approach I had read worked well for language classification. This approach involves counting all combinations of 2 (bigram) or 3 (trigram) character sequences in your dataset. I tested the model and I was seeing ~99% accuracy. Here I've defined a pipeline combining a text feature extractor with a simple classifier. A pipeline is a utility used to build a composite classifier. To extract features, I'm using a TfidfVectorizer. The vectorizer first counts the number of occurrences of each n-gram in each document to "vectorize the text." It then applies the TF-IDF (term frequency–inverse document frequency) algorithm. TF-IDF reflects how important a word is to a document in a collection of documents. The TF-IDF value increases based on the number of times a n-gram appears in the document, but is offset by the frequency of the n-gram in the rest of the documents. So for example an common word like "the" would get down weighted compared to a less common word like "automobile."
So the first thing you might ask yourself after you've trained your awesome model is "now what?" So one of the first problems you'll want to solve is how to distribute your model? The easiest thing to do this is to pickle (serialize) the model to disk and distribute it as part of your application. You can also store it in a database such as GridFS or Amazon S3. In the case of my model, it took up roughly 400MB in memory. This is pretty big, but easily storable on disk (and more importantly in memory).
Next let's discuss how we’re going to get data into our model. You're data could be coming from many types of sources, a web front-end, a DB trigger, etc.. In many cases, you can't easily control the rate of incoming data and you don't want to hold up the front-end or the database while you wait for a prediction to be made. In these cases, it's useful to be able to process your data asynchronously.
In the example I'm giving today, we created a simple web front-end (similar to google translate) where a user can enter some text to be classified, and get a classification back. We don't want to hold up a thread or process in the client waiting on our classifier to do its thing. Rather the front-end sends the input to a REST API which will record the text input and return a tracking_id that the client can then use to get the result.
Decoupling the UI from the backend in this way solves one design issue. However another thing to consider is weather you can afford to lose messages. If all of your data needs to be processed you have 2 options. You either need to have a built in retry mechanism in the front end, or you need a persistent and durable queue to hold your messages.
Enter RabbitMQ. One of the many features provided by RabbitMQ is Highly Available Queues. By using RabbitMQ, you can ensure that every message is processed without needing to implement a fancy (and likely error prone) retry mechanism in your front-end.
RabbitMQ uses AMQP (Advanced Message Queuing Protocol) for all client communication. Using AMQP allows clients running on different platforms or written in different languages, to easily send messages to each other. From a high level, AMQP enables clients to publish messages, and other clients to consume those messages. It does all this without requiring you to roll your own protocol or library.
Once you hook your data input source into RabbitMQ and start publishing data, all you need to do is put your model in a persistent worker and start consuming input.
In the case of my language classification model, we implemented a simple worker that unpickles the classifier and subscribes to an input queue. It then runs an event loop (main) that pulls new messages as they become available and passes them to process_event. Process event calls predict on our model and converts the numerical prediction to a human readable format. This prediction is then stored in our DB for the front-end to retrieve.
So that’s basically it. Our design looks a little something like this: The input comes from the UI where the user enters some text they wish to classify. The UI hits a Flask REST API via a GET request. The API stores the request in the DB. The API sends a message to RabbitMQ with the text to classify and the tracking_id for storing the resulting classification. The API returns a json response to the UI with the tracking_id. The worker pulls the message off the queue in RabbitMQ. The worker calls predict on the classifier with the text as input. The classifier returns a prediction. The worker updates the database with the result. The UI displays the result.
Alright so let’s see what this all looks like in action!
Alright so let’s see what this all looks like in action!
Alright so let’s see what this all looks like in action!
Besides the basic design concerns I’ve already covered, there’s a few more things worth mentioning. The worst thing that can happen when you're processing data asynchronously is for your queue to backup. Backups will result in longer processing times, and if unbounded, you'll likely crash RabbitMQ. The easiest way to scale your workers is to start another instance. Using this strategy, processing should scale roughly linearly. In my experience, you can easily handle thousands of messages a second this way.
Another way to scale your worker is to convert it to processing requests in batches. Many of the algorithms scale super-linearly when you pass multiple samples to the predict method. The downside of this is that you will no longer be able to process results in realtime. However, if you're restricted on resources (memory & cpu), this might be a worthwhile alternative.
Keep an eye on your queue sizes, alert when they backup. Scale as needed (possibly automatically).
Understand your load requirements. Load test end-to-end to verify you can handle the expected load.
Periodically re-verify your algorithm using new data. Build in a feedback loop so that you can collect new labeled samples to verify the performance Version control your classifier. Keep detailed changelogs and performance metrics/characteristics.
I’d like to thank Kelly O’brien and Matt Parke for helping me with the front-end and back-end for the demo. Without them things would be a lot less exciting!
You can find me online @beckerfuffle on Twitter. At beckerfuffle.com, and I'm also mdbecker on github. I'll be posting the materials for this talk on my github.

Realtime predictive analytics using RabbitMQ & scikit-learn

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Realtime predictive analytics using RabbitMQ & scikit-learn

Similar to Realtime predictive analytics using RabbitMQ & scikit-learn (20)

More from AWeber

More from AWeber (20)

Recently uploaded

Recently uploaded (20)

Realtime predictive analytics using RabbitMQ & scikit-learn

Editor's Notes