Agile Machine Learning for Real-time Recommender Systems

•

7 likes•4,250 views

These are slides presented at MLconf in San Francisco, November 14, 2014. I share the approach to real-time machine learning for recommender systems developed at if(we). We achieve rapid iterative cycles by adhering to a strict approach to structuring and accessing our data, as well as to building the online features that comprise our models. These developments support teams of data scientist and data engineers, who work together to solve complex recommendation problems. We also introduce the Antelope Realtime Events framework, an open source demonstration application which derives from our scalable proprietary software stack.

Software

Agile Machine Learning for Real-time
Recommender Systems
Johann Schleier-Smith
CTO, if(we)
@jssmith johann@ifwe.co github.com/ifwe

1. Gain understanding of machine learning
2. Gain understanding of the product usage
3. See opportunity to make the product better
4. Create training data
5. Train predictive models
6. Put models in production
7. See improvements

1. Gain understanding of machine learning
2. Gain understanding of the product usage
3. See opportunity to make the product better
4. Pull records from database to create interesting
features (usually aggregates)
5. Train predictive models
6. Go implement models for production
7. See improvements

• Profitable startup actively pursuing big
opportunities in social apps
• Millions of users of existing brands
• Thousands of social contacts per second

Tagged dating feature
• >10 million candidates
to select from
• >1000 updates/sec
• Must be responsive to
current activity
• Users expect instant
query results

• Data scientist hands model description to
software engineer
• May need to translate features from SQL to Java
• Aggregate features require batch processing
• May need to adjust features and model to
achieve real-time updates
• Fast scoring requires high-performance in-memory
data structures

!
!
!
4. Pull records from database to create interesting
features (usually aggregates)
5. Train predictive models
6. Go implement models for production

Create interesting features
Train predictive models
Put models in production

$History. filterTime(start, PLUS_INFINITY). foreach { e: Event => model.update(e) }$

class MyModel {
def update(e: Event) { … }
def topN(ctx: Context, n: Int) = { … }
}

class MyFeature {
def update(e: Event) { … }
def score(ctx: Context,
candidateId: Long): Double = { … }
}

$History. filterTime(start, PLUS_INFINITY). foreach { e: Event => { writeTrainingData(outcome(e), model.features(context(e)) model.update(e) } }$

live demo
Kaggle competition
with Best Buy data
https://www.kaggle.com/c/acm-sf-chapter-hackathon-small

product update events
{
“timestamp” : “2012-05-03 6:43:15”,
“eventType” : “ProductUpdate”,
“eventProperties” : {
“sku” : “1032361”,
“regularPrice” : “19.99”,
“name” : “Need for Speed: Hot Pursuit”,
“description” : “Fasten your seatbelt and
get ready to drive like your life depends
on it...”
...
}
}

product view events
{
“timestamp” : “2011-10-31 09:48:46”,
“eventType” : “ProductView”,
“eventProperties” : {
“skuSelected” : “2670133”,
“query” : “Modern warfare”
}
}

demo
Try it yourself, code and instructions at:
https://github.com/ifweco/antelope/blob/master/doc/demo.md

• All data in form of events – no exceptions!
• Roll through history to generate training examples
• Sample training data carefully to avoid feedback
• Model is static while features are live and personal
• Use interesting features with boring algorithms
• Expressiveness > performance > scalability
github.com/ifwe/antelope
@jssmith

What's hot

ML-Ops: From Proof-of-Concept to Production ApplicationHunter Carlisle

MLOps by Sasha RosenbaumSasha Rosenbaum

MLOps with serverless architectures (October 2018)Julien SIMON

AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...Bill Liu

MLOps and Reproducible ML on AWS with Kubeflow and SageMakerProvectus

Ml ops past_present_futureNisha Talagala

MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus

Seamless MLOps with Seldon and MLflowDatabricks

Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and PrometheusManasi Vartak

Nasscom ml ops webinarSameer Mahajan

Managing the Machine Learning Lifecycle with MLOpsFatih Baltacı

Productionizing Machine Learning in Our Health and Wellness MarketplaceDatabricks

Modern Machine Learning Infrastructure and PracticesWill Gardella

Production machine learning_infrastructurejoshwills

Richard Coffey (x18140785) - Research in Computing CA2Richard Coffey

MLOps Bridging the gap between Data Scientists and Ops.Knoldus Inc.

Data ops: Machine Learning in productionStepan Pushkarev

“Houston, we have a model...” Introduction to MLOpsRui Quintino

Machine learning model to productionGeorg Heiler

Driverless AI - Arno Candel, H2O.aiSri Ambati

What's hot (20)

ML-Ops: From Proof-of-Concept to Production Application

MLOps by Sasha Rosenbaum

MLOps with serverless architectures (October 2018)

AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...

MLOps and Reproducible ML on AWS with Kubeflow and SageMaker

Ml ops past_present_future

MLOps and Data Quality: Deploying Reliable ML Models in Production

Seamless MLOps with Seldon and MLflow

Robust MLOps with Open-Source: ModelDB, Docker, Jenkins, and Prometheus

Nasscom ml ops webinar

Managing the Machine Learning Lifecycle with MLOps

Productionizing Machine Learning in Our Health and Wellness Marketplace

Modern Machine Learning Infrastructure and Practices

Production machine learning_infrastructure

Richard Coffey (x18140785) - Research in Computing CA2

MLOps Bridging the gap between Data Scientists and Ops.

Data ops: Machine Learning in production

“Houston, we have a model...” Introduction to MLOps

Machine learning model to production

Driverless AI - Arno Candel, H2O.ai

Viewers also liked

Fuzzy clustering using RSIO-FCM pptNIGAN NAYAK

Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)Parinda Rajapaksha

Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah

Ted Dunning, Chief Application Architect, MapR at MLconf SFMLconf

Scott Clark, Software Engineer, Yelp at MLconf SFMLconf

Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SFMLconf

Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SFMLconf

Quoc Le, Software Engineer, Google at MLconf SFMLconf

Steffen Rendle, Research Scientist, Google at MLconf SFMLconf

MLconf - Distributed Deep Learning for Classification and Regression Problems...Sri Ambati

Ameet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SFMLconf

Cassandra Introduction & FeaturesDataStax Academy

Introduction to Apache ZooKeeperSaurav Haloi

Intro to HBasealexbaranau

Introduction to RedisDvir Volk

10 Lessons Learned from Building Machine Learning SystemsXavier Amatriain

Viewers also liked (16)

Fuzzy clustering using RSIO-FCM ppt

Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)

Introduction and Overview of Apache Kafka, TriHUG July 23, 2013

Ted Dunning, Chief Application Architect, MapR at MLconf SF

Scott Clark, Software Engineer, Yelp at MLconf SF

Ted Willke, Senior Principal Engineer & GM, Datacenter Group, Intel at MLconf SF

Lise Getoor, Professor, Computer Science, UC Santa Cruz at MLconf SF

Quoc Le, Software Engineer, Google at MLconf SF

Steffen Rendle, Research Scientist, Google at MLconf SF

MLconf - Distributed Deep Learning for Classification and Regression Problems...

Ameet Talwalkar, assistant professor of Computer Science, UCLA at MLconf SF

Cassandra Introduction & Features

Introduction to Apache ZooKeeper

Intro to HBase

Introduction to Redis

10 Lessons Learned from Building Machine Learning Systems

Similar to Agile Machine Learning for Real-time Recommender Systems

BackboneJS Training - Giving Backbone to your applicationsJoseph Khan

[2C2]PredictionIONAVER D2

WebNet Conference 2012 - Designing complex applications using html5 and knock...Fabio Franzini

DevOps and Machine Learning (Geekwire Cloud Tech Summit)Jasjeet Thind

Making Netflix Machine Learning Algorithms ReliableJustin Basilico

Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...Provectus

Monitoring AI with AIStepan Pushkarev

Agile experiments in Machine Learning with F#J On The Beach

Machine Learning and Analytics Breakout SessionSplunk

I want my model to be deployed ! (another story of MLOps)AZUG FR

Machine Learning and Analytics Breakout SessionSplunk

Splunk for Machine Learning and AnalyticsShannon Cuthbertson

Splunk for Machine Learning and AnalyticsSplunk

PredictionIO – A Machine Learning Server in Scala – SF Scalapredictionio

PyData Meetup - Feature Store for Hopsworks and ML PipelinesJim Dowling

Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...Databricks

Machine Learning for .NET Developers - ADC21Gülden Bilgütay

Agile Experiments in Machine Learningmathias-brandewinder

Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)Neotys_Partner

Machine Learning and Analytics Breakout SessionSplunk

Similar to Agile Machine Learning for Real-time Recommender Systems (20)

BackboneJS Training - Giving Backbone to your applications

[2C2]PredictionIO

WebNet Conference 2012 - Designing complex applications using html5 and knock...

DevOps and Machine Learning (Geekwire Cloud Tech Summit)

Making Netflix Machine Learning Algorithms Reliable

Data Summer Conf 2018, “Monitoring AI with AI (RUS)” — Stepan Pushkarev, CTO ...

Monitoring AI with AI

Agile experiments in Machine Learning with F#

Machine Learning and Analytics Breakout Session

I want my model to be deployed ! (another story of MLOps)

Machine Learning and Analytics Breakout Session

Splunk for Machine Learning and Analytics

PredictionIO – A Machine Learning Server in Scala – SF Scala

PyData Meetup - Feature Store for Hopsworks and ML Pipelines

Analytics Zoo: Building Analytics and AI Pipeline for Apache Spark and BigDL ...

Machine Learning for .NET Developers - ADC21

Agile Experiments in Machine Learning

Jonathon Wright - Intelligent Performance Cognitive Learning (AIOps)

Machine Learning and Analytics Breakout Session

Recently uploaded

Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC

2.pdf Ejercicios de programación competitivaDiego Iván Oliveros Acosta

MYjobs Presentation Django-based projectAnoyGreter

Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ

Introduction Computer Science - Software Design.pdfFerryKemperman

Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig

Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ

GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko

Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase

Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH

What are the key points to focus on before starting to learn ETL Development....kzayra69

EY_Graph Database Powered SustainabilityNeo4j

SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa

CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies

Recruitment Management Software Benefits (Infographic)Hr365.us smith

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin

Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel

SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl

Recently uploaded (20)

Software Project Health Check: Best Practices and Techniques for Your Product...

2.pdf Ejercicios de programación competitiva

MYjobs Presentation Django-based project

Cloud Management Software Platforms: OpenStack

Introduction Computer Science - Software Design.pdf

Automate your Kamailio Test Calls - Kamailio World 2024

Cloud Data Center Network Construction - IEEE

GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf

Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha

KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024

Der Spagat zwischen BIAS und FAIRNESS (2024)

What are the key points to focus on before starting to learn ETL Development....

EY_Graph Database Powered Sustainability

SpotFlow: Tracking Method Calls and States at Runtime

CRM Contender Series: HubSpot vs. Salesforce

Recruitment Management Software Benefits (Infographic)

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...

Unveiling the Future: Sylius 2.0 New Features

SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany

Agile Machine Learning for Real-time Recommender Systems

1. Agile Machine Learning for Real-time Recommender Systems Johann Schleier-Smith CTO, if(we) @jssmith johann@ifwe.co github.com/ifwe

2. what it should look like

3. 1. Gain understanding of machine learning 2. Gain understanding of the product usage 3. See opportunity to make the product better 4. Create training data 5. Train predictive models 6. Put models in production 7. See improvements

4. what it often looks like

5. 1. Gain understanding of machine learning 2. Gain understanding of the product usage 3. See opportunity to make the product better 4. Pull records from database to create interesting features (usually aggregates) 5. Train predictive models 6. Go implement models for production 7. See improvements

6. 1. Gain understanding of machine learning 2. Gain understanding of the product usage 3. See opportunity to make the product better 4. Pull records from database to create interesting features (usually aggregates) 5. Train predictive models 6. Go implement models for production 7. See improvements 3-6 months

7. 1. Gain understanding of machine learning 2. Gain understanding of the product usage 3. See opportunity to make the product better 4. Pull records from database to create interesting features (usually aggregates) 5. Train predictive models 6. Go implement models for production 7. See improvements Cool! Wa s i t w o r t h i t ?

8. • Profitable startup actively pursuing big opportunities in social apps • Millions of users of existing brands • Thousands of social contacts per second

9. real-time recommendations challenges

10. Tagged dating feature • >10 million candidates to select from • >1000 updates/sec • Must be responsive to current activity • Users expect instant query results

11. implementation pain points

12. • Data scientist hands model description to software engineer • May need to translate features from SQL to Java • Aggregate features require batch processing • May need to adjust features and model to achieve real-time updates • Fast scoring requires high-performance in-memory data structures

13. time for new thinking

14. one way that works better

15. ! ! ! 4. Pull records from database to create interesting features (usually aggregates) 5. Train predictive models 6. Go implement models for production

16. Create interesting features Train predictive models Put models in production

17. Create interesting features Train predictive models Put models in production

18. one right way to data event history

19. History. filterTime(start, PLUS_INFINITY). foreach { e: Event => model.update(e) }

20. everything is an event

21. Bob registers Alice registers Alice updates profile Bob opens app Bob sees Alice in recommendations Bob swipes yes on Alice Alice receives push notification Alice sees Bob swiped yes Alice swipes yes Alice sends message to Bob

22. writing the model

23. class MyModel { def update(e: Event) { … } def topN(ctx: Context, n: Int) = { … } }

24. models are all about features

25. class MyFeature { def update(e: Event) { … } def score(ctx: Context, candidateId: Long): Double = { … } }

26. model training

27. History. filterTime(start, PLUS_INFINITY). foreach { e: Event => { writeTrainingData(outcome(e), model.features(context(e)) model.update(e) } }

28. live demo Kaggle competition with Best Buy data https://www.kaggle.com/c/acm-sf-chapter-hackathon-small

29. product update events { “timestamp” : “2012-05-03 6:43:15”, “eventType” : “ProductUpdate”, “eventProperties” : { “sku” : “1032361”, “regularPrice” : “19.99”, “name” : “Need for Speed: Hot Pursuit”, “description” : “Fasten your seatbelt and get ready to drive like your life depends on it...” ... } }

30. product view events { “timestamp” : “2011-10-31 09:48:46”, “eventType” : “ProductView”, “eventProperties” : { “skuSelected” : “2670133”, “query” : “Modern warfare” } }

31. demo Try it yourself, code and instructions at: https://github.com/ifweco/antelope/blob/master/doc/demo.md

32.

33.

34.

35. 1. Gain understanding of machine learning 2. Gain understanding of the product usage 3. See opportunity to make the product better 4. Create training data 5. Train predictive models 6. Put models in production 7. See improvements Fast cycles!!

36.

37. • All data in form of events – no exceptions! • Roll through history to generate training examples • Sample training data carefully to avoid feedback • Model is static while features are live and personal • Use interesting features with boring algorithms • Expressiveness > performance > scalability github.com/ifwe/antelope @jssmith

Agile Machine Learning for Real-time Recommender Systems

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (16)

Similar to Agile Machine Learning for Real-time Recommender Systems

Similar to Agile Machine Learning for Real-time Recommender Systems (20)

Recently uploaded

Recently uploaded (20)

Agile Machine Learning for Real-time Recommender Systems