Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Agile Machine Learning for Real-time Recommender Systems

3,895 views

Published on

These are slides presented at MLconf in San Francisco, November 14, 2014. I share the approach to real-time machine learning for recommender systems developed at if(we). We achieve rapid iterative cycles by adhering to a strict approach to structuring and accessing our data, as well as to building the online features that comprise our models. These developments support teams of data scientist and data engineers, who work together to solve complex recommendation problems. We also introduce the Antelope Realtime Events framework, an open source demonstration application which derives from our scalable proprietary software stack.

Published in: Software
  • Be the first to comment

Agile Machine Learning for Real-time Recommender Systems

  1. 1. Agile Machine Learning for Real-time Recommender Systems Johann Schleier-Smith CTO, if(we) @jssmith johann@ifwe.co github.com/ifwe
  2. 2. what it should look like
  3. 3. 1. Gain understanding of machine learning 2. Gain understanding of the product usage 3. See opportunity to make the product better 4. Create training data 5. Train predictive models 6. Put models in production 7. See improvements
  4. 4. what it often looks like
  5. 5. 1. Gain understanding of machine learning 2. Gain understanding of the product usage 3. See opportunity to make the product better 4. Pull records from database to create interesting features (usually aggregates) 5. Train predictive models 6. Go implement models for production 7. See improvements
  6. 6. 1. Gain understanding of machine learning 2. Gain understanding of the product usage 3. See opportunity to make the product better 4. Pull records from database to create interesting features (usually aggregates) 5. Train predictive models 6. Go implement models for production 7. See improvements 3-6 months
  7. 7. 1. Gain understanding of machine learning 2. Gain understanding of the product usage 3. See opportunity to make the product better 4. Pull records from database to create interesting features (usually aggregates) 5. Train predictive models 6. Go implement models for production 7. See improvements Cool! Wa s i t w o r t h i t ?
  8. 8. • Profitable startup actively pursuing big opportunities in social apps • Millions of users of existing brands • Thousands of social contacts per second
  9. 9. real-time recommendations challenges
  10. 10. Tagged dating feature • >10 million candidates to select from • >1000 updates/sec • Must be responsive to current activity • Users expect instant query results
  11. 11. implementation pain points
  12. 12. • Data scientist hands model description to software engineer • May need to translate features from SQL to Java • Aggregate features require batch processing • May need to adjust features and model to achieve real-time updates • Fast scoring requires high-performance in-memory data structures
  13. 13. time for new thinking
  14. 14. one way that works better
  15. 15. ! ! ! 4. Pull records from database to create interesting features (usually aggregates) 5. Train predictive models 6. Go implement models for production
  16. 16. Create interesting features Train predictive models Put models in production
  17. 17. Create interesting features Train predictive models Put models in production
  18. 18. one right way to data event history
  19. 19. History. filterTime(start, PLUS_INFINITY). foreach { e: Event => model.update(e) }
  20. 20. everything is an event
  21. 21. Bob registers Alice registers Alice updates profile Bob opens app Bob sees Alice in recommendations Bob swipes yes on Alice Alice receives push notification Alice sees Bob swiped yes Alice swipes yes Alice sends message to Bob
  22. 22. writing the model
  23. 23. class MyModel { def update(e: Event) { … } def topN(ctx: Context, n: Int) = { … } }
  24. 24. models are all about features
  25. 25. class MyFeature { def update(e: Event) { … } def score(ctx: Context, candidateId: Long): Double = { … } }
  26. 26. model training
  27. 27. History. filterTime(start, PLUS_INFINITY). foreach { e: Event => { writeTrainingData(outcome(e), model.features(context(e)) model.update(e) } }
  28. 28. live demo Kaggle competition with Best Buy data https://www.kaggle.com/c/acm-sf-chapter-hackathon-small
  29. 29. product update events { “timestamp” : “2012-05-03 6:43:15”, “eventType” : “ProductUpdate”, “eventProperties” : { “sku” : “1032361”, “regularPrice” : “19.99”, “name” : “Need for Speed: Hot Pursuit”, “description” : “Fasten your seatbelt and get ready to drive like your life depends on it...” ... } }
  30. 30. product view events { “timestamp” : “2011-10-31 09:48:46”, “eventType” : “ProductView”, “eventProperties” : { “skuSelected” : “2670133”, “query” : “Modern warfare” } }
  31. 31. demo Try it yourself, code and instructions at: https://github.com/ifweco/antelope/blob/master/doc/demo.md
  32. 32. 1. Gain understanding of machine learning 2. Gain understanding of the product usage 3. See opportunity to make the product better 4. Create training data 5. Train predictive models 6. Put models in production 7. See improvements Fast cycles!!
  33. 33. • All data in form of events – no exceptions! • Roll through history to generate training examples • Sample training data carefully to avoid feedback • Model is static while features are live and personal • Use interesting features with boring algorithms • Expressiveness > performance > scalability github.com/ifwe/antelope @jssmith

×