Advertisement
Advertisement

More Related Content

Slideshows for you(20)

Viewers also liked(20)

Advertisement

Similar to An Architecture for Agile Machine Learning in Real-Time Applications(20)

Advertisement

An Architecture for Agile Machine Learning in Real-Time Applications

  1. An Architecture for Agile Machine Learning in Real-Time Applications johann@ifwe.co@jssmith github.com/ifwe Johann Schleier-Smith if(we) Inc. August 11, 2015
 KDD, Sydney Australia
  2. • Profitable startup actively pursuing big opportunities in social apps • Millions of users on existing products • Thousands of social contacts per second
  3. Overview • Agile machine learning can be difficult— but brings big benefits • Key challenges in deployment and feature engineering • Solution in single path to data
  4. production development serve personalized recommendations data collection model updates
  5. production development serve personalized recommendations data collection model updates study & understand train & backtest design new models & features
  6. production development serve personalized recommendations data collection study & understand design new models & features model updates train & backtest
  7. model updates train & backtest write spec e-mail model to engineers request engineering why did we want this? QA bug fixes meetingswait export to Excel check parameters Java development new database schema
  8. model updates train & backtest • Shared path to data • Shared feature definition code
  9. production development serve personalized recommendations data collection model updates study & understand train & backtest design new models & features
  10. • >10 million candidates • >1000 updates/sec • Must be responsive to current activity • Users expect instant query results Recommendation Engine
 for Dating Product
  11. Model
  12. Model
  13. Model • Decompose likelihood of match between vote outcomes and vote occurrence • Logistic regression • Real-time personalization through feature vector evolution • Model parameters trained offline by data scientists • Consider 1000s of features, select 50-100
  14. Application APIs & Business Logic RDBMS
  15. Application APIs & Business Logic RDBMS Data Warehouse / Hadoop
  16. Application APIs & Business Logic RDBMS Data Warehouse / Hadoop Streaming Logs
  17. Application APIs & Business Logic RDBMS Data Warehouse / Hadoop Streaming Logs
  18. Application APIs & Business Logic RDBMS production development Exploratory Analysis Training & Backtesting Data Warehouse / Hadoop Streaming Logs
  19. Application APIs & Business Logic RDBMS production development Exploratory Analysis Training & Backtesting Batch Predictions Data Warehouse / Hadoop Streaming Logs
  20. Application APIs & Business Logic RDBMS production development Exploratory Analysis Training & Backtesting Batch Predictions Predictive Services / Ranking Data Warehouse / Hadoop Streaming Logs
  21. Application APIs & Business Logic RDBMS production development Exploratory Analysis Training & Backtesting Batch Predictions Predictive Services / Ranking Data Warehouse / Hadoop Streaming Logs
  22. Events Time
  23. Aggregation first( ) last( ) count( ) sum( ) max( ) count( ) avg( ) min( ) Events Time
  24. Machine learning inputAggregation first( ) last( ) count( ) sum( ) max( ) count( ) avg( ) min( ) Events Time
  25. Event History API trait EventHistory { def publishEvent(e: Event) def getEvents( startTime: Date, endTime: Date, eventFilter: EventFilter, eventHandler: EventHandler ) }
  26. Event History API trait EventHistory { def publishEvent(e: Event) def getEvents( startTime: Date, endTime: Date, eventFilter: EventFilter, eventHandler: EventHandler ) }
  27. Event History API trait EventHistory { def publishEvent(e: Event) def getEvents( startTime: Date, endTime: Date, eventFilter: EventFilter, eventHandler: EventHandler ) }
  28. Event History API trait EventHistory { def publishEvent(e: Event) def getEvents( startTime: Date, endTime: Date, eventFilter: EventFilter, eventHandler: EventHandler ) } +∞ for
 real-time
 streaming
  29. Events Alice updates profile Bob opens app Bob sees Alice in recommendations Bob swipes yes on Alice Alice receives push notification Alice sees Bob in recommendations Alice sends message to Bob Time
  30. Online feature stateEvents Alice updates profile Bob opens app Bob sees Alice in recommendations Bob swipes yes on Alice Alice receives push notification Alice sees Bob in recommendations Alice sends message to Bob Time
  31. Machine learning inputOnline feature stateEvents Alice updates profile Bob opens app Bob sees Alice in recommendations Bob swipes yes on Alice Alice receives push notification Alice sees Bob in recommendations Alice sends message to Bob Time
  32. RDBMS Application APIs & Business Logic Event History Repository Ranking Real-Time State Updates
  33. RDBMS Application APIs & Business Logic Event History Repository Ranking Real-Time State Updates
  34. RDBMS Application APIs & Business Logic Event History Repository Ranking Real-Time State Updates
  35. RDBMS Application APIs & Business Logic Event History Repository Ranking Real-Time State Updates State Updates Exploratory Analysis Training & Backtesting production development
  36. RDBMS Application APIs & Business Logic Event History Repository Ranking Real-Time State Updates State Updates Exploratory Analysis Training & Backtesting production development
  37. RDBMS Application APIs & Business Logic Event History Repository Ranking Real-Time State Updates State Updates Exploratory Analysis Training & Backtesting production development
  38. RDBMS Application APIs & Business Logic Event History Repository Ranking Real-Time State Updates State Updates Exploratory Analysis Training & Backtesting production development
  39. RDBMS Application APIs & Business Logic Event History Repository Ranking Real-Time State Updates State Updates Exploratory Analysis Training & Backtesting production development
  40. RDBMS Application APIs & Business Logic Event History Repository Ranking Real-Time State Updates State Updates Exploratory Analysis Training & Backtesting production development
  41. Monitoring RDBMS Application APIs & Business Logic Event History Repository Ranking Real-Time State Updates State Updates Exploratory Analysis Training & Backtesting production development
  42. • Single path to data for real-time streaming and history • Shared feature engineering code for development and production • Team shares access to code and data • Fine-grained alignment of feature state and prediction outcomes • Temporally accurate modeling ensured (no looking ahead) Event History API
  43. 15 new models released and tested within 6 months >30% cumulative improvement in usage shown in A/B testing 0 500,000 1,000,000 1,500,000 2,000,000 Apr 2013 Jul 2013 Oct 2013 Jan 2014 Apr 2014 DailyUniqueUsers Matchers Voters New model released A/Btestupdated
  44. • Open source implementation derived from if(we)’s proprietary platform • Provides Scala DSL for building online features from event history • Examples include dating recommendations and product search with learning to rank • Not yet ready for scale or production • Seeking collaborators
  45. Production Serving Data Science Ranking R MatlabPython Feature Engineering Event History API Kafka Streaming data Storm Historical data S3 NFSHDFS Antelope Open Source Vision
  46. Agile Machine Learning with Event History • Solving deployment yields quick product cycles • All data saved and retrieved as time-ordered events • Single path to data for both historical and real-time access • Same feature engineering code used in development and production • Agile success • Team shares access to code and data • Production product iterations measured in days rather than months github.com/ifwe/antelope@jssmith
Advertisement