Velox: Models In Action

150,100 views

Published on

by Dan Crankshaw

Published in: Data & Analytics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
150,100
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
47
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Velox: Models In Action

  1. 1. VELOX: MODELS IN ACTION Dan Crankshaw UC Berkeley AMPLab crankshaw@cs.berkeley.edu AMPCamp 6 2015
  2. 2. Spark Spark Streaming Spark SQL BlinkDB GraphX MLlib KeystoneML HDFS, S3, … Tachyon Mesos BERKELEY DATA 
 ANALYTICS STACK (BDAS)
  3. 3. Catify: Music for Cats
  4. 4. MODELINGTASK Rating Songs
  5. 5. MODELINGTASK Ratings Songs Prediction
  6. 6. Catify: Music for Cats
  7. 7. Catify: Music for Cats CatID Song Score 1 16 2.1 1 14 3.7 3 273 4.2 4 14 1.9
  8. 8. Catify: Music for Cats Pipeline CatID Song Score 1 16 2.1 1 14 3.7 3 273 4.2 4 14 1.9
  9. 9. Catify: Music for Cats Tachyon + HDFS Pipeline CatID Song Score 1 16 2.1 1 14 3.7 3 273 4.2 4 14 1.9
  10. 10. Catify: Music for Cats Tachyon + HDFS Pipeline CatID Song Score 1 16 2.1 1 14 3.7 3 273 4.2 4 14 1.9
  11. 11. Pipeline Tachyon + HDFS Node.js App Server Apache Web Server Catify: Music for Cats
  12. 12. Catify: Music for Cats Songs Users
  13. 13. Songs Users O(users * songs) Catify: Music for Cats
  14. 14. Pipeline Tachyon + HDFS Node.js App Server Apache Web Server Catify: Music for Cats
  15. 15. Pipeline Tachyon + HDFS Node.js App Server Apache Web Server Precomputed Ratings Catify: Music for Cats
  16. 16. Pipeline Tachyon + HDFS Node.js App Server Apache Web Server Precomputed Ratings Catify: Music for Cats Black box
  17. 17. Pipeline Tachyon + HDFS Node.js App Server Apache Web Server Training Data Precomputed Ratings Catify: Music for Cats Black box
  18. 18. Pipeline Tachyon + HDFS Node.js App Server Apache Web Server Training Data Precomputed Ratings Catify: Music for Cats Black box
  19. 19. What’s wrong?
  20. 20. 1. Serving system: low-latency but high staleness What’s wrong?
  21. 21. 1. Serving system: low-latency but high staleness 2. Batch training: slow incremental maintenance, no serving What’s wrong?
  22. 22. 1. Serving system: low-latency but high staleness 2. Batch training: slow incremental maintenance, no serving 3. Ad-hoc model management What’s wrong?
  23. 23. VELOX GOALS
  24. 24. VELOX GOALS 1. Low latency and fresh predictions
  25. 25. VELOX GOALS 1. Low latency and fresh predictions 2. Break the abstraction: model- specific optimizations
  26. 26. VELOX GOALS 1. Low latency and fresh predictions 2. Break the abstraction: model- specific optimizations 3. Unified system eases operation
  27. 27. Spark Spark Streaming Spark SQL BlinkDB GraphX MLlib KeystoneML HDFS, S3, … Tachyon THE MISSING PIECE IN BDAS Mesos
  28. 28. Spark Streaming Spark SQL Graph X ML library BlinkDB Keystone ML Training THE MISSING PIECE IN BDAS Spark HDFS, S3, … Tachyon Mesos
  29. 29. Spark Streaming Spark SQL Graph X ML library BlinkDB Keystone ML Training Management + Serving THE MISSING PIECE IN BDAS Spark HDFS, S3, … Tachyon Mesos
  30. 30. Spark Streaming Spark SQL Graph X ML library BlinkDB Keystone ML Velox Training Management + Serving THE MISSING PIECE IN BDAS Spark HDFS, S3, … Tachyon Mesos
  31. 31. Spark Streaming Spark SQL Graph X BlinkDB Velox Training Management + Serving Spark HDFS, S3, … Tachyon THE MISSING PIECE IN BDAS Mesos ML library Keystone ML
  32. 32. Spark Streaming Spark SQL Graph X BlinkDB Velox Training Management + Serving Spark HDFS, S3, … Tachyon Model Manager THE MISSING PIECE IN BDAS Mesos ML library Keystone ML
  33. 33. Spark Streaming Spark SQL Graph X BlinkDB Velox Training Management + Serving Spark HDFS, S3, … Tachyon Model Manager Prediction Service THE MISSING PIECE IN BDAS Mesos ML library Keystone ML
  34. 34. VELOX ARCHITECTURE
  35. 35. VELOX ARCHITECTURE Standalone Scala Service Shared-Nothing Serving Cluster Personalized Predictions as a Service Automatic Integration with Spark
  36. 36. SYSTEM ARCHITECTURE uuid: 01-10 uuid: 11-20 uuid: 20-30
  37. 37. SYSTEM ARCHITECTURE frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30
  38. 38. SYSTEM ARCHITECTURE frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30 uuid: 4
  39. 39. SYSTEM ARCHITECTURE Predictions via REST frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30 uuid: 4
  40. 40. SYSTEM ARCHITECTURE Predictions via REST frontend.js Returns score uuid: 01-10 uuid: 11-20 uuid: 20-30 uuid: 4
  41. 41. SYSTEM ARCHITECTURE frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30 uuid: 4
  42. 42. SYSTEM ARCHITECTURE frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30 Feedback via REST uuid: 4
  43. 43. SYSTEM ARCHITECTURE frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30 Feedback via REST Model updated in realtime uuid: 4
  44. 44. SYSTEM ARCHITECTURE frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30
  45. 45. SYSTEM ARCHITECTURE master workerworker worker frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30
  46. 46. SYSTEM ARCHITECTURE master workerworker worker Batch train RPC frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30
  47. 47. SYSTEM ARCHITECTURE master workerworker worker Batch train RPC frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30 Returns batch trained model
  48. 48. Mesos Mesos HDFS, S3, … Tachyon HadoopYarn Spark Straming Shark SQL Graph X ML library BlinkDB MLbase Spark Velox Training Management + Serving PREDICTION SERVICE Model Manager Prediction Service
  49. 49. PREDICTION API GET  /velox/catify/predict?userid=22&song=27632 Simple point queries:
  50. 50. PREDICTION API GET  /velox/catify/predict_top_k?userid=22&k=100 GET  /velox/catify/predict?userid=22&song=27632 Simple point queries: More complex ordering queries:
  51. 51. PREDICTION API GET  /velox/catify/predict_top_k?userid=22&k=100 GET  /velox/catify/predict?userid=22&song=27632 Simple point queries: More complex ordering queries: Low-latency and scalable partitioning Personalized Predictions
  52. 52. PREDICTION API GET  /velox/catify/predict_top_k?userid=22&k=100 GET  /velox/catify/predict?userid=22&song=27632 Simple point queries: More complex ordering queries: Low-latency and scalable partitioning Personalized Predictions Intelligent Caching Sharing and re-use of model partial-state
  53. 53. SYSTEM ARCHITECTURE Predictions via REST frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30 uuid: 4
  54. 54. PREDICTION EXECUTION def  predict(  u:  UUID,  x:  Context  ) uuid model
  55. 55. PREDICTION EXECUTION def  predict(  u:  UUID,  x:  Context  ) uuid model Look up user model Read
  56. 56. PREDICTION EXECUTION def  predict(  u:  UUID,  x:  Context  ) uuid model Look up user model Primary key lookup Read
  57. 57. PREDICTION EXECUTION def  predict(  u:  UUID,  x:  Context  ) uuid model Look up user model Primary key lookup Partition queries by user: always local Read
  58. 58. Compute Features PREDICTION EXECUTION def  predict(  u:  UUID,  x:  Context  ) user independent }f( )
  59. 59. Compute Features PREDICTION EXECUTION def  predict(  u:  UUID,  x:  Context  ) Feature computation 
 could be costly user independent }f( )
  60. 60. Compute Features PREDICTION EXECUTION def  predict(  u:  UUID,  x:  Context  ) Feature computation 
 could be costly user independent } Cache features for reuse across users f( )
  61. 61. Compute Features PREDICTION EXECUTION def  predict(  u:  UUID,  x:  Context  ) user independent }f( )
  62. 62. Compute Features PREDICTION EXECUTION def  predict(  u:  UUID,  x:  Context  ) Feature computation 
 could be costly user independent }f( )
  63. 63. Compute Features PREDICTION EXECUTION def  predict(  u:  UUID,  x:  Context  ) Feature computation 
 could be costly user independent } Predict with missing features f( )
  64. 64. SYSTEM ARCHITECTURE frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30 uuid: 4
  65. 65. SYSTEM ARCHITECTURE frontend.js Returns score uuid: 01-10 uuid: 11-20 uuid: 20-30 uuid: 4
  66. 66. Mesos Mesos HDFS, S3, … Tachyon HadoopYarn Spark Straming Shark SQL Graph X ML library BlinkDB MLbase Spark Velox Training Management + Serving Model Manager Prediction Service MODEL MANAGER
  67. 67. PERSONALIZED SPLIT MODEL Input (Song)
  68. 68. PERSONALIZED SPLIT MODEL Input (Song) Shared Basis Feature Model
  69. 69. PERSONALIZED SPLIT MODEL Input (Song) Shared Basis Feature Model Big Data
  70. 70. PERSONALIZED SPLIT MODEL Input (Song) Shared Basis Feature Model Big Data Changes Slowly
  71. 71. PERSONALIZED SPLIT MODEL Input (Song) Shared Basis Feature Model Big Data Changes Slowly Train in Batch!
  72. 72. PERSONALIZED SPLIT MODEL Input (Song) Shared Basis Feature Model Big Data Changes Slowly Train in Batch! Personalized User Model
  73. 73. PERSONALIZED SPLIT MODEL Input (Song) Shared Basis Feature Model Big Data Changes Slowly Train in Batch! Small Data Personalized User Model
  74. 74. PERSONALIZED SPLIT MODEL Input (Song) Shared Basis Feature Model Big Data Changes Slowly Train in Batch! Small Data Changes Quickly Personalized User Model
  75. 75. PERSONALIZED SPLIT MODEL Input (Song) Shared Basis Feature Model Big Data Changes Slowly Train in Batch! Small Data Changes Quickly Train Online! Personalized User Model
  76. 76. Input (Song) Personalized User Model Shared Basis Feature Model PERSONALIZED SPLIT MODEL
  77. 77. Input (Song) Personalized User Model Input (Song) Shared Basis Feature Model PERSONALIZED SPLIT MODEL
  78. 78. Input (Song) Personalized User Model Input (Song) Shared Basis Feature Model PERSONALIZED SPLIT MODEL
  79. 79. Input (Song) Personalized User Model Meow Input (Song) Shared Basis Feature Model PERSONALIZED SPLIT MODEL
  80. 80. Personalized User Model Meow Input (Song) Input (Song) Shared Basis Feature Model PERSONALIZED SPLIT MODEL
  81. 81. Personalized User Model Meow Terrible Input (Song) Input (Song) Shared Basis Feature Model PERSONALIZED SPLIT MODEL
  82. 82. MULTIPLE FEATURE FUNCTIONS Input (Song) Rating
  83. 83. MULTIPLE FEATURE FUNCTIONS Input (Song) Rating Split
  84. 84. MULTIPLE FEATURE FUNCTIONS Input (Song) Rating Split f1 f2 f3 f4 f5
  85. 85. MATHEMATICAL FORMULATION Input (Song)
  86. 86. MATHEMATICAL FORMULATION Input (Song) x
  87. 87. Shared Basis Feature Models Changes slowly MATHEMATICAL FORMULATION Input (Song) x
  88. 88. Shared Basis Feature Models Changes slowly MATHEMATICAL FORMULATION Input (Song) f(x; ✓) x
  89. 89. Shared Basis Feature Models Personalized User Model Changes slowly Highly dynamic MATHEMATICAL FORMULATION Input (Song) f(x; ✓) x
  90. 90. Shared Basis Feature Models Personalized User Model Changes slowly Highly dynamic MATHEMATICAL FORMULATION Input (Song) f(x; ✓) · wu x
  91. 91. Shared Basis Feature Models Personalized User Model Changes slowly Highly dynamic = Rating MATHEMATICAL FORMULATION Input (Song) f(x; ✓) · wu x Meow
  92. 92. FEEDBACK API POST  /velox/catify/observe?userid=22&song=27&score=3.7 Simple direct value feedback:
  93. 93. FEEDBACK API POST  /velox/catify/observe?userid=22&song=27&score=3.7 Simple direct value feedback: Continuously update 
 user models inVelox Online Learning
  94. 94. FEEDBACK API POST  /velox/catify/observe?userid=22&song=27&score=3.7 Simple direct value feedback: Continuously update 
 user models inVelox Online Learning Offline Learning Saved for feature learning in Spark
  95. 95. FEEDBACK API POST  /velox/catify/observe?userid=22&song=27&score=3.7 Simple direct value feedback: Continuously update 
 user models inVelox Online Learning Offline Learning Saved for feature learning in Spark Evaluation Continuously assess
 model performance
  96. 96. SYSTEM ARCHITECTURE frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30 Feedback via REST Model updated in realtime uuid: 4
  97. 97. ONLINE LEARNING velox.jar user model def  observe(u:  UUID,  x:  Context,  y:  Score)
  98. 98. ONLINE LEARNING velox.jar user model def  observe(u:  UUID,  x:  Context,  y:  Score) Update user model with new training data Write
  99. 99. ONLINE LEARNING velox.jar user model def  observe(u:  UUID,  x:  Context,  y:  Score) Stochastic gradient descent Update user model with new training data Write
  100. 100. ONLINE LEARNING velox.jar user model def  observe(u:  UUID,  x:  Context,  y:  Score) Stochastic gradient descent Incremental linear algebra Update user model with new training data Write
  101. 101. SYSTEM ARCHITECTURE master workerworker worker Batch train RPC frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30
  102. 102. OFFLINE OR NEARLINE LEARNING def  retrain(trainingData:  RDD) Spark Based Training Algs.wu · f(x; ✓) Automated retraining policies Efficient batch training using Spark Incremental learning using Spark Streaming
  103. 103. BEYOND RECOMMENDER SYSTEMS
  104. 104. 1. Spam and anomaly detection BEYOND RECOMMENDER SYSTEMS
  105. 105. 1. Spam and anomaly detection 2. Device/location specific modeling BEYOND RECOMMENDER SYSTEMS
  106. 106. 1. Spam and anomaly detection 2. Device/location specific modeling 3. Splitting models across networks BEYOND RECOMMENDER SYSTEMS
  107. 107. Spark Streaming Spark SQL Graph X ML library BlinkDB Keystone ML Velox Model Creation Management + Serving THE MISSING PIECE IN BDAS Spark HDFS, S3, … Tachyon Mesos
  108. 108. SUMMARY
  109. 109. Today: model training and serving relies on ad-hoc, manual processes spread across multiple systems SUMMARY
  110. 110. Today: model training and serving relies on ad-hoc, manual processes spread across multiple systems TheVelox system automatically maintains multiple models while providing low latency, fresh, and personalized predictions SUMMARY
  111. 111. Today: model training and serving relies on ad-hoc, manual processes spread across multiple systems TheVelox system automatically maintains multiple models while providing low latency, fresh, and personalized predictions SUMMARY
  112. 112. Today: model training and serving relies on ad-hoc, manual processes spread across multiple systems TheVelox system automatically maintains multiple models while providing low latency, fresh, and personalized predictions https://amplab.cs.berkeley.edu/projects/velox/ crankshaw@cs.berkeley.edu SUMMARY
  113. 113. QUESTIONS?
  114. 114. FEATURE COMPUTE LATENCY: SIMPLE MODELS
  115. 115. FEATURE COMPUTE LATENCY: COMPLEX MODELS

×