Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
VELOX: MODELS IN ACTION
Dan Crankshaw
UC Berkeley AMPLab
crankshaw@cs.berkeley.edu
AMPCamp 6
2015
Spark
Spark
Streaming Spark SQL
BlinkDB
GraphX
MLlib
KeystoneML
HDFS, S3, …
Tachyon
Mesos
BERKELEY DATA 

ANALYTICS STACK ...
Catify: Music for Cats
MODELINGTASK
Rating
Songs
MODELINGTASK
Ratings
Songs
Prediction
Catify: Music for Cats
Catify: Music for Cats
CatID Song Score
1 16 2.1
1 14 3.7
3 273 4.2
4 14 1.9
Catify: Music for Cats
Pipeline
CatID Song Score
1 16 2.1
1 14 3.7
3 273 4.2
4 14 1.9
Catify: Music for Cats
Tachyon + HDFS
Pipeline
CatID Song Score
1 16 2.1
1 14 3.7
3 273 4.2
4 14 1.9
Catify: Music for Cats
Tachyon + HDFS
Pipeline
CatID Song Score
1 16 2.1
1 14 3.7
3 273 4.2
4 14 1.9
Pipeline
Tachyon + HDFS
Node.js App Server
Apache Web Server
Catify: Music for Cats
Catify: Music for Cats
Songs
Users
Songs
Users
O(users * songs)
Catify: Music for Cats
Pipeline
Tachyon + HDFS
Node.js App Server
Apache Web Server
Catify: Music for Cats
Pipeline
Tachyon + HDFS
Node.js App Server
Apache Web Server
Precomputed
Ratings
Catify: Music for Cats
Pipeline
Tachyon + HDFS
Node.js App Server
Apache Web Server
Precomputed
Ratings
Catify: Music for Cats
Black box
Pipeline
Tachyon + HDFS
Node.js App Server
Apache Web Server
Training Data
Precomputed
Ratings
Catify: Music for Cats
Blac...
Pipeline
Tachyon + HDFS
Node.js App Server
Apache Web Server
Training Data
Precomputed
Ratings
Catify: Music for Cats
Blac...
What’s wrong?
1. Serving system: low-latency but
high staleness
What’s wrong?
1. Serving system: low-latency but
high staleness
2. Batch training: slow incremental
maintenance, no serving
What’s wrong?
1. Serving system: low-latency but
high staleness
2. Batch training: slow incremental
maintenance, no serving
3. Ad-hoc mo...
VELOX GOALS
VELOX GOALS
1. Low latency and fresh predictions
VELOX GOALS
1. Low latency and fresh predictions
2. Break the abstraction: model-
specific optimizations
VELOX GOALS
1. Low latency and fresh predictions
2. Break the abstraction: model-
specific optimizations
3. Unified system e...
Spark
Spark
Streaming Spark SQL
BlinkDB
GraphX
MLlib
KeystoneML
HDFS, S3, …
Tachyon
THE MISSING PIECE IN BDAS
Mesos
Spark
Streaming Spark
SQL
Graph
X ML
library
BlinkDB
Keystone
ML
Training
THE MISSING PIECE IN BDAS
Spark
HDFS, S3, …
Tach...
Spark
Streaming Spark
SQL
Graph
X ML
library
BlinkDB
Keystone
ML
Training Management + Serving
THE MISSING PIECE IN BDAS
S...
Spark
Streaming Spark
SQL
Graph
X ML
library
BlinkDB
Keystone
ML Velox
Training Management + Serving
THE MISSING PIECE IN ...
Spark
Streaming Spark
SQL
Graph
X
BlinkDB
Velox
Training Management + Serving
Spark
HDFS, S3, …
Tachyon
THE MISSING PIECE ...
Spark
Streaming Spark
SQL
Graph
X
BlinkDB
Velox
Training Management + Serving
Spark
HDFS, S3, …
Tachyon
Model
Manager
THE ...
Spark
Streaming Spark
SQL
Graph
X
BlinkDB
Velox
Training Management + Serving
Spark
HDFS, S3, …
Tachyon
Model
Manager
Pred...
VELOX ARCHITECTURE
VELOX ARCHITECTURE
Standalone Scala
Service
Shared-Nothing
Serving Cluster
Personalized Predictions
as a Service
Automatic...
SYSTEM ARCHITECTURE
uuid: 01-10
uuid: 11-20
uuid: 20-30
SYSTEM ARCHITECTURE
frontend.js
uuid: 01-10
uuid: 11-20
uuid: 20-30
SYSTEM ARCHITECTURE
frontend.js
uuid: 01-10
uuid: 11-20
uuid: 20-30
uuid: 4
SYSTEM ARCHITECTURE
Predictions via REST
frontend.js
uuid: 01-10
uuid: 11-20
uuid: 20-30
uuid: 4
SYSTEM ARCHITECTURE
Predictions via REST
frontend.js
Returns score
uuid: 01-10
uuid: 11-20
uuid: 20-30
uuid: 4
SYSTEM ARCHITECTURE
frontend.js
uuid: 01-10
uuid: 11-20
uuid: 20-30
uuid: 4
SYSTEM ARCHITECTURE
frontend.js
uuid: 01-10
uuid: 11-20
uuid: 20-30
Feedback via REST
uuid: 4
SYSTEM ARCHITECTURE
frontend.js
uuid: 01-10
uuid: 11-20
uuid: 20-30
Feedback via REST
Model updated
in realtime
uuid: 4
SYSTEM ARCHITECTURE
frontend.js
uuid: 01-10
uuid: 11-20
uuid: 20-30
SYSTEM ARCHITECTURE
master
workerworker
worker
frontend.js
uuid: 01-10
uuid: 11-20
uuid: 20-30
SYSTEM ARCHITECTURE
master
workerworker
worker
Batch train RPC
frontend.js
uuid: 01-10
uuid: 11-20
uuid: 20-30
SYSTEM ARCHITECTURE
master
workerworker
worker
Batch train RPC
frontend.js
uuid: 01-10
uuid: 11-20
uuid: 20-30
Returns bat...
Mesos Mesos
HDFS, S3, …
Tachyon
HadoopYarn
Spark
Straming Shark
SQL
Graph
X ML
library
BlinkDB MLbase
Spark
Velox
Training...
PREDICTION API
GET	
  /velox/catify/predict?userid=22&song=27632
Simple point queries:
PREDICTION API
GET	
  /velox/catify/predict_top_k?userid=22&k=100
GET	
  /velox/catify/predict?userid=22&song=27632
Simple...
PREDICTION API
GET	
  /velox/catify/predict_top_k?userid=22&k=100
GET	
  /velox/catify/predict?userid=22&song=27632
Simple...
PREDICTION API
GET	
  /velox/catify/predict_top_k?userid=22&k=100
GET	
  /velox/catify/predict?userid=22&song=27632
Simple...
SYSTEM ARCHITECTURE
Predictions via REST
frontend.js
uuid: 01-10
uuid: 11-20
uuid: 20-30
uuid: 4
PREDICTION EXECUTION
def	
  predict(	
  u:	
  UUID,	
  x:	
  Context	
  )
uuid model
PREDICTION EXECUTION
def	
  predict(	
  u:	
  UUID,	
  x:	
  Context	
  )
uuid model
Look up user
model
Read
PREDICTION EXECUTION
def	
  predict(	
  u:	
  UUID,	
  x:	
  Context	
  )
uuid model
Look up user
model
Primary key lookup...
PREDICTION EXECUTION
def	
  predict(	
  u:	
  UUID,	
  x:	
  Context	
  )
uuid model
Look up user
model
Primary key lookup...
Compute
Features
PREDICTION EXECUTION
def	
  predict(	
  u:	
  UUID,	
  x:	
  Context	
  )
user independent
}f( )
Compute
Features
PREDICTION EXECUTION
def	
  predict(	
  u:	
  UUID,	
  x:	
  Context	
  )
Feature computation 

could be ...
Compute
Features
PREDICTION EXECUTION
def	
  predict(	
  u:	
  UUID,	
  x:	
  Context	
  )
Feature computation 

could be ...
Compute
Features
PREDICTION EXECUTION
def	
  predict(	
  u:	
  UUID,	
  x:	
  Context	
  )
user independent
}f( )
Compute
Features
PREDICTION EXECUTION
def	
  predict(	
  u:	
  UUID,	
  x:	
  Context	
  )
Feature computation 

could be ...
Compute
Features
PREDICTION EXECUTION
def	
  predict(	
  u:	
  UUID,	
  x:	
  Context	
  )
Feature computation 

could be ...
SYSTEM ARCHITECTURE
frontend.js
uuid: 01-10
uuid: 11-20
uuid: 20-30
uuid: 4
SYSTEM ARCHITECTURE
frontend.js
Returns score
uuid: 01-10
uuid: 11-20
uuid: 20-30
uuid: 4
Mesos Mesos
HDFS, S3, …
Tachyon
HadoopYarn
Spark
Straming Shark
SQL
Graph
X ML
library
BlinkDB MLbase
Spark
Velox
Training...
PERSONALIZED SPLIT MODEL
Input
(Song)
PERSONALIZED SPLIT MODEL
Input
(Song)
Shared Basis Feature Model
PERSONALIZED SPLIT MODEL
Input
(Song)
Shared Basis Feature Model
Big Data
PERSONALIZED SPLIT MODEL
Input
(Song)
Shared Basis Feature Model
Big Data
Changes Slowly
PERSONALIZED SPLIT MODEL
Input
(Song)
Shared Basis Feature Model
Big Data
Changes Slowly
Train in Batch!
PERSONALIZED SPLIT MODEL
Input
(Song)
Shared Basis Feature Model
Big Data
Changes Slowly
Train in Batch!
Personalized
User...
PERSONALIZED SPLIT MODEL
Input
(Song)
Shared Basis Feature Model
Big Data
Changes Slowly
Train in Batch!
Small Data
Person...
PERSONALIZED SPLIT MODEL
Input
(Song)
Shared Basis Feature Model
Big Data
Changes Slowly
Train in Batch!
Small Data
Change...
PERSONALIZED SPLIT MODEL
Input
(Song)
Shared Basis Feature Model
Big Data
Changes Slowly
Train in Batch!
Small Data
Change...
Input
(Song)
Personalized
User Model
Shared Basis Feature Model
PERSONALIZED SPLIT MODEL
Input
(Song)
Personalized
User Model
Input
(Song)
Shared Basis Feature Model
PERSONALIZED SPLIT MODEL
Input
(Song)
Personalized
User Model
Input
(Song)
Shared Basis Feature Model
PERSONALIZED SPLIT MODEL
Input
(Song)
Personalized
User Model
Meow
Input
(Song)
Shared Basis Feature Model
PERSONALIZED SPLIT MODEL
Personalized
User Model
Meow
Input
(Song)
Input
(Song)
Shared Basis Feature Model
PERSONALIZED SPLIT MODEL
Personalized
User Model
Meow
Terrible
Input
(Song)
Input
(Song)
Shared Basis Feature Model
PERSONALIZED SPLIT MODEL
MULTIPLE FEATURE FUNCTIONS
Input
(Song) Rating
MULTIPLE FEATURE FUNCTIONS
Input
(Song) Rating
Split
MULTIPLE FEATURE FUNCTIONS
Input
(Song) Rating
Split
f1
f2
f3
f4
f5
MATHEMATICAL FORMULATION
Input
(Song)
MATHEMATICAL FORMULATION
Input
(Song)
x
Shared Basis
Feature Models
Changes
slowly
MATHEMATICAL FORMULATION
Input
(Song)
x
Shared Basis
Feature Models
Changes
slowly
MATHEMATICAL FORMULATION
Input
(Song)
f(x; ✓)
x
Shared Basis
Feature Models
Personalized
User Model
Changes
slowly
Highly
dynamic
MATHEMATICAL FORMULATION
Input
(Song)
f(...
Shared Basis
Feature Models
Personalized
User Model
Changes
slowly
Highly
dynamic
MATHEMATICAL FORMULATION
Input
(Song)
f(...
Shared Basis
Feature Models
Personalized
User Model
Changes
slowly
Highly
dynamic
= Rating
MATHEMATICAL FORMULATION
Input
...
FEEDBACK API
POST	
  /velox/catify/observe?userid=22&song=27&score=3.7
Simple direct value feedback:
FEEDBACK API
POST	
  /velox/catify/observe?userid=22&song=27&score=3.7
Simple direct value feedback:
Continuously update 
...
FEEDBACK API
POST	
  /velox/catify/observe?userid=22&song=27&score=3.7
Simple direct value feedback:
Continuously update 
...
FEEDBACK API
POST	
  /velox/catify/observe?userid=22&song=27&score=3.7
Simple direct value feedback:
Continuously update 
...
SYSTEM ARCHITECTURE
frontend.js
uuid: 01-10
uuid: 11-20
uuid: 20-30
Feedback via REST
Model updated
in realtime
uuid: 4
ONLINE LEARNING
velox.jar
user model
def	
  observe(u:	
  UUID,	
  x:	
  Context,	
  y:	
  Score)
ONLINE LEARNING
velox.jar
user model
def	
  observe(u:	
  UUID,	
  x:	
  Context,	
  y:	
  Score)
Update user
model with n...
ONLINE LEARNING
velox.jar
user model
def	
  observe(u:	
  UUID,	
  x:	
  Context,	
  y:	
  Score)
Stochastic gradient desc...
ONLINE LEARNING
velox.jar
user model
def	
  observe(u:	
  UUID,	
  x:	
  Context,	
  y:	
  Score)
Stochastic gradient desc...
SYSTEM ARCHITECTURE
master
workerworker
worker
Batch train RPC
frontend.js
uuid: 01-10
uuid: 11-20
uuid: 20-30
OFFLINE OR NEARLINE
LEARNING
def	
  retrain(trainingData:	
  RDD)
Spark Based
Training Algs.wu · f(x; ✓)
Automated retrain...
BEYOND RECOMMENDER
SYSTEMS
1. Spam and anomaly detection
BEYOND RECOMMENDER
SYSTEMS
1. Spam and anomaly detection
2. Device/location specific modeling
BEYOND RECOMMENDER
SYSTEMS
1. Spam and anomaly detection
2. Device/location specific modeling
3. Splitting models across networks
BEYOND RECOMMENDER
S...
Spark
Streaming Spark
SQL
Graph
X ML
library
BlinkDB
Keystone
ML Velox
Model Creation Management + Serving
THE MISSING PIE...
SUMMARY
Today: model training and serving relies on ad-hoc,
manual processes spread across multiple systems
SUMMARY
Today: model training and serving relies on ad-hoc,
manual processes spread across multiple systems
TheVelox system automa...
Today: model training and serving relies on ad-hoc,
manual processes spread across multiple systems
TheVelox system automa...
Today: model training and serving relies on ad-hoc,
manual processes spread across multiple systems
TheVelox system automa...
QUESTIONS?
FEATURE COMPUTE LATENCY:
SIMPLE MODELS
FEATURE COMPUTE LATENCY:
COMPLEX MODELS
Upcoming SlideShare
Loading in …5
×

Velox: Models In Action

150,337 views

Published on

by Dan Crankshaw

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Velox: Models In Action

  1. 1. VELOX: MODELS IN ACTION Dan Crankshaw UC Berkeley AMPLab crankshaw@cs.berkeley.edu AMPCamp 6 2015
  2. 2. Spark Spark Streaming Spark SQL BlinkDB GraphX MLlib KeystoneML HDFS, S3, … Tachyon Mesos BERKELEY DATA 
 ANALYTICS STACK (BDAS)
  3. 3. Catify: Music for Cats
  4. 4. MODELINGTASK Rating Songs
  5. 5. MODELINGTASK Ratings Songs Prediction
  6. 6. Catify: Music for Cats
  7. 7. Catify: Music for Cats CatID Song Score 1 16 2.1 1 14 3.7 3 273 4.2 4 14 1.9
  8. 8. Catify: Music for Cats Pipeline CatID Song Score 1 16 2.1 1 14 3.7 3 273 4.2 4 14 1.9
  9. 9. Catify: Music for Cats Tachyon + HDFS Pipeline CatID Song Score 1 16 2.1 1 14 3.7 3 273 4.2 4 14 1.9
  10. 10. Catify: Music for Cats Tachyon + HDFS Pipeline CatID Song Score 1 16 2.1 1 14 3.7 3 273 4.2 4 14 1.9
  11. 11. Pipeline Tachyon + HDFS Node.js App Server Apache Web Server Catify: Music for Cats
  12. 12. Catify: Music for Cats Songs Users
  13. 13. Songs Users O(users * songs) Catify: Music for Cats
  14. 14. Pipeline Tachyon + HDFS Node.js App Server Apache Web Server Catify: Music for Cats
  15. 15. Pipeline Tachyon + HDFS Node.js App Server Apache Web Server Precomputed Ratings Catify: Music for Cats
  16. 16. Pipeline Tachyon + HDFS Node.js App Server Apache Web Server Precomputed Ratings Catify: Music for Cats Black box
  17. 17. Pipeline Tachyon + HDFS Node.js App Server Apache Web Server Training Data Precomputed Ratings Catify: Music for Cats Black box
  18. 18. Pipeline Tachyon + HDFS Node.js App Server Apache Web Server Training Data Precomputed Ratings Catify: Music for Cats Black box
  19. 19. What’s wrong?
  20. 20. 1. Serving system: low-latency but high staleness What’s wrong?
  21. 21. 1. Serving system: low-latency but high staleness 2. Batch training: slow incremental maintenance, no serving What’s wrong?
  22. 22. 1. Serving system: low-latency but high staleness 2. Batch training: slow incremental maintenance, no serving 3. Ad-hoc model management What’s wrong?
  23. 23. VELOX GOALS
  24. 24. VELOX GOALS 1. Low latency and fresh predictions
  25. 25. VELOX GOALS 1. Low latency and fresh predictions 2. Break the abstraction: model- specific optimizations
  26. 26. VELOX GOALS 1. Low latency and fresh predictions 2. Break the abstraction: model- specific optimizations 3. Unified system eases operation
  27. 27. Spark Spark Streaming Spark SQL BlinkDB GraphX MLlib KeystoneML HDFS, S3, … Tachyon THE MISSING PIECE IN BDAS Mesos
  28. 28. Spark Streaming Spark SQL Graph X ML library BlinkDB Keystone ML Training THE MISSING PIECE IN BDAS Spark HDFS, S3, … Tachyon Mesos
  29. 29. Spark Streaming Spark SQL Graph X ML library BlinkDB Keystone ML Training Management + Serving THE MISSING PIECE IN BDAS Spark HDFS, S3, … Tachyon Mesos
  30. 30. Spark Streaming Spark SQL Graph X ML library BlinkDB Keystone ML Velox Training Management + Serving THE MISSING PIECE IN BDAS Spark HDFS, S3, … Tachyon Mesos
  31. 31. Spark Streaming Spark SQL Graph X BlinkDB Velox Training Management + Serving Spark HDFS, S3, … Tachyon THE MISSING PIECE IN BDAS Mesos ML library Keystone ML
  32. 32. Spark Streaming Spark SQL Graph X BlinkDB Velox Training Management + Serving Spark HDFS, S3, … Tachyon Model Manager THE MISSING PIECE IN BDAS Mesos ML library Keystone ML
  33. 33. Spark Streaming Spark SQL Graph X BlinkDB Velox Training Management + Serving Spark HDFS, S3, … Tachyon Model Manager Prediction Service THE MISSING PIECE IN BDAS Mesos ML library Keystone ML
  34. 34. VELOX ARCHITECTURE
  35. 35. VELOX ARCHITECTURE Standalone Scala Service Shared-Nothing Serving Cluster Personalized Predictions as a Service Automatic Integration with Spark
  36. 36. SYSTEM ARCHITECTURE uuid: 01-10 uuid: 11-20 uuid: 20-30
  37. 37. SYSTEM ARCHITECTURE frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30
  38. 38. SYSTEM ARCHITECTURE frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30 uuid: 4
  39. 39. SYSTEM ARCHITECTURE Predictions via REST frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30 uuid: 4
  40. 40. SYSTEM ARCHITECTURE Predictions via REST frontend.js Returns score uuid: 01-10 uuid: 11-20 uuid: 20-30 uuid: 4
  41. 41. SYSTEM ARCHITECTURE frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30 uuid: 4
  42. 42. SYSTEM ARCHITECTURE frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30 Feedback via REST uuid: 4
  43. 43. SYSTEM ARCHITECTURE frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30 Feedback via REST Model updated in realtime uuid: 4
  44. 44. SYSTEM ARCHITECTURE frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30
  45. 45. SYSTEM ARCHITECTURE master workerworker worker frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30
  46. 46. SYSTEM ARCHITECTURE master workerworker worker Batch train RPC frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30
  47. 47. SYSTEM ARCHITECTURE master workerworker worker Batch train RPC frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30 Returns batch trained model
  48. 48. Mesos Mesos HDFS, S3, … Tachyon HadoopYarn Spark Straming Shark SQL Graph X ML library BlinkDB MLbase Spark Velox Training Management + Serving PREDICTION SERVICE Model Manager Prediction Service
  49. 49. PREDICTION API GET  /velox/catify/predict?userid=22&song=27632 Simple point queries:
  50. 50. PREDICTION API GET  /velox/catify/predict_top_k?userid=22&k=100 GET  /velox/catify/predict?userid=22&song=27632 Simple point queries: More complex ordering queries:
  51. 51. PREDICTION API GET  /velox/catify/predict_top_k?userid=22&k=100 GET  /velox/catify/predict?userid=22&song=27632 Simple point queries: More complex ordering queries: Low-latency and scalable partitioning Personalized Predictions
  52. 52. PREDICTION API GET  /velox/catify/predict_top_k?userid=22&k=100 GET  /velox/catify/predict?userid=22&song=27632 Simple point queries: More complex ordering queries: Low-latency and scalable partitioning Personalized Predictions Intelligent Caching Sharing and re-use of model partial-state
  53. 53. SYSTEM ARCHITECTURE Predictions via REST frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30 uuid: 4
  54. 54. PREDICTION EXECUTION def  predict(  u:  UUID,  x:  Context  ) uuid model
  55. 55. PREDICTION EXECUTION def  predict(  u:  UUID,  x:  Context  ) uuid model Look up user model Read
  56. 56. PREDICTION EXECUTION def  predict(  u:  UUID,  x:  Context  ) uuid model Look up user model Primary key lookup Read
  57. 57. PREDICTION EXECUTION def  predict(  u:  UUID,  x:  Context  ) uuid model Look up user model Primary key lookup Partition queries by user: always local Read
  58. 58. Compute Features PREDICTION EXECUTION def  predict(  u:  UUID,  x:  Context  ) user independent }f( )
  59. 59. Compute Features PREDICTION EXECUTION def  predict(  u:  UUID,  x:  Context  ) Feature computation 
 could be costly user independent }f( )
  60. 60. Compute Features PREDICTION EXECUTION def  predict(  u:  UUID,  x:  Context  ) Feature computation 
 could be costly user independent } Cache features for reuse across users f( )
  61. 61. Compute Features PREDICTION EXECUTION def  predict(  u:  UUID,  x:  Context  ) user independent }f( )
  62. 62. Compute Features PREDICTION EXECUTION def  predict(  u:  UUID,  x:  Context  ) Feature computation 
 could be costly user independent }f( )
  63. 63. Compute Features PREDICTION EXECUTION def  predict(  u:  UUID,  x:  Context  ) Feature computation 
 could be costly user independent } Predict with missing features f( )
  64. 64. SYSTEM ARCHITECTURE frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30 uuid: 4
  65. 65. SYSTEM ARCHITECTURE frontend.js Returns score uuid: 01-10 uuid: 11-20 uuid: 20-30 uuid: 4
  66. 66. Mesos Mesos HDFS, S3, … Tachyon HadoopYarn Spark Straming Shark SQL Graph X ML library BlinkDB MLbase Spark Velox Training Management + Serving Model Manager Prediction Service MODEL MANAGER
  67. 67. PERSONALIZED SPLIT MODEL Input (Song)
  68. 68. PERSONALIZED SPLIT MODEL Input (Song) Shared Basis Feature Model
  69. 69. PERSONALIZED SPLIT MODEL Input (Song) Shared Basis Feature Model Big Data
  70. 70. PERSONALIZED SPLIT MODEL Input (Song) Shared Basis Feature Model Big Data Changes Slowly
  71. 71. PERSONALIZED SPLIT MODEL Input (Song) Shared Basis Feature Model Big Data Changes Slowly Train in Batch!
  72. 72. PERSONALIZED SPLIT MODEL Input (Song) Shared Basis Feature Model Big Data Changes Slowly Train in Batch! Personalized User Model
  73. 73. PERSONALIZED SPLIT MODEL Input (Song) Shared Basis Feature Model Big Data Changes Slowly Train in Batch! Small Data Personalized User Model
  74. 74. PERSONALIZED SPLIT MODEL Input (Song) Shared Basis Feature Model Big Data Changes Slowly Train in Batch! Small Data Changes Quickly Personalized User Model
  75. 75. PERSONALIZED SPLIT MODEL Input (Song) Shared Basis Feature Model Big Data Changes Slowly Train in Batch! Small Data Changes Quickly Train Online! Personalized User Model
  76. 76. Input (Song) Personalized User Model Shared Basis Feature Model PERSONALIZED SPLIT MODEL
  77. 77. Input (Song) Personalized User Model Input (Song) Shared Basis Feature Model PERSONALIZED SPLIT MODEL
  78. 78. Input (Song) Personalized User Model Input (Song) Shared Basis Feature Model PERSONALIZED SPLIT MODEL
  79. 79. Input (Song) Personalized User Model Meow Input (Song) Shared Basis Feature Model PERSONALIZED SPLIT MODEL
  80. 80. Personalized User Model Meow Input (Song) Input (Song) Shared Basis Feature Model PERSONALIZED SPLIT MODEL
  81. 81. Personalized User Model Meow Terrible Input (Song) Input (Song) Shared Basis Feature Model PERSONALIZED SPLIT MODEL
  82. 82. MULTIPLE FEATURE FUNCTIONS Input (Song) Rating
  83. 83. MULTIPLE FEATURE FUNCTIONS Input (Song) Rating Split
  84. 84. MULTIPLE FEATURE FUNCTIONS Input (Song) Rating Split f1 f2 f3 f4 f5
  85. 85. MATHEMATICAL FORMULATION Input (Song)
  86. 86. MATHEMATICAL FORMULATION Input (Song) x
  87. 87. Shared Basis Feature Models Changes slowly MATHEMATICAL FORMULATION Input (Song) x
  88. 88. Shared Basis Feature Models Changes slowly MATHEMATICAL FORMULATION Input (Song) f(x; ✓) x
  89. 89. Shared Basis Feature Models Personalized User Model Changes slowly Highly dynamic MATHEMATICAL FORMULATION Input (Song) f(x; ✓) x
  90. 90. Shared Basis Feature Models Personalized User Model Changes slowly Highly dynamic MATHEMATICAL FORMULATION Input (Song) f(x; ✓) · wu x
  91. 91. Shared Basis Feature Models Personalized User Model Changes slowly Highly dynamic = Rating MATHEMATICAL FORMULATION Input (Song) f(x; ✓) · wu x Meow
  92. 92. FEEDBACK API POST  /velox/catify/observe?userid=22&song=27&score=3.7 Simple direct value feedback:
  93. 93. FEEDBACK API POST  /velox/catify/observe?userid=22&song=27&score=3.7 Simple direct value feedback: Continuously update 
 user models inVelox Online Learning
  94. 94. FEEDBACK API POST  /velox/catify/observe?userid=22&song=27&score=3.7 Simple direct value feedback: Continuously update 
 user models inVelox Online Learning Offline Learning Saved for feature learning in Spark
  95. 95. FEEDBACK API POST  /velox/catify/observe?userid=22&song=27&score=3.7 Simple direct value feedback: Continuously update 
 user models inVelox Online Learning Offline Learning Saved for feature learning in Spark Evaluation Continuously assess
 model performance
  96. 96. SYSTEM ARCHITECTURE frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30 Feedback via REST Model updated in realtime uuid: 4
  97. 97. ONLINE LEARNING velox.jar user model def  observe(u:  UUID,  x:  Context,  y:  Score)
  98. 98. ONLINE LEARNING velox.jar user model def  observe(u:  UUID,  x:  Context,  y:  Score) Update user model with new training data Write
  99. 99. ONLINE LEARNING velox.jar user model def  observe(u:  UUID,  x:  Context,  y:  Score) Stochastic gradient descent Update user model with new training data Write
  100. 100. ONLINE LEARNING velox.jar user model def  observe(u:  UUID,  x:  Context,  y:  Score) Stochastic gradient descent Incremental linear algebra Update user model with new training data Write
  101. 101. SYSTEM ARCHITECTURE master workerworker worker Batch train RPC frontend.js uuid: 01-10 uuid: 11-20 uuid: 20-30
  102. 102. OFFLINE OR NEARLINE LEARNING def  retrain(trainingData:  RDD) Spark Based Training Algs.wu · f(x; ✓) Automated retraining policies Efficient batch training using Spark Incremental learning using Spark Streaming
  103. 103. BEYOND RECOMMENDER SYSTEMS
  104. 104. 1. Spam and anomaly detection BEYOND RECOMMENDER SYSTEMS
  105. 105. 1. Spam and anomaly detection 2. Device/location specific modeling BEYOND RECOMMENDER SYSTEMS
  106. 106. 1. Spam and anomaly detection 2. Device/location specific modeling 3. Splitting models across networks BEYOND RECOMMENDER SYSTEMS
  107. 107. Spark Streaming Spark SQL Graph X ML library BlinkDB Keystone ML Velox Model Creation Management + Serving THE MISSING PIECE IN BDAS Spark HDFS, S3, … Tachyon Mesos
  108. 108. SUMMARY
  109. 109. Today: model training and serving relies on ad-hoc, manual processes spread across multiple systems SUMMARY
  110. 110. Today: model training and serving relies on ad-hoc, manual processes spread across multiple systems TheVelox system automatically maintains multiple models while providing low latency, fresh, and personalized predictions SUMMARY
  111. 111. Today: model training and serving relies on ad-hoc, manual processes spread across multiple systems TheVelox system automatically maintains multiple models while providing low latency, fresh, and personalized predictions SUMMARY
  112. 112. Today: model training and serving relies on ad-hoc, manual processes spread across multiple systems TheVelox system automatically maintains multiple models while providing low latency, fresh, and personalized predictions https://amplab.cs.berkeley.edu/projects/velox/ crankshaw@cs.berkeley.edu SUMMARY
  113. 113. QUESTIONS?
  114. 114. FEATURE COMPUTE LATENCY: SIMPLE MODELS
  115. 115. FEATURE COMPUTE LATENCY: COMPLEX MODELS

×