Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Building and Deploying ML Applications
on production
in a fraction of the time.
A Machine Learning Server in Scala
Available Tools
Processing Framework
• e.g. Apache Spark, Apache Hadoop
Algorithm Libraries
• e.g. MLlib, Mahout
Data Stor...
Integrate everything together nicely
and move from prototyping to production.
What is Missing?
You have a mobile app
A Classic Recommender Example…
App
Predict
products
You need a Recommendation Engine
Predict product...
def pseudocode () {
// Read training data

val trainingData = sc.textFile("trainingData.txt").map(_.split(',') match

{ …....
• How to deploy a scalable service that respond to dynamic prediction query?
• How do you persist the predictive model, in...
Engine
Event Server
(data storage)
Data: User Actions
Query via REST:
User ID
Predicted Result:
A list of Product IDs
A Cl...
• PredictionIO is a machine learning server for
building and deploying predictive engines

on production

in a fraction of...
Data: User Actions
Query via REST:
User ID
Predicted Result:
A list of Product IDs
Engine
Event Server
(data storage)
Mobi...
• $ pio eventserver
• Event-based
client.create_event(
event="rate",
entity_type="user",
entity_id=“user_123”,
target_enti...
Query via REST:
User ID
Predicted Result:
A list of Product IDs
Engine
Data: User Actions
Event Server
(data storage)
Mobi...
• DASE - the “MVC” for Machine Learning
• Data: Data Source and Data Preparator
• Algorithm(s)
• Serving
• Evaluator
Engin...
A. Train deployable predictive model(s)
B. Respond to dynamic query
C. Evaluation
Engine Functions of an Engine
Engine A. Train predictive model(s)
class DataSource(…) extends PDataSource
def readTraining(sc: SparkContext)
==> trainin...
Engine A. Train predictive model(s)
class DataSource(…) extends PDataSource
override def readTraining(sc: SparkContext): T...
Engine A. Train predictive model(s)
class Algorithm1(val ap: ALSAlgorithmParams) extends PAlgorithm
def train(preparedData...
Engine A. Train predictive model(s)
Event Server
Algorithm 1 Algorithm 3Algorithm 2
PreparedDate
Engine
Data Preparator
Da...
B. Respond to dynamic queryEngine
• Query (Input) :



$ curl -H "Content-Type: application/json" -d 

'{ "user": "1", "nu...
B. Respond to dynamic queryEngine
• Predicted Result (Output):



{“itemScores”:[{"item":"22","score":4.072304374729956},
...
class Algorithm1(…) extends PAlgorithm
def predict(model: ALSModel, query: Query)
==> predictedResult
class Serving extend...
Engine B. Respond to dynamic query
class Algorithm1(val ap: ALSAlgorithmParams) extends
PAlgorithm
def predict(model: ALSM...
B. Respond to dynamic queryEngine
Algorithm 1
Model 1
Serving
Mobile App
Algorithm 3
Model 3
Algorithm 2
Model 2
Predicted...
Engine DASE Factory
object RecEngine extends IEngineFactory {
def apply() = {
new Engine(
classOf[DataSource],
classOf[Pre...
Running on Production
• Install PredictionIO

$ bash -c "$(curl -s http://install.prediction.io/install.sh)"
• Start the E...
Deploy on Production
Website
Mobile App
Email
Campaign
Event Server
(data storage)
Data
Query via REST
Predicted
Result
En...
The Next Step
• Quickstart with an Engine Template!
• Follow on Github: github.com/predictionio/
• Learn PredictionIO: pre...
Thanks.
Simon Chan
simon@prediction.io
@PredictionIO
prediction.io (Newsletters)
github.com/predictionio
Upcoming SlideShare
Loading in …5
×

PredictionIO – A Machine Learning Server in Scala – SF Scala

101,382 views

Published on

A Machine Learning Server in Scala.

Building and Deploying ML Applications on production in a fraction of the time.

Slides from SF Scala meetup January 2015 at StumbleUpon.

Published in: Engineering
  • Dating direct: ❶❶❶ http://bit.ly/2LaDVgK ❶❶❶
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Follow the link, new dating source: ❤❤❤ http://bit.ly/2LaDVgK ❤❤❤
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • I am so pleased that I found you! I have suffered from Sleep Apnea for years. I have tried everything to fix the problem but nothing has worked. For the last years I have been trying to use a CPAP machine on and off but it is very difficult to sleep with. It's noisy and very uncomfortable. I had no idea there was a natural way to help me. I am so pleased that I found you!  http://t.cn/Aigi9dEf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hi there! Essay Help For Students | Discount 10% for your first order! - Check our website! https://vk.cc/80SakO
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

PredictionIO – A Machine Learning Server in Scala – SF Scala

  1. 1. Building and Deploying ML Applications on production in a fraction of the time. A Machine Learning Server in Scala
  2. 2. Available Tools Processing Framework • e.g. Apache Spark, Apache Hadoop Algorithm Libraries • e.g. MLlib, Mahout Data Storage • e.g. HBase, Cassandra
  3. 3. Integrate everything together nicely and move from prototyping to production. What is Missing?
  4. 4. You have a mobile app A Classic Recommender Example… App Predict products You need a Recommendation Engine Predict products that a customer will like – and show it. Predictive model Algorithm - You don't need to write your own:
 Spark MLlib - ALS algorithm
 Predictive model - based on users’ previous behaviors
  5. 5. def pseudocode () { // Read training data
 val trainingData = sc.textFile("trainingData.txt").map(_.split(',') match
 { …. }) // Build a predictive model with an algorithm
 val model = ALS.train(trainingData, 10, 20, 0.01) // Make prediction
 allUsers.foreach { user =>
 model.recommendProducts(user, 5)
 } } A Classic Recommender Example prototyping…
  6. 6. • How to deploy a scalable service that respond to dynamic prediction query? • How do you persist the predictive model, in a distributed environment? • How to make HBase, Spark and algorithms talking to each other? • How should I prepare, or transform, the data for model training? • How to update the model with new data without downtime? • Where should I add some business logics? • How to make the code configurable, re-usable and maintainable? • How do I build all these with a separate of concerns (SoC)? Beyond Prototyping
  7. 7. Engine Event Server (data storage) Data: User Actions Query via REST: User ID Predicted Result: A list of Product IDs A Classic Recommender Example on production… Mobile App
  8. 8. • PredictionIO is a machine learning server for building and deploying predictive engines
 on production
 in a fraction of the time. • Built on Apache Spark, MLlib and HBase. PredictionIO
  9. 9. Data: User Actions Query via REST: User ID Predicted Result: A list of Product IDs Engine Event Server (data storage) Mobile App Event Server
  10. 10. • $ pio eventserver • Event-based client.create_event( event="rate", entity_type="user", entity_id=“user_123”, target_entity_type="item", target_entity_id=“item_100”, properties= { "rating" : 5.0 } ) Event Server Collecting Date
  11. 11. Query via REST: User ID Predicted Result: A list of Product IDs Engine Data: User Actions Event Server (data storage) Mobile App Engine
  12. 12. • DASE - the “MVC” for Machine Learning • Data: Data Source and Data Preparator • Algorithm(s) • Serving • Evaluator Engine Building an Engine with Separation of Concerns (SoC)
  13. 13. A. Train deployable predictive model(s) B. Respond to dynamic query C. Evaluation Engine Functions of an Engine
  14. 14. Engine A. Train predictive model(s) class DataSource(…) extends PDataSource def readTraining(sc: SparkContext) ==> trainingData class Preparator(…) extends PPreparator def prepare(sc: SparkContext, trainingData: TrainingData) ==> preparedData class Algorithm1(…) extends PAlgorithm def train(prepareData: PreparedData) ==> Model $ pio train
  15. 15. Engine A. Train predictive model(s) class DataSource(…) extends PDataSource override def readTraining(sc: SparkContext): TrainingData = { val eventsDb = Storage.getPEvents() val eventsRDD: RDD[Event] = eventsDb.find(….)(sc) val ratingsRDD: RDD[Rating] = eventsRDD.map { event => val rating = try { val ratingValue: Double = event.event match {….} Rating(event.entityId, event.targetEntityId.get, ratingValue) } catch {…} rating } new TrainingData(ratingsRDD) }
  16. 16. Engine A. Train predictive model(s) class Algorithm1(val ap: ALSAlgorithmParams) extends PAlgorithm def train(preparedData: PreparedData): Model1 = { mllibRatings = data…. val m = ALS.train(mllibRatings, ap.rank, ap.numIterations, ap.lambda) new Model1( rank = m.rank, userFeatures = m.userFeatures, productFeatures = m.productFeatures ) }
  17. 17. Engine A. Train predictive model(s) Event Server Algorithm 1 Algorithm 3Algorithm 2 PreparedDate Engine Data Preparator Data Source TrainingDate Model 3Model 1Model 2
  18. 18. B. Respond to dynamic queryEngine • Query (Input) :
 
 $ curl -H "Content-Type: application/json" -d 
 '{ "user": "1", "num": 4 }' 
 http://localhost:8000/queries.json case class Query( val user: String, val num: Int ) extends Serializable
  19. 19. B. Respond to dynamic queryEngine • Predicted Result (Output):
 
 {“itemScores”:[{"item":"22","score":4.072304374729956}, {"item":"62","score":4.058482414005789},
 {"item":"75","score":4.046063009943821}]} case class PredictedResult( val itemScores: Array[ItemScore] ) extends Serializable case class ItemScore( item: String, score: Double ) extends Serializable
  20. 20. class Algorithm1(…) extends PAlgorithm def predict(model: ALSModel, query: Query) ==> predictedResult class Serving extends LServing def serve(query: Query, predictedResults: Seq[PredictedResult]) ==> predictedResult B. Respond to dynamic queryEngine Query via REST
  21. 21. Engine B. Respond to dynamic query class Algorithm1(val ap: ALSAlgorithmParams) extends PAlgorithm def predict(model: ALSModel, query: Query): PredictedResult = { model….{ userInt => val itemScores = model.recommendProducts (…).map (….) new PredictedResult(itemScores) }.getOrElse{….} }
  22. 22. B. Respond to dynamic queryEngine Algorithm 1 Model 1 Serving Mobile App Algorithm 3 Model 3 Algorithm 2 Model 2 Predicted Results Query (input) Predicted Result (output) Engine
  23. 23. Engine DASE Factory object RecEngine extends IEngineFactory { def apply() = { new Engine( classOf[DataSource], classOf[Preparator], Map("algo1" -> classOf[Algorithm1]), classOf[Serving]) } }
  24. 24. Running on Production • Install PredictionIO
 $ bash -c "$(curl -s http://install.prediction.io/install.sh)" • Start the Event Server
 $ pio eventserver • Deploy an Engine
 $ pio build; pio train; pio deploy • Update Engine Model with New Data
 $ pio train; pio deploy
  25. 25. Deploy on Production Website Mobile App Email Campaign Event Server (data storage) Data Query via REST Predicted Result Engine 1 Engine 3 Engine 2 Engine 4
  26. 26. The Next Step • Quickstart with an Engine Template! • Follow on Github: github.com/predictionio/ • Learn PredictionIO: prediction.io/ • Learn Scala! Scala for the Impatient • Contribute!
  27. 27. Thanks. Simon Chan simon@prediction.io @PredictionIO prediction.io (Newsletters) github.com/predictionio

×