Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[2C2]PredictionIO

6,178 views

Published on

DEVIEW 2014 [2C2]PredictionIO

Published in: Technology
  • who will win this game? get free picks and predictions. ●●● http://ishbv.com/zcodesys/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

[2C2]PredictionIO

  1. 1. An Open Source Machine Learning Server for Developers @PredictionIO #PredictionIO Simon Chan simon@prediction.io
  2. 2. Thank you for having me here today! • Simon Chan - CEO of PredictionIO • A small team of Data Scientists and Engineers • Mainly based in Silicon Valley, also London and Hong Kong
  3. 3. Top Github Open Source • Over 5000 developers engaged • Powering over 200 applications
  4. 4. Talk Focus: • Machine Learning - A (Very) Brief Review • Challenges We Face When Building PredictionIO
  5. 5. Machine Learning is Simple?
  6. 6. I am going to give an example that will make you… HUNGRY!
  7. 7. F FOOD Club – Menu FOOD CLUB
  8. 8. Coding time…. # Using PredictionIO # Collect Data cli = predictionio.EventClient("<my_app_id>") cli.record_user_action_on_item("buy", "John", “BulgogiA") # Predict top preferences eng = predictionio.EngineClient("<my_engine_url>") rec = eng.send_query({"uid" : "John", "n" : 5})
  9. 9. The Magic Behind: Engine 1. Data Sourcing and Preparation 2. Algorithm 3. Serving 4. Evaluation
  10. 10. Challenges and Solutions
  11. 11. Architectural Challenge 1 Workflow Co-ordination on a Distributed Cluster
  12. 12. Needs: •Support multiple distributed engines •Support multiple algorithms to execute in parallel How to coordinate the workflow when you have more pending tasks than processing units?
  13. 13. Attempt #1 Use a database system to store tasks, and have a pool of workers pull tasks from it. •Inefficient. Database becomes bottleneck and potentially single point of failure.
  14. 14. Attempt #2 Use an Akka cluster. Akka is a toolkit and runtime for building highly concurrent, distributed, and fault tolerant event-driven applications on the JVM. •Fundamentally the same problem with the above. •Need to build management suite on top.
  15. 15. Solution Apache Spark: directed acyclic graph (DAG) scheduling Adapts to many different infrastructure: Apache Spark standalone cluster, Apache Hadoop 2 YARN, Apache Mesos. Source: http://upload.wikimedia.org/wikipedia/commons/3/39/Directed_acyclic_graph_3.svg
  16. 16. Solution Source Code: http://github.com/predictionio
  17. 17. Architectural Challenge 2 Distributed In-memory Model Retrieval
  18. 18. Needs: •Engines produce models that are distributed across a cluster. Requires a way to serve these distributed in-memory models to queries in real-time.
  19. 19. Solution All PredictionIO engine instances are launched inside a “SparkContext”. A SparkContext represents the connection to a Spark cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster. Source: http://bighadoop.files.wordpress.com/2014/04/spark-architecture.png
  20. 20. •When an engine is local to a single machine, it loads the model to its memory. •When an engine is distributed, SparkContext will automatically load the model on a cluster.
  21. 21. Conceptual Code for the Solution val sc = SparkContext(conf) ... val model = if (model_is_distributed) { if (model_is_persisted) { sc.objectFile(model_on_HDFS) } else { engine.algo.train() } } else { ... } }
  22. 22. PredictionIO 0.8
  23. 23. Built-in Engines: •Item Recommendation •Item Rank •Item Similarity
  24. 24. Create an Engine Instance Project…. $ pio instance io.prediction.engines.itemrec $ cd io.prediction.engines.itemrec $ pio register
  25. 25. Collect Event Data…. cli = predictionio.EventClient("<app_id>") cli.record_user_action_on_item("like", "John", “bulgogi_12”) cli.record_user_action_on_item("view", "John", “bimbimbap_13”)
  26. 26. Configurate the Engine Instance settings in params/datasource.json { "appId": <app_id>, "actions": [ "view", "like", ... ], ... }
  27. 27. Train the Data Model $ pio train Deploy the Engine Instance $ pio deploy
  28. 28. Retrieve Prediction Results from predictionio import EngineClient client = EngineClient(url="http://localhost:8000") prediction = client.send_query({"uid": "John", "n": 3}) print prediction Output {u'items': [{u'272': 9.929327011108398}, {u'313': 9.92607593536377}, {u’347': 9.92170524597168}]}
  29. 29. You can also…. • Change algorithm • Tune algorithm parameter • Compare and evaluate algorithm • Add custom business logics
  30. 30. SDKs for: • Python • Ruby • PHP • Java / Andriod • Scala • Node.js • iOS • Meteor • more….
  31. 31. Also, build your own Engine!
  32. 32. Applications of Machine Learning Speech Recognition Personal Newsfeed SPAM Filtering Recommendation Driverless Car Churn Prediction Ad Targeting Fraud Detection {
  33. 33. 감사합니다 Korean Documentation (Beta)! http://docs.prediction.io/kr - @PredictionIO - prediction.io - Newsletters - github.com/predictionio

×