Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

[2C2]PredictionIO

6,339 views

Published on

DEVIEW 2014 [2C2]PredictionIO

Published in: Technology
  • Have you ever used the help of ⇒ www.WritePaper.info ⇐? They can help you with any type of writing - from personal statement to research paper. Due to this service you'll save your time and get an essay without plagiarism.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • I can advise you this service - ⇒ www.WritePaper.info ⇐ Bought essay here. No problem.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Follow the link, new dating source: ❶❶❶ http://bit.ly/39mQKz3 ❶❶❶
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Dating for everyone is here: ❶❶❶ http://bit.ly/39mQKz3 ❶❶❶
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • who will win this game? get free picks and predictions. ●●● http://ishbv.com/zcodesys/pdf
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

[2C2]PredictionIO

  1. 1. An Open Source Machine Learning Server for Developers @PredictionIO #PredictionIO Simon Chan simon@prediction.io
  2. 2. Thank you for having me here today! • Simon Chan - CEO of PredictionIO • A small team of Data Scientists and Engineers • Mainly based in Silicon Valley, also London and Hong Kong
  3. 3. Top Github Open Source • Over 5000 developers engaged • Powering over 200 applications
  4. 4. Talk Focus: • Machine Learning - A (Very) Brief Review • Challenges We Face When Building PredictionIO
  5. 5. Machine Learning is Simple?
  6. 6. I am going to give an example that will make you… HUNGRY!
  7. 7. F FOOD Club – Menu FOOD CLUB
  8. 8. Coding time…. # Using PredictionIO # Collect Data cli = predictionio.EventClient("<my_app_id>") cli.record_user_action_on_item("buy", "John", “BulgogiA") # Predict top preferences eng = predictionio.EngineClient("<my_engine_url>") rec = eng.send_query({"uid" : "John", "n" : 5})
  9. 9. The Magic Behind: Engine 1. Data Sourcing and Preparation 2. Algorithm 3. Serving 4. Evaluation
  10. 10. Challenges and Solutions
  11. 11. Architectural Challenge 1 Workflow Co-ordination on a Distributed Cluster
  12. 12. Needs: •Support multiple distributed engines •Support multiple algorithms to execute in parallel How to coordinate the workflow when you have more pending tasks than processing units?
  13. 13. Attempt #1 Use a database system to store tasks, and have a pool of workers pull tasks from it. •Inefficient. Database becomes bottleneck and potentially single point of failure.
  14. 14. Attempt #2 Use an Akka cluster. Akka is a toolkit and runtime for building highly concurrent, distributed, and fault tolerant event-driven applications on the JVM. •Fundamentally the same problem with the above. •Need to build management suite on top.
  15. 15. Solution Apache Spark: directed acyclic graph (DAG) scheduling Adapts to many different infrastructure: Apache Spark standalone cluster, Apache Hadoop 2 YARN, Apache Mesos. Source: http://upload.wikimedia.org/wikipedia/commons/3/39/Directed_acyclic_graph_3.svg
  16. 16. Solution Source Code: http://github.com/predictionio
  17. 17. Architectural Challenge 2 Distributed In-memory Model Retrieval
  18. 18. Needs: •Engines produce models that are distributed across a cluster. Requires a way to serve these distributed in-memory models to queries in real-time.
  19. 19. Solution All PredictionIO engine instances are launched inside a “SparkContext”. A SparkContext represents the connection to a Spark cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster. Source: http://bighadoop.files.wordpress.com/2014/04/spark-architecture.png
  20. 20. •When an engine is local to a single machine, it loads the model to its memory. •When an engine is distributed, SparkContext will automatically load the model on a cluster.
  21. 21. Conceptual Code for the Solution val sc = SparkContext(conf) ... val model = if (model_is_distributed) { if (model_is_persisted) { sc.objectFile(model_on_HDFS) } else { engine.algo.train() } } else { ... } }
  22. 22. PredictionIO 0.8
  23. 23. Built-in Engines: •Item Recommendation •Item Rank •Item Similarity
  24. 24. Create an Engine Instance Project…. $ pio instance io.prediction.engines.itemrec $ cd io.prediction.engines.itemrec $ pio register
  25. 25. Collect Event Data…. cli = predictionio.EventClient("<app_id>") cli.record_user_action_on_item("like", "John", “bulgogi_12”) cli.record_user_action_on_item("view", "John", “bimbimbap_13”)
  26. 26. Configurate the Engine Instance settings in params/datasource.json { "appId": <app_id>, "actions": [ "view", "like", ... ], ... }
  27. 27. Train the Data Model $ pio train Deploy the Engine Instance $ pio deploy
  28. 28. Retrieve Prediction Results from predictionio import EngineClient client = EngineClient(url="http://localhost:8000") prediction = client.send_query({"uid": "John", "n": 3}) print prediction Output {u'items': [{u'272': 9.929327011108398}, {u'313': 9.92607593536377}, {u’347': 9.92170524597168}]}
  29. 29. You can also…. • Change algorithm • Tune algorithm parameter • Compare and evaluate algorithm • Add custom business logics
  30. 30. SDKs for: • Python • Ruby • PHP • Java / Andriod • Scala • Node.js • iOS • Meteor • more….
  31. 31. Also, build your own Engine!
  32. 32. Applications of Machine Learning Speech Recognition Personal Newsfeed SPAM Filtering Recommendation Driverless Car Churn Prediction Ad Targeting Fraud Detection {
  33. 33. 감사합니다 Korean Documentation (Beta)! http://docs.prediction.io/kr - @PredictionIO - prediction.io - Newsletters - github.com/predictionio

×