[2C2]PredictionIO

NAVER D2
NAVER D2NAVER D2
An Open Source Machine Learning Server 
for Developers 
@PredictionIO #PredictionIO 
Simon Chan 
simon@prediction.io
Thank you for having me here today! 
• Simon Chan - CEO of PredictionIO 
• A small team of Data Scientists and Engineers 
• Mainly based in Silicon Valley, also London and Hong Kong
Top Github Open Source 
• Over 5000 developers engaged 
• Powering over 200 applications
Talk Focus: 
• Machine Learning - A (Very) Brief Review 
• Challenges We Face When Building PredictionIO
Machine Learning is Simple?
I am going to give an 
example that will make 
you… HUNGRY!
F FOOD Club – Menu 
FOOD 
CLUB
[2C2]PredictionIO
Coding time…. 
# Using PredictionIO 
# Collect Data 
cli = predictionio.EventClient("<my_app_id>") 
cli.record_user_action_on_item("buy", "John", “BulgogiA") 
# Predict top preferences 
eng = predictionio.EngineClient("<my_engine_url>") 
rec = eng.send_query({"uid" : "John", "n" : 5})
The Magic Behind: Engine 
1. Data Sourcing and Preparation 
2. Algorithm 
3. Serving 
4. Evaluation
[2C2]PredictionIO
[2C2]PredictionIO
Challenges and Solutions
Architectural Challenge 1 
Workflow Co-ordination on a Distributed Cluster
Needs: 
•Support multiple distributed engines 
•Support multiple algorithms to execute in parallel 
How to coordinate the workflow when you have 
more pending tasks than processing units?
Attempt #1 
Use a database system to store tasks, and 
have a pool of workers pull tasks from it. 
•Inefficient. Database becomes bottleneck 
and potentially single point of failure.
Attempt #2 
Use an Akka cluster. 
Akka is a toolkit and runtime for building highly 
concurrent, distributed, and fault tolerant event-driven 
applications on the JVM. 
•Fundamentally the same problem with the above. 
•Need to build management suite on top.
Solution 
Apache Spark: directed acyclic graph 
(DAG) scheduling 
Adapts to many different infrastructure: 
Apache Spark standalone cluster, Apache 
Hadoop 2 YARN, Apache Mesos. 
Source: http://upload.wikimedia.org/wikipedia/commons/3/39/Directed_acyclic_graph_3.svg
Solution Source Code: 
http://github.com/predictionio
Architectural Challenge 2 
Distributed In-memory Model Retrieval
Needs: 
•Engines produce models that are 
distributed across a cluster. 
Requires a way to serve these distributed 
in-memory models to queries in real-time.
Solution 
All PredictionIO engine instances are launched 
inside a “SparkContext”. 
A SparkContext represents the connection to a 
Spark cluster, and can be used to create RDDs, 
accumulators and broadcast variables on that 
cluster. 
Source: http://bighadoop.files.wordpress.com/2014/04/spark-architecture.png
•When an engine is local to a single 
machine, it loads the model to its memory. 
•When an engine is distributed, 
SparkContext will automatically load the 
model on a cluster.
Conceptual Code for the Solution 
val sc = SparkContext(conf) 
... 
val model = 
if (model_is_distributed) { 
if (model_is_persisted) { 
sc.objectFile(model_on_HDFS) 
} else { 
engine.algo.train() 
} 
} else { 
... 
} 
}
PredictionIO 0.8
Built-in Engines: 
•Item Recommendation 
•Item Rank 
•Item Similarity
Create an Engine Instance Project…. 
$ pio instance io.prediction.engines.itemrec 
$ cd io.prediction.engines.itemrec 
$ pio register
Collect Event Data…. 
cli = predictionio.EventClient("<app_id>") 
cli.record_user_action_on_item("like", "John", “bulgogi_12”) 
cli.record_user_action_on_item("view", "John", “bimbimbap_13”)
Configurate the Engine Instance settings 
in params/datasource.json 
{ 
"appId": <app_id>, 
"actions": [ 
"view", "like", ... 
], ... 
}
Train the Data Model 
$ pio train 
Deploy the Engine Instance 
$ pio deploy
Retrieve Prediction Results 
from predictionio import EngineClient 
client = EngineClient(url="http://localhost:8000") 
prediction = client.send_query({"uid": "John", "n": 3}) 
print prediction 
Output 
{u'items': [{u'272': 9.929327011108398}, {u'313': 
9.92607593536377}, {u’347': 9.92170524597168}]}
You can also…. 
• Change algorithm 
• Tune algorithm parameter 
• Compare and evaluate algorithm 
• Add custom business logics
SDKs for: 
• Python 
• Ruby 
• PHP 
• Java / Andriod 
• Scala 
• Node.js 
• iOS 
• Meteor 
• more….
Also, 
build your own Engine!
Applications 
of 
Machine Learning 
Speech Recognition 
Personal Newsfeed 
SPAM Filtering 
Recommendation 
Driverless Car 
Churn Prediction 
Ad Targeting 
Fraud Detection 
{
감사합니다 
Korean Documentation (Beta)! 
http://docs.prediction.io/kr 
- @PredictionIO 
- prediction.io - Newsletters 
- github.com/predictionio
1 of 36

Recommended

An introduction to predictionIO by
An introduction to predictionIOAn introduction to predictionIO
An introduction to predictionIOJackson dos Santos Olveira
1.1K views14 slides
Introduction to PredictionIO by
Introduction to PredictionIOIntroduction to PredictionIO
Introduction to PredictionIOMuhammet Arslan
237 views15 slides
Machine Learning Software Design Pattern with PredictionIO by
Machine Learning Software Design Pattern with PredictionIOMachine Learning Software Design Pattern with PredictionIO
Machine Learning Software Design Pattern with PredictionIOTuri, Inc.
833 views11 slides
Introduce to PredictionIO by
Introduce to PredictionIOIntroduce to PredictionIO
Introduce to PredictionIOWei-Yuan Chang
691 views28 slides
PredictionIO – A Machine Learning Server in Scala – SF Scala by
PredictionIO – A Machine Learning Server in Scala – SF ScalaPredictionIO – A Machine Learning Server in Scala – SF Scala
PredictionIO – A Machine Learning Server in Scala – SF Scalapredictionio
102.6K views27 slides
PredictionIO - Building Applications That Predict User Behavior Through Big D... by
PredictionIO - Building Applications That Predict User Behavior Through Big D...PredictionIO - Building Applications That Predict User Behavior Through Big D...
PredictionIO - Building Applications That Predict User Behavior Through Big D...predictionio
8.1K views73 slides

More Related Content

What's hot

Big Wins with Small Data: PredictionIO in Ecommerce by
Big Wins with Small Data: PredictionIO in EcommerceBig Wins with Small Data: PredictionIO in Ecommerce
Big Wins with Small Data: PredictionIO in EcommerceDavid Jones
3.7K views38 slides
Using Azure Machine Learning Models by
Using Azure Machine Learning ModelsUsing Azure Machine Learning Models
Using Azure Machine Learning ModelsEng Teong Cheah
148 views23 slides
Azure によるスピードレイヤの分析アーキテクチャ by
Azure によるスピードレイヤの分析アーキテクチャAzure によるスピードレイヤの分析アーキテクチャ
Azure によるスピードレイヤの分析アーキテクチャDeep Learning Lab(ディープラーニング・ラボ)
979 views31 slides
Augmenting Machine Learning with Databricks Labs AutoML Toolkit by
Augmenting Machine Learning with Databricks Labs AutoML ToolkitAugmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitDatabricks
557 views39 slides
What is going on - Application diagnostics on Azure - TechDays Finland by
What is going on - Application diagnostics on Azure - TechDays FinlandWhat is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays FinlandMaarten Balliauw
746 views43 slides
Azure Machine Learning tutorial by
Azure Machine Learning tutorialAzure Machine Learning tutorial
Azure Machine Learning tutorialGiacomo Lanciano
3.3K views71 slides

What's hot(20)

Big Wins with Small Data: PredictionIO in Ecommerce by David Jones
Big Wins with Small Data: PredictionIO in EcommerceBig Wins with Small Data: PredictionIO in Ecommerce
Big Wins with Small Data: PredictionIO in Ecommerce
David Jones3.7K views
Using Azure Machine Learning Models by Eng Teong Cheah
Using Azure Machine Learning ModelsUsing Azure Machine Learning Models
Using Azure Machine Learning Models
Eng Teong Cheah148 views
Augmenting Machine Learning with Databricks Labs AutoML Toolkit by Databricks
Augmenting Machine Learning with Databricks Labs AutoML ToolkitAugmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
Databricks557 views
What is going on - Application diagnostics on Azure - TechDays Finland by Maarten Balliauw
What is going on - Application diagnostics on Azure - TechDays FinlandWhat is going on - Application diagnostics on Azure - TechDays Finland
What is going on - Application diagnostics on Azure - TechDays Finland
Maarten Balliauw746 views
An introduction to Machine Learning with scikit-learn (October 2018) by Julien SIMON
An introduction to Machine Learning with scikit-learn (October 2018)An introduction to Machine Learning with scikit-learn (October 2018)
An introduction to Machine Learning with scikit-learn (October 2018)
Julien SIMON1.2K views
StreamSQL Feature Store (Apache Pulsar Summit) by Simba Khadder
StreamSQL Feature Store (Apache Pulsar Summit)StreamSQL Feature Store (Apache Pulsar Summit)
StreamSQL Feature Store (Apache Pulsar Summit)
Simba Khadder516 views
Best practices with Microsoft Graph: Making your applications more performant... by Microsoft Tech Community
Best practices with Microsoft Graph: Making your applications more performant...Best practices with Microsoft Graph: Making your applications more performant...
Best practices with Microsoft Graph: Making your applications more performant...
Feature store: Solving anti-patterns in ML-systems by Andrzej Michałowski
Feature store: Solving anti-patterns in ML-systemsFeature store: Solving anti-patterns in ML-systems
Feature store: Solving anti-patterns in ML-systems
Sergii Baidachnyi ITEM 2018 by ITEM
Sergii Baidachnyi ITEM 2018Sergii Baidachnyi ITEM 2018
Sergii Baidachnyi ITEM 2018
ITEM98 views
Simplifying Model Management with MLflow by Databricks
Simplifying Model Management with MLflowSimplifying Model Management with MLflow
Simplifying Model Management with MLflow
Databricks2.2K views
Sundar Ranganathan, NetApp + Vinod Iyengar, H2O.ai - Driverless AI integratio... by Sri Ambati
Sundar Ranganathan, NetApp + Vinod Iyengar, H2O.ai - Driverless AI integratio...Sundar Ranganathan, NetApp + Vinod Iyengar, H2O.ai - Driverless AI integratio...
Sundar Ranganathan, NetApp + Vinod Iyengar, H2O.ai - Driverless AI integratio...
Sri Ambati403 views
Analytics in the Cloud by Ross McNeely
Analytics in the CloudAnalytics in the Cloud
Analytics in the Cloud
Ross McNeely421 views
App development with constraint layout, kotlin &amp; firebase by Pankaj Rai
App development with constraint layout, kotlin &amp; firebaseApp development with constraint layout, kotlin &amp; firebase
App development with constraint layout, kotlin &amp; firebase
Pankaj Rai62 views
Autonomous analytics on streaming data by Claudiu Barbura
Autonomous analytics on streaming dataAutonomous analytics on streaming data
Autonomous analytics on streaming data
Claudiu Barbura1.2K views
Hopsworks MLOps World talk june 21 by Jim Dowling
Hopsworks MLOps World talk june 21Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21
Jim Dowling155 views

Viewers also liked

PredictionIO - The 1st International Conference on Predictive APIs and Apps by
PredictionIO - The 1st International Conference on Predictive APIs and AppsPredictionIO - The 1st International Conference on Predictive APIs and Apps
PredictionIO - The 1st International Conference on Predictive APIs and Appspredictionio
2.5K views23 slides
Prediction io–final 2014-jp-handout by
Prediction io–final 2014-jp-handoutPrediction io–final 2014-jp-handout
Prediction io–final 2014-jp-handoutHa Phuong
2.7K views17 slides
Discovery by
DiscoveryDiscovery
DiscoveryPat Ferrel
1.6K views23 slides
Co-occurrence Based Recommendations with Mahout, Scala and Spark by
Co-occurrence Based Recommendations with Mahout, Scala and SparkCo-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and Sparksscdotopen
8.9K views40 slides
Machine learning in the enterprise by
Machine learning in the enterpriseMachine learning in the enterprise
Machine learning in the enterpriseJesus Rodriguez
1.1K views15 slides
The Minister's Black Veil - in class notes by
The Minister's Black Veil - in class notesThe Minister's Black Veil - in class notes
The Minister's Black Veil - in class noteslramirezcruz
12.5K views13 slides

Viewers also liked(10)

PredictionIO - The 1st International Conference on Predictive APIs and Apps by predictionio
PredictionIO - The 1st International Conference on Predictive APIs and AppsPredictionIO - The 1st International Conference on Predictive APIs and Apps
PredictionIO - The 1st International Conference on Predictive APIs and Apps
predictionio2.5K views
Prediction io–final 2014-jp-handout by Ha Phuong
Prediction io–final 2014-jp-handoutPrediction io–final 2014-jp-handout
Prediction io–final 2014-jp-handout
Ha Phuong2.7K views
Discovery by Pat Ferrel
DiscoveryDiscovery
Discovery
Pat Ferrel1.6K views
Co-occurrence Based Recommendations with Mahout, Scala and Spark by sscdotopen
Co-occurrence Based Recommendations with Mahout, Scala and SparkCo-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and Spark
sscdotopen8.9K views
Machine learning in the enterprise by Jesus Rodriguez
Machine learning in the enterpriseMachine learning in the enterprise
Machine learning in the enterprise
Jesus Rodriguez1.1K views
The Minister's Black Veil - in class notes by lramirezcruz
The Minister's Black Veil - in class notesThe Minister's Black Veil - in class notes
The Minister's Black Veil - in class notes
lramirezcruz12.5K views
Practical Machine Learning by David Jones
Practical Machine LearningPractical Machine Learning
Practical Machine Learning
David Jones2.8K views
The Universal Recommender by Pat Ferrel
The Universal RecommenderThe Universal Recommender
The Universal Recommender
Pat Ferrel33.9K views
Introduction to Mahout and Machine Learning by Varad Meru
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
Varad Meru77K views

Similar to [2C2]PredictionIO

Prediction io 架構與整合 -DataCon.TW-2017 by
Prediction io 架構與整合 -DataCon.TW-2017Prediction io 架構與整合 -DataCon.TW-2017
Prediction io 架構與整合 -DataCon.TW-2017William Lee
504 views47 slides
Story ofcorespring infodeck by
Story ofcorespring infodeckStory ofcorespring infodeck
Story ofcorespring infodeckMakarand Bhatambarekar
558 views25 slides
pio_present by
pio_presentpio_present
pio_presentGladson Manuel
538 views18 slides
WebNet Conference 2012 - Designing complex applications using html5 and knock... by
WebNet Conference 2012 - Designing complex applications using html5 and knock...WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...Fabio Franzini
1.6K views36 slides
Yufeng Guo | Coding the 7 steps of machine learning | Codemotion Madrid 2018 by
Yufeng Guo |  Coding the 7 steps of machine learning | Codemotion Madrid 2018 Yufeng Guo |  Coding the 7 steps of machine learning | Codemotion Madrid 2018
Yufeng Guo | Coding the 7 steps of machine learning | Codemotion Madrid 2018 Codemotion
612 views50 slides
Introduction to Client Side Dev in SharePoint Workshop by
Introduction to Client Side Dev in SharePoint WorkshopIntroduction to Client Side Dev in SharePoint Workshop
Introduction to Client Side Dev in SharePoint WorkshopMark Rackley
5.2K views72 slides

Similar to [2C2]PredictionIO(20)

Prediction io 架構與整合 -DataCon.TW-2017 by William Lee
Prediction io 架構與整合 -DataCon.TW-2017Prediction io 架構與整合 -DataCon.TW-2017
Prediction io 架構與整合 -DataCon.TW-2017
William Lee504 views
WebNet Conference 2012 - Designing complex applications using html5 and knock... by Fabio Franzini
WebNet Conference 2012 - Designing complex applications using html5 and knock...WebNet Conference 2012 - Designing complex applications using html5 and knock...
WebNet Conference 2012 - Designing complex applications using html5 and knock...
Fabio Franzini1.6K views
Yufeng Guo | Coding the 7 steps of machine learning | Codemotion Madrid 2018 by Codemotion
Yufeng Guo |  Coding the 7 steps of machine learning | Codemotion Madrid 2018 Yufeng Guo |  Coding the 7 steps of machine learning | Codemotion Madrid 2018
Yufeng Guo | Coding the 7 steps of machine learning | Codemotion Madrid 2018
Codemotion612 views
Introduction to Client Side Dev in SharePoint Workshop by Mark Rackley
Introduction to Client Side Dev in SharePoint WorkshopIntroduction to Client Side Dev in SharePoint Workshop
Introduction to Client Side Dev in SharePoint Workshop
Mark Rackley5.2K views
Google App Engine for Java by Lars Vogel
Google App Engine for JavaGoogle App Engine for Java
Google App Engine for Java
Lars Vogel1.4K views
gDayX 2013 - Advanced AngularJS - Nicolas Embleton by George Nguyen
gDayX 2013 - Advanced AngularJS - Nicolas EmbletongDayX 2013 - Advanced AngularJS - Nicolas Embleton
gDayX 2013 - Advanced AngularJS - Nicolas Embleton
George Nguyen1.4K views
Google App Engine for Java by Lars Vogel
Google App Engine for JavaGoogle App Engine for Java
Google App Engine for Java
Lars Vogel3.7K views
App engine devfest_mexico_10 by Chris Schalk
App engine devfest_mexico_10App engine devfest_mexico_10
App engine devfest_mexico_10
Chris Schalk2.1K views
I want my model to be deployed ! (another story of MLOps) by AZUG FR
I want my model to be deployed ! (another story of MLOps)I want my model to be deployed ! (another story of MLOps)
I want my model to be deployed ! (another story of MLOps)
AZUG FR162 views
Agile Machine Learning for Real-time Recommender Systems by Johann Schleier-Smith
Agile Machine Learning for Real-time Recommender SystemsAgile Machine Learning for Real-time Recommender Systems
Agile Machine Learning for Real-time Recommender Systems
Google I/O 2021 Recap by furusin
Google I/O 2021 RecapGoogle I/O 2021 Recap
Google I/O 2021 Recap
furusin167 views

More from NAVER D2

[211] 인공지능이 인공지능 챗봇을 만든다 by
[211] 인공지능이 인공지능 챗봇을 만든다[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다NAVER D2
10.8K views73 slides
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i... by
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...NAVER D2
3.6K views69 slides
[215] Druid로 쉽고 빠르게 데이터 분석하기 by
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기NAVER D2
5.4K views58 slides
[245]Papago Internals: 모델분석과 응용기술 개발 by
[245]Papago Internals: 모델분석과 응용기술 개발[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발NAVER D2
2.1K views55 slides
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈 by
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈NAVER D2
2.3K views66 slides
[235]Wikipedia-scale Q&A by
[235]Wikipedia-scale Q&A[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&ANAVER D2
1.5K views54 slides

More from NAVER D2(20)

[211] 인공지능이 인공지능 챗봇을 만든다 by NAVER D2
[211] 인공지능이 인공지능 챗봇을 만든다[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다
NAVER D210.8K views
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i... by NAVER D2
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
NAVER D23.6K views
[215] Druid로 쉽고 빠르게 데이터 분석하기 by NAVER D2
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기
NAVER D25.4K views
[245]Papago Internals: 모델분석과 응용기술 개발 by NAVER D2
[245]Papago Internals: 모델분석과 응용기술 개발[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발
NAVER D22.1K views
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈 by NAVER D2
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
NAVER D22.3K views
[235]Wikipedia-scale Q&A by NAVER D2
[235]Wikipedia-scale Q&A[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A
NAVER D21.5K views
[244]로봇이 현실 세계에 대해 학습하도록 만들기 by NAVER D2
[244]로봇이 현실 세계에 대해 학습하도록 만들기[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기
NAVER D21.7K views
[243] Deep Learning to help student’s Deep Learning by NAVER D2
[243] Deep Learning to help student’s Deep Learning[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning
NAVER D21.4K views
[234]Fast & Accurate Data Annotation Pipeline for AI applications by NAVER D2
[234]Fast & Accurate Data Annotation Pipeline for AI applications[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications
NAVER D21.3K views
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing by NAVER D2
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load BalancingOld version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
NAVER D21.4K views
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지 by NAVER D2
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
NAVER D21.9K views
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기 by NAVER D2
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
NAVER D23.6K views
[224]네이버 검색과 개인화 by NAVER D2
[224]네이버 검색과 개인화[224]네이버 검색과 개인화
[224]네이버 검색과 개인화
NAVER D22.3K views
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템) by NAVER D2
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
NAVER D21.9K views
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기 by NAVER D2
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
NAVER D22.6K views
[213] Fashion Visual Search by NAVER D2
[213] Fashion Visual Search[213] Fashion Visual Search
[213] Fashion Visual Search
NAVER D21.5K views
[232] TensorRT를 활용한 딥러닝 Inference 최적화 by NAVER D2
[232] TensorRT를 활용한 딥러닝 Inference 최적화[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화
NAVER D24.5K views
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지 by NAVER D2
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
NAVER D21.1K views
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터 by NAVER D2
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
NAVER D21.7K views
[223]기계독해 QA: 검색인가, NLP인가? by NAVER D2
[223]기계독해 QA: 검색인가, NLP인가?[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?
NAVER D23.8K views

Recently uploaded

KVM Security Groups Under the Hood - Wido den Hollander - Your.Online by
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineShapeBlue
102 views19 slides
Microsoft Power Platform.pptx by
Microsoft Power Platform.pptxMicrosoft Power Platform.pptx
Microsoft Power Platform.pptxUni Systems S.M.S.A.
67 views38 slides
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... by
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...ShapeBlue
88 views20 slides
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveNetwork Automation Forum
46 views35 slides
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue by
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlueShapeBlue
50 views23 slides
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... by
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...ShapeBlue
63 views13 slides

Recently uploaded(20)

KVM Security Groups Under the Hood - Wido den Hollander - Your.Online by ShapeBlue
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
ShapeBlue102 views
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... by ShapeBlue
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
ShapeBlue88 views
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue by ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
ShapeBlue50 views
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... by ShapeBlue
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
ShapeBlue63 views
Five Things You SHOULD Know About Postman by Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman40 views
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc77 views
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N... by James Anderson
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
GDG Cloud Southlake 28 Brad Taylor and Shawn Augenstein Old Problems in the N...
James Anderson133 views
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ... by ShapeBlue
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
ShapeBlue77 views
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti... by ShapeBlue
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
ShapeBlue46 views
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... by ShapeBlue
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
ShapeBlue65 views
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ... by Jasper Oosterveld
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
State of the Union - Rohit Yadav - Apache CloudStack by ShapeBlue
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStack
ShapeBlue145 views
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha... by ShapeBlue
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
ShapeBlue74 views
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue by ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueCloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
ShapeBlue46 views
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue by ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
ShapeBlue131 views
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue by ShapeBlue
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlueMigrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue
ShapeBlue96 views
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 by IttrainingIttraining
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院

[2C2]PredictionIO

  • 1. An Open Source Machine Learning Server for Developers @PredictionIO #PredictionIO Simon Chan simon@prediction.io
  • 2. Thank you for having me here today! • Simon Chan - CEO of PredictionIO • A small team of Data Scientists and Engineers • Mainly based in Silicon Valley, also London and Hong Kong
  • 3. Top Github Open Source • Over 5000 developers engaged • Powering over 200 applications
  • 4. Talk Focus: • Machine Learning - A (Very) Brief Review • Challenges We Face When Building PredictionIO
  • 6. I am going to give an example that will make you… HUNGRY!
  • 7. F FOOD Club – Menu FOOD CLUB
  • 9. Coding time…. # Using PredictionIO # Collect Data cli = predictionio.EventClient("<my_app_id>") cli.record_user_action_on_item("buy", "John", “BulgogiA") # Predict top preferences eng = predictionio.EngineClient("<my_engine_url>") rec = eng.send_query({"uid" : "John", "n" : 5})
  • 10. The Magic Behind: Engine 1. Data Sourcing and Preparation 2. Algorithm 3. Serving 4. Evaluation
  • 14. Architectural Challenge 1 Workflow Co-ordination on a Distributed Cluster
  • 15. Needs: •Support multiple distributed engines •Support multiple algorithms to execute in parallel How to coordinate the workflow when you have more pending tasks than processing units?
  • 16. Attempt #1 Use a database system to store tasks, and have a pool of workers pull tasks from it. •Inefficient. Database becomes bottleneck and potentially single point of failure.
  • 17. Attempt #2 Use an Akka cluster. Akka is a toolkit and runtime for building highly concurrent, distributed, and fault tolerant event-driven applications on the JVM. •Fundamentally the same problem with the above. •Need to build management suite on top.
  • 18. Solution Apache Spark: directed acyclic graph (DAG) scheduling Adapts to many different infrastructure: Apache Spark standalone cluster, Apache Hadoop 2 YARN, Apache Mesos. Source: http://upload.wikimedia.org/wikipedia/commons/3/39/Directed_acyclic_graph_3.svg
  • 19. Solution Source Code: http://github.com/predictionio
  • 20. Architectural Challenge 2 Distributed In-memory Model Retrieval
  • 21. Needs: •Engines produce models that are distributed across a cluster. Requires a way to serve these distributed in-memory models to queries in real-time.
  • 22. Solution All PredictionIO engine instances are launched inside a “SparkContext”. A SparkContext represents the connection to a Spark cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster. Source: http://bighadoop.files.wordpress.com/2014/04/spark-architecture.png
  • 23. •When an engine is local to a single machine, it loads the model to its memory. •When an engine is distributed, SparkContext will automatically load the model on a cluster.
  • 24. Conceptual Code for the Solution val sc = SparkContext(conf) ... val model = if (model_is_distributed) { if (model_is_persisted) { sc.objectFile(model_on_HDFS) } else { engine.algo.train() } } else { ... } }
  • 26. Built-in Engines: •Item Recommendation •Item Rank •Item Similarity
  • 27. Create an Engine Instance Project…. $ pio instance io.prediction.engines.itemrec $ cd io.prediction.engines.itemrec $ pio register
  • 28. Collect Event Data…. cli = predictionio.EventClient("<app_id>") cli.record_user_action_on_item("like", "John", “bulgogi_12”) cli.record_user_action_on_item("view", "John", “bimbimbap_13”)
  • 29. Configurate the Engine Instance settings in params/datasource.json { "appId": <app_id>, "actions": [ "view", "like", ... ], ... }
  • 30. Train the Data Model $ pio train Deploy the Engine Instance $ pio deploy
  • 31. Retrieve Prediction Results from predictionio import EngineClient client = EngineClient(url="http://localhost:8000") prediction = client.send_query({"uid": "John", "n": 3}) print prediction Output {u'items': [{u'272': 9.929327011108398}, {u'313': 9.92607593536377}, {u’347': 9.92170524597168}]}
  • 32. You can also…. • Change algorithm • Tune algorithm parameter • Compare and evaluate algorithm • Add custom business logics
  • 33. SDKs for: • Python • Ruby • PHP • Java / Andriod • Scala • Node.js • iOS • Meteor • more….
  • 34. Also, build your own Engine!
  • 35. Applications of Machine Learning Speech Recognition Personal Newsfeed SPAM Filtering Recommendation Driverless Car Churn Prediction Ad Targeting Fraud Detection {
  • 36. 감사합니다 Korean Documentation (Beta)! http://docs.prediction.io/kr - @PredictionIO - prediction.io - Newsletters - github.com/predictionio