SlideShare a Scribd company logo

Prediction io 架構與整合 -DataCon.TW-2017

Apache PredictionIO 是一個開源 Machine Learning Server 架構,提供開發者及資料科學家能有效地快速建立所需的預測引擎,並且透過 REST 整合現有系統,達到 Machine Learning as a Service 的目標。我們將介紹如何整合 Hadoop Ecosystem 及 PredictionIO,有效協助使用者蒐集、儲存資料、訓練學習引擎及提供預測結果,幫助企業發掘問題、改善客戶需求預測等。

1 of 47
Download to read offline
Apache PredictionIO 架構與整合
Establish an effective machine learning platform efficiently
李文良 張峰睿
亦思科技
2017/09/30
About us
李文良 (William Lee)
亦思科技 研究發展處 專案經理
williamlee@is-land.com.tw
張峰睿 (Frank Chang)
亦思科技 研究發展處 系統架構師
frank@is-land.com.tw
Outline
•Background
•PredictionIO Overview
•Quick Start your first Engine
•Customizing an Engine
•Implementation on Enterprise Production
•Summary
對於機器學習的期望
《MIT Technology Review 》and 《Google Cloud》 報告裡所提到的:
• 有 50% 的組織規劃將在將來透過機器學習來加深對手上資料群的了解,以
便得出更多的資訊。
• 用於取得更多的競爭優勢,或是加速現有資料的分析。
• 甚至有 31% 認為可以透過機器學習達到降低成本的功效。
https://s3.amazonaws.com/files.technologyreview.com/whitepapers/MITTR_GoogleforWork_Survey.pdf
機器學習在半導體領域的應用
● 半導體生產通常需經過數百道的製程,過程中產出數百萬筆的資料。
● 在產品的開發過程 RD 必須為這些資料訂定 SPEC 用於檢查品質以及機台調
整。
● 通常需要 IT 人員協助取得資料集轉入統計軟體方能進行分析。
● 若系統結合收集資料並且機器學習自動產生 SPEC 輔助人員確認,能減少產
品開發過程中所耗用的時間。
開發產品 資料收集 分析資料 調整參數
SPEC
機器學習在金融領域的應用
● 現行金融業已經進入電子交易的時代,歷史交易累積成為大量的資料群。
並且隨時會透過交易系統加入新的資料。
● 根據不同需求的從歷史資料庫中提取資訊進行分析。
● 若是能建構出系統統合收集歷史資料的資料庫並且提供幾種機器學習的演
算法,便能夠加快分析資料到產出目標的時間。
歷史資料
用戶端
用戶端
用戶端
分析資料
設計產品
風險管理
異常分析
市場預測
分析資料
分析資料
Ad

Recommended

Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationCraig Chao
 
Managing Millions of Tests Using Databricks
Managing Millions of Tests Using DatabricksManaging Millions of Tests Using Databricks
Managing Millions of Tests Using DatabricksDatabricks
 
Simplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Simplify and Boost Spark 3 Deployments with Hypervisor-Native KubernetesSimplify and Boost Spark 3 Deployments with Hypervisor-Native Kubernetes
Simplify and Boost Spark 3 Deployments with Hypervisor-Native KubernetesDatabricks
 
Productionizing Machine Learning Pipelines with Databricks and Azure ML
Productionizing Machine Learning Pipelines with Databricks and Azure MLProductionizing Machine Learning Pipelines with Databricks and Azure ML
Productionizing Machine Learning Pipelines with Databricks and Azure MLDatabricks
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataDatabricks
 
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...
Building a Data Warehouse for Business Analytics using Spark SQL-(Blagoy Kalo...Spark Summit
 
Spark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit EU talk by Bas Geerdink
Spark Summit EU talk by Bas GeerdinkSpark Summit
 
Flock: Data Science Platform @ CISL
Flock: Data Science Platform @ CISLFlock: Data Science Platform @ CISL
Flock: Data Science Platform @ CISLDatabricks
 

More Related Content

What's hot

Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0Databricks
 
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Flink Forward
 
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Spark Summit
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Databricks
 
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...Databricks
 
An Insider’s Guide to Maximizing Spark SQL Performance
 An Insider’s Guide to Maximizing Spark SQL Performance An Insider’s Guide to Maximizing Spark SQL Performance
An Insider’s Guide to Maximizing Spark SQL PerformanceTakuya UESHIN
 
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...Databricks
 
Productionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureProductionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureDatabricks
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkDatabricks
 
Best Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and DeltaBest Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and DeltaDatabricks
 
Continuous Integration & Continuous Delivery
Continuous Integration & Continuous DeliveryContinuous Integration & Continuous Delivery
Continuous Integration & Continuous DeliveryDatabricks
 
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...Databricks
 
Databricks: What We Have Learned by Eating Our Dog Food
Databricks: What We Have Learned by Eating Our Dog FoodDatabricks: What We Have Learned by Eating Our Dog Food
Databricks: What We Have Learned by Eating Our Dog FoodDatabricks
 
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSABuilding the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSADatabricks
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformDatabricks
 
Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowData Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowDatabricks
 
Enancing Threat Detection with Big Data and AI
Enancing Threat Detection with Big Data and AIEnancing Threat Detection with Big Data and AI
Enancing Threat Detection with Big Data and AIDatabricks
 
Koalas: How Well Does Koalas Work?
Koalas: How Well Does Koalas Work?Koalas: How Well Does Koalas Work?
Koalas: How Well Does Koalas Work?Databricks
 
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...Spark Summit
 
MLflow Model Serving
MLflow Model ServingMLflow Model Serving
MLflow Model ServingDatabricks
 

What's hot (20)

Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
Spark Summit San Francisco 2016 - Matei Zaharia Keynote: Apache Spark 2.0
 
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
 
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Sum...
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
 
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
A Journey to Building an Autonomous Streaming Data Platform—Scaling to Trilli...
 
An Insider’s Guide to Maximizing Spark SQL Performance
 An Insider’s Guide to Maximizing Spark SQL Performance An Insider’s Guide to Maximizing Spark SQL Performance
An Insider’s Guide to Maximizing Spark SQL Performance
 
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
Real-Time Machine Learning with Redis, Apache Spark, Tensor Flow, and more wi...
 
Productionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices ArchitectureProductionizing Machine Learning with a Microservices Architecture
Productionizing Machine Learning with a Microservices Architecture
 
Healthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache SparkHealthcare Claim Reimbursement using Apache Spark
Healthcare Claim Reimbursement using Apache Spark
 
Best Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and DeltaBest Practices for Building Robust Data Platform with Apache Spark and Delta
Best Practices for Building Robust Data Platform with Apache Spark and Delta
 
Continuous Integration & Continuous Delivery
Continuous Integration & Continuous DeliveryContinuous Integration & Continuous Delivery
Continuous Integration & Continuous Delivery
 
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...
 
Databricks: What We Have Learned by Eating Our Dog Food
Databricks: What We Have Learned by Eating Our Dog FoodDatabricks: What We Have Learned by Eating Our Dog Food
Databricks: What We Have Learned by Eating Our Dog Food
 
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSABuilding the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
Building the Foundations of an Intelligent, Event-Driven Data Platform at EFSA
 
Using Databricks as an Analysis Platform
Using Databricks as an Analysis PlatformUsing Databricks as an Analysis Platform
Using Databricks as an Analysis Platform
 
Data Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache ArrowData Science Across Data Sources with Apache Arrow
Data Science Across Data Sources with Apache Arrow
 
Enancing Threat Detection with Big Data and AI
Enancing Threat Detection with Big Data and AIEnancing Threat Detection with Big Data and AI
Enancing Threat Detection with Big Data and AI
 
Koalas: How Well Does Koalas Work?
Koalas: How Well Does Koalas Work?Koalas: How Well Does Koalas Work?
Koalas: How Well Does Koalas Work?
 
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
 
MLflow Model Serving
MLflow Model ServingMLflow Model Serving
MLflow Model Serving
 

Similar to Prediction io 架構與整合 -DataCon.TW-2017

[2C2]PredictionIO
[2C2]PredictionIO[2C2]PredictionIO
[2C2]PredictionIONAVER D2
 
Introduce to PredictionIO
Introduce to PredictionIOIntroduce to PredictionIO
Introduce to PredictionIOWei-Yuan Chang
 
Introduction to PredictionIO
Introduction to PredictionIOIntroduction to PredictionIO
Introduction to PredictionIOMuhammet Arslan
 
How to Webpack your Django!
How to Webpack your Django!How to Webpack your Django!
How to Webpack your Django!David Gibbons
 
SharePoint Saturday Atlanta 2015
SharePoint Saturday Atlanta 2015SharePoint Saturday Atlanta 2015
SharePoint Saturday Atlanta 2015Pushkar Chivate
 
Data Seeding via Parameterized API Requests
Data Seeding via Parameterized API RequestsData Seeding via Parameterized API Requests
Data Seeding via Parameterized API RequestsRapidValue
 
PredictionIO – A Machine Learning Server in Scala – SF Scala
PredictionIO – A Machine Learning Server in Scala – SF ScalaPredictionIO – A Machine Learning Server in Scala – SF Scala
PredictionIO – A Machine Learning Server in Scala – SF Scalapredictionio
 
Django Architecture Introduction
Django Architecture IntroductionDjango Architecture Introduction
Django Architecture IntroductionHaiqi Chen
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Chester Chen
 
OpenStack Rally presentation by RamaK
OpenStack Rally presentation by RamaKOpenStack Rally presentation by RamaK
OpenStack Rally presentation by RamaKRama Krishna B
 
Projeto-web-services-Spring-Boot-JPA.pdf
Projeto-web-services-Spring-Boot-JPA.pdfProjeto-web-services-Spring-Boot-JPA.pdf
Projeto-web-services-Spring-Boot-JPA.pdfAdrianoSantos888423
 
Dive into DevOps | March, Building with Terraform, Volodymyr Tsap
Dive into DevOps | March, Building with Terraform, Volodymyr TsapDive into DevOps | March, Building with Terraform, Volodymyr Tsap
Dive into DevOps | March, Building with Terraform, Volodymyr TsapProvectus
 
Pyramid Deployment and Maintenance
Pyramid Deployment and MaintenancePyramid Deployment and Maintenance
Pyramid Deployment and MaintenanceJazkarta, Inc.
 
Google cloud certified professional cloud developer practice dumps 2020
Google cloud certified professional cloud developer practice dumps 2020Google cloud certified professional cloud developer practice dumps 2020
Google cloud certified professional cloud developer practice dumps 2020SkillCertProExams
 

Similar to Prediction io 架構與整合 -DataCon.TW-2017 (20)

[2C2]PredictionIO
[2C2]PredictionIO[2C2]PredictionIO
[2C2]PredictionIO
 
Introduce to PredictionIO
Introduce to PredictionIOIntroduce to PredictionIO
Introduce to PredictionIO
 
pio_present
pio_presentpio_present
pio_present
 
Introduction to PredictionIO
Introduction to PredictionIOIntroduction to PredictionIO
Introduction to PredictionIO
 
Pyramid deployment
Pyramid deploymentPyramid deployment
Pyramid deployment
 
How to Webpack your Django!
How to Webpack your Django!How to Webpack your Django!
How to Webpack your Django!
 
SharePoint Saturday Atlanta 2015
SharePoint Saturday Atlanta 2015SharePoint Saturday Atlanta 2015
SharePoint Saturday Atlanta 2015
 
Data Seeding via Parameterized API Requests
Data Seeding via Parameterized API RequestsData Seeding via Parameterized API Requests
Data Seeding via Parameterized API Requests
 
PredictionIO – A Machine Learning Server in Scala – SF Scala
PredictionIO – A Machine Learning Server in Scala – SF ScalaPredictionIO – A Machine Learning Server in Scala – SF Scala
PredictionIO – A Machine Learning Server in Scala – SF Scala
 
Node.js and Parse
Node.js and ParseNode.js and Parse
Node.js and Parse
 
Django Architecture Introduction
Django Architecture IntroductionDjango Architecture Introduction
Django Architecture Introduction
 
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
 
OpenStack Rally presentation by RamaK
OpenStack Rally presentation by RamaKOpenStack Rally presentation by RamaK
OpenStack Rally presentation by RamaK
 
Projeto-web-services-Spring-Boot-JPA.pdf
Projeto-web-services-Spring-Boot-JPA.pdfProjeto-web-services-Spring-Boot-JPA.pdf
Projeto-web-services-Spring-Boot-JPA.pdf
 
NYC_2016_slides
NYC_2016_slidesNYC_2016_slides
NYC_2016_slides
 
Dive into DevOps | March, Building with Terraform, Volodymyr Tsap
Dive into DevOps | March, Building with Terraform, Volodymyr TsapDive into DevOps | March, Building with Terraform, Volodymyr Tsap
Dive into DevOps | March, Building with Terraform, Volodymyr Tsap
 
Google Cloud Platform
Google Cloud Platform Google Cloud Platform
Google Cloud Platform
 
Iac d.damyanov 4.pptx
Iac d.damyanov 4.pptxIac d.damyanov 4.pptx
Iac d.damyanov 4.pptx
 
Pyramid Deployment and Maintenance
Pyramid Deployment and MaintenancePyramid Deployment and Maintenance
Pyramid Deployment and Maintenance
 
Google cloud certified professional cloud developer practice dumps 2020
Google cloud certified professional cloud developer practice dumps 2020Google cloud certified professional cloud developer practice dumps 2020
Google cloud certified professional cloud developer practice dumps 2020
 

Recently uploaded

Business Analytics _ Confidence Interval
Business Analytics _ Confidence IntervalBusiness Analytics _ Confidence Interval
Business Analytics _ Confidence IntervalRavindra Nath Shukla
 
Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...ThinkInnovation
 
itc limited word file.pdf...............
itc limited word file.pdf...............itc limited word file.pdf...............
itc limited word file.pdf...............mahetamanav24
 
Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023stephizcoolio
 
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfIIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfAustraliaChapterIIBA
 
fundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptxfundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptxPoonamRijal
 
Lies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaLies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaAdrian Sanabria
 
A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)UNCResearchHub
 
Industry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxIndustry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxMdRafiqulIslam403212
 
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Thibaud Le Douarin
 
Operations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensOperations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensKondapi V Siva Rama Brahmam
 
AWS Identity and access management for users
AWS Identity and access management for usersAWS Identity and access management for users
AWS Identity and access management for usersStephenEfange3
 
Tips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsTips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsDataArchiva
 
ppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptxppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptxHizkiaJastis
 

Recently uploaded (15)

Business Analytics _ Confidence Interval
Business Analytics _ Confidence IntervalBusiness Analytics _ Confidence Interval
Business Analytics _ Confidence Interval
 
Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...Unlocking New Insights Into the World of European Soccer Through the European...
Unlocking New Insights Into the World of European Soccer Through the European...
 
itc limited word file.pdf...............
itc limited word file.pdf...............itc limited word file.pdf...............
itc limited word file.pdf...............
 
Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023Soil Health Policy Map Years 2020 to 2023
Soil Health Policy Map Years 2020 to 2023
 
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdfIIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
IIBA Adl - Being Effective on Day 1 - Slide Deck.pdf
 
fundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptxfundamentals of digital imaging - POONAM.pptx
fundamentals of digital imaging - POONAM.pptx
 
Lies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix EnigmaLies and Myths in InfoSec - 2023 Usenix Enigma
Lies and Myths in InfoSec - 2023 Usenix Enigma
 
A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)A Gentle Introduction to Text Analysis :)
A Gentle Introduction to Text Analysis :)
 
Industry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptxIndustry 4.0 in IoT Transforming the Future.pptx
Industry 4.0 in IoT Transforming the Future.pptx
 
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
Generative AI Rennes Meetup with OVHcloud - WAICF highlights & how to deploy ...
 
Electricity Year 2023_updated_22022024.pptx
Electricity Year 2023_updated_22022024.pptxElectricity Year 2023_updated_22022024.pptx
Electricity Year 2023_updated_22022024.pptx
 
Operations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample ScreensOperations Data On Mobile - inSis Mobile App - Sample Screens
Operations Data On Mobile - inSis Mobile App - Sample Screens
 
AWS Identity and access management for users
AWS Identity and access management for usersAWS Identity and access management for users
AWS Identity and access management for users
 
Tips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data GoalsTips to Align with Your Salesforce Data Goals
Tips to Align with Your Salesforce Data Goals
 
ppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptxppt penjualan berbasis online omset.pptx
ppt penjualan berbasis online omset.pptx
 

Prediction io 架構與整合 -DataCon.TW-2017

  • 1. Apache PredictionIO 架構與整合 Establish an effective machine learning platform efficiently 李文良 張峰睿 亦思科技 2017/09/30
  • 2. About us 李文良 (William Lee) 亦思科技 研究發展處 專案經理 williamlee@is-land.com.tw 張峰睿 (Frank Chang) 亦思科技 研究發展處 系統架構師 frank@is-land.com.tw
  • 3. Outline •Background •PredictionIO Overview •Quick Start your first Engine •Customizing an Engine •Implementation on Enterprise Production •Summary
  • 4. 對於機器學習的期望 《MIT Technology Review 》and 《Google Cloud》 報告裡所提到的: • 有 50% 的組織規劃將在將來透過機器學習來加深對手上資料群的了解,以 便得出更多的資訊。 • 用於取得更多的競爭優勢,或是加速現有資料的分析。 • 甚至有 31% 認為可以透過機器學習達到降低成本的功效。 https://s3.amazonaws.com/files.technologyreview.com/whitepapers/MITTR_GoogleforWork_Survey.pdf
  • 5. 機器學習在半導體領域的應用 ● 半導體生產通常需經過數百道的製程,過程中產出數百萬筆的資料。 ● 在產品的開發過程 RD 必須為這些資料訂定 SPEC 用於檢查品質以及機台調 整。 ● 通常需要 IT 人員協助取得資料集轉入統計軟體方能進行分析。 ● 若系統結合收集資料並且機器學習自動產生 SPEC 輔助人員確認,能減少產 品開發過程中所耗用的時間。 開發產品 資料收集 分析資料 調整參數 SPEC
  • 6. 機器學習在金融領域的應用 ● 現行金融業已經進入電子交易的時代,歷史交易累積成為大量的資料群。 並且隨時會透過交易系統加入新的資料。 ● 根據不同需求的從歷史資料庫中提取資訊進行分析。 ● 若是能建構出系統統合收集歷史資料的資料庫並且提供幾種機器學習的演 算法,便能夠加快分析資料到產出目標的時間。 歷史資料 用戶端 用戶端 用戶端 分析資料 設計產品 風險管理 異常分析 市場預測 分析資料 分析資料
  • 7. 從 Spark MLLib 開始 run examples 時似乎不錯
  • 8. 實際上建構系統時,卻有許多需要注意的部份...... Training Model 可以 存起來下次使用 。 存放在哪? 如何管理? App 或現有系統結合? 如何即時並且方便使用? Algorithm 如何執行? 參數和環境如何設定? 資料怎麼進來?存哪?
  • 9. Hidden Technical Debt in Machine Learning Systems “Only a small fraction of real-world ML systems is composed of the ML code. The required surrounding infrastructure is vast and complex.” Hidden Technical Debt in Machine Learning Systems , by Sculley, et al., NIPS, 2016
  • 10. Big Data System with Machine Learning Stacks API Service Server Spark ML Caffe, DeepLearning4J, Tensorflow, …... Hadoop, Spark, …... RDB, Hadoop HDFS, HBase, ES, …... Apps Algorithms Processing DataStore PredictionIO http://sssslide.com/speakerdeck.com/takahiro/building-a-recommendation-engine-with-spark-and-apache-predictionio
  • 11. Outline •Background •PredictionIO Overview •Quick Start your first Engine •Customizing an Engine •Implementation on Enterprise Production •Summary
  • 12. What is Apache PredictionIO Apache PredictionIO (incubating) 是開源機器學習伺服器平台。 提供開發者及資料科學家有效快速建立預測引擎。 並且整合所有應用系統達到 Machine Learning as a Service 的目標。 PredictionIO 可帶來下列預期效益: • 提供簡便資料收集以及儲存的方案,統合現有生態系中的平台。 • 讓開發者可以快速的使用模組建立 machine learning engine 並提供 Service 便於整合外部系統。 • 可以透過模組修改建立自訂的 machine learning engine。
  • 13. Latest release on 9/26 From PredictionIO JIRA web site, we can find: • Version 0.12.0 was released on 26/Sep’17
  • 15. SDK / Service Client Architecture Processing Event Server Prediction Engine PredictionIO Platform Engine Template Analytics Tools Storage Build Engine and Deploy
  • 17. Outline •Background •PredictionIO Overview •Quick Start your first Engine •Customizing an Engine •Implementation on Enterprise Production •Summary
  • 18. ● REST APIs ● SDKs ● 54 of available templates ● DASE for custom needs ● Source Code ● Docker Quick Start your first Engine Install & Start EventServer Train & Deploy Prediction Engine Query Result via REST 1. Install and Run PredictionIO 2. Create a new Engine from an Engine Template 3. Generate App ID and Access Key 6. Use the Engine Alternatives Operation Steps 5. Deploy the Engine as a Service 4. Collecting Data
  • 19. Installation & Quick Start ● 請參考 https://github.com/apache/incubator-predictionio/ https://github.com/apache/incubator-predictionio/
  • 20. Installing with Docker ● Install docker firstly ● Start docker-predictionio $ docker run -it -p 8000:8000 steveny/predictionio /bin/bash http://predictionio.incubator.apache.org/community/projects/#docker-installation-for-predictionio
  • 21. Installing From Source ● Up-to-date Version : 0.12.0 ● Downloading Source Code : https://github.com/apache/incubator-predictionio/ ● Building Dependencies: Ecosystem Versions of Dependencies Default Scala 2.10.x, 2.11.x 2.11.8 Spark 1.6.x, 2.0.x, 2.1.x 2.1.1 Elasticsearch 1.7.x, 5.x 5.5.2 Hadoop 2.4.x to 2.7.x 2.7.3(*) HBase 0.98.x, 1.2.x 1.2.6(*) https://predictionio.incubator.apache.org/install/install-sourcecode/ $ ./make-distribution.sh -Dscala.version=2.11.8 -Dspark.version=2.1.0 -Delasticsearch.version=5.3.0 ● Setup and Start PredictionIO
  • 22. Command Line ● General Commands ○ pio status : Displays install path and running status of PredictionIO system and its dependencies. ● Event Server Commands ○ pio eventserver : Launch the Event Server. ○ pio app : Manage apps that are used by the Event Server ● Engine Commands ○ pio build : Build the engine at the current directory. ○ pio train : Kick off a training using an engine. ○ pio deploy : Deploy an engine as an engine server. If no instance ID is specified, it will deploy the latest instance. https://predictionio.incubator.apache.org/cli/#engine-commands
  • 23. $ curl -i -X POST http://localhost:7070/events.json?accessKey=$ACCESS_KEY -H "Content-Type: application/json" -d '{$JSON-CONTEXT}' REST API and SDKs ● REST API : ○ port number for server access (default value, please make sure your setup) : ■ Event Server : 7070 ■ Engine : 8000 ○ example $ curl -H "Content-Type: application/json" -d '{ $JSON-CONTEXT }' http://localhost:8000/queries.json ● SDKs : ○ Java & Android ○ Python ○ PHP ○ Ruby https://predictionio.incubator.apache.org/cli/#engine-commands
  • 24. Outline •Background •PredictionIO Overview •Quick Start your first Engine •Customizing an Engine •Implementation on Enterprise Production •Summary
  • 27. Customizing your Engine with D-A-S-E 參數設定 D A S E https://predictionio.incubator.apache.org/customize/ Datasource.scala Preparator.scala ALSAlgorithm .scala Serving.scala Evaluation.scala Engine.scalaengine.json
  • 28. Engine Query case class case class Predicted Result Engine Factory object RecommendationEngine Query via REST Predicted Result Engine 參數設定 Engine.scala D A S E
  • 30. Data Source and Data Preparator readTrain() D A S E events RDD ratings RDD Training Data prepare() Action Required (*) Prepared Data DataSource Preparator Algorithm DataSource.scala Preparator.scala Note : * : Performs any necessary feature selection or data processing, etc. Event Server
  • 31. Algorithm train() D A S E algo Model predict() Model Predicted Result Algorithm Serving Prepared Data train Query ALSAlgorithm .scala 參數設定 Note : *: train() is called when you run “pio train”
  • 32. example of engine.json for Algorithm { ... "algorithms": [ { "name": "als", "params": { "rank": 10, "numIterations": 20, "lambda": 0.01, "seed": 3 } } ] ... } D A S E
  • 33. Serving serve() D A S E Predicted Result Predicted Result (JSON) Serving Query Predicted Results Combine Predicted Result (*) Note: *: serve() method will combine multiple predicted results into one if you have more than one predictive model
  • 34. Quiz (1/4) - Read Custom Events Q: 如何將 rate 及 buy 二種 event 改成 like 及 dislike ? events RDD val eventsRDD: RDD[Event] = PEventStore.find( appName = dsp.appName, entityType = Some("user"), eventNames = Some(List("rate", "buy")), // read "rate" and "buy" event // targetEntityType is optional field of an event. targetEntityType = Some(Some("item")))(sc) D A S E val eventsRDD: RDD[Event] = PEventStore.find( appName = dsp.appName, entityType = Some("customer"), // change user to customer eventNames = Some(List("like", "dislike")), // read "like" and "dislike” event // targetEntityType is optional field of an event. targetEntityType = Some(Some("product")))(sc) // Modified Before After
  • 35. Quiz (2/4) - Map Custom Events Q: 如何將 rate 及 buy 二種 event 改成 like 及 dislike ? (續) ratings RDD D A S E val ratingValue: Double = event.event match { case "rate" => event.properties.get[Double]("rating") case "buy" => 4.0 // map buy event to rating value of 4 case "like" => 4.0 // map a like event to a rating of 4.0 case "dislike" => 1.0 // map a like event to a rating of 1.0 case _ => throw new Exception(s"Unexpected event ${event} is read.") } val ratingValue: Double = event.event match { case "rate" => event.properties.get[Double]("rating") case "buy" => 4.0 // map buy event to rating value of 4 case _ => throw new Exception(s"Unexpected event ${event} is read.") } Before After
  • 36. Quiz (3/4) - Customizing Data Preparator Q: 如何將新增黑名單功能,讓系統濾除部份產品 ? class Preparator extends PPreparator[TrainingData, PreparedData] { def prepare(sc: SparkContext, trainingData: TrainingData): PreparedData = { new PreparedData(ratings = trainingData.ratings) } } D A S E import scala.io.Source // ADDED class Preparator extends PPreparator[TrainingData, PreparedData] { def prepare(sc: SparkContext, trainingData: TrainingData): PreparedData = { val noTrainItems = Source.fromFile("./data/sample_not_train_data.txt").getLines.toSet // exclude noTrainItems from original trainingData val ratings = trainingData.ratings.filter( r => !noTrainItems.contains(r.item) ) new PreparedData(ratings) } } Before After
  • 37. Quiz (4/4) - Release for your Change D A S E $ pio build $ pio train $ pio deploy ● How to release the modified engine(s) ?
  • 38. Evaluation (1/4) Evaluation AccuracyEvaluation Evaluation Metrics Engine Params List AccuracyAlgo D A S E case class Accuracy extends AverageMetric[EmptyEvaluationInfo, Query, PredictedResult, ActualResult] { def calculate(query: Query, predicted: PredictedResult, actual: ActualResult) : Double = (if (predicted.label == actual.label) 1.0 else 0.0) }
  • 39. Evaluation (2/4) Query case class case class Predicted Result RecommendationEngine Query via REST Predicted Result class DataSource 參數設定 D A S E case class Actual Result
  • 41. Evaluation (4/4) ● Build and run the evaluation ● Deploy the best engine parameter D A S E
  • 42. Outline •Background •PredictionIO Overview •Quick Start your first Engine •Customizing an Engine •Implementation on Enterprise Production •Summary
  • 43. Implementation on Enterprise Production Test Log Cluster Cluster Cluster Cluster Batch Data ( pio import) Real Time Data ( Streaming + PIO SDK ) Yield-En. Event Server Cluster Prediction Engine Cluster P1 Engine P2 Engine P3 Engine Meta Event Data Model Query via REST Prediction Result RDD Off-line Training PredictionIO Platform
  • 44. Deploy the Event Server onto Prediction Cluster Setup PredictionIO Run eventserver listen pio_engine_7070 :7070 mode http balance roundrobin option httpclose option forwardfor option redispatch retries 3 log global log 127.0.0.1 local4 info server piovm1 192.168.56.101:7070 check weight 1 maxconn 30 server piovm2 192.168.56.102:7070 check weight 1 maxconn 30 server piovm3 192.168.56.103:7070 check weight 1 maxconn 30 ● HAProxy configuration for Event Server Cluster Setup HAProxy 分別在需佈署之 Event Server 上,執行下列指令: $ pio eventserver &
  • 45. Deploy the Engine onto Prediction Cluster $ pio deploy 分別在需佈署之 Prediction Server 上,執行下列指令: $ pio deploy --port 8001 --engine-instance-id AV6dTEoKBlbECIGzXhaS Off-Line Engine Training
  • 46. Summary • Apache PredictionIO project is an active and popular project. • It will let you to integrate machine learning functions in your apps effectively and efficiently. • It is also convenient for you to consolidate multiple PredictionIO nodes with HAProxy and other Hadoop ecosystem to provide scalable and stable solution.