SlideShare a Scribd company logo
BUILD YOUR BI SYSTEM
PRACTICE IN DATA LAKE ECOSYSTEM
Bryan@Vpon Data
• Experience
Vpon Data Engineer
TWM, Keywear, Nielsen
• Bryan’s notes for data analysis
http://bryannotes.blogspot.tw
• Spark.TW
• Linikedin
https://tw.linkedin.com/pub/bryan-yang/7b/763/a79
ABOUT ME
AGENDA
• User Story
• Data Lake
• Frame Work of BI
DEAL WITH BIG DATA
SMALL RETAILER
MORE COMPLEX AND
BIG…
http://www.slideteam.net/technology-powerpoint-templates/mobile-phones.html
3 KINDS OF PROBLEMS
https://kavyamuthanna.wordpress.com/category/big-data/
BIG DATA BIG PROBLEM
http://www.mn.uio.no/ifi/studier/masteroppgaver/nd/masteroppgave_cloud_bigdata_hpc.html
BIG DATA BIG COST
• The cost of data storage
What does the data keep?
How long?
• The cost of data management
Is the machine and infra easy to maintain?
Data Flow(ETL)?
• The time cost of data processing
How long will the users can wait?
Accessibility of the data
Human costs you can not see
A REAL CASE
SO MANY ADHOC QUERIES
SALES
MARKETING
FINANCE
BUSINESS
EVEN A SIMPLE QUERY
Q: HI, PLEASE TELL ME HOW MANY
USERS FROM THE BEGINNING?
A:SELECT COUNT(1)
FROM LOG
ttps://myreelpov.wordpress.com/2012/12/23/which-story-do-you-prefer-life-of-pi/life-of-pi-2
Your Life
Boss
Family and Lover
Customers
Data Ocean
Overviews
Business intelligence (BI) is the set of techniques and tools for
the transformation of raw data into meaningful and useful
information for business analysis purposes. —Wikipedia
DIFFERENT FEATHERS
Price Perfomance Accessibility
Hadoop Low Median Low
SQL Server Low-Median Depends on Median
Data
Warehouse
High High Median
BI System High Depends on High
http://www.datalytyx.com/big-data-data-lakes/
WHY DATA LAKE
tp://thesologuide.com/332/the-seesaw-of-success-when-taking-a-rest-is-bes
HIVE
• Create at Facebook
• Data warehouse in Hadoop ecosystem
• HiveQL(SQL like interface)
• Metastore(Save the schema of data,
schema on read)
• UDF
http://www.stratapps.net/intro-hive.php
http://hortonworks.com/blog/hive-cheat-sheet-for-sql-users/
ONE MORE THING
TERADATA
• Massively Parallel Processing
• Each processor handles different threads
of the program, and Each processor itself
has its own operating disk
• Teradata SQL is fully certified at the SQL
92
http://www.slideshare.net/alam7/module-02-teradata-basics
https://www.safaribooksonline.com/library/view/teradata-architecture-for
TABLEAU
• Visualization Tool
• Connect with kinds of database
• VizQL
• Tableau Server
http://www.clearpeaks.com/blog/tableau/tableau-8-2-new-features
https://www.youtube.com/watch?v=fYpy04vmG_o
m/services/business-intelligence-services/tableau-consulting/table
JENKINS
• Manage ETL processes
• Free & Many Plugins
• Monitor Jobs Status and dependency
• Communication with Git and SVM
• Email alert
User Interface回到首頁
管理選單
建置中項目
Joblist 建置資訊
建置狀態
下次建置項目及時間
ip:port
Job Name List
Job Name
List
Build Steps
call python script
call the remote shell
call local shell script
Build Graph
Job Name Job Name Job Name
Job Name Job Name Job Name
Job Name Job Name Job Name
LET’S PUT IT ALL
TOGETHER
Hadoop
Cluster 1
Hadoop
Cluster 2
Teradata
Tableau
Server
User
Data Transfer
Request
ETL
Live Query Too Slow
Data Slicing
Hadoop
Cluster 1
Hadoop
Cluster 2
Teradata
Tableau
Server
User
Data Transfer
Request
ETL
Extract Data Insufficient Space
Data Slicing
Hadoop
Cluster 1
Hadoop
Cluster 2
Teradata
Tableau
Server
User
Data Transfer
Request
ETL
Extract Data
Every Day
Table
View
Statistical Tables
Data Slicing
USER EXPERIENCE
TUNING
0
30
60
90
120
150
HIVE TERADATA BI
120X Faster
HOW TO CHOOSE THE
COMPONENT IN YOUR BI
FRAMEWORK ?
• The cost of data storage
• The cost of data management
• The time cost of data processing
CONSIDERINGS AND
SUGGESTIONS
• Time is money
• HDD space/ money for the time
• Understanding the components and
relationships
• Get balance of the needs and costs
• Good framework will help business growth
COST CURVE
Business Growth
CostofBusinessGrowth
Hardware
*More Nodes
*More Memories
*Graph Card
…
Software
*Spark
*Tez
*Tachyon
*Algorithm
…
IN THE FUTURE
Cloud
*EC2
*Big Query
*Bluemix
*SAP
…
THANK YOU FOR YOUR
LISTENING
Special Thank
Vpon
Hood, Meiyen, Gil and OPS Team
Q & A

More Related Content

What's hot

ECS19 - Marco Rocca and Fabio Franzini - Need a custom logic in PowerApps? Us...
ECS19 - Marco Rocca and Fabio Franzini - Need a custom logic in PowerApps? Us...ECS19 - Marco Rocca and Fabio Franzini - Need a custom logic in PowerApps? Us...
ECS19 - Marco Rocca and Fabio Franzini - Need a custom logic in PowerApps? Us...
European Collaboration Summit
 
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Kaxil Naik
 
Icinga Camp San Diego: Apify them all
Icinga Camp San Diego: Apify them allIcinga Camp San Diego: Apify them all
Icinga Camp San Diego: Apify them all
Icinga
 
Lessons from the Trenches - Building Enterprise Applications with RavenDB
Lessons from the Trenches - Building Enterprise Applications with RavenDBLessons from the Trenches - Building Enterprise Applications with RavenDB
Lessons from the Trenches - Building Enterprise Applications with RavenDB
Oren Eini
 
Building a friendly .NET SDK to connect to Space
Building a friendly .NET SDK to connect to SpaceBuilding a friendly .NET SDK to connect to Space
Building a friendly .NET SDK to connect to Space
Maarten Balliauw
 
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSISMicrosoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
Mark Kromer
 
02 integrate highchart
02 integrate highchart02 integrate highchart
02 integrate highchart
Erhwen Kuo
 
5 Amazing Reasons DBAs Need to Love Extended Events
5 Amazing Reasons DBAs Need to Love Extended Events5 Amazing Reasons DBAs Need to Love Extended Events
5 Amazing Reasons DBAs Need to Love Extended Events
Jason Strate
 
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with AzkabanBuilding a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with AzkabanDataWorks Summit
 
HyperBatch
HyperBatchHyperBatch
HyperBatch
Daniel Peter
 
Icinga Camp Bangalore - Icinga and Icinga Director
Icinga Camp Bangalore - Icinga and Icinga Director Icinga Camp Bangalore - Icinga and Icinga Director
Icinga Camp Bangalore - Icinga and Icinga Director
Icinga
 
A (XPages) developers guide to Cloudant - MeetIT
A (XPages) developers guide to Cloudant - MeetITA (XPages) developers guide to Cloudant - MeetIT
A (XPages) developers guide to Cloudant - MeetIT
Frank van der Linden
 
Continuous Intelligence - Streaming Apps That Are Always In Sync | Simon Cros...
Continuous Intelligence - Streaming Apps That Are Always In Sync | Simon Cros...Continuous Intelligence - Streaming Apps That Are Always In Sync | Simon Cros...
Continuous Intelligence - Streaming Apps That Are Always In Sync | Simon Cros...
HostedbyConfluent
 
Building Codealike: a journey into the developers analytics world
Building Codealike: a journey into the developers analytics worldBuilding Codealike: a journey into the developers analytics world
Building Codealike: a journey into the developers analytics world
Oren Eini
 
SenchaCon 2016: A Data-Driven Application for the Embedded World - Jean-Phili...
SenchaCon 2016: A Data-Driven Application for the Embedded World - Jean-Phili...SenchaCon 2016: A Data-Driven Application for the Embedded World - Jean-Phili...
SenchaCon 2016: A Data-Driven Application for the Embedded World - Jean-Phili...
Sencha
 
GraphQL vs. (the) REST
GraphQL vs. (the) RESTGraphQL vs. (the) REST
GraphQL vs. (the) REST
coliquio GmbH
 
The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!
Michele Leroux Bustamante
 
Database migrations with Flyway and Liquibase
Database migrations with Flyway and LiquibaseDatabase migrations with Flyway and Liquibase
Database migrations with Flyway and Liquibase
Lars Östling
 
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDB
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDBZapping ever faster: how Zap sped up by two orders of magnitude using RavenDB
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDB
Oren Eini
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
Databricks
 

What's hot (20)

ECS19 - Marco Rocca and Fabio Franzini - Need a custom logic in PowerApps? Us...
ECS19 - Marco Rocca and Fabio Franzini - Need a custom logic in PowerApps? Us...ECS19 - Marco Rocca and Fabio Franzini - Need a custom logic in PowerApps? Us...
ECS19 - Marco Rocca and Fabio Franzini - Need a custom logic in PowerApps? Us...
 
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
Apache Airflow in the Cloud: Programmatically orchestrating workloads with Py...
 
Icinga Camp San Diego: Apify them all
Icinga Camp San Diego: Apify them allIcinga Camp San Diego: Apify them all
Icinga Camp San Diego: Apify them all
 
Lessons from the Trenches - Building Enterprise Applications with RavenDB
Lessons from the Trenches - Building Enterprise Applications with RavenDBLessons from the Trenches - Building Enterprise Applications with RavenDB
Lessons from the Trenches - Building Enterprise Applications with RavenDB
 
Building a friendly .NET SDK to connect to Space
Building a friendly .NET SDK to connect to SpaceBuilding a friendly .NET SDK to connect to Space
Building a friendly .NET SDK to connect to Space
 
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSISMicrosoft Data Integration Pipelines: Azure Data Factory and SSIS
Microsoft Data Integration Pipelines: Azure Data Factory and SSIS
 
02 integrate highchart
02 integrate highchart02 integrate highchart
02 integrate highchart
 
5 Amazing Reasons DBAs Need to Love Extended Events
5 Amazing Reasons DBAs Need to Love Extended Events5 Amazing Reasons DBAs Need to Love Extended Events
5 Amazing Reasons DBAs Need to Love Extended Events
 
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with AzkabanBuilding a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
 
HyperBatch
HyperBatchHyperBatch
HyperBatch
 
Icinga Camp Bangalore - Icinga and Icinga Director
Icinga Camp Bangalore - Icinga and Icinga Director Icinga Camp Bangalore - Icinga and Icinga Director
Icinga Camp Bangalore - Icinga and Icinga Director
 
A (XPages) developers guide to Cloudant - MeetIT
A (XPages) developers guide to Cloudant - MeetITA (XPages) developers guide to Cloudant - MeetIT
A (XPages) developers guide to Cloudant - MeetIT
 
Continuous Intelligence - Streaming Apps That Are Always In Sync | Simon Cros...
Continuous Intelligence - Streaming Apps That Are Always In Sync | Simon Cros...Continuous Intelligence - Streaming Apps That Are Always In Sync | Simon Cros...
Continuous Intelligence - Streaming Apps That Are Always In Sync | Simon Cros...
 
Building Codealike: a journey into the developers analytics world
Building Codealike: a journey into the developers analytics worldBuilding Codealike: a journey into the developers analytics world
Building Codealike: a journey into the developers analytics world
 
SenchaCon 2016: A Data-Driven Application for the Embedded World - Jean-Phili...
SenchaCon 2016: A Data-Driven Application for the Embedded World - Jean-Phili...SenchaCon 2016: A Data-Driven Application for the Embedded World - Jean-Phili...
SenchaCon 2016: A Data-Driven Application for the Embedded World - Jean-Phili...
 
GraphQL vs. (the) REST
GraphQL vs. (the) RESTGraphQL vs. (the) REST
GraphQL vs. (the) REST
 
The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!The Ultimate Logging Architecture - You KNOW you want it!
The Ultimate Logging Architecture - You KNOW you want it!
 
Database migrations with Flyway and Liquibase
Database migrations with Flyway and LiquibaseDatabase migrations with Flyway and Liquibase
Database migrations with Flyway and Liquibase
 
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDB
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDBZapping ever faster: how Zap sped up by two orders of magnitude using RavenDB
Zapping ever faster: how Zap sped up by two orders of magnitude using RavenDB
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
 

Viewers also liked

Build your ETL job using Jenkins - step by step
Build your ETL job using Jenkins - step by stepBuild your ETL job using Jenkins - step by step
Build your ETL job using Jenkins - step by step
Bryan Yang
 
Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material
Bryan Yang
 
Artificial Intelligence at Work - Assist Workshop 2016 - Nick Triantos - SRI
Artificial Intelligence at Work - Assist Workshop 2016 - Nick Triantos - SRIArtificial Intelligence at Work - Assist Workshop 2016 - Nick Triantos - SRI
Artificial Intelligence at Work - Assist Workshop 2016 - Nick Triantos - SRI
Assist
 
Introduce to Spark sql 1.3.0
Introduce to Spark sql 1.3.0 Introduce to Spark sql 1.3.0
Introduce to Spark sql 1.3.0
Bryan Yang
 
Blockchain Smartnetworks
Blockchain Smartnetworks Blockchain Smartnetworks
Blockchain Smartnetworks
Melanie Swan
 
DSP 資料科學計畫簡介
DSP 資料科學計畫簡介DSP 資料科學計畫簡介
DSP 資料科學計畫簡介
codefortomorrow
 
Spark Sql for Training
Spark Sql for TrainingSpark Sql for Training
Spark Sql for Training
Bryan Yang
 
Big data para principiantes
Big data para principiantesBig data para principiantes
Big data para principiantes
Carlos Toxtli
 
Estudio "Big Data: retos y oportunidades para el turismo"
Estudio "Big Data: retos y oportunidades para el turismo"Estudio "Big Data: retos y oportunidades para el turismo"
Estudio "Big Data: retos y oportunidades para el turismo"
Invattur
 
Introducción al Big Data
Introducción al Big DataIntroducción al Big Data
Introducción al Big Data
David Alayón
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
Randy L. Archambault
 
Business Intelligence - Intro
Business Intelligence - IntroBusiness Intelligence - Intro
Business Intelligence - Intro
David Hubbard
 
Building A Bi Strategy
Building A Bi StrategyBuilding A Bi Strategy
Building A Bi Strategy
larryzagata
 
Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)
Bernardo Najlis
 
Business intelligence ppt
Business intelligence pptBusiness intelligence ppt
Business intelligence ppt
sujithkylm007
 

Viewers also liked (16)

Build your ETL job using Jenkins - step by step
Build your ETL job using Jenkins - step by stepBuild your ETL job using Jenkins - step by step
Build your ETL job using Jenkins - step by step
 
Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material
 
Artificial Intelligence at Work - Assist Workshop 2016 - Nick Triantos - SRI
Artificial Intelligence at Work - Assist Workshop 2016 - Nick Triantos - SRIArtificial Intelligence at Work - Assist Workshop 2016 - Nick Triantos - SRI
Artificial Intelligence at Work - Assist Workshop 2016 - Nick Triantos - SRI
 
Introduce to Spark sql 1.3.0
Introduce to Spark sql 1.3.0 Introduce to Spark sql 1.3.0
Introduce to Spark sql 1.3.0
 
Blockchain Smartnetworks
Blockchain Smartnetworks Blockchain Smartnetworks
Blockchain Smartnetworks
 
DSP 資料科學計畫簡介
DSP 資料科學計畫簡介DSP 資料科學計畫簡介
DSP 資料科學計畫簡介
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Spark Sql for Training
Spark Sql for TrainingSpark Sql for Training
Spark Sql for Training
 
Big data para principiantes
Big data para principiantesBig data para principiantes
Big data para principiantes
 
Estudio "Big Data: retos y oportunidades para el turismo"
Estudio "Big Data: retos y oportunidades para el turismo"Estudio "Big Data: retos y oportunidades para el turismo"
Estudio "Big Data: retos y oportunidades para el turismo"
 
Introducción al Big Data
Introducción al Big DataIntroducción al Big Data
Introducción al Big Data
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Business Intelligence - Intro
Business Intelligence - IntroBusiness Intelligence - Intro
Business Intelligence - Intro
 
Building A Bi Strategy
Building A Bi StrategyBuilding A Bi Strategy
Building A Bi Strategy
 
Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)Business Intelligence Presentation (1/2)
Business Intelligence Presentation (1/2)
 
Business intelligence ppt
Business intelligence pptBusiness intelligence ppt
Business intelligence ppt
 

Similar to Building your bi system-HadoopCon Taiwan 2015

Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
iguazio
 
Measure and increase developer productivity with help of Severless by Kazulki...
Measure and increase developer productivity with help of Severless by Kazulki...Measure and increase developer productivity with help of Severless by Kazulki...
Measure and increase developer productivity with help of Severless by Kazulki...
Vadym Kazulkin
 
Geek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure EnvironmentsGeek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure Environments
IDERA Software
 
O365con14 - migrating your e-mail to the cloud
O365con14 - migrating your e-mail to the cloudO365con14 - migrating your e-mail to the cloud
O365con14 - migrating your e-mail to the cloud
NCCOMMS
 
CIAOPS Need to Know Office 365 Webinar - December 2017
CIAOPS Need to Know Office 365 Webinar - December 2017CIAOPS Need to Know Office 365 Webinar - December 2017
CIAOPS Need to Know Office 365 Webinar - December 2017
Robert Crane
 
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
TechWell
 
Measure and Increase Developer Productivity with Help of Serverless AWS Commu...
Measure and Increase Developer Productivity with Help of Serverless AWS Commu...Measure and Increase Developer Productivity with Help of Serverless AWS Commu...
Measure and Increase Developer Productivity with Help of Serverless AWS Commu...
Vadym Kazulkin
 
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
Delivering Insights from 20M+ Smart Homes with 500M+ DevicesDelivering Insights from 20M+ Smart Homes with 500M+ Devices
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
Databricks
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Databricks
 
Transforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOpsTransforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOps
Nicolas (Nick) Barcet
 
Network Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveNetwork Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspective
Walid Shaari
 
Mstr meetup
Mstr meetupMstr meetup
Mstr meetup
Bhavani Akunuri
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
Kent Graziano
 
SharePoint Best Practices Conference 2013
SharePoint Best Practices Conference 2013SharePoint Best Practices Conference 2013
SharePoint Best Practices Conference 2013
Mike Brannon
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
Torsten Steinbach
 
Measure and Increase Developer Productivity with Help of Serverless at Server...
Measure and Increase Developer Productivity with Help of Serverless at Server...Measure and Increase Developer Productivity with Help of Serverless at Server...
Measure and Increase Developer Productivity with Help of Serverless at Server...
Vadym Kazulkin
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
Timothy St. Clair
 
Working Software Over Comprehensive Documentation
Working Software Over Comprehensive DocumentationWorking Software Over Comprehensive Documentation
Working Software Over Comprehensive Documentation
Andrii Dzynia
 
Technical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfTechnical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdf
Ilham31574
 
Bring-your-ML-Project-into-Production-v2.pdf
Bring-your-ML-Project-into-Production-v2.pdfBring-your-ML-Project-into-Production-v2.pdf
Bring-your-ML-Project-into-Production-v2.pdf
Liang Yan
 

Similar to Building your bi system-HadoopCon Taiwan 2015 (20)

Challenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in ProductionChallenges of Operationalising Data Science in Production
Challenges of Operationalising Data Science in Production
 
Measure and increase developer productivity with help of Severless by Kazulki...
Measure and increase developer productivity with help of Severless by Kazulki...Measure and increase developer productivity with help of Severless by Kazulki...
Measure and increase developer productivity with help of Severless by Kazulki...
 
Geek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure EnvironmentsGeek Sync | Deployment and Management of Complex Azure Environments
Geek Sync | Deployment and Management of Complex Azure Environments
 
O365con14 - migrating your e-mail to the cloud
O365con14 - migrating your e-mail to the cloudO365con14 - migrating your e-mail to the cloud
O365con14 - migrating your e-mail to the cloud
 
CIAOPS Need to Know Office 365 Webinar - December 2017
CIAOPS Need to Know Office 365 Webinar - December 2017CIAOPS Need to Know Office 365 Webinar - December 2017
CIAOPS Need to Know Office 365 Webinar - December 2017
 
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
 
Measure and Increase Developer Productivity with Help of Serverless AWS Commu...
Measure and Increase Developer Productivity with Help of Serverless AWS Commu...Measure and Increase Developer Productivity with Help of Serverless AWS Commu...
Measure and Increase Developer Productivity with Help of Serverless AWS Commu...
 
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
Delivering Insights from 20M+ Smart Homes with 500M+ DevicesDelivering Insights from 20M+ Smart Homes with 500M+ Devices
Delivering Insights from 20M+ Smart Homes with 500M+ Devices
 
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at DatabricksLessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
 
Transforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOpsTransforming to OpenStack: a sample roadmap to DevOps
Transforming to OpenStack: a sample roadmap to DevOps
 
Network Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveNetwork Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspective
 
Mstr meetup
Mstr meetupMstr meetup
Mstr meetup
 
Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)Demystifying Data Warehouse as a Service (DWaaS)
Demystifying Data Warehouse as a Service (DWaaS)
 
SharePoint Best Practices Conference 2013
SharePoint Best Practices Conference 2013SharePoint Best Practices Conference 2013
SharePoint Best Practices Conference 2013
 
Serverless SQL
Serverless SQLServerless SQL
Serverless SQL
 
Measure and Increase Developer Productivity with Help of Serverless at Server...
Measure and Increase Developer Productivity with Help of Serverless at Server...Measure and Increase Developer Productivity with Help of Serverless at Server...
Measure and Increase Developer Productivity with Help of Serverless at Server...
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 
Working Software Over Comprehensive Documentation
Working Software Over Comprehensive DocumentationWorking Software Over Comprehensive Documentation
Working Software Over Comprehensive Documentation
 
Technical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdfTechnical Deck Delta Live Tables.pdf
Technical Deck Delta Live Tables.pdf
 
Bring-your-ML-Project-into-Production-v2.pdf
Bring-your-ML-Project-into-Production-v2.pdfBring-your-ML-Project-into-Production-v2.pdf
Bring-your-ML-Project-into-Production-v2.pdf
 

More from Bryan Yang

敏捷開發心法
敏捷開發心法敏捷開發心法
敏捷開發心法
Bryan Yang
 
Data pipeline essential
Data pipeline essentialData pipeline essential
Data pipeline essential
Bryan Yang
 
Docker 101
Docker 101Docker 101
Docker 101
Bryan Yang
 
資料分析的快樂就是如此樸實無華且枯燥
資料分析的快樂就是如此樸實無華且枯燥資料分析的快樂就是如此樸實無華且枯燥
資料分析的快樂就是如此樸實無華且枯燥
Bryan Yang
 
Data pipeline 101
Data pipeline 101Data pipeline 101
Data pipeline 101
Bryan Yang
 
Building a data driven business
Building a data driven businessBuilding a data driven business
Building a data driven business
Bryan Yang
 
產業數據力-以傳統零售業為例
產業數據力-以傳統零售業為例產業數據力-以傳統零售業為例
產業數據力-以傳統零售業為例
Bryan Yang
 
Serverless ETL
Serverless ETLServerless ETL
Serverless ETL
Bryan Yang
 
敏捷開發心法
敏捷開發心法敏捷開發心法
敏捷開發心法
Bryan Yang
 
Introduction to docker
Introduction to dockerIntroduction to docker
Introduction to docker
Bryan Yang
 

More from Bryan Yang (10)

敏捷開發心法
敏捷開發心法敏捷開發心法
敏捷開發心法
 
Data pipeline essential
Data pipeline essentialData pipeline essential
Data pipeline essential
 
Docker 101
Docker 101Docker 101
Docker 101
 
資料分析的快樂就是如此樸實無華且枯燥
資料分析的快樂就是如此樸實無華且枯燥資料分析的快樂就是如此樸實無華且枯燥
資料分析的快樂就是如此樸實無華且枯燥
 
Data pipeline 101
Data pipeline 101Data pipeline 101
Data pipeline 101
 
Building a data driven business
Building a data driven businessBuilding a data driven business
Building a data driven business
 
產業數據力-以傳統零售業為例
產業數據力-以傳統零售業為例產業數據力-以傳統零售業為例
產業數據力-以傳統零售業為例
 
Serverless ETL
Serverless ETLServerless ETL
Serverless ETL
 
敏捷開發心法
敏捷開發心法敏捷開發心法
敏捷開發心法
 
Introduction to docker
Introduction to dockerIntroduction to docker
Introduction to docker
 

Recently uploaded

一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 

Recently uploaded (20)

一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 

Building your bi system-HadoopCon Taiwan 2015

Editor's Notes

  1. big data brings the problem in 3 ways. Variety: kinds of data types, data sources , databases Volume: log data, transection data, crawler data Velocity: real time ,near real time, batch
  2. Vpon is a big data advertising company. We receive and produce amount of data a day.
  3. 業務需求反應能等待的處理時間
  4. We receive so many adhoc queries a day. Queries are com from each development like Business development, sales, Account services RD blahblah. For example, how many users a day, how many requests a day, click rate, etc.
  5. 業務需求反應能等待的處理時間