SlideShare a Scribd company logo
1 of 36
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
Data to Drive Decision-Making
2015-03-03
Jerome Boulon
1
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
Quick History
1999 2008 2009 2010 2012
Yahoo!: Chukwa
Hadoop Monitoring Solution
Netflix: Honu
Data Collection Pipeline
CaliStream: Founder
Honu: Data As a Service
Monitoring Solution
for cable modems/TV
network
Ontology Search
Acquired by Microsoft
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
Agenda
3
• Agenda
• Decision-Making, the process
• Netflix Recommendation
• Big Data @ Riot Games
• Data Pipeline
• Conclusion
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
Decision-Making, The Process
• Explicit
• Theory Driven
• Data-driven
• Measurable Outcomes
• Iterative
Prove a hypothesis right (or wrong)
Want result AND explanation
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
Iterative
Hypothesis
Gather DataAnalyze Data
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
Offline
Testing
Online
A/BTesting
Roll-out
To Prod
Offline/Online Testing
Fail
Success
Success
Iterations
Iterations
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
Netflix Recommendation
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
Presentation 1
1/3
0/3
0+/3
1+/3 Presentation 2
1/3
1/3
0+/3
2+/3
Netflix Recommendation
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
Offline
Testing
Online
A/BTesting
Roll-out
To Prod
The Data
Fail
Success
Success
Search
Time
Rating
Impression
Demographic
Social
User Behavior
Geo Information
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
The Gaming Space
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
Use Case: Honu @
• Error code
• AVG Ping latency
• AVG Queue
Wait time
• ClientVersion
• Operating System
• Hardware Profile
• Etc
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
The Data Pipeline
12
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
Data Pipeline
• Direct correlation to the success of your Big Data
project
• Structured and Multi-Structured Events
• Schema evolution
• Collecting massive amount of data live
• 60% of BI project resources is consume here!
• Most “underestimated” and “unsexy” but MOST
important phase
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
S3
Apps
Sensor
Data
Click
Stream
Location
…
• Built for Netflix scale
• 100+ Billions events/Day
• Automatic Discovery,
Load-balancing, Fail-over
• Schema-less
• …
HONU
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
S3
Apps
Sensor
Data
Click
Stream
Location
…
• Team
• Event format
• Protocol
• Discovery
• Load balancing
• Fail-Over
• Kinesis/Kafka/Scrib
e …
• …
Collect
Collect
Collect
Collectors
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
S3
Apps
Sensor
Data
Click
Stream
Location
…
Collect
Collect
Collect
Collectors
• HadoopTeam
• Hadoop knowledge
• Hive
• Schema evolution
• Upgrades
• Files optimization
• …
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
S3
Apps
Sensor
Data
Click
Stream
Location
…
Time to market
> 18 Months !
Collect
Collect
Collect
Collectors
Take Control ofYour Data: CaliStream.com © CaliStream.com 201518
CaliStreamTake Control ofYour Data
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
S3
Apps
Sensor
Data
Click
Stream
Location
…
HONU
CaliStream:
Honu as a Service
Time to market (Hours)
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
Our Solution: CaliStream
CaliStream provides
a SaaS data processing pipeline
to easily stream large volume of events
from your applications directly to Hive/Hadoop
in a robust, scalable and cost effective way
without any prior Hadoop Knowledge
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
CaliStream, Native Hive integration
Big
Data
Sensor
Data
Social
Click
Stream
Location
logs
Sensor
Data
Click
Stream
Location logsSocial
… …
CaliStream
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
CaliStream
Everything in CaliStream
is represented as a Hive table
so you can easily analyze your data
Select […]
from[…]
where[…]
group by […] ;
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
CaliStream
HiveTable
Java API
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
Generic REST API + JSON
CaliStream
HiveTable
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
HTTPRestProxy
Collectors
BI Analytics
Redshift
CaliStream
EchoService
AppsClusters
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
CaliStream.comTake Control ofYour Data
Jerome Boulon
jboulon@caliStream.com
CaliStreamTake Control ofYour Data
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
Backup Slides
27
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
BeaconServers
Collectors
BI Analytics
Redshift
AppsClusters
*
Ownership
CaliStream
* Run on your account
Connectors
Your
Company
Your
Company
Your
S3 Bucket
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
IoT Use Cases
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
IoT Use Case 1
30
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
Use Case: Global Boats tracking system
• Boats information
• Telemetry data
• etc
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
IoT Use Case 2
32
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
Use case: Small company in
Montreal
• Business model: Sensor data acquisition for cities
worldwide (Montreal,Toronto, Paris, …)
Database
Montreal
Database
City 2
Database
City n
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
Problems
• Data silos: each city is self-contained, managed
with his own deployment
• Strict data access and security rules that limit
business opportunities
• Deriving value from larger data set is a tedious
and manual task
• Multiple copies of the same data
• Etc
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
Customer Initial Architecture
35
Database
Sensors
Sensor Listener Service
Take Control ofYour Data: CaliStream.com © CaliStream.com 2015
CaliStream Integration (< 1 Week)
36
Sensors
Sensor Listener Service
CaliStream
Database
Table-1 Table-2 Table-n
S3
Data
Warehous
e
ive
CaliStream Client SDK
CaliStream Events
Unified
S3 Data warehouse
Big Data:
TheValue

More Related Content

What's hot

How Disney+ uses fast data ubiquity to improve the customer experience
 How Disney+ uses fast data ubiquity to improve the customer experience  How Disney+ uses fast data ubiquity to improve the customer experience
How Disney+ uses fast data ubiquity to improve the customer experience Martin Zapletal
 
Fast data for fitness 10 nov 2020
Fast data for fitness 10 nov 2020Fast data for fitness 10 nov 2020
Fast data for fitness 10 nov 2020Timothy Spann
 
Span Conference: Why your company needs a unified log
Span Conference: Why your company needs a unified logSpan Conference: Why your company needs a unified log
Span Conference: Why your company needs a unified logAlexander Dean
 
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...✔ Eric David Benari, PMP
 
Introducing Tupilak, Snowplow's unified log fabric
Introducing Tupilak, Snowplow's unified log fabricIntroducing Tupilak, Snowplow's unified log fabric
Introducing Tupilak, Snowplow's unified log fabricAlexander Dean
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsDatabricks
 
One Kubernetes to rule them all (ZEUS 2019 Keynote)
One Kubernetes to rule them all (ZEUS 2019 Keynote)One Kubernetes to rule them all (ZEUS 2019 Keynote)
One Kubernetes to rule them all (ZEUS 2019 Keynote)Simon Harrer
 
Optimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystemOptimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystemDataWorks Summit
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...HostedbyConfluent
 
Apache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainApache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainLuke Han
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)Eva Tse
 
How to Quantify the Value of Kafka in Your Organization
How to Quantify the Value of Kafka in Your Organization How to Quantify the Value of Kafka in Your Organization
How to Quantify the Value of Kafka in Your Organization confluent
 
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, BlazegraphDatabase Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph✔ Eric David Benari, PMP
 
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...HostedbyConfluent
 
Druid Overview by Rachel Pedreschi
Druid Overview by Rachel PedreschiDruid Overview by Rachel Pedreschi
Druid Overview by Rachel PedreschiBrian Olsen
 
AWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache StormAWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache StormAmazon Web Services
 
AWS re:Invent 2016: Taking Data to the Extreme (MBL202)
AWS re:Invent 2016: Taking Data to the Extreme (MBL202)AWS re:Invent 2016: Taking Data to the Extreme (MBL202)
AWS re:Invent 2016: Taking Data to the Extreme (MBL202)Amazon Web Services
 
Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...
Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...
Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...Amazon Web Services
 
Big Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics PlatformBig Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics PlatformSudhir Tonse
 
Scala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaScala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaAlexander Dean
 

What's hot (20)

How Disney+ uses fast data ubiquity to improve the customer experience
 How Disney+ uses fast data ubiquity to improve the customer experience  How Disney+ uses fast data ubiquity to improve the customer experience
How Disney+ uses fast data ubiquity to improve the customer experience
 
Fast data for fitness 10 nov 2020
Fast data for fitness 10 nov 2020Fast data for fitness 10 nov 2020
Fast data for fitness 10 nov 2020
 
Span Conference: Why your company needs a unified log
Span Conference: Why your company needs a unified logSpan Conference: Why your company needs a unified log
Span Conference: Why your company needs a unified log
 
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
Database Camp 2016 @ United Nations, NYC - Michael Glukhovsky, Co-Founder, Re...
 
Introducing Tupilak, Snowplow's unified log fabric
Introducing Tupilak, Snowplow's unified log fabricIntroducing Tupilak, Snowplow's unified log fabric
Introducing Tupilak, Snowplow's unified log fabric
 
Headaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous ApplicationsHeadaches and Breakthroughs in Building Continuous Applications
Headaches and Breakthroughs in Building Continuous Applications
 
One Kubernetes to rule them all (ZEUS 2019 Keynote)
One Kubernetes to rule them all (ZEUS 2019 Keynote)One Kubernetes to rule them all (ZEUS 2019 Keynote)
One Kubernetes to rule them all (ZEUS 2019 Keynote)
 
Optimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystemOptimizing industrial operations using the big data ecosystem
Optimizing industrial operations using the big data ecosystem
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
 
Apache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data SpainApache Kylin and Use Cases - 2018 Big Data Spain
Apache Kylin and Use Cases - 2018 Big Data Spain
 
The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)The evolution of the big data platform @ Netflix (OSCON 2015)
The evolution of the big data platform @ Netflix (OSCON 2015)
 
How to Quantify the Value of Kafka in Your Organization
How to Quantify the Value of Kafka in Your Organization How to Quantify the Value of Kafka in Your Organization
How to Quantify the Value of Kafka in Your Organization
 
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, BlazegraphDatabase Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
Database Camp 2016 @ United Nations, NYC - Brad Bebee, CEO, Blazegraph
 
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
Building a Streaming Pipeline on Kubernetes Using Kafka Connect, KSQLDB & Apa...
 
Druid Overview by Rachel Pedreschi
Druid Overview by Rachel PedreschiDruid Overview by Rachel Pedreschi
Druid Overview by Rachel Pedreschi
 
AWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache StormAWS Webcast - Amazon Kinesis and Apache Storm
AWS Webcast - Amazon Kinesis and Apache Storm
 
AWS re:Invent 2016: Taking Data to the Extreme (MBL202)
AWS re:Invent 2016: Taking Data to the Extreme (MBL202)AWS re:Invent 2016: Taking Data to the Extreme (MBL202)
AWS re:Invent 2016: Taking Data to the Extreme (MBL202)
 
Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...
Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...
Amazon Kinesis: Real-time Streaming Big data Processing Applications (BDT311)...
 
Big Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics PlatformBig Data Pipeline and Analytics Platform
Big Data Pipeline and Analytics Platform
 
Scala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in ScalaScala eXchange: Building robust data pipelines in Scala
Scala eXchange: Building robust data pipelines in Scala
 

Similar to Data to Drive Decision-Making - CaliStream Meetup

Recommendation at scale
Recommendation at scaleRecommendation at scale
Recommendation at scalesimondolle
 
Track 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptx
Track 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptxTrack 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptx
Track 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptxAmazon Web Services
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarHortonworks
 
Data Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache KafkaData Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache Kafkaconfluent
 
Analyze billions of records on Salesforce App Cloud with BigObject
Analyze billions of records on Salesforce App Cloud with BigObjectAnalyze billions of records on Salesforce App Cloud with BigObject
Analyze billions of records on Salesforce App Cloud with BigObjectSalesforce Developers
 
DataArchiva’s Journey to Success in Salesforce Data Archiving
DataArchiva’s Journey to Success in Salesforce Data ArchivingDataArchiva’s Journey to Success in Salesforce Data Archiving
DataArchiva’s Journey to Success in Salesforce Data ArchivingDataArchiva
 
Get Started with Real-Time Streaming Data in Under 5 Minutes - AWS Online Tec...
Get Started with Real-Time Streaming Data in Under 5 Minutes - AWS Online Tec...Get Started with Real-Time Streaming Data in Under 5 Minutes - AWS Online Tec...
Get Started with Real-Time Streaming Data in Under 5 Minutes - AWS Online Tec...Amazon Web Services
 
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...AWS Summits
 
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...Amazon Web Services
 
Building-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdfBuilding-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdfAmazon Web Services
 
Building a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay NordicsBuilding a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay Nordicsjavier ramirez
 
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...Amazon Web Services
 
AWS 金融服務概覽與區塊鍊案例分享
AWS 金融服務概覽與區塊鍊案例分享AWS 金融服務概覽與區塊鍊案例分享
AWS 金融服務概覽與區塊鍊案例分享Amazon Web Services
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleAmazon Web Services
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsSlim Baltagi
 
Unleash the Potential of Big Data on Salesforce
Unleash the Potential of Big Data on SalesforceUnleash the Potential of Big Data on Salesforce
Unleash the Potential of Big Data on SalesforceDreamforce
 
Proof of Concept: Adobe Analytics Live Stream on Amazon Web Services
Proof of Concept: Adobe Analytics Live Stream on Amazon Web ServicesProof of Concept: Adobe Analytics Live Stream on Amazon Web Services
Proof of Concept: Adobe Analytics Live Stream on Amazon Web ServicesYASH Technologies
 
Developing a Continuous Automated Approach to Cloud Security
 Developing a Continuous Automated Approach to Cloud Security Developing a Continuous Automated Approach to Cloud Security
Developing a Continuous Automated Approach to Cloud SecurityAmazon Web Services
 
Data Pipelines -Big Data Meets Salesforce
Data Pipelines -Big Data Meets SalesforceData Pipelines -Big Data Meets Salesforce
Data Pipelines -Big Data Meets SalesforceCarolEnLaNube
 

Similar to Data to Drive Decision-Making - CaliStream Meetup (20)

Recommendation at scale
Recommendation at scaleRecommendation at scale
Recommendation at scale
 
Track 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptx
Track 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptxTrack 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptx
Track 1 Session 6_建立安全高效的資料分析平台加速金融創新_HC+EMQ Cliff(已檢核,上下無黑邊).pptx
 
Cloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinarCloudian 451-hortonworks - webinar
Cloudian 451-hortonworks - webinar
 
ABD217_From Batch to Streaming
ABD217_From Batch to StreamingABD217_From Batch to Streaming
ABD217_From Batch to Streaming
 
Data Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache KafkaData Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache Kafka
 
Analyze billions of records on Salesforce App Cloud with BigObject
Analyze billions of records on Salesforce App Cloud with BigObjectAnalyze billions of records on Salesforce App Cloud with BigObject
Analyze billions of records on Salesforce App Cloud with BigObject
 
DataArchiva’s Journey to Success in Salesforce Data Archiving
DataArchiva’s Journey to Success in Salesforce Data ArchivingDataArchiva’s Journey to Success in Salesforce Data Archiving
DataArchiva’s Journey to Success in Salesforce Data Archiving
 
Get Started with Real-Time Streaming Data in Under 5 Minutes - AWS Online Tec...
Get Started with Real-Time Streaming Data in Under 5 Minutes - AWS Online Tec...Get Started with Real-Time Streaming Data in Under 5 Minutes - AWS Online Tec...
Get Started with Real-Time Streaming Data in Under 5 Minutes - AWS Online Tec...
 
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
 
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
Need for Speed – Intro To Real-Time Data Streaming Analytics on AWS | AWS Sum...
 
Building-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdfBuilding-a-Modern-Data-Platform-in-the-Cloud.pdf
Building-a-Modern-Data-Platform-in-the-Cloud.pdf
 
Building a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay NordicsBuilding a modern data platform in the cloud. AWS DevDay Nordics
Building a modern data platform in the cloud. AWS DevDay Nordics
 
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
Serverless Stream Processing Pipeline Best Practices (SRV316-R1) - AWS re:Inv...
 
AWS 金融服務概覽與區塊鍊案例分享
AWS 金融服務概覽與區塊鍊案例分享AWS 金融服務概覽與區塊鍊案例分享
AWS 金融服務概覽與區塊鍊案例分享
 
Modern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at ScaleModern Data Architectures for Business Insights at Scale
Modern Data Architectures for Business Insights at Scale
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
 
Unleash the Potential of Big Data on Salesforce
Unleash the Potential of Big Data on SalesforceUnleash the Potential of Big Data on Salesforce
Unleash the Potential of Big Data on Salesforce
 
Proof of Concept: Adobe Analytics Live Stream on Amazon Web Services
Proof of Concept: Adobe Analytics Live Stream on Amazon Web ServicesProof of Concept: Adobe Analytics Live Stream on Amazon Web Services
Proof of Concept: Adobe Analytics Live Stream on Amazon Web Services
 
Developing a Continuous Automated Approach to Cloud Security
 Developing a Continuous Automated Approach to Cloud Security Developing a Continuous Automated Approach to Cloud Security
Developing a Continuous Automated Approach to Cloud Security
 
Data Pipelines -Big Data Meets Salesforce
Data Pipelines -Big Data Meets SalesforceData Pipelines -Big Data Meets Salesforce
Data Pipelines -Big Data Meets Salesforce
 

Recently uploaded

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 

Recently uploaded (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 

Data to Drive Decision-Making - CaliStream Meetup

  • 1. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 Data to Drive Decision-Making 2015-03-03 Jerome Boulon 1
  • 2. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 Quick History 1999 2008 2009 2010 2012 Yahoo!: Chukwa Hadoop Monitoring Solution Netflix: Honu Data Collection Pipeline CaliStream: Founder Honu: Data As a Service Monitoring Solution for cable modems/TV network Ontology Search Acquired by Microsoft
  • 3. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 Agenda 3 • Agenda • Decision-Making, the process • Netflix Recommendation • Big Data @ Riot Games • Data Pipeline • Conclusion
  • 4. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 Decision-Making, The Process • Explicit • Theory Driven • Data-driven • Measurable Outcomes • Iterative Prove a hypothesis right (or wrong) Want result AND explanation
  • 5. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 Iterative Hypothesis Gather DataAnalyze Data
  • 6. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 Offline Testing Online A/BTesting Roll-out To Prod Offline/Online Testing Fail Success Success Iterations Iterations
  • 7. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 Netflix Recommendation
  • 8. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 Presentation 1 1/3 0/3 0+/3 1+/3 Presentation 2 1/3 1/3 0+/3 2+/3 Netflix Recommendation
  • 9. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 Offline Testing Online A/BTesting Roll-out To Prod The Data Fail Success Success Search Time Rating Impression Demographic Social User Behavior Geo Information
  • 10. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 The Gaming Space
  • 11. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 Use Case: Honu @ • Error code • AVG Ping latency • AVG Queue Wait time • ClientVersion • Operating System • Hardware Profile • Etc
  • 12. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 The Data Pipeline 12
  • 13. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 Data Pipeline • Direct correlation to the success of your Big Data project • Structured and Multi-Structured Events • Schema evolution • Collecting massive amount of data live • 60% of BI project resources is consume here! • Most “underestimated” and “unsexy” but MOST important phase
  • 14. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 S3 Apps Sensor Data Click Stream Location … • Built for Netflix scale • 100+ Billions events/Day • Automatic Discovery, Load-balancing, Fail-over • Schema-less • … HONU
  • 15. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 S3 Apps Sensor Data Click Stream Location … • Team • Event format • Protocol • Discovery • Load balancing • Fail-Over • Kinesis/Kafka/Scrib e … • … Collect Collect Collect Collectors
  • 16. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 S3 Apps Sensor Data Click Stream Location … Collect Collect Collect Collectors • HadoopTeam • Hadoop knowledge • Hive • Schema evolution • Upgrades • Files optimization • …
  • 17. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 S3 Apps Sensor Data Click Stream Location … Time to market > 18 Months ! Collect Collect Collect Collectors
  • 18. Take Control ofYour Data: CaliStream.com © CaliStream.com 201518 CaliStreamTake Control ofYour Data
  • 19. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 S3 Apps Sensor Data Click Stream Location … HONU CaliStream: Honu as a Service Time to market (Hours)
  • 20. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 Our Solution: CaliStream CaliStream provides a SaaS data processing pipeline to easily stream large volume of events from your applications directly to Hive/Hadoop in a robust, scalable and cost effective way without any prior Hadoop Knowledge
  • 21. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 CaliStream, Native Hive integration Big Data Sensor Data Social Click Stream Location logs Sensor Data Click Stream Location logsSocial … … CaliStream
  • 22. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 CaliStream Everything in CaliStream is represented as a Hive table so you can easily analyze your data Select […] from[…] where[…] group by […] ;
  • 23. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 CaliStream HiveTable Java API
  • 24. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 Generic REST API + JSON CaliStream HiveTable
  • 25. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 HTTPRestProxy Collectors BI Analytics Redshift CaliStream EchoService AppsClusters
  • 26. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 CaliStream.comTake Control ofYour Data Jerome Boulon jboulon@caliStream.com CaliStreamTake Control ofYour Data
  • 27. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 Backup Slides 27
  • 28. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 BeaconServers Collectors BI Analytics Redshift AppsClusters * Ownership CaliStream * Run on your account Connectors Your Company Your Company Your S3 Bucket
  • 29. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 IoT Use Cases
  • 30. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 IoT Use Case 1 30
  • 31. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 Use Case: Global Boats tracking system • Boats information • Telemetry data • etc
  • 32. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 IoT Use Case 2 32
  • 33. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 Use case: Small company in Montreal • Business model: Sensor data acquisition for cities worldwide (Montreal,Toronto, Paris, …) Database Montreal Database City 2 Database City n
  • 34. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 Problems • Data silos: each city is self-contained, managed with his own deployment • Strict data access and security rules that limit business opportunities • Deriving value from larger data set is a tedious and manual task • Multiple copies of the same data • Etc
  • 35. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 Customer Initial Architecture 35 Database Sensors Sensor Listener Service
  • 36. Take Control ofYour Data: CaliStream.com © CaliStream.com 2015 CaliStream Integration (< 1 Week) 36 Sensors Sensor Listener Service CaliStream Database Table-1 Table-2 Table-n S3 Data Warehous e ive CaliStream Client SDK CaliStream Events Unified S3 Data warehouse Big Data: TheValue