SlideShare a Scribd company logo
1 of 74
Sudhir Tonse (@stonse)
Danny Yuan (@g9yuayon)
Big Data Pipeline and Analytics Platform
Using NetflixOSS and Other Open Source Software
Data Is the most important asset
at Netflix
If all the data is easily available to all
teams, it can be leveraged in new and
exciting ways
~1000 Device Types
~500 Apps/Web Services
~100 Billion Events/Day
3.2M messages per
second at peak time
3GB per second at peak
time
Dashboard
Type of Events
• User Interface Events
• Search Event (‘Matrix’ using PS3 …)
• Star Rating Event (HoC : 5 stars, Xbox, US, …)
• Infrastructural Events
• RPC Call (API -> Billing Service, ‘/bill/..’, 200, …)
• Log Errors (NPE, “Movie is null”, …, …)
• Other Events …
Making Sense of Billions of Events
http://netflix.github.io
+
A Humble Beginning
Evolution …Scale!
Application
Application
Application Application
Application
Application
Application
Application
ApplicationApplication
We Want to Process
App Data in Hadoop
Our Hadoop Ecosystem
@NetflixOSS Big Data Tools
Hadoop as a Service
Pig Scripting on Steroids
Pig Married to Clojure
“Map-Reduce for Clojure”
S3MPER
S3mper is a library that provides an additional layer of consistency checking on top of Amazon's S3 index through use of a consistent, secondary index.
S3mper is a library that provides an
additional layer of consistency
checking on top of Amazon's S3 index through
use of a consistent, secondary index.
Efficient ETL with Cassandra
Cassandra
Offline Analysis
Evolution … Speed!
We Want to Aggregate, Index, and
Query Data in Real Time
Interactive Exploration
Let’s walk through some use cases
client activity event
*
/name = “movieStarts”
Pipeline Challenges
• App owners: send and forget
• Data scientists: validation, ETL, batch
processing
• DevOps: stream processing, targeted search
Message Routing
We Want to Consume Data
Selectively in Different Ways
• Message broker
• High-throughput
• Persistent and replicated
There Is More
Intelligent Alerts
Intelligent Alerts
Guided Debugging in the Right Context
Guided Debugging in the Right Context
Guided Debugging in the Right Context
• Ad-hoc query with different dimensions
• Quick aggregations and Top-N queries
• Time series with flexible filters
• Quick access to raw data using boolean
queries
What We Need
Druid
• Rapid exploration of high dimensional data
• Fast ingestion and querying
• Time series
• Real-time indexing of event streams
• Killer feature: boolean search
• Great UI: Kibana
The Old Pipeline
The New Pipeline
There Is More
It’s Not All About Counters and Time Series
RequestId Parent Id Node Id Service Name Status
4965-4a74 0 123 Edge Service 200
4965-4a74 123 456 Gateway 200
4965-4a74 456 789 Service A 200
4965-4a74e 456 abc Service B 200
Status:200
Distributed Tracing
Distributed Tracing
Distributed Tracing
A System that Supports All These
A Data Pipeline To Glue
Them All
Make It Simple
Message Producing
• Simple and Uniform API
• messageBus.publish(event)
Consumption Is Simple Too
consumer.observe().subscribe(new Subscriber<>() {
@Override
public void onNext(Ackable<IncomingMessage> ackable) {
process(ackable.getEntity(MyEventType.class));
ackable.ack();
}
});
consumer.pause();
consumer.resume()
RxJava
• Functional reactive programming model
• Powerful streaming API
• Separation of logic and threading model
Design Decisions
• Top Priority: app stability and throughput
• Asynchronous operations
• Aggressive buffering
• Drops messages if necessary
Anything Can Fail
Cloud Resiliency
Fault Tolerance Features
• Write and forward with auto-reattached EBS
(Amazon’s Elastic Block Storage)
• disk-backed queue: big-queue
• Customized scaling down
There’s More to Do
• Contribute to @NetflixOSS
• Join us :-)
Summary
http://netflix.github.io
+
You can build your own web-scale data
pipeline using open source components
Thank You!
Sudhir Tonse
http://www.linkedin.com/in/sudhirtonse
Twitter: @stonse
Danny Yuan
http://www.linkedin.com/pub/danny-
yuan/4/374/862
Twitter: @g9yuayon

More Related Content

What's hot

Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteGigaom
 
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014Amazon Web Services
 
Presto Talk @ Hadoop Summit'15
Presto Talk @ Hadoop Summit'15Presto Talk @ Hadoop Summit'15
Presto Talk @ Hadoop Summit'15Nezih Yigitbasi
 
QCon SF-2015 Stream Processing in uber
QCon SF-2015 Stream Processing in uberQCon SF-2015 Stream Processing in uber
QCon SF-2015 Stream Processing in uberDanny Yuan
 
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012Amazon Web Services
 
AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)
AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)
AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)Amazon Web Services
 
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...Amazon Web Services
 
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017Amazon Web Services
 
Scaling Traffic from 0 to 139 Million Unique Visitors
Scaling Traffic from 0 to 139 Million Unique VisitorsScaling Traffic from 0 to 139 Million Unique Visitors
Scaling Traffic from 0 to 139 Million Unique VisitorsYelp Engineering
 
Scalable Deep Learning on AWS Using Apache MXNet - AWS Summit Tel Aviv 2017
Scalable Deep Learning on AWS Using Apache MXNet - AWS Summit Tel Aviv 2017Scalable Deep Learning on AWS Using Apache MXNet - AWS Summit Tel Aviv 2017
Scalable Deep Learning on AWS Using Apache MXNet - AWS Summit Tel Aviv 2017Amazon Web Services
 
Earth Observation in the Cloud using ENVI
Earth Observation in the Cloud using ENVIEarth Observation in the Cloud using ENVI
Earth Observation in the Cloud using ENVIAmazon Web Services
 
Fast Data at Scale - AWS Summit Tel Aviv 2017
Fast Data at Scale - AWS Summit Tel Aviv 2017Fast Data at Scale - AWS Summit Tel Aviv 2017
Fast Data at Scale - AWS Summit Tel Aviv 2017Amazon Web Services
 
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and DruidPulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and DruidTony Ng
 
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014Eva Tse
 
The Fermilab HEPCloud Facility
The Fermilab HEPCloud FacilityThe Fermilab HEPCloud Facility
The Fermilab HEPCloud FacilityClaudio Pontili
 
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Amazon Web Services
 
Presto @ Netflix: Interactive Queries at Petabyte Scale
Presto @ Netflix: Interactive Queries at Petabyte ScalePresto @ Netflix: Interactive Queries at Petabyte Scale
Presto @ Netflix: Interactive Queries at Petabyte ScaleDataWorks Summit
 
High Performance Computing Implementation on AWS
High Performance Computing Implementation on AWSHigh Performance Computing Implementation on AWS
High Performance Computing Implementation on AWSAmazon Web Services
 
20181027 deep learningcommunity_aws
20181027 deep learningcommunity_aws20181027 deep learningcommunity_aws
20181027 deep learningcommunity_awsHirokuni Uchida
 

What's hot (20)

Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
 
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
(BDT403) Netflix's Next Generation Big Data Platform | AWS re:Invent 2014
 
Presto Talk @ Hadoop Summit'15
Presto Talk @ Hadoop Summit'15Presto Talk @ Hadoop Summit'15
Presto Talk @ Hadoop Summit'15
 
QCon SF-2015 Stream Processing in uber
QCon SF-2015 Stream Processing in uberQCon SF-2015 Stream Processing in uber
QCon SF-2015 Stream Processing in uber
 
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
BDT303 Data Science with Elastic MapReduce - AWS re: Invent 2012
 
AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)
AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)
AWS re:Invent 2016: Tableau Rules of Engagement in the Cloud (STG306)
 
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
(ADV402) Beating the Speed of Light with Your Infrastructure in AWS | AWS re:...
 
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
Interactive Analytics on AWS - AWS Summit Tel Aviv 2017
 
Scaling Traffic from 0 to 139 Million Unique Visitors
Scaling Traffic from 0 to 139 Million Unique VisitorsScaling Traffic from 0 to 139 Million Unique Visitors
Scaling Traffic from 0 to 139 Million Unique Visitors
 
Scalable Deep Learning on AWS Using Apache MXNet - AWS Summit Tel Aviv 2017
Scalable Deep Learning on AWS Using Apache MXNet - AWS Summit Tel Aviv 2017Scalable Deep Learning on AWS Using Apache MXNet - AWS Summit Tel Aviv 2017
Scalable Deep Learning on AWS Using Apache MXNet - AWS Summit Tel Aviv 2017
 
Earth Observation in the Cloud using ENVI
Earth Observation in the Cloud using ENVIEarth Observation in the Cloud using ENVI
Earth Observation in the Cloud using ENVI
 
Fast Data at Scale - AWS Summit Tel Aviv 2017
Fast Data at Scale - AWS Summit Tel Aviv 2017Fast Data at Scale - AWS Summit Tel Aviv 2017
Fast Data at Scale - AWS Summit Tel Aviv 2017
 
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and DruidPulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
Pulsar: Real-time Analytics at Scale with Kafka, Kylin and Druid
 
Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014Next Generation Big Data Platform at Netflix 2014
Next Generation Big Data Platform at Netflix 2014
 
The Fermilab HEPCloud Facility
The Fermilab HEPCloud FacilityThe Fermilab HEPCloud Facility
The Fermilab HEPCloud Facility
 
BDA304 Data-Driven Post Mortems
BDA304 Data-Driven Post MortemsBDA304 Data-Driven Post Mortems
BDA304 Data-Driven Post Mortems
 
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
Running Fast, Interactive Queries on Petabyte Datasets using Presto - AWS Jul...
 
Presto @ Netflix: Interactive Queries at Petabyte Scale
Presto @ Netflix: Interactive Queries at Petabyte ScalePresto @ Netflix: Interactive Queries at Petabyte Scale
Presto @ Netflix: Interactive Queries at Petabyte Scale
 
High Performance Computing Implementation on AWS
High Performance Computing Implementation on AWSHigh Performance Computing Implementation on AWS
High Performance Computing Implementation on AWS
 
20181027 deep learningcommunity_aws
20181027 deep learningcommunity_aws20181027 deep learningcommunity_aws
20181027 deep learningcommunity_aws
 

Viewers also liked

QConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing systemQConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing systemDanny Yuan
 
Quantcast File System (QFS) - Alternative to HDFS
Quantcast File System (QFS) - Alternative to HDFSQuantcast File System (QFS) - Alternative to HDFS
Quantcast File System (QFS) - Alternative to HDFSbigdatagurus_meetup
 
Stream Processing Frameworks
Stream Processing FrameworksStream Processing Frameworks
Stream Processing FrameworksSirKetchup
 
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...Sudhir Tonse
 
Netflix Cloud Platform Building Blocks
Netflix Cloud Platform Building BlocksNetflix Cloud Platform Building Blocks
Netflix Cloud Platform Building BlocksSudhir Tonse
 
Netflix Edge Engineering Open House Presentations - June 9, 2016
Netflix Edge Engineering Open House Presentations - June 9, 2016Netflix Edge Engineering Open House Presentations - June 9, 2016
Netflix Edge Engineering Open House Presentations - June 9, 2016Daniel Jacobson
 
Architecting for the Cloud using NetflixOSS - Codemash Workshop
Architecting for the Cloud using NetflixOSS - Codemash WorkshopArchitecting for the Cloud using NetflixOSS - Codemash Workshop
Architecting for the Cloud using NetflixOSS - Codemash WorkshopSudhir Tonse
 
Machine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleMachine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleXavier Amatriain
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectiveXavier Amatriain
 
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?Hortonworks
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Xavier Amatriain
 
Pros and Cons of a MicroServices Architecture talk at AWS ReInvent
Pros and Cons of a MicroServices Architecture talk at AWS ReInventPros and Cons of a MicroServices Architecture talk at AWS ReInvent
Pros and Cons of a MicroServices Architecture talk at AWS ReInventSudhir Tonse
 
MicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scaleMicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scaleSudhir Tonse
 
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...Amazon Web Services
 
Staying Shallow & Lean in a Deep Learning World
Staying Shallow & Lean in a Deep Learning WorldStaying Shallow & Lean in a Deep Learning World
Staying Shallow & Lean in a Deep Learning WorldXavier Amatriain
 
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance DataWorks Summit/Hadoop Summit
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014P. Taylor Goetz
 

Viewers also liked (19)

QConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing systemQConSF 2014 talk on Netflix Mantis, a stream processing system
QConSF 2014 talk on Netflix Mantis, a stream processing system
 
Quantcast File System (QFS) - Alternative to HDFS
Quantcast File System (QFS) - Alternative to HDFSQuantcast File System (QFS) - Alternative to HDFS
Quantcast File System (QFS) - Alternative to HDFS
 
Stream Processing Frameworks
Stream Processing FrameworksStream Processing Frameworks
Stream Processing Frameworks
 
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
Big Data Pipeline and Analytics Platform Using NetflixOSS and Other Open Sour...
 
Netflix Cloud Platform Building Blocks
Netflix Cloud Platform Building BlocksNetflix Cloud Platform Building Blocks
Netflix Cloud Platform Building Blocks
 
Netflix Edge Engineering Open House Presentations - June 9, 2016
Netflix Edge Engineering Open House Presentations - June 9, 2016Netflix Edge Engineering Open House Presentations - June 9, 2016
Netflix Edge Engineering Open House Presentations - June 9, 2016
 
Architecting for the Cloud using NetflixOSS - Codemash Workshop
Architecting for the Cloud using NetflixOSS - Codemash WorkshopArchitecting for the Cloud using NetflixOSS - Codemash Workshop
Architecting for the Cloud using NetflixOSS - Codemash Workshop
 
Machine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora ExampleMachine Learning for Q&A Sites: The Quora Example
Machine Learning for Q&A Sites: The Quora Example
 
Past present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry PerspectivePast present and future of Recommender Systems: an Industry Perspective
Past present and future of Recommender Systems: an Industry Perspective
 
HDFS Erasure Coding in Action
HDFS Erasure Coding in Action HDFS Erasure Coding in Action
HDFS Erasure Coding in Action
 
S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?S3Guard: What's in your consistency model?
S3Guard: What's in your consistency model?
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
Pros and Cons of a MicroServices Architecture talk at AWS ReInvent
Pros and Cons of a MicroServices Architecture talk at AWS ReInventPros and Cons of a MicroServices Architecture talk at AWS ReInvent
Pros and Cons of a MicroServices Architecture talk at AWS ReInvent
 
MicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scaleMicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scale
 
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
AWS re:Invent 2016: Netflix: Using Amazon S3 as the fabric of our big data ec...
 
Staying Shallow & Lean in a Deep Learning World
Staying Shallow & Lean in a Deep Learning WorldStaying Shallow & Lean in a Deep Learning World
Staying Shallow & Lean in a Deep Learning World
 
Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance Debunking the Myths of HDFS Erasure Coding Performance
Debunking the Myths of HDFS Erasure Coding Performance
 
Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014Scaling Apache Storm - Strata + Hadoop World 2014
Scaling Apache Storm - Strata + Hadoop World 2014
 
Culture
CultureCulture
Culture
 

Similar to Big Data Pipeline and Analytics Platform

DevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suroDevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suroGaurav "GP" Pal
 
What's New in 6.3 + Data On-Boarding
What's New in 6.3 + Data On-BoardingWhat's New in 6.3 + Data On-Boarding
What's New in 6.3 + Data On-BoardingSplunk
 
D3SF17- Improving Our China Clients Performance
D3SF17- Improving Our China Clients PerformanceD3SF17- Improving Our China Clients Performance
D3SF17- Improving Our China Clients PerformanceImperva Incapsula
 
Mitigating One Million Security Threats With Kafka and Spark With Arun Janart...
Mitigating One Million Security Threats With Kafka and Spark With Arun Janart...Mitigating One Million Security Threats With Kafka and Spark With Arun Janart...
Mitigating One Million Security Threats With Kafka and Spark With Arun Janart...HostedbyConfluent
 
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin KumarSiphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumarconfluent
 
Distributed dataintelligence
Distributed dataintelligenceDistributed dataintelligence
Distributed dataintelligencewww.ixxo.io
 
Don't think DevOps think Compliant Database DevOps
Don't think DevOps think Compliant Database DevOpsDon't think DevOps think Compliant Database DevOps
Don't think DevOps think Compliant Database DevOpsRed Gate Software
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDogRedis Labs
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriDemi Ben-Ari
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time AnalyticsAmazon Web Services
 
Setting up InfluxData for IoT
Setting up InfluxData for IoTSetting up InfluxData for IoT
Setting up InfluxData for IoTInfluxData
 
You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard
You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard
You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard confluent
 
Building a modern in-house analytics pipeline
Building a modern in-house analytics pipelineBuilding a modern in-house analytics pipeline
Building a modern in-house analytics pipelineSergey Burkov
 
How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...
How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...
How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...Precisely
 
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Maya Lumbroso
 
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Dataconomy Media
 
Data Onboarding Breakout Session
Data Onboarding Breakout SessionData Onboarding Breakout Session
Data Onboarding Breakout SessionSplunk
 
Azure Event Grid: Glue for the Internet
Azure Event Grid: Glue for the InternetAzure Event Grid: Glue for the Internet
Azure Event Grid: Glue for the InternetJeremy Likness
 
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.Data Con LA
 

Similar to Big Data Pipeline and Analytics Platform (20)

DevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suroDevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
DevOps in the Amazon Cloud – Learn from the pioneersNetflix suro
 
What's New in 6.3 + Data On-Boarding
What's New in 6.3 + Data On-BoardingWhat's New in 6.3 + Data On-Boarding
What's New in 6.3 + Data On-Boarding
 
D3SF17- Improving Our China Clients Performance
D3SF17- Improving Our China Clients PerformanceD3SF17- Improving Our China Clients Performance
D3SF17- Improving Our China Clients Performance
 
Mesoscon 2015
Mesoscon 2015Mesoscon 2015
Mesoscon 2015
 
Mitigating One Million Security Threats With Kafka and Spark With Arun Janart...
Mitigating One Million Security Threats With Kafka and Spark With Arun Janart...Mitigating One Million Security Threats With Kafka and Spark With Arun Janart...
Mitigating One Million Security Threats With Kafka and Spark With Arun Janart...
 
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin KumarSiphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
 
Distributed dataintelligence
Distributed dataintelligenceDistributed dataintelligence
Distributed dataintelligence
 
Don't think DevOps think Compliant Database DevOps
Don't think DevOps think Compliant Database DevOpsDon't think DevOps think Compliant Database DevOps
Don't think DevOps think Compliant Database DevOps
 
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
Monitoring and Scaling Redis at DataDog - Ilan Rabinovitch, DataDog
 
Big Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-AriBig Data made easy in the era of the Cloud - Demi Ben-Ari
Big Data made easy in the era of the Cloud - Demi Ben-Ari
 
Getting Started with Real-time Analytics
Getting Started with Real-time AnalyticsGetting Started with Real-time Analytics
Getting Started with Real-time Analytics
 
Setting up InfluxData for IoT
Setting up InfluxData for IoTSetting up InfluxData for IoT
Setting up InfluxData for IoT
 
You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard
You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard
You Must Construct Additional Pipelines: Pub-Sub on Kafka at Blizzard
 
Building a modern in-house analytics pipeline
Building a modern in-house analytics pipelineBuilding a modern in-house analytics pipeline
Building a modern in-house analytics pipeline
 
How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...
How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...
How Precisely and Splunk Can Help You Better Manage Your IBM Z and IBM i Envi...
 
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
 
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
Ronan Corkery, kdb+ developer at Kx Systems: “Kdb+: How Wall Street Tech can ...
 
Data Onboarding Breakout Session
Data Onboarding Breakout SessionData Onboarding Breakout Session
Data Onboarding Breakout Session
 
Azure Event Grid: Glue for the Internet
Azure Event Grid: Glue for the InternetAzure Event Grid: Glue for the Internet
Azure Event Grid: Glue for the Internet
 
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
 

Big Data Pipeline and Analytics Platform

Editor's Notes

  1. Something happened. Our traffic turned into a hockey stick, and the number of applications exploded. So, log traffic also exploded. Simple log scraping wouldn’t cut it any more.
  2. For one thing: interactive exploration. Sometimes we want to get data in real time so we can act quickly. Some data is only useful in a small time window after all. Sometimes we want to perform lots of experimental queries just to find the right insights. If we wait too long for a query back, we won’t be able to iterate fast enough. Either way, we need to get query results back in seconds.
  3. Here is one example: we process more than 150 thousand events per second about user activities. What if we’d like to know the geographically how many users started playing videos in the past 5 minutes? So I submit my query, and in a few seconds.... But this is an aggregated view. What if I want to drill down the data immediately along different dimensions? In this particular case, to find out failed attempts on our SilverLight players that run on PCs and Macs?
  4. Note this is different from alerting based on monitoring metrics. Monitoring metrics are great and versatile. But it doesn’t help us catch unexpected errors. When we build an application, we instrument our code diligently, yet it’s very likely we miss some critical instrumentation points. There’s one thing that we always catch, though: logged errors and unhandled exceptions. It’s about The alert provides a precise entrypoint and the right context for people to drill down the right problems
  5. Note this is different from alerting based on monitoring metrics. Monitoring metrics are great and versatile. But it doesn’t help us catch unexpected errors. When we build an application, we instrument our code diligently, yet it’s very likely we miss some critical instrumentation points. There’s one thing that we always catch, though: logged errors and unhandled exceptions. It’s about The alert provides a precise entrypoint and the right context for people to drill down the right problems