SlideShare a Scribd company logo
1 of 31
1Ā© Cloudera, Inc. All rights reserved.
13 June2016
Ted Malaska| Principle Solutions Architect @ Cloudera,
Pat Patterson| Community Champion @ StreamSets
Ingest and Stream Processing -
What will you choose?
InfoQ.com: News & Community Site
ā€¢ 750,000 unique visitors/month
ā€¢ Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
ā€¢ Post content from our QCon conferences
ā€¢ News 15-20 / week
ā€¢ Articles 3-4 / week
ā€¢ Presentations (videos) 12-15 / week
ā€¢ Interviews 2-3 / week
ā€¢ Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
ingest-stream-processing
Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
2Ā© Cloudera, Inc. All rights reserved.
About Ted and Pat
Ted Malaska
ā€¢ Principal Solutions Architect
@ Cloudera
ā€¢ Apache HBase SparkOnHBase
Contributor
ā€¢ Contact
ā€¢ ted.malaska@cloudera.com
ā€¢ @TedMalaska
Pat Patterson
ā€¢ Community Champion @
StreamSets
ā€¢ Formerly Developer Evangelist at
Salesforce
ā€¢ Contact
ā€¢ pat@streamsets.com
ā€¢ @metadaddy
3Ā© Cloudera, Inc. All rights reserved.
Streaming Patterns
ā€¢Ingestion
ā€¢Low Millisecond Actions
ā€¢Near Real Time Complex Actions
4Ā© Cloudera, Inc. All rights reserved.
Parts Of Streaming
Producer Kafka Engine Destination
5Ā© Cloudera, Inc. All rights reserved.
Parts Of Streaming
Producer Kafka Engine Destination
At Least once
Ordered
Partitioned
At Least Once Depends
Depends
6Ā© Cloudera, Inc. All rights reserved.
Destinations
ā€¢ File Systems: example HDFS
ā€¢ Batch is good
ā€¢ Only can do exactly once is a file is closed in a single ack.
ā€¢ Good for Scans
ā€¢ Solr
ā€¢ Everything is Document based making exactly once
ā€¢ Batch is still good
ā€¢ Good for Search Queries
7Ā© Cloudera, Inc. All rights reserved.
Destinations
ā€¢ NoSQL: example HBase
ā€¢ Everything has a row key making exactly once for writes
ā€¢ Increments can be applied twice is so be careful
ā€¢ Good for gets and puts
ā€¢ Kudu
ā€¢ Everything has a row key making exactly once for writes
ā€¢ Good for gets, puts, and scans
8Ā© Cloudera, Inc. All rights reserved.
Ingestion Destinations
ā€¢ File Systems: example HDFS
ā€¢ Flume
ā€¢ Kafka Connect
ā€¢ Solr
ā€¢ Flume
ā€¢ Any Streaming Engine
9Ā© Cloudera, Inc. All rights reserved.
Ingestion Destinations
ā€¢ NoSQL: example HBase
ā€¢ Flume
ā€¢ Any Streaming Engine: Storm and Spark Streaming Tested
ā€¢ Kudu
ā€¢ Flume
ā€¢ Kafka Connect
ā€¢ Any Streaming Engine: Spark Streaming Tested
10Ā© Cloudera, Inc. All rights reserved.
Tricks With Producers
ā€¢ Send Source ID (requires Partitioning In Kafka)
ā€¢ Seq
ā€¢ UUID
ā€¢ UUID plus time
ā€¢ Partition on SourceID
ā€¢ Watch out for repartitions and partition fail overs
11Ā© Cloudera, Inc. All rights reserved.
Streaming Engines
ā€¢ Consumer
ā€¢ Flume, KafkaConnect, Streaming Engine
ā€¢ Storm
ā€¢ Spark Streaming
ā€¢ Flink
ā€¢ Kafka Streams
12Ā© Cloudera, Inc. All rights reserved.
Consumer: Flume, KafkaConnect
ā€¢ Simple and Works
ā€¢ Low latency
ā€¢ High throughput
ā€¢ Interceptors
ā€¢ Transformations
ā€¢ Alerting
ā€¢ Ingestions
13Ā© Cloudera, Inc. All rights reserved.
Consumer: Streaming Engines
ā€¢ Not so great at HDFS Ingestion
ā€¢ But great for record storage systems
ā€¢ HBase
ā€¢ Cassandra
ā€¢ Kudu
ā€¢ SolR
ā€¢ Elastic Search
14Ā© Cloudera, Inc. All rights reserved.
Storm
ā€¢ Old Gen
ā€¢ Low latency
ā€¢ Low throughput
ā€¢ At least once
ā€¢ Around for ever
ā€¢ Topology Based
15Ā© Cloudera, Inc. All rights reserved.
Spark Streaming
ā€¢ The Juggernaut
ā€¢ Higher Latency
ā€¢ High Through Put
ā€¢ Exactly Once
ā€¢ SQL
ā€¢ MlLib
ā€¢ Highly used
ā€¢ Easy to Debug/Unit Test
ā€¢ Easy to transition from
Batch
ā€¢ Flow Language
ā€¢ 600 commits in a month
and about 100 meetups
16Ā© Cloudera, Inc. All rights reserved.
Spark Streaming
DStream
DStream
DStream
Single Pass
Source Receiver RDD
Source Receiver RDD
RDD
Filter Count Print
Source Receiver RDD
RDD
RDD
Single Pass
Filter Count Print
First
Batch
Second
Batch
17Ā© Cloudera, Inc. All rights reserved.
DStream
DStream
DStream
Single Pass
Source Receiver RDD
Source Receiver RDD
RDD
Filter Count
Print
Source Receiver
RDD
partitions
RDD
Parition
RDD
Single Pass
Filter Count
Pre-first
Batch
First
Batch
Second
Batch
Stateful
RDD 1
Print
Stateful
RDD 2
Stateful
RDD 1
Spark Streaming
18Ā© Cloudera, Inc. All rights reserved.
Flink
ā€¢ Iā€™m Better Than Spark Why Doesnā€™t Anyone use me
ā€¢ Very much like Spark but not as feature rich
ā€¢ Lower Latency
ā€¢ Micro Batch -> ABS
ā€¢ Asynchronous Barrier Snapshotting
ā€¢ Flow Language
ā€¢ ~1/6th the comments and meetups
ā€¢ But Slim loves it ļŠ
19Ā© Cloudera, Inc. All rights reserved.
Flink - ABS
Operator
Buffer
20Ā© Cloudera, Inc. All rights reserved.
Operator
Buffer
Operator
Buffer
Flink - ABS
Barrier 1A
Hit
Barrier 1B
Still Behind
21Ā© Cloudera, Inc. All rights reserved.
Operator
Buffer
Flink - ABS
Both
Barriers Hit
Operator
Buffer
Barrier 1A
Hit
Barrier 1B
Still Behind
22Ā© Cloudera, Inc. All rights reserved.
Operator
Buffer
Flink - ABS
Both
Barriers Hit
Operator
Buffer
Barrier is
combined
and can
move on
Buffer can
be flushed
out
23Ā© Cloudera, Inc. All rights reserved.
Kafka Streams
ā€¢ The new Kid on the Block
ā€¢ When you only have Kafka
ā€¢ Low Latency
ā€¢ High Throughput
ā€¢ Not exactly once
ā€¢ Very Young
ā€¢ Flow Language
ā€¢ Very different hardware profile then others
ā€¢ Not widely supported
ā€¢ Not widely used
ā€¢ Worries about separation of concern
24Ā© Cloudera, Inc. All rights reserved.
Summary about Engines
ā€¢ Ingestion
ā€¢ Flume and KafkaConnect
ā€¢ Super Real Time and Special
ā€¢ Consumer
ā€¢ Counting, MlLib, SQL
ā€¢ Spark
ā€¢ Maybe future and cool
ā€¢ Flink and KafkaStreams
ā€¢ Odd man out
ā€¢ Storm
25Ā© Cloudera, Inc. All rights reserved.
Abstractions
Code Abstractions
Beam
SQL Abstraction
SQL
UI Abstraction
StreamSets
Streaming Engines
26Ā© Cloudera, Inc. All rights reserved.
StreamSets Data Collector
Building a Higher Level, Open Source Tool
27Ā© Cloudera, Inc. All rights reserved.
Traditional and Big Data
Founders
StreamSets Company Background
Top tier Investors
Momentum to Date
Strategic Partners
ā€¢ Founded 2014; exited stealth 9/15
ā€¢ ~30 employees
ā€¢ Double-digit enterprise customers
ā€¢ 10,000 downloads
28Ā© Cloudera, Inc. All rights reserved.
Thank you!
Watch the video with slide synchronization on
InfoQ.com!
https://www.infoq.com/presentations/ingest-
stream-processing

More Related Content

Viewers also liked

ęø…ę˜Žēƀ
ęø…ę˜Žēƀęø…ę˜Žēƀ
ęø…ę˜ŽēƀJoanne Chen
Ā 
Seminarie 'Sturen op effecten door slimme dashboards' 3 december 2015
Seminarie 'Sturen op effecten door slimme dashboards' 3 december 2015Seminarie 'Sturen op effecten door slimme dashboards' 3 december 2015
Seminarie 'Sturen op effecten door slimme dashboards' 3 december 2015Mƶbius Business Redesign
Ā 
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuPostgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuRedis Labs
Ā 
HIgh Performance Redis- Tague Griffith, GoPro
HIgh Performance Redis- Tague Griffith, GoProHIgh Performance Redis- Tague Griffith, GoPro
HIgh Performance Redis- Tague Griffith, GoProRedis Labs
Ā 
ęˆ‘ęƒ³č«‹ä½ åƒé£Æ (ē¹ä½“)
ęˆ‘ęƒ³č«‹ä½ åƒé£Æ (ē¹ä½“)ęˆ‘ęƒ³č«‹ä½ åƒé£Æ (ē¹ä½“)
ęˆ‘ęƒ³č«‹ä½ åƒé£Æ (ē¹ä½“)Na Li
Ā 
Condicionamiento y aprendizaje
Condicionamiento y aprendizajeCondicionamiento y aprendizaje
Condicionamiento y aprendizajeJakelinCuenca
Ā 
[biurowi 5 - en] basic principles of fire protection
[biurowi 5 - en] basic principles of fire protection[biurowi 5 - en] basic principles of fire protection
[biurowi 5 - en] basic principles of fire protectionAktywBHP
Ā 
Lista de verbos Irregulares - InglƩs
Lista de verbos Irregulares - InglƩsLista de verbos Irregulares - InglƩs
Lista de verbos Irregulares - InglƩsJakelinCuenca
Ā 
Chinese Link Lesson 20 worksheet 2016
Chinese Link Lesson 20  worksheet 2016Chinese Link Lesson 20  worksheet 2016
Chinese Link Lesson 20 worksheet 2016Joanne Chen
Ā 
Chinese link textbook Lesson 6 vocabulary
Chinese link textbook Lesson 6 vocabulary Chinese link textbook Lesson 6 vocabulary
Chinese link textbook Lesson 6 vocabulary Joanne Chen
Ā 
A Presentation on "NGO's Role in Disaster Management" Presented by Mr. Deepak...
A Presentation on "NGO's Role in Disaster Management" Presented by Mr. Deepak...A Presentation on "NGO's Role in Disaster Management" Presented by Mr. Deepak...
A Presentation on "NGO's Role in Disaster Management" Presented by Mr. Deepak...CDRN
Ā 

Viewers also liked (11)

ęø…ę˜Žēƀ
ęø…ę˜Žēƀęø…ę˜Žēƀ
ęø…ę˜Žēƀ
Ā 
Seminarie 'Sturen op effecten door slimme dashboards' 3 december 2015
Seminarie 'Sturen op effecten door slimme dashboards' 3 december 2015Seminarie 'Sturen op effecten door slimme dashboards' 3 december 2015
Seminarie 'Sturen op effecten door slimme dashboards' 3 december 2015
Ā 
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, HerokuPostgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Postgres & Redis Sitting in a Tree- Rimas Silkaitis, Heroku
Ā 
HIgh Performance Redis- Tague Griffith, GoPro
HIgh Performance Redis- Tague Griffith, GoProHIgh Performance Redis- Tague Griffith, GoPro
HIgh Performance Redis- Tague Griffith, GoPro
Ā 
ęˆ‘ęƒ³č«‹ä½ åƒé£Æ (ē¹ä½“)
ęˆ‘ęƒ³č«‹ä½ åƒé£Æ (ē¹ä½“)ęˆ‘ęƒ³č«‹ä½ åƒé£Æ (ē¹ä½“)
ęˆ‘ęƒ³č«‹ä½ åƒé£Æ (ē¹ä½“)
Ā 
Condicionamiento y aprendizaje
Condicionamiento y aprendizajeCondicionamiento y aprendizaje
Condicionamiento y aprendizaje
Ā 
[biurowi 5 - en] basic principles of fire protection
[biurowi 5 - en] basic principles of fire protection[biurowi 5 - en] basic principles of fire protection
[biurowi 5 - en] basic principles of fire protection
Ā 
Lista de verbos Irregulares - InglƩs
Lista de verbos Irregulares - InglƩsLista de verbos Irregulares - InglƩs
Lista de verbos Irregulares - InglƩs
Ā 
Chinese Link Lesson 20 worksheet 2016
Chinese Link Lesson 20  worksheet 2016Chinese Link Lesson 20  worksheet 2016
Chinese Link Lesson 20 worksheet 2016
Ā 
Chinese link textbook Lesson 6 vocabulary
Chinese link textbook Lesson 6 vocabulary Chinese link textbook Lesson 6 vocabulary
Chinese link textbook Lesson 6 vocabulary
Ā 
A Presentation on "NGO's Role in Disaster Management" Presented by Mr. Deepak...
A Presentation on "NGO's Role in Disaster Management" Presented by Mr. Deepak...A Presentation on "NGO's Role in Disaster Management" Presented by Mr. Deepak...
A Presentation on "NGO's Role in Disaster Management" Presented by Mr. Deepak...
Ā 

More from C4Media

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoC4Media
Ā 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileC4Media
Ā 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020C4Media
Ā 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsC4Media
Ā 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No KeeperC4Media
Ā 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like OwnersC4Media
Ā 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaC4Media
Ā 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideC4Media
Ā 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
Ā 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
Ā 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
Ā 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
Ā 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
Ā 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
Ā 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
Ā 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
Ā 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
Ā 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
Ā 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
Ā 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
Ā 

More from C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Ā 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
Ā 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
Ā 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
Ā 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
Ā 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
Ā 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Ā 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
Ā 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
Ā 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
Ā 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
Ā 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
Ā 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
Ā 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
Ā 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
Ā 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
Ā 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
Ā 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
Ā 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
Ā 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Ā 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
Ā 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
Ā 
Mcleodganj Call Girls šŸ„° 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls šŸ„° 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls šŸ„° 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls šŸ„° 8617370543 Service Offer VIP Hot ModelDeepika Singh
Ā 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
Ā 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
Ā 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
Ā 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
Ā 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
Ā 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
Ā 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
Ā 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
Ā 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
Ā 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
Ā 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
Ā 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
Ā 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
Ā 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
Ā 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
Ā 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
Ā 

Recently uploaded (20)

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
Ā 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
Ā 
Mcleodganj Call Girls šŸ„° 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls šŸ„° 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls šŸ„° 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls šŸ„° 8617370543 Service Offer VIP Hot Model
Ā 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
Ā 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Ā 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Ā 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
Ā 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
Ā 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Ā 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Ā 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Ā 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
Ā 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
Ā 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
Ā 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
Ā 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Ā 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
Ā 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
Ā 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
Ā 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
Ā 

Ingest & Stream Processing - What Will You Choose?

  • 1. 1Ā© Cloudera, Inc. All rights reserved. 13 June2016 Ted Malaska| Principle Solutions Architect @ Cloudera, Pat Patterson| Community Champion @ StreamSets Ingest and Stream Processing - What will you choose?
  • 2. InfoQ.com: News & Community Site ā€¢ 750,000 unique visitors/month ā€¢ Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) ā€¢ Post content from our QCon conferences ā€¢ News 15-20 / week ā€¢ Articles 3-4 / week ā€¢ Presentations (videos) 12-15 / week ā€¢ Interviews 2-3 / week ā€¢ Books 1 / month Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ ingest-stream-processing
  • 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4. 2Ā© Cloudera, Inc. All rights reserved. About Ted and Pat Ted Malaska ā€¢ Principal Solutions Architect @ Cloudera ā€¢ Apache HBase SparkOnHBase Contributor ā€¢ Contact ā€¢ ted.malaska@cloudera.com ā€¢ @TedMalaska Pat Patterson ā€¢ Community Champion @ StreamSets ā€¢ Formerly Developer Evangelist at Salesforce ā€¢ Contact ā€¢ pat@streamsets.com ā€¢ @metadaddy
  • 5. 3Ā© Cloudera, Inc. All rights reserved. Streaming Patterns ā€¢Ingestion ā€¢Low Millisecond Actions ā€¢Near Real Time Complex Actions
  • 6. 4Ā© Cloudera, Inc. All rights reserved. Parts Of Streaming Producer Kafka Engine Destination
  • 7. 5Ā© Cloudera, Inc. All rights reserved. Parts Of Streaming Producer Kafka Engine Destination At Least once Ordered Partitioned At Least Once Depends Depends
  • 8. 6Ā© Cloudera, Inc. All rights reserved. Destinations ā€¢ File Systems: example HDFS ā€¢ Batch is good ā€¢ Only can do exactly once is a file is closed in a single ack. ā€¢ Good for Scans ā€¢ Solr ā€¢ Everything is Document based making exactly once ā€¢ Batch is still good ā€¢ Good for Search Queries
  • 9. 7Ā© Cloudera, Inc. All rights reserved. Destinations ā€¢ NoSQL: example HBase ā€¢ Everything has a row key making exactly once for writes ā€¢ Increments can be applied twice is so be careful ā€¢ Good for gets and puts ā€¢ Kudu ā€¢ Everything has a row key making exactly once for writes ā€¢ Good for gets, puts, and scans
  • 10. 8Ā© Cloudera, Inc. All rights reserved. Ingestion Destinations ā€¢ File Systems: example HDFS ā€¢ Flume ā€¢ Kafka Connect ā€¢ Solr ā€¢ Flume ā€¢ Any Streaming Engine
  • 11. 9Ā© Cloudera, Inc. All rights reserved. Ingestion Destinations ā€¢ NoSQL: example HBase ā€¢ Flume ā€¢ Any Streaming Engine: Storm and Spark Streaming Tested ā€¢ Kudu ā€¢ Flume ā€¢ Kafka Connect ā€¢ Any Streaming Engine: Spark Streaming Tested
  • 12. 10Ā© Cloudera, Inc. All rights reserved. Tricks With Producers ā€¢ Send Source ID (requires Partitioning In Kafka) ā€¢ Seq ā€¢ UUID ā€¢ UUID plus time ā€¢ Partition on SourceID ā€¢ Watch out for repartitions and partition fail overs
  • 13. 11Ā© Cloudera, Inc. All rights reserved. Streaming Engines ā€¢ Consumer ā€¢ Flume, KafkaConnect, Streaming Engine ā€¢ Storm ā€¢ Spark Streaming ā€¢ Flink ā€¢ Kafka Streams
  • 14. 12Ā© Cloudera, Inc. All rights reserved. Consumer: Flume, KafkaConnect ā€¢ Simple and Works ā€¢ Low latency ā€¢ High throughput ā€¢ Interceptors ā€¢ Transformations ā€¢ Alerting ā€¢ Ingestions
  • 15. 13Ā© Cloudera, Inc. All rights reserved. Consumer: Streaming Engines ā€¢ Not so great at HDFS Ingestion ā€¢ But great for record storage systems ā€¢ HBase ā€¢ Cassandra ā€¢ Kudu ā€¢ SolR ā€¢ Elastic Search
  • 16. 14Ā© Cloudera, Inc. All rights reserved. Storm ā€¢ Old Gen ā€¢ Low latency ā€¢ Low throughput ā€¢ At least once ā€¢ Around for ever ā€¢ Topology Based
  • 17. 15Ā© Cloudera, Inc. All rights reserved. Spark Streaming ā€¢ The Juggernaut ā€¢ Higher Latency ā€¢ High Through Put ā€¢ Exactly Once ā€¢ SQL ā€¢ MlLib ā€¢ Highly used ā€¢ Easy to Debug/Unit Test ā€¢ Easy to transition from Batch ā€¢ Flow Language ā€¢ 600 commits in a month and about 100 meetups
  • 18. 16Ā© Cloudera, Inc. All rights reserved. Spark Streaming DStream DStream DStream Single Pass Source Receiver RDD Source Receiver RDD RDD Filter Count Print Source Receiver RDD RDD RDD Single Pass Filter Count Print First Batch Second Batch
  • 19. 17Ā© Cloudera, Inc. All rights reserved. DStream DStream DStream Single Pass Source Receiver RDD Source Receiver RDD RDD Filter Count Print Source Receiver RDD partitions RDD Parition RDD Single Pass Filter Count Pre-first Batch First Batch Second Batch Stateful RDD 1 Print Stateful RDD 2 Stateful RDD 1 Spark Streaming
  • 20. 18Ā© Cloudera, Inc. All rights reserved. Flink ā€¢ Iā€™m Better Than Spark Why Doesnā€™t Anyone use me ā€¢ Very much like Spark but not as feature rich ā€¢ Lower Latency ā€¢ Micro Batch -> ABS ā€¢ Asynchronous Barrier Snapshotting ā€¢ Flow Language ā€¢ ~1/6th the comments and meetups ā€¢ But Slim loves it ļŠ
  • 21. 19Ā© Cloudera, Inc. All rights reserved. Flink - ABS Operator Buffer
  • 22. 20Ā© Cloudera, Inc. All rights reserved. Operator Buffer Operator Buffer Flink - ABS Barrier 1A Hit Barrier 1B Still Behind
  • 23. 21Ā© Cloudera, Inc. All rights reserved. Operator Buffer Flink - ABS Both Barriers Hit Operator Buffer Barrier 1A Hit Barrier 1B Still Behind
  • 24. 22Ā© Cloudera, Inc. All rights reserved. Operator Buffer Flink - ABS Both Barriers Hit Operator Buffer Barrier is combined and can move on Buffer can be flushed out
  • 25. 23Ā© Cloudera, Inc. All rights reserved. Kafka Streams ā€¢ The new Kid on the Block ā€¢ When you only have Kafka ā€¢ Low Latency ā€¢ High Throughput ā€¢ Not exactly once ā€¢ Very Young ā€¢ Flow Language ā€¢ Very different hardware profile then others ā€¢ Not widely supported ā€¢ Not widely used ā€¢ Worries about separation of concern
  • 26. 24Ā© Cloudera, Inc. All rights reserved. Summary about Engines ā€¢ Ingestion ā€¢ Flume and KafkaConnect ā€¢ Super Real Time and Special ā€¢ Consumer ā€¢ Counting, MlLib, SQL ā€¢ Spark ā€¢ Maybe future and cool ā€¢ Flink and KafkaStreams ā€¢ Odd man out ā€¢ Storm
  • 27. 25Ā© Cloudera, Inc. All rights reserved. Abstractions Code Abstractions Beam SQL Abstraction SQL UI Abstraction StreamSets Streaming Engines
  • 28. 26Ā© Cloudera, Inc. All rights reserved. StreamSets Data Collector Building a Higher Level, Open Source Tool
  • 29. 27Ā© Cloudera, Inc. All rights reserved. Traditional and Big Data Founders StreamSets Company Background Top tier Investors Momentum to Date Strategic Partners ā€¢ Founded 2014; exited stealth 9/15 ā€¢ ~30 employees ā€¢ Double-digit enterprise customers ā€¢ 10,000 downloads
  • 30. 28Ā© Cloudera, Inc. All rights reserved. Thank you!
  • 31. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ingest- stream-processing