SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Harnessing Data-in-Motion
with Hortonworks DataFlow
Apache NiFi, Kafka and Storm
Better Together
Bryan Bende
Sr. Software Engineer
Haimo Liu
Product Manager
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
• Introduction to Hortonworks Data Flow
• Introduction to Apache projects
• Better together
• Best Practices
• Demo
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Connected Data Platforms
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stream Processing
Flow Management
Enterprise Services
At the edge
Security
Visualization
On premises In the cloud
Registries/Catalogs Governance (Security/Compliance) Operations
HDF 2.0 – Data in Motion Platform
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Flow Management Flow management + Stream Processing
D A T A I N M O T I O N D A T A A T R E S T
IoT Data Sources AWS
Azure
Google Cloud
Hadoop
NiFi
Kafka
Storm
Others…
NiFi
NiFi NiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
MiNiFi
NiFi
HDF 2.0 – Data in Motion Platform
Enterprise Services
Ambari Ranger Other services
Introduction to
Apache Projects
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is Apache NiFi?
• Created to address the challenges of global enterprise dataflow
• Key features:
– Visual Command and Control
– Data Lineage (Provenance)
– Data Prioritization
– Data Buffering/Back-Pressure
– Control Latency vs. Throughput
– Secure Control Plane / Data Plane
– Scale Out Clustering
– Extensibility
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi
What is Apache NiFi used for?
• Reliable and secure transfer of data between systems
• Delivery of data from sources to analytic platforms
• Enrichment and preparation of data:
– Conversion between formats
– Extraction/Parsing
– Routing decisions
What is Apache NiFi NOT used for?
• Distributed Computation
• Complex Event Processing
• Complex Rolling Window Operations
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi Terminology
FlowFile
• Unit of data moving through the system
• Content + Attributes (key/value pairs)
Processor
• Performs the work, can access FlowFiles
Connection
• Links between processors
• Queues that can be dynamically prioritized
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is Apache Kafka? APACHE
KAFKA
• Distributed streaming platform that
allows publishing and subscribing to
streams of records
• Streams of records are organized into
categories called topics
• Topics can be partitioned and/or
replicated
• Records consist of a key, value, and
timestamp
http://kafka.apache.org/intro
Kafka
Cluster
producer
producer
producer
consumer
consumer
consumer
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kafka: Anatomy of a Topic
Partition
0
Partition
1
Partition
2
0 0 0
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10
11 11
12
Writes
Old
New
 Partitioning allows topics to
scale beyond a single
machine/node
 Topics can also be replicated,
for high availability.
APACHE
KAFKA
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi and Kafka Are Complementary
NiFi
Provide dataflow solution
• Centralized management, from edge to core
• Great traceability, event level data provenance
starting when data is born
• Interactive command and control – real time
operational visibility
• Dataflow management, including prioritization,
back pressure, and edge intelligence
• Visual representation of global dataflow
Kafka
Provide durable stream store
• Low latency
• Distributed data durability
• Decentralized management of producers &
consumers
+
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is Apache Storm?
• Distributed, low-latency, fault-tolerant, Stream Processing platform.
• Provides processing guarantees.
• Key concepts include:
• Tuples
• Streams
• Spouts
• Bolts
• Topology
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Storm - Tuples and Streams
• What is a Tuple?
–Fundamental data structure in Storm
–Named list of values that can be of any data type
•What is a Stream?
–An unbounded sequences of tuples.
–Core abstraction in Storm and are what you “process” in Storm
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Storm - Spouts
• What is a Spout?
–Source of data
–E.g.: JMS, Twitter, Log, Kafka Spout
–Can spin up multiple instances of a Spout and dynamically adjust as needed
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Storm - Bolts
• What is a Bolt?
–Processes any number of input streams and produces output streams
–Common processing in bolts are functions, aggregations, joins, R/W to data stores, alerting logic
–Can spin up multiple instances of a Bolt and dynamically adjust as needed
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Storm - Topology
• What is a Topology?
–A network of spouts and bolts wired together into a workflow
Truck-Event-Processor Topology
Kafka Spout
HBase
Bolt
Monitoring
Bolt
HDFS
Bolt
WebSocket
Bolt
Stream Stream
Stream
Stream
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
+
NiFi and Storm Are Complementary
NiFi
Simple event processing
• Manages flow of data between producers and
consumers across the enterprise
• Data enrichment, splitting, aggregation,
format conversion, schema translation…
• Scale out to handle gigabytes per second, or
scale down to a Raspberry PI handling tens of
thousands of events per second
Storm
Complex and distributed processing
• Complex processing from multiple streams (JOIN
operations)
• Analyzing data across time windows (rolling window
aggregation, standard deviation, etc.)
• Scale out to thousands of nodes if needed
+
Better Together
+ +
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key Integration Points
• NiFi - Kafka
– NiFi Kafka Producer
– NiFi Kafka Consumer
• Storm - Kafka
– Storm Kafka Consumer
– Storm Kafka Producer
+ +
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key Integration Points – NiFi & Kafka
NiFi
MiNiFi
MiNiFi
MiNiFi
Kafka
Consumer 1
Consumer 2
Consumer N
• Producer Processors
• PutKafka (0.8 Kafka Client)
• PublishKafka (0.9 Kafka Client)
• PublishKafka_0_10 (0.10 Kafka Client)
+
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key Integration Points – NiFi & Kafka
Kafka
Producer 1
Producer 2
Producer N
NiFi
Destination 1
Destination 2
Destination 3
• Consumer Processors
• GetKafka (0.8 Kafka Client)
• ConsumeKafka (0.9 Kafka Client)
• ConsumeKafka_0_10 (0.10 Kafka Client)
+
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Key Integration Points – Storm & Kafka
• storm-kafka module
– KafkaSpout (Core & Trident) & KafkaBolt
– Compatible with Kafka 0.8 and 0.9 client
– Kafka client declared by topology developer
• storm-kafka-client module
– KafkaSpout & KafkaSpoutTuplesBuilder
– Compatible with Kafka 0.9 and 0.10 client
– Kafka client declared by topology developer
Kafka Storm
Incoming Topic
Results Topic
KafkaSpout
KafkaBolt
+
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Better Together
NiFiMiNiFi
Kafka
Storm
Incoming Topic
Results Topic
PublishKafka
ConsumeKafka
Destinations
MiNiFi
• MiNiFi – Collection, filtering, and prioritization at the edge
• NiFi - Central data flow management, routing, enriching, and transformation
• Kafka - Central messaging bus for subscription by downstream consumers
• Storm - Streaming analytics focused on complex event processing
+ +
Best Practices
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi PublishKafka
Apache NiFi - Node 1
Apache Kafka
Topic 1 - Partition 1
Topic 1 - Partition 2
PublishKafka
Apache NiFi – Node 2
PublishKafka
= Concurrent Task
• Each NiFi node runs an
instance of PublishKafka
• Each instance has one or
more concurrent tasks
(threads)
• Each concurrent task is an
independent producer,
sends data round-robin to
partitions of a topic
+
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi ConsumeKafka – Nodes = Partitions
Apache NiFi - Node 1
Apache Kafka
Topic 1 - Partition 1
Topic 1 - Partition 2
ConsumeKafka
(consumer group 1)
Apache NiFi – Node 2
ConsumeKafka
(consumer group 1)
= Concurrent Task
• Each NiFi node runs an
instance of ConsumeKafka
• Each instance has one or
more concurrent tasks
(threads)
• Each concurrent task is a
consumer assigned to a
single partition
• Kafka Client ensures a given
partition can only have one
consumer/thread in a
consumer group
+
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi ConsumeKafka – Nodes > Partitions
Apache NiFi - Node 1
Apache Kafka
Topic 1 - Partition 1
Topic 1 - Partition 2
ConsumeKafka
(consumer group 1)
Apache NiFi – Node 2
ConsumeKafka
(consumer group 1)
= Concurrent Task
Apache NiFi – Node 3
ConsumeKafka
(consumer group 1)
• Remember… each partition
can only have one consumer
from the same group
• When there are more NiFi
nodes than partitions, some
nodes won’t consume
anything
+
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi ConsumeKafka – Nodes < Partitions
Apache NiFi - Node 1
Apache Kafka
Topic 1 - Partition 1
Topic 1 - Partition 2
ConsumeKafka
(consumer group 1)
Apache NiFi – Node 2
ConsumeKafka
(consumer group 1)
= Concurrent Task
Topic 1 - Partition 3
Topic 1 - Partition 4
• When there are less NiFi
nodes/tasks than partitions,
multiple partitions will be
assigned to each node/task
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi ConsumeKafka – Tasks = Partitions
Apache NiFi - Node 1
Apache Kafka
Topic 1 - Partition 1
Topic 1 - Partition 2
ConsumeKafka
(consumer group 1)
Apache NiFi – Node 2
ConsumeKafka
(consumer group 1)
= Concurrent Task
Topic 1 - Partition 3
Topic 1 - Partition 4
• When there are less NiFi
nodes than partitions, we
can increase the concurrent
tasks on each node
• Kafka Client will
automatically rebalance
partition assignment
• Improves throughput
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi ConsumeKafka – Tasks > Partitions
Apache NiFi - Node 1
ConsumeKafka
(consumer group 1)
Apache NiFi – Node 2
ConsumeKafka
(consumer group 1)
= Concurrent Task
Apache Kafka
Topic 1 - Partition 1
Topic 1 - Partition 2
• Increasing concurrent tasks
only makes sense when the
number of partitions is
greater than the number of
nodes
• Otherwise we end up with
some tasks not consuming
anything
+
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kafka Processors & Batching Messages
• PublishKafka - ‘Message Demarcator’
• If not specified, flow file content sent as a single message
• If specified, flow file content separated into multiple messages based on demarcator
• Ex: Sending 1 million messages to Kafka – significantly better performance with 1 flow file
containing 1 million demarcated messages vs. 1 million flow files with a single message
• ConsumeKafka - ‘Message Demarcator’
• If not specified, a flow file is produced for each message consumed
• If specified, multiple messages written to a single flow file separated by the demarcator
• Maximum # of messages written to a single flow file equals ‘Max Poll Records’
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Best Practice Summary
• PublishKafka
• Each concurrent task is an independent producer
• Scale number of concurrent tasks according to data flow
• ConsumeKafka
• Kafka client assigns one thread per-partition with in a consumer group
• Create optimal alignment between # of partitions and # of consumer tasks
• Avoid having more tasks than partitions
• Batching
• Message Demarcator property on PublishKafka and ConsumeKafka
• Can achieve significantly better performance
Demo!
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Summary of the Demo Scenario
Truck Sensors
NiFi
MiNiFi
Kafka Storm
Speed Events
Average Speed
PublishKafka
ConsumeKafka
Dashboard
Windowed
Avg. Speed
• MiNiFi – Collects data from truck sensors
• NiFi – Filter/enrich truck data, deliver to Kafka, consume results
• Kafka - Central messaging bus, Storm consumes from and publishes to
• Storm – Computes average speed over a time window per driver & route
+ ++
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo – Data Generator
 Geo Event
2016-11-07 10:34:52.922|truck_geo_event|73|10|George
Vetticaden|1390372503|Saint Louis to Tulsa|Normal|38.14|-
91.3|1|
 Speed Event
2016-11-07 10:34:52.922|truck_speed_event|73|10|George
Vetticaden|1390372503|Saint Louis to Tulsa|70|
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo – MiNiFi
Processors:
- name: TailFile
class: org.apache.nifi.processors.standard.TailFile
...
Properties:
File Location: Local
File to Tail: /tmp/truck-sensor-data/truck-1.txt
...
Connections:
- name: TailFile/success/2042214b-0158-1000-353d-654ef72c7307
source name: TailFile
...
Remote Processing Groups:
- name: http://localhost:9090/nifi
url: http://localhost:9090/nifi
...
Input Ports:
- id: 2042214b-0158-1000-353d-654ef72c7307
name: Truck Events
...
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo - NiFi
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo - Storm
40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo - Dashboard
41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Questions?
Hortonworks Community Connection:
Data Ingestion and Streaming
https://community.hortonworks.com/
42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kerberized interaction w/Kafka GetKafka PutKafka
Kafka broker 0.8 (HDP 2.3.2) Supported Supported
Kafka broker 0.9 (HDP 2.3.4 +) Supported Supported
Kafka broker 0.8 (Apache) N/A N/A
Kafka broker 0.9 (Apache) Not Supported Not Supported
Non-Kerberized interaction w/Kafka GetKafka PutKafka
Kafka broker 0.8 (HDP 2.3.2) Supported Supported
Kafka broker 0.9 (HDP 2.3.4 +) Supported Supported
Kafka broker 0.8 (Apache) Supported Supported
Kafka broker 0.9 (Apache) Supported Supported
SSL Interaction w/ Kafka GetKafka PutKafka
Kafka broker 0.8 (HDP 2.3.2) N/A N/A
Kafka broker 0.9 (HDP 2.3.4 +) Not Supported Not Supported
Kafka broker 0.8 (Apache) N/A N/A
Kafka broker 0.9 (Apache) Not Supported Not Supported
HDF Kafka Processor Compatibility
43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kerberized interaction w/Kafka ConsumeKafka (2 sets) PublishKafka (2 sets)
Kafka broker 0.8 (HDP 2.3.2) Not Supported Not Supported
Kafka broker 0.9/0.10 (HDP 2.3.4 +) Supported Supported
Kafka broker 0.8 (Apache) N/A N/A
Kafka broker 0.9/0.10 (Apache) Supported Supported
Non-Kerberized interaction w/Kafka ConsumeKafka (2 sets) PublishKafka (2 sets)
Kafka broker 0.8 (HDP 2.3.2) Not Supported Not Supported
Kafka broker 0.9/0.10 (HDP 2.3.4 +) Supported Supported
Kafka broker 0.8 (Apache) Not Supported Not Supported
Kafka broker 0.9/0.10 (Apache) Supported Supported
SSL Interaction w/ Kafka ConsumeKafka (2 sets) PublishKafka (2 sets)
Kafka broker 0.8 (HDP 2.3.2) N/A N/A
Kafka broker 0.9/0.10 (HDP 2.3.4 +) Supported Supported
Kafka broker 0.8 (Apache) N/A N/A
Kafka broker 0.9/0.10 (Apache) Supported Supported
HDF Kafka Processor Compatibility

More Related Content

What's hot

BYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFiBYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFi
DataWorks Summit
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
DataWorks Summit
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Kai Wähner
 
An Overview of Ambari
An Overview of AmbariAn Overview of Ambari
An Overview of Ambari
Chicago Hadoop Users Group
 
Manage Add-On Services with Apache Ambari
Manage Add-On Services with Apache AmbariManage Add-On Services with Apache Ambari
Manage Add-On Services with Apache Ambari
DataWorks Summit
 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Kai Wähner
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifi
Anshuman Ghosh
 
Real time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaReal time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafka
Timothy Spann
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016
Timothy Spann
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
Vinay Shukla
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC Improvements
Bryan Bende
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Nifi workshop
Nifi workshopNifi workshop
Nifi workshop
Yifeng Jiang
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
DataWorks Summit
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Aldrin Piri
 
NiFi Best Practices for the Enterprise
NiFi Best Practices for the EnterpriseNiFi Best Practices for the Enterprise
NiFi Best Practices for the Enterprise
Gregory Keys
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
Manish Gupta
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
DataWorks Summit
 

What's hot (20)

BYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFiBYOP: Custom Processor Development with Apache NiFi
BYOP: Custom Processor Development with Apache NiFi
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse ArchitectureServerless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
 
An Overview of Ambari
An Overview of AmbariAn Overview of Ambari
An Overview of Ambari
 
Manage Add-On Services with Apache Ambari
Manage Add-On Services with Apache AmbariManage Add-On Services with Apache Ambari
Manage Add-On Services with Apache Ambari
 
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
Data Warehouse vs. Data Lake vs. Data Streaming – Friends, Enemies, Frenemies?
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifi
 
Real time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaReal time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafka
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016Apache NiFi Meetup - Princeton NJ 2016
Apache NiFi Meetup - Princeton NJ 2016
 
Hadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache KnoxHadoop Security Today & Tomorrow with Apache Knox
Hadoop Security Today & Tomorrow with Apache Knox
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC Improvements
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Nifi workshop
Nifi workshopNifi workshop
Nifi workshop
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
 
NiFi Best Practices for the Enterprise
NiFi Best Practices for the EnterpriseNiFi Best Practices for the Enterprise
NiFi Best Practices for the Enterprise
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
 

Similar to Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Together

State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & Community
Accumulo Summit
 
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-dive
Yifeng Jiang
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
DataWorks Summit
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFi
Hortonworks
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep Dive
Aldrin Piri
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop EcosystemApache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
Bryan Bende
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
Joe Percivall
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
DataWorks Summit/Hadoop Summit
 
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiData at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Aldrin Piri
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
DataWorks Summit
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFi
DataWorks Summit
 
HDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New FeaturesHDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New Features
Timothy Spann
 
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFiThe First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
DataWorks Summit
 
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFiThe First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
DataWorks Summit
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Data Con LA
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep Dive
Bryan Bende
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
sureshraj43
 
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks
 
Apache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop SummitApache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop Summit
Aldrin Piri
 

Similar to Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Together (20)

State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & Community
 
Kinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-diveKinesis vs-kafka-and-kafka-deep-dive
Kinesis vs-kafka-and-kafka-deep-dive
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Curing the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging ManagerCuring the Kafka blindness—Streams Messaging Manager
Curing the Kafka blindness—Streams Messaging Manager
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFi
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep Dive
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop EcosystemApache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
 
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiData at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFi
 
HDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New FeaturesHDF 3.1 : An Introduction to New Features
HDF 3.1 : An Introduction to New Features
 
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFiThe First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
The First Mile - Edge and IoT Data Collection With Apache Nifi and MiniFi
 
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFiThe First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
The First Mile -- Edge and IoT Data Collection with Apache NiFi and MiNiFi
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep Dive
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache HadoopHortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
Hortonworks Technical Workshop: Real Time Monitoring with Apache Hadoop
 
Apache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop SummitApache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop Summit
 

More from Hortonworks

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
Hortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
Hortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
Hortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
Hortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks
 

More from Hortonworks (20)

Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next LevelHortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT StrategyIoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with CloudbreakGetting the Most Out of Your Data in the Cloud with Cloudbreak
Getting the Most Out of Your Data in the Cloud with Cloudbreak
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log EventsJohns Hopkins - Using Hadoop to Secure Access Log Events
Johns Hopkins - Using Hadoop to Secure Access Log Events
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad GuysCatch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
 
HDF 3.2 - What's New
HDF 3.2 - What's NewHDF 3.2 - What's New
HDF 3.2 - What's New
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging ManagerCuring Kafka Blindness with Hortonworks Streams Messaging Manager
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical EnvironmentsInterpretation Tool for Genomic Sequencing Data in Clinical Environments
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
 
IBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data LandscapeIBM+Hortonworks = Transformation of the Big Data Landscape
IBM+Hortonworks = Transformation of the Big Data Landscape
 
Premier Inside-Out: Apache Druid
Premier Inside-Out: Apache DruidPremier Inside-Out: Apache Druid
Premier Inside-Out: Apache Druid
 
Accelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at ScaleAccelerating Data Science and Real Time Analytics at Scale
Accelerating Data Science and Real Time Analytics at Scale
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATATIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: ClearsenseDelivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
 
Making Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with EaseMaking Enterprise Big Data Small with Ease
Making Enterprise Big Data Small with Ease
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World PresentationWebinewbie to Webinerd in 30 Days - Webinar World Presentation
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
 
Driving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data ManagementDriving Digital Transformation Through Global Data Management
Driving Digital Transformation Through Global Data Management
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming FeaturesHDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDCUnlock Value from Big Data with Apache NiFi and Streaming CDC
Unlock Value from Big Data with Apache NiFi and Streaming CDC
 

Recently uploaded

"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 

Recently uploaded (20)

"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 

Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Together

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Harnessing Data-in-Motion with Hortonworks DataFlow Apache NiFi, Kafka and Storm Better Together Bryan Bende Sr. Software Engineer Haimo Liu Product Manager
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda • Introduction to Hortonworks Data Flow • Introduction to Apache projects • Better together • Best Practices • Demo
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Connected Data Platforms
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Stream Processing Flow Management Enterprise Services At the edge Security Visualization On premises In the cloud Registries/Catalogs Governance (Security/Compliance) Operations HDF 2.0 – Data in Motion Platform
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Flow Management Flow management + Stream Processing D A T A I N M O T I O N D A T A A T R E S T IoT Data Sources AWS Azure Google Cloud Hadoop NiFi Kafka Storm Others… NiFi NiFi NiFi MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi MiNiFi NiFi HDF 2.0 – Data in Motion Platform Enterprise Services Ambari Ranger Other services
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is Apache NiFi? • Created to address the challenges of global enterprise dataflow • Key features: – Visual Command and Control – Data Lineage (Provenance) – Data Prioritization – Data Buffering/Back-Pressure – Control Latency vs. Throughput – Secure Control Plane / Data Plane – Scale Out Clustering – Extensibility
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi What is Apache NiFi used for? • Reliable and secure transfer of data between systems • Delivery of data from sources to analytic platforms • Enrichment and preparation of data: – Conversion between formats – Extraction/Parsing – Routing decisions What is Apache NiFi NOT used for? • Distributed Computation • Complex Event Processing • Complex Rolling Window Operations
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi Terminology FlowFile • Unit of data moving through the system • Content + Attributes (key/value pairs) Processor • Performs the work, can access FlowFiles Connection • Links between processors • Queues that can be dynamically prioritized
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is Apache Kafka? APACHE KAFKA • Distributed streaming platform that allows publishing and subscribing to streams of records • Streams of records are organized into categories called topics • Topics can be partitioned and/or replicated • Records consist of a key, value, and timestamp http://kafka.apache.org/intro Kafka Cluster producer producer producer consumer consumer consumer
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Kafka: Anatomy of a Topic Partition 0 Partition 1 Partition 2 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 10 10 11 11 12 Writes Old New  Partitioning allows topics to scale beyond a single machine/node  Topics can also be replicated, for high availability. APACHE KAFKA
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi and Kafka Are Complementary NiFi Provide dataflow solution • Centralized management, from edge to core • Great traceability, event level data provenance starting when data is born • Interactive command and control – real time operational visibility • Dataflow management, including prioritization, back pressure, and edge intelligence • Visual representation of global dataflow Kafka Provide durable stream store • Low latency • Distributed data durability • Decentralized management of producers & consumers +
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is Apache Storm? • Distributed, low-latency, fault-tolerant, Stream Processing platform. • Provides processing guarantees. • Key concepts include: • Tuples • Streams • Spouts • Bolts • Topology
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Storm - Tuples and Streams • What is a Tuple? –Fundamental data structure in Storm –Named list of values that can be of any data type •What is a Stream? –An unbounded sequences of tuples. –Core abstraction in Storm and are what you “process” in Storm
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Storm - Spouts • What is a Spout? –Source of data –E.g.: JMS, Twitter, Log, Kafka Spout –Can spin up multiple instances of a Spout and dynamically adjust as needed
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Storm - Bolts • What is a Bolt? –Processes any number of input streams and produces output streams –Common processing in bolts are functions, aggregations, joins, R/W to data stores, alerting logic –Can spin up multiple instances of a Bolt and dynamically adjust as needed
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Storm - Topology • What is a Topology? –A network of spouts and bolts wired together into a workflow Truck-Event-Processor Topology Kafka Spout HBase Bolt Monitoring Bolt HDFS Bolt WebSocket Bolt Stream Stream Stream Stream
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved + NiFi and Storm Are Complementary NiFi Simple event processing • Manages flow of data between producers and consumers across the enterprise • Data enrichment, splitting, aggregation, format conversion, schema translation… • Scale out to handle gigabytes per second, or scale down to a Raspberry PI handling tens of thousands of events per second Storm Complex and distributed processing • Complex processing from multiple streams (JOIN operations) • Analyzing data across time windows (rolling window aggregation, standard deviation, etc.) • Scale out to thousands of nodes if needed +
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key Integration Points • NiFi - Kafka – NiFi Kafka Producer – NiFi Kafka Consumer • Storm - Kafka – Storm Kafka Consumer – Storm Kafka Producer + +
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key Integration Points – NiFi & Kafka NiFi MiNiFi MiNiFi MiNiFi Kafka Consumer 1 Consumer 2 Consumer N • Producer Processors • PutKafka (0.8 Kafka Client) • PublishKafka (0.9 Kafka Client) • PublishKafka_0_10 (0.10 Kafka Client) +
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key Integration Points – NiFi & Kafka Kafka Producer 1 Producer 2 Producer N NiFi Destination 1 Destination 2 Destination 3 • Consumer Processors • GetKafka (0.8 Kafka Client) • ConsumeKafka (0.9 Kafka Client) • ConsumeKafka_0_10 (0.10 Kafka Client) +
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Key Integration Points – Storm & Kafka • storm-kafka module – KafkaSpout (Core & Trident) & KafkaBolt – Compatible with Kafka 0.8 and 0.9 client – Kafka client declared by topology developer • storm-kafka-client module – KafkaSpout & KafkaSpoutTuplesBuilder – Compatible with Kafka 0.9 and 0.10 client – Kafka client declared by topology developer Kafka Storm Incoming Topic Results Topic KafkaSpout KafkaBolt +
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Better Together NiFiMiNiFi Kafka Storm Incoming Topic Results Topic PublishKafka ConsumeKafka Destinations MiNiFi • MiNiFi – Collection, filtering, and prioritization at the edge • NiFi - Central data flow management, routing, enriching, and transformation • Kafka - Central messaging bus for subscription by downstream consumers • Storm - Streaming analytics focused on complex event processing + +
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi PublishKafka Apache NiFi - Node 1 Apache Kafka Topic 1 - Partition 1 Topic 1 - Partition 2 PublishKafka Apache NiFi – Node 2 PublishKafka = Concurrent Task • Each NiFi node runs an instance of PublishKafka • Each instance has one or more concurrent tasks (threads) • Each concurrent task is an independent producer, sends data round-robin to partitions of a topic +
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi ConsumeKafka – Nodes = Partitions Apache NiFi - Node 1 Apache Kafka Topic 1 - Partition 1 Topic 1 - Partition 2 ConsumeKafka (consumer group 1) Apache NiFi – Node 2 ConsumeKafka (consumer group 1) = Concurrent Task • Each NiFi node runs an instance of ConsumeKafka • Each instance has one or more concurrent tasks (threads) • Each concurrent task is a consumer assigned to a single partition • Kafka Client ensures a given partition can only have one consumer/thread in a consumer group +
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi ConsumeKafka – Nodes > Partitions Apache NiFi - Node 1 Apache Kafka Topic 1 - Partition 1 Topic 1 - Partition 2 ConsumeKafka (consumer group 1) Apache NiFi – Node 2 ConsumeKafka (consumer group 1) = Concurrent Task Apache NiFi – Node 3 ConsumeKafka (consumer group 1) • Remember… each partition can only have one consumer from the same group • When there are more NiFi nodes than partitions, some nodes won’t consume anything +
  • 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi ConsumeKafka – Nodes < Partitions Apache NiFi - Node 1 Apache Kafka Topic 1 - Partition 1 Topic 1 - Partition 2 ConsumeKafka (consumer group 1) Apache NiFi – Node 2 ConsumeKafka (consumer group 1) = Concurrent Task Topic 1 - Partition 3 Topic 1 - Partition 4 • When there are less NiFi nodes/tasks than partitions, multiple partitions will be assigned to each node/task
  • 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi ConsumeKafka – Tasks = Partitions Apache NiFi - Node 1 Apache Kafka Topic 1 - Partition 1 Topic 1 - Partition 2 ConsumeKafka (consumer group 1) Apache NiFi – Node 2 ConsumeKafka (consumer group 1) = Concurrent Task Topic 1 - Partition 3 Topic 1 - Partition 4 • When there are less NiFi nodes than partitions, we can increase the concurrent tasks on each node • Kafka Client will automatically rebalance partition assignment • Improves throughput
  • 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi ConsumeKafka – Tasks > Partitions Apache NiFi - Node 1 ConsumeKafka (consumer group 1) Apache NiFi – Node 2 ConsumeKafka (consumer group 1) = Concurrent Task Apache Kafka Topic 1 - Partition 1 Topic 1 - Partition 2 • Increasing concurrent tasks only makes sense when the number of partitions is greater than the number of nodes • Otherwise we end up with some tasks not consuming anything +
  • 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Kafka Processors & Batching Messages • PublishKafka - ‘Message Demarcator’ • If not specified, flow file content sent as a single message • If specified, flow file content separated into multiple messages based on demarcator • Ex: Sending 1 million messages to Kafka – significantly better performance with 1 flow file containing 1 million demarcated messages vs. 1 million flow files with a single message • ConsumeKafka - ‘Message Demarcator’ • If not specified, a flow file is produced for each message consumed • If specified, multiple messages written to a single flow file separated by the demarcator • Maximum # of messages written to a single flow file equals ‘Max Poll Records’
  • 33. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Best Practice Summary • PublishKafka • Each concurrent task is an independent producer • Scale number of concurrent tasks according to data flow • ConsumeKafka • Kafka client assigns one thread per-partition with in a consumer group • Create optimal alignment between # of partitions and # of consumer tasks • Avoid having more tasks than partitions • Batching • Message Demarcator property on PublishKafka and ConsumeKafka • Can achieve significantly better performance
  • 34. Demo!
  • 35. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Summary of the Demo Scenario Truck Sensors NiFi MiNiFi Kafka Storm Speed Events Average Speed PublishKafka ConsumeKafka Dashboard Windowed Avg. Speed • MiNiFi – Collects data from truck sensors • NiFi – Filter/enrich truck data, deliver to Kafka, consume results • Kafka - Central messaging bus, Storm consumes from and publishes to • Storm – Computes average speed over a time window per driver & route + ++
  • 36. 36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo – Data Generator  Geo Event 2016-11-07 10:34:52.922|truck_geo_event|73|10|George Vetticaden|1390372503|Saint Louis to Tulsa|Normal|38.14|- 91.3|1|  Speed Event 2016-11-07 10:34:52.922|truck_speed_event|73|10|George Vetticaden|1390372503|Saint Louis to Tulsa|70|
  • 37. 37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo – MiNiFi Processors: - name: TailFile class: org.apache.nifi.processors.standard.TailFile ... Properties: File Location: Local File to Tail: /tmp/truck-sensor-data/truck-1.txt ... Connections: - name: TailFile/success/2042214b-0158-1000-353d-654ef72c7307 source name: TailFile ... Remote Processing Groups: - name: http://localhost:9090/nifi url: http://localhost:9090/nifi ... Input Ports: - id: 2042214b-0158-1000-353d-654ef72c7307 name: Truck Events ...
  • 38. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo - NiFi
  • 39. 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo - Storm
  • 40. 40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo - Dashboard
  • 41. 41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Questions? Hortonworks Community Connection: Data Ingestion and Streaming https://community.hortonworks.com/
  • 42. 42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Kerberized interaction w/Kafka GetKafka PutKafka Kafka broker 0.8 (HDP 2.3.2) Supported Supported Kafka broker 0.9 (HDP 2.3.4 +) Supported Supported Kafka broker 0.8 (Apache) N/A N/A Kafka broker 0.9 (Apache) Not Supported Not Supported Non-Kerberized interaction w/Kafka GetKafka PutKafka Kafka broker 0.8 (HDP 2.3.2) Supported Supported Kafka broker 0.9 (HDP 2.3.4 +) Supported Supported Kafka broker 0.8 (Apache) Supported Supported Kafka broker 0.9 (Apache) Supported Supported SSL Interaction w/ Kafka GetKafka PutKafka Kafka broker 0.8 (HDP 2.3.2) N/A N/A Kafka broker 0.9 (HDP 2.3.4 +) Not Supported Not Supported Kafka broker 0.8 (Apache) N/A N/A Kafka broker 0.9 (Apache) Not Supported Not Supported HDF Kafka Processor Compatibility
  • 43. 43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Kerberized interaction w/Kafka ConsumeKafka (2 sets) PublishKafka (2 sets) Kafka broker 0.8 (HDP 2.3.2) Not Supported Not Supported Kafka broker 0.9/0.10 (HDP 2.3.4 +) Supported Supported Kafka broker 0.8 (Apache) N/A N/A Kafka broker 0.9/0.10 (Apache) Supported Supported Non-Kerberized interaction w/Kafka ConsumeKafka (2 sets) PublishKafka (2 sets) Kafka broker 0.8 (HDP 2.3.2) Not Supported Not Supported Kafka broker 0.9/0.10 (HDP 2.3.4 +) Supported Supported Kafka broker 0.8 (Apache) Not Supported Not Supported Kafka broker 0.9/0.10 (Apache) Supported Supported SSL Interaction w/ Kafka ConsumeKafka (2 sets) PublishKafka (2 sets) Kafka broker 0.8 (HDP 2.3.2) N/A N/A Kafka broker 0.9/0.10 (HDP 2.3.4 +) Supported Supported Kafka broker 0.8 (Apache) N/A N/A Kafka broker 0.9/0.10 (Apache) Supported Supported HDF Kafka Processor Compatibility

Editor's Notes

  1. Hortonworks: Powering the Future of Data
  2. Since each ConsumeKafka is part of the same group, and there are more ConsumeKafka instances than partitions, one of them doesn’t have anything to do.
  3. If we increase the concurrent tasks greater than the number of partitions, then some tasks have nothing to do.