SlideShare a Scribd company logo
Apache NiFi in the
Hadoop Ecosystem
Bryan Bende – Member of Technical Staff
Hadoop Summit 2016 Dublin
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Outline
• What is Apache NiFi?
• Hadoop Ecosystem Integrations
• Use-Case/Architecture Discussion
• Future Work
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
About Me
• Member of Technical Staff on Hortonworks DataFlow Team
• Apache NiFi Committer & PMC Member
• Integrations for HBase, Solr, Syslog, Stream Processing
• Twitter: @bbende / Blog: bryanbende.com
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What is Apache NiFi?
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Simplistic View of Enterprise Data Flow
Data Flow
Process and Analyze
Data
Acquire Data
Store Data
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Different organizations/business units across different geographic locations…
Realistic View of Enterprise Data Flow
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Interacting with different business partners and customers
Realistic View of Enterprise Data Flow
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi
• Created to address the challenges of global enterprise dataflow
• Key features:
– Visual Command and Control
– Data Lineage (Provenance)
– Data Prioritization
– Data Buffering/Back-Pressure
– Control Latency vs. Throughput
– Secure Control Plane / Data Plane
– Scale Out Clustering
– Extensibility
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache NiFi
What is Apache NiFi used for?
• Reliable and secure transfer of data between systems
• Delivery of data from sources to analytic platforms
• Enrichment and preparation of data:
– Conversion between formats
– Extraction/Parsing
– Routing decisions
What is Apache NiFi NOT used for?
• Distributed Computation
• Complex Event Processing
• Joins / Complex Rolling Window Operations
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hadoop Ecosystem Integrations
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Ingest
MergeContent
• Merges into appropriately sized files for HDFS
• Based on size, number of messages, and time
UpdateAttribute
• Sets the HDFS directory and filename
• Use expression language to dynamically bin by date:
/data/${now():format('yyyy/MM/dd/HH')}/
PutHDFS
• Writes FlowFile content to HDFS
• Supports Conflict Resolution Strategy and Kerberos
authentication
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Retrieval
ListHDFS
• Periodically perform listing on HDFS directory
• Produces FlowFile per HDFS file
• Flow only contains HDFS path & filename
FetchHDFS
• Retrieves a file from HDFS
• Use incoming FlowFiles to dynamically fetch:
HDFS Filename: ${path}/${filename}
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HDFS Retrieval in a Cluster
NCM
Node 1 (Primary)
ListHDFS
HDFS
FetchHDFS
RPG
Input Port
Node 2
ListHDFS
FetchHDFS
RPG
Input Port
• Perform “list” operation on
primary node
• Send results to Remote
Process Group pointing back
to same cluster
• Redistributes results to all
nodes to perform “fetch” in
parallel
• Same approach for ListFile +
FetchFile and ListSFTP +
FetchSFTP
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HBase Integration
• ControllerService wrapping HBase Client
• Implementation provided for HBase 1.1.2 Client
• Other implementations could be built as an extension
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HBase Ingest – Single Cell
• Table, Row Id, Col Family, and Col Qualifier
provided in processor, or dynamically from
attributes
• FlowFile content becomes the cell value
• Batch Size to specify maximum number of cells
for a single ‘put’ operation
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HBase Ingest – Full Row
• Table and Column Family provided in
processor, or dynamically from attributes
• Row ID can be a field in JSON, or a FlowFile
attribute
• JSON Field/Values become Column Qualifiers
and Values
• Complex Field Strategy
– Fail
– Warn
– Ignore
– Text
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Kafka Integration
PutKafka
• Provide Broker and Topic Name
• Publishes FlowFile content as one or more messages
• Ability to send large delimited content, split into
messages by NiFi
GetKafka
• Provide ZK Connection String and Topic Name
• Produces a FlowFile for each message consumed
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Stream Processing Integration
• Stream processing systems
generally pull data, then push
results
• NiFi Site-To-Site pushes and pulls
between NiFi instances
• The Site-To-Site Client can be used
from a stream processing platform
https://github.com/apache/nifi/tree/master/nifi
-commons/nifi-site-to-site-client
Stream Processing
Site-To-Site Client
NiFi
Output Port
Stream Processing
Site-To-Site Client
NiFi
Input Port
Pull
Push
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Site-to-Site Client Overview
• Push to Input Port, or Pull from Output Port
• Communicate with NiFi clusters, or standalone instances
• Handles load balancing and reliable delivery
• Secure connections using certificates (optional)
SiteToSiteClientConfig clientConfig =
new SiteToSiteClient.Builder()
.url(“http://localhost:8080/nifi”)
.portName(”My Port”)
.buildConfig();
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Site-To-Site Client Pulling
SiteToSiteClient client = ...
Transaction transaction =
client.createTransaction(TransferDirection.RECEIVE);
DataPacket dataPacket = transaction.receive();
while (dataPacket != null) {
...
}
transaction.confirm();
transaction.complete();
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Site-To-Site Client Pushing
SiteToSiteClient client = ...
Transaction transaction =
client.createTransaction(TransferDirection.SEND);
NiFiDataPacket data = ...
transaction.send(data.getContent(), data.getAttributes());
transaction.confirm();
transaction.complete();
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Current Stream Processing Integrations
Spark Streaming - NiFi Spark Receiver
• https://github.com/apache/nifi/tree/master/nifi-external/nifi-spark-receiver
Storm – NiFi Spout
• https://github.com/apache/nifi/tree/master/nifi-external/nifi-storm-spout
Flink – NiFi Source & Sink
• https://github.com/apache/flink/tree/master/flink-streaming-connectors/flink-connector-nifi
Apex - NiFi Input Operators & Output Operators
• https://github.com/apache/incubator-apex-
malhar/tree/master/contrib/src/main/java/com/datatorrent/contrib/nifi
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Other Relevant Integrations
• GetSolr, PutSolrContentStream
• FetchElasticSearch, PutElasticSearch
• GetMongo, PutMongo
• QueryCassandra, PutCassandraQL
• GetCouchbaseKey, PutCouchbaseKey
• QueryDatabaseTable, ExecuteSQL, PutSQL
• GetSplunk, PutSplunk
• And more! https://nifi.apache.org/docs.html
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use-Case/Architecture Discussion
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Drive Data to Core for Analysis
NiFi
Stream
Processing
NiFi
NiFi
• Drive data from sources to central data center for analysis
• Tiered collection approach at various locations, think regional data centers
Edge
Edge
Core
Batch
Analytics
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamically Adjusting Data Flows
• Push analytic results back to core NiFi
• Push results back to edge locations/devices to change behavior
NiFi
NiFi
NiFi
Edge
Edge
Core
Batch
Analytics
Stream
Processing
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
1. Logs filtered by level and sent from Edge -> Core
2. Stream processing produces new filters based on rate & sends back to core
3. Edge polls core for new filter levels & updates filtering
Example: Dynamic Log Collection
Core NiFi
Streaming
Processing
Edge NiFi
Logs Logs
New Filters
Logs Output Log Input Log Output
Result Input Store Result
Service Fetch ResultPoll Service
Filter
New Filters
New
Filters
Poll
Analytic
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamic Log Collection – Edge NiFi
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamic Log Collection – Core NiFi
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Dynamic Log Collection Summary
NiFi
NiFi
NiFi
Edge
Edge
Core
Stream
Processing
Logs
Logs
Logs
New FiltersNew Filters
New Filters
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Future Work
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The Future – Ecosystem Integrations
• Ambari
– Support a fully managed NiFi Cluster through Ambari
– Monitoring, management, upgrades, etc.
• Ranger
– Ability to delegate authorization decisions to Ranger
– Manage authorization controls through Ranger
• Atlas
– Track lineage from the source to destination
– Apply tags to data as its acquired
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The Future – Apache NiFi
• HA Control Plane
– Zero Master cluster, Web UI accessible from any node
– Auto-Election of “Cluster Coordinator” and “Primary Node” through ZooKeeper
• HA Data Plane
– Ability to replicate data across nodes in a cluster
• Multi-Tenancy
– Restrict Access to portions of a flow
– Allow people/groups with in an organization to only access their portions of the flow
• Extension Registry
– Create a central repository of NARs and Templates
– Move most NARs out of Apache NiFi distribution, ship with a minimal set
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
The Future – Apache NiFi
• Variable Registry
– Define environment specific variables through the UI, reference through EL
– Make templates more portable across environments/instances
• Redesign of User Interface
– Modernize look & feel, improve usability, support multi-tenancy
• Continued Development of Integration Points
– New processors added continuously!
• MiNiFi
– Complimentary data collection agent to NiFi’s current approach
– Small, lightweight, centrally managed agent that integrates with NiFi for follow-on dataflow
management
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thanks!
• Questions?
• Contact Info:
– Email: bbende@hortonworks.com
– Twitter: @bbende
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank you

More Related Content

What's hot

Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiReal-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Timothy Spann
 
Intelligently collecting data at the edge—intro to Apache MiNiFi
Intelligently collecting data at the edge—intro to Apache MiNiFiIntelligently collecting data at the edge—intro to Apache MiNiFi
Intelligently collecting data at the edge—intro to Apache MiNiFi
DataWorks Summit
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
Mahmood Reza Esmaili Zand
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
Timothy Spann
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
DataWorks Summit
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
DataWorks Summit
 
Introduction to snowflake
Introduction to snowflakeIntroduction to snowflake
Introduction to snowflake
Sunil Gurav
 
Cloud computing and OpenStack
Cloud computing and OpenStackCloud computing and OpenStack
Cloud computing and OpenStack
Edgar Magana
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Aldrin Piri
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
Cloudera, Inc.
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
Timothy Spann
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
Edureka!
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
Alluxio, Inc.
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Data Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaData Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and Kafka
DataWorks Summit
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifi
Anshuman Ghosh
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
Databricks
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
Rommel Garcia
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
DataWorks Summit
 

What's hot (20)

Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFiReal-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
Real-time Twitter Sentiment Analysis and Image Recognition with Apache NiFi
 
Intelligently collecting data at the edge—intro to Apache MiNiFi
Intelligently collecting data at the edge—intro to Apache MiNiFiIntelligently collecting data at the edge—intro to Apache MiNiFi
Intelligently collecting data at the edge—intro to Apache MiNiFi
 
Apache hive introduction
Apache hive introductionApache hive introduction
Apache hive introduction
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
 
Apache Nifi Crash Course
Apache Nifi Crash CourseApache Nifi Crash Course
Apache Nifi Crash Course
 
Best practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at RenaultBest practices and lessons learnt from Running Apache NiFi at Renault
Best practices and lessons learnt from Running Apache NiFi at Renault
 
Introduction to snowflake
Introduction to snowflakeIntroduction to snowflake
Introduction to snowflake
 
Cloud computing and OpenStack
Cloud computing and OpenStackCloud computing and OpenStack
Cloud computing and OpenStack
 
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San JoseDataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
Dataflow with Apache NiFi - Apache NiFi Meetup - 2016 Hadoop Summit - San Jose
 
Cloudera SDX
Cloudera SDXCloudera SDX
Cloudera SDX
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | EdurekaWhat are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
What are Hadoop Components? Hadoop Ecosystem and Architecture | Edureka
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Data Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaData Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and Kafka
 
Introduction to data flow management using apache nifi
Introduction to data flow management using apache nifiIntroduction to data flow management using apache nifi
Introduction to data flow management using apache nifi
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
 
Apache Ranger
Apache RangerApache Ranger
Apache Ranger
 
Dataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFiDataflow Management From Edge to Core with Apache NiFi
Dataflow Management From Edge to Core with Apache NiFi
 

Viewers also liked

Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup Slides
Isheeta Sanghi
 
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataHortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Mats Johansson
 
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFiTaking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Bryan Bende
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
Manish Gupta
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto Meetup
Hortonworks
 
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFiBeyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
Isheeta Sanghi
 
Hbase trabalho final
Hbase trabalho finalHbase trabalho final
Hbase trabalho final
Lanylldo Araujo
 
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionHDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi Introduction
Milind Pandit
 
Talendビッグデータインテグレーション製品ご紹介
Talendビッグデータインテグレーション製品ご紹介Talendビッグデータインテグレーション製品ご紹介
Talendビッグデータインテグレーション製品ご紹介
Talend KK
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
Noam Shaish
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Hortonworks
 
Introduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability MeetupIntroduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability Meetup
Saptak Sen
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep Dive
Bryan Bende
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
datamantra
 
Apache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in NutshellApache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in Nutshell
DataWorks Summit/Hadoop Summit
 
Design a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDFDesign a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDF
Hortonworks
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical Workshop
Hortonworks
 
Reference architecture for Internet of Things
Reference architecture for Internet of ThingsReference architecture for Internet of Things
Reference architecture for Internet of Things
Sujee Maniyam
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
Hortonworks
 

Viewers also liked (20)

Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup Slides
 
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big DataHortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
Hortonworks DataFlow & Apache Nifi @Oslo Hadoop Big Data
 
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFiTaking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 
Aliran sesat
Aliran sesatAliran sesat
Aliran sesat
 
Apache NiFi Toronto Meetup
Apache NiFi Toronto MeetupApache NiFi Toronto Meetup
Apache NiFi Toronto Meetup
 
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFiBeyond Messaging Enterprise Dataflow powered by Apache NiFi
Beyond Messaging Enterprise Dataflow powered by Apache NiFi
 
Hbase trabalho final
Hbase trabalho finalHbase trabalho final
Hbase trabalho final
 
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionHDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi Introduction
 
Talendビッグデータインテグレーション製品ご紹介
Talendビッグデータインテグレーション製品ご紹介Talendビッグデータインテグレーション製品ご紹介
Talendビッグデータインテグレーション製品ご紹介
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
 
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in HadoopDiscover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
Discover HDP2.1: Apache Storm for Stream Data Processing in Hadoop
 
Introduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability MeetupIntroduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability Meetup
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep Dive
 
Introduction to Spark Streaming
Introduction to Spark StreamingIntroduction to Spark Streaming
Introduction to Spark Streaming
 
Apache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in NutshellApache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in Nutshell
 
Design a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDFDesign a Dataflow in 7 minutes with Apache NiFi/HDF
Design a Dataflow in 7 minutes with Apache NiFi/HDF
 
HDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical WorkshopHDF: Hortonworks DataFlow: Technical Workshop
HDF: Hortonworks DataFlow: Technical Workshop
 
Reference architecture for Internet of Things
Reference architecture for Internet of ThingsReference architecture for Internet of Things
Reference architecture for Internet of Things
 
Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5Webinar Series Part 5 New Features of HDF 5
Webinar Series Part 5 New Features of HDF 5
 

Similar to Apache NiFi in the Hadoop Ecosystem

State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & Community
Accumulo Summit
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep Dive
Aldrin Piri
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFi
Hortonworks
 
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Seetharam Venkatesh
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
DataWorks Summit
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Data Con LA
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
Joe Percivall
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
DataWorks Summit/Hadoop Summit
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFi
DataWorks Summit
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Alex Zeltov
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
Isheeta Sanghi
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and Flink
Bryan Bende
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
Hortonworks
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
Isheeta Sanghi
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
Isheeta Sanghi
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
ahortonworks
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
DataWorks Summit/Hadoop Summit
 
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
huguk
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
Chris Nauroth
 

Similar to Apache NiFi in the Hadoop Ecosystem (20)

State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & Community
 
Future of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep DiveFuture of Data New Jersey - HDF 3.0 Deep Dive
Future of Data New Jersey - HDF 3.0 Deep Dive
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFi
 
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015 Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
Data Governance in Apache Falcon - Hadoop Summit Brussels 2015
 
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache FalconDriving Enterprise Data Governance for Big Data Systems through Apache Falcon
Driving Enterprise Data Governance for Big Data Systems through Apache Falcon
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
 
Data Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat AlwellData Con LA 2018 - Streaming and IoT by Pat Alwell
Data Con LA 2018 - Streaming and IoT by Pat Alwell
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFi
 
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache ZeppelinIntro to Big Data Analytics using Apache Spark and Apache Zeppelin
Intro to Big Data Analytics using Apache Spark and Apache Zeppelin
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and Flink
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 
Curb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure ClusterCurb your insecurity with HDP - Tips for a Secure Cluster
Curb your insecurity with HDP - Tips for a Secure Cluster
 
Hadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in ProductionHadoop & Cloud Storage: Object Store Integration in Production
Hadoop & Cloud Storage: Object Store Integration in Production
 
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
 
Hadoop & cloud storage object store integration in production (final)
Hadoop & cloud storage  object store integration in production (final)Hadoop & cloud storage  object store integration in production (final)
Hadoop & cloud storage object store integration in production (final)
 

More from Bryan Bende

Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC Improvements
Bryan Bende
 
Apache NiFi Meetup - Introduction to NiFi Registry
Apache NiFi Meetup - Introduction to NiFi RegistryApache NiFi Meetup - Introduction to NiFi Registry
Apache NiFi Meetup - Introduction to NiFi Registry
Bryan Bende
 
Devnexus 2018 - Let Your Data Flow with Apache NiFi
Devnexus 2018 - Let Your Data Flow with Apache NiFiDevnexus 2018 - Let Your Data Flow with Apache NiFi
Devnexus 2018 - Let Your Data Flow with Apache NiFi
Bryan Bende
 
You Can't Search Without Data
You Can't Search Without DataYou Can't Search Without Data
You Can't Search Without Data
Bryan Bende
 
Apache NiFi Record Processing
Apache NiFi Record ProcessingApache NiFi Record Processing
Apache NiFi Record Processing
Bryan Bende
 
Integrating NiFi and Apex
Integrating NiFi and ApexIntegrating NiFi and Apex
Integrating NiFi and Apex
Bryan Bende
 
Building Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBuilding Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFi
Bryan Bende
 
Document Similarity with Cloud Computing
Document Similarity with Cloud ComputingDocument Similarity with Cloud Computing
Document Similarity with Cloud Computing
Bryan Bende
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014
Bryan Bende
 

More from Bryan Bende (9)

Apache NiFi SDLC Improvements
Apache NiFi SDLC ImprovementsApache NiFi SDLC Improvements
Apache NiFi SDLC Improvements
 
Apache NiFi Meetup - Introduction to NiFi Registry
Apache NiFi Meetup - Introduction to NiFi RegistryApache NiFi Meetup - Introduction to NiFi Registry
Apache NiFi Meetup - Introduction to NiFi Registry
 
Devnexus 2018 - Let Your Data Flow with Apache NiFi
Devnexus 2018 - Let Your Data Flow with Apache NiFiDevnexus 2018 - Let Your Data Flow with Apache NiFi
Devnexus 2018 - Let Your Data Flow with Apache NiFi
 
You Can't Search Without Data
You Can't Search Without DataYou Can't Search Without Data
You Can't Search Without Data
 
Apache NiFi Record Processing
Apache NiFi Record ProcessingApache NiFi Record Processing
Apache NiFi Record Processing
 
Integrating NiFi and Apex
Integrating NiFi and ApexIntegrating NiFi and Apex
Integrating NiFi and Apex
 
Building Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFiBuilding Data Pipelines for Solr with Apache NiFi
Building Data Pipelines for Solr with Apache NiFi
 
Document Similarity with Cloud Computing
Document Similarity with Cloud ComputingDocument Similarity with Cloud Computing
Document Similarity with Cloud Computing
 
Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014Real-Time Inverted Search NYC ASLUG Oct 2014
Real-Time Inverted Search NYC ASLUG Oct 2014
 

Recently uploaded

Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Jeffrey Haguewood
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 

Recently uploaded (20)

Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
Letter and Document Automation for Bonterra Impact Management (fka Social Sol...
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 

Apache NiFi in the Hadoop Ecosystem

  • 1. Apache NiFi in the Hadoop Ecosystem Bryan Bende – Member of Technical Staff Hadoop Summit 2016 Dublin
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Outline • What is Apache NiFi? • Hadoop Ecosystem Integrations • Use-Case/Architecture Discussion • Future Work
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved About Me • Member of Technical Staff on Hortonworks DataFlow Team • Apache NiFi Committer & PMC Member • Integrations for HBase, Solr, Syslog, Stream Processing • Twitter: @bbende / Blog: bryanbende.com
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is Apache NiFi?
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Simplistic View of Enterprise Data Flow Data Flow Process and Analyze Data Acquire Data Store Data
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Different organizations/business units across different geographic locations… Realistic View of Enterprise Data Flow
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Interacting with different business partners and customers Realistic View of Enterprise Data Flow
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi • Created to address the challenges of global enterprise dataflow • Key features: – Visual Command and Control – Data Lineage (Provenance) – Data Prioritization – Data Buffering/Back-Pressure – Control Latency vs. Throughput – Secure Control Plane / Data Plane – Scale Out Clustering – Extensibility
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache NiFi What is Apache NiFi used for? • Reliable and secure transfer of data between systems • Delivery of data from sources to analytic platforms • Enrichment and preparation of data: – Conversion between formats – Extraction/Parsing – Routing decisions What is Apache NiFi NOT used for? • Distributed Computation • Complex Event Processing • Joins / Complex Rolling Window Operations
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoop Ecosystem Integrations
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Ingest MergeContent • Merges into appropriately sized files for HDFS • Based on size, number of messages, and time UpdateAttribute • Sets the HDFS directory and filename • Use expression language to dynamically bin by date: /data/${now():format('yyyy/MM/dd/HH')}/ PutHDFS • Writes FlowFile content to HDFS • Supports Conflict Resolution Strategy and Kerberos authentication
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Retrieval ListHDFS • Periodically perform listing on HDFS directory • Produces FlowFile per HDFS file • Flow only contains HDFS path & filename FetchHDFS • Retrieves a file from HDFS • Use incoming FlowFiles to dynamically fetch: HDFS Filename: ${path}/${filename}
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HDFS Retrieval in a Cluster NCM Node 1 (Primary) ListHDFS HDFS FetchHDFS RPG Input Port Node 2 ListHDFS FetchHDFS RPG Input Port • Perform “list” operation on primary node • Send results to Remote Process Group pointing back to same cluster • Redistributes results to all nodes to perform “fetch” in parallel • Same approach for ListFile + FetchFile and ListSFTP + FetchSFTP
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HBase Integration • ControllerService wrapping HBase Client • Implementation provided for HBase 1.1.2 Client • Other implementations could be built as an extension
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HBase Ingest – Single Cell • Table, Row Id, Col Family, and Col Qualifier provided in processor, or dynamically from attributes • FlowFile content becomes the cell value • Batch Size to specify maximum number of cells for a single ‘put’ operation
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HBase Ingest – Full Row • Table and Column Family provided in processor, or dynamically from attributes • Row ID can be a field in JSON, or a FlowFile attribute • JSON Field/Values become Column Qualifiers and Values • Complex Field Strategy – Fail – Warn – Ignore – Text
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Kafka Integration PutKafka • Provide Broker and Topic Name • Publishes FlowFile content as one or more messages • Ability to send large delimited content, split into messages by NiFi GetKafka • Provide ZK Connection String and Topic Name • Produces a FlowFile for each message consumed
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Stream Processing Integration • Stream processing systems generally pull data, then push results • NiFi Site-To-Site pushes and pulls between NiFi instances • The Site-To-Site Client can be used from a stream processing platform https://github.com/apache/nifi/tree/master/nifi -commons/nifi-site-to-site-client Stream Processing Site-To-Site Client NiFi Output Port Stream Processing Site-To-Site Client NiFi Input Port Pull Push
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Site-to-Site Client Overview • Push to Input Port, or Pull from Output Port • Communicate with NiFi clusters, or standalone instances • Handles load balancing and reliable delivery • Secure connections using certificates (optional) SiteToSiteClientConfig clientConfig = new SiteToSiteClient.Builder() .url(“http://localhost:8080/nifi”) .portName(”My Port”) .buildConfig();
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Site-To-Site Client Pulling SiteToSiteClient client = ... Transaction transaction = client.createTransaction(TransferDirection.RECEIVE); DataPacket dataPacket = transaction.receive(); while (dataPacket != null) { ... } transaction.confirm(); transaction.complete();
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Site-To-Site Client Pushing SiteToSiteClient client = ... Transaction transaction = client.createTransaction(TransferDirection.SEND); NiFiDataPacket data = ... transaction.send(data.getContent(), data.getAttributes()); transaction.confirm(); transaction.complete();
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Current Stream Processing Integrations Spark Streaming - NiFi Spark Receiver • https://github.com/apache/nifi/tree/master/nifi-external/nifi-spark-receiver Storm – NiFi Spout • https://github.com/apache/nifi/tree/master/nifi-external/nifi-storm-spout Flink – NiFi Source & Sink • https://github.com/apache/flink/tree/master/flink-streaming-connectors/flink-connector-nifi Apex - NiFi Input Operators & Output Operators • https://github.com/apache/incubator-apex- malhar/tree/master/contrib/src/main/java/com/datatorrent/contrib/nifi
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Other Relevant Integrations • GetSolr, PutSolrContentStream • FetchElasticSearch, PutElasticSearch • GetMongo, PutMongo • QueryCassandra, PutCassandraQL • GetCouchbaseKey, PutCouchbaseKey • QueryDatabaseTable, ExecuteSQL, PutSQL • GetSplunk, PutSplunk • And more! https://nifi.apache.org/docs.html
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use-Case/Architecture Discussion
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Drive Data to Core for Analysis NiFi Stream Processing NiFi NiFi • Drive data from sources to central data center for analysis • Tiered collection approach at various locations, think regional data centers Edge Edge Core Batch Analytics
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dynamically Adjusting Data Flows • Push analytic results back to core NiFi • Push results back to edge locations/devices to change behavior NiFi NiFi NiFi Edge Edge Core Batch Analytics Stream Processing
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 1. Logs filtered by level and sent from Edge -> Core 2. Stream processing produces new filters based on rate & sends back to core 3. Edge polls core for new filter levels & updates filtering Example: Dynamic Log Collection Core NiFi Streaming Processing Edge NiFi Logs Logs New Filters Logs Output Log Input Log Output Result Input Store Result Service Fetch ResultPoll Service Filter New Filters New Filters Poll Analytic
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dynamic Log Collection – Edge NiFi
  • 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dynamic Log Collection – Core NiFi
  • 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Dynamic Log Collection Summary NiFi NiFi NiFi Edge Edge Core Stream Processing Logs Logs Logs New FiltersNew Filters New Filters
  • 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Future Work
  • 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The Future – Ecosystem Integrations • Ambari – Support a fully managed NiFi Cluster through Ambari – Monitoring, management, upgrades, etc. • Ranger – Ability to delegate authorization decisions to Ranger – Manage authorization controls through Ranger • Atlas – Track lineage from the source to destination – Apply tags to data as its acquired
  • 33. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The Future – Apache NiFi • HA Control Plane – Zero Master cluster, Web UI accessible from any node – Auto-Election of “Cluster Coordinator” and “Primary Node” through ZooKeeper • HA Data Plane – Ability to replicate data across nodes in a cluster • Multi-Tenancy – Restrict Access to portions of a flow – Allow people/groups with in an organization to only access their portions of the flow • Extension Registry – Create a central repository of NARs and Templates – Move most NARs out of Apache NiFi distribution, ship with a minimal set
  • 34. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved The Future – Apache NiFi • Variable Registry – Define environment specific variables through the UI, reference through EL – Make templates more portable across environments/instances • Redesign of User Interface – Modernize look & feel, improve usability, support multi-tenancy • Continued Development of Integration Points – New processors added continuously! • MiNiFi – Complimentary data collection agent to NiFi’s current approach – Small, lightweight, centrally managed agent that integrates with NiFi for follow-on dataflow management
  • 35. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thanks! • Questions? • Contact Info: – Email: bbende@hortonworks.com – Twitter: @bbende
  • 36. 36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank you