SlideShare a Scribd company logo
Apache NiFi 1.0 in
Nutshell
Koji Kawamura – Software Engineer
Arti Wadhwani – Technical Support Engineer
2016 October 27
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
What’s NiFi
NiFi 1.0 Enhancements
NiFi on the edge
Common issues
What’s Next?
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
What’s NiFi
NiFi 1.0 Enhancements
NiFi on the edge
Common issues
What’s Next?
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
November 2014
NiFi is donated to the Apache Software Foundation
(ASF) through NSA’s Technology Transfer Program
and enters ASF’s incubator.
2006
NiagaraFiles (NiFi) was first incepted at the National
Security Agency (NSA)
A Brief History
July 2015
NiFi reaches ASF top-level project status
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
” NiFi is like digging irrigation ditches as the water flows, rather than building out a
sprinkler system in advance."
“NiFiは事前にスプリンクラーを配備するというより、
水が流れるのに合わせて用水路を整備するようなもんさ”
https://mail-archives.apache.org/mod_mbox/nifi-users/201604.mbox/%3C2FCCBD60-0A79-42F1-9F9B-A121591C826E@apache.org%3E
What’s Apache NiFi?
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi is a tool for
Data Flow
Management
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Store Data
Process and Analyze
Data
Acquire Data
Simplistic View of DataFlows: Easy, Definitive
Dataflow
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Realistic View of Dataflows: Complex, Convoluted
Store Data
Process and Analyze
Data
Acquire Data
Store DataStore Data
Store Data
Store Data
Acquire Data
Acquire Data
Acquire Data
Dataflow
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0 has 170+ Processors, 30% Increase from NiFi 0.7
Hash
Extract
Merge
Duplicate
Scan
GeoEnrich
Replace
ConvertSplit
Translate
Route Content
Route Context
Route Text
Control Rate
Distribute Load
Generate Table Fetch
Jolt Transform JSON
Prioritized Delivery
Encrypt
Tail
Evaluate
Execute
HL7
FTP
UDP
XML
SFTP
HTTP
Syslog
Email
HTML
Image
AMQP
MQTT
All Apache project logos are trademarks of the ASF and the respective projects.
Fetch
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Deeper Ecosystem Integration – New Processors
Processor Description
Publish/ConsumeKafka Two NARs, with kafka 0.9/0.10 client libraries, respectively
JoltTransformJson Manipulate JSON data on the fly, with a preview functionality
GenerateTableFetch Incremental fetch + parallel fetch against source table partitions
PutHiveQL Ingest to Hive tables
SelectHiveQL Select from Hive tables
PutHiveStreaming ingest streaming data to Hive, leverage Hive streaming API
CovertAvroToORC Format conversation, Avro to ORC
Publish/ConsumeMQTT MQTT is a popular protocol in IoT world
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SOURCES REGIONAL
INFRASTRUCTURE
CORE
INFRASTRUCTURE
Data Movement Management
Constrained
High-Latency
Localized Context
Hybrid – Cloud/On-Premise
Low-Latency
Global Context
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Hortonworks DataFlow (HDF)
 Constrained
 High-latency
 Localized context
 Hybrid – cloud/on-premises
 Low-latency
 Global context
SOURCES
REGIONAL
INFRASTRUCTURE
CORE
INFRASTRUCTURE
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Flow Management
Detailed Break Down of Requirements
 Req 1: Acquire data from various Wearable Device’s Cloud Instances
 Req 2: Move Data from Customer Cloud Instances to on-premise instance
 Req 3: Perform intelligent Routing & Filtering of data. The routing and filtering rules will be often changed at
run-time.
 Req 4: Deliver the data data to various downstream systems. New downstream apps should will always appear
and the data should be fed to it when it comes online.
 Req 5: Parse the device data to standardized format that downstream sysem can understand
 Req 6: Enrich the data with contextual information including patient/customer info (age, gender, etc..)
 Req 7: Recognize the pattern when the resting heart rate exceeds a certain threshold (the insight), and then
create an alert/notification.
 Req 8: Run a Outlier detection model on streaming heart rate that comes in. If the score is above certain
threshold, alert on the heart rate.
Stream Processing & Analytics
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
What’s NiFi
NiFi 1.0 Enhancements
NiFi on the edge
Common issues
What’s Next?
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0: Modernized UI
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Modernized UI – Complete Interface Redesign
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Connect Components to design your data flow
Component What for?
Processor Purpose built processing unit e.g. GetXXX, PutXXX
Input Port Receiving data endpoint btw Process Groups (local/remote)
Output Port Exposing data endpoint btw Process Groups (local/remote)
Process Group Must have, to design well structured data flow
Remote Process Group Enable data transfer btw NiFi deployments via Site-to-Site
Funnel Bundle multiple relationships into one
Template Share part of data flow
Label Useful to visually group processors, and description
From left to right
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Data Provenance
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0: Multitenant Authorization
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 0.x - Authorization Model
 Previously had role based authorization
– Dataflow Manager (DFM)
– Monitor
– Provenance
– Admin
– Proxy
– NiFi
 Limitation - All or nothing model
– DFM can change everything, Monitor can change nothing
– Can’t give a user ability to modify/view only certain components
– Would require standing up multiple NiFi instances
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0 - Authorization Model
 NiFi 1.0 introduces a new delegated authorization model
 Authorize each request based on user identity, action, and resource
– Example for user1 modifying properties on processor1:
• User Identity: user1
• Action: WRITE
• Resource: processor1 (uuid)
 If authorizer says resource not found, parent is checked… if parent isn’t found, parent’s
parent is checked, and so on…
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0 – NiFi Managed Authorizer vs. External Authorizer
 Managed Authorizer
– File based persistence
• Could be be extended to other persistence mechanisms
– NiFi UI to manage policies
– NiFi controls authorization logic
 External Authorizer
– Ranger integration
– Ranger UI to manage policies
– Ranger controls authorization logic
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0 – Managing Users
 Clicking the new user icon
allows the admin to create
Users and Groups
– Individual Users can be grouped
– Groups can be assigned
members
 Clicking the edit user icon
allows the admin to update a
specific User/Group
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0 – UI Overview
Users Icon in Global
Menu used to access
Users/Groups
Lock Icon in Global
Menu used to
access
Global policies
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0 – UI Overview
Lock Icon in palette
used to access
policies for currently
selected component
Selection Context
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0 – Overriding Component Policies
 Component inherit policies
from the closest ancestor
Process Group with policies
defined
 View/Modify policies
handled independently
 Click Override to define a
new policy, then add Users
and Groups
 New Users and Groups
override the inherited
policies (whitelisting)
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0 - Multi-Tenancy Example
 Create a Group for Team 1 and a Group for Team 2
 Give Team 1 view & modify for Process Group 1
 Give Team 2 view & modify for Process Group 2
 A user from Team 1 would see:
Can’t see the name of the group and
can’t right-click to configure the
group, but can enter the group
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0 – Revisions
 Revision per component
 Supports concurrent editing of different components without need for refreshing
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0: Zero Master Clustering
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 0.x: NCM (NiFi Cluster Manager)
NCM
Node1
Node2
External
Data Source
Chunk
Chunk
Chunk
Distribution mechanism
depends on data source
Web
UI
Other
NiFi
Interact with NCM
Site-to-Site:
Get topology from NCM
Then transfer data p2p
Primary
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0: ZMC (Zero Master Clustering)
Node1
Node2
Node3
External
Data Source
Chunk
Chunk
Chunk
Distribution mechanism
depends on data source
Web
UI
Other
NiFi
Interact with any node
Site-to-Site:
Get topology from one of nodes
Then transfer data p2p
Zookeeper
Primary
Coordinator
Zookeeper elects
Cluster Coordinator and Primary node
Any node can fail
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi 1.0: And More!
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Foundational Work for SDLC
 Deterministic template export
– Deterministic ordering, template xml file
– Version control of the template
– Collaborative SDLC effort
 Variable registry
– Phase one implementation
– In-memory variable registry
– The same key referenced in a template, mapped to different environmental
specific values
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved© Hortonworks Inc. 2011 – 2016. All Rights ReservedX
Enter the TLS Toolkit
⬢ Command-line tool to automate
certificate generation and
configuration
⬢ Self-contained certificate authority
(CA) for certificate signing
⬢ Keystore & truststore generation
⬢ Client certificate generation
⬢ Automatically updates nifi.properties
⬢ Underpins Ambari TLS integration
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
JVM
REST API
NiFi
Framework
Proc CS
Report
Task
Extension API
S2S
API
JVM
S2S Client Libraries
Site-to-Site Refactoring – S2S HTTP(S) Protocol through Proxy Server
Socket protocol: TCP
HDF 2.0: HTTP(s) protocol
HTTP proxy
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
What’s NiFi
NiFi 1.0 Enhancements
NiFi on the edge
Common issues
What’s Next?
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Edge Intelligence with Apache MiNiFi
 Guaranteed delivery
 Data buffering
‒ Backpressure
‒ Pressure release
 Prioritized queuing
 Flow specific QoS
‒ Latency vs. throughput
‒ Loss tolerance
 Data provenance
 Recovery / recording a rolling log
of fine-grained history
 Designed for extension
Different from Apache NiFi
 Design and Deploy
 Warm re-deploys
Key Features
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
NiFi vs. MiNiFi Java Processor, Smaller Footprint ~40 MB
NiFi Framework
Components
MiNiFi
NiFi Framework
User Interface
Components
NiFi
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
What’s NiFi
NiFi 1.0 Enhancements
NiFi on the edge
Common issues
What’s Next?
40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Common issues
 Hbase Connection Issues - ClassNotFoundException
 NiFi SSL issues
 ExecuteSQL Processor issues
 NiFi Content Repo full
 PutKafka/GetKafka issues
 Issues after enabling Kerberos
 OutOfMemory Issues
41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Interesting Issues/Use Cases
 TBD (need to add 2-3 interesting issues/use cases)
42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Best Practices
 Debug Logging in case of Processor issues
 NiFi Site-to-Site Practices
 Core Properties tuning
 JVM tuning
 Understanding health via NiFi UI
43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Agenda
What’s NiFi
NiFi 1.0 Enhancements
NiFi on the edge
Common issues
What’s Next?
44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
What’s Next
 Framework extension
– Distributed data durability (HA
data)
– Configuration management flows
(SDLC)
 Enhanced User Experience
– Template/Extension Registry
– Variable Registry
 Deeper ecosystem integration
 Central Command and Control
 Native Agent (GA)
NiFi MiNiFi
https://cwiki.apache.org/confluence/display/NIFI/Product+requirements
Nifi product requirements Search!
45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You

More Related Content

What's hot

Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
DataWorks Summit/Hadoop Summit
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFi
DataWorks Summit
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and Flink
Bryan Bende
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
alanfgates
 
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPANNetwork for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
DataWorks Summit/Hadoop Summit
 
Apache NiFi Crash Course Intro
Apache NiFi Crash Course IntroApache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
DataWorks Summit/Hadoop Summit
 
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the EnterpriseEnabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFi
Hortonworks
 
Building a Smarter Home with Apache NiFi and Spark
Building a Smarter Home with Apache NiFi and SparkBuilding a Smarter Home with Apache NiFi and Spark
Building a Smarter Home with Apache NiFi and Spark
DataWorks Summit/Hadoop Summit
 
Hive present-and-feature-shanghai
Hive present-and-feature-shanghaiHive present-and-feature-shanghai
Hive present-and-feature-shanghai
Yifeng Jiang
 
Intro to Spark with Zeppelin
Intro to Spark with ZeppelinIntro to Spark with Zeppelin
Intro to Spark with Zeppelin
Hortonworks
 
What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4
DataWorks Summit
 
Hdp developer apache spark using python (lab guide) by hortonworks university...
Hdp developer apache spark using python (lab guide) by hortonworks university...Hdp developer apache spark using python (lab guide) by hortonworks university...
Hdp developer apache spark using python (lab guide) by hortonworks university...
ssusercda69b
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
DataWorks Summit/Hadoop Summit
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
DataWorks Summit
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC IsilonScaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Hortonworks
 

What's hot (20)

Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Connecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFiConnecting the Drops with Apache NiFi & Apache MiNiFi
Connecting the Drops with Apache NiFi & Apache MiNiFi
 
Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1Hortonworks Data in Motion Webinar Series - Part 1
Hortonworks Data in Motion Webinar Series - Part 1
 
Integrating NiFi and Flink
Integrating NiFi and FlinkIntegrating NiFi and Flink
Integrating NiFi and Flink
 
Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017Hive edw-dataworks summit-eu-april-2017
Hive edw-dataworks summit-eu-april-2017
 
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPANNetwork for the Large-scale Hadoop cluster at Yahoo! JAPAN
Network for the Large-scale Hadoop cluster at Yahoo! JAPAN
 
Apache NiFi Crash Course Intro
Apache NiFi Crash Course IntroApache NiFi Crash Course Intro
Apache NiFi Crash Course Intro
 
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the EnterpriseEnabling Apache Zeppelin and Spark for Data Science in the Enterprise
Enabling Apache Zeppelin and Spark for Data Science in the Enterprise
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
Mission to NARs with Apache NiFi
Mission to NARs with Apache NiFiMission to NARs with Apache NiFi
Mission to NARs with Apache NiFi
 
Building a Smarter Home with Apache NiFi and Spark
Building a Smarter Home with Apache NiFi and SparkBuilding a Smarter Home with Apache NiFi and Spark
Building a Smarter Home with Apache NiFi and Spark
 
Hive present-and-feature-shanghai
Hive present-and-feature-shanghaiHive present-and-feature-shanghai
Hive present-and-feature-shanghai
 
Intro to Spark with Zeppelin
Intro to Spark with ZeppelinIntro to Spark with Zeppelin
Intro to Spark with Zeppelin
 
What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4What’s new in Apache Spark 2.3 and Spark 2.4
What’s new in Apache Spark 2.3 and Spark 2.4
 
Hdp developer apache spark using python (lab guide) by hortonworks university...
Hdp developer apache spark using python (lab guide) by hortonworks university...Hdp developer apache spark using python (lab guide) by hortonworks university...
Hdp developer apache spark using python (lab guide) by hortonworks university...
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the EnterpriseUsing Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Scaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC IsilonScaling real time streaming architectures with HDF and Dell EMC Isilon
Scaling real time streaming architectures with HDF and Dell EMC Isilon
 

Viewers also liked

Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
DataWorks Summit/Hadoop Summit
 
Rebuilding Web Tracking Infrastructure for Scale
Rebuilding Web Tracking Infrastructure for ScaleRebuilding Web Tracking Infrastructure for Scale
Rebuilding Web Tracking Infrastructure for Scale
DataWorks Summit/Hadoop Summit
 
The truth about SQL and Data Warehousing on Hadoop
The truth about SQL and Data Warehousing on HadoopThe truth about SQL and Data Warehousing on Hadoop
The truth about SQL and Data Warehousing on Hadoop
DataWorks Summit/Hadoop Summit
 
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...
DataWorks Summit/Hadoop Summit
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
DataWorks Summit/Hadoop Summit
 
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
DataWorks Summit/Hadoop Summit
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
DataWorks Summit/Hadoop Summit
 
Data science lifecycle with Apache Zeppelin
Data science lifecycle with Apache ZeppelinData science lifecycle with Apache Zeppelin
Data science lifecycle with Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
The real world use of Big Data to change business
The real world use of Big Data to change businessThe real world use of Big Data to change business
The real world use of Big Data to change business
DataWorks Summit/Hadoop Summit
 
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
DataWorks Summit/Hadoop Summit
 
Protecting Enterprise Data In Apache Hadoop
Protecting Enterprise Data In Apache HadoopProtecting Enterprise Data In Apache Hadoop
Protecting Enterprise Data In Apache Hadoop
DataWorks Summit/Hadoop Summit
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
DataWorks Summit/Hadoop Summit
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
DataWorks Summit/Hadoop Summit
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataUsing Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch data
DataWorks Summit/Hadoop Summit
 
Case study of DevOps for Hadoop in Recruit.
Case study of DevOps for Hadoop in Recruit.Case study of DevOps for Hadoop in Recruit.
Case study of DevOps for Hadoop in Recruit.
DataWorks Summit/Hadoop Summit
 
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/PigHivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
DataWorks Summit/Hadoop Summit
 
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
DataWorks Summit/Hadoop Summit
 
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEGenerating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
DataWorks Summit/Hadoop Summit
 
SEGA : Growth hacking by Spark ML for Mobile games
SEGA : Growth hacking by Spark ML for Mobile gamesSEGA : Growth hacking by Spark ML for Mobile games
SEGA : Growth hacking by Spark ML for Mobile games
DataWorks Summit/Hadoop Summit
 
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
DataWorks Summit/Hadoop Summit
 

Viewers also liked (20)

Apache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduceApache Hadoop 3.0 What's new in YARN and MapReduce
Apache Hadoop 3.0 What's new in YARN and MapReduce
 
Rebuilding Web Tracking Infrastructure for Scale
Rebuilding Web Tracking Infrastructure for ScaleRebuilding Web Tracking Infrastructure for Scale
Rebuilding Web Tracking Infrastructure for Scale
 
The truth about SQL and Data Warehousing on Hadoop
The truth about SQL and Data Warehousing on HadoopThe truth about SQL and Data Warehousing on Hadoop
The truth about SQL and Data Warehousing on Hadoop
 
Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...Data infrastructure architecture for medium size organization: tips for colle...
Data infrastructure architecture for medium size organization: tips for colle...
 
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
How to overcome mysterious problems caused by large and multi-tenancy Hadoop ...
 
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
From a single droplet to a full bottle, our journey to Hadoop at Coca-Cola Ea...
 
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
Near Real-Time Network Anomaly Detection and Traffic Analysis using Spark bas...
 
Data science lifecycle with Apache Zeppelin
Data science lifecycle with Apache ZeppelinData science lifecycle with Apache Zeppelin
Data science lifecycle with Apache Zeppelin
 
The real world use of Big Data to change business
The real world use of Big Data to change businessThe real world use of Big Data to change business
The real world use of Big Data to change business
 
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
 
Protecting Enterprise Data In Apache Hadoop
Protecting Enterprise Data In Apache HadoopProtecting Enterprise Data In Apache Hadoop
Protecting Enterprise Data In Apache Hadoop
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
 
Using Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch dataUsing Hadoop to build a Data Quality Service for both real-time and batch data
Using Hadoop to build a Data Quality Service for both real-time and batch data
 
Case study of DevOps for Hadoop in Recruit.
Case study of DevOps for Hadoop in Recruit.Case study of DevOps for Hadoop in Recruit.
Case study of DevOps for Hadoop in Recruit.
 
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/PigHivemall: Scalable machine learning library for Apache Hive/Spark/Pig
Hivemall: Scalable machine learning library for Apache Hive/Spark/Pig
 
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
 
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNEGenerating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
Generating Recommendations at Amazon Scale with Apache Spark and Amazon DSSTNE
 
SEGA : Growth hacking by Spark ML for Mobile games
SEGA : Growth hacking by Spark ML for Mobile gamesSEGA : Growth hacking by Spark ML for Mobile games
SEGA : Growth hacking by Spark ML for Mobile games
 
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
Use case and Live demo : Agile data integration from Legacy system to Hadoop ...
 

Similar to Apache NiFi 1.0 in Nutshell

Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiData at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Aldrin Piri
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
Joe Percivall
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & Community
Accumulo Summit
 
Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup Slides
Isheeta Sanghi
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
Isheeta Sanghi
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
Hortonworks
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
Isheeta Sanghi
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
Isheeta Sanghi
 
Integrating NiFi and Apex
Integrating NiFi and ApexIntegrating NiFi and Apex
Integrating NiFi and Apex
Bryan Bende
 
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHarnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Haimo Liu
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Data Con LA
 
Integrating Apache NiFi and Apache Apex
Integrating Apache NiFi and Apache Apex Integrating Apache NiFi and Apache Apex
Integrating Apache NiFi and Apache Apex
Apache Apex
 
Nifi workshop
Nifi workshopNifi workshop
Nifi workshop
Yifeng Jiang
 
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
DataWorks Summit/Hadoop Summit
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep Dive
Bryan Bende
 
Introduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability MeetupIntroduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability Meetup
Saptak Sen
 
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFiTaking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Bryan Bende
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks
 
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionHDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi Introduction
Milind Pandit
 
Apache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop SummitApache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop Summit
Aldrin Piri
 

Similar to Apache NiFi 1.0 in Nutshell (20)

Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFiData at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
Data at Scales and the Values of Starting Small with Apache NiFi & MiNiFi
 
The Avant-garde of Apache NiFi
The Avant-garde of Apache NiFiThe Avant-garde of Apache NiFi
The Avant-garde of Apache NiFi
 
State of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & CommunityState of the Apache NiFi Ecosystem & Community
State of the Apache NiFi Ecosystem & Community
 
Apache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup SlidesApache NiFi- MiNiFi meetup Slides
Apache NiFi- MiNiFi meetup Slides
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 
Integrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache FlinkIntegrating Apache NiFi and Apache Flink
Integrating Apache NiFi and Apache Flink
 
Integrating NiFi and Apex
Integrating NiFi and ApexIntegrating NiFi and Apex
Integrating NiFi and Apex
 
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFIHarnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
Harnessing Data-in-Motion with HDF 2.0, introduction to Apache NIFI/MINIFI
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
 
Integrating Apache NiFi and Apache Apex
Integrating Apache NiFi and Apache Apex Integrating Apache NiFi and Apache Apex
Integrating Apache NiFi and Apache Apex
 
Nifi workshop
Nifi workshopNifi workshop
Nifi workshop
 
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data AnalysisApache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis
 
NJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep DiveNJ Hadoop Meetup - Apache NiFi Deep Dive
NJ Hadoop Meetup - Apache NiFi Deep Dive
 
Introduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability MeetupIntroduction to Apache NiFi - Seattle Scalability Meetup
Introduction to Apache NiFi - Seattle Scalability Meetup
 
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFiTaking DataFlow Management to the Edge with Apache NiFi/MiNiFi
Taking DataFlow Management to the Edge with Apache NiFi/MiNiFi
 
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
Hortonworks Data in Motion Webinar Series Part 7 Apache Kafka Nifi Better Tog...
 
HDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi IntroductionHDF Powered by Apache NiFi Introduction
HDF Powered by Apache NiFi Introduction
 
Apache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop SummitApache NiFi Crash Course - San Jose Hadoop Summit
Apache NiFi Crash Course - San Jose Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
DataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage SchemesScaling HDFS to Manage Billions of Files with Distributed Storage Schemes
Scaling HDFS to Manage Billions of Files with Distributed Storage Schemes
 

Recently uploaded

UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 

Recently uploaded (20)

UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 

Apache NiFi 1.0 in Nutshell

  • 1. Apache NiFi 1.0 in Nutshell Koji Kawamura – Software Engineer Arti Wadhwani – Technical Support Engineer 2016 October 27
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda What’s NiFi NiFi 1.0 Enhancements NiFi on the edge Common issues What’s Next?
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda What’s NiFi NiFi 1.0 Enhancements NiFi on the edge Common issues What’s Next?
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved November 2014 NiFi is donated to the Apache Software Foundation (ASF) through NSA’s Technology Transfer Program and enters ASF’s incubator. 2006 NiagaraFiles (NiFi) was first incepted at the National Security Agency (NSA) A Brief History July 2015 NiFi reaches ASF top-level project status
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved ” NiFi is like digging irrigation ditches as the water flows, rather than building out a sprinkler system in advance." “NiFiは事前にスプリンクラーを配備するというより、 水が流れるのに合わせて用水路を整備するようなもんさ” https://mail-archives.apache.org/mod_mbox/nifi-users/201604.mbox/%3C2FCCBD60-0A79-42F1-9F9B-A121591C826E@apache.org%3E What’s Apache NiFi?
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi is a tool for Data Flow Management
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Store Data Process and Analyze Data Acquire Data Simplistic View of DataFlows: Easy, Definitive Dataflow
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Realistic View of Dataflows: Complex, Convoluted Store Data Process and Analyze Data Acquire Data Store DataStore Data Store Data Store Data Acquire Data Acquire Data Acquire Data Dataflow
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi 1.0 has 170+ Processors, 30% Increase from NiFi 0.7 Hash Extract Merge Duplicate Scan GeoEnrich Replace ConvertSplit Translate Route Content Route Context Route Text Control Rate Distribute Load Generate Table Fetch Jolt Transform JSON Prioritized Delivery Encrypt Tail Evaluate Execute HL7 FTP UDP XML SFTP HTTP Syslog Email HTML Image AMQP MQTT All Apache project logos are trademarks of the ASF and the respective projects. Fetch
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Deeper Ecosystem Integration – New Processors Processor Description Publish/ConsumeKafka Two NARs, with kafka 0.9/0.10 client libraries, respectively JoltTransformJson Manipulate JSON data on the fly, with a preview functionality GenerateTableFetch Incremental fetch + parallel fetch against source table partitions PutHiveQL Ingest to Hive tables SelectHiveQL Select from Hive tables PutHiveStreaming ingest streaming data to Hive, leverage Hive streaming API CovertAvroToORC Format conversation, Avro to ORC Publish/ConsumeMQTT MQTT is a popular protocol in IoT world
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SOURCES REGIONAL INFRASTRUCTURE CORE INFRASTRUCTURE Data Movement Management Constrained High-Latency Localized Context Hybrid – Cloud/On-Premise Low-Latency Global Context
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hortonworks DataFlow (HDF)  Constrained  High-latency  Localized context  Hybrid – cloud/on-premises  Low-latency  Global context SOURCES REGIONAL INFRASTRUCTURE CORE INFRASTRUCTURE
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Flow Management Detailed Break Down of Requirements  Req 1: Acquire data from various Wearable Device’s Cloud Instances  Req 2: Move Data from Customer Cloud Instances to on-premise instance  Req 3: Perform intelligent Routing & Filtering of data. The routing and filtering rules will be often changed at run-time.  Req 4: Deliver the data data to various downstream systems. New downstream apps should will always appear and the data should be fed to it when it comes online.  Req 5: Parse the device data to standardized format that downstream sysem can understand  Req 6: Enrich the data with contextual information including patient/customer info (age, gender, etc..)  Req 7: Recognize the pattern when the resting heart rate exceeds a certain threshold (the insight), and then create an alert/notification.  Req 8: Run a Outlier detection model on streaming heart rate that comes in. If the score is above certain threshold, alert on the heart rate. Stream Processing & Analytics
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda What’s NiFi NiFi 1.0 Enhancements NiFi on the edge Common issues What’s Next?
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi 1.0: Modernized UI
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Modernized UI – Complete Interface Redesign
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Connect Components to design your data flow Component What for? Processor Purpose built processing unit e.g. GetXXX, PutXXX Input Port Receiving data endpoint btw Process Groups (local/remote) Output Port Exposing data endpoint btw Process Groups (local/remote) Process Group Must have, to design well structured data flow Remote Process Group Enable data transfer btw NiFi deployments via Site-to-Site Funnel Bundle multiple relationships into one Template Share part of data flow Label Useful to visually group processors, and description From left to right
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Data Provenance
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi 1.0: Multitenant Authorization
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi 0.x - Authorization Model  Previously had role based authorization – Dataflow Manager (DFM) – Monitor – Provenance – Admin – Proxy – NiFi  Limitation - All or nothing model – DFM can change everything, Monitor can change nothing – Can’t give a user ability to modify/view only certain components – Would require standing up multiple NiFi instances
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi 1.0 - Authorization Model  NiFi 1.0 introduces a new delegated authorization model  Authorize each request based on user identity, action, and resource – Example for user1 modifying properties on processor1: • User Identity: user1 • Action: WRITE • Resource: processor1 (uuid)  If authorizer says resource not found, parent is checked… if parent isn’t found, parent’s parent is checked, and so on…
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi 1.0 – NiFi Managed Authorizer vs. External Authorizer  Managed Authorizer – File based persistence • Could be be extended to other persistence mechanisms – NiFi UI to manage policies – NiFi controls authorization logic  External Authorizer – Ranger integration – Ranger UI to manage policies – Ranger controls authorization logic
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi 1.0 – Managing Users  Clicking the new user icon allows the admin to create Users and Groups – Individual Users can be grouped – Groups can be assigned members  Clicking the edit user icon allows the admin to update a specific User/Group
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi 1.0 – UI Overview Users Icon in Global Menu used to access Users/Groups Lock Icon in Global Menu used to access Global policies
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi 1.0 – UI Overview Lock Icon in palette used to access policies for currently selected component Selection Context
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi 1.0 – Overriding Component Policies  Component inherit policies from the closest ancestor Process Group with policies defined  View/Modify policies handled independently  Click Override to define a new policy, then add Users and Groups  New Users and Groups override the inherited policies (whitelisting)
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi 1.0 - Multi-Tenancy Example  Create a Group for Team 1 and a Group for Team 2  Give Team 1 view & modify for Process Group 1  Give Team 2 view & modify for Process Group 2  A user from Team 1 would see: Can’t see the name of the group and can’t right-click to configure the group, but can enter the group
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi 1.0 – Revisions  Revision per component  Supports concurrent editing of different components without need for refreshing
  • 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi 1.0: Zero Master Clustering
  • 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi 0.x: NCM (NiFi Cluster Manager) NCM Node1 Node2 External Data Source Chunk Chunk Chunk Distribution mechanism depends on data source Web UI Other NiFi Interact with NCM Site-to-Site: Get topology from NCM Then transfer data p2p Primary
  • 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi 1.0: ZMC (Zero Master Clustering) Node1 Node2 Node3 External Data Source Chunk Chunk Chunk Distribution mechanism depends on data source Web UI Other NiFi Interact with any node Site-to-Site: Get topology from one of nodes Then transfer data p2p Zookeeper Primary Coordinator Zookeeper elects Cluster Coordinator and Primary node Any node can fail
  • 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi 1.0: And More!
  • 33. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Foundational Work for SDLC  Deterministic template export – Deterministic ordering, template xml file – Version control of the template – Collaborative SDLC effort  Variable registry – Phase one implementation – In-memory variable registry – The same key referenced in a template, mapped to different environmental specific values
  • 34. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved© Hortonworks Inc. 2011 – 2016. All Rights ReservedX Enter the TLS Toolkit ⬢ Command-line tool to automate certificate generation and configuration ⬢ Self-contained certificate authority (CA) for certificate signing ⬢ Keystore & truststore generation ⬢ Client certificate generation ⬢ Automatically updates nifi.properties ⬢ Underpins Ambari TLS integration
  • 35. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved JVM REST API NiFi Framework Proc CS Report Task Extension API S2S API JVM S2S Client Libraries Site-to-Site Refactoring – S2S HTTP(S) Protocol through Proxy Server Socket protocol: TCP HDF 2.0: HTTP(s) protocol HTTP proxy
  • 36. 36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda What’s NiFi NiFi 1.0 Enhancements NiFi on the edge Common issues What’s Next?
  • 37. 37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Edge Intelligence with Apache MiNiFi  Guaranteed delivery  Data buffering ‒ Backpressure ‒ Pressure release  Prioritized queuing  Flow specific QoS ‒ Latency vs. throughput ‒ Loss tolerance  Data provenance  Recovery / recording a rolling log of fine-grained history  Designed for extension Different from Apache NiFi  Design and Deploy  Warm re-deploys Key Features
  • 38. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved NiFi vs. MiNiFi Java Processor, Smaller Footprint ~40 MB NiFi Framework Components MiNiFi NiFi Framework User Interface Components NiFi
  • 39. 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda What’s NiFi NiFi 1.0 Enhancements NiFi on the edge Common issues What’s Next?
  • 40. 40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Common issues  Hbase Connection Issues - ClassNotFoundException  NiFi SSL issues  ExecuteSQL Processor issues  NiFi Content Repo full  PutKafka/GetKafka issues  Issues after enabling Kerberos  OutOfMemory Issues
  • 41. 41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Interesting Issues/Use Cases  TBD (need to add 2-3 interesting issues/use cases)
  • 42. 42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Best Practices  Debug Logging in case of Processor issues  NiFi Site-to-Site Practices  Core Properties tuning  JVM tuning  Understanding health via NiFi UI
  • 43. 43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Agenda What’s NiFi NiFi 1.0 Enhancements NiFi on the edge Common issues What’s Next?
  • 44. 44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What’s Next  Framework extension – Distributed data durability (HA data) – Configuration management flows (SDLC)  Enhanced User Experience – Template/Extension Registry – Variable Registry  Deeper ecosystem integration  Central Command and Control  Native Agent (GA) NiFi MiNiFi https://cwiki.apache.org/confluence/display/NIFI/Product+requirements Nifi product requirements Search!
  • 45. 45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank You

Editor's Notes

  1. Talk Track: In general, dataflows are pictured in our minds as being simple, definitive and relatively linear.
  2. Talk Track In reality, dataflows move all over. Data is moved and stored in multiple places – sometimes interim, sometimes longterm. Data is procesed in different places, and then moved again. Complicated, convoluted, messy.
  3. Hortonworks: Powering the Future of Data
  4. Talk Track: HDF helps move data from edge data sources, through to regional and core data centers, and then back out again. There are many ways to move data into the data center, but ONLY Hortonworks Dataflow creates a data control plane for bi-directional feedback that sends context and commands back to the source to adjust on the fly, provenance capability which means you can fully trace the origin and path of data in real time to verify trust
  5. TALK TRACK Hortonworks DataFlow is powered by Apache NiFI, Kafka, and Storm) – all key components of any streaming data architecture. MiNiFi/NiFi : dynamic, configurable data pipelines Kafka to adapt to differing rates of data creation and delivery Storm for real-time streaming analytics to create immediate insights at a massive scale. Only Hortonworks offers all of this as part of a Connected Data Platform that optimizes for delivery into HDP (HDFS, Hive, Spark, Hbase, etc…) There are scenarios where NiFI will provide all that you you need, but you will notice the orange and blue horizontal triangles provide a continium of capability from edge to core, that indicates varying degrees of need for the different products.
  6. Assume two development teams – Team 1 and Team 2 Each team gets a Process Group and shouldn’t be able to interfere with other team
  7. TALK TRACK Apache MiNiFI is a sub project of Apache NiFi. It is designed to solve the difficulties of managing and transmitting data feeds to and from the source of origin, enabling edge intelligence to adjust dataflow behavior with bi-directional communication, out to the last mile of digital signal. It has a very small and lightweight footprint*, and generate the same level of data provenance as NiFi that is vital to edge analytics and IoAT (Internet of Any Thing) It’s a little bit diferent from NiF in that is is not a real-time command and control interface – in fact – the agent, unlike NiFi doesn’t have a built in UI at all. MiNiFi is designed for design and deploy situations and for “warm re-deploys”. HDF 2.0 supports the java version of the MiNiFi agent, and a C++ version is coming soon as well.
  8. 38