The Avant-garde of Apache NiFi

The Avant-garde of
Apache NiFi
Joe Percivall - @JPercivall
Hadoop Summit – Melbourne
31 August 2016

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
About Me
• Software Engineer at Hortonworks
• Apache NiFi committer and PMC member
• Github: github.com/JPercivall

Agenda
• Intro to NiFi
• What’s new in NiFi 1.0.0
• Intro to MiNiFi
• MiNiFi Architecture
• NiFi & MiNiFi Demo

Agenda
• Intro to Apache NiFi
• Intro to MiNiFi

Let’s Connect A to B
Producers A.K.A Things
Anything
AND
Everything
Internet!
Consumers
• User
• Storage
• System
• …More Things

Why is moving data effectively hard?
 Standards
 Formats
 “Exactly Once” Delivery
 Protocols
 Veracity of Information
 Validity of Information
 Ensuring Security
 Overcoming Security
 Compliance
 Schemas
 Consumers Change
 Credential Management
 “That [person|team|group]”
 Network
 “Exactly Once” Delivery

• Web-based User Interface for creating, monitoring,
& controlling data flows
• Directed graphs of data routing and transformation
• Highly configurable - modify data flow at runtime,
dynamically prioritize data
• Easily extensible through development of custom
components
• Data Provenance tracks data through entire system
[1] https://nifi.apache.org/
Dataflow
Apache NiFi

Apache NiFi
Key Features
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Supports push and pull
models
• Recovery/recording
a rolling log of fine-
grained history
• Visual command and
control
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering

Simplified Example
Let’s consider the needs of a courier service
Physical Store
Gateway
Server
Mobile Devices
Registers
Server Cluster
Distribution Center Core Data Center at HQ
Server Cluster
On Delivery Routes
Trucks Deliverers
Delivery Truck: Creative Stall, https://thenounproject.com/creativestall/
Deliverer: Rigo Peter, https://thenounproject.com/rigo/
Cash Register: Sergey Patutin, https://thenounproject.com/bdesign.by/
Hand Scanner: Eric Pearson, https://thenounproject.com/epearson001/

Great! I am collecting all this data! Let’s use it!
Finding our needles in the haystack
Physical Store
Gateway
Server
Mobile Devices
Registers
Server Cluster
Distribution Center
Kafka
Core Data Center at HQ
Server Cluster
Others
Storm / Spark /
Flink / Apex
Kafka
Storm / Spark / Flink / Apex
Trucks Deliverers
On Delivery Routes

Let’s revisit our courier service from the perspective of NiFi
Physical Store
Gateway
Server
Mobile Devices
Registers
Server Cluster
Distribution Center
Kafka
Core Data Center at HQ
Server Cluster
Others
Storm / Spark /
Flink / Apex
Kafka
Storm / Spark / Flink / Apex
Trucks Deliverers
NiFi NiFi NiFi NiFi NiFi NiFi
On Delivery Routes

Fundamental Terminology
FlowFile
• Unit of data moving through the system
• Content + Attributes (key/value pairs)
Processor
• Performs the work, can access FlowFiles
Connection
• Links between processors
• Queues that can be dynamically prioritized
git clone https://github.com/JPercivall/nifi-developer-tutorial.git

Agenda
• Intro to NiFi
• Intro to MiNiFi

Apache NiFi-1.0.0
Zero Master Clustering
UI Refresh
Multi-tenant authorization and internal
authorization/policy management
15+ new components
Over 450 tickets closed!

Zero Master Clustering

UI Refresh & Multi-tenant Authorization

Agenda
• Intro to NiFi
• Intro to MiNiFi

Revisit: Courier service from the perspective of NiFi
Physical Store
Gateway
Server
Mobile Devices
Registers
Server Cluster
Server Cluster
Trucks Deliverers
On Delivery Routes

Courier service from the perspective of NiFi & MiNiFi
Physical Store
Gateway
Server
Mobile Devices
Registers
Server Cluster
Server Cluster
Trucks Deliverers
Client
Libraries
Client
Libraries
MiNiFi
MiNiFi
Client
Libraries
On Delivery Routes

Apache NiFi MiNiFi
Key Features
• Data buffering
- Backpressure
- Pressure release
- Loss tolerance
• Data provenance
grained history
• Design and Deploy
• Warm re-deploys

Visual Command and Control
vs.
Design and Deploy

Created to more effectively collect
data at the edge

Agenda
• Intro to NiFi
• Intro to MiNiFi

NiFi vs MiNiFi Java Processes
NiFi Framework
Components
MiNiFi
NiFi Framework
User Interface
Components
NiFi

NiFi Java Processes
Bootstrap
NiFi
UI
bootstrap.conf
nifi.properties
flow.xml.gzreads &
modifies
reads
reads
starts
NiFi MiNiFi

MiNiFi Java Processes
MiNiFi
Bootstrap
Configuration
Change Notifier(s)
bootstrap.conf
nifi.properties
flow.xml.gz
reads
reads
starts
config.ymltransforms
reads
into
NiFi MiNiFi

Same Extensible Framework (nars)
 In minifi-0.0.1, the nifi-0.6.1 standard processors are bundled (~20mb)
– Tailing a Log
– UpdateAttribute
– Routing by content or attributes
– PutEmail
Allows MiNiFi to use NiFi processors

Simple Config.yml
Tail a rolling file -> Site to Site

Agenda
• Intro to NiFi
• Intro to MiNiFi

Courier service from the perspective of NiFi & MiNiFi
Physical Store
Gateway
Server
Mobile Devices
Registers
Server Cluster
Server Cluster
Trucks Deliverers
Client
Libraries
Client
Libraries
MiNiFi
MiNiFi
Client
Libraries
On Delivery Routes

Questions?

Thank you!

Learn more and join us!
Apache NiFi site
http://nifi.apache.org
Subproject MiNiFi site
http://nifi.apache.org/minifi/
Subscribe to and collaborate at
dev@nifi.apache.org
users@nifi.apache.org
Submit Ideas or Issues
https://issues.apache.org/jira/browse/NIFI
Follow us on Twitter
@apachenifi

Back-up

Matured at NSA 2006-2014
Brief history of the Apache NiFi Community
• Contributors from Government and several commercial industries
• Releases on a 6-8 week schedule
• Apache NiFi 1.0.0. release on the horizon
• Zero-Master Clustering
Code developed
at NSA
2006
Today
Achieved TLP
status in just
7 months
July 2015
Code available
open source
ASL v2
November 2014

A bit more complex Config.yml
Tail a rolling File -> Secure Site to Site with Provenance

MiNiFi 0.0.1-Java
 Declarative configuration of processing flows through a YAML configuration file
 Exporting of provenance events to another NiFi instance via a Reporting Task over Site
to Site
 Flow change configuration watcher implementations that provide reloading a NiFi
instance when receiving an updated flow over REST or changes on a file system
 Providing a mechanism to query an instance's status
 <40mb binary distribution
Release Notes

Change notifier update
MiNiFi
Bootstrap
Configuration
Change Notifiers
1. Initial state
–Both running

MiNiFi
Bootstrap
Configuration
Change Notifiers
user creates new configuration
2. User sends update through
notifier
–HTTP(S) post request
–Change watched file

MiNiFi
Bootstrap
Configuration
Change Notifiers
3. Bootstrap validation
–Basic validation
–Rest notifier will respond
accordingly
–Results logged
validate new configuration

MiNiFi
Bootstrap
Configuration
Change Notifiers
config.yml
saves new
4. Bootstrap saves and
transforms
–Copy old config.yml to a
swap file
nifi.properties
flow.xml.gz
transforms into

MiNiFi
Bootstrap
Configuration
Change Notifiers
nifi.properties
flow.xml.gz
attempt restart
config.yml
saves new
reads
transforms into
5. Bootstrap attempts restart
–MiNiFi reads in the new
nifi.properties and
flow.xml.gz

6. Success or Fail
–Successful restart continue
processing
–Failure, rollback to old
config
–Existing Data is mapped or
orphaned
MiNiFi
Bootstrap
Configuration
Change Notifiers
nifi.properties
flow.xml.gz
attempt restart
config.yml
saves new
reads
transforms into

The Avant-garde of Apache NiFi

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to The Avant-garde of Apache NiFi

Similar to The Avant-garde of Apache NiFi (20)

Recently uploaded

Recently uploaded (20)

The Avant-garde of Apache NiFi

Editor's Notes