@serrazon
Abstractions
NiFi Term FBP Term Description
FlowFile Information Packet Unit of data moving from one system to another. Tracked by its
key/value pair attributes
Processor Black Box Work of data routing, transformation or mediation between systems.
Have access to attributes, they can work with zero or more FlowFiles.
They can commit or rollback the work.
Connection Bounded Buffer Links between processors. Acts as queues and allow different
processes to work at different rates. Allows dynamic priorities and can
have upper bounds on load, which enables back pressure.
Flow Controller Scheduler Maintains the status of how process connect and manages the working
threads. Acts as a broker between processors.
Process Group subnet Set of processes and their connections. They have input and ouput port
for them to communicate with other process groups or processors.
Allows composition of other components.
@serrazon
Data going from Producers to Consumers
● Formats (&& II) schemas
● Protocols
● Priorities - The most important first
● Batch vs Streams
● Data level security - authorization
● I need just a part of the message
● Before I get the data, please clean it and prepare it first.
@serrazon
Nowadays Messaging Scenario
Acquire Data
Process /
Analyze Data
Store Data
dataflows
Massive amount of data produced by
several types of producers going into the
wire using several types of channels.
Challenge: Acquire, process and store
them, online, fast and securely.
@serrazon
What NIFI offers?
● No coding, No deployment - Visual operation and control - On the fly
● No log search - Tracking everything is happening - Data lineage (provenance)
● Configure and change how the data is distributed - Prioritization
● Regulate the speed of data consumption - Buffering Data - Back Pressure
● Control latency vs throughput
● Secure Control layer / Data layer - Authentication / Authorization
● Multiple instances - Clustering
● Extensibility
It was designed for tackling the Global Enterprise Dataflow challenges
@serrazon
Apache NIFI
● Simple data transfer between systems - Reliable and Secure
● Inject of data to Analytic layers
● Data magics / Preparing data
○ Conversion between formats
○ Extraction / Parsing
○ Routing decisions
What is NIFI for?
And what is NIFI NOT for ?
● Distributed Computation
● Complex Event Processing
@serrazon
Use cases types
● IoT Remote sensor data capture
● Enterprise integrations (among systems on intra or internet)
● Big Data ingestion
● Simple event processing (handling discrete points)
More use cases info out there...
@serrazon
So, why NIFI?
Wider coverage than other market solutions.
Wider range of dataflow scenarios covered. Allows composition of processes.
On-the-fly changes - wow!
Keep tracking
Highly security and compliance requirements
@serrazon
Apache NIFI - Architecture
OS Host
JVM
Web Server
Flow Controller
Processor 1 Processor 2
FlowFile
Repository
Content
Repository
Provenance
Repository
Local Storage
@serrazon
Demo
● Get log data from system A
● Publish dataflow to a telemetry queue
● Subscribe to the queue for processing on system B
● Show data provenance
● Show queuing at relationship level