NiFi Term FBP Term Description
FlowFile Information Packet Unit of data moving from one system to another. Tracked by its
key/value pair attributes
Processor Black Box Work of data routing, transformation or mediation between systems.
Have access to attributes, they can work with zero or more FlowFiles.
They can commit or rollback the work.
Connection Bounded Buffer Links between processors. Acts as queues and allow different
processes to work at different rates. Allows dynamic priorities and can
have upper bounds on load, which enables back pressure.
Flow Controller Scheduler Maintains the status of how process connect and manages the working
threads. Acts as a broker between processors.
Process Group subnet Set of processes and their connections. They have input and ouput port
for them to communicate with other process groups or processors.
Allows composition of other components.
Data flowing in a mesage from A (producer) through a channel up to B (consumer)
Data going from Producers to Consumers
● Formats (&& II) schemas
● Priorities - The most important first
● Batch vs Streams
● Data level security - authorization
● I need just a part of the message
● Before I get the data, please clean it and prepare it first.
Nowadays Messaging Scenario
Massive amount of data produced by
several types of producers going into the
wire using several types of channels.
Challenge: Acquire, process and store
them, online, fast and securely.
The Messaging Problem at large scale
What NIFI offers?
● No coding, No deployment - Visual operation and control - On the fly
● No log search - Tracking everything is happening - Data lineage (provenance)
● Configure and change how the data is distributed - Prioritization
● Regulate the speed of data consumption - Buffering Data - Back Pressure
● Control latency vs throughput
● Secure Control layer / Data layer - Authentication / Authorization
● Multiple instances - Clustering
It was designed for tackling the Global Enterprise Dataflow challenges
● Simple data transfer between systems - Reliable and Secure
● Inject of data to Analytic layers
● Data magics / Preparing data
○ Conversion between formats
○ Extraction / Parsing
○ Routing decisions
What is NIFI for?
And what is NIFI NOT for ?
● Distributed Computation
● Complex Event Processing
Use cases types
● IoT Remote sensor data capture
● Enterprise integrations (among systems on intra or internet)
● Big Data ingestion
● Simple event processing (handling discrete points)
More use cases info out there...
So, why NIFI?
Wider coverage than other market solutions.
Wider range of dataflow scenarios covered. Allows composition of processes.
On-the-fly changes - wow!
Highly security and compliance requirements
Apache NIFI - Architecture
Processor 1 Processor 2
● Get log data from system A
● Publish dataflow to a telemetry queue
● Subscribe to the queue for processing on system B
● Show data provenance
● Show queuing at relationship level