@serrazon
@serrazon
A system to process and distribute data
@serrazon
● Where NiFi came from?
● The NiFi way
● Flows
● Messaging
● Architecture
● Demo
Contents
https://nifi.apache.org
@serrazon
Where NiFi came from?
@serrazon
● NSA Technology Transfer Program - Niagara Files
● FBP - Flow Based Programming
● HortonWorks maintains NiFi on Apache
History
@serrazon
The NiFi way
@serrazon
Abstractions
NiFi Term FBP Term Description
FlowFile Information Packet Unit of data moving from one system to another. Tracked by its
key/value pair attributes
Processor Black Box Work of data routing, transformation or mediation between systems.
Have access to attributes, they can work with zero or more FlowFiles.
They can commit or rollback the work.
Connection Bounded Buffer Links between processors. Acts as queues and allow different
processes to work at different rates. Allows dynamic priorities and can
have upper bounds on load, which enables back pressure.
Flow Controller Scheduler Maintains the status of how process connect and manages the working
threads. Acts as a broker between processors.
Process Group subnet Set of processes and their connections. They have input and ouput port
for them to communicate with other process groups or processors.
Allows composition of other components.
@serrazon
Messaging
A B
Message
channel
Producer Consumer
Data flowing in a mesage from A (producer) through a channel up to B (consumer)
@serrazon
Data going from Producers to Consumers
● Formats (&& II) schemas
● Protocols
● Priorities - The most important first
● Batch vs Streams
● Data level security - authorization
● I need just a part of the message
● Before I get the data, please clean it and prepare it first.
@serrazon
Nowadays Messaging Scenario
Acquire Data
Process /
Analyze Data
Store Data
dataflows
Massive amount of data produced by
several types of producers going into the
wire using several types of channels.
Challenge: Acquire, process and store
them, online, fast and securely.
@serrazon
The Messaging Problem at large scale
@serrazon
What NIFI offers?
● No coding, No deployment - Visual operation and control - On the fly
● No log search - Tracking everything is happening - Data lineage (provenance)
● Configure and change how the data is distributed - Prioritization
● Regulate the speed of data consumption - Buffering Data - Back Pressure
● Control latency vs throughput
● Secure Control layer / Data layer - Authentication / Authorization
● Multiple instances - Clustering
● Extensibility
It was designed for tackling the Global Enterprise Dataflow challenges
@serrazon
Apache NIFI
● Simple data transfer between systems - Reliable and Secure
● Inject of data to Analytic layers
● Data magics / Preparing data
○ Conversion between formats
○ Extraction / Parsing
○ Routing decisions
What is NIFI for?
And what is NIFI NOT for ?
● Distributed Computation
● Complex Event Processing
@serrazon
Use cases types
● IoT Remote sensor data capture
● Enterprise integrations (among systems on intra or internet)
● Big Data ingestion
● Simple event processing (handling discrete points)
More use cases info out there...
@serrazon
So, why NIFI?
Wider coverage than other market solutions.
Wider range of dataflow scenarios covered. Allows composition of processes.
On-the-fly changes - wow!
Keep tracking
Highly security and compliance requirements
@serrazon
Apache NIFI - Architecture
OS Host
JVM
Web Server
Flow Controller
Processor 1 Processor 2
FlowFile
Repository
Content
Repository
Provenance
Repository
Local Storage
@serrazon
Demo
● Get log data from system A
● Publish dataflow to a telemetry queue
● Subscribe to the queue for processing on system B
● Show data provenance
● Show queuing at relationship level
@serrazon
@serrazon

Nifi

  • 1.
  • 2.
    @serrazon A system toprocess and distribute data
  • 3.
    @serrazon ● Where NiFicame from? ● The NiFi way ● Flows ● Messaging ● Architecture ● Demo Contents https://nifi.apache.org
  • 4.
  • 5.
    @serrazon ● NSA TechnologyTransfer Program - Niagara Files ● FBP - Flow Based Programming ● HortonWorks maintains NiFi on Apache History
  • 6.
  • 7.
    @serrazon Abstractions NiFi Term FBPTerm Description FlowFile Information Packet Unit of data moving from one system to another. Tracked by its key/value pair attributes Processor Black Box Work of data routing, transformation or mediation between systems. Have access to attributes, they can work with zero or more FlowFiles. They can commit or rollback the work. Connection Bounded Buffer Links between processors. Acts as queues and allow different processes to work at different rates. Allows dynamic priorities and can have upper bounds on load, which enables back pressure. Flow Controller Scheduler Maintains the status of how process connect and manages the working threads. Acts as a broker between processors. Process Group subnet Set of processes and their connections. They have input and ouput port for them to communicate with other process groups or processors. Allows composition of other components.
  • 8.
    @serrazon Messaging A B Message channel Producer Consumer Dataflowing in a mesage from A (producer) through a channel up to B (consumer)
  • 9.
    @serrazon Data going fromProducers to Consumers ● Formats (&& II) schemas ● Protocols ● Priorities - The most important first ● Batch vs Streams ● Data level security - authorization ● I need just a part of the message ● Before I get the data, please clean it and prepare it first.
  • 10.
    @serrazon Nowadays Messaging Scenario AcquireData Process / Analyze Data Store Data dataflows Massive amount of data produced by several types of producers going into the wire using several types of channels. Challenge: Acquire, process and store them, online, fast and securely.
  • 11.
  • 12.
    @serrazon What NIFI offers? ●No coding, No deployment - Visual operation and control - On the fly ● No log search - Tracking everything is happening - Data lineage (provenance) ● Configure and change how the data is distributed - Prioritization ● Regulate the speed of data consumption - Buffering Data - Back Pressure ● Control latency vs throughput ● Secure Control layer / Data layer - Authentication / Authorization ● Multiple instances - Clustering ● Extensibility It was designed for tackling the Global Enterprise Dataflow challenges
  • 13.
    @serrazon Apache NIFI ● Simpledata transfer between systems - Reliable and Secure ● Inject of data to Analytic layers ● Data magics / Preparing data ○ Conversion between formats ○ Extraction / Parsing ○ Routing decisions What is NIFI for? And what is NIFI NOT for ? ● Distributed Computation ● Complex Event Processing
  • 14.
    @serrazon Use cases types ●IoT Remote sensor data capture ● Enterprise integrations (among systems on intra or internet) ● Big Data ingestion ● Simple event processing (handling discrete points) More use cases info out there...
  • 15.
    @serrazon So, why NIFI? Widercoverage than other market solutions. Wider range of dataflow scenarios covered. Allows composition of processes. On-the-fly changes - wow! Keep tracking Highly security and compliance requirements
  • 16.
    @serrazon Apache NIFI -Architecture OS Host JVM Web Server Flow Controller Processor 1 Processor 2 FlowFile Repository Content Repository Provenance Repository Local Storage
  • 17.
    @serrazon Demo ● Get logdata from system A ● Publish dataflow to a telemetry queue ● Subscribe to the queue for processing on system B ● Show data provenance ● Show queuing at relationship level
  • 18.
  • 19.