More Related Content



  1. @serrazon
  2. @serrazon A system to process and distribute data
  3. @serrazon ● Where NiFi came from? ● The NiFi way ● Flows ● Messaging ● Architecture ● Demo Contents
  4. @serrazon Where NiFi came from?
  5. @serrazon ● NSA Technology Transfer Program - Niagara Files ● FBP - Flow Based Programming ● HortonWorks maintains NiFi on Apache History
  6. @serrazon The NiFi way
  7. @serrazon Abstractions NiFi Term FBP Term Description FlowFile Information Packet Unit of data moving from one system to another. Tracked by its key/value pair attributes Processor Black Box Work of data routing, transformation or mediation between systems. Have access to attributes, they can work with zero or more FlowFiles. They can commit or rollback the work. Connection Bounded Buffer Links between processors. Acts as queues and allow different processes to work at different rates. Allows dynamic priorities and can have upper bounds on load, which enables back pressure. Flow Controller Scheduler Maintains the status of how process connect and manages the working threads. Acts as a broker between processors. Process Group subnet Set of processes and their connections. They have input and ouput port for them to communicate with other process groups or processors. Allows composition of other components.
  8. @serrazon Messaging A B Message channel Producer Consumer Data flowing in a mesage from A (producer) through a channel up to B (consumer)
  9. @serrazon Data going from Producers to Consumers ● Formats (&& II) schemas ● Protocols ● Priorities - The most important first ● Batch vs Streams ● Data level security - authorization ● I need just a part of the message ● Before I get the data, please clean it and prepare it first.
  10. @serrazon Nowadays Messaging Scenario Acquire Data Process / Analyze Data Store Data dataflows Massive amount of data produced by several types of producers going into the wire using several types of channels. Challenge: Acquire, process and store them, online, fast and securely.
  11. @serrazon The Messaging Problem at large scale
  12. @serrazon What NIFI offers? ● No coding, No deployment - Visual operation and control - On the fly ● No log search - Tracking everything is happening - Data lineage (provenance) ● Configure and change how the data is distributed - Prioritization ● Regulate the speed of data consumption - Buffering Data - Back Pressure ● Control latency vs throughput ● Secure Control layer / Data layer - Authentication / Authorization ● Multiple instances - Clustering ● Extensibility It was designed for tackling the Global Enterprise Dataflow challenges
  13. @serrazon Apache NIFI ● Simple data transfer between systems - Reliable and Secure ● Inject of data to Analytic layers ● Data magics / Preparing data ○ Conversion between formats ○ Extraction / Parsing ○ Routing decisions What is NIFI for? And what is NIFI NOT for ? ● Distributed Computation ● Complex Event Processing
  14. @serrazon Use cases types ● IoT Remote sensor data capture ● Enterprise integrations (among systems on intra or internet) ● Big Data ingestion ● Simple event processing (handling discrete points) More use cases info out there...
  15. @serrazon So, why NIFI? Wider coverage than other market solutions. Wider range of dataflow scenarios covered. Allows composition of processes. On-the-fly changes - wow! Keep tracking Highly security and compliance requirements
  16. @serrazon Apache NIFI - Architecture OS Host JVM Web Server Flow Controller Processor 1 Processor 2 FlowFile Repository Content Repository Provenance Repository Local Storage
  17. @serrazon Demo ● Get log data from system A ● Publish dataflow to a telemetry queue ● Subscribe to the queue for processing on system B ● Show data provenance ● Show queuing at relationship level
  18. @serrazon
  19. @serrazon