Apache Nifi
Agenda
· Overview
· Navigation
· Technical Details
· Q&A
Background
Mission Statement
“Put simply NiFi was built to automate the flow of data between systems. While the
term dataflow is used in a variety of contexts, we use it here to mean the
automated and managed flow of information between systems. This problem
space has been around ever since enterprises had more than one system, where
some of the systems created data and some of the systems consumed data. The
problems and solution patterns that emerged have been discussed and articulated
extensively.” Apache Nifi Overview
Overview
· Short for “NiagaraFiles” - donated to apache by the NSA in 2014 as
part of the TTP (Transition To Practice) program.
· Flow based programming model.
· Version 1.13.2 (current release) has over 285 “processors” (think
manipulate or gather info) to interact with the data.
· Data format agnostic.
Navigation
● Easy to navigate UI
● Can be standalone or in a
cluster
● Allows users to string
together “processors” or
write their own code
● Moves data around inside a
“flowfile”
● Built in security
○ LDAP, other
authentication
● Automatically captures all
changes that happen to
each “flowfile”
● Has a “replay” capability
● Provenance data can be
moved outside of Nifi and
into something like
elasticsearch and kibana
for real time
dashboarding.
Technical Details
Technical Details
· Written in Java and runnable on Java 8 and 11.
· Zero-Main cluster paradigm
· Built for high concurrency and data streaming, but supports batch
type operations.
· Open source and easily extensible.
· “Owned” and supported by Cloudera.
Use Cases
· Data ingestion
· Streaming
· Batch
· Data processing
· Filtering
· Enrichment
· Transformations
· IOT (both Nifi and Minifi)
· Minifi offers edge computing
· Application data processing
· Offload data processing to platform
· Data and process automation
· Data governance
Questions
?
Appendix
References
· Nifi
· https://nifi.apache.org/docs/nifi-docs/html/overview.html
· Flowfile
· https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#flowfile
· Provenance
· https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#provenance_events
· Data batching and streaming
· https://thenewstack.io/the-big-data-debate-batch-processing-vs-streaming-processing/
· Custom processor setup
· https://community.cloudera.com/t5/Community-Articles/Building-a-Custom-Processor-Using-IntelliJ
/ta-p/244343
· Custom Scripting
· https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-1/ta-p/2489
22

Automate your data flows with Apache NIFI

  • 1.
  • 2.
    Agenda · Overview · Navigation ·Technical Details · Q&A
  • 3.
  • 4.
    Mission Statement “Put simplyNiFi was built to automate the flow of data between systems. While the term dataflow is used in a variety of contexts, we use it here to mean the automated and managed flow of information between systems. This problem space has been around ever since enterprises had more than one system, where some of the systems created data and some of the systems consumed data. The problems and solution patterns that emerged have been discussed and articulated extensively.” Apache Nifi Overview
  • 5.
    Overview · Short for“NiagaraFiles” - donated to apache by the NSA in 2014 as part of the TTP (Transition To Practice) program. · Flow based programming model. · Version 1.13.2 (current release) has over 285 “processors” (think manipulate or gather info) to interact with the data. · Data format agnostic.
  • 6.
  • 7.
    ● Easy tonavigate UI ● Can be standalone or in a cluster ● Allows users to string together “processors” or write their own code ● Moves data around inside a “flowfile” ● Built in security ○ LDAP, other authentication
  • 9.
    ● Automatically capturesall changes that happen to each “flowfile” ● Has a “replay” capability ● Provenance data can be moved outside of Nifi and into something like elasticsearch and kibana for real time dashboarding.
  • 10.
  • 11.
    Technical Details · Writtenin Java and runnable on Java 8 and 11. · Zero-Main cluster paradigm · Built for high concurrency and data streaming, but supports batch type operations. · Open source and easily extensible. · “Owned” and supported by Cloudera.
  • 12.
    Use Cases · Dataingestion · Streaming · Batch · Data processing · Filtering · Enrichment · Transformations · IOT (both Nifi and Minifi) · Minifi offers edge computing · Application data processing · Offload data processing to platform · Data and process automation · Data governance
  • 13.
  • 14.
  • 15.
    References · Nifi · https://nifi.apache.org/docs/nifi-docs/html/overview.html ·Flowfile · https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#flowfile · Provenance · https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#provenance_events · Data batching and streaming · https://thenewstack.io/the-big-data-debate-batch-processing-vs-streaming-processing/ · Custom processor setup · https://community.cloudera.com/t5/Community-Articles/Building-a-Custom-Processor-Using-IntelliJ /ta-p/244343 · Custom Scripting · https://community.cloudera.com/t5/Community-Articles/ExecuteScript-Cookbook-part-1/ta-p/2489 22