* Overview of NiFi
* Understanding NiFi Layout as a service
* Key Concepts such as Flow Files, Attributes etc
* Understanding how to access the documentation
* Capabilities of NiFi as a Data Ingestion Tool
* NiFi vs. Traditional ETL Tools
* Role of NiFi in Data Engineering at Scale
* Simple pipeline to copy files from Local File System and HDFS
2. Agenda
• Overview of NiFi
• Understanding NiFi Layout as a service
• Key Concepts such as Flow Files, Attributes etc
• Understanding how to access the documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• Demo - Simple pipeline to copy files from Local File System and HDFS
3. Resources
• Code and Documentation will be available in GitHub Repository.
• Videos will be available over YouTube as part of this playlist. Videos
will be streamed for free and will be available for free for few weeks
after which they will become member only (except this one).
training@itversity.com
4. Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
6. Web/App Server
Web/App Server
Web/App Server
Database
Files
Databases
BI/DW
External
Apps
Data Integration
Batch or Real Time
• For batch get data from databases
by querying data from Database
• Batch Tools: Informatica, Ab Initio
etc
• For real time get data from web
server logs or database logs
• Real time tools: Goldengate to get
data from database logs, Kafka to
get data from web server logs
10. Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
11. Understanding NiFi as a service
• NiFi is a data ingestion tool and it is typically configured on edge
nodes or client nodes.
• It can be configured on multiple nodes as a cluster for HA, Fault
Tolerance and Load Balancing.
• It can be integrated with Kerberos for Security.
• NiFi is an external service and requires configuration to integrate with
Data Engineering tools like Spark, Kafka, Hadoop etc.
• NiFi is provided as one of the key services under
Cloudera/Hortonworks Distributions.
training@itversity.com
12. Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
13. NiFi Core Concepts
Here are the core concepts of NiFi one should be familiar with. One will
understand all these concepts while exploring NiFi in depth as part of
the NiFi Workshop Series.
• Processors
• Processor Groups
• Flowfiles
• Attributes
• Controller Services
• NiFi Expression Language
training@itversity.com
14. Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
15. Accessing NiFi Documentation
• NiFi documentation is accessible from any processor by using usage
that is available in right click menu.
training@itversity.com
16. Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
17. Capabilities of NiFi as a Data Ingestion Tool
• Can consume data from most of the sources into Data Lake.
• Can port the data from Data Lake to downstream systems.
• We can also take care of file format conversion while loading data into
Data Lake using NiFi.
• NiFi also provides abilities to apply almost all the standard row level
transformations either by using JOLT or SQL in an incremental fashion.
• NiFi can also be leveraged for orchestrating as well as scheduling the
Data Pipelines.
• However, NiFi might not be the most appropriate tool to load heavy
data as baseline and also not good at complex transformations.
training@itversity.com
18. Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
19. NiFi vs. Traditional ETL Tools
• NiFi is primarily an ingestion tool.
• It works well to extract and load the data into Data Lake with out
complex transformations.
• NiFi is very good at getting data between hops by dealing with files
rather than manipulating data.
• NiFi is capable of building simple and generic pipelines to get data
between hops with out restricting the flow with schema.
• You can build a very simple flow in minutes to get data from
thousands of files belonging to hundreds of tables into Data Lake. You
will see that as part of the demo later.
training@itversity.com
20. Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
21. Role of NiFi in Data Engineering at Scale
• Get data from databases into data lake
• Consume data from Kafka topics into data lake
• Get data from app server log files into data lake (using Minifi)
• Get data from Data Lake into file servers.
• Get data from on-prem Data Lake into Cloud such as S3, ADLS etc.
• Get processed data from Data Lake into Databases or Data
Warehouses.
training@itversity.com
23. Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
24. NiFi Demo – Simple Data Pipeline
• Build a simple pipeline to get files from local file system into HDFS.
training@itversity.com