SlideShare a Scribd company logo
1 of 24
Data Ingestion using NiFi
Quick Overview
training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi Layout as a service
• Key Concepts such as Flow Files, Attributes etc
• Understanding how to access the documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• Demo - Simple pipeline to copy files from Local File System and HDFS
Resources
• Code and Documentation will be available in GitHub Repository.
• Videos will be available over YouTube as part of this playlist. Videos
will be streamed for free and will be available for free for few weeks
after which they will become member only (except this one).
training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
Web/App Server
Web/App Server
Web/App Server
Database
Client
Client
Client
Client
Client
Client
Switch
Firewall
Switch
Firewall
Web/App Server
Web/App Server
Web/App Server
Database
Files
Databases
BI/DW
External
Apps
Data Integration
Batch or Real Time
• For batch get data from databases
by querying data from Database
• Batch Tools: Informatica, Ab Initio
etc
• For real time get data from web
server logs or database logs
• Real time tools: Goldengate to get
data from database logs, Kafka to
get data from web server logs
Files
Databases
BI/DW
External
Apps
Data Lake
Database
Application
logs
Mainframes
IOT Device
Data
Modern Large Scale Data Engineering Architecture
Files
Databases
BI/DW
External
Apps
Data Lake
Database
Application
logs
Mainframes
IOT Device
Data
Modern Large Scale Data Engineering Architecture
Files
Databases
BI/DW
External
Apps
Data Lake
(S3, ADLS)
Database
Application
logs
Mainframes
IOT Device
Data
Modern Large Scale Data Engineering Architecture
Ingestion
Ingestion
Data Processing
(EMR, Databricks, Docker)
NiFi helps in Ingestion and basic scheduling
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
Understanding NiFi as a service
• NiFi is a data ingestion tool and it is typically configured on edge
nodes or client nodes.
• It can be configured on multiple nodes as a cluster for HA, Fault
Tolerance and Load Balancing.
• It can be integrated with Kerberos for Security.
• NiFi is an external service and requires configuration to integrate with
Data Engineering tools like Spark, Kafka, Hadoop etc.
• NiFi is provided as one of the key services under
Cloudera/Hortonworks Distributions.
training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
NiFi Core Concepts
Here are the core concepts of NiFi one should be familiar with. One will
understand all these concepts while exploring NiFi in depth as part of
the NiFi Workshop Series.
• Processors
• Processor Groups
• Flowfiles
• Attributes
• Controller Services
• NiFi Expression Language
training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
Accessing NiFi Documentation
• NiFi documentation is accessible from any processor by using usage
that is available in right click menu.
training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
Capabilities of NiFi as a Data Ingestion Tool
• Can consume data from most of the sources into Data Lake.
• Can port the data from Data Lake to downstream systems.
• We can also take care of file format conversion while loading data into
Data Lake using NiFi.
• NiFi also provides abilities to apply almost all the standard row level
transformations either by using JOLT or SQL in an incremental fashion.
• NiFi can also be leveraged for orchestrating as well as scheduling the
Data Pipelines.
• However, NiFi might not be the most appropriate tool to load heavy
data as baseline and also not good at complex transformations.
training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
NiFi vs. Traditional ETL Tools
• NiFi is primarily an ingestion tool.
• It works well to extract and load the data into Data Lake with out
complex transformations.
• NiFi is very good at getting data between hops by dealing with files
rather than manipulating data.
• NiFi is capable of building simple and generic pipelines to get data
between hops with out restricting the flow with schema.
• You can build a very simple flow in minutes to get data from
thousands of files belonging to hundreds of tables into Data Lake. You
will see that as part of the demo later.
training@itversity.com
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
Role of NiFi in Data Engineering at Scale
• Get data from databases into data lake
• Consume data from Kafka topics into data lake
• Get data from app server log files into data lake (using Minifi)
• Get data from Data Lake into file servers.
• Get data from on-prem Data Lake into Cloud such as S3, ADLS etc.
• Get processed data from Data Lake into Databases or Data
Warehouses.
training@itversity.com
Files
Databases
BI/DW
External
Apps
Data Lake
Database
Application
logs
Mainframes
IOT Device
Data
Modern Large Scale Data Engineering Architecture
Agenda
• Overview of NiFi
• Understanding NiFi as a service
• NiFi Core Concepts
• Accessing NiFi Documentation
• Capabilities of NiFi as a Data Ingestion Tool
• NiFi vs. Traditional ETL Tools
• Role of NiFi in Data Engineering at Scale
• NiFi Demo – Simple Data Pipeline
NiFi Demo – Simple Data Pipeline
• Build a simple pipeline to get files from local file system into HDFS.
training@itversity.com

More Related Content

What's hot

Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiSlim Baltagi
 
Seamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with ConnectSeamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with ConnectPrecisely
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveTorsten Steinbach
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...James Serra
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big DataDataWorks Summit
 
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAmazon Web Services
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...DataWorks Summit
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0   virtual - subhadeep duttaCWIN17 India / Insights platform architecture v1 0   virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep duttaCapgemini
 
Suburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeSuburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeTorsten Steinbach
 
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community SitesOracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community SitesSteven Feuerstein
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationDatabricks
 
Modern Data Warehouse Overview
Modern Data Warehouse OverviewModern Data Warehouse Overview
Modern Data Warehouse OverviewJohn Chang
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudJames Serra
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Hortonworks
 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise ArchitectureMapR Technologies
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonDremio Corporation
 
Delta Lake with Azure Databricks
Delta Lake with Azure DatabricksDelta Lake with Azure Databricks
Delta Lake with Azure DatabricksDustin Vannoy
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsDataWorks Summit
 

What's hot (20)

Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
 
Seamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with ConnectSeamless, Real-Time Data Integration with Connect
Seamless, Real-Time Data Integration with Connect
 
IBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep DiveIBM Cloud Day January 2021 Data Lake Deep Dive
IBM Cloud Day January 2021 Data Lake Deep Dive
 
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...
 
Scaling Data Science on Big Data
Scaling Data Science on Big DataScaling Data Science on Big Data
Scaling Data Science on Big Data
 
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWS
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0   virtual - subhadeep duttaCWIN17 India / Insights platform architecture v1 0   virtual - subhadeep dutta
CWIN17 India / Insights platform architecture v1 0 virtual - subhadeep dutta
 
Suburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data LakeSuburface 2021 IBM Cloud Data Lake
Suburface 2021 IBM Cloud Data Lake
 
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community SitesOracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
Oracle PL/SQL 12c and 18c New Features + RADstack + Community Sites
 
The Hidden Value of Hadoop Migration
The Hidden Value of Hadoop MigrationThe Hidden Value of Hadoop Migration
The Hidden Value of Hadoop Migration
 
Modern Data Warehouse Overview
Modern Data Warehouse OverviewModern Data Warehouse Overview
Modern Data Warehouse Overview
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
Modern Data Architecture for a Data Lake with Informatica and Hortonworks Dat...
 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise Architecture
 
Bi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in LondonBi on Big Data - Strata 2016 in London
Bi on Big Data - Strata 2016 in London
 
Delta Lake with Azure Databricks
Delta Lake with Azure DatabricksDelta Lake with Azure Databricks
Delta Lake with Azure Databricks
 
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for AnalyticsVerizon Centralizes Data into a Data Lake in Real Time for Analytics
Verizon Centralizes Data into a Data Lake in Real Time for Analytics
 

Similar to Data ingestion using NiFi - Quick Overview

Integração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia CetaxIntegração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia CetaxMarco Garcia
 
Apache NiFi User Guide
Apache NiFi User GuideApache NiFi User Guide
Apache NiFi User GuideDeon Huang
 
DevOps Fest 2020. Даніель Яворович. Data pipelines: building an efficient ins...
DevOps Fest 2020. Даніель Яворович. Data pipelines: building an efficient ins...DevOps Fest 2020. Даніель Яворович. Data pipelines: building an efficient ins...
DevOps Fest 2020. Даніель Яворович. Data pipelines: building an efficient ins...DevOps_Fest
 
Apache NiFi: A Drag and Drop Approach
Apache NiFi: A Drag and Drop ApproachApache NiFi: A Drag and Drop Approach
Apache NiFi: A Drag and Drop ApproachCalculated Systems
 
Automate your data flows with Apache NIFI
Automate your data flows with Apache NIFIAutomate your data flows with Apache NIFI
Automate your data flows with Apache NIFIAdam Doyle
 
AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101Timothy Spann
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems
 
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-RampTimothy Spann
 
Deep learning with DL4J - Hadoop Summit 2015
Deep learning with DL4J - Hadoop Summit 2015Deep learning with DL4J - Hadoop Summit 2015
Deep learning with DL4J - Hadoop Summit 2015Josh Patterson
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...ssuserd3a367
 
How to Build Deep Learning Models
How to Build Deep Learning ModelsHow to Build Deep Learning Models
How to Build Deep Learning ModelsJosh Patterson
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019Timothy Spann
 
drupal 7 amfserver presentation: integrating flash and drupal
drupal 7 amfserver presentation: integrating flash and drupaldrupal 7 amfserver presentation: integrating flash and drupal
drupal 7 amfserver presentation: integrating flash and drupalrolf vreijdenberger
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksData Con LA
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online TrainingLearntek1
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Data Con LA
 
Introduction to Filecoin
Introduction to Filecoin   Introduction to Filecoin
Introduction to Filecoin Vanessa Lošić
 

Similar to Data ingestion using NiFi - Quick Overview (20)

Integração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia CetaxIntegração de Dados com Apache NIFI - Marco Garcia Cetax
Integração de Dados com Apache NIFI - Marco Garcia Cetax
 
Apache NiFi User Guide
Apache NiFi User GuideApache NiFi User Guide
Apache NiFi User Guide
 
DevOps Fest 2020. Даніель Яворович. Data pipelines: building an efficient ins...
DevOps Fest 2020. Даніель Яворович. Data pipelines: building an efficient ins...DevOps Fest 2020. Даніель Яворович. Data pipelines: building an efficient ins...
DevOps Fest 2020. Даніель Яворович. Data pipelines: building an efficient ins...
 
Apache NiFi: A Drag and Drop Approach
Apache NiFi: A Drag and Drop ApproachApache NiFi: A Drag and Drop Approach
Apache NiFi: A Drag and Drop Approach
 
Automate your data flows with Apache NIFI
Automate your data flows with Apache NIFIAutomate your data flows with Apache NIFI
Automate your data flows with Apache NIFI
 
AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101
 
spring-cloud.pptx
spring-cloud.pptxspring-cloud.pptx
spring-cloud.pptx
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
 
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
 
Deep learning with DL4J - Hadoop Summit 2015
Deep learning with DL4J - Hadoop Summit 2015Deep learning with DL4J - Hadoop Summit 2015
Deep learning with DL4J - Hadoop Summit 2015
 
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
Building Scalable Big Data Infrastructure Using Open Source Software Presenta...
 
Integrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data LakesIntegrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data Lakes
 
How to Build Deep Learning Models
How to Build Deep Learning ModelsHow to Build Deep Learning Models
How to Build Deep Learning Models
 
Introduction to Apache NiFi dws19 DWS - DC 2019
Introduction to Apache NiFi   dws19 DWS - DC 2019Introduction to Apache NiFi   dws19 DWS - DC 2019
Introduction to Apache NiFi dws19 DWS - DC 2019
 
drupal 7 amfserver presentation: integrating flash and drupal
drupal 7 amfserver presentation: integrating flash and drupaldrupal 7 amfserver presentation: integrating flash and drupal
drupal 7 amfserver presentation: integrating flash and drupal
 
Stinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of HortonworksStinger.Next by Alan Gates of Hortonworks
Stinger.Next by Alan Gates of Hortonworks
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
Big Data Day LA 2016/ Big Data Track - Building scalable enterprise data flow...
 
HDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the CloudHDFCloud Workshop: HDF5 in the Cloud
HDFCloud Workshop: HDF5 in the Cloud
 
Introduction to Filecoin
Introduction to Filecoin   Introduction to Filecoin
Introduction to Filecoin
 

More from Durga Gadiraju

Big Data Certifications Workshop - 201711 - Introduction and Database Essentials
Big Data Certifications Workshop - 201711 - Introduction and Database EssentialsBig Data Certifications Workshop - 201711 - Introduction and Database Essentials
Big Data Certifications Workshop - 201711 - Introduction and Database EssentialsDurga Gadiraju
 
Big Data Certifications Workshop - 201711 - Introduction and Linux Essentials
Big Data Certifications Workshop - 201711 - Introduction and Linux EssentialsBig Data Certifications Workshop - 201711 - Introduction and Linux Essentials
Big Data Certifications Workshop - 201711 - Introduction and Linux EssentialsDurga Gadiraju
 
HDPCD Spark using Python (pyspark)
HDPCD Spark using Python (pyspark)HDPCD Spark using Python (pyspark)
HDPCD Spark using Python (pyspark)Durga Gadiraju
 
Pycon India 2017 - Big Data Engineering using Spark with Python (pyspark) - W...
Pycon India 2017 - Big Data Engineering using Spark with Python (pyspark) - W...Pycon India 2017 - Big Data Engineering using Spark with Python (pyspark) - W...
Pycon India 2017 - Big Data Engineering using Spark with Python (pyspark) - W...Durga Gadiraju
 
Big Data Introduction - Solix empower
Big Data Introduction - Solix empowerBig Data Introduction - Solix empower
Big Data Introduction - Solix empowerDurga Gadiraju
 
Oracle migrations and upgrades
Oracle migrations and upgradesOracle migrations and upgrades
Oracle migrations and upgradesDurga Gadiraju
 

More from Durga Gadiraju (7)

Itversity
ItversityItversity
Itversity
 
Big Data Certifications Workshop - 201711 - Introduction and Database Essentials
Big Data Certifications Workshop - 201711 - Introduction and Database EssentialsBig Data Certifications Workshop - 201711 - Introduction and Database Essentials
Big Data Certifications Workshop - 201711 - Introduction and Database Essentials
 
Big Data Certifications Workshop - 201711 - Introduction and Linux Essentials
Big Data Certifications Workshop - 201711 - Introduction and Linux EssentialsBig Data Certifications Workshop - 201711 - Introduction and Linux Essentials
Big Data Certifications Workshop - 201711 - Introduction and Linux Essentials
 
HDPCD Spark using Python (pyspark)
HDPCD Spark using Python (pyspark)HDPCD Spark using Python (pyspark)
HDPCD Spark using Python (pyspark)
 
Pycon India 2017 - Big Data Engineering using Spark with Python (pyspark) - W...
Pycon India 2017 - Big Data Engineering using Spark with Python (pyspark) - W...Pycon India 2017 - Big Data Engineering using Spark with Python (pyspark) - W...
Pycon India 2017 - Big Data Engineering using Spark with Python (pyspark) - W...
 
Big Data Introduction - Solix empower
Big Data Introduction - Solix empowerBig Data Introduction - Solix empower
Big Data Introduction - Solix empower
 
Oracle migrations and upgrades
Oracle migrations and upgradesOracle migrations and upgrades
Oracle migrations and upgrades
 

Recently uploaded

Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Servicejennyeacort
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 

Recently uploaded (20)

Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 

Data ingestion using NiFi - Quick Overview

  • 1. Data Ingestion using NiFi Quick Overview training@itversity.com
  • 2. Agenda • Overview of NiFi • Understanding NiFi Layout as a service • Key Concepts such as Flow Files, Attributes etc • Understanding how to access the documentation • Capabilities of NiFi as a Data Ingestion Tool • NiFi vs. Traditional ETL Tools • Role of NiFi in Data Engineering at Scale • Demo - Simple pipeline to copy files from Local File System and HDFS
  • 3. Resources • Code and Documentation will be available in GitHub Repository. • Videos will be available over YouTube as part of this playlist. Videos will be streamed for free and will be available for free for few weeks after which they will become member only (except this one). training@itversity.com
  • 4. Agenda • Overview of NiFi • Understanding NiFi as a service • NiFi Core Concepts • Accessing NiFi Documentation • Capabilities of NiFi as a Data Ingestion Tool • NiFi vs. Traditional ETL Tools • Role of NiFi in Data Engineering at Scale • NiFi Demo – Simple Data Pipeline
  • 5. Web/App Server Web/App Server Web/App Server Database Client Client Client Client Client Client Switch Firewall Switch Firewall
  • 6. Web/App Server Web/App Server Web/App Server Database Files Databases BI/DW External Apps Data Integration Batch or Real Time • For batch get data from databases by querying data from Database • Batch Tools: Informatica, Ab Initio etc • For real time get data from web server logs or database logs • Real time tools: Goldengate to get data from database logs, Kafka to get data from web server logs
  • 9. Files Databases BI/DW External Apps Data Lake (S3, ADLS) Database Application logs Mainframes IOT Device Data Modern Large Scale Data Engineering Architecture Ingestion Ingestion Data Processing (EMR, Databricks, Docker) NiFi helps in Ingestion and basic scheduling
  • 10. Agenda • Overview of NiFi • Understanding NiFi as a service • NiFi Core Concepts • Accessing NiFi Documentation • Capabilities of NiFi as a Data Ingestion Tool • NiFi vs. Traditional ETL Tools • Role of NiFi in Data Engineering at Scale • NiFi Demo – Simple Data Pipeline
  • 11. Understanding NiFi as a service • NiFi is a data ingestion tool and it is typically configured on edge nodes or client nodes. • It can be configured on multiple nodes as a cluster for HA, Fault Tolerance and Load Balancing. • It can be integrated with Kerberos for Security. • NiFi is an external service and requires configuration to integrate with Data Engineering tools like Spark, Kafka, Hadoop etc. • NiFi is provided as one of the key services under Cloudera/Hortonworks Distributions. training@itversity.com
  • 12. Agenda • Overview of NiFi • Understanding NiFi as a service • NiFi Core Concepts • Accessing NiFi Documentation • Capabilities of NiFi as a Data Ingestion Tool • NiFi vs. Traditional ETL Tools • Role of NiFi in Data Engineering at Scale • NiFi Demo – Simple Data Pipeline
  • 13. NiFi Core Concepts Here are the core concepts of NiFi one should be familiar with. One will understand all these concepts while exploring NiFi in depth as part of the NiFi Workshop Series. • Processors • Processor Groups • Flowfiles • Attributes • Controller Services • NiFi Expression Language training@itversity.com
  • 14. Agenda • Overview of NiFi • Understanding NiFi as a service • NiFi Core Concepts • Accessing NiFi Documentation • Capabilities of NiFi as a Data Ingestion Tool • NiFi vs. Traditional ETL Tools • Role of NiFi in Data Engineering at Scale • NiFi Demo – Simple Data Pipeline
  • 15. Accessing NiFi Documentation • NiFi documentation is accessible from any processor by using usage that is available in right click menu. training@itversity.com
  • 16. Agenda • Overview of NiFi • Understanding NiFi as a service • NiFi Core Concepts • Accessing NiFi Documentation • Capabilities of NiFi as a Data Ingestion Tool • NiFi vs. Traditional ETL Tools • Role of NiFi in Data Engineering at Scale • NiFi Demo – Simple Data Pipeline
  • 17. Capabilities of NiFi as a Data Ingestion Tool • Can consume data from most of the sources into Data Lake. • Can port the data from Data Lake to downstream systems. • We can also take care of file format conversion while loading data into Data Lake using NiFi. • NiFi also provides abilities to apply almost all the standard row level transformations either by using JOLT or SQL in an incremental fashion. • NiFi can also be leveraged for orchestrating as well as scheduling the Data Pipelines. • However, NiFi might not be the most appropriate tool to load heavy data as baseline and also not good at complex transformations. training@itversity.com
  • 18. Agenda • Overview of NiFi • Understanding NiFi as a service • NiFi Core Concepts • Accessing NiFi Documentation • Capabilities of NiFi as a Data Ingestion Tool • NiFi vs. Traditional ETL Tools • Role of NiFi in Data Engineering at Scale • NiFi Demo – Simple Data Pipeline
  • 19. NiFi vs. Traditional ETL Tools • NiFi is primarily an ingestion tool. • It works well to extract and load the data into Data Lake with out complex transformations. • NiFi is very good at getting data between hops by dealing with files rather than manipulating data. • NiFi is capable of building simple and generic pipelines to get data between hops with out restricting the flow with schema. • You can build a very simple flow in minutes to get data from thousands of files belonging to hundreds of tables into Data Lake. You will see that as part of the demo later. training@itversity.com
  • 20. Agenda • Overview of NiFi • Understanding NiFi as a service • NiFi Core Concepts • Accessing NiFi Documentation • Capabilities of NiFi as a Data Ingestion Tool • NiFi vs. Traditional ETL Tools • Role of NiFi in Data Engineering at Scale • NiFi Demo – Simple Data Pipeline
  • 21. Role of NiFi in Data Engineering at Scale • Get data from databases into data lake • Consume data from Kafka topics into data lake • Get data from app server log files into data lake (using Minifi) • Get data from Data Lake into file servers. • Get data from on-prem Data Lake into Cloud such as S3, ADLS etc. • Get processed data from Data Lake into Databases or Data Warehouses. training@itversity.com
  • 23. Agenda • Overview of NiFi • Understanding NiFi as a service • NiFi Core Concepts • Accessing NiFi Documentation • Capabilities of NiFi as a Data Ingestion Tool • NiFi vs. Traditional ETL Tools • Role of NiFi in Data Engineering at Scale • NiFi Demo – Simple Data Pipeline
  • 24. NiFi Demo – Simple Data Pipeline • Build a simple pipeline to get files from local file system into HDFS. training@itversity.com