SlideShare a Scribd company logo
1 of 24
Download to read offline
Looking at the New Features of
Apache NiFi
Timothy Spann
Principal Developer Advocate
Sunday October 8, 2023
4:10PM - 4:50 PM
Room 102
Slides, Code, Articles and More…
3
FLaNK Stack
Tim Spann
@PaasDev // Blog: www.datainmotion.dev
Principal Developer Advocate.
Princeton Future of Data Meetup.
ex-Pivotal, ex-Hortonworks, ex-StreamNative, ex-PwC
https://medium.com/@tspann
https://github.com/tspannhw
Apache NiFi x Apache Kafka x Apache Flink
© 2023 Cloudera, Inc. All rights reserved. 4
Future of Data - New York + Princeton + Virtual
@PaasDev
https://www.meetup.com/futureofdata-princeton/
https://www.meetup.com/futureofdata-newyork/
From Big Data to AI to Streaming to Containers to
Cloud to Analytics to Cloud Storage to Fast Data to
Machine Learning to Microservices to ...
FLaNK Stack Weekly
This week in Apache NiFi, Apache Flink, Apache
Kafka, Apache Spark, Apache Iceberg, Python, Java,
AI, ML, LLM and Open Source friends.
https://bit.ly/32dAJft
My Talk List
Utilizing Real-Time Transit Data for Travel Optimization
Let’s Monitor the Conditions at the Conference
Agenda
Apache NiFi has a lot of new features, processors and best practices that have arrived
in the last year or so.
I will walk through building flows using the latest tips, techniques and processor.
I will and change a number of data flows utilizing the latest NiFi version and point out
gotchas and some never dos. The deck will act as a take-away with notes, tips and
guides to what we covered.
===> Any NiFi 1.23+ and 2.0 in progress features people want to see?
Records
New ExcelRecord Reader
AmazonGlueSchemaRegistry
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12353320
New to 2023 Processors
GenerateRecord
GetAsanaObject
PutSalesforceObject
QuerySalesforceObject
PutIoTDBRecord
QueryIoTDBRecord
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12353320
ListGoogleDrive
FetchGoogleDrive
PutGoogleDrive
PutBoxFile
ListBoxFile
FetchBoxFile
PutDropbox
DecryptContent
DecryptContentCompatibility
New to 2023 Processors
ExtractRecordSchema
RemoveRecordField
VerifyContentMAC
TriggerHiveMetaStoreEvent
“count” function added to RecordPath
AWS ML Service Processors
https://github.com/tspannhw/FLaNK-AWSML
AWS Translate
Deprecating for Removal
Deprecate Lua and Ruby Script Engines
Deprecate ECMAScript Script Engine
Deprecate the Ambari Reporting Task
Deprecate Kafka 1.x components and 2.0 components
XML Templates
Variables
See:
https://cwiki.apache.org/confluence/display/NIFI/Deprecated+Components+and+Features
Start Using
ExecuteStateless -> run your stateless flows right in a regular NiFi cluster
Parameters
JSON Flow Serialization
Records everywhere
© 2020 Cloudera, Inc. All rights reserved. 15
https://medium.com/cloudera-inc/getting-ready-for-apache-nifi-2-0-5a5e6a67f450
NiFi 2.0 Coming
● Python Integration
● Parameters
● JDK 17, maybe JDK 21+
● JSON Flow Serialization
● Rules Engine for Development Assistance
● Run Process Group as Stateless
● flow.json.gz
https://cwiki.apache.org/confluence/display/NIFI/NiFi+2.0+Release+Goals
Thanks to Pierre!
© 2019 Cloudera, Inc. All rights reserved. 18
Python as First Class (NIFI-11241)
Graphical UI with custom Python based extensions
NEW
in NiFi
2.0
© 2019 Cloudera, Inc. All rights reserved. 19
Apache NiFi in a few numbers
A very active project with a dynamic community & comparison with ACEU 2019
2800+ members on the Slack channel (535+ - 4 years ago)
475+ contributors on Github across the repositories (260+ - 4 years
ago)
65 committers in the Apache NiFi community (45 - 4 years ago)
Apache NiFi 1.23.2 is the latest release, NiFi 2.0 coming soon (NiFi
1.10 - 4 years ago)
14M+ docker pulls of the Apache NiFi image (1M+ - 4 years ago)
20
© 2023 Cloudera, Inc. All rights reserved.
Cloudera Edge Flow Manager
(Command & Control of MiNiFi Agents)
MiNiFi C++
(small footprint)
MiNiFi Java
(headless version of NiFi)
NiFi Registry
Cloudera NiFi for Kafka
Connect
NiFi in
Cloudera DataFlow Functions
Cloudera DataFlow
Stateless NiFi
NiFi Deploy Options from Open Source to Managed
21
© 2023 Cloudera, Inc. All rights reserved.
NiFi 2.0 is coming… https://medium.com/cloudera-inc/getting-ready-for-apache-nifi-2-0-5a5e6a67f450
- First-class citizen Python API
- Rules Engine
- NiFi Stateless at Process Group level
- Java 21 (virtual threads, perf improvements, etc)
https://medium.com/@george.vetticaden/accelerating-ai-data-pipelines-building-an-evernote-chatbot-with-apache-nifi-2-0-and-generative-ai-9d977466ff4c
Closing the gap between data engineers and data scientists…
- Export documentation (Sharepoint, OCR) to build the knowledge base powering your chatbot
- Scrape the internet (Sitemap) to build the knowledge base powering your chatbot
- Real-time streaming ingest of Slack to build the knowledge base powering your chatbot
DEMO
24
TH N Y U

More Related Content

Similar to CoC23_ Looking at the New Features of Apache NiFi

AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101Timothy Spann
 
AIDEVDAY_ Data-in-Motion to Supercharge AI
AIDEVDAY_ Data-in-Motion to Supercharge AIAIDEVDAY_ Data-in-Motion to Supercharge AI
AIDEVDAY_ Data-in-Motion to Supercharge AITimothy Spann
 
Using apache mx net in production deep learning streaming pipelines
Using apache mx net in production deep learning streaming pipelinesUsing apache mx net in production deep learning streaming pipelines
Using apache mx net in production deep learning streaming pipelinesTimothy Spann
 
Mm.. FLaNK Stack (MiNiFi MXNet Flink NiFi Kudu Kafka)
Mm.. FLaNK Stack (MiNiFi MXNet Flink NiFi Kudu Kafka)Mm.. FLaNK Stack (MiNiFi MXNet Flink NiFi Kudu Kafka)
Mm.. FLaNK Stack (MiNiFi MXNet Flink NiFi Kudu Kafka)Timothy Spann
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Timothy Spann
 
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC MeetupTimothy Spann
 
ApacheCon 2021 - Apache NiFi Deep Dive 300
ApacheCon 2021 - Apache NiFi Deep Dive 300ApacheCon 2021 - Apache NiFi Deep Dive 300
ApacheCon 2021 - Apache NiFi Deep Dive 300Timothy Spann
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksDataWorks Summit/Hadoop Summit
 
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksOverview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksSlim Baltagi
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksSlim Baltagi
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
 
Building Real-Time Travel Alerts
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel AlertsTimothy Spann
 
Protocol Labs, David Dias, TADSummit 2018
Protocol Labs, David Dias, TADSummit 2018Protocol Labs, David Dias, TADSummit 2018
Protocol Labs, David Dias, TADSummit 2018Alan Quayle
 
ApacheCon 2021 Apache Deep Learning 302
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302Timothy Spann
 
IoT Edge Data Processing with NVidia Jetson Nano oct 3 2019
IoT  Edge Data Processing with NVidia Jetson Nano oct 3 2019IoT  Edge Data Processing with NVidia Jetson Nano oct 3 2019
IoT Edge Data Processing with NVidia Jetson Nano oct 3 2019Timothy Spann
 
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfTimothy Spann
 
ApacheCon 2021: Apache NiFi 101- introduction and best practices
ApacheCon 2021:   Apache NiFi 101- introduction and best practicesApacheCon 2021:   Apache NiFi 101- introduction and best practices
ApacheCon 2021: Apache NiFi 101- introduction and best practicesTimothy Spann
 
What You Missed: Red Hat Summit 2016
What You Missed: Red Hat Summit 2016 What You Missed: Red Hat Summit 2016
What You Missed: Red Hat Summit 2016 NetApp
 
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023Timothy Spann
 
ApacheCon 2021: Cracking the nut with Apache Pulsar (FLiP)
ApacheCon 2021:  Cracking the nut with Apache Pulsar (FLiP)ApacheCon 2021:  Cracking the nut with Apache Pulsar (FLiP)
ApacheCon 2021: Cracking the nut with Apache Pulsar (FLiP)Timothy Spann
 

Similar to CoC23_ Looking at the New Features of Apache NiFi (20)

AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101AIDevWorldApacheNiFi101
AIDevWorldApacheNiFi101
 
AIDEVDAY_ Data-in-Motion to Supercharge AI
AIDEVDAY_ Data-in-Motion to Supercharge AIAIDEVDAY_ Data-in-Motion to Supercharge AI
AIDEVDAY_ Data-in-Motion to Supercharge AI
 
Using apache mx net in production deep learning streaming pipelines
Using apache mx net in production deep learning streaming pipelinesUsing apache mx net in production deep learning streaming pipelines
Using apache mx net in production deep learning streaming pipelines
 
Mm.. FLaNK Stack (MiNiFi MXNet Flink NiFi Kudu Kafka)
Mm.. FLaNK Stack (MiNiFi MXNet Flink NiFi Kudu Kafka)Mm.. FLaNK Stack (MiNiFi MXNet Flink NiFi Kudu Kafka)
Mm.. FLaNK Stack (MiNiFi MXNet Flink NiFi Kudu Kafka)
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
 
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
 
ApacheCon 2021 - Apache NiFi Deep Dive 300
ApacheCon 2021 - Apache NiFi Deep Dive 300ApacheCon 2021 - Apache NiFi Deep Dive 300
ApacheCon 2021 - Apache NiFi Deep Dive 300
 
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
 
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics FrameworksOverview of Apache Fink: the 4 G of Big Data Analytics Frameworks
Overview of Apache Fink: the 4 G of Big Data Analytics Frameworks
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
 
JConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
 
Building Real-Time Travel Alerts
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
 
Protocol Labs, David Dias, TADSummit 2018
Protocol Labs, David Dias, TADSummit 2018Protocol Labs, David Dias, TADSummit 2018
Protocol Labs, David Dias, TADSummit 2018
 
ApacheCon 2021 Apache Deep Learning 302
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302
 
IoT Edge Data Processing with NVidia Jetson Nano oct 3 2019
IoT  Edge Data Processing with NVidia Jetson Nano oct 3 2019IoT  Edge Data Processing with NVidia Jetson Nano oct 3 2019
IoT Edge Data Processing with NVidia Jetson Nano oct 3 2019
 
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
 
ApacheCon 2021: Apache NiFi 101- introduction and best practices
ApacheCon 2021:   Apache NiFi 101- introduction and best practicesApacheCon 2021:   Apache NiFi 101- introduction and best practices
ApacheCon 2021: Apache NiFi 101- introduction and best practices
 
What You Missed: Red Hat Summit 2016
What You Missed: Red Hat Summit 2016 What You Missed: Red Hat Summit 2016
What You Missed: Red Hat Summit 2016
 
GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
 
ApacheCon 2021: Cracking the nut with Apache Pulsar (FLiP)
ApacheCon 2021:  Cracking the nut with Apache Pulsar (FLiP)ApacheCon 2021:  Cracking the nut with Apache Pulsar (FLiP)
ApacheCon 2021: Cracking the nut with Apache Pulsar (FLiP)
 

Recently uploaded

EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 

Recently uploaded (20)

EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 

CoC23_ Looking at the New Features of Apache NiFi

  • 1. Looking at the New Features of Apache NiFi Timothy Spann Principal Developer Advocate Sunday October 8, 2023 4:10PM - 4:50 PM Room 102
  • 3. 3 FLaNK Stack Tim Spann @PaasDev // Blog: www.datainmotion.dev Principal Developer Advocate. Princeton Future of Data Meetup. ex-Pivotal, ex-Hortonworks, ex-StreamNative, ex-PwC https://medium.com/@tspann https://github.com/tspannhw Apache NiFi x Apache Kafka x Apache Flink
  • 4. © 2023 Cloudera, Inc. All rights reserved. 4 Future of Data - New York + Princeton + Virtual @PaasDev https://www.meetup.com/futureofdata-princeton/ https://www.meetup.com/futureofdata-newyork/ From Big Data to AI to Streaming to Containers to Cloud to Analytics to Cloud Storage to Fast Data to Machine Learning to Microservices to ...
  • 5. FLaNK Stack Weekly This week in Apache NiFi, Apache Flink, Apache Kafka, Apache Spark, Apache Iceberg, Python, Java, AI, ML, LLM and Open Source friends. https://bit.ly/32dAJft
  • 6. My Talk List Utilizing Real-Time Transit Data for Travel Optimization Let’s Monitor the Conditions at the Conference
  • 7. Agenda Apache NiFi has a lot of new features, processors and best practices that have arrived in the last year or so. I will walk through building flows using the latest tips, techniques and processor. I will and change a number of data flows utilizing the latest NiFi version and point out gotchas and some never dos. The deck will act as a take-away with notes, tips and guides to what we covered. ===> Any NiFi 1.23+ and 2.0 in progress features people want to see?
  • 9. New to 2023 Processors GenerateRecord GetAsanaObject PutSalesforceObject QuerySalesforceObject PutIoTDBRecord QueryIoTDBRecord https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12353320 ListGoogleDrive FetchGoogleDrive PutGoogleDrive PutBoxFile ListBoxFile FetchBoxFile PutDropbox DecryptContent DecryptContentCompatibility
  • 10. New to 2023 Processors ExtractRecordSchema RemoveRecordField VerifyContentMAC TriggerHiveMetaStoreEvent “count” function added to RecordPath
  • 11. AWS ML Service Processors https://github.com/tspannhw/FLaNK-AWSML
  • 13. Deprecating for Removal Deprecate Lua and Ruby Script Engines Deprecate ECMAScript Script Engine Deprecate the Ambari Reporting Task Deprecate Kafka 1.x components and 2.0 components XML Templates Variables See: https://cwiki.apache.org/confluence/display/NIFI/Deprecated+Components+and+Features
  • 14. Start Using ExecuteStateless -> run your stateless flows right in a regular NiFi cluster Parameters JSON Flow Serialization Records everywhere
  • 15. © 2020 Cloudera, Inc. All rights reserved. 15
  • 16. https://medium.com/cloudera-inc/getting-ready-for-apache-nifi-2-0-5a5e6a67f450 NiFi 2.0 Coming ● Python Integration ● Parameters ● JDK 17, maybe JDK 21+ ● JSON Flow Serialization ● Rules Engine for Development Assistance ● Run Process Group as Stateless ● flow.json.gz https://cwiki.apache.org/confluence/display/NIFI/NiFi+2.0+Release+Goals
  • 18. © 2019 Cloudera, Inc. All rights reserved. 18 Python as First Class (NIFI-11241) Graphical UI with custom Python based extensions NEW in NiFi 2.0
  • 19. © 2019 Cloudera, Inc. All rights reserved. 19 Apache NiFi in a few numbers A very active project with a dynamic community & comparison with ACEU 2019 2800+ members on the Slack channel (535+ - 4 years ago) 475+ contributors on Github across the repositories (260+ - 4 years ago) 65 committers in the Apache NiFi community (45 - 4 years ago) Apache NiFi 1.23.2 is the latest release, NiFi 2.0 coming soon (NiFi 1.10 - 4 years ago) 14M+ docker pulls of the Apache NiFi image (1M+ - 4 years ago)
  • 20. 20 © 2023 Cloudera, Inc. All rights reserved. Cloudera Edge Flow Manager (Command & Control of MiNiFi Agents) MiNiFi C++ (small footprint) MiNiFi Java (headless version of NiFi) NiFi Registry Cloudera NiFi for Kafka Connect NiFi in Cloudera DataFlow Functions Cloudera DataFlow Stateless NiFi NiFi Deploy Options from Open Source to Managed
  • 21. 21 © 2023 Cloudera, Inc. All rights reserved. NiFi 2.0 is coming… https://medium.com/cloudera-inc/getting-ready-for-apache-nifi-2-0-5a5e6a67f450 - First-class citizen Python API - Rules Engine - NiFi Stateless at Process Group level - Java 21 (virtual threads, perf improvements, etc) https://medium.com/@george.vetticaden/accelerating-ai-data-pipelines-building-an-evernote-chatbot-with-apache-nifi-2-0-and-generative-ai-9d977466ff4c Closing the gap between data engineers and data scientists… - Export documentation (Sharepoint, OCR) to build the knowledge base powering your chatbot - Scrape the internet (Sitemap) to build the knowledge base powering your chatbot - Real-time streaming ingest of Slack to build the knowledge base powering your chatbot
  • 22. DEMO
  • 23.