Sql bits apache nifi 101 Introduction and best practices

Timothy Spann
Timothy SpannDeveloper Advocate
Apache NiFi 101:
Introduction and Best
Practices
Tim Spann | Developer Advocate
Question Everything!
Session
Regular 50 minute session
Apache NiFi 101: Introduction
and Best Practices
Primary Speaker
Fri 12:00
Feedback Link
https://sqlb.it/?7108
Timothy Spann | Developer
Advocate
FLiP(N) Stack = Flink, Pulsar and NiFI Stack
Streaming Systems & Data Architecture Expert
Experience:
15+ years of experience with streaming technologies
including Pulsar, Flink, Spark, NiFi, Kafka, Big Data,
Cloud, ML, IoT and more.
Today, he helps to grow the Pulsar community sharing
rich technical knowledge and experience at both global
conferences and through individual conversations
streamnative.io
Passionate and dedicated team.
Founded by the original developers of
Apache Pulsar.
StreamNative helps teams to capture,
manage, and leverage data using Pulsar’s
unified messaging and streaming
platform.
FLiP Stack Weekly
This week in Apache Flink, Apache Pulsar, Apache
NiFi, Apache Spark and open source friends.
https://bit.ly/32dAJft
<->
Streaming FLiPN Apps
StreamNative Hub
StreamNative Cloud
Unified Batch and Stream COMPUTING
Batch
(Batch + Stream)
Unified Batch and Stream STORAGE
Offload
(Queuing + Streaming)
Tiered Storage
Pulsar
---
KoP
---
MoP
---
Websocket
Pulsar
Sink
Streaming
Edge Gateway
Protocols
<-> Events <->
CDC
Apps
Apache NiFi
Why Apache NiFi?
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Supports push and pull
models
• 350+ processors
• Visual command and
control
• Over a 100 sources
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
• Version Control
Architecture
https://nifi.apache.org/docs/nifi-docs/html/overview.html
Flow File
https://nifi.apache.org/docs/nifi-docs/html/overview.html
Flow Files are content and key/value pairs for attributes
that are each event/message/file that has
been introduced into NiFi.
Processor
https://nifi.apache.org/docs/nifi-docs/html/overview.html
A Java component that runs in NiFi to route, process or
manipulate data. You can build your own if it is not included in
Standard NiFi or not in the open source.
Controller
Like a connection pools, connections, processes that ingest or
work with outside data.
Connection
https://nifi.apache.org/docs/nifi-docs/html/overview.html
These link together NiFi processors.
Process Groups
Groups of processors. These are versionable and reusable
components/modules.
Provenance
https://www.datainmotion.dev/2021/01/automating-starting-services-in-apache.html
Backpressure & Prioritizers
https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
Record Processors
https://www.datainmotion.dev/2019/03/advanced-xml-processing-with-apache.html
● XML, CSV, JSON, AVRO and more
● Schemas or Inferred Schemas
● Easily convert between them
● Support SQL with Apache Calcite
Record Processors
https://www.datainmotion.dev/2019/03/advanced-xml-processing-with-apache.html
Caching
https://dev.to/tspannhw/flank-using-apache-kudu-as-a-cache-for-fda-updates-4knj
Listen FTP
Let Apache NiFi be your FTP server
Consume MQTT
This could read from Apache Pulsar - MoP (MQTT on Pulsar)
Apache NiFi Pulsar Connector
https://github.com/david-streamlio/pulsar-nifi-bundle
Apache NiFi Pulsar Connector
https://www.datainmotion.dev/2021/11/producing-and-consuming-pulsar-messages.html
Apache NiFi Pulsar Connector
Apache NiFi Pulsar Connector
Apache NiFi Pulsar Connector
https://github.com/streamnative/pulsar-nifi-bundle
Apache NiFi Pulsar Connector
https://streamnative.io/apache-nifi-connector/
https://github.com/tspannhw/awesome-nifi-pulsar
https://t.co/TbcYhdUPVn
Sql bits    apache nifi 101 Introduction and best practices
Metrics, Status, Charts
https://www.clouddataops.dev/data-flow-experience
DevOps on Apache NiFi 1.15.3
Toolkit Setup on Apache NiFi 1.15.3
Download NiFi Toolkit
Copy keystore and truststore information from your NiFi conf/nifi.properties
Create a nifi.properties file linked to the cli.sh
baseUrl=https://nvidia-desktop:8443
keystore=/home/nvidia/nvme/nifi-1.15.3/conf/keystore.p12
keystoreType=PKCS12
keystorePasswd=5325343412efaab3123c6892d93
keyPasswd=53134eee99da9dbe9349123aa17c6892d93
truststore=/home/nvidia/nvme/nifi-1.15.3/conf/truststore.p12
truststoreType=PKCS12
truststorePasswd=93498Dfdjfhujdhure8d8hfd84j3n43jd
DevOps
https://www.datainmotion.dev/2021/01/automating-starting-services-in-apache.html
https://nipyapi.readthedocs.io/en/latest/
nifi-toolkit/bin/cli.sh nifi list-param-contexts
nifi-toolkit/bin/cli.sh nifi pg-list
nifi-toolkit/bin/cli.sh nifi pg-set-param-context …
Or
nifi-toolkit/bin/cli.sh
nifi pg-list
DevOps
https://dev.to/tspannhw/automating-starting-services-in-apache-nifi-and-applying-parameters-5h4n
https://github.com/tspannhw/ApacheConAtHome2020/blob/main/scripts/setupnifi.sh
nifi pg-list
nifi pg-status
nifi pg-get-services
nifi pg-enable-services -u https:/
/nvidia-desktop:8443 --processGroupId root
nifi pg-start -u http:/
/edge2ai-1.dim.local:8080 -pgid LOOKTHISUP
nifi list-param-contexts -u https:/
/nvidia-desktop:8443 -verbose
nifi create-reporting-task -u https:/
/nvidia-desktop:8443 -verbose -i
All Data - Anytime - Anywhere - Multi-Cloud - Multi-Protocol
Multi-
inges
t
Multi-
inges
t
Multi-ingest Merge
Priority
Apache Pulsar
Apache Pulsar is a Cloud-Native
Messaging and Event-Streaming Platform.
Why Apache Pulsar?
Unified
Messaging
Platform
Guaranteed
Message
Delivery
Resiliency Infinite
Scalability
Connectivity
• Functions - Lightweight Stream
Processing (Java, Python, Go)
• Connectors - Sources & Sinks
(Cassandra, Kafka, …)
• Protocol Handlers - AoP (AMQP), KoP
(Kafka), MoP (MQTT)
• Processing Engines - Flink, Spark,
Presto/Trino via Pulsar SQL, NiFi
• Data Offloaders - Tiered Storage - (S3)
hub.streamnative.io
Serverless Event
Streaming
Framework
• Lightweight computation similar
to AWS Lambda.
• Specifically designed to use
Apache Pulsar as a message bus.
• Function runtime can be located
within Pulsar Broker.
• Java, Go, Python
https://streamnative.io/blog/engineering/2021-11-10-streaming-data-pipelines-with-pulsar-io/
Presto/Trino workers can read segments
directly from bookies (or offloaded storage) in
parallel. Bookie
1
Segment 1
Producer Consumer
Broker 1
Topic1-Part1
Broker 2
Topic1-Part2
Broker 3
Topic1-Part3
Segment
2
Segment
3
Segment
4
Segment X
Segment 1
Segment
1 Segment 1
Segment 3
Segment
3
Segment 3
Segment 2
Segment
2
Segment 2
Segment 4
Segment 4
Segment
4
Segment X
Segment X
Segment X
Bookie
2
Bookie
3
Query
Coordin
ator
.
.
.
.
.
.
SQL
Worker
SQL
Worker
SQL
Worker
SQL
Worker
Query
Topic
Metadata
Pulsar SQL
Question Everything!
Session
Regular 50 minute session
Apache NiFi 101: Introduction
and Best Practices
Primary Speaker
Fri 12:00
Feedback Link
https://sqlb.it/?7108
● https://www.datainmotion.dev/2021/11/producing-and-consuming-pulsar-messages.html
● https://www.datainmotion.dev/2020/06/no-more-spaghetti-flows.html
● https://github.com/tspannhw/EverythingApacheNiFi
● https://www.datainmotion.dev/2019/03/apache-nifi-101.html
● https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html
● https://pierrevillard.com/best-of-nifi/
● https://blogs.apache.org/nifi/
● https://www.nifi.rocks/documents/nifi-expression-language-cheat-sheet.pdf
● https://dev.to/tspannhw/new-features-of-apache-nifi-1-13-0-45ln
● https://dev.to/tspannhw/tracking-satellites-with-apache-nifi-44o7
● https://www.datainmotion.dev/2021/01/flank-using-apache-kudu-as-cache-for.html
● https://www.datainmotion.dev/2020/12/basic-understanding-of-cloudera-flow.html
● https://bryanbende.com/development/2021/11/10/apache-nifi-stateless
Deeper Content
Now Available
On-Demand Pulsar
Training
Academy.StreamNative.io
1 of 41

Recommended

Building Real-Time Travel Alerts by
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel AlertsTimothy Spann
165 views48 slides
JConWorld_ Continuous SQL with Kafka and Flink by
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
156 views36 slides
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines by
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data PipelinesTimothy Spann
150 views25 slides
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo by
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoTimothy Spann
162 views8 slides
CoC23_ Looking at the New Features of Apache NiFi by
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiTimothy Spann
36 views24 slides
CoC23_ Let’s Monitor The Conditions at the Conference by
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the ConferenceTimothy Spann
17 views17 slides

More Related Content

More from Timothy Spann

The Never Landing Stream with HTAP and Streaming by
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingTimothy Spann
254 views39 slides
Meetup - Brasil - Data In Motion - 2023 September 19 by
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Timothy Spann
319 views33 slides
Implement a Universal Data Distribution Architecture to Manage All Streaming ... by
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Timothy Spann
28 views56 slides
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data by
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataTimothy Spann
193 views45 slides
big data fest building modern data streaming apps by
big data fest building modern data streaming appsbig data fest building modern data streaming apps
big data fest building modern data streaming appsTimothy Spann
317 views55 slides
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp by
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-RampTimothy Spann
163 views27 slides

More from Timothy Spann(20)

The Never Landing Stream with HTAP and Streaming by Timothy Spann
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
Timothy Spann254 views
Meetup - Brasil - Data In Motion - 2023 September 19 by Timothy Spann
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19
Timothy Spann319 views
Implement a Universal Data Distribution Architecture to Manage All Streaming ... by Timothy Spann
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Timothy Spann28 views
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data by Timothy Spann
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Timothy Spann193 views
big data fest building modern data streaming apps by Timothy Spann
big data fest building modern data streaming appsbig data fest building modern data streaming apps
big data fest building modern data streaming apps
Timothy Spann317 views
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp by Timothy Spann
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Timothy Spann163 views
OSSNA Building Modern Data Streaming Apps by Timothy Spann
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
Timothy Spann155 views
GSJUG: Mastering Data Streaming Pipelines 09May2023 by Timothy Spann
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
Timothy Spann255 views
BestInFlowCompetitionTutorials03May2023 by Timothy Spann
BestInFlowCompetitionTutorials03May2023BestInFlowCompetitionTutorials03May2023
BestInFlowCompetitionTutorials03May2023
Timothy Spann11 views
Cloudera Sandbox Event Guidelines For Workflow by Timothy Spann
Cloudera Sandbox Event Guidelines For WorkflowCloudera Sandbox Event Guidelines For Workflow
Cloudera Sandbox Event Guidelines For Workflow
Timothy Spann32 views
Meet the Committers Webinar_ Lab Preparation by Timothy Spann
Meet the Committers Webinar_ Lab PreparationMeet the Committers Webinar_ Lab Preparation
Meet the Committers Webinar_ Lab Preparation
Timothy Spann32 views
Best Practices For Workflow by Timothy Spann
Best Practices For WorkflowBest Practices For Workflow
Best Practices For Workflow
Timothy Spann89 views
Meetup: Streaming Data Pipeline Development by Timothy Spann
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
Timothy Spann337 views
DevNexus: Apache Pulsar Development 101 with Java by Timothy Spann
DevNexus:  Apache Pulsar Development 101 with JavaDevNexus:  Apache Pulsar Development 101 with Java
DevNexus: Apache Pulsar Development 101 with Java
Timothy Spann261 views
Conf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices by Timothy Spann
Conf42 Python_ ML Enhanced Event Streaming Apps with Python MicroservicesConf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices
Conf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices
Timothy Spann443 views
ITPC Building Modern Data Streaming Apps by Timothy Spann
ITPC Building Modern Data Streaming AppsITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming Apps
Timothy Spann797 views
PythonWebConference_ Cloud Native Apache Pulsar Development 202 with Python by Timothy Spann
PythonWebConference_ Cloud Native Apache Pulsar Development 202 with PythonPythonWebConference_ Cloud Native Apache Pulsar Development 202 with Python
PythonWebConference_ Cloud Native Apache Pulsar Development 202 with Python
Timothy Spann430 views
PhillyJug Getting Started With Real-time Cloud Native Streaming With Java by Timothy Spann
PhillyJug  Getting Started With Real-time Cloud Native Streaming With JavaPhillyJug  Getting Started With Real-time Cloud Native Streaming With Java
PhillyJug Getting Started With Real-time Cloud Native Streaming With Java
Timothy Spann625 views
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud) by Timothy Spann
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Timothy Spann18 views

Recently uploaded

Introduction to Gradle by
Introduction to GradleIntroduction to Gradle
Introduction to GradleJohn Valentino
7 views7 slides
How to build dyanmic dashboards and ensure they always work by
How to build dyanmic dashboards and ensure they always workHow to build dyanmic dashboards and ensure they always work
How to build dyanmic dashboards and ensure they always workWiiisdom
16 views13 slides
EV Charging App Case by
EV Charging App Case EV Charging App Case
EV Charging App Case iCoderz Solutions
10 views1 slide
Using Qt under LGPL-3.0 by
Using Qt under LGPL-3.0Using Qt under LGPL-3.0
Using Qt under LGPL-3.0Burkhard Stubert
14 views11 slides
FOSSLight Community Day 2023-11-30 by
FOSSLight Community Day 2023-11-30FOSSLight Community Day 2023-11-30
FOSSLight Community Day 2023-11-30Shane Coughlan
8 views18 slides
Bootstrapping vs Venture Capital.pptx by
Bootstrapping vs Venture Capital.pptxBootstrapping vs Venture Capital.pptx
Bootstrapping vs Venture Capital.pptxZeljko Svedic
16 views17 slides

Recently uploaded(20)

How to build dyanmic dashboards and ensure they always work by Wiiisdom
How to build dyanmic dashboards and ensure they always workHow to build dyanmic dashboards and ensure they always work
How to build dyanmic dashboards and ensure they always work
Wiiisdom16 views
FOSSLight Community Day 2023-11-30 by Shane Coughlan
FOSSLight Community Day 2023-11-30FOSSLight Community Day 2023-11-30
FOSSLight Community Day 2023-11-30
Shane Coughlan8 views
Bootstrapping vs Venture Capital.pptx by Zeljko Svedic
Bootstrapping vs Venture Capital.pptxBootstrapping vs Venture Capital.pptx
Bootstrapping vs Venture Capital.pptx
Zeljko Svedic16 views
Understanding HTML terminology by artembondar5
Understanding HTML terminologyUnderstanding HTML terminology
Understanding HTML terminology
artembondar58 views
How Workforce Management Software Empowers SMEs | TraQSuite by TraQSuite
How Workforce Management Software Empowers SMEs | TraQSuiteHow Workforce Management Software Empowers SMEs | TraQSuite
How Workforce Management Software Empowers SMEs | TraQSuite
TraQSuite7 views
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile... by Stefan Wolpers
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...
Stefan Wolpers44 views
Automated Testing of Microsoft Power BI Reports by RTTS
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI Reports
RTTS11 views
Ports-and-Adapters Architecture for Embedded HMI by Burkhard Stubert
Ports-and-Adapters Architecture for Embedded HMIPorts-and-Adapters Architecture for Embedded HMI
Ports-and-Adapters Architecture for Embedded HMI
Burkhard Stubert35 views
Electronic AWB - Electronic Air Waybill by Freightoscope
Electronic AWB - Electronic Air Waybill Electronic AWB - Electronic Air Waybill
Electronic AWB - Electronic Air Waybill
Freightoscope 6 views
tecnologia18.docx by nosi6702
tecnologia18.docxtecnologia18.docx
tecnologia18.docx
nosi67026 views

Sql bits apache nifi 101 Introduction and best practices