Devfest uk & ireland using apache nifi with apache pulsar for fast data on-ramp 2022

Timothy Spann
Timothy SpannDeveloper Advocate
Using Apache NiFi with Apache
Pulsar for Fast Data On-Ramp
Tim Spann | Developer Advocate
Tim Spann
Developer Advocate
● https://www.datainmotion.dev/
● https://github.com/tspannhw/SpeakerProfile
● https://dev.to/tspannhw
● https://sessionize.com/tspann/
DZone Zone Leader and Big Data
MVB Data DJay
FLiP Stack Weekly
This week in Apache Flink, Apache Pulsar, Apache
NiFi, Apache Spark and open source friends.
https://bit.ly/32dAJft
streamnative.io
● Founded the original developers of
Apache Pulsar.
● Passionate and dedicated team.
● StreamNative helps teams to capture,
manage, and leverage data using
Pulsar’s unified messaging and
streaming platform.
The Need For Real-Time Data
Hybrid and multi-cloud
strategies with native
geo-replication
Seamlessly build
microservice architectures
with support for streaming
and messaging workloads
Built for Kubernetes
CloudNative
migrations with tools
360 degree customer data
multi-tenancy, infinite
retention, and extensive
connector ecosystem
Events <->
Streaming FLiPN Apps
StreamNative Hub
StreamNative Cloud
Unified Batch and Stream COMPUTING
Batch
(Batch + Stream)
Unified Batch and Stream STORAGE
Offload
(Queuing + Streaming)
Tiered Storage
Pulsar
---
KoP
---
MoP
---
Websocket
Pulsar
Sink
Streaming
Edge Gateway
Protocols
<-> Events <->
CDC
Apps
Apache Pulsar
Apache Pulsar
● Serverless computing framework.
● Unbounded storage, multi-tiered
architecture, and tiered-storage.
● Streaming & Pub/Sub messaging
semantics.
● Multi-protocol support
Why Apache Pulsar?
Unified
Messaging Platform
Guaranteed
Message Delivery Resiliency Infinite
Scalability
Messages - the basic unit of Pulsar
Component Description
Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although
message data can also conform to data schemas.
Key Messages are optionally tagged with keys, used in partitioning and also is useful for
things like topic compaction.
Properties An optional key/value map of user-defined properties.
Producer name The name of the producer who produces the message. If you do not specify a producer
name, the default name is used. Message De-Duplication.
Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of
the message is its order in that sequence. Message De-Duplication.
Connectivity
• Functions - Lightweight Stream
Processing (Java, Python, Go)
• Connectors - Sources & Sinks
(Cassandra, Kafka, …)
• Protocol Handlers - AoP (AMQP), KoP
(Kafka), MoP (MQTT)
• Processing Engines - Flink, Spark,
Presto/Trino via Pulsar SQL
• Data Offloaders - Tiered Storage - (S3)
hub.streamnative.io
Pulsar’s Publish-Subscribe model
Broker
Subscription
Consumer 1
Consumer 2
Consumer 3
Topic
Producer 1
Producer 2
● Producers send messages.
● Topics are an ordered, named channel that
producers use to transmit messages to
subscribed consumers.
● Messages belong to a topic and contain an
arbitrary payload.
● Brokers handle connections and routes
messages between producers / consumers.
● Subscriptions are named configuration
rules that determine how messages are
delivered to consumers.
● Consumers receive messages.
Topics
Tenants
(Compliance)
Tenants
(Data Services)
Namespace
(Microservices)
Topic-1
(Cust Auth)
Topic-1
(Location Resolution)
Topic-2
(Demographics)
Topic-1
(Budgeted Spend)
Topic-1
(Acct History)
Topic-1
(Risk Detection)
Namespace
(ETL)
Namespace
(Campaigns)
Namespace
(ETL)
Tenants
(Marketing)
Namespace
(Risk Assessment)
Pulsar Instance
Pulsar Cluster
Pulsar subscription modes
Different subscription modes have
different semantics:
Exclusive/Failover - guaranteed
order, single active consumer
Shared - multiple active consumers,
no order
Key_Shared - multiple active
consumers, order for given key
Producer 1
Producer 2
Pulsar Topic
Subscription D
Consumer D-1
Consumer D-2
Key-Shared
<
K
1,
V
10
>
<
K
1,
V
11
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
2
,V
2
1>
<
K
2
,V
2
2
>
Subscription C
Consumer C-1
Consumer C-2
Shared
<
K
1,
V
10
>
<
K
2,
V
21
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
1,
V
11
>
<
K
2
,V
2
2
>
Subscription A Consumer A
Exclusive
Subscription B
Consumer B-1
Consumer B-2
In case of failure in
Consumer B-1
Failover
What are Pulsar Functions?
• Lambda-style functions that
use Pulsar as the message bus.
• Handles producer/consumer
setup
• Applies user supplied business
logic against consumed
message.
Benefits of Pulsar
Functions
• Allow you to focus on the business
logic.
• Eliminates boilerplate code.
• Handles message consumption and
publication
• No need for another processing
framework.
• Can be scaled up independently
Use cases
● Unified Messaging Platform
● AdTech
● Fraud Detection
● Connected Car
● IoT Analytics
● Microservices Development
Apache NiFi
Why Apache NiFi?
• Guaranteed delivery
• Data buffering
- Backpressure
- Pressure release
• Prioritized queuing
• Flow specific QoS
- Latency vs. throughput
- Loss tolerance
• Data provenance
• Supports push and pull
models
• Hundreds of processors
• Visual command and
control
• Over a sixty sources
• Flow templates
• Pluggable/multi-role
security
• Designed for extension
• Clustering
• Version Control
Architecture
https://nifi.apache.org/docs/nifi-docs/html/overview.html
Provenance
https://www.datainmotion.dev/2021/01/automating-starting-services-in-apache.html
Backpressure & Prioritizers
https://www.datainmotion.dev/2019/11/exploring-apache-nifi-110-parameters.html
Record Processors
https://www.datainmotion.dev/2019/03/advanced-xml-processing-with-apache.html
● XML, CSV, JSON, AVRO and more
● Schemas or Inferred Schemas
● Easily convert between them
● Support SQL with Apache Calcite
Record Processors
https://www.datainmotion.dev/2019/03/advanced-xml-processing-with-apache.html
Consume MQTT
This could read from Apache Pulsar - MoP (MQTT on Pulsar)
Apache NiFi Pulsar Connector
https://github.com/david-streamlio/pulsar-nifi-bundle
Apache NiFi Pulsar Connector
https://www.datainmotion.dev/2021/11/producing-and-consuming-pulsar-messages.html
Apache NiFi Pulsar Connector
Apache NiFi Pulsar Connector
Apache NiFi Pulsar Connector
https://github.com/david-streamlio/pulsar-nifi-bundle/releases/tag/v1.14.0
StreamNative
Cloud
Powered by Apache Pulsar, StreamNative provides a cloud-native,
real-time messaging and streaming platform to support multi-cloud
and hybrid cloud strategies.
Built for Containers
Cloud Native
StreamNative Cloud
Flink SQL
StreamNative
Ambassador Program
2022
Learn More Start Survey
Tell us about your Pulsar experience
and what improvements you would
like to see!
Now Available
On-Demand Pulsar
Training
Academy.StreamNative.io
Live 3-day
Developers Training
Times:
● Europe: 3:00 PM CET - 7:00 PM CET
● EasternTime: 9:00 AM - 1: 00 PM EST
● Pacific Time: 6:00 AM - 10 AM PST
Save Your Spot!
34
Feb
15-17
Let’s Keep
in Touch!
Tim Spann
Developer Advocate
@PaaSDev
https://www.linkedin.com/in/timothyspann
https://github.com/tspannhw
MQTT on Pulsar (MoP)
Kafka-on-Pulsar (Kop)
1 of 37

Recommended

Data minutes #2 Apache Pulsar with MQTT for Edge Computing Lightning - 2022 by
Data minutes #2   Apache Pulsar with MQTT for Edge Computing Lightning - 2022Data minutes #2   Apache Pulsar with MQTT for Edge Computing Lightning - 2022
Data minutes #2 Apache Pulsar with MQTT for Edge Computing Lightning - 2022Timothy Spann
571 views17 slides
Api world apache nifi 101 by
Api world   apache nifi 101Api world   apache nifi 101
Api world apache nifi 101Timothy Spann
523 views33 slides
Music city data Hail Hydrate! from stream to lake by
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeTimothy Spann
708 views37 slides
Data science online camp using the flipn stack for edge ai (flink, nifi, pu... by
Data science online camp   using the flipn stack for edge ai (flink, nifi, pu...Data science online camp   using the flipn stack for edge ai (flink, nifi, pu...
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...Timothy Spann
1K views48 slides
Spark optimization by
Spark optimizationSpark optimization
Spark optimizationAnkit Beohar
484 views6 slides
Learning the basics of Apache NiFi for iot OSS Europe 2020 by
Learning the basics of Apache NiFi for iot OSS Europe 2020Learning the basics of Apache NiFi for iot OSS Europe 2020
Learning the basics of Apache NiFi for iot OSS Europe 2020Timothy Spann
614 views33 slides

More Related Content

What's hot

StreamNative FLiP into scylladb - scylla summit 2022 by
StreamNative   FLiP into scylladb - scylla summit 2022StreamNative   FLiP into scylladb - scylla summit 2022
StreamNative FLiP into scylladb - scylla summit 2022Timothy Spann
528 views22 slides
ApacheCon 2021 - Apache NiFi Deep Dive 300 by
ApacheCon 2021 - Apache NiFi Deep Dive 300ApacheCon 2021 - Apache NiFi Deep Dive 300
ApacheCon 2021 - Apache NiFi Deep Dive 300Timothy Spann
690 views37 slides
fluentd -- the missing log collector by
fluentd -- the missing log collectorfluentd -- the missing log collector
fluentd -- the missing log collectorMuga Nishizawa
2.2K views51 slides
Pulsar summit asia 2021: Designing Pulsar for Isolation by
Pulsar summit asia 2021: Designing Pulsar for IsolationPulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for IsolationShivji Kumar Jha
175 views28 slides
Real time stock processing with apache nifi, apache flink and apache kafka by
Real time stock processing with apache nifi, apache flink and apache kafkaReal time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaTimothy Spann
786 views10 slides
Cracking the nut, solving edge ai with apache tools and frameworks by
Cracking the nut, solving edge ai with apache tools and frameworksCracking the nut, solving edge ai with apache tools and frameworks
Cracking the nut, solving edge ai with apache tools and frameworksTimothy Spann
713 views25 slides

What's hot(20)

StreamNative FLiP into scylladb - scylla summit 2022 by Timothy Spann
StreamNative   FLiP into scylladb - scylla summit 2022StreamNative   FLiP into scylladb - scylla summit 2022
StreamNative FLiP into scylladb - scylla summit 2022
Timothy Spann528 views
ApacheCon 2021 - Apache NiFi Deep Dive 300 by Timothy Spann
ApacheCon 2021 - Apache NiFi Deep Dive 300ApacheCon 2021 - Apache NiFi Deep Dive 300
ApacheCon 2021 - Apache NiFi Deep Dive 300
Timothy Spann690 views
fluentd -- the missing log collector by Muga Nishizawa
fluentd -- the missing log collectorfluentd -- the missing log collector
fluentd -- the missing log collector
Muga Nishizawa2.2K views
Pulsar summit asia 2021: Designing Pulsar for Isolation by Shivji Kumar Jha
Pulsar summit asia 2021: Designing Pulsar for IsolationPulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for Isolation
Shivji Kumar Jha175 views
Real time stock processing with apache nifi, apache flink and apache kafka by Timothy Spann
Real time stock processing with apache nifi, apache flink and apache kafkaReal time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafka
Timothy Spann786 views
Cracking the nut, solving edge ai with apache tools and frameworks by Timothy Spann
Cracking the nut, solving edge ai with apache tools and frameworksCracking the nut, solving edge ai with apache tools and frameworks
Cracking the nut, solving edge ai with apache tools and frameworks
Timothy Spann713 views
Real time cloud native open source streaming of any data to apache solr by Timothy Spann
Real time cloud native open source streaming of any data to apache solrReal time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solr
Timothy Spann759 views
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing by Timothy Spann
Pulsar summit asia 2021   apache pulsar with mqtt for edge computingPulsar summit asia 2021   apache pulsar with mqtt for edge computing
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Timothy Spann366 views
Cracking the nut, solving edge ai with apache tools and frameworks by Timothy Spann
Cracking the nut, solving edge ai with apache tools and frameworksCracking the nut, solving edge ai with apache tools and frameworks
Cracking the nut, solving edge ai with apache tools and frameworks
Timothy Spann504 views
[March sn meetup] apache pulsar + apache nifi for cloud data lake by Timothy Spann
[March sn meetup] apache pulsar + apache nifi for cloud data lake[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake
Timothy Spann903 views
Using FLiP with influxdb for edgeai iot at scale 2022 by Timothy Spann
Using FLiP with influxdb for edgeai iot at scale 2022Using FLiP with influxdb for edgeai iot at scale 2022
Using FLiP with influxdb for edgeai iot at scale 2022
Timothy Spann465 views
DBCC 2021 - FLiP Stack for Cloud Data Lakes by Timothy Spann
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data Lakes
Timothy Spann717 views
Using the FLiPN stack for edge ai (flink, nifi, pulsar) by Timothy Spann
Using the FLiPN stack for edge ai (flink, nifi, pulsar)Using the FLiPN stack for edge ai (flink, nifi, pulsar)
Using the FLiPN stack for edge ai (flink, nifi, pulsar)
Timothy Spann450 views
ApacheCon 2021: Cracking the nut with Apache Pulsar (FLiP) by Timothy Spann
ApacheCon 2021:  Cracking the nut with Apache Pulsar (FLiP)ApacheCon 2021:  Cracking the nut with Apache Pulsar (FLiP)
ApacheCon 2021: Cracking the nut with Apache Pulsar (FLiP)
Timothy Spann355 views
Automation + dev ops summit hail hydrate! from stream to lake by Timothy Spann
Automation + dev ops summit   hail hydrate! from stream to lakeAutomation + dev ops summit   hail hydrate! from stream to lake
Automation + dev ops summit hail hydrate! from stream to lake
Timothy Spann457 views
ApacheCon 2021: Apache NiFi 101- introduction and best practices by Timothy Spann
ApacheCon 2021:   Apache NiFi 101- introduction and best practicesApacheCon 2021:   Apache NiFi 101- introduction and best practices
ApacheCon 2021: Apache NiFi 101- introduction and best practices
Timothy Spann887 views
Cloud lunch and learn real-time streaming in azure by Timothy Spann
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
Timothy Spann663 views
Apache Deep Learning 201 - Philly Open Source by Timothy Spann
Apache Deep Learning 201 - Philly Open SourceApache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open Source
Timothy Spann642 views
Big mountain data and dev conference apache pulsar with mqtt for edge compu... by Timothy Spann
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Timothy Spann440 views

Similar to Devfest uk & ireland using apache nifi with apache pulsar for fast data on-ramp 2022

Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid... by
Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...Timothy Spann
757 views29 slides
OSSNA Building Modern Data Streaming Apps by
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming AppsTimothy Spann
155 views50 slides
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum... by
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...HostedbyConfluent
336 views49 slides
(Current22) Let's Monitor The Conditions at the Conference by
(Current22) Let's Monitor The Conditions at the Conference(Current22) Let's Monitor The Conditions at the Conference
(Current22) Let's Monitor The Conditions at the ConferenceTimothy Spann
150 views49 slides
[AI Dev World 2022] Build ML Enhanced Event Streaming by
[AI Dev World 2022] Build ML Enhanced Event Streaming[AI Dev World 2022] Build ML Enhanced Event Streaming
[AI Dev World 2022] Build ML Enhanced Event StreamingTimothy Spann
201 views22 slides
Building an Event Streaming Architecture with Apache Pulsar by
Building an Event Streaming Architecture with Apache PulsarBuilding an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache PulsarScyllaDB
136 views28 slides

Similar to Devfest uk & ireland using apache nifi with apache pulsar for fast data on-ramp 2022(20)

Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid... by Timothy Spann
Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
Timothy Spann757 views
OSSNA Building Modern Data Streaming Apps by Timothy Spann
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
Timothy Spann155 views
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum... by HostedbyConfluent
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
HostedbyConfluent336 views
(Current22) Let's Monitor The Conditions at the Conference by Timothy Spann
(Current22) Let's Monitor The Conditions at the Conference(Current22) Let's Monitor The Conditions at the Conference
(Current22) Let's Monitor The Conditions at the Conference
Timothy Spann150 views
[AI Dev World 2022] Build ML Enhanced Event Streaming by Timothy Spann
[AI Dev World 2022] Build ML Enhanced Event Streaming[AI Dev World 2022] Build ML Enhanced Event Streaming
[AI Dev World 2022] Build ML Enhanced Event Streaming
Timothy Spann201 views
Building an Event Streaming Architecture with Apache Pulsar by ScyllaDB
Building an Event Streaming Architecture with Apache PulsarBuilding an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache Pulsar
ScyllaDB136 views
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum... by StreamNative
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
StreamNative750 views
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team by Timothy Spann
NYC Dec 2022 Meetup_ Building Real-Time Requires a TeamNYC Dec 2022 Meetup_ Building Real-Time Requires a Team
NYC Dec 2022 Meetup_ Building Real-Time Requires a Team
Timothy Spann485 views
Timothy Spann [StreamNative] | Using FLaNK with InfluxDB for EdgeAI IoT at Sc... by InfluxData
Timothy Spann [StreamNative] | Using FLaNK with InfluxDB for EdgeAI IoT at Sc...Timothy Spann [StreamNative] | Using FLaNK with InfluxDB for EdgeAI IoT at Sc...
Timothy Spann [StreamNative] | Using FLaNK with InfluxDB for EdgeAI IoT at Sc...
InfluxData250 views
Using FLiP with influxdb for EdgeAI IoT at Scale by Timothy Spann
Using FLiP with influxdb for EdgeAI IoT at ScaleUsing FLiP with influxdb for EdgeAI IoT at Scale
Using FLiP with influxdb for EdgeAI IoT at Scale
Timothy Spann2.3K views
Big data conference europe real-time streaming in any and all clouds, hybri... by Timothy Spann
Big data conference europe   real-time streaming in any and all clouds, hybri...Big data conference europe   real-time streaming in any and all clouds, hybri...
Big data conference europe real-time streaming in any and all clouds, hybri...
Timothy Spann811 views
Current and Future of Apache Kafka by Joe Stein
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
Joe Stein9K views
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp... by Timothy Spann
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Timothy Spann470 views
Introduction to Apache Kafka by Ricardo Bravo
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
Ricardo Bravo110 views
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and... by Timothy Spann
Osacon 2021   hello hydrate! from stream to clickhouse with apache pulsar and...Osacon 2021   hello hydrate! from stream to clickhouse with apache pulsar and...
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and...
Timothy Spann3.1K views
Timothy Spann: Apache Pulsar for ML by Edunomica
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
Edunomica37 views
PNDA - Platform for Network Data Analytics by John Evans
PNDA - Platform for Network Data AnalyticsPNDA - Platform for Network Data Analytics
PNDA - Platform for Network Data Analytics
John Evans121.4K views
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp by José Román Martín Gil
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) by Timothy Spann
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Timothy Spann305 views

More from Timothy Spann

Building Real-Time Travel Alerts by
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel AlertsTimothy Spann
165 views48 slides
JConWorld_ Continuous SQL with Kafka and Flink by
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
156 views36 slides
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines by
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data PipelinesTimothy Spann
150 views25 slides
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo by
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoTimothy Spann
162 views8 slides
CoC23_ Looking at the New Features of Apache NiFi by
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiTimothy Spann
36 views24 slides
CoC23_ Let’s Monitor The Conditions at the Conference by
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the ConferenceTimothy Spann
17 views17 slides

More from Timothy Spann(20)

Building Real-Time Travel Alerts by Timothy Spann
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
Timothy Spann165 views
JConWorld_ Continuous SQL with Kafka and Flink by Timothy Spann
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann156 views
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines by Timothy Spann
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
Timothy Spann150 views
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo by Timothy Spann
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Timothy Spann162 views
CoC23_ Looking at the New Features of Apache NiFi by Timothy Spann
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFi
Timothy Spann36 views
CoC23_ Let’s Monitor The Conditions at the Conference by Timothy Spann
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the Conference
Timothy Spann17 views
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf by Timothy Spann
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
Timothy Spann23 views
CoC23_Utilizing Real-Time Transit Data for Travel Optimization by Timothy Spann
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationCoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
Timothy Spann31 views
The Never Landing Stream with HTAP and Streaming by Timothy Spann
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
Timothy Spann254 views
Meetup - Brasil - Data In Motion - 2023 September 19 by Timothy Spann
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19
Timothy Spann319 views
Implement a Universal Data Distribution Architecture to Manage All Streaming ... by Timothy Spann
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Timothy Spann28 views
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data by Timothy Spann
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Timothy Spann193 views
big data fest building modern data streaming apps by Timothy Spann
big data fest building modern data streaming appsbig data fest building modern data streaming apps
big data fest building modern data streaming apps
Timothy Spann317 views
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp by Timothy Spann
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Timothy Spann163 views
GSJUG: Mastering Data Streaming Pipelines 09May2023 by Timothy Spann
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
Timothy Spann255 views
BestInFlowCompetitionTutorials03May2023 by Timothy Spann
BestInFlowCompetitionTutorials03May2023BestInFlowCompetitionTutorials03May2023
BestInFlowCompetitionTutorials03May2023
Timothy Spann11 views
Cloudera Sandbox Event Guidelines For Workflow by Timothy Spann
Cloudera Sandbox Event Guidelines For WorkflowCloudera Sandbox Event Guidelines For Workflow
Cloudera Sandbox Event Guidelines For Workflow
Timothy Spann32 views
Meet the Committers Webinar_ Lab Preparation by Timothy Spann
Meet the Committers Webinar_ Lab PreparationMeet the Committers Webinar_ Lab Preparation
Meet the Committers Webinar_ Lab Preparation
Timothy Spann32 views
Best Practices For Workflow by Timothy Spann
Best Practices For WorkflowBest Practices For Workflow
Best Practices For Workflow
Timothy Spann89 views

Recently uploaded

EV Charging App Case by
EV Charging App Case EV Charging App Case
EV Charging App Case iCoderz Solutions
10 views1 slide
nintendo_64.pptx by
nintendo_64.pptxnintendo_64.pptx
nintendo_64.pptxpaiga02016
7 views7 slides
Top-5-production-devconMunich-2023.pptx by
Top-5-production-devconMunich-2023.pptxTop-5-production-devconMunich-2023.pptx
Top-5-production-devconMunich-2023.pptxTier1 app
10 views40 slides
Winter Projects GDSC IITK by
Winter Projects GDSC IITKWinter Projects GDSC IITK
Winter Projects GDSC IITKSahilSingh368445
416 views60 slides
What is API by
What is APIWhat is API
What is APIartembondar5
15 views15 slides
Introduction to Gradle by
Introduction to GradleIntroduction to Gradle
Introduction to GradleJohn Valentino
7 views7 slides

Recently uploaded(20)

Top-5-production-devconMunich-2023.pptx by Tier1 app
Top-5-production-devconMunich-2023.pptxTop-5-production-devconMunich-2023.pptx
Top-5-production-devconMunich-2023.pptx
Tier1 app10 views
Dapr Unleashed: Accelerating Microservice Development by Miroslav Janeski
Dapr Unleashed: Accelerating Microservice DevelopmentDapr Unleashed: Accelerating Microservice Development
Dapr Unleashed: Accelerating Microservice Development
Miroslav Janeski16 views
How to build dyanmic dashboards and ensure they always work by Wiiisdom
How to build dyanmic dashboards and ensure they always workHow to build dyanmic dashboards and ensure they always work
How to build dyanmic dashboards and ensure they always work
Wiiisdom16 views
Mobile App Development Company by Richestsoft
Mobile App Development CompanyMobile App Development Company
Mobile App Development Company
Richestsoft 5 views
Electronic AWB - Electronic Air Waybill by Freightoscope
Electronic AWB - Electronic Air Waybill Electronic AWB - Electronic Air Waybill
Electronic AWB - Electronic Air Waybill
Freightoscope 6 views
tecnologia18.docx by nosi6702
tecnologia18.docxtecnologia18.docx
tecnologia18.docx
nosi67026 views
FOSSLight Community Day 2023-11-30 by Shane Coughlan
FOSSLight Community Day 2023-11-30FOSSLight Community Day 2023-11-30
FOSSLight Community Day 2023-11-30
Shane Coughlan8 views
Transport Management System - Shipment & Container Tracking by Freightoscope
Transport Management System - Shipment & Container TrackingTransport Management System - Shipment & Container Tracking
Transport Management System - Shipment & Container Tracking
Freightoscope 6 views
Understanding HTML terminology by artembondar5
Understanding HTML terminologyUnderstanding HTML terminology
Understanding HTML terminology
artembondar58 views

Devfest uk & ireland using apache nifi with apache pulsar for fast data on-ramp 2022

  • 1. Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp Tim Spann | Developer Advocate
  • 2. Tim Spann Developer Advocate ● https://www.datainmotion.dev/ ● https://github.com/tspannhw/SpeakerProfile ● https://dev.to/tspannhw ● https://sessionize.com/tspann/ DZone Zone Leader and Big Data MVB Data DJay
  • 3. FLiP Stack Weekly This week in Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark and open source friends. https://bit.ly/32dAJft
  • 4. streamnative.io ● Founded the original developers of Apache Pulsar. ● Passionate and dedicated team. ● StreamNative helps teams to capture, manage, and leverage data using Pulsar’s unified messaging and streaming platform.
  • 5. The Need For Real-Time Data Hybrid and multi-cloud strategies with native geo-replication Seamlessly build microservice architectures with support for streaming and messaging workloads Built for Kubernetes CloudNative migrations with tools 360 degree customer data multi-tenancy, infinite retention, and extensive connector ecosystem
  • 6. Events <-> Streaming FLiPN Apps StreamNative Hub StreamNative Cloud Unified Batch and Stream COMPUTING Batch (Batch + Stream) Unified Batch and Stream STORAGE Offload (Queuing + Streaming) Tiered Storage Pulsar --- KoP --- MoP --- Websocket Pulsar Sink Streaming Edge Gateway Protocols <-> Events <-> CDC Apps
  • 8. Apache Pulsar ● Serverless computing framework. ● Unbounded storage, multi-tiered architecture, and tiered-storage. ● Streaming & Pub/Sub messaging semantics. ● Multi-protocol support
  • 9. Why Apache Pulsar? Unified Messaging Platform Guaranteed Message Delivery Resiliency Infinite Scalability
  • 10. Messages - the basic unit of Pulsar Component Description Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data can also conform to data schemas. Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like topic compaction. Properties An optional key/value map of user-defined properties. Producer name The name of the producer who produces the message. If you do not specify a producer name, the default name is used. Message De-Duplication. Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the message is its order in that sequence. Message De-Duplication.
  • 11. Connectivity • Functions - Lightweight Stream Processing (Java, Python, Go) • Connectors - Sources & Sinks (Cassandra, Kafka, …) • Protocol Handlers - AoP (AMQP), KoP (Kafka), MoP (MQTT) • Processing Engines - Flink, Spark, Presto/Trino via Pulsar SQL • Data Offloaders - Tiered Storage - (S3) hub.streamnative.io
  • 12. Pulsar’s Publish-Subscribe model Broker Subscription Consumer 1 Consumer 2 Consumer 3 Topic Producer 1 Producer 2 ● Producers send messages. ● Topics are an ordered, named channel that producers use to transmit messages to subscribed consumers. ● Messages belong to a topic and contain an arbitrary payload. ● Brokers handle connections and routes messages between producers / consumers. ● Subscriptions are named configuration rules that determine how messages are delivered to consumers. ● Consumers receive messages.
  • 13. Topics Tenants (Compliance) Tenants (Data Services) Namespace (Microservices) Topic-1 (Cust Auth) Topic-1 (Location Resolution) Topic-2 (Demographics) Topic-1 (Budgeted Spend) Topic-1 (Acct History) Topic-1 (Risk Detection) Namespace (ETL) Namespace (Campaigns) Namespace (ETL) Tenants (Marketing) Namespace (Risk Assessment) Pulsar Instance Pulsar Cluster
  • 14. Pulsar subscription modes Different subscription modes have different semantics: Exclusive/Failover - guaranteed order, single active consumer Shared - multiple active consumers, no order Key_Shared - multiple active consumers, order for given key Producer 1 Producer 2 Pulsar Topic Subscription D Consumer D-1 Consumer D-2 Key-Shared < K 1, V 10 > < K 1, V 11 > < K 1, V 12 > < K 2 ,V 2 0 > < K 2 ,V 2 1> < K 2 ,V 2 2 > Subscription C Consumer C-1 Consumer C-2 Shared < K 1, V 10 > < K 2, V 21 > < K 1, V 12 > < K 2 ,V 2 0 > < K 1, V 11 > < K 2 ,V 2 2 > Subscription A Consumer A Exclusive Subscription B Consumer B-1 Consumer B-2 In case of failure in Consumer B-1 Failover
  • 15. What are Pulsar Functions? • Lambda-style functions that use Pulsar as the message bus. • Handles producer/consumer setup • Applies user supplied business logic against consumed message.
  • 16. Benefits of Pulsar Functions • Allow you to focus on the business logic. • Eliminates boilerplate code. • Handles message consumption and publication • No need for another processing framework. • Can be scaled up independently
  • 17. Use cases ● Unified Messaging Platform ● AdTech ● Fraud Detection ● Connected Car ● IoT Analytics ● Microservices Development
  • 19. Why Apache NiFi? • Guaranteed delivery • Data buffering - Backpressure - Pressure release • Prioritized queuing • Flow specific QoS - Latency vs. throughput - Loss tolerance • Data provenance • Supports push and pull models • Hundreds of processors • Visual command and control • Over a sixty sources • Flow templates • Pluggable/multi-role security • Designed for extension • Clustering • Version Control
  • 23. Record Processors https://www.datainmotion.dev/2019/03/advanced-xml-processing-with-apache.html ● XML, CSV, JSON, AVRO and more ● Schemas or Inferred Schemas ● Easily convert between them ● Support SQL with Apache Calcite
  • 25. Consume MQTT This could read from Apache Pulsar - MoP (MQTT on Pulsar)
  • 26. Apache NiFi Pulsar Connector https://github.com/david-streamlio/pulsar-nifi-bundle
  • 27. Apache NiFi Pulsar Connector https://www.datainmotion.dev/2021/11/producing-and-consuming-pulsar-messages.html
  • 28. Apache NiFi Pulsar Connector
  • 29. Apache NiFi Pulsar Connector
  • 30. Apache NiFi Pulsar Connector https://github.com/david-streamlio/pulsar-nifi-bundle/releases/tag/v1.14.0
  • 32. Powered by Apache Pulsar, StreamNative provides a cloud-native, real-time messaging and streaming platform to support multi-cloud and hybrid cloud strategies. Built for Containers Cloud Native StreamNative Cloud Flink SQL
  • 33. StreamNative Ambassador Program 2022 Learn More Start Survey Tell us about your Pulsar experience and what improvements you would like to see!
  • 34. Now Available On-Demand Pulsar Training Academy.StreamNative.io Live 3-day Developers Training Times: ● Europe: 3:00 PM CET - 7:00 PM CET ● EasternTime: 9:00 AM - 1: 00 PM EST ● Pacific Time: 6:00 AM - 10 AM PST Save Your Spot! 34 Feb 15-17
  • 35. Let’s Keep in Touch! Tim Spann Developer Advocate @PaaSDev https://www.linkedin.com/in/timothyspann https://github.com/tspannhw
  • 36. MQTT on Pulsar (MoP)