Princeton Dec 2022 Meetup_ NiFi + Flink + Pulsar

Timothy Spann
Timothy SpannDeveloper Advocate
Streaming Data Platform
for Cloud-Native
Event-driven Applications
Tim Spann
Developer Advocate
Proprietary & Confidential |
Agenda
2
● Welcome
● Introduction to Apache Pulsar
○ Basics of Pulsar
○ Use Cases
● Let’s Build an App!
○ Demo
● Resources
● Q&A
Proprietary & Confidential | 3
Tim Spann
Developer Advocate
at StreamNative
FLiP(N) Stack = Flink, Pulsar and NiFi Stack
Streaming Systems & Data Architecture Expert
Experience:
● 15+ years of experience with streaming technologies
including Pulsar, Flink, Spark, NiFi, Big Data, Cloud, MXNet,
IoT, Python and more.
● Today, he helps to grow the Pulsar community sharing rich
technical knowledge and experience at both global
conferences and through individual conversations.
Proprietary & Confidential |
https://bit.ly/32dAJft
4
FLiP Stack Weekly
This week in Apache Flink, Apache
Pulsar, Apache NiFi, Apache Spark and
open source friends.
Proprietary & Confidential |
Agenda
5
• Introduction
• Demo
• What is Apache Pulsar?
• Apache Flink
• Pulsar to NiFi, Pulsar to SSB
• Q&A
Proprietary & Confidential | 6
Founded by the original creators of
Apache Pulsar.
StreamNative employs more than 50% of
the active core committers to Apache
Pulsar.
StreamNative has more experience
designing, deploying, and running
large-scale Apache Pulsar instances
than any team in the world.
Proprietary & Confidential | 7
Building Real-Time
Requires a Team
Proprietary & Confidential |
Proprietary & Confidential |
Proprietary & Confidential |
Proprietary & Confidential |
Proprietary & Confidential |
https://github.com/tspannhw/pulsar-csp-ce
https://github.com/tspannhw/FLiPN-FLaNK-KafkaConnectToMoP
https://github.com/tspannhw/create-nifi-pulsar-flink-apps
Demos
Proprietary & Confidential | 13
Apache Pulsar has a vibrant community
560+
Contributors
10,000+
Commits
7,000+
Slack Members
1,000+
Organizations
Using Pulsar
Proprietary & Confidential |
Guaranteed
Message
Delivery
14
Unified
Messaging
Platform
Resiliency
The basics
Infinite
Scalability
Proprietary & Confidential | 15
Pulsar Cluster
● “Bookies”
● Stores messages and cursors
● Messages are grouped in
segments/ledgers
● A group of bookies form an
“ensemble” to store a ledger
● “Brokers”
● Handles message routing and
connections
● Stateless, but with caches
● Automatic load-balancing
● Topics are composed of
multiple segments
●
● Stores metadata for
both Pulsar and
BookKeeper
● Service discovery
Store
Messages
Metadata &
Service Discovery
Metadata &
Service Discovery
Metadata
Storage
Proprietary & Confidential |
Tenants
(Compliance)
Tenants
(Data Services)
Namespace
(Microservices)
Topic-1
(Cust Auth)
Topic-1
(Location Resolution)
Topic-2
(Demographics)
Topic-1
(Budgeted Spend)
Topic-1
(Acct History)
Topic-1
(Risk Detection)
Namespace
(ETL)
Namespace
(Campaigns)
Namespace
(ETL)
Tenants
(Marketing)
Namespace
(Risk Assessment)
Pulsar Cluster
16
Tenant - Namespaces - Topics
Proprietary & Confidential |
Messages - the basic unit of Pulsar
17
Component Description
Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data
can also conform to data schemas.
Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like
topic compaction.
Properties An optional key/value map of user-defined properties.
Producer name The name of the producer who produces the message. If you do not specify a producer name, the
default name is used.
Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the
message is its order in that sequence.
Proprietary & Confidential | 18
Producer-Consumer
Publisher sends data and
doesn't know about the
subscribers or their status.
Producer Consumer
Topic
All interactions go through
Pulsar and it handles all
communication.
Subscriber receives data
from publisher and never
directly interacts with it.
Proprietary & Confidential |
Streaming
Consumer
Consumer
Consumer
Subscription
Shared
Failover
Consumer
Consumer
Subscription
In case of failure in
Consumer B-0
Consumer
Consumer
Subscription
Exclusive
X
Consumer
Consumer
Key-Shared
Subscription
Pulsar
Topic/Partition
Messaging
19
Proprietary & Confidential |
Pulsarʼs Publish-Subscribe model
20
● Producers send messages.
● Topics are an ordered, named channels that producers use to transmit messages to
subscribed consumers.
● Messages belong to a topic and contain an arbitrary payload.
● Brokers handle connections and routes messages between producers / consumers.
● Subscriptions are named configuration rules that determine how messages are
delivered to consumers.
● Consumers receive messages.
Broker
Subscription
Consumer 1
Consumer 2
Consumer 3
Topic
Producer 1
Producer 2
Proprietary & Confidential |
Pulsar Subscription Modes
21
Different subscription modes
have different semantics:
● Exclusive/Failover -
guaranteed order, single
active consumer
● Shared - multiple active
consumers, no order
● Key_Shared - multiple
active consumers, order
for given key
Producer 1
Producer 2
Pulsar Topic
Subscription D
Consumer D-1
Consumer D-2
Key-Shared
<
K
1,
V
10
>
<
K
1,
V
11
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
2
,V
2
1>
<
K
2
,V
2
2
>
Subscription C
Consumer C-1
Consumer C-2
Shared
<
K
1,
V
10
>
<
K
2,
V
21
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
1,
V
11
>
<
K
2
,V
2
2
>
Subscription A Consumer A
Exclusive
Subscription B
Consumer B-1
Consumer B-2
In case of failure in
Consumer B-1
Failover
Proprietary & Confidential |
Integrated Schema Registry
22
Schema Registry
schema-1 (value=Avro/Protobuf/JSON) schema-2
(value=Avro/Protobuf/JSON)
schema-3
(value=Avro/Protobuf/JSON)
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
(value=Avro/Protobuf/JSON) data
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
(value=Avro/Protobuf/JSON) data
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers
Proprietary & Confidential |
The FliPN kitten crosses the stream
4 ways with Apache Pulsar
23
Proprietary & Confidential |
Kafka on Pulsar (KoP)
24
Proprietary & Confidential |
MQTT on Pulsar (MoP)
25
Proprietary & Confidential |
AMQP on Pulsar (AoP)
26
Proprietary & Confidential |
Kafka to Pulsar
27
Proprietary & Confidential | 28
A serverless event streaming
framework
Pulsar Functions
● Lightweight computation similar to
AWS Lambda.
● Specifically designed to use Apache
Pulsar as a message bus.
● Function runtime can be located
within Pulsar Broker.
● Java Functions
Proprietary & Confidential | 29
Pulsar Functions
● Consume messages from one or
more Pulsar topics.
● Apply user-supplied processing
logic to each message.
● Publish the results of the
computation to another topic.
● Support multiple programming
languages (Java, Python, Go)
● Can leverage 3rd-party libraries to
support the execution of ML
models on the edge.
Proprietary & Confidential |
Data Offloaders
(Tiered Storage)
Client Libraries
StreamNative Pulsar ecosystem
hub.streamnative.io
Connectors
(Sources & Sinks)
Protocol Handlers
Pulsar Functions
(Lightweight Stream
Processing)
Processing Engines
… and more!
… and more!
Proprietary & Confidential |
Apache NiFi - Apache Pulsar Connector
31
https://github.com/streamnative/pulsar-nifi-bundle
Proprietary & Confidential |
Apache NiFi - Apache Pulsar Connector
32
Proprietary & Confidential |
Apache NiFi - Apache Pulsar Connector
33
https://github.com/david-streamlio/pulsar-nifi-bundle/releases/tag/v1.14.0
Proprietary & Confidential | 34
Flink-Pulsar Sink Connector
Proprietary & Confidential |
Cloudera SQL Stream Builder -> Pulsar Connector
35
Proprietary & Confidential |
Cloudera SQL Stream Builder -> Pulsar Connector
36
Proprietary & Confidential | 37
Apache Pulsar
in Action
Please enjoy David’s complete
book which is the ultimate
guide to Pulsar.
Proprietary & Confidential | 38
Proprietary & Confidential | 39
● https://medium.com/@tspann/using-apache-
pulsar-with-cloudera-sql-builder-apache-flin
k-b518aa9eadff
● https://medium.com/@tspann/apache-pulsar
-processor-updated-for-the-new-apache-nif
i-1-18-0-af781d722e63
Proprietary & Confidential |
@PassDev
https://www.linkedin.com/in/timothyspann
https://github.com/tspannhw
https://streamnative.io/pulsar-python/
40
Tim Spann
Developer Advocate
at StreamNative
Proprietary & Confidential | 41
Additional Slides
Proprietary & Confidential |
Concentrate data for multiple clusters into a single one where main processing happens.
Data Aggregation
Proprietary & Confidential |
Processing happens identical in both clusters.
Active - Active
Proprietary & Confidential |
Only active cluster processes data - Passive cluster is for failover.
Active - Passive
1 of 44

Recommended

Building Real-Time Travel Alerts by
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel AlertsTimothy Spann
165 views48 slides
JConWorld_ Continuous SQL with Kafka and Flink by
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
156 views36 slides
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines by
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data PipelinesTimothy Spann
150 views25 slides
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo by
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoTimothy Spann
162 views8 slides
CoC23_ Looking at the New Features of Apache NiFi by
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiTimothy Spann
36 views24 slides
CoC23_ Let’s Monitor The Conditions at the Conference by
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the ConferenceTimothy Spann
17 views17 slides

More Related Content

More from Timothy Spann

The Never Landing Stream with HTAP and Streaming by
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and StreamingTimothy Spann
254 views39 slides
Meetup - Brasil - Data In Motion - 2023 September 19 by
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19Timothy Spann
319 views33 slides
Implement a Universal Data Distribution Architecture to Manage All Streaming ... by
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Timothy Spann
28 views56 slides
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data by
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataTimothy Spann
193 views45 slides
big data fest building modern data streaming apps by
big data fest building modern data streaming appsbig data fest building modern data streaming apps
big data fest building modern data streaming appsTimothy Spann
317 views55 slides
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp by
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-RampTimothy Spann
163 views27 slides

More from Timothy Spann(20)

The Never Landing Stream with HTAP and Streaming by Timothy Spann
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
Timothy Spann254 views
Meetup - Brasil - Data In Motion - 2023 September 19 by Timothy Spann
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19
Timothy Spann319 views
Implement a Universal Data Distribution Architecture to Manage All Streaming ... by Timothy Spann
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Timothy Spann28 views
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data by Timothy Spann
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Timothy Spann193 views
big data fest building modern data streaming apps by Timothy Spann
big data fest building modern data streaming appsbig data fest building modern data streaming apps
big data fest building modern data streaming apps
Timothy Spann317 views
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp by Timothy Spann
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Timothy Spann163 views
OSSNA Building Modern Data Streaming Apps by Timothy Spann
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
Timothy Spann155 views
GSJUG: Mastering Data Streaming Pipelines 09May2023 by Timothy Spann
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
Timothy Spann255 views
BestInFlowCompetitionTutorials03May2023 by Timothy Spann
BestInFlowCompetitionTutorials03May2023BestInFlowCompetitionTutorials03May2023
BestInFlowCompetitionTutorials03May2023
Timothy Spann11 views
Cloudera Sandbox Event Guidelines For Workflow by Timothy Spann
Cloudera Sandbox Event Guidelines For WorkflowCloudera Sandbox Event Guidelines For Workflow
Cloudera Sandbox Event Guidelines For Workflow
Timothy Spann32 views
Meet the Committers Webinar_ Lab Preparation by Timothy Spann
Meet the Committers Webinar_ Lab PreparationMeet the Committers Webinar_ Lab Preparation
Meet the Committers Webinar_ Lab Preparation
Timothy Spann32 views
Best Practices For Workflow by Timothy Spann
Best Practices For WorkflowBest Practices For Workflow
Best Practices For Workflow
Timothy Spann89 views
Meetup: Streaming Data Pipeline Development by Timothy Spann
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
Timothy Spann337 views
DevNexus: Apache Pulsar Development 101 with Java by Timothy Spann
DevNexus:  Apache Pulsar Development 101 with JavaDevNexus:  Apache Pulsar Development 101 with Java
DevNexus: Apache Pulsar Development 101 with Java
Timothy Spann261 views
Conf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices by Timothy Spann
Conf42 Python_ ML Enhanced Event Streaming Apps with Python MicroservicesConf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices
Conf42 Python_ ML Enhanced Event Streaming Apps with Python Microservices
Timothy Spann443 views
ITPC Building Modern Data Streaming Apps by Timothy Spann
ITPC Building Modern Data Streaming AppsITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming Apps
Timothy Spann797 views
PythonWebConference_ Cloud Native Apache Pulsar Development 202 with Python by Timothy Spann
PythonWebConference_ Cloud Native Apache Pulsar Development 202 with PythonPythonWebConference_ Cloud Native Apache Pulsar Development 202 with Python
PythonWebConference_ Cloud Native Apache Pulsar Development 202 with Python
Timothy Spann430 views
PhillyJug Getting Started With Real-time Cloud Native Streaming With Java by Timothy Spann
PhillyJug  Getting Started With Real-time Cloud Native Streaming With JavaPhillyJug  Getting Started With Real-time Cloud Native Streaming With Java
PhillyJug Getting Started With Real-time Cloud Native Streaming With Java
Timothy Spann625 views
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud) by Timothy Spann
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Timothy Spann18 views

Recently uploaded

Automated Testing of Microsoft Power BI Reports by
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI ReportsRTTS
11 views20 slides
nintendo_64.pptx by
nintendo_64.pptxnintendo_64.pptx
nintendo_64.pptxpaiga02016
7 views7 slides
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P... by
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...NimaTorabi2
17 views17 slides
predicting-m3-devopsconMunich-2023.pptx by
predicting-m3-devopsconMunich-2023.pptxpredicting-m3-devopsconMunich-2023.pptx
predicting-m3-devopsconMunich-2023.pptxTier1 app
10 views24 slides
Flask-Python by
Flask-PythonFlask-Python
Flask-PythonTriloki Gupta
10 views12 slides
Quality Engineer: A Day in the Life by
Quality Engineer: A Day in the LifeQuality Engineer: A Day in the Life
Quality Engineer: A Day in the LifeJohn Valentino
10 views18 slides

Recently uploaded(20)

Automated Testing of Microsoft Power BI Reports by RTTS
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI Reports
RTTS11 views
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P... by NimaTorabi2
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
NimaTorabi217 views
predicting-m3-devopsconMunich-2023.pptx by Tier1 app
predicting-m3-devopsconMunich-2023.pptxpredicting-m3-devopsconMunich-2023.pptx
predicting-m3-devopsconMunich-2023.pptx
Tier1 app10 views
Quality Engineer: A Day in the Life by John Valentino
Quality Engineer: A Day in the LifeQuality Engineer: A Day in the Life
Quality Engineer: A Day in the Life
John Valentino10 views
Top-5-production-devconMunich-2023.pptx by Tier1 app
Top-5-production-devconMunich-2023.pptxTop-5-production-devconMunich-2023.pptx
Top-5-production-devconMunich-2023.pptx
Tier1 app10 views
Electronic AWB - Electronic Air Waybill by Freightoscope
Electronic AWB - Electronic Air Waybill Electronic AWB - Electronic Air Waybill
Electronic AWB - Electronic Air Waybill
Freightoscope 6 views
Mobile App Development Company by Richestsoft
Mobile App Development CompanyMobile App Development Company
Mobile App Development Company
Richestsoft 5 views
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile... by Stefan Wolpers
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...
Stefan Wolpers44 views
Supercharging your Python Development Environment with VS Code and Dev Contai... by Dawn Wages
Supercharging your Python Development Environment with VS Code and Dev Contai...Supercharging your Python Development Environment with VS Code and Dev Contai...
Supercharging your Python Development Environment with VS Code and Dev Contai...
Dawn Wages5 views
Ports-and-Adapters Architecture for Embedded HMI by Burkhard Stubert
Ports-and-Adapters Architecture for Embedded HMIPorts-and-Adapters Architecture for Embedded HMI
Ports-and-Adapters Architecture for Embedded HMI
Burkhard Stubert35 views
Understanding HTML terminology by artembondar5
Understanding HTML terminologyUnderstanding HTML terminology
Understanding HTML terminology
artembondar58 views
predicting-m3-devopsconMunich-2023-v2.pptx by Tier1 app
predicting-m3-devopsconMunich-2023-v2.pptxpredicting-m3-devopsconMunich-2023-v2.pptx
predicting-m3-devopsconMunich-2023-v2.pptx
Tier1 app14 views
Top-5-production-devconMunich-2023-v2.pptx by Tier1 app
Top-5-production-devconMunich-2023-v2.pptxTop-5-production-devconMunich-2023-v2.pptx
Top-5-production-devconMunich-2023-v2.pptx
Tier1 app9 views
Dapr Unleashed: Accelerating Microservice Development by Miroslav Janeski
Dapr Unleashed: Accelerating Microservice DevelopmentDapr Unleashed: Accelerating Microservice Development
Dapr Unleashed: Accelerating Microservice Development
Miroslav Janeski16 views

Princeton Dec 2022 Meetup_ NiFi + Flink + Pulsar

  • 1. Streaming Data Platform for Cloud-Native Event-driven Applications Tim Spann Developer Advocate
  • 2. Proprietary & Confidential | Agenda 2 ● Welcome ● Introduction to Apache Pulsar ○ Basics of Pulsar ○ Use Cases ● Let’s Build an App! ○ Demo ● Resources ● Q&A
  • 3. Proprietary & Confidential | 3 Tim Spann Developer Advocate at StreamNative FLiP(N) Stack = Flink, Pulsar and NiFi Stack Streaming Systems & Data Architecture Expert Experience: ● 15+ years of experience with streaming technologies including Pulsar, Flink, Spark, NiFi, Big Data, Cloud, MXNet, IoT, Python and more. ● Today, he helps to grow the Pulsar community sharing rich technical knowledge and experience at both global conferences and through individual conversations.
  • 4. Proprietary & Confidential | https://bit.ly/32dAJft 4 FLiP Stack Weekly This week in Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark and open source friends.
  • 5. Proprietary & Confidential | Agenda 5 • Introduction • Demo • What is Apache Pulsar? • Apache Flink • Pulsar to NiFi, Pulsar to SSB • Q&A
  • 6. Proprietary & Confidential | 6 Founded by the original creators of Apache Pulsar. StreamNative employs more than 50% of the active core committers to Apache Pulsar. StreamNative has more experience designing, deploying, and running large-scale Apache Pulsar instances than any team in the world.
  • 7. Proprietary & Confidential | 7 Building Real-Time Requires a Team
  • 12. Proprietary & Confidential | https://github.com/tspannhw/pulsar-csp-ce https://github.com/tspannhw/FLiPN-FLaNK-KafkaConnectToMoP https://github.com/tspannhw/create-nifi-pulsar-flink-apps Demos
  • 13. Proprietary & Confidential | 13 Apache Pulsar has a vibrant community 560+ Contributors 10,000+ Commits 7,000+ Slack Members 1,000+ Organizations Using Pulsar
  • 14. Proprietary & Confidential | Guaranteed Message Delivery 14 Unified Messaging Platform Resiliency The basics Infinite Scalability
  • 15. Proprietary & Confidential | 15 Pulsar Cluster ● “Bookies” ● Stores messages and cursors ● Messages are grouped in segments/ledgers ● A group of bookies form an “ensemble” to store a ledger ● “Brokers” ● Handles message routing and connections ● Stateless, but with caches ● Automatic load-balancing ● Topics are composed of multiple segments ● ● Stores metadata for both Pulsar and BookKeeper ● Service discovery Store Messages Metadata & Service Discovery Metadata & Service Discovery Metadata Storage
  • 16. Proprietary & Confidential | Tenants (Compliance) Tenants (Data Services) Namespace (Microservices) Topic-1 (Cust Auth) Topic-1 (Location Resolution) Topic-2 (Demographics) Topic-1 (Budgeted Spend) Topic-1 (Acct History) Topic-1 (Risk Detection) Namespace (ETL) Namespace (Campaigns) Namespace (ETL) Tenants (Marketing) Namespace (Risk Assessment) Pulsar Cluster 16 Tenant - Namespaces - Topics
  • 17. Proprietary & Confidential | Messages - the basic unit of Pulsar 17 Component Description Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data can also conform to data schemas. Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like topic compaction. Properties An optional key/value map of user-defined properties. Producer name The name of the producer who produces the message. If you do not specify a producer name, the default name is used. Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the message is its order in that sequence.
  • 18. Proprietary & Confidential | 18 Producer-Consumer Publisher sends data and doesn't know about the subscribers or their status. Producer Consumer Topic All interactions go through Pulsar and it handles all communication. Subscriber receives data from publisher and never directly interacts with it.
  • 19. Proprietary & Confidential | Streaming Consumer Consumer Consumer Subscription Shared Failover Consumer Consumer Subscription In case of failure in Consumer B-0 Consumer Consumer Subscription Exclusive X Consumer Consumer Key-Shared Subscription Pulsar Topic/Partition Messaging 19
  • 20. Proprietary & Confidential | Pulsarʼs Publish-Subscribe model 20 ● Producers send messages. ● Topics are an ordered, named channels that producers use to transmit messages to subscribed consumers. ● Messages belong to a topic and contain an arbitrary payload. ● Brokers handle connections and routes messages between producers / consumers. ● Subscriptions are named configuration rules that determine how messages are delivered to consumers. ● Consumers receive messages. Broker Subscription Consumer 1 Consumer 2 Consumer 3 Topic Producer 1 Producer 2
  • 21. Proprietary & Confidential | Pulsar Subscription Modes 21 Different subscription modes have different semantics: ● Exclusive/Failover - guaranteed order, single active consumer ● Shared - multiple active consumers, no order ● Key_Shared - multiple active consumers, order for given key Producer 1 Producer 2 Pulsar Topic Subscription D Consumer D-1 Consumer D-2 Key-Shared < K 1, V 10 > < K 1, V 11 > < K 1, V 12 > < K 2 ,V 2 0 > < K 2 ,V 2 1> < K 2 ,V 2 2 > Subscription C Consumer C-1 Consumer C-2 Shared < K 1, V 10 > < K 2, V 21 > < K 1, V 12 > < K 2 ,V 2 0 > < K 1, V 11 > < K 2 ,V 2 2 > Subscription A Consumer A Exclusive Subscription B Consumer B-1 Consumer B-2 In case of failure in Consumer B-1 Failover
  • 22. Proprietary & Confidential | Integrated Schema Registry 22 Schema Registry schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3 (value=Avro/Protobuf/JSON) Schema Data ID Local Cache for Schemas + Schema Data ID + Local Cache for Schemas Send schema-1 (value=Avro/Protobuf/JSON) data serialized per schema ID Send (register) schema (if not in local cache) Read schema-1 (value=Avro/Protobuf/JSON) data deserialized per schema ID Get schema by ID (if not in local cache) Producers Consumers
  • 23. Proprietary & Confidential | The FliPN kitten crosses the stream 4 ways with Apache Pulsar 23
  • 24. Proprietary & Confidential | Kafka on Pulsar (KoP) 24
  • 25. Proprietary & Confidential | MQTT on Pulsar (MoP) 25
  • 26. Proprietary & Confidential | AMQP on Pulsar (AoP) 26
  • 27. Proprietary & Confidential | Kafka to Pulsar 27
  • 28. Proprietary & Confidential | 28 A serverless event streaming framework Pulsar Functions ● Lightweight computation similar to AWS Lambda. ● Specifically designed to use Apache Pulsar as a message bus. ● Function runtime can be located within Pulsar Broker. ● Java Functions
  • 29. Proprietary & Confidential | 29 Pulsar Functions ● Consume messages from one or more Pulsar topics. ● Apply user-supplied processing logic to each message. ● Publish the results of the computation to another topic. ● Support multiple programming languages (Java, Python, Go) ● Can leverage 3rd-party libraries to support the execution of ML models on the edge.
  • 30. Proprietary & Confidential | Data Offloaders (Tiered Storage) Client Libraries StreamNative Pulsar ecosystem hub.streamnative.io Connectors (Sources & Sinks) Protocol Handlers Pulsar Functions (Lightweight Stream Processing) Processing Engines … and more! … and more!
  • 31. Proprietary & Confidential | Apache NiFi - Apache Pulsar Connector 31 https://github.com/streamnative/pulsar-nifi-bundle
  • 32. Proprietary & Confidential | Apache NiFi - Apache Pulsar Connector 32
  • 33. Proprietary & Confidential | Apache NiFi - Apache Pulsar Connector 33 https://github.com/david-streamlio/pulsar-nifi-bundle/releases/tag/v1.14.0
  • 34. Proprietary & Confidential | 34 Flink-Pulsar Sink Connector
  • 35. Proprietary & Confidential | Cloudera SQL Stream Builder -> Pulsar Connector 35
  • 36. Proprietary & Confidential | Cloudera SQL Stream Builder -> Pulsar Connector 36
  • 37. Proprietary & Confidential | 37 Apache Pulsar in Action Please enjoy David’s complete book which is the ultimate guide to Pulsar.
  • 39. Proprietary & Confidential | 39 ● https://medium.com/@tspann/using-apache- pulsar-with-cloudera-sql-builder-apache-flin k-b518aa9eadff ● https://medium.com/@tspann/apache-pulsar -processor-updated-for-the-new-apache-nif i-1-18-0-af781d722e63
  • 40. Proprietary & Confidential | @PassDev https://www.linkedin.com/in/timothyspann https://github.com/tspannhw https://streamnative.io/pulsar-python/ 40 Tim Spann Developer Advocate at StreamNative
  • 41. Proprietary & Confidential | 41 Additional Slides
  • 42. Proprietary & Confidential | Concentrate data for multiple clusters into a single one where main processing happens. Data Aggregation
  • 43. Proprietary & Confidential | Processing happens identical in both clusters. Active - Active
  • 44. Proprietary & Confidential | Only active cluster processes data - Passive cluster is for failover. Active - Passive