[March sn meetup] apache pulsar + apache nifi for cloud data lake

Timothy Spann
Timothy SpannDeveloper Advocate
Welcome to
Apache Pulsar and Apache NiFi
for Cloud Data Lakes
In the meantime:
● (1) Use the chat to let us know
where you’re calling in from
● (2) Take part in our our poll, under
the booth “poll” tab in the right
panel of Hopin
We’ll start in 5min we’re just waiting for
People to sign in
[March sn meetup] apache pulsar + apache nifi for cloud data lake
Agenda
01
02
03
04
05
Intro to Apache Pulsar - Tim Spann
Intro to Apache NiFi - John Kuchmek
Demo
Key Takeaways + Resources
Additional Q&A
Tim Spann
Developer Advocate
● FLiP(N) Stack = Flink, Pulsar and NiFi Combined
● Streaming Systems & Data Architecture Expert
● Pulsar, Flink, Spark, NiFi, Big Data, Cloud, MXNet,
IoT, Java, Python, Sensors and more.
Tim Spann
Developer Advocate
StreamNative
Tim Spann
Developer Advocate
● Integration of OT & IT Data
● Cloudera Streaming SME
● NiFi, Spark, Flink, Kafka, Storm, Druid, Kudu,
Python, Sensors, PLCs, Private Cloud and Public
Cloud
John Kuchmek
Principal Solutions Engineer
Cloudera
Apache Pulsar is a Cloud-Native
Messaging and Event-Streaming Platform.
Why Apache Pulsar?
Unified
Messaging Platform
Guaranteed
Message Delivery Resiliency Infinite
Scalability
Component Description
Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although
message data can also conform to data schemas.
Key Messages are optionally tagged with keys, used in partitioning and also is useful for
things like topic compaction.
Properties An optional key/value map of user-defined properties.
Producer name The name of the producer who produces the message. If you do not specify a producer
name, the default name is used. Message De-Duplication.
Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of
the message is its order in that sequence. Message De-Duplication.
Messages - the basic unit of Pulsar
Connectivity
• Libraries - (Java, Python, Go, NodeJS, WebSockets,
C++, C#, Scala, Rust,...)
• Functions - Lightweight Stream Processing (Java,
Python, Go)
• Connectors - Sources & Sinks (Cassandra, Kafka, …)
• Protocol Handlers - AoP (AMQP), KoP (Kafka), MoP
(MQTT)
• Processing Engines - Flink, Spark, Presto/Trino via
Pulsar SQL, NiFi
• Data Offloaders - Tiered Storage - (S3)
hub.streamnative.io
Apache NiFi Pulsar Connector
https://github.com/streamnative/pulsar-nifi-bundle
Apache NiFi is a GUI based Data Flow
tool that runs anywhere.
Why NiFi
• Enable easy ingestion, routing, management and delivery of any data anywhere (Edge,
cloud, data center) to any downstream system with built in end-to-end security and
provenance
ACQUIRE PROCESS DELIVER
• Over 350 Prebuilt Processors
• Easy to build your own
• Parse, Enrich & Apply Schema
• Filter, Split, Merger & Route
• Throttle & Backpressure
• Guaranteed Delivery
• Full data provenance
• Eco-system integration
Advanced tooling to industrialize flow development
(Flow Development Life Cycle)
FTP
SFTP
HL7
UDP
XML
HTTP
EMAIL
HTML
IMAGE
SYSLOG
FTP
SFTP
HL7
UDP
XML
HTTP
EMAIL
HTML
IMAGE
SYSLOG
HASH
MERGE
EXTRACT
DUPLICATE
SPLIT
ROUTE TEXT
ROUTE CONTENT
ROUTE CONTEXT
CONTROL RATE
DISTRIBUTE LOAD
GEOENRICH
SCAN
REPLACE
TRANSLATE
CONVERT
ENCRYPT
TALL
EVALUATE
EXECUTE
Apache NiFi Capabilities
Data Ingest Data Transformation Data Enrichment
HTTP
Syslog
HL7
UDP
SFTP
MQTT
WS
Hash
Compress
Merge
Duplicate
Split
Encrypt
Syslog
REST
Mapcach
Enrich IP
GeoIP
XML
Flow Development Lifecycle
FLiP Stack Weekly
This week in Apache Flink, Apache Pulsar, Apache
NiFi, Apache Spark and open source friends.
https://bit.ly/32dAJft
Demo
Streaming NFTs
NFTs
NFT
Thermal
Aggregates
Status
APIs
CRYPTO
FEEDS
Weather
NFT++
Streaming NFTs
https://opensea.io/collection/tspannhw-collection
Key Takeaways
● Real-time ingest and data manipulation
● Easy to use/configure processors and
controller services
● Multiple ways to connect
Resources
Learn More about Nifi + Pulsar Integration
https://streamnative.io/apache-nifi-connector/
Github
https://github.com/tspannhw/awesome-nifi-pulsar
Blogpost on Apache Pulsar + Nifi Integration
https://hubs.ly/Q015PNMd0
● StreamNative: Pulsar-as-a-Service
● AWS Certified Associate Solutions
Architect
● Reach me at doug@streamnative.io
Doug Cohen
Head of Sales, StreamNative
Additional
Resources
Let’s Keep
in Touch!
Tim Spann
Developer Advocate
@PaaSDev
linkedin.com/in/timo
thyspann
github.com/tspannhw
John Kuchmek
Principal
Solutions Engineer
@K_Physics
linkedin.com/in/jkuch
mek
github.com/johnkuch
Pulsar Subscription Modes
Different subscription modes
have different semantics:
Exclusive/Failover - guaranteed
order, single active consumer
Shared - multiple active
consumers, no order
Key_Shared - multiple active
consumers, order for given key
Producer 1
Producer 2
Pulsar Topic
Subscription D
Consumer D-1
Consumer D-2
Key-Shared
<
K
1,
V
10
>
<
K
1,
V
11
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
2
,V
2
1>
<
K
2
,V
2
2
>
Subscription C
Consumer C-1
Consumer C-2
Shared
<
K
1,
V
10
>
<
K
2,
V
21
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
1,
V
11
>
<
K
2
,V
2
2
>
Subscription A Consumer A
Exclusive
Subscription B
Consumer B-1
Consumer B-2
In case of failure in
Consumer B-1
Failover
Schema Registry
Schema Registry
schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3
(value=Avro/Protobuf/JSON)
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
(value=Avro/Protobuf/JSON) data
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
(value=Avro/Protobuf/JSON) data
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers
● Buffer
● Batch
● Route
● Filter
● Aggregate
● Enrich
● Replicate
● Dedupe
● Decouple
● Distribute
Messaging
Ideal for work queues that do not
require tasks to be performed in a
particular order—for example, sending
one email message to many recipients.
RabbitMQ and Amazon SQS are
examples of popular queue-based
message systems.
Pulsar: Unified Messaging + Data Streaming
Messaging
Ideal for work queues that do not
require tasks to be performed in a
particular order—for example, sending
one email message to many recipients.
RabbitMQ and Amazon SQS are
examples of popular queue-based
message systems.
Pulsar: Unified Messaging + Data Streaming
.. and Streaming
Works best in situations where the order
of messages is important—for example,
data ingestion.
Kafka and Amazon Kinesis are examples
of messaging systems that use streaming
semantics for consuming messages.
Pulsar Instance
Pulsar Cluster
Pulsar Instance
Pulsar Cluster
A Unified Messaging Platform
Message Queuing
Data Streaming
Topics
Tenants
(Compliance)
Tenants
(Data Services)
Namespace
(Microservices)
Topic-1
(Cust Auth)
Topic-1
(Location Resolution)
Topic-2
(Demographics)
Topic-1
(Budgeted Spend)
Topic-1
(Acct History)
Topic-1
(Risk Detection)
Namespace
(ETL)
Namespace
(Campaigns)
Namespace
(ETL)
Tenants
(Marketing)
Namespace
(Risk Assessment)
Pulsar Instance
Pulsar Cluster
Pulsar’s Publish-Subscribe model
Broker
Subscription
Consumer 1
Consumer 2
Consumer 3
Topic
Producer 1
Producer 2
● Producers send messages.
● Topics are an ordered, named channel that producers
use to transmit messages to subscribed consumers.
● Messages belong to a topic and contain an arbitrary
payload.
● Brokers handle connections and routes
messages between producers / consumers.
● Subscriptions are named configuration rules
that determine how messages are delivered to
consumers.
● Consumers receive messages.
Producer-Consumer
Producer Consumer
Publisher sends data and
doesn't know about the
subscribers or their status.
All interactions go through
Pulsar and it handles all
communication.
Subscriber receives data
from publisher and never
directly interacts with it
Topic
Topic
Kafka
On Pulsar
(KoP)
streamnative.io
MQTT
On Pulsar
(MoP)
Pulsar Functions
● Lightweight
computation similar to
AWS Lambda.
● Specifically designed to
use Apache Pulsar as a
message bus.
● Function runtime can
be located within
Pulsar Broker.
A serverless event streaming
framework
streamnative.io
● Consume messages from one
or more Pulsar topics.
● Apply user-supplied
processing logic to each
message.
● Publish the results of the
computation to another topic.
● Support multiple
programming languages (Java,
Python, Go)
● Can leverage 3rd-party
libraries to support the
execution of ML models on
the edge.
Pulsar Functions
Pulsar SQL
Presto/Trino workers can read
segments directly from
bookies (or offloaded storage)
in parallel.
Bookie
1
Segment 1
Producer Consumer
Broker 1
Topic1-Part1
Broker 2
Topic1-Part2
Broker 3
Topic1-Part3
Segment 2 Segment 3 Segment 4 Segment X
Segment 1
Segment 1 Segment 1
Segment 3 Segment 3
Segment 3
Segment 2
Segment 2
Segment 2
Segment 4
Segment 4
Segment 4
Segment X
Segment X
Segment X
Bookie
2
Bookie
3
Query
Coordinator
...
...
SQL Worker SQL Worker SQL Worker
SQL Worker
Query
Topic
Metadata
Use Cases
Multi-Tenant Data
Infrastructure
AdTech
Fraud Detection
Connected Car
IoT Analytics
Data Lake Hydration
Apache NiFi
Apache NiFi Pulsar Connector
https://github.com/streamnative/pulsar-nifi-bundle
Apache NiFi Pulsar Connector
https://github.com/david-streamlio/pulsar-nifi-bundle
Apache NiFi Pulsar Connector
Apache NiFi Pulsar Connector
Apache NiFi Pulsar Connector
StreamNative
Cloud
streamnative.io
Passionate and dedicated team.
Founded by the original developers of
Apache Pulsar.
StreamNative helps teams to capture,
manage, and leverage data using Pulsar’s
unified messaging and streaming
platform.
Founded By The
Creators Of Apache Pulsar
Sijie Guo
ASF Member
Pulsar/BookKeeper PMC
Founder and CEO
Jia Zhai
Pulsar/BookKeeper PMC
Co-Founder
Matteo Merli
ASF Member
Pulsar/BookKeeper PMC
CTO
Data veterans with extensive industry experience
[March sn meetup] apache pulsar + apache nifi for cloud data lake
REST Feed Non-Fungible Token
{"date":"Thu, 24 Feb 2022 22:26:41
GMT","short_description":"","featured":"false","image_thumbnail_url":"htt","asset_contract_created_date":"2022-02-17T15:4
8:44.822206","asset_contract_owner":"50299352","image_preview_url":"https://lh3.googleus","asset_contract_symbol":"TD
","twitter_username":"","description":"10,000metaverse-readyAvatars","asset_contract_address":"0xc7df86762ba83f2a619
7e1ff9bb40ae0f696b9e6","external_url":"https://www.sandbox.game/en/snoopdogg/","token_id":"492","asset_contract_na
me":"Theoggies","asset_contract_nft_version":"3.0","asset_contract_description":"metaverse.","asset_contract_external_lin
k":"https://www.sandbox.game/en/snoopdogg/","id":"307922619","featured_image_url":"https","slug":"snoop-dogg-doggie
s","token_metadata":"https://contracts.sandbox.game/unrevealed.json?tokenId=492","asset_contract_schema_name":"ER
C721","animation_url":"https","num_sales":"1","image_url":"https://lh","asset_contract_default_to_fiat":"false","external_link":
"","image_original_url":"https://contracts.sandbox.game/preview.png","asset_contract_payout_address":"0x4489590a1166
18b506f0efe885432f6a8ed998e9","animation_original_url":"https://con","background_color":"","asset_contract_asset_cont
ract_type":"non-fungible","name":"The Doggies","asset_contract_image_url":"https","asset_contract_total_supply":"0"}
https://docs.opensea.io/reference/retrieving-bundles
StreamNative Hub
StreamNative Cloud
Unified Batch and Stream STORAGE
Offload
(Queuing + Streaming)
Apache Pulsar - Apache NiFi <-> Events <-> Cloud Data Stores
Tiered Storage
Pulsar
---
KoP
---
MoP
---
Websocket
---
HTTP
Pulsar
Sink
Pulsar
Sink
Data Gateway
Protocols
Data to Cloud Data Lake
Micro
Service
(Queuing + Streaming)
StreamNative Cloud
Tiered Storage
(Queuing + Streaming)
(Queuing + Streaming)
Tiered Storage
(Queuing + Streaming)
[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake
1 of 55

Recommended

StreamNative FLiP into scylladb - scylla summit 2022 by
StreamNative   FLiP into scylladb - scylla summit 2022StreamNative   FLiP into scylladb - scylla summit 2022
StreamNative FLiP into scylladb - scylla summit 2022Timothy Spann
528 views22 slides
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing by
Pulsar summit asia 2021   apache pulsar with mqtt for edge computingPulsar summit asia 2021   apache pulsar with mqtt for edge computing
Pulsar summit asia 2021 apache pulsar with mqtt for edge computingTimothy Spann
366 views30 slides
Apache Deep Learning 201 - Philly Open Source by
Apache Deep Learning 201 - Philly Open SourceApache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open SourceTimothy Spann
642 views29 slides
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends by
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsTimothy Spann
986 views60 slides
Pulsar summit asia 2021: Designing Pulsar for Isolation by
Pulsar summit asia 2021: Designing Pulsar for IsolationPulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for IsolationShivji Kumar Jha
175 views28 slides
Real time cloud native open source streaming of any data to apache solr by
Real time cloud native open source streaming of any data to apache solrReal time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solrTimothy Spann
759 views31 slides

More Related Content

What's hot

DBCC 2021 - FLiP Stack for Cloud Data Lakes by
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesTimothy Spann
717 views36 slides
Music city data Hail Hydrate! from stream to lake by
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeTimothy Spann
708 views37 slides
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid... by
Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...Timothy Spann
757 views29 slides
Data minutes #2 Apache Pulsar with MQTT for Edge Computing Lightning - 2022 by
Data minutes #2   Apache Pulsar with MQTT for Edge Computing Lightning - 2022Data minutes #2   Apache Pulsar with MQTT for Edge Computing Lightning - 2022
Data minutes #2 Apache Pulsar with MQTT for Edge Computing Lightning - 2022Timothy Spann
571 views17 slides
Distributed Crypto-Currency Trading with Apache Pulsar by
Distributed Crypto-Currency Trading with Apache PulsarDistributed Crypto-Currency Trading with Apache Pulsar
Distributed Crypto-Currency Trading with Apache PulsarStreamlio
1.6K views20 slides
Python web conference 2022 apache pulsar development 101 with python (f li-... by
Python web conference 2022   apache pulsar development 101 with python (f li-...Python web conference 2022   apache pulsar development 101 with python (f li-...
Python web conference 2022 apache pulsar development 101 with python (f li-...Timothy Spann
282 views49 slides

What's hot(20)

DBCC 2021 - FLiP Stack for Cloud Data Lakes by Timothy Spann
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data Lakes
Timothy Spann717 views
Music city data Hail Hydrate! from stream to lake by Timothy Spann
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
Timothy Spann708 views
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid... by Timothy Spann
Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
Timothy Spann757 views
Data minutes #2 Apache Pulsar with MQTT for Edge Computing Lightning - 2022 by Timothy Spann
Data minutes #2   Apache Pulsar with MQTT for Edge Computing Lightning - 2022Data minutes #2   Apache Pulsar with MQTT for Edge Computing Lightning - 2022
Data minutes #2 Apache Pulsar with MQTT for Edge Computing Lightning - 2022
Timothy Spann571 views
Distributed Crypto-Currency Trading with Apache Pulsar by Streamlio
Distributed Crypto-Currency Trading with Apache PulsarDistributed Crypto-Currency Trading with Apache Pulsar
Distributed Crypto-Currency Trading with Apache Pulsar
Streamlio1.6K views
Python web conference 2022 apache pulsar development 101 with python (f li-... by Timothy Spann
Python web conference 2022   apache pulsar development 101 with python (f li-...Python web conference 2022   apache pulsar development 101 with python (f li-...
Python web conference 2022 apache pulsar development 101 with python (f li-...
Timothy Spann282 views
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ... by StreamNative
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
Interactive Analytics on Pulsar with Pulsar SQL - Pulsar Virtual Summit Europ...
StreamNative273 views
Using FLiP with influxdb for edgeai iot at scale 2022 by Timothy Spann
Using FLiP with influxdb for edgeai iot at scale 2022Using FLiP with influxdb for edgeai iot at scale 2022
Using FLiP with influxdb for edgeai iot at scale 2022
Timothy Spann465 views
Big data conference europe real-time streaming in any and all clouds, hybri... by Timothy Spann
Big data conference europe   real-time streaming in any and all clouds, hybri...Big data conference europe   real-time streaming in any and all clouds, hybri...
Big data conference europe real-time streaming in any and all clouds, hybri...
Timothy Spann811 views
Introduction to Apache Kafka- Part 1 by Knoldus Inc.
Introduction to Apache Kafka- Part 1Introduction to Apache Kafka- Part 1
Introduction to Apache Kafka- Part 1
Knoldus Inc.4.1K views
Open Source Bristol 30 March 2022 by Timothy Spann
Open Source Bristol 30 March 2022Open Source Bristol 30 March 2022
Open Source Bristol 30 March 2022
Timothy Spann95 views
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) - Pulsar Summit Asia ... by StreamNative
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) - Pulsar Summit Asia ...Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) - Pulsar Summit Asia ...
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) - Pulsar Summit Asia ...
StreamNative258 views
Automation + dev ops summit hail hydrate! from stream to lake by Timothy Spann
Automation + dev ops summit   hail hydrate! from stream to lakeAutomation + dev ops summit   hail hydrate! from stream to lake
Automation + dev ops summit hail hydrate! from stream to lake
Timothy Spann457 views
Kafka and Spark Streaming by datamantra
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streaming
datamantra2.1K views
Big mountain data and dev conference apache pulsar with mqtt for edge compu... by Timothy Spann
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Timothy Spann440 views
Cloud lunch and learn real-time streaming in azure by Timothy Spann
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
Timothy Spann663 views
Apache Pulsar at Yahoo! Japan by StreamNative
Apache Pulsar at Yahoo! JapanApache Pulsar at Yahoo! Japan
Apache Pulsar at Yahoo! Japan
StreamNative2.6K views
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and... by Timothy Spann
Osacon 2021   hello hydrate! from stream to clickhouse with apache pulsar and...Osacon 2021   hello hydrate! from stream to clickhouse with apache pulsar and...
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and...
Timothy Spann3.1K views

Similar to [March sn meetup] apache pulsar + apache nifi for cloud data lake

JConf.dev 2022 - Apache Pulsar Development 101 with Java by
JConf.dev 2022 - Apache Pulsar Development 101 with JavaJConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with JavaTimothy Spann
216 views59 slides
Timothy Spann: Apache Pulsar for ML by
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLEdunomica
37 views65 slides
bigdata 2022_ FLiP Into Pulsar Apps by
bigdata 2022_ FLiP Into Pulsar Appsbigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar AppsTimothy Spann
460 views60 slides
Apache Pulsar Development 101 with Python by
Apache Pulsar Development 101 with PythonApache Pulsar Development 101 with Python
Apache Pulsar Development 101 with PythonTimothy Spann
1.2K views45 slides
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi... by
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Timothy Spann
197 views69 slides
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) by
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) Timothy Spann
305 views71 slides

Similar to [March sn meetup] apache pulsar + apache nifi for cloud data lake(20)

JConf.dev 2022 - Apache Pulsar Development 101 with Java by Timothy Spann
JConf.dev 2022 - Apache Pulsar Development 101 with JavaJConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with Java
Timothy Spann216 views
Timothy Spann: Apache Pulsar for ML by Edunomica
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
Edunomica37 views
bigdata 2022_ FLiP Into Pulsar Apps by Timothy Spann
bigdata 2022_ FLiP Into Pulsar Appsbigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Apps
Timothy Spann460 views
Apache Pulsar Development 101 with Python by Timothy Spann
Apache Pulsar Development 101 with PythonApache Pulsar Development 101 with Python
Apache Pulsar Development 101 with Python
Timothy Spann1.2K views
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi... by Timothy Spann
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Timothy Spann197 views
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) by Timothy Spann
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Timothy Spann305 views
Let's keep it simple and streaming by Timothy Spann
Let's keep it simple and streamingLet's keep it simple and streaming
Let's keep it simple and streaming
Timothy Spann19 views
Let's keep it simple and streaming.pdf by VMware Tanzu
Let's keep it simple and streaming.pdfLet's keep it simple and streaming.pdf
Let's keep it simple and streaming.pdf
VMware Tanzu88 views
CODEONTHEBEACH_Streaming Applications with Apache Pulsar by Timothy Spann
CODEONTHEBEACH_Streaming Applications with Apache PulsarCODEONTHEBEACH_Streaming Applications with Apache Pulsar
CODEONTHEBEACH_Streaming Applications with Apache Pulsar
Timothy Spann47 views
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar by Timothy Spann
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
Timothy Spann175 views
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py) by Timothy Spann
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)
Timothy Spann172 views
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud) by Timothy Spann
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Timothy Spann18 views
Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming by Timothy Spann
Princeton Dec 2022 Meetup_ StreamNative and Cloudera StreamingPrinceton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Timothy Spann214 views
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum... by HostedbyConfluent
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
HostedbyConfluent336 views
(Current22) Let's Monitor The Conditions at the Conference by Timothy Spann
(Current22) Let's Monitor The Conditions at the Conference(Current22) Let's Monitor The Conditions at the Conference
(Current22) Let's Monitor The Conditions at the Conference
Timothy Spann150 views
Deep Dive into Building Streaming Applications with Apache Pulsar by Timothy Spann
Deep Dive into Building Streaming Applications with Apache Pulsar Deep Dive into Building Streaming Applications with Apache Pulsar
Deep Dive into Building Streaming Applications with Apache Pulsar
Timothy Spann298 views
OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar by Timothy Spann
OSS EU:  Deep Dive into Building Streaming Applications with Apache PulsarOSS EU:  Deep Dive into Building Streaming Applications with Apache Pulsar
OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar
Timothy Spann822 views
The Dream Stream Team for Pulsar and Spring by Timothy Spann
The Dream Stream Team for Pulsar and SpringThe Dream Stream Team for Pulsar and Spring
The Dream Stream Team for Pulsar and Spring
Timothy Spann590 views
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ... by Trivadis
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis TechEvent 2016 Apache Kafka - Scalable Massage Processing and more! ...
Trivadis390 views
[Conf42-KubeNative] Building Real-time Pulsar Apps on K8 by Timothy Spann
[Conf42-KubeNative] Building Real-time Pulsar Apps on K8[Conf42-KubeNative] Building Real-time Pulsar Apps on K8
[Conf42-KubeNative] Building Real-time Pulsar Apps on K8
Timothy Spann241 views

More from Timothy Spann

Building Real-Time Travel Alerts by
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel AlertsTimothy Spann
165 views48 slides
JConWorld_ Continuous SQL with Kafka and Flink by
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
156 views36 slides
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines by
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data PipelinesTimothy Spann
150 views25 slides
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo by
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoTimothy Spann
162 views8 slides
CoC23_ Looking at the New Features of Apache NiFi by
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiTimothy Spann
36 views24 slides
CoC23_ Let’s Monitor The Conditions at the Conference by
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the ConferenceTimothy Spann
17 views17 slides

More from Timothy Spann(20)

Building Real-Time Travel Alerts by Timothy Spann
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
Timothy Spann165 views
JConWorld_ Continuous SQL with Kafka and Flink by Timothy Spann
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann156 views
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines by Timothy Spann
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
Timothy Spann150 views
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo by Timothy Spann
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Timothy Spann162 views
CoC23_ Looking at the New Features of Apache NiFi by Timothy Spann
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFi
Timothy Spann36 views
CoC23_ Let’s Monitor The Conditions at the Conference by Timothy Spann
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the Conference
Timothy Spann17 views
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf by Timothy Spann
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
Timothy Spann23 views
CoC23_Utilizing Real-Time Transit Data for Travel Optimization by Timothy Spann
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationCoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
Timothy Spann31 views
The Never Landing Stream with HTAP and Streaming by Timothy Spann
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
Timothy Spann254 views
Meetup - Brasil - Data In Motion - 2023 September 19 by Timothy Spann
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19
Timothy Spann319 views
Implement a Universal Data Distribution Architecture to Manage All Streaming ... by Timothy Spann
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Timothy Spann28 views
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data by Timothy Spann
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Timothy Spann193 views
big data fest building modern data streaming apps by Timothy Spann
big data fest building modern data streaming appsbig data fest building modern data streaming apps
big data fest building modern data streaming apps
Timothy Spann317 views
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp by Timothy Spann
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Timothy Spann163 views
OSSNA Building Modern Data Streaming Apps by Timothy Spann
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
Timothy Spann155 views
GSJUG: Mastering Data Streaming Pipelines 09May2023 by Timothy Spann
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
Timothy Spann255 views
BestInFlowCompetitionTutorials03May2023 by Timothy Spann
BestInFlowCompetitionTutorials03May2023BestInFlowCompetitionTutorials03May2023
BestInFlowCompetitionTutorials03May2023
Timothy Spann11 views
Cloudera Sandbox Event Guidelines For Workflow by Timothy Spann
Cloudera Sandbox Event Guidelines For WorkflowCloudera Sandbox Event Guidelines For Workflow
Cloudera Sandbox Event Guidelines For Workflow
Timothy Spann32 views
Meet the Committers Webinar_ Lab Preparation by Timothy Spann
Meet the Committers Webinar_ Lab PreparationMeet the Committers Webinar_ Lab Preparation
Meet the Committers Webinar_ Lab Preparation
Timothy Spann32 views

Recently uploaded

Quality Engineer: A Day in the Life by
Quality Engineer: A Day in the LifeQuality Engineer: A Day in the Life
Quality Engineer: A Day in the LifeJohn Valentino
10 views18 slides
JioEngage_Presentation.pptx by
JioEngage_Presentation.pptxJioEngage_Presentation.pptx
JioEngage_Presentation.pptxadmin125455
9 views4 slides
Electronic AWB - Electronic Air Waybill by
Electronic AWB - Electronic Air Waybill Electronic AWB - Electronic Air Waybill
Electronic AWB - Electronic Air Waybill Freightoscope
6 views1 slide
Techstack Ltd at Slush 2023, Ukrainian delegation by
Techstack Ltd at Slush 2023, Ukrainian delegationTechstack Ltd at Slush 2023, Ukrainian delegation
Techstack Ltd at Slush 2023, Ukrainian delegationViktoriiaOpanasenko
7 views4 slides
How to build dyanmic dashboards and ensure they always work by
How to build dyanmic dashboards and ensure they always workHow to build dyanmic dashboards and ensure they always work
How to build dyanmic dashboards and ensure they always workWiiisdom
16 views13 slides
What is API by
What is APIWhat is API
What is APIartembondar5
15 views15 slides

Recently uploaded(20)

Quality Engineer: A Day in the Life by John Valentino
Quality Engineer: A Day in the LifeQuality Engineer: A Day in the Life
Quality Engineer: A Day in the Life
John Valentino10 views
JioEngage_Presentation.pptx by admin125455
JioEngage_Presentation.pptxJioEngage_Presentation.pptx
JioEngage_Presentation.pptx
admin1254559 views
Electronic AWB - Electronic Air Waybill by Freightoscope
Electronic AWB - Electronic Air Waybill Electronic AWB - Electronic Air Waybill
Electronic AWB - Electronic Air Waybill
Freightoscope 6 views
How to build dyanmic dashboards and ensure they always work by Wiiisdom
How to build dyanmic dashboards and ensure they always workHow to build dyanmic dashboards and ensure they always work
How to build dyanmic dashboards and ensure they always work
Wiiisdom16 views
Bootstrapping vs Venture Capital.pptx by Zeljko Svedic
Bootstrapping vs Venture Capital.pptxBootstrapping vs Venture Capital.pptx
Bootstrapping vs Venture Capital.pptx
Zeljko Svedic16 views
Introduction to Git Source Control by John Valentino
Introduction to Git Source ControlIntroduction to Git Source Control
Introduction to Git Source Control
John Valentino8 views
Dapr Unleashed: Accelerating Microservice Development by Miroslav Janeski
Dapr Unleashed: Accelerating Microservice DevelopmentDapr Unleashed: Accelerating Microservice Development
Dapr Unleashed: Accelerating Microservice Development
Miroslav Janeski16 views
Understanding HTML terminology by artembondar5
Understanding HTML terminologyUnderstanding HTML terminology
Understanding HTML terminology
artembondar58 views
ADDO_2022_CICID_Tom_Halpin.pdf by TomHalpin9
ADDO_2022_CICID_Tom_Halpin.pdfADDO_2022_CICID_Tom_Halpin.pdf
ADDO_2022_CICID_Tom_Halpin.pdf
TomHalpin96 views
Ports-and-Adapters Architecture for Embedded HMI by Burkhard Stubert
Ports-and-Adapters Architecture for Embedded HMIPorts-and-Adapters Architecture for Embedded HMI
Ports-and-Adapters Architecture for Embedded HMI
Burkhard Stubert35 views
Streamlining Your Business Operations with Enterprise Application Integration... by Flexsin
Streamlining Your Business Operations with Enterprise Application Integration...Streamlining Your Business Operations with Enterprise Application Integration...
Streamlining Your Business Operations with Enterprise Application Integration...
Flexsin 5 views
tecnologia18.docx by nosi6702
tecnologia18.docxtecnologia18.docx
tecnologia18.docx
nosi67026 views

[March sn meetup] apache pulsar + apache nifi for cloud data lake

  • 1. Welcome to Apache Pulsar and Apache NiFi for Cloud Data Lakes In the meantime: ● (1) Use the chat to let us know where you’re calling in from ● (2) Take part in our our poll, under the booth “poll” tab in the right panel of Hopin We’ll start in 5min we’re just waiting for People to sign in
  • 3. Agenda 01 02 03 04 05 Intro to Apache Pulsar - Tim Spann Intro to Apache NiFi - John Kuchmek Demo Key Takeaways + Resources Additional Q&A
  • 4. Tim Spann Developer Advocate ● FLiP(N) Stack = Flink, Pulsar and NiFi Combined ● Streaming Systems & Data Architecture Expert ● Pulsar, Flink, Spark, NiFi, Big Data, Cloud, MXNet, IoT, Java, Python, Sensors and more. Tim Spann Developer Advocate StreamNative
  • 5. Tim Spann Developer Advocate ● Integration of OT & IT Data ● Cloudera Streaming SME ● NiFi, Spark, Flink, Kafka, Storm, Druid, Kudu, Python, Sensors, PLCs, Private Cloud and Public Cloud John Kuchmek Principal Solutions Engineer Cloudera
  • 6. Apache Pulsar is a Cloud-Native Messaging and Event-Streaming Platform.
  • 7. Why Apache Pulsar? Unified Messaging Platform Guaranteed Message Delivery Resiliency Infinite Scalability
  • 8. Component Description Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data can also conform to data schemas. Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like topic compaction. Properties An optional key/value map of user-defined properties. Producer name The name of the producer who produces the message. If you do not specify a producer name, the default name is used. Message De-Duplication. Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the message is its order in that sequence. Message De-Duplication. Messages - the basic unit of Pulsar
  • 9. Connectivity • Libraries - (Java, Python, Go, NodeJS, WebSockets, C++, C#, Scala, Rust,...) • Functions - Lightweight Stream Processing (Java, Python, Go) • Connectors - Sources & Sinks (Cassandra, Kafka, …) • Protocol Handlers - AoP (AMQP), KoP (Kafka), MoP (MQTT) • Processing Engines - Flink, Spark, Presto/Trino via Pulsar SQL, NiFi • Data Offloaders - Tiered Storage - (S3) hub.streamnative.io
  • 10. Apache NiFi Pulsar Connector https://github.com/streamnative/pulsar-nifi-bundle
  • 11. Apache NiFi is a GUI based Data Flow tool that runs anywhere.
  • 12. Why NiFi • Enable easy ingestion, routing, management and delivery of any data anywhere (Edge, cloud, data center) to any downstream system with built in end-to-end security and provenance ACQUIRE PROCESS DELIVER • Over 350 Prebuilt Processors • Easy to build your own • Parse, Enrich & Apply Schema • Filter, Split, Merger & Route • Throttle & Backpressure • Guaranteed Delivery • Full data provenance • Eco-system integration Advanced tooling to industrialize flow development (Flow Development Life Cycle) FTP SFTP HL7 UDP XML HTTP EMAIL HTML IMAGE SYSLOG FTP SFTP HL7 UDP XML HTTP EMAIL HTML IMAGE SYSLOG HASH MERGE EXTRACT DUPLICATE SPLIT ROUTE TEXT ROUTE CONTENT ROUTE CONTEXT CONTROL RATE DISTRIBUTE LOAD GEOENRICH SCAN REPLACE TRANSLATE CONVERT ENCRYPT TALL EVALUATE EXECUTE
  • 13. Apache NiFi Capabilities Data Ingest Data Transformation Data Enrichment HTTP Syslog HL7 UDP SFTP MQTT WS Hash Compress Merge Duplicate Split Encrypt Syslog REST Mapcach Enrich IP GeoIP XML
  • 15. FLiP Stack Weekly This week in Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark and open source friends. https://bit.ly/32dAJft
  • 16. Demo
  • 19. Key Takeaways ● Real-time ingest and data manipulation ● Easy to use/configure processors and controller services ● Multiple ways to connect
  • 20. Resources Learn More about Nifi + Pulsar Integration https://streamnative.io/apache-nifi-connector/ Github https://github.com/tspannhw/awesome-nifi-pulsar Blogpost on Apache Pulsar + Nifi Integration https://hubs.ly/Q015PNMd0
  • 21. ● StreamNative: Pulsar-as-a-Service ● AWS Certified Associate Solutions Architect ● Reach me at doug@streamnative.io Doug Cohen Head of Sales, StreamNative Additional Resources
  • 22. Let’s Keep in Touch! Tim Spann Developer Advocate @PaaSDev linkedin.com/in/timo thyspann github.com/tspannhw John Kuchmek Principal Solutions Engineer @K_Physics linkedin.com/in/jkuch mek github.com/johnkuch
  • 23. Pulsar Subscription Modes Different subscription modes have different semantics: Exclusive/Failover - guaranteed order, single active consumer Shared - multiple active consumers, no order Key_Shared - multiple active consumers, order for given key Producer 1 Producer 2 Pulsar Topic Subscription D Consumer D-1 Consumer D-2 Key-Shared < K 1, V 10 > < K 1, V 11 > < K 1, V 12 > < K 2 ,V 2 0 > < K 2 ,V 2 1> < K 2 ,V 2 2 > Subscription C Consumer C-1 Consumer C-2 Shared < K 1, V 10 > < K 2, V 21 > < K 1, V 12 > < K 2 ,V 2 0 > < K 1, V 11 > < K 2 ,V 2 2 > Subscription A Consumer A Exclusive Subscription B Consumer B-1 Consumer B-2 In case of failure in Consumer B-1 Failover
  • 24. Schema Registry Schema Registry schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3 (value=Avro/Protobuf/JSON) Schema Data ID Local Cache for Schemas + Schema Data ID + Local Cache for Schemas Send schema-1 (value=Avro/Protobuf/JSON) data serialized per schema ID Send (register) schema (if not in local cache) Read schema-1 (value=Avro/Protobuf/JSON) data deserialized per schema ID Get schema by ID (if not in local cache) Producers Consumers
  • 25. ● Buffer ● Batch ● Route ● Filter ● Aggregate ● Enrich ● Replicate ● Dedupe ● Decouple ● Distribute
  • 26. Messaging Ideal for work queues that do not require tasks to be performed in a particular order—for example, sending one email message to many recipients. RabbitMQ and Amazon SQS are examples of popular queue-based message systems. Pulsar: Unified Messaging + Data Streaming
  • 27. Messaging Ideal for work queues that do not require tasks to be performed in a particular order—for example, sending one email message to many recipients. RabbitMQ and Amazon SQS are examples of popular queue-based message systems. Pulsar: Unified Messaging + Data Streaming .. and Streaming Works best in situations where the order of messages is important—for example, data ingestion. Kafka and Amazon Kinesis are examples of messaging systems that use streaming semantics for consuming messages.
  • 28. Pulsar Instance Pulsar Cluster Pulsar Instance Pulsar Cluster
  • 29. A Unified Messaging Platform Message Queuing Data Streaming
  • 30. Topics Tenants (Compliance) Tenants (Data Services) Namespace (Microservices) Topic-1 (Cust Auth) Topic-1 (Location Resolution) Topic-2 (Demographics) Topic-1 (Budgeted Spend) Topic-1 (Acct History) Topic-1 (Risk Detection) Namespace (ETL) Namespace (Campaigns) Namespace (ETL) Tenants (Marketing) Namespace (Risk Assessment) Pulsar Instance Pulsar Cluster
  • 31. Pulsar’s Publish-Subscribe model Broker Subscription Consumer 1 Consumer 2 Consumer 3 Topic Producer 1 Producer 2 ● Producers send messages. ● Topics are an ordered, named channel that producers use to transmit messages to subscribed consumers. ● Messages belong to a topic and contain an arbitrary payload. ● Brokers handle connections and routes messages between producers / consumers. ● Subscriptions are named configuration rules that determine how messages are delivered to consumers. ● Consumers receive messages.
  • 32. Producer-Consumer Producer Consumer Publisher sends data and doesn't know about the subscribers or their status. All interactions go through Pulsar and it handles all communication. Subscriber receives data from publisher and never directly interacts with it Topic Topic
  • 35. Pulsar Functions ● Lightweight computation similar to AWS Lambda. ● Specifically designed to use Apache Pulsar as a message bus. ● Function runtime can be located within Pulsar Broker. A serverless event streaming framework
  • 36. streamnative.io ● Consume messages from one or more Pulsar topics. ● Apply user-supplied processing logic to each message. ● Publish the results of the computation to another topic. ● Support multiple programming languages (Java, Python, Go) ● Can leverage 3rd-party libraries to support the execution of ML models on the edge. Pulsar Functions
  • 37. Pulsar SQL Presto/Trino workers can read segments directly from bookies (or offloaded storage) in parallel. Bookie 1 Segment 1 Producer Consumer Broker 1 Topic1-Part1 Broker 2 Topic1-Part2 Broker 3 Topic1-Part3 Segment 2 Segment 3 Segment 4 Segment X Segment 1 Segment 1 Segment 1 Segment 3 Segment 3 Segment 3 Segment 2 Segment 2 Segment 2 Segment 4 Segment 4 Segment 4 Segment X Segment X Segment X Bookie 2 Bookie 3 Query Coordinator ... ... SQL Worker SQL Worker SQL Worker SQL Worker Query Topic Metadata
  • 38. Use Cases Multi-Tenant Data Infrastructure AdTech Fraud Detection Connected Car IoT Analytics Data Lake Hydration
  • 40. Apache NiFi Pulsar Connector https://github.com/streamnative/pulsar-nifi-bundle
  • 41. Apache NiFi Pulsar Connector https://github.com/david-streamlio/pulsar-nifi-bundle
  • 42. Apache NiFi Pulsar Connector
  • 43. Apache NiFi Pulsar Connector
  • 44. Apache NiFi Pulsar Connector
  • 46. streamnative.io Passionate and dedicated team. Founded by the original developers of Apache Pulsar. StreamNative helps teams to capture, manage, and leverage data using Pulsar’s unified messaging and streaming platform.
  • 47. Founded By The Creators Of Apache Pulsar Sijie Guo ASF Member Pulsar/BookKeeper PMC Founder and CEO Jia Zhai Pulsar/BookKeeper PMC Co-Founder Matteo Merli ASF Member Pulsar/BookKeeper PMC CTO Data veterans with extensive industry experience
  • 49. REST Feed Non-Fungible Token {"date":"Thu, 24 Feb 2022 22:26:41 GMT","short_description":"","featured":"false","image_thumbnail_url":"htt","asset_contract_created_date":"2022-02-17T15:4 8:44.822206","asset_contract_owner":"50299352","image_preview_url":"https://lh3.googleus","asset_contract_symbol":"TD ","twitter_username":"","description":"10,000metaverse-readyAvatars","asset_contract_address":"0xc7df86762ba83f2a619 7e1ff9bb40ae0f696b9e6","external_url":"https://www.sandbox.game/en/snoopdogg/","token_id":"492","asset_contract_na me":"Theoggies","asset_contract_nft_version":"3.0","asset_contract_description":"metaverse.","asset_contract_external_lin k":"https://www.sandbox.game/en/snoopdogg/","id":"307922619","featured_image_url":"https","slug":"snoop-dogg-doggie s","token_metadata":"https://contracts.sandbox.game/unrevealed.json?tokenId=492","asset_contract_schema_name":"ER C721","animation_url":"https","num_sales":"1","image_url":"https://lh","asset_contract_default_to_fiat":"false","external_link": "","image_original_url":"https://contracts.sandbox.game/preview.png","asset_contract_payout_address":"0x4489590a1166 18b506f0efe885432f6a8ed998e9","animation_original_url":"https://con","background_color":"","asset_contract_asset_cont ract_type":"non-fungible","name":"The Doggies","asset_contract_image_url":"https","asset_contract_total_supply":"0"} https://docs.opensea.io/reference/retrieving-bundles
  • 50. StreamNative Hub StreamNative Cloud Unified Batch and Stream STORAGE Offload (Queuing + Streaming) Apache Pulsar - Apache NiFi <-> Events <-> Cloud Data Stores Tiered Storage Pulsar --- KoP --- MoP --- Websocket --- HTTP Pulsar Sink Pulsar Sink Data Gateway Protocols Data to Cloud Data Lake Micro Service (Queuing + Streaming)
  • 53. (Queuing + Streaming) Tiered Storage (Queuing + Streaming)