Open Source Bristol 30 March 2022

Timothy Spann
Timothy SpannDeveloper Advocate
What can
Apache Pulsar
do for FinTech?
streamnative.io
Tim Spann
Developer Advocate
StreamNative
● FLiP(N) Stack = Flink, Pulsar and NiFi Stack
● Streaming Systems & Data Architecture Expert
● Experience:
○ 15+ years of experience with streaming technologies including Apache
Pulsar, Apache Flink, Apache Spark, Apache NiFi, Big Data, Cloud,
Trino, Aerospike, IoT and more.
John Kinson
Head of Sales, EMEA
StreamNative
● Startup, Scale-up and Large Enterprise expert
● Building the StreamNative Sales function in EMEA
● Experience:
○ 25+ years of building and selling distributed and embedded systems in
the telecoms, digital media and cloud enterprise software industries
Agenda
01 Welcome
02 Introduction to Messaging + Data Streaming
03 Introduction to Apache Pulsar
04 Why Open Source
05 Resources
06 Q&A
3
4
➔ Asynchronous messages triggered by
events
➔ Consuming messages regardless of
Language, System, Sender
➔ Queueing
➔ Routing
➔ Work Queues
➔ JPMorgan Chase AMQP
MESSAGING
5
➔ Perform in Real-Time
➔ Process Events as They Happen
➔ Joining Streams with SQL
➔ Find Anomalies Immediately
➔ Ordering and Arrival Semantics
➔ Continuous Streams of Data
DATA STREAMING
streamnative.io
Accessing historical as well as
real-time data
Pub/sub model enables event streams
to be sent from multiple producers,
and consumed by multiple consumers
To process large amounts of data in a
highly scalable way
When is Messaging and
Streaming used?
Industry trends
Banking
Transforming from
siloed systems
to combined data streams
Provide faster claim
processing, fraud detection and
system integration
Insurance
Handle huge columns of
data from sensors
IoT
7
Apache Pulsar is a Cloud-Native Messaging
and Event-Streaming Platform.
Messaging
Ideal for work queues that do not
require tasks to be performed in a
particular order—for example,
sending one email message to many
recipients.
RabbitMQ and Amazon SQS are
examples of popular queue-based
message systems.
Pulsar: Unified Messaging + Data Streaming
Messaging
Ideal for work queues that do not
require tasks to be performed in a
particular order—for example,
sending one email message to many
recipients.
RabbitMQ and Amazon SQS are
examples of popular queue-based
message systems.
Pulsar: Unified Messaging + Data Streaming
.. and Streaming
Works best in situations where the
order of messages is important—for
example, data ingestion.
Kafka and Amazon Kinesis are
examples of messaging systems that
use streaming semantics for
consuming messages.
Unified Messaging and Streaming
StreamNative Hub
StreamNative Cloud
Unified Batch and Stream COMPUTING
Batch
(Batch + Stream)
Unified Batch and Stream STORAGE
Offload
(Queuing + Streaming)
Tiered Storage
Pulsar
---
KoP
---
MoP
---
Websocket
Pulsar
Sink
Streaming
Edge Gateway
Protocols
CDC
Apps
Building
Microservices
Asynchronous
Communication
Building Real Time
Applications
Highly Resilient
Tiered storage
12
Pulsar Benefits
Pulsar Global Adoption
Using Pulsar with Fintech
14
Low latency
Geo-replication
Data integrity
High availability
Durability
Multi-tenancy
Multiple data consumers:
Transactions, payment
processing, alerts,
analytics, KYC, fraud
detection with ML & AI
Large data volumes,
high scalability
Financial event
messaging
Many topics, producers,
consumers
Why Open
Source Pulsar?
Sijie Guo
ASF Member
Pulsar/BookKeeper PMC
Founder and CEO
Jia Zhai
Pulsar/BookKeeper PMC
Co-Founder
Matteo Merli
ASF Member
Pulsar/BookKeeper PMC
CTO
16
● We would get many benefits from an
open source model
○ Other companies would help
develop the product
○ Better security, code escrow,
longevity
● We would keep the core features in the
OSS version
● We could build commercial offerings,
services around the core product
OUR BETS AND EARLY DECISIONS
Why Open
Source Pulsar?
17
C/OSS Model
Benefits Challenges
Many developers
Security,
Longevity,
Escrow
Why pay?
Multiple roadmaps
RESOURCES
Here are resources to continue your journey
with Apache Pulsar
Now Available
On-Demand Pulsar
Training
Academy.StreamNative.io
19
[On-Demand Video]
Introduction to Pulsar
Watch Now!
20
FREE ebook
Apache Pulsar
in Action
Access Now!
John Kinson
Head of Sales
EMEA
Q&A
Tim Spann
Developer Advocate
@PaaSDev
linkedin.com/in/
timothyspann
github.com/tspannhw
john@streamnative.io
linkedin.com/in/
johnkinson
+44 207 072 1095
22
Thank you
streamnative.io
Industry trends
Notable industries and sectors using data streaming:
Banking - transforming from siloed systems to combined data streams
○ Typical applications of event streaming include banking sector processing of
financial transactions, with multiple customer touchpoints, notifications, and
support for mobile devices
○ Banking data (transactions and meta data) can be streamed in parallel for
fraud detection using ML and AI in near real-time
Insurance - building a single view from multiple data sources to provide faster claim
processing, fraud detection and system integration
IoT - handling huge volumes of data from sensors
Adopted Pulsar to replace
Kafka in their DSP (Data
Streaming Platform).
● 1.5-2x lower in capex
cost
● 5-50x improvement in
latency
● 2-3x lower in opex due
● Process 10
petabytes/day
Adopted Pulsar to power
their billing platform,
Midas, which processing
hundreds of billions of
financial transactions daily.
Adoption then expanded to
Tencent’s Federated
Learning Platform and
Tencent Gaming.
Applied Materials is one of
the biggest semiconductor
hardware and software
supplier in the industry.
They adopted Pulsar to
enable them to build a
message bus to tie all of
their data together. They
previously used Tibco.
Pulsar Adoption Use Cases
Agenda
Welcome
Introduction to Messaging + Data Streaming
● What is messaging and data streaming?
● When is it used?
● What are the industry trends?
Introduction to Apache Pulsar
● What it is
● What it enables
● Who uses it today?
● Using Apache Pulsar in FinTech applications
Why Open Source
● Why open source Apache Pulsar?
● What have been the benefits and challenges?
Resources
Q&A
Industry trends
Banking
Transforming from
siloed systems
to combined data streams
Provide faster claim
processing, fraud detection and
system integration
Insurance
Handle huge columns of
data from sensors
IoT
26
Pulsar Adoption Spreads
Tencent serves billions of users and over a million merchants.
Use Case #1: Payments
Early 2019, Tencent
adopts Pulsar to power
their billing platform,
Midas, processing
hundreds of billions of
financial transactions
daily.
Use Case #2: ML/AI
Pulsar adoption
spreads to Tencent’s
Federated Learning
Platform where it
supports trillions of
concurrent federated
learnings every day.
Use Case #3: Gaming
Tencent’s Gaming
Department replaces
Kafka with Pulsar for
its logging pipeline.
Founded By The
Creators Of Apache Pulsar
Sijie Guo
ASF Member
Pulsar/BookKeeper PMC
Founder and CEO
Jia Zhai
Pulsar/BookKeeper PMC
Co-Founder
Matteo Merli
ASF Member
Pulsar/BookKeeper PMC
CTO
Data veterans with extensive industry experience
Messages - the basic unit of Pulsar
Component Description
Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although
message data can also conform to data schemas.
Key Messages are optionally tagged with keys, used in partitioning and also is useful for
things like topic compaction.
Properties An optional key/value map of user-defined properties.
Producer name The name of the producer who produces the message. If you do not specify a producer
name, the default name is used. Message De-Duplication.
Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of
the message is its order in that sequence. Message De-Duplication.
Producer-Consumer
Producer Consumer
Publisher sends data and
doesn't know about the
subscribers or their status.
All interactions go through
Pulsar and it handles all
communication.
Subscriber receives data
from publisher and never
directly interacts with it
Topic
Topic
Pulsar’s Publish-Subscribe model
Broker
Subscription
Consumer 1
Consumer 2
Consumer 3
Topic
Producer 1
Producer 2
● Producers send messages.
● Topics are an ordered, named channel that producers
use to transmit messages to subscribed consumers.
● Messages belong to a topic and contain an arbitrary
payload.
● Brokers handle connections and routes
messages between producers / consumers.
● Subscriptions are named configuration rules
that determine how messages are delivered to
consumers.
● Consumers receive messages.
Pulsar Subscription Modes
Different subscription modes
have different semantics:
Exclusive/Failover - guaranteed
order, single active consumer
Shared - multiple active
consumers, no order
Key_Shared - multiple active
consumers, order for given key
Producer 1
Producer 2
Pulsar Topic
Subscription D
Consumer D-1
Consumer D-2
Key-Shared
<
K
1,
V
10
>
<
K
1,
V
11
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
2
,V
2
1>
<
K
2
,V
2
2
>
Subscription C
Consumer C-1
Consumer C-2
Shared
<
K
1,
V
10
>
<
K
2,
V
21
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
1,
V
11
>
<
K
2
,V
2
2
>
Subscription A Consumer A
Exclusive
Subscription B
Consumer B-1
Consumer B-2
In case of failure in
Consumer B-1
Failover
Messaging
Ordering Guarantees
Topic Ordering Guarantees:
● Messages sent to a single topic or
partition DO have an ordering
guarantee.
● Messages sent to different partitions
DO NOT have an ordering guarantee.
33
Subscription Mode Guarantees:
● A single consumer can receive
messages from the same partition in
order using an exclusive or failover
subscription mode.
● Multiple consumers can receive
messages from the same key in order
using the key_shared subscription
mode.
Messaging
Ordering Guarantees
Topic Ordering Guarantees:
● Messages sent to a single topic or
partition DO have an ordering
guarantee.
● Messages sent to different partitions
DO NOT have an ordering guarantee.
34
Subscription Mode Guarantees:
● A single consumer can receive
messages from the same partition in
order using an exclusive or failover
subscription mode.
● Multiple consumers can receive
messages from the same key in order
using the key_shared subscription
mode.
Unified Messaging Model
Streaming
Messaging
Producer 1
Producer 2
Pulsar
Topic/Partition
m0
m1
m2
m3
m4
Consumer D-1
Consumer D-2
Consumer D-3
Subscription D
<
k
2
,
v
1
>
<
k
2
,
v
3
>
<k3,v2>
<
k
1
,
v
0
>
<
k
1
,
v
4
>
Key-Shared
Consumer C-1
Consumer C-2
Consumer C-3
Subscription C
m1
m2
m3
m4
m0
Shared
Failover
Consumer B-1
Consumer B-0
Subscription B
m1
m2
m3
m4
m0
In case of failure in
Consumer B-0
Consumer A-1
Consumer A-0
Subscription A
m1
m2
m3
m4
m0
Exclusive
X
Connectivity
• Libraries - (Java, Python, Go, NodeJS,
WebSockets, C++, C#, Scala, Rust,...)
• Functions - Lightweight Stream
Processing (Java, Python, Go)
• Connectors - Sources & Sinks
(Cassandra, Kafka, …)
• Protocol Handlers - AoP (AMQP), KoP
(Kafka), MoP (MQTT)
• Processing Engines - Flink, Spark,
Presto/Trino via Pulsar SQL
• Data Offloaders - Tiered Storage - (S3)
hub.streamnative.io
Use Cases
Multi-Tenant Data
Infrastructure
AdTech
Fraud Detection
FinTech
IoT Analytics
Microservices Development
Schema Registry
Schema Registry
schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3
(value=Avro/Protobuf/JSON)
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
(value=Avro/Protobuf/JSON) data
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
(value=Avro/Protobuf/JSON) data
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers
Pulsar Functions
● Lightweight computation
similar to AWS Lambda.
● Specifically designed to use
Apache Pulsar as a message
bus.
● Function runtime can be
located within Pulsar Broker.
A serverless event streaming
framework
● Consume messages from one
or more Pulsar topics.
● Apply user-supplied
processing logic to each
message.
● Publish the results of the
computation to another topic.
● Support multiple
programming languages (Java,
Python, Go)
● Can leverage 3rd-party
libraries to support the
execution of ML models on
the edge.
Pulsar Functions
Moving Data In and Out of Pulsar
IO/Connectors are a simple way to integrate with external systems and move
data in and out of Pulsar. https://pulsar.apache.org/docs/en/io-jdbc-sink/
● Built on top of Pulsar Functions
● Built-in connectors - hub.streamnative.io
Source Sink
Kafka-on-Pulsar (Kop)
Pulsar SQL
Presto/Trino workers can read
segments directly from
bookies (or offloaded storage)
in parallel.
Bookie
1
Segment 1
Producer Consumer
Broker 1
Topic1-Part1
Broker 2
Topic1-Part2
Broker 3
Topic1-Part3
Segment 2 Segment 3 Segment 4 Segment X
Segment 1
Segment 1 Segment 1
Segment 3 Segment 3
Segment 3
Segment 2
Segment 2
Segment 2
Segment 4
Segment 4
Segment 4
Segment X
Segment X
Segment X
Bookie
2
Bookie
3
Query
Coordinator
...
...
SQL Worker SQL Worker SQL Worker
SQL Worker
Query
Topic
Metadata
<-> Events <->
Streaming FLiPS Apps
StreamNative Hub
StreamNative Cloud
Unified Batch and Stream COMPUTING
Batch
(Batch + Stream)
Unified Batch and Stream STORAGE
Offload
(Queuing + Streaming)
Tiered Storage
Pulsar
---
KoP
---
MoP
---
Websocket
Pulsar
Sink
Streaming
Edge Gateway
Protocols
<-> Events <->
CDC
Apps
Review: Key Pulsar Terminology
● Producer is a process that publishes messages to a topic.
● Consumer is a process that establishes a subscription to a topic
and processes messages published to that topic.
● Subscription: A subscription is a named configuration rule that
determines how messages are delivered to consumers. Four
subscription modes are available in Pulsar: exclusive, shared,
failover, and key-shared.
● Brokers handle the connections and routes messages.
● Topics are named channels for transmitting messages from
producers to consumers. Partitioned Topics are “virtual” topics
composed of multiple topics.
● Messages belong to a topic and contain an arbitrary payload.
● Instance is a group of clusters that
act together as a single unit.
● Cluster is a set of Pulsar brokers,
ZooKeeper quorum, and an
ensemble of BookKeeper bookies.
● Tenants are the administrative unit
for allocating capacity and enforcing
an authentication/ authorization
scheme.
● Namespaces are a grouping
mechanism for related topics.
The Need For Real-Time Data
Hybrid and multi-cloud
strategies with native
geo-replication
Seamlessly build
microservice architectures
with support for streaming
and messaging workloads
Built for Kubernetes
CloudNative
migrations with tools
360 degree customer data
multi-tenancy, infinite
retention, and extensive
connector ecosystem
streamnative.io
Tim Spann
Developer Advocate
StreamNative
● FLiP(N) Stack = Flink, Pulsar and NiFi Stack
● Streaming Systems & Data Architecture Expert
● Experience:
○ 15+ years of experience with streaming technologies including Apache
Pulsar, Apache Flink, Apache Spark, Apache NiFi, Big Data, Cloud,
Trino, Aerospike, IoT and more.
Background
● Provides a data platform
for the cloud
● Customers include 92 of
the Fortune 100
● Core use cases include
real-time monitoring,
interactive applications,
log processing & analytics,
IOT analytics, streaming
data transformation,
real-time analytics &
event-driven workflows
Why Pulsar
● Scalability
● Durability
● Fault Tolerance
● High Availability
● Sharing & Isolation
● Messaging Models
● Persistence
● Client Languages
● Deployment in k8s
● Operability
● Disaster REcovery
● TCO
● Community & Adoption
Benefits
● 1.5-2x lower in capex
cost
● 5-50x improvement in
latency
● 2-3x lower in opex due to
layered architecture
● Processes billions of
messages/day in
production
Background
● The third-largest payment
provider in China behind
Alipay and WeChat
Payment
● 500 million registered users
and 41.9 million active users
● Need to improve the
efficiency of fraud detection
for mobile payments
● Current lambda architecture
of Kafka + Hive is complex
and difficult to maintain
Benefits
● Reduce complexity by 33%
(clusters reduced from six to
four)
● Improve production
efficiency by 11 times
● Higher stability due to the
unified architecture
Why Pulsar
● Cloud-native architecture
and segment-centric
storage
● Pulsar is able to do both
streaming and batch
processing
● Able to build a unified
data processing stack
with Pulsar and Spark,
streamlining messy
operations problems
StreamNative Customer Spotlight:
Background
● Flipkart is the largest
e-commerce company
in India with $6B+ in
annual revenue
● Company-wide
messaging platform,
supporting different
types of streaming use
cases, including:
payment processing,
order tracking,
warehouse, logistics, etc.
Why StreamNative
● Work with the original
developers of Pulsar and
top Pulsar engineers
● Experience operating
large scale,
geo-replicated
messaging systems
● 24 x 7 support to
support mission-critical
business applications
Benefits
● Able to handle spikes in
traffic without manual
rebalancing or system failure
● Reduced operational
complexity and total cost of
ownership
● Support the move to cloud
StreamNative Customer Spotlight:
Background
● Narvar provides
e-commerce supply chain
management software,
powering 300 retailers and
650 brands
● Core use case:
asynchronous processing
to distribute tasks between
the various systems,
including individual
retailers’ ordering and
warehouse management
applications
Why StreamNative
● Work with the original
developers of Pulsar and
top Pulsar engineers
● “Before we began working
with StreamNative, Sijie
Guo and his team helped us
work out some production
issues. We were very
impressed by how quickly
they solved our problems
and their willingness to
help.” - Ankush Goyal
Benefits
● Accelerate application
development
● Able to handle spikes in
traffic without manual
rebalancing or system failure
● Reduced customer issues
streamnative.io
Passionate and dedicated team.
Founded by the original developers of
Apache Pulsar.
StreamNative helps teams to capture,
manage, and leverage data using Pulsar’s
unified messaging and streaming
platform.
Building An App
Code Along With Tim
<<DEMO>>
Geo-Replication
Pulsar has built-in cross
data center replication
that is used in production
already.
Why Open
Source Pulsar?
Sijie Guo
ASF Member
Pulsar/BookKeeper PMC
Founder and CEO
Jia Zhai
Pulsar/BookKeeper PMC
Co-Founder
Matteo Merli
ASF Member
Pulsar/BookKeeper PMC
CTO
● Other companies would help develop the
product
● We could build commercial offerings, services
around the core product
● We would get many benefits from an open
source model
1 of 55

Recommended

MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdf by
MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdfMLconf 2022 NYC Event-Driven Machine Learning at Scale.pdf
MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdfTimothy Spann
747 views26 slides
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) - Pulsar Summit Asia ... by
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) - Pulsar Summit Asia ...Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) - Pulsar Summit Asia ...
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) - Pulsar Summit Asia ...StreamNative
258 views22 slides
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa... by
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...StreamNative
1.4K views27 slides
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends by
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsTimothy Spann
986 views60 slides
Real time cloud native open source streaming of any data to apache solr by
Real time cloud native open source streaming of any data to apache solrReal time cloud native open source streaming of any data to apache solr
Real time cloud native open source streaming of any data to apache solrTimothy Spann
759 views31 slides
Automation + dev ops summit hail hydrate! from stream to lake by
Automation + dev ops summit   hail hydrate! from stream to lakeAutomation + dev ops summit   hail hydrate! from stream to lake
Automation + dev ops summit hail hydrate! from stream to lakeTimothy Spann
457 views33 slides

More Related Content

What's hot

Big data conference europe real-time streaming in any and all clouds, hybri... by
Big data conference europe   real-time streaming in any and all clouds, hybri...Big data conference europe   real-time streaming in any and all clouds, hybri...
Big data conference europe real-time streaming in any and all clouds, hybri...Timothy Spann
811 views32 slides
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and... by
Osacon 2021   hello hydrate! from stream to clickhouse with apache pulsar and...Osacon 2021   hello hydrate! from stream to clickhouse with apache pulsar and...
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and...Timothy Spann
3.1K views46 slides
Python web conference 2022 apache pulsar development 101 with python (f li-... by
Python web conference 2022   apache pulsar development 101 with python (f li-...Python web conference 2022   apache pulsar development 101 with python (f li-...
Python web conference 2022 apache pulsar development 101 with python (f li-...Timothy Spann
282 views49 slides
Big mountain data and dev conference apache pulsar with mqtt for edge compu... by
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...Timothy Spann
440 views47 slides
Hail hydrate! from stream to lake using open source by
Hail hydrate! from stream to lake using open sourceHail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open sourceTimothy Spann
569 views25 slides
DBCC 2021 - FLiP Stack for Cloud Data Lakes by
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data LakesTimothy Spann
717 views36 slides

What's hot(20)

Big data conference europe real-time streaming in any and all clouds, hybri... by Timothy Spann
Big data conference europe   real-time streaming in any and all clouds, hybri...Big data conference europe   real-time streaming in any and all clouds, hybri...
Big data conference europe real-time streaming in any and all clouds, hybri...
Timothy Spann811 views
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and... by Timothy Spann
Osacon 2021   hello hydrate! from stream to clickhouse with apache pulsar and...Osacon 2021   hello hydrate! from stream to clickhouse with apache pulsar and...
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and...
Timothy Spann3.1K views
Python web conference 2022 apache pulsar development 101 with python (f li-... by Timothy Spann
Python web conference 2022   apache pulsar development 101 with python (f li-...Python web conference 2022   apache pulsar development 101 with python (f li-...
Python web conference 2022 apache pulsar development 101 with python (f li-...
Timothy Spann282 views
Big mountain data and dev conference apache pulsar with mqtt for edge compu... by Timothy Spann
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Timothy Spann440 views
Hail hydrate! from stream to lake using open source by Timothy Spann
Hail hydrate! from stream to lake using open sourceHail hydrate! from stream to lake using open source
Hail hydrate! from stream to lake using open source
Timothy Spann569 views
DBCC 2021 - FLiP Stack for Cloud Data Lakes by Timothy Spann
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data Lakes
Timothy Spann717 views
Data science online camp using the flipn stack for edge ai (flink, nifi, pu... by Timothy Spann
Data science online camp   using the flipn stack for edge ai (flink, nifi, pu...Data science online camp   using the flipn stack for edge ai (flink, nifi, pu...
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...
Timothy Spann1K views
StreamNative FLiP into scylladb - scylla summit 2022 by Timothy Spann
StreamNative   FLiP into scylladb - scylla summit 2022StreamNative   FLiP into scylladb - scylla summit 2022
StreamNative FLiP into scylladb - scylla summit 2022
Timothy Spann528 views
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021 by StreamNative
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
Select Star: Flink SQL for Pulsar Folks - Pulsar Summit NA 2021
StreamNative175 views
Data minutes #2 Apache Pulsar with MQTT for Edge Computing Lightning - 2022 by Timothy Spann
Data minutes #2   Apache Pulsar with MQTT for Edge Computing Lightning - 2022Data minutes #2   Apache Pulsar with MQTT for Edge Computing Lightning - 2022
Data minutes #2 Apache Pulsar with MQTT for Edge Computing Lightning - 2022
Timothy Spann571 views
Music city data Hail Hydrate! from stream to lake by Timothy Spann
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
Timothy Spann708 views
fluentd -- the missing log collector by Muga Nishizawa
fluentd -- the missing log collectorfluentd -- the missing log collector
fluentd -- the missing log collector
Muga Nishizawa2.2K views
Apache Deep Learning 201 - Philly Open Source by Timothy Spann
Apache Deep Learning 201 - Philly Open SourceApache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open Source
Timothy Spann642 views
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl... by confluent
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
confluent13K views
Apache Pulsar, Supporting the Entire Lifecycle of Streaming Data by StreamNative
Apache Pulsar, Supporting the Entire Lifecycle of Streaming DataApache Pulsar, Supporting the Entire Lifecycle of Streaming Data
Apache Pulsar, Supporting the Entire Lifecycle of Streaming Data
StreamNative205 views
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka... by Timothy Spann
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
Live Demo Jam Expands: The Leading-Edge Streaming Data Platform with NiFi, Ka...
Timothy Spann519 views
Kafka and Spark Streaming by datamantra
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streaming
datamantra2.1K views
Pulsar summit asia 2021: Designing Pulsar for Isolation by Shivji Kumar Jha
Pulsar summit asia 2021: Designing Pulsar for IsolationPulsar summit asia 2021: Designing Pulsar for Isolation
Pulsar summit asia 2021: Designing Pulsar for Isolation
Shivji Kumar Jha175 views
ApacheCon 2021 Apache Deep Learning 302 by Timothy Spann
ApacheCon 2021   Apache Deep Learning 302ApacheCon 2021   Apache Deep Learning 302
ApacheCon 2021 Apache Deep Learning 302
Timothy Spann632 views

Similar to Open Source Bristol 30 March 2022

[AerospikeRoadshow] Apache Pulsar Unifies Streaming and Messaging for Real-Ti... by
[AerospikeRoadshow] Apache Pulsar Unifies Streaming and Messaging for Real-Ti...[AerospikeRoadshow] Apache Pulsar Unifies Streaming and Messaging for Real-Ti...
[AerospikeRoadshow] Apache Pulsar Unifies Streaming and Messaging for Real-Ti...Timothy Spann
177 views22 slides
Built on Pulsar: A Commercial Consent Management System for 80 Million Citizens by
Built on Pulsar: A Commercial Consent Management System for 80 Million CitizensBuilt on Pulsar: A Commercial Consent Management System for 80 Million Citizens
Built on Pulsar: A Commercial Consent Management System for 80 Million CitizensStreamNative
266 views22 slides
Confluent Messaging Modernization Forum by
Confluent Messaging Modernization ForumConfluent Messaging Modernization Forum
Confluent Messaging Modernization Forumconfluent
797 views39 slides
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum... by
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...StreamNative
750 views37 slides
MuleSoft Meetup Singapore #8 March 2021 by
MuleSoft Meetup Singapore #8 March 2021MuleSoft Meetup Singapore #8 March 2021
MuleSoft Meetup Singapore #8 March 2021Julian Douch
366 views45 slides
Event-Driven Applications Done Right - Pulsar Summit SF 2022 by
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022StreamNative
32 views57 slides

Similar to Open Source Bristol 30 March 2022(20)

[AerospikeRoadshow] Apache Pulsar Unifies Streaming and Messaging for Real-Ti... by Timothy Spann
[AerospikeRoadshow] Apache Pulsar Unifies Streaming and Messaging for Real-Ti...[AerospikeRoadshow] Apache Pulsar Unifies Streaming and Messaging for Real-Ti...
[AerospikeRoadshow] Apache Pulsar Unifies Streaming and Messaging for Real-Ti...
Timothy Spann177 views
Built on Pulsar: A Commercial Consent Management System for 80 Million Citizens by StreamNative
Built on Pulsar: A Commercial Consent Management System for 80 Million CitizensBuilt on Pulsar: A Commercial Consent Management System for 80 Million Citizens
Built on Pulsar: A Commercial Consent Management System for 80 Million Citizens
StreamNative266 views
Confluent Messaging Modernization Forum by confluent
Confluent Messaging Modernization ForumConfluent Messaging Modernization Forum
Confluent Messaging Modernization Forum
confluent797 views
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum... by StreamNative
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
Apache Pulsar: Why Unified Messaging and Streaming Is the Future - Pulsar Sum...
StreamNative750 views
MuleSoft Meetup Singapore #8 March 2021 by Julian Douch
MuleSoft Meetup Singapore #8 March 2021MuleSoft Meetup Singapore #8 March 2021
MuleSoft Meetup Singapore #8 March 2021
Julian Douch366 views
Event-Driven Applications Done Right - Pulsar Summit SF 2022 by StreamNative
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
StreamNative32 views
Introducing Events and Stream Processing into Nationwide Building Society (Ro... by confluent
Introducing Events and Stream Processing into Nationwide Building Society (Ro...Introducing Events and Stream Processing into Nationwide Building Society (Ro...
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
confluent1.4K views
Confluent & GSI Webinars series - Session 3 by confluent
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
confluent17 views
[IoT Tech Expo] Smart Cities – Leveraging Messaging from Project to City to ... by Solace
[IoT Tech Expo] Smart Cities – Leveraging Messaging from Project to City to ...[IoT Tech Expo] Smart Cities – Leveraging Messaging from Project to City to ...
[IoT Tech Expo] Smart Cities – Leveraging Messaging from Project to City to ...
Solace883 views
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal... by HostedbyConfluent
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
From Monoliths to Microservices - A Journey With Confluent With Gayathri Veal...
HostedbyConfluent748 views
All Things Open SDN, NFV and Open Daylight by Mark Hinkle
All Things Open SDN, NFV and Open Daylight All Things Open SDN, NFV and Open Daylight
All Things Open SDN, NFV and Open Daylight
Mark Hinkle1.9K views
Introducing Events and Stream Processing into Nationwide Building Society by confluent
Introducing Events and Stream Processing into Nationwide Building SocietyIntroducing Events and Stream Processing into Nationwide Building Society
Introducing Events and Stream Processing into Nationwide Building Society
confluent3.4K views
apidays New York - Leveraging Event Streaming to Super-Charge your Business, ... by apidays
apidays New York - Leveraging Event Streaming to Super-Charge your Business, ...apidays New York - Leveraging Event Streaming to Super-Charge your Business, ...
apidays New York - Leveraging Event Streaming to Super-Charge your Business, ...
apidays45 views
How io t is changing our world by manoharparakh
How io t is changing our worldHow io t is changing our world
How io t is changing our world
manoharparakh42 views
Combating Mobile Device Theft with Blockchain by Nagesh Caparthy
Combating Mobile Device Theft with BlockchainCombating Mobile Device Theft with Blockchain
Combating Mobile Device Theft with Blockchain
Nagesh Caparthy717 views
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum... by HostedbyConfluent
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
HostedbyConfluent336 views
(Current22) Let's Monitor The Conditions at the Conference by Timothy Spann
(Current22) Let's Monitor The Conditions at the Conference(Current22) Let's Monitor The Conditions at the Conference
(Current22) Let's Monitor The Conditions at the Conference
Timothy Spann150 views
ITPC Building Modern Data Streaming Apps by Timothy Spann
ITPC Building Modern Data Streaming AppsITPC Building Modern Data Streaming Apps
ITPC Building Modern Data Streaming Apps
Timothy Spann797 views
Bni cloud presentation by richszy
Bni cloud presentationBni cloud presentation
Bni cloud presentation
richszy628 views

More from Timothy Spann

Building Real-Time Travel Alerts by
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel AlertsTimothy Spann
165 views48 slides
JConWorld_ Continuous SQL with Kafka and Flink by
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
156 views36 slides
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines by
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data PipelinesTimothy Spann
150 views25 slides
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo by
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoTimothy Spann
162 views8 slides
CoC23_ Looking at the New Features of Apache NiFi by
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiTimothy Spann
36 views24 slides
CoC23_ Let’s Monitor The Conditions at the Conference by
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the ConferenceTimothy Spann
17 views17 slides

More from Timothy Spann(20)

Building Real-Time Travel Alerts by Timothy Spann
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
Timothy Spann165 views
JConWorld_ Continuous SQL with Kafka and Flink by Timothy Spann
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann156 views
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines by Timothy Spann
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
Timothy Spann150 views
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo by Timothy Spann
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Timothy Spann162 views
CoC23_ Looking at the New Features of Apache NiFi by Timothy Spann
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFi
Timothy Spann36 views
CoC23_ Let’s Monitor The Conditions at the Conference by Timothy Spann
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the Conference
Timothy Spann17 views
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf by Timothy Spann
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
Timothy Spann23 views
CoC23_Utilizing Real-Time Transit Data for Travel Optimization by Timothy Spann
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationCoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
Timothy Spann31 views
The Never Landing Stream with HTAP and Streaming by Timothy Spann
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
Timothy Spann254 views
Meetup - Brasil - Data In Motion - 2023 September 19 by Timothy Spann
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19
Timothy Spann319 views
Implement a Universal Data Distribution Architecture to Manage All Streaming ... by Timothy Spann
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Timothy Spann28 views
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data by Timothy Spann
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Timothy Spann193 views
big data fest building modern data streaming apps by Timothy Spann
big data fest building modern data streaming appsbig data fest building modern data streaming apps
big data fest building modern data streaming apps
Timothy Spann317 views
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp by Timothy Spann
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Timothy Spann163 views
OSSNA Building Modern Data Streaming Apps by Timothy Spann
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
Timothy Spann155 views
GSJUG: Mastering Data Streaming Pipelines 09May2023 by Timothy Spann
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
Timothy Spann255 views
BestInFlowCompetitionTutorials03May2023 by Timothy Spann
BestInFlowCompetitionTutorials03May2023BestInFlowCompetitionTutorials03May2023
BestInFlowCompetitionTutorials03May2023
Timothy Spann11 views
Cloudera Sandbox Event Guidelines For Workflow by Timothy Spann
Cloudera Sandbox Event Guidelines For WorkflowCloudera Sandbox Event Guidelines For Workflow
Cloudera Sandbox Event Guidelines For Workflow
Timothy Spann32 views
Meet the Committers Webinar_ Lab Preparation by Timothy Spann
Meet the Committers Webinar_ Lab PreparationMeet the Committers Webinar_ Lab Preparation
Meet the Committers Webinar_ Lab Preparation
Timothy Spann32 views

Recently uploaded

Chat GPTs by
Chat GPTsChat GPTs
Chat GPTsGene Leybzon
13 views36 slides
nintendo_64.pptx by
nintendo_64.pptxnintendo_64.pptx
nintendo_64.pptxpaiga02016
7 views7 slides
What is API by
What is APIWhat is API
What is APIartembondar5
15 views15 slides
JioEngage_Presentation.pptx by
JioEngage_Presentation.pptxJioEngage_Presentation.pptx
JioEngage_Presentation.pptxadmin125455
9 views4 slides
Ports-and-Adapters Architecture for Embedded HMI by
Ports-and-Adapters Architecture for Embedded HMIPorts-and-Adapters Architecture for Embedded HMI
Ports-and-Adapters Architecture for Embedded HMIBurkhard Stubert
35 views19 slides
Techstack Ltd at Slush 2023, Ukrainian delegation by
Techstack Ltd at Slush 2023, Ukrainian delegationTechstack Ltd at Slush 2023, Ukrainian delegation
Techstack Ltd at Slush 2023, Ukrainian delegationViktoriiaOpanasenko
7 views4 slides

Recently uploaded(20)

JioEngage_Presentation.pptx by admin125455
JioEngage_Presentation.pptxJioEngage_Presentation.pptx
JioEngage_Presentation.pptx
admin1254559 views
Ports-and-Adapters Architecture for Embedded HMI by Burkhard Stubert
Ports-and-Adapters Architecture for Embedded HMIPorts-and-Adapters Architecture for Embedded HMI
Ports-and-Adapters Architecture for Embedded HMI
Burkhard Stubert35 views
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P... by NimaTorabi2
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
NimaTorabi217 views
Top-5-production-devconMunich-2023-v2.pptx by Tier1 app
Top-5-production-devconMunich-2023-v2.pptxTop-5-production-devconMunich-2023-v2.pptx
Top-5-production-devconMunich-2023-v2.pptx
Tier1 app9 views
Automated Testing of Microsoft Power BI Reports by RTTS
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI Reports
RTTS11 views
Mobile App Development Company by Richestsoft
Mobile App Development CompanyMobile App Development Company
Mobile App Development Company
Richestsoft 5 views
Supercharging your Python Development Environment with VS Code and Dev Contai... by Dawn Wages
Supercharging your Python Development Environment with VS Code and Dev Contai...Supercharging your Python Development Environment with VS Code and Dev Contai...
Supercharging your Python Development Environment with VS Code and Dev Contai...
Dawn Wages5 views
Streamlining Your Business Operations with Enterprise Application Integration... by Flexsin
Streamlining Your Business Operations with Enterprise Application Integration...Streamlining Your Business Operations with Enterprise Application Integration...
Streamlining Your Business Operations with Enterprise Application Integration...
Flexsin 5 views
Top-5-production-devconMunich-2023.pptx by Tier1 app
Top-5-production-devconMunich-2023.pptxTop-5-production-devconMunich-2023.pptx
Top-5-production-devconMunich-2023.pptx
Tier1 app10 views
Dapr Unleashed: Accelerating Microservice Development by Miroslav Janeski
Dapr Unleashed: Accelerating Microservice DevelopmentDapr Unleashed: Accelerating Microservice Development
Dapr Unleashed: Accelerating Microservice Development
Miroslav Janeski16 views
Introduction to Git Source Control by John Valentino
Introduction to Git Source ControlIntroduction to Git Source Control
Introduction to Git Source Control
John Valentino8 views
Electronic AWB - Electronic Air Waybill by Freightoscope
Electronic AWB - Electronic Air Waybill Electronic AWB - Electronic Air Waybill
Electronic AWB - Electronic Air Waybill
Freightoscope 6 views

Open Source Bristol 30 March 2022

  • 2. streamnative.io Tim Spann Developer Advocate StreamNative ● FLiP(N) Stack = Flink, Pulsar and NiFi Stack ● Streaming Systems & Data Architecture Expert ● Experience: ○ 15+ years of experience with streaming technologies including Apache Pulsar, Apache Flink, Apache Spark, Apache NiFi, Big Data, Cloud, Trino, Aerospike, IoT and more. John Kinson Head of Sales, EMEA StreamNative ● Startup, Scale-up and Large Enterprise expert ● Building the StreamNative Sales function in EMEA ● Experience: ○ 25+ years of building and selling distributed and embedded systems in the telecoms, digital media and cloud enterprise software industries
  • 3. Agenda 01 Welcome 02 Introduction to Messaging + Data Streaming 03 Introduction to Apache Pulsar 04 Why Open Source 05 Resources 06 Q&A 3
  • 4. 4 ➔ Asynchronous messages triggered by events ➔ Consuming messages regardless of Language, System, Sender ➔ Queueing ➔ Routing ➔ Work Queues ➔ JPMorgan Chase AMQP MESSAGING
  • 5. 5 ➔ Perform in Real-Time ➔ Process Events as They Happen ➔ Joining Streams with SQL ➔ Find Anomalies Immediately ➔ Ordering and Arrival Semantics ➔ Continuous Streams of Data DATA STREAMING
  • 6. streamnative.io Accessing historical as well as real-time data Pub/sub model enables event streams to be sent from multiple producers, and consumed by multiple consumers To process large amounts of data in a highly scalable way When is Messaging and Streaming used?
  • 7. Industry trends Banking Transforming from siloed systems to combined data streams Provide faster claim processing, fraud detection and system integration Insurance Handle huge columns of data from sensors IoT 7
  • 8. Apache Pulsar is a Cloud-Native Messaging and Event-Streaming Platform.
  • 9. Messaging Ideal for work queues that do not require tasks to be performed in a particular order—for example, sending one email message to many recipients. RabbitMQ and Amazon SQS are examples of popular queue-based message systems. Pulsar: Unified Messaging + Data Streaming
  • 10. Messaging Ideal for work queues that do not require tasks to be performed in a particular order—for example, sending one email message to many recipients. RabbitMQ and Amazon SQS are examples of popular queue-based message systems. Pulsar: Unified Messaging + Data Streaming .. and Streaming Works best in situations where the order of messages is important—for example, data ingestion. Kafka and Amazon Kinesis are examples of messaging systems that use streaming semantics for consuming messages.
  • 11. Unified Messaging and Streaming StreamNative Hub StreamNative Cloud Unified Batch and Stream COMPUTING Batch (Batch + Stream) Unified Batch and Stream STORAGE Offload (Queuing + Streaming) Tiered Storage Pulsar --- KoP --- MoP --- Websocket Pulsar Sink Streaming Edge Gateway Protocols CDC Apps
  • 14. Using Pulsar with Fintech 14 Low latency Geo-replication Data integrity High availability Durability Multi-tenancy Multiple data consumers: Transactions, payment processing, alerts, analytics, KYC, fraud detection with ML & AI Large data volumes, high scalability Financial event messaging Many topics, producers, consumers
  • 15. Why Open Source Pulsar? Sijie Guo ASF Member Pulsar/BookKeeper PMC Founder and CEO Jia Zhai Pulsar/BookKeeper PMC Co-Founder Matteo Merli ASF Member Pulsar/BookKeeper PMC CTO
  • 16. 16 ● We would get many benefits from an open source model ○ Other companies would help develop the product ○ Better security, code escrow, longevity ● We would keep the core features in the OSS version ● We could build commercial offerings, services around the core product OUR BETS AND EARLY DECISIONS Why Open Source Pulsar?
  • 17. 17 C/OSS Model Benefits Challenges Many developers Security, Longevity, Escrow Why pay? Multiple roadmaps
  • 18. RESOURCES Here are resources to continue your journey with Apache Pulsar
  • 20. 20 FREE ebook Apache Pulsar in Action Access Now!
  • 21. John Kinson Head of Sales EMEA Q&A Tim Spann Developer Advocate @PaaSDev linkedin.com/in/ timothyspann github.com/tspannhw john@streamnative.io linkedin.com/in/ johnkinson +44 207 072 1095
  • 23. streamnative.io Industry trends Notable industries and sectors using data streaming: Banking - transforming from siloed systems to combined data streams ○ Typical applications of event streaming include banking sector processing of financial transactions, with multiple customer touchpoints, notifications, and support for mobile devices ○ Banking data (transactions and meta data) can be streamed in parallel for fraud detection using ML and AI in near real-time Insurance - building a single view from multiple data sources to provide faster claim processing, fraud detection and system integration IoT - handling huge volumes of data from sensors
  • 24. Adopted Pulsar to replace Kafka in their DSP (Data Streaming Platform). ● 1.5-2x lower in capex cost ● 5-50x improvement in latency ● 2-3x lower in opex due ● Process 10 petabytes/day Adopted Pulsar to power their billing platform, Midas, which processing hundreds of billions of financial transactions daily. Adoption then expanded to Tencent’s Federated Learning Platform and Tencent Gaming. Applied Materials is one of the biggest semiconductor hardware and software supplier in the industry. They adopted Pulsar to enable them to build a message bus to tie all of their data together. They previously used Tibco. Pulsar Adoption Use Cases
  • 25. Agenda Welcome Introduction to Messaging + Data Streaming ● What is messaging and data streaming? ● When is it used? ● What are the industry trends? Introduction to Apache Pulsar ● What it is ● What it enables ● Who uses it today? ● Using Apache Pulsar in FinTech applications Why Open Source ● Why open source Apache Pulsar? ● What have been the benefits and challenges? Resources Q&A
  • 26. Industry trends Banking Transforming from siloed systems to combined data streams Provide faster claim processing, fraud detection and system integration Insurance Handle huge columns of data from sensors IoT 26
  • 27. Pulsar Adoption Spreads Tencent serves billions of users and over a million merchants. Use Case #1: Payments Early 2019, Tencent adopts Pulsar to power their billing platform, Midas, processing hundreds of billions of financial transactions daily. Use Case #2: ML/AI Pulsar adoption spreads to Tencent’s Federated Learning Platform where it supports trillions of concurrent federated learnings every day. Use Case #3: Gaming Tencent’s Gaming Department replaces Kafka with Pulsar for its logging pipeline.
  • 28. Founded By The Creators Of Apache Pulsar Sijie Guo ASF Member Pulsar/BookKeeper PMC Founder and CEO Jia Zhai Pulsar/BookKeeper PMC Co-Founder Matteo Merli ASF Member Pulsar/BookKeeper PMC CTO Data veterans with extensive industry experience
  • 29. Messages - the basic unit of Pulsar Component Description Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data can also conform to data schemas. Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like topic compaction. Properties An optional key/value map of user-defined properties. Producer name The name of the producer who produces the message. If you do not specify a producer name, the default name is used. Message De-Duplication. Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the message is its order in that sequence. Message De-Duplication.
  • 30. Producer-Consumer Producer Consumer Publisher sends data and doesn't know about the subscribers or their status. All interactions go through Pulsar and it handles all communication. Subscriber receives data from publisher and never directly interacts with it Topic Topic
  • 31. Pulsar’s Publish-Subscribe model Broker Subscription Consumer 1 Consumer 2 Consumer 3 Topic Producer 1 Producer 2 ● Producers send messages. ● Topics are an ordered, named channel that producers use to transmit messages to subscribed consumers. ● Messages belong to a topic and contain an arbitrary payload. ● Brokers handle connections and routes messages between producers / consumers. ● Subscriptions are named configuration rules that determine how messages are delivered to consumers. ● Consumers receive messages.
  • 32. Pulsar Subscription Modes Different subscription modes have different semantics: Exclusive/Failover - guaranteed order, single active consumer Shared - multiple active consumers, no order Key_Shared - multiple active consumers, order for given key Producer 1 Producer 2 Pulsar Topic Subscription D Consumer D-1 Consumer D-2 Key-Shared < K 1, V 10 > < K 1, V 11 > < K 1, V 12 > < K 2 ,V 2 0 > < K 2 ,V 2 1> < K 2 ,V 2 2 > Subscription C Consumer C-1 Consumer C-2 Shared < K 1, V 10 > < K 2, V 21 > < K 1, V 12 > < K 2 ,V 2 0 > < K 1, V 11 > < K 2 ,V 2 2 > Subscription A Consumer A Exclusive Subscription B Consumer B-1 Consumer B-2 In case of failure in Consumer B-1 Failover
  • 33. Messaging Ordering Guarantees Topic Ordering Guarantees: ● Messages sent to a single topic or partition DO have an ordering guarantee. ● Messages sent to different partitions DO NOT have an ordering guarantee. 33 Subscription Mode Guarantees: ● A single consumer can receive messages from the same partition in order using an exclusive or failover subscription mode. ● Multiple consumers can receive messages from the same key in order using the key_shared subscription mode.
  • 34. Messaging Ordering Guarantees Topic Ordering Guarantees: ● Messages sent to a single topic or partition DO have an ordering guarantee. ● Messages sent to different partitions DO NOT have an ordering guarantee. 34 Subscription Mode Guarantees: ● A single consumer can receive messages from the same partition in order using an exclusive or failover subscription mode. ● Multiple consumers can receive messages from the same key in order using the key_shared subscription mode.
  • 35. Unified Messaging Model Streaming Messaging Producer 1 Producer 2 Pulsar Topic/Partition m0 m1 m2 m3 m4 Consumer D-1 Consumer D-2 Consumer D-3 Subscription D < k 2 , v 1 > < k 2 , v 3 > <k3,v2> < k 1 , v 0 > < k 1 , v 4 > Key-Shared Consumer C-1 Consumer C-2 Consumer C-3 Subscription C m1 m2 m3 m4 m0 Shared Failover Consumer B-1 Consumer B-0 Subscription B m1 m2 m3 m4 m0 In case of failure in Consumer B-0 Consumer A-1 Consumer A-0 Subscription A m1 m2 m3 m4 m0 Exclusive X
  • 36. Connectivity • Libraries - (Java, Python, Go, NodeJS, WebSockets, C++, C#, Scala, Rust,...) • Functions - Lightweight Stream Processing (Java, Python, Go) • Connectors - Sources & Sinks (Cassandra, Kafka, …) • Protocol Handlers - AoP (AMQP), KoP (Kafka), MoP (MQTT) • Processing Engines - Flink, Spark, Presto/Trino via Pulsar SQL • Data Offloaders - Tiered Storage - (S3) hub.streamnative.io
  • 37. Use Cases Multi-Tenant Data Infrastructure AdTech Fraud Detection FinTech IoT Analytics Microservices Development
  • 38. Schema Registry Schema Registry schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3 (value=Avro/Protobuf/JSON) Schema Data ID Local Cache for Schemas + Schema Data ID + Local Cache for Schemas Send schema-1 (value=Avro/Protobuf/JSON) data serialized per schema ID Send (register) schema (if not in local cache) Read schema-1 (value=Avro/Protobuf/JSON) data deserialized per schema ID Get schema by ID (if not in local cache) Producers Consumers
  • 39. Pulsar Functions ● Lightweight computation similar to AWS Lambda. ● Specifically designed to use Apache Pulsar as a message bus. ● Function runtime can be located within Pulsar Broker. A serverless event streaming framework
  • 40. ● Consume messages from one or more Pulsar topics. ● Apply user-supplied processing logic to each message. ● Publish the results of the computation to another topic. ● Support multiple programming languages (Java, Python, Go) ● Can leverage 3rd-party libraries to support the execution of ML models on the edge. Pulsar Functions
  • 41. Moving Data In and Out of Pulsar IO/Connectors are a simple way to integrate with external systems and move data in and out of Pulsar. https://pulsar.apache.org/docs/en/io-jdbc-sink/ ● Built on top of Pulsar Functions ● Built-in connectors - hub.streamnative.io Source Sink
  • 43. Pulsar SQL Presto/Trino workers can read segments directly from bookies (or offloaded storage) in parallel. Bookie 1 Segment 1 Producer Consumer Broker 1 Topic1-Part1 Broker 2 Topic1-Part2 Broker 3 Topic1-Part3 Segment 2 Segment 3 Segment 4 Segment X Segment 1 Segment 1 Segment 1 Segment 3 Segment 3 Segment 3 Segment 2 Segment 2 Segment 2 Segment 4 Segment 4 Segment 4 Segment X Segment X Segment X Bookie 2 Bookie 3 Query Coordinator ... ... SQL Worker SQL Worker SQL Worker SQL Worker Query Topic Metadata
  • 44. <-> Events <-> Streaming FLiPS Apps StreamNative Hub StreamNative Cloud Unified Batch and Stream COMPUTING Batch (Batch + Stream) Unified Batch and Stream STORAGE Offload (Queuing + Streaming) Tiered Storage Pulsar --- KoP --- MoP --- Websocket Pulsar Sink Streaming Edge Gateway Protocols <-> Events <-> CDC Apps
  • 45. Review: Key Pulsar Terminology ● Producer is a process that publishes messages to a topic. ● Consumer is a process that establishes a subscription to a topic and processes messages published to that topic. ● Subscription: A subscription is a named configuration rule that determines how messages are delivered to consumers. Four subscription modes are available in Pulsar: exclusive, shared, failover, and key-shared. ● Brokers handle the connections and routes messages. ● Topics are named channels for transmitting messages from producers to consumers. Partitioned Topics are “virtual” topics composed of multiple topics. ● Messages belong to a topic and contain an arbitrary payload. ● Instance is a group of clusters that act together as a single unit. ● Cluster is a set of Pulsar brokers, ZooKeeper quorum, and an ensemble of BookKeeper bookies. ● Tenants are the administrative unit for allocating capacity and enforcing an authentication/ authorization scheme. ● Namespaces are a grouping mechanism for related topics.
  • 46. The Need For Real-Time Data Hybrid and multi-cloud strategies with native geo-replication Seamlessly build microservice architectures with support for streaming and messaging workloads Built for Kubernetes CloudNative migrations with tools 360 degree customer data multi-tenancy, infinite retention, and extensive connector ecosystem
  • 47. streamnative.io Tim Spann Developer Advocate StreamNative ● FLiP(N) Stack = Flink, Pulsar and NiFi Stack ● Streaming Systems & Data Architecture Expert ● Experience: ○ 15+ years of experience with streaming technologies including Apache Pulsar, Apache Flink, Apache Spark, Apache NiFi, Big Data, Cloud, Trino, Aerospike, IoT and more.
  • 48. Background ● Provides a data platform for the cloud ● Customers include 92 of the Fortune 100 ● Core use cases include real-time monitoring, interactive applications, log processing & analytics, IOT analytics, streaming data transformation, real-time analytics & event-driven workflows Why Pulsar ● Scalability ● Durability ● Fault Tolerance ● High Availability ● Sharing & Isolation ● Messaging Models ● Persistence ● Client Languages ● Deployment in k8s ● Operability ● Disaster REcovery ● TCO ● Community & Adoption Benefits ● 1.5-2x lower in capex cost ● 5-50x improvement in latency ● 2-3x lower in opex due to layered architecture ● Processes billions of messages/day in production
  • 49. Background ● The third-largest payment provider in China behind Alipay and WeChat Payment ● 500 million registered users and 41.9 million active users ● Need to improve the efficiency of fraud detection for mobile payments ● Current lambda architecture of Kafka + Hive is complex and difficult to maintain Benefits ● Reduce complexity by 33% (clusters reduced from six to four) ● Improve production efficiency by 11 times ● Higher stability due to the unified architecture Why Pulsar ● Cloud-native architecture and segment-centric storage ● Pulsar is able to do both streaming and batch processing ● Able to build a unified data processing stack with Pulsar and Spark, streamlining messy operations problems
  • 50. StreamNative Customer Spotlight: Background ● Flipkart is the largest e-commerce company in India with $6B+ in annual revenue ● Company-wide messaging platform, supporting different types of streaming use cases, including: payment processing, order tracking, warehouse, logistics, etc. Why StreamNative ● Work with the original developers of Pulsar and top Pulsar engineers ● Experience operating large scale, geo-replicated messaging systems ● 24 x 7 support to support mission-critical business applications Benefits ● Able to handle spikes in traffic without manual rebalancing or system failure ● Reduced operational complexity and total cost of ownership ● Support the move to cloud
  • 51. StreamNative Customer Spotlight: Background ● Narvar provides e-commerce supply chain management software, powering 300 retailers and 650 brands ● Core use case: asynchronous processing to distribute tasks between the various systems, including individual retailers’ ordering and warehouse management applications Why StreamNative ● Work with the original developers of Pulsar and top Pulsar engineers ● “Before we began working with StreamNative, Sijie Guo and his team helped us work out some production issues. We were very impressed by how quickly they solved our problems and their willingness to help.” - Ankush Goyal Benefits ● Accelerate application development ● Able to handle spikes in traffic without manual rebalancing or system failure ● Reduced customer issues
  • 52. streamnative.io Passionate and dedicated team. Founded by the original developers of Apache Pulsar. StreamNative helps teams to capture, manage, and leverage data using Pulsar’s unified messaging and streaming platform.
  • 53. Building An App Code Along With Tim <<DEMO>>
  • 54. Geo-Replication Pulsar has built-in cross data center replication that is used in production already.
  • 55. Why Open Source Pulsar? Sijie Guo ASF Member Pulsar/BookKeeper PMC Founder and CEO Jia Zhai Pulsar/BookKeeper PMC Co-Founder Matteo Merli ASF Member Pulsar/BookKeeper PMC CTO ● Other companies would help develop the product ● We could build commercial offerings, services around the core product ● We would get many benefits from an open source model