Building Real-Time
Requires a Team
Tim Spann
Developer Advocate
Proprietary & Confidential | 2
Tim Spann
Developer Advocate
at StreamNative
FLiP(N) Stack = Flink, Pulsar and NiFi Stack
Streaming Systems & Data Architecture Expert
Experience:
● 15+ years of experience with streaming technologies
including Pulsar, Flink, Spark, NiFi, Big Data, Cloud, MXNet,
IoT, Python and more.
● Today, he helps to grow the Pulsar community sharing rich
technical knowledge and experience at both global
conferences and through individual conversations.
Proprietary & Confidential |
https://bit.ly/32dAJft
3
FLiP Stack Weekly
This week in Apache Flink, Apache
Pulsar, Apache NiFi, Apache Spark and
open source friends.
4
Building Real-Time
Requires a Team
Proprietary & Confidential |
Agenda
5
• Introduction
• What is Apache Pulsar?
• Pulsar to Pinot
• Demo
• Q&A
Proprietary & Confidential |
FinTech
● Fraud prevention
● Customer 360
● Personalization
● Threat detection
eCommerce
● Dynamic pricing
● Digital payments
● Omnichannel inventory
optimization
AdTech
● Real-time bidding
● Ad serving/exchange
● Personalized promos
● Identity graph
6
Verticals and use cases
Proprietary & Confidential |
IoT
● Predictive maintenance
● Track and trace
● Connected supply chain
● Geo-location based alerts
Telecommunications
● Network optimization
● Churn prevention
● Real-time in-service
promos & discounting
Data Lake
● Data pipeline acceleration
● Real-time analytics
● Real-time decisioning
Verticals and use cases
7
A streaming data platform
for cloud-native,
event-driven applications.
Proprietary & Confidential | 9
Founded by the original creators of
Apache Pulsar.
StreamNative employs more than 50% of
the active core committers to Apache
Pulsar.
StreamNative has more experience
designing, deploying, and running
large-scale Apache Pulsar instances
than any team in the world.
10
Apache Pulsar has a vibrant community
560+
Contributors
10,000+
Commits
7,000+
Slack Members
1,000+
Organizations
Using Pulsar
Proprietary & Confidential |
Guaranteed
Message
Delivery
11
Unified
Messaging
Platform
Resiliency
The basics
Infinite
Scalability
Proprietary & Confidential | 12
Pulsar Cluster
● “Bookies”
● Stores messages and cursors
● Messages are grouped in
segments/ledgers
● A group of bookies form an
“ensemble” to store a ledger
● “Brokers”
● Handles message routing and
connections
● Stateless, but with caches
● Automatic load-balancing
● Topics are composed of
multiple segments
●
● Stores metadata for
both Pulsar and
BookKeeper
● Service discovery
Store
Messages
Metadata &
Service Discovery
Metadata &
Service Discovery
Metadata
Storage
Tenants
(Compliance)
Tenants
(Data Services)
Namespace
(Microservices)
Topic-1
(Cust Auth)
Topic-1
(Location Resolution)
Topic-2
(Demographics)
Topic-1
(Budgeted Spend)
Topic-1
(Acct History)
Topic-1
(Risk Detection)
Namespace
(ETL)
Namespace
(Campaigns)
Namespace
(ETL)
Tenants
(Marketing)
Namespace
(Risk Assessment)
Pulsar Cluster
13
Tenant - Namespaces - Topics
Proprietary & Confidential |
Messages - the basic unit of Pulsar
14
Component Description
Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data
can also conform to data schemas.
Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like
topic compaction.
Properties An optional key/value map of user-defined properties.
Producer name The name of the producer who produces the message. If you do not specify a producer name, the
default name is used.
Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the
message is its order in that sequence.
Proprietary & Confidential |
Streaming
Consumer
Consumer
Consumer
Subscription
Shared
Failover
Consumer
Consumer
Subscription
In case of failure in
Consumer B-0
Consumer
Consumer
Subscription
Exclusive
X
Consumer
Consumer
Key-Shared
Subscription
Pulsar
Topic/Partition
Messaging
15
Proprietary & Confidential |
Integrated Schema Registry
16
Schema Registry
schema-1 (value=Avro/Protobuf/JSON) schema-2
(value=Avro/Protobuf/JSON)
schema-3
(value=Avro/Protobuf/JSON)
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
(value=Avro/Protobuf/JSON) data
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
(value=Avro/Protobuf/JSON) data
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers
Proprietary & Confidential |
Connections
17
● Functions - Lightweight Stream
Processing (Java, Python, Go)
● Connectors - Sources & Sinks
(Cassandra, Kafka, …)
● Protocol Handlers - AoP (AMQP), KoP
(Kafka), MoP (MQTT)
● Processing Engines - Flink, Spark,
Presto/Trino via Pulsar SQL
● Data Offloaders - Tiered Storage - (S3)
Proprietary & Confidential |
The FliPN kitten crosses the stream
4 ways with Apache Pulsar
18
Proprietary & Confidential |
Kafka on Pulsar (KoP)
19
Proprietary & Confidential |
MQTT on Pulsar (MoP)
20
Proprietary & Confidential |
AMQP on Pulsar (AoP)
21
Proprietary & Confidential |
Kafka to Pulsar
22
Proprietary & Confidential | 23
A serverless event streaming
framework
Pulsar Functions
● Lightweight computation similar to
AWS Lambda.
● Specifically designed to use Apache
Pulsar as a message bus.
● Function runtime can be located
within Pulsar Broker.
● Java Functions
Proprietary & Confidential | 24
Pulsar Functions
● Consume messages from one or
more Pulsar topics.
● Apply user-supplied processing
logic to each message.
● Publish the results of the
computation to another topic.
● Support multiple programming
languages (Java, Python, Go)
● Can leverage 3rd-party libraries to
support the execution of ML
models on the edge.
Proprietary & Confidential |
Apache NiFi - Apache Pulsar Connector
25
https://github.com/streamnative/pulsar-nifi-bundle
Proprietary & Confidential |
Apache Flink
26
● Unified computing engine
● Batch processing is a special
case of stream processing
● Stateful processing
● Massive Scalability
● Flink SQL for queries, inserts
against Pulsar Topics
● Streaming Analytics
● Continuous SQL
● Continuous ETL
● Complex Event Processing
● Standard SQL Powered by
Apache Calcite
Proprietary & Confidential | 27
https://dev.startree.ai/docs/pinot/recipes/pulsar
https://docs.pinot.apache.org/basics/data-import/pinot-stream-ingestion/apache-pulsar
Proprietary & Confidential | 28
Proprietary & Confidential | 29
Proprietary & Confidential |
Proprietary & Confidential | 31
https://github.com/tspannhw/pulsar-thermal-pinot
Proprietary & Confidential | 32
Proprietary & Confidential | 33
Apache Pulsar
in Action
Please enjoy David’s complete
book which is the ultimate
guide to Pulsar.
Proprietary & Confidential | 34
Proprietary & Confidential |
@PassDev
https://www.linkedin.com/in/timothyspann
https://github.com/tspannhw
https://streamnative.io/pulsar-python/
35
Tim Spann
Developer Advocate
at StreamNative

NYC Dec 2022 Meetup_ Building Real-Time Requires a Team

  • 1.
    Building Real-Time Requires aTeam Tim Spann Developer Advocate
  • 2.
    Proprietary & Confidential| 2 Tim Spann Developer Advocate at StreamNative FLiP(N) Stack = Flink, Pulsar and NiFi Stack Streaming Systems & Data Architecture Expert Experience: ● 15+ years of experience with streaming technologies including Pulsar, Flink, Spark, NiFi, Big Data, Cloud, MXNet, IoT, Python and more. ● Today, he helps to grow the Pulsar community sharing rich technical knowledge and experience at both global conferences and through individual conversations.
  • 3.
    Proprietary & Confidential| https://bit.ly/32dAJft 3 FLiP Stack Weekly This week in Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark and open source friends.
  • 4.
  • 5.
    Proprietary & Confidential| Agenda 5 • Introduction • What is Apache Pulsar? • Pulsar to Pinot • Demo • Q&A
  • 6.
    Proprietary & Confidential| FinTech ● Fraud prevention ● Customer 360 ● Personalization ● Threat detection eCommerce ● Dynamic pricing ● Digital payments ● Omnichannel inventory optimization AdTech ● Real-time bidding ● Ad serving/exchange ● Personalized promos ● Identity graph 6 Verticals and use cases
  • 7.
    Proprietary & Confidential| IoT ● Predictive maintenance ● Track and trace ● Connected supply chain ● Geo-location based alerts Telecommunications ● Network optimization ● Churn prevention ● Real-time in-service promos & discounting Data Lake ● Data pipeline acceleration ● Real-time analytics ● Real-time decisioning Verticals and use cases 7
  • 8.
    A streaming dataplatform for cloud-native, event-driven applications.
  • 9.
    Proprietary & Confidential| 9 Founded by the original creators of Apache Pulsar. StreamNative employs more than 50% of the active core committers to Apache Pulsar. StreamNative has more experience designing, deploying, and running large-scale Apache Pulsar instances than any team in the world.
  • 10.
    10 Apache Pulsar hasa vibrant community 560+ Contributors 10,000+ Commits 7,000+ Slack Members 1,000+ Organizations Using Pulsar
  • 11.
    Proprietary & Confidential| Guaranteed Message Delivery 11 Unified Messaging Platform Resiliency The basics Infinite Scalability
  • 12.
    Proprietary & Confidential| 12 Pulsar Cluster ● “Bookies” ● Stores messages and cursors ● Messages are grouped in segments/ledgers ● A group of bookies form an “ensemble” to store a ledger ● “Brokers” ● Handles message routing and connections ● Stateless, but with caches ● Automatic load-balancing ● Topics are composed of multiple segments ● ● Stores metadata for both Pulsar and BookKeeper ● Service discovery Store Messages Metadata & Service Discovery Metadata & Service Discovery Metadata Storage
  • 13.
    Tenants (Compliance) Tenants (Data Services) Namespace (Microservices) Topic-1 (Cust Auth) Topic-1 (LocationResolution) Topic-2 (Demographics) Topic-1 (Budgeted Spend) Topic-1 (Acct History) Topic-1 (Risk Detection) Namespace (ETL) Namespace (Campaigns) Namespace (ETL) Tenants (Marketing) Namespace (Risk Assessment) Pulsar Cluster 13 Tenant - Namespaces - Topics
  • 14.
    Proprietary & Confidential| Messages - the basic unit of Pulsar 14 Component Description Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data can also conform to data schemas. Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like topic compaction. Properties An optional key/value map of user-defined properties. Producer name The name of the producer who produces the message. If you do not specify a producer name, the default name is used. Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the message is its order in that sequence.
  • 15.
    Proprietary & Confidential| Streaming Consumer Consumer Consumer Subscription Shared Failover Consumer Consumer Subscription In case of failure in Consumer B-0 Consumer Consumer Subscription Exclusive X Consumer Consumer Key-Shared Subscription Pulsar Topic/Partition Messaging 15
  • 16.
    Proprietary & Confidential| Integrated Schema Registry 16 Schema Registry schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3 (value=Avro/Protobuf/JSON) Schema Data ID Local Cache for Schemas + Schema Data ID + Local Cache for Schemas Send schema-1 (value=Avro/Protobuf/JSON) data serialized per schema ID Send (register) schema (if not in local cache) Read schema-1 (value=Avro/Protobuf/JSON) data deserialized per schema ID Get schema by ID (if not in local cache) Producers Consumers
  • 17.
    Proprietary & Confidential| Connections 17 ● Functions - Lightweight Stream Processing (Java, Python, Go) ● Connectors - Sources & Sinks (Cassandra, Kafka, …) ● Protocol Handlers - AoP (AMQP), KoP (Kafka), MoP (MQTT) ● Processing Engines - Flink, Spark, Presto/Trino via Pulsar SQL ● Data Offloaders - Tiered Storage - (S3)
  • 18.
    Proprietary & Confidential| The FliPN kitten crosses the stream 4 ways with Apache Pulsar 18
  • 19.
    Proprietary & Confidential| Kafka on Pulsar (KoP) 19
  • 20.
    Proprietary & Confidential| MQTT on Pulsar (MoP) 20
  • 21.
    Proprietary & Confidential| AMQP on Pulsar (AoP) 21
  • 22.
    Proprietary & Confidential| Kafka to Pulsar 22
  • 23.
    Proprietary & Confidential| 23 A serverless event streaming framework Pulsar Functions ● Lightweight computation similar to AWS Lambda. ● Specifically designed to use Apache Pulsar as a message bus. ● Function runtime can be located within Pulsar Broker. ● Java Functions
  • 24.
    Proprietary & Confidential| 24 Pulsar Functions ● Consume messages from one or more Pulsar topics. ● Apply user-supplied processing logic to each message. ● Publish the results of the computation to another topic. ● Support multiple programming languages (Java, Python, Go) ● Can leverage 3rd-party libraries to support the execution of ML models on the edge.
  • 25.
    Proprietary & Confidential| Apache NiFi - Apache Pulsar Connector 25 https://github.com/streamnative/pulsar-nifi-bundle
  • 26.
    Proprietary & Confidential| Apache Flink 26 ● Unified computing engine ● Batch processing is a special case of stream processing ● Stateful processing ● Massive Scalability ● Flink SQL for queries, inserts against Pulsar Topics ● Streaming Analytics ● Continuous SQL ● Continuous ETL ● Complex Event Processing ● Standard SQL Powered by Apache Calcite
  • 27.
    Proprietary & Confidential| 27 https://dev.startree.ai/docs/pinot/recipes/pulsar https://docs.pinot.apache.org/basics/data-import/pinot-stream-ingestion/apache-pulsar
  • 28.
  • 29.
  • 30.
  • 31.
    Proprietary & Confidential| 31 https://github.com/tspannhw/pulsar-thermal-pinot
  • 32.
  • 33.
    Proprietary & Confidential| 33 Apache Pulsar in Action Please enjoy David’s complete book which is the ultimate guide to Pulsar.
  • 34.
  • 35.
    Proprietary & Confidential| @PassDev https://www.linkedin.com/in/timothyspann https://github.com/tspannhw https://streamnative.io/pulsar-python/ 35 Tim Spann Developer Advocate at StreamNative