NYC Dec 2022 Meetup_ Building Real-Time Requires a Team

Timothy Spann
Timothy SpannDeveloper Advocate
Building Real-Time
Requires a Team
Tim Spann
Developer Advocate
Proprietary & Confidential | 2
Tim Spann
Developer Advocate
at StreamNative
FLiP(N) Stack = Flink, Pulsar and NiFi Stack
Streaming Systems & Data Architecture Expert
Experience:
● 15+ years of experience with streaming technologies
including Pulsar, Flink, Spark, NiFi, Big Data, Cloud, MXNet,
IoT, Python and more.
● Today, he helps to grow the Pulsar community sharing rich
technical knowledge and experience at both global
conferences and through individual conversations.
Proprietary & Confidential |
https://bit.ly/32dAJft
3
FLiP Stack Weekly
This week in Apache Flink, Apache
Pulsar, Apache NiFi, Apache Spark and
open source friends.
4
Building Real-Time
Requires a Team
Proprietary & Confidential |
Agenda
5
• Introduction
• What is Apache Pulsar?
• Pulsar to Pinot
• Demo
• Q&A
Proprietary & Confidential |
FinTech
● Fraud prevention
● Customer 360
● Personalization
● Threat detection
eCommerce
● Dynamic pricing
● Digital payments
● Omnichannel inventory
optimization
AdTech
● Real-time bidding
● Ad serving/exchange
● Personalized promos
● Identity graph
6
Verticals and use cases
Proprietary & Confidential |
IoT
● Predictive maintenance
● Track and trace
● Connected supply chain
● Geo-location based alerts
Telecommunications
● Network optimization
● Churn prevention
● Real-time in-service
promos & discounting
Data Lake
● Data pipeline acceleration
● Real-time analytics
● Real-time decisioning
Verticals and use cases
7
A streaming data platform
for cloud-native,
event-driven applications.
Proprietary & Confidential | 9
Founded by the original creators of
Apache Pulsar.
StreamNative employs more than 50% of
the active core committers to Apache
Pulsar.
StreamNative has more experience
designing, deploying, and running
large-scale Apache Pulsar instances
than any team in the world.
10
Apache Pulsar has a vibrant community
560+
Contributors
10,000+
Commits
7,000+
Slack Members
1,000+
Organizations
Using Pulsar
Proprietary & Confidential |
Guaranteed
Message
Delivery
11
Unified
Messaging
Platform
Resiliency
The basics
Infinite
Scalability
Proprietary & Confidential | 12
Pulsar Cluster
● “Bookies”
● Stores messages and cursors
● Messages are grouped in
segments/ledgers
● A group of bookies form an
“ensemble” to store a ledger
● “Brokers”
● Handles message routing and
connections
● Stateless, but with caches
● Automatic load-balancing
● Topics are composed of
multiple segments
●
● Stores metadata for
both Pulsar and
BookKeeper
● Service discovery
Store
Messages
Metadata &
Service Discovery
Metadata &
Service Discovery
Metadata
Storage
Tenants
(Compliance)
Tenants
(Data Services)
Namespace
(Microservices)
Topic-1
(Cust Auth)
Topic-1
(Location Resolution)
Topic-2
(Demographics)
Topic-1
(Budgeted Spend)
Topic-1
(Acct History)
Topic-1
(Risk Detection)
Namespace
(ETL)
Namespace
(Campaigns)
Namespace
(ETL)
Tenants
(Marketing)
Namespace
(Risk Assessment)
Pulsar Cluster
13
Tenant - Namespaces - Topics
Proprietary & Confidential |
Messages - the basic unit of Pulsar
14
Component Description
Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data
can also conform to data schemas.
Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like
topic compaction.
Properties An optional key/value map of user-defined properties.
Producer name The name of the producer who produces the message. If you do not specify a producer name, the
default name is used.
Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the
message is its order in that sequence.
Proprietary & Confidential |
Streaming
Consumer
Consumer
Consumer
Subscription
Shared
Failover
Consumer
Consumer
Subscription
In case of failure in
Consumer B-0
Consumer
Consumer
Subscription
Exclusive
X
Consumer
Consumer
Key-Shared
Subscription
Pulsar
Topic/Partition
Messaging
15
Proprietary & Confidential |
Integrated Schema Registry
16
Schema Registry
schema-1 (value=Avro/Protobuf/JSON) schema-2
(value=Avro/Protobuf/JSON)
schema-3
(value=Avro/Protobuf/JSON)
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
(value=Avro/Protobuf/JSON) data
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
(value=Avro/Protobuf/JSON) data
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers
Proprietary & Confidential |
Connections
17
● Functions - Lightweight Stream
Processing (Java, Python, Go)
● Connectors - Sources & Sinks
(Cassandra, Kafka, …)
● Protocol Handlers - AoP (AMQP), KoP
(Kafka), MoP (MQTT)
● Processing Engines - Flink, Spark,
Presto/Trino via Pulsar SQL
● Data Offloaders - Tiered Storage - (S3)
Proprietary & Confidential |
The FliPN kitten crosses the stream
4 ways with Apache Pulsar
18
Proprietary & Confidential |
Kafka on Pulsar (KoP)
19
Proprietary & Confidential |
MQTT on Pulsar (MoP)
20
Proprietary & Confidential |
AMQP on Pulsar (AoP)
21
Proprietary & Confidential |
Kafka to Pulsar
22
Proprietary & Confidential | 23
A serverless event streaming
framework
Pulsar Functions
● Lightweight computation similar to
AWS Lambda.
● Specifically designed to use Apache
Pulsar as a message bus.
● Function runtime can be located
within Pulsar Broker.
● Java Functions
Proprietary & Confidential | 24
Pulsar Functions
● Consume messages from one or
more Pulsar topics.
● Apply user-supplied processing
logic to each message.
● Publish the results of the
computation to another topic.
● Support multiple programming
languages (Java, Python, Go)
● Can leverage 3rd-party libraries to
support the execution of ML
models on the edge.
Proprietary & Confidential |
Apache NiFi - Apache Pulsar Connector
25
https://github.com/streamnative/pulsar-nifi-bundle
Proprietary & Confidential |
Apache Flink
26
● Unified computing engine
● Batch processing is a special
case of stream processing
● Stateful processing
● Massive Scalability
● Flink SQL for queries, inserts
against Pulsar Topics
● Streaming Analytics
● Continuous SQL
● Continuous ETL
● Complex Event Processing
● Standard SQL Powered by
Apache Calcite
Proprietary & Confidential | 27
https://dev.startree.ai/docs/pinot/recipes/pulsar
https://docs.pinot.apache.org/basics/data-import/pinot-stream-ingestion/apache-pulsar
Proprietary & Confidential | 28
Proprietary & Confidential | 29
Proprietary & Confidential |
Proprietary & Confidential | 31
https://github.com/tspannhw/pulsar-thermal-pinot
Proprietary & Confidential | 32
Proprietary & Confidential | 33
Apache Pulsar
in Action
Please enjoy David’s complete
book which is the ultimate
guide to Pulsar.
Proprietary & Confidential | 34
Proprietary & Confidential |
@PassDev
https://www.linkedin.com/in/timothyspann
https://github.com/tspannhw
https://streamnative.io/pulsar-python/
35
Tim Spann
Developer Advocate
at StreamNative
1 of 35

Recommended

MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdf by
MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdfMLconf 2022 NYC Event-Driven Machine Learning at Scale.pdf
MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdfTimothy Spann
747 views26 slides
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) by
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) Timothy Spann
305 views71 slides
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum... by
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...
Let’s Monitor Conditions at the Conference With Timothy Spann & David Kjerrum...HostedbyConfluent
336 views49 slides
(Current22) Let's Monitor The Conditions at the Conference by
(Current22) Let's Monitor The Conditions at the Conference(Current22) Let's Monitor The Conditions at the Conference
(Current22) Let's Monitor The Conditions at the ConferenceTimothy Spann
150 views49 slides
[Conf42-KubeNative] Building Real-time Pulsar Apps on K8 by
[Conf42-KubeNative] Building Real-time Pulsar Apps on K8[Conf42-KubeNative] Building Real-time Pulsar Apps on K8
[Conf42-KubeNative] Building Real-time Pulsar Apps on K8Timothy Spann
241 views14 slides
[AI Dev World 2022] Build ML Enhanced Event Streaming by
[AI Dev World 2022] Build ML Enhanced Event Streaming[AI Dev World 2022] Build ML Enhanced Event Streaming
[AI Dev World 2022] Build ML Enhanced Event StreamingTimothy Spann
201 views22 slides

More Related Content

Similar to NYC Dec 2022 Meetup_ Building Real-Time Requires a Team

bigdata 2022_ FLiP Into Pulsar Apps by
bigdata 2022_ FLiP Into Pulsar Appsbigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar AppsTimothy Spann
460 views60 slides
Big mountain data and dev conference apache pulsar with mqtt for edge compu... by
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...Timothy Spann
440 views47 slides
JConf.dev 2022 - Apache Pulsar Development 101 with Java by
JConf.dev 2022 - Apache Pulsar Development 101 with JavaJConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with JavaTimothy Spann
216 views59 slides
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud) by
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)Timothy Spann
18 views48 slides
Timothy Spann: Apache Pulsar for ML by
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLEdunomica
37 views65 slides
Building an Event Streaming Architecture with Apache Pulsar by
Building an Event Streaming Architecture with Apache PulsarBuilding an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache PulsarScyllaDB
136 views28 slides

Similar to NYC Dec 2022 Meetup_ Building Real-Time Requires a Team(20)

bigdata 2022_ FLiP Into Pulsar Apps by Timothy Spann
bigdata 2022_ FLiP Into Pulsar Appsbigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Apps
Timothy Spann460 views
Big mountain data and dev conference apache pulsar with mqtt for edge compu... by Timothy Spann
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Timothy Spann440 views
JConf.dev 2022 - Apache Pulsar Development 101 with Java by Timothy Spann
JConf.dev 2022 - Apache Pulsar Development 101 with JavaJConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with Java
Timothy Spann216 views
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud) by Timothy Spann
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Timothy Spann18 views
Timothy Spann: Apache Pulsar for ML by Edunomica
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
Edunomica37 views
Building an Event Streaming Architecture with Apache Pulsar by ScyllaDB
Building an Event Streaming Architecture with Apache PulsarBuilding an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache Pulsar
ScyllaDB136 views
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi... by Timothy Spann
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Timothy Spann197 views
[March sn meetup] apache pulsar + apache nifi for cloud data lake by Timothy Spann
[March sn meetup] apache pulsar + apache nifi for cloud data lake[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake
Timothy Spann903 views
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar by Timothy Spann
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache PulsarApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
ApacheCon2022_Deep Dive into Building Streaming Applications with Apache Pulsar
Timothy Spann175 views
Python web conference 2022 apache pulsar development 101 with python (f li-... by Timothy Spann
Python web conference 2022   apache pulsar development 101 with python (f li-...Python web conference 2022   apache pulsar development 101 with python (f li-...
Python web conference 2022 apache pulsar development 101 with python (f li-...
Timothy Spann282 views
DBCC 2021 - FLiP Stack for Cloud Data Lakes by Timothy Spann
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data Lakes
Timothy Spann717 views
Data science online camp using the flipn stack for edge ai (flink, nifi, pu... by Timothy Spann
Data science online camp   using the flipn stack for edge ai (flink, nifi, pu...Data science online camp   using the flipn stack for edge ai (flink, nifi, pu...
Data science online camp using the flipn stack for edge ai (flink, nifi, pu...
Timothy Spann1K views
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and... by Timothy Spann
Osacon 2021   hello hydrate! from stream to clickhouse with apache pulsar and...Osacon 2021   hello hydrate! from stream to clickhouse with apache pulsar and...
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and...
Timothy Spann3.1K views
Serverless Event Streaming Applications as Functionson K8 by Timothy Spann
Serverless Event Streaming Applications as Functionson K8Serverless Event Streaming Applications as Functionson K8
Serverless Event Streaming Applications as Functionson K8
Timothy Spann361 views
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid... by Timothy Spann
Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
Timothy Spann757 views
OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar by Timothy Spann
OSS EU:  Deep Dive into Building Streaming Applications with Apache PulsarOSS EU:  Deep Dive into Building Streaming Applications with Apache Pulsar
OSS EU: Deep Dive into Building Streaming Applications with Apache Pulsar
Timothy Spann822 views
CODEONTHEBEACH_Streaming Applications with Apache Pulsar by Timothy Spann
CODEONTHEBEACH_Streaming Applications with Apache PulsarCODEONTHEBEACH_Streaming Applications with Apache Pulsar
CODEONTHEBEACH_Streaming Applications with Apache Pulsar
Timothy Spann47 views
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py) by Timothy Spann
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)
Python Web Conference 2022 - Apache Pulsar Development 101 with Python (FLiP-Py)
Timothy Spann172 views
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing by Timothy Spann
Pulsar summit asia 2021   apache pulsar with mqtt for edge computingPulsar summit asia 2021   apache pulsar with mqtt for edge computing
Pulsar summit asia 2021 apache pulsar with mqtt for edge computing
Timothy Spann366 views
Automation + dev ops summit hail hydrate! from stream to lake by Timothy Spann
Automation + dev ops summit   hail hydrate! from stream to lakeAutomation + dev ops summit   hail hydrate! from stream to lake
Automation + dev ops summit hail hydrate! from stream to lake
Timothy Spann457 views

More from Timothy Spann

Building Real-Time Travel Alerts by
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel AlertsTimothy Spann
165 views48 slides
JConWorld_ Continuous SQL with Kafka and Flink by
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
156 views36 slides
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines by
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data PipelinesTimothy Spann
150 views25 slides
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo by
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoTimothy Spann
162 views8 slides
CoC23_ Looking at the New Features of Apache NiFi by
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiTimothy Spann
36 views24 slides
CoC23_ Let’s Monitor The Conditions at the Conference by
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the ConferenceTimothy Spann
17 views17 slides

More from Timothy Spann(20)

Building Real-Time Travel Alerts by Timothy Spann
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
Timothy Spann165 views
JConWorld_ Continuous SQL with Kafka and Flink by Timothy Spann
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann156 views
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines by Timothy Spann
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
Timothy Spann150 views
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo by Timothy Spann
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Timothy Spann162 views
CoC23_ Looking at the New Features of Apache NiFi by Timothy Spann
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFi
Timothy Spann36 views
CoC23_ Let’s Monitor The Conditions at the Conference by Timothy Spann
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the Conference
Timothy Spann17 views
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf by Timothy Spann
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
Timothy Spann23 views
CoC23_Utilizing Real-Time Transit Data for Travel Optimization by Timothy Spann
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationCoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
Timothy Spann31 views
The Never Landing Stream with HTAP and Streaming by Timothy Spann
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
Timothy Spann254 views
Meetup - Brasil - Data In Motion - 2023 September 19 by Timothy Spann
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19
Timothy Spann319 views
Implement a Universal Data Distribution Architecture to Manage All Streaming ... by Timothy Spann
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Timothy Spann28 views
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data by Timothy Spann
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Timothy Spann193 views
big data fest building modern data streaming apps by Timothy Spann
big data fest building modern data streaming appsbig data fest building modern data streaming apps
big data fest building modern data streaming apps
Timothy Spann317 views
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp by Timothy Spann
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Timothy Spann163 views
OSSNA Building Modern Data Streaming Apps by Timothy Spann
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
Timothy Spann155 views
GSJUG: Mastering Data Streaming Pipelines 09May2023 by Timothy Spann
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
Timothy Spann255 views
BestInFlowCompetitionTutorials03May2023 by Timothy Spann
BestInFlowCompetitionTutorials03May2023BestInFlowCompetitionTutorials03May2023
BestInFlowCompetitionTutorials03May2023
Timothy Spann11 views
Cloudera Sandbox Event Guidelines For Workflow by Timothy Spann
Cloudera Sandbox Event Guidelines For WorkflowCloudera Sandbox Event Guidelines For Workflow
Cloudera Sandbox Event Guidelines For Workflow
Timothy Spann32 views
Meet the Committers Webinar_ Lab Preparation by Timothy Spann
Meet the Committers Webinar_ Lab PreparationMeet the Committers Webinar_ Lab Preparation
Meet the Committers Webinar_ Lab Preparation
Timothy Spann32 views

Recently uploaded

Unlocking the Power of AI in Product Management - A Comprehensive Guide for P... by
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...NimaTorabi2
17 views17 slides
Advanced API Mocking Techniques Using Wiremock by
Advanced API Mocking Techniques Using WiremockAdvanced API Mocking Techniques Using Wiremock
Advanced API Mocking Techniques Using WiremockDimpy Adhikary
5 views11 slides
JioEngage_Presentation.pptx by
JioEngage_Presentation.pptxJioEngage_Presentation.pptx
JioEngage_Presentation.pptxadmin125455
9 views4 slides
aATP - New Correlation Confirmation Feature.pptx by
aATP - New Correlation Confirmation Feature.pptxaATP - New Correlation Confirmation Feature.pptx
aATP - New Correlation Confirmation Feature.pptxEsatEsenek1
222 views6 slides
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile... by
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...Stefan Wolpers
44 views38 slides
Quality Assurance by
Quality Assurance Quality Assurance
Quality Assurance interworksoftware2
8 views6 slides

Recently uploaded(20)

Unlocking the Power of AI in Product Management - A Comprehensive Guide for P... by NimaTorabi2
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
NimaTorabi217 views
Advanced API Mocking Techniques Using Wiremock by Dimpy Adhikary
Advanced API Mocking Techniques Using WiremockAdvanced API Mocking Techniques Using Wiremock
Advanced API Mocking Techniques Using Wiremock
Dimpy Adhikary5 views
JioEngage_Presentation.pptx by admin125455
JioEngage_Presentation.pptxJioEngage_Presentation.pptx
JioEngage_Presentation.pptx
admin1254559 views
aATP - New Correlation Confirmation Feature.pptx by EsatEsenek1
aATP - New Correlation Confirmation Feature.pptxaATP - New Correlation Confirmation Feature.pptx
aATP - New Correlation Confirmation Feature.pptx
EsatEsenek1222 views
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile... by Stefan Wolpers
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...
Stefan Wolpers44 views
How to build dyanmic dashboards and ensure they always work by Wiiisdom
How to build dyanmic dashboards and ensure they always workHow to build dyanmic dashboards and ensure they always work
How to build dyanmic dashboards and ensure they always work
Wiiisdom16 views
Ports-and-Adapters Architecture for Embedded HMI by Burkhard Stubert
Ports-and-Adapters Architecture for Embedded HMIPorts-and-Adapters Architecture for Embedded HMI
Ports-and-Adapters Architecture for Embedded HMI
Burkhard Stubert35 views
Top-5-production-devconMunich-2023.pptx by Tier1 app
Top-5-production-devconMunich-2023.pptxTop-5-production-devconMunich-2023.pptx
Top-5-production-devconMunich-2023.pptx
Tier1 app10 views
tecnologia18.docx by nosi6702
tecnologia18.docxtecnologia18.docx
tecnologia18.docx
nosi67026 views
Automated Testing of Microsoft Power BI Reports by RTTS
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI Reports
RTTS11 views
Transport Management System - Shipment & Container Tracking by Freightoscope
Transport Management System - Shipment & Container TrackingTransport Management System - Shipment & Container Tracking
Transport Management System - Shipment & Container Tracking
Freightoscope 6 views
Top-5-production-devconMunich-2023-v2.pptx by Tier1 app
Top-5-production-devconMunich-2023-v2.pptxTop-5-production-devconMunich-2023-v2.pptx
Top-5-production-devconMunich-2023-v2.pptx
Tier1 app9 views
Mobile App Development Company by Richestsoft
Mobile App Development CompanyMobile App Development Company
Mobile App Development Company
Richestsoft 5 views

NYC Dec 2022 Meetup_ Building Real-Time Requires a Team

  • 1. Building Real-Time Requires a Team Tim Spann Developer Advocate
  • 2. Proprietary & Confidential | 2 Tim Spann Developer Advocate at StreamNative FLiP(N) Stack = Flink, Pulsar and NiFi Stack Streaming Systems & Data Architecture Expert Experience: ● 15+ years of experience with streaming technologies including Pulsar, Flink, Spark, NiFi, Big Data, Cloud, MXNet, IoT, Python and more. ● Today, he helps to grow the Pulsar community sharing rich technical knowledge and experience at both global conferences and through individual conversations.
  • 3. Proprietary & Confidential | https://bit.ly/32dAJft 3 FLiP Stack Weekly This week in Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark and open source friends.
  • 5. Proprietary & Confidential | Agenda 5 • Introduction • What is Apache Pulsar? • Pulsar to Pinot • Demo • Q&A
  • 6. Proprietary & Confidential | FinTech ● Fraud prevention ● Customer 360 ● Personalization ● Threat detection eCommerce ● Dynamic pricing ● Digital payments ● Omnichannel inventory optimization AdTech ● Real-time bidding ● Ad serving/exchange ● Personalized promos ● Identity graph 6 Verticals and use cases
  • 7. Proprietary & Confidential | IoT ● Predictive maintenance ● Track and trace ● Connected supply chain ● Geo-location based alerts Telecommunications ● Network optimization ● Churn prevention ● Real-time in-service promos & discounting Data Lake ● Data pipeline acceleration ● Real-time analytics ● Real-time decisioning Verticals and use cases 7
  • 8. A streaming data platform for cloud-native, event-driven applications.
  • 9. Proprietary & Confidential | 9 Founded by the original creators of Apache Pulsar. StreamNative employs more than 50% of the active core committers to Apache Pulsar. StreamNative has more experience designing, deploying, and running large-scale Apache Pulsar instances than any team in the world.
  • 10. 10 Apache Pulsar has a vibrant community 560+ Contributors 10,000+ Commits 7,000+ Slack Members 1,000+ Organizations Using Pulsar
  • 11. Proprietary & Confidential | Guaranteed Message Delivery 11 Unified Messaging Platform Resiliency The basics Infinite Scalability
  • 12. Proprietary & Confidential | 12 Pulsar Cluster ● “Bookies” ● Stores messages and cursors ● Messages are grouped in segments/ledgers ● A group of bookies form an “ensemble” to store a ledger ● “Brokers” ● Handles message routing and connections ● Stateless, but with caches ● Automatic load-balancing ● Topics are composed of multiple segments ● ● Stores metadata for both Pulsar and BookKeeper ● Service discovery Store Messages Metadata & Service Discovery Metadata & Service Discovery Metadata Storage
  • 13. Tenants (Compliance) Tenants (Data Services) Namespace (Microservices) Topic-1 (Cust Auth) Topic-1 (Location Resolution) Topic-2 (Demographics) Topic-1 (Budgeted Spend) Topic-1 (Acct History) Topic-1 (Risk Detection) Namespace (ETL) Namespace (Campaigns) Namespace (ETL) Tenants (Marketing) Namespace (Risk Assessment) Pulsar Cluster 13 Tenant - Namespaces - Topics
  • 14. Proprietary & Confidential | Messages - the basic unit of Pulsar 14 Component Description Value / data payload The data carried by the message. All Pulsar messages contain raw bytes, although message data can also conform to data schemas. Key Messages are optionally tagged with keys, used in partitioning and also is useful for things like topic compaction. Properties An optional key/value map of user-defined properties. Producer name The name of the producer who produces the message. If you do not specify a producer name, the default name is used. Sequence ID Each Pulsar message belongs to an ordered sequence on its topic. The sequence ID of the message is its order in that sequence.
  • 15. Proprietary & Confidential | Streaming Consumer Consumer Consumer Subscription Shared Failover Consumer Consumer Subscription In case of failure in Consumer B-0 Consumer Consumer Subscription Exclusive X Consumer Consumer Key-Shared Subscription Pulsar Topic/Partition Messaging 15
  • 16. Proprietary & Confidential | Integrated Schema Registry 16 Schema Registry schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3 (value=Avro/Protobuf/JSON) Schema Data ID Local Cache for Schemas + Schema Data ID + Local Cache for Schemas Send schema-1 (value=Avro/Protobuf/JSON) data serialized per schema ID Send (register) schema (if not in local cache) Read schema-1 (value=Avro/Protobuf/JSON) data deserialized per schema ID Get schema by ID (if not in local cache) Producers Consumers
  • 17. Proprietary & Confidential | Connections 17 ● Functions - Lightweight Stream Processing (Java, Python, Go) ● Connectors - Sources & Sinks (Cassandra, Kafka, …) ● Protocol Handlers - AoP (AMQP), KoP (Kafka), MoP (MQTT) ● Processing Engines - Flink, Spark, Presto/Trino via Pulsar SQL ● Data Offloaders - Tiered Storage - (S3)
  • 18. Proprietary & Confidential | The FliPN kitten crosses the stream 4 ways with Apache Pulsar 18
  • 19. Proprietary & Confidential | Kafka on Pulsar (KoP) 19
  • 20. Proprietary & Confidential | MQTT on Pulsar (MoP) 20
  • 21. Proprietary & Confidential | AMQP on Pulsar (AoP) 21
  • 22. Proprietary & Confidential | Kafka to Pulsar 22
  • 23. Proprietary & Confidential | 23 A serverless event streaming framework Pulsar Functions ● Lightweight computation similar to AWS Lambda. ● Specifically designed to use Apache Pulsar as a message bus. ● Function runtime can be located within Pulsar Broker. ● Java Functions
  • 24. Proprietary & Confidential | 24 Pulsar Functions ● Consume messages from one or more Pulsar topics. ● Apply user-supplied processing logic to each message. ● Publish the results of the computation to another topic. ● Support multiple programming languages (Java, Python, Go) ● Can leverage 3rd-party libraries to support the execution of ML models on the edge.
  • 25. Proprietary & Confidential | Apache NiFi - Apache Pulsar Connector 25 https://github.com/streamnative/pulsar-nifi-bundle
  • 26. Proprietary & Confidential | Apache Flink 26 ● Unified computing engine ● Batch processing is a special case of stream processing ● Stateful processing ● Massive Scalability ● Flink SQL for queries, inserts against Pulsar Topics ● Streaming Analytics ● Continuous SQL ● Continuous ETL ● Complex Event Processing ● Standard SQL Powered by Apache Calcite
  • 27. Proprietary & Confidential | 27 https://dev.startree.ai/docs/pinot/recipes/pulsar https://docs.pinot.apache.org/basics/data-import/pinot-stream-ingestion/apache-pulsar
  • 31. Proprietary & Confidential | 31 https://github.com/tspannhw/pulsar-thermal-pinot
  • 33. Proprietary & Confidential | 33 Apache Pulsar in Action Please enjoy David’s complete book which is the ultimate guide to Pulsar.
  • 35. Proprietary & Confidential | @PassDev https://www.linkedin.com/in/timothyspann https://github.com/tspannhw https://streamnative.io/pulsar-python/ 35 Tim Spann Developer Advocate at StreamNative