(Current22) Let's Monitor The Conditions at the Conference

Timothy Spann
Timothy SpannDeveloper Advocate
Let's Monitor The Conditions at
the Conference
2
Tim Spann
Developer Advocate
Tim Spann, Developer Advocate at StreamNative
● FLiP(N) Stack = Flink, Pulsar and NiFI Stack
● Streaming Systems & Data Architecture Expert
● Experience:
○ 15+ years of experience with streaming technologies including Pulsar,
Flink, Kafka, Spark, NiFi, Big Data, Cloud, MXNet, IoT, Python and
more.
○ Today, he helps to grow the Pulsar community sharing rich technical
knowledge and experience at both global conferences and through
individual conversations.
David Kjerrumgaard
Developer Advocate
● Apache Pulsar Committer | Author of Pulsar
In Action
● Former Principal Software Engineer on
Splunk’s messaging team responsible for
Splunk’s internal Pulsar-as-a-Service
platform
● Former Director of Solution Architecture at
Streamlio
4
FLiP Stack Weekly
This week in Apache Flink, Apache Pulsar, Apache
NiFi, Apache Spark and open source friends.
https://bit.ly/32dAJft
streamnative.io
Agenda
• Collect the Data
• Stream the Data
• Store the Data
• Share the Data
streamnative.io
What?
streamnative.io
Temp, Humidity, Air Quality, Energy, …
streamnative.io
Is this (I)IoT?
Edge Computing?
● Any computation happening
outside of the cloud, closer
to the edge of the network
● Operates on real-time data
generated by sensors or users
● Improves response times in
applications where real-time
processing of data is required
Edge Computing
streamnative.io
Extending the Data Processing Layer
PULSAR
Edge Compute
streamnative.io
● Apache Pulsar’s two-tier architecture separates the compute and
storage layers and interact with one another over a TCP/IP
connection. This allows us to run the computing layer (Broker) on
either Edge servers or IoT Gateway devices.
● Our example native applications can stream data via MQTT. We
can also write small apps in Java, Python, Golang and other
languages to send messages via WebSockets, HTTP, Pulsar, Kafka
or other protocols from modern Edge computers.
● Pulsar’s serverless computing framework, know as Pulsar
Functions, can run inside the Broker as threads. Effectively
“stretching” the data processing layer.
Edge Computing with Pulsar
streamnative.io
● Pulsar’s Serverless computing framework can run inside the Pulsar Broker
as a thread pool. This framework can be used as the execution environment
for ML models.
● The Apache Pulsar Broker supports the MQTT protocol and therefore can
directly receive incoming data from the sensor hubs and store it in a topic.
Benefits of Running Pulsar Broker on the Edge
PULSAR
Edge Compute
streamnative.io
● Containers
● 64 bit processors and operating systems
● 8-64 GB Modern RAM
● Fast WiFi / Bluetooth
● 300+ Core GPUs
● eMMC Fast Storage
● TBs of SSD
● Examples: NVIDIA JETSON XAVIER NX
Edge Computing Power - Edge Server
streamnative.io
Device 1 - AdaFruit Funhouse
• https://github.com/tspannhw/pulsar-adafruit-funhouse
(MQTT)
Raw JSON:
{"pressure": 1009.08,
"button_sel": "off",
"pir_sensor": "off",
"humidity": 36.0422, "temperature": 80.9526,
"button_down": "off", "captouch6": "off",
"captouch7": "off", "button_up": "off", "captouch8": "off",
"light": 6990}
Processor 240MHz / RAM 2+4MB
streamnative.io
Device 2 - Raspberry Pi
• https://github.com/tspannhw/FLiP-Pi-DeltaLake-Thermal
Pulsar Protocol
Raw JSON:
Processor 1.5 GHz, 64-bit quad-core / RAM 2-8 GB LPDDR4-3200 SDRAM
{"uuid": "thrml_zda_20220715182748", "ipaddress": "192.168.1.204",
"cputempf": 108, "runtime": 0, "host": "thermal", "hostname": "thermal",
"macaddress": "e4:5f:01:7c:3f:34", "endtime": "1657909668.7279365",
"te": "0.0007398128509521484", "cpu": 1.8,
"diskusage": "105078.0 MB",
"memory": 9.0, "rowid": "20220715182748_fc4cbbb1-79da-4c1a-8991-78bd23c9f221",
"systemtime": "07/15/2022 14:27:53", "ts": 1657909673,
"starttime": "07/15/2022 14:27:48",
"datetimestamp": "2022-07-15 18:27:52.492469+00:00", "
temperature": 28.238,
"humidity": 29.61, "co2": 992.0}
streamnative.io
Device 2 - RPI 4 - 2GB
streamnative.io
Device 3 - Mac M1 PowerBook
https://github.com/search?q=user%3Atspannhw+airquality&type=repositories
Pulsar, AMQP, MQTT, Kafka Protocols
Raw JSON: {"dateObserved":"2022-08-03",
"hourObserved":13,"localTimeZone":"CST",
"reportingArea":"El
Paso","stateCode":"TX","latitude":31.8493,
"longitude":-106.4375,
"parameterName":"PM10","aqi":23,
"category":{"number":1,"name":"Good","additionalP
roperties":{}},"additionalProperties":{}}
Processor Apple M1 Pro 10-core 3.2GHz CPU 16-core GPU/ RAM 32 GB
streamnative.io
HS100 Meter - Electric
https://github.com/tspannhw/FLiP-Py-Energy
streamnative.io
streamnative.io
streamnative.io
Apache Pulsar is a Cloud-Native
Messaging and Event-Streaming Platform.
(Current22) Let's Monitor The Conditions at the Conference
Unified Messaging Model
Simplify your data infrastructure and
enable new use cases with queuing and
streaming capabilities in one platform.
Multi-tenancy
Enable multiple user groups to share the
same cluster, either via access control, or
in entirely different namespaces.
Scalability
Decoupled data computing and storage
enable horizontal scaling to handle data
scale and management complexity.
Geo-replication
Support for multi-datacenter replication
with both asynchronous and
synchronous replication for built-in
disaster recovery.
Tiered storage
Enable historical data to be offloaded to
cloud-native storage and store event
streams for indefinite periods of time.
Pulsar Benefits
● “Bookies”
● Stores messages and cursors
● Messages are grouped in
segments/ledgers
● A group of bookies form an
“ensemble” to store a ledger
● “Brokers”
● Handles message routing and
connections
● Stateless, but with caches
● Automatic load-balancing
● Topics are composed of
multiple segments
●
● Stores metadata for both
Pulsar and BookKeeper
● Service discovery
Store
Messages
Metadata &
Service Discovery
Metadata &
Service Discovery
Key Pulsar Concepts: Architecture
MetaData
Storage
Pulsar Subscription Modes
Different subscription modes
have different semantics:
Exclusive/Failover - guaranteed
order, single active consumer
Shared - multiple active
consumers, no order
Key_Shared - multiple active
consumers, order for given key
Producer 1
Producer 2
Pulsar Topic
Subscription D
Consumer D-1
Consumer D-2
Key-Shared
<
K
1,
V
10
>
<
K
1,
V
11
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
2
,V
2
1>
<
K
2
,V
2
2
>
Subscription C
Consumer C-1
Consumer C-2
Shared
<
K
1,
V
10
>
<
K
2,
V
21
>
<
K
1,
V
12
>
<
K
2
,V
2
0
>
<
K
1,
V
11
>
<
K
2
,V
2
2
>
Subscription A Consumer A
Exclusive
Subscription B
Consumer B-1
Consumer B-2
In case of failure in
Consumer B-1
Failover
Messaging
Ordering Guarantees
Topic Ordering Guarantees:
● Messages sent to a single topic or
partition DO have an ordering
guarantee.
● Messages sent to different partitions
DO NOT have an ordering guarantee.
28
Subscription Mode Guarantees:
● A single consumer can receive
messages from the same partition in
order using an exclusive or failover
subscription mode.
● Multiple consumers can receive
messages from the same key in order
using the key_shared subscription
mode.
Messaging
Ordering Guarantees
Topic Ordering Guarantees:
● Messages sent to a single topic or
partition DO have an ordering
guarantee.
● Messages sent to different partitions
DO NOT have an ordering guarantee.
29
Subscription Mode Guarantees:
● A single consumer can receive
messages from the same partition in
order using an exclusive or failover
subscription mode.
● Multiple consumers can receive
messages from the same key in order
using the key_shared subscription
mode.
Streaming
Consumer
Consumer
Consumer
Subscription
Shared
Failover
Consumer
Consumer
Subscription
In case of failure in
Consumer B-0
Consumer
Consumer
Subscription
Exclusive
X
Consumer
Consumer
Key-Shared
Subscription
Pulsar
Topic/Partition
Messaging
Unified Messaging
Model
Topics
Tenants
(Compliance)
Tenants
(Data Services)
Namespace
(Microservices)
Topic-1
(Cust Auth)
Topic-1
(Location Resolution)
Topic-2
(Demographics)
Topic-1
(Budgeted Spend)
Topic-1
(Acct History)
Topic-1
(Risk Detection)
Namespace
(ETL)
Namespace
(Campaigns)
Namespace
(ETL)
Tenants
(Marketing)
Namespace
(Risk Assessment)
Pulsar Cluster
Pulsar Cluster
Kafka
On Pulsar
(KoP)
MQTT
On Pulsar
(MoP)
AMQP
On Pulsar
(AoP)
Connectivity
• Functions - Lightweight Stream
Processing (Java, Python, Go)
• Connectors - Sources & Sinks
(Cassandra, Kafka, …)
• Protocol Handlers - AoP (AMQP), KoP
(Kafka), MoP (MQTT)
• Processing Engines - Flink, Spark,
Presto/Trino via Pulsar SQL
• Data Offloaders - Tiered Storage - (S3)
(Current22) Let's Monitor The Conditions at the Conference
Schema Registry
Schema Registry
schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3
(value=Avro/Protobuf/JSON)
Schema
Data
ID
Local Cache
for Schemas
+
Schema
Data
ID +
Local Cache
for Schemas
Send schema-1
(value=Avro/Protobuf/JSON) data
serialized per schema ID
Send (register)
schema (if not in
local cache)
Read schema-1
(value=Avro/Protobuf/JSON) data
deserialized per schema ID
Get schema by ID (if
not in local cache)
Producers Consumers
Presto/Trino workers can read segments
directly from bookies (or offloaded storage) in
parallel. Bookie
1
Segment 1
Producer Consumer
Broker 1
Topic1-Part1
Broker 2
Topic1-Part2
Broker 3
Topic1-Part3
Segment
2
Segment
3
Segment
4
Segment X
Segment 1
Segment
1 Segment 1
Segment 3
Segment
3
Segment 3
Segment 2
Segment
2
Segment 2
Segment 4
Segment 4
Segment
4
Segment X
Segment X
Segment X
Bookie
2
Bookie
3
Query
Coordin
ator
.
.
.
.
.
.
SQL
Worker
SQL
Worker
SQL
Worker
SQL
Worker
Query
Topic
Metadata
Pulsar SQL
Apache NiFi Pulsar Connector
https://streamnative.io/apache-nifi-connector/
SQL
select aqi, parameterName, dateObserved, hourObserved, latitude,
longitude, localTimeZone, stateCode, reportingArea from
airquality;
select max(aqi) as MaxAQI, parameterName, reportingArea from
airquality group by parameterName, reportingArea;
select max(aqi) as MaxAQI, min(aqi) as MinAQI, avg(aqi) as
AvgAQI, count(aqi) as RowCount, parameterName, reportingArea
from airquality group by parameterName, reportingArea;
Building Spark SQL View
val dfPulsar = spark.readStream.format("pulsar")
.option("service.url", "pulsar://pulsar1:6650")
.option("admin.url", "http://pulsar1:8080")
.option("topic", "persistent://public/default/pi-sensors")
.load()
dfPulsar.printSchema()
val pQuery = dfPulsar.selectExpr("*")
.writeStream.format("console")
.option("truncate", false)
.start()
https://github.com/tspannhw/FLiP-Pi-BreakoutGarden
IoT Data
IoT Ingestion: High-volume
streaming sources, sensors,
multiple message formats,
diverse protocols and
multi-vendor devices
creates data ingestion
challenges.
Other Sources: Transit data,
news, twitter, status feeds,
REST data, stock data and
more.
Demo Time
Q&A
Now Available
On-Demand Pulsar
Training
Academy.StreamNative.io
45
Resources
● For a first look at Pulsar benchmark report, share your email in the chat
● Join the Pulsar Slack channel - Apache-Pulsar.slack.com
● Follow @streamnativeio and @apache_pulsar on Twitter
● Contact StreamNative Sales - doug@streamnative.io
Too Many Tim Links
● https://dzone.com/articles/five-sensors-real-time-with-pulsar-and-python-on-a
● https://github.com/tspannhw/airquality
● https://github.com/tspannhw/FLiPN-AirQuality-REST
● https://github.com/tspannhw/pulsar-airquality-function
● https://github.com/tspannhw/FLiP-Pi-BreakoutGarden
● https://github.com/tspannhw/FLiPN-DEVNEXUS-2022
● https://github.com/tspannhw/FLiP-Pi-Thermal
● https://github.com/tspannhw/FLiP-Pi-Weather
● https://github.com/tspannhw/FLiP-RP400
● https://github.com/tspannhw/FLiP-Py-Pi-GasThermal
StreamNative: By the Creators Of Apache Pulsar
✓ Original creators of Apache
Pulsar & BookKeeper
✓ Operated the largest
Pulsar/BookKeeper cluster
✓ Data veterans with extensive
industry experience
CONFIDENTIAL. DO NOT SHARE.
ASF Member
Pulsar/BookKeeper PMC
Founder and CEO
Sijie Guo
ASF Member
Pulsar/BookKeeper PMC
CTO
Matteo Merli
Pulsar/BookKeeper PMC
Co-Founder
Jia Zhai
Tim Spann
Developer Advocate
@PaaSDev
https://www.linkedin.com/in/timothyspann
https://github.com/tspannhw
1 of 49

Recommended

Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) by
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar)
Using the FLiPN Stack for Edge AI (Flink, NiFi, Pulsar) Timothy Spann
305 views71 slides
Timothy Spann: Apache Pulsar for ML by
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLEdunomica
37 views65 slides
bigdata 2022_ FLiP Into Pulsar Apps by
bigdata 2022_ FLiP Into Pulsar Appsbigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar AppsTimothy Spann
460 views60 slides
JConf.dev 2022 - Apache Pulsar Development 101 with Java by
JConf.dev 2022 - Apache Pulsar Development 101 with JavaJConf.dev 2022 - Apache Pulsar Development 101 with Java
JConf.dev 2022 - Apache Pulsar Development 101 with JavaTimothy Spann
216 views59 slides
Princeton Dec 2022 Meetup_ StreamNative and Cloudera Streaming by
Princeton Dec 2022 Meetup_ StreamNative and Cloudera StreamingPrinceton Dec 2022 Meetup_ StreamNative and Cloudera Streaming
Princeton Dec 2022 Meetup_ StreamNative and Cloudera StreamingTimothy Spann
214 views139 slides
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and... by
Osacon 2021   hello hydrate! from stream to clickhouse with apache pulsar and...Osacon 2021   hello hydrate! from stream to clickhouse with apache pulsar and...
Osacon 2021 hello hydrate! from stream to clickhouse with apache pulsar and...Timothy Spann
3.1K views46 slides

More Related Content

Similar to (Current22) Let's Monitor The Conditions at the Conference

Music city data Hail Hydrate! from stream to lake by
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lakeTimothy Spann
708 views37 slides
Big mountain data and dev conference apache pulsar with mqtt for edge compu... by
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...Timothy Spann
440 views47 slides
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r... by
Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...Timothy Spann
553 views37 slides
[March sn meetup] apache pulsar + apache nifi for cloud data lake by
[March sn meetup] apache pulsar + apache nifi for cloud data lake[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lakeTimothy Spann
903 views55 slides
Big data conference europe real-time streaming in any and all clouds, hybri... by
Big data conference europe   real-time streaming in any and all clouds, hybri...Big data conference europe   real-time streaming in any and all clouds, hybri...
Big data conference europe real-time streaming in any and all clouds, hybri...Timothy Spann
811 views32 slides
apidays New York - Leveraging Event Streaming to Super-Charge your Business, ... by
apidays New York - Leveraging Event Streaming to Super-Charge your Business, ...apidays New York - Leveraging Event Streaming to Super-Charge your Business, ...
apidays New York - Leveraging Event Streaming to Super-Charge your Business, ...apidays
45 views54 slides

Similar to (Current22) Let's Monitor The Conditions at the Conference(20)

Music city data Hail Hydrate! from stream to lake by Timothy Spann
Music city data Hail Hydrate! from stream to lakeMusic city data Hail Hydrate! from stream to lake
Music city data Hail Hydrate! from stream to lake
Timothy Spann708 views
Big mountain data and dev conference apache pulsar with mqtt for edge compu... by Timothy Spann
Big mountain data and dev conference   apache pulsar with mqtt for edge compu...Big mountain data and dev conference   apache pulsar with mqtt for edge compu...
Big mountain data and dev conference apache pulsar with mqtt for edge compu...
Timothy Spann440 views
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r... by Timothy Spann
Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...Devfest uk & ireland  using apache nifi with apache pulsar for fast data on-r...
Devfest uk & ireland using apache nifi with apache pulsar for fast data on-r...
Timothy Spann553 views
[March sn meetup] apache pulsar + apache nifi for cloud data lake by Timothy Spann
[March sn meetup] apache pulsar + apache nifi for cloud data lake[March sn meetup] apache pulsar + apache nifi for cloud data lake
[March sn meetup] apache pulsar + apache nifi for cloud data lake
Timothy Spann903 views
Big data conference europe real-time streaming in any and all clouds, hybri... by Timothy Spann
Big data conference europe   real-time streaming in any and all clouds, hybri...Big data conference europe   real-time streaming in any and all clouds, hybri...
Big data conference europe real-time streaming in any and all clouds, hybri...
Timothy Spann811 views
apidays New York - Leveraging Event Streaming to Super-Charge your Business, ... by apidays
apidays New York - Leveraging Event Streaming to Super-Charge your Business, ...apidays New York - Leveraging Event Streaming to Super-Charge your Business, ...
apidays New York - Leveraging Event Streaming to Super-Charge your Business, ...
apidays45 views
Building an Event Streaming Architecture with Apache Pulsar by ScyllaDB
Building an Event Streaming Architecture with Apache PulsarBuilding an Event Streaming Architecture with Apache Pulsar
Building an Event Streaming Architecture with Apache Pulsar
ScyllaDB136 views
Cloud lunch and learn real-time streaming in azure by Timothy Spann
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
Timothy Spann663 views
[Conf42-KubeNative] Building Real-time Pulsar Apps on K8 by Timothy Spann
[Conf42-KubeNative] Building Real-time Pulsar Apps on K8[Conf42-KubeNative] Building Real-time Pulsar Apps on K8
[Conf42-KubeNative] Building Real-time Pulsar Apps on K8
Timothy Spann241 views
[AI Dev World 2022] Build ML Enhanced Event Streaming by Timothy Spann
[AI Dev World 2022] Build ML Enhanced Event Streaming[AI Dev World 2022] Build ML Enhanced Event Streaming
[AI Dev World 2022] Build ML Enhanced Event Streaming
Timothy Spann201 views
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud) by Timothy Spann
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Why Spring Belongs In Your Data Stream (From Edge to Multi-Cloud)
Timothy Spann18 views
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi... by Timothy Spann
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Machine Intelligence Guild_ Build ML Enhanced Event Streaming Applications wi...
Timothy Spann197 views
Serverless Event Streaming Applications as Functionson K8 by Timothy Spann
Serverless Event Streaming Applications as Functionson K8Serverless Event Streaming Applications as Functionson K8
Serverless Event Streaming Applications as Functionson K8
Timothy Spann361 views
Using FLiP with influxdb for edgeai iot at scale 2022 by Timothy Spann
Using FLiP with influxdb for edgeai iot at scale 2022Using FLiP with influxdb for edgeai iot at scale 2022
Using FLiP with influxdb for edgeai iot at scale 2022
Timothy Spann465 views
Big Data Streams Architectures. Why? What? How? by Anton Nazaruk
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
Anton Nazaruk871 views
DBCC 2021 - FLiP Stack for Cloud Data Lakes by Timothy Spann
DBCC 2021 - FLiP Stack for Cloud Data LakesDBCC 2021 - FLiP Stack for Cloud Data Lakes
DBCC 2021 - FLiP Stack for Cloud Data Lakes
Timothy Spann717 views
OSSNA Building Modern Data Streaming Apps by Timothy Spann
OSSNA Building Modern Data Streaming AppsOSSNA Building Modern Data Streaming Apps
OSSNA Building Modern Data Streaming Apps
Timothy Spann155 views
MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdf by Timothy Spann
MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdfMLconf 2022 NYC Event-Driven Machine Learning at Scale.pdf
MLconf 2022 NYC Event-Driven Machine Learning at Scale.pdf
Timothy Spann747 views
Apache Spark 101 - Demi Ben-Ari by Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
Demi Ben-Ari520 views
Apache Pulsar Development 101 with Python by Timothy Spann
Apache Pulsar Development 101 with PythonApache Pulsar Development 101 with Python
Apache Pulsar Development 101 with Python
Timothy Spann1.2K views

More from Timothy Spann

Building Real-Time Travel Alerts by
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel AlertsTimothy Spann
165 views48 slides
JConWorld_ Continuous SQL with Kafka and Flink by
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and FlinkTimothy Spann
156 views36 slides
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines by
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data PipelinesTimothy Spann
150 views25 slides
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo by
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoTimothy Spann
162 views8 slides
CoC23_ Looking at the New Features of Apache NiFi by
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiTimothy Spann
36 views24 slides
CoC23_ Let’s Monitor The Conditions at the Conference by
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the ConferenceTimothy Spann
17 views17 slides

More from Timothy Spann(20)

Building Real-Time Travel Alerts by Timothy Spann
Building Real-Time Travel AlertsBuilding Real-Time Travel Alerts
Building Real-Time Travel Alerts
Timothy Spann165 views
JConWorld_ Continuous SQL with Kafka and Flink by Timothy Spann
JConWorld_ Continuous SQL with Kafka and FlinkJConWorld_ Continuous SQL with Kafka and Flink
JConWorld_ Continuous SQL with Kafka and Flink
Timothy Spann156 views
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines by Timothy Spann
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
[EN]DSS23_tspann_Integrating LLM with Streaming Data Pipelines
Timothy Spann150 views
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo by Timothy Spann
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines DemoEvolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Evolve 2023 NYC - Integrating AI Into Realtime Data Pipelines Demo
Timothy Spann162 views
CoC23_ Looking at the New Features of Apache NiFi by Timothy Spann
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFi
Timothy Spann36 views
CoC23_ Let’s Monitor The Conditions at the Conference by Timothy Spann
CoC23_ Let’s Monitor The Conditions at the ConferenceCoC23_ Let’s Monitor The Conditions at the Conference
CoC23_ Let’s Monitor The Conditions at the Conference
Timothy Spann17 views
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf by Timothy Spann
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdfOSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
OSSFinance_UnlockingFinancialDatawithReal-TimePipelines.pdf
Timothy Spann23 views
CoC23_Utilizing Real-Time Transit Data for Travel Optimization by Timothy Spann
CoC23_Utilizing Real-Time Transit Data for Travel OptimizationCoC23_Utilizing Real-Time Transit Data for Travel Optimization
CoC23_Utilizing Real-Time Transit Data for Travel Optimization
Timothy Spann31 views
The Never Landing Stream with HTAP and Streaming by Timothy Spann
The Never Landing Stream with HTAP and StreamingThe Never Landing Stream with HTAP and Streaming
The Never Landing Stream with HTAP and Streaming
Timothy Spann254 views
Meetup - Brasil - Data In Motion - 2023 September 19 by Timothy Spann
Meetup - Brasil - Data In Motion - 2023 September 19Meetup - Brasil - Data In Motion - 2023 September 19
Meetup - Brasil - Data In Motion - 2023 September 19
Timothy Spann319 views
Implement a Universal Data Distribution Architecture to Manage All Streaming ... by Timothy Spann
Implement a Universal Data Distribution Architecture to Manage All Streaming ...Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Implement a Universal Data Distribution Architecture to Manage All Streaming ...
Timothy Spann28 views
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data by Timothy Spann
Building Real-time Pipelines with FLaNK_ A Case Study with Transit DataBuilding Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Building Real-time Pipelines with FLaNK_ A Case Study with Transit Data
Timothy Spann193 views
big data fest building modern data streaming apps by Timothy Spann
big data fest building modern data streaming appsbig data fest building modern data streaming apps
big data fest building modern data streaming apps
Timothy Spann317 views
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp by Timothy Spann
Using Apache NiFi with Apache Pulsar for Fast Data On-RampUsing Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Using Apache NiFi with Apache Pulsar for Fast Data On-Ramp
Timothy Spann163 views
GSJUG: Mastering Data Streaming Pipelines 09May2023 by Timothy Spann
GSJUG: Mastering Data Streaming Pipelines 09May2023GSJUG: Mastering Data Streaming Pipelines 09May2023
GSJUG: Mastering Data Streaming Pipelines 09May2023
Timothy Spann255 views
BestInFlowCompetitionTutorials03May2023 by Timothy Spann
BestInFlowCompetitionTutorials03May2023BestInFlowCompetitionTutorials03May2023
BestInFlowCompetitionTutorials03May2023
Timothy Spann11 views
Cloudera Sandbox Event Guidelines For Workflow by Timothy Spann
Cloudera Sandbox Event Guidelines For WorkflowCloudera Sandbox Event Guidelines For Workflow
Cloudera Sandbox Event Guidelines For Workflow
Timothy Spann32 views
Meet the Committers Webinar_ Lab Preparation by Timothy Spann
Meet the Committers Webinar_ Lab PreparationMeet the Committers Webinar_ Lab Preparation
Meet the Committers Webinar_ Lab Preparation
Timothy Spann32 views
Best Practices For Workflow by Timothy Spann
Best Practices For WorkflowBest Practices For Workflow
Best Practices For Workflow
Timothy Spann89 views

Recently uploaded

tecnologia18.docx by
tecnologia18.docxtecnologia18.docx
tecnologia18.docxnosi6702
6 views5 slides
Benefits in Software Development by
Benefits in Software DevelopmentBenefits in Software Development
Benefits in Software DevelopmentJohn Valentino
6 views15 slides
Flask-Python by
Flask-PythonFlask-Python
Flask-PythonTriloki Gupta
10 views12 slides
What is API by
What is APIWhat is API
What is APIartembondar5
15 views15 slides
Winter Projects GDSC IITK by
Winter Projects GDSC IITKWinter Projects GDSC IITK
Winter Projects GDSC IITKSahilSingh368445
416 views60 slides
Electronic AWB - Electronic Air Waybill by
Electronic AWB - Electronic Air Waybill Electronic AWB - Electronic Air Waybill
Electronic AWB - Electronic Air Waybill Freightoscope
6 views1 slide

Recently uploaded(20)

tecnologia18.docx by nosi6702
tecnologia18.docxtecnologia18.docx
tecnologia18.docx
nosi67026 views
Electronic AWB - Electronic Air Waybill by Freightoscope
Electronic AWB - Electronic Air Waybill Electronic AWB - Electronic Air Waybill
Electronic AWB - Electronic Air Waybill
Freightoscope 6 views
aATP - New Correlation Confirmation Feature.pptx by EsatEsenek1
aATP - New Correlation Confirmation Feature.pptxaATP - New Correlation Confirmation Feature.pptx
aATP - New Correlation Confirmation Feature.pptx
EsatEsenek1222 views
Automated Testing of Microsoft Power BI Reports by RTTS
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI Reports
RTTS11 views
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile... by Stefan Wolpers
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...
How To Make Your Plans Suck Less — Maarten Dalmijn at the 57th Hands-on Agile...
Stefan Wolpers44 views
How to build dyanmic dashboards and ensure they always work by Wiiisdom
How to build dyanmic dashboards and ensure they always workHow to build dyanmic dashboards and ensure they always work
How to build dyanmic dashboards and ensure they always work
Wiiisdom16 views
Quality Engineer: A Day in the Life by John Valentino
Quality Engineer: A Day in the LifeQuality Engineer: A Day in the Life
Quality Engineer: A Day in the Life
John Valentino10 views
JioEngage_Presentation.pptx by admin125455
JioEngage_Presentation.pptxJioEngage_Presentation.pptx
JioEngage_Presentation.pptx
admin1254559 views
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P... by NimaTorabi2
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
Unlocking the Power of AI in Product Management - A Comprehensive Guide for P...
NimaTorabi217 views
Advanced API Mocking Techniques Using Wiremock by Dimpy Adhikary
Advanced API Mocking Techniques Using WiremockAdvanced API Mocking Techniques Using Wiremock
Advanced API Mocking Techniques Using Wiremock
Dimpy Adhikary5 views
Ports-and-Adapters Architecture for Embedded HMI by Burkhard Stubert
Ports-and-Adapters Architecture for Embedded HMIPorts-and-Adapters Architecture for Embedded HMI
Ports-and-Adapters Architecture for Embedded HMI
Burkhard Stubert35 views
Top-5-production-devconMunich-2023-v2.pptx by Tier1 app
Top-5-production-devconMunich-2023-v2.pptxTop-5-production-devconMunich-2023-v2.pptx
Top-5-production-devconMunich-2023-v2.pptx
Tier1 app9 views

(Current22) Let's Monitor The Conditions at the Conference

  • 1. Let's Monitor The Conditions at the Conference
  • 2. 2
  • 3. Tim Spann Developer Advocate Tim Spann, Developer Advocate at StreamNative ● FLiP(N) Stack = Flink, Pulsar and NiFI Stack ● Streaming Systems & Data Architecture Expert ● Experience: ○ 15+ years of experience with streaming technologies including Pulsar, Flink, Kafka, Spark, NiFi, Big Data, Cloud, MXNet, IoT, Python and more. ○ Today, he helps to grow the Pulsar community sharing rich technical knowledge and experience at both global conferences and through individual conversations.
  • 4. David Kjerrumgaard Developer Advocate ● Apache Pulsar Committer | Author of Pulsar In Action ● Former Principal Software Engineer on Splunk’s messaging team responsible for Splunk’s internal Pulsar-as-a-Service platform ● Former Director of Solution Architecture at Streamlio 4
  • 5. FLiP Stack Weekly This week in Apache Flink, Apache Pulsar, Apache NiFi, Apache Spark and open source friends. https://bit.ly/32dAJft
  • 6. streamnative.io Agenda • Collect the Data • Stream the Data • Store the Data • Share the Data
  • 8. streamnative.io Temp, Humidity, Air Quality, Energy, …
  • 10. ● Any computation happening outside of the cloud, closer to the edge of the network ● Operates on real-time data generated by sensors or users ● Improves response times in applications where real-time processing of data is required Edge Computing
  • 11. streamnative.io Extending the Data Processing Layer PULSAR Edge Compute
  • 12. streamnative.io ● Apache Pulsar’s two-tier architecture separates the compute and storage layers and interact with one another over a TCP/IP connection. This allows us to run the computing layer (Broker) on either Edge servers or IoT Gateway devices. ● Our example native applications can stream data via MQTT. We can also write small apps in Java, Python, Golang and other languages to send messages via WebSockets, HTTP, Pulsar, Kafka or other protocols from modern Edge computers. ● Pulsar’s serverless computing framework, know as Pulsar Functions, can run inside the Broker as threads. Effectively “stretching” the data processing layer. Edge Computing with Pulsar
  • 13. streamnative.io ● Pulsar’s Serverless computing framework can run inside the Pulsar Broker as a thread pool. This framework can be used as the execution environment for ML models. ● The Apache Pulsar Broker supports the MQTT protocol and therefore can directly receive incoming data from the sensor hubs and store it in a topic. Benefits of Running Pulsar Broker on the Edge PULSAR Edge Compute
  • 14. streamnative.io ● Containers ● 64 bit processors and operating systems ● 8-64 GB Modern RAM ● Fast WiFi / Bluetooth ● 300+ Core GPUs ● eMMC Fast Storage ● TBs of SSD ● Examples: NVIDIA JETSON XAVIER NX Edge Computing Power - Edge Server
  • 15. streamnative.io Device 1 - AdaFruit Funhouse • https://github.com/tspannhw/pulsar-adafruit-funhouse (MQTT) Raw JSON: {"pressure": 1009.08, "button_sel": "off", "pir_sensor": "off", "humidity": 36.0422, "temperature": 80.9526, "button_down": "off", "captouch6": "off", "captouch7": "off", "button_up": "off", "captouch8": "off", "light": 6990} Processor 240MHz / RAM 2+4MB
  • 16. streamnative.io Device 2 - Raspberry Pi • https://github.com/tspannhw/FLiP-Pi-DeltaLake-Thermal Pulsar Protocol Raw JSON: Processor 1.5 GHz, 64-bit quad-core / RAM 2-8 GB LPDDR4-3200 SDRAM {"uuid": "thrml_zda_20220715182748", "ipaddress": "192.168.1.204", "cputempf": 108, "runtime": 0, "host": "thermal", "hostname": "thermal", "macaddress": "e4:5f:01:7c:3f:34", "endtime": "1657909668.7279365", "te": "0.0007398128509521484", "cpu": 1.8, "diskusage": "105078.0 MB", "memory": 9.0, "rowid": "20220715182748_fc4cbbb1-79da-4c1a-8991-78bd23c9f221", "systemtime": "07/15/2022 14:27:53", "ts": 1657909673, "starttime": "07/15/2022 14:27:48", "datetimestamp": "2022-07-15 18:27:52.492469+00:00", " temperature": 28.238, "humidity": 29.61, "co2": 992.0}
  • 18. streamnative.io Device 3 - Mac M1 PowerBook https://github.com/search?q=user%3Atspannhw+airquality&type=repositories Pulsar, AMQP, MQTT, Kafka Protocols Raw JSON: {"dateObserved":"2022-08-03", "hourObserved":13,"localTimeZone":"CST", "reportingArea":"El Paso","stateCode":"TX","latitude":31.8493, "longitude":-106.4375, "parameterName":"PM10","aqi":23, "category":{"number":1,"name":"Good","additionalP roperties":{}},"additionalProperties":{}} Processor Apple M1 Pro 10-core 3.2GHz CPU 16-core GPU/ RAM 32 GB
  • 19. streamnative.io HS100 Meter - Electric https://github.com/tspannhw/FLiP-Py-Energy
  • 23. Apache Pulsar is a Cloud-Native Messaging and Event-Streaming Platform.
  • 25. Unified Messaging Model Simplify your data infrastructure and enable new use cases with queuing and streaming capabilities in one platform. Multi-tenancy Enable multiple user groups to share the same cluster, either via access control, or in entirely different namespaces. Scalability Decoupled data computing and storage enable horizontal scaling to handle data scale and management complexity. Geo-replication Support for multi-datacenter replication with both asynchronous and synchronous replication for built-in disaster recovery. Tiered storage Enable historical data to be offloaded to cloud-native storage and store event streams for indefinite periods of time. Pulsar Benefits
  • 26. ● “Bookies” ● Stores messages and cursors ● Messages are grouped in segments/ledgers ● A group of bookies form an “ensemble” to store a ledger ● “Brokers” ● Handles message routing and connections ● Stateless, but with caches ● Automatic load-balancing ● Topics are composed of multiple segments ● ● Stores metadata for both Pulsar and BookKeeper ● Service discovery Store Messages Metadata & Service Discovery Metadata & Service Discovery Key Pulsar Concepts: Architecture MetaData Storage
  • 27. Pulsar Subscription Modes Different subscription modes have different semantics: Exclusive/Failover - guaranteed order, single active consumer Shared - multiple active consumers, no order Key_Shared - multiple active consumers, order for given key Producer 1 Producer 2 Pulsar Topic Subscription D Consumer D-1 Consumer D-2 Key-Shared < K 1, V 10 > < K 1, V 11 > < K 1, V 12 > < K 2 ,V 2 0 > < K 2 ,V 2 1> < K 2 ,V 2 2 > Subscription C Consumer C-1 Consumer C-2 Shared < K 1, V 10 > < K 2, V 21 > < K 1, V 12 > < K 2 ,V 2 0 > < K 1, V 11 > < K 2 ,V 2 2 > Subscription A Consumer A Exclusive Subscription B Consumer B-1 Consumer B-2 In case of failure in Consumer B-1 Failover
  • 28. Messaging Ordering Guarantees Topic Ordering Guarantees: ● Messages sent to a single topic or partition DO have an ordering guarantee. ● Messages sent to different partitions DO NOT have an ordering guarantee. 28 Subscription Mode Guarantees: ● A single consumer can receive messages from the same partition in order using an exclusive or failover subscription mode. ● Multiple consumers can receive messages from the same key in order using the key_shared subscription mode.
  • 29. Messaging Ordering Guarantees Topic Ordering Guarantees: ● Messages sent to a single topic or partition DO have an ordering guarantee. ● Messages sent to different partitions DO NOT have an ordering guarantee. 29 Subscription Mode Guarantees: ● A single consumer can receive messages from the same partition in order using an exclusive or failover subscription mode. ● Multiple consumers can receive messages from the same key in order using the key_shared subscription mode.
  • 30. Streaming Consumer Consumer Consumer Subscription Shared Failover Consumer Consumer Subscription In case of failure in Consumer B-0 Consumer Consumer Subscription Exclusive X Consumer Consumer Key-Shared Subscription Pulsar Topic/Partition Messaging Unified Messaging Model
  • 31. Topics Tenants (Compliance) Tenants (Data Services) Namespace (Microservices) Topic-1 (Cust Auth) Topic-1 (Location Resolution) Topic-2 (Demographics) Topic-1 (Budgeted Spend) Topic-1 (Acct History) Topic-1 (Risk Detection) Namespace (ETL) Namespace (Campaigns) Namespace (ETL) Tenants (Marketing) Namespace (Risk Assessment) Pulsar Cluster Pulsar Cluster
  • 35. Connectivity • Functions - Lightweight Stream Processing (Java, Python, Go) • Connectors - Sources & Sinks (Cassandra, Kafka, …) • Protocol Handlers - AoP (AMQP), KoP (Kafka), MoP (MQTT) • Processing Engines - Flink, Spark, Presto/Trino via Pulsar SQL • Data Offloaders - Tiered Storage - (S3)
  • 37. Schema Registry Schema Registry schema-1 (value=Avro/Protobuf/JSON) schema-2 (value=Avro/Protobuf/JSON) schema-3 (value=Avro/Protobuf/JSON) Schema Data ID Local Cache for Schemas + Schema Data ID + Local Cache for Schemas Send schema-1 (value=Avro/Protobuf/JSON) data serialized per schema ID Send (register) schema (if not in local cache) Read schema-1 (value=Avro/Protobuf/JSON) data deserialized per schema ID Get schema by ID (if not in local cache) Producers Consumers
  • 38. Presto/Trino workers can read segments directly from bookies (or offloaded storage) in parallel. Bookie 1 Segment 1 Producer Consumer Broker 1 Topic1-Part1 Broker 2 Topic1-Part2 Broker 3 Topic1-Part3 Segment 2 Segment 3 Segment 4 Segment X Segment 1 Segment 1 Segment 1 Segment 3 Segment 3 Segment 3 Segment 2 Segment 2 Segment 2 Segment 4 Segment 4 Segment 4 Segment X Segment X Segment X Bookie 2 Bookie 3 Query Coordin ator . . . . . . SQL Worker SQL Worker SQL Worker SQL Worker Query Topic Metadata Pulsar SQL
  • 39. Apache NiFi Pulsar Connector https://streamnative.io/apache-nifi-connector/
  • 40. SQL select aqi, parameterName, dateObserved, hourObserved, latitude, longitude, localTimeZone, stateCode, reportingArea from airquality; select max(aqi) as MaxAQI, parameterName, reportingArea from airquality group by parameterName, reportingArea; select max(aqi) as MaxAQI, min(aqi) as MinAQI, avg(aqi) as AvgAQI, count(aqi) as RowCount, parameterName, reportingArea from airquality group by parameterName, reportingArea;
  • 41. Building Spark SQL View val dfPulsar = spark.readStream.format("pulsar") .option("service.url", "pulsar://pulsar1:6650") .option("admin.url", "http://pulsar1:8080") .option("topic", "persistent://public/default/pi-sensors") .load() dfPulsar.printSchema() val pQuery = dfPulsar.selectExpr("*") .writeStream.format("console") .option("truncate", false) .start() https://github.com/tspannhw/FLiP-Pi-BreakoutGarden
  • 42. IoT Data IoT Ingestion: High-volume streaming sources, sensors, multiple message formats, diverse protocols and multi-vendor devices creates data ingestion challenges. Other Sources: Transit data, news, twitter, status feeds, REST data, stock data and more.
  • 44. Q&A
  • 46. Resources ● For a first look at Pulsar benchmark report, share your email in the chat ● Join the Pulsar Slack channel - Apache-Pulsar.slack.com ● Follow @streamnativeio and @apache_pulsar on Twitter ● Contact StreamNative Sales - doug@streamnative.io
  • 47. Too Many Tim Links ● https://dzone.com/articles/five-sensors-real-time-with-pulsar-and-python-on-a ● https://github.com/tspannhw/airquality ● https://github.com/tspannhw/FLiPN-AirQuality-REST ● https://github.com/tspannhw/pulsar-airquality-function ● https://github.com/tspannhw/FLiP-Pi-BreakoutGarden ● https://github.com/tspannhw/FLiPN-DEVNEXUS-2022 ● https://github.com/tspannhw/FLiP-Pi-Thermal ● https://github.com/tspannhw/FLiP-Pi-Weather ● https://github.com/tspannhw/FLiP-RP400 ● https://github.com/tspannhw/FLiP-Py-Pi-GasThermal
  • 48. StreamNative: By the Creators Of Apache Pulsar ✓ Original creators of Apache Pulsar & BookKeeper ✓ Operated the largest Pulsar/BookKeeper cluster ✓ Data veterans with extensive industry experience CONFIDENTIAL. DO NOT SHARE. ASF Member Pulsar/BookKeeper PMC Founder and CEO Sijie Guo ASF Member Pulsar/BookKeeper PMC CTO Matteo Merli Pulsar/BookKeeper PMC Co-Founder Jia Zhai