SlideShare a Scribd company logo
1 of 49
Download to read offline
Building a Real-Time Analytics
Application with
Apache Pulsar and Apache Pinot
Mark Needham
@MarkHNeedham
15th November 2022
Mary Grygleski
@mgrygles
Mary Grygleski
The Passionate Developer Advocate
Mary is a Streaming Developer Advocate at DataStax, a
leading Data Management Company that specializes in
Database-as-a-Service, NoSQL, Big Data, Streaming, and
the Cloud-Native platform. Previously she was with the
Java and WebSphere/Open Source Advocacy team at
IBM.
Based out of Chicago, Mary is a Java Champion and
President and Executive Board Member of the Chicago
Java Users Group (CJUG). She is also co-organizers for
the Data, Cloud and AI In Chicago, Chicago Cloud, and
IBM Cloud Chicago meetup groups.
She has extensive experience in product and application
design, development, integration, and deployment
experience, and specializes in Event-driven, Reactive
Java, Open Source, and Cloud-enabled Distributed
systems.
https://www.linkedin.com/in/mary-grygleski/
@mgrygles
https://www.twitch.tv/mgrygles
https://discord.gg/RMU4Juw
Who is Mary?
Mark Needham
Developer Relations Engineer
Mark Needham is an Apache Pinot advocate and
developer relations engineer at StarTree.
As a developer relations engineer, Mark helps users
learn how to use Apache Pinot to build their real-time
user-facing analytics applications. He also does
developer experience, simplifying the getting started
experience by making product tweaks and
improvements to the documentation.
Mark writes about his experiences working with Pinot at
markhneedham.com.
https://www.linkedin.com/in/markhneedham/
@markhneedham
Who is Mark?
https://www.markhneedham.com/blog/
learndatawithmark.com
What is Real-Time Analytics?
Real-time analytics is the discipline that applies logic and mathematics
to data to provide insights for making better decisions quickly.
Events
Events
Events -> Insight
Events Insight
Events -> Insight -> Action
Events Insight Action
The value of data over time
Time
Value
The value of data over time
Time
Value
Real-Time
The value of data over time
Time
Value
Real-Time
Who’s interested in this data?
● Analysts
● Management
● Users
Real-Time Analytics Quadrant
Human Facing
Machine Facing
Internal External
Observability
Real-Time
Dashboard
Recommendation Engine
Fraud Detection
Order Tracking Service
Total users 700 Million
QPS 10000+
Latency SLA < 100 ms p99th
Freshness Seconds
Examples of Real-Time Analytics
Examples of Real-Time Analytics
Missed
orders
Inaccurate
orders
Downtime
Top selling
items
Menu item
Feedback
Total users 500,000+
QPS 100s
Latency SLA < 100 ms p99th
Freshness Seconds - Minutes
Examples of Real-Time Analytics
Source:
Peter Bakkum, Engineering Manager @Stripe Financial
Properties of Real-Time Analytics Systems
Building a User-facing Real-Time Analytics System
Velocity of
ingestion
Real-Time
Ingestion
1000s of QPS
Milliseconds
Latency
Seconds
Freshness
Highly
Available Scalable
Cost
Effective
High
Dimensionality
What is Apache Pulsar?
18
Open source
Created by Yahoo
Contributed to the Apache Software Foundation (ASF) in 2016
Top-level project (2018)
Cloud-native design
Cluster based
Multi-tenant
Simple client APIs (Java, C#, Python, Go, …)
➔ Separate compute and storage!
Guaranteed message delivery
If a message successfully reaches a Pulsar broker, it will be delivered to its
intended target.
Light-weight serverless functions framework
Create complex processing logic within a Pulsar cluster (aka: data
pipeline)
Tiered storage offloads
Offload data from hot/warm storage to cold/long-term storage when the
data is aging out
Meet
Pulsar
19
Streaming
Ingest data Sink data Select data
Process data
Not Streaming
Ingest
data
Persist
data
Select
data
Process
data
Streaming versus not streaming
Persist
data
Select
data
What is Apache Pinot?
S1 S3
Pinot
Controller
S2
3
1 2
2 3
4
Pinot Servers
Zookeeper
Pinot
Broker
S4
4
1
Seg1 -> S1
Seg2 -> S2
Seg3 -> S3
Seg4 -> S4
Seg1 -> S1, S4
Seg2 -> S2, S3
Seg3 -> S3, S1
Seg4 -> S4, S2
select count(*) from X
where country = us
Apache Pinot Architecture
Demo Time! 🥳
github.com/mneedham/pinot-wiki/tree/pulsar
Real-Time Analytics Quadrant
Human Facing
Machine Facing
Internal External
Observability
Real-Time
Dashboard
Recommendation Engine
Fraud Detection
Order Tracking Service
Demo Architecture
Our data set: Wikimedia Recent Changes Feed
● A continuous stream of structured event data
describing changes made to Wikimedia properties.
● Published over HTTP using the Server-Side Events
(SSE) Protocol.
Wikimedia Recent Changes Feed events
event: message
id:
[{"topic":"eqiad.mediawiki.recentchange","partition":0,"timestamp":1647344554001},{"topic":"codfw.me
diawiki.recentchange","partition":0,"offset":-1}]
data:
{"$schema":"/mediawiki/recentchange/1.0.0","meta":{"uri":"https://en.wikipedia.org/wiki/Bosmansdam_H
igh_School","request_id":"f72015bb-376c-48b9-9863-afc0c75a72c8","id":"99c272ae-d31c-4535-9dac-69b098
3171d6","dt":"2022-03-15T11:42:34Z","domain":"en.wikipedia.org","stream":"mediawiki.recentchange","t
opic":"eqiad.mediawiki.recentchange","partition":0,"offset":3714501013},"id":1485381286,"type":"edit
","namespace":0,"title":"Bosmansdam High School","comment":"v2.04b - Fix errors for [[WP:WCW|CW
project]] (Template value ends with break)","timestamp":1647344554,"user":"ZI
Jony","bot":false,"minor":true,"length":{"old":16089,"new":16085},"revision":{"old":1075262250,"new"
:1077261343},"server_url":"https://en.wikipedia.org","server_name":"en.wikipedia.org","server_script
_path":"/w","wiki":"enwiki","parsedcomment":"v2.04b - Fix errors for <a href="/wiki/Wikipedia:WCW"
class="mw-redirect" title="Wikipedia:WCW">CW project</a> (Template value ends with break)"}
Wikimedia Recent Changes Feed events
event: message
id:
[{"topic":"eqiad.mediawiki.recentchange","partition":0,"timestamp":1647344554001},{"topic":"codfw.me
diawiki.recentchange","partition":0,"offset":-1}]
data:
{"$schema":"/mediawiki/recentchange/1.0.0","meta":{"uri":"https://en.wikipedia.org/wiki/Bosmansdam_H
igh_School","request_id":"f72015bb-376c-48b9-9863-afc0c75a72c8","id":"99c272ae-d31c-4535-9dac-69b098
3171d6","dt":"2022-03-15T11:42:34Z","domain":"en.wikipedia.org","stream":"mediawiki.recentchange","t
opic":"eqiad.mediawiki.recentchange","partition":0,"offset":3714501013},"id":1485381286,"type":"edit
","namespace":0,"title":"Bosmansdam High School","comment":"v2.04b - Fix errors for [[WP:WCW|CW
project]] (Template value ends with break)","timestamp":1647344554,"user":"ZI
Jony","bot":false,"minor":true,"length":{"old":16089,"new":16085},"revision":{"old":1075262250,"new"
:1077261343},"server_url":"https://en.wikipedia.org","server_name":"en.wikipedia.org","server_script
_path":"/w","wiki":"enwiki","parsedcomment":"v2.04b - Fix errors for <a href="/wiki/Wikipedia:WCW"
class="mw-redirect" title="Wikipedia:WCW">CW project</a> (Template value ends with break)"}
Demo Done! 😌
Powered by Apache Pinot
3.9k
Github Stars
Slack Users
Companies
2400+
100+
Community
Events/sec
1M+ Peak QPS
200k+ Query Latency
ms
Performance
pinot.apache.org
Who else is using Pulsar?
31
Takeaways
● Real-time analytics lets us create applications that give users
actionable insights
● Properties of these systems: Fresh data, fast querying, at scale
● Pulsar + Pinot is the perfect combination to achieve this
Thank you! (from Mark) 🙇
dev.startree.ai
@MarkHNeedham
stree.ai/slack
@learndatawithmark
Thank you! (from Mary) 󰢚
@mgrygles
Apache Pulsar Slack sign-up
https://apache-pulsar.herokuapp.com/
https://pulsar-neighborhood.github.io/
Resources
Astra DB: https://astra.datastax.com
Astra Streaming:
https://www.datastax.com/products/astra-streaming
Luna Streaming:
https://www.datastax.com/products/luna-streaming
CDC for Astra DB:
https://docs.datastax.com/en/astra/docs/astream-cdc.html
https://pulsar.apache.org/
https://bookkeeper.apache.org/
https://zookeeper.apache.org
Check out 5 Minutes About Pulsar on
https://bit.ly/3bgkRxJ
How to start coding ?
Check out Awesome-Astra
https://awesome-astra.github.io/docs/
Follow Mary’s Twitch Stream
(Different topics: Java, Open Source, Distributed Messaging, Event-Streaming, Cloud, DevOps, etc)
Wednesday at 2pm-US/CST
https://twitch.tv/mgrygles
Publishing Messages to Kafka
Creating Pinot Table
docker exec -it pinot-controller-wiki bin/pinot-admin.sh 
AddTable 
-tableConfigFile /config/table.json 
-schemaFile /config/schema.json 
-exec
Publishing Messages to Kafka
Pinot
Pinot
Streamlit Dashboard
Streamlit Dashboard: Bots?
Streamlit Dashboard: Top Users
Streamlit Dashboard: Top Bots/Non Bots
Streamlit Dashboard: What got changed?
Streamlit Dashboard: By who?

More Related Content

What's hot

The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
Databricks
 

What's hot (20)

Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
 
Fluentd vs. Logstash for OpenStack Log Management
Fluentd vs. Logstash for OpenStack Log ManagementFluentd vs. Logstash for OpenStack Log Management
Fluentd vs. Logstash for OpenStack Log Management
 
Elk
Elk Elk
Elk
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetStreaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
 
PostgreSQL replication
PostgreSQL replicationPostgreSQL replication
PostgreSQL replication
 
The Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication TutorialThe Full MySQL and MariaDB Parallel Replication Tutorial
The Full MySQL and MariaDB Parallel Replication Tutorial
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
Elastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & KibanaElastic - ELK, Logstash & Kibana
Elastic - ELK, Logstash & Kibana
 
Solving PostgreSQL wicked problems
Solving PostgreSQL wicked problemsSolving PostgreSQL wicked problems
Solving PostgreSQL wicked problems
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/AvroThe Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
The Rise of ZStandard: Apache Spark/Parquet/ORC/Avro
 
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
Analyzing Petabyte Scale Financial Data with Apache Pinot and Apache Kafka | ...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 

Similar to Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot

Sviluppare Applicazioni Real Time con AppSync Deck.pptx
Sviluppare Applicazioni Real Time con AppSync Deck.pptxSviluppare Applicazioni Real Time con AppSync Deck.pptx
Sviluppare Applicazioni Real Time con AppSync Deck.pptx
Amazon Web Services
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystem
Yael Garten
 

Similar to Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot (20)

GIB2020 - Building Event-Driven Integration Architectures
GIB2020 - Building Event-Driven Integration ArchitecturesGIB2020 - Building Event-Driven Integration Architectures
GIB2020 - Building Event-Driven Integration Architectures
 
Set Your Data In Motion - CTO Roundtable
Set Your Data In Motion - CTO RoundtableSet Your Data In Motion - CTO Roundtable
Set Your Data In Motion - CTO Roundtable
 
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka ArchitecturesEvent Streaming CTO Roundtable for Cloud-native Kafka Architectures
Event Streaming CTO Roundtable for Cloud-native Kafka Architectures
 
8base Hyperledger Miami Meetup Presentation
8base Hyperledger Miami Meetup Presentation8base Hyperledger Miami Meetup Presentation
8base Hyperledger Miami Meetup Presentation
 
8base Hyperledger Miami Meetup 20180719
8base Hyperledger Miami Meetup 201807198base Hyperledger Miami Meetup 20180719
8base Hyperledger Miami Meetup 20180719
 
Sviluppare Applicazioni Real Time con AppSync Deck.pptx
Sviluppare Applicazioni Real Time con AppSync Deck.pptxSviluppare Applicazioni Real Time con AppSync Deck.pptx
Sviluppare Applicazioni Real Time con AppSync Deck.pptx
 
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and m...
 
Streaming Visualization
Streaming VisualizationStreaming Visualization
Streaming Visualization
 
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
Blueprint Series: Architecture Patterns for Implementing Serverless Microserv...
 
Building Event-Driven (Micro)Services with Apache Kafka
Building Event-Driven (Micro)Services with Apache KafkaBuilding Event-Driven (Micro)Services with Apache Kafka
Building Event-Driven (Micro)Services with Apache Kafka
 
From Kafka to BigQuery - Strata Singapore
From Kafka to BigQuery - Strata SingaporeFrom Kafka to BigQuery - Strata Singapore
From Kafka to BigQuery - Strata Singapore
 
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and More
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and MoreWSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and More
WSO2 Stream Processor: Graphical Editor, HTTP & Message Trace Analytics and More
 
GIBC2018 - Building Event Driven Cloud Solutions with Microsoft Azure Event Grid
GIBC2018 - Building Event Driven Cloud Solutions with Microsoft Azure Event GridGIBC2018 - Building Event Driven Cloud Solutions with Microsoft Azure Event Grid
GIBC2018 - Building Event Driven Cloud Solutions with Microsoft Azure Event Grid
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
 
Data & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architectureData & analytics challenges in a microservice architecture
Data & analytics challenges in a microservice architecture
 
WSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsWSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needs
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystem
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
 
Introduction to Azure monitor
Introduction to Azure monitorIntroduction to Azure monitor
Introduction to Azure monitor
 
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
Presto: Fast SQL-on-Anything (including Delta Lake, Snowflake, Elasticsearch ...
 

More from Altinity Ltd

More from Altinity Ltd (20)

Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptxBuilding an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
Building an Analytic Extension to MySQL with ClickHouse and Open Source.pptx
 
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
Cloud Native ClickHouse at Scale--Using the Altinity Kubernetes Operator-2022...
 
Building an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open SourceBuilding an Analytic Extension to MySQL with ClickHouse and Open Source
Building an Analytic Extension to MySQL with ClickHouse and Open Source
 
Fun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdfFun with ClickHouse Window Functions-2021-08-19.pdf
Fun with ClickHouse Window Functions-2021-08-19.pdf
 
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdfCloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
Cloud Native Data Warehouses - Intro to ClickHouse on Kubernetes-2021-07.pdf
 
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
Building High Performance Apps with Altinity Stable Builds for ClickHouse | A...
 
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
Application Monitoring using Open Source - VictoriaMetrics & Altinity ClickHo...
 
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdfOwn your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
Own your ClickHouse data with Altinity.Cloud Anywhere-2023-01-17.pdf
 
ClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom AppsClickHouse ReplacingMergeTree in Telecom Apps
ClickHouse ReplacingMergeTree in Telecom Apps
 
Adventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree EngineAdventures with the ClickHouse ReplacingMergeTree Engine
Adventures with the ClickHouse ReplacingMergeTree Engine
 
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdfAltinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
Altinity Webinar: Introduction to Altinity.Cloud-Platform for Real-Time Data.pdf
 
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
OSA Con 2022 - What Data Engineering Can Learn from Frontend Engineering - Pe...
 
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdfOSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
OSA Con 2022 - Welcome to OSA CON Version 2022 - Robert Hodges - Altinity.pdf
 
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
OSA Con 2022 - Using ClickHouse Database to Power Analytics and Customer Enga...
 
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
OSA Con 2022 - Tips and Tricks to Keep Your Queries under 100ms with ClickHou...
 
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
OSA Con 2022 - The Open Source Analytic Universe, Version 2022 - Robert Hodge...
 
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
OSA Con 2022 - Switching Jaeger Distributed Tracing to ClickHouse to Enable A...
 
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
OSA Con 2022 - Streaming Data Made Easy - Tim Spann & David Kjerrumgaard - St...
 
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdfOSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
OSA Con 2022 - State of Open Source Databases - Peter Zaitsev - Percona.pdf
 
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
OSA Con 2022 - Specifics of data analysis in Time Series Databases - Roman Kh...
 

Recently uploaded

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 

Recently uploaded (20)

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
TrafficWave Generator Will Instantly drive targeted and engaging traffic back...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 

Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot

  • 1. Building a Real-Time Analytics Application with Apache Pulsar and Apache Pinot Mark Needham @MarkHNeedham 15th November 2022 Mary Grygleski @mgrygles
  • 2. Mary Grygleski The Passionate Developer Advocate Mary is a Streaming Developer Advocate at DataStax, a leading Data Management Company that specializes in Database-as-a-Service, NoSQL, Big Data, Streaming, and the Cloud-Native platform. Previously she was with the Java and WebSphere/Open Source Advocacy team at IBM. Based out of Chicago, Mary is a Java Champion and President and Executive Board Member of the Chicago Java Users Group (CJUG). She is also co-organizers for the Data, Cloud and AI In Chicago, Chicago Cloud, and IBM Cloud Chicago meetup groups. She has extensive experience in product and application design, development, integration, and deployment experience, and specializes in Event-driven, Reactive Java, Open Source, and Cloud-enabled Distributed systems. https://www.linkedin.com/in/mary-grygleski/ @mgrygles https://www.twitch.tv/mgrygles https://discord.gg/RMU4Juw Who is Mary?
  • 3. Mark Needham Developer Relations Engineer Mark Needham is an Apache Pinot advocate and developer relations engineer at StarTree. As a developer relations engineer, Mark helps users learn how to use Apache Pinot to build their real-time user-facing analytics applications. He also does developer experience, simplifying the getting started experience by making product tweaks and improvements to the documentation. Mark writes about his experiences working with Pinot at markhneedham.com. https://www.linkedin.com/in/markhneedham/ @markhneedham Who is Mark? https://www.markhneedham.com/blog/ learndatawithmark.com
  • 4. What is Real-Time Analytics? Real-time analytics is the discipline that applies logic and mathematics to data to provide insights for making better decisions quickly.
  • 7. Events -> Insight -> Action Events Insight Action
  • 8. The value of data over time Time Value
  • 9. The value of data over time Time Value Real-Time
  • 10. The value of data over time Time Value Real-Time Who’s interested in this data? ● Analysts ● Management ● Users
  • 11. Real-Time Analytics Quadrant Human Facing Machine Facing Internal External Observability Real-Time Dashboard Recommendation Engine Fraud Detection Order Tracking Service
  • 12. Total users 700 Million QPS 10000+ Latency SLA < 100 ms p99th Freshness Seconds Examples of Real-Time Analytics
  • 13. Examples of Real-Time Analytics Missed orders Inaccurate orders Downtime Top selling items Menu item Feedback Total users 500,000+ QPS 100s Latency SLA < 100 ms p99th Freshness Seconds - Minutes
  • 14. Examples of Real-Time Analytics Source: Peter Bakkum, Engineering Manager @Stripe Financial
  • 15. Properties of Real-Time Analytics Systems
  • 16. Building a User-facing Real-Time Analytics System Velocity of ingestion Real-Time Ingestion 1000s of QPS Milliseconds Latency Seconds Freshness Highly Available Scalable Cost Effective High Dimensionality
  • 17. What is Apache Pulsar?
  • 18. 18 Open source Created by Yahoo Contributed to the Apache Software Foundation (ASF) in 2016 Top-level project (2018) Cloud-native design Cluster based Multi-tenant Simple client APIs (Java, C#, Python, Go, …) ➔ Separate compute and storage! Guaranteed message delivery If a message successfully reaches a Pulsar broker, it will be delivered to its intended target. Light-weight serverless functions framework Create complex processing logic within a Pulsar cluster (aka: data pipeline) Tiered storage offloads Offload data from hot/warm storage to cold/long-term storage when the data is aging out Meet Pulsar
  • 19. 19 Streaming Ingest data Sink data Select data Process data Not Streaming Ingest data Persist data Select data Process data Streaming versus not streaming Persist data Select data
  • 20. What is Apache Pinot?
  • 21. S1 S3 Pinot Controller S2 3 1 2 2 3 4 Pinot Servers Zookeeper Pinot Broker S4 4 1 Seg1 -> S1 Seg2 -> S2 Seg3 -> S3 Seg4 -> S4 Seg1 -> S1, S4 Seg2 -> S2, S3 Seg3 -> S3, S1 Seg4 -> S4, S2 select count(*) from X where country = us Apache Pinot Architecture
  • 24. Real-Time Analytics Quadrant Human Facing Machine Facing Internal External Observability Real-Time Dashboard Recommendation Engine Fraud Detection Order Tracking Service
  • 26. Our data set: Wikimedia Recent Changes Feed ● A continuous stream of structured event data describing changes made to Wikimedia properties. ● Published over HTTP using the Server-Side Events (SSE) Protocol.
  • 27. Wikimedia Recent Changes Feed events event: message id: [{"topic":"eqiad.mediawiki.recentchange","partition":0,"timestamp":1647344554001},{"topic":"codfw.me diawiki.recentchange","partition":0,"offset":-1}] data: {"$schema":"/mediawiki/recentchange/1.0.0","meta":{"uri":"https://en.wikipedia.org/wiki/Bosmansdam_H igh_School","request_id":"f72015bb-376c-48b9-9863-afc0c75a72c8","id":"99c272ae-d31c-4535-9dac-69b098 3171d6","dt":"2022-03-15T11:42:34Z","domain":"en.wikipedia.org","stream":"mediawiki.recentchange","t opic":"eqiad.mediawiki.recentchange","partition":0,"offset":3714501013},"id":1485381286,"type":"edit ","namespace":0,"title":"Bosmansdam High School","comment":"v2.04b - Fix errors for [[WP:WCW|CW project]] (Template value ends with break)","timestamp":1647344554,"user":"ZI Jony","bot":false,"minor":true,"length":{"old":16089,"new":16085},"revision":{"old":1075262250,"new" :1077261343},"server_url":"https://en.wikipedia.org","server_name":"en.wikipedia.org","server_script _path":"/w","wiki":"enwiki","parsedcomment":"v2.04b - Fix errors for <a href="/wiki/Wikipedia:WCW" class="mw-redirect" title="Wikipedia:WCW">CW project</a> (Template value ends with break)"}
  • 28. Wikimedia Recent Changes Feed events event: message id: [{"topic":"eqiad.mediawiki.recentchange","partition":0,"timestamp":1647344554001},{"topic":"codfw.me diawiki.recentchange","partition":0,"offset":-1}] data: {"$schema":"/mediawiki/recentchange/1.0.0","meta":{"uri":"https://en.wikipedia.org/wiki/Bosmansdam_H igh_School","request_id":"f72015bb-376c-48b9-9863-afc0c75a72c8","id":"99c272ae-d31c-4535-9dac-69b098 3171d6","dt":"2022-03-15T11:42:34Z","domain":"en.wikipedia.org","stream":"mediawiki.recentchange","t opic":"eqiad.mediawiki.recentchange","partition":0,"offset":3714501013},"id":1485381286,"type":"edit ","namespace":0,"title":"Bosmansdam High School","comment":"v2.04b - Fix errors for [[WP:WCW|CW project]] (Template value ends with break)","timestamp":1647344554,"user":"ZI Jony","bot":false,"minor":true,"length":{"old":16089,"new":16085},"revision":{"old":1075262250,"new" :1077261343},"server_url":"https://en.wikipedia.org","server_name":"en.wikipedia.org","server_script _path":"/w","wiki":"enwiki","parsedcomment":"v2.04b - Fix errors for <a href="/wiki/Wikipedia:WCW" class="mw-redirect" title="Wikipedia:WCW">CW project</a> (Template value ends with break)"}
  • 30. Powered by Apache Pinot 3.9k Github Stars Slack Users Companies 2400+ 100+ Community Events/sec 1M+ Peak QPS 200k+ Query Latency ms Performance pinot.apache.org
  • 31. Who else is using Pulsar? 31
  • 32. Takeaways ● Real-time analytics lets us create applications that give users actionable insights ● Properties of these systems: Fresh data, fast querying, at scale ● Pulsar + Pinot is the perfect combination to achieve this
  • 33. Thank you! (from Mark) 🙇 dev.startree.ai @MarkHNeedham stree.ai/slack @learndatawithmark
  • 34. Thank you! (from Mary) 󰢚 @mgrygles Apache Pulsar Slack sign-up https://apache-pulsar.herokuapp.com/ https://pulsar-neighborhood.github.io/
  • 35. Resources Astra DB: https://astra.datastax.com Astra Streaming: https://www.datastax.com/products/astra-streaming Luna Streaming: https://www.datastax.com/products/luna-streaming CDC for Astra DB: https://docs.datastax.com/en/astra/docs/astream-cdc.html https://pulsar.apache.org/ https://bookkeeper.apache.org/ https://zookeeper.apache.org
  • 36. Check out 5 Minutes About Pulsar on https://bit.ly/3bgkRxJ
  • 37. How to start coding ? Check out Awesome-Astra https://awesome-astra.github.io/docs/
  • 38. Follow Mary’s Twitch Stream (Different topics: Java, Open Source, Distributed Messaging, Event-Streaming, Cloud, DevOps, etc) Wednesday at 2pm-US/CST https://twitch.tv/mgrygles
  • 40. Creating Pinot Table docker exec -it pinot-controller-wiki bin/pinot-admin.sh AddTable -tableConfigFile /config/table.json -schemaFile /config/schema.json -exec
  • 42. Pinot
  • 43. Pinot
  • 47. Streamlit Dashboard: Top Bots/Non Bots