History of Apache Pinot

@apachepinot | @KishoreBytes
Happy Anniversary
First Meetup

2006-2011
My Journey
Yahoo Ads
2011-2014
2015 +
Apache
Pinot
& Thirdye
Espresso &
Apache
Helix
A distributed system is one in
which the failure of a computer you
didn't even know existed can
render your own computer
Unusable
- Leslie Lamport

Pinot @ LinkedIn

70+
Products
Pinot @ LinkedIn
User Facing Analytics
120k+
queries/sec
ms - 1s
latency

Pinot @ LinkedIn
Business Metrics Analytics
10,000+
Metrics
50,000+
Dimensions

Pinot @ LinkedIn
ThirdEye: Anomaly detection and root cause analysis
50+
Teams
100K
Time Series

Event
Data
Entity
Data
Kafka
Espresso
LinkedIn Analytics Ecosystem
User facing analytics
Business facing analytics
Anomaly Detection & Root
cause Analysis
Batch
Realtime

Pinot @
Other Companies

Why did we build it?

2014
Sensei + Bobo
(Lucene)
Who Viewed My Proﬁle -
Before 2014

Upgrade
2014
LinkedIn Upgrades “Who’s
Viewed Your Proﬁle”
New Look, Better Analytics.

2014
Upgrade
A Huge Success!

Nightmares of Success
2014
● 1000+ queries/sec
● Expanded the cluster to 1000 nodes to
maintain SLA
● Had to break the cluster into multiple shards
based on member Id
● Code Yellow
Most of us just wanted
to kill this product and
go back to batch mode

Nightmares of Success
2014
However… we were tasked to ﬁgure
out the solution as the Engagement
metrics were out of this world.

What was wrong with
the existing stack?
2014
Search
system
repurposed
for analytics
Inverted
Index Driven
Fixed Query
Plan
Sensei, ElasticSearch, Druid had the similar architecture

• It was a search system repurposed for Analytics
• High reliance on inverted index lead to system page cache thrashing
• Fixed query plan -> weak optimization
• All existing systems - ElasticSearch, Druid had the same issues
What was wrong with
the existing stack?
2014

Break down the problem into
basic components
& reassemble from ground-up
Solution: Reasoning
from ﬁrst principle.

ScanPost-Filter
Filter
Storage
ScanInverted Index
Columnar Store
Byte
Encoding
Sorted Index
❏ Common Techniques
❏ Pinot Only
Bit/RLE
Encoding
Star-Tree Pre-aggregation
Star-Tree Index
Star-tree
Per-segment flexible query planning
Pinot Query
Execution Stack

• Better data compression:
○ Run length encoding
○ Can be accessed as
forward/inverted index
• Spatial locality
sorted index
inverted index
Filter Sorted IndexSorted Index

latency
space requirement
T=infinity
T=1,000,000
T=10,000
T=100
T=1
• Trade-off between latency
and space
• Limit max number of records
(T) to scan per query via
partial pre-aggregation.
Star-tree Index:
Smart Materialized View
Aggregation
Filter
Storage
Star-Tree Pre-aggregation
Star-Tree Index
Star-Tree

Star-tree Index:
Smart Materialized View

• ~5ms average latency
• <100ms 95th percentile
• No Inverted Index!
• No caching!
• 45x improvement in
eﬃciency
Pinot:
Before & After
2014
2016
After (2020)
75 Nodes
5,000 Queries / sec
700M+ members
BEFORE (2014)
1000 Nodes
1500 Queries / sec
200M+ members

Performance
Comparison

User Facing
Business Facing
5000 Queries Pinot Druid
Total Time 11 minutes 24 minutes
P50 84ms 136ms
P90 206ms 667ms
Pinot vs Druid -
Perf comparison
Single threaded

User Facing
Applications
Business Facing
Metrics
Anomaly Detection
Time Series
Multiple Use Cases:
One Platform
Kafka
70+
10k
100k
120k
Queries/secEvents/sec
1M+

What’s next

Fact Table
Dimension Table Pre-Join Pre-Aggregation Pre-Cube
Spark SQL
Presto
Big Query
Pinot
Druid
Elastic Search
Kylin
KV Store
Latency
Flexibility
lowhigh
lowhigh
Pinot
Latency vs Flexibility

SPEED FLEXIBILITY
Pinot + Presto
Streaming connector

Pinot Roadmap
Feature Applicable to ..
Upsert support Analytics on mutable data
Map, Full Text, JSON Richer data type
Kinesis Connector Amazon cloud
Range, Geo Spatial Metric ( e.g. latency > 3sec), Geo Queries
Cluster Manager Dashboard Ease of use

Thank you
Christina Luu, Carlos Mendivil, Margarita Mendoza, Noel Navarro, Molly Vorwerck (Uber), and Kenny Bastani.

Questions
Contributors - We love distributed systems!
• Apache Pinot (incubating) 0.3.0 is available at
https://pinot.apache.org/download
• Pinot Twitter Account
https://twitter.com/ApachePinot
• Pinot Meetup Page
https://www.meetup.com/apache-pinot
• Pinot Slack Channel
https://tinyurl.com/pinotSlackChannel

History of Apache Pinot

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to History of Apache Pinot

Similar to History of Apache Pinot (20)

More from Kishore Gopalakrishna

More from Kishore Gopalakrishna (7)

Recently uploaded

Recently uploaded (20)

History of Apache Pinot