Two of the most frequently asked questions about Pinot’s history are “Why did LinkedIn build Pinot?”, “How is it different from Druid, ElasticSearch, Kylin?”. In this talk, we will go over the use cases that motivated us to build Pinot and how it has changed the analytics landscape at LinkedIn, Uber, and other companies.
2. @apachepinot | @KishoreBytes
2006-2011
My Journey
Yahoo Ads
2011-2014
2015 +
Apache
Pinot
& Thirdye
Espresso &
Apache
Helix
A distributed system is one in
which the failure of a computer you
didn't even know existed can
render your own computer
Unusable
- Leslie Lamport
13. @apachepinot | @KishoreBytes
Nightmares of Success
2014
● 1000+ queries/sec
● Expanded the cluster to 1000 nodes to
maintain SLA
● Had to break the cluster into multiple shards
based on member Id
● Code Yellow
Most of us just wanted
to kill this product and
go back to batch mode
15. @apachepinot | @KishoreBytes
What was wrong with
the existing stack?
2014
Search
system
repurposed
for analytics
Inverted
Index Driven
Fixed Query
Plan
Sensei, ElasticSearch, Druid had the similar architecture
16. @apachepinot | @KishoreBytes
• It was a search system repurposed for Analytics
• High reliance on inverted index lead to system page cache thrashing
• Fixed query plan -> weak optimization
• All existing systems - ElasticSearch, Druid had the same issues
What was wrong with
the existing stack?
2014
17. @apachepinot | @KishoreBytes
Break down the problem into
basic components
& reassemble from ground-up
Solution: Reasoning
from first principle.
19. @apachepinot | @KishoreBytes
• Better data compression:
○ Run length encoding
○ Can be accessed as
forward/inverted index
• Spatial locality
sorted index
inverted index
Filter Sorted IndexSorted Index
20. @apachepinot | @KishoreBytes
latency
space requirement
T=infinity
T=1,000,000
T=10,000
T=100
T=1
• Trade-off between latency
and space
• Limit max number of records
(T) to scan per query via
partial pre-aggregation.
Star-tree Index:
Smart Materialized View
Aggregation
Filter
Storage
Star-Tree Pre-aggregation
Star-Tree Index
Star-Tree
22. @apachepinot | @KishoreBytes
• ~5ms average latency
• <100ms 95th percentile
• No Inverted Index!
• No caching!
• 45x improvement in
efficiency
Pinot:
Before & After
2014
2016
After (2020)
75 Nodes
5,000 Queries / sec
700M+ members
BEFORE (2014)
1000 Nodes
1500 Queries / sec
200M+ members
24. @apachepinot | @KishoreBytes
User Facing
Business Facing
5000 Queries Pinot Druid
Total Time 11 minutes 24 minutes
P50 84ms 136ms
P90 206ms 667ms
Pinot vs Druid -
Perf comparison
Single threaded
25. @apachepinot | @KishoreBytes
User Facing
Applications
Business Facing
Metrics
Anomaly Detection
Time Series
Multiple Use Cases:
One Platform
Kafka
70+
10k
100k
120k
Queries/secEvents/sec
1M+
29. @apachepinot | @KishoreBytes
Pinot Roadmap
Feature Applicable to ..
Upsert support Analytics on mutable data
Map, Full Text, JSON Richer data type
Kinesis Connector Amazon cloud
Range, Geo Spatial Metric ( e.g. latency > 3sec), Geo Queries
Cluster Manager Dashboard Ease of use
30. @apachepinot | @KishoreBytes
Thank you
Christina Luu, Carlos Mendivil, Margarita Mendoza, Noel Navarro, Molly Vorwerck (Uber), and Kenny Bastani.
31. @apachepinot | @KishoreBytes
Questions
Contributors - We love distributed systems!
• Apache Pinot (incubating) 0.3.0 is available at
https://pinot.apache.org/download
• Pinot Twitter Account
https://twitter.com/ApachePinot
• Pinot Meetup Page
https://www.meetup.com/apache-pinot
• Pinot Slack Channel
https://tinyurl.com/pinotSlackChannel