Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale

Enabling Real-time Analytics Applications @ LinkedIn’s Scale
Mayank Shrivastava Jackie Jiang
Senior Software Engineer
Seunghyun Lee
Senior Software EngineerStaff Software Engineer
Apache Pinot

1
2
3
4
Agenda
Introduction
Pinot @ LinkedIn
How to use Pinot
Pinot Performance

How is data generated and used at LinkedIn
Actor Verb
Member
Job
Post
Company
Object Life Cycle
Create
Generate
Analyze
Product
DataInsights
600+ million
members
Tens of
million posts
likes/shared
per day
3+ million
jobs posted
per month
30 million
companies
Trillions of events per day

Real-time Analytics Applications at LinkedIn

How to build an online analytics application?
• Real-time data ingestion
• Millions of active users, 1000s of queries per sec
• Super low latency (10s ms)
• Highly available, always on

Approach 1. Join on the fly
Event Stream
Profile View
Profile View Table
Member Table
Application
Server
Who viewed my profile
• Real-time
(depending on storage)
• High latency due to join

Approach 2. Pre Join + Pre Aggregate
• Near real-time ingestion
• Latency varies with query
selectivity
Event Stream
Profile View
Profile View
Table
Member Table
Application
Server
Stream
Processing
Engine
Pre Join +
Pre Aggr

Approach 3. Pre Join + Pre Aggregate + Pre Cube
• Very fast
• Batch ingestion (hourly / daily)
• Storage explosion
• Re-bootstrap on schema change
Event Stream
Profile View Profile View
Table
Member Table
Application
Server
Batch
Processing
Engine
Pre Join +
Pre Aggr +
Pre Cube

Latency vs. Flexibility
Profile View Table
Member Table Pre-Join Pre-Aggregation Pre-Cube
Spark SQL
Presto
Hive
Big Query
Druid
Elastic Search
Pinot
Kylin
KV Store
Latency
Flexibility
lowhigh
lowhigh
Pinot

Who Viewed My Profile @ LinkedIn
Data Lake
Stream
Processing
WVMP
Dashboard
Ad-hoc Queries
Espresso
Raw Tracking
Data
Pre-joined
Data
Pre Join +
Pre Aggr

What is Apache Pinot?
• OLAP Datastore
• Columnar, indexed storage
• Low latency analytics
• Distributed – highly available, reliable, scalable
• Lambda architecture
○ Offline data pushes + Real-time stream ingestion
• Open Source

Pinot @ LinkedIn
70+ 2000+ 100K+ 1M+
Member Facing
Use Cases
Dashboards
for Internal
Business Metrics
Queries
Per Second
Records Ingested
Per Second

Pinot @ LinkedIn: Member Facing Analytics Report
• Providing analytics reports
for Linkedin member-facing
applications
• Very high QPS (Thousands)
• Requires strict latency SLA
(10s ms - sub-sec)

Pinot @ LinkedIn: Interactive Dashboard
• Visualization tool for
multi-dimensional metrics
• Complex, explorative queries
• 2000+ metrics,
used by 1000+ employees

Pinot @ LinkedIn: Anomaly Detection
• Efficiently detect and
investigate anomalies in
metrics
• Third Eye: Part of Apache
Pinot open source

How to use Pinot
Batch Data Ingestion
Real-time Data Ingestion
SQL-like Query Interface (PQL)

Let’s build something cool
Event RSVP Data

How to use Pinot: Workflow
Define
Schema
Define Table
Configuration
Create
Table
One Time Setup
Raw Data
Generate
Pinot
Segments
Push Data
Streaming
Data
Setup
Stream Data
Source
Batch
(Scheduled Job)
Real-time
(One Time Setup)
Data Ingestion
HDFS, S3,
ADSL, NFS...
Kafka,
Event Hub...

How to use Pinot: Define Schema
● Schema name: meetupRsvp
● Dimension field specs
○ event_name (string)
○ event_time (long)
○ country (string)
○ city (string)
○ …
● Metrics field specs
○ rsvp_count (int)
● Time field spec
○ timestamp (long)
■ timetype: epoch / datetime
■ granularity: millisecond /
second/hour/day
• Dimension: an attribute of your data (filter,
group by)
• Metric: a number that is used to measure
characteristics of a dimension (aggregation)
• Time: a timestamp of an event (partitioning,
retention management)
SELECT event_name, sum(rsvp_count)
FROM meetupRsvp
WHERE country = “us”
GROUP BY event_name
TOP 10
Example Query - Top 10 events in US

How to use Pinot: Configure and Create Table
Pinot Schema
Table Config
● Table name: meetupRsvp
● Table type: batch / realtime
/ hybrid
● Replication factor: 2
● Index Columns: ...
● Bloom filters: ...
● Retention: 30 days
● ...
Pinot
Admin Client

How to use Pinot: Batch Ingestion
Raw DataRaw Data
Raw Data
Segment
Generation
Job
(library)
Json, CSV, Avro,
Parquet, ORC...
Pinot
Schema
Table
Config
Pinot
Segment
Pinot
Segment
Pinot
Segment
HDFS, S3, ADLS, NFS...

How to use Pinot: Batch Ingestion
Raw Data
Segment
Generation
Job
(library)
Json, Avro,
Parquet, ORC...
Pinot
Schema
Table
Config
Pinot
Segment
Pinot
Segment
Pinot
Segment
Segment
Push Job
(library)
HDFS, S3, ADLS, NFS... HDFS, S3, ADLS, NFS...

How to use Pinot: Segment Assignment
Segment
Push Job
Controller
Helix
Zookeeper
Server-0 Server-1 Server-2
Pinot
• Assignment strategies
○ Uniform
○ Replica Group
○ Partition Aware
Segment Store
S0 S2S1
● S0: Sever-0, Server-1
● S1: Server-1, Server-2
● S2: Server-0, Server-2
S0 S2 S1 S0 S2 S1
1. Table name
2. Segment name
3. Segment URI path

How to use Pinot: Query Routing
Segment
Push Job
Controller
Helix
• Routing Strategies
○ Uniform
○ Replica Group
○ Partition Aware
Broker
Queries
Segment Store
S0 S2S1
Server-0 Server-1 Server-2
Pinot
S0 S2 S1 S0 S2 S1

How to use Pinot: Batch + Realtime
Segment
Push Job
Controller
Helix
Real-time
Servers
Offline
Servers
Broker
Queries
Pinot
Streaming
Data
Kafka,
Event Hub,
Kinesis...
Table Config
● Table name: meetupRsvp
● Table type: real-time
● Replication factor: 2
● Kafka broker: ...
● Kafka topic name: ...
● Retention: 5 days
● ...
• A single schema for both
offline + real-time tables

How to use Pinot: Batch + Realtime
Segment
Push Job
Controller
Helix
Real-time
Servers
Offline
Servers
Broker
Queries
Pinot
Streaming
Data
Kafka,
Event Hub,
Kinesis...
• Real-time servers keep
consumed data in
memory, periodically
flush data to segment
store.
• Broker handles offline
and real-time federation.

Interactive Dashboard select sum(pageView) from T
where country = us
and browser = chrome
...
group by time
• Human-driven queries
• Slice and dice over arbitrary dimensions
5000 Queries Pinot Druid
Total Time 11 minutes 24 minutes
P50 84ms 136ms
P90 206ms 667ms

Site Facing Analytics
select sum(articleViewCount) from T
where articleId = x
...
and time >= y time < z
group by viewer[title|geo|industry]
• Pre-defined queries with different
filtering values
• Usually have a filter on the primary key
(e.g. articleId)
• High QPS (thousands), low latency
(< 100ms for 99%) requirements

Anomaly Detection
for d1 in [us, ca, ...]
for d2 in [chrome, firefox, ...]
...
select sum(pageViews) from T
where country = d1 and browser = d2…
group by time
Filter Aggregation
select …
where country = us …
Slow, scan 60-70% data
select …
where country = ireland …
Scan less than 1%
• Identifying issues requires monitoring
all possible combinations
• Data distribution can be skewed

Secret behind Pinot
Aggregation
Filter
Storage
Scan Star-Tree Pre-aggregation
Scan Inverted Index
Columnar Store Encoding/Compression
Sorted Index Star-Tree Index
❏ Common Techniques
❏ Pinot & Druid
❏ Pinot Only
select sum(pageView) from T
where country = us

Columnar Store
• Read relevant columns only
country browser ...
us chrome ...
ca firefox ...
jp ie ...
us firefox ...
ca ie ...
… … ...
Raw Data
Row Based
Column Based
Aggregation
Filter
Storage
where country = us
Columnar us chrome ...
ca firefox ...
jp ie ...
country
us
ca
jp
us
ca
…
browser
chrome
firefox
ie
firefox
ie
…
...
...
...
...
...
...
...

Encoding & Compression Dictionary
Forward Index
country
ca
jp
us
…
browser
chrome
firefox
ie
…
country
2
0
1
2
0
...
browser
0
1
2
1
2
...
• Storage compression
○ Dictionary encoding
○ Bit compression
Aggregation
Filter
Storage Encoding/Compression
where country = us
Column Based
country
us
ca
jp
us
ca
…
browser
chrome
firefox
ie
firefox
ie
…
docId
0
1
2
3
4
…
docId
0
1
2
3
4
...
dictId
0
1
2
…

Inverted Index
docId country browser
0 us chrome
1 ca firefox
2 jp ie
3 us firefox
4 ca ie
… … …
Raw Data country docIds
ca 1, 4...
jp 2...
us 0, 3...
... ...
Inverted Index
browser docIds
chrome 0 ...
firefox 1, 3...
ie 2, 4...
... ...• Storing bitmap for each value
• Fast filtering：
○ Constant time value lookup
○ Bit operations for AND/OR clause
Aggregation
Filter
Storage
Inverted
Index
where country = us

Sorted Index
• Better data compression:
○ Run length encoding
○ Can be accessed as
forward/inverted index
• Spatial locality
country start docId end docId
ca 0 80
jp 81 100
us 101 300
… … …
docId country
0 ca
... …
100 jp
101 us
… …
300 us
… …
sorted index
inverted index
Aggregation
Filter
Storage
Sorted Index
where country = us

Latency vs. Space Trade-off
latency
space requirement
scan
pre-cubeStar-Tree
where country = us
Aggregation
Filter
Storage
Star-Tree Pre-aggregation
Star-Tree Index

Star-Tree Index
latency
space requirement
T=infinity
T=1,000,000
T=10,000
T=100
T=1
• Configurable trade-off between latency and space by partial
pre-aggregation technique
• Be able to achieve a hard upper bound for query latencies

Flexible Query Execution Plan
Query Optimization
select max(col) from T Use metadata instead of scanning
select sum(metric) from T
where country = us and accountId = x
Reorder filter based on the available indexes
(apply accountId before country predicate)
Segment level physical query planner can intelligently choose the best way
to solve the query based on the segment metadata and available indexes.

Global Optimizations
Problem Solution
Querying all segments
Segment pruning to minimize the number of
segments to query
Querying all servers
Smart segment assignment to reduce the fan-out
to servers

Conclusion
User Activity
Data
Member
Facing
Applications
Interactive
Dashboard
Anomaly
Detection

Contributing to Pinot
• We are looking for contributions!
• Apache Pinot (incubating) 0.1.0 is available at
https://pinot.apache.org
• Pinot Twitter Account
https://twitter.com/ApachePinot
• Pinot Meetup Page
https://www.meetup.com/apache-pinot
• Pinot Slack Channel
https://tinyurl.com/pinotSlackChannel

Folks behind Pinot
Mayank Shrivastava
Subbu Subramaniam
Jean-Francois Im
Jackie Jiang
Seunghyun Lee
Jennifer Dai
Neha Pawar
Jialiang Li
Sunitha Beeram
Shraddha Sahay
Kishore Gopalakrishna
Xiang Fu
James Shao
Prasanna Ravi
John Gutmann
Dino Occhialini
Walter Huf
Xiaohui Sun
Long Huynh
Akshay Rai
Alexander Pucher
Jihao Zhang
Felix Cheung
Olivier Lamy
Jim Jagielski
Marcel Siegrist
Roman Shaposhnik
Anurag Shendge

Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale

More Related Content

What's hot

Similar to Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale

Recently uploaded

Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale