Pulsar Summit
San Francisco
Hotel Nikko
August 18 2022
Keynote
Data Democracy:
Journey to
User-Facing Analytics
Xiang Fu
Co-Founder • StarTree
About Me:
Co-Founder at StarTree, cloud-native platform to build the next generation of data
analytics applications for millions of users.
Founder and PMC of Apache Pinot, a realtime, distributed OLAP datastore
Previously, architect at Uber's data platform team solving streaming data serving,
processing, and analytics problems at a large scale.
Data Democracy - Are We There?
SQL Editors
Dashboard
Internal Facing Analytics
Operators
Analysts
Past — Present
Current technology has done a
great job delivering insights for
INTERNAL USERS
Analytical Data Apps
latency-sensitive
User Facing Analytics
Present — Future
Users Customers
To truly democratize data, we
need to deliver high quality
insights to EXTERNAL USERS
The Gap
Vanishing window of
opportunity for events
Time Value of Data
Value
Time
Event
Insight
Streaming Changed The Game…
Data warehouses & lakes
Hours to Days
Stream
Milliseconds to Seconds
And Started A Cycle
Streaming technologies like PULSAR increased speed and reduced costs
to store events, kicking of a cycle…
Collect more
events
Improved user
experience
Increase user
engagement
Streaming
Sources
Messaging
Pub-sub
Log
Aggregation
Streaming
Processing
Real Time
Analytics
Streaming Spawned New Use Cases
- Ingest data as soon as events happen
- Query that data as soon as it’s ingested
- Do above at scale.
In simple terms, we need to:
How Do We Do Real-time Analytics ?
Simple is HARD!
Enter Apache Pinot
Ingestion Sources (Real-time, Batch, SQL)
Efficient compute and indexing powerhouse
Compressed and scalable storage (PB scale)
Advanced Query Support
Multi-Tenant and Distributed Architecture
Apache Pinot At A Glance
5000 queries/sec
~5ms average latency
<100ms 95th percentile
2016
After Pinot
5,000 Queries / sec
700M+ members
Before Pinot
1500 Queries / sec
200M+ members
2014
45X Improvement
in Efficiency
1000 Nodes
75 Nodes
Apache Pinot Impact
2013 2015 2019 2021
Started @ LinkedIn Apache Graduation
StarTree Founded
Open Source
Apache Pinot Timeline
40+
Companies
Slack Users
800
55k
Downloads
100+
Companies
Slack Users
2500+
1M+
Downloads
2020
2022
Apache Pinot Community Growth
Apache Pinot Adoption
Apache Pinot Architecture
Strong Integration with Pulsar
INTERNAL FACING ANALYTICS
USER FACING ANALYTICS
Business Analysts
Platform Operators
Application Users
Business Partners
Food Delivery FinTech
Long Orders Insights
Nearby Orders in App,
Restaurants Manager Dashboard
Merchants Dashboard
Ledger Observability
Real Time Use Cases
events/sec
1M+
queries/sec
200K+
query
latency
Ms
data size
1PB+
rows
1T
query
latency
< 1s
data size
200TB+
queries/sec
30K+
query
latency
< 100ms
Confidential - Do not duplicate or distribute without consent of StarTree Inc.
Apache Pinot At Scale
Democratizing data through
User-Facing Analytics
Who Viewed My Profile
LinkedIn
Publishing Analytics Platform
Restaurant Manager
Uber
Orders Near You
Contact Me:
Thank you!
Pulsar Summit
San Francisco
Hotel Nikko
August 18 2022
xiangfu@startree.ai
@xiangfu0

Data Democracy: Journey to User-Facing Analytics - Pulsar Summit SF 2022