Real-time
Analytics Using
Apache Pinot
How LinkedIn, Uber Eats and Stripe create
Real Time Dashboards for millions of users.
Agenda
Who is Barkha? (why would you want to listen to me?)
The evolution of Analytics
How LinkedIn Solved their Problem
Try some Pinot with me
Overheard @ Big Data Fest 2023
• 5 Year trends in Big data will see
• Streaming APIs
• Will Data Warehouse Survive?
• Integration with LLM/AI/ML
• Thiago de Faria
• 5 Year trends in Big data will see
• Democratization of Data Warehousing
• Commoditized Data Warehousing
• Most companies are barely doing BI let alone AI.
• Joe Reis
About Barkha
• Founder South Florida Women in
Technology
• Developer Advocate @StarTree
• Linkedin.com/in/BarkhaHerman
• Twitter @BarkhaH
Analytics?
Real Time?
Scale?
OH WHY?
Why do we need Real-time
Analytics? Or Analytics? Or at
Scale?
Historic
Analytics
Batch
Shared Data
No Scale Concerns
Modern
Analytics
Data Freshness
Daily reports vs.
How late is my food
delivery?
Query Performance
Reports < 2 minutes vs.
Dashboards take < 10
millisecond to load
Scale
All division managers
worldwide access report
(> 1000) vs.
Millions of users access
dashboard
How LinkedIn
solved Analytics @
Scale
By inventing Pinot
LinkedIn: Who Viewed
your Profile? • Capture profile view information
and its deduplication
• Compute view sources (e.g.,
search, profile page, etc.)
• View relevance (e.g., a senior
leader viewed your profile)
• View obfuscations based on the
viewing member’s privacy settings
Before Pinot
• Elastic Search based solution
• 1000 Nodes
• 1500 queries / sec
• 20+ million users
After Pinot
• 75 Nodes
• 5000 queries / sec
• 70+ million users
Pinot
Building
Blocks • Segment is the physical
store.
• Table are conceptual and
accept both real-time and
batch data.
• Tenants provide
functional segregation.
• Cluster allow for scale
based on use.
Pinot
Building
Blocks
Indexes
Pinot
supports
the
following
indexing
techniques
Inverted index - Used for exact lookups
Range index - Used for range queries.
Text index - Used for phrase, term, Boolean, prefix, or regex queries.
Geospatial index - Based on H3, a hexagon-based hierarchical gridding.
Used for finding points that exist within a certain distance from another point.
JSON index - Used for querying columns in JSON documents.
Star-Tree index - Pre-aggregates results across multiple columns.
StarTree Index
Don’t pre cube everything…
Apache Pinot Architecture
Demo
Pizza Shop Demo
https://github.com/startreedata/pizza-shop-demo
Overheard @ Big Data Fest 2023
• 5 Year trends in Big data will see
• Streaming APIs  Apache Pinot is built to solve Streaming First Problems
• Will Data Warehouse Survive?  Apache Pinot builds Customer Facing Analytics which is on the rise
• Integration with LLM/AI/ML  Apps built on top of Pinot such as ThirdEye use Statistics and allow for AI/ML Add Ons.
• Thiago de Faria
• 5 Year trends in Big data will see
• Democratization of Data Warehousing  Apache Pinot builds Customer Facing Analytics which is on the rise
• Commoditized Data Warehousing  Apache Pinot builds Customer Facing Analytics which is on the rise
• Most companies are barely doing BI let alone AI.  Easy Analytics + Apps built on top of Pinot such as ThirdEye.
• Joe Reis
Using Real-
time Analytics
@ Scale
What can you do with it?
Who Uses
Apache
Pinot?
What’s Next?
Please Connect!!!!! I need brownie points.
Thank you for listening!

Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot

  • 1.
    Real-time Analytics Using Apache Pinot HowLinkedIn, Uber Eats and Stripe create Real Time Dashboards for millions of users.
  • 2.
    Agenda Who is Barkha?(why would you want to listen to me?) The evolution of Analytics How LinkedIn Solved their Problem Try some Pinot with me
  • 3.
    Overheard @ BigData Fest 2023 • 5 Year trends in Big data will see • Streaming APIs • Will Data Warehouse Survive? • Integration with LLM/AI/ML • Thiago de Faria • 5 Year trends in Big data will see • Democratization of Data Warehousing • Commoditized Data Warehousing • Most companies are barely doing BI let alone AI. • Joe Reis
  • 4.
    About Barkha • FounderSouth Florida Women in Technology • Developer Advocate @StarTree • Linkedin.com/in/BarkhaHerman • Twitter @BarkhaH
  • 5.
    Analytics? Real Time? Scale? OH WHY? Whydo we need Real-time Analytics? Or Analytics? Or at Scale?
  • 6.
  • 7.
    Modern Analytics Data Freshness Daily reportsvs. How late is my food delivery? Query Performance Reports < 2 minutes vs. Dashboards take < 10 millisecond to load Scale All division managers worldwide access report (> 1000) vs. Millions of users access dashboard
  • 8.
    How LinkedIn solved Analytics@ Scale By inventing Pinot
  • 9.
    LinkedIn: Who Viewed yourProfile? • Capture profile view information and its deduplication • Compute view sources (e.g., search, profile page, etc.) • View relevance (e.g., a senior leader viewed your profile) • View obfuscations based on the viewing member’s privacy settings
  • 10.
    Before Pinot • ElasticSearch based solution • 1000 Nodes • 1500 queries / sec • 20+ million users
  • 11.
    After Pinot • 75Nodes • 5000 queries / sec • 70+ million users
  • 12.
    Pinot Building Blocks • Segmentis the physical store. • Table are conceptual and accept both real-time and batch data. • Tenants provide functional segregation. • Cluster allow for scale based on use.
  • 13.
  • 14.
    Indexes Pinot supports the following indexing techniques Inverted index -Used for exact lookups Range index - Used for range queries. Text index - Used for phrase, term, Boolean, prefix, or regex queries. Geospatial index - Based on H3, a hexagon-based hierarchical gridding. Used for finding points that exist within a certain distance from another point. JSON index - Used for querying columns in JSON documents. Star-Tree index - Pre-aggregates results across multiple columns.
  • 15.
    StarTree Index Don’t precube everything…
  • 16.
  • 17.
  • 18.
    Overheard @ BigData Fest 2023 • 5 Year trends in Big data will see • Streaming APIs  Apache Pinot is built to solve Streaming First Problems • Will Data Warehouse Survive?  Apache Pinot builds Customer Facing Analytics which is on the rise • Integration with LLM/AI/ML  Apps built on top of Pinot such as ThirdEye use Statistics and allow for AI/ML Add Ons. • Thiago de Faria • 5 Year trends in Big data will see • Democratization of Data Warehousing  Apache Pinot builds Customer Facing Analytics which is on the rise • Commoditized Data Warehousing  Apache Pinot builds Customer Facing Analytics which is on the rise • Most companies are barely doing BI let alone AI.  Easy Analytics + Apps built on top of Pinot such as ThirdEye. • Joe Reis
  • 19.
    Using Real- time Analytics @Scale What can you do with it?
  • 20.
  • 21.
    What’s Next? Please Connect!!!!!I need brownie points.
  • 22.
    Thank you forlistening!