Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

0

Share

Download to read offline

Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache Kafka (Neha Pawar, Stealth Mode Startup) Kafka Summit 2020

Download to read offline

We built Apache Pinot - a real-time distributed OLAP datastore - for low-latency analytics at scale. This is heavily used at companies such as LinkedIn, Uber, Slack, where Kafka serves as the backbone for capturing vast amounts of data. Pinot ingests millions of events per sec from Kafka, builds indexes in real-time and serves 100K+ queries per second while ensuring latency SLA of millisecond to sub second.

In the first implementation, we used the Consumer Group feature to manage the offsets and checkpoints across multiple Kafka Consumers. However, to achieve fault tolerance and scalability, we had to run multiple consumer groups for the same topic. This was our initial strategy to maintain the SLA at high query workload. But this model posed other challenges - since Kafka maintains offset per consumer group, achieving data consistency across multiple consumer groups was not possible. Also, a failure of a single node in a consumer group meant the entire consumer group was unavailable for query processing. Restarting the failed node needed lot of manual operations to ensure data is consumed exactly once. This resulted in management overhead and inefficient hardware utilization.

While taking inspiration from the Kafka consumer group implementation, we redesigned the real-time consumption in Pinot to maintain consistent offset across multiple consumer groups. This allowed us to guarantee consistent data across all replicas. This enabled us to copy data from another consumer group during node addition, node failure or increasing the replication group.

In this talk, we will deep dive into the various challenges faced and considerations that went into this design, and learn what makes Pinot resilient to failures both in Kafka Brokers and Pinot Components. We will introduce the new concept of ""lockstep"" sequencing where multiple consumer groups can synchronize checkpoints periodically and maintain consistency. We'll describe how we achieve this while maintaining strict freshness SLAs, and also withstanding high throughput and ingestion.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Apache Pinot Case Study: Building Distributed Analytics Systems Using Apache Kafka (Neha Pawar, Stealth Mode Startup) Kafka Summit 2020

  1. 1. @apachepinot | @KishoreBytes Apache Pinot Case Study Building distributed analytics systems using Apache Kafka
  2. 2. @apachepinot | @KishoreBytes
  3. 3. @apachepinot | @KishoreBytes Pinot @LinkedIn
  4. 4. @apachepinot | @KishoreBytes 70+ Products Pinot @ LinkedIn User Facing Analytics 120k+ queries/sec ms - 1s latency
  5. 5. @apachepinot | @KishoreBytes Pinot @ LinkedIn Business Metrics Analytics 10k+ Metrics 50k+ Dimensions
  6. 6. @apachepinot | @KishoreBytes Pinot @ LinkedIn ThirdEye: Anomaly detection and root cause analysis 50+ Teams 100K Time Series
  7. 7. @apachepinot | @KishoreBytes Apache Pinot @ Other Companies 2.7k Github StarsSlack UsersCompanies 400+20+ Community has tripled in the last two quarters Join our growing community on the Apache Pinot Slack Channel https://communityinviter.com/apps/apache-pinot/apache-pinot
  8. 8. @apachepinot | @KishoreBytes User Facing Applications Business Facing Metrics Anomaly Detection Time Series Multiple Use Cases: One Platform Kafka 70+ 10k 100k 120k Queries/secEvents/sec 1M+
  9. 9. @apachepinot | @KishoreBytes Challenges of User facing real-time analytics Velocity of ingestion High Dimensionality 1000s of QPS Milliseconds Latency Seconds Freshness Highly Available Scalable Cost Effective User-facing real-time analytics system
  10. 10. @apachepinot | @KishoreBytes Pinot Real-time Ingestion Deep Dive
  11. 11. @apachepinot | @KishoreBytes Pinot Architecture Servers Brokers Queries Scatter Gather ● Servers - Consuming, indexing, serving ● Brokers - Scatter gather
  12. 12. @apachepinot | @KishoreBytes Server 1 Deep Store Pinot Realtime Ingestion Basics ● Kafka Consumer on Pinot Server ● Periodically create “Pinot segment” ● Persist to deep store ● In memory data - queryable ● Continue consumption
  13. 13. @apachepinot | @KishoreBytes Kafka Consumer Groups Approach 1
  14. 14. @apachepinot | @KishoreBytes Kafka Consumer Group based design ● Each consumer consumes from 1 or more partitions Server 2Server 1 time 3 partitions Consumer Group Kafka Consumer Kafka Consumer ● Periodic checkpointing ● Kafka Rebalancer Server1 starts consuming from 0 and 2 Checkpoint 350 Checkpoint 400 seg1 seg2 Kafka Rebalancer ● Fault tolerant consumption
  15. 15. @apachepinot | @KishoreBytes Challenges with Capacity Expansion Server 2S1 Add Server3 Partition 2 moves to Server 3 Server3 begins consumption from 400time Server 3 Duplicate Data! 3 partitions Kafka Consumer Kafka Consumer Consumer Group Kafka Consumer Checkpoint 350 Checkpoint 400 seg1 seg2 Kafka Rebalancer Server1 starts consuming from 0 and 2
  16. 16. @apachepinot | @KishoreBytes Deep store Multiple Consumer Groups Consumer Group 1 Consumer Group 2 3 partitions 2 replicas ● No control over partitions assigned to consumer ● No control over checkpointing ● Segment disparity Queries Fault tolerant ● Storage inefficient
  17. 17. @apachepinot | @KishoreBytes Operational Complexity Queries Consumer Group 1 Consumer Group 2 3 partitions 2 replicas ● Disable consumer group for node failure/capacity changes
  18. 18. @apachepinot | @KishoreBytes Server 4 Scalability limitation Queries Consumer Group 1 Consumer Group 2 3 partitions 2 replicas ● Scalability limited by #partitions Idle ● Cost inefficient
  19. 19. @apachepinot | @KishoreBytes Single node in a Consumer Group ● Eliminates incorrect results ● Reduced operational complexity Server 1 Server 2 ● Limited by capacity of 1 node ● Storage overhead ● Scalability limitation Consumer Group 1 Consumer Group 2 3 partitions 2 replicas The only deployment model that worked
  20. 20. @apachepinot | @KishoreBytes Incorrect Results Operational Complexity Storage overhead Limited scalability Expensive Multi-node Consumer Group Y Y Y Y Y Single-node Consumer Group Y Y Y Issues with Kafka Consumer Group based solution
  21. 21. @apachepinot | @KishoreBytes Problem 1 Lack of control with Kafka Rebalancer Solution Take control of partition assignment
  22. 22. @apachepinot | @KishoreBytes Problem 2 Segment Disparity due to checkpointing mechanism Solution Take control of checkpointing
  23. 23. @apachepinot | @KishoreBytes Partition Level Consumption Approach 2
  24. 24. @apachepinot | @KishoreBytes S1 S3 Partition Level Consumption Controller S23 partitions 2 replicas Partition Server State Start offset End offset S1 S2 CONSUMING CONSUMING 20 S3 S1 CONSUMING CONSUMING 20 S2 S3 CONSUMING CONSUMING 20 0 1 2 Cluster State ● Single coordinator across all replicas ● All actions determined by cluster state
  25. 25. @apachepinot | @KishoreBytes Deep Store S1 S3 Partition Level Consumption Controller S23 partitions 2 replicas Partition Server State Start offset End offset 0 S1 S2 CONSUMING CONSUMING 20 1 S3 S1 CONSUMING CONSUMING 20 2 S2 S3 CONSUMING CONSUMING 20 Cluster State Commit 80 110 110ONLINE ONLINE ● Only 1 server persists segment to deep store ● Only 1 copy stored
  26. 26. @apachepinot | @KishoreBytes Deep Store S1 S3 Partition Level Consumption Controller S23 partitions 2 replicas Partition Server State Start offset End offset 0 S1 S2 20 1 S3 S1 CONSUMING CONSUMING 20 2 S2 S3 CONSUMING CONSUMING 20 Cluster State 110 ONLINE ONLINE ● All other replicas ○ Download from deep store ● Segment equivalence
  27. 27. @apachepinot | @KishoreBytes Deep Store S1 S3 Partition Level Consumption Controller S23 partitions 2 replicas Partition Server State Start offset End offset 0 S1 S2 ONLINE ONLINE 20 110 1 S3 S1 CONSUMING CONSUMING 20 2 S2 S3 CONSUMING CONSUMING 20 Cluster State 0 S1 S2 CONSUMING CONSUMING 110 ● New segment state created ● Start where previous segment left off
  28. 28. @apachepinot | @KishoreBytes Deep Store S1 S3 Partition Level Consumption Controller S23 partitions 2 replicas Partition Server State Start offset End offset 0 S1 S2 ONLINE ONLINE 20 110 1 S3 S1 ONLINE ONLINE 20 120 2 S2 S3 ONLINE ONLINE 20 100 Cluster State 0 S1 S2 CONSUMING CONSUMING 110 1 S3 S1 CONSUMING CONSUMING 120 2 S2 S3 CONSUMING CONSUMING 100 ● Each partition independent of others
  29. 29. @apachepinot | @KishoreBytes Deep Store S1 S3 Capacity expansion Controller S23 partitions 2 replicas S4 ● Consuming segment - Restart consumption using offset in cluster state ● Pinot segment - Download from deep store ● Easy to handle changes in replication/partitions ● No duplicates! ● Cluster state table updated
  30. 30. @apachepinot | @KishoreBytes S1 S3 Node failures Controller S23 partitions 2 replicas S4 ● At least 1 replica still alive ● No complex operations
  31. 31. @apachepinot | @KishoreBytes S1 S3 Scalability Controller S23 partitions 2 replicas S4 ● Easily add nodes ● Segment equivalence = Smart segment assignment + Smart query routing S6 S5 Completed Servers Consuming Servers
  32. 32. @apachepinot | @KishoreBytes Incorrect Results Operational Complexity Storage overhead Limited scalability Expensive Multi-node Consumer Group Y Y Y Y Y Single-node Consumer Group Y Y Y Partition Level Consumers Summary
  33. 33. @apachepinot | @KishoreBytes Q&A pinot.apache.org @apachepinot

We built Apache Pinot - a real-time distributed OLAP datastore - for low-latency analytics at scale. This is heavily used at companies such as LinkedIn, Uber, Slack, where Kafka serves as the backbone for capturing vast amounts of data. Pinot ingests millions of events per sec from Kafka, builds indexes in real-time and serves 100K+ queries per second while ensuring latency SLA of millisecond to sub second. In the first implementation, we used the Consumer Group feature to manage the offsets and checkpoints across multiple Kafka Consumers. However, to achieve fault tolerance and scalability, we had to run multiple consumer groups for the same topic. This was our initial strategy to maintain the SLA at high query workload. But this model posed other challenges - since Kafka maintains offset per consumer group, achieving data consistency across multiple consumer groups was not possible. Also, a failure of a single node in a consumer group meant the entire consumer group was unavailable for query processing. Restarting the failed node needed lot of manual operations to ensure data is consumed exactly once. This resulted in management overhead and inefficient hardware utilization. While taking inspiration from the Kafka consumer group implementation, we redesigned the real-time consumption in Pinot to maintain consistent offset across multiple consumer groups. This allowed us to guarantee consistent data across all replicas. This enabled us to copy data from another consumer group during node addition, node failure or increasing the replication group. In this talk, we will deep dive into the various challenges faced and considerations that went into this design, and learn what makes Pinot resilient to failures both in Kafka Brokers and Pinot Components. We will introduce the new concept of ""lockstep"" sequencing where multiple consumer groups can synchronize checkpoints periodically and maintain consistency. We'll describe how we achieve this while maintaining strict freshness SLAs, and also withstanding high throughput and ingestion.

Views

Total views

4,305

On Slideshare

0

From embeds

0

Number of embeds

4,149

Actions

Downloads

16

Shares

0

Comments

0

Likes

0

×