Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to Apache BookKeeper Distributed Storage

233 views

Published on

A brief technical introduction to Apache BookKeeper, the scalable, fault-tolerant, and low-latency storage service optimized for real-time and streaming workloads.

Published in: Software
  • Be the first to comment

Introduction to Apache BookKeeper Distributed Storage

  1. 1. Introduction to Apache BookKeeper April 2018
  2. 2. • Distributed storage service • Developed at Yahoo • Designed for low latency and scalability • Architected for resiliency and data durability • Users include: Introducing Apache BookKeeper !2
  3. 3. BookKeeper design goals !3 Write and read streams of entries with very low latency (< 5 ms) Ensure that stored data is durable, consistent, and resilient Immediate access to data— stream or tail data as it is written Efficiently store and access both historic and real-time data
  4. 4. BookKeeper key capabilities !4 Data consistency Simple, repeatable read consistency model Data durability Built-in replication and resiliency Performance Efficient distribution of load across cluster Flexibility Tunable write model to optimize balance Scalability Isolation of writes and reads for consistent performance
  5. 5. BookKeeper core concepts !5 Entry 
 (aka record) Sequence of bytes that is the smallest unit of data storage and access Log Ledger: append-only sequence of records Stream: unbounded, infinite sequence of data records 1 2 3 4 5 6 1 2 3 Entry Stream Ledger Ledger 1 2 3 4
  6. 6. • Bookies • Individual BookKeeper storage node • Bookies manage access to ledgers • Ledgers striped across bookies • Interfaces • Ledger API: low-level API for direct interaction with ledgers • Distributed Log API: higher-level abstraction for storing and reading data • Metadata • Stored in ZooKeeper cluster • Ledger and ensemble information Bookies BookKeeper architecture !6 Metadata Client Interfaces Ledger API Log API Ledgers
  7. 7. Data storage in BookKeeper !7 Physical storage Logical view Segment 1 Segment 2 Segment 3 Segment 4 Segment n Segment …… … … … Segment 1 Segment 2 Segment n Bookie 1 Segment 1 Segment n Segment 3 Bookie 2 Segment 4 Segment 2 Segment 3 Bookie 3 Segment 3 Segment 4 Segment n Bookie 4 Segment 1 Segment 2 Segment 4 Bookie 5 • Data stored in segments • Segments striped across bookies
  8. 8. Data storage in BookKeeper !8 Physical storage Logical view Segment 1 Segment 2 Segment 3 Segment 4 Segment n Segment …… … … … Segment 1 Segment 2 Segment n Bookie 1 Segment 1 Segment n Segment 3 Bookie 2 Segment 4 Segment 2 Segment 3 Bookie 3 Segment 3 Segment 4 Segment n Bookie 4 Segment 1 Segment 2 Segment 4 Bookie 5 • Data stored in segments • Storage striped across bookies • Segments replicated across cluster
  9. 9. Write quorum ACK quorum • Single bookie can serve and store thousands of ledgers • Separation of write and read paths • Bookies use separate I/O paths for writes, tailing reads, and catch-up reads • Avoid read activity impact on write latency • Entries sorted to allow for mostly sequential reads Consistent, low latency performance !9 Bookie 1 Bookie 2 Bookie 3 Bookie 4 Writer Reader
  10. 10. Part of the Streamlio platform for fast data !10 Interfaces APIs Libraries & Connectivity Real-time processing Messaging & queuing Stream storage ConnectorsClientData SourceStormKafka Functional Management Resource Management Metadata Security Monitoring Orchestration Powered by Powered by Powered by

×