Introduction to Apache BookKeeper Distributed Storage

Introduction to Apache
BookKeeper
April 2018

• Distributed storage service
• Developed at Yahoo
• Designed for low latency and scalability
• Architected for resiliency and data
durability
• Users include:
Introducing Apache BookKeeper
!2

BookKeeper design goals
!3
Write and read streams of entries
with very low latency (< 5 ms)
Ensure that stored data is durable,
consistent, and resilient
Immediate access to data—
stream or tail data as it is written
Efficiently store and access both
historic and real-time data

BookKeeper key capabilities
!4
Data consistency
Simple, repeatable read
consistency model
Data durability
Built-in replication
and resiliency
Performance
Efficient distribution of
load across cluster
Flexibility
Tunable write model to
optimize balance
Scalability
Isolation of writes and
reads for consistent
performance

BookKeeper core concepts
!5
Entry  
(aka record)
Sequence of bytes that is the
smallest unit of data storage and
access
Log
Ledger: append-only sequence
of records
Stream: unbounded, inﬁnite
sequence of data records
1 2 3 4 5 6
1 2 3
Entry
Stream
Ledger Ledger
1 2 3 4

• Bookies
• Individual BookKeeper storage node
• Bookies manage access to ledgers
• Ledgers striped across bookies
• Interfaces
• Ledger API: low-level API for direct interaction
with ledgers
• Distributed Log API: higher-level abstraction for
storing and reading data
• Metadata
• Stored in ZooKeeper cluster
• Ledger and ensemble information
Bookies
BookKeeper architecture
!6
Metadata
Client Interfaces
Ledger API Log API
Ledgers

Data storage in BookKeeper
!7
Physical storage
Logical view Segment 1 Segment 2 Segment 3 Segment 4 Segment n
Segment ……
…
…
…
Segment 1
Segment 2
Segment n
Bookie 1
Segment 1
Segment n
Segment 3
Bookie 2
Segment 4
Segment 2
Segment 3
Bookie 3
Segment 3
Segment 4
Segment n
Bookie 4
Segment 1
Segment 2
Segment 4
Bookie 5
• Data stored in segments
• Segments striped across bookies

Data storage in BookKeeper
!8
Physical storage
Logical view Segment 1 Segment 2 Segment 3 Segment 4 Segment n
Segment ……
…
…
…
Segment 1
Segment 2
Segment n
Bookie 1
Segment 1
Segment n
Segment 3
Bookie 2
Segment 4
Segment 2
Segment 3
Bookie 3
Segment 3
Segment 4
Segment n
Bookie 4
Segment 1
Segment 2
Segment 4
Bookie 5
• Data stored in segments
• Storage striped across bookies
• Segments replicated across cluster

Write quorum
ACK quorum
• Single bookie can serve and store
thousands of ledgers
• Separation of write and read paths
• Bookies use separate I/O paths for writes,
tailing reads, and catch-up reads
• Avoid read activity impact on write latency
• Entries sorted to allow for mostly
sequential reads
Consistent, low latency performance
!9
Bookie 1 Bookie 2 Bookie 3 Bookie 4
Writer
Reader

Part of the Streamlio platform for fast data
!10
Interfaces
APIs Libraries & Connectivity
Real-time processing
Messaging & queuing
Stream storage
ConnectorsClientData SourceStormKafka Functional
Management
Resource Management
Metadata
Security
Monitoring
Orchestration
Powered by
Powered by
Powered by

Introduction to Apache BookKeeper Distributed Storage

Introduction to Apache BookKeeper Distributed Storage

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to Apache BookKeeper Distributed Storage

Similar to Introduction to Apache BookKeeper Distributed Storage (20)

More from Streamlio

More from Streamlio (11)

Recently uploaded

Recently uploaded (20)

Introduction to Apache BookKeeper Distributed Storage