Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Apache BookKeeper
A High Performance and Low Latency Storage Service
@sijieg (Sijie Guo, Twitter)
@jvjujjuri (JV, Salesfor...
I am Sijie Guo
- PMC Chair of Apache BookKeeper
- Co-creator of Apache DistributedLog
- Twitter Messaging/Pub-Sub Team
- Y...
Challenges in Distributed Systems
Expect Failures
up to 10% annual failure rates for disks/servers
“
Symptoms
Problem 1: Not Available
Problem 1: Not Available
Problem 2: Inconsistencies
CAP
“
More Issues
Problem 3: Split Brain
Writer A Writer A
Write A’
Writer A
Write A’
Two Writers
Problem 4: Failure Detection
B
A
C
Problem 5: Recovery
B
A
C
Recovery Protocol
Consistency
“
Solutions
Overview
Enter Apache BookKeeper
BookKeeper - Durable Storage
A Durable Storage Optimized for Immutable Data
Serve as a building block for reliable systems...
Immutable Data Abstraction
Ledger
◉ Segment
◉ Block / Object
◉ Append-Only File
◉ ...
Guarantees
If an entry
has been acknowledged,
it must be readable
If an entry
is read once,
it must always be readable
History
◉ Initial Use Case - Hadoop NameNode HA
◉ 2008: Open Sourced Contrib of ZooKeeper
◉ 2011: Sub-Project of ZooKeeper...
Inside of Apache BookKeeper
Details
Architecture
Bookie
Bookie
Bookie
APP
Client
Metadata Store
Ledger
Reliable Writes
◉ Store checksum along with entry
◉ Fsync entries before responding
◉ Ack when
○ All Previous Entries
○ Th...
Consistency - LastAddPushed
0 1 2 3 4 7 8 9
LastAdd
Pushed
10 11 12
Writer
Add entries
Consistency - LastAddConfirmed
0 1 2 3 4 7 8 9 10 11 12
LastAdd
Confirmed
Reader Reader
LastAdd
Confirmed
Writer Writer
Ow...
Fencing
Read Entry & Read LAC
B1 B2 B3
Client
Read Entry K
Speculative Reads
On Timeouts
B1 B2 B3
Client
Read LAC
Quorum Read
Long Poll Read
B1 B2 B3
Client
Long Poll Read
Speculative
Long Poll
Inside a Bookie
Use Cases
Apache BookKeeper as a Building Block
Projects built on BookKeeper
◉ Twitter: Apache DistributedLog
◉ Yahoo: Pulsar - Cloud Messaging Service
◉ Salesforce Distr...
“
Apache DistributedLog
(Twitter)
Apache DistributedLog
1 2 3 4 5 6 7 11
1
2
1
3
1
4
1
5
1
6
1
7
Oldest Newest
Log Segment
X
Log Segment
X+1
Log Segment
X+2...
Apache DistributedLog
MetadataStore
Log Segment
Store
(BK)
Cold
Storage
(HDFS)
Log Streams - Abstraction & Naming
- Data M...
DistributedLog at Twitter
◉ Manhattan Key/Value Store - WAL
◉ Durable Deferred RPC - Journal
◉ Real-Time Search Indexing -...
Scale DistributedLog at Twitter
◉ 1.5 trillion records/day, 17.5 petabytes/day
◉ O(10) thousands streams, O(1) million liv...
DistributedLog Resources
◉ Website - https://distributedlog.io
◉ Mail List -
dev@distributedlog.incubator.apache.org
◉ Pro...
“
Yahoo! Pulsar
(Cloud Messaging Service)
Yahoo! Pulsar
◉ Distributed Pub/Sub Messaging Platform
◉ Flexible Messaging Model - Topic and Queue
◉ Durable, Low Latency...
Yahoo! Pulsar
Scale Pulsar at Yahoo!
◉ 100 billion messages per day
◉ More than 1.4 million topics
◉ Avg publish latency across services...
Pulsar Performance
“
Salesforce Distributed Store
Salesforce Application Storage
◉ Store for Persistent WAL, Data and Objects
◉ Low, Constant Write Latencies
◉ Low, Constan...
Heterogeneous Stores
Roadmap, Releases, Future
Community
Community
◉ 7 PMC Members
◉ 10+ Committers
◉ 20+ Active Contributors
◉ 5+ Companies actively using/contributing
○ Twitter
...
Release 4.5.0
◉ Netty 4 Upgrade - Performance Improvements
◉ Security (Authentication & Authorization) Support
◉ Explicit ...
Future
◉ Scalable Segment Store
○ Object, Log, File, Stream, …
◉ Long Term Storage
○ Disk Scrubber
○ Better Lifecycle Mana...
Any questions ?
You can find me at
◉ @sijieg
◉ guosijie@gmail.com
Thanks!
Upcoming SlideShare
Loading in …5
×

Apache BookKeeper: A High Performance and Low Latency Storage Service

2,443 views

Published on

Apache BookKeeper is a high performance and low latency storage service optimized for storing immutable and append-only data (such as log, streaming events, and objects). Sijie Guo and JV shares the experienced with Apache BookKeeper. This talk covers the motivation and overview of BookKeeper, dives into implementation details and describes the use cases built upon it.

Published in: Technology
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Apache BookKeeper: A High Performance and Low Latency Storage Service

  1. 1. Apache BookKeeper A High Performance and Low Latency Storage Service @sijieg (Sijie Guo, Twitter) @jvjujjuri (JV, Salesforce)
  2. 2. I am Sijie Guo - PMC Chair of Apache BookKeeper - Co-creator of Apache DistributedLog - Twitter Messaging/Pub-Sub Team - Yahoo! R&D Beijing Hello!
  3. 3. Challenges in Distributed Systems
  4. 4. Expect Failures up to 10% annual failure rates for disks/servers
  5. 5. “ Symptoms
  6. 6. Problem 1: Not Available
  7. 7. Problem 1: Not Available
  8. 8. Problem 2: Inconsistencies
  9. 9. CAP
  10. 10. “ More Issues
  11. 11. Problem 3: Split Brain Writer A Writer A Write A’ Writer A Write A’ Two Writers
  12. 12. Problem 4: Failure Detection B A C
  13. 13. Problem 5: Recovery B A C Recovery Protocol Consistency
  14. 14. “ Solutions
  15. 15. Overview Enter Apache BookKeeper
  16. 16. BookKeeper - Durable Storage A Durable Storage Optimized for Immutable Data Serve as a building block for reliable systems Commodity Hardware Durability Replication Consistency Recovery Client Library
  17. 17. Immutable Data Abstraction
  18. 18. Ledger ◉ Segment ◉ Block / Object ◉ Append-Only File ◉ ...
  19. 19. Guarantees If an entry has been acknowledged, it must be readable If an entry is read once, it must always be readable
  20. 20. History ◉ Initial Use Case - Hadoop NameNode HA ◉ 2008: Open Sourced Contrib of ZooKeeper ◉ 2011: Sub-Project of ZooKeeper ◉ 2012: Yahoo! Push Notification ◉ 2012~Now: DistributedLog, Pulsar, Majordodo ◉ 2015~Now: Salesforce Distributed Store
  21. 21. Inside of Apache BookKeeper Details
  22. 22. Architecture Bookie Bookie Bookie APP Client Metadata Store Ledger
  23. 23. Reliable Writes ◉ Store checksum along with entry ◉ Fsync entries before responding ◉ Ack when ○ All Previous Entries ○ This Entry Bookie Bookie Bookie Accepted by Quorum
  24. 24. Consistency - LastAddPushed 0 1 2 3 4 7 8 9 LastAdd Pushed 10 11 12 Writer Add entries
  25. 25. Consistency - LastAddConfirmed 0 1 2 3 4 7 8 9 10 11 12 LastAdd Confirmed Reader Reader LastAdd Confirmed Writer Writer Ownership Changed Add entries Ack Adds Fencing
  26. 26. Fencing
  27. 27. Read Entry & Read LAC B1 B2 B3 Client Read Entry K Speculative Reads On Timeouts B1 B2 B3 Client Read LAC Quorum Read
  28. 28. Long Poll Read B1 B2 B3 Client Long Poll Read Speculative Long Poll
  29. 29. Inside a Bookie
  30. 30. Use Cases Apache BookKeeper as a Building Block
  31. 31. Projects built on BookKeeper ◉ Twitter: Apache DistributedLog ◉ Yahoo: Pulsar - Cloud Messaging Service ◉ Salesforce Distributed Store. ◉ Huawei - HDFS NameNode HA ◉ HubSpot - WAL ◉ Majordodo - Distributed Resource Manager
  32. 32. “ Apache DistributedLog (Twitter)
  33. 33. Apache DistributedLog 1 2 3 4 5 6 7 11 1 2 1 3 1 4 1 5 1 6 1 7 Oldest Newest Log Segment X Log Segment X+1 Log Segment X+2 Apache BookKeeper
  34. 34. Apache DistributedLog MetadataStore Log Segment Store (BK) Cold Storage (HDFS) Log Streams - Abstraction & Naming - Data Management - Efficient Write & Read - Intra-cluster & Geo Replication - Segments - Raw Streams Write Proxy Read Proxy - Ownership Tracking - Batching, Compression Record Cache - Rate Limiting, Quota - - Serving - Applications - Different Consumer models DBs - e.g., Twitter’s Manhattan Deferred RPC (queuing) Self-serve Pub/Sub Stream Computing Cross DC Replication
  35. 35. DistributedLog at Twitter ◉ Manhattan Key/Value Store - WAL ◉ Durable Deferred RPC - Journal ◉ Real-Time Search Indexing - Change Propagation ◉ Self-serve Pub/Sub - Message Delivery, Ads Pipeline ◉ Stream Computing ○ Source & Sink ○ Stateful Processing in Heron (coming soon) ◉ Reliable Cross Datacenter Replication
  36. 36. Scale DistributedLog at Twitter ◉ 1.5 trillion records/day, 17.5 petabytes/day ◉ O(10) thousands streams, O(1) million live ledgers ◉ O(10^2) bookies, O(10^3) proxies ◉ Records size from 100 bytes to 20 KB to even more ◉ Data is kept from hours to days, even up to a year ◉ Replication factor is 3 or 5. 9 or 15 for global use case.
  37. 37. DistributedLog Resources ◉ Website - https://distributedlog.io ◉ Mail List - dev@distributedlog.incubator.apache.org ◉ Project Ideas - https://cwiki.apache.org/confluence/display/DL/Project+Ideas ◉ Paper - “DistributedLog: A high performance replicated log service” (ICDE 2017)
  38. 38. “ Yahoo! Pulsar (Cloud Messaging Service)
  39. 39. Yahoo! Pulsar ◉ Distributed Pub/Sub Messaging Platform ◉ Flexible Messaging Model - Topic and Queue ◉ Durable, Low Latency ◉ Strong Ordering and Consistency Guarantees ◉ Geo Replication ◉ Apache BookKeeper as Durable Message Store
  40. 40. Yahoo! Pulsar
  41. 41. Scale Pulsar at Yahoo! ◉ 100 billion messages per day ◉ More than 1.4 million topics ◉ Avg publish latency across services of less than 5ms ◉ 10+ data centers, cross-region replications
  42. 42. Pulsar Performance
  43. 43. “ Salesforce Distributed Store
  44. 44. Salesforce Application Storage ◉ Store for Persistent WAL, Data and Objects ◉ Low, Constant Write Latencies ◉ Low, Constant Random Read Latencies ◉ Highly Available, Consistent ◉ Distributed and Linearly Scalable ◉ On Commodity Hardware
  45. 45. Heterogeneous Stores
  46. 46. Roadmap, Releases, Future Community
  47. 47. Community ◉ 7 PMC Members ◉ 10+ Committers ◉ 20+ Active Contributors ◉ 5+ Companies actively using/contributing ○ Twitter ○ Yahoo! ○ Salesforce ○ Huawei ○ EMC
  48. 48. Release 4.5.0 ◉ Netty 4 Upgrade - Performance Improvements ◉ Security (Authentication & Authorization) Support ◉ Explicit LAC ◉ Long Poll Read Support ◉ Auto Re-replication Improvements ◉ ...
  49. 49. Future ◉ Scalable Segment Store ○ Object, Log, File, Stream, … ◉ Long Term Storage ○ Disk Scrubber ○ Better Lifecycle Management ○ … ◉ Beyond the limit ○ 128 bits support ○ Scalable metadata management
  50. 50. Any questions ? You can find me at ◉ @sijieg ◉ guosijie@gmail.com Thanks!

×