Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Bigtable

3,390 views

Published on

Published in: Technology
  • Be the first to comment

Bigtable

  1. 1. BigtableA Distributed Storage System for Structured Data Authors: Fay Chang et. al. Presenter: Zafar Gilani 1
  2. 2. BigtableOutline• Introduction• Data model• Implementation• Performance evaluation• Conclusions 2
  3. 3. BigtableA distributed storage system ..• .. for managing structured data.• Used for demanding workloads, such as: – Throughput oriented batch processing. – Serving latency-sensitive data to the client.• Dynamic control instead of relational model.• Data locality properties (revisit later briefly). 3
  4. 4. BigtableBigtable has achieved several goals• Wide applicability: used for 60+ Google products, including: – Google Analytics, Google Code, Google Earth, Google Maps and Gmail.• Scalability (explain later under evaluation).• High performance.• High availability. 4
  5. 5. BigtableOutline• Introduction• Data model• Implementation• Performance evaluation• Conclusions 5
  6. 6. Bigtable Data model • Essentially a sparse, distributed, persistent multi-dimensional sorted map. • The map is indexed by a row key, column key and a timestamp. • Atomic reads and writes over a single row. ColumnsRows 6
  7. 7. BigtableRow and column range• Row range dynamically • Column keys grouped partitioned into tablets. into column families.• Data in lexicographic • Each family has the order. same type.• Allows data locality. • Allows access control and disk or memory accounting. 7
  8. 8. BigtableRow and column range• Row range dynamically • Column keys grouped partitioned into tablets. into column families.• Data in lexicographic • Each family has the order. same type.• Allows data locality. • Allows access control and disk or memory accounting. Enables reasoning about data locality 8
  9. 9. ColumnsRows 9
  10. 10. ColumnsRows Anchor is a column family 10
  11. 11. ColumnsTablets “anchor:bbcworld.com “anchor:weather “com.bbc.www” “BBC” “BBC.com” 11
  12. 12. BigtableOutline• Introduction• Data model• Implementation• Performance evaluation• Conclusions 12
  13. 13. BigtableBigtable uses several othertechnologies• Google File System to store log and data files.• SSTable file format to store BigTable data.• Chubby, a distributed lock service.For more details on these technologies, refer to section 4 of the paper. 13
  14. 14. Bigtable ImplementationMaster responsibilities:-Assign tablets to tabletservers-Add/delete tablet servers-Balance tablet server load-GC-Schema changes INTERNET CLIENT Communicate directly to tablet servers MASTER TABLET SERVERS 14
  15. 15. BigtableHow data is stored?A three-level hierarchy, similar to B+ trees. 15
  16. 16. BigtableLocation hierarchy Chubby file contains location of the root tablet. 16
  17. 17. BigtableLocation hierarchy Root tablet contains all tablet locations in Metadata table. 17
  18. 18. BigtableLocation hierarchy Metadata table stores locations of actual tablets. 18
  19. 19. BigtableLocation hierarchy Client moves up the hierarchy (Metadata -> Root -> Chubby), if location of tablet is unknown or incorrect. 19
  20. 20. BigtableHow data is served? 20
  21. 21. BigtableTablet servingPersistent 21
  22. 22. Bigtable Tablet serving CompactionsCompactions occurregularly, advantages:-Shrinks memory usage.-Reduces amount of dataread from log duringrecovery. 22
  23. 23. BigtableOutline• Introduction• Data model• Implementation• Performance evaluation• Conclusions 23
  24. 24. BigtableBenchmarks for perf evaluation• Scan: – Scans over values in a row range.• Random reads from memory.• Random reads/writes: – R keys to be read/written spread over N clients.• Sequential reads/writes: – 0 to R-1 keys to be read/written spread over N clients. 24
  25. 25. BigtablePerformance evaluation Scan uses single RPC call and shows best performance. 25
  26. 26. BigtablePerformance evaluation Sequential reads are better than random reads, since each fetched block is used to serve next requests. 26
  27. 27. BigtablePerformance evaluation Random read shows the worst performance. Fetching 64KB every 1000 bytes is expensive. 27
  28. 28. BigtablePerformance evaluation Not linear, but scales well. 28
  29. 29. BigtableOutline• Introduction• Data model• Implementation• Performance evaluation• Conclusions 29
  30. 30. BigtableConclusions• Bigtable: highly scalable and available, without compromising performance.• Flexibility for Google – designed using their own data model.• Custom design gives Google the ability to remove or minimize bottlenecks.• Related work: – Apache Hbase (open source) – Boxwood (though targeted at a lower/FS level) 30
  31. 31. BigtableA Distributed Storage System for Structured Data Authors: Fay Chang et. al. Presenter: Zafar Gilani 31
  32. 32. BigtableB+ Trees• A tree with sorted data for: – Efficient insertion, retrieval and removal of records.• All records are stored at the leaf level, only keys stored in interior nodes. 32

×