BigtableA Distributed Storage System for         Structured Data        Authors: Fay Chang et. al.        Presenter: Zafar...
BigtableOutline•   Introduction•   Data model•   Implementation•   Performance evaluation•   Conclusions                  ...
BigtableA distributed storage system ..• .. for managing structured data.• Used for demanding workloads, such as:   – Thro...
BigtableBigtable has achieved several goals• Wide applicability: used for 60+ Google  products, including:  – Google Analy...
BigtableOutline•   Introduction•   Data model•   Implementation•   Performance evaluation•   Conclusions                  ...
Bigtable  Data model  • Essentially a sparse, distributed, persistent    multi-dimensional sorted map.  • The map is index...
BigtableRow and column range• Row range dynamically       • Column keys grouped  partitioned into tablets.     into column...
BigtableRow and column range• Row range dynamically             • Column keys grouped  partitioned into tablets.          ...
ColumnsRows                 9
ColumnsRows       Anchor is a column family                                   10
ColumnsTablets                          “anchor:bbcworld.com   “anchor:weather          “com.bbc.www”          “BBC”      ...
BigtableOutline•   Introduction•   Data model•   Implementation•   Performance evaluation•   Conclusions                  ...
BigtableBigtable uses several othertechnologies• Google File System to store log and data files.• SSTable file format to s...
Bigtable  ImplementationMaster responsibilities:-Assign tablets to tabletservers-Add/delete tablet servers-Balance tablet ...
BigtableHow data is stored?A three-level hierarchy, similar to B+ trees.                                                 15
BigtableLocation hierarchy Chubby file contains location     of the root tablet.                                  16
BigtableLocation hierarchy       Root tablet contains all     tablet locations in Metadata                 table.         ...
BigtableLocation hierarchy     Metadata table stores                     locations of actual tablets.                     ...
BigtableLocation hierarchy  Client moves up the hierarchy (Metadata -> Root -> Chubby),  if location of tablet is unknown ...
BigtableHow data is served?                       20
BigtableTablet servingPersistent                  21
Bigtable  Tablet serving                          CompactionsCompactions occurregularly, advantages:-Shrinks memory usage....
BigtableOutline•   Introduction•   Data model•   Implementation•   Performance evaluation•   Conclusions                  ...
BigtableBenchmarks for perf evaluation• Scan:  – Scans over values in a row range.• Random reads from memory.• Random read...
BigtablePerformance evaluation                     Scan uses single RPC call and                       shows best performa...
BigtablePerformance evaluation                          Sequential reads are better                           than random ...
BigtablePerformance evaluation                     Random read shows the worst                      performance. Fetching ...
BigtablePerformance evaluation    Not linear, but scales well.                                    28
BigtableOutline•   Introduction•   Data model•   Implementation•   Performance evaluation•   Conclusions                  ...
BigtableConclusions• Bigtable: highly scalable and available, without  compromising performance.• Flexibility for Google –...
BigtableA Distributed Storage System for         Structured Data        Authors: Fay Chang et. al.        Presenter: Zafar...
BigtableB+ Trees• A tree with sorted data for:  – Efficient insertion, retrieval and removal of    records.• All records a...
Upcoming SlideShare
Loading in...5
×

Bigtable

2,055

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,055
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
181
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Transcript of "Bigtable"

  1. 1. BigtableA Distributed Storage System for Structured Data Authors: Fay Chang et. al. Presenter: Zafar Gilani 1
  2. 2. BigtableOutline• Introduction• Data model• Implementation• Performance evaluation• Conclusions 2
  3. 3. BigtableA distributed storage system ..• .. for managing structured data.• Used for demanding workloads, such as: – Throughput oriented batch processing. – Serving latency-sensitive data to the client.• Dynamic control instead of relational model.• Data locality properties (revisit later briefly). 3
  4. 4. BigtableBigtable has achieved several goals• Wide applicability: used for 60+ Google products, including: – Google Analytics, Google Code, Google Earth, Google Maps and Gmail.• Scalability (explain later under evaluation).• High performance.• High availability. 4
  5. 5. BigtableOutline• Introduction• Data model• Implementation• Performance evaluation• Conclusions 5
  6. 6. Bigtable Data model • Essentially a sparse, distributed, persistent multi-dimensional sorted map. • The map is indexed by a row key, column key and a timestamp. • Atomic reads and writes over a single row. ColumnsRows 6
  7. 7. BigtableRow and column range• Row range dynamically • Column keys grouped partitioned into tablets. into column families.• Data in lexicographic • Each family has the order. same type.• Allows data locality. • Allows access control and disk or memory accounting. 7
  8. 8. BigtableRow and column range• Row range dynamically • Column keys grouped partitioned into tablets. into column families.• Data in lexicographic • Each family has the order. same type.• Allows data locality. • Allows access control and disk or memory accounting. Enables reasoning about data locality 8
  9. 9. ColumnsRows 9
  10. 10. ColumnsRows Anchor is a column family 10
  11. 11. ColumnsTablets “anchor:bbcworld.com “anchor:weather “com.bbc.www” “BBC” “BBC.com” 11
  12. 12. BigtableOutline• Introduction• Data model• Implementation• Performance evaluation• Conclusions 12
  13. 13. BigtableBigtable uses several othertechnologies• Google File System to store log and data files.• SSTable file format to store BigTable data.• Chubby, a distributed lock service.For more details on these technologies, refer to section 4 of the paper. 13
  14. 14. Bigtable ImplementationMaster responsibilities:-Assign tablets to tabletservers-Add/delete tablet servers-Balance tablet server load-GC-Schema changes INTERNET CLIENT Communicate directly to tablet servers MASTER TABLET SERVERS 14
  15. 15. BigtableHow data is stored?A three-level hierarchy, similar to B+ trees. 15
  16. 16. BigtableLocation hierarchy Chubby file contains location of the root tablet. 16
  17. 17. BigtableLocation hierarchy Root tablet contains all tablet locations in Metadata table. 17
  18. 18. BigtableLocation hierarchy Metadata table stores locations of actual tablets. 18
  19. 19. BigtableLocation hierarchy Client moves up the hierarchy (Metadata -> Root -> Chubby), if location of tablet is unknown or incorrect. 19
  20. 20. BigtableHow data is served? 20
  21. 21. BigtableTablet servingPersistent 21
  22. 22. Bigtable Tablet serving CompactionsCompactions occurregularly, advantages:-Shrinks memory usage.-Reduces amount of dataread from log duringrecovery. 22
  23. 23. BigtableOutline• Introduction• Data model• Implementation• Performance evaluation• Conclusions 23
  24. 24. BigtableBenchmarks for perf evaluation• Scan: – Scans over values in a row range.• Random reads from memory.• Random reads/writes: – R keys to be read/written spread over N clients.• Sequential reads/writes: – 0 to R-1 keys to be read/written spread over N clients. 24
  25. 25. BigtablePerformance evaluation Scan uses single RPC call and shows best performance. 25
  26. 26. BigtablePerformance evaluation Sequential reads are better than random reads, since each fetched block is used to serve next requests. 26
  27. 27. BigtablePerformance evaluation Random read shows the worst performance. Fetching 64KB every 1000 bytes is expensive. 27
  28. 28. BigtablePerformance evaluation Not linear, but scales well. 28
  29. 29. BigtableOutline• Introduction• Data model• Implementation• Performance evaluation• Conclusions 29
  30. 30. BigtableConclusions• Bigtable: highly scalable and available, without compromising performance.• Flexibility for Google – designed using their own data model.• Custom design gives Google the ability to remove or minimize bottlenecks.• Related work: – Apache Hbase (open source) – Boxwood (though targeted at a lower/FS level) 30
  31. 31. BigtableA Distributed Storage System for Structured Data Authors: Fay Chang et. al. Presenter: Zafar Gilani 31
  32. 32. BigtableB+ Trees• A tree with sorted data for: – Efficient insertion, retrieval and removal of records.• All records are stored at the leaf level, only keys stored in interior nodes. 32
  1. Gostou de algum slide específico?

    Recortar slides é uma maneira fácil de colecionar informações para acessar mais tarde.

×