Bigtable

A Distributed Storage System for
         Structured Data

        Authors: Fay Chang et. al.

        Presenter: Zafar Gilani

                                     1
Bigtable


Outline
•   Introduction
•   Data model
•   Implementation
•   Performance evaluation
•   Conclusions




                               2
Bigtable


A distributed storage system ..
• .. for managing structured data.
• Used for demanding workloads, such as:
   – Throughput oriented batch processing.
   – Serving latency-sensitive data to the client.
• Dynamic control instead of relational model.
• Data locality properties (revisit later briefly).



                                                        3
Bigtable


Bigtable has achieved several goals
• Wide applicability: used for 60+ Google
  products, including:
  – Google Analytics, Google Code, Google Earth,
    Google Maps and Gmail.
• Scalability (explain later under evaluation).
• High performance.
• High availability.


                                                     4
Bigtable


Outline
•   Introduction
•   Data model
•   Implementation
•   Performance evaluation
•   Conclusions




                               5
Bigtable


  Data model
  • Essentially a sparse, distributed, persistent
    multi-dimensional sorted map.
  • The map is indexed by a row key, column key
    and a timestamp.
  • Atomic reads and writes over a single row.
                            Columns




Rows


                                                    6
Bigtable


Row and column range
• Row range dynamically       • Column keys grouped
  partitioned into tablets.     into column families.
• Data in lexicographic       • Each family has the
  order.                        same type.
• Allows data locality.       • Allows access control
                                and disk or memory
                                accounting.




                                                          7
Bigtable


Row and column range
• Row range dynamically             • Column keys grouped
  partitioned into tablets.           into column families.
• Data in lexicographic             • Each family has the
  order.                              same type.
• Allows data locality.             • Allows access control
                                      and disk or memory
                                      accounting.


               Enables reasoning about data locality

                                                                8
Columns




Rows




                 9
Columns




Rows




       Anchor is a column family




                                   10
Columns




Tablets
                          “anchor:bbcworld.com   “anchor:weather


          “com.bbc.www”          “BBC”            “BBC.com”




                                                                   11
Bigtable


Outline
•   Introduction
•   Data model
•   Implementation
•   Performance evaluation
•   Conclusions




                              12
Bigtable
Bigtable uses several other
technologies
• Google File System to store log and data files.
• SSTable file format to store BigTable data.
• Chubby, a distributed lock service.




For more details on these technologies, refer to section 4 of the paper.


                                                                            13
Bigtable


  Implementation
Master responsibilities:
-Assign tablets to tablet
servers
-Add/delete tablet servers
-Balance tablet server load
-GC
-Schema changes
                                                   INTERNET

                                                                      CLIENT
                                               Communicate directly
                                               to tablet servers


               MASTER
                              TABLET SERVERS
                                                                       14
Bigtable


How data is stored?
A three-level hierarchy, similar to B+ trees.




                                                 15
Bigtable


Location hierarchy
 Chubby file contains location
     of the root tablet.




                                  16
Bigtable


Location hierarchy
       Root tablet contains all
     tablet locations in Metadata
                 table.




                                     17
Bigtable


Location hierarchy     Metadata table stores
                     locations of actual tablets.




                                                     18
Bigtable


Location hierarchy




  Client moves up the hierarchy (Metadata -> Root -> Chubby),
  if location of tablet is unknown or incorrect.
                                                                 19
Bigtable


How data is served?




                       20
Bigtable


Tablet serving




Persistent




                  21
Bigtable


  Tablet serving


                          Compactions




Compactions occur
regularly, advantages:
-Shrinks memory usage.
-Reduces amount of data
read from log during
recovery.                                22
Bigtable


Outline
•   Introduction
•   Data model
•   Implementation
•   Performance evaluation
•   Conclusions




                              23
Bigtable


Benchmarks for perf evaluation
• Scan:
  – Scans over values in a row range.
• Random reads from memory.
• Random reads/writes:
  – R keys to be read/written spread over N clients.
• Sequential reads/writes:
  – 0 to R-1 keys to be read/written spread over N
    clients.

                                                        24
Bigtable


Performance evaluation
                     Scan uses single RPC call and
                       shows best performance.




                                               25
Bigtable


Performance evaluation



                          Sequential reads are better
                           than random reads, since
                         each fetched block is used to
                             serve next requests.




                                                 26
Bigtable


Performance evaluation




                     Random read shows the worst
                      performance. Fetching 64KB
                     every 1000 bytes is expensive.




                                              27
Bigtable


Performance evaluation
    Not linear, but scales well.




                                    28
Bigtable


Outline
•   Introduction
•   Data model
•   Implementation
•   Performance evaluation
•   Conclusions




                              29
Bigtable


Conclusions
• Bigtable: highly scalable and available, without
  compromising performance.
• Flexibility for Google – designed using their
  own data model.
• Custom design gives Google the ability to
  remove or minimize bottlenecks.
• Related work:
  – Apache Hbase (open source)
  – Boxwood (though targeted at a lower/FS level)

                                                     30
Bigtable

A Distributed Storage System for
         Structured Data

        Authors: Fay Chang et. al.

        Presenter: Zafar Gilani

                                     31
Bigtable


B+ Trees
• A tree with sorted data for:
  – Efficient insertion, retrieval and removal of
    records.
• All records are stored at the leaf level, only
  keys stored in interior nodes.




                                                     32

Bigtable

  • 1.
    Bigtable A Distributed StorageSystem for Structured Data Authors: Fay Chang et. al. Presenter: Zafar Gilani 1
  • 2.
    Bigtable Outline • Introduction • Data model • Implementation • Performance evaluation • Conclusions 2
  • 3.
    Bigtable A distributed storagesystem .. • .. for managing structured data. • Used for demanding workloads, such as: – Throughput oriented batch processing. – Serving latency-sensitive data to the client. • Dynamic control instead of relational model. • Data locality properties (revisit later briefly). 3
  • 4.
    Bigtable Bigtable has achievedseveral goals • Wide applicability: used for 60+ Google products, including: – Google Analytics, Google Code, Google Earth, Google Maps and Gmail. • Scalability (explain later under evaluation). • High performance. • High availability. 4
  • 5.
    Bigtable Outline • Introduction • Data model • Implementation • Performance evaluation • Conclusions 5
  • 6.
    Bigtable Datamodel • Essentially a sparse, distributed, persistent multi-dimensional sorted map. • The map is indexed by a row key, column key and a timestamp. • Atomic reads and writes over a single row. Columns Rows 6
  • 7.
    Bigtable Row and columnrange • Row range dynamically • Column keys grouped partitioned into tablets. into column families. • Data in lexicographic • Each family has the order. same type. • Allows data locality. • Allows access control and disk or memory accounting. 7
  • 8.
    Bigtable Row and columnrange • Row range dynamically • Column keys grouped partitioned into tablets. into column families. • Data in lexicographic • Each family has the order. same type. • Allows data locality. • Allows access control and disk or memory accounting. Enables reasoning about data locality 8
  • 9.
  • 10.
    Columns Rows Anchor is a column family 10
  • 11.
    Columns Tablets “anchor:bbcworld.com “anchor:weather “com.bbc.www” “BBC” “BBC.com” 11
  • 12.
    Bigtable Outline • Introduction • Data model • Implementation • Performance evaluation • Conclusions 12
  • 13.
    Bigtable Bigtable uses severalother technologies • Google File System to store log and data files. • SSTable file format to store BigTable data. • Chubby, a distributed lock service. For more details on these technologies, refer to section 4 of the paper. 13
  • 14.
    Bigtable Implementation Masterresponsibilities: -Assign tablets to tablet servers -Add/delete tablet servers -Balance tablet server load -GC -Schema changes INTERNET CLIENT Communicate directly to tablet servers MASTER TABLET SERVERS 14
  • 15.
    Bigtable How data isstored? A three-level hierarchy, similar to B+ trees. 15
  • 16.
    Bigtable Location hierarchy Chubbyfile contains location of the root tablet. 16
  • 17.
    Bigtable Location hierarchy Root tablet contains all tablet locations in Metadata table. 17
  • 18.
    Bigtable Location hierarchy Metadata table stores locations of actual tablets. 18
  • 19.
    Bigtable Location hierarchy Client moves up the hierarchy (Metadata -> Root -> Chubby), if location of tablet is unknown or incorrect. 19
  • 20.
  • 21.
  • 22.
    Bigtable Tabletserving Compactions Compactions occur regularly, advantages: -Shrinks memory usage. -Reduces amount of data read from log during recovery. 22
  • 23.
    Bigtable Outline • Introduction • Data model • Implementation • Performance evaluation • Conclusions 23
  • 24.
    Bigtable Benchmarks for perfevaluation • Scan: – Scans over values in a row range. • Random reads from memory. • Random reads/writes: – R keys to be read/written spread over N clients. • Sequential reads/writes: – 0 to R-1 keys to be read/written spread over N clients. 24
  • 25.
    Bigtable Performance evaluation Scan uses single RPC call and shows best performance. 25
  • 26.
    Bigtable Performance evaluation Sequential reads are better than random reads, since each fetched block is used to serve next requests. 26
  • 27.
    Bigtable Performance evaluation Random read shows the worst performance. Fetching 64KB every 1000 bytes is expensive. 27
  • 28.
    Bigtable Performance evaluation Not linear, but scales well. 28
  • 29.
    Bigtable Outline • Introduction • Data model • Implementation • Performance evaluation • Conclusions 29
  • 30.
    Bigtable Conclusions • Bigtable: highlyscalable and available, without compromising performance. • Flexibility for Google – designed using their own data model. • Custom design gives Google the ability to remove or minimize bottlenecks. • Related work: – Apache Hbase (open source) – Boxwood (though targeted at a lower/FS level) 30
  • 31.
    Bigtable A Distributed StorageSystem for Structured Data Authors: Fay Chang et. al. Presenter: Zafar Gilani 31
  • 32.
    Bigtable B+ Trees • Atree with sorted data for: – Efficient insertion, retrieval and removal of records. • All records are stored at the leaf level, only keys stored in interior nodes. 32