Google Bigtable
The magic behind Google’s
data management
Overview
● Introduction
● Challenges
● Data model
● Building blocks
● Conclusion
➔ Bigtable is Google’s cloud based data storage service.
➔ It works on distributed parallel architecture and clustering.
➔ It is self managing, highly scalable, fault tolerant and flexible.
➔ Bigtable provide low latency real time access and improved higher workload
processing.
➔ It provides integration capabilities with other products and services through
API’s
➔ Many services by Google use Bigtable to store data , including Gmail, Youtube,
web indexing, Google Maps and Google Analytics
Intro
Original Idea
Challenges
Jeffrey and sanjay decided to build a datastore service that could scale linearly across thousands of
commodity servers.
● Using cheap hardware may lead to system failure.
● How to retain performance at high scale
-- Compromise with few things
>Abandon traditional relational model (No joins )
>Replication of data
>Using parallel and distributed architecture
Data Model
A Bigtable is a sparse, distributed, persistent multi-dimensional sorted map. The map is indexed by a
row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.
Bigtable considers data as strings, both in case of structured and unstructured data.
● Rows
➔ The row keys in a table are arbitrary strings.
➔ Data is maintained in lexicographic order by row key
➔ Each row range is called a tablet, which is the unit of distribution and load balancing.
● Columns
➔ Column keys are grouped into sets called column families.
➔ Data stored in a column family is usually of the same type
➔ A column key is named using the syntax: family : qualifier.
➔ Column family names must be printable , but qualifiers may be arbitrary strings.
● Timestamps
➔ Each cell in a Bigtable can contain multiple versions of the same data
➔ Versions are indexed by 64-bit integer timestamps
➔ Timestamps can be assigned: automatically by Bigtable , or explicitly by client applications
Rows Timestamps
Columns
Building Blocks
Bigtable is built on several other pieces of Google infrastructure.
● Google File system(GFS)
● SSTable : Data structure for storage
● Chubby: Distributed lock service.
Three major components
❖ Library linked into every client
❖ Single master server
▪ Assigning tablets to tablet servers
▪ Detecting addition and expiration of tablet servers
▪ Balancing tablet-server load
▪ Garbage collection files in GFS
❖ Many tablet servers
▪ Manages a set of tablets
▪ Tablet servers handle read and write requests to its table
▪ Splits tablets that have grown too large
Three level hierarchy
Level 2: Root tablet contains the location of METADATA tablets
Level 3: Each METADATA tablet contains the location of user tablets
Level 1: Chubby file containing location of the root tablet
▪ Location of tablet is stored under a row key that
encodes table identifier and its end row
“All models are wrong. Some models are
useful.”
- George Box,"one of the great statistical minds of the 20th century”
Distributed and
parallel computing
has paved the way
for new
technologies to
flourish.
Conclusion
Bigtable has provided low latency real time access and improved higher workload processing with high scalability and
high throughput. It’s Robust fault tolerant architecture helps to reduce risk of data loss, reliable cluster resizing enables to
provision or de-provision the new cluster with no down time , autonomous management let’s the user be free of
managing the tasks and assignment of data, while Bigtable does it automatically and provided integration capabilities
with other products and services through API’s really make it a general purpose data store, extending it’s capability and
giving user a reliable interface to get more out of less. Bigtable uses a parallel and distributed architecture to process the
data at lightning speeds while reducing cost per computation, the architecture at back end is advanced and proved to be
better in performance and user experience.
With the demand of huge cloud data storage making so much sense now, Bigtable has landed being one of the best
possible solution with lower cost, high performance, durability and flexibility. Since it’s already powering most of
Google’s services , it has proved its usability, and its really the Google’s magic behind it’s data management and high
performance operability, giving it an edge over other giants in the field.
Thanks!
For giving
Your
Precious
Time.

Google Bigtable

  • 1.
    Google Bigtable The magicbehind Google’s data management
  • 2.
    Overview ● Introduction ● Challenges ●Data model ● Building blocks ● Conclusion
  • 3.
    ➔ Bigtable isGoogle’s cloud based data storage service. ➔ It works on distributed parallel architecture and clustering. ➔ It is self managing, highly scalable, fault tolerant and flexible. ➔ Bigtable provide low latency real time access and improved higher workload processing. ➔ It provides integration capabilities with other products and services through API’s ➔ Many services by Google use Bigtable to store data , including Gmail, Youtube, web indexing, Google Maps and Google Analytics Intro
  • 4.
  • 5.
    Challenges Jeffrey and sanjaydecided to build a datastore service that could scale linearly across thousands of commodity servers. ● Using cheap hardware may lead to system failure. ● How to retain performance at high scale -- Compromise with few things >Abandon traditional relational model (No joins ) >Replication of data >Using parallel and distributed architecture
  • 6.
    Data Model A Bigtableis a sparse, distributed, persistent multi-dimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes. Bigtable considers data as strings, both in case of structured and unstructured data. ● Rows ➔ The row keys in a table are arbitrary strings. ➔ Data is maintained in lexicographic order by row key ➔ Each row range is called a tablet, which is the unit of distribution and load balancing. ● Columns ➔ Column keys are grouped into sets called column families. ➔ Data stored in a column family is usually of the same type ➔ A column key is named using the syntax: family : qualifier. ➔ Column family names must be printable , but qualifiers may be arbitrary strings.
  • 7.
    ● Timestamps ➔ Eachcell in a Bigtable can contain multiple versions of the same data ➔ Versions are indexed by 64-bit integer timestamps ➔ Timestamps can be assigned: automatically by Bigtable , or explicitly by client applications Rows Timestamps Columns
  • 8.
    Building Blocks Bigtable isbuilt on several other pieces of Google infrastructure. ● Google File system(GFS) ● SSTable : Data structure for storage ● Chubby: Distributed lock service.
  • 9.
    Three major components ❖Library linked into every client ❖ Single master server ▪ Assigning tablets to tablet servers ▪ Detecting addition and expiration of tablet servers ▪ Balancing tablet-server load ▪ Garbage collection files in GFS ❖ Many tablet servers ▪ Manages a set of tablets ▪ Tablet servers handle read and write requests to its table ▪ Splits tablets that have grown too large
  • 10.
    Three level hierarchy Level2: Root tablet contains the location of METADATA tablets Level 3: Each METADATA tablet contains the location of user tablets Level 1: Chubby file containing location of the root tablet ▪ Location of tablet is stored under a row key that encodes table identifier and its end row
  • 12.
    “All models arewrong. Some models are useful.” - George Box,"one of the great statistical minds of the 20th century”
  • 13.
    Distributed and parallel computing haspaved the way for new technologies to flourish.
  • 14.
    Conclusion Bigtable has providedlow latency real time access and improved higher workload processing with high scalability and high throughput. It’s Robust fault tolerant architecture helps to reduce risk of data loss, reliable cluster resizing enables to provision or de-provision the new cluster with no down time , autonomous management let’s the user be free of managing the tasks and assignment of data, while Bigtable does it automatically and provided integration capabilities with other products and services through API’s really make it a general purpose data store, extending it’s capability and giving user a reliable interface to get more out of less. Bigtable uses a parallel and distributed architecture to process the data at lightning speeds while reducing cost per computation, the architecture at back end is advanced and proved to be better in performance and user experience. With the demand of huge cloud data storage making so much sense now, Bigtable has landed being one of the best possible solution with lower cost, high performance, durability and flexibility. Since it’s already powering most of Google’s services , it has proved its usability, and its really the Google’s magic behind it’s data management and high performance operability, giving it an edge over other giants in the field.
  • 15.