Bigtable: A Distributed Storage System
Presenter: Ku. Devyani B.Vaidya
Dec 8th , 2011Dec 8th , 2011
Bigtable: A Distributed Storage System
1. Introduction
2. What is a Bigtable?
3. Why not A DBMS?
4. Data model: Row
Column
Timestamps
5. APIs
6. Building Blocks
8. Conclusion
7.Real Applications
Dec 8th , 2011Dec 8th , 2011
Introduction
• BigTable is a distributed storage system
for managing structured data.
• Designed to scale to a very large size
- Petabytes of data across thousands of
servers
• Used for many Google projects
- Web indexing, Personalized Search, Google
Earth, Google Analytics, Google Finance, …
• Flexible, high-performance solution for
all of Google’s products
Dec 8th , 2011Dec 8th , 2011
What is a Bigtable?
• “A BigTable is a sparse, distributed,
persistent multidimensional sorted map. The
map is indexed by a row key, a column key,
and a timestamp; each value in the map is an
uninterpreted array of bytes.”
Dec 8th , 2011Dec 8th , 2011
Why not A DBMS?
• Few DBMS’s support the requisite scale
– Required DB with wide scalability, wide
applicability, high performance and high
availability
• Couldn’t afford it if there was one
– Most DBMSs require very expensive
infrastructure
• DBMSs provide more than Google needs
– E.g., full transactions, SQL
• Google has highly optimized lower-level systems
that could be exploited
– GFS, Chubby, MapReduce, Job scheduling
Dec 8th , 2011Dec 8th , 2011
Data model: Row
• Row keys are arbitrary strings
• Row is the unit of transactional consistency
• Data is maintained in lexicographic order by row
key
• Rows with consecutive keys (Row Range) are
grouped together as “tablets”.
Dec 8th , 2011Dec 8th , 2011
Data model: Column
• Column keys are grouped into sets called “column
families”, which form the unit of access control.
• Column key is named using the following syntax:
family :qualifier
• Access control and disk/memory accounting are
performed at column family level
Dec 8th , 2011Dec 8th , 2011
Data model: timestamps
• Each cell in Bigtable can contain multiple versions
of data, each indexed by timestamp
• Timestamps are 64-bit integers
• Assigned by:
– Bigtable
– Client application
• Data is stored in decreasing timestamp order, so
that most recent data is easily accessed
– Application specifies how many versions (n) of data items
are maintained in a cell
- Bigtable garbage-collects cell versions automatically.
Dec 8th , 2011Dec 8th , 2011
Data Model
Example: Web Indexing
Dec 8th , 2011Dec 8th , 2011
Data Model
Dec 8th , 2011Dec 8th , 2011
Data Model
Row
Dec 8th , 2011Dec 8th , 2011
Data Model
Columns
Dec 8th , 2011Dec 8th , 2011
Data Model
Cells
Dec 8th , 2011Dec 8th , 2011
Data Model
timestamps
Dec 8th , 2011Dec 8th , 2011
Data Model
Column family
Dec 8th , 2011Dec 8th , 2011
Data Model
Column family
family: qualifier
Dec 8th , 2011Dec 8th , 2011
Data Model
Column family
family: qualifier
Dec 8th , 2011Dec 8th , 2011
APIs
•The Bigtable API provides functions :
- Creating and deleting tables and column families.
-Changing cluster , table and column family
metadata.
-Support for single row transactions
-Allows cells to be used as integer counters
Dec 8th , 2011Dec 8th , 2011
Building Blocks
. Bigtable uses the distributed Google File
System (GFS) to store log and data files
• The Google SSTable file format is used
internally to store Bigtable data
• An SSTable provides a persistent , ordered
immutable map from keys to values
Dec 8th , 2011Dec 8th , 2011
Real Applications
•Google Analytics
http://analytics.google.com
•Google Earth & Google Maps
http://earth.google.com
•Personalized Search
www.google.com/psearch
•Web Indexing
•Google Finance
•Orkut
•Writely
Dec 8th , 2011Dec 8th , 2011
Conclusion
• Bigtable has achieved its goals of high performance,
data availability and scalability.
It has been successfully deployed in real apps
(Personalized Search, Orkut, GoogleMaps, …)
• Significant advantages of building own storage
system like flexibility in designing data model, control
over implementation and other infrastructure on which
Bigtable relies on.
Dec 8th , 2011Dec 8th , 2011
Source
1. www.google.com
2. www.studymafia.org
Dec 8th , 2011
©2007 The Board of Regents of the University of Nebraska. All rights reserved.
Thanks

Bigtable a distributed storage system

  • 1.
    Bigtable: A DistributedStorage System Presenter: Ku. Devyani B.Vaidya
  • 2.
    Dec 8th ,2011Dec 8th , 2011 Bigtable: A Distributed Storage System 1. Introduction 2. What is a Bigtable? 3. Why not A DBMS? 4. Data model: Row Column Timestamps 5. APIs 6. Building Blocks 8. Conclusion 7.Real Applications
  • 3.
    Dec 8th ,2011Dec 8th , 2011 Introduction • BigTable is a distributed storage system for managing structured data. • Designed to scale to a very large size - Petabytes of data across thousands of servers • Used for many Google projects - Web indexing, Personalized Search, Google Earth, Google Analytics, Google Finance, … • Flexible, high-performance solution for all of Google’s products
  • 4.
    Dec 8th ,2011Dec 8th , 2011 What is a Bigtable? • “A BigTable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, a column key, and a timestamp; each value in the map is an uninterpreted array of bytes.”
  • 5.
    Dec 8th ,2011Dec 8th , 2011 Why not A DBMS? • Few DBMS’s support the requisite scale – Required DB with wide scalability, wide applicability, high performance and high availability • Couldn’t afford it if there was one – Most DBMSs require very expensive infrastructure • DBMSs provide more than Google needs – E.g., full transactions, SQL • Google has highly optimized lower-level systems that could be exploited – GFS, Chubby, MapReduce, Job scheduling
  • 6.
    Dec 8th ,2011Dec 8th , 2011 Data model: Row • Row keys are arbitrary strings • Row is the unit of transactional consistency • Data is maintained in lexicographic order by row key • Rows with consecutive keys (Row Range) are grouped together as “tablets”.
  • 7.
    Dec 8th ,2011Dec 8th , 2011 Data model: Column • Column keys are grouped into sets called “column families”, which form the unit of access control. • Column key is named using the following syntax: family :qualifier • Access control and disk/memory accounting are performed at column family level
  • 8.
    Dec 8th ,2011Dec 8th , 2011 Data model: timestamps • Each cell in Bigtable can contain multiple versions of data, each indexed by timestamp • Timestamps are 64-bit integers • Assigned by: – Bigtable – Client application • Data is stored in decreasing timestamp order, so that most recent data is easily accessed – Application specifies how many versions (n) of data items are maintained in a cell - Bigtable garbage-collects cell versions automatically.
  • 9.
    Dec 8th ,2011Dec 8th , 2011 Data Model Example: Web Indexing
  • 10.
    Dec 8th ,2011Dec 8th , 2011 Data Model
  • 11.
    Dec 8th ,2011Dec 8th , 2011 Data Model Row
  • 12.
    Dec 8th ,2011Dec 8th , 2011 Data Model Columns
  • 13.
    Dec 8th ,2011Dec 8th , 2011 Data Model Cells
  • 14.
    Dec 8th ,2011Dec 8th , 2011 Data Model timestamps
  • 15.
    Dec 8th ,2011Dec 8th , 2011 Data Model Column family
  • 16.
    Dec 8th ,2011Dec 8th , 2011 Data Model Column family family: qualifier
  • 17.
    Dec 8th ,2011Dec 8th , 2011 Data Model Column family family: qualifier
  • 18.
    Dec 8th ,2011Dec 8th , 2011 APIs •The Bigtable API provides functions : - Creating and deleting tables and column families. -Changing cluster , table and column family metadata. -Support for single row transactions -Allows cells to be used as integer counters
  • 19.
    Dec 8th ,2011Dec 8th , 2011 Building Blocks . Bigtable uses the distributed Google File System (GFS) to store log and data files • The Google SSTable file format is used internally to store Bigtable data • An SSTable provides a persistent , ordered immutable map from keys to values
  • 20.
    Dec 8th ,2011Dec 8th , 2011 Real Applications •Google Analytics http://analytics.google.com •Google Earth & Google Maps http://earth.google.com •Personalized Search www.google.com/psearch •Web Indexing •Google Finance •Orkut •Writely
  • 21.
    Dec 8th ,2011Dec 8th , 2011 Conclusion • Bigtable has achieved its goals of high performance, data availability and scalability. It has been successfully deployed in real apps (Personalized Search, Orkut, GoogleMaps, …) • Significant advantages of building own storage system like flexibility in designing data model, control over implementation and other infrastructure on which Bigtable relies on.
  • 22.
    Dec 8th ,2011Dec 8th , 2011 Source 1. www.google.com 2. www.studymafia.org
  • 23.
    Dec 8th ,2011 ©2007 The Board of Regents of the University of Nebraska. All rights reserved. Thanks