SlideShare a Scribd company logo
Understanding Bigtable
Tarun Kumar Sarkar
Adviser: Prof. Dr. Stefan B̈ttcher
University of Paderborn
September 30, 2015
Abstract
Bigtable is a distributed storage system designed by Google to manage large scale of
structured data. Various application of Google (Google Analytics, Google Earth,
Google Finance etc.) having different kind of demands in terms of dada size, latency
requirement, flexibility of managing its data. Google wanted to develop an
application, which can solve the varied demands from those applications and can be
deployed over a distributed environment. After years of brainstorming they developed
Bigtable, which provide high scalability, flexibility and high performance needed by
those application. Bigtable provide a very simple data model, which gives the clients
dynamic control over its data layout and format. We will discuss the Bigtable data
model, its architecture and implementation of the architecture in this paper.
1. Introduction
1.1. Motivation
One main problem Google faced was to store and manage the large and rapidly
growing volume of information, another requirement was to analyze that information
which could add significant value to the decision making process. Dealing these
issues using traditional system may involve complex workloads, which push the
boundaries of what are possible using traditional data warehousing and data
management techniques and technologies. Traditional relational databases present a
view that is composed of multiple tables, each with rows and named columns.
Queries, mostly performed in SQL (Structured Query Language) allow one to extract
specific columns from a row where certain conditions are met (e.g., a column has a
specific value). Moreover, one can perform queries across multiple tables (this is the
“relational” part of a relational database). For example a table of students may include
a student’s name, ID number, and contact information. A table of grades may include
a student’s ID number, course number, and grade. We can construct a query that
extracts a grade by name by searching for the ID number in the student table and then
matching that ID number in the grade table. Moreover, with traditional databases, we
expect ACID guarantees: that transactions will be atomic, consistent, isolated, and
durable. As with distributed transactions, it is impossible to guarantee consistency
while providing high availability and network partition tolerance. This makes ACID
databases unattractive for highly distributed environments and led to the emergence of
alternate data stores that are target to high availability and high performance. Here,
we will look at the structure and capabilities of Bigtable.
1.2. Ground Work
A basic understanding of Relational Database concept as well as fundamentals of Big
Data would help understanding this paper. May reference available online even going
through the Wikipedia page about Relational Database and Big Data would be of
great help.
2. Bigtable
Bigtable is a distributed storage system that is structured as a large table (e.g. one that
may be petabytes in size and distributed among tens of thousands of machines). It is
designed for storing items such as billions of URLs, with many versions of content,
over 100 TB of satellite image data. It has the capability of handling hundreds of
millions of users, and has the ability to performing thousands of queries per second.
Bigtable was developed at Google and it has been in use since 2005 in hundreds of
Google services.
2.1. Characteristics
Bigtable basically is a sparse, distributed, persistent multi-dimensional sorted map.
Map
A map is an associative array; a data structure that allows one to look up a value to a
corresponding key quickly. Bigtable is a collection of (key, value) pairs.
Persistent
The data is stored persistently on disk.
Distributed
Bigtable’s data is distributed among many independent machines. The table is broken
up among rows, with groups of adjacent rows managed by a server. A row itself is
never distributed.
Sparse
The table is sparse, meaning that different rows in a table may use different columns,
with many of the columns empty for a particular row.
Sorted
Bigtable sorts its data by keys. This helps keep related data close together; usually on
the same machine assuming that one structures keys in such a way that sorting brings
the data together.
Multidimensional
A table is indexed by row key. Each row contains one or more named column
families. Each column family may have multiple columns and each cell (intersection
of row and column) may contain multiple versions of data based on time stamp.
Timestamp based
Time is another dimension in Bigtable data storage. Every cell in Bigtable may keep
multiple versions of column family data. If an application does not specify a
timestamp, it will retrieve the latest version of the column family. Alternatively, it can
specify a timestamp and get the latest version that is earlier than or equal to that
timestamp.
2.2. Data Model
Bigtable is designed with semi-structured data storage in mind. It is a large map that
is indexed by a row key, column key, and a timestamp. Each value within the map is
an array of bytes that is interpreted by the application.
Let us look at a sample slice of a table (Figure 1) that stores information about many
server performances named serverperformance.
Figure 1: A slice of an example table that stores cpu performance and memory of many
server. The row key is the name of the server (e.g. server1, server2 etc.). The cpu column
family contains the cpu usage, and the memory column family contains the memory usage of
each server. The cell (intersection of server1 row and cpu:core2 column) has three versions
of data, at different timestamps t1, t2, and t3.
Rows
Bigtable maintains data in lexicographic order by row key. Every read or write of data
to a row is atomic, regardless of how many different columns are read or written
within that row. A table is logically split among rows into multiple sub-tables called
tablets. A tablet is a set of consecutive rows of a table and is the unit of distribution
and load balancing within Bigtable. Because the table is always sorted by row, reads
of short ranges of rows are efficient; one typically requires communicating with a
small number of machines. Hence, this is a key idea to ensure a high degree of
locality for their data access. The row keys in a table are arbitrary strings, in our
example server1, server2 are the row keys.
Column Families
Each row contains one or more named column families. Basically column keys are
grouped into sets called column families. A column family must be defined before
data can be stored under any column key in that family. Within a column family, one
may have one or any number of named columns. All data within a column family is
usually of the same type. The implementation of Bigtable usually compresses all the
column’s data together within a column family. Columns within a column family can
be created on the fly. Rows, column families and columns provide a three level
naming hierarchy in identifying data. A column key is defined using a printable
family name and the column name of arbitrary string. For example, cpu:core1 is a
column key. Column family is the unit of access control and both disk and memory
accounting also performed at the column family level. For example a client only
allowed reading data of cpu column family.
core1 core2 core3 physical virtual
server1
server2
server3
cpu memory
t1
t2
t3
serverperformance
Timestamps
Each column family cell can contain multiple versions of same data. Such as, in the
example, we may have several time stamped versions (t1, t2, t3) of cpu performance
data in cpu:core2 column of server1. Each version is identified by a 64-bit timestamp
that either represents real time or is a value assigned by the client. A table is
configured with per-column-family settings for garbage collection of old data. A
column family can be defined to keep only the latest n versions or to keep only the
versions written since some time t (e.g. only keep values that were written in the last
seven days).
2.3. Supported API
Bigtable support functions for creating and deleting tables and column families,
changing cluster, table, and column family metadata, such as access control rights. A
Bigtable client application can write or delete values into table, retrieve values from
individual rows, or iterate over a subset of the data in a table.
Bigtable supports many features that allow the user to work on data and manipulate it
in complex ways. It does not support transaction across row keys. Currently Bigtable
supports only single-row transactions, which mean atomic read-modify-write
operation sequences can be performed on data stored under a single row key.
A Bigtable client can execute its scripts under the address space of the servers. The
supported scripting language is Sawzall, developed at Google for processing data.
Bigtable provide a set of wrappers, which can be used with MapReduce, it allow a
Bigtable to be used both as an input source and as an output target for MapReduce
jobs.
2.4. Bigtable Architecture
The Bigtable comprises three main components as we can see in (Figure 2); a client
library (that is linked into every client), a master server that coordinates activities, and
many tablet servers.
Figure 2: Bigtable Architecture
Master
Server
tablet
Tablet Server
tablet
Tablet Server
tablet
Tablet Server
client
client
client
A Bigtable cluster may stores a number of tables and each table consists of a set of
tablets, and each tablet contains a set of row range. Initially, each table consists of just
one tablet. As a table grows, it is automatically split into multiple tablets (typically
100-200 MB in size). Tablet servers can be added or removed dynamically.
The master assigns tablets to tablet servers and balances tablet server load. It is also
responsible for garbage collection of files in GFS and managing schema changes
(table and column family creation).
Each tablet server manages a set of tablets (typically 10-1,000 tablets per server). It
handles read/write requests to the tablets it manages and splits tablets when a tablet
becomes too large. As with other distributed systems client data does not move
through the master, clients communicate directly with tablet servers for reads/writes
operation. This makes the master lightly loaded.
2.5. Architecture Implementation
This section describes the fundamentals of the Bigtable architecture implementation.
Building Blocks
Bigtable is not independent. It is built on several other pieces of Google infrastructure
to do what it does.
Bigtable uses the Google File System (GFS) to store data and log files. It provides
efficient, reliable access to data using large clusters of commodity hardware.
Bigtable depends on a cluster management system it schedule jobs, manage resources
on shared machines, deal with machine failures, and monitor machine status.
SSTable file format is used to store Bigtable data. SSTables are designed so that a
data access requires, at most, a single disk access. An SSTable, once created, is never
changed. If new data is added, a new SSTable is created. Once an old SSTable is no
longer needed, it is set out for garbage collection. SSTable immutability is at the core
of Bigtable’s data check pointing and recovery routines.
Chubby is a highly available and persistent distributed lock service that manages
leases for resources and stores configuration information. The service runs with five
active replicas, one of which is elected as the master to serve requests. A majority
must be running for the service to work. Paxos algorithm is used to keep the replicas
consistent. Chubby provides a namespace of files & directories. Each file or directory
can be used as a lock. Bigtable uses Chubby to ensure there is only one active master,
to store the bootstrap location of Bigtable data, to discover tablet servers, to store
Bigtable schema information, and to store access control lists.
Tablet Location
Locating tablet within a Bigtable is managed in a three-level hierarchy (Figure 3).
The first level is a file stored in Chubby that contains the location of the root tablet.
The root (top-level) tablet stores the location of all Metadata tablets in a special
Metadata table. Each Metadata tablet contains the location of user data tablets. The
client library caches tablet locations for efficiency. Some secondary information is
stored in the METADATA table for debugging and performance analysis.
Figure 3: Tablet Location Hierarchy
Tablet Assignment
A tablet is assigned to one tablet server at any point of time. The master is responsible
to keep track of the set of live tablet servers, and the current assignment of tablets to
tablet servers, including which tablets are unassigned. If a tablet is unassigned, and
place is available in a tablet server, the master assigns the tablet by sending a tablet
load request to the tablet server. Chubby keep track of tablet servers. When a tablet
server starts, it creates, and acquires an exclusive lock on, a uniquely named file in a
specific Chubby directory. The master monitors this directory (the servers directory)
to discover tablet servers.
Whenever a master is started by the Bigtable cluster management system, it executes
the following steps to discover the current tablet assignments (1) The master grabs a
unique master lock in Chubby, which prevents con-current master instantiations. (2)
The master scans the server’s directory in Chubby to find the live servers. (3) The
master communicates with every live tablet server to discover what tablets are already
assigned to each server. (4) The master scans the METADATA table to learn the set
of tablets. (5) Builds a set of unassigned tablet, which are become eligible for
assignment.
Tablet splits are treated specially since a tablet server initiates them. The tablet server
commits the split by recording information for the new tablet in the METADATA
table. When the split has committed, it notifies the master.
Tablet Serving
Bigtable stores the persistent state of its data into GFS (Figure 4). Any updates
information to the Bigtable are first stored in a commit log, which is basically redo
records. This redo records are used for recovery incase of failure. The recently
committed updates are stored in memtable (in memory sorted buffer); the older
updates are stored in a sequence of SSTables.
Chubby file
Root tablet
(1st METADATA tablet)
Other
METADATA
tablets
User Table 1
User Table N
Figure 4: Tablet Representation
When a write operation arrives at a tablet server, the server checks for well-
formedness of the data, and that the sender is authorized to perform the mutation
(Mutation is abstraction to perform a series of updates). A valid mutation is first
written to the commit log, after that its contents are inserted into the memtable.
When a read operation arrives at a tablet server, it is similarly checked for well-
formedness and proper authorization. A valid read operation is executed on a merged
view of the sequence of SSTables and the memtable.
Compactions
Bigtable perform two kind of compaction one is minor compaction and another is
major compaction. In minor compaction the existing memtable is frozen and a new
meltable is created once its size reaches a threshold, and the frozen memtable is
converted to an SSTable and written to GFS. It has two goals, one is to shrinks the
memory usage of the tablet server, and second is to reduce the amount of data that has
to be read from the commit log during recovery.
In major compaction Bigtable reads the contents of many SSTable (created during
minor compaction) and the meltable content and write out to a new SSTable.
The SSTables produced during major compactions does not contain special deletion
entries, which might be available in SSTable created during minor compaction. These
major compactions allow Bigtable to reclaim resources used by deleted data. A client
application can optionally specify which compression to use for SSTables.
2.6. Open Ends
Where to use?
Bigtable is ideal for applications that need very high throughput and scalability for
non-structured data. Bigtable also can be used as a storage engine for batch
MapReduce operations, stream processing, analytics, and machine-learning
applications. Bigtable can be used to store and query marketing data (such as
purchase histories and customer preferences), financial data (such as transaction
histories, stock prices, and currency exchange rate), Internet of Things data (such as
Write Op
commit log
memtable Read Op
Memory
GFS
Tablet Serving
SSTable Files
usage reports from energy meters and home appliances) and Time-series data (such as
CPU and memory usage over time for multiple servers).
Bigtable is not a relational database, it does not support SQL queries or joins, nor
does it support multi-row transactions. It is not a good solution for small amounts of
data.
What is Next?
Research is going on for supporting some additional Bigtable features, such as
support for secondary indices and infrastructure for building cross-data-center
replicated Bigtables with multiple master replicas.
3. Conclusions
I have described the characteristics of Bigtable, Data Model, the Architecture and its
implementation and who need it. It is important to realize that data comes in many
shapes and sizes. It also has many different uses; real-time fraud detection, web
display advertising and competitive analysis, social media and sentiment analysis,
intelligent traffic management and smart power grids, are few example. All of these
analytical solutions involve significant volumes of both semi-structured and
structured data. Many of these analytical solutions were not possible previously
because they were too costly to implement using standard relational database system.
Bigtable in combination with these new and evolving analytical processing
technologies can bring significant benefits to the business. Initially Google developed
this distributed system for storing structured data for its internal use. Bigtable clusters
have been in production use since April 2005. Currently Bigtable used in more than
hundreds of Goggle product and Google has many customers outside. Bigtable users
like the performance and high availability provided by the Bigtable implementation,
and that they can scale the capacity of their clusters by simply adding more machines
to the system as their resource demands change over time.
4. References
[1] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A.
Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber:
Bigtable: A Distributed Storage System for Structured Data. OSDI'06: Seventh
Symposium on Operating System Design and Implementation,
Seattle, WA, November 2006.
[2] https://cloud.google.com/bigtable/
[3] https://en.wikipedia.org/wiki/BigTable
[4] https://en.wikipedia.org/wiki/Big_data
[5] https://en.wikipedia.org/wiki/Relational_database_management_system

More Related Content

What's hot

ADVANCE DATABASE MANAGEMENT SYSTEM CONCEPTS & ARCHITECTURE by vikas jagtap
ADVANCE DATABASE MANAGEMENT SYSTEM CONCEPTS & ARCHITECTURE by vikas jagtapADVANCE DATABASE MANAGEMENT SYSTEM CONCEPTS & ARCHITECTURE by vikas jagtap
ADVANCE DATABASE MANAGEMENT SYSTEM CONCEPTS & ARCHITECTURE by vikas jagtap
Vikas Jagtap
 
Introduction to database with ms access.hetvii
Introduction to database with ms access.hetviiIntroduction to database with ms access.hetvii
Introduction to database with ms access.hetvii
07HetviBhagat
 
Data resource management and DSS
Data resource management and DSSData resource management and DSS
Data resource management and DSS
RajThakuri
 
JovianDATA MDX Engine Comad oct 22 2011
JovianDATA MDX Engine Comad oct 22 2011JovianDATA MDX Engine Comad oct 22 2011
JovianDATA MDX Engine Comad oct 22 2011Satya Ramachandran
 
Database Part 1
Database Part 1Database Part 1
Database Part 1
Fizaril Amzari Omar
 
Database Part 2
Database Part 2Database Part 2
Database Part 2
Fizaril Amzari Omar
 
Introduction to databases
Introduction to databasesIntroduction to databases
Introduction to databases
Bryan Corpuz
 
Spot db consistency checking and optimization in spatial database
Spot db  consistency checking and optimization in spatial databaseSpot db  consistency checking and optimization in spatial database
Spot db consistency checking and optimization in spatial database
Pratik Udapure
 
Database Management System
Database Management SystemDatabase Management System
Database Management System
Abishek V S
 
11 Database Concepts
11 Database Concepts11 Database Concepts
11 Database Concepts
Praveen M Jigajinni
 
Database system concepts
Database system conceptsDatabase system concepts
Database system conceptsKumar
 
Chapter 6 Database SC025 2017/2018
Chapter 6 Database SC025 2017/2018Chapter 6 Database SC025 2017/2018
Chapter 6 Database SC025 2017/2018
Fizaril Amzari Omar
 
Database concepts
Database conceptsDatabase concepts
Database concepts
Harry Potter
 
Database management system by Neeraj Bhandari ( Surkhet.Nepal )
Database management system by Neeraj Bhandari ( Surkhet.Nepal )Database management system by Neeraj Bhandari ( Surkhet.Nepal )
Database management system by Neeraj Bhandari ( Surkhet.Nepal )Neeraj Bhandari
 
ความรู้เบื้องต้นฐานข้อมูล 1
ความรู้เบื้องต้นฐานข้อมูล 1ความรู้เบื้องต้นฐานข้อมูล 1
ความรู้เบื้องต้นฐานข้อมูล 1Witoon Thammatuch-aree
 
Database Design
Database DesignDatabase Design
Database Designlearnt
 

What's hot (17)

ADVANCE DATABASE MANAGEMENT SYSTEM CONCEPTS & ARCHITECTURE by vikas jagtap
ADVANCE DATABASE MANAGEMENT SYSTEM CONCEPTS & ARCHITECTURE by vikas jagtapADVANCE DATABASE MANAGEMENT SYSTEM CONCEPTS & ARCHITECTURE by vikas jagtap
ADVANCE DATABASE MANAGEMENT SYSTEM CONCEPTS & ARCHITECTURE by vikas jagtap
 
Introduction to database with ms access.hetvii
Introduction to database with ms access.hetviiIntroduction to database with ms access.hetvii
Introduction to database with ms access.hetvii
 
Data resource management and DSS
Data resource management and DSSData resource management and DSS
Data resource management and DSS
 
JovianDATA MDX Engine Comad oct 22 2011
JovianDATA MDX Engine Comad oct 22 2011JovianDATA MDX Engine Comad oct 22 2011
JovianDATA MDX Engine Comad oct 22 2011
 
Database Part 1
Database Part 1Database Part 1
Database Part 1
 
Database Part 2
Database Part 2Database Part 2
Database Part 2
 
Introduction to databases
Introduction to databasesIntroduction to databases
Introduction to databases
 
Spot db consistency checking and optimization in spatial database
Spot db  consistency checking and optimization in spatial databaseSpot db  consistency checking and optimization in spatial database
Spot db consistency checking and optimization in spatial database
 
Database Management System
Database Management SystemDatabase Management System
Database Management System
 
11 Database Concepts
11 Database Concepts11 Database Concepts
11 Database Concepts
 
Database system concepts
Database system conceptsDatabase system concepts
Database system concepts
 
Chapter 6 Database SC025 2017/2018
Chapter 6 Database SC025 2017/2018Chapter 6 Database SC025 2017/2018
Chapter 6 Database SC025 2017/2018
 
Database concepts
Database conceptsDatabase concepts
Database concepts
 
Database management system by Neeraj Bhandari ( Surkhet.Nepal )
Database management system by Neeraj Bhandari ( Surkhet.Nepal )Database management system by Neeraj Bhandari ( Surkhet.Nepal )
Database management system by Neeraj Bhandari ( Surkhet.Nepal )
 
ความรู้เบื้องต้นฐานข้อมูล 1
ความรู้เบื้องต้นฐานข้อมูล 1ความรู้เบื้องต้นฐานข้อมูล 1
ความรู้เบื้องต้นฐานข้อมูล 1
 
Week 1
Week 1Week 1
Week 1
 
Database Design
Database DesignDatabase Design
Database Design
 

Viewers also liked

CPS_11.10.16(a)
CPS_11.10.16(a)CPS_11.10.16(a)
CPS_11.10.16(a)Jim Eskin
 
Las herramientas web en los estudios
Las herramientas web en los estudiosLas herramientas web en los estudios
Las herramientas web en los estudios
Gus Gus
 
Storyboarding
StoryboardingStoryboarding
Storyboarding
SeyiiO
 
Robert Williams Work Experience
Robert Williams Work ExperienceRobert Williams Work Experience
Robert Williams Work ExperienceRobert Williams
 
Tytöt ja teknologia -tilaisuus 3.3.2016: Miten ja miksi tytöt käyttävät sosia...
Tytöt ja teknologia -tilaisuus 3.3.2016: Miten ja miksi tytöt käyttävät sosia...Tytöt ja teknologia -tilaisuus 3.3.2016: Miten ja miksi tytöt käyttävät sosia...
Tytöt ja teknologia -tilaisuus 3.3.2016: Miten ja miksi tytöt käyttävät sosia...
A-lehdet Oy
 
Proyecto 5
Proyecto 5Proyecto 5
INLIGHT App Note 37
INLIGHT App Note 37INLIGHT App Note 37
INLIGHT App Note 37Amber Cook
 
Smith, Simon- Resume - Master
Smith, Simon- Resume - MasterSmith, Simon- Resume - Master
Smith, Simon- Resume - Mastersimon smith
 
Michał Koniewicz - "SCRUM - jak ugryźć i nie połamać sobie zębów - doświadcza...
Michał Koniewicz - "SCRUM - jak ugryźć i nie połamać sobie zębów - doświadcza...Michał Koniewicz - "SCRUM - jak ugryźć i nie połamać sobie zębów - doświadcza...
Michał Koniewicz - "SCRUM - jak ugryźć i nie połamać sobie zębów - doświadcza...
PMI Szczecin
 
PI_JohnnieGriffin_102615
PI_JohnnieGriffin_102615PI_JohnnieGriffin_102615
PI_JohnnieGriffin_102615Johnnie Griffin
 
Herramientas Ofimáticas
Herramientas OfimáticasHerramientas Ofimáticas
Herramientas Ofimáticas
Jose Andres Cerda Hidalgo
 
Keshu
KeshuKeshu
Keshu
kesu1234
 

Viewers also liked (13)

CPS_11.10.16(a)
CPS_11.10.16(a)CPS_11.10.16(a)
CPS_11.10.16(a)
 
Las herramientas web en los estudios
Las herramientas web en los estudiosLas herramientas web en los estudios
Las herramientas web en los estudios
 
Storyboarding
StoryboardingStoryboarding
Storyboarding
 
Robert Williams Work Experience
Robert Williams Work ExperienceRobert Williams Work Experience
Robert Williams Work Experience
 
Tytöt ja teknologia -tilaisuus 3.3.2016: Miten ja miksi tytöt käyttävät sosia...
Tytöt ja teknologia -tilaisuus 3.3.2016: Miten ja miksi tytöt käyttävät sosia...Tytöt ja teknologia -tilaisuus 3.3.2016: Miten ja miksi tytöt käyttävät sosia...
Tytöt ja teknologia -tilaisuus 3.3.2016: Miten ja miksi tytöt käyttävät sosia...
 
Proyecto 5
Proyecto 5Proyecto 5
Proyecto 5
 
INLIGHT App Note 37
INLIGHT App Note 37INLIGHT App Note 37
INLIGHT App Note 37
 
ACCURATE WEIGHING SCALES Catalog CountryWingGroup
ACCURATE WEIGHING SCALES Catalog CountryWingGroupACCURATE WEIGHING SCALES Catalog CountryWingGroup
ACCURATE WEIGHING SCALES Catalog CountryWingGroup
 
Smith, Simon- Resume - Master
Smith, Simon- Resume - MasterSmith, Simon- Resume - Master
Smith, Simon- Resume - Master
 
Michał Koniewicz - "SCRUM - jak ugryźć i nie połamać sobie zębów - doświadcza...
Michał Koniewicz - "SCRUM - jak ugryźć i nie połamać sobie zębów - doświadcza...Michał Koniewicz - "SCRUM - jak ugryźć i nie połamać sobie zębów - doświadcza...
Michał Koniewicz - "SCRUM - jak ugryźć i nie połamać sobie zębów - doświadcza...
 
PI_JohnnieGriffin_102615
PI_JohnnieGriffin_102615PI_JohnnieGriffin_102615
PI_JohnnieGriffin_102615
 
Herramientas Ofimáticas
Herramientas OfimáticasHerramientas Ofimáticas
Herramientas Ofimáticas
 
Keshu
KeshuKeshu
Keshu
 

Similar to Bigtable_Paper

Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06
Shahbaz Sidhu
 
Bigtable
Bigtable Bigtable
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06temp2004it
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06
mrlonganh
 
Google BigTable
Google BigTableGoogle BigTable
Google Big Table
Google Big TableGoogle Big Table
Google Big Table
Omar Al-Sabek
 
Google - Bigtable
Google - BigtableGoogle - Bigtable
Google - Bigtable
영원 서
 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databases
Fabio Fumarola
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
theijes
 
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLA STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
ijscai
 
A Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQLA Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQL
IJSCAI Journal
 
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLA STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
ijscai
 
A Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQLA Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQL
IJSCAI Journal
 
22827361 ab initio-fa-qs
22827361 ab initio-fa-qs22827361 ab initio-fa-qs
22827361 ab initio-fa-qsCapgemini
 

Similar to Bigtable_Paper (20)

Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06
 
Bigtable
Bigtable Bigtable
Bigtable
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06
 
Google BigTable
Google BigTableGoogle BigTable
Google BigTable
 
Google Big Table
Google Big TableGoogle Big Table
Google Big Table
 
GOOGLE BIGTABLE
GOOGLE BIGTABLEGOOGLE BIGTABLE
GOOGLE BIGTABLE
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 
Google - Bigtable
Google - BigtableGoogle - Bigtable
Google - Bigtable
 
Big table
Big tableBig table
Big table
 
Sap abap material
Sap abap materialSap abap material
Sap abap material
 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databases
 
Database aggregation using metadata
Database aggregation using metadataDatabase aggregation using metadata
Database aggregation using metadata
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLA STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
 
A Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQLA Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQL
 
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQLA STUDY ON GRAPH STORAGE DATABASE OF NOSQL
A STUDY ON GRAPH STORAGE DATABASE OF NOSQL
 
A Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQLA Study on Graph Storage Database of NOSQL
A Study on Graph Storage Database of NOSQL
 
22827361 ab initio-fa-qs
22827361 ab initio-fa-qs22827361 ab initio-fa-qs
22827361 ab initio-fa-qs
 

Bigtable_Paper

  • 1. Understanding Bigtable Tarun Kumar Sarkar Adviser: Prof. Dr. Stefan B̈ttcher University of Paderborn September 30, 2015 Abstract Bigtable is a distributed storage system designed by Google to manage large scale of structured data. Various application of Google (Google Analytics, Google Earth, Google Finance etc.) having different kind of demands in terms of dada size, latency requirement, flexibility of managing its data. Google wanted to develop an application, which can solve the varied demands from those applications and can be deployed over a distributed environment. After years of brainstorming they developed Bigtable, which provide high scalability, flexibility and high performance needed by those application. Bigtable provide a very simple data model, which gives the clients dynamic control over its data layout and format. We will discuss the Bigtable data model, its architecture and implementation of the architecture in this paper. 1. Introduction 1.1. Motivation One main problem Google faced was to store and manage the large and rapidly growing volume of information, another requirement was to analyze that information which could add significant value to the decision making process. Dealing these issues using traditional system may involve complex workloads, which push the boundaries of what are possible using traditional data warehousing and data management techniques and technologies. Traditional relational databases present a view that is composed of multiple tables, each with rows and named columns. Queries, mostly performed in SQL (Structured Query Language) allow one to extract specific columns from a row where certain conditions are met (e.g., a column has a specific value). Moreover, one can perform queries across multiple tables (this is the “relational” part of a relational database). For example a table of students may include a student’s name, ID number, and contact information. A table of grades may include a student’s ID number, course number, and grade. We can construct a query that extracts a grade by name by searching for the ID number in the student table and then matching that ID number in the grade table. Moreover, with traditional databases, we expect ACID guarantees: that transactions will be atomic, consistent, isolated, and durable. As with distributed transactions, it is impossible to guarantee consistency while providing high availability and network partition tolerance. This makes ACID databases unattractive for highly distributed environments and led to the emergence of alternate data stores that are target to high availability and high performance. Here, we will look at the structure and capabilities of Bigtable. 1.2. Ground Work A basic understanding of Relational Database concept as well as fundamentals of Big Data would help understanding this paper. May reference available online even going through the Wikipedia page about Relational Database and Big Data would be of great help.
  • 2. 2. Bigtable Bigtable is a distributed storage system that is structured as a large table (e.g. one that may be petabytes in size and distributed among tens of thousands of machines). It is designed for storing items such as billions of URLs, with many versions of content, over 100 TB of satellite image data. It has the capability of handling hundreds of millions of users, and has the ability to performing thousands of queries per second. Bigtable was developed at Google and it has been in use since 2005 in hundreds of Google services. 2.1. Characteristics Bigtable basically is a sparse, distributed, persistent multi-dimensional sorted map. Map A map is an associative array; a data structure that allows one to look up a value to a corresponding key quickly. Bigtable is a collection of (key, value) pairs. Persistent The data is stored persistently on disk. Distributed Bigtable’s data is distributed among many independent machines. The table is broken up among rows, with groups of adjacent rows managed by a server. A row itself is never distributed. Sparse The table is sparse, meaning that different rows in a table may use different columns, with many of the columns empty for a particular row. Sorted Bigtable sorts its data by keys. This helps keep related data close together; usually on the same machine assuming that one structures keys in such a way that sorting brings the data together. Multidimensional A table is indexed by row key. Each row contains one or more named column families. Each column family may have multiple columns and each cell (intersection of row and column) may contain multiple versions of data based on time stamp. Timestamp based Time is another dimension in Bigtable data storage. Every cell in Bigtable may keep multiple versions of column family data. If an application does not specify a timestamp, it will retrieve the latest version of the column family. Alternatively, it can specify a timestamp and get the latest version that is earlier than or equal to that timestamp. 2.2. Data Model Bigtable is designed with semi-structured data storage in mind. It is a large map that is indexed by a row key, column key, and a timestamp. Each value within the map is an array of bytes that is interpreted by the application.
  • 3. Let us look at a sample slice of a table (Figure 1) that stores information about many server performances named serverperformance. Figure 1: A slice of an example table that stores cpu performance and memory of many server. The row key is the name of the server (e.g. server1, server2 etc.). The cpu column family contains the cpu usage, and the memory column family contains the memory usage of each server. The cell (intersection of server1 row and cpu:core2 column) has three versions of data, at different timestamps t1, t2, and t3. Rows Bigtable maintains data in lexicographic order by row key. Every read or write of data to a row is atomic, regardless of how many different columns are read or written within that row. A table is logically split among rows into multiple sub-tables called tablets. A tablet is a set of consecutive rows of a table and is the unit of distribution and load balancing within Bigtable. Because the table is always sorted by row, reads of short ranges of rows are efficient; one typically requires communicating with a small number of machines. Hence, this is a key idea to ensure a high degree of locality for their data access. The row keys in a table are arbitrary strings, in our example server1, server2 are the row keys. Column Families Each row contains one or more named column families. Basically column keys are grouped into sets called column families. A column family must be defined before data can be stored under any column key in that family. Within a column family, one may have one or any number of named columns. All data within a column family is usually of the same type. The implementation of Bigtable usually compresses all the column’s data together within a column family. Columns within a column family can be created on the fly. Rows, column families and columns provide a three level naming hierarchy in identifying data. A column key is defined using a printable family name and the column name of arbitrary string. For example, cpu:core1 is a column key. Column family is the unit of access control and both disk and memory accounting also performed at the column family level. For example a client only allowed reading data of cpu column family. core1 core2 core3 physical virtual server1 server2 server3 cpu memory t1 t2 t3 serverperformance
  • 4. Timestamps Each column family cell can contain multiple versions of same data. Such as, in the example, we may have several time stamped versions (t1, t2, t3) of cpu performance data in cpu:core2 column of server1. Each version is identified by a 64-bit timestamp that either represents real time or is a value assigned by the client. A table is configured with per-column-family settings for garbage collection of old data. A column family can be defined to keep only the latest n versions or to keep only the versions written since some time t (e.g. only keep values that were written in the last seven days). 2.3. Supported API Bigtable support functions for creating and deleting tables and column families, changing cluster, table, and column family metadata, such as access control rights. A Bigtable client application can write or delete values into table, retrieve values from individual rows, or iterate over a subset of the data in a table. Bigtable supports many features that allow the user to work on data and manipulate it in complex ways. It does not support transaction across row keys. Currently Bigtable supports only single-row transactions, which mean atomic read-modify-write operation sequences can be performed on data stored under a single row key. A Bigtable client can execute its scripts under the address space of the servers. The supported scripting language is Sawzall, developed at Google for processing data. Bigtable provide a set of wrappers, which can be used with MapReduce, it allow a Bigtable to be used both as an input source and as an output target for MapReduce jobs. 2.4. Bigtable Architecture The Bigtable comprises three main components as we can see in (Figure 2); a client library (that is linked into every client), a master server that coordinates activities, and many tablet servers. Figure 2: Bigtable Architecture Master Server tablet Tablet Server tablet Tablet Server tablet Tablet Server client client client
  • 5. A Bigtable cluster may stores a number of tables and each table consists of a set of tablets, and each tablet contains a set of row range. Initially, each table consists of just one tablet. As a table grows, it is automatically split into multiple tablets (typically 100-200 MB in size). Tablet servers can be added or removed dynamically. The master assigns tablets to tablet servers and balances tablet server load. It is also responsible for garbage collection of files in GFS and managing schema changes (table and column family creation). Each tablet server manages a set of tablets (typically 10-1,000 tablets per server). It handles read/write requests to the tablets it manages and splits tablets when a tablet becomes too large. As with other distributed systems client data does not move through the master, clients communicate directly with tablet servers for reads/writes operation. This makes the master lightly loaded. 2.5. Architecture Implementation This section describes the fundamentals of the Bigtable architecture implementation. Building Blocks Bigtable is not independent. It is built on several other pieces of Google infrastructure to do what it does. Bigtable uses the Google File System (GFS) to store data and log files. It provides efficient, reliable access to data using large clusters of commodity hardware. Bigtable depends on a cluster management system it schedule jobs, manage resources on shared machines, deal with machine failures, and monitor machine status. SSTable file format is used to store Bigtable data. SSTables are designed so that a data access requires, at most, a single disk access. An SSTable, once created, is never changed. If new data is added, a new SSTable is created. Once an old SSTable is no longer needed, it is set out for garbage collection. SSTable immutability is at the core of Bigtable’s data check pointing and recovery routines. Chubby is a highly available and persistent distributed lock service that manages leases for resources and stores configuration information. The service runs with five active replicas, one of which is elected as the master to serve requests. A majority must be running for the service to work. Paxos algorithm is used to keep the replicas consistent. Chubby provides a namespace of files & directories. Each file or directory can be used as a lock. Bigtable uses Chubby to ensure there is only one active master, to store the bootstrap location of Bigtable data, to discover tablet servers, to store Bigtable schema information, and to store access control lists. Tablet Location Locating tablet within a Bigtable is managed in a three-level hierarchy (Figure 3). The first level is a file stored in Chubby that contains the location of the root tablet. The root (top-level) tablet stores the location of all Metadata tablets in a special Metadata table. Each Metadata tablet contains the location of user data tablets. The client library caches tablet locations for efficiency. Some secondary information is stored in the METADATA table for debugging and performance analysis.
  • 6. Figure 3: Tablet Location Hierarchy Tablet Assignment A tablet is assigned to one tablet server at any point of time. The master is responsible to keep track of the set of live tablet servers, and the current assignment of tablets to tablet servers, including which tablets are unassigned. If a tablet is unassigned, and place is available in a tablet server, the master assigns the tablet by sending a tablet load request to the tablet server. Chubby keep track of tablet servers. When a tablet server starts, it creates, and acquires an exclusive lock on, a uniquely named file in a specific Chubby directory. The master monitors this directory (the servers directory) to discover tablet servers. Whenever a master is started by the Bigtable cluster management system, it executes the following steps to discover the current tablet assignments (1) The master grabs a unique master lock in Chubby, which prevents con-current master instantiations. (2) The master scans the server’s directory in Chubby to find the live servers. (3) The master communicates with every live tablet server to discover what tablets are already assigned to each server. (4) The master scans the METADATA table to learn the set of tablets. (5) Builds a set of unassigned tablet, which are become eligible for assignment. Tablet splits are treated specially since a tablet server initiates them. The tablet server commits the split by recording information for the new tablet in the METADATA table. When the split has committed, it notifies the master. Tablet Serving Bigtable stores the persistent state of its data into GFS (Figure 4). Any updates information to the Bigtable are first stored in a commit log, which is basically redo records. This redo records are used for recovery incase of failure. The recently committed updates are stored in memtable (in memory sorted buffer); the older updates are stored in a sequence of SSTables. Chubby file Root tablet (1st METADATA tablet) Other METADATA tablets User Table 1 User Table N
  • 7. Figure 4: Tablet Representation When a write operation arrives at a tablet server, the server checks for well- formedness of the data, and that the sender is authorized to perform the mutation (Mutation is abstraction to perform a series of updates). A valid mutation is first written to the commit log, after that its contents are inserted into the memtable. When a read operation arrives at a tablet server, it is similarly checked for well- formedness and proper authorization. A valid read operation is executed on a merged view of the sequence of SSTables and the memtable. Compactions Bigtable perform two kind of compaction one is minor compaction and another is major compaction. In minor compaction the existing memtable is frozen and a new meltable is created once its size reaches a threshold, and the frozen memtable is converted to an SSTable and written to GFS. It has two goals, one is to shrinks the memory usage of the tablet server, and second is to reduce the amount of data that has to be read from the commit log during recovery. In major compaction Bigtable reads the contents of many SSTable (created during minor compaction) and the meltable content and write out to a new SSTable. The SSTables produced during major compactions does not contain special deletion entries, which might be available in SSTable created during minor compaction. These major compactions allow Bigtable to reclaim resources used by deleted data. A client application can optionally specify which compression to use for SSTables. 2.6. Open Ends Where to use? Bigtable is ideal for applications that need very high throughput and scalability for non-structured data. Bigtable also can be used as a storage engine for batch MapReduce operations, stream processing, analytics, and machine-learning applications. Bigtable can be used to store and query marketing data (such as purchase histories and customer preferences), financial data (such as transaction histories, stock prices, and currency exchange rate), Internet of Things data (such as Write Op commit log memtable Read Op Memory GFS Tablet Serving SSTable Files
  • 8. usage reports from energy meters and home appliances) and Time-series data (such as CPU and memory usage over time for multiple servers). Bigtable is not a relational database, it does not support SQL queries or joins, nor does it support multi-row transactions. It is not a good solution for small amounts of data. What is Next? Research is going on for supporting some additional Bigtable features, such as support for secondary indices and infrastructure for building cross-data-center replicated Bigtables with multiple master replicas. 3. Conclusions I have described the characteristics of Bigtable, Data Model, the Architecture and its implementation and who need it. It is important to realize that data comes in many shapes and sizes. It also has many different uses; real-time fraud detection, web display advertising and competitive analysis, social media and sentiment analysis, intelligent traffic management and smart power grids, are few example. All of these analytical solutions involve significant volumes of both semi-structured and structured data. Many of these analytical solutions were not possible previously because they were too costly to implement using standard relational database system. Bigtable in combination with these new and evolving analytical processing technologies can bring significant benefits to the business. Initially Google developed this distributed system for storing structured data for its internal use. Bigtable clusters have been in production use since April 2005. Currently Bigtable used in more than hundreds of Goggle product and Google has many customers outside. Bigtable users like the performance and high availability provided by the Bigtable implementation, and that they can scale the capacity of their clusters by simply adding more machines to the system as their resource demands change over time. 4. References [1] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber: Bigtable: A Distributed Storage System for Structured Data. OSDI'06: Seventh Symposium on Operating System Design and Implementation, Seattle, WA, November 2006. [2] https://cloud.google.com/bigtable/ [3] https://en.wikipedia.org/wiki/BigTable [4] https://en.wikipedia.org/wiki/Big_data [5] https://en.wikipedia.org/wiki/Relational_database_management_system