Bigtable

Bigtable

A Distributed Storage System for
Structured Data

Authors: Fay Chang et. al.

Presenter: Zafar Gilani

1

Bigtable

Outline
• Introduction
• Data model
• Implementation
• Performance evaluation
• Conclusions

2

Bigtable

A distributed storage system ..
• .. for managing structured data.
• Used for demanding workloads, such as:
– Throughput oriented batch processing.
– Serving latency-sensitive data to the client.
• Dynamic control instead of relational model.
• Data locality properties (revisit later briefly).

3

Bigtable

Bigtable has achieved several goals
• Wide applicability: used for 60+ Google
products, including:
– Google Analytics, Google Code, Google Earth,
Google Maps and Gmail.
• Scalability (explain later under evaluation).
• High performance.
• High availability.

4

Bigtable

Outline
• Introduction
• Data model
• Implementation
• Conclusions

5

Bigtable

Data model
• Essentially a sparse, distributed, persistent
multi-dimensional sorted map.
• The map is indexed by a row key, column key
and a timestamp.
• Atomic reads and writes over a single row.
Columns

Rows

6

Bigtable

Row and column range
• Row range dynamically • Column keys grouped
partitioned into tablets. into column families.
• Data in lexicographic • Each family has the
order. same type.
• Allows data locality. • Allows access control
and disk or memory
accounting.

7

Bigtable

Row and column range
• Row range dynamically • Column keys grouped
partitioned into tablets. into column families.
• Data in lexicographic • Each family has the
order. same type.
• Allows data locality. • Allows access control
and disk or memory
accounting.

Enables reasoning about data locality

8

Columns

Rows

Anchor is a column family

10

Columns

Tablets
“anchor:bbcworld.com “anchor:weather

“com.bbc.www” “BBC” “BBC.com”

11

Bigtable

Outline
• Introduction
• Data model
• Implementation
• Conclusions

12

Bigtable
Bigtable uses several other
technologies
• Google File System to store log and data files.
• SSTable file format to store BigTable data.
• Chubby, a distributed lock service.

For more details on these technologies, refer to section 4 of the paper.

13

Bigtable

Implementation
Master responsibilities:
-Assign tablets to tablet
servers
-Add/delete tablet servers
-Balance tablet server load
-GC
-Schema changes
INTERNET

CLIENT
Communicate directly
to tablet servers

MASTER
TABLET SERVERS
14

Bigtable

How data is stored?
A three-level hierarchy, similar to B+ trees.

15

Bigtable

Location hierarchy
Chubby file contains location
of the root tablet.

16

Bigtable

Location hierarchy
Root tablet contains all
tablet locations in Metadata
table.

17

Bigtable

Location hierarchy Metadata table stores
locations of actual tablets.

18

Bigtable

Location hierarchy

Client moves up the hierarchy (Metadata -> Root -> Chubby),
if location of tablet is unknown or incorrect.
19

Bigtable

How data is served?

20

Bigtable

Tablet serving

Persistent

21

Bigtable

Tablet serving

Compactions

Compactions occur
regularly, advantages:
-Shrinks memory usage.
-Reduces amount of data
read from log during
recovery. 22

Bigtable

Outline
• Introduction
• Data model
• Implementation
• Conclusions

23

Bigtable

Benchmarks for perf evaluation
• Scan:
– Scans over values in a row range.
• Random reads from memory.
• Random reads/writes:
– R keys to be read/written spread over N clients.
• Sequential reads/writes:
– 0 to R-1 keys to be read/written spread over N
clients.

24

Bigtable

Performance evaluation
Scan uses single RPC call and
shows best performance.

25

Bigtable


Sequential reads are better
than random reads, since
each fetched block is used to
serve next requests.

26

Bigtable


Random read shows the worst
performance. Fetching 64KB
every 1000 bytes is expensive.

27

Bigtable

Not linear, but scales well.

28

Bigtable

Outline
• Introduction
• Data model
• Implementation
• Conclusions

29

Bigtable

Conclusions
• Bigtable: highly scalable and available, without
compromising performance.
• Flexibility for Google – designed using their
own data model.
• Custom design gives Google the ability to
remove or minimize bottlenecks.
• Related work:
– Apache Hbase (open source)
– Boxwood (though targeted at a lower/FS level)

30

Bigtable

A Distributed Storage System for
Structured Data

Authors: Fay Chang et. al.

Presenter: Zafar Gilani

31

Bigtable

B+ Trees
• A tree with sorted data for:
– Efficient insertion, retrieval and removal of
records.
• All records are stored at the leaf level, only
keys stored in interior nodes.

32

Bigtable

More Related Content

What's hot

Viewers also liked

Similar to Bigtable

More from zafargilani

Recently uploaded

Bigtable