Big table presentation-final

A Distributed Storage System for
Structured Data
Bigtable
Presenter:
Yunming Zhang
Conglong Li
Saturday, September 21, 13

References
SOCC 2010 Key Note Slides
Jeff Dean Google
Introduction to Distributed Computing, Winter 2008
University of Washington
2

Motivation
Lots of (semi) structured data at Google
URLs
Contents, crawl metadata, links
Per-user data:
User preference settings, search results
Scale is large
Billions of URLs, hundreds of million of users,
Existing Commercial database doesn’t meet the
requirements
3

Store and manage all the state reliably and efﬁciently
Allow asynchronous processes to update different
pieces of data continuously
Very high read/write rates
Efﬁcient scans over all or interesting subsets of
data
Often want to examine data changes over time
Goals
4

BigTable vs. GFS
GFS provides raw data storage
We need:
More sophisticated storage
Key - value mapping
Flexible enough to be useful
Store semi-structured data
Reliable, scalable, etc.
5

BigTable
Bigtable is a distributed storage system for managing
large scale structured data
Wide applicability
Scalability
High performance
High availability
6

Overview
Data Model
API
Implementation Structures
Optimizations
Performance Evaluation
Applications
Conclusions
7

Data Model
Sparse
Sorted
Multidimensional
8

Cell
Contains multiple versions of the data
Can locate a data using row key, column key and a
time stamp
Treats data as uninterpreted array of bytes that allow
clients to serialize various forms of structured and
semi-structured data
Supports automatic garbage collection per column
family for management of versioned data
9

Store and manage all the state reliably and efﬁciently
data
Goals
10

Row
Row key is an arbitrary string
Access to column data in a row is atomic
Row creation is implicit upon storing data
Rows ordered lexicographically
Rows close together lexicographically usually reside
on one or a small number of machines
11

Columns
Columns are grouped into Column Families:
family:optional_qualiﬁer
Column family
Has associated type information
Usually of the same type 12

Overview
Data Model
API
Optimizations
Applications
Conclusions
13

API
Metadata operations
Create/delete tables, column families, change
metadata, modify access control list
Writes ( atomic )
Set (), DeleteCells(), DeleteRow()
Reads
Scanner: read arbitrary cells in a BigTable
14

Overview
Data Model
API
Optimizations
Applications
Conclusions
15

Tablets
Large tables broken into tablets at row boundaries
Tablet holds contiguous range of rows
Clients can often choose row keys for locality
Aim for ~100MB to 200MB of data per tablet
Serving machine responsible for ~100 tablets
Fast recovery:
100 machine each pick up 1 tablet from failed machine
Fine-grained load balancing:
Migrate tablets away from overloaded machine
16

Tablets and Splitting

System Structure
Master
Metadata operations
Load balancing
Keep track of live tablet servers
Master failure
Tablet server
Accept read and write to data
18

System Structure

System Structure
read/write

System Structure
Metadata operations

Locating Tablets
3-level hierarchical lookup scheme for tablets
Location is ip port of servers in META tables
22

Tablet Representation
and serving
Append only tablet log
SSTable on GFS
A Sorted map of string to string
If you want to ﬁnd a row data, all the data are
contiguous
Memtable write buffer
When a read comes in, you have to merge SSTable data
and uncommitted value.
23

and Serving
24

and Serving
25

Compaction
Tablet state represented as a set of immutable compacted
SSTable ﬁles, plus tail of log
Minor compaction:
When in-memory buffer ﬁlls up, it freezes the in-memory
buffer and create a new SSTable
Major compaction:
Periodically compact all SSTables for tablet into new base
SSTable on GFS
Storage reclaimed from deletions at this point
Produce new tables
26

Overview
Data Model
API
Optimizations
Applications
Conclusions
27

Reliable system for storing and managing all the states
data
Goals
28

Locality Groups
Clients can group multiple column families together
into a locality group
A separate SSTable is generated for each locality group
Enable more efﬁcient read
Can be declared to be in-memory
29

Compression
Many opportunities for compression
Similar values in columns and cells
Within each SSTable for a locality group, encode
compressed blocks
Keep blocks small for random access
Exploit fact that many values very similar
30

Reliable system for storing and managing all the states
data
Goals
31

Commit log and recovery
Single commit log ﬁle per tablet server
reduce the number of concurrent ﬁle writes to GFS
Tablet Recovery
redo points in log
perform the same set of operations from last
persistent state
32

Overview
Data Model
API
Optimizations
Applications
Conclusions
33

Performance evaluation
Test Environment
Based on a GFS with 1876 machines
400 GB IDE hard drives in each machine
Two-level tree-shaped switched network
Performance Tests
Random Read/Write
Sequential Read/Write
34

Single tablet-server performance
Random reads is the slowest
Transfer 64 KB SSTable over GFS to read 1000 byte
Random and sequential writes perform better
Append writes to server to a single commit log
Group commit
35

Performance Scaling
Performance didn’t scale linearly
Load imbalance in multiple server conﬁgurations
Larger data transfer overhead
36

Overview
Data Model
API
Optimizations
Applications
Conclusions
37

Google Analytics
A service that analyzes trafﬁc patterns at web sites
Raw Click Table
Row for each end-user session
Row key is (website name, time)
Summary Table
Extracts recent session data using MapReduce jobs
38

Google Earth
Use one table for preprocessing and one for serving
Different latency requirements (disk vs memory)
Each row in the imagery table represents a single
geographic segment
Column family to store data source
One column for each raw image
Very sparse
39

Personalized Search
Row key is a unique userid
A column family for each type of user action
Replicated across Bigtable clusters to increase
availability and reduce latency
40

Conclusions
Bigtable provides a high scalability, high performance,
high availability and ﬂexible storage for structured
data.
It provides a low level read / write based interface for
other frameworks to build on top of it
It has enabled Google to deal with large scale data
efﬁciently
41

Big table presentation-final

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to Big table presentation-final

Similar to Big table presentation-final (20)

Recently uploaded

Recently uploaded (20)

Big table presentation-final