• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Summary of "Google's Big Table" at nosql summer reading in Tokyo
 

Summary of "Google's Big Table" at nosql summer reading in Tokyo

on

  • 12,695 views

This is the summary materials of "Google's Big Table" at nosql summer reading in Tokyo on July 22, 2010 hosted by Gemini

This is the summary materials of "Google's Big Table" at nosql summer reading in Tokyo on July 22, 2010 hosted by Gemini

Statistics

Views

Total Views
12,695
Views on SlideShare
6,299
Embed Views
6,396

Actions

Likes
18
Downloads
380
Comments
2

33 Embeds 6,396

http://nosql.mypopescu.com 3002
http://www.nosqldatabases.com 1798
http://blog.nosqlfan.com 1424
http://time-recorder.blogspot.com 75
http://translate.googleusercontent.com 16
http://time-recorder.blogspot.sg 10
http://webcache.googleusercontent.com 8
http://www.gemini-bigdata.com 7
http://time-recorder.blogspot.nl 7
http://time-recorder.blogspot.co.uk 4
http://time-recorder.blogspot.co.at 4
http://time-recorder.blogspot.hk 4
http://time-recorder.blogspot.in 4
http://time-recorder.blogspot.com.es 3
http://cache.baidu.com 3
http://hibari-gemini.blogspot.com 3
http://www.linkedin.com 2
http://time-recorder.blogspot.com.br 2
http://www.cloudian-blog.com 2
http://time-recorder.blogspot.ie 2
http://time-recorder.blogspot.pt 2
http://time-recorder.blogspot.ca 2
http://time-recorder.blogspot.ru 2
http://time-recorder.blogspot.com.au 1
http://time-recorder.blogspot.it 1
http://xue.uplook.cn 1
http://schoolbreakout.com 1
http://time-recorder.blogspot.tw 1
http://time-recorder.blogspot.de 1
http://irr.posterous.com 1
http://time-recorder.blogspot.ch 1
http://www.google.se 1
http://time-recorder.blogspot.fr 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel

12 of 2 previous next

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
  • i Lov IT....
    Are you sure you want to
    Your message goes here
    Processing…
  • Its My Seminar Topic...
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Summary of "Google's Big Table" at nosql summer reading in Tokyo Summary of "Google's Big Table" at nosql summer reading in Tokyo Presentation Transcript

    • Bigtable: A Distributed Storage System for Structured DataChang, et al., 2006.
      Gemini Mobile Technologies, Inc.
      NOSQL Tokyo Reading Group
      (http://nosqlsummer.org/city/tokyo)
      July 22, 2010
      2010/7/23
      Gemini Mobile Technologies, Inc.
      1
    • Bigtable: A Distributed Storage System for Structured Data
      Authors: Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber Fay
      Abstract: Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this paper we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.
      Appeared in: OSDI'06: Seventh Symposium on Operating System Design and Implementation, Seattle, WA, November, 2006.
      http://labs.google.com/papers/bigtable.html
      2010/7/23
      Gemini Mobile Technologies, Inc. All rights reserved.
      2
    • 1. Introduction
      “Big Table” is a distributed storage system for managing structured data.
      Scales to “Petabytes of data and thousands of machines”.
      Developed and in use at Google since 2005. Used for more than 60 Google products.
      2010/7/23
      Gemini Mobile Technologies, Inc. All rights reserved.
      3
    • 2. Data Model
      (row, column, time) => string
      Row, column, value are arbitrary strings.
      Every read or write of data under a single row key is atomic (regardless of the number of different columns being read or written in the row).
      Columns are dynamically added.
      Timestamps for different versions of data.
      Assigned by client application.
      Older versions are garbage-collected.
      Example: Web map
      2010/7/23
      Gemini Mobile Technologies, Inc. All rights reserved.
      4
    • 2.1 Tablets
      Rows are sorted lexicographically.
      Consecutive keys are grouped together as “tablets”.
      Allows data locality.
      Example rows: com.google.maps/index.html and com.google.maps/foo.html are likely to be in same tablet.
      2010/7/23
      Gemini Mobile Technologies, Inc. All rights reserved.
      5
    • 2.2 Column Families
      Column keys are grouped into sets called “column families”.
      Column key is named using syntax: family:qualifier
      Access control and disk/memory accounting are at column family level
      Example: “anchor:cnnsi.com”
      2010/7/23
      Gemini Mobile Technologies, Inc. All rights reserved.
      6
    • 3. API
      Data Design
      Creating/deleting tables and column families
      Changing cluster, table and column family metadata like access control rights
      Client Interactions
      Write/Delete values
      Read values
      Scan row ranges
      Single-row transactions (e.g., read/modify/write sequence for data under a row key)
      Map/Reduce integration.
      Read from Big Table; Write to Big Table.
      2010/7/23
      Gemini Mobile Technologies, Inc. All rights reserved.
      7
    • 4. Building Blocks
      SSTable file: Data structure for storage
      Maps keys to values
      Ordered. Enables data locality for efficient writes/reads.
      Immutable. On reads, no concurrency control needed. Need to garbage collect deleted data.
      Stored in Google File System (GFS), and optionally can be mapped into memory.
      Replicates data for redundancy.
      Chubby: Distributed lock service.
      Store the root tablet, schema info, access control list
      Synchronize and detect tablet servers
      2010/7/23
      Gemini Mobile Technologies, Inc. All rights reserved.
      8
    • 5. Implementation
      3 components:
      Client library
      Master Server (exactly 1).
      Assigns tablets to tablet servers.
      Detecting the addition and expiration of tablet servers.
      Balancing tablet-server load
      Garbage collection of GFS files
      Schema changes such as table and column family creations.
      Tablet Servers (multiple, dynamically added/removed)
      Handles read and write requests to the tablets that it has loaded
      Splits tablets that have grown too large. Each tablet 100-200 MB.
      2010/7/23
      Gemini Mobile Technologies, Inc. All rights reserved.
      9
    • 5.1 Tablet Location
      How to know which node to route client request?
      3-level hierarchy
      One file in Chubby for location of Root Tablet
      Root tablet contains location of Metadata tablets
      Metadata table contains location of user tablets
      Row: [Tablet’s Table ID] + [End Row]
      Key: [Node ID]
      Client library caches tablet locations.
      2010/7/23
      Gemini Mobile Technologies, Inc. All rights reserved.
      10
    • Tablet Assignment
      Master keeps track of tablet assignment and live servers
      Chubby
      Tablet server creates & locks a unique file.
      Tablet server stops serving if loses lock.
      Master periodically checks tablet servers. If fails, master tries to lock the file and un-assigns the tablet.
      Master failure does not change tablets assignments.
      Master restart
      2010/7/23
      Gemini Mobile Technologies, Inc. All rights reserved.
      11
    • 5.3 Tablet Serving
      Write
      Check well-formedness of request.
      Check authorization in Chubby file.
      Write to “tablet log” (i.e., a transaction log for “redo” in case of failure).
      Write to memtable (RAM).
      Separately, “compaction” moves memtable data to SSTable. And truncates tablet log.
      2010/7/23
      Gemini Mobile Technologies, Inc. All rights reserved.
      12
      Read
      Check well-formedness of request.
      Check authorization in Chubby file.
      Merge memtable and SSTables to find data.
      Return data.
    • 5.4 Compaction
      In order to control size of memtable, tablet log, and SSTable files, “compaction” is used.
      MinorCompaction. Move data from memtable to SSTable. Truncate tablet log.
      Merging Compaction. Merge multiple SSTables and memtable to a single SSTable.
      Major Compaction. Remove deleted data.
      2010/7/23
      Gemini Mobile Technologies, Inc. All rights reserved.
      13
    • 6. Refinements
      “Locality group”.
      Client can group multiple column families into a locality group. Enables more efficient reads since each locality group is a separate SSTable.
      Compression.
      Client can choose to compress at locality group level.
      Two level caching in servers
      Scan cache ( K/V pairs)
      Block cache (SSTable blocks read from GFS)
      Bloom filter
      Efficient check if a SSTable contain data for a row/column pair.
      Commit log implementation
      Each tablet server has a single commit log (not one-per-tablet).
      2010/7/23
      Gemini Mobile Technologies, Inc. All rights reserved.
      14
    • 7. Performance Evaluation
      Random reads are slowest. Need to access SSTable block from disk.
      Writes are faster than reads. Commit log is append-only. Reads require merging of SSTables and memtable.
      Scans reduce number of read operations.
      2010/7/23
      Gemini Mobile Technologies, Inc. All rights reserved.
      15
    • 7. Performance Evaluation: Scaling
      Not linear, but not bad up to 250 tablet servers.
      Random read has worst scaling. Block transfers saturate network.
      2010/7/23
      Gemini Mobile Technologies, Inc. All rights reserved.
      16
    • 8. Conclusions
      Satisfies goals of high-availability, high-performance, massively scalable data storage.
      API. Successfully used by various Google products (>60).
      Additional features in progress:
      Secondary indexes
      Cross data center replication.
      Deploy as a hosted service.
      Advantages of the custom development:
      Significant flexibility due to own data model.
      Can remove bottlenecks and inefficiencies as they arise.
      2010/7/23
      Gemini Mobile Technologies, Inc. All rights reserved.
      17
    • Big Table Family Tree
      2010/7/23
      Gemini Mobile Technologies, Inc. All rights reserved.
      18
      Non-relational DBs (HBase, Cassandra, MongoDB, etc.)
      Column-oriented data model.
      Multi-level storage (commit log, RAM table, SSTable)
      Tablet management (assignment, splitting, recovery, GC, Bloom filters)
      Google related technologies and open-source equivalents
      GFS => Hadoop Distributed File System (HDFS)
      Chubby => Zookeeper
      Map/Reduce => Apache Map/Reduce