SlideShare a Scribd company logo
Outline
                Introduction
                      Design
             Implementation
                     Results
                 Conclusions




Bigtable: A Distributed Storage System for
             Structured Data

                 Alvanos Michalis


                    April 6, 2009




            Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                         Introduction
                               Design
                      Implementation
                              Results
                          Conclusions

1   Introduction
       Motivation
2   Design
       Data model
3   Implementation
       Building blocks
       Tablets
       Compactions
       Refinements
4   Results
       Hardware Environment
       Performance Evaluation
5   Conclusions
       Real applications
       Lessons
       End
                     Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                           Introduction
                                 Design
                                          Motivation
                        Implementation
                                Results
                            Conclusions


Google!

     Lots of Different kinds of data!
          Crawling system URLs, contents, links, anchors, page-rank etc
          Per-user data: preferences, recent queries/ search history
          Geographic data, images etc ...
     Many incoming requests
     No commercial system is big enough
          Scale is too large for commercial databases
          May not run on their commodity hardware
          No dependence on other vendors
          Optimizations
          Better Price/Performance
          Building internally means the system can be applied across
          many projects for low incremental cost

                       Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                           Introduction
                                 Design
                                          Motivation
                        Implementation
                                Results
                            Conclusions


Google goals



     Fault-tolerant, persistent
     Scalable
          1000s of servers
          Millions of reads/writes, efficient scans
     Self-managing
     Simple!




                       Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                            Introduction
                                  Design
                                           Data model
                         Implementation
                                 Results
                             Conclusions


Bigtable




  Definition
  A Bigtable is a sparse, distributed, persistent multidimensional
  sorted map.

  The map is indexed by a row key, column key, and a timestamp;
  each value in the map is an uninterpreted array of bytes.

  (row:string, column:string, time:int64) -> string
                        Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                            Introduction
                                  Design
                                           Data model
                         Implementation
                                 Results
                             Conclusions


Rows




       The row keys in a table are arbitrary strings
       Every read or write of data under a single row key is atomic
       maintains data in lexicographic order by row key


                        Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                         Introduction
                               Design
                                        Data model
                      Implementation
                              Results
                          Conclusions


Column Families




     Grouped into sets called column families
     All data stored in a column family is usually of the same type
     A column family must be created before data can be stored
     under any column key in that family
     A column key is named using the following syntax:
     family:qualifier
                     Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                          Introduction
                                Design
                                         Data model
                       Implementation
                               Results
                           Conclusions


Timestamps


     Each cell in a Bigtable can contain multiple versions of the
     same data; these versions are indexed by timestamp (64-bit
     integers).
     Applications that need to avoid collisions must generate
     unique timestamps themselves.
     To make the management of versioned data less onerous, they
     support two per-column-family settings that tell Bigtable to
     garbage-collect cell versions automatically.




                      Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                           Introduction   Building blocks
                                 Design   Tablets
                        Implementation    Compactions
                                Results   Refinements
                            Conclusions


Infrastructure


      Google WorkQueue (scheduler)
      GFS: large-scale distributed file system
          Master: responsible for metadata
          Chunk servers: responsible for r/w large chunks of data
          Chunks replicated on 3 machines; master responsible
      Chubby: lock/file/name service
          Coarse-grained locks; can store small amount of data in a lock
          5 replicas; need a majority vote to be active




                       Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                          Introduction   Building blocks
                                Design   Tablets
                       Implementation    Compactions
                               Results   Refinements
                           Conclusions


SSTable




     Lives in GFS
     Immutable, sorted file of key-value pairs
     Chunks of data plus an index
     Index is of block ranges, not values

                      Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                         Introduction   Building blocks
                               Design   Tablets
                      Implementation    Compactions
                              Results   Refinements
                          Conclusions


Tablet Design




     Large tables broken into tablets at row boundaries
         Tablets hold contiguous rows
         Approx 100 200 MB of data per tablet
     Approx 100 tablets per machine
         Fast recovery
         Load-balancing
     Built out of multiple SSTables
                     Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                           Introduction   Building blocks
                                 Design   Tablets
                        Implementation    Compactions
                                Results   Refinements
                            Conclusions


Tablet Location




      Like a B+-tree, but fixed at 3 levels
      How can we avoid creating a bottleneck at the root?
          Aggressively cache tablet locations
          Lookup starts from leaf (bet on it being correct); reverse on
          miss
                       Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                           Introduction   Building blocks
                                 Design   Tablets
                        Implementation    Compactions
                                Results   Refinements
                            Conclusions


Tablet Assignment
     Each tablet is assigned to one tablet server at a time. The
     master keeps track of the set of live tablet servers, and the
     current assignment of tablets to tablet servers.
     Bigtable uses Chubby to keep track of tablet servers. When a
     tablet server starts, it creates, and acquires an exclusive lock
     on, a uniquely-named file in a specific Chubby directory.
     Tablet server stops serving its tablets if loses its exclusive lock
     The master is responsible for detecting when a tablet server is
     no longer serving its tablets, and for reassigning those tablets
     as soon as possible.
     When a master is started by the cluster management system,
     it needs to discover the current tablet assignments before it
     can change them.
                       Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                          Introduction   Building blocks
                                Design   Tablets
                       Implementation    Compactions
                               Results   Refinements
                           Conclusions


Serving a Tablet




      Updates are logged
      Each SSTable corresponds to a batch of updates or a
      snapshot of the tablet taken at some earlier time
      Memtable (sorted by key) caches recent updates
      Reads consult both memtable and SSTables
                      Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                           Introduction   Building blocks
                                 Design   Tablets
                        Implementation    Compactions
                                Results   Refinements
                            Conclusions


Compactions

  As write operations execute, the size of the memtable increases.
      Minor compaction convert the memtable into an SSTable
           Reduce memory usage
           Reduce log traffic on restart
      Merging compaction
           Periodically executed in the background
           Reduce number of SSTables
           Good place to apply policy keep only N versions
      Major compaction
           Merging compaction that results in only one SSTable
           No deletion records, only live data
           Reclaim resources.

                       Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                          Introduction   Building blocks
                                Design   Tablets
                       Implementation    Compactions
                               Results   Refinements
                           Conclusions


Refinements (1/2)


     Group column families together into an SSTable. Segregating
     column families that are not typically accessed together into
     separate locality groups enables more efficient reads.
     Can compress locality groups, using Bentley and McIlroy’s
     scheme and a fast compression algorithm that looks for
     repetitions.
     Bloom Filters on locality groups allows to ask whether an
     SSTable might contain any data for a specified row/column
     pair. Drastically reduces the number of disk seeks required -
     for non-existent rows or columns do not need to touch disk.


                      Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                          Introduction   Building blocks
                                Design   Tablets
                       Implementation    Compactions
                               Results   Refinements
                           Conclusions


Refinements (2/2)


     Caching for read performance ( two levels of caching)
         Scan Cache: higher-level cache that caches the key-value pairs
         returned by the SSTable interface to the tablet server code.
         Block Cache: lower-level cache that caches SSTables blocks
         that were read from GFS.
     Commit-log implementation
     Speeding up tablet recovery (log entries)
     Exploiting immutability



                      Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                          Introduction
                                Design   Hardware Environment
                       Implementation    Performance Evaluation
                               Results
                           Conclusions


Hardware Environment


     Tablet servers were configured to use 1 GB of memory and to
     write to a GFS cell consisting of 1786 machines with two 400
     GB IDE hard drives each.
     Each machine had two dual-core Opteron 2 GHz chips
     Enough physical memory to hold the working set of all
     running processes
     Single gigabit Ethernet link
     Two-level tree-shaped switched network with 100-200 Gbps
     aggregate bandwidth at the root.



                      Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                          Introduction
                                Design   Hardware Environment
                       Implementation    Performance Evaluation
                               Results
                           Conclusions


Results Per Tablet Server




  Number of 1000-byte values read/written per second.


                      Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                          Introduction
                                Design   Hardware Environment
                       Implementation    Performance Evaluation
                               Results
                           Conclusions


Results Aggregate Rate




  Number of 1000-byte values read/written per second.

                      Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                          Introduction
                                Design   Hardware Environment
                       Implementation    Performance Evaluation
                               Results
                           Conclusions


Single tablet-server performance


      The tablet server executes 1200 reads per second ( 75
      MB/s), enough to saturate the tablet server CPUs because of
      overheads in networking stack
      Random and sequential writes perform better than random
      reads (commit log and uses group commit)
      No significant difference between random writes and
      sequential writes (same commit log)
      Sequential reads perform better than random reads (block
      cache)



                      Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                          Introduction
                                Design   Hardware Environment
                       Implementation    Performance Evaluation
                               Results
                           Conclusions


Scaling


      Aggregate throughput increases dramatically performance of
      random reads from memory increases
      However, performance does not increase linearly
      Drop in per-server throughput
          Imbalance in load: Re-balancing is throttled to reduce the
          number of tablet movement and the load generated by
          benchmarks shifts around as the benchmark progresses
          The random read benchmark: transfer one 64KB block over
          the network for every 1000-byte read and saturates shared 1
          Gigabit links



                      Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                            Introduction
                                           Real applications
                                  Design
                                           Lessons
                         Implementation
                                           End
                                 Results
                             Conclusions


Timestamps




     Google Analytics
     Google Earth
     Personalized Search

                        Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                           Introduction
                                          Real applications
                                 Design
                                          Lessons
                        Implementation
                                          End
                                Results
                            Conclusions


Lessons learned
      Large distributed systems are vulnerable to many types of
      failures, not just the standard network partitions and fail-stop
      failures
          Memory and network corruption
          Large clock skew
          Extended and asymmetric network partitions
          Bugs in other systems (Chubby !)
          ...
      Delay adding new features until it is clear how the new
      features will be used
      A practical lesson: the importance of proper system-level
      monitoring
      Keep It Simple!
                       Alvanos Michalis   Bigtable: A Distributed Storage System for Structured Data
Outline
                  Introduction
                                 Real applications
                        Design
                                 Lessons
               Implementation
                                 End
                       Results
                   Conclusions


END!




 QUESTIONS ?




           Alvanos Michalis      Bigtable: A Distributed Storage System for Structured Data

More Related Content

What's hot

Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
StreamNative
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)
Romain Jacotin
 
Dynamo and BigTable - Review and Comparison
Dynamo and BigTable - Review and ComparisonDynamo and BigTable - Review and Comparison
Dynamo and BigTable - Review and Comparison
Grisha Weintraub
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
Tathastu.ai
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalability
jbellis
 
Summary of "Google's Big Table" at nosql summer reading in Tokyo
Summary of "Google's Big Table" at nosql summer reading in TokyoSummary of "Google's Big Table" at nosql summer reading in Tokyo
Summary of "Google's Big Table" at nosql summer reading in Tokyo
CLOUDIAN KK
 
Bigtable and Dynamo
Bigtable and DynamoBigtable and Dynamo
Bigtable and Dynamo
Iraklis Psaroudakis
 
An Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseAn Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed Database
Benjamin Bengfort
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
University of California, Santa Cruz
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
Cloudera, Inc.
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
ScyllaDB
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
Lars Hofhansl
 
Google file system GFS
Google file system GFSGoogle file system GFS
Google file system GFS
zihad164
 
Hazelcast Introduction
Hazelcast IntroductionHazelcast Introduction
Hazelcast Introduction
CodeOps Technologies LLP
 
GOOGLE FILE SYSTEM
GOOGLE FILE SYSTEMGOOGLE FILE SYSTEM
GOOGLE FILE SYSTEM
JYoTHiSH o.s
 
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
Varad Meru
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
Mayur Rathod
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?
Venu Anuganti
 
Google - Bigtable
Google - BigtableGoogle - Bigtable
Google - Bigtable
영원 서
 

What's hot (20)

Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
Google file system
Google file systemGoogle file system
Google file system
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)
 
Dynamo and BigTable - Review and Comparison
Dynamo and BigTable - Review and ComparisonDynamo and BigTable - Review and Comparison
Dynamo and BigTable - Review and Comparison
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
What Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database ScalabilityWhat Every Developer Should Know About Database Scalability
What Every Developer Should Know About Database Scalability
 
Summary of "Google's Big Table" at nosql summer reading in Tokyo
Summary of "Google's Big Table" at nosql summer reading in TokyoSummary of "Google's Big Table" at nosql summer reading in Tokyo
Summary of "Google's Big Table" at nosql summer reading in Tokyo
 
Bigtable and Dynamo
Bigtable and DynamoBigtable and Dynamo
Bigtable and Dynamo
 
An Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseAn Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed Database
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
 
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
 
Apache HBase Performance Tuning
Apache HBase Performance TuningApache HBase Performance Tuning
Apache HBase Performance Tuning
 
Google file system GFS
Google file system GFSGoogle file system GFS
Google file system GFS
 
Hazelcast Introduction
Hazelcast IntroductionHazelcast Introduction
Hazelcast Introduction
 
GOOGLE FILE SYSTEM
GOOGLE FILE SYSTEMGOOGLE FILE SYSTEM
GOOGLE FILE SYSTEM
 
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
CS295 Week5: Megastore - Providing Scalable, Highly Available Storage for Int...
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?
 
Google - Bigtable
Google - BigtableGoogle - Bigtable
Google - Bigtable
 

Viewers also liked

Bigtable
BigtableBigtable
Bigtablenextlib
 
Google BigTable
Google BigTableGoogle BigTable
Big table
Big tableBig table
Big table
PSIT
 
Google Bigtable paper presentation
Google Bigtable paper presentationGoogle Bigtable paper presentation
Google Bigtable paper presentation
vanjakom
 
Cloud-native Apps - Architektur, Implementierung, Demo
Cloud-native Apps - Architektur, Implementierung, DemoCloud-native Apps - Architektur, Implementierung, Demo
Cloud-native Apps - Architektur, Implementierung, Demo
Andreas Koop
 
ROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEY
ROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEYROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEY
ROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEY
Support Driven
 
I've (probably) been using Google App Engine for a week longer than you have
I've (probably) been using Google App Engine for a week longer than you haveI've (probably) been using Google App Engine for a week longer than you have
I've (probably) been using Google App Engine for a week longer than you have
Simon Willison
 
Big table
Big tableBig table
Big table
Manuel Correa
 
Innovation Case Study BitTorrent
Innovation Case Study BitTorrentInnovation Case Study BitTorrent
Innovation Case Study BitTorrentPetter S. Rønning
 
Privacy preserving public auditing for regenerating-code-based cloud storage
Privacy preserving public auditing for regenerating-code-based cloud storagePrivacy preserving public auditing for regenerating-code-based cloud storage
Privacy preserving public auditing for regenerating-code-based cloud storage
parry prabhu
 
Privacy preserving public auditing for regenerating-code-based cloud storage
Privacy preserving public auditing for regenerating-code-based cloud storagePrivacy preserving public auditing for regenerating-code-based cloud storage
Privacy preserving public auditing for regenerating-code-based cloud storage
Nagamalleswararao Tadikonda
 
Privacy preserving public auditing for secure cloud storage
Privacy preserving public auditing for secure cloud storagePrivacy preserving public auditing for secure cloud storage
Privacy preserving public auditing for secure cloud storageMustaq Syed
 
Privacy Preserving Public Auditing for Data Storage Security in Cloud
Privacy Preserving Public Auditing for Data Storage Security in Cloud Privacy Preserving Public Auditing for Data Storage Security in Cloud
Privacy Preserving Public Auditing for Data Storage Security in Cloud
Girish Chandra
 
Privacy Preserving Public Auditing for Data Storage Security in Cloud.ppt
Privacy Preserving Public Auditing for Data Storage Security in Cloud.pptPrivacy Preserving Public Auditing for Data Storage Security in Cloud.ppt
Privacy Preserving Public Auditing for Data Storage Security in Cloud.ppt
Girish Chandra
 

Viewers also liked (20)

google Bigtable
google Bigtablegoogle Bigtable
google Bigtable
 
Bigtable
BigtableBigtable
Bigtable
 
Google BigTable
Google BigTableGoogle BigTable
Google BigTable
 
Google's BigTable
Google's BigTableGoogle's BigTable
Google's BigTable
 
Big table
Big tableBig table
Big table
 
Big table
Big tableBig table
Big table
 
Bigtable
BigtableBigtable
Bigtable
 
Google Bigtable paper presentation
Google Bigtable paper presentationGoogle Bigtable paper presentation
Google Bigtable paper presentation
 
Cloud-native Apps - Architektur, Implementierung, Demo
Cloud-native Apps - Architektur, Implementierung, DemoCloud-native Apps - Architektur, Implementierung, Demo
Cloud-native Apps - Architektur, Implementierung, Demo
 
ROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEY
ROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEYROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEY
ROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEY
 
I've (probably) been using Google App Engine for a week longer than you have
I've (probably) been using Google App Engine for a week longer than you haveI've (probably) been using Google App Engine for a week longer than you have
I've (probably) been using Google App Engine for a week longer than you have
 
Big table
Big tableBig table
Big table
 
Innovation Case Study BitTorrent
Innovation Case Study BitTorrentInnovation Case Study BitTorrent
Innovation Case Study BitTorrent
 
Privacy preserving public auditing for regenerating-code-based cloud storage
Privacy preserving public auditing for regenerating-code-based cloud storagePrivacy preserving public auditing for regenerating-code-based cloud storage
Privacy preserving public auditing for regenerating-code-based cloud storage
 
Big table
Big tableBig table
Big table
 
Privacy preserving public auditing for regenerating-code-based cloud storage
Privacy preserving public auditing for regenerating-code-based cloud storagePrivacy preserving public auditing for regenerating-code-based cloud storage
Privacy preserving public auditing for regenerating-code-based cloud storage
 
Bigtable
BigtableBigtable
Bigtable
 
Privacy preserving public auditing for secure cloud storage
Privacy preserving public auditing for secure cloud storagePrivacy preserving public auditing for secure cloud storage
Privacy preserving public auditing for secure cloud storage
 
Privacy Preserving Public Auditing for Data Storage Security in Cloud
Privacy Preserving Public Auditing for Data Storage Security in Cloud Privacy Preserving Public Auditing for Data Storage Security in Cloud
Privacy Preserving Public Auditing for Data Storage Security in Cloud
 
Privacy Preserving Public Auditing for Data Storage Security in Cloud.ppt
Privacy Preserving Public Auditing for Data Storage Security in Cloud.pptPrivacy Preserving Public Auditing for Data Storage Security in Cloud.ppt
Privacy Preserving Public Auditing for Data Storage Security in Cloud.ppt
 

Similar to Bigtable: A Distributed Storage System for Structured Data

Scaling Up vs. Scaling-out
Scaling Up vs. Scaling-outScaling Up vs. Scaling-out
Scaling Up vs. Scaling-out
Christopher Nadeau
 
Snowflake Cloning.pdf
Snowflake Cloning.pdfSnowflake Cloning.pdf
Snowflake Cloning.pdf
VishnuGone
 
Bigtable
BigtableBigtable
Bigtableptdorf
 
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
Zilliz
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Vipin Batra
 
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San FranciscoMySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
Dave Stokes
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06temp2004it
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06
mrlonganh
 
MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018
Dave Stokes
 
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
IJCERT JOURNAL
 
The google file system
The google file systemThe google file system
The google file system
Daniel Checchia
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
James Serra
 
Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...
Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...
Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...
Bill Wilder
 

Similar to Bigtable: A Distributed Storage System for Structured Data (20)

Scaling Up vs. Scaling-out
Scaling Up vs. Scaling-outScaling Up vs. Scaling-out
Scaling Up vs. Scaling-out
 
Snowflake Cloning.pdf
Snowflake Cloning.pdfSnowflake Cloning.pdf
Snowflake Cloning.pdf
 
Bigtable
BigtableBigtable
Bigtable
 
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
VectorDB Schema Design 101 - Considerations for Building a Scalable and Perfo...
 
NOSQL
NOSQLNOSQL
NOSQL
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San FranciscoMySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
MySQL 8 Tips and Tricks from Symfony USA 2018, San Francisco
 
gfs-sosp2003
gfs-sosp2003gfs-sosp2003
gfs-sosp2003
 
gfs-sosp2003
gfs-sosp2003gfs-sosp2003
gfs-sosp2003
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06
 
MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018MySQL 8 Server Optimization Swanseacon 2018
MySQL 8 Server Optimization Swanseacon 2018
 
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
 
Gfs论文
Gfs论文Gfs论文
Gfs论文
 
The google file system
The google file systemThe google file system
The google file system
 
Data Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future OutlookData Warehousing Trends, Best Practices, and Future Outlook
Data Warehousing Trends, Best Practices, and Future Outlook
 
Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...
Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...
Cloud Architecture Patterns for Mere Mortals - Bill Wilder - Vermont Code Cam...
 
Sybase IQ Big Data
Sybase IQ Big DataSybase IQ Big Data
Sybase IQ Big Data
 
Sybase IQ ve Big Data
Sybase IQ ve Big DataSybase IQ ve Big Data
Sybase IQ ve Big Data
 

More from elliando dias

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slides
elliando dias
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScriptelliando dias
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structures
elliando dias
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de containerelliando dias
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agilityelliando dias
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Librarieselliando dias
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!elliando dias
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Webelliando dias
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduinoelliando dias
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorceryelliando dias
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Designelliando dias
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makes
elliando dias
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.elliando dias
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebookelliando dias
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Studyelliando dias
 

More from elliando dias (20)

Clojurescript slides
Clojurescript slidesClojurescript slides
Clojurescript slides
 
Why you should be excited about ClojureScript
Why you should be excited about ClojureScriptWhy you should be excited about ClojureScript
Why you should be excited about ClojureScript
 
Functional Programming with Immutable Data Structures
Functional Programming with Immutable Data StructuresFunctional Programming with Immutable Data Structures
Functional Programming with Immutable Data Structures
 
Nomenclatura e peças de container
Nomenclatura  e peças de containerNomenclatura  e peças de container
Nomenclatura e peças de container
 
Geometria Projetiva
Geometria ProjetivaGeometria Projetiva
Geometria Projetiva
 
Polyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better AgilityPolyglot and Poly-paradigm Programming for Better Agility
Polyglot and Poly-paradigm Programming for Better Agility
 
Javascript Libraries
Javascript LibrariesJavascript Libraries
Javascript Libraries
 
How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!How to Make an Eight Bit Computer and Save the World!
How to Make an Eight Bit Computer and Save the World!
 
Ragel talk
Ragel talkRagel talk
Ragel talk
 
A Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the WebA Practical Guide to Connecting Hardware to the Web
A Practical Guide to Connecting Hardware to the Web
 
Introdução ao Arduino
Introdução ao ArduinoIntrodução ao Arduino
Introdução ao Arduino
 
Minicurso arduino
Minicurso arduinoMinicurso arduino
Minicurso arduino
 
Incanter Data Sorcery
Incanter Data SorceryIncanter Data Sorcery
Incanter Data Sorcery
 
Rango
RangoRango
Rango
 
Fab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine DesignFab.in.a.box - Fab Academy: Machine Design
Fab.in.a.box - Fab Academy: Machine Design
 
The Digital Revolution: Machines that makes
The Digital Revolution: Machines that makesThe Digital Revolution: Machines that makes
The Digital Revolution: Machines that makes
 
Hadoop + Clojure
Hadoop + ClojureHadoop + Clojure
Hadoop + Clojure
 
Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.Hadoop - Simple. Scalable.
Hadoop - Simple. Scalable.
 
Hadoop and Hive Development at Facebook
Hadoop and Hive Development at FacebookHadoop and Hive Development at Facebook
Hadoop and Hive Development at Facebook
 
Multi-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case StudyMulti-core Parallelization in Clojure - a Case Study
Multi-core Parallelization in Clojure - a Case Study
 

Recently uploaded

The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 

Recently uploaded (20)

The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 

Bigtable: A Distributed Storage System for Structured Data

  • 1. Outline Introduction Design Implementation Results Conclusions Bigtable: A Distributed Storage System for Structured Data Alvanos Michalis April 6, 2009 Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 2. Outline Introduction Design Implementation Results Conclusions 1 Introduction Motivation 2 Design Data model 3 Implementation Building blocks Tablets Compactions Refinements 4 Results Hardware Environment Performance Evaluation 5 Conclusions Real applications Lessons End Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 3. Outline Introduction Design Motivation Implementation Results Conclusions Google! Lots of Different kinds of data! Crawling system URLs, contents, links, anchors, page-rank etc Per-user data: preferences, recent queries/ search history Geographic data, images etc ... Many incoming requests No commercial system is big enough Scale is too large for commercial databases May not run on their commodity hardware No dependence on other vendors Optimizations Better Price/Performance Building internally means the system can be applied across many projects for low incremental cost Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 4. Outline Introduction Design Motivation Implementation Results Conclusions Google goals Fault-tolerant, persistent Scalable 1000s of servers Millions of reads/writes, efficient scans Self-managing Simple! Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 5. Outline Introduction Design Data model Implementation Results Conclusions Bigtable Definition A Bigtable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes. (row:string, column:string, time:int64) -> string Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 6. Outline Introduction Design Data model Implementation Results Conclusions Rows The row keys in a table are arbitrary strings Every read or write of data under a single row key is atomic maintains data in lexicographic order by row key Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 7. Outline Introduction Design Data model Implementation Results Conclusions Column Families Grouped into sets called column families All data stored in a column family is usually of the same type A column family must be created before data can be stored under any column key in that family A column key is named using the following syntax: family:qualifier Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 8. Outline Introduction Design Data model Implementation Results Conclusions Timestamps Each cell in a Bigtable can contain multiple versions of the same data; these versions are indexed by timestamp (64-bit integers). Applications that need to avoid collisions must generate unique timestamps themselves. To make the management of versioned data less onerous, they support two per-column-family settings that tell Bigtable to garbage-collect cell versions automatically. Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 9. Outline Introduction Building blocks Design Tablets Implementation Compactions Results Refinements Conclusions Infrastructure Google WorkQueue (scheduler) GFS: large-scale distributed file system Master: responsible for metadata Chunk servers: responsible for r/w large chunks of data Chunks replicated on 3 machines; master responsible Chubby: lock/file/name service Coarse-grained locks; can store small amount of data in a lock 5 replicas; need a majority vote to be active Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 10. Outline Introduction Building blocks Design Tablets Implementation Compactions Results Refinements Conclusions SSTable Lives in GFS Immutable, sorted file of key-value pairs Chunks of data plus an index Index is of block ranges, not values Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 11. Outline Introduction Building blocks Design Tablets Implementation Compactions Results Refinements Conclusions Tablet Design Large tables broken into tablets at row boundaries Tablets hold contiguous rows Approx 100 200 MB of data per tablet Approx 100 tablets per machine Fast recovery Load-balancing Built out of multiple SSTables Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 12. Outline Introduction Building blocks Design Tablets Implementation Compactions Results Refinements Conclusions Tablet Location Like a B+-tree, but fixed at 3 levels How can we avoid creating a bottleneck at the root? Aggressively cache tablet locations Lookup starts from leaf (bet on it being correct); reverse on miss Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 13. Outline Introduction Building blocks Design Tablets Implementation Compactions Results Refinements Conclusions Tablet Assignment Each tablet is assigned to one tablet server at a time. The master keeps track of the set of live tablet servers, and the current assignment of tablets to tablet servers. Bigtable uses Chubby to keep track of tablet servers. When a tablet server starts, it creates, and acquires an exclusive lock on, a uniquely-named file in a specific Chubby directory. Tablet server stops serving its tablets if loses its exclusive lock The master is responsible for detecting when a tablet server is no longer serving its tablets, and for reassigning those tablets as soon as possible. When a master is started by the cluster management system, it needs to discover the current tablet assignments before it can change them. Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 14. Outline Introduction Building blocks Design Tablets Implementation Compactions Results Refinements Conclusions Serving a Tablet Updates are logged Each SSTable corresponds to a batch of updates or a snapshot of the tablet taken at some earlier time Memtable (sorted by key) caches recent updates Reads consult both memtable and SSTables Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 15. Outline Introduction Building blocks Design Tablets Implementation Compactions Results Refinements Conclusions Compactions As write operations execute, the size of the memtable increases. Minor compaction convert the memtable into an SSTable Reduce memory usage Reduce log traffic on restart Merging compaction Periodically executed in the background Reduce number of SSTables Good place to apply policy keep only N versions Major compaction Merging compaction that results in only one SSTable No deletion records, only live data Reclaim resources. Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 16. Outline Introduction Building blocks Design Tablets Implementation Compactions Results Refinements Conclusions Refinements (1/2) Group column families together into an SSTable. Segregating column families that are not typically accessed together into separate locality groups enables more efficient reads. Can compress locality groups, using Bentley and McIlroy’s scheme and a fast compression algorithm that looks for repetitions. Bloom Filters on locality groups allows to ask whether an SSTable might contain any data for a specified row/column pair. Drastically reduces the number of disk seeks required - for non-existent rows or columns do not need to touch disk. Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 17. Outline Introduction Building blocks Design Tablets Implementation Compactions Results Refinements Conclusions Refinements (2/2) Caching for read performance ( two levels of caching) Scan Cache: higher-level cache that caches the key-value pairs returned by the SSTable interface to the tablet server code. Block Cache: lower-level cache that caches SSTables blocks that were read from GFS. Commit-log implementation Speeding up tablet recovery (log entries) Exploiting immutability Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 18. Outline Introduction Design Hardware Environment Implementation Performance Evaluation Results Conclusions Hardware Environment Tablet servers were configured to use 1 GB of memory and to write to a GFS cell consisting of 1786 machines with two 400 GB IDE hard drives each. Each machine had two dual-core Opteron 2 GHz chips Enough physical memory to hold the working set of all running processes Single gigabit Ethernet link Two-level tree-shaped switched network with 100-200 Gbps aggregate bandwidth at the root. Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 19. Outline Introduction Design Hardware Environment Implementation Performance Evaluation Results Conclusions Results Per Tablet Server Number of 1000-byte values read/written per second. Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 20. Outline Introduction Design Hardware Environment Implementation Performance Evaluation Results Conclusions Results Aggregate Rate Number of 1000-byte values read/written per second. Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 21. Outline Introduction Design Hardware Environment Implementation Performance Evaluation Results Conclusions Single tablet-server performance The tablet server executes 1200 reads per second ( 75 MB/s), enough to saturate the tablet server CPUs because of overheads in networking stack Random and sequential writes perform better than random reads (commit log and uses group commit) No significant difference between random writes and sequential writes (same commit log) Sequential reads perform better than random reads (block cache) Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 22. Outline Introduction Design Hardware Environment Implementation Performance Evaluation Results Conclusions Scaling Aggregate throughput increases dramatically performance of random reads from memory increases However, performance does not increase linearly Drop in per-server throughput Imbalance in load: Re-balancing is throttled to reduce the number of tablet movement and the load generated by benchmarks shifts around as the benchmark progresses The random read benchmark: transfer one 64KB block over the network for every 1000-byte read and saturates shared 1 Gigabit links Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 23. Outline Introduction Real applications Design Lessons Implementation End Results Conclusions Timestamps Google Analytics Google Earth Personalized Search Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 24. Outline Introduction Real applications Design Lessons Implementation End Results Conclusions Lessons learned Large distributed systems are vulnerable to many types of failures, not just the standard network partitions and fail-stop failures Memory and network corruption Large clock skew Extended and asymmetric network partitions Bugs in other systems (Chubby !) ... Delay adding new features until it is clear how the new features will be used A practical lesson: the importance of proper system-level monitoring Keep It Simple! Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data
  • 25. Outline Introduction Real applications Design Lessons Implementation End Results Conclusions END! QUESTIONS ? Alvanos Michalis Bigtable: A Distributed Storage System for Structured Data