SlideShare a Scribd company logo
1 of 28
Download to read offline
Google Bigtable
(Bigtable: A Distributed Storage System for Structured Data)

           Komadinovic Vanja, Vast Platform team
Presentation overview

- introduction
- design
- basic implementation
- GFS - HDFS introduction
- MapReduce introduction
- implementation
- HBase - Apache Bigtable solution
- performances and usage case
- some thoughts for discussion
Introduction

- what is bigtable?
- why need for something more than SQL?
- machines number increase - scaling
- fault tolerance
- row, column, cell
- versioning
- storage on distributed file system
- read, write operations on row or set of rows
- scanner - iterator
- no need for indexing
Design

- column family basic unit of access control

- column family must be created before data insertion

- column labels number is unlimited

- row key is arbitrary string, max. size 64Kb

- every cell has copies in time - configurable
Design
Design

- table is divided in tablets
- tablets are distributed on machines
- each tablet contains N rows
- tablet size 100 - 200 Mb
- tablet server splits tablet when it grows, only operation
initiated by tablet server
- tablets are stored on GFS
Design
Basic implementation
GFS - HDFS introduction

- distributed file system
- files are divided in relatively large blocks, 64Mb
- blocks of one file are stored on multiple machines
- every block is replicated in configurable number of copies
- fault tolerance guaranteed
- Master - Slaves architecture
GFS - HDFS introduction: overview
MapReduce introduction

- framework build on top of (GFS) HDFS
- used for distributed data processing
- Master - Slaves architecture
- between data elements there is no dependency
- everything is executed in parallel
- two parts: Mapper and Reducer
- Mapper reads input data and emits ( key, value pairs ) to
Reducer
- Data on Reducer is grouped by key
- Reducer produces output data
- Combiner and Partitioner - who are they, and what they want?
MapReduce introduction: overview
Implementation

- Chubby - distributed lock service, synchronization point, start
point for master, tablet server and even clients
- Master server, only one, guaranteed by Chubby
- in case of Master tragical death tablet servers continue to
serve client
- Master decides who serves each tablet ( including ROOT and
METATABLE )
- tablets are saved on GFS
- all trafic between clients and Bigtable is done directly within
client and tablet server ( Master is not required )
- client caches METATABLE, on first fault recache is done
Implementation: overview
Implementation: oragnization
- two special tables: ROOT and METADATA
- ROOT never gets splitted
- location of ROOT is in Chubby file
- estimated ROOT size 128Mb => 2^34 tablets
- METADATA contains all user tablets, row key is an encoding
of the tablet’s table identifier and its end row
Implementation

- SSTable - file representation of tablet
   - consist of blocks
   - column family labels are stored side by side
   - index of blocks at end of file

- compaction
   - minor
   - major

- locality groups

- caching
HBase - Apache Bigtable solution 
HBase - Bigtable synonyms

- Bigtable ~ HBase
- tablet ~ region
- Master server ~ HBase master
- Tablet server ~ HBase Region server
- Chubby ~ Zookeeper ???
- GFS ~ HDFS
- MapReduce ~ MapReduce :)
- SSTable file ~ MapFile file
HBase differences

- MapFiles index is stored in separate file instead at end of file
as in SSTable

- Unlike Bigtable which identifies a row range by the table name
and end-key, HBase identifies a row range by the table name
and start-key

- when the HBase master dies, the cluster will shut down
Performance and usage case
- number of 1000-byte values read/written per second
- table shows the rate per tablet server

           Experiment   1 TS    50 TS   250 TS   500 TS

            random      1212     593     479      241
             reads
             random     10811   8511    8000     6250
           reads mem

            random      8850    3745    3425     2000
             writes
           sequential   4425    2463    2625     2469
             reads
           sequential   8547    3623    2451     1905
             writes
             scans      15385   10526   9524     7843
Performances and usage case

Scaling:

- performance of random reads from memory increases by
almost a factor of 300 as the number of tablet server increases
by a factor of 500

- significant drop in per-server throughput when going from 1 to
50 tablet servers

- the random read benchmark shows the worst scaling (an
increase in aggregate throughput by only a factor of 100 for a
500-fold increase in number of servers)
Performances and usage case

Real users:

- Google Analytics:
   - It provides aggregate statistics, such as the number of
unique visitors per day and the page views per URL per day
   - two tables - raw user clicks (~200Tb, 14% compression)
and summary (~20Tb, 29% compression)

- Google Earth:
   - Google operates a collection of services that provide users
with access to high-resolution satellite imagery of the world’s
surface
   - two tables - (~70Tb, imagery, no compression), (~500Gb,
cache, served on hundreds of tablet servers
Cooked by Vast: RecordStack

Introduction:

- long-term archival storage for vertical data to HDFS
- optimized for later analytical processing.
- stores:
   - listings data
   - transactional log data
   - user clicks
   - leads
   - other reporting stuff
Cooked by Vast: RecordStack

Goals:

- Efficient archival of records to HDFS - conserving space
- Checksuming records and tracking change
- Optmization of HDFS - stored structure by periodical data
compacting
- Simple access from Hadoop tasks for analytical processing
- Simple access via client application - Remote console
- generation of reports
Some thoughts for discussion

- doing Bigtable with some number of "classic" databases?

- expose of Chubby to client, is it required?

- why not open source Bigtable?

- run of tablet servers on data nodes - closer to data?
Useful links

- Scalability: http://scalability.rs/

- NoSQL Summer: http://nosqlsummer.org/

- Google Bigtable paper: http://tinyurl.com/37rlevv

- HBase Architecture: http://tinyurl.com/3y3fhbk

- HBase related blog: http://www.larsgeorge.com/

- Hadoop: The Definitive guide: http://tinyurl.com/36ponz3
If you, maybe, want to contact me?



e-mail: vanjakom@gmail.com, vanja@vast.com

skype: komadinovic.vanja

mobile: =381 (64) 296 03 43

twitter: vanjakom
Thanks !!!

More Related Content

What's hot

Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsAlluxio, Inc.
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars GeorgeJAX London
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersCloudera, Inc.
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon
 
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017Amazon Web Services
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop DataWorks Summit/Hadoop Summit
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
An Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseAn Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseBenjamin Bengfort
 
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Altinity Ltd
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoHigh Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoAlluxio, Inc.
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to CassandraGokhan Atil
 
On Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQLOn Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQLDatabricks
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfAltinity Ltd
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...Chester Chen
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 

What's hot (20)

Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
Bigtable and Dynamo
Bigtable and DynamoBigtable and Dynamo
Bigtable and Dynamo
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
 
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBaseHBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
HBaseCon 2015: Taming GC Pauses for Large Java Heap in HBase
 
Apache hive
Apache hiveApache hive
Apache hive
 
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
Best Practices for Running PostgreSQL on AWS - DAT314 - re:Invent 2017
 
Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop Real-time Hadoop: The Ideal Messaging System for Hadoop
Real-time Hadoop: The Ideal Messaging System for Hadoop
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
An Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed DatabaseAn Overview of Spanner: Google's Globally Distributed Database
An Overview of Spanner: Google's Globally Distributed Database
 
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoHigh Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
On Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQLOn Improving Broadcast Joins in Apache Spark SQL
On Improving Broadcast Joins in Apache Spark SQL
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 

Viewers also liked

Big table
Big tableBig table
Big tablePSIT
 
Google Bigtable paper presentation
Google Bigtable paper presentationGoogle Bigtable paper presentation
Google Bigtable paper presentationvanjakom
 
Dynamo and BigTable - Review and Comparison
Dynamo and BigTable - Review and ComparisonDynamo and BigTable - Review and Comparison
Dynamo and BigTable - Review and ComparisonGrisha Weintraub
 
BigTable And Hbase
BigTable And HbaseBigTable And Hbase
BigTable And HbaseEdward Yoon
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)Romain Jacotin
 
Google Architecture - Breaking it Open
Google Architecture - Breaking it OpenGoogle Architecture - Breaking it Open
Google Architecture - Breaking it OpenHARMAN Services
 
Bigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured DataBigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured Dataelliando dias
 
Big Data Berlin - Criteo
Big Data Berlin - CriteoBig Data Berlin - Criteo
Big Data Berlin - CriteoSofian Djamaa
 
30 billion requests per day with a NoSQL architecture (2013)
30 billion requests per day with a NoSQL architecture (2013)30 billion requests per day with a NoSQL architecture (2013)
30 billion requests per day with a NoSQL architecture (2013)Julien SIMON
 
Bigtable
BigtableBigtable
Bigtablenextlib
 
Introduction_OF_Hadoop_and_BigData
Introduction_OF_Hadoop_and_BigDataIntroduction_OF_Hadoop_and_BigData
Introduction_OF_Hadoop_and_BigDataNilay Mishra
 
ROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEY
ROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEYROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEY
ROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEYSupport Driven
 
I've (probably) been using Google App Engine for a week longer than you have
I've (probably) been using Google App Engine for a week longer than you haveI've (probably) been using Google App Engine for a week longer than you have
I've (probably) been using Google App Engine for a week longer than you haveSimon Willison
 
HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs
HFile: A Block-Indexed File Format to Store Sorted Key-Value PairsHFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs
HFile: A Block-Indexed File Format to Store Sorted Key-Value PairsSchubert Zhang
 
Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to PythonBogdan Sabău
 

Viewers also liked (20)

Big table
Big tableBig table
Big table
 
google Bigtable
google Bigtablegoogle Bigtable
google Bigtable
 
Google Bigtable paper presentation
Google Bigtable paper presentationGoogle Bigtable paper presentation
Google Bigtable paper presentation
 
Dynamo and BigTable - Review and Comparison
Dynamo and BigTable - Review and ComparisonDynamo and BigTable - Review and Comparison
Dynamo and BigTable - Review and Comparison
 
BigTable And Hbase
BigTable And HbaseBigTable And Hbase
BigTable And Hbase
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)
 
Bigtable
BigtableBigtable
Bigtable
 
Google Architecture - Breaking it Open
Google Architecture - Breaking it OpenGoogle Architecture - Breaking it Open
Google Architecture - Breaking it Open
 
Bigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured DataBigtable: A Distributed Storage System for Structured Data
Bigtable: A Distributed Storage System for Structured Data
 
Big Data Berlin - Criteo
Big Data Berlin - CriteoBig Data Berlin - Criteo
Big Data Berlin - Criteo
 
30 billion requests per day with a NoSQL architecture (2013)
30 billion requests per day with a NoSQL architecture (2013)30 billion requests per day with a NoSQL architecture (2013)
30 billion requests per day with a NoSQL architecture (2013)
 
Bigtable
BigtableBigtable
Bigtable
 
Introduction_OF_Hadoop_and_BigData
Introduction_OF_Hadoop_and_BigDataIntroduction_OF_Hadoop_and_BigData
Introduction_OF_Hadoop_and_BigData
 
Php.Mvc Presentation
Php.Mvc PresentationPhp.Mvc Presentation
Php.Mvc Presentation
 
Big table
Big tableBig table
Big table
 
ROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEY
ROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEYROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEY
ROCKING YOUR SEAT AT THE BIG TABLE - ROB BAILEY
 
I've (probably) been using Google App Engine for a week longer than you have
I've (probably) been using Google App Engine for a week longer than you haveI've (probably) been using Google App Engine for a week longer than you have
I've (probably) been using Google App Engine for a week longer than you have
 
HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs
HFile: A Block-Indexed File Format to Store Sorted Key-Value PairsHFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs
HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs
 
Introduction to Python
Introduction to PythonIntroduction to Python
Introduction to Python
 
Google's BigTable
Google's BigTableGoogle's BigTable
Google's BigTable
 

Similar to Google Bigtable Paper Presentation

HBase, crazy dances on the elephant back.
HBase, crazy dances on the elephant back.HBase, crazy dances on the elephant back.
HBase, crazy dances on the elephant back.Roman Nikitchenko
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGuang Xu
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewNisanth Simon
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)Sascha Dittmann
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Vinoth Chandar
 
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...Yahoo Developer Network
 
MapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APIMapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APImcsrivas
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7Ted Dunning
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataRyan Bosshart
 
02.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 201302.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 2013WANdisco Plc
 
Meethadoop
MeethadoopMeethadoop
MeethadoopIIIT-H
 
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Cloudera, Inc.
 

Similar to Google Bigtable Paper Presentation (20)

Gcp data engineer
Gcp data engineerGcp data engineer
Gcp data engineer
 
HBase, crazy dances on the elephant back.
HBase, crazy dances on the elephant back.HBase, crazy dances on the elephant back.
HBase, crazy dances on the elephant back.
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheet
 
Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
HBase introduction talk
HBase introduction talkHBase introduction talk
HBase introduction talk
 
Hadoop - Introduction to HDFS
Hadoop - Introduction to HDFSHadoop - Introduction to HDFS
Hadoop - Introduction to HDFS
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 2)
 
Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017Hoodie - DataEngConf 2017
Hoodie - DataEngConf 2017
 
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
Apache Hadoop India Summit 2011 talk "Searching Information Inside Hadoop Pla...
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 
Training
TrainingTraining
Training
 
MapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APIMapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase API
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 
Kudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast DataKudu - Fast Analytics on Fast Data
Kudu - Fast Analytics on Fast Data
 
02.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 201302.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 2013
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
 
Hadoop Research
Hadoop Research Hadoop Research
Hadoop Research
 

Google Bigtable Paper Presentation

  • 1. Google Bigtable (Bigtable: A Distributed Storage System for Structured Data) Komadinovic Vanja, Vast Platform team
  • 2. Presentation overview - introduction - design - basic implementation - GFS - HDFS introduction - MapReduce introduction - implementation - HBase - Apache Bigtable solution - performances and usage case - some thoughts for discussion
  • 3. Introduction - what is bigtable? - why need for something more than SQL? - machines number increase - scaling - fault tolerance - row, column, cell - versioning - storage on distributed file system - read, write operations on row or set of rows - scanner - iterator - no need for indexing
  • 4. Design - column family basic unit of access control - column family must be created before data insertion - column labels number is unlimited - row key is arbitrary string, max. size 64Kb - every cell has copies in time - configurable
  • 6. Design - table is divided in tablets - tablets are distributed on machines - each tablet contains N rows - tablet size 100 - 200 Mb - tablet server splits tablet when it grows, only operation initiated by tablet server - tablets are stored on GFS
  • 9. GFS - HDFS introduction - distributed file system - files are divided in relatively large blocks, 64Mb - blocks of one file are stored on multiple machines - every block is replicated in configurable number of copies - fault tolerance guaranteed - Master - Slaves architecture
  • 10. GFS - HDFS introduction: overview
  • 11. MapReduce introduction - framework build on top of (GFS) HDFS - used for distributed data processing - Master - Slaves architecture - between data elements there is no dependency - everything is executed in parallel - two parts: Mapper and Reducer - Mapper reads input data and emits ( key, value pairs ) to Reducer - Data on Reducer is grouped by key - Reducer produces output data - Combiner and Partitioner - who are they, and what they want?
  • 13. Implementation - Chubby - distributed lock service, synchronization point, start point for master, tablet server and even clients - Master server, only one, guaranteed by Chubby - in case of Master tragical death tablet servers continue to serve client - Master decides who serves each tablet ( including ROOT and METATABLE ) - tablets are saved on GFS - all trafic between clients and Bigtable is done directly within client and tablet server ( Master is not required ) - client caches METATABLE, on first fault recache is done
  • 15. Implementation: oragnization - two special tables: ROOT and METADATA - ROOT never gets splitted - location of ROOT is in Chubby file - estimated ROOT size 128Mb => 2^34 tablets - METADATA contains all user tablets, row key is an encoding of the tablet’s table identifier and its end row
  • 16. Implementation - SSTable - file representation of tablet - consist of blocks - column family labels are stored side by side - index of blocks at end of file - compaction - minor - major - locality groups - caching
  • 17. HBase - Apache Bigtable solution 
  • 18. HBase - Bigtable synonyms - Bigtable ~ HBase - tablet ~ region - Master server ~ HBase master - Tablet server ~ HBase Region server - Chubby ~ Zookeeper ??? - GFS ~ HDFS - MapReduce ~ MapReduce :) - SSTable file ~ MapFile file
  • 19. HBase differences - MapFiles index is stored in separate file instead at end of file as in SSTable - Unlike Bigtable which identifies a row range by the table name and end-key, HBase identifies a row range by the table name and start-key - when the HBase master dies, the cluster will shut down
  • 20. Performance and usage case - number of 1000-byte values read/written per second - table shows the rate per tablet server Experiment 1 TS 50 TS 250 TS 500 TS random 1212 593 479 241 reads random 10811 8511 8000 6250 reads mem random 8850 3745 3425 2000 writes sequential 4425 2463 2625 2469 reads sequential 8547 3623 2451 1905 writes scans 15385 10526 9524 7843
  • 21. Performances and usage case Scaling: - performance of random reads from memory increases by almost a factor of 300 as the number of tablet server increases by a factor of 500 - significant drop in per-server throughput when going from 1 to 50 tablet servers - the random read benchmark shows the worst scaling (an increase in aggregate throughput by only a factor of 100 for a 500-fold increase in number of servers)
  • 22. Performances and usage case Real users: - Google Analytics: - It provides aggregate statistics, such as the number of unique visitors per day and the page views per URL per day - two tables - raw user clicks (~200Tb, 14% compression) and summary (~20Tb, 29% compression) - Google Earth: - Google operates a collection of services that provide users with access to high-resolution satellite imagery of the world’s surface - two tables - (~70Tb, imagery, no compression), (~500Gb, cache, served on hundreds of tablet servers
  • 23. Cooked by Vast: RecordStack Introduction: - long-term archival storage for vertical data to HDFS - optimized for later analytical processing. - stores: - listings data - transactional log data - user clicks - leads - other reporting stuff
  • 24. Cooked by Vast: RecordStack Goals: - Efficient archival of records to HDFS - conserving space - Checksuming records and tracking change - Optmization of HDFS - stored structure by periodical data compacting - Simple access from Hadoop tasks for analytical processing - Simple access via client application - Remote console - generation of reports
  • 25. Some thoughts for discussion - doing Bigtable with some number of "classic" databases? - expose of Chubby to client, is it required? - why not open source Bigtable? - run of tablet servers on data nodes - closer to data?
  • 26. Useful links - Scalability: http://scalability.rs/ - NoSQL Summer: http://nosqlsummer.org/ - Google Bigtable paper: http://tinyurl.com/37rlevv - HBase Architecture: http://tinyurl.com/3y3fhbk - HBase related blog: http://www.larsgeorge.com/ - Hadoop: The Definitive guide: http://tinyurl.com/36ponz3
  • 27. If you, maybe, want to contact me? e-mail: vanjakom@gmail.com, vanja@vast.com skype: komadinovic.vanja mobile: =381 (64) 296 03 43 twitter: vanjakom