SlideShare a Scribd company logo
Google Bigtable
The magic behind Google’s
data management
Overview
● Introduction
● Challenges
● Data model
● Building blocks
● Conclusion
➔ Bigtable is Google’s cloud based data storage service.
➔ It works on distributed parallel architecture and clustering.
➔ It is self managing, highly scalable, fault tolerant and flexible.
➔ Bigtable provide low latency real time access and improved higher workload
processing.
➔ It provides integration capabilities with other products and services through
API’s
➔ Many services by Google use Bigtable to store data , including Gmail, Youtube,
web indexing, Google Maps and Google Analytics
Intro
Original Idea
Challenges
Jeffrey and sanjay decided to build a datastore service that could scale linearly across thousands of
commodity servers.
● Using cheap hardware may lead to system failure.
● How to retain performance at high scale
-- Compromise with few things
>Abandon traditional relational model (No joins )
>Replication of data
>Using parallel and distributed architecture
Data Model
A Bigtable is a sparse, distributed, persistent multi-dimensional sorted map. The map is indexed by a
row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes.
Bigtable considers data as strings, both in case of structured and unstructured data.
● Rows
➔ The row keys in a table are arbitrary strings.
➔ Data is maintained in lexicographic order by row key
➔ Each row range is called a tablet, which is the unit of distribution and load balancing.
● Columns
➔ Column keys are grouped into sets called column families.
➔ Data stored in a column family is usually of the same type
➔ A column key is named using the syntax: family : qualifier.
➔ Column family names must be printable , but qualifiers may be arbitrary strings.
● Timestamps
➔ Each cell in a Bigtable can contain multiple versions of the same data
➔ Versions are indexed by 64-bit integer timestamps
➔ Timestamps can be assigned: automatically by Bigtable , or explicitly by client applications
Rows Timestamps
Columns
Building Blocks
Bigtable is built on several other pieces of Google infrastructure.
● Google File system(GFS)
● SSTable : Data structure for storage
● Chubby: Distributed lock service.
Three major components
❖ Library linked into every client
❖ Single master server
▪ Assigning tablets to tablet servers
▪ Detecting addition and expiration of tablet servers
▪ Balancing tablet-server load
▪ Garbage collection files in GFS
❖ Many tablet servers
▪ Manages a set of tablets
▪ Tablet servers handle read and write requests to its table
▪ Splits tablets that have grown too large
Three level hierarchy
Level 2: Root tablet contains the location of METADATA tablets
Level 3: Each METADATA tablet contains the location of user tablets
Level 1: Chubby file containing location of the root tablet
▪ Location of tablet is stored under a row key that
encodes table identifier and its end row
“All models are wrong. Some models are
useful.”
- George Box,"one of the great statistical minds of the 20th century”
Distributed and
parallel computing
has paved the way
for new
technologies to
flourish.
Conclusion
Bigtable has provided low latency real time access and improved higher workload processing with high scalability and
high throughput. It’s Robust fault tolerant architecture helps to reduce risk of data loss, reliable cluster resizing enables to
provision or de-provision the new cluster with no down time , autonomous management let’s the user be free of
managing the tasks and assignment of data, while Bigtable does it automatically and provided integration capabilities
with other products and services through API’s really make it a general purpose data store, extending it’s capability and
giving user a reliable interface to get more out of less. Bigtable uses a parallel and distributed architecture to process the
data at lightning speeds while reducing cost per computation, the architecture at back end is advanced and proved to be
better in performance and user experience.
With the demand of huge cloud data storage making so much sense now, Bigtable has landed being one of the best
possible solution with lower cost, high performance, durability and flexibility. Since it’s already powering most of
Google’s services , it has proved its usability, and its really the Google’s magic behind it’s data management and high
performance operability, giving it an edge over other giants in the field.
Thanks!
For giving
Your
Precious
Time.

More Related Content

What's hot

What's hot (19)

Google BigQuery for Everyday Developer
Google BigQuery for Everyday DeveloperGoogle BigQuery for Everyday Developer
Google BigQuery for Everyday Developer
 
Web Browser Controls in Adlib: The Hidden Diamond in the Adlib Treasure Chest
Web Browser Controls in Adlib: The Hidden Diamond in the Adlib Treasure ChestWeb Browser Controls in Adlib: The Hidden Diamond in the Adlib Treasure Chest
Web Browser Controls in Adlib: The Hidden Diamond in the Adlib Treasure Chest
 
DocumentDB - NoSQL on Cloud at Reboot2015
DocumentDB - NoSQL on Cloud at Reboot2015DocumentDB - NoSQL on Cloud at Reboot2015
DocumentDB - NoSQL on Cloud at Reboot2015
 
Bigtable a distributed storage system
Bigtable a distributed storage systemBigtable a distributed storage system
Bigtable a distributed storage system
 
Dbscripts Drupalcon DC 2009 Presentation
Dbscripts Drupalcon DC 2009 PresentationDbscripts Drupalcon DC 2009 Presentation
Dbscripts Drupalcon DC 2009 Presentation
 
Architecture Blue Print
Architecture Blue PrintArchitecture Blue Print
Architecture Blue Print
 
Google Big Query UDFs
Google Big Query UDFsGoogle Big Query UDFs
Google Big Query UDFs
 
Tableau Data Sheet | Whitepaper
Tableau Data Sheet | WhitepaperTableau Data Sheet | Whitepaper
Tableau Data Sheet | Whitepaper
 
Google Cloud Platform at Vente-Exclusive.com
Google Cloud Platform at Vente-Exclusive.comGoogle Cloud Platform at Vente-Exclusive.com
Google Cloud Platform at Vente-Exclusive.com
 
TDC2016SP - Trilha BigData
TDC2016SP - Trilha BigDataTDC2016SP - Trilha BigData
TDC2016SP - Trilha BigData
 
Google Bigtable
Google BigtableGoogle Bigtable
Google Bigtable
 
An Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and KibanaAn Intro to Elasticsearch and Kibana
An Intro to Elasticsearch and Kibana
 
CZJUG Intro - BI Platform as a Service - a case for Java in the Cloud
CZJUG Intro - BI Platform as a Service - a case for Java in the CloudCZJUG Intro - BI Platform as a Service - a case for Java in the Cloud
CZJUG Intro - BI Platform as a Service - a case for Java in the Cloud
 
Data Structure and Types
Data Structure and TypesData Structure and Types
Data Structure and Types
 
Google App Engine 7 9-14
Google App Engine 7 9-14Google App Engine 7 9-14
Google App Engine 7 9-14
 
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by YugabyteA Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
A Planet-Scale Database for Low Latency Transactional Apps by Yugabyte
 
Big data converted
Big data convertedBig data converted
Big data converted
 
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Building Resilient and Scalable Data Pipelines by Decoupling Compute and StorageBuilding Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
Building Resilient and Scalable Data Pipelines by Decoupling Compute and Storage
 
Nyc web perf-final-july-23
Nyc web perf-final-july-23Nyc web perf-final-july-23
Nyc web perf-final-july-23
 

Similar to Google Bigtable

Bigtable
BigtableBigtable
Bigtable
ptdorf
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06
temp2004it
 

Similar to Google Bigtable (20)

Google Big Table
Google Big TableGoogle Big Table
Google Big Table
 
Bigtable
BigtableBigtable
Bigtable
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06
 
Bigtable osdi06
Bigtable osdi06Bigtable osdi06
Bigtable osdi06
 
Google - Bigtable
Google - BigtableGoogle - Bigtable
Google - Bigtable
 
Getting more into GCP.pdf
Getting more into GCP.pdfGetting more into GCP.pdf
Getting more into GCP.pdf
 
bigquery.pptx
bigquery.pptxbigquery.pptx
bigquery.pptx
 
Exploring BigData with Google BigQuery
Exploring BigData with Google BigQueryExploring BigData with Google BigQuery
Exploring BigData with Google BigQuery
 
Big table
Big tableBig table
Big table
 
Introduction to GCP Data Flow Presentation
Introduction to GCP Data Flow PresentationIntroduction to GCP Data Flow Presentation
Introduction to GCP Data Flow Presentation
 
Introduction to GCP DataFlow Presentation
Introduction to GCP DataFlow PresentationIntroduction to GCP DataFlow Presentation
Introduction to GCP DataFlow Presentation
 
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
 
Executive Intro to BigQuery
Executive Intro to BigQueryExecutive Intro to BigQuery
Executive Intro to BigQuery
 
GCP On Prem Buyers Guide - White-paper | Qubole
GCP On Prem Buyers Guide - White-paper | Qubole GCP On Prem Buyers Guide - White-paper | Qubole
GCP On Prem Buyers Guide - White-paper | Qubole
 
Facade
FacadeFacade
Facade
 
Traditional data word
Traditional data wordTraditional data word
Traditional data word
 
Cloud computing overview
Cloud computing overviewCloud computing overview
Cloud computing overview
 
Google BigTable
Google BigTableGoogle BigTable
Google BigTable
 
Bigtable_Paper
Bigtable_PaperBigtable_Paper
Bigtable_Paper
 

Recently uploaded

Recently uploaded (20)

Ransomware Mallox [EN].pdf
Ransomware         Mallox       [EN].pdfRansomware         Mallox       [EN].pdf
Ransomware Mallox [EN].pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 

Google Bigtable

  • 1. Google Bigtable The magic behind Google’s data management
  • 2. Overview ● Introduction ● Challenges ● Data model ● Building blocks ● Conclusion
  • 3. ➔ Bigtable is Google’s cloud based data storage service. ➔ It works on distributed parallel architecture and clustering. ➔ It is self managing, highly scalable, fault tolerant and flexible. ➔ Bigtable provide low latency real time access and improved higher workload processing. ➔ It provides integration capabilities with other products and services through API’s ➔ Many services by Google use Bigtable to store data , including Gmail, Youtube, web indexing, Google Maps and Google Analytics Intro
  • 5. Challenges Jeffrey and sanjay decided to build a datastore service that could scale linearly across thousands of commodity servers. ● Using cheap hardware may lead to system failure. ● How to retain performance at high scale -- Compromise with few things >Abandon traditional relational model (No joins ) >Replication of data >Using parallel and distributed architecture
  • 6. Data Model A Bigtable is a sparse, distributed, persistent multi-dimensional sorted map. The map is indexed by a row key, column key, and a timestamp; each value in the map is an uninterpreted array of bytes. Bigtable considers data as strings, both in case of structured and unstructured data. ● Rows ➔ The row keys in a table are arbitrary strings. ➔ Data is maintained in lexicographic order by row key ➔ Each row range is called a tablet, which is the unit of distribution and load balancing. ● Columns ➔ Column keys are grouped into sets called column families. ➔ Data stored in a column family is usually of the same type ➔ A column key is named using the syntax: family : qualifier. ➔ Column family names must be printable , but qualifiers may be arbitrary strings.
  • 7. ● Timestamps ➔ Each cell in a Bigtable can contain multiple versions of the same data ➔ Versions are indexed by 64-bit integer timestamps ➔ Timestamps can be assigned: automatically by Bigtable , or explicitly by client applications Rows Timestamps Columns
  • 8. Building Blocks Bigtable is built on several other pieces of Google infrastructure. ● Google File system(GFS) ● SSTable : Data structure for storage ● Chubby: Distributed lock service.
  • 9. Three major components ❖ Library linked into every client ❖ Single master server ▪ Assigning tablets to tablet servers ▪ Detecting addition and expiration of tablet servers ▪ Balancing tablet-server load ▪ Garbage collection files in GFS ❖ Many tablet servers ▪ Manages a set of tablets ▪ Tablet servers handle read and write requests to its table ▪ Splits tablets that have grown too large
  • 10. Three level hierarchy Level 2: Root tablet contains the location of METADATA tablets Level 3: Each METADATA tablet contains the location of user tablets Level 1: Chubby file containing location of the root tablet ▪ Location of tablet is stored under a row key that encodes table identifier and its end row
  • 11.
  • 12. “All models are wrong. Some models are useful.” - George Box,"one of the great statistical minds of the 20th century”
  • 13. Distributed and parallel computing has paved the way for new technologies to flourish.
  • 14. Conclusion Bigtable has provided low latency real time access and improved higher workload processing with high scalability and high throughput. It’s Robust fault tolerant architecture helps to reduce risk of data loss, reliable cluster resizing enables to provision or de-provision the new cluster with no down time , autonomous management let’s the user be free of managing the tasks and assignment of data, while Bigtable does it automatically and provided integration capabilities with other products and services through API’s really make it a general purpose data store, extending it’s capability and giving user a reliable interface to get more out of less. Bigtable uses a parallel and distributed architecture to process the data at lightning speeds while reducing cost per computation, the architecture at back end is advanced and proved to be better in performance and user experience. With the demand of huge cloud data storage making so much sense now, Bigtable has landed being one of the best possible solution with lower cost, high performance, durability and flexibility. Since it’s already powering most of Google’s services , it has proved its usability, and its really the Google’s magic behind it’s data management and high performance operability, giving it an edge over other giants in the field.