SlideShare a Scribd company logo
HBASE Backups



Pritam Damania
Software Engineer, Facebook
Aug 1, 2012
Agenda
 1   Introduction to HBASE and HDFS

 2   Introduction to HBASE Backups

 3   Facebook’s Backup solution

 4   Results

 5   Further Work
INTRODUCTION TO HDFS
What is HDFS ?


▪  Distributed   FileSystem
▪  Runs    on top of commodity hardware
▪  Scale   to Petabytes of data
▪  Tolerates   machine failures
HDFS Data Model

▪  Data    is logically organized into files and directories
▪  Files   are divided into uniform-sized blocks
▪  Blocks
        are distributed across the nodes of the cluster and are replicated
 to handle hardware failure
▪  HDFS     keeps checksums of data for corruption detection and recovery
▪  HDFS     exposes block placement so that computation can be migrated to
 data
HDFS Data Model (2)

            MetaInfo(Filename, replicationFactor, block-ids, …)
            /users/user1/data/part-0, repl:2, ids: {1,3}, …
            /users/user1/data/part-1, repl:3, ids: {2,4,5}, …


                                Block Storage

1                           2
                                          1       4         2     5
        2


                                           3                      4
            3         4
    5                                               5



                                                  7
HDFS Architecture
                                               Metadata (Name, #replicas, …):
                          Namenode                 /users/foo/data, 3, …
       Metadata ops
                                             Block ops
  Client               Metadata ops

Read              Datanodes                                 Datanodes
                                       Replication

                                                                     Blocks


                               Write                        Rack 2
            Rack 1
                            Client
                                                     8
INTRODUCTION TO HBASE
HBase in a nutshell

§  distributed, large-scale data store

§  can host very large tables, billions of rows x millions of columns

§  efficient at random reads/writes

§  open source project modeled after Google’s BigTable
HBase Data Model
•  An HBase table is:
 •    a sparse , three-dimensional array of cells, indexed by:
       RowKey, ColumnKey, Timestamp/Version

 •    sharded into regions along an ordered RowKey space

•  Within each region:
 •    Data is grouped into column families
 ▪    Sort order within each column family:

      •  Row Key (asc), Column Key (asc), Timestamp (desc)
HBase System Overview
                            Database Layer
   HBASE
       Master     Backup Master
   Region           Region             Region    ...
   Server           Server             Server

            Storage Layer                              Coordination Service

HDFS                                            Zookeeper Quorum
   Namenode         Secondary Namenode          ZK         ZK         ...
                                                Peer       Peer
Datanode Datanode       Datanode       ...
HBase Overview
HBASE Region Server
             ....
          Region #2
       Region #1
                    ....
                ColumnFamily #2
            ColumnFamily #1         Memstore
                             (in memory data structure)


               HFiles (in HDFS)                       flush



  Write Ahead Log ( in HDFS)
INTRODUCTION TO HBASE
       BACKUPS
Why Backups ?


▪  Data   Corruption
▪  Operational   error
▪  Hardware   failures
▪  Disaster
Hbase Backups – The Problem

▪  Need   a consistent, point in time backup
▪  Issues   :
 ▪    Live cluster, with traffic
 ▪    Data in MemStore
 ▪    Flushes and Compations in the background
 ▪    Regionserver death
 ▪    Regions moving
CURRENT OPTIONS – Export Table
▪  Pros   :
 ▪    Can export part or full table
 ▪    Map-Reduce job downloads data to output path provided
 ▪    Supports start time, end time and versions so could provide a
      consistent backup
 ▪    Can specify which Column Families to export
▪  Cons       :
 ▪    Only one table at a time
 ▪    Full scans and random reads
CURRENT OPTIONS - Copy Table
▪  Tool   to copy existing table to a intra/inter cluster
▪  Pros   :
 ▪    Another parallel replicated setup to switch
 ▪    Supports start time, end time, and versions
 ▪    Cluster being copied to could be in different setup
 ▪    Can specify which Column Families to export
▪  Cons       :
 ▪    Keep another HBASE cluster up and ready
 ▪    Full scans and random reads
Facebook’s Backup Solution
Backups V1
                Log(Put
                A)
  Application             Backup
                          Cluster
                Log(Put
                A)
        Put A             Dedup

    HBase                 Verify
Backups V1 – Pros and Cons
▪  Pros   :
 ▪    Simple solution
 ▪    Consistency in backup
 ▪    Point in time restore
 ▪    Verification of backups
▪  Cons       :
 ▪    Requires replay of large amount of transactions
 ▪    Requires double writes and deduplication
Backups V2

               Flush Region
 RegionServe   Get File List           Mapper
      r


     Flush                     Copy
                               Files


                      HDFS
                                                .regioninfo
Backups V2 – Tuning


▪  Locality   based mappers
▪  Use   in rack replication
▪  Increase   .Trash retention for HDFS
▪  Fault   tolerant
▪  Use   Backups V1 for point in time
Backups V2 – Restore



▪  Rewrite   backed up .regioninfo
▪  Move   backup copy in place
▪  Add   regions to .META using .regioninfo
Backups V2 – Pros and Cons
▪  Pros   :
 ▪    Faster restore
 ▪    Backup entire data in hours
 ▪    Consistency in backup
 ▪    Point in time restore
 ▪    Resilient to RS death, region moves
▪  Cons       :
 ▪    Affects production cluster
 ▪    Not scalable with data growth
Backups V2 – HDFS Improvements


▪  Overhead    of copying large files
▪  Use   locality of data
▪  HDFS    HFiles are immutable
▪  HDFS    blocks are immutable
▪  Hardlinks   at block level!
Fast Copy workflow
           Source                                                       Destination

B1    B2   ………………..                                         B1’   B2’    ……………………
                                                                         ….




FastCopy Client                           Add Block
                                       Create Destination
                                          Get Source              NameNode




                          Copy Block




     B1             B1’                    B1         B1’         B1                  B1’

     B2             B2’                    B2         B2’         B2                  B2’

      Date Node1                             Date Node2                 Date Node3
FastCopy – Pros and Cons

▪  Pros   :
 ▪    Extremely fast
 ▪    Lots of space saving
 ▪    Minimal impact to production cluster
▪  Cons       :
 ▪    NameNode not aware
 ▪    Hardlinks lost on datanode death
 ▪    Balancer not aware.
Operations


▪  Messages    Use Case :
 ▪    3 stage (same cluster, off cluster, off data center)
 ▪    Stage 1 : once/ day
 ▪    Stage 2 : once / 10 day
 ▪    Stage 3 : once / 10 day
 ▪    Retention based on capacity
Results
Backup Numbers

Example :
▪    40 TB table
▪    49 Mappers
▪    Normal Copy – 15 hours
▪    Fast Copy – 1.5 hours
Disk Savings - FastCopy



Disk
usage in
percent
Network Traffic - FastCopy
Further Work
Further Work

▪    Backup HLogs
▪    Point in time backups
▪    Namenode level Hard links


▪    Code and JIRAs :
     ▪    HBASE 4618
     ▪    HDFS code in github (https://github.com/facebook/hadoop-20)
Acknowledgements

▪    Madhuwanti Vaidya

▪    Ryan Thiessen

▪    Karthik Ranganathan

▪    Paul Tuckfield

▪    Kannan Muthukkaruppan

▪    Hairong Kuang

▪    Dhruba Borthakur

▪    Amitanand Aiyer

▪    Mikhail Bautin
Questions ?
(c) 2009 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0

More Related Content

What's hot

HBase 2.0 cluster topology
HBase 2.0 cluster topologyHBase 2.0 cluster topology
HBase 2.0 cluster topology
Mikhail Antonov
 
A DBA’s guide to using TSA
A DBA’s guide to using TSAA DBA’s guide to using TSA
A DBA’s guide to using TSA
Frederik Engelen
 
Ibm db2 10.5 for linux, unix, and windows getting started with db2 installa...
Ibm db2 10.5 for linux, unix, and windows   getting started with db2 installa...Ibm db2 10.5 for linux, unix, and windows   getting started with db2 installa...
Ibm db2 10.5 for linux, unix, and windows getting started with db2 installa...
bupbechanhgmail
 
A First Look at the DB2 10 DSNZPARM Changes
A First Look at the DB2 10 DSNZPARM ChangesA First Look at the DB2 10 DSNZPARM Changes
A First Look at the DB2 10 DSNZPARM ChangesWillie Favero
 
Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksHadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Cloudera, Inc.
 
DB2 V 10 HADR Multiple Standby
DB2 V 10 HADR Multiple StandbyDB2 V 10 HADR Multiple Standby
DB2 V 10 HADR Multiple StandbyDale McInnis
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
Delhi/NCR HUG
 
11 cool features in Defrag.nsf+ 11
11 cool features in Defrag.nsf+ 1111 cool features in Defrag.nsf+ 11
11 cool features in Defrag.nsf+ 11
aosborne
 
Hadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenesHadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenes
Nitin Khattar
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
Mahendran Ponnusamy
 
Storage infrastructure using HBase behind LINE messages
Storage infrastructure using HBase behind LINE messagesStorage infrastructure using HBase behind LINE messages
Storage infrastructure using HBase behind LINE messages
LINE Corporation (Tech Unit)
 
Hadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionHadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment Evolution
Benoit Perroud
 
HDFS Futures: NameNode Federation for Improved Efficiency and Scalability
HDFS Futures: NameNode Federation for Improved Efficiency and ScalabilityHDFS Futures: NameNode Federation for Improved Efficiency and Scalability
HDFS Futures: NameNode Federation for Improved Efficiency and Scalability
Hortonworks
 
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure CodingLess is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
Zhe Zhang
 
Hadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHadoop HDFS NameNode HA
Hadoop HDFS NameNode HA
Hanborq Inc.
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
shrey mehrotra
 
JONSMITH10042016
JONSMITH10042016JONSMITH10042016
JONSMITH10042016
Jon Smith
 

What's hot (20)

HBase 2.0 cluster topology
HBase 2.0 cluster topologyHBase 2.0 cluster topology
HBase 2.0 cluster topology
 
A DBA’s guide to using TSA
A DBA’s guide to using TSAA DBA’s guide to using TSA
A DBA’s guide to using TSA
 
Ibm db2 10.5 for linux, unix, and windows getting started with db2 installa...
Ibm db2 10.5 for linux, unix, and windows   getting started with db2 installa...Ibm db2 10.5 for linux, unix, and windows   getting started with db2 installa...
Ibm db2 10.5 for linux, unix, and windows getting started with db2 installa...
 
A First Look at the DB2 10 DSNZPARM Changes
A First Look at the DB2 10 DSNZPARM ChangesA First Look at the DB2 10 DSNZPARM Changes
A First Look at the DB2 10 DSNZPARM Changes
 
Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)Big data- HDFS(2nd presentation)
Big data- HDFS(2nd presentation)
 
Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksHadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
 
DB2 V 10 HADR Multiple Standby
DB2 V 10 HADR Multiple StandbyDB2 V 10 HADR Multiple Standby
DB2 V 10 HADR Multiple Standby
 
Hadoop Architecture
Hadoop ArchitectureHadoop Architecture
Hadoop Architecture
 
11 cool features in Defrag.nsf+ 11
11 cool features in Defrag.nsf+ 1111 cool features in Defrag.nsf+ 11
11 cool features in Defrag.nsf+ 11
 
Hadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenesHadoop Distributed File System(HDFS) : Behind the scenes
Hadoop Distributed File System(HDFS) : Behind the scenes
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Dnssec
DnssecDnssec
Dnssec
 
Storage infrastructure using HBase behind LINE messages
Storage infrastructure using HBase behind LINE messagesStorage infrastructure using HBase behind LINE messages
Storage infrastructure using HBase behind LINE messages
 
Hadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment EvolutionHadoop Successes and Failures to Drive Deployment Evolution
Hadoop Successes and Failures to Drive Deployment Evolution
 
INFLOW-2014-NVM-Compression
INFLOW-2014-NVM-CompressionINFLOW-2014-NVM-Compression
INFLOW-2014-NVM-Compression
 
HDFS Futures: NameNode Federation for Improved Efficiency and Scalability
HDFS Futures: NameNode Federation for Improved Efficiency and ScalabilityHDFS Futures: NameNode Federation for Improved Efficiency and Scalability
HDFS Futures: NameNode Federation for Improved Efficiency and Scalability
 
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure CodingLess is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
 
Hadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHadoop HDFS NameNode HA
Hadoop HDFS NameNode HA
 
Introduction to hadoop and hdfs
Introduction to hadoop and hdfsIntroduction to hadoop and hdfs
Introduction to hadoop and hdfs
 
JONSMITH10042016
JONSMITH10042016JONSMITH10042016
JONSMITH10042016
 

Similar to Facebook's HBase Backups - StampedeCon 2012

Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
强 王
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统yongboy
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
Uwe Printz
 
[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)baggioss
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
Uwe Printz
 
Putting Wings on the Elephant
Putting Wings on the ElephantPutting Wings on the Elephant
Putting Wings on the ElephantDataWorks Summit
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
Cloudera, Inc.
 
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
Michael Stack
 
Facebook's Approach to Big Data Storage Challenge
Facebook's Approach to Big Data Storage ChallengeFacebook's Approach to Big Data Storage Challenge
Facebook's Approach to Big Data Storage Challenge
DataWorks Summit
 
Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010
Cloudera, Inc.
 
Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at Facebook
MariaDB plc
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
Yoshinori Matsunobu
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)mundlapudi
 
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...Ontico
 
CephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at LastCephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at Last
Ceph Community
 
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaApache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaYahoo Developer Network
 
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
VMware Tanzu
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
Chris Nauroth
 

Similar to Facebook's HBase Backups - StampedeCon 2012 (20)

Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)[Hi c2011]building mission critical messaging system(guoqiang jerry)
[Hi c2011]building mission critical messaging system(guoqiang jerry)
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Putting Wings on the Elephant
Putting Wings on the ElephantPutting Wings on the Elephant
Putting Wings on the Elephant
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
 
Facebook's Approach to Big Data Storage Challenge
Facebook's Approach to Big Data Storage ChallengeFacebook's Approach to Big Data Storage Challenge
Facebook's Approach to Big Data Storage Challenge
 
Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010
 
Migrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at FacebookMigrating from InnoDB and HBase to MyRocks at Facebook
Migrating from InnoDB and HBase to MyRocks at Facebook
 
MyRocks introduction and production deployment
MyRocks introduction and production deploymentMyRocks introduction and production deployment
MyRocks introduction and production deployment
 
Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)Hadoop - Disk Fail In Place (DFIP)
Hadoop - Disk Fail In Place (DFIP)
 
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
Cистема распределенного, масштабируемого и высоконадежного хранения данных дл...
 
CephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at LastCephFS in Jewel: Stable at Last
CephFS in Jewel: Stable at Last
 
CLFS 2010
CLFS 2010CLFS 2010
CLFS 2010
 
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay RadiaApache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
Apache Hadoop India Summit 2011 Keynote talk "HDFS Federation" by Sanjay Radia
 
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
Hadoop - Just the Basics for Big Data Rookies (SpringOne2GX 2013)
 
Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4Hdfs 2016-hadoop-summit-san-jose-v4
Hdfs 2016-hadoop-summit-san-jose-v4
 
Hadoop, Taming Elephants
Hadoop, Taming ElephantsHadoop, Taming Elephants
Hadoop, Taming Elephants
 

More from StampedeCon

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
StampedeCon
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
StampedeCon
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
StampedeCon
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
StampedeCon
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
StampedeCon
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
StampedeCon
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
StampedeCon
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
StampedeCon
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
StampedeCon
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
StampedeCon
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
StampedeCon
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
StampedeCon
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
StampedeCon
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
StampedeCon
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
StampedeCon
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
StampedeCon
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
StampedeCon
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
StampedeCon
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
StampedeCon
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
StampedeCon
 

More from StampedeCon (20)

Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
Why Should We Trust You-Interpretability of Deep Neural Networks - StampedeCo...
 
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
The Search for a New Visual Search Beyond Language - StampedeCon AI Summit 2017
 
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
Predicting Outcomes When Your Outcomes are Graphs - StampedeCon AI Summit 2017
 
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
Novel Semi-supervised Probabilistic ML Approach to SNP Variant Calling - Stam...
 
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
How to Talk about AI to Non-analaysts - Stampedecon AI Summit 2017
 
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
Getting Started with Keras and TensorFlow - StampedeCon AI Summit 2017
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
Don't Start from Scratch: Transfer Learning for Novel Computer Vision Problem...
 
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
Bringing the Whole Elephant Into View Can Cognitive Systems Bring Real Soluti...
 
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
Automated AI The Next Frontier in Analytics - StampedeCon AI Summit 2017
 
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017AI in the Enterprise: Past,  Present &  Future - StampedeCon AI Summit 2017
AI in the Enterprise: Past, Present & Future - StampedeCon AI Summit 2017
 
A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017A Different Data Science Approach - StampedeCon AI Summit 2017
A Different Data Science Approach - StampedeCon AI Summit 2017
 
Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017Graph in Customer 360 - StampedeCon Big Data Conference 2017
Graph in Customer 360 - StampedeCon Big Data Conference 2017
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
Doing Big Data Using Amazon's Analogs - StampedeCon Big Data Conference 2017
 
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
Enabling New Business Capabilities with Cloud-based Streaming Data Architectu...
 
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
Big Data Meets IoT: Lessons From the Cloud on Polling, Collecting, and Analyz...
 
Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016Innovation in the Data Warehouse - StampedeCon 2016
Innovation in the Data Warehouse - StampedeCon 2016
 
Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016Creating a Data Driven Organization - StampedeCon 2016
Creating a Data Driven Organization - StampedeCon 2016
 
Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016Using The Internet of Things for Population Health Management - StampedeCon 2016
Using The Internet of Things for Population Health Management - StampedeCon 2016
 

Recently uploaded

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 

Facebook's HBase Backups - StampedeCon 2012

  • 1.
  • 2. HBASE Backups Pritam Damania Software Engineer, Facebook Aug 1, 2012
  • 3. Agenda 1 Introduction to HBASE and HDFS 2 Introduction to HBASE Backups 3 Facebook’s Backup solution 4 Results 5 Further Work
  • 5. What is HDFS ? ▪  Distributed FileSystem ▪  Runs on top of commodity hardware ▪  Scale to Petabytes of data ▪  Tolerates machine failures
  • 6. HDFS Data Model ▪  Data is logically organized into files and directories ▪  Files are divided into uniform-sized blocks ▪  Blocks are distributed across the nodes of the cluster and are replicated to handle hardware failure ▪  HDFS keeps checksums of data for corruption detection and recovery ▪  HDFS exposes block placement so that computation can be migrated to data
  • 7. HDFS Data Model (2) MetaInfo(Filename, replicationFactor, block-ids, …) /users/user1/data/part-0, repl:2, ids: {1,3}, … /users/user1/data/part-1, repl:3, ids: {2,4,5}, … Block Storage 1 2 1 4 2 5 2 3 4 3 4 5 5 7
  • 8. HDFS Architecture Metadata (Name, #replicas, …): Namenode /users/foo/data, 3, … Metadata ops Block ops Client Metadata ops Read Datanodes Datanodes Replication Blocks Write Rack 2 Rack 1 Client 8
  • 10. HBase in a nutshell §  distributed, large-scale data store §  can host very large tables, billions of rows x millions of columns §  efficient at random reads/writes §  open source project modeled after Google’s BigTable
  • 11. HBase Data Model •  An HBase table is: •  a sparse , three-dimensional array of cells, indexed by: RowKey, ColumnKey, Timestamp/Version •  sharded into regions along an ordered RowKey space •  Within each region: •  Data is grouped into column families ▪  Sort order within each column family: •  Row Key (asc), Column Key (asc), Timestamp (desc)
  • 12. HBase System Overview Database Layer HBASE Master Backup Master Region Region Region ... Server Server Server Storage Layer Coordination Service HDFS Zookeeper Quorum Namenode Secondary Namenode ZK ZK ... Peer Peer Datanode Datanode Datanode ...
  • 13. HBase Overview HBASE Region Server .... Region #2 Region #1 .... ColumnFamily #2 ColumnFamily #1 Memstore (in memory data structure) HFiles (in HDFS) flush Write Ahead Log ( in HDFS)
  • 15. Why Backups ? ▪  Data Corruption ▪  Operational error ▪  Hardware failures ▪  Disaster
  • 16. Hbase Backups – The Problem ▪  Need a consistent, point in time backup ▪  Issues : ▪  Live cluster, with traffic ▪  Data in MemStore ▪  Flushes and Compations in the background ▪  Regionserver death ▪  Regions moving
  • 17. CURRENT OPTIONS – Export Table ▪  Pros : ▪  Can export part or full table ▪  Map-Reduce job downloads data to output path provided ▪  Supports start time, end time and versions so could provide a consistent backup ▪  Can specify which Column Families to export ▪  Cons : ▪  Only one table at a time ▪  Full scans and random reads
  • 18. CURRENT OPTIONS - Copy Table ▪  Tool to copy existing table to a intra/inter cluster ▪  Pros : ▪  Another parallel replicated setup to switch ▪  Supports start time, end time, and versions ▪  Cluster being copied to could be in different setup ▪  Can specify which Column Families to export ▪  Cons : ▪  Keep another HBASE cluster up and ready ▪  Full scans and random reads
  • 20. Backups V1 Log(Put A) Application Backup Cluster Log(Put A) Put A Dedup HBase Verify
  • 21. Backups V1 – Pros and Cons ▪  Pros : ▪  Simple solution ▪  Consistency in backup ▪  Point in time restore ▪  Verification of backups ▪  Cons : ▪  Requires replay of large amount of transactions ▪  Requires double writes and deduplication
  • 22. Backups V2 Flush Region RegionServe Get File List Mapper r Flush Copy Files HDFS .regioninfo
  • 23. Backups V2 – Tuning ▪  Locality based mappers ▪  Use in rack replication ▪  Increase .Trash retention for HDFS ▪  Fault tolerant ▪  Use Backups V1 for point in time
  • 24. Backups V2 – Restore ▪  Rewrite backed up .regioninfo ▪  Move backup copy in place ▪  Add regions to .META using .regioninfo
  • 25. Backups V2 – Pros and Cons ▪  Pros : ▪  Faster restore ▪  Backup entire data in hours ▪  Consistency in backup ▪  Point in time restore ▪  Resilient to RS death, region moves ▪  Cons : ▪  Affects production cluster ▪  Not scalable with data growth
  • 26. Backups V2 – HDFS Improvements ▪  Overhead of copying large files ▪  Use locality of data ▪  HDFS HFiles are immutable ▪  HDFS blocks are immutable ▪  Hardlinks at block level!
  • 27. Fast Copy workflow Source Destination B1 B2 ……………….. B1’ B2’ …………………… …. FastCopy Client Add Block Create Destination Get Source NameNode Copy Block B1 B1’ B1 B1’ B1 B1’ B2 B2’ B2 B2’ B2 B2’ Date Node1 Date Node2 Date Node3
  • 28. FastCopy – Pros and Cons ▪  Pros : ▪  Extremely fast ▪  Lots of space saving ▪  Minimal impact to production cluster ▪  Cons : ▪  NameNode not aware ▪  Hardlinks lost on datanode death ▪  Balancer not aware.
  • 29. Operations ▪  Messages Use Case : ▪  3 stage (same cluster, off cluster, off data center) ▪  Stage 1 : once/ day ▪  Stage 2 : once / 10 day ▪  Stage 3 : once / 10 day ▪  Retention based on capacity
  • 31. Backup Numbers Example : ▪  40 TB table ▪  49 Mappers ▪  Normal Copy – 15 hours ▪  Fast Copy – 1.5 hours
  • 32. Disk Savings - FastCopy Disk usage in percent
  • 33. Network Traffic - FastCopy
  • 35. Further Work ▪  Backup HLogs ▪  Point in time backups ▪  Namenode level Hard links ▪  Code and JIRAs : ▪  HBASE 4618 ▪  HDFS code in github (https://github.com/facebook/hadoop-20)
  • 36. Acknowledgements ▪  Madhuwanti Vaidya ▪  Ryan Thiessen ▪  Karthik Ranganathan ▪  Paul Tuckfield ▪  Kannan Muthukkaruppan ▪  Hairong Kuang ▪  Dhruba Borthakur ▪  Amitanand Aiyer ▪  Mikhail Bautin
  • 38. (c) 2009 Facebook, Inc. or its licensors. "Facebook" is a registered trademark of Facebook, Inc.. All rights reserved. 1.0