SlideShare a Scribd company logo
1 of 43
June	
  13,	
  2012	
  

HBase Consistency and
Performance Improvements
Esteban	
  Gu+errez,	
  Gregory	
  Chanan	
  
{esteban,	
  gchanan}@cloudera.com	
  
HBase Consistency

    •  ACID guarantees within a single row
    •  “Any row returned by the scan will be a
       consistent view (i.e. that version of the
       complete row existed at some point in
       time)”[1]

    [1] http://hbase.apache.org/acid-semantics.html



2
                       ©2012 Cloudera, Inc. All Rights Reserved.
HBase Consistency Issues

    •  Write Consistency Issues
    •  Read Consistency Issues




3
                    ©2012 Cloudera, Inc. All Rights Reserved.
Write Consistency

    HBASE-4552

    •  Importing Multiple CFs HFiles
       is not an atomic operation




4
                     ©2012 Cloudera, Inc. All Rights Reserved.
Write Consistency

    HBASE-4552

•  Importing Multiple CFs HFiles
was not an atomic operation
   is




5
                 ©2012 Cloudera, Inc. All Rights Reserved.
Write Consistency

    HBASE-4552
                                       HRegion.bulkLoadHFile()


                       HFile1:         HFile2:                    HFile3:     HFile4:
             Row 1


                     fam1:col1       fam2:col2                  fam3:col3   fam4:col4

                      val1
     T1   Scan



     T2   Scan        val1               val2

     T3   Scan
                      val1               val2                       val3

     T4   Scan
                      val1               val2                       val3      val4

                                                                                        < HBase 0.90.5


6
                             ©2012 Cloudera, Inc. All Rights Reserved.
Write Consistency

    HBASE-4552
                                        HRegion.bulkLoadHFiles()


                       HFile1:           HFile2:                    HFile3:     HFile4:
             Row 1


                     fam1:col1         fam2:col2                  fam3:col3   fam4:col4


     T1   Scan       public void bulkLoadHFiles(List<Pair<byte[], String>>
                     familyPaths) {!
                     ...!
                        startRegionOperation(); ç lock.writeLock().lock()!
     T2   Scan       } finally {!
                        closeBulkRegionOperation(); !
                     }!
     T3   Scan
                     ...!
                     !


     T4   Scan

                                                                                          ≥ HBase 0.90.5


7
                               ©2012 Cloudera, Inc. All Rights Reserved.
Write Consistency

    HBASE-4552
                                        HRegion.bulkLoadHFiles()


                       HFile1:           HFile2:                    HFile3:     HFile4:
             Row 1


                     fam1:col1         fam2:col2                  fam3:col3   fam4:col4


     T1   Scan       public void bulkLoadHFiles(List<Pair<byte[], String>>
                     familyPaths) {!
                     ...!
                        startRegionOperation(); !
     T2   Scan       } finally {!
                        closeBulkRegionOperation(); ç lock.writeLock().unlock()!
                     }!
     T3   Scan
                     ...!
                     !


     T4   Scan

                                                                                          ≥ HBase 0.90.5


8
                               ©2012 Cloudera, Inc. All Rights Reserved.
Write Consistency

    HBASE-4552
                                        HRegion.bulkLoadHFiles()


                       HFile1:           HFile2:                    HFile3:         HFile4:
             Row 1


                     fam1:col1         fam2:col2                  fam3:col3       fam4:col4


     T1   Scan       public void bulkLoadHFiles(List<Pair<byte[], String>>
                     familyPaths) {!
                     ...!
                        startRegionOperation(); !
     T2   Scan       } finally {!
                        closeBulkRegionOperation(); !
                     }!
     T3   Scan
                     ...!
                     !


     T4   Scan         val1                  val2                          val3      val4
                                                                                              ≥ HBase 0.90.5


9
                               ©2012 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856

 •  Seen only twice in the
    wilderness
 •  Hard to detect if application
    monitoring is not
    implemented


10
                   ©2012 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 •  Table size ≈ 50 M records
 •  Large number of CFs
 •  New records are continuously added to
    the table
 •  Concurrent MR Jobs on the same table
 •  Cluster has to meet strict SLAs


11
                 ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Symptoms
                                                                                 Run 1
              …             …                                                …
                            SPLIT_RAW_FILES                                  …
     Map-Reduce Framework
                            Map output records                               500,000




12
                                     ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Symptoms
                                                                                 Run 1   Run 2
              …             …                                                …           …
                            SPLIT_RAW_FILES                                  …           …
     Map-Reduce Framework
                            Map output records                               500,000     499,997




13
                                     ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Symptoms
                                                                                 Run 1   Run 2     Run 3
              …             …                                                …           …         …
                            SPLIT_RAW_FILES                                  …           …         …
     Map-Reduce Framework
                            Map output records                               500,000     499,997   500,001




14
                                     ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Symptoms
                                                                                 Run 1   Run 2     Run 3
                …           …                                                …           …         …
                            SPLIT_RAW_FILES                                  …           …         …
     Map-Reduce Framework
                            Map output records                               500,000     499,997   500,001


     cf1:col1        cf2:col2             cf3:col3
     cf1:col1
                     cf2:col2             cf3:col3
     cf1:col1



15
                                     ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Symptoms
                                                                                 Run 1   Run 2     Run 3
                …           …                                                …           …         …
                            SPLIT_RAW_FILES                                  …           …         …
     Map-Reduce Framework
                            Map output records                               500,000     499,997   500,001


     cf1:col1        cf2:col2             cf3:col3
     cf1:col1
                     cf2:col2             cf3:col3
     cf1:col1
      Scale testing shows between 0.5% to 2% of inconsistent results between runs


16
                                     ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Impact
 •  Result is used to update user facing
    records
 •  Customer is not happy




17
                  ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Impact
 •  Result is used to update user facing
    records
 •  Customer is not happy
     — “Where is my data?”




18
                    ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Workarounds
 •  Re-try scan if not all CFs are present
 •  Re-submit job if any inconsistency is found




19
                  ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Workarounds
 •  Re-try scan if not all CFs are present
 •  Re-submit job if any inconsistency is found
 •  Sometimes that is not possible




20
                  ©2011 Cloudera, Inc. All Rights Reserved.
Read Consistency

 HBASE-2856
 Workarounds
 •  Re-try scan if not all CFs are present
 •  Re-submit job if any inconsistency is found
 •  Sometimes that is not possible SLAs!




21
                  ©2011 Cloudera, Inc. All Rights Reserved.
MVCC

 •  HBase maintains ACID semantics using
    Multiversion Concurrency Control
 •  Instead of overwriting state, create a new
    version of object with timestamp
     Timestamp   Row             fam1:col1                          fam2:col2
     t1          row1            val1                               val1




22
                        ©2012 Cloudera, Inc. All Rights Reserved.
MVCC

 •  HBase maintains ACID semantics using
    Multiversion Concurrency Control
 •  Instead of overwriting state, create a new
    version of object with timestamp
     Timestamp   Row             fam1:col1                          fam2:col2
     t2          row1            val2                               val2
     t1          row1            val1                               val1
 •  Reads never have to block
 •  Note this timestamp is not externally visible!
    Internally called “memStoreTs”


23
                        ©2012 Cloudera, Inc. All Rights Reserved.
HBase Write Path

 1.  Write to WAL (per RegionServer)
 2.  Write to In-Memory Sorted Map (MemStore)
     (per Region+ColumnFamily)
 3.  Flush MemStore to disk as HFile when
     MemStore hits configurable
     hbase.hregion.memstore.flush.size




24
                   ©2012 Cloudera, Inc. All Rights Reserved.
Internals / Bug




     Now that we know the internals – what
               could go wrong?




25
                  ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together
 Let’s go back to the beginning…

                        MemStore
     Timestamp   Row             fam1:col1                        fam2:col2
     t1          row1            val1                             val1




26
                          ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together
 Let’s go back to the beginning…

                        MemStore
     Timestamp   Row             fam1:col1                        fam2:col2
     t1          row1            val1                             val1



 And start a scan.




27
                          ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together
 Let’s go back to the beginning…

                        MemStore
     Timestamp   Row             fam1:col1                        fam2:col2
     t2          row1            val2                             val2
     t1          row1            val1                             val1

 And start a scan.
 And concurrently put.




28
                          ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together
 Let’s go back to the beginning…

                        MemStore
     Timestamp   Row             fam1:col1                        fam2:col2
     t2          row1            val2                             val2
     t1          row1            val1                             val1

 And start a scan.                                                       HFile
 And concurrently put.                                        Row           fam2:col2:

 Which causes a flush.                                        row1          val2
                                                              row1          val1




29
                          ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together
 Now, scan needs to make sense of this…

                 MemStore
     Ts          Row           fam1:col1
     t2          row1          val2
     t1          row1          val1

                  HFile
          Row           fam2:col2:
          row1          val2
          row1          val1




30
                                ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together
 Now, scan needs to make sense of this…

                    MemStore
     Ts             Row           fam1:col1
     t2             row1          val2
     t1             row1          val1

                     HFile
             Row           fam2:col2:
             row1          val2
             row1          val1
          But HFile has no timestamp!




31
                                   ©2012 Cloudera, Inc. All Rights Reserved.
Putting it together
 Now, scan needs to make sense of this…

                    MemStore
     Ts             Row           fam1:col1
     t2             row1          val2
     t1             row1          val1

                     HFile                                                     Inconsistent Result
             Row           fam2:col2:                      Row                    fam1:col1     fam2:col2
             row1          val2                            row1                   val1          val2
             row1          val1
          But HFile has no timestamp!




32
                                   ©2012 Cloudera, Inc. All Rights Reserved.
Solution
 Store the timestamp in the Hfile
          MemStore                                                      HFile
Ts        Row       fam1:col1                       Ts                 Row      fam2:col2:
t2        row1      val2                            t2                 row1     val2
t1        row1      val1                            t1                 row1     val1


                           Correct Result
             Row             fam1:col1                          fam2:col2
             row1            val1                               val2


 Now we have all the information we need


33
                           ©2012 Cloudera, Inc. All Rights Reserved.
Consistency
 •  Only some of the consistency issues in 0.90
    –  e.g. HBASE-5121: MajorCompaction may
       affect scan's correctness
 •  Solution: Upgrade to 0.92 or 0.94




34
                    ©2012 Cloudera, Inc. All Rights Reserved.
HBase 0.94




        “Performance Release”




35
              ©2012 Cloudera, Inc. All Rights Reserved.
Performance Improvements in 0.94
 •  HBASE-5047 Support checksums in HBase block cache
 •  HBASE-5199 Delete out of TTL store files before
    compaction selection
 •  HBASE-4608 HLog Compression
 •  HBASE-4465 Lazy-seek optimization for StoreFile
    scanners




36
                     ©2012 Cloudera, Inc. All Rights Reserved.
Performance Improvements in 0.94
 •  HBASE-5047 Support checksums in HBase block cache
 •  HBASE-5199 Delete out of TTL store files before
    compaction selection
 •  HBASE-4608 HLog Compression
 •  HBASE-4465 Lazy-seek optimization for StoreFile
    scanners




37
                     ©2012 Cloudera, Inc. All Rights Reserved.
HBASE-5047
 •  HDFS stores checksum is separate file
            HFile              Checksum




 •  So each file read actually requires two disk iops
 •  HBase often bottlenecked by random disk ipos




38
                        ©2012 Cloudera, Inc. All Rights Reserved.
HBASE-5047 Solution
 •  Solution: Store checksum in HFile block
              HFile                                   HFile Block
                                                            Chksum

                                                               Data




 •  On by default (“hbase.regionserver.checksum.verify”)
 •  Bytes per checksum (“hbase.hstore.bytes.per.checksum”) –
    default is 16K




39
                         ©2012 Cloudera, Inc. All Rights Reserved.
Performance Improvements in 0.94
 •  HBASE-5047 Support checksums in HBase block cache
 •  HBASE-5199 Delete out of TTL store files before
    compaction selection
 •  HBASE-4608 HLog Compression
 •  HBASE-4465 Lazy-seek optimization for StoreFile
    scanners




40
                     ©2012 Cloudera, Inc. All Rights Reserved.
HBASE-5199
 •  User can specify TTL per column family
 •  If all values in the HFile are expired, delete HFile rather
    than compact




 •  Off by default, turn on via
    ("hbase.store.delete.expired.storefile“)


41
                             ©2012 Cloudera, Inc. All Rights Reserved.
Conclusion
 •  Most consistency issues fixed in 0.92/
    CDH4
 •  Performance improvements in 0.94
 •  0.94 is wire compatible with 0.92, so will
    be in a CDH4 update




42
                   ©2012 Cloudera, Inc. All Rights Reserved.
References
 •  HBase Acid Semantics,
    http://hbase.apache.org/acid-semantics.html
 •  Apache HBase Meetup @ SU, Michael Stack.
    http://files.meetup.com/
    1350427/20120327hbase_meetup.pdf
 •  HBase Internals, Lars Hofhansl.
    http://www.cloudera.com/resource/hbasecon-2012-
    learning-hbase-internals/




43
                      ©2012 Cloudera, Inc. All Rights Reserved.

More Related Content

What's hot

Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing DataWorks Summit
 
Real time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaReal time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaTimothy Spann
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security ArchitectureOwen O'Malley
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
Apache HBase - Just the Basics
Apache HBase - Just the BasicsApache HBase - Just the Basics
Apache HBase - Just the BasicsHBaseCon
 
Using Kafka to scale database replication
Using Kafka to scale database replicationUsing Kafka to scale database replication
Using Kafka to scale database replicationVenu Ryali
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingDataWorks Summit
 
Hazelcast Essentials
Hazelcast EssentialsHazelcast Essentials
Hazelcast EssentialsRahul Gupta
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneHadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneErik Krogen
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and FutureDataWorks Summit
 
データインターフェースとしてのHadoop ~HDFSとクラウドストレージと私~ (NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
データインターフェースとしてのHadoop ~HDFSとクラウドストレージと私~ (NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...データインターフェースとしてのHadoop ~HDFSとクラウドストレージと私~ (NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
データインターフェースとしてのHadoop ~HDFSとクラウドストレージと私~ (NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...NTT DATA Technology & Innovation
 
[234]멀티테넌트 하둡 클러스터 운영 경험기
[234]멀티테넌트 하둡 클러스터 운영 경험기[234]멀티테넌트 하둡 클러스터 운영 경험기
[234]멀티테넌트 하둡 클러스터 운영 경험기NAVER D2
 
Hortonworks Sandbox Startup Guide for VirtualBox
Hortonworks Sandbox Startup Guide for VirtualBoxHortonworks Sandbox Startup Guide for VirtualBox
Hortonworks Sandbox Startup Guide for VirtualBoxHortonworks
 

What's hot (20)

Hadoop and OpenStack
Hadoop and OpenStackHadoop and OpenStack
Hadoop and OpenStack
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing Apache Tez: Accelerating Hadoop Query Processing
Apache Tez: Accelerating Hadoop Query Processing
 
Real time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaReal time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafka
 
Hadoop Security Architecture
Hadoop Security ArchitectureHadoop Security Architecture
Hadoop Security Architecture
 
Apache Hive - Introduction
Apache Hive - IntroductionApache Hive - Introduction
Apache Hive - Introduction
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
Apache HBase - Just the Basics
Apache HBase - Just the BasicsApache HBase - Just the Basics
Apache HBase - Just the Basics
 
Using Kafka to scale database replication
Using Kafka to scale database replicationUsing Kafka to scale database replication
Using Kafka to scale database replication
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
HDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFSHDFS Tiered Storage: Mounting Object Stores in HDFS
HDFS Tiered Storage: Mounting Object Stores in HDFS
 
Hazelcast Essentials
Hazelcast EssentialsHazelcast Essentials
Hazelcast Essentials
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
Hadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of OzoneHadoop Meetup Jan 2019 - Overview of Ozone
Hadoop Meetup Jan 2019 - Overview of Ozone
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
データインターフェースとしてのHadoop ~HDFSとクラウドストレージと私~ (NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
データインターフェースとしてのHadoop ~HDFSとクラウドストレージと私~ (NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...データインターフェースとしてのHadoop ~HDFSとクラウドストレージと私~ (NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
データインターフェースとしてのHadoop ~HDFSとクラウドストレージと私~ (NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
 
[234]멀티테넌트 하둡 클러스터 운영 경험기
[234]멀티테넌트 하둡 클러스터 운영 경험기[234]멀티테넌트 하둡 클러스터 운영 경험기
[234]멀티테넌트 하둡 클러스터 운영 경험기
 
Hortonworks Sandbox Startup Guide for VirtualBox
Hortonworks Sandbox Startup Guide for VirtualBoxHortonworks Sandbox Startup Guide for VirtualBox
Hortonworks Sandbox Startup Guide for VirtualBox
 

Viewers also liked

Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsHadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsCloudera, Inc.
 
Streaming map reduce
Streaming map reduceStreaming map reduce
Streaming map reducedanirayan
 
Chrome extensions
Chrome extensions Chrome extensions
Chrome extensions Ahmad Tahhan
 
阿里自研数据库 Ocean base实践
阿里自研数据库 Ocean base实践阿里自研数据库 Ocean base实践
阿里自研数据库 Ocean base实践wuqiuping
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBaseHBaseCon
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBaseCarol McDonald
 
Hortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks
 
唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pubChao Zhu
 
Content Identification using HBase
Content Identification using HBaseContent Identification using HBase
Content Identification using HBaseHBaseCon
 
New Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's GuideNew Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's GuideHBaseCon
 
Design Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and KijiDesign Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and KijiHBaseCon
 
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)Amazon Web Services
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon RedshiftAmazon Web Services
 
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"Inhacking
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftAmazon Web Services
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0enissoz
 

Viewers also liked (20)

Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance ImprovementsHadoop Summit 2012 | HBase Consistency and Performance Improvements
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
 
Apache HBase 0.98
Apache HBase 0.98Apache HBase 0.98
Apache HBase 0.98
 
Streaming map reduce
Streaming map reduceStreaming map reduce
Streaming map reduce
 
Chrome extensions
Chrome extensions Chrome extensions
Chrome extensions
 
阿里自研数据库 Ocean base实践
阿里自研数据库 Ocean base实践阿里自研数据库 Ocean base实践
阿里自研数据库 Ocean base实践
 
Hbase Nosql
Hbase NosqlHbase Nosql
Hbase Nosql
 
IoT:what about data storage?
IoT:what about data storage?IoT:what about data storage?
IoT:what about data storage?
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 
Hortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical ApplicationsHortonworks Technical Workshop: HBase For Mission Critical Applications
Hortonworks Technical Workshop: HBase For Mission Critical Applications
 
唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub唯品会大数据实践 Sacc pub
唯品会大数据实践 Sacc pub
 
Content Identification using HBase
Content Identification using HBaseContent Identification using HBase
Content Identification using HBase
 
New Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's GuideNew Security Features in Apache HBase 0.98: An Operator's Guide
New Security Features in Apache HBase 0.98: An Operator's Guide
 
Design Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and KijiDesign Patterns for Building 360-degree Views with HBase and Kiji
Design Patterns for Building 360-degree Views with HBase and Kiji
 
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
AWS re:Invent 2016: What’s New with Amazon Redshift (BDA304)
 
(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
SE2016 Java Valerii Moisieienko "Apache HBase Workshop"
 
Building Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon RedshiftBuilding Your Data Warehouse with Amazon Redshift
Building Your Data Warehouse with Amazon Redshift
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
 

Similar to HBase Consistency and Performance Improvements

"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at ClouderaDataconomy Media
 
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - SlidesSeveralnines
 
"Mobage DBA Fight against Big Data" - NHN TE
"Mobage DBA Fight against Big Data" - NHN TE"Mobage DBA Fight against Big Data" - NHN TE
"Mobage DBA Fight against Big Data" - NHN TERyosuke IWANAGA
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightGert Drapers
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterDataWorks Summit
 
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HivePerformance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HiveYukinori Suda
 
READPAST & Furious: Locking
READPAST & Furious: Locking READPAST & Furious: Locking
READPAST & Furious: Locking Mark Broadbent
 
The Practice of Alluxio in Ctrip Bigdata Platform
The Practice of Alluxio in Ctrip Bigdata PlatformThe Practice of Alluxio in Ctrip Bigdata Platform
The Practice of Alluxio in Ctrip Bigdata PlatformAlluxio, Inc.
 
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer SimonDocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simonlucenerevolution
 
Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValues Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValues Lucidworks (Archived)
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Alex Levenson
 
Steps to identify ONTAP latency related issues
Steps to identify ONTAP latency related issuesSteps to identify ONTAP latency related issues
Steps to identify ONTAP latency related issuesAshwin Pawar
 
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark StreamingReal-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark StreamingAbdelhamide EL ARIB
 
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera, Inc.
 
Oracle 12.2 sharding learning more
Oracle 12.2 sharding learning moreOracle 12.2 sharding learning more
Oracle 12.2 sharding learning moreLeyi (Kamus) Zhang
 
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...buildacloud
 

Similar to HBase Consistency and Performance Improvements (20)

"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
"Petascale Genomics with Spark", Sean Owen,Director of Data Science at Cloudera
 
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
9 DevOps Tips for Going in Production with Galera Cluster for MySQL - Slides
 
"Mobage DBA Fight against Big Data" - NHN TE
"Mobage DBA Fight against Big Data" - NHN TE"Mobage DBA Fight against Big Data" - NHN TE
"Mobage DBA Fight against Big Data" - NHN TE
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
MySQL Replication
MySQL ReplicationMySQL Replication
MySQL Replication
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
 
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HivePerformance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
 
READPAST & Furious: Locking
READPAST & Furious: Locking READPAST & Furious: Locking
READPAST & Furious: Locking
 
The Practice of Alluxio in Ctrip Bigdata Platform
The Practice of Alluxio in Ctrip Bigdata PlatformThe Practice of Alluxio in Ctrip Bigdata Platform
The Practice of Alluxio in Ctrip Bigdata Platform
 
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer SimonDocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
 
Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValuesColumn Stride Fields aka. DocValues
Column Stride Fields aka. DocValues
 
Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValues Column Stride Fields aka. DocValues
Column Stride Fields aka. DocValues
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
 
Steps to identify ONTAP latency related issues
Steps to identify ONTAP latency related issuesSteps to identify ONTAP latency related issues
Steps to identify ONTAP latency related issues
 
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark StreamingReal-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
Real-time Data Pipeline: Kafka Streams / Kafka Connect versus Spark Streaming
 
14 lab-planing
14 lab-planing14 lab-planing
14 lab-planing
 
14 lab-planing
14 lab-planing14 lab-planing
14 lab-planing
 
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
 
Oracle 12.2 sharding learning more
Oracle 12.2 sharding learning moreOracle 12.2 sharding learning more
Oracle 12.2 sharding learning more
 
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
Building Reliable Cloud Storage with Riak and CloudStack - Andy Gross, Chief ...
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Miss joya
 
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiCall Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiNehru place Escorts
 
Kesar Bagh Call Girl Price 9548273370 , Lucknow Call Girls Service
Kesar Bagh Call Girl Price 9548273370 , Lucknow Call Girls ServiceKesar Bagh Call Girl Price 9548273370 , Lucknow Call Girls Service
Kesar Bagh Call Girl Price 9548273370 , Lucknow Call Girls Servicemakika9823
 
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% SafeBangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safenarwatsonia7
 
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...narwatsonia7
 
Aspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliAspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliRewAs ALI
 
VIP Call Girls Pune Vani 9907093804 Short 1500 Night 6000 Best call girls Ser...
VIP Call Girls Pune Vani 9907093804 Short 1500 Night 6000 Best call girls Ser...VIP Call Girls Pune Vani 9907093804 Short 1500 Night 6000 Best call girls Ser...
VIP Call Girls Pune Vani 9907093804 Short 1500 Night 6000 Best call girls Ser...Miss joya
 
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...Garima Khatri
 
Call Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls Service
Call Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls ServiceCall Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls Service
Call Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls Servicenarwatsonia7
 
Call Girls Chennai Megha 9907093804 Independent Call Girls Service Chennai
Call Girls Chennai Megha 9907093804 Independent Call Girls Service ChennaiCall Girls Chennai Megha 9907093804 Independent Call Girls Service Chennai
Call Girls Chennai Megha 9907093804 Independent Call Girls Service ChennaiNehru place Escorts
 
Call Girls Yelahanka Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Yelahanka Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Yelahanka Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Yelahanka Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore EscortsVIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escortsaditipandeya
 
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...Miss joya
 
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort ServiceCall Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Serviceparulsinha
 
Call Girls Doddaballapur Road Just Call 7001305949 Top Class Call Girl Servic...
Call Girls Doddaballapur Road Just Call 7001305949 Top Class Call Girl Servic...Call Girls Doddaballapur Road Just Call 7001305949 Top Class Call Girl Servic...
Call Girls Doddaballapur Road Just Call 7001305949 Top Class Call Girl Servic...narwatsonia7
 
Call Girl Chennai Indira 9907093804 Independent Call Girls Service Chennai
Call Girl Chennai Indira 9907093804 Independent Call Girls Service ChennaiCall Girl Chennai Indira 9907093804 Independent Call Girls Service Chennai
Call Girl Chennai Indira 9907093804 Independent Call Girls Service ChennaiNehru place Escorts
 
CALL ON ➥9907093804 🔝 Call Girls Hadapsar ( Pune) Girls Service
CALL ON ➥9907093804 🔝 Call Girls Hadapsar ( Pune)  Girls ServiceCALL ON ➥9907093804 🔝 Call Girls Hadapsar ( Pune)  Girls Service
CALL ON ➥9907093804 🔝 Call Girls Hadapsar ( Pune) Girls ServiceMiss joya
 
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Availablenarwatsonia7
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalorenarwatsonia7
 
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowSonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowRiya Pathan
 

Recently uploaded (20)

Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
Low Rate Call Girls Pune Esha 9907093804 Short 1500 Night 6000 Best call girl...
 
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service ChennaiCall Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
 
Kesar Bagh Call Girl Price 9548273370 , Lucknow Call Girls Service
Kesar Bagh Call Girl Price 9548273370 , Lucknow Call Girls ServiceKesar Bagh Call Girl Price 9548273370 , Lucknow Call Girls Service
Kesar Bagh Call Girl Price 9548273370 , Lucknow Call Girls Service
 
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% SafeBangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
Bangalore Call Girls Marathahalli 📞 9907093804 High Profile Service 100% Safe
 
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
VIP Call Girls Tirunelveli Aaradhya 8250192130 Independent Escort Service Tir...
 
Aspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas AliAspirin presentation slides by Dr. Rewas Ali
Aspirin presentation slides by Dr. Rewas Ali
 
VIP Call Girls Pune Vani 9907093804 Short 1500 Night 6000 Best call girls Ser...
VIP Call Girls Pune Vani 9907093804 Short 1500 Night 6000 Best call girls Ser...VIP Call Girls Pune Vani 9907093804 Short 1500 Night 6000 Best call girls Ser...
VIP Call Girls Pune Vani 9907093804 Short 1500 Night 6000 Best call girls Ser...
 
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
VIP Mumbai Call Girls Hiranandani Gardens Just Call 9920874524 with A/C Room ...
 
Call Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls Service
Call Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls ServiceCall Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls Service
Call Girls Service Bellary Road Just Call 7001305949 Enjoy College Girls Service
 
Call Girls Chennai Megha 9907093804 Independent Call Girls Service Chennai
Call Girls Chennai Megha 9907093804 Independent Call Girls Service ChennaiCall Girls Chennai Megha 9907093804 Independent Call Girls Service Chennai
Call Girls Chennai Megha 9907093804 Independent Call Girls Service Chennai
 
Call Girls Yelahanka Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Yelahanka Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Yelahanka Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Yelahanka Just Call 7001305949 Top Class Call Girl Service Available
 
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore EscortsVIP Call Girls Indore Kirti 💚😋  9256729539 🚀 Indore Escorts
VIP Call Girls Indore Kirti 💚😋 9256729539 🚀 Indore Escorts
 
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
VIP Call Girls Pune Vrinda 9907093804 Short 1500 Night 6000 Best call girls S...
 
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort ServiceCall Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
Call Girls Service In Shyam Nagar Whatsapp 8445551418 Independent Escort Service
 
Call Girls Doddaballapur Road Just Call 7001305949 Top Class Call Girl Servic...
Call Girls Doddaballapur Road Just Call 7001305949 Top Class Call Girl Servic...Call Girls Doddaballapur Road Just Call 7001305949 Top Class Call Girl Servic...
Call Girls Doddaballapur Road Just Call 7001305949 Top Class Call Girl Servic...
 
Call Girl Chennai Indira 9907093804 Independent Call Girls Service Chennai
Call Girl Chennai Indira 9907093804 Independent Call Girls Service ChennaiCall Girl Chennai Indira 9907093804 Independent Call Girls Service Chennai
Call Girl Chennai Indira 9907093804 Independent Call Girls Service Chennai
 
CALL ON ➥9907093804 🔝 Call Girls Hadapsar ( Pune) Girls Service
CALL ON ➥9907093804 🔝 Call Girls Hadapsar ( Pune)  Girls ServiceCALL ON ➥9907093804 🔝 Call Girls Hadapsar ( Pune)  Girls Service
CALL ON ➥9907093804 🔝 Call Girls Hadapsar ( Pune) Girls Service
 
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service AvailableCall Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
Call Girls Whitefield Just Call 7001305949 Top Class Call Girl Service Available
 
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service BangaloreCall Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
Call Girl Bangalore Nandini 7001305949 Independent Escort Service Bangalore
 
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call NowSonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
 

HBase Consistency and Performance Improvements

  • 1. June  13,  2012   HBase Consistency and Performance Improvements Esteban  Gu+errez,  Gregory  Chanan   {esteban,  gchanan}@cloudera.com  
  • 2. HBase Consistency •  ACID guarantees within a single row •  “Any row returned by the scan will be a consistent view (i.e. that version of the complete row existed at some point in time)”[1] [1] http://hbase.apache.org/acid-semantics.html 2 ©2012 Cloudera, Inc. All Rights Reserved.
  • 3. HBase Consistency Issues •  Write Consistency Issues •  Read Consistency Issues 3 ©2012 Cloudera, Inc. All Rights Reserved.
  • 4. Write Consistency HBASE-4552 •  Importing Multiple CFs HFiles is not an atomic operation 4 ©2012 Cloudera, Inc. All Rights Reserved.
  • 5. Write Consistency HBASE-4552 •  Importing Multiple CFs HFiles was not an atomic operation is 5 ©2012 Cloudera, Inc. All Rights Reserved.
  • 6. Write Consistency HBASE-4552 HRegion.bulkLoadHFile() HFile1: HFile2: HFile3: HFile4: Row 1 fam1:col1 fam2:col2 fam3:col3 fam4:col4 val1 T1 Scan T2 Scan val1 val2 T3 Scan val1 val2 val3 T4 Scan val1 val2 val3 val4 < HBase 0.90.5 6 ©2012 Cloudera, Inc. All Rights Reserved.
  • 7. Write Consistency HBASE-4552 HRegion.bulkLoadHFiles() HFile1: HFile2: HFile3: HFile4: Row 1 fam1:col1 fam2:col2 fam3:col3 fam4:col4 T1 Scan public void bulkLoadHFiles(List<Pair<byte[], String>> familyPaths) {! ...! startRegionOperation(); ç lock.writeLock().lock()! T2 Scan } finally {! closeBulkRegionOperation(); ! }! T3 Scan ...! ! T4 Scan ≥ HBase 0.90.5 7 ©2012 Cloudera, Inc. All Rights Reserved.
  • 8. Write Consistency HBASE-4552 HRegion.bulkLoadHFiles() HFile1: HFile2: HFile3: HFile4: Row 1 fam1:col1 fam2:col2 fam3:col3 fam4:col4 T1 Scan public void bulkLoadHFiles(List<Pair<byte[], String>> familyPaths) {! ...! startRegionOperation(); ! T2 Scan } finally {! closeBulkRegionOperation(); ç lock.writeLock().unlock()! }! T3 Scan ...! ! T4 Scan ≥ HBase 0.90.5 8 ©2012 Cloudera, Inc. All Rights Reserved.
  • 9. Write Consistency HBASE-4552 HRegion.bulkLoadHFiles() HFile1: HFile2: HFile3: HFile4: Row 1 fam1:col1 fam2:col2 fam3:col3 fam4:col4 T1 Scan public void bulkLoadHFiles(List<Pair<byte[], String>> familyPaths) {! ...! startRegionOperation(); ! T2 Scan } finally {! closeBulkRegionOperation(); ! }! T3 Scan ...! ! T4 Scan val1 val2 val3 val4 ≥ HBase 0.90.5 9 ©2012 Cloudera, Inc. All Rights Reserved.
  • 10. Read Consistency HBASE-2856 •  Seen only twice in the wilderness •  Hard to detect if application monitoring is not implemented 10 ©2012 Cloudera, Inc. All Rights Reserved.
  • 11. Read Consistency HBASE-2856 •  Table size ≈ 50 M records •  Large number of CFs •  New records are continuously added to the table •  Concurrent MR Jobs on the same table •  Cluster has to meet strict SLAs 11 ©2011 Cloudera, Inc. All Rights Reserved.
  • 12. Read Consistency HBASE-2856 Symptoms Run 1 … … … SPLIT_RAW_FILES … Map-Reduce Framework Map output records 500,000 12 ©2011 Cloudera, Inc. All Rights Reserved.
  • 13. Read Consistency HBASE-2856 Symptoms Run 1 Run 2 … … … … SPLIT_RAW_FILES … … Map-Reduce Framework Map output records 500,000 499,997 13 ©2011 Cloudera, Inc. All Rights Reserved.
  • 14. Read Consistency HBASE-2856 Symptoms Run 1 Run 2 Run 3 … … … … … SPLIT_RAW_FILES … … … Map-Reduce Framework Map output records 500,000 499,997 500,001 14 ©2011 Cloudera, Inc. All Rights Reserved.
  • 15. Read Consistency HBASE-2856 Symptoms Run 1 Run 2 Run 3 … … … … … SPLIT_RAW_FILES … … … Map-Reduce Framework Map output records 500,000 499,997 500,001 cf1:col1 cf2:col2 cf3:col3 cf1:col1 cf2:col2 cf3:col3 cf1:col1 15 ©2011 Cloudera, Inc. All Rights Reserved.
  • 16. Read Consistency HBASE-2856 Symptoms Run 1 Run 2 Run 3 … … … … … SPLIT_RAW_FILES … … … Map-Reduce Framework Map output records 500,000 499,997 500,001 cf1:col1 cf2:col2 cf3:col3 cf1:col1 cf2:col2 cf3:col3 cf1:col1 Scale testing shows between 0.5% to 2% of inconsistent results between runs 16 ©2011 Cloudera, Inc. All Rights Reserved.
  • 17. Read Consistency HBASE-2856 Impact •  Result is used to update user facing records •  Customer is not happy 17 ©2011 Cloudera, Inc. All Rights Reserved.
  • 18. Read Consistency HBASE-2856 Impact •  Result is used to update user facing records •  Customer is not happy — “Where is my data?” 18 ©2011 Cloudera, Inc. All Rights Reserved.
  • 19. Read Consistency HBASE-2856 Workarounds •  Re-try scan if not all CFs are present •  Re-submit job if any inconsistency is found 19 ©2011 Cloudera, Inc. All Rights Reserved.
  • 20. Read Consistency HBASE-2856 Workarounds •  Re-try scan if not all CFs are present •  Re-submit job if any inconsistency is found •  Sometimes that is not possible 20 ©2011 Cloudera, Inc. All Rights Reserved.
  • 21. Read Consistency HBASE-2856 Workarounds •  Re-try scan if not all CFs are present •  Re-submit job if any inconsistency is found •  Sometimes that is not possible SLAs! 21 ©2011 Cloudera, Inc. All Rights Reserved.
  • 22. MVCC •  HBase maintains ACID semantics using Multiversion Concurrency Control •  Instead of overwriting state, create a new version of object with timestamp Timestamp Row fam1:col1 fam2:col2 t1 row1 val1 val1 22 ©2012 Cloudera, Inc. All Rights Reserved.
  • 23. MVCC •  HBase maintains ACID semantics using Multiversion Concurrency Control •  Instead of overwriting state, create a new version of object with timestamp Timestamp Row fam1:col1 fam2:col2 t2 row1 val2 val2 t1 row1 val1 val1 •  Reads never have to block •  Note this timestamp is not externally visible! Internally called “memStoreTs” 23 ©2012 Cloudera, Inc. All Rights Reserved.
  • 24. HBase Write Path 1.  Write to WAL (per RegionServer) 2.  Write to In-Memory Sorted Map (MemStore) (per Region+ColumnFamily) 3.  Flush MemStore to disk as HFile when MemStore hits configurable hbase.hregion.memstore.flush.size 24 ©2012 Cloudera, Inc. All Rights Reserved.
  • 25. Internals / Bug Now that we know the internals – what could go wrong? 25 ©2012 Cloudera, Inc. All Rights Reserved.
  • 26. Putting it together Let’s go back to the beginning… MemStore Timestamp Row fam1:col1 fam2:col2 t1 row1 val1 val1 26 ©2012 Cloudera, Inc. All Rights Reserved.
  • 27. Putting it together Let’s go back to the beginning… MemStore Timestamp Row fam1:col1 fam2:col2 t1 row1 val1 val1 And start a scan. 27 ©2012 Cloudera, Inc. All Rights Reserved.
  • 28. Putting it together Let’s go back to the beginning… MemStore Timestamp Row fam1:col1 fam2:col2 t2 row1 val2 val2 t1 row1 val1 val1 And start a scan. And concurrently put. 28 ©2012 Cloudera, Inc. All Rights Reserved.
  • 29. Putting it together Let’s go back to the beginning… MemStore Timestamp Row fam1:col1 fam2:col2 t2 row1 val2 val2 t1 row1 val1 val1 And start a scan. HFile And concurrently put. Row fam2:col2: Which causes a flush. row1 val2 row1 val1 29 ©2012 Cloudera, Inc. All Rights Reserved.
  • 30. Putting it together Now, scan needs to make sense of this… MemStore Ts Row fam1:col1 t2 row1 val2 t1 row1 val1 HFile Row fam2:col2: row1 val2 row1 val1 30 ©2012 Cloudera, Inc. All Rights Reserved.
  • 31. Putting it together Now, scan needs to make sense of this… MemStore Ts Row fam1:col1 t2 row1 val2 t1 row1 val1 HFile Row fam2:col2: row1 val2 row1 val1 But HFile has no timestamp! 31 ©2012 Cloudera, Inc. All Rights Reserved.
  • 32. Putting it together Now, scan needs to make sense of this… MemStore Ts Row fam1:col1 t2 row1 val2 t1 row1 val1 HFile Inconsistent Result Row fam2:col2: Row fam1:col1 fam2:col2 row1 val2 row1 val1 val2 row1 val1 But HFile has no timestamp! 32 ©2012 Cloudera, Inc. All Rights Reserved.
  • 33. Solution Store the timestamp in the Hfile MemStore HFile Ts Row fam1:col1 Ts Row fam2:col2: t2 row1 val2 t2 row1 val2 t1 row1 val1 t1 row1 val1 Correct Result Row fam1:col1 fam2:col2 row1 val1 val2 Now we have all the information we need 33 ©2012 Cloudera, Inc. All Rights Reserved.
  • 34. Consistency •  Only some of the consistency issues in 0.90 –  e.g. HBASE-5121: MajorCompaction may affect scan's correctness •  Solution: Upgrade to 0.92 or 0.94 34 ©2012 Cloudera, Inc. All Rights Reserved.
  • 35. HBase 0.94 “Performance Release” 35 ©2012 Cloudera, Inc. All Rights Reserved.
  • 36. Performance Improvements in 0.94 •  HBASE-5047 Support checksums in HBase block cache •  HBASE-5199 Delete out of TTL store files before compaction selection •  HBASE-4608 HLog Compression •  HBASE-4465 Lazy-seek optimization for StoreFile scanners 36 ©2012 Cloudera, Inc. All Rights Reserved.
  • 37. Performance Improvements in 0.94 •  HBASE-5047 Support checksums in HBase block cache •  HBASE-5199 Delete out of TTL store files before compaction selection •  HBASE-4608 HLog Compression •  HBASE-4465 Lazy-seek optimization for StoreFile scanners 37 ©2012 Cloudera, Inc. All Rights Reserved.
  • 38. HBASE-5047 •  HDFS stores checksum is separate file HFile Checksum •  So each file read actually requires two disk iops •  HBase often bottlenecked by random disk ipos 38 ©2012 Cloudera, Inc. All Rights Reserved.
  • 39. HBASE-5047 Solution •  Solution: Store checksum in HFile block HFile HFile Block Chksum Data •  On by default (“hbase.regionserver.checksum.verify”) •  Bytes per checksum (“hbase.hstore.bytes.per.checksum”) – default is 16K 39 ©2012 Cloudera, Inc. All Rights Reserved.
  • 40. Performance Improvements in 0.94 •  HBASE-5047 Support checksums in HBase block cache •  HBASE-5199 Delete out of TTL store files before compaction selection •  HBASE-4608 HLog Compression •  HBASE-4465 Lazy-seek optimization for StoreFile scanners 40 ©2012 Cloudera, Inc. All Rights Reserved.
  • 41. HBASE-5199 •  User can specify TTL per column family •  If all values in the HFile are expired, delete HFile rather than compact •  Off by default, turn on via ("hbase.store.delete.expired.storefile“) 41 ©2012 Cloudera, Inc. All Rights Reserved.
  • 42. Conclusion •  Most consistency issues fixed in 0.92/ CDH4 •  Performance improvements in 0.94 •  0.94 is wire compatible with 0.92, so will be in a CDH4 update 42 ©2012 Cloudera, Inc. All Rights Reserved.
  • 43. References •  HBase Acid Semantics, http://hbase.apache.org/acid-semantics.html •  Apache HBase Meetup @ SU, Michael Stack. http://files.meetup.com/ 1350427/20120327hbase_meetup.pdf •  HBase Internals, Lars Hofhansl. http://www.cloudera.com/resource/hbasecon-2012- learning-hbase-internals/ 43 ©2012 Cloudera, Inc. All Rights Reserved.