SlideShare a Scribd company logo
1
Rigorous and Multi-tenant HBase Performance
Govind Kamat, Yanpei Chen
Performance Engineering
2
Bio
Govind Kamat
• Member of the Performance Engineering Team at Cloudera
• Focuses on Hadoop and HBase performance and scalability
• Experience includes the development of large-scale software systems,
microprocessor architecture, compilers and electronic design
Yanpei Chen
• Member of the Performance Engineering Team at Cloudera
• Works on cross-component performance - Hadoop, HBase, Search and Impala
• Ph.D. from UC Berkeley, focus on performance measurement method and theory
3
Outline
• Apache HBase overview
• Measuring performance + YCSB basics
• Cluster setup best practices
• Techniques for rigorous measurement
• HBase in a multi-tenant environment
4
HBase Overview
• Distributed, "NoSQL" key-value store
• Column-oriented, sorted map
• Keys are lexicographically sorted
• Multiple regions across “regionservers”
• Built on HDFS, MapReduce not required
5
Measuring HBase Performance is Hard!
• Numbers not reproducible
• Large run-to-run variation
• Testbeds not clearly defined/properly setup
• Various workloads have been used
• Configuration parameters not specified
• State of regionservers not taken into account
• Reported numbers not comparable
6
Cluster is
down … sigh!
7
Workloads for Performance Measurement
• Set of transactions to be imposed against it
• read, update, insert, scan and mixes thereof
• Initial data to be loaded into the DB
• Insert
• Transaction load intensity variation over time
• Possible HBase workloads:
• Actual customer/production workloads (best)
• PerformanceEvaluation (not really a workload )
• YCSB (Yahoo! Cloud Serving Benchmark, commonly used)
8
Yahoo! Cloud Serving Benchmark (YCSB) Basics
• Performance evaluation framework for key-value
databases, such as:
• HBase, Cassandra, Sherpa, Accumulo, Voldemort
• Abstracts out the client from the DB
• Flexible and configurable
• Comes with a standard “core” workload
• Reports throughput and latency metrics
9
YCSB Basics - Running YCSB
• Create a table called "usertable" in HBase
$ ycsb [load | run] hbase
-p workload=
com.yahoo.ycsb.workloads.CoreWorkload
-p columnfamily=cf
-p operationcount=1000000
-P workloads/randomWrite
-threads 10
-s
10
YCSB Basics – YCSB Parameters
• Specified like so: '-p property=value’
• columnfamily, fieldcount, fieldlength
• recordcount, operationcount
• readproportion, updateproportion, scanproportion, ..
• readallfields, writeallfields
• requestdistribution
• maxscanlength, scanlengthdistribution
• maxexecutiontime
11
YCSB Basics - YCSB Output 1/2
2014-05-28 17:08:34:025 1310 sec: 2951422 operations; 2737.33 current
ops/sec; [READ AverageLatency(us)=8098.29]
2014-05-28 17:08:44:026 1320 sec: 2972315 operations; 2089.09 current
ops/sec; [READ AverageLatency(us)=8671.15]
[OVERALL], RunTime(ms), 1334884.0
[OVERALL], Throughput(ops/sec), 2247.3862897450267
[READ], Operations, 3000000
[READ], AverageLatency(us), 8876.560442666667
[READ], MinLatency(us), 205
[READ], MaxLatency(us), 2530720
[READ], 95thPercentileLatency(ms), 9
[READ], 99thPercentileLatency(ms), 15
12
YCSB Basics - YCSB Output 2/2
[READ], 0, 2168499
[READ], 1, 445777
[READ], 2, 29748
[READ], 3, 32264
[READ], 4, 28154
[READ], 5, 26195
[READ], 6, 32222
[READ], 7, 39343
[READ], 8, 44038
[READ], 9, 41481
[...]
[READ], >1000, 11925
13
Cluster Setup Best Practices
• Setting up the cluster
• Configuring HBase
• Creating tables
• Pre-splitting tables
• Loading data
14
HBase Cluster Configuration Best Practices
• Use the appropriate hardware, correctly sized: memory, disk
• Dedicate separate nodes for master services and worker roles
• No Task Trackers and Node Managers on regionserver nodes
• Segregate clients from the regionservers
• Configure HBase properly:
• Block cache (read), memstore (write)
• Bloom filters, compression, compaction, short-circuit reads, etc.
• Use the appropriate data set size, number of regions, etc.
• Monitor the cluster constantly
15
16
Data Loading – Several Options
• Real, actual, production (hot) data 
• Custom loader
• PerformanceEvaluation
• Loading using YCSB
• HFileGenerator followed by bulk-load
17
Data Loading - Pre-split the Table
• Auto-splitting has significant overhead
• RegionSplitter utility
• UniformSplit
• HexStringSplit
• YCSB: user100000 .. user999999
hbase(main):1:0> create 'usertable', 'cf’,
{ SPLITS=> (1..(50-1)).map {|i| "user#{1000 +
i*9000/50}" } } #50 splits
• Set maximum region file size to a large value
18
Techniques for Rigorous Measurement
• Keep the input data set fixed
• Warm up the cache
• Set the target throughput
• Use the correct workload distribution
19
Keep the Input Data Set Fixed!
20
Keep the Input Data Set Fixed!
A beginning is the time for taking the most
delicate care that the balances are correct.
The manual of Muad’Dib
From “Dune” by Frank Herbert
21
Cluster is
down … sigh!
22
Warm Up the Cache
• Performance depends significantly on memory
• HBase block cache and OS page cache for reads
• Memstore and WAL for writes
• Load all the rows in the table
• Write until data starts getting flushed
• Compaction can affect performance significantly
• Carry out long-running tests
• Repeat till steady-state
• Otherwise, performance can vary a lot
23
Warm Up the Cache
24
Set the Target Throughput
• Two parameters to set desired throughput
• -threads
• -target
• Actual throughput will match target throughput ...
• ... until the DB hits its limit
• Performance may then begin to degrade
• This throughput defines maximum cluster performance
• Can be used to evaluate different HBase releases
• Otherwise, HBase is never stressed beyond saturation
25
Set the Target Throughput
26
Use the Appropriate Workload Distribution
• Various types possible
• Uniform (default, but unrealistic)
• Latest
• Hotspot
• Zipfian
27
Rigorous Measurement Techniques
• Set the cluster up properly
• Keep the input data set fixed
• Pre-split the key space
• Warm up the cache properly
• Set the target throughput
• Use the correct workload distribution
• Monitor cluster statistics continually
28 ©2014 Cloudera, Inc. All rights reserved.
• Multi-tenant as in different compute frameworks
Multi-tenant HBase Performance
29 ©2014 Cloudera, Inc. All rights reserved.
HBase in a Multi-tenant Environment
Integration
Storage
Resource Management
Metadata
Processing
Batch
MR
…
Interactive
SQL
Impala
Interactive
Search
Solr
Interactive
Serving
HBase
Machine
Learning
System
Management
Data
Management
Support
Security
30 ©2014 Cloudera, Inc. All rights reserved.
• Customer wants to do free-text search on data in HBase
• Explore relevant data beyond just key look-up
• This is “multi-tenant” as in multiple frameworks
• HBase + MapReduce + Cloudera Search (Apache Solr)
• Data indexed into Solr via MapReduce (or Lily HBase Indexer)
• Challenge is to not impact HBase and Solr performance
Real Multi-tenant Use Case
31 ©2014 Cloudera, Inc. All rights reserved.
• Inevitable constraints
• More processing, different processing on the same hardware
• Multi-tenant performance of each framework < stand-alone perf.
• Good multi-tenant performance means
• Efficient - good aggregate performance across HBase/MR/Search
• Fair - performance of each reflects assigned share of resources
• Elastic - transient spare resources get quickly and fully used
Multi-tenant Performance is Hard!
32 ©2014 Cloudera, Inc. All rights reserved.
• Configure HBase, Search, and MapReduce
• Large set of performance-relevant parameters for each
• Configure each for achieve a desired resource share
• Many implicit resource controls
• Setup the datasets for high performance
• How many regions for the HBase table
• How many shards for the Solr collection
Practically doing HBase  Solr via MapReduce
33 ©2014 Cloudera, Inc. All rights reserved.
Start with stand-alone performance
• Stand-alone MR indexing rate of HBase  Search
• Should be no lower than that for HDFS  Search
34 ©2014 Cloudera, Inc. All rights reserved.
• Stand-alone MR indexing rate of HBase  Search
• Should be no lower than that for HDFS  Search
Start with stand-alone performance
time
MapReduce indexing
HBase  Solr
resource
HBase, MR,
Solr all idle
HBase, MR,
Solr all idle
capacity
35 ©2014 Cloudera, Inc. All rights reserved.
• MR indexing HBase  Solr while both are active
• Test efficiency, fairness, elasticity
Multi-tenant Performance
HBase
transactions
HBase transactions HBase
transactions
MR indexing
HBase  Solr
Search
queries
Search
queriesSearch queries
time
resource
capacity
36 ©2014 Cloudera, Inc. All rights reserved.
• HBase essential to an enterprise data hub
• Need for multiple frameworks to analyze HBase data
• Challenging to define/measure multi-tenant performance
• Not tractable without rigorous techniques
• Look for discipline and rigor in performance numbers!
Recap
37 ©2014 Cloudera, Inc. All rights reserved.
• gkamat@cloudera.com
• yanpei@cloudera.com
Thanks!
38 ©2014 Cloudera, Inc. All rights reserved.
Backup slides
39
Building YCSB
$ git clone http://github.com/brianfrankcooper/YCSB
$ mvn package –DskipTests
diff --git a/pom.xml b/pom.xml
- <maven.assembly.version>2.2.1</maven.assembly.version>
- <hbase.version>0.92.1</hbase.version>
+ <maven.assembly.version>2.4</maven.assembly.version>
+ <hbase.version>0.98.1-hadoop2</hbase.version>
40
Building YCSB (contd.)
diff --git a/hbase/pom.xml b/hbase/pom.xml
- <artifactId>hbase</artifactId>
+ <artifactId>hbase-client</artifactId>
- <artifactId>hadoop-core</artifactId>
- <version>1.0.0</version>
+ <artifactId>hadoop-common</artifactId>
+ <version>2.3.0</version>

More Related Content

What's hot

HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
larsgeorge
 
Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path
HBaseCon
 
HBase Sizing Notes
HBase Sizing NotesHBase Sizing Notes
HBase Sizing Notes
larsgeorge
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
enissoz
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low Latency
Nick Dimiduk
 
HBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and CompactionHBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and Compaction
DataWorks Summit/Hadoop Summit
 
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceHBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
Cloudera, Inc.
 
HBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBase
HBaseCon
 
Rigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance MeasurementRigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance Measurement
DataWorks Summit
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
DataWorks Summit
 
HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014
Nick Dimiduk
 
Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction
HBaseCon
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
enissoz
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
JAX London
 
HBase Applications - Atlanta HUG - May 2014
HBase Applications - Atlanta HUG - May 2014HBase Applications - Atlanta HUG - May 2014
HBase Applications - Atlanta HUG - May 2014
larsgeorge
 
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon
 
004 architecture andadvanceduse
004 architecture andadvanceduse004 architecture andadvanceduse
004 architecture andadvanceduse
Scott Miao
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
Scott Miao
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
Cloudera, Inc.
 

What's hot (20)

HBase Sizing Guide
HBase Sizing GuideHBase Sizing Guide
HBase Sizing Guide
 
Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path Off-heaping the Apache HBase Read Path
Off-heaping the Apache HBase Read Path
 
HBase Sizing Notes
HBase Sizing NotesHBase Sizing Notes
HBase Sizing Notes
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 
Apache HBase Low Latency
Apache HBase Low LatencyApache HBase Low Latency
Apache HBase Low Latency
 
HBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and CompactionHBase Accelerated: In-Memory Flush and Compaction
HBase Accelerated: In-Memory Flush and Compaction
 
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, SalesforceHBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
HBaseCon 2012 | Learning HBase Internals - Lars Hofhansl, Salesforce
 
HBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBase
 
Rigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance MeasurementRigorous and Multi-tenant HBase Performance Measurement
Rigorous and Multi-tenant HBase Performance Measurement
 
HBase Storage Internals
HBase Storage InternalsHBase Storage Internals
HBase Storage Internals
 
HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014HBase Low Latency, StrataNYC 2014
HBase Low Latency, StrataNYC 2014
 
Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction
 
Meet HBase 1.0
Meet HBase 1.0Meet HBase 1.0
Meet HBase 1.0
 
HBase Advanced - Lars George
HBase Advanced - Lars GeorgeHBase Advanced - Lars George
HBase Advanced - Lars George
 
HBase Applications - Atlanta HUG - May 2014
HBase Applications - Atlanta HUG - May 2014HBase Applications - Atlanta HUG - May 2014
HBase Applications - Atlanta HUG - May 2014
 
HBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on MesosHBaseCon 2015: Elastic HBase on Mesos
HBaseCon 2015: Elastic HBase on Mesos
 
004 architecture andadvanceduse
004 architecture andadvanceduse004 architecture andadvanceduse
004 architecture andadvanceduse
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
001 hbase introduction
001 hbase introduction001 hbase introduction
001 hbase introduction
 
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
HBaseCon 2012 | HBase Coprocessors – Deploy Shared Functionality Directly on ...
 

Similar to Rigorous and Multi-tenant HBase Performance

HBase and Hadoop at Urban Airship
HBase and Hadoop at Urban AirshipHBase and Hadoop at Urban Airship
HBase and Hadoop at Urban Airship
dave_revell
 
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBaseNoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
Antonio Severien
 
Real time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackReal time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stack
DataWorks Summit/Hadoop Summit
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
Cloudera, Inc.
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
HBaseCon
 
Cheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduceCheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduce
Tilani Gunawardena PhD(UNIBAS), BSc(Pera), FHEA(UK), CEng, MIESL
 
Austin Scales- Clickstream Analytics at Bazaarvoice
Austin Scales- Clickstream Analytics at BazaarvoiceAustin Scales- Clickstream Analytics at Bazaarvoice
Austin Scales- Clickstream Analytics at Bazaarvoice
bazaarvoice_engineering
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Esther Kundin
 
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarC* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
DataStax Academy
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
JAX London
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Esther Kundin
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
DataWorks Summit/Hadoop Summit
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
larsgeorge
 
Wmware NoSQL
Wmware NoSQLWmware NoSQL
Wmware NoSQL
Murat Çakal
 
Managing growth in Production Hadoop Deployments
Managing growth in Production Hadoop DeploymentsManaging growth in Production Hadoop Deployments
Managing growth in Production Hadoop Deployments
DataWorks Summit
 
Hadoop_EcoSystem_Pradeep_MG
Hadoop_EcoSystem_Pradeep_MGHadoop_EcoSystem_Pradeep_MG
Hadoop_EcoSystem_Pradeep_MG
Pradeep MG
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Data Con LA
 
Aesop change data propagation
Aesop change data propagationAesop change data propagation
Aesop change data propagation
Regunath B
 
Impala presentation
Impala presentationImpala presentation
Impala presentation
trihug
 
Big data hadoop training in pune course content advanto software
Big data hadoop training in pune course content advanto softwareBig data hadoop training in pune course content advanto software
Big data hadoop training in pune course content advanto software
Advanto Software
 

Similar to Rigorous and Multi-tenant HBase Performance (20)

HBase and Hadoop at Urban Airship
HBase and Hadoop at Urban AirshipHBase and Hadoop at Urban Airship
HBase and Hadoop at Urban Airship
 
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBaseNoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
 
Real time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stackReal time fraud detection at 1+M scale on hadoop stack
Real time fraud detection at 1+M scale on hadoop stack
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
 
Large-scale Web Apps @ Pinterest
Large-scale Web Apps @ PinterestLarge-scale Web Apps @ Pinterest
Large-scale Web Apps @ Pinterest
 
Cheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduceCheetah:Data Warehouse on Top of MapReduce
Cheetah:Data Warehouse on Top of MapReduce
 
Austin Scales- Clickstream Analytics at Bazaarvoice
Austin Scales- Clickstream Analytics at BazaarvoiceAustin Scales- Clickstream Analytics at Bazaarvoice
Austin Scales- Clickstream Analytics at Bazaarvoice
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag JambhekarC* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
C* Summit 2013: Cassandra at eBay Scale by Feng Qu and Anurag Jambhekar
 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
 
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry TrendsBig Data and Hadoop - History, Technical Deep Dive, and Industry Trends
Big Data and Hadoop - History, Technical Deep Dive, and Industry Trends
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
Wmware NoSQL
Wmware NoSQLWmware NoSQL
Wmware NoSQL
 
Managing growth in Production Hadoop Deployments
Managing growth in Production Hadoop DeploymentsManaging growth in Production Hadoop Deployments
Managing growth in Production Hadoop Deployments
 
Hadoop_EcoSystem_Pradeep_MG
Hadoop_EcoSystem_Pradeep_MGHadoop_EcoSystem_Pradeep_MG
Hadoop_EcoSystem_Pradeep_MG
 
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
Big Data Day LA 2016/ Big Data Track - How To Use Impala and Kudu To Optimize...
 
Aesop change data propagation
Aesop change data propagationAesop change data propagation
Aesop change data propagation
 
Impala presentation
Impala presentationImpala presentation
Impala presentation
 
Big data hadoop training in pune course content advanto software
Big data hadoop training in pune course content advanto softwareBig data hadoop training in pune course content advanto software
Big data hadoop training in pune course content advanto software
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.
AnkitaPandya11
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
Alina Yurenko
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
Alberto Brandolini
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
Green Software Development
 
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
kalichargn70th171
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
Lecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptxLecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptx
TaghreedAltamimi
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
Philip Schwarz
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
Patrick Weigel
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
Remote DBA Services
 
UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
Peter Muessig
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
Quickdice ERP
 
Project Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdfProject Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdf
Karya Keeper
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
sjcobrien
 
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
gapen1
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
Green Software Development
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
Rakesh Kumar R
 
How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
ToXSL Technologies
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 

Recently uploaded (20)

fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
 
Modelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - AmsterdamModelling Up - DDDEurope 2024 - Amsterdam
Modelling Up - DDDEurope 2024 - Amsterdam
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
 
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
Lecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptxLecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptx
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 
Hand Rolled Applicative User Validation Code Kata
Hand Rolled Applicative User ValidationCode KataHand Rolled Applicative User ValidationCode Kata
Hand Rolled Applicative User Validation Code Kata
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
 
UI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design SystemUI5con 2024 - Bring Your Own Design System
UI5con 2024 - Bring Your Own Design System
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
 
Project Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdfProject Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdf
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
 
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
如何办理(hull学位证书)英国赫尔大学毕业证硕士文凭原版一模一样
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
 
How to write a program in any programming language
How to write a program in any programming languageHow to write a program in any programming language
How to write a program in any programming language
 
How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?How Can Hiring A Mobile App Development Company Help Your Business Grow?
How Can Hiring A Mobile App Development Company Help Your Business Grow?
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 

Rigorous and Multi-tenant HBase Performance

  • 1. 1 Rigorous and Multi-tenant HBase Performance Govind Kamat, Yanpei Chen Performance Engineering
  • 2. 2 Bio Govind Kamat • Member of the Performance Engineering Team at Cloudera • Focuses on Hadoop and HBase performance and scalability • Experience includes the development of large-scale software systems, microprocessor architecture, compilers and electronic design Yanpei Chen • Member of the Performance Engineering Team at Cloudera • Works on cross-component performance - Hadoop, HBase, Search and Impala • Ph.D. from UC Berkeley, focus on performance measurement method and theory
  • 3. 3 Outline • Apache HBase overview • Measuring performance + YCSB basics • Cluster setup best practices • Techniques for rigorous measurement • HBase in a multi-tenant environment
  • 4. 4 HBase Overview • Distributed, "NoSQL" key-value store • Column-oriented, sorted map • Keys are lexicographically sorted • Multiple regions across “regionservers” • Built on HDFS, MapReduce not required
  • 5. 5 Measuring HBase Performance is Hard! • Numbers not reproducible • Large run-to-run variation • Testbeds not clearly defined/properly setup • Various workloads have been used • Configuration parameters not specified • State of regionservers not taken into account • Reported numbers not comparable
  • 7. 7 Workloads for Performance Measurement • Set of transactions to be imposed against it • read, update, insert, scan and mixes thereof • Initial data to be loaded into the DB • Insert • Transaction load intensity variation over time • Possible HBase workloads: • Actual customer/production workloads (best) • PerformanceEvaluation (not really a workload ) • YCSB (Yahoo! Cloud Serving Benchmark, commonly used)
  • 8. 8 Yahoo! Cloud Serving Benchmark (YCSB) Basics • Performance evaluation framework for key-value databases, such as: • HBase, Cassandra, Sherpa, Accumulo, Voldemort • Abstracts out the client from the DB • Flexible and configurable • Comes with a standard “core” workload • Reports throughput and latency metrics
  • 9. 9 YCSB Basics - Running YCSB • Create a table called "usertable" in HBase $ ycsb [load | run] hbase -p workload= com.yahoo.ycsb.workloads.CoreWorkload -p columnfamily=cf -p operationcount=1000000 -P workloads/randomWrite -threads 10 -s
  • 10. 10 YCSB Basics – YCSB Parameters • Specified like so: '-p property=value’ • columnfamily, fieldcount, fieldlength • recordcount, operationcount • readproportion, updateproportion, scanproportion, .. • readallfields, writeallfields • requestdistribution • maxscanlength, scanlengthdistribution • maxexecutiontime
  • 11. 11 YCSB Basics - YCSB Output 1/2 2014-05-28 17:08:34:025 1310 sec: 2951422 operations; 2737.33 current ops/sec; [READ AverageLatency(us)=8098.29] 2014-05-28 17:08:44:026 1320 sec: 2972315 operations; 2089.09 current ops/sec; [READ AverageLatency(us)=8671.15] [OVERALL], RunTime(ms), 1334884.0 [OVERALL], Throughput(ops/sec), 2247.3862897450267 [READ], Operations, 3000000 [READ], AverageLatency(us), 8876.560442666667 [READ], MinLatency(us), 205 [READ], MaxLatency(us), 2530720 [READ], 95thPercentileLatency(ms), 9 [READ], 99thPercentileLatency(ms), 15
  • 12. 12 YCSB Basics - YCSB Output 2/2 [READ], 0, 2168499 [READ], 1, 445777 [READ], 2, 29748 [READ], 3, 32264 [READ], 4, 28154 [READ], 5, 26195 [READ], 6, 32222 [READ], 7, 39343 [READ], 8, 44038 [READ], 9, 41481 [...] [READ], >1000, 11925
  • 13. 13 Cluster Setup Best Practices • Setting up the cluster • Configuring HBase • Creating tables • Pre-splitting tables • Loading data
  • 14. 14 HBase Cluster Configuration Best Practices • Use the appropriate hardware, correctly sized: memory, disk • Dedicate separate nodes for master services and worker roles • No Task Trackers and Node Managers on regionserver nodes • Segregate clients from the regionservers • Configure HBase properly: • Block cache (read), memstore (write) • Bloom filters, compression, compaction, short-circuit reads, etc. • Use the appropriate data set size, number of regions, etc. • Monitor the cluster constantly
  • 15. 15
  • 16. 16 Data Loading – Several Options • Real, actual, production (hot) data  • Custom loader • PerformanceEvaluation • Loading using YCSB • HFileGenerator followed by bulk-load
  • 17. 17 Data Loading - Pre-split the Table • Auto-splitting has significant overhead • RegionSplitter utility • UniformSplit • HexStringSplit • YCSB: user100000 .. user999999 hbase(main):1:0> create 'usertable', 'cf’, { SPLITS=> (1..(50-1)).map {|i| "user#{1000 + i*9000/50}" } } #50 splits • Set maximum region file size to a large value
  • 18. 18 Techniques for Rigorous Measurement • Keep the input data set fixed • Warm up the cache • Set the target throughput • Use the correct workload distribution
  • 19. 19 Keep the Input Data Set Fixed!
  • 20. 20 Keep the Input Data Set Fixed! A beginning is the time for taking the most delicate care that the balances are correct. The manual of Muad’Dib From “Dune” by Frank Herbert
  • 22. 22 Warm Up the Cache • Performance depends significantly on memory • HBase block cache and OS page cache for reads • Memstore and WAL for writes • Load all the rows in the table • Write until data starts getting flushed • Compaction can affect performance significantly • Carry out long-running tests • Repeat till steady-state • Otherwise, performance can vary a lot
  • 23. 23 Warm Up the Cache
  • 24. 24 Set the Target Throughput • Two parameters to set desired throughput • -threads • -target • Actual throughput will match target throughput ... • ... until the DB hits its limit • Performance may then begin to degrade • This throughput defines maximum cluster performance • Can be used to evaluate different HBase releases • Otherwise, HBase is never stressed beyond saturation
  • 25. 25 Set the Target Throughput
  • 26. 26 Use the Appropriate Workload Distribution • Various types possible • Uniform (default, but unrealistic) • Latest • Hotspot • Zipfian
  • 27. 27 Rigorous Measurement Techniques • Set the cluster up properly • Keep the input data set fixed • Pre-split the key space • Warm up the cache properly • Set the target throughput • Use the correct workload distribution • Monitor cluster statistics continually
  • 28. 28 ©2014 Cloudera, Inc. All rights reserved. • Multi-tenant as in different compute frameworks Multi-tenant HBase Performance
  • 29. 29 ©2014 Cloudera, Inc. All rights reserved. HBase in a Multi-tenant Environment Integration Storage Resource Management Metadata Processing Batch MR … Interactive SQL Impala Interactive Search Solr Interactive Serving HBase Machine Learning System Management Data Management Support Security
  • 30. 30 ©2014 Cloudera, Inc. All rights reserved. • Customer wants to do free-text search on data in HBase • Explore relevant data beyond just key look-up • This is “multi-tenant” as in multiple frameworks • HBase + MapReduce + Cloudera Search (Apache Solr) • Data indexed into Solr via MapReduce (or Lily HBase Indexer) • Challenge is to not impact HBase and Solr performance Real Multi-tenant Use Case
  • 31. 31 ©2014 Cloudera, Inc. All rights reserved. • Inevitable constraints • More processing, different processing on the same hardware • Multi-tenant performance of each framework < stand-alone perf. • Good multi-tenant performance means • Efficient - good aggregate performance across HBase/MR/Search • Fair - performance of each reflects assigned share of resources • Elastic - transient spare resources get quickly and fully used Multi-tenant Performance is Hard!
  • 32. 32 ©2014 Cloudera, Inc. All rights reserved. • Configure HBase, Search, and MapReduce • Large set of performance-relevant parameters for each • Configure each for achieve a desired resource share • Many implicit resource controls • Setup the datasets for high performance • How many regions for the HBase table • How many shards for the Solr collection Practically doing HBase  Solr via MapReduce
  • 33. 33 ©2014 Cloudera, Inc. All rights reserved. Start with stand-alone performance • Stand-alone MR indexing rate of HBase  Search • Should be no lower than that for HDFS  Search
  • 34. 34 ©2014 Cloudera, Inc. All rights reserved. • Stand-alone MR indexing rate of HBase  Search • Should be no lower than that for HDFS  Search Start with stand-alone performance time MapReduce indexing HBase  Solr resource HBase, MR, Solr all idle HBase, MR, Solr all idle capacity
  • 35. 35 ©2014 Cloudera, Inc. All rights reserved. • MR indexing HBase  Solr while both are active • Test efficiency, fairness, elasticity Multi-tenant Performance HBase transactions HBase transactions HBase transactions MR indexing HBase  Solr Search queries Search queriesSearch queries time resource capacity
  • 36. 36 ©2014 Cloudera, Inc. All rights reserved. • HBase essential to an enterprise data hub • Need for multiple frameworks to analyze HBase data • Challenging to define/measure multi-tenant performance • Not tractable without rigorous techniques • Look for discipline and rigor in performance numbers! Recap
  • 37. 37 ©2014 Cloudera, Inc. All rights reserved. • gkamat@cloudera.com • yanpei@cloudera.com Thanks!
  • 38. 38 ©2014 Cloudera, Inc. All rights reserved. Backup slides
  • 39. 39 Building YCSB $ git clone http://github.com/brianfrankcooper/YCSB $ mvn package –DskipTests diff --git a/pom.xml b/pom.xml - <maven.assembly.version>2.2.1</maven.assembly.version> - <hbase.version>0.92.1</hbase.version> + <maven.assembly.version>2.4</maven.assembly.version> + <hbase.version>0.98.1-hadoop2</hbase.version>
  • 40. 40 Building YCSB (contd.) diff --git a/hbase/pom.xml b/hbase/pom.xml - <artifactId>hbase</artifactId> + <artifactId>hbase-client</artifactId> - <artifactId>hadoop-core</artifactId> - <version>1.0.0</version> + <artifactId>hadoop-common</artifactId> + <version>2.3.0</version>