SlideShare a Scribd company logo
YCSB++ BenchmarkingTool
PerformanceDebuggingAdvanced
FeaturesofScalableTableStores
Swapnil Patil
Milo Polte,WittawatTantisiriroj, Kai Ren, Lin Xiao, Julio Lopez, Garth Gibson, Adam Fuchs *, Billie Rinaldi *
CarnegieMellonUniversity *NationalSecurityAgency
Open Cirrus Summit, October 2011, Atlanta GA
Scalable table stores are critical systems
Swapnil Patil, CMU 2
•  For data processing & analysis (e.g. Pregel, Hive)
•  For systems services (e.g., Google Colossus metadata)
Evolution of scalable table stores
Simple, lightweight  complex, feature-rich stores
Supports a broader range of applications and services
Hard to debug and understand performance problems
Complex behavior and interaction of various components
GrowingsetofHBasefeatures
2008 2009 2010 2011+
RangeRowFilters
Batch updates
Bulk load tools
RegEx filtering
Scan optimizations
HBASE
release
Co-processors
Access Control
⏏

⏏

⏏

3Swapnil Patil, CMU
YCSB++ FUNCTIONALITY
ZooKeeper-based distributed and coordinated testing
API and extensions the new Apache ACCUMULO DB
Fine-grained, correlated monitoring usingOTUS
FEATURES TESTED USING YCSB++
Batch writing Table pre-splitting  Bulk loading
Weak consistency  Server-side filtering  Fine-grained security
Tool released at http://www.pdl.cmu.edu/ycsb++
Swapnil Patil, CMU 4
Need richer tools for understanding
advanced features in table stores …
Outline
•  Problem
•  YCSB++ design
•  Illustrative examples
•  Ongoing work and summary
Swapnil Patil, CMU 5
Yahoo Cloud Serving Benchmark [Cooper2010]
Swapnil Patil, CMU 6
•  For CRUD (create-read-update-delete) benchmarking
•  Single-node system with an extensible API
Storage Servers
HBASE
OTHER
DBS
Workload
Executor
Threads
Stats
DBClients
Command-line
Parameters
Workload
Parameter
File
YCSB++: New extensions
Swapnil Patil, CMU 7
Added support for the new Apache ACCUMULO DB
− New parameters and workload executors
Storage Servers
HBASE
OTHER
DBS
Workload
Executor
Threads
Stats
DBClients
Workload
Parameter
File
Command-line
Parameters
EXTENSIONSEXTENSIONS ACCUMULO
YCSB++: Distributed & parallel tests
Swapnil Patil, CMU 8
Multi-client, multi-phase coordination using ZooKeeper
− Enables testing at large scales and testing asymmetric features
Storage Servers
HBASE
OTHER
DBS
Workload
Executor
Threads
Stats
DBClients
Workload
Parameter
File
Command-line
Parameters
EXTENSIONSEXTENSIONS
MULTI-PHASE
ACCUMULO
YCSB clients
COORDINATION
YCSB++: Collective monitoring
Swapnil Patil, CMU 9
OTUS monitor built on Ganglia [Ren2011]
− Collects information fromYCSB, table stores, HDFS and OS
Storage Servers
HBASE
OTHER
DBS
Workload
Executor
Threads
Stats
DBClients
Workload
Parameter
File
Command-line
Parameters
EXTENSIONSEXTENSIONS
MULTI-PHASE
ACCUMULO
YCSB clients
COORDINATION OTUS MONITORING
Example ofYCSB++ debugging
Swapnil Patil, CMU 10
OTUS collects fine-grained information
− Both HDFS process andTabletServer process on same node
0
20
40
60
80
100
00:00 04:00 08:00 12:00 16:00 20:00 00:00 04:00
0
8
16
24
32
40
CPUUsage(%)
AvgNumberofStoreFilesPerTablet
Time (Minutes)
Monitoring Resource Usage and TableStore Metrics
Accumulo Avg. StoreFiles per Tablet
HDFS DataNode CPU Usage
Accumulo TabletServer CPU Usage
Outline
•  Problem
•  YCSB++ design
•  Illustrative examples
− YCSB++ on HBASE and ACCUMULO (Bigtable-like stores)
•  Ongoing work and summary
Swapnil Patil, CMU 11
Tablet Servers
Recap of Bigtable-like table stores
Swapnil Patil, CMU 12
HDFS nodes
TabletTN
Memtable
(Fewer)
Sorted
Indexed
Files
Sorted
Indexed
Files
MINOR
COMPACTION
MAJOR
COMPACTION
Write
Ahead
Log
Data
Insertion
1 2
3
Write-path: in-memory buffering & async FS writes
1) Mutations logged in memory tables (unsorted order)
2) Minor compaction: Memtables -> sorted, indexed files in HDFS
3) Major compaction: LSM-tree based file merging in background
Read-path: lookup both memtables and on-disk files
Apache ACCUMULO
Started at NSA; now an Apache Incubator project
− Designed for for high-speed ingest and scan workloads
− http://incubator.apache.org/projects/accumulo.html
New features in ACCUMULO
− Iterator framework for user-specified programs placed in
between different stages of the DB pipeline
  E.g., Support joins and stream processing using iterators
− Also supports fine-grained cell-level access control
Swapnil Patil, CMU 13
ILLUSTRATIVE EXAMPLE #1
Analyzing the
fast inserts vs. weak consistency
tradeoff usingYCSB++
Swapnil Patil, CMU 14
Client-side batch writing
Feature: clients batch inserts, delay writes to server
•  Improves insert throughput and latency
•  Newly inserted data may not be immediately visible to
other clients
Swapnil Patil, CMU 15
⏏

⏏

Table store servers
ZooKeeper
Cluster
Manager
YCSB++
Store client
Batch
YCSB++
Store client
CLIENT #1 CLIENT #2
Read{K}
Batch writing improves throughput
6 clients creating 9 million 1-Kbyte records on 6 servers
− Small batches - high client CPU utilization, limits throughput
− Large batches - saturate servers, limited benefit from batching
Swapnil Patil, CMU 16
0
10
20
30
40
50
60
10 KB 100 KB 1 MB 10 MB
Insertspersecond(1000s)
Batch size
Hbase Accumulo
Table store servers
ZooKeeper
Batch writing causes weak consistency
Swapnil Patil, CMU 17
Test setup: ZooKeeper-based client coordination
•  Share producer-consumer queue between readers/writers
•  R-W lag = delay before C2 can read C1’s most recent write
YCSB++
Store client
Batch
YCSB++
Store client
1
2 3
4
CLIENT #1 CLIENT #2
Insert
{K:V}
(106 records)
EnqueueK
(sample 1% records)
Polland
dequeueK
Read{K}
Batch writing causes weak consistency
Deferred write wins, but lag can be ~100 seconds
− (N%) = fraction of requests that needed multiple read()s
− Implementation of batching affects the median latency
Swapnil Patil, CMU 18
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 10 100 1000 10000 100000
Fractionofrequests
read-after-write time lag (ms)
(a) HBase: Time lag for different buffer sizes
10 KB ( <1%)
100 KB (7.4%)
1 MB ( 17%)
10 MB ( 23%)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 10 100 1000 10000 100000
Fractionofrequests
read-after-write time lag (ms)
(b) Accumulo: Time lag for different buffer sizes
10 KB ( <1%)
100 KB (1.2%)
1 MB ( 14%)
10 MB ( 33%)
ILLUSTRATIVE EXAMPLE #2
Benchmarking
high-speed ingest features
usingYCSB++
Swapnil Patil, CMU 19
Features for high-speed insertions
Most table stores have high-speed ingest features
− Periodically insert large amounts of data or migrate old
data in bulk
− Classic relational DB techniques applied to new stores
Two features: bulk loading and table pre-splitting
•  Less data migration during inserts
•  Engages more tablet servers immediately
•  Need careful tuning and configuration [Sasha2002]
Swapnil Patil, CMU 20
⏏

⏏

⏏

8-phase test setup: table bulk loading
Bulk loading involves two steps
− Hadoop-based data formatting
− Importing store files into table store
Pre-load phase (1 and 2)
− Bulk load 6M rows in an empty table
− Goal: parallelism by engaging all servers
Load phase (4 and 5)
− Load 48M new rows
− Goal: study rebalancing during ingest
R/U measurements (3, 6 and 7)
− Correlate latency with rebalancing work
Swapnil Patil, CMU 21
Load (importing)
Read/Update workload
Load (re-formatting)
Read/Update workload
Sleep
Read/Update workload
Pre-Load (importing)
Pre-Load (re-formatting)
Phases
1
2
3
4
5
6
7
8
Read latency affected by rebalancing work
Swapnil Patil, CMU 22
Load (importing)
Read/Update workload
Load (re-formatting)
Read/Update workload
Sleep
Read/Update workload
Pre-Load (importing)
Pre-Load (re-formatting)
Phases
1
2
3
4
5
6
7
8
1
10
100
1000
0 60 120 180 240 300
AccumuloReadLatency(ms)
Measurement Phase RunningTime (Seconds)
R/U 1 (Phase 3) R/U 2 (Phase 6) R/U 3 (Phase 8)
•  High latency after high insertion periods that
cause servers to rebalance (compactions)
•  Latency drops after store is in a steady state
Rebalancing on ACCUMULO servers
Swapnil Patil, CMU 23
Load (importing)
Read/Update workload
Load (re-formatting)
Read/Update workload
Sleep
Read/Update workload
Pre-Load (importing)
Pre-Load (re-formatting)
Phases
1
2
3
4
5
6
7
8
•  OTUS monitor shows the server-side
compactions during post-ingest
measurement phases
1
10
100
1000
0 300 600 900 1200 1500 1800
Experiment RunningTime (sec)
StoreFiles
Tablets
Compactions
HBASE is slower: Different compaction policies
Swapnil Patil, CMU 24
1
10
100
1000
10000
0 60 120 180 240 300
AccumuloReadLatency(ms)
Measurement Phase RunningTime (Seconds)
R/U 1 (Phase 3) R/U 2 (Phase 6) R/U 3 (Phase 8)
1
10
100
1000
0 300 600 900 1200 1500 1800
Accumulo Experiment RunningTime (sec)
StoreFiles
Tablets
Compactions
1
10
100
1000
10000
0 60 120 180 240 300
HBaseReadLatency(ms)
Measurement Phase RunningTime (Seconds)
1
10
100
1000
0 300 600 900 1200 1500 1800
HBase Experiment RunningTime (sec)
Extending to table pre-splitting
Swapnil Patil, CMU 25
Tablepre-splittingtest
Load
Pre-load
Pre-split into N ranges
Read/Update workload
Sleep
Read/Update workload
Load (importing)
Read/Update workload
Load (re-formatting)
Read/Update workload
Sleep
Read/Update workload
Pre-Load (importing)
Pre-Load (re-formatting)
Bulkloadingtest
Pre-split a key range into N partitions to avoid splitting during insertion
Outline
•  Problem
•  YCSB++ design
•  Illustrative examples
•  Ongoing work and summary
Swapnil Patil, CMU 26
Things not covered in this talk
More features: function shipping to servers
− Data filtering at the servers
− Fine-grained, cell-level access control
MoredetailsintheACMSOCC2011paper
Ongoing work
− Analyze more table stores: Cassandra,CouchDB, MongoDB
− Continue research through the new Intel Science and
Technology Center for Cloud Computing at CMU (withGaTech)
Swapnil Patil, CMU 27
Summary:YCSB++ tool
•  Tool for performance debugging and benchmarking
advanced features using new extensions toYCSB
•  Two case-studies: Apache HBASE and ACCUMULO
•  Tool available at http://www.pdl.cmu.edu/ycsb++
Weak consistency semantics Distributed clients using ZooKeeper
Fast insertions (pre-splits & bulk loads) Multi-phase testing (with Hadoop)
Server-side filtering New workload generators and
database client API extensionsFine-grained access control
28Swapnil Patil, CMU

More Related Content

What's hot

Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
DataStax
 
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
Joe Stein
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value StoreSantal Li
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)
Romain Jacotin
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
jbellis
 
Cassandra useful features
Cassandra useful featuresCassandra useful features
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaHBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
Cloudera, Inc.
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
Instaclustr
 
Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )varasteh65
 
Apache Cassandra at Macys
Apache Cassandra at MacysApache Cassandra at Macys
Apache Cassandra at Macys
DataStax Academy
 
Bigtable and Dynamo
Bigtable and DynamoBigtable and Dynamo
Bigtable and Dynamo
Iraklis Psaroudakis
 
Cost and performance comparison for OpenStack compute and storage infrastructure
Cost and performance comparison for OpenStack compute and storage infrastructureCost and performance comparison for OpenStack compute and storage infrastructure
Cost and performance comparison for OpenStack compute and storage infrastructure
Principled Technologies
 
Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction
HBaseCon
 
ClustrixDB 7.5 Announcement
ClustrixDB 7.5 AnnouncementClustrixDB 7.5 Announcement
ClustrixDB 7.5 Announcement
Clustrix
 
Performance of persistent apps on Container-Native Storage for Red Hat OpenSh...
Performance of persistent apps on Container-Native Storage for Red Hat OpenSh...Performance of persistent apps on Container-Native Storage for Red Hat OpenSh...
Performance of persistent apps on Container-Native Storage for Red Hat OpenSh...
Principled Technologies
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
DataStax
 
006 performance tuningandclusteradmin
006 performance tuningandclusteradmin006 performance tuningandclusteradmin
006 performance tuningandclusteradminScott Miao
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.Jack Levin
 

What's hot (20)

Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
Running 400-node Cassandra + Spark Clusters in Azure (Anubhav Kale, Microsoft...
 
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value Store
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)
 
Cassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + DynamoCassandra: Open Source Bigtable + Dynamo
Cassandra: Open Source Bigtable + Dynamo
 
Cassandra useful features
Cassandra useful featuresCassandra useful features
Cassandra useful features
 
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - ClouderaHBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
HBaseCon 2012 | Base Metrics: What They Mean to You - Cloudera
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
 
Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )Oracle Real Application Cluster ( RAC )
Oracle Real Application Cluster ( RAC )
 
Apache Cassandra at Macys
Apache Cassandra at MacysApache Cassandra at Macys
Apache Cassandra at Macys
 
Bigtable and Dynamo
Bigtable and DynamoBigtable and Dynamo
Bigtable and Dynamo
 
Cost and performance comparison for OpenStack compute and storage infrastructure
Cost and performance comparison for OpenStack compute and storage infrastructureCost and performance comparison for OpenStack compute and storage infrastructure
Cost and performance comparison for OpenStack compute and storage infrastructure
 
Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction Apache HBase, Accelerated: In-Memory Flush and Compaction
Apache HBase, Accelerated: In-Memory Flush and Compaction
 
ClustrixDB 7.5 Announcement
ClustrixDB 7.5 AnnouncementClustrixDB 7.5 Announcement
ClustrixDB 7.5 Announcement
 
Performance of persistent apps on Container-Native Storage for Red Hat OpenSh...
Performance of persistent apps on Container-Native Storage for Red Hat OpenSh...Performance of persistent apps on Container-Native Storage for Red Hat OpenSh...
Performance of persistent apps on Container-Native Storage for Red Hat OpenSh...
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
 
006 performance tuningandclusteradmin
006 performance tuningandclusteradmin006 performance tuningandclusteradmin
006 performance tuningandclusteradmin
 
Hug Hbase Presentation.
Hug Hbase Presentation.Hug Hbase Presentation.
Hug Hbase Presentation.
 
Exch2007 sp1 win2008
Exch2007 sp1 win2008Exch2007 sp1 win2008
Exch2007 sp1 win2008
 

Similar to Ycsb benchmarking

Adventures in RDS Load Testing
Adventures in RDS Load TestingAdventures in RDS Load Testing
Adventures in RDS Load Testing
Mike Harnish
 
Using AWR for IO Subsystem Analysis
Using AWR for IO Subsystem AnalysisUsing AWR for IO Subsystem Analysis
Using AWR for IO Subsystem Analysis
Texas Memory Systems, and IBM Company
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
MLconf
 
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
Accumulo Summit
 
Thomas+Niewel+ +Oracletuning
Thomas+Niewel+ +OracletuningThomas+Niewel+ +Oracletuning
Thomas+Niewel+ +Oracletuningafa reg
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
Amazon Web Services
 
How should I monitor my idaa
How should I monitor my idaaHow should I monitor my idaa
How should I monitor my idaa
Cuneyt Goksu
 
C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf
C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris WolfC* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf
C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf
DataStax Academy
 
SQLIO - measuring storage performance
SQLIO - measuring storage performanceSQLIO - measuring storage performance
SQLIO - measuring storage performance
valerian_ceaus
 
Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL ServerStephen Rose
 
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
Cloudera, Inc.
 
GECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon2017_High-volume data streaming in azure_ Aliaksandr LaishaGECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon_Org Team
 
HeroLympics Eng V03 Henk Vd Valk
HeroLympics  Eng V03 Henk Vd ValkHeroLympics  Eng V03 Henk Vd Valk
HeroLympics Eng V03 Henk Vd Valk
hvdvalk
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
Coburn Watson
 
Dba tuning
Dba tuningDba tuning
Top 10 tips for Oracle performance
Top 10 tips for Oracle performanceTop 10 tips for Oracle performance
Top 10 tips for Oracle performance
Guy Harrison
 
Nosql series-part-3-hypertable
Nosql series-part-3-hypertableNosql series-part-3-hypertable
Nosql series-part-3-hypertablehypertable
 
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Plat...
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Plat...Data on the Move: Transitioning from a Legacy Architecture to a Big Data Plat...
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Plat...
MapR Technologies
 
Exchange Server 2013 Database and Store Changes
Exchange Server 2013 Database and Store ChangesExchange Server 2013 Database and Store Changes
Exchange Server 2013 Database and Store Changes
Microsoft TechNet - Belgium and Luxembourg
 

Similar to Ycsb benchmarking (20)

Adventures in RDS Load Testing
Adventures in RDS Load TestingAdventures in RDS Load Testing
Adventures in RDS Load Testing
 
Using AWR for IO Subsystem Analysis
Using AWR for IO Subsystem AnalysisUsing AWR for IO Subsystem Analysis
Using AWR for IO Subsystem Analysis
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
Accumulo Summit 2015: Ferrari on a Bumpy Road: Shock Absorbers to Smooth Out ...
 
Thomas+Niewel+ +Oracletuning
Thomas+Niewel+ +OracletuningThomas+Niewel+ +Oracletuning
Thomas+Niewel+ +Oracletuning
 
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
(DAT402) Amazon RDS PostgreSQL:Lessons Learned & New Features
 
How should I monitor my idaa
How should I monitor my idaaHow should I monitor my idaa
How should I monitor my idaa
 
C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf
C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris WolfC* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf
C* Summit 2013: CMB: An Open Message Bus for the Cloud by Boris Wolf
 
Using Statspack and AWR for Memory Monitoring and Tuning
Using Statspack and AWR for Memory Monitoring and TuningUsing Statspack and AWR for Memory Monitoring and Tuning
Using Statspack and AWR for Memory Monitoring and Tuning
 
SQLIO - measuring storage performance
SQLIO - measuring storage performanceSQLIO - measuring storage performance
SQLIO - measuring storage performance
 
Troubleshooting SQL Server
Troubleshooting SQL ServerTroubleshooting SQL Server
Troubleshooting SQL Server
 
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini HBaseCon 2012 | HBase, the Use Case in eBay Cassini
HBaseCon 2012 | HBase, the Use Case in eBay Cassini
 
GECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon2017_High-volume data streaming in azure_ Aliaksandr LaishaGECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
GECon2017_High-volume data streaming in azure_ Aliaksandr Laisha
 
HeroLympics Eng V03 Henk Vd Valk
HeroLympics  Eng V03 Henk Vd ValkHeroLympics  Eng V03 Henk Vd Valk
HeroLympics Eng V03 Henk Vd Valk
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
Dba tuning
Dba tuningDba tuning
Dba tuning
 
Top 10 tips for Oracle performance
Top 10 tips for Oracle performanceTop 10 tips for Oracle performance
Top 10 tips for Oracle performance
 
Nosql series-part-3-hypertable
Nosql series-part-3-hypertableNosql series-part-3-hypertable
Nosql series-part-3-hypertable
 
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Plat...
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Plat...Data on the Move: Transitioning from a Legacy Architecture to a Big Data Plat...
Data on the Move: Transitioning from a Legacy Architecture to a Big Data Plat...
 
Exchange Server 2013 Database and Store Changes
Exchange Server 2013 Database and Store ChangesExchange Server 2013 Database and Store Changes
Exchange Server 2013 Database and Store Changes
 

More from Sqrrl

Transitioning Government Technology
Transitioning Government TechnologyTransitioning Government Technology
Transitioning Government Technology
Sqrrl
 
Leveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your HuntsLeveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your Hunts
Sqrrl
 
How to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your NetworkHow to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your Network
Sqrrl
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting Started
Sqrrl
 
Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)
Sqrrl
 
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior GraphUser and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
Sqrrl
 
Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)
Sqrrl
 
Sqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar UsersSqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl
 
Threat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivityThreat Hunting for Command and Control Activity
Threat Hunting for Command and Control Activity
Sqrrl
 
Modernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led TrainingModernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led Training
Sqrrl
 
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Sqrrl
 
Leveraging DNS to Surface Attacker Activity
Leveraging DNS to Surface Attacker ActivityLeveraging DNS to Surface Attacker Activity
Leveraging DNS to Surface Attacker Activity
Sqrrl
 
The Art and Science of Alert Triage
The Art and Science of Alert TriageThe Art and Science of Alert Triage
The Art and Science of Alert Triage
Sqrrl
 
Reducing Mean Time to Know
Reducing Mean Time to KnowReducing Mean Time to Know
Reducing Mean Time to Know
Sqrrl
 
Sqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl
 
The Linked Data Advantage
The Linked Data AdvantageThe Linked Data Advantage
The Linked Data Advantage
Sqrrl
 
Sqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, AnalyzeSqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl
 
Sqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber HuntingSqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber Hunting
Sqrrl
 
Benchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value StoreBenchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value Store
Sqrrl
 
Scalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelScalable Graph Clustering with Pregel
Scalable Graph Clustering with Pregel
Sqrrl
 

More from Sqrrl (20)

Transitioning Government Technology
Transitioning Government TechnologyTransitioning Government Technology
Transitioning Government Technology
 
Leveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your HuntsLeveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your Hunts
 
How to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your NetworkHow to Hunt for Lateral Movement on Your Network
How to Hunt for Lateral Movement on Your Network
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting Started
 
Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)Building a Next-Generation Security Operations Center (SOC)
Building a Next-Generation Security Operations Center (SOC)
 
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior GraphUser and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
 
Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)
 
Sqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar UsersSqrrl and IBM: Threat Hunting for QRadar Users
Sqrrl and IBM: Threat Hunting for QRadar Users
 
Threat Hunting for Command and Control Activity
Threat Hunting for Command and Control ActivityThreat Hunting for Command and Control Activity
Threat Hunting for Command and Control Activity
 
Modernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led TrainingModernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led Training
 
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
Threat Hunting vs. UEBA: Similarities, Differences, and How They Work Together
 
Leveraging DNS to Surface Attacker Activity
Leveraging DNS to Surface Attacker ActivityLeveraging DNS to Surface Attacker Activity
Leveraging DNS to Surface Attacker Activity
 
The Art and Science of Alert Triage
The Art and Science of Alert TriageThe Art and Science of Alert Triage
The Art and Science of Alert Triage
 
Reducing Mean Time to Know
Reducing Mean Time to KnowReducing Mean Time to Know
Reducing Mean Time to Know
 
Sqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl Enterprise: Big Data Security Analytics Use Case
Sqrrl Enterprise: Big Data Security Analytics Use Case
 
The Linked Data Advantage
The Linked Data AdvantageThe Linked Data Advantage
The Linked Data Advantage
 
Sqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, AnalyzeSqrrl Enterprise: Integrate, Explore, Analyze
Sqrrl Enterprise: Integrate, Explore, Analyze
 
Sqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber HuntingSqrrl Datasheet: Cyber Hunting
Sqrrl Datasheet: Cyber Hunting
 
Benchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value StoreBenchmarking The Apache Accumulo Distributed Key–Value Store
Benchmarking The Apache Accumulo Distributed Key–Value Store
 
Scalable Graph Clustering with Pregel
Scalable Graph Clustering with PregelScalable Graph Clustering with Pregel
Scalable Graph Clustering with Pregel
 

Recently uploaded

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 

Recently uploaded (20)

GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 

Ycsb benchmarking

  • 1. YCSB++ BenchmarkingTool PerformanceDebuggingAdvanced FeaturesofScalableTableStores Swapnil Patil Milo Polte,WittawatTantisiriroj, Kai Ren, Lin Xiao, Julio Lopez, Garth Gibson, Adam Fuchs *, Billie Rinaldi * CarnegieMellonUniversity *NationalSecurityAgency Open Cirrus Summit, October 2011, Atlanta GA
  • 2. Scalable table stores are critical systems Swapnil Patil, CMU 2 •  For data processing & analysis (e.g. Pregel, Hive) •  For systems services (e.g., Google Colossus metadata)
  • 3. Evolution of scalable table stores Simple, lightweight  complex, feature-rich stores Supports a broader range of applications and services Hard to debug and understand performance problems Complex behavior and interaction of various components GrowingsetofHBasefeatures 2008 2009 2010 2011+ RangeRowFilters Batch updates Bulk load tools RegEx filtering Scan optimizations HBASE release Co-processors Access Control ⏏
 ⏏
 ⏏
 3Swapnil Patil, CMU
  • 4. YCSB++ FUNCTIONALITY ZooKeeper-based distributed and coordinated testing API and extensions the new Apache ACCUMULO DB Fine-grained, correlated monitoring usingOTUS FEATURES TESTED USING YCSB++ Batch writing Table pre-splitting  Bulk loading Weak consistency  Server-side filtering  Fine-grained security Tool released at http://www.pdl.cmu.edu/ycsb++ Swapnil Patil, CMU 4 Need richer tools for understanding advanced features in table stores …
  • 5. Outline •  Problem •  YCSB++ design •  Illustrative examples •  Ongoing work and summary Swapnil Patil, CMU 5
  • 6. Yahoo Cloud Serving Benchmark [Cooper2010] Swapnil Patil, CMU 6 •  For CRUD (create-read-update-delete) benchmarking •  Single-node system with an extensible API Storage Servers HBASE OTHER DBS Workload Executor Threads Stats DBClients Command-line Parameters Workload Parameter File
  • 7. YCSB++: New extensions Swapnil Patil, CMU 7 Added support for the new Apache ACCUMULO DB − New parameters and workload executors Storage Servers HBASE OTHER DBS Workload Executor Threads Stats DBClients Workload Parameter File Command-line Parameters EXTENSIONSEXTENSIONS ACCUMULO
  • 8. YCSB++: Distributed & parallel tests Swapnil Patil, CMU 8 Multi-client, multi-phase coordination using ZooKeeper − Enables testing at large scales and testing asymmetric features Storage Servers HBASE OTHER DBS Workload Executor Threads Stats DBClients Workload Parameter File Command-line Parameters EXTENSIONSEXTENSIONS MULTI-PHASE ACCUMULO YCSB clients COORDINATION
  • 9. YCSB++: Collective monitoring Swapnil Patil, CMU 9 OTUS monitor built on Ganglia [Ren2011] − Collects information fromYCSB, table stores, HDFS and OS Storage Servers HBASE OTHER DBS Workload Executor Threads Stats DBClients Workload Parameter File Command-line Parameters EXTENSIONSEXTENSIONS MULTI-PHASE ACCUMULO YCSB clients COORDINATION OTUS MONITORING
  • 10. Example ofYCSB++ debugging Swapnil Patil, CMU 10 OTUS collects fine-grained information − Both HDFS process andTabletServer process on same node 0 20 40 60 80 100 00:00 04:00 08:00 12:00 16:00 20:00 00:00 04:00 0 8 16 24 32 40 CPUUsage(%) AvgNumberofStoreFilesPerTablet Time (Minutes) Monitoring Resource Usage and TableStore Metrics Accumulo Avg. StoreFiles per Tablet HDFS DataNode CPU Usage Accumulo TabletServer CPU Usage
  • 11. Outline •  Problem •  YCSB++ design •  Illustrative examples − YCSB++ on HBASE and ACCUMULO (Bigtable-like stores) •  Ongoing work and summary Swapnil Patil, CMU 11
  • 12. Tablet Servers Recap of Bigtable-like table stores Swapnil Patil, CMU 12 HDFS nodes TabletTN Memtable (Fewer) Sorted Indexed Files Sorted Indexed Files MINOR COMPACTION MAJOR COMPACTION Write Ahead Log Data Insertion 1 2 3 Write-path: in-memory buffering & async FS writes 1) Mutations logged in memory tables (unsorted order) 2) Minor compaction: Memtables -> sorted, indexed files in HDFS 3) Major compaction: LSM-tree based file merging in background Read-path: lookup both memtables and on-disk files
  • 13. Apache ACCUMULO Started at NSA; now an Apache Incubator project − Designed for for high-speed ingest and scan workloads − http://incubator.apache.org/projects/accumulo.html New features in ACCUMULO − Iterator framework for user-specified programs placed in between different stages of the DB pipeline   E.g., Support joins and stream processing using iterators − Also supports fine-grained cell-level access control Swapnil Patil, CMU 13
  • 14. ILLUSTRATIVE EXAMPLE #1 Analyzing the fast inserts vs. weak consistency tradeoff usingYCSB++ Swapnil Patil, CMU 14
  • 15. Client-side batch writing Feature: clients batch inserts, delay writes to server •  Improves insert throughput and latency •  Newly inserted data may not be immediately visible to other clients Swapnil Patil, CMU 15 ⏏
 ⏏
 Table store servers ZooKeeper Cluster Manager YCSB++ Store client Batch YCSB++ Store client CLIENT #1 CLIENT #2 Read{K}
  • 16. Batch writing improves throughput 6 clients creating 9 million 1-Kbyte records on 6 servers − Small batches - high client CPU utilization, limits throughput − Large batches - saturate servers, limited benefit from batching Swapnil Patil, CMU 16 0 10 20 30 40 50 60 10 KB 100 KB 1 MB 10 MB Insertspersecond(1000s) Batch size Hbase Accumulo
  • 17. Table store servers ZooKeeper Batch writing causes weak consistency Swapnil Patil, CMU 17 Test setup: ZooKeeper-based client coordination •  Share producer-consumer queue between readers/writers •  R-W lag = delay before C2 can read C1’s most recent write YCSB++ Store client Batch YCSB++ Store client 1 2 3 4 CLIENT #1 CLIENT #2 Insert {K:V} (106 records) EnqueueK (sample 1% records) Polland dequeueK Read{K}
  • 18. Batch writing causes weak consistency Deferred write wins, but lag can be ~100 seconds − (N%) = fraction of requests that needed multiple read()s − Implementation of batching affects the median latency Swapnil Patil, CMU 18 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 10 100 1000 10000 100000 Fractionofrequests read-after-write time lag (ms) (a) HBase: Time lag for different buffer sizes 10 KB ( <1%) 100 KB (7.4%) 1 MB ( 17%) 10 MB ( 23%) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 10 100 1000 10000 100000 Fractionofrequests read-after-write time lag (ms) (b) Accumulo: Time lag for different buffer sizes 10 KB ( <1%) 100 KB (1.2%) 1 MB ( 14%) 10 MB ( 33%)
  • 19. ILLUSTRATIVE EXAMPLE #2 Benchmarking high-speed ingest features usingYCSB++ Swapnil Patil, CMU 19
  • 20. Features for high-speed insertions Most table stores have high-speed ingest features − Periodically insert large amounts of data or migrate old data in bulk − Classic relational DB techniques applied to new stores Two features: bulk loading and table pre-splitting •  Less data migration during inserts •  Engages more tablet servers immediately •  Need careful tuning and configuration [Sasha2002] Swapnil Patil, CMU 20 ⏏
 ⏏
 ⏏

  • 21. 8-phase test setup: table bulk loading Bulk loading involves two steps − Hadoop-based data formatting − Importing store files into table store Pre-load phase (1 and 2) − Bulk load 6M rows in an empty table − Goal: parallelism by engaging all servers Load phase (4 and 5) − Load 48M new rows − Goal: study rebalancing during ingest R/U measurements (3, 6 and 7) − Correlate latency with rebalancing work Swapnil Patil, CMU 21 Load (importing) Read/Update workload Load (re-formatting) Read/Update workload Sleep Read/Update workload Pre-Load (importing) Pre-Load (re-formatting) Phases 1 2 3 4 5 6 7 8
  • 22. Read latency affected by rebalancing work Swapnil Patil, CMU 22 Load (importing) Read/Update workload Load (re-formatting) Read/Update workload Sleep Read/Update workload Pre-Load (importing) Pre-Load (re-formatting) Phases 1 2 3 4 5 6 7 8 1 10 100 1000 0 60 120 180 240 300 AccumuloReadLatency(ms) Measurement Phase RunningTime (Seconds) R/U 1 (Phase 3) R/U 2 (Phase 6) R/U 3 (Phase 8) •  High latency after high insertion periods that cause servers to rebalance (compactions) •  Latency drops after store is in a steady state
  • 23. Rebalancing on ACCUMULO servers Swapnil Patil, CMU 23 Load (importing) Read/Update workload Load (re-formatting) Read/Update workload Sleep Read/Update workload Pre-Load (importing) Pre-Load (re-formatting) Phases 1 2 3 4 5 6 7 8 •  OTUS monitor shows the server-side compactions during post-ingest measurement phases 1 10 100 1000 0 300 600 900 1200 1500 1800 Experiment RunningTime (sec) StoreFiles Tablets Compactions
  • 24. HBASE is slower: Different compaction policies Swapnil Patil, CMU 24 1 10 100 1000 10000 0 60 120 180 240 300 AccumuloReadLatency(ms) Measurement Phase RunningTime (Seconds) R/U 1 (Phase 3) R/U 2 (Phase 6) R/U 3 (Phase 8) 1 10 100 1000 0 300 600 900 1200 1500 1800 Accumulo Experiment RunningTime (sec) StoreFiles Tablets Compactions 1 10 100 1000 10000 0 60 120 180 240 300 HBaseReadLatency(ms) Measurement Phase RunningTime (Seconds) 1 10 100 1000 0 300 600 900 1200 1500 1800 HBase Experiment RunningTime (sec)
  • 25. Extending to table pre-splitting Swapnil Patil, CMU 25 Tablepre-splittingtest Load Pre-load Pre-split into N ranges Read/Update workload Sleep Read/Update workload Load (importing) Read/Update workload Load (re-formatting) Read/Update workload Sleep Read/Update workload Pre-Load (importing) Pre-Load (re-formatting) Bulkloadingtest Pre-split a key range into N partitions to avoid splitting during insertion
  • 26. Outline •  Problem •  YCSB++ design •  Illustrative examples •  Ongoing work and summary Swapnil Patil, CMU 26
  • 27. Things not covered in this talk More features: function shipping to servers − Data filtering at the servers − Fine-grained, cell-level access control MoredetailsintheACMSOCC2011paper Ongoing work − Analyze more table stores: Cassandra,CouchDB, MongoDB − Continue research through the new Intel Science and Technology Center for Cloud Computing at CMU (withGaTech) Swapnil Patil, CMU 27
  • 28. Summary:YCSB++ tool •  Tool for performance debugging and benchmarking advanced features using new extensions toYCSB •  Two case-studies: Apache HBASE and ACCUMULO •  Tool available at http://www.pdl.cmu.edu/ycsb++ Weak consistency semantics Distributed clients using ZooKeeper Fast insertions (pre-splits & bulk loads) Multi-phase testing (with Hadoop) Server-side filtering New workload generators and database client API extensionsFine-grained access control 28Swapnil Patil, CMU