Couchbase Performance Benchmarking

Benchmarking Couchbase Server
Renat Khasanshyn
CEO, Altoros Systems, Inc.

CouchConf 2012
September 21, 2012

Copyright © Altoros Systems, Inc. | CONFIDENTIAL

Presentation Outline

• Benchmark Goals
• Benchmark Design and Scenario
• Benchmarking Tools
• Benchmark Results

2

3

About Altoros
• Software delivery acceleration specialist for big data application
implementation services
• 200+ employees globally (Eastern Europe, US, UK, Denmark, Norway)
• Big data practice areas:
 Advertising analytics
 Automated device analytics
 Big data warehouse
Customers

Partners

Implementation Partner

4

Why Benchmark NoSQL technologies?

• All NoSQL technologies say they are “high
performance and scalable”
But this isn’t helpful to end users
• Performance needs to be measured for meaning full
workloads
⇒ To help users understand the performance characteristics of
databases those workloads

• So we decided to compare the commonly used
NoSQL databases
• MongoDB 2.2RC
• Cassandra 1.1.2
• Couchbase Server 2.0 - Recent Build

5

Benchmark Goals

• Reproducible by anyone
– Open Source workload generator
• Focus on use case for which NoSQL typically
selected
• Use a realistic workload
– Simulate steady state of application running
– Meaningful data amounts & runtime
• Compare latency vs throughput
• Measure max throughput (for given scenario)

6

Benchmarking Scenario

• For interactive web application
• Scalability and performance are the most common
requirements
• Typically leads to users selecting NoSQL over RDBMS
• The working set of data changes with time
• End users using the application change over time
• Example: every few hours, every few days, every few weeks
• There is more data available than memory (RAM)
• Replication is used for fault tolerance
• Real world data sizes
• Use EC2 as deployment platform
– Commonly used
– Easy to replicate results

7

Benchmarking Scenario Details

Hardware
• 4 Amazon m1.xlarge instances for the NoSQL DBs
• 1 instance used as the client
Workload details
• Operations are a mix of C:R:U:D in the ratio 5:60:33:2
• Each document roughly 1.5-2K in size (15 fields * 100 bytes)
• 15 million active and 15 million replica documents
• Workload with sliding working set
• Load phase, warm-up phase, access phase
• Runtime of the access phase ~1 hour
• Latency measured for varying throughput - 3 times for each run
• Focus on transaction performance
– Latency
– Throughput

8

What was measured?

• Latency • Throughput
• Round trip time taken • Throughput was varied
for a request to execute from 1K ops/sec to 25K
from the client to the ops/sec depending on
server and back NoSQL database
• Average, 95th and 99th • Max throughput was
percentile measured measured
• Why is this important? • Why is this important?
• You want your users to • You want your app to
have a great experience support hundreds of
• Not just an “average” thousands of users
one
Workloads are not rate limited, focused on
max throughput.

9

YCSB

10

Benchmark Implementation: YCSB

• Yahoo! team offered a “standard” benchmark

• Yahoo! Cloud Serving Benchmark (YCSB)
– Focus on database
– Focus on performance

• YCSB Client consists of 2 parts
– Workload generator
– Workload scenarios

11

Why YCSB

• Open source
• Extensible
• Rich selection of connectors
• Azure, BigTable, Cassandra, CouchDB,
• Dynomite, GemFire, HBase, Hypertable,
• Infinispan, MongoDB, PNUTS, Redis,
• Connector for Sharded RDBMS (i.e. MySQL),
• Voldemort, GigaSpaces XAP
• We developed a few connectors
• Accumulo, Couchbase, Riak,
• Connector for Shared Nothing RDBMS (i.e. MySQL Cluster)

12

How YCSB Works

13

THE CONFIGURATIONS

14

Cluster specification

Amazon m1.xlarge Instance

15 GB memory
4 virtual cores
4 EBS 50 GB volumes in RAID0
YCSB Client 64-bit Amazon Linux (CentOS binary compatible)

Amazon m1.xlarge Instances * 4

15 GB memory
4 virtual cores
4 EBS 50 GB volumes in RAID0
64-bit Amazon Linux

* Extra nodes for masters, routers, etc
15

Couchbase Configuration

• 4 node Couchbase cluster
• 1 replica setting
• Each node has some active and some replica
data
• 12GB used as the (12288 MB) Couchbase
bucket size per node

16

MongoDB Configation

• 4 shards each has 1 replica (replication factor – 1),
where each shard is a set of 2 nodes - primary and
secondary
• Journaling disabled (trying to maximize performance)
• var shards = [
"shard1/ycsb-node1:27017,ycsb-node2:27018",
"shard4/ycsb-node4:27017,ycsb-node3:27018"];
Each node running
• 2 mongod processes (all together 8 mongod
processes on 4 nodes)
• 4 mongos processes, which is the MongoDB router,
process on 27019 port
17

Cassandra Configuration

• Cassandra JVM settings:
• 1.1) MAX_HEAP_SIZE, which is a total amount of
memory dedicated to the Java heap - 6G
• 1.2) HEAP_NEWSIZE, total amount of memory for the
new generation of objects - 400M

• Cassandra settings:
• 2.1) RandomPartitioner was used which distributes
rows across the cluster evenly by MD5
• 2.2) Memtable size 4048 MB

18

THE RESULTS

19

Reads (Average time)

Read latencies against throughput
7

6

Cassandra
5
Average Latency [ms]

4
MongoDB
3

2

1
Couchbase
0
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000
Operations per Second

20

Reads (95th percentile)
18
16

14
Cassandra
12
95th Percentile Latency [ms]

10

8

6

4 Couchbase
2
MongoDB
0
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000

21

Reads (99th percentile)
60

50

Cassandra
40

30 MongoDB

20

10
Couchbase

0
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000

22

Mongo Replica Reads

• MongoDB setup had 4 shards
• By default only masters will service reads
• To allow replica reads and still be comparable, need to
ensure that replica data is up-to-date
• This was done using write-concern (REPLICAS_SAFE)
• Tests showed that results did not improve
• This includes results for writes

23

Writes (Average time)
5
Insert and Update latencies against throughput
4.5

4

MongoDB
3.5
Average Latency [ms]

3 Cassandra
2.5

2

1.5

1
Couchbase
0.5

0
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000
Operations per second

24

Writes (95th percentile)
30
Insert and update latencies against throughput

25

MongoDB

20

15

Cassandra
10

Couchbase
5

0
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000

25

Writes (99th percentile)
50
Insert and update latencies against throughput
45

40

MongoDB
35

30

25
Cassandra
20

15

10

5
Couchbase
0
0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 22000

26

Results Analysis

• Couchbase
• Showed the lowest latencies & highest throughput
• Latency was independent of throughput for up to 3/4th the max
achievable throughput (for both reads and write)
• Cassandra
• Had the highest latencies of all the databases
• Showed higher max throughput compared with mongoDB but only
60% of the throughput achieved by Couchbase
• Latencies rose fast as throughput was increased
• MongoDB
• Read latencies were better than Cassandra but higher than
Couchbase
• Max throughput for read and writes was the lowest of all the
databases
– Particularly for writes, high latencies seen for average throughput
– Coarse write lock seems to have a big impact on performance

27

Other Thoughts

• You decide who is a winner
• NoSQL is a “different horses for different courses”
• Evaluate before choosing the “horse”
• Construct your own or use existing workloads
• Benchmark it
• Tune database!
• Benchmark it again

Amazon EC2 observations
• Scales perfectly for NoSQL
• EBS slows down database on reads
• RAID0 it! Use 4 disk in array (good choice), some reported
performance degraded with higher number (6 and >)

28

What are we missing in our benchmarking scenario?

Load phase workload
• Working set is created
• 15 million records
• 1.5 KB record (15 fields by 100 Bytes)
• 45GB total or ≈12GB per node
Ideas, anyone?

29

YCSB Connectors

github.com/Altoros/YCSB

30

Workload Generator Specs

Hotspot generator with sliding window:

hotspotslidingspeed=10
Speed of the hot set window movement measured in keys per second, with a
default value of 10 keys/sec (can be overridden in workload properties file).
hotspotdatafraction=0.2
Proportion of the hot data set to the whole dataset, default is 0.2
hotspotoperationfraction=0.9
Value specifying how often hot dataset will be queried comparing to cold
dataset, default is 0.8, used 0.9
lowerbound=0
The minimal key value allowed to be queried. Set to 0
upperbound=15000000
The maximum key value allowed to be queried. Set to 15 million

Also specification of the client process, which drives workload:
6) threadcount=30
Number of parallel threads spawned on the client node to drive benchmark

31

Thank you!

Thank You!
renat.k@altoros.com
@renatkhasanshyn
Tel. (650) 395-7002

32

Couchbase Performance Benchmarking

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Couchbase Performance Benchmarking

Similar to Couchbase Performance Benchmarking (20)

Recently uploaded

Recently uploaded (20)

Couchbase Performance Benchmarking