Benchmarking, benchmarking, benchmarking. We all do it, mostly it tells us what we want to hear but often hides a mountain of misinformation. In this talk we will walk through the pitfalls that you might find yourself in by looking at some examples where things go wrong. We will then walk through how MongoDB performance is measured, the processes and methodology and ways to present and look at the information.
2. Before we start…
• We are going to look a lot at
– C++ kernel code
– Java benchmarks
– JavaScript tests
• And lots of charts
• And its going to be awesome!
6. "We all live in a house on fire, no fire department
to call; no way out, just the upstairs window to
look out of while the fire burns the house down
with us trapped, locked in it."
The Milk Train Doesn't Stop Here Anymore
Tennessee Williams
7. #1 Time taken to Insert x
Documents
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
8. #1 Time taken to Insert x
Documents
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
9. #1 Time taken to Insert x
Documents
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
10. #1 Time taken to Insert x
Documents
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
11. #1 Time taken to Insert x
Documents
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
12. So that looks ok, right?
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
13. What are else you measuring?
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
Object creation and GC
management?
14. What are else you measuring?
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
Object creation and GC
management?
Thread contention on
nextInt()?
15. What are else you measuring?
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
Object creation and GC
management?
Thread contention on
nextInt()?
Time to synthesize data?
16. What are else you measuring?
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
Object creation and GC
management?
Thread contention on
nextInt()?
Time to synthesize data?
Thread contention on
addAndGet()?
17. What are else you measuring?
long startTime = System.currentTimeMillis();
for (int roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
BasicDBObject doc = new BasicDBObject();
doc.put("_id",id);
doc.put("k",rand.nextInt(numMaxInserts)+1);
String cVal = "…"
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i]=doc;
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
globalInserts.addAndGet(documentsPerInsert);
}
long endTime = System.currentTimeMillis();
Object creation and GC
management?
Thread contention on
nextInt()?
Time to synthesize data?
Thread contention on
addAndGet()?
Clock resolution?
18. Solution: Pre-Create the objects
// Pre Create the Object outside the Loop
BasicDBObject[] aDocs = new BasicDBObject[documentsPerInsert];
for (int i=0; i < documentsPerInsert; i++) {
BasicDBObject doc = new BasicDBObject();
String cVal = "…";
doc.put("c",cVal);
String padVal = "…";
doc.put("pad",padVal);
aDocs[i] = doc;
}
Pre-create non varying
data outside the timing
loop
Alternative
• Pre-create the data in a file; load from file
19. Solution: Remove contention
// Use ThreadLocalRandom generator or an instance of java.util.Random per thread
java.util.concurrent.ThreadLocalRandom rand;
for (long roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
doc = aDocs[i];
doc.put("_id",id);
doc.put("k", nextInt(rand, numMaxInserts)+1);
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
}
// Maintain count outside the loop
globalInserts.addAndGet(documentsPerInsert * roundNum);
Remove contention
nextInt() by making
Thread local
20. Solution: Remove contention
// Use ThreadLocalRandom generator or an instance of java.util.Random per thread
java.util.concurrent.ThreadLocalRandom rand;
Remove contention
nextInt() by making
Thread local
for (long roundNum = 0; roundNum < numRounds; roundNum++) {
for (int i = 0; i < documentsPerInsert; i++) {
id++;
doc = aDocs[i];
doc.put("_id",id);
doc.put("k", nextInt(rand, numMaxInserts)+1);
}
coll.insert(aDocs);
numInserts += documentsPerInsert;
}
// Maintain count outside the loop
globalInserts.addAndGet(documentsPerInsert * roundNum);
Remove contention on
addAndGet()
21. Solution: Timer resolution
long startTime = System.currentTimeMillis();
…
long endTime = System.currentTimeMillis();
long startTime = System.nanoTime();
…
long endTime = System.nanoTime() - startTime;
"granularity of the value
depends on the
underlying operating
system and may be
larger"
"resolution is at least as
good as that of
currentTimeMillis()"
Source
• http://docs.oracle.com/javase/7/docs/api/java/lang/System.html
23. #2 Response time to return all
results
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}
BasicDBObject predicate = new BasicDBObject();
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
24. #2 Response time to return all
results
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}
BasicDBObject predicate = new BasicDBObject();
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
25. #2 Response time to return all
results
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}
BasicDBObject predicate = new BasicDBObject();
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
26. #2 Response time to return all
results
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}
BasicDBObject predicate = new BasicDBObject();
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
27. So that looks ok, right?
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}
BasicDBObject predicate = new BasicDBObject();
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
28. What are else you measuring?
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}
BasicDBObject predicate = new BasicDBObject();
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
Each doc is is 4080 bytes
on disk with powerOf2Sizes
29. What are else you measuring?
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}
BasicDBObject predicate = new BasicDBObject();
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
Each doc is is 4080 bytes
on disk with powerOf2Sizes
Unrestricted predicate?
30. What are else you measuring?
BasicDBObject doc = new BasicDBObject();
doc.put("v", str); // str is a 2k string
for (int i=0; i < 1000; i++) {
doc.put("_id",i); coll.insert(doc);
}
BasicDBObject predicate = new BasicDBObject();
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate);
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
Each doc is is 4080 bytes
on disk with powerOf2Sizes
Unrestricted predicate?
Measuring
• Time to parse &
execute query
• Time to retrieve all
document
But also
• Cost of shipping ~4MB
data through network
stack
31. Solution: Limit the projection
BasicDBObject predicate = new BasicDBObject();
predicate.put("_id", new BasicDBObject("$gte", 10).append("$lte", 20));
BasicDBObject projection = new BasicDBObject();
projection.put("_id", 1);
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate, projection );
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
Return fixed range
32. Solution: Limit the projection
BasicDBObject predicate = new BasicDBObject();
predicate.put("_id", new BasicDBObject("$gte", 10).append("$lte", 20));
BasicDBObject projection = new BasicDBObject();
projection.put("_id", 1);
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate, projection );
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
Return fixed range
Only project _id
33. Solution: Limit the projection
BasicDBObject predicate = new BasicDBObject();
predicate.put("_id", new BasicDBObject("$gte", 10).append("$lte", 20));
BasicDBObject projection = new BasicDBObject();
projection.put("_id", 1);
long startTime = System.currentTimeMillis();
DBCursor cur = coll.find(predicate, projection );
DBObject foundObj;
while (cur.hasNext()) {
foundObj = cur.next();
}
long endTime = System.currentTimeMillis();
Return fixed range
Only project _id
Only 46k transferred
through network stack
36. "Every experiment destroys some of the
knowledge of the system which was obtained by
previous experiments."
The Physical Principles of the Quantum Theory (1930)
Werner Heisenberg
49. Ouch… where's the tree in the
woods?
• 2.4.10 -> 2.6.0
– 4495 git commits
50. git-bisect
• Bisect between good/bad hashes
• git-bisect nominates a new githash
– Build against githash
– Re-run test
– Confirm if this githash is good/bad
• Rinse and repeat
61. Response Times
Bulk Insert 2.6.0 vs 2.6.1
Ceiling similar, lower floor
resulting in 40%
improvement in throughput
62. Secondary effects of Yield policy change
Write lock time reduced
Order of magnitude reduction
of write lock duration
63. Unexpected side effects of
measurement?
> db.serverStatus()
Yes – will cause a read lock to be acquired
> db.serverStatus({recordStats:0})
No – lock is not acquired
> mongostat
Yes - until SERVER-14008 resolved, uses db.serverStatus()
64. CPU sampling
• Get an impression of
– Call Graphs
– CPU time spent on node and called nodes
65. Setup & building with google-profiler
> sudo apt-get install google-perftools
> sudo apt-get install libunwind7-dev
> scons --use-cpu-profiler mongod
66. Start the profiling
> mongodb –dbpath <…>
Note: Do not use –fork
> mongo
> use admin
> db.runCommand({_cpuProfilerStart: {profileFilename: 'foo.prof'}})
Execute some commands that you want to profile
> db.runCommand({_cpuProfilerStop: 1})
70. Public Benchmarks – Not all forks are
the same…
• YCSB
– https://github.com/achille/YCSB
• sysbench-mongodb
– https://github.com/mdcallag/sysbench-mongodb
72. "The future sucks. Change it."
"I'm way cool Beavis, but I cannot change the
future."
Beavis & Butthead
73. What we are working on
• mongo-perf
– UI refactor
– Adding more micro benchmarks (geo, sharding)
• Workloads
– Adding external benchmarks
– Creating benchmarks for common use cases
• Inbox fan in/out
• Analytical dashboards
• Stream / Feeds
• Customers, Partners & Community
74. Here's how you can help change the
future!
• Got a great workload? Great benchmark?
• Want to donate it?
• alvin@mongodb.com
75. Don't be that benchmark…
#1 Know what you are measuring
#2 Measure only what you need to
measure
76. #MongoDBDays
Thank You
Alvin Richards
alvin@mongodb.com / @jonnyeight
Senior Director of Performance Engineering, MongoDB
Editor's Notes
Per Java7 documentation
http://docs.oracle.com/javase/7/docs/api/java/util/Random.html
"Instances of java.util.Random are threadsafe. However, the concurrent use of the same java.util.Random instance across threads may encounter contention and consequent poor performance. Consider instead using ThreadLocalRandom in multithreaded designs."
Per Java7 documentation
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/atomic/package-summary.html
"The specifications of these methods enable implementations to employ efficient machine-level atomic instructions that are available on contemporary processors. However on some platforms, support may entail some form of internal locking."
Per Java7 documentation
http://docs.oracle.com/javase/7/docs/api/java/lang/System.html#currentTimeMillis()
"Returns the current time in milliseconds. Note that while the unit of time of the return value is a millisecond, the granularity of the value depends on the underlying operating system and may be larger. For example, many operating systems measure time in units of tens of milliseconds."
Per Jav7 documentation
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadLocalRandom.html
"A random number generator isolated to the current thread…Use of ThreadLocalRandom is particularly appropriate when multiple tasks (for example, each a ForkJoinTask) use random numbers in parallel in thread pools."
Per Java7 documentation
http://docs.oracle.com/javase/7/docs/api/java/lang/System.html#nanoTime()
"This method provides nanosecond precision, but not necessarily nanosecond resolution (that is, how frequently the value changes) - no guarantees are made except that the resolution is at least as good as that of currentTimeMillis()."