SlideShare a Scribd company logo
1 of 32
Download to read offline
The Truth, the Whole Truth, and
Nothing but the Truth
A Pragmatic Guide to Assessing Empirical Evaluations
Stephen M. Blackburn, Amer Diwan, Matthias Hauswirth, Peter F. Sweeney
Jose ́ Nelson Amaral, Tim Brecht, Lubomr Bulej, Cliff Click, Lieven Eeckhout,
Sebastian Fischmeister, Daniel Frampton, Laurie J. Hendren, Michael Hind,
Antony L. Hosking, Richard E. Jones, Tomas Kalibera, Nathan Keynes,
Nathaniel Nystrom, and Andreas Zeller
Programming Languages Mentoring Workshop, SantaBarbara, June 2016
Rob Walsh
Anneke
Kathryn McKinley
Ron Morrison
Eliot Moss
Robin Stanton
Dave Thomas
Generosity
Perseverance
Mr Beach
Graham Hellistrand
The Truth, the Whole Truth, and
Nothing but the Truth
A Pragmatic Guide to Assessing Empirical Evaluations
Stephen M. Blackburn, Amer Diwan, Matthias Hauswirth, Peter F. Sweeney
Jose ́ Nelson Amaral, Tim Brecht, Lubomr Bulej, Cliff Click, Lieven Eeckhout,
Sebastian Fischmeister, Daniel Frampton, Laurie J. Hendren, Michael Hind,
Antony L. Hosking, Richard E. Jones, Tomas Kalibera, Nathan Keynes,
Nathaniel Nystrom, and Andreas Zeller
Programming Languages Mentoring Workshop, SantaBarbara, June 2016
20162010
1
1.1
1.2
1.3
1.4
1.5
1.6
1 1.5 2 2.5 3
16
18
20
22
24
30MB 40MB 50MB 60MB 70MB 80MB 90MB
NormalizedTime
Time(sec)
Heap size relative to minimum heap size
Heap size
Depth First
Breadth First
Partial DF 2 Children
OOR
(a) jython Total Time
1
1.1
1.2
1.3
1.4
1.5
1.6
1 1.5 2 2.5 3
12
13
14
15
16
17
18
15MB 20MB 25MB 30MB 35MB 40MB 45MB 50MB 55MB 60MB 65MB
NormalizedTime
Time(sec)
Heap size relative to minimum heap size
Heap size
Depth First
Breadth First
Partial DF 2 Children
OOR
(b) db Total Time
1
1.1
1.2
1.3
1.4
1.5
1.6
1 1.5 2 2.5 3
6.5
7
7.5
8
8.5
9
9.5
10
20MB 30MB 40MB 50MB 60MB 70MB 80MB
NormalizedTime
Time(sec)
Heap size relative to minimum heap size
Heap size
Depth First
Breadth First
Partial DF 2 Children
OOR
(c) javac Total Time
1
1.1
1.2
1.3
1.4
1.5
1 1.5 2 2.5 3
15
16
17
18
19
20
21
22
30MB 40MB 50MB 60MB 70MB 80MB 90MB
NormalizedMutatorTime
MutatorTime(sec)
Heap size relative to minimum heap size
Heap size
Depth First
Breadth First
Partial DF 2 Children
OOR
(d) jython Mutator Time
1
1.1
1.2
1.3
1.4
1.5
1 1.5 2 2.5 3
11
12
13
14
15
16
17
15MB 20MB 25MB 30MB 35MB 40MB 45MB 50MB 55MB 60MB 65MB
NormalizedMutatorTime
MutatorTime(sec)
Heap size relative to minimum heap size
Heap size
Depth First
Breadth First
Partial DF 2 Children
OOR
(e) db Mutator Time
0.96
0.98
1
1.02
1.04
1.06
1.08
1.1
1 1.5 2 2.5 3
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
20MB 30MB 40MB 50MB 60MB 70MB 80MB
NormalizedMutatorTime
MutatorTime(sec)
Heap size relative to minimum heap size
Heap size
Depth First
Breadth First
Partial DF 2 Children
OOR
(f) javac Mutator Time
3.5
4
4.5
5
120
140
160
30MB 40MB 50MB 60MB 70MB 80MB 90MB
torL2Misses
ses(10^6)
Heap size
Depth First
Breadth First
Partial DF 2 Children
OOR
1.6
1.8
2
200
220
15MB20MB25MB30MB35MB40MB45MB50MB55MB60MB65MB
torL2Misses
ses(10^6)
Heap size
Depth First
Breadth First
Partial DF 2 Children
OOR
1.6
1.8
2
26
28
30
32
20MB 30MB 40MB 50MB 60MB 70MB 80MB
torL2Misses
ses(10^6)
Heap size
Depth First
Breadth First
Partial DF 2 Children
OOR
hypotheses, evaluations, claims
a block-based heap
structure could allow us
to get the best of
compacting, free list,
and copying collectors
hypothesis
Abstract
Programmers are increasingly choosing managed languages for
modern applications, which tend to allocate many short-to-medium
lived small objects. The garbage collector therefore directly deter-
mines program performance by making a classic space-time trade-
off that seeks to provide space efficiency, fast reclamation, and mu-
tator performance. The three canonical tracing garbage collectors:
semi-space, mark-sweep, and mark-compact each sacrifice one ob-
jective. This paper describes a collector family, called mark-region,
and introduces opportunistic defragmentation, which mixes copy-
ing and marking in a single pass. Combining both, we implement
immix, a novel high performance garbage collector that achieves all
three performance objectives. The key insight is to allocate and re-
claim memory in contiguous regions, at a coarse block grain when
possible and otherwise in groups of finer grain lines. We show
that immix outperforms existing canonical algorithms, improving
total application performance by 7 to 25% on average across 20
benchmarks. As the mature space in a generational collector, im-
mix matches or beats a highly tuned generational collector, e.g. it
improves jbb2000 by 5%. These innovations and the identification
of a new family of collectors open new opportunities for garbage
collector design.
Categories and Subject Descriptors D.3.4 [Programming Lan-
guages]: Processors—Memory management (garbage collection)
General Terms Algorithms, Experimentation, Languages, Performance,
Measurement
Keywords Fragmentation, Free-List, Compact, Mark-Sweep, Semi-
Space, Mark-Region, Immix, Sweep-To-Region, Sweep-To-Free-List
1. Introduction
Modern applications are increasingly written in managed lan-
guages and make conflicting demands on their underlying mem-
ory managers. For example, real-time applications demand pause-
time guarantees, embedded systems demand space efficiency, and
servers demand high throughput. In seeking to satisfy these de-
efficiency, fast collection, and mutator performance through con-
tiguous allocation of contemporaneous objects.
Figure 1 starkly illustrates this dichotomy for full heap versions
of mark-sweep (MS), semi-space (SS), and mark-compact (MC)
implemented in MMTk [12], running on a Core 2 Duo. It plots
the geometric mean of total time, collection time, mutator time,
and mutator cache misses as a function of heap size, normalized
to the best, for 20 DaCapo, SPECjvm98, and SPECjbb2000 bench-
marks, and shows 99% confidence intervals. The crossing lines in
Figure 1(a) illustrate the classic space-time trade-off at the heart of
garbage collection. Mark-compact is uncompetitive in this setting
due to its overwhelming collection costs. In smaller heap sizes, the
space and collector efficiency of mark-sweep perform best since
the overheads of garbage collection dominate total performance.
Figures 1(c) and 1(d) show that the primary advantage for semi-
space is 10% better mutator time compared with mark-sweep, due
to better cache locality. Once the heap size is large enough, garbage
collection time reduces, and the locality of the mutator dominates
total performance so semi-space performs best.
To explain this tradeoff, we need to introduce and slightly ex-
pand memory management terminology. A tracing garbage collec-
tor performs allocation of new objects, identification of live ob-
jects, and reclamation of free memory. The canonical collectors all
identify live objects the same way, by marking objects during a
transitive closure over the object graph.
Reclamation strategy dictates allocation strategy, and the litera-
ture identifies just three strategies: (1) sweep-to-free-list, (2) evacu-
ation, and (3) compaction. For example, mark-sweep collectors al-
locate from a free list, mark live objects, and then sweep-to-free-list
1.3
1.4
1.5
dTime
MS
MC
SS
8
16
32
64
CTime(log)
MS
MC
SS
claim evaluation
Immix: A Mark-Region Garbage Collector with
Space Efficiency, Fast Collection, and Mutator Performance ⇤
Stephen M. Blackburn
Australian National University
Steve.Blackburn@anu.edu.au
Kathryn S. McKinley
The University of Texas at Austin
mckinley@cs.utexas.edu
Abstract
Programmers are increasingly choosing managed languages for
modern applications, which tend to allocate many short-to-medium
lived small objects. The garbage collector therefore directly deter-
mines program performance by making a classic space-time trade-
off that seeks to provide space efficiency, fast reclamation, and mu-
tator performance. The three canonical tracing garbage collectors:
semi-space, mark-sweep, and mark-compact each sacrifice one ob-
jective. This paper describes a collector family, called mark-region,
and introduces opportunistic defragmentation, which mixes copy-
ing and marking in a single pass. Combining both, we implement
immix, a novel high performance garbage collector that achieves all
three performance objectives. The key insight is to allocate and re-
claim memory in contiguous regions, at a coarse block grain when
possible and otherwise in groups of finer grain lines. We show
that immix outperforms existing canonical algorithms, improving
total application performance by 7 to 25% on average across 20
benchmarks. As the mature space in a generational collector, im-
mix matches or beats a highly tuned generational collector, e.g. it
improves jbb2000 by 5%. These innovations and the identification
of a new family of collectors open new opportunities for garbage
collector design.
Categories and Subject Descriptors D.3.4 [Programming Lan-
guages]: Processors—Memory management (garbage collection)
efficiency, fast collection, and mutator performance through con-
tiguous allocation of contemporaneous objects.
Figure 1 starkly illustrates this dichotomy for full heap versions
of mark-sweep (MS), semi-space (SS), and mark-compact (MC)
implemented in MMTk [12], running on a Core 2 Duo. It plots
the geometric mean of total time, collection time, mutator time,
and mutator cache misses as a function of heap size, normalized
to the best, for 20 DaCapo, SPECjvm98, and SPECjbb2000 bench-
marks, and shows 99% confidence intervals. The crossing lines in
Figure 1(a) illustrate the classic space-time trade-off at the heart of
garbage collection. Mark-compact is uncompetitive in this setting
due to its overwhelming collection costs. In smaller heap sizes, the
space and collector efficiency of mark-sweep perform best since
the overheads of garbage collection dominate total performance.
Figures 1(c) and 1(d) show that the primary advantage for semi-
space is 10% better mutator time compared with mark-sweep, due
to better cache locality. Once the heap size is large enough, garbage
collection time reduces, and the locality of the mutator dominates
total performance so semi-space performs best.
To explain this tradeoff, we need to introduce and slightly ex-
pand memory management terminology. A tracing garbage collec-
tor performs allocation of new objects, identification of live ob-
jects, and reclamation of free memory. The canonical collectors all
identify live objects the same way, by marking objects during a
Categories and Subject Descriptors D.3.4 [Programming Lan-
guages]: Processors—Memory management (garbage collection)
General Terms Algorithms, Experimentation, Languages, Performance,
Measurement
Keywords Fragmentation, Free-List, Compact, Mark-Sweep, Semi-
Space, Mark-Region, Immix, Sweep-To-Region, Sweep-To-Free-List
1. Introduction
Modern applications are increasingly written in managed lan-
guages and make conflicting demands on their underlying mem-
ory managers. For example, real-time applications demand pause-
time guarantees, embedded systems demand space efficiency, and
servers demand high throughput. In seeking to satisfy these de-
mands, the literature includes reference counting collectors and
three canonical tracing collectors: semi-space, mark-sweep, and
mark-compact. These collectors are typically used as building
blocks for more sophisticated algorithms. Since reference count-
ing is incomplete, we omit it from further consideration here. Un-
fortunately, the tracing collectors each achieve only two of: space
⇤ This work is supported by ARC DP0666059, NSF CNS-0719966, NSF CCF-
0429859, NSF EIA-0303609, DARPA F33615-03-C-4106, Microsoft, Intel, and IBM.
Any opinions, findings and conclusions expressed herein are the authors’ and do not
necessarily reflect those of the sponsors.
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. To copy otherwise, to republish, to post on servers or to redistribute
to lists, requires prior specific permission and/or a fee.
PLDI’08, June 7–13, 2008, Tucson, Arizona, USA.
Copyright c 2008 ACM 978-1-59593-860-2/08/06...$5.00
identify live objects the same way, by marking objects during a
transitive closure over the object graph.
Reclamation strategy dictates allocation strategy, and the litera-
ture identifies just three strategies: (1) sweep-to-free-list, (2) evacu-
ation, and (3) compaction. For example, mark-sweep collectors al-
locate from a free list, mark live objects, and then sweep-to-free-list
1
1.1
1.2
1.3
1.4
1.5
1 2 3 4 5 6
NormalizedTime
Heap size relative to minimum heap size
MS
MC
SS
(a) Total Time
0.5
1
2
4
8
16
32
64
1 2 3 4 5 6
NormalizedGCTime(log)
Heap size relative to minimum heap size
MS
MC
SS
(b) Garbage Collection Time
1
1.05
1.1
1.15
1.2
1 2 3 4 5 6
NormalizedMutatorTime
Heap size relative to minimum heap size
MS
MC
SS
(c) Mutator Time
0.5
1
2
4
1 2 3 4 5 6
NormalizedMutatorPerf(log)
Heap size relative to minimum heap size
MS
MC
SS
(d) Mutator L2 Misses
Figure 1. Performance Tradeoffs For Canonical Collectors: Geo-
metric Mean for 20 DaCapo and SPEC Benchmarks.
evaluation claim?
You are doing it completely wrong!
sins of exposition
sins of reasoning
inscrutabilityirreproducability
ignorance inappropriateness inconsistency
sins of reasoning
evaluationclaim
ignorance
• claim ignores elements of the
evaluation
• may be benign
• not benign when the ignored elements
counter the claim
• common idioms
• ignoring data points
• ignoring data distribution
evaluation
claim
ignorance: ignoring data points [Georges et al 2007]
mental setup. In a
m in general, there
m that affect over-
on-determinism is
machine (VM) that
M compilation and
-determinism and
utions of the same
being taken and,
compiled and op-
n. Another source
cheduling in time-
mean w/ 95% confidence i
9.0
9.5
10.0
10.5
11.0
11.5
12.0
12.5
CopyMS
GenCopy
GenMS
MarkSweep
executiontime(s)
best of 30
9.0
9.5
10.0
10.5
11.0
11.5
12.0
12.5
CopyMS
GenCopy
GenMS
MarkSweep
SemiSpace
executiontime(s)
Figure 1. An example illustrating the pitfall of pre
Java performance data analysis methods: the ‘best’ m
a
ere
er-
is
hat
nd
nd
me
nd,
op-
ce
me-
mean w/ 95% confidence interval
9.0
9.5
10.0
10.5
11.0
11.5
12.0
12.5
CopyMS
GenCopy
GenMS
MarkSweep
SemiSpace
executiontime(s)
best of 30
9.0
9.5
10.0
10.5
11.0
11.5
12.0
12.5
CopyMS
GenCopy
GenMS
MarkSweep
SemiSpace
executiontime(s)
Figure 1. An example illustrating the pitfall of prevalent
Java performance data analysis methods: the ‘best’ method
ignorance: ignoring bimodality of a distribution
‘In our experience, while the sin of ignorance
seems obvious and easy to avoid, in reality it is
far from that. Many factors in the
evaluation that seem irrelevant to a claim
may actually be critical to the soundness
of the claim.
As a community we need to work towards
identifying these factors.’
inappropriateness
• claim extends beyond the evaluation
• may be benign
• assume laws of physics
• assume prior (cited) work
• not benign when extending beyond these
• common idioms
• inappropriate metrics (energy != performance)
• inappropriate use of independent variables (don’t control for space in GC eval)
• inappropriate use of tools (such as biased sampling)
evaluation
claim
A B
1
1.1
1.2
1.3
1.4
1.5
1 2 3 4 5 6
NormalizedTime
Heap size relative to minimum heap size
SemiSpace
MarkSweep
inappropriateness: misuse of independent variables (ignoring heap size)
[Blackburn et al 2008]
1
1.1
1.2
1.3
1.4
1.5
1 2 3 4 5 6
NormalizedTime
Heap size relative to minimum heap size
SemiSpace
MarkSweep
inappropriateness: inappropriate (biased) sampling: hotness of methods under four
profilers [Mytkowicz et al 2010]
hprof jprofile xprof yourkit
0
5
10
15
20
JavaParser.jj_scan_token
NodeIterator.getPositionFromParent
DefaultNameStep.evaluate
●
● ●
●
●
●
● ●● ●
●
●
percentofoverallexecution
Figure 1. Disagreement in the hottest method for benchmark pmd
across four popular Java profilers.
B
a
b
c
fo
jy
lu
p
Ta
as
tim
3.1
We
‘In our experience, while the sin of
inappropriateness seems obvious and easy to
avoid, in reality it is far from that. Many
factors that may be unaccounted for in an
evaluation may actually be important to
deriving a sound claim.
As a community, we need to work towards
identifying these factors.’
inconsistency
• claim and evaluation are
about different things
• compare two incompatible things
in the evaluation, but claim ignores the incompatibility (apples v oranges)
• common idioms
• inconsistent measurement contexts (night | day, 32 | 64 machines)
• inconsistent metrics (wall clock | cycles, retired instructions | issued instructions)
• inconsistent workloads
• inconsistent data analysis
evaluation claim
inconsistency: time of day/week makes a large difference to workload [gmail]
Time
RequestsperSecond
‘In our experience, while the sin of
inappropriateness seems obvious and easy to
avoid, in reality it is far from that. Many
artifacts that seem comparable may actually
be inconsistent.
As a community we need to work towards
identifying these artifacts.’
sins of exposition
inscrutability
• description of the claim is inadequate
• authors often fail to explicitly identify their claim
• three common forms:
• omission (no claim: when this happens, the work is literally meaningless)
• ambiguity (‘improved performance’: latency? throughput? …)
• distortion (‘improved by 10%’: the GC? the whole program? …)
• claim is synthesized by the authors, so in principle, inscrutability
should never occur
claim
irreproducability
• the description of the evaluation is inadequate
• three common forms:
• omission (space constraints, not realizing factor is relevant, confidentiality)
• ambiguity (imprecise language, lack of detail, missing units)
• distortion (unambiguous, but incorrect, such as missing units
evaluation
using this framework
evaluation claim?
sins of exposition
sins of reasoning
inscrutabilityirreproducability
ignorance inappropriateness inconsistency
our hope: we’re moving up Phil Amour’s orders of ignorance…
OOI3: I don’t know something and I do not have a
process to find out that I don’t know it
OOI2: I don’t know something but I do have a
process to find out that I don’t know it
call for cultural change
Rejected
Often rejected
Safe bet
Often rejected
Rare
Novelty
Qualityofevaluation
PLDI 2015: 27/33 artifacts accepted from 58 accepted papers
assert(scope_of_claim == scope_of_eval)
Questions?
With special thanks to my co-authors:
Amer Diwan, Matthias Hauswirth, Peter F. Sweeney
Jose ́ Nelson Amaral, Tim Brecht, Lubomr Bulej, Cliff Click, Lieven Eeckhout,
Sebastian Fischmeister, Daniel Frampton, Laurie J. Hendren, Michael Hind,
Antony L. Hosking, Richard E. Jones, Tomas Kalibera, Nathan Keynes,
Nathaniel Nystrom, and Andreas Zeller

More Related Content

Viewers also liked

Whither Software Engineering Research? (keynote talk at APSEC 2012)
Whither Software Engineering Research? (keynote talk at APSEC 2012)Whither Software Engineering Research? (keynote talk at APSEC 2012)
Whither Software Engineering Research? (keynote talk at APSEC 2012)David Rosenblum
 
Applications and Abstractions: A Cautionary Tale (invited talk at a DIMACS Wo...
Applications and Abstractions: A Cautionary Tale (invited talk at a DIMACS Wo...Applications and Abstractions: A Cautionary Tale (invited talk at a DIMACS Wo...
Applications and Abstractions: A Cautionary Tale (invited talk at a DIMACS Wo...David Rosenblum
 
Career Management (invited talk at ICSE 2014 NFRS)
Career Management (invited talk at ICSE 2014 NFRS)Career Management (invited talk at ICSE 2014 NFRS)
Career Management (invited talk at ICSE 2014 NFRS)David Rosenblum
 
Known Unknowns: Testing in the Presence of Uncertainty (talk at ACM SIGSOFT F...
Known Unknowns: Testing in the Presence of Uncertainty (talk at ACM SIGSOFT F...Known Unknowns: Testing in the Presence of Uncertainty (talk at ACM SIGSOFT F...
Known Unknowns: Testing in the Presence of Uncertainty (talk at ACM SIGSOFT F...David Rosenblum
 
Probability and Uncertainty in Software Engineering (keynote talk at NASAC 2013)
Probability and Uncertainty in Software Engineering (keynote talk at NASAC 2013)Probability and Uncertainty in Software Engineering (keynote talk at NASAC 2013)
Probability and Uncertainty in Software Engineering (keynote talk at NASAC 2013)David Rosenblum
 
Felicitous Computing (invited Talk for UC Irvine ISR Distinguished Speaker Se...
Felicitous Computing (invited Talk for UC Irvine ISR Distinguished Speaker Se...Felicitous Computing (invited Talk for UC Irvine ISR Distinguished Speaker Se...
Felicitous Computing (invited Talk for UC Irvine ISR Distinguished Speaker Se...David Rosenblum
 
The Power of Probabilistic Thinking (keynote talk at ASE 2016)
The Power of Probabilistic Thinking (keynote talk at ASE 2016)The Power of Probabilistic Thinking (keynote talk at ASE 2016)
The Power of Probabilistic Thinking (keynote talk at ASE 2016)David Rosenblum
 
Jogging While Driving, and Other Software Engineering Research Problems (invi...
Jogging While Driving, and Other Software Engineering Research Problems (invi...Jogging While Driving, and Other Software Engineering Research Problems (invi...
Jogging While Driving, and Other Software Engineering Research Problems (invi...David Rosenblum
 

Viewers also liked (8)

Whither Software Engineering Research? (keynote talk at APSEC 2012)
Whither Software Engineering Research? (keynote talk at APSEC 2012)Whither Software Engineering Research? (keynote talk at APSEC 2012)
Whither Software Engineering Research? (keynote talk at APSEC 2012)
 
Applications and Abstractions: A Cautionary Tale (invited talk at a DIMACS Wo...
Applications and Abstractions: A Cautionary Tale (invited talk at a DIMACS Wo...Applications and Abstractions: A Cautionary Tale (invited talk at a DIMACS Wo...
Applications and Abstractions: A Cautionary Tale (invited talk at a DIMACS Wo...
 
Career Management (invited talk at ICSE 2014 NFRS)
Career Management (invited talk at ICSE 2014 NFRS)Career Management (invited talk at ICSE 2014 NFRS)
Career Management (invited talk at ICSE 2014 NFRS)
 
Known Unknowns: Testing in the Presence of Uncertainty (talk at ACM SIGSOFT F...
Known Unknowns: Testing in the Presence of Uncertainty (talk at ACM SIGSOFT F...Known Unknowns: Testing in the Presence of Uncertainty (talk at ACM SIGSOFT F...
Known Unknowns: Testing in the Presence of Uncertainty (talk at ACM SIGSOFT F...
 
Probability and Uncertainty in Software Engineering (keynote talk at NASAC 2013)
Probability and Uncertainty in Software Engineering (keynote talk at NASAC 2013)Probability and Uncertainty in Software Engineering (keynote talk at NASAC 2013)
Probability and Uncertainty in Software Engineering (keynote talk at NASAC 2013)
 
Felicitous Computing (invited Talk for UC Irvine ISR Distinguished Speaker Se...
Felicitous Computing (invited Talk for UC Irvine ISR Distinguished Speaker Se...Felicitous Computing (invited Talk for UC Irvine ISR Distinguished Speaker Se...
Felicitous Computing (invited Talk for UC Irvine ISR Distinguished Speaker Se...
 
The Power of Probabilistic Thinking (keynote talk at ASE 2016)
The Power of Probabilistic Thinking (keynote talk at ASE 2016)The Power of Probabilistic Thinking (keynote talk at ASE 2016)
The Power of Probabilistic Thinking (keynote talk at ASE 2016)
 
Jogging While Driving, and Other Software Engineering Research Problems (invi...
Jogging While Driving, and Other Software Engineering Research Problems (invi...Jogging While Driving, and Other Software Engineering Research Problems (invi...
Jogging While Driving, and Other Software Engineering Research Problems (invi...
 

Similar to The Truth, the Whole Truth, and Nothing but the Truth: A Pragmatic Guide to Assessing Empirical Evaluations

The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingUniversity of Washington
 
L. Operation of Rougher Flotation Circuits Aided by Industrial Simulator.pdf
L. Operation of Rougher Flotation Circuits Aided by Industrial Simulator.pdfL. Operation of Rougher Flotation Circuits Aided by Industrial Simulator.pdf
L. Operation of Rougher Flotation Circuits Aided by Industrial Simulator.pdfvrcp21062000
 
Garbage First Garbage Collector (G1 GC): Current and Future Adaptability and ...
Garbage First Garbage Collector (G1 GC): Current and Future Adaptability and ...Garbage First Garbage Collector (G1 GC): Current and Future Adaptability and ...
Garbage First Garbage Collector (G1 GC): Current and Future Adaptability and ...Monica Beckwith
 
Parameter Estimation of Pollutant Removal for Subsurface Horizontal Flow Cons...
Parameter Estimation of Pollutant Removal for Subsurface Horizontal Flow Cons...Parameter Estimation of Pollutant Removal for Subsurface Horizontal Flow Cons...
Parameter Estimation of Pollutant Removal for Subsurface Horizontal Flow Cons...mkbsbs
 
MC2: High-Performance Garbage Collection for Memory-Constrained Environments
MC2: High-Performance Garbage Collection for Memory-Constrained EnvironmentsMC2: High-Performance Garbage Collection for Memory-Constrained Environments
MC2: High-Performance Garbage Collection for Memory-Constrained EnvironmentsEmery Berger
 
Let's talk about Garbage Collection
Let's talk about Garbage CollectionLet's talk about Garbage Collection
Let's talk about Garbage CollectionHaim Yadid
 
Hoard: A Scalable Memory Allocator for Multithreaded Applications
Hoard: A Scalable Memory Allocator for Multithreaded ApplicationsHoard: A Scalable Memory Allocator for Multithreaded Applications
Hoard: A Scalable Memory Allocator for Multithreaded ApplicationsEmery Berger
 
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...IJRESJOURNAL
 
Orpailleur -- triclustering talk
Orpailleur -- triclustering talkOrpailleur -- triclustering talk
Orpailleur -- triclustering talkDmitrii Ignatov
 
11.rotary brown stock pulp washers through mathematical models a review
11.rotary brown stock pulp washers through mathematical models a review11.rotary brown stock pulp washers through mathematical models a review
11.rotary brown stock pulp washers through mathematical models a reviewAlexander Decker
 
Rotary brown stock pulp washers through mathematical models a review
Rotary brown stock pulp washers through mathematical models a reviewRotary brown stock pulp washers through mathematical models a review
Rotary brown stock pulp washers through mathematical models a reviewAlexander Decker
 
JVM Garbage Collection Tuning
JVM Garbage Collection TuningJVM Garbage Collection Tuning
JVM Garbage Collection Tuningihji
 
Time domain analysis and synthesis using Pth norm filter design
Time domain analysis and synthesis using Pth norm filter designTime domain analysis and synthesis using Pth norm filter design
Time domain analysis and synthesis using Pth norm filter designCSCJournals
 

Similar to The Truth, the Whole Truth, and Nothing but the Truth: A Pragmatic Guide to Assessing Empirical Evaluations (20)

The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity Computing
 
Blinkdb
BlinkdbBlinkdb
Blinkdb
 
L. Operation of Rougher Flotation Circuits Aided by Industrial Simulator.pdf
L. Operation of Rougher Flotation Circuits Aided by Industrial Simulator.pdfL. Operation of Rougher Flotation Circuits Aided by Industrial Simulator.pdf
L. Operation of Rougher Flotation Circuits Aided by Industrial Simulator.pdf
 
Os7
Os7Os7
Os7
 
Garbage First Garbage Collector (G1 GC): Current and Future Adaptability and ...
Garbage First Garbage Collector (G1 GC): Current and Future Adaptability and ...Garbage First Garbage Collector (G1 GC): Current and Future Adaptability and ...
Garbage First Garbage Collector (G1 GC): Current and Future Adaptability and ...
 
Parameter Estimation of Pollutant Removal for Subsurface Horizontal Flow Cons...
Parameter Estimation of Pollutant Removal for Subsurface Horizontal Flow Cons...Parameter Estimation of Pollutant Removal for Subsurface Horizontal Flow Cons...
Parameter Estimation of Pollutant Removal for Subsurface Horizontal Flow Cons...
 
MC2: High-Performance Garbage Collection for Memory-Constrained Environments
MC2: High-Performance Garbage Collection for Memory-Constrained EnvironmentsMC2: High-Performance Garbage Collection for Memory-Constrained Environments
MC2: High-Performance Garbage Collection for Memory-Constrained Environments
 
Let's talk about Garbage Collection
Let's talk about Garbage CollectionLet's talk about Garbage Collection
Let's talk about Garbage Collection
 
SISAP17
SISAP17SISAP17
SISAP17
 
CNN for modeling sentence
CNN for modeling sentenceCNN for modeling sentence
CNN for modeling sentence
 
Srikanta Mishra
Srikanta MishraSrikanta Mishra
Srikanta Mishra
 
J011119198
J011119198J011119198
J011119198
 
Hoard: A Scalable Memory Allocator for Multithreaded Applications
Hoard: A Scalable Memory Allocator for Multithreaded ApplicationsHoard: A Scalable Memory Allocator for Multithreaded Applications
Hoard: A Scalable Memory Allocator for Multithreaded Applications
 
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
An Improved Adaptive Multi-Objective Particle Swarm Optimization for Disassem...
 
Orpailleur -- triclustering talk
Orpailleur -- triclustering talkOrpailleur -- triclustering talk
Orpailleur -- triclustering talk
 
11.rotary brown stock pulp washers through mathematical models a review
11.rotary brown stock pulp washers through mathematical models a review11.rotary brown stock pulp washers through mathematical models a review
11.rotary brown stock pulp washers through mathematical models a review
 
Rotary brown stock pulp washers through mathematical models a review
Rotary brown stock pulp washers through mathematical models a reviewRotary brown stock pulp washers through mathematical models a review
Rotary brown stock pulp washers through mathematical models a review
 
JVM Garbage Collection Tuning
JVM Garbage Collection TuningJVM Garbage Collection Tuning
JVM Garbage Collection Tuning
 
Time domain analysis and synthesis using Pth norm filter design
Time domain analysis and synthesis using Pth norm filter designTime domain analysis and synthesis using Pth norm filter design
Time domain analysis and synthesis using Pth norm filter design
 
50120140503004
5012014050300450120140503004
50120140503004
 

Recently uploaded

GBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolationGBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolationAreesha Ahmad
 
Energy is the beat of life irrespective of the domains. ATP- the energy curre...
Energy is the beat of life irrespective of the domains. ATP- the energy curre...Energy is the beat of life irrespective of the domains. ATP- the energy curre...
Energy is the beat of life irrespective of the domains. ATP- the energy curre...Nistarini College, Purulia (W.B) India
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Cherry
 
Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloChristian Robert
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneySérgio Sacani
 
FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.takadzanijustinmaime
 
Fourth quarter science 9-Kinetic-and-Potential-Energy.pptx
Fourth quarter science 9-Kinetic-and-Potential-Energy.pptxFourth quarter science 9-Kinetic-and-Potential-Energy.pptx
Fourth quarter science 9-Kinetic-and-Potential-Energy.pptxrosenapiri1
 
GBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) MetabolismGBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) MetabolismAreesha Ahmad
 
Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Fabiano Dalpiaz
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry Areesha Ahmad
 
Major groups of bacteria: Spirochetes, Chlamydia, Rickettsia, nanobes, mycopl...
Major groups of bacteria: Spirochetes, Chlamydia, Rickettsia, nanobes, mycopl...Major groups of bacteria: Spirochetes, Chlamydia, Rickettsia, nanobes, mycopl...
Major groups of bacteria: Spirochetes, Chlamydia, Rickettsia, nanobes, mycopl...Cherry
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptxArvind Kumar
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptxCherry
 
CONTRIBUTION OF PANCHANAN MAHESHWARI.pptx
CONTRIBUTION OF PANCHANAN MAHESHWARI.pptxCONTRIBUTION OF PANCHANAN MAHESHWARI.pptx
CONTRIBUTION OF PANCHANAN MAHESHWARI.pptxRASHMI M G
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxCherry
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.Cherry
 
Taphonomy and Quality of the Fossil Record
Taphonomy and Quality of the  Fossil RecordTaphonomy and Quality of the  Fossil Record
Taphonomy and Quality of the Fossil RecordSangram Sahoo
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCherry
 

Recently uploaded (20)

GBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolationGBSN - Microbiology (Unit 5) Concept of isolation
GBSN - Microbiology (Unit 5) Concept of isolation
 
Energy is the beat of life irrespective of the domains. ATP- the energy curre...
Energy is the beat of life irrespective of the domains. ATP- the energy curre...Energy is the beat of life irrespective of the domains. ATP- the energy curre...
Energy is the beat of life irrespective of the domains. ATP- the energy curre...
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Adaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte CarloAdaptive Restore algorithm & importance Monte Carlo
Adaptive Restore algorithm & importance Monte Carlo
 
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center ChimneyX-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
X-rays from a Central “Exhaust Vent” of the Galactic Center Chimney
 
FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.FS P2 COMBO MSTA LAST PUSH past exam papers.
FS P2 COMBO MSTA LAST PUSH past exam papers.
 
Fourth quarter science 9-Kinetic-and-Potential-Energy.pptx
Fourth quarter science 9-Kinetic-and-Potential-Energy.pptxFourth quarter science 9-Kinetic-and-Potential-Energy.pptx
Fourth quarter science 9-Kinetic-and-Potential-Energy.pptx
 
GBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) MetabolismGBSN - Biochemistry (Unit 3) Metabolism
GBSN - Biochemistry (Unit 3) Metabolism
 
Information science research with large language models: between science and ...
Information science research with large language models: between science and ...Information science research with large language models: between science and ...
Information science research with large language models: between science and ...
 
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
GBSN - Biochemistry (Unit 2) Basic concept of organic chemistry
 
Major groups of bacteria: Spirochetes, Chlamydia, Rickettsia, nanobes, mycopl...
Major groups of bacteria: Spirochetes, Chlamydia, Rickettsia, nanobes, mycopl...Major groups of bacteria: Spirochetes, Chlamydia, Rickettsia, nanobes, mycopl...
Major groups of bacteria: Spirochetes, Chlamydia, Rickettsia, nanobes, mycopl...
 
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY  // USES OF ANTIOBIOTICS TYPES OF ANTIB...
ABHISHEK ANTIBIOTICS PPT MICROBIOLOGY // USES OF ANTIOBIOTICS TYPES OF ANTIB...
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
PODOCARPUS...........................pptx
PODOCARPUS...........................pptxPODOCARPUS...........................pptx
PODOCARPUS...........................pptx
 
CONTRIBUTION OF PANCHANAN MAHESHWARI.pptx
CONTRIBUTION OF PANCHANAN MAHESHWARI.pptxCONTRIBUTION OF PANCHANAN MAHESHWARI.pptx
CONTRIBUTION OF PANCHANAN MAHESHWARI.pptx
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
Taphonomy and Quality of the Fossil Record
Taphonomy and Quality of the  Fossil RecordTaphonomy and Quality of the  Fossil Record
Taphonomy and Quality of the Fossil Record
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 

The Truth, the Whole Truth, and Nothing but the Truth: A Pragmatic Guide to Assessing Empirical Evaluations

  • 1. The Truth, the Whole Truth, and Nothing but the Truth A Pragmatic Guide to Assessing Empirical Evaluations Stephen M. Blackburn, Amer Diwan, Matthias Hauswirth, Peter F. Sweeney Jose ́ Nelson Amaral, Tim Brecht, Lubomr Bulej, Cliff Click, Lieven Eeckhout, Sebastian Fischmeister, Daniel Frampton, Laurie J. Hendren, Michael Hind, Antony L. Hosking, Richard E. Jones, Tomas Kalibera, Nathan Keynes, Nathaniel Nystrom, and Andreas Zeller Programming Languages Mentoring Workshop, SantaBarbara, June 2016
  • 2. Rob Walsh Anneke Kathryn McKinley Ron Morrison Eliot Moss Robin Stanton Dave Thomas Generosity Perseverance Mr Beach Graham Hellistrand
  • 3. The Truth, the Whole Truth, and Nothing but the Truth A Pragmatic Guide to Assessing Empirical Evaluations Stephen M. Blackburn, Amer Diwan, Matthias Hauswirth, Peter F. Sweeney Jose ́ Nelson Amaral, Tim Brecht, Lubomr Bulej, Cliff Click, Lieven Eeckhout, Sebastian Fischmeister, Daniel Frampton, Laurie J. Hendren, Michael Hind, Antony L. Hosking, Richard E. Jones, Tomas Kalibera, Nathan Keynes, Nathaniel Nystrom, and Andreas Zeller Programming Languages Mentoring Workshop, SantaBarbara, June 2016
  • 5. 1 1.1 1.2 1.3 1.4 1.5 1.6 1 1.5 2 2.5 3 16 18 20 22 24 30MB 40MB 50MB 60MB 70MB 80MB 90MB NormalizedTime Time(sec) Heap size relative to minimum heap size Heap size Depth First Breadth First Partial DF 2 Children OOR (a) jython Total Time 1 1.1 1.2 1.3 1.4 1.5 1.6 1 1.5 2 2.5 3 12 13 14 15 16 17 18 15MB 20MB 25MB 30MB 35MB 40MB 45MB 50MB 55MB 60MB 65MB NormalizedTime Time(sec) Heap size relative to minimum heap size Heap size Depth First Breadth First Partial DF 2 Children OOR (b) db Total Time 1 1.1 1.2 1.3 1.4 1.5 1.6 1 1.5 2 2.5 3 6.5 7 7.5 8 8.5 9 9.5 10 20MB 30MB 40MB 50MB 60MB 70MB 80MB NormalizedTime Time(sec) Heap size relative to minimum heap size Heap size Depth First Breadth First Partial DF 2 Children OOR (c) javac Total Time 1 1.1 1.2 1.3 1.4 1.5 1 1.5 2 2.5 3 15 16 17 18 19 20 21 22 30MB 40MB 50MB 60MB 70MB 80MB 90MB NormalizedMutatorTime MutatorTime(sec) Heap size relative to minimum heap size Heap size Depth First Breadth First Partial DF 2 Children OOR (d) jython Mutator Time 1 1.1 1.2 1.3 1.4 1.5 1 1.5 2 2.5 3 11 12 13 14 15 16 17 15MB 20MB 25MB 30MB 35MB 40MB 45MB 50MB 55MB 60MB 65MB NormalizedMutatorTime MutatorTime(sec) Heap size relative to minimum heap size Heap size Depth First Breadth First Partial DF 2 Children OOR (e) db Mutator Time 0.96 0.98 1 1.02 1.04 1.06 1.08 1.1 1 1.5 2 2.5 3 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 20MB 30MB 40MB 50MB 60MB 70MB 80MB NormalizedMutatorTime MutatorTime(sec) Heap size relative to minimum heap size Heap size Depth First Breadth First Partial DF 2 Children OOR (f) javac Mutator Time 3.5 4 4.5 5 120 140 160 30MB 40MB 50MB 60MB 70MB 80MB 90MB torL2Misses ses(10^6) Heap size Depth First Breadth First Partial DF 2 Children OOR 1.6 1.8 2 200 220 15MB20MB25MB30MB35MB40MB45MB50MB55MB60MB65MB torL2Misses ses(10^6) Heap size Depth First Breadth First Partial DF 2 Children OOR 1.6 1.8 2 26 28 30 32 20MB 30MB 40MB 50MB 60MB 70MB 80MB torL2Misses ses(10^6) Heap size Depth First Breadth First Partial DF 2 Children OOR
  • 7. a block-based heap structure could allow us to get the best of compacting, free list, and copying collectors hypothesis Abstract Programmers are increasingly choosing managed languages for modern applications, which tend to allocate many short-to-medium lived small objects. The garbage collector therefore directly deter- mines program performance by making a classic space-time trade- off that seeks to provide space efficiency, fast reclamation, and mu- tator performance. The three canonical tracing garbage collectors: semi-space, mark-sweep, and mark-compact each sacrifice one ob- jective. This paper describes a collector family, called mark-region, and introduces opportunistic defragmentation, which mixes copy- ing and marking in a single pass. Combining both, we implement immix, a novel high performance garbage collector that achieves all three performance objectives. The key insight is to allocate and re- claim memory in contiguous regions, at a coarse block grain when possible and otherwise in groups of finer grain lines. We show that immix outperforms existing canonical algorithms, improving total application performance by 7 to 25% on average across 20 benchmarks. As the mature space in a generational collector, im- mix matches or beats a highly tuned generational collector, e.g. it improves jbb2000 by 5%. These innovations and the identification of a new family of collectors open new opportunities for garbage collector design. Categories and Subject Descriptors D.3.4 [Programming Lan- guages]: Processors—Memory management (garbage collection) General Terms Algorithms, Experimentation, Languages, Performance, Measurement Keywords Fragmentation, Free-List, Compact, Mark-Sweep, Semi- Space, Mark-Region, Immix, Sweep-To-Region, Sweep-To-Free-List 1. Introduction Modern applications are increasingly written in managed lan- guages and make conflicting demands on their underlying mem- ory managers. For example, real-time applications demand pause- time guarantees, embedded systems demand space efficiency, and servers demand high throughput. In seeking to satisfy these de- efficiency, fast collection, and mutator performance through con- tiguous allocation of contemporaneous objects. Figure 1 starkly illustrates this dichotomy for full heap versions of mark-sweep (MS), semi-space (SS), and mark-compact (MC) implemented in MMTk [12], running on a Core 2 Duo. It plots the geometric mean of total time, collection time, mutator time, and mutator cache misses as a function of heap size, normalized to the best, for 20 DaCapo, SPECjvm98, and SPECjbb2000 bench- marks, and shows 99% confidence intervals. The crossing lines in Figure 1(a) illustrate the classic space-time trade-off at the heart of garbage collection. Mark-compact is uncompetitive in this setting due to its overwhelming collection costs. In smaller heap sizes, the space and collector efficiency of mark-sweep perform best since the overheads of garbage collection dominate total performance. Figures 1(c) and 1(d) show that the primary advantage for semi- space is 10% better mutator time compared with mark-sweep, due to better cache locality. Once the heap size is large enough, garbage collection time reduces, and the locality of the mutator dominates total performance so semi-space performs best. To explain this tradeoff, we need to introduce and slightly ex- pand memory management terminology. A tracing garbage collec- tor performs allocation of new objects, identification of live ob- jects, and reclamation of free memory. The canonical collectors all identify live objects the same way, by marking objects during a transitive closure over the object graph. Reclamation strategy dictates allocation strategy, and the litera- ture identifies just three strategies: (1) sweep-to-free-list, (2) evacu- ation, and (3) compaction. For example, mark-sweep collectors al- locate from a free list, mark live objects, and then sweep-to-free-list 1.3 1.4 1.5 dTime MS MC SS 8 16 32 64 CTime(log) MS MC SS claim evaluation Immix: A Mark-Region Garbage Collector with Space Efficiency, Fast Collection, and Mutator Performance ⇤ Stephen M. Blackburn Australian National University Steve.Blackburn@anu.edu.au Kathryn S. McKinley The University of Texas at Austin mckinley@cs.utexas.edu Abstract Programmers are increasingly choosing managed languages for modern applications, which tend to allocate many short-to-medium lived small objects. The garbage collector therefore directly deter- mines program performance by making a classic space-time trade- off that seeks to provide space efficiency, fast reclamation, and mu- tator performance. The three canonical tracing garbage collectors: semi-space, mark-sweep, and mark-compact each sacrifice one ob- jective. This paper describes a collector family, called mark-region, and introduces opportunistic defragmentation, which mixes copy- ing and marking in a single pass. Combining both, we implement immix, a novel high performance garbage collector that achieves all three performance objectives. The key insight is to allocate and re- claim memory in contiguous regions, at a coarse block grain when possible and otherwise in groups of finer grain lines. We show that immix outperforms existing canonical algorithms, improving total application performance by 7 to 25% on average across 20 benchmarks. As the mature space in a generational collector, im- mix matches or beats a highly tuned generational collector, e.g. it improves jbb2000 by 5%. These innovations and the identification of a new family of collectors open new opportunities for garbage collector design. Categories and Subject Descriptors D.3.4 [Programming Lan- guages]: Processors—Memory management (garbage collection) efficiency, fast collection, and mutator performance through con- tiguous allocation of contemporaneous objects. Figure 1 starkly illustrates this dichotomy for full heap versions of mark-sweep (MS), semi-space (SS), and mark-compact (MC) implemented in MMTk [12], running on a Core 2 Duo. It plots the geometric mean of total time, collection time, mutator time, and mutator cache misses as a function of heap size, normalized to the best, for 20 DaCapo, SPECjvm98, and SPECjbb2000 bench- marks, and shows 99% confidence intervals. The crossing lines in Figure 1(a) illustrate the classic space-time trade-off at the heart of garbage collection. Mark-compact is uncompetitive in this setting due to its overwhelming collection costs. In smaller heap sizes, the space and collector efficiency of mark-sweep perform best since the overheads of garbage collection dominate total performance. Figures 1(c) and 1(d) show that the primary advantage for semi- space is 10% better mutator time compared with mark-sweep, due to better cache locality. Once the heap size is large enough, garbage collection time reduces, and the locality of the mutator dominates total performance so semi-space performs best. To explain this tradeoff, we need to introduce and slightly ex- pand memory management terminology. A tracing garbage collec- tor performs allocation of new objects, identification of live ob- jects, and reclamation of free memory. The canonical collectors all identify live objects the same way, by marking objects during a Categories and Subject Descriptors D.3.4 [Programming Lan- guages]: Processors—Memory management (garbage collection) General Terms Algorithms, Experimentation, Languages, Performance, Measurement Keywords Fragmentation, Free-List, Compact, Mark-Sweep, Semi- Space, Mark-Region, Immix, Sweep-To-Region, Sweep-To-Free-List 1. Introduction Modern applications are increasingly written in managed lan- guages and make conflicting demands on their underlying mem- ory managers. For example, real-time applications demand pause- time guarantees, embedded systems demand space efficiency, and servers demand high throughput. In seeking to satisfy these de- mands, the literature includes reference counting collectors and three canonical tracing collectors: semi-space, mark-sweep, and mark-compact. These collectors are typically used as building blocks for more sophisticated algorithms. Since reference count- ing is incomplete, we omit it from further consideration here. Un- fortunately, the tracing collectors each achieve only two of: space ⇤ This work is supported by ARC DP0666059, NSF CNS-0719966, NSF CCF- 0429859, NSF EIA-0303609, DARPA F33615-03-C-4106, Microsoft, Intel, and IBM. Any opinions, findings and conclusions expressed herein are the authors’ and do not necessarily reflect those of the sponsors. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. PLDI’08, June 7–13, 2008, Tucson, Arizona, USA. Copyright c 2008 ACM 978-1-59593-860-2/08/06...$5.00 identify live objects the same way, by marking objects during a transitive closure over the object graph. Reclamation strategy dictates allocation strategy, and the litera- ture identifies just three strategies: (1) sweep-to-free-list, (2) evacu- ation, and (3) compaction. For example, mark-sweep collectors al- locate from a free list, mark live objects, and then sweep-to-free-list 1 1.1 1.2 1.3 1.4 1.5 1 2 3 4 5 6 NormalizedTime Heap size relative to minimum heap size MS MC SS (a) Total Time 0.5 1 2 4 8 16 32 64 1 2 3 4 5 6 NormalizedGCTime(log) Heap size relative to minimum heap size MS MC SS (b) Garbage Collection Time 1 1.05 1.1 1.15 1.2 1 2 3 4 5 6 NormalizedMutatorTime Heap size relative to minimum heap size MS MC SS (c) Mutator Time 0.5 1 2 4 1 2 3 4 5 6 NormalizedMutatorPerf(log) Heap size relative to minimum heap size MS MC SS (d) Mutator L2 Misses Figure 1. Performance Tradeoffs For Canonical Collectors: Geo- metric Mean for 20 DaCapo and SPEC Benchmarks.
  • 8. evaluation claim? You are doing it completely wrong! sins of exposition sins of reasoning inscrutabilityirreproducability ignorance inappropriateness inconsistency
  • 11. ignorance • claim ignores elements of the evaluation • may be benign • not benign when the ignored elements counter the claim • common idioms • ignoring data points • ignoring data distribution evaluation claim
  • 12. ignorance: ignoring data points [Georges et al 2007] mental setup. In a m in general, there m that affect over- on-determinism is machine (VM) that M compilation and -determinism and utions of the same being taken and, compiled and op- n. Another source cheduling in time- mean w/ 95% confidence i 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 CopyMS GenCopy GenMS MarkSweep executiontime(s) best of 30 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 CopyMS GenCopy GenMS MarkSweep SemiSpace executiontime(s) Figure 1. An example illustrating the pitfall of pre Java performance data analysis methods: the ‘best’ m a ere er- is hat nd nd me nd, op- ce me- mean w/ 95% confidence interval 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 CopyMS GenCopy GenMS MarkSweep SemiSpace executiontime(s) best of 30 9.0 9.5 10.0 10.5 11.0 11.5 12.0 12.5 CopyMS GenCopy GenMS MarkSweep SemiSpace executiontime(s) Figure 1. An example illustrating the pitfall of prevalent Java performance data analysis methods: the ‘best’ method
  • 13. ignorance: ignoring bimodality of a distribution
  • 14. ‘In our experience, while the sin of ignorance seems obvious and easy to avoid, in reality it is far from that. Many factors in the evaluation that seem irrelevant to a claim may actually be critical to the soundness of the claim. As a community we need to work towards identifying these factors.’
  • 15. inappropriateness • claim extends beyond the evaluation • may be benign • assume laws of physics • assume prior (cited) work • not benign when extending beyond these • common idioms • inappropriate metrics (energy != performance) • inappropriate use of independent variables (don’t control for space in GC eval) • inappropriate use of tools (such as biased sampling) evaluation claim
  • 16. A B 1 1.1 1.2 1.3 1.4 1.5 1 2 3 4 5 6 NormalizedTime Heap size relative to minimum heap size SemiSpace MarkSweep inappropriateness: misuse of independent variables (ignoring heap size) [Blackburn et al 2008] 1 1.1 1.2 1.3 1.4 1.5 1 2 3 4 5 6 NormalizedTime Heap size relative to minimum heap size SemiSpace MarkSweep
  • 17. inappropriateness: inappropriate (biased) sampling: hotness of methods under four profilers [Mytkowicz et al 2010] hprof jprofile xprof yourkit 0 5 10 15 20 JavaParser.jj_scan_token NodeIterator.getPositionFromParent DefaultNameStep.evaluate ● ● ● ● ● ● ● ●● ● ● ● percentofoverallexecution Figure 1. Disagreement in the hottest method for benchmark pmd across four popular Java profilers. B a b c fo jy lu p Ta as tim 3.1 We
  • 18. ‘In our experience, while the sin of inappropriateness seems obvious and easy to avoid, in reality it is far from that. Many factors that may be unaccounted for in an evaluation may actually be important to deriving a sound claim. As a community, we need to work towards identifying these factors.’
  • 19. inconsistency • claim and evaluation are about different things • compare two incompatible things in the evaluation, but claim ignores the incompatibility (apples v oranges) • common idioms • inconsistent measurement contexts (night | day, 32 | 64 machines) • inconsistent metrics (wall clock | cycles, retired instructions | issued instructions) • inconsistent workloads • inconsistent data analysis evaluation claim
  • 20. inconsistency: time of day/week makes a large difference to workload [gmail] Time RequestsperSecond
  • 21. ‘In our experience, while the sin of inappropriateness seems obvious and easy to avoid, in reality it is far from that. Many artifacts that seem comparable may actually be inconsistent. As a community we need to work towards identifying these artifacts.’
  • 23. inscrutability • description of the claim is inadequate • authors often fail to explicitly identify their claim • three common forms: • omission (no claim: when this happens, the work is literally meaningless) • ambiguity (‘improved performance’: latency? throughput? …) • distortion (‘improved by 10%’: the GC? the whole program? …) • claim is synthesized by the authors, so in principle, inscrutability should never occur claim
  • 24. irreproducability • the description of the evaluation is inadequate • three common forms: • omission (space constraints, not realizing factor is relevant, confidentiality) • ambiguity (imprecise language, lack of detail, missing units) • distortion (unambiguous, but incorrect, such as missing units evaluation
  • 26. evaluation claim? sins of exposition sins of reasoning inscrutabilityirreproducability ignorance inappropriateness inconsistency
  • 27. our hope: we’re moving up Phil Amour’s orders of ignorance… OOI3: I don’t know something and I do not have a process to find out that I don’t know it OOI2: I don’t know something but I do have a process to find out that I don’t know it
  • 29. Rejected Often rejected Safe bet Often rejected Rare Novelty Qualityofevaluation
  • 30. PLDI 2015: 27/33 artifacts accepted from 58 accepted papers
  • 32. Questions? With special thanks to my co-authors: Amer Diwan, Matthias Hauswirth, Peter F. Sweeney Jose ́ Nelson Amaral, Tim Brecht, Lubomr Bulej, Cliff Click, Lieven Eeckhout, Sebastian Fischmeister, Daniel Frampton, Laurie J. Hendren, Michael Hind, Antony L. Hosking, Richard E. Jones, Tomas Kalibera, Nathan Keynes, Nathaniel Nystrom, and Andreas Zeller