Opal Hermes - towards representative benchmarks

Hermes:
Towards Representative
Benchmarks
Collaborative work by many PhDs and Post Docs!
Dr. Michael Eichberg (mail@michael-eichberg.de)

Technische Universität Darmstadt
http://www.opal-project.de/Hermes.html
https://twitter.com/@TheOpalProject
https://gitter.im/OPAL-Project/Hermes

OOPSLA 2000
“ The benchmarks in our evaluation are
meant to be representative of real
applications and they include four SPECjvm
benchmarks plus three other large, object-
oriented benchmarks. ”

Empirical Software Engineering 2011
“ For our study, we consider the API
documentation of packages from the
ofﬁcial Java Development Kit, the Eclipse
project, and from the Apache foundation.
[…] By choosing libraries that cover a wide
range of use cases we try to alleviate the
risk that the study is biased towards a
particular domain. The decision is based on
the evidential observation that the level
and the quality of their documentation is
often high.”

FSE 2015
“ In the third research question, we looked at
the performance […] on realistic
programs. We selected five realistic apps
from different sources.”
0
75
150
225
300
Reflection BookStore a2z add.me.fast pipefilter
size/execution time of analysis
Why?

FSE 2019
“ We evaluate IT on three well-known
benchmark suites— DaCapo 2006,
Dacapo-9.12-MR1-bach and ScalaBench.
Additionally, we use two real-world
performance bug datasets.…”

Qualitas Corpus,

XCorpus,

…

It’s huge, but….

is it representative?
ant informa luceneantlr ireport marauroaaoi itext mavenargouml ivatagroupware megamekaspectj jFin_DateMath mvnforumaxion jag myfaces_coreazureus james nakedobjectsbatik jasml nekohtmlc_jdbc jasperreports netbeanscastor javacc openjmscayenne jboss oscachecheckstyle jchempaint picocontainercobertura jedit pmdcollections jena poicolt jext pookacolumba jfreechart proguardcompiere jgraph quartzderby jgraphpad quickserverdisplaytag jgrapht quiltdrawswf jgroups rollerdrjava jhotdraw rssowleclipse_SDK jmeter sableccemma jmoney sandmarkexoportal joggplayer springframeworkfindbugs jparse squirrel_sqlfitjava jpf strutsfreecol jre tapestryfreecs jrefactory

Properties of the 
Qualitas Corpus (Sept. 2013)
• No usage of Java FX (already became available in 2008)

• Only one project (Hibernate) made use of Java 7 features

• (There is a lot of redundancy.)

Feature-oriented
Benchmarks
• Securibench

• Pointerbench

• …

Features are
not distributed
uniformly!
Judge: Identifying,
Understanding, and
Evaluating Sources of
Unsoundness in Call
Graphs
Michael Reif, Florian Kübler, Michael
Eichberg, Dominik Helm, Mira Mezini

ISSTA’19
WALA0-1-CFA—all congured with the FULL reection option. WALA
requires to specify packages to be excluded from the analysis. For
the comparative analysis (Experiment 2 (see 4.3)) we excluded no
package, whereas for the experiment related to RQ3 we use the pre-
dened Java60RegressionExclusions to ensure termination. For all
Soot call graphs (SootCHA, SootRTA, SootVTA, and SootSPARK [26])
we use the options: safe-forname and safe-newinstance. This options
make Soot consider all types as instantiated when Class.forName
or Class.newInstance is used. We could not use types-for-invoke due
exceptions being thrown [41]. Furthermore, we use include-all to
ensure that no packages are ltered. Our library test cases are addi-
tionally started with library:signature-resolution and all-reachable
to make use of Soot’s capabilities to analyze library code. DOOPCI’s
call graph is set to be context-insensitive with classical-reection
turned on. For OPALRTA, we use the standard conguration.
All test cases w.r.t. libraries are started with the respective library
entry points. We perform all experiments on a server with two Intel
Xeon E5-2620 CPUs and 64 GB RAM.
4.2 Experiment 1
Our corpus for analyzing the prevalence of language and API fea-
tures (RQ1) includes the XCorpus [13], the top 50 libraries from
Maven Central [31] (from July 2018), the top 15 apps from Google’s
Playstore (from January 2018), plus ve Clojure [20], Groovy [32],
Kotlin [16], and Scala [24] projects.
Table 2 visualizes the results using a heatmap. It shows the
relative frequency of each feature (cf. Feature column) within each
corpus. We include the OpenJDK column as a separate corpus
because most corpus projects are built upon it and, hence, partially
use its features. A feature’s relative frequency is color coded using a
logarithmic scale as shown in the legend of Table 2. Slightly yellow
boxes (⌅) identify unused features and red boxes (⌅) those found
in 5% of all methods; we chose 5% because only 7 features occur
in more than 5% of all methods. Features used in no corpus (e.g.,
Groovy invokedynamics, or the serialization of lambdas) and always
soundly resolved features (e.g., standard poly-/monomorphic call)
are not included.
⌅ All the API and language features supported by Java up to version
7 are used widely across all code bases.
The most frequently used feature that was introduced with Java
8 is the call of static interface methods (J8DIM6). 12% of all
methods of the top 50 Maven projects use them; Scalatest [22] is
responsible for ⇡ 90% of all uses. Clojure and Android code have
not yet adapted Java 8 call semantics. Other Java 8 features, e.g.,
MethodHandle constants, are rarely used; primarily by the Nashorn
library.
⌅ Support for Java 8 is a must, given the frequent use of Java 8 call
semantics features in modern code (J8DIMX), unless one analyzes
only Android or Clojure code.
Serialization-related functionality (Ser3-7,9, ExtSer) and Java’s
Reection API (cf. TR, LRR, CSR) are both used with medium fre-
0%
0.00006%0.0001%0.0002%0.0005%0.0009%0.0018%0.0037%0.0073%0.0148%0.0295%0.059%0.1181%0.2367%0.4724%0.9448%1.8895%
5%
— log2(n)
Feature
OpenJDK8
XCorpusTop50MavenScala
GroovyKotlin
ClojureAndroid
Feature
OpenJDK8
XCorpusTop50MavenScala
GroovyKotlin
ClojureAndroid
J8DIM1 ExtSer2
J8DIM2 ExtSer3
J8DIM3 SI1
J8DIM4 SI2
J8DIM5 SI3
J8DIM6 SI4
JVMC1 SI5
JVMC2 SI6
JVMC3 SI7
JVMC4 SI8
JVMC5 MH1.1
Lambda1 MH1.2
Lambda2 MH2
Lambda3 MH3
Lambda4 MH4
Lambda5 MH5
Lambda6 MH6
Lambda7 MH7
LIB3 MH8
LIB4 TR1
LIB5 TR2
MR1 TR3
MR2 TR4
MR3 TR5
MR4 TR6
MR5 TR7
MR6 TR8
MR7 TR9
Native LRR1
Ser1 LRR2
Ser3 CSR1+CSR2
Ser4 LRR3+CSR3
Ser5 Unsafe1
Ser6 Unsafe2
Ser7 Unsafe3
Ser8 Unsafe4
Ser9 Unsafe5
ExtSer1 Unsafe6
Unsafe7

Getting Representative and
relevant Benchmarks
1. Determine the population
2. Get a (manageable) representative subset
2.1.determine the relevant technical features
2.2.compute the subset…
2.2.1.for development and testing purposes

2.2.2.for a comprehensive evaluation
3.M
ake
itpublicly
available!

Hermes
Assessment and Creation of Eﬀective Test Corpora 
(consisting of projects available as Java bytecode)

Facilitating the
Development and Evaluation of
Code Analyses
Idea
Prototyping
D
evelopm
ent
Evaluation
Paper

OPAL
Hermes in a Nutshell
Corpus candidates Hermes Optimal corpus
Feature Queries
Manual and/or
Automatic Selection
using Choco Solver

General Purpose Queries
Existence of  
Bytecode Instructions
Class File Versions
Class Types
Trivial Reflection
OO Metrics  
(Fan-In/Fan-Out,…)
Field Access
Method w/o Returns
Call Graph Related
Method Types
Recursive  
Data Structures
Size of 
Inheritance Tree
API Usage

Feature Queries Relevant for
Constructing Call Graphs
Static Initializer
Lambdas
Unsafe API
Polymorphic Calls
Method Reference
Serialization
Java 8 Polymorphic Calls
Signature Polymorphic
Methods
Non-Java bytecode
It’s
Extensible!

Feature Queries for  
API Usage
Bytecode  
Instrumentation
Class Loader
GUI
Crypto
JDBC
Reflection
System
Thread
Unsafe
It’s
Extensible!

Feature Queries
trait FeatureQuery {
// …
def apply[S](
projectConfiguration: ProjectConfiguration,
project: Project[S],
rawClassFiles: Traversable[(da.ClassFile, S)]
): TraversableOnce[Feature[S]]
// …
}
Identifier,
Project JAR Files,
Library JAR Files,
Statistics
Complete reified
project information
(classes, fields,
methods, bodys, etc.)
Raw class file information
(e.g., for extracting
information from the
constant pool)List of detected features in
the codebase (id, frequency
of occurrence, (opt.)
locations)

Feature
abstract case class Feature[S] private (
id: String,
count: Int,
extensions: Chain[Location[S]]
) {
…
}
The name of the
feature.
How often the feature
was found.
Where the feature was
found.

Constructing a  
Minimal Corpus
• Hidden Truths in Dead Software Paths @ FSE 15

• Original evaluation and development conducted on the
complete Qualitas Corpus

• Minimal corpus computed by Hermes using all available
queries only consists of 5 out of the 100 projects in the
Qualitas Corpus

• Evaluation cut down from 16.77 minutes to 2.82 minutes
(~6x faster) while coverage is only 1.06% below the
original corpus

• Can help you to build a speciﬁcally targeted benchmark

• Hermes Data (i.e., all current feature queries) for all of
Maven Central

• For details talk to Ben :-)
Delphi

Opal Hermes - towards representative benchmarks

More Related Content

What's hot

Similar to Opal Hermes - towards representative benchmarks

Recently uploaded

Opal Hermes - towards representative benchmarks