SlideShare a Scribd company logo
SOFTWARE
TECHNIK
Automating the Generation
of Benchmark Suites
Creation, Assessment, and Management of Effective Test
Corpora
Ben Hermann
@benhermann
Joint work Lisa Nguyen Quang Do, Michael Eichberg, Karim Ali, and Eric Bodden
National Java Resource Workshop @ SPLASH, Vancouver
October 23rd, 2017
@benhermannABM @ NJR 2017
Evaluation of Code
Analyses
2
@benhermannABM @ NJR 2017
Evaluation of Code Analyses
• Compare results of an analysis against
• A ground truth show soundness
• A previous analysis show improvement (e.g., in precision)
3
New analysis Ground truthPrevious analyses
@benhermannABM @ NJR 2017
Evaluation of Code Analyses
• Compare results of an analysis against
• A ground truth show soundness
• A previous analysis show improvement (e.g., in precision)
3
New analysis Ground truthPrevious analyses
Evaluation corpus
analyzesanalyzes is based on
@benhermannABM @ NJR 2017
Construction of a Corpus
4
@benhermannABM @ NJR 2017
Construction of a Corpus
4
Size
@benhermannABM @ NJR 2017
Construction of a Corpus
4
Size
Content
@benhermannABM @ NJR 2017
Construction of a Corpus
4
Size
Content
Representativeness
@benhermannABM @ NJR 2017
Construction of a Corpus
4
Size
Content
Representativeness
Permanence
Criteria from Tempero et al. 2010
@benhermannABM @ NJR 2017
Construction of a Corpus
4
Size
Content
Representativeness
Permanence
Criteria from Tempero et al. 2010
Sources
@benhermannABM @ NJR 2017
Construction of a Corpus
4
Size
Content
Representativeness
Permanence
Criteria from Tempero et al. 2010
Sources
Purpose
@benhermannABM @ NJR 2017
Construction of a Corpus
4
Size
Content
Representativeness
Permanence
Criteria from Tempero et al. 2010
Sources
Purpose
How to determine this?
@benhermannABM @ NJR 2017
Construction of a Corpus
4
Size
Content
Representativeness
Permanence
Criteria from Tempero et al. 2010
Sources
Purpose
How to determine this?
How to achieve this?
@benhermannABM @ NJR 2017
Sourcing Projects 

for the Corpus
5
ABM
Size
Content
@benhermannABM @ NJR 2017
Sourcing Projects 

for the Corpus
5
ABM
GitHub
BitBucket
…
collect
Size
Content
@benhermannABM @ NJR 2017
Sourcing Projects 

for the Corpus
5
ABM
GitHub
BitBucket
…
collect
Criteria such as size,
license, or
programming
language apply
Size
Content
@benhermannABM @ NJR 2017
Sourcing Projects 

for the Corpus
5
ABM
GitHub
BitBucket
…
collect build
Compiled
Projects
Criteria such as size,
license, or
programming
language apply
Size
Content
@benhermannABM @ NJR 2017
Sourcing Projects 

for the Corpus
5
ABM
GitHub
BitBucket
…
collect build
Compiled
Projects
Criteria such as size,
license, or
programming
language apply
We currently support
maven and sbt, but are
expanding (e.g., gradle)
Size
Content
@benhermannABM @ NJR 2017
How can we achieve
representativeness for a
corpus?
6
@benhermannABM @ NJR 2017
Representativeness in
Custom Collections
7
@benhermannABM @ NJR 2017
Representativeness in
Custom Collections
7
We used the three algorithms to construct respective call
graphs for a large set of libraries: the 100 most used distinct
Java related libraries from Maven Central Repository. The
set is representative for a wide range of libraries.
@benhermannABM @ NJR 2017
Representativeness in
Custom Collections
7
We used the three algorithms to construct respective call
graphs for a large set of libraries: the 100 most used distinct
Java related libraries from Maven Central Repository. The
set is representative for a wide range of libraries.
It contains very small (e.g., JUnit) to very large (e.g., Scala
Library) libraries; libraries developed primarily in an
industrial context (e.g., Guava) or in an open-source
setting (e.g., Apache Commons); libraries from very
different domains: testing (e.g., Hamcrest, Mockito),
databases (e.g., HSQLDB), bytecode engineering (e.g.,
cglib), runtime environments (e.g., Scala Runtime),
containers (e.g., Netty), and also general utility libraries
(e.g., osgi.core).
@benhermannABM @ NJR 2017
Representativeness in
Custom Collections
7
We used the three algorithms to construct respective call
graphs for a large set of libraries: the 100 most used distinct
Java related libraries from Maven Central Repository. The
set is representative for a wide range of libraries.
Additionally, it contains two libraries that have unusual
properties: jsr305 and easymockclassextesion both do not
contain a single instance method call. The jsr305 project is
just a collection of annotations and easymockclassextesion
only contains interface definitions and a few classes with
static methods.
It contains very small (e.g., JUnit) to very large (e.g., Scala
Library) libraries; libraries developed primarily in an
industrial context (e.g., Guava) or in an open-source
setting (e.g., Apache Commons); libraries from very
different domains: testing (e.g., Hamcrest, Mockito),
databases (e.g., HSQLDB), bytecode engineering (e.g.,
cglib), runtime environments (e.g., Scala Runtime),
containers (e.g., Netty), and also general utility libraries
(e.g., osgi.core).
@benhermannABM @ NJR 2017
Representativeness in
Custom Collections
7
We used the three algorithms to construct respective call
graphs for a large set of libraries: the 100 most used distinct
Java related libraries from Maven Central Repository. The
set is representative for a wide range of libraries.
Additionally, it contains two libraries that have unusual
properties: jsr305 and easymockclassextesion both do not
contain a single instance method call. The jsr305 project is
just a collection of annotations and easymockclassextesion
only contains interface definitions and a few classes with
static methods.
It contains very small (e.g., JUnit) to very large (e.g., Scala
Library) libraries; libraries developed primarily in an
industrial context (e.g., Guava) or in an open-source
setting (e.g., Apache Commons); libraries from very
different domains: testing (e.g., Hamcrest, Mockito),
databases (e.g., HSQLDB), bytecode engineering (e.g.,
cglib), runtime environments (e.g., Scala Runtime),
containers (e.g., Netty), and also general utility libraries
(e.g., osgi.core).
Lastly, the set also contains libraries that are written in other
languages, such as Scala (e.g., ScalaTest), whose compilers
only use a subset of the JVM’s concepts. The Scala
compiler, e.g., does not use package and protected visibility.
This significantly limits our possibilities to identify the
library-private implementation (recall that LibCHACPA
identifies a library’s private implementation based on the
evaluation of the code elements’ visibilities). For each
library, we also downloaded all of its dependencies to build
complete class hierarchies for them.
@benhermannABM @ NJR 2017
Representativeness in
Custom Collections
7
We used the three algorithms to construct respective call
graphs for a large set of libraries: the 100 most used distinct
Java related libraries from Maven Central Repository. The
set is representative for a wide range of libraries.
Additionally, it contains two libraries that have unusual
properties: jsr305 and easymockclassextesion both do not
contain a single instance method call. The jsr305 project is
just a collection of annotations and easymockclassextesion
only contains interface definitions and a few classes with
static methods.
It contains very small (e.g., JUnit) to very large (e.g., Scala
Library) libraries; libraries developed primarily in an
industrial context (e.g., Guava) or in an open-source
setting (e.g., Apache Commons); libraries from very
different domains: testing (e.g., Hamcrest, Mockito),
databases (e.g., HSQLDB), bytecode engineering (e.g.,
cglib), runtime environments (e.g., Scala Runtime),
containers (e.g., Netty), and also general utility libraries
(e.g., osgi.core).
Lastly, the set also contains libraries that are written in other
languages, such as Scala (e.g., ScalaTest), whose compilers
only use a subset of the JVM’s concepts. The Scala
compiler, e.g., does not use package and protected visibility.
This significantly limits our possibilities to identify the
library-private implementation (recall that LibCHACPA
identifies a library’s private implementation based on the
evaluation of the code elements’ visibilities). For each
library, we also downloaded all of its dependencies to build
complete class hierarchies for them.
Michael Reif, Michael Eichberg, Ben Hermann, Johannes Lerch, and Mira Mezini. 2016. Call graph construction for
Java libraries. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software
Engineering (FSE 2016)
Description of the Darmstadt Library Corpus (DLC) from:
@benhermannABM @ NJR 2017
Representativeness 

in ABM
8
ABM
build
Compiled
Projects
Representativeness
@benhermannABM @ NJR 2017
Representativeness 

in ABM
8
ABM
build
Compiled
Projects
Representativeness
Hermes
inspect
select
@benhermannABM @ NJR 2017
How Hermes Works
9
Corpus candidates
@benhermannABM @ NJR 2017
How Hermes Works
9
Corpus candidates Hermes
@benhermannABM @ NJR 2017
How Hermes Works
9
Corpus candidates Hermes Optimal corpus
@benhermannABM @ NJR 2017
How Hermes Works
9
Corpus candidates Hermes Optimal corpus
Feature Queries
@benhermannABM @ NJR 2017
How Hermes Works
9
Corpus candidates Hermes Optimal corpus
Feature Queries
Manual or Automatic
Selection
@benhermannABM @ NJR 2017
OPAL
How Hermes Works
9
Corpus candidates Hermes Optimal corpus
Feature Queries
Manual or Automatic
Selection
@benhermannABM @ NJR 2017
OPAL
How Hermes Works
9
Corpus candidates Hermes Optimal corpus
Feature Queries
Manual or Automatic
Selection
Introduced at 

SOAP 2014
Introduced at 

SOAP 2017
@benhermannABM @ NJR 2017
Feature Queries
10
trait FeatureQuery {
// …
def apply[S](
projectConfiguration: ProjectConfiguration,
project: Project[S],
rawClassFiles: Traversable[(da.ClassFile, S)]
): TraversableOnce[Feature[S]]
// …
}
@benhermannABM @ NJR 2017
Feature Queries
10
trait FeatureQuery {
// …
def apply[S](
projectConfiguration: ProjectConfiguration,
project: Project[S],
rawClassFiles: Traversable[(da.ClassFile, S)]
): TraversableOnce[Feature[S]]
// …
}
Identifier,
Project JAR Files,
Library JAR Files,
Statistics
@benhermannABM @ NJR 2017
Feature Queries
10
trait FeatureQuery {
// …
def apply[S](
projectConfiguration: ProjectConfiguration,
project: Project[S],
rawClassFiles: Traversable[(da.ClassFile, S)]
): TraversableOnce[Feature[S]]
// …
}
Identifier,
Project JAR Files,
Library JAR Files,
Statistics
Complete reified
project information
(classes, fields,
methods, bodys, etc.)
@benhermannABM @ NJR 2017
Feature Queries
10
trait FeatureQuery {
// …
def apply[S](
projectConfiguration: ProjectConfiguration,
project: Project[S],
rawClassFiles: Traversable[(da.ClassFile, S)]
): TraversableOnce[Feature[S]]
// …
}
Identifier,
Project JAR Files,
Library JAR Files,
Statistics
Complete reified
project information
(classes, fields,
methods, bodys, etc.)
Raw class file information
(e.g., for extracting
information from the
constant pool)
@benhermannABM @ NJR 2017
Feature Queries
10
trait FeatureQuery {
// …
def apply[S](
projectConfiguration: ProjectConfiguration,
project: Project[S],
rawClassFiles: Traversable[(da.ClassFile, S)]
): TraversableOnce[Feature[S]]
// …
}
Identifier,
Project JAR Files,
Library JAR Files,
Statistics
Complete reified
project information
(classes, fields,
methods, bodys, etc.)
Raw class file information
(e.g., for extracting
information from the
constant pool)List of detected features in
the codebase (id, frequency
of occurrence, (opt.)
locations)
@benhermannABM @ NJR 2017
Already Implemented
Queries
11
@benhermannABM @ NJR 2017
Already Implemented
Queries
11
Existence of 

Bytecode Instructions
Class File Versions
Class Types
Trivial Reflection
Fan-In/Fan-Out
Field Access
Method w/o Returns
Method Types
Various Metrics
Recursive 

Data Structures
Size of

Inheritance Tree
API Usage
@benhermannABM @ NJR 2017
Feature Queries for 

API Usage
12
@benhermannABM @ NJR 2017
Feature Queries for 

API Usage
12
Bytecode 

Instrumentation
Class Loader
GUI
Crypto
JDBC
Reflection
System
Thread
Unsafe
@benhermannABM @ NJR 2017
Constructing a Minimal
Corpus
• Dead-Path Analysis [FSE15]
• Original evaluation conducted on the complete Qualitas
Corpus
• Minimal corpus only consists of 5 out of the 100
projects in the Qualitas Corpus
• Evaluation cut down from 16.77 minutes to 2.82
minutes (~6x faster) while coverage is only 1.06% below
the original corpus
13
@benhermannABM @ NJR 2017
Collection Permanence
14
Permanence
ABM
@benhermannABM @ NJR 2017
Collection Permanence
14
Permanence
ABM
We store and retain
collection definitions
@benhermannABM @ NJR 2017
Collection Permanence
14
Permanence
ABM
Download corpus and 

provide on your
infrastructure
Collected
Projects
We store and retain
collection definitions
@benhermannABM @ NJR 2017
Collection Permanence
14
Permanence
ABM
Publish 

complete corpus
Download corpus and 

provide on your
infrastructure
Collected
Projects
We store and retain
collection definitions
@benhermannABM @ NJR 2017
Collection Permanence
14
Permanence
ABM
Publish 

complete corpus
use DOI 

for papers
Download corpus and 

provide on your
infrastructure
Collected
Projects
We store and retain
collection definitions
@benhermannABM @ NJR 2017
Collection Permanence
14
Permanence
ABM
Publish 

complete corpus
use DOI 

for papers
Download corpus and 

provide on your
infrastructure
Collected
Projects
We store and retain
collection definitions
We would love to see
more services like this
@benhermannABM @ NJR 2017
Bringing it all together
15
ABM Hermes
inspect
GitHub
BitBucket
…
collect
build
publish 

complete corpus
use DOI 

for papers
SOFTWARE
TECHNIK
Automating the Generation of
Benchmark Suites
Creation, Assessment, and Management of Effective Test Corpora
Ben Hermann
@benhermann
Joint work Michael Reif, Michael Eichberg, and Mira Mezini
Thank you!

More Related Content

What's hot

Kafka Connect
Kafka ConnectKafka Connect
Kafka Connect
Oleg Kuznetsov
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
DataStax Academy
 
DAWN and Scientific Workflows
DAWN and Scientific WorkflowsDAWN and Scientific Workflows
DAWN and Scientific Workflows
Matthew Gerring
 
SEppt
SEpptSEppt
Intro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with sparkIntro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with spark
Alex Zeltov
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streaming
datamantra
 
Bringing complex event processing to Spark streaming
Bringing complex event processing to Spark streamingBringing complex event processing to Spark streaming
Bringing complex event processing to Spark streaming
DataWorks Summit
 
Online Tweet Sentiment Analysis with Apache Spark
Online Tweet Sentiment Analysis with Apache SparkOnline Tweet Sentiment Analysis with Apache Spark
Online Tweet Sentiment Analysis with Apache Spark
Davide Nardone
 
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Uri Laserson
 
Cloudera - Using morphlines for on the-fly ETL by Wolfgang Hoschek
Cloudera - Using morphlines for on the-fly ETL by Wolfgang HoschekCloudera - Using morphlines for on the-fly ETL by Wolfgang Hoschek
Cloudera - Using morphlines for on the-fly ETL by Wolfgang Hoschek
Hakka Labs
 
ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3
DataWorks Summit
 

What's hot (11)

Kafka Connect
Kafka ConnectKafka Connect
Kafka Connect
 
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
Big Data Open Source Security LLC: Realtime log analysis with Mesos, Docker, ...
 
DAWN and Scientific Workflows
DAWN and Scientific WorkflowsDAWN and Scientific Workflows
DAWN and Scientific Workflows
 
SEppt
SEpptSEppt
SEppt
 
Intro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with sparkIntro to big data analytics using microsoft machine learning server with spark
Intro to big data analytics using microsoft machine learning server with spark
 
Kafka and Spark Streaming
Kafka and Spark StreamingKafka and Spark Streaming
Kafka and Spark Streaming
 
Bringing complex event processing to Spark streaming
Bringing complex event processing to Spark streamingBringing complex event processing to Spark streaming
Bringing complex event processing to Spark streaming
 
Online Tweet Sentiment Analysis with Apache Spark
Online Tweet Sentiment Analysis with Apache SparkOnline Tweet Sentiment Analysis with Apache Spark
Online Tweet Sentiment Analysis with Apache Spark
 
Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)Large-Scale Data Science on Hadoop (Intel Big Data Day)
Large-Scale Data Science on Hadoop (Intel Big Data Day)
 
Cloudera - Using morphlines for on the-fly ETL by Wolfgang Hoschek
Cloudera - Using morphlines for on the-fly ETL by Wolfgang HoschekCloudera - Using morphlines for on the-fly ETL by Wolfgang Hoschek
Cloudera - Using morphlines for on the-fly ETL by Wolfgang Hoschek
 
ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3
 

Similar to Automating the Generation of Benchmark Suites

Introduction to Scala
Introduction to ScalaIntroduction to Scala
Introduction to Scala
Mohammad Hossein Rimaz
 
Assist software awesome scala
Assist software   awesome scalaAssist software   awesome scala
Assist software awesome scala
AssistSoftware
 
Model-based Analysis of Large Scale Software Repositories
Model-based Analysis of Large Scale Software RepositoriesModel-based Analysis of Large Scale Software Repositories
Model-based Analysis of Large Scale Software Repositories
Markus Scheidgen
 
Introduction to Roslyn and its use in program development
Introduction to Roslyn and its use in program developmentIntroduction to Roslyn and its use in program development
Introduction to Roslyn and its use in program development
PVS-Studio
 
Introduction to Roslyn and its use in program development
Introduction to Roslyn and its use in program developmentIntroduction to Roslyn and its use in program development
Introduction to Roslyn and its use in program development
Ekaterina Milovidova
 
Memory models in c#
Memory models in c#Memory models in c#
Memory models in c#
Sophie Obomighie
 
The Why and How of Scala at Twitter
The Why and How of Scala at TwitterThe Why and How of Scala at Twitter
The Why and How of Scala at Twitter
Alex Payne
 
Authorcontext:ire
Authorcontext:ireAuthorcontext:ire
Authorcontext:ire
Soham Saha
 
Building .NET Core tools using the Roslyn API by Arthur Tabatchnic at .Net fo...
Building .NET Core tools using the Roslyn API by Arthur Tabatchnic at .Net fo...Building .NET Core tools using the Roslyn API by Arthur Tabatchnic at .Net fo...
Building .NET Core tools using the Roslyn API by Arthur Tabatchnic at .Net fo...
DevClub_lv
 
Scala and its Ecosystem
Scala and its EcosystemScala and its Ecosystem
Scala and its Ecosystem
Petr Hošek
 
Analyzing the Evolution of Testing Library Usage in Open Source Java Projects
Analyzing the Evolution of Testing Library Usage in Open Source Java ProjectsAnalyzing the Evolution of Testing Library Usage in Open Source Java Projects
Analyzing the Evolution of Testing Library Usage in Open Source Java Projects
Ahmed Zerouali
 
Apache maven and its impact on java 9 (Java One 2017)
Apache maven and its impact on java 9 (Java One 2017)Apache maven and its impact on java 9 (Java One 2017)
Apache maven and its impact on java 9 (Java One 2017)
Robert Scholte
 
The essence of the VivaCore code analysis library
The essence of the VivaCore code analysis libraryThe essence of the VivaCore code analysis library
The essence of the VivaCore code analysis library
PVS-Studio
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
ateeq ateeq
 
Programming the Semantic Web
Programming the Semantic WebProgramming the Semantic Web
Programming the Semantic Web
Luigi De Russis
 
A General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
A General Purpose Extensible Scanning Query Architecture for Ad Hoc AnalyticsA General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
A General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
Flurry, Inc.
 
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorialESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
Jonathon Hare
 
Ruby on rails
Ruby on railsRuby on rails
Ruby on rails
TAInteractive
 
Ruby on Rails
Ruby on Rails Ruby on Rails
Ruby on Rails
thinkahead.net
 
Ruby on rails
Ruby on railsRuby on rails
Ruby on rails
TAInteractive
 

Similar to Automating the Generation of Benchmark Suites (20)

Introduction to Scala
Introduction to ScalaIntroduction to Scala
Introduction to Scala
 
Assist software awesome scala
Assist software   awesome scalaAssist software   awesome scala
Assist software awesome scala
 
Model-based Analysis of Large Scale Software Repositories
Model-based Analysis of Large Scale Software RepositoriesModel-based Analysis of Large Scale Software Repositories
Model-based Analysis of Large Scale Software Repositories
 
Introduction to Roslyn and its use in program development
Introduction to Roslyn and its use in program developmentIntroduction to Roslyn and its use in program development
Introduction to Roslyn and its use in program development
 
Introduction to Roslyn and its use in program development
Introduction to Roslyn and its use in program developmentIntroduction to Roslyn and its use in program development
Introduction to Roslyn and its use in program development
 
Memory models in c#
Memory models in c#Memory models in c#
Memory models in c#
 
The Why and How of Scala at Twitter
The Why and How of Scala at TwitterThe Why and How of Scala at Twitter
The Why and How of Scala at Twitter
 
Authorcontext:ire
Authorcontext:ireAuthorcontext:ire
Authorcontext:ire
 
Building .NET Core tools using the Roslyn API by Arthur Tabatchnic at .Net fo...
Building .NET Core tools using the Roslyn API by Arthur Tabatchnic at .Net fo...Building .NET Core tools using the Roslyn API by Arthur Tabatchnic at .Net fo...
Building .NET Core tools using the Roslyn API by Arthur Tabatchnic at .Net fo...
 
Scala and its Ecosystem
Scala and its EcosystemScala and its Ecosystem
Scala and its Ecosystem
 
Analyzing the Evolution of Testing Library Usage in Open Source Java Projects
Analyzing the Evolution of Testing Library Usage in Open Source Java ProjectsAnalyzing the Evolution of Testing Library Usage in Open Source Java Projects
Analyzing the Evolution of Testing Library Usage in Open Source Java Projects
 
Apache maven and its impact on java 9 (Java One 2017)
Apache maven and its impact on java 9 (Java One 2017)Apache maven and its impact on java 9 (Java One 2017)
Apache maven and its impact on java 9 (Java One 2017)
 
The essence of the VivaCore code analysis library
The essence of the VivaCore code analysis libraryThe essence of the VivaCore code analysis library
The essence of the VivaCore code analysis library
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Programming the Semantic Web
Programming the Semantic WebProgramming the Semantic Web
Programming the Semantic Web
 
A General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
A General Purpose Extensible Scanning Query Architecture for Ad Hoc AnalyticsA General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
A General Purpose Extensible Scanning Query Architecture for Ad Hoc Analytics
 
ESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorialESSIR LivingKnowledge DiversityEngine tutorial
ESSIR LivingKnowledge DiversityEngine tutorial
 
Ruby on rails
Ruby on railsRuby on rails
Ruby on rails
 
Ruby on Rails
Ruby on Rails Ruby on Rails
Ruby on Rails
 
Ruby on rails
Ruby on railsRuby on rails
Ruby on rails
 

Recently uploaded

Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
IndexBug
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 

Recently uploaded (20)

Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceAI 101: An Introduction to the Basics and Impact of Artificial Intelligence
AI 101: An Introduction to the Basics and Impact of Artificial Intelligence
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 

Automating the Generation of Benchmark Suites

  • 1. SOFTWARE TECHNIK Automating the Generation of Benchmark Suites Creation, Assessment, and Management of Effective Test Corpora Ben Hermann @benhermann Joint work Lisa Nguyen Quang Do, Michael Eichberg, Karim Ali, and Eric Bodden National Java Resource Workshop @ SPLASH, Vancouver October 23rd, 2017
  • 2. @benhermannABM @ NJR 2017 Evaluation of Code Analyses 2
  • 3. @benhermannABM @ NJR 2017 Evaluation of Code Analyses • Compare results of an analysis against • A ground truth show soundness • A previous analysis show improvement (e.g., in precision) 3 New analysis Ground truthPrevious analyses
  • 4. @benhermannABM @ NJR 2017 Evaluation of Code Analyses • Compare results of an analysis against • A ground truth show soundness • A previous analysis show improvement (e.g., in precision) 3 New analysis Ground truthPrevious analyses Evaluation corpus analyzesanalyzes is based on
  • 5. @benhermannABM @ NJR 2017 Construction of a Corpus 4
  • 6. @benhermannABM @ NJR 2017 Construction of a Corpus 4 Size
  • 7. @benhermannABM @ NJR 2017 Construction of a Corpus 4 Size Content
  • 8. @benhermannABM @ NJR 2017 Construction of a Corpus 4 Size Content Representativeness
  • 9. @benhermannABM @ NJR 2017 Construction of a Corpus 4 Size Content Representativeness Permanence Criteria from Tempero et al. 2010
  • 10. @benhermannABM @ NJR 2017 Construction of a Corpus 4 Size Content Representativeness Permanence Criteria from Tempero et al. 2010 Sources
  • 11. @benhermannABM @ NJR 2017 Construction of a Corpus 4 Size Content Representativeness Permanence Criteria from Tempero et al. 2010 Sources Purpose
  • 12. @benhermannABM @ NJR 2017 Construction of a Corpus 4 Size Content Representativeness Permanence Criteria from Tempero et al. 2010 Sources Purpose How to determine this?
  • 13. @benhermannABM @ NJR 2017 Construction of a Corpus 4 Size Content Representativeness Permanence Criteria from Tempero et al. 2010 Sources Purpose How to determine this? How to achieve this?
  • 14. @benhermannABM @ NJR 2017 Sourcing Projects 
 for the Corpus 5 ABM Size Content
  • 15. @benhermannABM @ NJR 2017 Sourcing Projects 
 for the Corpus 5 ABM GitHub BitBucket … collect Size Content
  • 16. @benhermannABM @ NJR 2017 Sourcing Projects 
 for the Corpus 5 ABM GitHub BitBucket … collect Criteria such as size, license, or programming language apply Size Content
  • 17. @benhermannABM @ NJR 2017 Sourcing Projects 
 for the Corpus 5 ABM GitHub BitBucket … collect build Compiled Projects Criteria such as size, license, or programming language apply Size Content
  • 18. @benhermannABM @ NJR 2017 Sourcing Projects 
 for the Corpus 5 ABM GitHub BitBucket … collect build Compiled Projects Criteria such as size, license, or programming language apply We currently support maven and sbt, but are expanding (e.g., gradle) Size Content
  • 19. @benhermannABM @ NJR 2017 How can we achieve representativeness for a corpus? 6
  • 20. @benhermannABM @ NJR 2017 Representativeness in Custom Collections 7
  • 21. @benhermannABM @ NJR 2017 Representativeness in Custom Collections 7 We used the three algorithms to construct respective call graphs for a large set of libraries: the 100 most used distinct Java related libraries from Maven Central Repository. The set is representative for a wide range of libraries.
  • 22. @benhermannABM @ NJR 2017 Representativeness in Custom Collections 7 We used the three algorithms to construct respective call graphs for a large set of libraries: the 100 most used distinct Java related libraries from Maven Central Repository. The set is representative for a wide range of libraries. It contains very small (e.g., JUnit) to very large (e.g., Scala Library) libraries; libraries developed primarily in an industrial context (e.g., Guava) or in an open-source setting (e.g., Apache Commons); libraries from very different domains: testing (e.g., Hamcrest, Mockito), databases (e.g., HSQLDB), bytecode engineering (e.g., cglib), runtime environments (e.g., Scala Runtime), containers (e.g., Netty), and also general utility libraries (e.g., osgi.core).
  • 23. @benhermannABM @ NJR 2017 Representativeness in Custom Collections 7 We used the three algorithms to construct respective call graphs for a large set of libraries: the 100 most used distinct Java related libraries from Maven Central Repository. The set is representative for a wide range of libraries. Additionally, it contains two libraries that have unusual properties: jsr305 and easymockclassextesion both do not contain a single instance method call. The jsr305 project is just a collection of annotations and easymockclassextesion only contains interface definitions and a few classes with static methods. It contains very small (e.g., JUnit) to very large (e.g., Scala Library) libraries; libraries developed primarily in an industrial context (e.g., Guava) or in an open-source setting (e.g., Apache Commons); libraries from very different domains: testing (e.g., Hamcrest, Mockito), databases (e.g., HSQLDB), bytecode engineering (e.g., cglib), runtime environments (e.g., Scala Runtime), containers (e.g., Netty), and also general utility libraries (e.g., osgi.core).
  • 24. @benhermannABM @ NJR 2017 Representativeness in Custom Collections 7 We used the three algorithms to construct respective call graphs for a large set of libraries: the 100 most used distinct Java related libraries from Maven Central Repository. The set is representative for a wide range of libraries. Additionally, it contains two libraries that have unusual properties: jsr305 and easymockclassextesion both do not contain a single instance method call. The jsr305 project is just a collection of annotations and easymockclassextesion only contains interface definitions and a few classes with static methods. It contains very small (e.g., JUnit) to very large (e.g., Scala Library) libraries; libraries developed primarily in an industrial context (e.g., Guava) or in an open-source setting (e.g., Apache Commons); libraries from very different domains: testing (e.g., Hamcrest, Mockito), databases (e.g., HSQLDB), bytecode engineering (e.g., cglib), runtime environments (e.g., Scala Runtime), containers (e.g., Netty), and also general utility libraries (e.g., osgi.core). Lastly, the set also contains libraries that are written in other languages, such as Scala (e.g., ScalaTest), whose compilers only use a subset of the JVM’s concepts. The Scala compiler, e.g., does not use package and protected visibility. This significantly limits our possibilities to identify the library-private implementation (recall that LibCHACPA identifies a library’s private implementation based on the evaluation of the code elements’ visibilities). For each library, we also downloaded all of its dependencies to build complete class hierarchies for them.
  • 25. @benhermannABM @ NJR 2017 Representativeness in Custom Collections 7 We used the three algorithms to construct respective call graphs for a large set of libraries: the 100 most used distinct Java related libraries from Maven Central Repository. The set is representative for a wide range of libraries. Additionally, it contains two libraries that have unusual properties: jsr305 and easymockclassextesion both do not contain a single instance method call. The jsr305 project is just a collection of annotations and easymockclassextesion only contains interface definitions and a few classes with static methods. It contains very small (e.g., JUnit) to very large (e.g., Scala Library) libraries; libraries developed primarily in an industrial context (e.g., Guava) or in an open-source setting (e.g., Apache Commons); libraries from very different domains: testing (e.g., Hamcrest, Mockito), databases (e.g., HSQLDB), bytecode engineering (e.g., cglib), runtime environments (e.g., Scala Runtime), containers (e.g., Netty), and also general utility libraries (e.g., osgi.core). Lastly, the set also contains libraries that are written in other languages, such as Scala (e.g., ScalaTest), whose compilers only use a subset of the JVM’s concepts. The Scala compiler, e.g., does not use package and protected visibility. This significantly limits our possibilities to identify the library-private implementation (recall that LibCHACPA identifies a library’s private implementation based on the evaluation of the code elements’ visibilities). For each library, we also downloaded all of its dependencies to build complete class hierarchies for them. Michael Reif, Michael Eichberg, Ben Hermann, Johannes Lerch, and Mira Mezini. 2016. Call graph construction for Java libraries. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE 2016) Description of the Darmstadt Library Corpus (DLC) from:
  • 26. @benhermannABM @ NJR 2017 Representativeness 
 in ABM 8 ABM build Compiled Projects Representativeness
  • 27. @benhermannABM @ NJR 2017 Representativeness 
 in ABM 8 ABM build Compiled Projects Representativeness Hermes inspect select
  • 28. @benhermannABM @ NJR 2017 How Hermes Works 9 Corpus candidates
  • 29. @benhermannABM @ NJR 2017 How Hermes Works 9 Corpus candidates Hermes
  • 30. @benhermannABM @ NJR 2017 How Hermes Works 9 Corpus candidates Hermes Optimal corpus
  • 31. @benhermannABM @ NJR 2017 How Hermes Works 9 Corpus candidates Hermes Optimal corpus Feature Queries
  • 32. @benhermannABM @ NJR 2017 How Hermes Works 9 Corpus candidates Hermes Optimal corpus Feature Queries Manual or Automatic Selection
  • 33. @benhermannABM @ NJR 2017 OPAL How Hermes Works 9 Corpus candidates Hermes Optimal corpus Feature Queries Manual or Automatic Selection
  • 34. @benhermannABM @ NJR 2017 OPAL How Hermes Works 9 Corpus candidates Hermes Optimal corpus Feature Queries Manual or Automatic Selection Introduced at 
 SOAP 2014 Introduced at 
 SOAP 2017
  • 35. @benhermannABM @ NJR 2017 Feature Queries 10 trait FeatureQuery { // … def apply[S]( projectConfiguration: ProjectConfiguration, project: Project[S], rawClassFiles: Traversable[(da.ClassFile, S)] ): TraversableOnce[Feature[S]] // … }
  • 36. @benhermannABM @ NJR 2017 Feature Queries 10 trait FeatureQuery { // … def apply[S]( projectConfiguration: ProjectConfiguration, project: Project[S], rawClassFiles: Traversable[(da.ClassFile, S)] ): TraversableOnce[Feature[S]] // … } Identifier, Project JAR Files, Library JAR Files, Statistics
  • 37. @benhermannABM @ NJR 2017 Feature Queries 10 trait FeatureQuery { // … def apply[S]( projectConfiguration: ProjectConfiguration, project: Project[S], rawClassFiles: Traversable[(da.ClassFile, S)] ): TraversableOnce[Feature[S]] // … } Identifier, Project JAR Files, Library JAR Files, Statistics Complete reified project information (classes, fields, methods, bodys, etc.)
  • 38. @benhermannABM @ NJR 2017 Feature Queries 10 trait FeatureQuery { // … def apply[S]( projectConfiguration: ProjectConfiguration, project: Project[S], rawClassFiles: Traversable[(da.ClassFile, S)] ): TraversableOnce[Feature[S]] // … } Identifier, Project JAR Files, Library JAR Files, Statistics Complete reified project information (classes, fields, methods, bodys, etc.) Raw class file information (e.g., for extracting information from the constant pool)
  • 39. @benhermannABM @ NJR 2017 Feature Queries 10 trait FeatureQuery { // … def apply[S]( projectConfiguration: ProjectConfiguration, project: Project[S], rawClassFiles: Traversable[(da.ClassFile, S)] ): TraversableOnce[Feature[S]] // … } Identifier, Project JAR Files, Library JAR Files, Statistics Complete reified project information (classes, fields, methods, bodys, etc.) Raw class file information (e.g., for extracting information from the constant pool)List of detected features in the codebase (id, frequency of occurrence, (opt.) locations)
  • 40. @benhermannABM @ NJR 2017 Already Implemented Queries 11
  • 41. @benhermannABM @ NJR 2017 Already Implemented Queries 11 Existence of 
 Bytecode Instructions Class File Versions Class Types Trivial Reflection Fan-In/Fan-Out Field Access Method w/o Returns Method Types Various Metrics Recursive 
 Data Structures Size of
 Inheritance Tree API Usage
  • 42. @benhermannABM @ NJR 2017 Feature Queries for 
 API Usage 12
  • 43. @benhermannABM @ NJR 2017 Feature Queries for 
 API Usage 12 Bytecode 
 Instrumentation Class Loader GUI Crypto JDBC Reflection System Thread Unsafe
  • 44. @benhermannABM @ NJR 2017 Constructing a Minimal Corpus • Dead-Path Analysis [FSE15] • Original evaluation conducted on the complete Qualitas Corpus • Minimal corpus only consists of 5 out of the 100 projects in the Qualitas Corpus • Evaluation cut down from 16.77 minutes to 2.82 minutes (~6x faster) while coverage is only 1.06% below the original corpus 13
  • 45. @benhermannABM @ NJR 2017 Collection Permanence 14 Permanence ABM
  • 46. @benhermannABM @ NJR 2017 Collection Permanence 14 Permanence ABM We store and retain collection definitions
  • 47. @benhermannABM @ NJR 2017 Collection Permanence 14 Permanence ABM Download corpus and 
 provide on your infrastructure Collected Projects We store and retain collection definitions
  • 48. @benhermannABM @ NJR 2017 Collection Permanence 14 Permanence ABM Publish 
 complete corpus Download corpus and 
 provide on your infrastructure Collected Projects We store and retain collection definitions
  • 49. @benhermannABM @ NJR 2017 Collection Permanence 14 Permanence ABM Publish 
 complete corpus use DOI 
 for papers Download corpus and 
 provide on your infrastructure Collected Projects We store and retain collection definitions
  • 50. @benhermannABM @ NJR 2017 Collection Permanence 14 Permanence ABM Publish 
 complete corpus use DOI 
 for papers Download corpus and 
 provide on your infrastructure Collected Projects We store and retain collection definitions We would love to see more services like this
  • 51. @benhermannABM @ NJR 2017 Bringing it all together 15 ABM Hermes inspect GitHub BitBucket … collect build publish 
 complete corpus use DOI 
 for papers
  • 52. SOFTWARE TECHNIK Automating the Generation of Benchmark Suites Creation, Assessment, and Management of Effective Test Corpora Ben Hermann @benhermann Joint work Michael Reif, Michael Eichberg, and Mira Mezini Thank you!