SlideShare a Scribd company logo
1 of 33
Download to read offline
Model-based Mining of 
Source Code Repositories 
Markus Scheidgen, Joachim Fischer 
{scheidge,fischer}@informatik.hu-berlin.de 
1 
Tuesday, 30. September 2014
Agenda 
▶ Mining Software Repositories (MSR) and Software Evolution 
Research (SER) 
▶ srcrepo – a model-based MSR system 
■ components and mining process 
■ distributed storage of very large models with emf-fragments 
■ a meta-model for source code repositories 
■ gathering software metrics with an OCL-like internal Scala DSL 
▶ work in progress - discussion of remaining problems and 
limitations 
2 
Tuesday, 30. September 2014
Relevant Research Fields 
1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software 
evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 
2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 
2008 
3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 
3 
Tuesday, 30. September 2014
Relevant Research Fields 
3 
Mining Software Repositories (MSR) 
The term mining software repositories (MSR) 
has been coined to describe a broad class of 
techniques for the examination of software 
repositories. 
The premise of MSR is that empirical and 
systematic investigations of repositories will 
shed new light on the process of software 
evolution. [1] 
Software Metrics 
A software metric is a 
mathematical definition mapping 
the entities of a software system 
to numeric metrics values. 
[...] to express features of 
software with numbers in order 
to facilitate software quality 
assessment. [2] 
Reverse Engineering 
Reverse engineering is the process 
of analyzing a subject system to 
(1) identify the system’s 
components and their 
interrelationships and (2) create 
representations of the system in 
another form or at a higher level 
of abstraction [3] 
Software Evolution Research (SER) 
■(dis-)proving Lehmann’s Laws of software evolution 
■empirical investigations of software repositories through statistical analysis of 
software metrics and software change metrics over the evolutionary cause of 
many software systems. 
Model-based Mining Software Repositories (with srcrepo) 
Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and 
retaining meaningful information depth. 
1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software 
evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 
2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 
2008 
3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 
Tuesday, 30. September 2014
Relevant Research Fields 
3 
Mining Software Repositories (MSR) 
The term mining software repositories (MSR) 
has been coined to describe a broad class of 
techniques for the examination of software 
repositories. 
The premise of MSR is that empirical and 
systematic investigations of repositories will 
shed new light on the process of software 
evolution. [1] 
Software Metrics 
A software metric is a 
mathematical definition mapping 
the entities of a software system 
to numeric metrics values. 
[...] to express features of 
software with numbers in order 
to facilitate software quality 
assessment. [2] 
Reverse Engineering 
Reverse engineering is the process 
of analyzing a subject system to 
(1) identify the system’s 
components and their 
interrelationships and (2) create 
representations of the system in 
another form or at a higher level 
of abstraction [3] 
Software Evolution Research (SER) 
■(dis-)proving Lehmann’s Laws of software evolution 
■empirical investigations of software repositories through statistical analysis of 
software metrics and software change metrics over the evolutionary cause of 
many software systems. 
Model-based Mining Software Repositories (with srcrepo) 
Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and 
retaining meaningful information depth. 
1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software 
evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 
2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 
2008 
3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 
Tuesday, 30. September 2014
Relevant Research Fields 
3 
Mining Software Repositories (MSR) 
The term mining software repositories (MSR) 
has been coined to describe a broad class of 
techniques for the examination of software 
repositories. 
The premise of MSR is that empirical and 
systematic investigations of repositories will 
shed new light on the process of software 
evolution. [1] 
Software Metrics 
A software metric is a 
mathematical definition mapping 
the entities of a software system 
to numeric metrics values. 
[...] to express features of 
software with numbers in order 
to facilitate software quality 
assessment. [2] 
Reverse Engineering 
Reverse engineering is the process 
of analyzing a subject system to 
(1) identify the system’s 
components and their 
interrelationships and (2) create 
representations of the system in 
another form or at a higher level 
of abstraction [3] 
Software Evolution Research (SER) 
■(dis-)proving Lehmann’s Laws of software evolution 
■empirical investigations of software repositories through statistical analysis of 
software metrics and software change metrics over the evolutionary cause of 
many software systems. 
Model-based Mining Software Repositories (with srcrepo) 
Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and 
retaining meaningful information depth. 
1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software 
evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 
2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 
2008 
3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 
Tuesday, 30. September 2014
Relevant Research Fields 
3 
Mining Software Repositories (MSR) 
The term mining software repositories (MSR) 
has been coined to describe a broad class of 
techniques for the examination of software 
repositories. 
The premise of MSR is that empirical and 
systematic investigations of repositories will 
shed new light on the process of software 
evolution. [1] 
Software Metrics 
A software metric is a 
mathematical definition mapping 
the entities of a software system 
to numeric metrics values. 
[...] to express features of 
software with numbers in order 
to facilitate software quality 
assessment. [2] 
Reverse Engineering 
Reverse engineering is the process 
of analyzing a subject system to 
(1) identify the system’s 
components and their 
interrelationships and (2) create 
representations of the system in 
another form or at a higher level 
of abstraction [3] 
Software Evolution Research (SER) 
■(dis-)proving Lehmann’s Laws of software evolution 
■empirical investigations of software repositories through statistical analysis of 
software metrics and software change metrics over the evolutionary cause of 
many software systems. 
Model-based Mining Software Repositories (with srcrepo) 
Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and 
retaining meaningful information depth. 
1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software 
evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 
2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 
2008 
3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 
Tuesday, 30. September 2014
Relevant Research Fields 
3 
Mining Software Repositories (MSR) 
The term mining software repositories (MSR) 
has been coined to describe a broad class of 
techniques for the examination of software 
repositories. 
The premise of MSR is that empirical and 
systematic investigations of repositories will 
shed new light on the process of software 
evolution. [1] 
Software Metrics 
A software metric is a 
mathematical definition mapping 
the entities of a software system 
to numeric metrics values. 
[...] to express features of 
software with numbers in order 
to facilitate software quality 
assessment. [2] 
Reverse Engineering 
Reverse engineering is the process 
of analyzing a subject system to 
(1) identify the system’s 
components and their 
interrelationships and (2) create 
representations of the system in 
another form or at a higher level 
of abstraction [3] 
Software Evolution Research (SER) 
■(dis-)proving Lehmann’s Laws of software evolution 
■empirical investigations of software repositories through statistical analysis of 
software metrics and software change metrics over the evolutionary cause of 
many software systems. 
Model-based Mining Software Repositories (with srcrepo) 
Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and 
retaining meaningful information depth. 
1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software 
evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 
2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 
2008 
3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 
Tuesday, 30. September 2014
Relevant Research Fields 
3 
Mining Software Repositories (MSR) 
The term mining software repositories (MSR) 
has been coined to describe a broad class of 
techniques for the examination of software 
repositories. 
The premise of MSR is that empirical and 
systematic investigations of repositories will 
shed new light on the process of software 
evolution. [1] 
Software Metrics 
A software metric is a 
mathematical definition mapping 
the entities of a software system 
to numeric metrics values. 
[...] to express features of 
software with numbers in order 
to facilitate software quality 
assessment. [2] 
Reverse Engineering 
Reverse engineering is the process 
of analyzing a subject system to 
(1) identify the system’s 
components and their 
interrelationships and (2) create 
representations of the system in 
another form or at a higher level 
of abstraction [3] 
Software Evolution Research (SER) 
■(dis-)proving Lehmann’s Laws of software evolution 
■empirical investigations of software repositories through statistical analysis of 
software metrics and software change metrics over the evolutionary cause of 
many software systems. 
Model-based Mining Software Repositories (with srcrepo) 
Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and 
retaining meaningful information depth. 
1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software 
evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 
2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 
2008 
3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 
Tuesday, 30. September 2014
srcrepo – Components and Process 
4 
Tuesday, 30. September 2014
srcrepo – Components and Process 
4 
large scale software 
repositories 
(e.g. github, sourceforge) 
source code 
repository 
(e.g. controlled 
by Git, SVN, CVS) 
source code 
(e.g. java, C++, 
eclipse*) 
issue tracker, mailing lists, wiki 
software projects 
Tuesday, 30. September 2014
srcrepo – Components and Process 
revision tree 
4 
large scale software 
repositories 
(e.g. github, sourceforge) 
srcrepo storage 
(EMF-models via EMF-Fragments, 
e.g. on mongodb) 
source code 
repository 
(e.g. controlled 
by Git, SVN, CVS) 
source code 
(e.g. java, C++, 
eclipse*) 
issue tracker, mailing lists, wiki 
software projects 
C! 
1 
2 3 
A" B# A! 
AST-models of new and 
changed CUs 
import 
Tuesday, 30. September 2014
srcrepo – Components and Process 
revision tree 
4 
large scale software 
repositories 
(e.g. github, sourceforge) 
srcrepo storage 
(EMF-models via EMF-Fragments, 
e.g. on mongodb) 
srcrepo runtime 
(headless eclipse RCP) 
source code 
repository 
(e.g. controlled 
by Git, SVN, CVS) 
source code 
(e.g. java, C++, 
eclipse*) 
issue tracker, mailing lists, wiki 
software projects 
C! 
1 
2 3 
A" B# A! 
AST-models of new and 
changed CUs 
import 
1 
revision tree 
2 3 
S! S" S" 
fully resolved snapshot 
models 
A! A!B" 
B"C# A# 
analysis 
Tuesday, 30. September 2014
5 
repository 
(e.g. controlled 
by Git, SVN, CVS) 
source code 
(e.g. java, C++, 
eclipse*) 
issue tracker, mailing lists, wiki 
C! 
1 
2 3 
A" B# A! 
AST-models of new and 
changed CUs 
import 
1 
2 3 
S! S" S" 
fully resolved snapshot 
models 
A! A!B" 
B"C# A# 
analysis 
Tuesday, 30. September 2014
5 
repository 
(e.g. controlled 
by Git, SVN, CVS) 
source code 
(e.g. java, C++, 
eclipse*) 
issue tracker, mailing lists, wiki 
C! 
1 
2 3 
A" B# A! 
AST-models of new and 
changed CUs 
import 
1 
2 3 
S! S" S" 
fully resolved snapshot 
models 
A! A!B" 
B"C# A# 
analysis OCL 
M! M" M# 
Tuesday, 30. September 2014
5 
repository 
(e.g. controlled 
by Git, SVN, CVS) 
source code 
(e.g. java, C++, 
eclipse*) 
issue tracker, mailing lists, wiki 
C! 
1 
2 3 
A" B# A! 
AST-models of new and 
changed CUs 
import 
1 
2 3 
S! S" S" 
fully resolved snapshot 
models 
A! A!B" 
B"C# A# 
analysis 
OCL EMF-Compare 
D!"# 
M!"# 
D#"$ 
M#"$ 
OCL 
M! M" M# 
Tuesday, 30. September 2014
5 
repository 
(e.g. controlled 
by Git, SVN, CVS) 
source code 
(e.g. java, C++, 
eclipse*) 
issue tracker, mailing lists, wiki 
C! 
1 
2 3 
A" B# A! 
AST-models of new and 
changed CUs 
import 
1 
2 3 
S! S" S" 
fully resolved snapshot 
models 
A! A!B" 
B"C# A# 
analysis 
M! 
M!"# 
M# M# 
M#"$ 
store 
timelines of metrics 
OCL EMF-Compare 
D!"# 
M!"# 
D#"$ 
M#"$ 
OCL 
M! M" M# 
Tuesday, 30. September 2014
5 
repository 
(e.g. controlled 
by Git, SVN, CVS) 
source code 
(e.g. java, C++, 
eclipse*) 
issue tracker, mailing lists, wiki 
statistics software 
(e.g. R, Matlab) 
C! 
1 
2 3 
A" B# A! 
AST-models of new and 
changed CUs 
import 
1 
2 3 
S! S" S" 
fully resolved snapshot 
models 
A! A!B" 
B"C# A# 
analysis 
M! 
M!"# 
M# M# 
M#"$ 
store 
timelines of metrics 
export 
OCL EMF-Compare 
D!"# 
M!"# 
D#"$ 
M#"$ 
OCL 
M! M" M# 
Tuesday, 30. September 2014
Distributed Model Store with emf-fragments 
6 
normal references 
* 
Tuesday, 30. September 2014
Distributed Model Store with emf-fragments 
6 
normal references 
* 
fragmenting references 
«fragments» 
* 
Tuesday, 30. September 2014
A Meta-Model for Source Code Repositories 
7 
code revision tree 
» 
«fragments» 
Tuesday, 30. September 2014
8 
MoDisco source code revision tree 
«fragments» 
Tuesday, 30. September 2014
A OCL-like internal Scala DSL for Computing Metrics 
▶ OCL-like internal Scala DSL analog to our internal Scala 
model transformation language [1] 
▶ OCL collection operations mapped to Scala’s higher-order 
fuctions [2]: 
1. L. George, A. Wider, M. Scheidgen: Type-Safe Model Transformation Languages as Internal DSLs in Scala; Theory and Practice of Model 
Transformations - 5th International Conference, ICMT; 2012 
2.Filip Krikava: Enrichting EMF Models with Scala; Slideshare 
9 
Tuesday, 30. September 2014
A OCL-like internal Scala DSL for Computing Metrics 
ownedPackages 
▶ OCL-like internal Scala Model DSL analog Package 
to our internal Scala 
model transformation language [1] 
▶ OCL collection operations mapped to Scala’s higher-order 
fuctions [2]: 
9 
pure OCL 
context Model: 
self.ownedElements->collect(p|p.ownedElements)->size 
Abstract 
Type 
Declaration 
ownedElements 
ownedElements 
* 
* 
* 
1. L. George, A. Wider, M. Scheidgen: Type-Safe Model Transformation Languages as Internal DSLs in Scala; Theory and Practice of Model 
Transformations - 5th International Conference, ICMT; 2012 
2.Filip Krikava: Enrichting EMF Models with Scala; Slideshare 
Tuesday, 30. September 2014
A OCL-like internal Scala DSL for Computing Metrics 
Abstract 
Type 
Declaration 
ownedPackages 
* 
▶ OCL-like internal Scala DSL ownedElements 
Model analog Package 
to our internal Scala 
model transformation language [1] 
▶ OCL collection operations mapped to Scala’s higher-order 
fuctions [2]: 
9 
pure OCL 
context Model: 
* 
ownedElements 
* 
self.ownedElements->collect(p|p.ownedElements)->size 
OCL-like expression in Scala 
def numberOfFirstPackageLevelTypes(self: Model): Int = 
self.getOwnedElements().collect(p=>p.getOwnedElements()).size() 
1. L. George, A. Wider, M. Scheidgen: Type-Safe Model Transformation Languages as Internal DSLs in Scala; Theory and Practice of Model 
Transformations - 5th International Conference, ICMT; 2012 
2.Filip Krikava: Enrichting EMF Models with Scala; Slideshare 
Tuesday, 30. September 2014
A OCL-like internal Scala DSL for Computing Metrics 
▶ Extending OCL’s collection operations: 
■ convenience operations 
■ closure 
■ aggregation 
■ execution 
trait OclCollection[E] extends java.lang.Iterable[E] 
{ 
def size(): Int 
def first(): E 
def exists(predicate: (E) => Boolean): Boolean 
def forAll(predicate: (E) => Boolean): Boolean 
def select(predicate: (E) => Boolean): OclCollection[E] 
def reject(predicate: (E) => Boolean): OclCollection[E] 
def collect[R](expr: (E) => R): OclCollection[R] 
def selectOfType[T]:OclCollection[T] 
def collectNotNull[R](expr: (E) => R): OclCollection[R] 
def collectAll[R](expr: (E) => OclCollection[R]): OclCollection[R] 
def closure(expr: (E) => OclCollection[10 
E]): OclCollection[E] 
123456789 
10 
11 
12 
13 
14 
15 
16 
Tuesday, 30. September 2014
trait OclCollection[E] extends java.lang.Iterable[E] 
{ 
def size(): Int 
def first(): E 
def exists(predicate: (E) => Boolean): Boolean 
def forAll(predicate: (E) => Boolean): Boolean 
def select(predicate: (E) => Boolean): OclCollection[E] 
def reject(predicate: (E) => Boolean): OclCollection[E] 
def collect[R](expr: (E) => R): OclCollection[R] 
def selectOfType[T]:OclCollection[T] 
def collectNotNull[R](expr: (E) => R): OclCollection[R] 
def collectAll[R](expr: (E) => OclCollection[R]): OclCollection[R] 
def closure(expr: (E) => OclCollection[E]): OclCollection[E] 
def aggregate[R,I](expr: (E) => I, start: () => R, aggr: (R, I) => R): R 
def sum(expr: (E) => Double): Double 
def product(expr: (E) => Double): Double 
def max(expr: (E) => Double): Double 
def min(expr: (E) => Double): Double 
def stats(expr: (E) => Double): Stats 
def run(runnable: (E) => Unit): Unit 
} 
11 
123456789 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
Tuesday, 30. September 2014
Complex Example: Average Weighted Methods per Class (WMC) 
▶ WMC is the first CK-metric [1]. There different commonly 
used weights; here we use cyclomatic complexity. 
def classes(model:Model):OclCollection[ClassDeclaration] = 
model.getOwnedElements() 
.collectClosure(pkg=>pkg.getOwnedPackages()) 
.collectAll(pkg=>pkg.getOwnedElements()) 
.collectClosure(typeDcl=> 
typeDcl.getBodyDeclarations() 
.selectOfType[ClassDeclaration]) 
12 
def WMC(model:Model):Double = 
classes(model).stats(clazz=> 
clazz.getBodyDeclarations() 
.selectOfType[MethodDeclaration]() 
.sum(method=>cyclmaticComplexity(method))).average 
def cyclomaticComplexity(method:MethodDeclaration):Int = 
... 
123456789 
10 
11 
12 
13 
14 
15 
16 
1. S.R. Chidamber, C.F. Kemerer: A Metrics Suite for Object Oriented Design; IEEE Transactions on Software Eng.; Vol.20/Nr.6/1994 
Tuesday, 30. September 2014
Complex Example: Average Weighted Methods per Class (WMC) 
▶ WMC is the first CK-metric [1]. There different commonly 
used weights; here we use cyclomatic complexity. 
def classes(model:Model):OclCollection[ClassDeclaration] = 
model.getOwnedElements() 
.collectClosure(pkg=>pkg.getOwnedPackages()) 
.collectAll(pkg=>pkg.getOwnedElements()) 
.collectClosure(typeDcl=> 
typeDcl.getBodyDeclarations() 
.selectOfType[ClassDeclaration]) 
Model 
Abstract 
Body 
Declaration 
12 
def WMC(model:Model):Double = 
classes(model).stats(clazz=> 
clazz.getBodyDeclarations() 
Package 
ownedPackages 
.selectOfType[MethodDeclaration]() 
.sum(method=>cyclmaticComplexity(method))).average 
Abstract 
Type 
Declaration 
def cyclomaticComplexity(method:MethodDeclaration):Int = 
... 
123456789 
10 
11 
12 
13 
14 
15 
16 
Class 
Declaration 
1 
1. S.R. Chidamber, C.F. Kemerer: A Metrics Suite for Object Oriented Design; IEEE Transactions on Software Eng.; Vol.20/Nr.6/1994 
TypeAccess 
ownedElements 
ownedElements 
bodyDeclarations 
superInterfaces 
superClass 
returnType 
type 
usagesInTypeAccess 
* 
* * 
* 
* * 
1 
1 
Tuesday, 30. September 2014
Model 
Package 
ownedElements 
Complex Example: Average Weighted Methods * per Class * 
ownedPackages 
(WMC) 
ownedElements 
* 
▶ WMC is the first CK-metric [1]. There different Abstract 
commonly 
Type 
used weights; here we use cyclomatic complexity. 
Declaration 
bodyDeclarations 
type 
* 
1 
def classes(model:Model):OclCollection[Abstract 
ClassDeclaration] = 
model.getOwnedElements() 
Body 
usagesInTypeAccess 
Declaration 
.collectClosure(pkg=>pkg.getOwnedPackages()) 
.collectAll(pkg=>pkg.getOwnedElements()) 
.collectClosure(typeDcl=> 
typeDcl.getBodyDeclarations() 
.selectOfType[ClassDeclaration]) 
12 
def WMC(model:Model):Double = 
classes(model).stats(clazz=> 
clazz.getBodyDeclarations() 
superInterfaces 
* * 
TypeAccess 
1 
returnType 
1 
Method 
Declaration 
superClass 
.selectOfType[MethodDeclaration]() 
.sum(method=>cyclmaticComplexity(method))).average 
def cyclomaticComplexity(method:MethodDeclaration):Int = 
... 
123456789 
10 
11 
12 
13 
14 
15 
16 
Class 
Declaration 
Abstract 
Method 
Invocation 
method 
1 
1. S.R. Chidamber, C.F. Kemerer: A Metrics Suite for Object Oriented Design; IEEE Transactions on Software Eng.; Vol.20/Nr.6/1994 
Tuesday, 30. September 2014
Implementation of the OCL-Collection Operations 
▶ Just in time iterator-based implementation rather than 
straight forward aggregation of result collections. 
13 
:RepositoryModel 
:Rev :Rev :Rev 
:Par... :Par... :Par... :Par... :Par... :Par... 
:Di! :Di! :Di! :Di! :Di! :Di! :Di! :Di! :Di! :Di! 
1 
2 
3 
4 
Tuesday, 30. September 2014
Future Work, Remaining Problems, and Limitations 
14 
Scalability 
■very large compilation units 
■incremental snapshot creation 
■batching OCL execution 
■experiments with large scale 
repository (e.g. git.eclipse.org) 
Heterogeneity 
■MoDisco for different 
programming languages 
■common metrics meta-model 
(e.g. OMG, KDM) 
■VCS abstraction and support for 
different VCS 
Accessibility 
■relating results to 
software repository 
entities 
■persisting and 
exporting results 
Information Depth 
■diff-models from 
comparison of 
compilation 
units 
▶ Very large compilation units (CU): e.g. a 3 MB, 600 kLOC CU in org.eclipse.emf 
■ tends to have lots of dependencies ➞ changes often ➞ makes problem even bigger 
■ CUs are smallest common denominator between text-based VCS view and syntax-based 
AST view 
■ smaller units require model-comparison or text-to-AST mappings 
▶ Support for different programming languages: either abstraction, parallel meta-models, or 
mixed approach 
■ MoDisco is extendable, but only Java support exists; other languages need to be 
implemented ➞ parallel meta-models 
■ A reasonable abstraction for multiple (or all) programming language probably does 
not exist. 
■ A shared abstract meta-model that all language meta-models extends could be an 
sensible compromise. 
Tuesday, 30. September 2014
15 
Tuesday, 30. September 2014

More Related Content

Recently uploaded

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 

Recently uploaded (20)

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Model-based Mining of Software Repositories

  • 1. Model-based Mining of Source Code Repositories Markus Scheidgen, Joachim Fischer {scheidge,fischer}@informatik.hu-berlin.de 1 Tuesday, 30. September 2014
  • 2. Agenda ▶ Mining Software Repositories (MSR) and Software Evolution Research (SER) ▶ srcrepo – a model-based MSR system ■ components and mining process ■ distributed storage of very large models with emf-fragments ■ a meta-model for source code repositories ■ gathering software metrics with an OCL-like internal Scala DSL ▶ work in progress - discussion of remaining problems and limitations 2 Tuesday, 30. September 2014
  • 3. Relevant Research Fields 1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008 3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 3 Tuesday, 30. September 2014
  • 4. Relevant Research Fields 3 Mining Software Repositories (MSR) The term mining software repositories (MSR) has been coined to describe a broad class of techniques for the examination of software repositories. The premise of MSR is that empirical and systematic investigations of repositories will shed new light on the process of software evolution. [1] Software Metrics A software metric is a mathematical definition mapping the entities of a software system to numeric metrics values. [...] to express features of software with numbers in order to facilitate software quality assessment. [2] Reverse Engineering Reverse engineering is the process of analyzing a subject system to (1) identify the system’s components and their interrelationships and (2) create representations of the system in another form or at a higher level of abstraction [3] Software Evolution Research (SER) ■(dis-)proving Lehmann’s Laws of software evolution ■empirical investigations of software repositories through statistical analysis of software metrics and software change metrics over the evolutionary cause of many software systems. Model-based Mining Software Repositories (with srcrepo) Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth. 1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008 3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 Tuesday, 30. September 2014
  • 5. Relevant Research Fields 3 Mining Software Repositories (MSR) The term mining software repositories (MSR) has been coined to describe a broad class of techniques for the examination of software repositories. The premise of MSR is that empirical and systematic investigations of repositories will shed new light on the process of software evolution. [1] Software Metrics A software metric is a mathematical definition mapping the entities of a software system to numeric metrics values. [...] to express features of software with numbers in order to facilitate software quality assessment. [2] Reverse Engineering Reverse engineering is the process of analyzing a subject system to (1) identify the system’s components and their interrelationships and (2) create representations of the system in another form or at a higher level of abstraction [3] Software Evolution Research (SER) ■(dis-)proving Lehmann’s Laws of software evolution ■empirical investigations of software repositories through statistical analysis of software metrics and software change metrics over the evolutionary cause of many software systems. Model-based Mining Software Repositories (with srcrepo) Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth. 1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008 3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 Tuesday, 30. September 2014
  • 6. Relevant Research Fields 3 Mining Software Repositories (MSR) The term mining software repositories (MSR) has been coined to describe a broad class of techniques for the examination of software repositories. The premise of MSR is that empirical and systematic investigations of repositories will shed new light on the process of software evolution. [1] Software Metrics A software metric is a mathematical definition mapping the entities of a software system to numeric metrics values. [...] to express features of software with numbers in order to facilitate software quality assessment. [2] Reverse Engineering Reverse engineering is the process of analyzing a subject system to (1) identify the system’s components and their interrelationships and (2) create representations of the system in another form or at a higher level of abstraction [3] Software Evolution Research (SER) ■(dis-)proving Lehmann’s Laws of software evolution ■empirical investigations of software repositories through statistical analysis of software metrics and software change metrics over the evolutionary cause of many software systems. Model-based Mining Software Repositories (with srcrepo) Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth. 1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008 3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 Tuesday, 30. September 2014
  • 7. Relevant Research Fields 3 Mining Software Repositories (MSR) The term mining software repositories (MSR) has been coined to describe a broad class of techniques for the examination of software repositories. The premise of MSR is that empirical and systematic investigations of repositories will shed new light on the process of software evolution. [1] Software Metrics A software metric is a mathematical definition mapping the entities of a software system to numeric metrics values. [...] to express features of software with numbers in order to facilitate software quality assessment. [2] Reverse Engineering Reverse engineering is the process of analyzing a subject system to (1) identify the system’s components and their interrelationships and (2) create representations of the system in another form or at a higher level of abstraction [3] Software Evolution Research (SER) ■(dis-)proving Lehmann’s Laws of software evolution ■empirical investigations of software repositories through statistical analysis of software metrics and software change metrics over the evolutionary cause of many software systems. Model-based Mining Software Repositories (with srcrepo) Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth. 1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008 3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 Tuesday, 30. September 2014
  • 8. Relevant Research Fields 3 Mining Software Repositories (MSR) The term mining software repositories (MSR) has been coined to describe a broad class of techniques for the examination of software repositories. The premise of MSR is that empirical and systematic investigations of repositories will shed new light on the process of software evolution. [1] Software Metrics A software metric is a mathematical definition mapping the entities of a software system to numeric metrics values. [...] to express features of software with numbers in order to facilitate software quality assessment. [2] Reverse Engineering Reverse engineering is the process of analyzing a subject system to (1) identify the system’s components and their interrelationships and (2) create representations of the system in another form or at a higher level of abstraction [3] Software Evolution Research (SER) ■(dis-)proving Lehmann’s Laws of software evolution ■empirical investigations of software repositories through statistical analysis of software metrics and software change metrics over the evolutionary cause of many software systems. Model-based Mining Software Repositories (with srcrepo) Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth. 1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008 3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 Tuesday, 30. September 2014
  • 9. Relevant Research Fields 3 Mining Software Repositories (MSR) The term mining software repositories (MSR) has been coined to describe a broad class of techniques for the examination of software repositories. The premise of MSR is that empirical and systematic investigations of repositories will shed new light on the process of software evolution. [1] Software Metrics A software metric is a mathematical definition mapping the entities of a software system to numeric metrics values. [...] to express features of software with numbers in order to facilitate software quality assessment. [2] Reverse Engineering Reverse engineering is the process of analyzing a subject system to (1) identify the system’s components and their interrelationships and (2) create representations of the system in another form or at a higher level of abstraction [3] Software Evolution Research (SER) ■(dis-)proving Lehmann’s Laws of software evolution ■empirical investigations of software repositories through statistical analysis of software metrics and software change metrics over the evolutionary cause of many software systems. Model-based Mining Software Repositories (with srcrepo) Overcoming heterogeneity and accessibility by raising the level of abstraction, while ensuring scalability and retaining meaningful information depth. 1. H. Kagdi, M.L. Collard, J.I. Maletic: A survey and taxonomy of approaches for mining software repositories in the context of software evolution; Journal of Software Maintenance and Evolution: Research and Practice; Vol.19/Nr.2/2007 2.R. Lincke, J. Lundberg, W. Löwe: Comparing Software Metrics Tools; 8th International Symposium on Software Testing and Analysis; 2008 3.E.J. Chikofsky, J.H. Cross: Reverse engineering and design recovery: A taxonomy; IEEE Software; Vol.7/Nr.1/1990 Tuesday, 30. September 2014
  • 10. srcrepo – Components and Process 4 Tuesday, 30. September 2014
  • 11. srcrepo – Components and Process 4 large scale software repositories (e.g. github, sourceforge) source code repository (e.g. controlled by Git, SVN, CVS) source code (e.g. java, C++, eclipse*) issue tracker, mailing lists, wiki software projects Tuesday, 30. September 2014
  • 12. srcrepo – Components and Process revision tree 4 large scale software repositories (e.g. github, sourceforge) srcrepo storage (EMF-models via EMF-Fragments, e.g. on mongodb) source code repository (e.g. controlled by Git, SVN, CVS) source code (e.g. java, C++, eclipse*) issue tracker, mailing lists, wiki software projects C! 1 2 3 A" B# A! AST-models of new and changed CUs import Tuesday, 30. September 2014
  • 13. srcrepo – Components and Process revision tree 4 large scale software repositories (e.g. github, sourceforge) srcrepo storage (EMF-models via EMF-Fragments, e.g. on mongodb) srcrepo runtime (headless eclipse RCP) source code repository (e.g. controlled by Git, SVN, CVS) source code (e.g. java, C++, eclipse*) issue tracker, mailing lists, wiki software projects C! 1 2 3 A" B# A! AST-models of new and changed CUs import 1 revision tree 2 3 S! S" S" fully resolved snapshot models A! A!B" B"C# A# analysis Tuesday, 30. September 2014
  • 14. 5 repository (e.g. controlled by Git, SVN, CVS) source code (e.g. java, C++, eclipse*) issue tracker, mailing lists, wiki C! 1 2 3 A" B# A! AST-models of new and changed CUs import 1 2 3 S! S" S" fully resolved snapshot models A! A!B" B"C# A# analysis Tuesday, 30. September 2014
  • 15. 5 repository (e.g. controlled by Git, SVN, CVS) source code (e.g. java, C++, eclipse*) issue tracker, mailing lists, wiki C! 1 2 3 A" B# A! AST-models of new and changed CUs import 1 2 3 S! S" S" fully resolved snapshot models A! A!B" B"C# A# analysis OCL M! M" M# Tuesday, 30. September 2014
  • 16. 5 repository (e.g. controlled by Git, SVN, CVS) source code (e.g. java, C++, eclipse*) issue tracker, mailing lists, wiki C! 1 2 3 A" B# A! AST-models of new and changed CUs import 1 2 3 S! S" S" fully resolved snapshot models A! A!B" B"C# A# analysis OCL EMF-Compare D!"# M!"# D#"$ M#"$ OCL M! M" M# Tuesday, 30. September 2014
  • 17. 5 repository (e.g. controlled by Git, SVN, CVS) source code (e.g. java, C++, eclipse*) issue tracker, mailing lists, wiki C! 1 2 3 A" B# A! AST-models of new and changed CUs import 1 2 3 S! S" S" fully resolved snapshot models A! A!B" B"C# A# analysis M! M!"# M# M# M#"$ store timelines of metrics OCL EMF-Compare D!"# M!"# D#"$ M#"$ OCL M! M" M# Tuesday, 30. September 2014
  • 18. 5 repository (e.g. controlled by Git, SVN, CVS) source code (e.g. java, C++, eclipse*) issue tracker, mailing lists, wiki statistics software (e.g. R, Matlab) C! 1 2 3 A" B# A! AST-models of new and changed CUs import 1 2 3 S! S" S" fully resolved snapshot models A! A!B" B"C# A# analysis M! M!"# M# M# M#"$ store timelines of metrics export OCL EMF-Compare D!"# M!"# D#"$ M#"$ OCL M! M" M# Tuesday, 30. September 2014
  • 19. Distributed Model Store with emf-fragments 6 normal references * Tuesday, 30. September 2014
  • 20. Distributed Model Store with emf-fragments 6 normal references * fragmenting references «fragments» * Tuesday, 30. September 2014
  • 21. A Meta-Model for Source Code Repositories 7 code revision tree » «fragments» Tuesday, 30. September 2014
  • 22. 8 MoDisco source code revision tree «fragments» Tuesday, 30. September 2014
  • 23. A OCL-like internal Scala DSL for Computing Metrics ▶ OCL-like internal Scala DSL analog to our internal Scala model transformation language [1] ▶ OCL collection operations mapped to Scala’s higher-order fuctions [2]: 1. L. George, A. Wider, M. Scheidgen: Type-Safe Model Transformation Languages as Internal DSLs in Scala; Theory and Practice of Model Transformations - 5th International Conference, ICMT; 2012 2.Filip Krikava: Enrichting EMF Models with Scala; Slideshare 9 Tuesday, 30. September 2014
  • 24. A OCL-like internal Scala DSL for Computing Metrics ownedPackages ▶ OCL-like internal Scala Model DSL analog Package to our internal Scala model transformation language [1] ▶ OCL collection operations mapped to Scala’s higher-order fuctions [2]: 9 pure OCL context Model: self.ownedElements->collect(p|p.ownedElements)->size Abstract Type Declaration ownedElements ownedElements * * * 1. L. George, A. Wider, M. Scheidgen: Type-Safe Model Transformation Languages as Internal DSLs in Scala; Theory and Practice of Model Transformations - 5th International Conference, ICMT; 2012 2.Filip Krikava: Enrichting EMF Models with Scala; Slideshare Tuesday, 30. September 2014
  • 25. A OCL-like internal Scala DSL for Computing Metrics Abstract Type Declaration ownedPackages * ▶ OCL-like internal Scala DSL ownedElements Model analog Package to our internal Scala model transformation language [1] ▶ OCL collection operations mapped to Scala’s higher-order fuctions [2]: 9 pure OCL context Model: * ownedElements * self.ownedElements->collect(p|p.ownedElements)->size OCL-like expression in Scala def numberOfFirstPackageLevelTypes(self: Model): Int = self.getOwnedElements().collect(p=>p.getOwnedElements()).size() 1. L. George, A. Wider, M. Scheidgen: Type-Safe Model Transformation Languages as Internal DSLs in Scala; Theory and Practice of Model Transformations - 5th International Conference, ICMT; 2012 2.Filip Krikava: Enrichting EMF Models with Scala; Slideshare Tuesday, 30. September 2014
  • 26. A OCL-like internal Scala DSL for Computing Metrics ▶ Extending OCL’s collection operations: ■ convenience operations ■ closure ■ aggregation ■ execution trait OclCollection[E] extends java.lang.Iterable[E] { def size(): Int def first(): E def exists(predicate: (E) => Boolean): Boolean def forAll(predicate: (E) => Boolean): Boolean def select(predicate: (E) => Boolean): OclCollection[E] def reject(predicate: (E) => Boolean): OclCollection[E] def collect[R](expr: (E) => R): OclCollection[R] def selectOfType[T]:OclCollection[T] def collectNotNull[R](expr: (E) => R): OclCollection[R] def collectAll[R](expr: (E) => OclCollection[R]): OclCollection[R] def closure(expr: (E) => OclCollection[10 E]): OclCollection[E] 123456789 10 11 12 13 14 15 16 Tuesday, 30. September 2014
  • 27. trait OclCollection[E] extends java.lang.Iterable[E] { def size(): Int def first(): E def exists(predicate: (E) => Boolean): Boolean def forAll(predicate: (E) => Boolean): Boolean def select(predicate: (E) => Boolean): OclCollection[E] def reject(predicate: (E) => Boolean): OclCollection[E] def collect[R](expr: (E) => R): OclCollection[R] def selectOfType[T]:OclCollection[T] def collectNotNull[R](expr: (E) => R): OclCollection[R] def collectAll[R](expr: (E) => OclCollection[R]): OclCollection[R] def closure(expr: (E) => OclCollection[E]): OclCollection[E] def aggregate[R,I](expr: (E) => I, start: () => R, aggr: (R, I) => R): R def sum(expr: (E) => Double): Double def product(expr: (E) => Double): Double def max(expr: (E) => Double): Double def min(expr: (E) => Double): Double def stats(expr: (E) => Double): Stats def run(runnable: (E) => Unit): Unit } 11 123456789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Tuesday, 30. September 2014
  • 28. Complex Example: Average Weighted Methods per Class (WMC) ▶ WMC is the first CK-metric [1]. There different commonly used weights; here we use cyclomatic complexity. def classes(model:Model):OclCollection[ClassDeclaration] = model.getOwnedElements() .collectClosure(pkg=>pkg.getOwnedPackages()) .collectAll(pkg=>pkg.getOwnedElements()) .collectClosure(typeDcl=> typeDcl.getBodyDeclarations() .selectOfType[ClassDeclaration]) 12 def WMC(model:Model):Double = classes(model).stats(clazz=> clazz.getBodyDeclarations() .selectOfType[MethodDeclaration]() .sum(method=>cyclmaticComplexity(method))).average def cyclomaticComplexity(method:MethodDeclaration):Int = ... 123456789 10 11 12 13 14 15 16 1. S.R. Chidamber, C.F. Kemerer: A Metrics Suite for Object Oriented Design; IEEE Transactions on Software Eng.; Vol.20/Nr.6/1994 Tuesday, 30. September 2014
  • 29. Complex Example: Average Weighted Methods per Class (WMC) ▶ WMC is the first CK-metric [1]. There different commonly used weights; here we use cyclomatic complexity. def classes(model:Model):OclCollection[ClassDeclaration] = model.getOwnedElements() .collectClosure(pkg=>pkg.getOwnedPackages()) .collectAll(pkg=>pkg.getOwnedElements()) .collectClosure(typeDcl=> typeDcl.getBodyDeclarations() .selectOfType[ClassDeclaration]) Model Abstract Body Declaration 12 def WMC(model:Model):Double = classes(model).stats(clazz=> clazz.getBodyDeclarations() Package ownedPackages .selectOfType[MethodDeclaration]() .sum(method=>cyclmaticComplexity(method))).average Abstract Type Declaration def cyclomaticComplexity(method:MethodDeclaration):Int = ... 123456789 10 11 12 13 14 15 16 Class Declaration 1 1. S.R. Chidamber, C.F. Kemerer: A Metrics Suite for Object Oriented Design; IEEE Transactions on Software Eng.; Vol.20/Nr.6/1994 TypeAccess ownedElements ownedElements bodyDeclarations superInterfaces superClass returnType type usagesInTypeAccess * * * * * * 1 1 Tuesday, 30. September 2014
  • 30. Model Package ownedElements Complex Example: Average Weighted Methods * per Class * ownedPackages (WMC) ownedElements * ▶ WMC is the first CK-metric [1]. There different Abstract commonly Type used weights; here we use cyclomatic complexity. Declaration bodyDeclarations type * 1 def classes(model:Model):OclCollection[Abstract ClassDeclaration] = model.getOwnedElements() Body usagesInTypeAccess Declaration .collectClosure(pkg=>pkg.getOwnedPackages()) .collectAll(pkg=>pkg.getOwnedElements()) .collectClosure(typeDcl=> typeDcl.getBodyDeclarations() .selectOfType[ClassDeclaration]) 12 def WMC(model:Model):Double = classes(model).stats(clazz=> clazz.getBodyDeclarations() superInterfaces * * TypeAccess 1 returnType 1 Method Declaration superClass .selectOfType[MethodDeclaration]() .sum(method=>cyclmaticComplexity(method))).average def cyclomaticComplexity(method:MethodDeclaration):Int = ... 123456789 10 11 12 13 14 15 16 Class Declaration Abstract Method Invocation method 1 1. S.R. Chidamber, C.F. Kemerer: A Metrics Suite for Object Oriented Design; IEEE Transactions on Software Eng.; Vol.20/Nr.6/1994 Tuesday, 30. September 2014
  • 31. Implementation of the OCL-Collection Operations ▶ Just in time iterator-based implementation rather than straight forward aggregation of result collections. 13 :RepositoryModel :Rev :Rev :Rev :Par... :Par... :Par... :Par... :Par... :Par... :Di! :Di! :Di! :Di! :Di! :Di! :Di! :Di! :Di! :Di! 1 2 3 4 Tuesday, 30. September 2014
  • 32. Future Work, Remaining Problems, and Limitations 14 Scalability ■very large compilation units ■incremental snapshot creation ■batching OCL execution ■experiments with large scale repository (e.g. git.eclipse.org) Heterogeneity ■MoDisco for different programming languages ■common metrics meta-model (e.g. OMG, KDM) ■VCS abstraction and support for different VCS Accessibility ■relating results to software repository entities ■persisting and exporting results Information Depth ■diff-models from comparison of compilation units ▶ Very large compilation units (CU): e.g. a 3 MB, 600 kLOC CU in org.eclipse.emf ■ tends to have lots of dependencies ➞ changes often ➞ makes problem even bigger ■ CUs are smallest common denominator between text-based VCS view and syntax-based AST view ■ smaller units require model-comparison or text-to-AST mappings ▶ Support for different programming languages: either abstraction, parallel meta-models, or mixed approach ■ MoDisco is extendable, but only Java support exists; other languages need to be implemented ➞ parallel meta-models ■ A reasonable abstraction for multiple (or all) programming language probably does not exist. ■ A shared abstract meta-model that all language meta-models extends could be an sensible compromise. Tuesday, 30. September 2014
  • 33. 15 Tuesday, 30. September 2014