SlideShare a Scribd company logo
Mining Source Code Descriptions
from Developer Communications

Sebastiano
Panichella

Jairo
Aponte

Massimiliano
Di Penta

Andrian
Marcus

Gerardo
Canfora
Context: Software Project

Sequence
diagram

Documentation

Source Code

Class
diagram

Developer

Program
Comprehension

Maintenance
Tasks
Context: Software Project

Sequence
diagram

Documentation

Source Code

Class
diagram

Difficult
understanding

Developer

Program
Comprehension

Maintenance
Tasks
Context: Software Project
describes

Sequence
diagram

Documentation

Source Code

Class
diagram

Difficult
understanding

understanding

Developer

Program
Comprehension

Maintenance
Tasks
Context: Software Project
Coming back
to the reality...

Program
Comprehension

Source Code
Maintenance
Tasks

Difficult
understanding

Developer
Idea
We argue that messages exchanged among contributors/developers
are a useful source of information to help understanding source code.

In such situations developers need to infer knowledge from,
the source code itself

Developer

source code descriptions in external
artifacts.
..................................................

When call the method IndexSplitter.split(File
destDir, String[] segs) from the Lucene cotrib
directory(contrib/misc/src/java/org/apache/luc
ene/index) it creates an index with segments
descriptor file with among contributors/developers
We argue that messages exchanged wrong data. Namely wrong
are a useful source of information to help understanding of segment
is the number representing the name source code.
that would be created next in this index.

Idea

..................................................

In such situations developers need to infer knowledge from,

CLASS: IndexSplitter
the source code itself

Developer

METHOD: split

source code descriptions in external
artifacts.
A Five Step-Approach for Mining
Method Descriptions

Developer
Step 1: Downloading emails/bugs reports and tracing them
onto classes
Two heuristics

Developer
Discussion

The discussion contains a fully-qualified class name (e.g.,
org.apache.lucene.analysis.MappingCharFilter); or the email contains a
file name (e.g., MappingCharFilter.java)
For bug reports, we complement the above heuristic by matching the
bug ID of each closed bug to the commit notes, therefore tracing the bug
report to the files changed in that commit

When call the method IndexSplitter .split(File destDir, String[] segs) from the
Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates
an index with segments descriptor file with wrong data. Namely wrong is the number
representing the name of segment that would be created next in this index.

public void split(File destDir, String[] segs) throws IOException {
destDir.mkdirs();
FSDirectory destFSDir = FSDirectory.open(destDir);
SegmentInfos destInfos = new SegmentInfos }

If some of the segments of the index already has this name this results either to
impossibility to create new segment or in crating of an corrupted segment.
Step 1: Downloading emails/bugs reports and tracing them
onto classes
Two heuristics

Developer
Discussion

The discussion contains a fully-qualified class name (e.g.,
org.apache.lucene.analysis.MappingCharFilter); or the email contains a
file name (e.g., MappingCharFilter.java)
For bug reports, we complement the above heuristic by matching the
bug ID of each closed bug to the commit notes, therefore tracing the bug
report to the files changed in that commit

When call the method IndexSplitter .split(File destDir, String[] segs) from the
Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates
an index with segments descriptor file with wrong data. Namely wrong is the number
representing the name of segment that would be created next in this index.

CLASS: IndexSplitter
public void split(File destDir, String[] segs) throws IOException {
destDir.mkdirs();
FSDirectory destFSDir = FSDirectory.open(destDir);
SegmentInfos destInfos = new SegmentInfos }

If some of the segments of the index already has this name this results either to
impossibility to create new segment or in crating of an corrupted segment.
Step 2: Extracting paragraphs
We consider as paragraphs, text section separated by one or more

white lines
Two heuristics

We prune out paragraph description from source code fragments
and/or stack Traces "by using an approach inspired by the work of
Bacchelli et al.

Developer
Discussion
When call the method IndexSplitter.split(File destDir, String[] segs) from the
PAR
Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates
1
an index with segments descriptor file with wrong data. Namely wrong is the number
representing the name of segment that would be created next in this index.

public void split(File destDir, String[] segs) throws IOException {
destDir.mkdirs();
FSDirectory destFSDir = FSDirectory.open(destDir);
SegmentInfos destInfos = new SegmentInfos }

PAR
2

If some of the segments of the index already has this name this results either to
impossibility to create new segment or in crating of an corrupted segment.

PAR
3
Step 2: Extracting paragraphs
We consider as paragraphs, text section separated by one or more

white lines
Two heuristics

We prune out paragraph description from source code fragments
and/or stack Traces "by using an approach inspired by the work of
Bacchelli et al.

Developer
Discussion
When call the method IndexSplitter.split(File destDir, String[] segs) from the
PAR
Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates
1
an index with segments descriptor file with wrong data. Namely wrong is the number
representing the name of segment that would be created next in this index.

public void split(File destDir, String[] segs) throws IOException {
destDir.mkdirs();
FSDirectory destFSDir = FSDirectory.open(destDir);
SegmentInfos destInfos = new SegmentInfos }

PAR
2

If some of the segments of the index already has this name this results either to
impossibility to create new segment or in crating of an corrupted segment.

PAR
3
Step 3: Tracing paragraphs onto methods
A) A valid paragraph must contain the keyword “method”
These paragraphs must
respect the following
two conditions:

Developer
Discussion

A)

B) and the method name must be followed by a open parenthesis—
i.e., we match “foo(”

B)

When call the method IndexSplitter.split(File destDir, String[] segs)
PAR
from the Lucene cotrib directory it creates an index with segments1
descriptor file with wrong data. Namely wrong is the number
representing the name of segment that would be created next in this
index.

CLASS: IndexSplitter
METHOD: split(
......................................................................................
......................................................................................
......................................................................................
......................................................................................
Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with
methods that assign each paragraph a score:

..........................
Problem seems to come from
MainMethodeSearchEngine in
org.eclipse.jdt.internal.ui.launcher
The Method
searchMainMethods(IProgressMonitor,
IJavaSearchScope, boolean) ,there's
a call to addSubTypes(List,
IProgressMonitor, IJavaSearchScope)
Method if includesSubtypes flag is
ON. This method add all types subtypes as soon as the given scope
encloses them without testing if
sub-types have a main! After return
IType[] before the excecution
..........................

CLASS: MainMethodSearchEngine
METHOD: serachMainMethods
SCORE
Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with
methods that assign each paragraph a score:

..........................
Problem seems to come from
MainMethodeSearchEngine in
org.eclipse.jdt.internal.ui.launcher
The Method
searchMainMethods(IProgressMonitor,
IJavaSearchScope, boolean) ,there's
a call to addSubTypes(List,
IProgressMonitor, IJavaSearchScope)
Method if includesSubtypes flag is
ON. This method add all types subtypes as soon as the given scope
encloses them without testing if
sub-types have a main! After return
IType[] before the excecution
..........................

CLASS: MainMethodSearchEngine
METHOD: serachMainMethods
% parameter = 100% -> s1= 1
SCORE

a) Method parameters:
% of parameters
s1= mentioned in the
paragraphs. Value
between 0 and 1

1 if the
method
does not
have
parameters
Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with
methods that assign each paragraph a score:

..........................
Problem seems to come from
MainMethodeSearchEngine in
org.eclipse.jdt.internal.ui.launcher
The Method
searchMainMethods(IProgressMonitor,
IJavaSearchScope, boolean) ,there's
a call to addSubTypes(List,
IProgressMonitor, IJavaSearchScope)
Method if includesSubtypes flag is
ON. This method add all types subtypes as soon as the given scope
encloses them without testing if
sub-types have a main! After return
IType[] before the excecution
..........................

CLASS: MainMethodSearchEngine
METHOD: serachMainMethods
% parameter = 100% -> s1= 1
SCORE =

1+

a) Method parameters:
% of parameters
s1= mentioned in the
paragraphs. Value
between 0 and 1

1 if the
method
does not
have
parameters

b) Syntactic descriptions (mentioning return values):
check whether the
Equal to one if
paragraph contains the
the method is
s2= keyword “return”. If YES
void.
Value equal 1, 0 otherwise
Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with
methods that assign each paragraph a score:

..........................
Problem seems to come from
MainMethodeSearchEngine in
org.eclipse.jdt.internal.ui.launcher
The Method
searchMainMethods(IProgressMonitor,
IJavaSearchScope, boolean) ,there's
a call to addSubTypes(List,
IProgressMonitor, IJavaSearchScope)
Method if includesSubtypes flag is
ON. This method add all types subtypes as soon as the given scope
encloses them without testing if
sub-types have a main! After return
IType[] before the excecution
..........................

CLASS: MainMethodSearchEngine
METHOD: serachMainMethods
% parameter = 100% -> s1= 1
SCORE =

1+ 0+ 1 = 2

a) Method parameters:
% of parameters
s1= mentioned in the
paragraphs. Value
between 0 and 1

1 if the
method
does not
have
parameters

b) Syntactic descriptions (mentioning return values):
check whether the
Equal to one if
paragraph contains the
the method is
s2= keyword “return”. If YES
void.
Value equal 1, 0 otherwise

c) Overriding/Overloading:
1 if any of the “overload” or
s3=“override” keywords appears
in the paragraph, 0 otherwise
d) Method invocations:
1 if any of the “call” or
s4=“excecute” keywords appears
in the paragraph, 0 otherwise
Step 4: Heuristic based Filtering
We defined a set of heuristics to further filter the paragraphs associated with
methods that assign each paragraph a score:

We selected paragraphs that have:
1. s1 ≥ thP = 0.5
2. s2 + s3 + s4 ≥ thH = 1

a) Method parameters:
% of parameters
s1= mentioned in the
paragraphs. Value
between 0 and 1

b) Syntactic descriptions (mentioning return values):
check whether the
Equal to one if
paragraph contains the
the method is
s2= keyword “return”. If YES
void.
Value equal 1, 0 otherwise

OK

% parameter = 100% -> s1= 1 ≥ 0.5
SCORE =

1+ 0+ 1 = 2 ≥ 1

1 if the
method
does not
have
parameters

c) Overriding/Overloading:
1 if any of the “overload” or
s3=“override” keywords appears
in the paragraph, 0 otherwise
d) Method invocations:
1 if any of the “call” or
s4=“execute” keywords appears
in the paragraph, 0 otherwise
Step 5: Similarity based Filtering
We rank filtered paragraphs through their textual similarity with the method they
are likely describing.
METHOD

PARAGRAPH

SCORE

Similarity

Method_3

Paragraph_4

2.5

96.1%

Method_1

Paragraph_1

2.5

95.6%

Method_2

Paragraph_2

1.5

97.4%

Method_3

Paragraph_3

1.5

86.2%

Method_1

Paragraph_3

1.5

79.0%

Method_3

Paragraph_2

1.5

77.5%

Method_2

Paragraph_4

1.5

64.3%

Method_2

Paragraph_3

1.3

83.2%

Method_3

Paragraph_1

1.3

73.9%

Method_2

Paragraph_1

1.3

68.7%

Method_1

Paragraph_4

1.3

53.6%

Removing:
- English stop words;
- Programming
language keywords
Using:
- Camel Case
splitting the on
remaining words
- Vector Space Model
Step 5: Similarity based Filtering
We rank filtered paragraphs through their textual similarity with the method they
are likely describing.
METHOD

PARAGRAPH

SCORE

Similarity

Method_3

Paragraph_4

2.5

96.1%

Method_1

Paragraph_1

2.5

95.6%

Method_2

Paragraph_2

1.5

97.4%

Method_3

Paragraph_3

1.5

86.2%

Method_1

Paragraph_3

1.5

79.0%

Method_3

Paragraph_2

1.5

77.5%

Method_2

Paragraph_4

1.5

64.3%

Method_2

Paragraph_3

1.3

83.2%

Method_3

Paragraph_1

1.3

73.9%

Method_2

Paragraph_1

1.3

68.7%

Method_1

Paragraph_4

1.3

53.6%

Removing:
- English stop words;
- Programming
language keywords
Using:
- Camel Case
splitting the on
remaining words
- Vector Space Model

th>=0.80
Empirical Study
• Goal: analyze source code descriptions in developer
discussions
• Purpose: investigating how developer discussions
describe methods of Java Source Code
• Quality focus: find good method description in
messages exchanged among contributors/developers
• Context: Bug-report and mailing lists of two Java
Project
 Apache Lucene and Eclipse
Context
Research Questions
 RQ1 (method coverage): How many methods from
the analyzed software systems are described by the
paragraphs identified by the proposed approach?

 RQ2 (precision): How precise is the proposed approach
in identifying method descriptions?

 RQ3 (missing descriptions): How many potentially
good method descriptions are missed by the approach?
RQ1: How many methods from the analyzed software
systems are described by the paragraphs identified
by the proposed approach?
RQ1: How many methods from the analyzed software
systems are described by the paragraphs identified
by the proposed approach?
RQ1: How many methods from the analyzed software
systems are described by the paragraphs identified
by the proposed approach?
RQ2: How precise is the proposed approach in identifying
method descriptions?
We sampled 250 descriptions from each project
RQ2: How precise is the proposed approach in identifying
method descriptions?
We sampled 250 descriptions from each project
RQ2: How precise is the proposed approach in identifying
method descriptions?
We sampled 250 descriptions from each project
RQ3: How many potentially good method descriptions are
missed by the approach?
We sampled 100 descriptions from each project
TABLE III
The analysis of a sample of 100 paragraphs traced to methods,
but not satisfying the Step 4 heuristic

System

True Negatives

False Negatives

Eclipse

78

22

Apache Lucene

67

33
Conclusion
Conclusion
Conclusion
Conclusion

More Related Content

What's hot

GSP 125 Enhance teaching/tutorialrank.com
 GSP 125 Enhance teaching/tutorialrank.com GSP 125 Enhance teaching/tutorialrank.com
GSP 125 Enhance teaching/tutorialrank.com
jonhson300
 
GSP 125 Entire Course NEW
GSP 125 Entire Course NEWGSP 125 Entire Course NEW
GSP 125 Entire Course NEW
shyamuopten
 
GSP 125 Education Specialist / snaptutorial.com
  GSP 125 Education Specialist / snaptutorial.com  GSP 125 Education Specialist / snaptutorial.com
GSP 125 Education Specialist / snaptutorial.com
stevesonz146
 
GSP 125 Exceptional Education - snaptutorial.com
GSP 125 Exceptional Education - snaptutorial.comGSP 125 Exceptional Education - snaptutorial.com
GSP 125 Exceptional Education - snaptutorial.com
donaldzs162
 
Gsp 125 Education Organization -- snaptutorial.com
Gsp 125   Education Organization -- snaptutorial.comGsp 125   Education Organization -- snaptutorial.com
Gsp 125 Education Organization -- snaptutorial.com
DavisMurphyB85
 
Gsp 125 Enthusiastic Study / snaptutorial.com
Gsp 125 Enthusiastic Study / snaptutorial.comGsp 125 Enthusiastic Study / snaptutorial.com
Gsp 125 Enthusiastic Study / snaptutorial.com
Stephenson101
 
GSP 125 RANK Education for Service--gsp125rank.com
GSP 125 RANK  Education for Service--gsp125rank.comGSP 125 RANK  Education for Service--gsp125rank.com
GSP 125 RANK Education for Service--gsp125rank.com
claric25
 
BENG 108 Final Project
BENG 108 Final ProjectBENG 108 Final Project
BENG 108 Final Project
Jason Trimble
 
Gsp 125 Future Our Mission/newtonhelp.com
Gsp 125 Future Our Mission/newtonhelp.comGsp 125 Future Our Mission/newtonhelp.com
Gsp 125 Future Our Mission/newtonhelp.com
amaranthbeg8
 
C# Unit 2 notes
C# Unit 2 notesC# Unit 2 notes
C# Unit 2 notes
Sudarshan Dhondaley
 
Git, Docker, Python Package and Module
Git, Docker, Python Package and ModuleGit, Docker, Python Package and Module
Git, Docker, Python Package and Module
Novita Sari
 
Java session04
Java session04Java session04
Java session04
Niit Care
 
ClassesandObjects
ClassesandObjectsClassesandObjects
ClassesandObjects
YourVirtual Class
 
Python reading and writing files
Python reading and writing filesPython reading and writing files
Python reading and writing files
Mukesh Tekwani
 

What's hot (14)

GSP 125 Enhance teaching/tutorialrank.com
 GSP 125 Enhance teaching/tutorialrank.com GSP 125 Enhance teaching/tutorialrank.com
GSP 125 Enhance teaching/tutorialrank.com
 
GSP 125 Entire Course NEW
GSP 125 Entire Course NEWGSP 125 Entire Course NEW
GSP 125 Entire Course NEW
 
GSP 125 Education Specialist / snaptutorial.com
  GSP 125 Education Specialist / snaptutorial.com  GSP 125 Education Specialist / snaptutorial.com
GSP 125 Education Specialist / snaptutorial.com
 
GSP 125 Exceptional Education - snaptutorial.com
GSP 125 Exceptional Education - snaptutorial.comGSP 125 Exceptional Education - snaptutorial.com
GSP 125 Exceptional Education - snaptutorial.com
 
Gsp 125 Education Organization -- snaptutorial.com
Gsp 125   Education Organization -- snaptutorial.comGsp 125   Education Organization -- snaptutorial.com
Gsp 125 Education Organization -- snaptutorial.com
 
Gsp 125 Enthusiastic Study / snaptutorial.com
Gsp 125 Enthusiastic Study / snaptutorial.comGsp 125 Enthusiastic Study / snaptutorial.com
Gsp 125 Enthusiastic Study / snaptutorial.com
 
GSP 125 RANK Education for Service--gsp125rank.com
GSP 125 RANK  Education for Service--gsp125rank.comGSP 125 RANK  Education for Service--gsp125rank.com
GSP 125 RANK Education for Service--gsp125rank.com
 
BENG 108 Final Project
BENG 108 Final ProjectBENG 108 Final Project
BENG 108 Final Project
 
Gsp 125 Future Our Mission/newtonhelp.com
Gsp 125 Future Our Mission/newtonhelp.comGsp 125 Future Our Mission/newtonhelp.com
Gsp 125 Future Our Mission/newtonhelp.com
 
C# Unit 2 notes
C# Unit 2 notesC# Unit 2 notes
C# Unit 2 notes
 
Git, Docker, Python Package and Module
Git, Docker, Python Package and ModuleGit, Docker, Python Package and Module
Git, Docker, Python Package and Module
 
Java session04
Java session04Java session04
Java session04
 
ClassesandObjects
ClassesandObjectsClassesandObjects
ClassesandObjects
 
Python reading and writing files
Python reading and writing filesPython reading and writing files
Python reading and writing files
 

Viewers also liked

Mahmoud CV 2015
Mahmoud CV 2015Mahmoud CV 2015
Mahmoud CV 2015
Mahmoud Abdalla
 
(42) viiolencia sexual estudios
(42) viiolencia sexual estudios(42) viiolencia sexual estudios
(42) viiolencia sexual estudios
Eduardo Andres Jara
 
Trabajo de investigacion semana 10
Trabajo de investigacion semana 10Trabajo de investigacion semana 10
Trabajo de investigacion semana 10
gerenciaderiesgos
 
практика Scratch
практика Scratchпрактика Scratch
практика ScratchDmitriyMezin
 

Viewers also liked (6)

Mahmoud CV 2015
Mahmoud CV 2015Mahmoud CV 2015
Mahmoud CV 2015
 
(42) viiolencia sexual estudios
(42) viiolencia sexual estudios(42) viiolencia sexual estudios
(42) viiolencia sexual estudios
 
Teletrade
TeletradeTeletrade
Teletrade
 
Ecoexpo
EcoexpoEcoexpo
Ecoexpo
 
Trabajo de investigacion semana 10
Trabajo de investigacion semana 10Trabajo de investigacion semana 10
Trabajo de investigacion semana 10
 
практика Scratch
практика Scratchпрактика Scratch
практика Scratch
 

Similar to 130614 sebastiano panichella - mining source code descriptions from developers communications

Nt1310 Unit 3 Language Analysis
Nt1310 Unit 3 Language AnalysisNt1310 Unit 3 Language Analysis
Nt1310 Unit 3 Language Analysis
Nicole Gomez
 
Object oriented basics
Object oriented basicsObject oriented basics
Object oriented basics
vamshimahi
 
chapter 5 Objectdesign.ppt
chapter 5 Objectdesign.pptchapter 5 Objectdesign.ppt
chapter 5 Objectdesign.ppt
TemesgenAzezew
 
Abap object-oriented-programming-tutorials
Abap object-oriented-programming-tutorialsAbap object-oriented-programming-tutorials
Abap object-oriented-programming-tutorials
cesarmendez78
 
Java cheat sheet
Java cheat sheet Java cheat sheet
Java cheat sheet
Saifur Rahman
 
Malware analysis
Malware analysisMalware analysis
Malware analysis
Roberto Falconi
 
These questions will be a bit advanced level 2
These questions will be a bit advanced level 2These questions will be a bit advanced level 2
These questions will be a bit advanced level 2
sadhana312471
 
Java căn bản - Chapter7
Java căn bản - Chapter7Java căn bản - Chapter7
Java căn bản - Chapter7
Vince Vo
 
OOP, Networking, Linux/Unix
OOP, Networking, Linux/UnixOOP, Networking, Linux/Unix
OOP, Networking, Linux/Unix
Novita Sari
 
Discover Tasty Query: The library for Scala program analysis
Discover Tasty Query: The library for Scala program analysisDiscover Tasty Query: The library for Scala program analysis
Discover Tasty Query: The library for Scala program analysis
James Thompson
 
Design patterns in Magento
Design patterns in MagentoDesign patterns in Magento
Design patterns in Magento
Divante
 
Chapter 7 - Defining Your Own Classes - Part II
Chapter 7 - Defining Your Own Classes - Part IIChapter 7 - Defining Your Own Classes - Part II
Chapter 7 - Defining Your Own Classes - Part II
Eduardo Bergavera
 
LectureNotes-02-DSA
LectureNotes-02-DSALectureNotes-02-DSA
LectureNotes-02-DSA
Haitham El-Ghareeb
 
What's New In Python 2.4
What's New In Python 2.4What's New In Python 2.4
What's New In Python 2.4
Richard Jones
 
V_Sound(Visualize Softwar and Understand the Document)
V_Sound(Visualize Softwar  and Understand the Document)V_Sound(Visualize Softwar  and Understand the Document)
V_Sound(Visualize Softwar and Understand the Document)
Imdad Ul Haq
 
CSCI6505 Project:Construct search engine using ML approach
CSCI6505 Project:Construct search engine using ML approachCSCI6505 Project:Construct search engine using ML approach
CSCI6505 Project:Construct search engine using ML approach
butest
 
Deployment
DeploymentDeployment
Annotation Processing in Android
Annotation Processing in AndroidAnnotation Processing in Android
Annotation Processing in Android
emanuelez
 
The smartpath information systems c plus plus
The smartpath information systems  c plus plusThe smartpath information systems  c plus plus
The smartpath information systems c plus plus
The Smartpath Information Systems,Bhilai,Durg,Chhattisgarh.
 
JDD2015: ClassIndex - szybka alternatywa dla skanowania klas - Sławek Piotrowski
JDD2015: ClassIndex - szybka alternatywa dla skanowania klas - Sławek PiotrowskiJDD2015: ClassIndex - szybka alternatywa dla skanowania klas - Sławek Piotrowski
JDD2015: ClassIndex - szybka alternatywa dla skanowania klas - Sławek Piotrowski
PROIDEA
 

Similar to 130614 sebastiano panichella - mining source code descriptions from developers communications (20)

Nt1310 Unit 3 Language Analysis
Nt1310 Unit 3 Language AnalysisNt1310 Unit 3 Language Analysis
Nt1310 Unit 3 Language Analysis
 
Object oriented basics
Object oriented basicsObject oriented basics
Object oriented basics
 
chapter 5 Objectdesign.ppt
chapter 5 Objectdesign.pptchapter 5 Objectdesign.ppt
chapter 5 Objectdesign.ppt
 
Abap object-oriented-programming-tutorials
Abap object-oriented-programming-tutorialsAbap object-oriented-programming-tutorials
Abap object-oriented-programming-tutorials
 
Java cheat sheet
Java cheat sheet Java cheat sheet
Java cheat sheet
 
Malware analysis
Malware analysisMalware analysis
Malware analysis
 
These questions will be a bit advanced level 2
These questions will be a bit advanced level 2These questions will be a bit advanced level 2
These questions will be a bit advanced level 2
 
Java căn bản - Chapter7
Java căn bản - Chapter7Java căn bản - Chapter7
Java căn bản - Chapter7
 
OOP, Networking, Linux/Unix
OOP, Networking, Linux/UnixOOP, Networking, Linux/Unix
OOP, Networking, Linux/Unix
 
Discover Tasty Query: The library for Scala program analysis
Discover Tasty Query: The library for Scala program analysisDiscover Tasty Query: The library for Scala program analysis
Discover Tasty Query: The library for Scala program analysis
 
Design patterns in Magento
Design patterns in MagentoDesign patterns in Magento
Design patterns in Magento
 
Chapter 7 - Defining Your Own Classes - Part II
Chapter 7 - Defining Your Own Classes - Part IIChapter 7 - Defining Your Own Classes - Part II
Chapter 7 - Defining Your Own Classes - Part II
 
LectureNotes-02-DSA
LectureNotes-02-DSALectureNotes-02-DSA
LectureNotes-02-DSA
 
What's New In Python 2.4
What's New In Python 2.4What's New In Python 2.4
What's New In Python 2.4
 
V_Sound(Visualize Softwar and Understand the Document)
V_Sound(Visualize Softwar  and Understand the Document)V_Sound(Visualize Softwar  and Understand the Document)
V_Sound(Visualize Softwar and Understand the Document)
 
CSCI6505 Project:Construct search engine using ML approach
CSCI6505 Project:Construct search engine using ML approachCSCI6505 Project:Construct search engine using ML approach
CSCI6505 Project:Construct search engine using ML approach
 
Deployment
DeploymentDeployment
Deployment
 
Annotation Processing in Android
Annotation Processing in AndroidAnnotation Processing in Android
Annotation Processing in Android
 
The smartpath information systems c plus plus
The smartpath information systems  c plus plusThe smartpath information systems  c plus plus
The smartpath information systems c plus plus
 
JDD2015: ClassIndex - szybka alternatywa dla skanowania klas - Sławek Piotrowski
JDD2015: ClassIndex - szybka alternatywa dla skanowania klas - Sławek PiotrowskiJDD2015: ClassIndex - szybka alternatywa dla skanowania klas - Sławek Piotrowski
JDD2015: ClassIndex - szybka alternatywa dla skanowania klas - Sławek Piotrowski
 

More from Ptidej Team

From IoT to Software Miniaturisation
From IoT to Software MiniaturisationFrom IoT to Software Miniaturisation
From IoT to Software Miniaturisation
Ptidej Team
 
Presentation
PresentationPresentation
Presentation
Ptidej Team
 
Presentation
PresentationPresentation
Presentation
Ptidej Team
 
Presentation
PresentationPresentation
Presentation
Ptidej Team
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel Briand
Ptidej Team
 
Manel Abdellatif
Manel AbdellatifManel Abdellatif
Manel Abdellatif
Ptidej Team
 
Azadeh Kermansaravi
Azadeh KermansaraviAzadeh Kermansaravi
Azadeh Kermansaravi
Ptidej Team
 
Mouna Abidi
Mouna AbidiMouna Abidi
Mouna Abidi
Ptidej Team
 
CSED - Manel Grichi
CSED - Manel GrichiCSED - Manel Grichi
CSED - Manel Grichi
Ptidej Team
 
Cristiano Politowski
Cristiano PolitowskiCristiano Politowski
Cristiano Politowski
Ptidej Team
 
Will io t trigger the next software crisis
Will io t trigger the next software crisisWill io t trigger the next software crisis
Will io t trigger the next software crisis
Ptidej Team
 
MIPA
MIPAMIPA
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.ppt
Ptidej Team
 
Thesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptThesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.ppt
Ptidej Team
 
Medicine15.ppt
Medicine15.pptMedicine15.ppt
Medicine15.ppt
Ptidej Team
 
Qrs17b.ppt
Qrs17b.pptQrs17b.ppt
Qrs17b.ppt
Ptidej Team
 
Icpc11c.ppt
Icpc11c.pptIcpc11c.ppt
Icpc11c.ppt
Ptidej Team
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
Icsme16.ppt
Ptidej Team
 
Msr17a.ppt
Msr17a.pptMsr17a.ppt
Msr17a.ppt
Ptidej Team
 
Icsoc15.ppt
Icsoc15.pptIcsoc15.ppt
Icsoc15.ppt
Ptidej Team
 

More from Ptidej Team (20)

From IoT to Software Miniaturisation
From IoT to Software MiniaturisationFrom IoT to Software Miniaturisation
From IoT to Software Miniaturisation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation
PresentationPresentation
Presentation
 
Presentation by Lionel Briand
Presentation by Lionel BriandPresentation by Lionel Briand
Presentation by Lionel Briand
 
Manel Abdellatif
Manel AbdellatifManel Abdellatif
Manel Abdellatif
 
Azadeh Kermansaravi
Azadeh KermansaraviAzadeh Kermansaravi
Azadeh Kermansaravi
 
Mouna Abidi
Mouna AbidiMouna Abidi
Mouna Abidi
 
CSED - Manel Grichi
CSED - Manel GrichiCSED - Manel Grichi
CSED - Manel Grichi
 
Cristiano Politowski
Cristiano PolitowskiCristiano Politowski
Cristiano Politowski
 
Will io t trigger the next software crisis
Will io t trigger the next software crisisWill io t trigger the next software crisis
Will io t trigger the next software crisis
 
MIPA
MIPAMIPA
MIPA
 
Thesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.pptThesis+of+laleh+eshkevari.ppt
Thesis+of+laleh+eshkevari.ppt
 
Thesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.pptThesis+of+nesrine+abdelkafi.ppt
Thesis+of+nesrine+abdelkafi.ppt
 
Medicine15.ppt
Medicine15.pptMedicine15.ppt
Medicine15.ppt
 
Qrs17b.ppt
Qrs17b.pptQrs17b.ppt
Qrs17b.ppt
 
Icpc11c.ppt
Icpc11c.pptIcpc11c.ppt
Icpc11c.ppt
 
Icsme16.ppt
Icsme16.pptIcsme16.ppt
Icsme16.ppt
 
Msr17a.ppt
Msr17a.pptMsr17a.ppt
Msr17a.ppt
 
Icsoc15.ppt
Icsoc15.pptIcsoc15.ppt
Icsoc15.ppt
 

Recently uploaded

Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
Data Hops
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 

Recently uploaded (20)

Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 

130614 sebastiano panichella - mining source code descriptions from developers communications

  • 1. Mining Source Code Descriptions from Developer Communications Sebastiano Panichella Jairo Aponte Massimiliano Di Penta Andrian Marcus Gerardo Canfora
  • 2. Context: Software Project Sequence diagram Documentation Source Code Class diagram Developer Program Comprehension Maintenance Tasks
  • 3. Context: Software Project Sequence diagram Documentation Source Code Class diagram Difficult understanding Developer Program Comprehension Maintenance Tasks
  • 4. Context: Software Project describes Sequence diagram Documentation Source Code Class diagram Difficult understanding understanding Developer Program Comprehension Maintenance Tasks
  • 5. Context: Software Project Coming back to the reality... Program Comprehension Source Code Maintenance Tasks Difficult understanding Developer
  • 6. Idea We argue that messages exchanged among contributors/developers are a useful source of information to help understanding source code. In such situations developers need to infer knowledge from, the source code itself Developer source code descriptions in external artifacts.
  • 7. .................................................. When call the method IndexSplitter.split(File destDir, String[] segs) from the Lucene cotrib directory(contrib/misc/src/java/org/apache/luc ene/index) it creates an index with segments descriptor file with among contributors/developers We argue that messages exchanged wrong data. Namely wrong are a useful source of information to help understanding of segment is the number representing the name source code. that would be created next in this index. Idea .................................................. In such situations developers need to infer knowledge from, CLASS: IndexSplitter the source code itself Developer METHOD: split source code descriptions in external artifacts.
  • 8. A Five Step-Approach for Mining Method Descriptions Developer
  • 9. Step 1: Downloading emails/bugs reports and tracing them onto classes Two heuristics Developer Discussion The discussion contains a fully-qualified class name (e.g., org.apache.lucene.analysis.MappingCharFilter); or the email contains a file name (e.g., MappingCharFilter.java) For bug reports, we complement the above heuristic by matching the bug ID of each closed bug to the commit notes, therefore tracing the bug report to the files changed in that commit When call the method IndexSplitter .split(File destDir, String[] segs) from the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates an index with segments descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index. public void split(File destDir, String[] segs) throws IOException { destDir.mkdirs(); FSDirectory destFSDir = FSDirectory.open(destDir); SegmentInfos destInfos = new SegmentInfos } If some of the segments of the index already has this name this results either to impossibility to create new segment or in crating of an corrupted segment.
  • 10. Step 1: Downloading emails/bugs reports and tracing them onto classes Two heuristics Developer Discussion The discussion contains a fully-qualified class name (e.g., org.apache.lucene.analysis.MappingCharFilter); or the email contains a file name (e.g., MappingCharFilter.java) For bug reports, we complement the above heuristic by matching the bug ID of each closed bug to the commit notes, therefore tracing the bug report to the files changed in that commit When call the method IndexSplitter .split(File destDir, String[] segs) from the Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates an index with segments descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index. CLASS: IndexSplitter public void split(File destDir, String[] segs) throws IOException { destDir.mkdirs(); FSDirectory destFSDir = FSDirectory.open(destDir); SegmentInfos destInfos = new SegmentInfos } If some of the segments of the index already has this name this results either to impossibility to create new segment or in crating of an corrupted segment.
  • 11. Step 2: Extracting paragraphs We consider as paragraphs, text section separated by one or more white lines Two heuristics We prune out paragraph description from source code fragments and/or stack Traces "by using an approach inspired by the work of Bacchelli et al. Developer Discussion When call the method IndexSplitter.split(File destDir, String[] segs) from the PAR Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates 1 an index with segments descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index. public void split(File destDir, String[] segs) throws IOException { destDir.mkdirs(); FSDirectory destFSDir = FSDirectory.open(destDir); SegmentInfos destInfos = new SegmentInfos } PAR 2 If some of the segments of the index already has this name this results either to impossibility to create new segment or in crating of an corrupted segment. PAR 3
  • 12. Step 2: Extracting paragraphs We consider as paragraphs, text section separated by one or more white lines Two heuristics We prune out paragraph description from source code fragments and/or stack Traces "by using an approach inspired by the work of Bacchelli et al. Developer Discussion When call the method IndexSplitter.split(File destDir, String[] segs) from the PAR Lucene cotrib directory (contrib/misc/src/java/org/apache/lucene/index) it creates 1 an index with segments descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index. public void split(File destDir, String[] segs) throws IOException { destDir.mkdirs(); FSDirectory destFSDir = FSDirectory.open(destDir); SegmentInfos destInfos = new SegmentInfos } PAR 2 If some of the segments of the index already has this name this results either to impossibility to create new segment or in crating of an corrupted segment. PAR 3
  • 13. Step 3: Tracing paragraphs onto methods A) A valid paragraph must contain the keyword “method” These paragraphs must respect the following two conditions: Developer Discussion A) B) and the method name must be followed by a open parenthesis— i.e., we match “foo(” B) When call the method IndexSplitter.split(File destDir, String[] segs) PAR from the Lucene cotrib directory it creates an index with segments1 descriptor file with wrong data. Namely wrong is the number representing the name of segment that would be created next in this index. CLASS: IndexSplitter METHOD: split( ...................................................................................... ...................................................................................... ...................................................................................... ......................................................................................
  • 14. Step 4: Heuristic based Filtering We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score: .......................... Problem seems to come from MainMethodeSearchEngine in org.eclipse.jdt.internal.ui.launcher The Method searchMainMethods(IProgressMonitor, IJavaSearchScope, boolean) ,there's a call to addSubTypes(List, IProgressMonitor, IJavaSearchScope) Method if includesSubtypes flag is ON. This method add all types subtypes as soon as the given scope encloses them without testing if sub-types have a main! After return IType[] before the excecution .......................... CLASS: MainMethodSearchEngine METHOD: serachMainMethods SCORE
  • 15. Step 4: Heuristic based Filtering We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score: .......................... Problem seems to come from MainMethodeSearchEngine in org.eclipse.jdt.internal.ui.launcher The Method searchMainMethods(IProgressMonitor, IJavaSearchScope, boolean) ,there's a call to addSubTypes(List, IProgressMonitor, IJavaSearchScope) Method if includesSubtypes flag is ON. This method add all types subtypes as soon as the given scope encloses them without testing if sub-types have a main! After return IType[] before the excecution .......................... CLASS: MainMethodSearchEngine METHOD: serachMainMethods % parameter = 100% -> s1= 1 SCORE a) Method parameters: % of parameters s1= mentioned in the paragraphs. Value between 0 and 1 1 if the method does not have parameters
  • 16. Step 4: Heuristic based Filtering We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score: .......................... Problem seems to come from MainMethodeSearchEngine in org.eclipse.jdt.internal.ui.launcher The Method searchMainMethods(IProgressMonitor, IJavaSearchScope, boolean) ,there's a call to addSubTypes(List, IProgressMonitor, IJavaSearchScope) Method if includesSubtypes flag is ON. This method add all types subtypes as soon as the given scope encloses them without testing if sub-types have a main! After return IType[] before the excecution .......................... CLASS: MainMethodSearchEngine METHOD: serachMainMethods % parameter = 100% -> s1= 1 SCORE = 1+ a) Method parameters: % of parameters s1= mentioned in the paragraphs. Value between 0 and 1 1 if the method does not have parameters b) Syntactic descriptions (mentioning return values): check whether the Equal to one if paragraph contains the the method is s2= keyword “return”. If YES void. Value equal 1, 0 otherwise
  • 17. Step 4: Heuristic based Filtering We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score: .......................... Problem seems to come from MainMethodeSearchEngine in org.eclipse.jdt.internal.ui.launcher The Method searchMainMethods(IProgressMonitor, IJavaSearchScope, boolean) ,there's a call to addSubTypes(List, IProgressMonitor, IJavaSearchScope) Method if includesSubtypes flag is ON. This method add all types subtypes as soon as the given scope encloses them without testing if sub-types have a main! After return IType[] before the excecution .......................... CLASS: MainMethodSearchEngine METHOD: serachMainMethods % parameter = 100% -> s1= 1 SCORE = 1+ 0+ 1 = 2 a) Method parameters: % of parameters s1= mentioned in the paragraphs. Value between 0 and 1 1 if the method does not have parameters b) Syntactic descriptions (mentioning return values): check whether the Equal to one if paragraph contains the the method is s2= keyword “return”. If YES void. Value equal 1, 0 otherwise c) Overriding/Overloading: 1 if any of the “overload” or s3=“override” keywords appears in the paragraph, 0 otherwise d) Method invocations: 1 if any of the “call” or s4=“excecute” keywords appears in the paragraph, 0 otherwise
  • 18. Step 4: Heuristic based Filtering We defined a set of heuristics to further filter the paragraphs associated with methods that assign each paragraph a score: We selected paragraphs that have: 1. s1 ≥ thP = 0.5 2. s2 + s3 + s4 ≥ thH = 1 a) Method parameters: % of parameters s1= mentioned in the paragraphs. Value between 0 and 1 b) Syntactic descriptions (mentioning return values): check whether the Equal to one if paragraph contains the the method is s2= keyword “return”. If YES void. Value equal 1, 0 otherwise OK % parameter = 100% -> s1= 1 ≥ 0.5 SCORE = 1+ 0+ 1 = 2 ≥ 1 1 if the method does not have parameters c) Overriding/Overloading: 1 if any of the “overload” or s3=“override” keywords appears in the paragraph, 0 otherwise d) Method invocations: 1 if any of the “call” or s4=“execute” keywords appears in the paragraph, 0 otherwise
  • 19. Step 5: Similarity based Filtering We rank filtered paragraphs through their textual similarity with the method they are likely describing. METHOD PARAGRAPH SCORE Similarity Method_3 Paragraph_4 2.5 96.1% Method_1 Paragraph_1 2.5 95.6% Method_2 Paragraph_2 1.5 97.4% Method_3 Paragraph_3 1.5 86.2% Method_1 Paragraph_3 1.5 79.0% Method_3 Paragraph_2 1.5 77.5% Method_2 Paragraph_4 1.5 64.3% Method_2 Paragraph_3 1.3 83.2% Method_3 Paragraph_1 1.3 73.9% Method_2 Paragraph_1 1.3 68.7% Method_1 Paragraph_4 1.3 53.6% Removing: - English stop words; - Programming language keywords Using: - Camel Case splitting the on remaining words - Vector Space Model
  • 20. Step 5: Similarity based Filtering We rank filtered paragraphs through their textual similarity with the method they are likely describing. METHOD PARAGRAPH SCORE Similarity Method_3 Paragraph_4 2.5 96.1% Method_1 Paragraph_1 2.5 95.6% Method_2 Paragraph_2 1.5 97.4% Method_3 Paragraph_3 1.5 86.2% Method_1 Paragraph_3 1.5 79.0% Method_3 Paragraph_2 1.5 77.5% Method_2 Paragraph_4 1.5 64.3% Method_2 Paragraph_3 1.3 83.2% Method_3 Paragraph_1 1.3 73.9% Method_2 Paragraph_1 1.3 68.7% Method_1 Paragraph_4 1.3 53.6% Removing: - English stop words; - Programming language keywords Using: - Camel Case splitting the on remaining words - Vector Space Model th>=0.80
  • 21. Empirical Study • Goal: analyze source code descriptions in developer discussions • Purpose: investigating how developer discussions describe methods of Java Source Code • Quality focus: find good method description in messages exchanged among contributors/developers • Context: Bug-report and mailing lists of two Java Project  Apache Lucene and Eclipse
  • 23. Research Questions  RQ1 (method coverage): How many methods from the analyzed software systems are described by the paragraphs identified by the proposed approach?  RQ2 (precision): How precise is the proposed approach in identifying method descriptions?  RQ3 (missing descriptions): How many potentially good method descriptions are missed by the approach?
  • 24. RQ1: How many methods from the analyzed software systems are described by the paragraphs identified by the proposed approach?
  • 25. RQ1: How many methods from the analyzed software systems are described by the paragraphs identified by the proposed approach?
  • 26. RQ1: How many methods from the analyzed software systems are described by the paragraphs identified by the proposed approach?
  • 27. RQ2: How precise is the proposed approach in identifying method descriptions? We sampled 250 descriptions from each project
  • 28. RQ2: How precise is the proposed approach in identifying method descriptions? We sampled 250 descriptions from each project
  • 29. RQ2: How precise is the proposed approach in identifying method descriptions? We sampled 250 descriptions from each project
  • 30. RQ3: How many potentially good method descriptions are missed by the approach? We sampled 100 descriptions from each project TABLE III The analysis of a sample of 100 paragraphs traced to methods, but not satisfying the Step 4 heuristic System True Negatives False Negatives Eclipse 78 22 Apache Lucene 67 33