The Interplay between SemanticThe Interplay between Semantic
Coupling and Co-Change ofCoupling and Co-Change of
Software Classes (journal first)Software Classes (journal first)
Nemitari Ajienka – EdgeHill University (UK)
Andrea Capiluppi – Brunel University London (UK)
Steve Counsell – Brunel University London (UK)
ICSE2018 - Gothenburg2 A Capiluppi
Outline
Rationale
Definitions: Semantic coupling and Co-change
Experimental set-up
Results
Conclusion
ICSE2018 - Gothenburg3 A Capiluppi
Rationale – Software changes: origin and impact
[Generated by Doxygen and Graphviz]
Certain classes have
the tendency to change
more
Identify patterns or
metrics of those
classes
ICSE2018 - Gothenburg4 A Capiluppi
Definitions
Semantic coupling
– Degree of relationship between classes’ semantic content
Co-change (Logical coupling)
– Based on historical data
– Classes changed in the same timeframe (day? Week?
Commit?)
ICSE2018 - Gothenburg5 A Capiluppi
Logical and Semantic Couplings
ICSE2018 - Gothenburg6 A Capiluppi
Semantic coupling: operationalisation
ICSE2018 - Gothenburg9 A Capiluppi
Research questions
RQ1: Is there a linear relationship between semantic
and logical coupling?
– Very similar classes (semantically) bound to co-evolve
more often?
RQ2: Is there a directional relationship between
semantic and logical coupling?
– If A and B are co-evolving, does it mean that they’re
semantically linked, or
– If A and B are semantically similar, will they co-evolve?
ICSE2018 - Gothenburg10 A Capiluppi
Experimental set-up
ICSE2018 - Gothenburg11 A Capiluppi
Data collection
Population: GoogleCode projects
– 2,599,222 projects
Sampling
– Only Java projects
– 95% confidence level, 5% confidence interval
– 380 projects
– All revisions+metadata downloaded
Pruning
– Projects with less than 20 revisions
– 79 non-trivial Java projects
– Avg: 117 revisions
ICSE2018 - Gothenburg12 A Capiluppi
Characteristics of projects in sample <excerpt>
ICSE2018 - Gothenburg13 A Capiluppi
Logical and Semantic Couplings
ICSE2018 - Gothenburg14 A Capiluppi
Co-evolution data (logical coupling)
Per project
Per revision
Per pair of OO classes
“what is the likelihood that class A and B co-evolve
together, based on historical data?”
– Low, medium, high likelihood
ICSE2018 - Gothenburg15 A Capiluppi
Logical coupling: operationalisation
Support
– class A modified in 3
transactions
– 2 also included changes to C
– Support for A C is 2.→
Confidence
– Confidence for A C (“C→
depends on A”) is 2/3 = 0.67
– Confidence for C A (“A→
depends on C”) is 2/4 = 0.5.
ICSE2018 - Gothenburg16 A Capiluppi
Semantic coupling: operationalisation
Per project
All revisions
Pair of classes
UrSQLController vs
UrSQLEntry
– N-gram similarity of 0.6
for n-grams of n=4
Vector Space Model (VSM)
text corpora (full code)
N-Gram technique: small
sentences (class identifiers)
Disco Word synonym: small
sentences (class identifiers)
ICSE2018 - Gothenburg17 A Capiluppi
Results
ICSE2018 - Gothenburg18 A Capiluppi
RQ1: linear relationship bw Logical and Semantic
Chi square test
Spearman’ Rank correlation (ρ)
Per project, per pair of classes, in all revisions:
– All confidence metrics (logical coupling)
– All coupling strengths between pairs
ICSE2018 - Gothenburg19 A Capiluppi
RQ1 results
No linear relationship
between the strengths of
logical and semantic
dependencies
Can’t infer co-evolution
frequency based on
semantic strength
Using semantic to predict
co-change has low
precision
ICSE2018 - Gothenburg20 A Capiluppi
RQ2: directional relationship bw Logical and Semantic
Co-changed Semantic Dependencies (CSD, in %)
– Percentage of sem dependencies that also co-change
Semantic Logical Dependencies (SLD, in %)
– Percentage of logical dependencies that are also
semantically related
ICSE2018 - Gothenburg21 A Capiluppi
RQ2: results
Number of semantic and logical
dependencies similar magn order
In most projects, 100%
semantic dependencies are also
logical dependencies
If two classes are semantically coupled, there is a high
chance that they will co-change in the future
ICSE2018 - Gothenburg22 A Capiluppi
Serendipity findings
Semantic coupling
– use full source code or just
identifiers?
– which is more efficient?
Chi-squared test of
independence
– VSM
– N-Gram + Disco
ICSE2018 - Gothenburg23 A Capiluppi
Results: class corpora or identifiers?
Class corpora and identifiers are related: if one shows
semantic coupling, so does the other
– Identifier-based techniques are much more effective
– N-gram more efficient than Disco
ICSE2018 - Gothenburg24 A Capiluppi
Take-away messages
Very similar classes (highly-semantically coupled) are
not co-changing more often
Semantically linked classes are very likely to co-evolve
Using identifiers instead of full corpora is an efficient
and effective way of measuring semantic coupling
Work shared at https://goo.gl/eLuDbB
ICSE2018 - Gothenburg25 A Capiluppi
Thank you

Interplay between semantic coupling and co-change

  • 1.
    The Interplay betweenSemanticThe Interplay between Semantic Coupling and Co-Change ofCoupling and Co-Change of Software Classes (journal first)Software Classes (journal first) Nemitari Ajienka – EdgeHill University (UK) Andrea Capiluppi – Brunel University London (UK) Steve Counsell – Brunel University London (UK)
  • 2.
    ICSE2018 - Gothenburg2A Capiluppi Outline Rationale Definitions: Semantic coupling and Co-change Experimental set-up Results Conclusion
  • 3.
    ICSE2018 - Gothenburg3A Capiluppi Rationale – Software changes: origin and impact [Generated by Doxygen and Graphviz] Certain classes have the tendency to change more Identify patterns or metrics of those classes
  • 4.
    ICSE2018 - Gothenburg4A Capiluppi Definitions Semantic coupling – Degree of relationship between classes’ semantic content Co-change (Logical coupling) – Based on historical data – Classes changed in the same timeframe (day? Week? Commit?)
  • 5.
    ICSE2018 - Gothenburg5A Capiluppi Logical and Semantic Couplings
  • 6.
    ICSE2018 - Gothenburg6A Capiluppi Semantic coupling: operationalisation
  • 7.
    ICSE2018 - Gothenburg9A Capiluppi Research questions RQ1: Is there a linear relationship between semantic and logical coupling? – Very similar classes (semantically) bound to co-evolve more often? RQ2: Is there a directional relationship between semantic and logical coupling? – If A and B are co-evolving, does it mean that they’re semantically linked, or – If A and B are semantically similar, will they co-evolve?
  • 8.
    ICSE2018 - Gothenburg10A Capiluppi Experimental set-up
  • 9.
    ICSE2018 - Gothenburg11A Capiluppi Data collection Population: GoogleCode projects – 2,599,222 projects Sampling – Only Java projects – 95% confidence level, 5% confidence interval – 380 projects – All revisions+metadata downloaded Pruning – Projects with less than 20 revisions – 79 non-trivial Java projects – Avg: 117 revisions
  • 10.
    ICSE2018 - Gothenburg12A Capiluppi Characteristics of projects in sample <excerpt>
  • 11.
    ICSE2018 - Gothenburg13A Capiluppi Logical and Semantic Couplings
  • 12.
    ICSE2018 - Gothenburg14A Capiluppi Co-evolution data (logical coupling) Per project Per revision Per pair of OO classes “what is the likelihood that class A and B co-evolve together, based on historical data?” – Low, medium, high likelihood
  • 13.
    ICSE2018 - Gothenburg15A Capiluppi Logical coupling: operationalisation Support – class A modified in 3 transactions – 2 also included changes to C – Support for A C is 2.→ Confidence – Confidence for A C (“C→ depends on A”) is 2/3 = 0.67 – Confidence for C A (“A→ depends on C”) is 2/4 = 0.5.
  • 14.
    ICSE2018 - Gothenburg16A Capiluppi Semantic coupling: operationalisation Per project All revisions Pair of classes UrSQLController vs UrSQLEntry – N-gram similarity of 0.6 for n-grams of n=4 Vector Space Model (VSM) text corpora (full code) N-Gram technique: small sentences (class identifiers) Disco Word synonym: small sentences (class identifiers)
  • 15.
    ICSE2018 - Gothenburg17A Capiluppi Results
  • 16.
    ICSE2018 - Gothenburg18A Capiluppi RQ1: linear relationship bw Logical and Semantic Chi square test Spearman’ Rank correlation (ρ) Per project, per pair of classes, in all revisions: – All confidence metrics (logical coupling) – All coupling strengths between pairs
  • 17.
    ICSE2018 - Gothenburg19A Capiluppi RQ1 results No linear relationship between the strengths of logical and semantic dependencies Can’t infer co-evolution frequency based on semantic strength Using semantic to predict co-change has low precision
  • 18.
    ICSE2018 - Gothenburg20A Capiluppi RQ2: directional relationship bw Logical and Semantic Co-changed Semantic Dependencies (CSD, in %) – Percentage of sem dependencies that also co-change Semantic Logical Dependencies (SLD, in %) – Percentage of logical dependencies that are also semantically related
  • 19.
    ICSE2018 - Gothenburg21A Capiluppi RQ2: results Number of semantic and logical dependencies similar magn order In most projects, 100% semantic dependencies are also logical dependencies If two classes are semantically coupled, there is a high chance that they will co-change in the future
  • 20.
    ICSE2018 - Gothenburg22A Capiluppi Serendipity findings Semantic coupling – use full source code or just identifiers? – which is more efficient? Chi-squared test of independence – VSM – N-Gram + Disco
  • 21.
    ICSE2018 - Gothenburg23A Capiluppi Results: class corpora or identifiers? Class corpora and identifiers are related: if one shows semantic coupling, so does the other – Identifier-based techniques are much more effective – N-gram more efficient than Disco
  • 22.
    ICSE2018 - Gothenburg24A Capiluppi Take-away messages Very similar classes (highly-semantically coupled) are not co-changing more often Semantically linked classes are very likely to co-evolve Using identifiers instead of full corpora is an efficient and effective way of measuring semantic coupling Work shared at https://goo.gl/eLuDbB
  • 23.
    ICSE2018 - Gothenburg25A Capiluppi Thank you