9.6 Million Links in
Source Code Comments:
Purpose, Evolution, and Decay
Hideaki Hata, Christoph Treude, Raula Gaikovina Kula, Takashi Ishio
Links in my own code comment
https://stackoverflow.com/a/23838584 2
Later, I found a new answer
https://stackoverflow.com/a/48027778
My code is
obsolete.
I can improve
my code.
3
Links in code comments
Links explicitly indicate external
sources related to the code.
● Important clues of developers’
intentions
● One aspect of software
documentation
4
Related studies of code comments
● Task annotations [Storey et al., ICSE 2008]
● Self-admitted technical debt
[Potdar and Shihab, ICSME 2014]
● Fragile comments [Ratol and Robillard, ASE 2017]
● License evolution [Wu et al., EMSE 2017]
5
Related studies of links (outside SE)
CHASE 2019
We encourage authors of accepted papers to make their data
public, in order to enhance the transparency of the process
and the reproducibility of the results.
We encourage you to avoid putting the data on your own
websites or systems like Dropbox, since more than 30% of
them will not work in a 4 years period
Koehler, Web page change and persistence—A four‐year
longitudinal study, https://doi.org/10.1002/asi.10018.
6
How are links used?
How do links evolve?
How do links suffer from decay?
Missing study: links in source code comments
7
Data collection from
The GHTorrent project
https://github.blog/2015-08-19-language-trends-on-github/
25,925 repos
8
Link existence (at least one) in repositories
9
89% in total
9.6 million links
10
Domains
11
Statistically representative sample
Strata by domain # domains # links Sample size
common 2,013 9,128,444 384
sometimes 30,851 502,083 384
rare 24,175 24,175 378
sum 57,039 9,654,702 1,146
12
Link targets
14
● Specification
● Organization homepage
● Tutorial or article
● API documentation
● Blog post
● Bug report
● Application
● Personal homepage
● Code
● Stack Overflow thread
● Research paper
Metadata
○ author, organization, or license
Source/attribution
○ a source of some aspect of the source code
Self-admitted technical debt
○ causes of technical debt
Link purpose
15
● License replacement
● Organization update
● Change to https
● Content move
● Content update
● Content change
● Other
Link evolution (88 out of 1,146)
16
Link target evolution in
The SOTorrent Dataset 17
Dead links
https://github.com/sveawebpay/php-integration/pull/82 18
19
● Links in code comments are prevalent.
● Common link targets: licenses, software homepages, and dead links.
● Common purposes: metadata and attribution.
● Links are rarely updated.
● 75% of Stack Overflow threads attracted
at least one change after being first referenced.
● 9% of the link targets are not available, in all unique links.
● Developers generally responded positively to the request to fix dead links.
Further challenges: supporting coevolution
20
● Further understanding of
external sources
● Further studies of source code
comments
● Tool support for external
source referencing, tracking,
and updating
Summary and online appendix
https://github.com/NAIST-SE/9.6MillionLinks 21

9.6 million links in source code comments: purpose, evolution, and decay

  • 1.
    9.6 Million Linksin Source Code Comments: Purpose, Evolution, and Decay Hideaki Hata, Christoph Treude, Raula Gaikovina Kula, Takashi Ishio
  • 2.
    Links in myown code comment https://stackoverflow.com/a/23838584 2
  • 3.
    Later, I founda new answer https://stackoverflow.com/a/48027778 My code is obsolete. I can improve my code. 3
  • 4.
    Links in codecomments Links explicitly indicate external sources related to the code. ● Important clues of developers’ intentions ● One aspect of software documentation 4
  • 5.
    Related studies ofcode comments ● Task annotations [Storey et al., ICSE 2008] ● Self-admitted technical debt [Potdar and Shihab, ICSME 2014] ● Fragile comments [Ratol and Robillard, ASE 2017] ● License evolution [Wu et al., EMSE 2017] 5
  • 6.
    Related studies oflinks (outside SE) CHASE 2019 We encourage authors of accepted papers to make their data public, in order to enhance the transparency of the process and the reproducibility of the results. We encourage you to avoid putting the data on your own websites or systems like Dropbox, since more than 30% of them will not work in a 4 years period Koehler, Web page change and persistence—A four‐year longitudinal study, https://doi.org/10.1002/asi.10018. 6
  • 7.
    How are linksused? How do links evolve? How do links suffer from decay? Missing study: links in source code comments 7
  • 8.
    Data collection from TheGHTorrent project https://github.blog/2015-08-19-language-trends-on-github/ 25,925 repos 8
  • 9.
    Link existence (atleast one) in repositories 9 89% in total
  • 10.
  • 11.
  • 12.
    Statistically representative sample Strataby domain # domains # links Sample size common 2,013 9,128,444 384 sometimes 30,851 502,083 384 rare 24,175 24,175 378 sum 57,039 9,654,702 1,146 12
  • 13.
    Link targets 14 ● Specification ●Organization homepage ● Tutorial or article ● API documentation ● Blog post ● Bug report ● Application ● Personal homepage ● Code ● Stack Overflow thread ● Research paper
  • 14.
    Metadata ○ author, organization,or license Source/attribution ○ a source of some aspect of the source code Self-admitted technical debt ○ causes of technical debt Link purpose 15
  • 15.
    ● License replacement ●Organization update ● Change to https ● Content move ● Content update ● Content change ● Other Link evolution (88 out of 1,146) 16
  • 16.
    Link target evolutionin The SOTorrent Dataset 17
  • 17.
  • 18.
    19 ● Links incode comments are prevalent. ● Common link targets: licenses, software homepages, and dead links. ● Common purposes: metadata and attribution. ● Links are rarely updated. ● 75% of Stack Overflow threads attracted at least one change after being first referenced. ● 9% of the link targets are not available, in all unique links. ● Developers generally responded positively to the request to fix dead links.
  • 19.
    Further challenges: supportingcoevolution 20 ● Further understanding of external sources ● Further studies of source code comments ● Tool support for external source referencing, tracking, and updating
  • 20.
    Summary and onlineappendix https://github.com/NAIST-SE/9.6MillionLinks 21