SlideShare a Scribd company logo
Automatic link extraction:
The good, the bad and the ugly in
software ecosystem mining
Eleni Constantinou
Tom Mens
Software ecosystem health
Prof. Dr. Tom Mens
Software Engineering Lab
University of Mons
• Ecosystem evolution
• Ecosystem health
www.secohealth.org
2
Software ecosystem health
Sustainability
Longevity Growth
Success
Resilience Survival
Diversity Popularity
3
Software ecosystem health
4
Software ecosystem evolution
Combine data from different sources
• Package manager data
• Repository activity
5
Combine Information
Package manager GitHub repositories
6
Combine Information
Package manager GitHub repositories
GOAL
Link packages to GitHub
repositories
6
Combine Information
Package manager GitHub repositories
6
Combine Information
7
Combine Information
7
Combine Information
7
Combine Information
8
Combine Information
8
The good
9
The good
10
The bad
11
The Bad
Irrelevant links
12
The bad - Irrelevant links
13
The bad - Irrelevant links
14
The Bad
Invalid GitHub links
15
The bad - Invalid GitHub links
16
The bad - Invalid GitHub links
16
The bad - Invalid GitHub links
17
The Bad
GitHub pages
18
The bad - GitHub pages
19
The bad - GitHub pages
https://github.com/krmbzds/qf
19
The Bad
GitHub user profiles
20
The bad - GitHub user profiles
21
The bad - GitHub user profiles
https://github.com/ghaki/[GEM_NAME]
21
The Bad
Sub-directory GitHub links
22
The bad - Sub-directory GitHub links
23
The bad - Sub-directory GitHub links
23
The ugly
24
The ugly
25
The ugly
25
Company Webpage
The ugly
25
Company Webpage
GitHub organization profile
The ugly
25
Company Webpage
GitHub organization profile
GEM_NAME
GEM_
NAME
GEM_REPOSITORY
How to automate
this process?
GitHub link?
Repository
link?
No
Yes
GitHub page?
Link exists?
Yes
No
Link to
GitHub
repository?
IRRELEVANT
Repository
found?
Yes
No
DELETED
REPOSITORY
Same repository
name (GitHub
API)?Yes
No
RENAMED
REPOSITORY
GOODYes
No
IRRELEVANT
User/project
page?
Yes
No IRRELEVANT
USER PROFILE/
PROJECT PAGE
Yes
No
IRRELEVANT
COMPANY
PAGE
Yes
No
26
How to automate this process?
Package
manager
External
Source
Software
repository
27
GitHub links
Investigate external sources
How to automate this process?
Package
manager
External
Source
Software
repository
27
GitHub links
Investigate external sources
The RubyGems ecosystem
Category Packages
Empty Fields 19,645
Good 69,740
Bad
Irrelevant links 8,650
Invalid GitHub links
(deleted)
11,153
Invalid GitHub links
(renamed)
7,069
GitHub pages 401
GitHub user profiles 858
GitHub sub-directories 172
Ugly (candidates) 3,016
28
120,704
packages
Improve and fully automate this process
Verify our process with other ecosystems
Provide support for additional platforms
Merge multi-platform activity
What next?
29
http://www.econst.eu
@eleni_const

More Related Content

Similar to Automatic link extraction: The good, the bad and the ugly in software ecosystem mining

Hidden Speed Bumps on the Road to "Continuous"
Hidden Speed Bumps on the Road to "Continuous"Hidden Speed Bumps on the Road to "Continuous"
Hidden Speed Bumps on the Road to "Continuous"
Sonatype
 
Hacktoberfest 2020 - Open source for beginners
Hacktoberfest 2020 - Open source for beginnersHacktoberfest 2020 - Open source for beginners
Hacktoberfest 2020 - Open source for beginners
DeepikaRana30
 
Open Source Collaboration in Drug Discovery in Pharma
Open Source Collaboration in Drug Discovery in PharmaOpen Source Collaboration in Drug Discovery in Pharma
Open Source Collaboration in Drug Discovery in Pharma
Kees van Bochove
 
Better Data for a Better World
Better Data for a Better WorldBetter Data for a Better World
Better Data for a Better World
Rothamsted Research, UK
 
Will Postgres Live Forever?
Will Postgres Live Forever?Will Postgres Live Forever?
Will Postgres Live Forever?
EDB
 
Building a Distributed Collaborative Data Pipeline with Apache Spark
Building a Distributed Collaborative Data Pipeline with Apache SparkBuilding a Distributed Collaborative Data Pipeline with Apache Spark
Building a Distributed Collaborative Data Pipeline with Apache Spark
Databricks
 
GitHub Universe: 2019: Exemplars, Laggards, and Hoarders A Data-driven Look a...
GitHub Universe: 2019: Exemplars, Laggards, and Hoarders A Data-driven Look a...GitHub Universe: 2019: Exemplars, Laggards, and Hoarders A Data-driven Look a...
GitHub Universe: 2019: Exemplars, Laggards, and Hoarders A Data-driven Look a...
Gene Kim
 
Dissecting and Mitigating the Privacy Risk of Personal Cloud Apps (at PETS 2016)
Dissecting and Mitigating the Privacy Risk of Personal Cloud Apps (at PETS 2016)Dissecting and Mitigating the Privacy Risk of Personal Cloud Apps (at PETS 2016)
Dissecting and Mitigating the Privacy Risk of Personal Cloud Apps (at PETS 2016)
Hamza Harkous
 
The adoption of FOSS workfows in commercial software development: the case of...
The adoption of FOSS workfows in commercial software development: the case of...The adoption of FOSS workfows in commercial software development: the case of...
The adoption of FOSS workfows in commercial software development: the case of...dmgerman
 
i5k Workspace Workshop - AGS2017
i5k Workspace Workshop - AGS2017i5k Workspace Workshop - AGS2017
i5k Workspace Workshop - AGS2017
Monica Poelchau
 
UC San Diego Campus LISA 2014 - Source Code Management
UC San Diego Campus LISA 2014 - Source Code ManagementUC San Diego Campus LISA 2014 - Source Code Management
UC San Diego Campus LISA 2014 - Source Code Management
Matthew Critchlow
 
Software Security Assurance for Devops
Software Security Assurance for DevopsSoftware Security Assurance for Devops
Software Security Assurance for Devops
Jerika Phelps
 
Software Security Assurance for DevOps
Software Security Assurance for DevOpsSoftware Security Assurance for DevOps
Software Security Assurance for DevOps
Black Duck by Synopsys
 
Markings of a Healthy OSS Project
Markings of a Healthy OSS ProjectMarkings of a Healthy OSS Project
Markings of a Healthy OSS Project
Clement Ho
 
Seven Deadly Sins against Research Software Sustainability
Seven Deadly Sins against Research Software SustainabilitySeven Deadly Sins against Research Software Sustainability
Seven Deadly Sins against Research Software Sustainability
anpawlik
 
ScienceCloud: Collaborative Workflows in Biologics R&D
ScienceCloud: Collaborative Workflows in Biologics R&DScienceCloud: Collaborative Workflows in Biologics R&D
ScienceCloud: Collaborative Workflows in Biologics R&D
BIOVIA
 
Council of Science Editors - Viewing Social Media Through Different Lenses
Council of Science Editors - Viewing Social Media Through Different LensesCouncil of Science Editors - Viewing Social Media Through Different Lenses
Council of Science Editors - Viewing Social Media Through Different Lenses
Darrell W. Gunter
 
Divine and felonios cyber security devopsdays austin 2018
Divine and felonios cyber security  devopsdays austin 2018Divine and felonios cyber security  devopsdays austin 2018
Divine and felonios cyber security devopsdays austin 2018
John Willis
 
Better Software, Better Research
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better Research
Carole Goble
 

Similar to Automatic link extraction: The good, the bad and the ugly in software ecosystem mining (20)

Hidden Speed Bumps on the Road to "Continuous"
Hidden Speed Bumps on the Road to "Continuous"Hidden Speed Bumps on the Road to "Continuous"
Hidden Speed Bumps on the Road to "Continuous"
 
Hacktoberfest 2020 - Open source for beginners
Hacktoberfest 2020 - Open source for beginnersHacktoberfest 2020 - Open source for beginners
Hacktoberfest 2020 - Open source for beginners
 
Open Source Collaboration in Drug Discovery in Pharma
Open Source Collaboration in Drug Discovery in PharmaOpen Source Collaboration in Drug Discovery in Pharma
Open Source Collaboration in Drug Discovery in Pharma
 
Better Data for a Better World
Better Data for a Better WorldBetter Data for a Better World
Better Data for a Better World
 
Will Postgres Live Forever?
Will Postgres Live Forever?Will Postgres Live Forever?
Will Postgres Live Forever?
 
Building a Distributed Collaborative Data Pipeline with Apache Spark
Building a Distributed Collaborative Data Pipeline with Apache SparkBuilding a Distributed Collaborative Data Pipeline with Apache Spark
Building a Distributed Collaborative Data Pipeline with Apache Spark
 
GitHub Universe: 2019: Exemplars, Laggards, and Hoarders A Data-driven Look a...
GitHub Universe: 2019: Exemplars, Laggards, and Hoarders A Data-driven Look a...GitHub Universe: 2019: Exemplars, Laggards, and Hoarders A Data-driven Look a...
GitHub Universe: 2019: Exemplars, Laggards, and Hoarders A Data-driven Look a...
 
Dissecting and Mitigating the Privacy Risk of Personal Cloud Apps (at PETS 2016)
Dissecting and Mitigating the Privacy Risk of Personal Cloud Apps (at PETS 2016)Dissecting and Mitigating the Privacy Risk of Personal Cloud Apps (at PETS 2016)
Dissecting and Mitigating the Privacy Risk of Personal Cloud Apps (at PETS 2016)
 
The adoption of FOSS workfows in commercial software development: the case of...
The adoption of FOSS workfows in commercial software development: the case of...The adoption of FOSS workfows in commercial software development: the case of...
The adoption of FOSS workfows in commercial software development: the case of...
 
i5k Workspace Workshop - AGS2017
i5k Workspace Workshop - AGS2017i5k Workspace Workshop - AGS2017
i5k Workspace Workshop - AGS2017
 
How's it Going?
How's it Going?How's it Going?
How's it Going?
 
UC San Diego Campus LISA 2014 - Source Code Management
UC San Diego Campus LISA 2014 - Source Code ManagementUC San Diego Campus LISA 2014 - Source Code Management
UC San Diego Campus LISA 2014 - Source Code Management
 
Software Security Assurance for Devops
Software Security Assurance for DevopsSoftware Security Assurance for Devops
Software Security Assurance for Devops
 
Software Security Assurance for DevOps
Software Security Assurance for DevOpsSoftware Security Assurance for DevOps
Software Security Assurance for DevOps
 
Markings of a Healthy OSS Project
Markings of a Healthy OSS ProjectMarkings of a Healthy OSS Project
Markings of a Healthy OSS Project
 
Seven Deadly Sins against Research Software Sustainability
Seven Deadly Sins against Research Software SustainabilitySeven Deadly Sins against Research Software Sustainability
Seven Deadly Sins against Research Software Sustainability
 
ScienceCloud: Collaborative Workflows in Biologics R&D
ScienceCloud: Collaborative Workflows in Biologics R&DScienceCloud: Collaborative Workflows in Biologics R&D
ScienceCloud: Collaborative Workflows in Biologics R&D
 
Council of Science Editors - Viewing Social Media Through Different Lenses
Council of Science Editors - Viewing Social Media Through Different LensesCouncil of Science Editors - Viewing Social Media Through Different Lenses
Council of Science Editors - Viewing Social Media Through Different Lenses
 
Divine and felonios cyber security devopsdays austin 2018
Divine and felonios cyber security  devopsdays austin 2018Divine and felonios cyber security  devopsdays austin 2018
Divine and felonios cyber security devopsdays austin 2018
 
Better Software, Better Research
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better Research
 

Recently uploaded

Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Studia Poinsotiana
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
zeex60
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
Wasswaderrick3
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
S.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary levelS.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary level
ronaldlakony0
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 

Recently uploaded (20)

Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
 
in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
S.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary levelS.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary level
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 

Automatic link extraction: The good, the bad and the ugly in software ecosystem mining