The paper titled as ModelMine A Tool to Facilitate Mining Models from Open Source Repositories is presented by Sayed Mohsin Reza at ACM / IEEE 23rd International Conference on Model Driven Engineering Languages and Systems (MODELS 2020) conference in Montreal Canada
Tool URL: https://www.smreza.com/projects/modelmine
Abstract:
Mining Software Repositories (MSR) has opened up new pathways and rich sources of data for research and practical purposes. This research discipline facilitates mining data from open source repositories and analyzing software defects, development activities, processes, patterns, and more. Contemporary mining tools are geared towards data extraction, analysis primarily from textual artifacts and have limitations in representation, ranking, and availability. This paper presents ModelMine, a novel mining tool that focuses on mining model-based artifacts and designs from open source repositories. ModelMine is designed particularly to mine software repositories, artifacts and commit history to uncover information about software designs and practices in open-source communities. ModelMine supports features that include identification and ranking of open source repositories based on the extent of the presence of model-based artifacts and querying repositories to extract models and design artifacts based on customizable criteria. It supports phase-by-phase caching of intermediate results to speed up the processing to enable efficient mining of data. We compare ModelMine against a state-of-the-art a tool named PyDriller in terms of performance and usability. The results show that ModelMine has the potential to become instrumental for cross-disciplinary research that combines modeling and design with repository mining and artifacts extraction.
ModelMine a tool to facilitate mining models from open source repositories presentation models 2020
1. Sayed Mohsin
Reza
ModelMine: A Tool to Facilitate Mining Models
from Open Source Repositories
Sayed Mohsin Reza, Omar Badreddin, and Khandoker Rahad
Presenter
Sayed Mohsin Reza
PhD Student
Department of Computer Science
University of Texas, USA
Email: sreza3@miners.utep.edu
Website: https://www.smreza.com/
Tool available at https://www.smreza.com/projects/modelmine/
2. Sayed Mohsin
Reza
Introduction & Background
• Mining Software Repositories (MSR) has witnessed tremendous growth in
the past few years
• MSR contributes to establishing research agendas in software
development, cost estimation, testing, quality assurance and more
• MSR contributes to analyzing software defects, development activities,
processes, patterns, and more.
Outcome
Reducing Software Development Cost
Better Software Design
Better Software Quality
2
3. Sayed Mohsin
Reza
Problem & Motivation
• Limited tools that target mining models and design artifacts from open
source repositories
• Limitations
• Both Repos & Mining tools focus on textual artifacts
• Data Representation
• Limited Search Criteria
• Limited Ranking Scope
• Faced trouble when writing paper “The Human in MDE Loop: A Case Study
on Integrating Handwritten Code in Model-Driven Engineering Repositories”
accepted recently in a journal.
3
4. Sayed Mohsin
Reza
Existing Tools
• Metric Miner (https://github.com/Woutrrr/metricminer2 )
Capable: Mining Commits, modifications, export result in CSV format
Limitation: search functionality, Result ranking and need JAVA coding
knowledge.
• Qualitas Corpus (http://qualitascorpus.com/ )
Collection of software systems intended to be used for empirical studies of
code artefacts
Limitation: updates of the systems, search functionality
• GHTorrent (https://ghtorrent.org/ )
Pros: provides repositories of GitHub with extracted metadata
Cons: provides repositories of GitHub in a static way.
4
5. Sayed Mohsin
Reza
Contributions
This paper presents a novel model mining tool called ModelMine
• Facilitates mining models with various search criteria
• faster data extraction for non-textual artifacts
• User friendly tool available tool to non MSR experts
• Ensure search capability in the following mining areas
1. Model based Repository Search - Available in some tools
2. Mode based Artifact Search – Novel feature
3. Commit History Search – Available in PyDriller, Metric Miner
5
7. Sayed Mohsin
Reza
Tool Demonstration
Link: https://www.smreza.com/projects/modelmine/
7
Live Demo
Upcoming Topics
• Evaluation of the tool
• Results of evaluation
• Conclusion
8. Sayed Mohsin
Reza
Evaluation
Comparative Analysis with PyDriller.
1. Performance Analysis – learn about execution time and memory
consumption
2. Usability Analysis - how easy the tool is to learn
• Ten participants working in software engineering research.
• Eight - doctoral students
• two - master’s students in computer science
Evaluation Forms: https://forms.gle/kJcWASsKM13AHh9a6
8
9. Sayed Mohsin
Reza
Evaluation Tasks
1. Task 1 (Size related): Retrieve the list of repositories: Minimum 1 UML
Model and repository size > 30 MB.
2. Task 2 (Time related): Retrieve the list of repositories: Minimum 1 UML
Model and created between January 2019 and December 2019.
3. Task 3 (File property related): Retrieve the list of artifacts with .𝑢𝑚𝑙 file
extension.
4. Task 4 (Commit related): Retrieve the list of commits: with a model artifact.
5. Task 5 (File property + commit related): Retrieve the list of commits: with
any model artifacts (any model-based file extension)
9
10. Sayed Mohsin
Reza
Performance Analysis Results
Performance results show that
PyDriller takes more time and memory
than ModelMine.
• PyDriller downloads whole git file
of a repository and mine commit
information.
• ModelMine fetches the information
directly without downloading any
file and have no intermediate
process.
Figure: Performance evaluation
results
10
11. Sayed Mohsin
Reza
Usability Analysis Results
• ModelMine has better usability in
all usability criteria than PyDriller
• In user interface & learning curve
category, ModelMine has 50% more
ratings than PyDriller.
• One participant comments that
ModelMine provides faster learning
experience than PyDriller due to its
easy UI design and better
readability.
Figure: Usability study results
11
12. Sayed Mohsin
Reza
Conclusion
• Mining models from open source repositories with non-textual-based
artifacts
• User Friendly tool for non experts.
• Performance superior to existing mining tools
• Tool available at https://www.smreza.com/projects/modelmine/
12
Questions?