Software development projects are notoriously complex and difficult to deal with. Several support tools such as issue tracking, code review and Source Control Management (SCM) systems have been introduced in the past decades to
ease development activities. While such tools efficiently track the evolution of a given aspect of the project (e.g., bug reports), they provide just a partial view of the project and often lack of advanced querying mechanisms limiting themselves to command line or simple GUI support. This is particularly true for projects that rely on Git, the most popular SCM system today.
In this paper, we propose a conceptual schema for Git and an approach that, given a Git repository, exports its data to a relational database in order to (1) promote data integration with other existing SCM tools and (2) enable writing queries on Git data using standard SQL syntax. To ensure efficiency, our approach comes with an incremental propagation mechanism that refreshes the database content with the latest modifications. We have implemented our approach in Gitana, an open-source tool available on GitHub (https://github.com/SOM-Research/Gitana).
6. Motivation
Issue trackers
Source Control
Management systems
…
Code review tools
• They provide just a partial view of the software project
• They come with insufficient means to perform non-trivial query operations
7. Motivation
Issue trackers
…
Code review tools
• They provide just a partial view of the software project
• They come with insufficient means to perform non-trivial query operations
• Specially true for Git repositories
8. Gitana
Issue trackers
…
Code review tools
Conceptual model for Git / relational database implementation
Import and incremental update processes
JSON exporter to facilitate the analysis of Git repositories in other technologies
10. Gitana
Issue trackers
…
Code review tools
• Easy integration with other tools (issue trackers, etc.) that rely on a database
• Easy inspection of any Git repository
15. Evaluation
The extraction time only refers to the initial import. Once this phase is complete, the
incremental mechanism takes over and minimizes the time for future imports.
Executed on a 2.6 GHz Intel Core i7 processor with 8 GB of RAM.
18. Conclusion
The import process is slow. It should
be parallelized.
The bad The good
The JSON export process binds the
user to the predefined output structure.
The exporter should be more tunable.
The materialized views in the database
are recalculated each time the update
process is triggered (not good for
large repositories). Incremental
maintenance on the materialized views
could be applied.
Genericity. Gitana stores all the
information in a Git repository.
Flexibility. Users can perform any kind
of query on the repository using SQL.
Incrementality. Gitana includes an
incremental propagation mechanism.
Exportability. The JSON exporter
makes the database information
available in other technologies.
Extensibility. Gitana can be easily
integrated with other DB-based tools.
Availability. Gitana is freely available
on GitHub
20. What’s next?
• Deeper integration of all kinds of project information
• One single central (database-oriented) shared access point for all the project
information, enabling lots of interesting cross-cutting queries.