Be the first to like this
The existing large amount of OSS artifacts has provided abundant materials for understanding how code is reused in open source universe, in particular, what code pieces are mostly reused, in what circumstances people reuse code, and so forth. Understanding this process could help with legacy software maintenance, as well as help to explore best practice of software development. Targeting the change history data of thousands of open source projects, we try to answer the following question: First, how is code reused by other projects? Second, how are code files organized in project and how does this organization structure change over time? To answer these questions, there are several technical difficulties we have to overcome. For example, because of the different kinds of VCSs, it is hard to figure out a uniform model which can represent the evolution progress of code files stored in them. Also, each VCS may have its own data format, so, extracting data from them is a big challenge. Furthermore, using current software algorithm and hardware platform to analyze the version iteration and reuse information of about a billion code files is another challenge.