Evolving(So*ware(Ecosystems(
Marktoberdorf(Summer(School(2014

Lecture(2
Tom(Mens(
So#ware(Engineering(Lab(
University(of(...
So#ware(Evolu7on
So#ware(Evolu7on(
Lehman’s(Laws
• Manny(Lehman((1925(?(2010)(
– Studied(30?year(evolu7on(of

IBM(OS/360(mainframe(
– Propo...
So#ware(Evolu7on(
Lehman’s(Laws
• ConGnuing(change(
• A([…](program(that(is(used(in(a(real?world(environment(must(be(con7n...
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
February(2014(?(CSMR?WCRE(So#...
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
February(2014(?(CSMR?WCRE(So#...
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
February(2014(?(CSMR?WCRE(So#...
So#ware(Ecosystems
Defini&ons
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
So#ware(Ecosystems

Relevant(...
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
So#ware(Ecosystems

Relevant(...
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
So#ware(Ecosystems(
Defini7ons...
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
So#ware(Ecosystems(
Defini7ons...
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
So#ware(Ecosystems(
Defini7ons...
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering 42
So#ware(Ecosystems(
Defini7...
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering 43
So#ware(Ecosystems(
Defini7...
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
So#ware(Ecosystems(
Defini7ons...
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
So#ware(Ecosystems(
Defini7ons...
So#ware(Ecosystems
Challenges
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
So#ware(Ecosystem(Analysis(
C...
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
So#ware(Ecosystem(Analysis(
C...
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
So#ware(Ecosystem(Analysis(
C...
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
So#ware(Ecosystem(Analysis(
C...
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
So#ware(Ecosystem(Analysis(
C...
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
Technical(Challenges(
Extrac7...
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
Technical(Challenges(
Extrac7...
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
Technical(Challenges(
Iden7ty...
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
Technical(Challenges(
Iden7ty...
56
6-3-2013
Ordering Rajesh Sola Sola Rajesh
Spelling: misspelling,
diacritics, punctuation
Rene Engelhard Fene Engelhard
...
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
Technical(Challenges(
Iden7ty...
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
Technical(Challenges(
Iden7ty...
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
Technical(Challenges(
Iden7ty...
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
Technical(Challenges(
Iden7ty...
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
Technical(Challenges(
Iden7ty...
July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering
Research(challenges(
Accessib...
Upcoming SlideShare
Loading in...5
×

MOD2014-Mens-Lecture2

246

Published on

This is my second in a series of 4 lectures on the topic of Evolving Software Ecosystems, presented during the NATO Marktoberdorf 2014 Summer School on Dependable Software System Engineering in Germany, August 2014.

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
246
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
14
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

MOD2014-Mens-Lecture2

  1. 1. Evolving(So*ware(Ecosystems( Marktoberdorf(Summer(School(2014
 Lecture(2 Tom(Mens( So#ware(Engineering(Lab( University(of(Mons informa7que.umons.ac.be/genlog
  2. 2. So#ware(Evolu7on
  3. 3. So#ware(Evolu7on( Lehman’s(Laws • Manny(Lehman((1925(?(2010)( – Studied(30?year(evolu7on(of
 IBM(OS/360(mainframe( – Proposed(“laws”(that(reflect(established( observa/ons(based(on*empirical*evidence( – EPSRC?funded(FEAST(project( • Addi7onal(evidence(on(more(industrial(so#ware(projects 31 Lehman and Belady (1985). Software Evolution – Processes of Software Change. Academic Press. Lehman (1997). Laws of Software Evolution Revisited. Springer LNCS 1149, pp. 108-124
  4. 4. So#ware(Evolu7on( Lehman’s(Laws • ConGnuing(change( • A([…](program(that(is(used(in(a(real?world(environment(must(be(con7nually( adapted,(else*it*becomes*progressively*less*sa/sfactory.* • Increasing(complexity( • As(a(program(is(evolved(its(complexity(increases(unless*work*is*done*to* maintain*or*reduce*it.* • ConGnuing(growth( • Func7onal(content(of(a(program(must(be(con7nually(increased(to(maintain( user(sa7sfac7on(over(its(life7me.( • Declining(quality( • […](programs(will(be(perceived(as(of(declining(quality(unless(rigorously( maintained(and(adapted(to(a(changing(opera7onal(environment( • Feedback(system( • […](programming(processes(cons7tute(mul7?loop,(mul7?level(feedback( systems(and(must(be(treated(as(such(to(be(successfully(modified(or(improved 32
  5. 5. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering February(2014(?(CSMR?WCRE(So#ware(Evolu7on(Week,(Antwerp,(Belgium So#ware(Evolu7on
 Relevant(Books 33 2006 Consider software evolution process as a multi-loop multi-level feedback system ! - Reports on results from the EPSRC- funded FEAST project - Supporting empirical evidence for Lehman’s laws of software evolution
  6. 6. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering February(2014(?(CSMR?WCRE(So#ware(Evolu7on(Week,(Antwerp,(Belgium So#ware(Evolu7on
 Relevant(Books 34 Relevant chapters ! - Analyzing Software Repositories to Understand Software Evolution - D’Ambros et al. ! - Predicting Bugs From History - Zimmermann et al. ! - Empirical Studies of Open Source Evolution - Fernandez-Ramil et al. 2008
  7. 7. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering February(2014(?(CSMR?WCRE(So#ware(Evolu7on(Week,(Antwerp,(Belgium So#ware(Evolu7on
 Relevant(Books 35 Mens, Tom; Serebrenik, Alexander; Cleve, Anthony (Eds.) 2014, XXIII, 404 p. ! Springer, ISBN 978-3-642-45398-4 Chapter 10 Studying Evolving Software Ecosystems based on Ecological Models Tom Mens, Ma¨elick Claes, Philippe Grosjean and Alexander Serebrenik Research on software evolution is very active, but evolutionary principles, models and theories that properly explain why and how software systems evolve over time are still lacking. Similarly, more empirical research is needed to understand how different software projects co-exist and co-evolve, and how contributors collaborate within their encompassing software ecosystem. In this chapter, we explore the differences and analogies between natural ecosys- tems and biological evolution on the one hand, and software ecosystems and soft- ware evolution on the other hand. The aim is to learn from research in ecology to advance the understanding of evolving software ecosystems. Ultimately, we wish to use such knowledge to derive diagnostic tools aiming to analyse and optimise the fitness of software projects in their environment, and to help software project communities in managing their projects better.
  8. 8. So#ware(Ecosystems Defini&ons
  9. 9. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering So#ware(Ecosystems
 Relevant(Books 37 MIT(Press,(2005 2013
  10. 10. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering So#ware(Ecosystems
 Relevant(PhD(Disserta7ons 38 Reverse Engineering Software Ecosystems Doctoral Dissertation submitted to the Faculty of Informatics of the University of Lugano in partial fulfillment of the requirements for the degree of Doctor of Philosophy presented by Mircea F. Lungu under the supervision of Michele Lanza September 2009 Social Aspects of Collaboration in Online Software Communities Bogdan Vasilescu Eindhoven University of Technology 2014
  11. 11. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering So#ware(Ecosystems( Defini7ons • Messerschmit(&(Szyperski,(2003([book]( • “a*collec/on*of*so,ware*products*that*have*some*given* degree*of*symbio/c*rela/onships.” 39
  12. 12. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering So#ware(Ecosystems( Defini7ons • Messerschmit(&(Szyperski,(2003([book]( • “a*collec/on*of*so,ware*products*that*have*some*given* degree*of*symbio/c*rela/onships.”* • Lungu,(2008([disserta7on]* • “a*collec/on*of*so,ware*projects*that*are*developed*and* evolve*together*in*the*same*environment.” 40
  13. 13. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering So#ware(Ecosystems( Defini7ons • Messerschmit(&(Szyperski,(2003([book]( • “a*collec/on*of*so,ware*products*that*have*some*given* degree*of*symbio/c*rela/onships.”* • Lungu,(2008([disserta7on]* • “a*collec/on*of*so,ware*projects*that*are*developed*and* evolve*together*in*the*same*environment.”* • Jansen(et(al.,(2013([book]* • “a*set*of*actors*func/oning*as*a*unit*and*interac/ng*with* a*shared*market*for*so,ware*and*services,*together*with* the*rela/onships*among*them.” 41
  14. 14. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering 42 So#ware(Ecosystems( Defini7ons Business?oriented(view • “a*set*of*actors*func/oning*as*a*unit* and*interac/ng*with*a*shared*market* for*so,ware*and*services,*together* with*the*rela/onships*among*them.” Examples • Eclipse( • Android*and*iOS*app*store
  15. 15. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering 43 So#ware(Ecosystems( Defini7ons Development?centric(view • “a*collec/on*of*so,ware* products*that*have*some*given* degree*of*symbio/c* rela/onships.”* ! ! • “a*collec/on*of*so,ware* projects*that*are*developed* and*evolve*together*in*the* same*environment.”* Examples • Gnome
 KDE( ! • Debian
 Ubuntu( ! • R’s*CRAN( ! • Apache
  16. 16. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering So#ware(Ecosystems( Defini7ons Projet 1 Projet 2 Projet 3 44 Socio?technical(view • a*community*of*persons* (end&users,*developers,* debuggers,*…)*contribu/ng* to*a*collec/on*of*projects
  17. 17. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering So#ware(Ecosystems( Defini7ons Ecosystem(<>(System(of(systems( ( (cf.(John(McDermid)( ! An ecosystem is a set of systems that is
 “designed as a whole”.! These systems! cannot function in isolation (symbiotic relationships)! are usually very diverse! function together as a unit! are evolved together towards a common
 (but evolving) goal
  18. 18. So#ware(Ecosystems Challenges
  19. 19. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering So#ware(Ecosystem(Analysis( Challenges 47 Empirically(analysing(so#ware(ecosystems(involves(many( challenges • Technical*challenges* • Scien/fic*challenges* • Prac/cal*challenges* • Ethical*challenges* • …
  20. 20. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering So#ware(Ecosystem(Analysis( Challenges Projet 1 Projet 2 Projet 3 48 Technical(challenges • Extrac/ng*and*combining*data* from*different*sources* • Iden/ty*merging* • Dealing*with*inconsistent*and* incomplete*data* • Big$data*analy/cs* • special*skills*and*tools* needed*to*store,*process*and* analyse*huge*amounts*of* data*
  21. 21. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering So#ware(Ecosystem(Analysis( Challenges 49 Scien&fic(challenges • Accessibility*of*data* • E.g.*many*apps*in*Google*Play*are*proprietary
 and*historical*informa/on*is*not*accessible* • Focus*on*open*source*so,ware* • Reproducibility*of*results* • Generalisability*of*results* • Which*research*methodology,*which*metrics,*which*sta/s/cal* tools,*…
  22. 22. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering So#ware(Ecosystem(Analysis( Challenges 50 Prac&cal(challenges • How*can*we*share*our*big*data*with*other*researchers?* • Different*formats,*different*tools,*storage*problems,*…* • How*can*we*make*our*research*results*useful*to*prac//oners* and*development*communi/es?* • How*can*we*build*tools*and*dashboards*that*integrate*our* findings?
  23. 23. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering So#ware(Ecosystem(Analysis( Challenges 51 Ethical(challenges • Privacy*issues* • Can*we*use*and*combine*informa/on*about*actual* developers?* • Can*we*make*these*results*freely*available?* • How*to*reconcile*privacy*with*reproducibility*? Privacy Reproducibility
  24. 24. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering Technical(Challenges( Extrac7ng(data(from(different(sources •(Source(code(and(other(commits(stored(in(version( control(repositories( E.g.,(Subversion,(Git( •(Developer(mailing(lists(and(user(mailing(lists( ! •(Bug(reports(and(change(requests(stored(in(issue( tracking(systems(( E.g.,(Bugzilla,(JIRA( Ques7on(and(Answer(websites( E.g.(StackOverflow 52
  25. 25. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering Technical(Challenges( Extrac7ng(data(from(different(sources Using(open(source(MetricsGrimoire(tool( suite((htps://github.com/MetricsGrimoire)( CVSAnalY( •extracts(informa7on(from(SVN(or(Git(source(code( repository(logs(and(stores(it(into(rela7onal(database( MailingListStats( •extracts(mailing(list(informa7on(from(mbox(format( Bicho( •extracts(informa7on(from(issue(tracking(systems(such( as(Bugzilla(and(JIRA 53
  26. 26. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering Technical(Challenges( Iden7ty(merging The(same(contributor(may(use(different( aliases 54 Euphegenia Doubtfire, euphegenia@hotmail.com Robin Williams, robinw@gmail.com
  27. 27. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering Technical(Challenges( Iden7ty(merging 55 DépôtsContributeurs john John Smith Dépôt de code source Mailing list Bug tracker john <js@gmail.com> john@doe.org johnny john John, Doe Doe, John john.doe@gmail.com john_doe@hotmail.com jdoe@gmail.com John W. Doe Jane
  28. 28. 56 6-3-2013 Ordering Rajesh Sola Sola Rajesh Spelling: misspelling, diacritics, punctuation Rene Engelhard Fene Engelhard Démurget Demurget J. A. M. Carneiro J A M Carneiro Middle initials, patronyms, nicknames, additional surnames, incomplete names Daniel M. Mueth Daniel Mueth Alexander Alexandrov Shopov Alexander Shopov Carlos Garnacho Parro Carlos Garnacho Jacob “Ulysses” Berkman Jacob Berkman A S Alam Amanpreet Singh Alam Name variants: transliteration, diminutives Γιωργοσ Georgios Mike Gratton Michael Gratton Software-specific: usernames, projects, tooling artefacts mrhappypants Aaron Brown Arturo Tena/libole2 Arturo Tena (16:06) Alex Roberts Alex Roberts Mix Any combination of those
  29. 29. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering Technical(Challenges( Iden7ty(merging 57 id(=(17
 {(John(Doe,( Doe(John,
 john@doe.org,
 john_doe@hotmail.com,
 john.doe@gmail.com(} Semi-automatic approach: • eliminate specific quirks observed during extraction Example: “(16:06) Alex Roberts” • compute similarity between each pair of aliases (based on Levenshtein distance) • cluster together aliases with high similarity • post-process manually •rely on external information (websites) •precise but labor-intensive
  30. 30. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering Technical(Challenges( Iden7ty(merging Levenshtein(distance((1965):( • Computes(the(minimal(distance(between(2(strings( in(terms(of(single(character(edits((dele$on,( addi$on(or(replacement)( • Example:(lev(“Mike”,(“Michael”)(=(4( • “Mike”(=>(“Mice”(=>(“Miche”(=>(“Michae”(=>(“Michael” 58
  31. 31. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering Technical(Challenges( Iden7ty(merging Levenshtein(distance((1965):( • Computes(the(minimal(distance(between(2(strings(in( terms(of(single(character(edits((dele$on,(addi$on(or( replacement)( • Example:(lev(“Mike”,(“Michael”)(=(4( • “Mike”(=>(“Mice”(=>(“Miche”(=>(“Michae”(=>(“Michael”( ! • Side(note( • Damerau?Levenshtein(distance(also(considers( transposi$on/of/adjacent/characters/ • Applied(in(biology(for(DNA(sequence(alignment 59
  32. 32. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering Technical(Challenges( Iden7ty(merging 60 • several merge algorithms exist ! • the “noisier” the data, the worse they perform! ! • simple algorithms have higher precision and recall than more complex ones A Comparison of Identity Merge Algorithms for Software Repositories Mathieu Goeminne⇤ , Tom Mens⇤ Institut d’Informatique, Facult´e des Sciences, Universit´e de Mons Abstract Software repository mining research extracts and analyses data originating from multiple software repositories to understand the historical development of soft- ware systems, and to propose better ways to evolve such systems in the future. Of particular interest is the study of the activities and interactions between the persons involved in the software development process. The main challenge with such studies lies in the ability to determine the identities (e.g., logins or e-mail accounts) in software repositories that represent the same physical person. To achieve this, di↵erent identity merge algorithms have been proposed in the past. This article provides an objective comparison of identity merge algorithms, in- cluding some improvements over existing algorithms. The results are validated on a selection of large ongoing open source software projects. Keywords: software repository mining, empirical software engineering, identity merging, open source, software evolution, comparison 1. Introduction Science(of(Computer(Programming(28(8),(August(2013
  33. 33. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering Technical(Challenges( Iden7ty(merging 61 Alternative automated approach • Use of Latent Semantic Analysis (LSA) • equally good as other algorithms in average case • better performance in worst case parameters, we first performed a sensitivity analysis by fixing 3 and varying the remaining. After the sensitivity analysis we restricted the range of minLen to {2, 3, 4} levThr to {0.5, 0.75}, cosThr to {0.65, 0.70, 0.75}, and k was fixed to half of the number of terms. In the average case, for each of the ten repetitions, training was performed on one tenth of the GNOME aliases (' 860), and testing on ten random subsets with the same size from the remaining aliases. Samples were chosen instead of the entire remaining data for computational efficiency reasons. In the worst case because of fewer aliases in the dataset (673), for each of the ten repetitions, training was performed on one third of the data and testing on the other two thirds. All algorithms as well as the data, can be made available upon request. Who’s who in GNOME: using LSA to merge software repository identities Erik Kouters, Bogdan Vasilescu⇤, Alexander Serebrenik, Mark G. J. van den Brand Technische Universiteit Eindhoven, Den Dolech 2, P.O. Box 513, 5600 MB Eindhoven, The Netherlands e.t.m.kouters@student.tue.nl, {b.n.vasilescu, a.serebrenik, m.g.j.v.d.brand}@tue.nl Abstract—Understanding an individual’s contribution to an ecosystem often necessitates integrating information from mul- tiple repositories corresponding to different projects within the ecosystem or different kinds of repositories (e.g., mail archives and version control systems). However, recognising that different contributions belong to the same contributor is challenging, since developers may use different aliases. It is known that existing identity merging algorithms are sensitive to large discrepancies between the aliases used by the same individual: the noisier the data, the worse their performance. To assess the scale of the problem for a large software ecosystem, we study all GNOME Git repositories, classify the differences in aliases, and discuss robustness of existing algorithms with respect to these types of differences. We then propose a new identity merging algorithm based on Latent Semantic Analysis (LSA), designed to be robust against more types of differences in aliases, and evaluate it empirically by means of cross-validation on GNOME Git authors. Our results show a clear improvement over existing algorithms in terms of precision and recall on worst-case input data. Keywords-identity merging; Gnome; latent semantic analysis I. INTRODUCTION One of the challenges when mining software repositories is identity merging [5]. To study contributors to software projects or software ecosystems, one often tries to integrate information about their contributions in different software repositories, such as version control systems, bug trackers, or mailing lists. However, developers may use different aliases To integrate information about individual contributio we therefore need a unique identity representing same contributor across different repositories and differ projects. To this end, we need to use an identity mergi algorithm [1, 3, 5, 8, 9]. However, performance of existi approaches degrades sharply in presence of “noisy” data, i data containing large discrepancies between the aliases us by the same individual: “the more noisy and complex project data is, the worse the merge algorithms behave” [ In this paper we concentrate on aliases used by develop in version control systems (VCS); here the term “alia refers to a hname, emaili tuple, typically available in V logs. Even for a single repository type such as VCS, same contributor may use different aliases at different tim or in different projects within the ecosystem. Our g is to design an identity merge algorithm with improv robustness with respect to noisy data, common in ecosyste maintained by large developer communities. We start extracting commit authorship information from all GNOM Git repositories, and discuss differences in the aliases us by GNOME developers in Section II. Next, we evalu robustness of two state of the art identity merging algorith with respect to types of differences in aliases in Section Based on lessons learned from existing approaches, propose a new identity merging algorithm using Lat Semantic Analysis (LSA) [6] in Section IV, and evalu it empirically by means of cross-validation in Section Our results show equally-good performance as the state ICSM(2012(ERA(track
  34. 34. July?August(2014(—(NATO(Marktoberdorf(Summer(School(—(Dependable(So#ware(Systems(Engineering Research(challenges( Accessibility Focus(on(open6source(so#ware( •(Free(access(to(source(code,(defect(data,( developer(and(user(communica7on( •(Historical(data(available(in(open(repositories( – Observable(communi7es( – Observable(ac7vi7es( •(Increasing(popularity(for(personal(and( commercial(use( •(A(huge(range(of(community(and(so#ware(sizes
 62
  1. Gostou de algum slide específico?

    Recortar slides é uma maneira fácil de colecionar informações para acessar mais tarde.

×