The adoption of FOSS workfows in commercial software development: the case of git and github

  • 44 views
Uploaded on

 

More in: Engineering , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
44
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. The adoption of FOSS workfows in commercial software development: the case of git and github Daniel M German University of Victoria Canada
  • 2. Open Source is everywhere
  • 3. On SSL and Heartbleed “[Heartbleed] is a software faw that has left up to two-thirds of the world’s websites vulnerable to attack by hackers.” – The Economist
  • 4. “There is no such thing as bad publicity except your own obituary.” – Brendan Behan
  • 5. ● “Most open-source software – and Open SSL is no exception – is produced voluntarily by people who are not paid for creating it. They do it for love, professional pride or as a way of demonstrating technical virtuosity. And mostly they do it in their spare time.” – John Naughton The Observer/The Guardian 'Heartbleed' bug can't be simply blamed on coders, April 13, 2014
  • 6. “Responsible corporate use of open-source software should therefore involve some measure of reciprocity: a corporation that benefts hugely from such software ought to put something back, either in the form of fnancial support for a particular open- source project, or – better still – by encouraging its own software people to contribute to the project.”
  • 7. “Much of the invisible backbone of websites from Google to Amazon to the Federal Bureau of Investigation was built by volunteer programmers in what is known as the open-source community.”
  • 8. “... volunteers, connected over the Internet, work together to build free software, to maintain and improve it and to look for bugs. Ideally, they check one another’s work in a peer review system similar to that found in science.”
  • 9. Linus Law: “Given Enough Eyeballs, all Bugs are Shallow” Eric Raymond, The Cathedral and the Bazaar
  • 10. In the case of Heartbleed “There weren't enough eyeballs” - Eric Raymond,
  • 11. ● Code was created by a grad student ● Reviewed by S. Henson, core developer of OpenSSL ● Included in OpenSSL in the Spring 2011 ● Not discovered for 3 years! Budget of openSSL: – US$2,000 for 2013
  • 12. the OpenSSL problem ● important infrastructure projects that are run by small teams of volunteers ● on April 24, the Linux Foundation announces the “Core Infrastructure Initiative” to address it
  • 13. Core Infrastructure Initiative ● Funded by: – Amazon, Cisco, Dell, Facebook, Fujitsu, Google, IBM, Intel, Microsoft, NetApp, Rackspace,Qualcomm, VMware and The Linux Foundation ● Funding to core projects: – Fellowships to core developers – as well as other resources to assist the project in improving its security, enabling outside reviews, and improving responsiveness to patch requests.
  • 14. What is FOSS development? ● Most important feature of FOSS – its free or open source license ● License – Guarantees code is available to others to reuse – Becomes a social contract among participants
  • 15. What is OSS development? ● Most frequently defned as: – Self organized teams developing software without a central authority ● Code is open for review – and reuse!!! ● Anybody can participate
  • 16. What makes OSS development possible? ● Teams of self-organized developers and contributors ● The Internet ● A common toolkit ● Version control systems
  • 17. Teams ● Come from all sectors: – Professionals and hobbyists – Paid and volunteers – Novices and Experienced – High-school students to PhDs – All over the world!!! ● Highly motivated!
  • 18. Common Toolkit ● To be able to collaborate you need a common set of tools – Programming languages ● gcc, perl, python, java, ruby, lua, php... – Editors and IDEs ● Emacs, vim, Eclipse, Netbeans... – Libraries ● boost, maven, cpan, Pypi... – Infrastructure ● Make, ant, cmake, bugzilla, etc. – Hosting infrastructure ● Sourceforge, Google Code, github, bitbucket ● They must be available at zero cost to anybody
  • 19. FOSS Toolkit ● I posit that one of the biggest infuences of FOSS on the practice of Software Development is the wide use of FOSS tools for the development of software – Most implementations of popular programming languages today are open source – FOSS Editors and IDEs are widely used too
  • 20. Free Software Foundation ● The FSF had to boostrap the development of the OSS toolkit – To build an Operating System you need a compiler – Before you build a compiler you need an editor, but you need an editor to build a compiler – gcc, emacs, bintools (ls, echo, cat, etc.), etc
  • 21. Richard Stallman Created the legal and technical infrastructure for Free and Open Source software
  • 22. on Code Reviews
  • 23. Need for Code Reviews ● Many FOSS teams discovered that to ship good quality software they needed to review the source code
  • 24. Fagan Code Inspections ● Code reviews performed at specifc stages of development Effective, but not widely used
  • 25. Open Source style Code Reviews ● Fagan inspections were unfeasible – Required participants to be in the same room ● Instead, code reviews started to be incremental – Rather than reviewing the whole, review the delta (the patch)
  • 26. Code Reviews in FOSS
  • 27. the spectrum of Code Reviews
  • 28. code reviews in FOSS (1) early, frequent reviews (2) of small, independent, complete contributions (3) that are broadcast to a large group of stakeholders, but only reviewed by a small set of self-selected experts (4) resulting in an effcient and effective peer review technique. - Peter Rigby
  • 29. Lessons from FOSS
  • 30. on Version Control systems
  • 31. Version Control Systems ● At the beginning, FOSS used tar fles in USENET – the FSF would ship physical tapes! ● Today, version control systems are the norm – Centralized or Distributed ● FOSS has a continuous and proven track of innovation in version control systems – FOSS democratized VC
  • 32. On Version Control ● The VC is the circulatory system of a software development ● It brings the code to all stakeholders ● A contribution is a patch – one or more commits
  • 33. the patch ● the patch should be reviewed ● most VCs don't support reviewing of patches
  • 34. the patch and its review ● Two models: – Commit then Review ● Review the code after it has been integrated or – Review Then Commit (RTC) ● Review the patch before it is integrated
  • 35. Linux ● Linux incorporated RTC early in its process ● Linus needed integration of Review process with VC ● No FOSS VC did it – he turned to bitkeeper
  • 36. Bitkeeper and Linux ● Symbiotic relationship – Free (as in beer) licenses to linux developers with one big condition ● User should not develop competing tools – Bitkeeper rapidly improved Linux integration process ● simplifed integration of reviewed code – Bitkeeper was probably infuenced by Linus workfow – in 2005 bitkeeper revokes its license to Linux developers
  • 37. Git ● Many other distributed version control systems before it ● What makes it special? – Many features, but specially: ● Pull-requests ● git incorporates code review process with a distributed version control system – Even via email patches
  • 38. How is distributed version control software being used?
  • 39. Git ● Software engineers are moving towards git – And other DVCs ● Github a major reason
  • 40. The Promise of Git From: http://thkoch2001.github.io/whygitisbetter/
  • 41. Challenge 1 ● Personal repos are beyond reach ● Local commits might never be observable
  • 42. “History is written by the victors” Challenge 2: History
  • 43. Rebasing changes history
  • 44. Save history before it is lost!
  • 45. Super-repository ● Collection of repositories cloned (recursively) from the same repo – At least one per developer ● In their personal computer – At least one public repository ● The blessed – In git, no way to trace them
  • 46. Moving commits across the superRepo Method Push Done at source, needs write access to destination Pull Done at destination, needs read access to source Email Source creates patch mails it; recipient applies it
  • 47. Ecosystem of Repos
  • 48. Can we learn from Linux?
  • 49. Life of a Patch in Linux
  • 50. ContinuousMining of Linux ● Linux has no centralized logging – Nobody really knows what the superRepo is – Commits fow without any event broadcasting mechanism ● Who do we fnd the activity? – Repos – Commits
  • 51. Semiautomatic Process ● Every 3 hrs, ask every repo – What new commits do you have? – What commits did you delete? – Automatically resolve propagations ● Commits might propagate before we scan ● Daily: – Are commits in repo by unknown committers? ● Answer: – is there a new repo? or is committer new to repo?
  • 52. Implementation ● Running since Nov. 2011 – Currently scans 650 repos every 3 hrs – Retrieved ● 2.3 million commits (compared to 400k in Linus repo) ● 109 million records in propagation table <commit-id, added|deleted, repo, when>
  • 53. Snapshot (Linus) Continuous No Repos 1 479 Commits 64k 533k Non-merge Commits 59k 485k Unique Non-merges 58k 135k %unique non-merges 98.9% 27.9% Non-merges that reached Blessed 43.1% Different authors emails 3434 5646 Different authors 2883 4575 Different committers emails 283 1185 Different committers 245 1058
  • 54. Commit vs Patches ● Commit ids are insuffcient to tracks patches ● Large amount of work not reaching blessed
  • 55. Arrival of Commits at Blessed
  • 56. Arrival of Commits at Blessed... ● We can classify patches as a new feature or bug-fx
  • 57. The Latency Time of Authorship Time of Commit
  • 58. The Repos
  • 59. Path to Linus
  • 60. ● Large ecosystem of repositories – Producers – Consumers
  • 61. Contributors vs Consumers
  • 62. Linux Dashboard ● We asked two linux maintainers: – Can this info be useful? ● Answer: – “Yes” … but not for what we expected...
  • 63. Tracking commits in Linux ● Need to track patches, not commits – Particularly important in consumer repositories – Need to cross-reference commits ● What commits contain the same patch? – Some repos track commits from blessed via cherry-picking ● Commit ids are useless ● So they annotate log with the origin commit id
  • 64. Linux Commits Dashboard ● Where is my commit? – My original commit, has it reached Linus? ● What was merged? – What commits were merged at once by Linus? ● What commits are related to this one? – Same patch ● Rebasing ● Cherry picking – Mentioned in a commit ● This commit fxes bug introduced in X ● This commit reverts commit X ● http://o.cs.uvic.ca:20810/perl/cid.pl?cid=70cb8bb0d365f0bc8b20fa67347caf9598a4674e ●
  • 65. Researcher states: “40% of pull requests are not merged” ● Based on simply querying ghtorrent data ● But it ignores what really happens ● Many pull requests are merged without being marked as merged in github ● Ghtorrent data has many potential threats to validity
  • 66. What is github used for?
  • 67. "I store my presentations in github. I don't need a USB stick anymore!"
  • 68. Are there potential threats to validity for studies that assume github is about software engineering only?
  • 69. Methodology ● Data sources: – Surveys – Sampling of repositories ● Mixed methods: – Quantitative, and – Qualitative
  • 70. I. A repository is not necessarily a project II. Most projects have few commits III. Most projects are innactive IV. A large proportion of repositories are not for software engineering V. More than two thirds of projects are personal VI. Only a fraction of repos use pull requests VII. If the commits in a pull-request are reworked, github only records the resulting patch VIII.Most pull-requests appear as non-merged, even though they were merged IX. Many active projects do not conduct all their sotfware development activity in github
  • 71. Uses:
  • 72. Most projects are inactive
  • 73. Social? 67% of projects are personal repos 95% have 3 or less committers
  • 74. Self contained? “Any serious project would have to have some separate infrastructure - mailing lists, forums, irc channels and their archives, build farms, etc. [...] Thus while GitHub and all other project hosts are used for collaboration, they are not and can not be a complete solution.”
  • 75. Others are already using github's information to reach conclusions!
  • 76. the open source report card http://osrc.dfm.io/dmgerman/
  • 77. how are github users collaborating?
  • 78. How does github suppot collaboration? ● Methodology: – Survey ● 240 responses (24% response rate) – Interviews ● 35 interviews from survey respondents – 71% professional developers – 11% managers – 9% students – 9% interns ● Approximately 1hr each
  • 79. Survey: why do you use github?
  • 80. Code centric collaboration
  • 81. Themes: focus ● Simple tools – git branching/merging – github features seem to be enough for most ● Pull requests and issue tracking ● Focused interaction – code-centric, focused communication – asynchronous and unobtrusive ●
  • 82. Focus: independence ● Decentralized work: – git allows them to work independently – yet they have visibility of what others do ● Low need for management: – Need for a clear process (the workfow) – They shy away from rigid management and team structure – Team managers recognize this – Managers should be educated on using git/github
  • 83. Focus: Exposure ● Easy contribution process – Fork and potentially contribute without pre- authorization ● Peer pressure – Developers are conscious that their code is readily visible to others – Adoption of small, frequent contributions
  • 84. OSS mentality ● At the operational level – the nature of the work allows independence and self- organization. – developers are familiar with the idea of working this way and share the mentality behind it. ● developers are self-driven ● share the mentality of – self- organizing, – minimizing communication and coordination needs, – having ownership of code, and – operating on a meritocratic, expertise-based model
  • 85. The github ecosystem
  • 86. The Github Ecosystem ● github is creating an ecosystem of proprietary, cloud enabled applications for software development teams – Service integration – JSON API ● Asana, Campfre, Lighthouse, Jira, Travis, Trello, etc, etc.
  • 87. Conclusions ● git and github are promoting the use of the pull- request workfow – small, independent contributions – that can be reviewed before integration ● Effectively, adopting open source code practices into their development – Independent work – Code reviews of contributions before they are integrated