The adoption of FOSS workfows in
commercial software development: the
case of git and github
Daniel M German
University of...
Open Source is everywhere
On SSL and Heartbleed
“[Heartbleed] is a software faw that
has left up to two-thirds of the
world’s websites vulnerable to...
“There is no such thing as bad publicity except
your own obituary.”
– Brendan Behan
● “Most open-source software – and Open SSL is
no exception – is produced voluntarily by
people who are not paid for creat...
“Responsible corporate use of open-source
software should therefore involve some
measure of reciprocity: a corporation tha...
“Much of the invisible backbone of
websites from Google to Amazon to the
Federal Bureau of Investigation was built
by volu...
“... volunteers, connected over the
Internet, work together to build free
software, to maintain and improve it and
to look...
Linus Law:
“Given Enough Eyeballs, all Bugs are
Shallow”
Eric Raymond, The Cathedral and the Bazaar
In the case of Heartbleed
“There weren't enough eyeballs”
- Eric Raymond,
● Code was created by a grad student
● Reviewed by S. Henson, core developer
of OpenSSL
● Included in OpenSSL in the Sprin...
the OpenSSL problem
● important infrastructure projects that are
run by small teams of volunteers
● on April 24, the Linux...
Core Infrastructure Initiative
● Funded by:
– Amazon, Cisco, Dell, Facebook, Fujitsu, Google,
IBM, Intel, Microsoft, NetAp...
What is FOSS development?
● Most important feature of FOSS
– its free or open source license
● License
– Guarantees code i...
What is OSS development?
● Most frequently defned as:
– Self organized teams developing software
without a central authori...
What makes OSS development
possible?
● Teams of self-organized developers
and contributors
● The Internet
● A common toolk...
Teams
● Come from all sectors:
– Professionals and hobbyists
– Paid and volunteers
– Novices and Experienced
– High-school...
Common Toolkit
● To be able to collaborate you need a common set of tools
– Programming languages
● gcc, perl, python, jav...
FOSS Toolkit
● I posit that one of the biggest infuences
of FOSS on the practice of Software
Development is the wide use o...
Free Software Foundation
● The FSF had to boostrap the development
of the OSS toolkit
– To build an Operating System you n...
Richard Stallman
Created the legal and technical
infrastructure for Free and Open Source
software
on Code Reviews
Need for Code Reviews
● Many FOSS teams discovered that to
ship good quality software they needed
to review the source code
Fagan Code Inspections
● Code reviews performed at specifc stages of
development
Effective, but not widely used
Open Source style Code Reviews
● Fagan inspections were unfeasible
– Required participants to be in the same room
● Instea...
Code Reviews in FOSS
the spectrum of Code Reviews
code reviews in FOSS
(1) early, frequent reviews
(2) of small, independent, complete
contributions
(3) that are broadcast ...
Lessons from FOSS
on Version Control systems
Version Control Systems
● At the beginning, FOSS used tar fles in
USENET
– the FSF would ship physical tapes!
● Today, ver...
On Version Control
● The VC is the
circulatory system of a
software development
● It brings the code to all
stakeholders
●...
the patch
● the patch should be reviewed
● most VCs don't support reviewing of
patches
the patch and its review
● Two models:
– Commit then Review
● Review the code after it has been integrated
or
– Review The...
Linux
● Linux incorporated RTC early in its
process
● Linus needed integration of Review
process with VC
● No FOSS VC did ...
Bitkeeper and Linux
● Symbiotic relationship
– Free (as in beer) licenses to linux developers with one
big condition
● Use...
Git
● Many other distributed version control
systems before it
● What makes it special?
– Many features, but specially:
● ...
How is distributed version control
software being used?
Git
● Software engineers are moving towards
git
– And other DVCs
● Github a major reason
The Promise of Git
From: http://thkoch2001.github.io/whygitisbetter/
Challenge 1
● Personal repos are beyond reach
● Local commits might never be observable
“History is written by the
victors”
Challenge 2: History
Rebasing changes history
Save history before it is lost!
Super-repository
● Collection of repositories cloned
(recursively) from the same repo
– At least one per developer
● In th...
Moving commits across the
superRepo
Method
Push Done at source, needs write access to destination
Pull Done at destination...
Ecosystem of Repos
Can we learn from Linux?
Life of a Patch in Linux
ContinuousMining of Linux
● Linux has no centralized logging
– Nobody really knows what the superRepo is
– Commits fow wit...
Semiautomatic Process
● Every 3 hrs, ask every repo
– What new commits do you have?
– What commits did you delete?
– Autom...
Implementation
● Running since Nov. 2011
– Currently scans 650 repos every 3 hrs
– Retrieved
● 2.3 million commits (compar...
Snapshot (Linus) Continuous
No Repos 1 479
Commits 64k 533k
Non-merge Commits 59k 485k
Unique Non-merges 58k 135k
%unique ...
Commit vs Patches
● Commit ids are insuffcient to tracks patches
● Large amount of work not reaching blessed
Arrival of Commits at Blessed
Arrival of Commits at Blessed...
● We can classify patches as a new feature
or bug-fx
The Latency
Time of Authorship Time of Commit
The Repos
Path to Linus
● Large ecosystem of
repositories
– Producers
– Consumers
Contributors vs Consumers
Linux Dashboard
● We asked two linux maintainers:
– Can this info be useful?
● Answer:
– “Yes”
… but not for what we expec...
Tracking commits in Linux
● Need to track patches, not commits
– Particularly important in consumer
repositories
– Need to...
Linux Commits Dashboard
● Where is my commit?
– My original commit, has it reached Linus?
● What was merged?
– What commit...
Researcher states:
“40% of pull requests are not merged”
● Based on simply querying ghtorrent data
● But it ignores what r...
What is github used for?
"I store my presentations in github. I don't
need a USB stick anymore!"
Are there potential threats to validity for studies
that assume github is about software engineering
only?
Methodology
● Data sources:
– Surveys
– Sampling of repositories
● Mixed methods:
– Quantitative, and
– Qualitative
I. A repository is not necessarily a project
II. Most projects have few commits
III. Most projects are innactive
IV. A lar...
Uses:
Most projects are inactive
Social?
67% of projects are personal repos
95% have 3 or less committers
Self contained?
“Any serious project would have to have some
separate infrastructure - mailing lists, forums, irc
channels...
Others are already using github's
information to reach conclusions!
the open source report card
http://osrc.dfm.io/dmgerman/
how are github users collaborating?
How does github suppot
collaboration?
● Methodology:
– Survey
● 240 responses (24% response rate)
– Interviews
● 35 interv...
Survey: why do you use github?
Code centric collaboration
Themes: focus
● Simple tools
– git branching/merging
– github features seem to be enough for most
● Pull requests and issu...
Focus: independence
● Decentralized work:
– git allows them to work independently
– yet they have visibility of what other...
Focus: Exposure
● Easy contribution process
– Fork and potentially contribute without pre-
authorization
● Peer pressure
–...
OSS mentality
● At the operational level
– the nature of the work allows independence and self-
organization.
– developers...
The github ecosystem
The Github Ecosystem
● github is creating an ecosystem of
proprietary, cloud enabled applications
for software development...
Conclusions
● git and github are promoting the use of the pull-
request workfow
– small, independent contributions
– that ...
The adoption of FOSS workfows in commercial software development: the case of git and github
The adoption of FOSS workfows in commercial software development: the case of git and github
The adoption of FOSS workfows in commercial software development: the case of git and github
The adoption of FOSS workfows in commercial software development: the case of git and github
The adoption of FOSS workfows in commercial software development: the case of git and github
The adoption of FOSS workfows in commercial software development: the case of git and github
The adoption of FOSS workfows in commercial software development: the case of git and github
The adoption of FOSS workfows in commercial software development: the case of git and github
The adoption of FOSS workfows in commercial software development: the case of git and github
The adoption of FOSS workfows in commercial software development: the case of git and github
The adoption of FOSS workfows in commercial software development: the case of git and github
The adoption of FOSS workfows in commercial software development: the case of git and github
The adoption of FOSS workfows in commercial software development: the case of git and github
The adoption of FOSS workfows in commercial software development: the case of git and github
Upcoming SlideShare
Loading in …5
×

The adoption of FOSS workfows in commercial software development: the case of git and github

399 views

Published on

Published in: Engineering, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
399
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

The adoption of FOSS workfows in commercial software development: the case of git and github

  1. 1. The adoption of FOSS workfows in commercial software development: the case of git and github Daniel M German University of Victoria Canada
  2. 2. Open Source is everywhere
  3. 3. On SSL and Heartbleed “[Heartbleed] is a software faw that has left up to two-thirds of the world’s websites vulnerable to attack by hackers.” – The Economist
  4. 4. “There is no such thing as bad publicity except your own obituary.” – Brendan Behan
  5. 5. ● “Most open-source software – and Open SSL is no exception – is produced voluntarily by people who are not paid for creating it. They do it for love, professional pride or as a way of demonstrating technical virtuosity. And mostly they do it in their spare time.” – John Naughton The Observer/The Guardian 'Heartbleed' bug can't be simply blamed on coders, April 13, 2014
  6. 6. “Responsible corporate use of open-source software should therefore involve some measure of reciprocity: a corporation that benefts hugely from such software ought to put something back, either in the form of fnancial support for a particular open- source project, or – better still – by encouraging its own software people to contribute to the project.”
  7. 7. “Much of the invisible backbone of websites from Google to Amazon to the Federal Bureau of Investigation was built by volunteer programmers in what is known as the open-source community.”
  8. 8. “... volunteers, connected over the Internet, work together to build free software, to maintain and improve it and to look for bugs. Ideally, they check one another’s work in a peer review system similar to that found in science.”
  9. 9. Linus Law: “Given Enough Eyeballs, all Bugs are Shallow” Eric Raymond, The Cathedral and the Bazaar
  10. 10. In the case of Heartbleed “There weren't enough eyeballs” - Eric Raymond,
  11. 11. ● Code was created by a grad student ● Reviewed by S. Henson, core developer of OpenSSL ● Included in OpenSSL in the Spring 2011 ● Not discovered for 3 years! Budget of openSSL: – US$2,000 for 2013
  12. 12. the OpenSSL problem ● important infrastructure projects that are run by small teams of volunteers ● on April 24, the Linux Foundation announces the “Core Infrastructure Initiative” to address it
  13. 13. Core Infrastructure Initiative ● Funded by: – Amazon, Cisco, Dell, Facebook, Fujitsu, Google, IBM, Intel, Microsoft, NetApp, Rackspace,Qualcomm, VMware and The Linux Foundation ● Funding to core projects: – Fellowships to core developers – as well as other resources to assist the project in improving its security, enabling outside reviews, and improving responsiveness to patch requests.
  14. 14. What is FOSS development? ● Most important feature of FOSS – its free or open source license ● License – Guarantees code is available to others to reuse – Becomes a social contract among participants
  15. 15. What is OSS development? ● Most frequently defned as: – Self organized teams developing software without a central authority ● Code is open for review – and reuse!!! ● Anybody can participate
  16. 16. What makes OSS development possible? ● Teams of self-organized developers and contributors ● The Internet ● A common toolkit ● Version control systems
  17. 17. Teams ● Come from all sectors: – Professionals and hobbyists – Paid and volunteers – Novices and Experienced – High-school students to PhDs – All over the world!!! ● Highly motivated!
  18. 18. Common Toolkit ● To be able to collaborate you need a common set of tools – Programming languages ● gcc, perl, python, java, ruby, lua, php... – Editors and IDEs ● Emacs, vim, Eclipse, Netbeans... – Libraries ● boost, maven, cpan, Pypi... – Infrastructure ● Make, ant, cmake, bugzilla, etc. – Hosting infrastructure ● Sourceforge, Google Code, github, bitbucket ● They must be available at zero cost to anybody
  19. 19. FOSS Toolkit ● I posit that one of the biggest infuences of FOSS on the practice of Software Development is the wide use of FOSS tools for the development of software – Most implementations of popular programming languages today are open source – FOSS Editors and IDEs are widely used too
  20. 20. Free Software Foundation ● The FSF had to boostrap the development of the OSS toolkit – To build an Operating System you need a compiler – Before you build a compiler you need an editor, but you need an editor to build a compiler – gcc, emacs, bintools (ls, echo, cat, etc.), etc
  21. 21. Richard Stallman Created the legal and technical infrastructure for Free and Open Source software
  22. 22. on Code Reviews
  23. 23. Need for Code Reviews ● Many FOSS teams discovered that to ship good quality software they needed to review the source code
  24. 24. Fagan Code Inspections ● Code reviews performed at specifc stages of development Effective, but not widely used
  25. 25. Open Source style Code Reviews ● Fagan inspections were unfeasible – Required participants to be in the same room ● Instead, code reviews started to be incremental – Rather than reviewing the whole, review the delta (the patch)
  26. 26. Code Reviews in FOSS
  27. 27. the spectrum of Code Reviews
  28. 28. code reviews in FOSS (1) early, frequent reviews (2) of small, independent, complete contributions (3) that are broadcast to a large group of stakeholders, but only reviewed by a small set of self-selected experts (4) resulting in an effcient and effective peer review technique. - Peter Rigby
  29. 29. Lessons from FOSS
  30. 30. on Version Control systems
  31. 31. Version Control Systems ● At the beginning, FOSS used tar fles in USENET – the FSF would ship physical tapes! ● Today, version control systems are the norm – Centralized or Distributed ● FOSS has a continuous and proven track of innovation in version control systems – FOSS democratized VC
  32. 32. On Version Control ● The VC is the circulatory system of a software development ● It brings the code to all stakeholders ● A contribution is a patch – one or more commits
  33. 33. the patch ● the patch should be reviewed ● most VCs don't support reviewing of patches
  34. 34. the patch and its review ● Two models: – Commit then Review ● Review the code after it has been integrated or – Review Then Commit (RTC) ● Review the patch before it is integrated
  35. 35. Linux ● Linux incorporated RTC early in its process ● Linus needed integration of Review process with VC ● No FOSS VC did it – he turned to bitkeeper
  36. 36. Bitkeeper and Linux ● Symbiotic relationship – Free (as in beer) licenses to linux developers with one big condition ● User should not develop competing tools – Bitkeeper rapidly improved Linux integration process ● simplifed integration of reviewed code – Bitkeeper was probably infuenced by Linus workfow – in 2005 bitkeeper revokes its license to Linux developers
  37. 37. Git ● Many other distributed version control systems before it ● What makes it special? – Many features, but specially: ● Pull-requests ● git incorporates code review process with a distributed version control system – Even via email patches
  38. 38. How is distributed version control software being used?
  39. 39. Git ● Software engineers are moving towards git – And other DVCs ● Github a major reason
  40. 40. The Promise of Git From: http://thkoch2001.github.io/whygitisbetter/
  41. 41. Challenge 1 ● Personal repos are beyond reach ● Local commits might never be observable
  42. 42. “History is written by the victors” Challenge 2: History
  43. 43. Rebasing changes history
  44. 44. Save history before it is lost!
  45. 45. Super-repository ● Collection of repositories cloned (recursively) from the same repo – At least one per developer ● In their personal computer – At least one public repository ● The blessed – In git, no way to trace them
  46. 46. Moving commits across the superRepo Method Push Done at source, needs write access to destination Pull Done at destination, needs read access to source Email Source creates patch mails it; recipient applies it
  47. 47. Ecosystem of Repos
  48. 48. Can we learn from Linux?
  49. 49. Life of a Patch in Linux
  50. 50. ContinuousMining of Linux ● Linux has no centralized logging – Nobody really knows what the superRepo is – Commits fow without any event broadcasting mechanism ● Who do we fnd the activity? – Repos – Commits
  51. 51. Semiautomatic Process ● Every 3 hrs, ask every repo – What new commits do you have? – What commits did you delete? – Automatically resolve propagations ● Commits might propagate before we scan ● Daily: – Are commits in repo by unknown committers? ● Answer: – is there a new repo? or is committer new to repo?
  52. 52. Implementation ● Running since Nov. 2011 – Currently scans 650 repos every 3 hrs – Retrieved ● 2.3 million commits (compared to 400k in Linus repo) ● 109 million records in propagation table <commit-id, added|deleted, repo, when>
  53. 53. Snapshot (Linus) Continuous No Repos 1 479 Commits 64k 533k Non-merge Commits 59k 485k Unique Non-merges 58k 135k %unique non-merges 98.9% 27.9% Non-merges that reached Blessed 43.1% Different authors emails 3434 5646 Different authors 2883 4575 Different committers emails 283 1185 Different committers 245 1058
  54. 54. Commit vs Patches ● Commit ids are insuffcient to tracks patches ● Large amount of work not reaching blessed
  55. 55. Arrival of Commits at Blessed
  56. 56. Arrival of Commits at Blessed... ● We can classify patches as a new feature or bug-fx
  57. 57. The Latency Time of Authorship Time of Commit
  58. 58. The Repos
  59. 59. Path to Linus
  60. 60. ● Large ecosystem of repositories – Producers – Consumers
  61. 61. Contributors vs Consumers
  62. 62. Linux Dashboard ● We asked two linux maintainers: – Can this info be useful? ● Answer: – “Yes” … but not for what we expected...
  63. 63. Tracking commits in Linux ● Need to track patches, not commits – Particularly important in consumer repositories – Need to cross-reference commits ● What commits contain the same patch? – Some repos track commits from blessed via cherry-picking ● Commit ids are useless ● So they annotate log with the origin commit id
  64. 64. Linux Commits Dashboard ● Where is my commit? – My original commit, has it reached Linus? ● What was merged? – What commits were merged at once by Linus? ● What commits are related to this one? – Same patch ● Rebasing ● Cherry picking – Mentioned in a commit ● This commit fxes bug introduced in X ● This commit reverts commit X ● http://o.cs.uvic.ca:20810/perl/cid.pl?cid=70cb8bb0d365f0bc8b20fa67347caf9598a4674e ●
  65. 65. Researcher states: “40% of pull requests are not merged” ● Based on simply querying ghtorrent data ● But it ignores what really happens ● Many pull requests are merged without being marked as merged in github ● Ghtorrent data has many potential threats to validity
  66. 66. What is github used for?
  67. 67. "I store my presentations in github. I don't need a USB stick anymore!"
  68. 68. Are there potential threats to validity for studies that assume github is about software engineering only?
  69. 69. Methodology ● Data sources: – Surveys – Sampling of repositories ● Mixed methods: – Quantitative, and – Qualitative
  70. 70. I. A repository is not necessarily a project II. Most projects have few commits III. Most projects are innactive IV. A large proportion of repositories are not for software engineering V. More than two thirds of projects are personal VI. Only a fraction of repos use pull requests VII. If the commits in a pull-request are reworked, github only records the resulting patch VIII.Most pull-requests appear as non-merged, even though they were merged IX. Many active projects do not conduct all their sotfware development activity in github
  71. 71. Uses:
  72. 72. Most projects are inactive
  73. 73. Social? 67% of projects are personal repos 95% have 3 or less committers
  74. 74. Self contained? “Any serious project would have to have some separate infrastructure - mailing lists, forums, irc channels and their archives, build farms, etc. [...] Thus while GitHub and all other project hosts are used for collaboration, they are not and can not be a complete solution.”
  75. 75. Others are already using github's information to reach conclusions!
  76. 76. the open source report card http://osrc.dfm.io/dmgerman/
  77. 77. how are github users collaborating?
  78. 78. How does github suppot collaboration? ● Methodology: – Survey ● 240 responses (24% response rate) – Interviews ● 35 interviews from survey respondents – 71% professional developers – 11% managers – 9% students – 9% interns ● Approximately 1hr each
  79. 79. Survey: why do you use github?
  80. 80. Code centric collaboration
  81. 81. Themes: focus ● Simple tools – git branching/merging – github features seem to be enough for most ● Pull requests and issue tracking ● Focused interaction – code-centric, focused communication – asynchronous and unobtrusive ●
  82. 82. Focus: independence ● Decentralized work: – git allows them to work independently – yet they have visibility of what others do ● Low need for management: – Need for a clear process (the workfow) – They shy away from rigid management and team structure – Team managers recognize this – Managers should be educated on using git/github
  83. 83. Focus: Exposure ● Easy contribution process – Fork and potentially contribute without pre- authorization ● Peer pressure – Developers are conscious that their code is readily visible to others – Adoption of small, frequent contributions
  84. 84. OSS mentality ● At the operational level – the nature of the work allows independence and self- organization. – developers are familiar with the idea of working this way and share the mentality behind it. ● developers are self-driven ● share the mentality of – self- organizing, – minimizing communication and coordination needs, – having ownership of code, and – operating on a meritocratic, expertise-based model
  85. 85. The github ecosystem
  86. 86. The Github Ecosystem ● github is creating an ecosystem of proprietary, cloud enabled applications for software development teams – Service integration – JSON API ● Asana, Campfre, Lighthouse, Jira, Travis, Trello, etc, etc.
  87. 87. Conclusions ● git and github are promoting the use of the pull- request workfow – small, independent contributions – that can be reviewed before integration ● Effectively, adopting open source code practices into their development – Independent work – Code reviews of contributions before they are integrated

×