Corporate Involvement in FLOSS
Study of Presence in Debian Code over Time




                              Gregorio Roble...
Introduction & Motivation

 FLOSS   landscape has a high variety of
  participants: individuals, universities,
  foundati...
Goal & Research Questions

 To measure the involvement of companies
 in FLOSS, specifically of those that deliver
 code t...
The Means

 Analyze source code available in Debian
 for contribution of companies
      More than 10,000 source code pac...
Looking for contributions

 Scanning   for copyright statements




 Companies have a high motivation to be
 the copyrig...
Methodology








 File   selection: Source code files
        >30 programming languages
    –

 'Ownergrep': Heurist...
The 'ownergrep'

 Regular     expression
    .*copyright (?:(c))?[d,-s:]+(?:bys+)?([^d]*)





        Ownergrep algorit...
Cleaning & Multiple entries

 Ad-hoc    heuristics
      lower case, removing white spaces, dots, etc.
  –

 Splitting  ...
Example: IBM Corporation

    international business machines        international business machines
                    ...
(Avoiding) Double counting

 Using a hash for each file
 Not 2-to-2, but only on same filenames
Results

 Contribution    is doubled every 2 years




 Companies     have been verified manually!!
      by means of Go...
Companies vs. rest (with care)
Top10-contributing Companies

 Debian   3.1
 Sarge
 2005
Top10-contributing Companies

 Debian   2.0
 Hamm
 1998
Hall Of Fame

 SUN   (1,2,2,2,1)
 Netscape (X,1,1,X,7)
 Silicon Graphics (3,4,4,5,4)
 IBM (X,X,6,1,2)
 Xerox (4,6,10,...
Summing up

 Approach  to quantify the contribution that
  companies give back to the community
 Threats to validity:
  ...
Any questions?



 Thanks for your attendance and interest!

     More information is available at
         http://libreso...
Upcoming SlideShare
Loading in …5
×

Involvement Of Companies In Debian

1,130 views
1,092 views

Published on

Published in: Economy & Finance, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,130
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Involvement Of Companies In Debian

  1. 1. Corporate Involvement in FLOSS Study of Presence in Debian Code over Time Gregorio Robles, Santiago Dueñas, Jesús González-Barahona Universidad Rey Juan Carlos libresoft.urjc.es Limerick, June 13th 2007    
  2. 2. Introduction & Motivation  FLOSS landscape has a high variety of participants: individuals, universities, foundations... and, of course, companies!  Companies in FLOSS have already been studied, especially focusing on the Why? (Building a community, marketing...) – How? (Business models, etc.) – Where? (Software domains...) – ... –
  3. 3. Goal & Research Questions  To measure the involvement of companies in FLOSS, specifically of those that deliver code to the community How much is the involvement of companies? – Has it changed over time? – How many companies? – Who are the main actors? –
  4. 4. The Means  Analyze source code available in Debian for contribution of companies More than 10,000 source code packages in – Debian 3.1  Wewill apply our methodology from Debian 2.0 (1998) to Debian 3.1 (2005)
  5. 5. Looking for contributions  Scanning for copyright statements  Companies have a high motivation to be the copyright holders!
  6. 6. Methodology   File selection: Source code files >30 programming languages –  'Ownergrep': Heuristic looking for ©  Cleaning, multiple entries & merging  Avoid double counting
  7. 7. The 'ownergrep'  Regular expression .*copyright (?:(c))?[d,-s:]+(?:bys+)?([^d]*)  Ownergrep algorithm based on one by Rishab – A. Ghosh and Vipul Prakash
  8. 8. Cleaning & Multiple entries  Ad-hoc heuristics lower case, removing white spaces, dots, etc. –  Splitting of joint © statements Spencer Kimball and Peter Mattis – IBM corporation and others –  But... the same copyright holder may appear in (many) different ways! They should really be merged into a unique one –
  9. 9. Example: IBM Corporation international business machines international business machines   inc corporation ibm deutschland entwicklung international business machines   gmbh, ibm corporation corp international business machines ibm crop   corp international business machines  international business machines  ibm entwicklung gmbh, ibm  corporation and corporation ibm corp  ibm  ibm deutschland entwicklung gmbh  ibm deutschland  international business machines,  ... and 7 other ways!  inc
  10. 10. (Avoiding) Double counting  Using a hash for each file  Not 2-to-2, but only on same filenames
  11. 11. Results  Contribution is doubled every 2 years  Companies have been verified manually!! by means of Google searches (thanks Diego!) –
  12. 12. Companies vs. rest (with care)
  13. 13. Top10-contributing Companies  Debian 3.1  Sarge  2005
  14. 14. Top10-contributing Companies  Debian 2.0  Hamm  1998
  15. 15. Hall Of Fame  SUN (1,2,2,2,1)  Netscape (X,1,1,X,7)  Silicon Graphics (3,4,4,5,4)  IBM (X,X,6,1,2)  Xerox (4,6,10,X,X)  [Other top companies can be found in the paper]
  16. 16. Summing up  Approach to quantify the contribution that companies give back to the community  Threats to validity: use of heuristics: ownergrep, cleaning, merging – completeness (use of © statements) –  Source code by companies gets doubled every two years!  Share of contribution has almost maintained constant over the last 7 years
  17. 17. Any questions? Thanks for your attendance and interest! More information is available at http://libresoft.urjc.es

×