The Ultimate Debian  Database  Israel Herraiz  <israel.herraiz@upm.es>  Davis, CA, July 26th 2012Download these slides at ...
Outline1. Debian: what is it and sources of data2. The UDD: what is it and where to get it3. What has been done and what w...
1. Debian: what is it andsources of data                            2 / 25
Debian• GNU/Linux software distribution   •   Goal: to deliver an entirely and exclusively free       distribution• Mainta...
Debian Releases                  4 / 25
5 / 25
Debian Source Packages                         6 / 25
Source and Binary Packages• A source package generates one or more binary  packages                                 octave...
Package uploads• There are no repositories like in other software  projects  •   Although developers may privately use ver...
Source Packages metadataSource: octaveSection: mathPriority: extraMaintainer: Debian Octave Group <pkg-octave-devel@lists....
Binary Packages metadataPackage: octavePriority: extraSection: mathInstalled-Size: 4760Maintainer: Ubuntu Developers <ubun...
Binary Packages metadataPackage: octavePriority: extraSection: mathInstalled-Size: 4760Maintainer: Ubuntu Developers <ubun...
Debian Popcon: Tracking Installations• Popularity: total  install counts  •   Recent Use (< 30      days)  •   Old Use (Be...
Debian Bugs• People find bugs in binary packages  •   ~500 bugs per month• But bugs are linked to source packages• Bugs ca...
2. The UDD: what is it andwhere to get it                             14 / 25
Research work: main paper (at MSR 2010)                                          15 / 25
Other papers at MSR 2010                           16 / 25
What is the UDD?• PostgreSQL database with all the information of  the sources described so far  •   http://udd.debian.org...
Debian sources of data• Sources / Packages • Lintian  metadata           • Migrations to testing• Bugs               • Upl...
!    19 / 25
Bear in mind!• You can also obtain the source code of the  packages  •   Easy to automate• And the modifications done by t...
3. What has been done andwhat we can do                            21 / 25
What kind of questions does Debian solve with the                       UDD?• High priority packages that have           R...
Some questions solved in the literature• The popularity bias      •   http://oa.upm.es/9585/  •   Open source projects get...
The popularity bias            Required packagesLog(Bugs)                    Log(installations)                           ...
Summary• Packages and sources metadata     •   And source code• Bugs     •   All the way back to 1995-96-97!• Popularity c...
Upcoming SlideShare
Loading in...5
×

The Ultimate Debian Database

370

Published on

Some comments about the sources of data stored in the Ultimate Debian Database

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
370
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

The Ultimate Debian Database

  1. 1. The Ultimate Debian Database Israel Herraiz <israel.herraiz@upm.es> Davis, CA, July 26th 2012Download these slides at http://slideshare.net/herraiz/the-ultimate-debian-database
  2. 2. Outline1. Debian: what is it and sources of data2. The UDD: what is it and where to get it3. What has been done and what we can do 1 / 25
  3. 3. 1. Debian: what is it andsources of data 2 / 25
  4. 4. Debian• GNU/Linux software distribution • Goal: to deliver an entirely and exclusively free distribution• Maintained by volunteers• Bureaucratic organization (policies, constitution, social contract)• Release when ready• > 10 years history• > 500 MSLOC• > 15k packages 3 / 25
  5. 5. Debian Releases 4 / 25
  6. 6. 5 / 25
  7. 7. Debian Source Packages 6 / 25
  8. 8. Source and Binary Packages• A source package generates one or more binary packages octave-core octave-doc octave liboctave liboctave-dev 7 / 25
  9. 9. Package uploads• There are no repositories like in other software projects • Although developers may privately use version control systems• When a bug is fixed, a new version is uploaded • Uploads == commits 8 / 25
  10. 10. Source Packages metadataSource: octaveSection: mathPriority: extraMaintainer: Debian Octave Group <pkg-octave-devel@lists.alioth.debian.org>Uploaders: Thomas Weber <tweber@debian.org>, Sébastien Villemot<sebastien.villemot@ens.fr>DM-Upload-Allowed: yesBuild-Depends: gfortran, debhelper (>= 9), automake, dh-autoreconf, texinfo ….Standards-Version: 3.9.3Homepage: http://www.octave.org/Vcs-Git: git://git.debian.org/git/pkg-octave/octave.gitVcs-Browser: http://git.debian.org/?p=pkg-octave/octave.git 9 / 25
  11. 11. Binary Packages metadataPackage: octavePriority: extraSection: mathInstalled-Size: 4760Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>Architecture: amd64Version: 3.6.1-1ubuntu1ppa1~precise1Recommends: gnuplot, libatlas3gf-baseReplaces: octave3.2Suggests: octave-info, octave-doc, octave-htmldocDepends: libamd2.2.0 (>= 1:3.4.0), libarpack2 (>= 2.1), …Conflicts: octave3.2Filename: pool/main/o/octave/octave_3.6.1-1ubuntu1ppa1~precise1_amd64.debSize: 1746050MD5sum: 2c431556d6cf98fd8a341e865ac63058SHA1: b333c49e6f6cb7d4445378020dfffdb5a1626de7Description: GNU Octave language for numerical computations… 10 / 25
  12. 12. Binary Packages metadataPackage: octavePriority: extraSection: mathInstalled-Size: 4760Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>Architecture: amd64Version: 3.6.1-1ubuntu1ppa1~precise1Recommends: gnuplot, libatlas3gf-baseReplaces: octave3.2Suggests: octave-info, octave-doc, octave-htmldocDepends: libamd2.2.0 (>= 1:3.4.0), libarpack2 (>= 2.1), …Conflicts: octave3.2Filename: pool/main/o/octave/octave_3.6.1-1ubuntu1ppa1~precise1_amd64.debSize: 1746050MD5sum: 2c431556d6cf98fd8a341e865ac63058SHA1: b333c49e6f6cb7d4445378020dfffdb5a1626de7Description: GNU Octave language for numerical computations… 11 / 25
  13. 13. Debian Popcon: Tracking Installations• Popularity: total install counts • Recent Use (< 30 days) • Old Use (Beyond 30 days)• Data collected daily• Users voluntarily opt- in • Source of bias 12 / 25
  14. 14. Debian Bugs• People find bugs in binary packages • ~500 bugs per month• But bugs are linked to source packages• Bugs can be • Accepted and solved in Debian • Rejected • Forwarded to upstream• Everything else, similar to other bug tracking systems • Life cycle, comments, severity levels… 13 / 25
  15. 15. 2. The UDD: what is it andwhere to get it 14 / 25
  16. 16. Research work: main paper (at MSR 2010) 15 / 25
  17. 17. Other papers at MSR 2010 16 / 25
  18. 18. What is the UDD?• PostgreSQL database with all the information of the sources described so far • http://udd.debian.org• New dumps available every two days • ~ 500 MB bz2• Used for some Debian internal services• Schema too complex and too big for a slide • Technical detail: you need a Debian-based system to load the dump of the UDD 17 / 25
  19. 19. Debian sources of data• Sources / Packages • Lintian metadata • Migrations to testing• Bugs • Uploads • including *all* • All the way back to archived bugs 1998! • 1995-96-97 • New packages queue• Carnivore • Translations status• Debtags • Orphaned packages• Popularity Contest • Screenshots• DEHS 18 / 25
  20. 20. ! 19 / 25
  21. 21. Bear in mind!• You can also obtain the source code of the packages • Easy to automate• And the modifications done by the Debian maintainers• So add product metrics to the set of data sources• But this is not included in the UDD 20 / 25
  22. 22. 3. What has been done andwhat we can do 21 / 25
  23. 23. What kind of questions does Debian solve with the UDD?• High priority packages that have Release Candidate blocker bugs• Developers with very buggy and/or outdated packages• Who uploaded this package to the unstable release?• Who reported the RC bugs since the last release? 22 / 25
  24. 24. Some questions solved in the literature• The popularity bias • http://oa.upm.es/9585/ • Open source projects get more bug reports if they are popular • The actual number of bugs is not related to the number of bugs reported • So more bugs actually means more quality • Well, at least more people who decide to use the software 23 / 25
  25. 25. The popularity bias Required packagesLog(Bugs) Log(installations) 24 / 25
  26. 26. Summary• Packages and sources metadata • And source code• Bugs • All the way back to 1995-96-97!• Popularity contest• Maintainers activity (uploads) • All the way back to 1998!• And much more….• Now, what do you think we can do with this? 25 / 25
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×