Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Ultimate Debian Database

613 views

Published on

Some comments about the sources of data stored in the Ultimate Debian Database

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

The Ultimate Debian Database

  1. 1. The Ultimate Debian Database Israel Herraiz <israel.herraiz@upm.es> Davis, CA, July 26th 2012Download these slides at http://slideshare.net/herraiz/the-ultimate-debian-database
  2. 2. Outline1. Debian: what is it and sources of data2. The UDD: what is it and where to get it3. What has been done and what we can do 1 / 25
  3. 3. 1. Debian: what is it andsources of data 2 / 25
  4. 4. Debian• GNU/Linux software distribution • Goal: to deliver an entirely and exclusively free distribution• Maintained by volunteers• Bureaucratic organization (policies, constitution, social contract)• Release when ready• > 10 years history• > 500 MSLOC• > 15k packages 3 / 25
  5. 5. Debian Releases 4 / 25
  6. 6. 5 / 25
  7. 7. Debian Source Packages 6 / 25
  8. 8. Source and Binary Packages• A source package generates one or more binary packages octave-core octave-doc octave liboctave liboctave-dev 7 / 25
  9. 9. Package uploads• There are no repositories like in other software projects • Although developers may privately use version control systems• When a bug is fixed, a new version is uploaded • Uploads == commits 8 / 25
  10. 10. Source Packages metadataSource: octaveSection: mathPriority: extraMaintainer: Debian Octave Group <pkg-octave-devel@lists.alioth.debian.org>Uploaders: Thomas Weber <tweber@debian.org>, Sébastien Villemot<sebastien.villemot@ens.fr>DM-Upload-Allowed: yesBuild-Depends: gfortran, debhelper (>= 9), automake, dh-autoreconf, texinfo ….Standards-Version: 3.9.3Homepage: http://www.octave.org/Vcs-Git: git://git.debian.org/git/pkg-octave/octave.gitVcs-Browser: http://git.debian.org/?p=pkg-octave/octave.git 9 / 25
  11. 11. Binary Packages metadataPackage: octavePriority: extraSection: mathInstalled-Size: 4760Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>Architecture: amd64Version: 3.6.1-1ubuntu1ppa1~precise1Recommends: gnuplot, libatlas3gf-baseReplaces: octave3.2Suggests: octave-info, octave-doc, octave-htmldocDepends: libamd2.2.0 (>= 1:3.4.0), libarpack2 (>= 2.1), …Conflicts: octave3.2Filename: pool/main/o/octave/octave_3.6.1-1ubuntu1ppa1~precise1_amd64.debSize: 1746050MD5sum: 2c431556d6cf98fd8a341e865ac63058SHA1: b333c49e6f6cb7d4445378020dfffdb5a1626de7Description: GNU Octave language for numerical computations… 10 / 25
  12. 12. Binary Packages metadataPackage: octavePriority: extraSection: mathInstalled-Size: 4760Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>Architecture: amd64Version: 3.6.1-1ubuntu1ppa1~precise1Recommends: gnuplot, libatlas3gf-baseReplaces: octave3.2Suggests: octave-info, octave-doc, octave-htmldocDepends: libamd2.2.0 (>= 1:3.4.0), libarpack2 (>= 2.1), …Conflicts: octave3.2Filename: pool/main/o/octave/octave_3.6.1-1ubuntu1ppa1~precise1_amd64.debSize: 1746050MD5sum: 2c431556d6cf98fd8a341e865ac63058SHA1: b333c49e6f6cb7d4445378020dfffdb5a1626de7Description: GNU Octave language for numerical computations… 11 / 25
  13. 13. Debian Popcon: Tracking Installations• Popularity: total install counts • Recent Use (< 30 days) • Old Use (Beyond 30 days)• Data collected daily• Users voluntarily opt- in • Source of bias 12 / 25
  14. 14. Debian Bugs• People find bugs in binary packages • ~500 bugs per month• But bugs are linked to source packages• Bugs can be • Accepted and solved in Debian • Rejected • Forwarded to upstream• Everything else, similar to other bug tracking systems • Life cycle, comments, severity levels… 13 / 25
  15. 15. 2. The UDD: what is it andwhere to get it 14 / 25
  16. 16. Research work: main paper (at MSR 2010) 15 / 25
  17. 17. Other papers at MSR 2010 16 / 25
  18. 18. What is the UDD?• PostgreSQL database with all the information of the sources described so far • http://udd.debian.org• New dumps available every two days • ~ 500 MB bz2• Used for some Debian internal services• Schema too complex and too big for a slide • Technical detail: you need a Debian-based system to load the dump of the UDD 17 / 25
  19. 19. Debian sources of data• Sources / Packages • Lintian metadata • Migrations to testing• Bugs • Uploads • including *all* • All the way back to archived bugs 1998! • 1995-96-97 • New packages queue• Carnivore • Translations status• Debtags • Orphaned packages• Popularity Contest • Screenshots• DEHS 18 / 25
  20. 20. ! 19 / 25
  21. 21. Bear in mind!• You can also obtain the source code of the packages • Easy to automate• And the modifications done by the Debian maintainers• So add product metrics to the set of data sources• But this is not included in the UDD 20 / 25
  22. 22. 3. What has been done andwhat we can do 21 / 25
  23. 23. What kind of questions does Debian solve with the UDD?• High priority packages that have Release Candidate blocker bugs• Developers with very buggy and/or outdated packages• Who uploaded this package to the unstable release?• Who reported the RC bugs since the last release? 22 / 25
  24. 24. Some questions solved in the literature• The popularity bias • http://oa.upm.es/9585/ • Open source projects get more bug reports if they are popular • The actual number of bugs is not related to the number of bugs reported • So more bugs actually means more quality • Well, at least more people who decide to use the software 23 / 25
  25. 25. The popularity bias Required packagesLog(Bugs) Log(installations) 24 / 25
  26. 26. Summary• Packages and sources metadata • And source code• Bugs • All the way back to 1995-96-97!• Popularity contest• Maintainers activity (uploads) • All the way back to 1998!• And much more….• Now, what do you think we can do with this? 25 / 25

×