The document discusses the author's 10 years of experience analyzing large code bases, with a focus on improving package quality assurance (QA) for GNU/Linux distributions. It describes how distributions industrialized free software by acting as intermediaries between developers and users. It then outlines the author's work studying and developing tools to help distributions more efficiently ensure package installability, find incompatible packages, and predict issues from repository updates. The talk concludes with a call to action to address software fragility and a discussion of the Software Heritage project.
What Mr. Spock would possibly say about modern unit testing: pragmatic and em...Yaroslav Yermilov
In this talk we will go through spock-framework features and compare them with what JUnit and TestNG can offer instead. Together we will try to find out both pragmatic and emotional answer to the Ultimate Question of Unit Testing: finally, should one use spock-framework in the year of 2016? Finally, we will take a quick look on spock-framework ecosystem and examine what the hell is JUnit 5 (yes, it’s a thing).
Code: https://github.com/yermilov/spock-talk/
Footnotes:
Vulcans are extraterrestrial humanoid species from the planet Vulcan attempted to live by reason and logic with no interference from emotion.
Mr. Spock is Half-Vulcan and Half-Human, so he can greatly combine pragmatic and emotional approach.
As Mr. Spock has an A7 computer expert classification he is surely interested in unit testing.
What Mr. Spock would possibly say about modern unit testing: pragmatic and em...Yaroslav Yermilov
In this talk we will go through spock-framework features and compare them with what JUnit and TestNG can offer instead. Together we will try to find out both pragmatic and emotional answer to the Ultimate Question of Unit Testing: finally, should one use spock-framework in the year of 2016? Finally, we will take a quick look on spock-framework ecosystem and examine what the hell is JUnit 5 (yes, it’s a thing).
Code: https://github.com/yermilov/spock-talk/
Footnotes:
Vulcans are extraterrestrial humanoid species from the planet Vulcan attempted to live by reason and logic with no interference from emotion.
Mr. Spock is Half-Vulcan and Half-Human, so he can greatly combine pragmatic and emotional approach.
As Mr. Spock has an A7 computer expert classification he is surely interested in unit testing.
Software Bertillonage: Finding the Provenance of an Entitymigod
Slides from the paper presented at the 2011 IEEE Intl Conf on Mining Software Repositories, by Julius Davies, Daniel German, Mike Godfrey, and Abram Hindle
How can you squeeze Security into DevOps? Security is often an understaffed function, so how can you leverage what you have in DevOps to improve your security posture? We will reveal processes already in place that can be used to improve security. This fine tuning of tools and processes can give you DevSecOps on a shoestring.
Learn more about Puppet 4 and migrating from Puppet 3 from people who've built it and are using it at PuppetConf 2016 in San Diego. More details: https://puppet.com/puppetconf/
Learn the top 5 reasons why software projects fail. The scariest part is that the failure causes are easily avoidable - yet as IT professionals, we continue to make life more difficult than it really needs to be.
Software Heritage: Archiving the Free Software Commons for Fun & ProfitSpeck&Tech
ABSTRACT: The ambition of the Software Heritage project is to collect, preserve, and share the entire body of free software that is published on the Internet in source code form, together with its development history. Since its public announcement in 2016, the project has assembled the largest collection of freely available software source code for about 5 billion unique source code files and 1 billion commits, coming from more than 80 million projects.
Initially focused on the collection and preservation goals - which were at the time urgent, due to the recurrent disappearances of development forges - Software Heritage has since rolled out several mechanisms to peruse its archive, making progress on the sharing goal.
In this talk, we will review the status of the Software Heritage project, emphasizing how users and developers can, today, benefit from the availability of a great public library of source code.
BIO: Stefano Zacchiroli is Associate Professor of Computer Science at University Paris Diderot on leave at Inria. His research interests span formal methods, software preservation, and Free/Open Source Software engineering. He is co-founder and current CTO of the Software Heritage project. He is an official member of the Debian Project since 2001, where he was elected to serve as Debian Project Leader for 3 terms in a row over the period 2010-2013. He is a former Board Director of the Open Source Initiative (OSI) and recipient of the 2015 O'Reilly Open Source Award.
The trials and tribulations of providing engineering infrastructure TechExeter
by Olly Stephens, ARM.
This talk is a reflection on the things I’ve learnt having spent the last 17 years (and counting) providing infrastructure to the engineering communities at ARM Ltd.
ARM engineering engages in a wide variety of engineering disciplines to produce, enable and support it’s products. This, in turn, creates varied demand on the internal infrastructure required to enable it. From large HPC clusters that have been used in pretty much the same way for 20+ years, through weird and wacky custom pieces of hardware, to the modern infrastructure required for efficient software development.
The talk will discuss some of the challenges of providing and evolving the internal infrastructure needed for ARM to function, and reflect on changes resulting from more recent enablers such as cloud computing and home working.
From the FreshTech 2017 conference by TechExeter
www.techexeter.uk
Slides of my inaugural lecture as professor of Software Engineering at IT University of Copenhagen. An attempt to explain what software engineering research is (for me) by example. Presented on December 1st, 2016 at IT University.
The Anatomy of the Idea is a short introduction to ideation - the process of generating ideas. How do we create ideas? Or better yet, how do we create great ideas?
If you're interested in how to improve your ideation, visit Crinid.com for tips, techniques and inspiration.
Thank you for reading,
- Rick van der Wal
http://www.crinid.com
Sustaining & innovating amidst changes is the hallmark of exemplary leadership. Pelmar Group has been displaying this leadership for the last 50 years! In this special edition, we showcase for you Pelmar Eng Ltd and two other knowledge enhancing articles
Software Bertillonage: Finding the Provenance of an Entitymigod
Slides from the paper presented at the 2011 IEEE Intl Conf on Mining Software Repositories, by Julius Davies, Daniel German, Mike Godfrey, and Abram Hindle
How can you squeeze Security into DevOps? Security is often an understaffed function, so how can you leverage what you have in DevOps to improve your security posture? We will reveal processes already in place that can be used to improve security. This fine tuning of tools and processes can give you DevSecOps on a shoestring.
Learn more about Puppet 4 and migrating from Puppet 3 from people who've built it and are using it at PuppetConf 2016 in San Diego. More details: https://puppet.com/puppetconf/
Learn the top 5 reasons why software projects fail. The scariest part is that the failure causes are easily avoidable - yet as IT professionals, we continue to make life more difficult than it really needs to be.
Software Heritage: Archiving the Free Software Commons for Fun & ProfitSpeck&Tech
ABSTRACT: The ambition of the Software Heritage project is to collect, preserve, and share the entire body of free software that is published on the Internet in source code form, together with its development history. Since its public announcement in 2016, the project has assembled the largest collection of freely available software source code for about 5 billion unique source code files and 1 billion commits, coming from more than 80 million projects.
Initially focused on the collection and preservation goals - which were at the time urgent, due to the recurrent disappearances of development forges - Software Heritage has since rolled out several mechanisms to peruse its archive, making progress on the sharing goal.
In this talk, we will review the status of the Software Heritage project, emphasizing how users and developers can, today, benefit from the availability of a great public library of source code.
BIO: Stefano Zacchiroli is Associate Professor of Computer Science at University Paris Diderot on leave at Inria. His research interests span formal methods, software preservation, and Free/Open Source Software engineering. He is co-founder and current CTO of the Software Heritage project. He is an official member of the Debian Project since 2001, where he was elected to serve as Debian Project Leader for 3 terms in a row over the period 2010-2013. He is a former Board Director of the Open Source Initiative (OSI) and recipient of the 2015 O'Reilly Open Source Award.
The trials and tribulations of providing engineering infrastructure TechExeter
by Olly Stephens, ARM.
This talk is a reflection on the things I’ve learnt having spent the last 17 years (and counting) providing infrastructure to the engineering communities at ARM Ltd.
ARM engineering engages in a wide variety of engineering disciplines to produce, enable and support it’s products. This, in turn, creates varied demand on the internal infrastructure required to enable it. From large HPC clusters that have been used in pretty much the same way for 20+ years, through weird and wacky custom pieces of hardware, to the modern infrastructure required for efficient software development.
The talk will discuss some of the challenges of providing and evolving the internal infrastructure needed for ARM to function, and reflect on changes resulting from more recent enablers such as cloud computing and home working.
From the FreshTech 2017 conference by TechExeter
www.techexeter.uk
Slides of my inaugural lecture as professor of Software Engineering at IT University of Copenhagen. An attempt to explain what software engineering research is (for me) by example. Presented on December 1st, 2016 at IT University.
The Anatomy of the Idea is a short introduction to ideation - the process of generating ideas. How do we create ideas? Or better yet, how do we create great ideas?
If you're interested in how to improve your ideation, visit Crinid.com for tips, techniques and inspiration.
Thank you for reading,
- Rick van der Wal
http://www.crinid.com
Sustaining & innovating amidst changes is the hallmark of exemplary leadership. Pelmar Group has been displaying this leadership for the last 50 years! In this special edition, we showcase for you Pelmar Eng Ltd and two other knowledge enhancing articles
Doing Business 2015: au-delà de l’efficience est une publication phare du Groupe de la Banque Mondiale et est le 12ème d'une série de rapports annuels mesurant les réglementations favorables et défavorables de l'activité commerciale. Doing Business présente des indicateurs quantitatifs sur la réglementation des affaires et la protection des droits de propriété de 189 pays - de l'Afghanistan au Zimbabwe - au fil du temps.
Doing Business mesure les réglementations affectant 11 domaines de la vie d'une entreprise. Dix de ces domaines sont inclus dans le classement de cette année sur la facilité de faire des affaires: création d'entreprise, octroi de permis de construire, raccordement à l'électricité, transfert de propriété, obtention de prêts, protection des investisseurs minoritaires, paiement des impôts, commerce transfrontalier, exécution des contrats et règlement de l’insolvabilité. Doing Business mesure également la régulation du marché du travail, ce qui n'est pas inclus dans le classement de cette année.
Les données de Doing Business 2015 sont mises à jour en date du 1er Juin 2014. Les indicateurs sont utilisés pour analyser les résultats économiques et identifier les meilleures réformes de la réglementation des affaires, dépendant de l’endroit et de l’objectif. Le rapport de cette année présente une expansion notable de plusieurs ensembles d'indicateurs et un changement dans le calcul du classement.
Nc verification and re processing for collaborative machiningLiu PeiLing
Collaborative machining is becoming a common practice worldwide. In mold manufacturing industry, as the specialized workshops often do machining much faster and cheaper than big mold firms, the mold makers are sub-contracting the machining jobs to other workshops especially the specialized workshops for higher efficiency and profit. This practice causes the separation of NC data generation, verification, and re-processing which requests new ways to manage NC data. This paper investigates the collaborative machining process and identifies quick NC data verification and re-processing as critical issues. The functionalities and limitations of the commercial systems are studied and the related NC model, simulation, verification, and optimization technology are scrutinzed. A dynamic in-process stock model based on a new geometry representation is proposed, then a system for quick NC verification and re-processing is developed using OpenGL. The system has been implemented in many mold manufacturing companies and the results show that the pervasive machining modeling, simulation, verification, and re-processing can effectively optimize machining processes in collaborative machining environments.
Mining Component Repositories for Installability IssuesRoberto Di Cosmo
Slides of the MSR 2015 presentation on the debcheck tool, that has been used to track installability issues in Debian for almost 10 years, and can be extended to many other repositories, like the Opam and Drupal ones.
Leveraging Solver Preferences to Tame your Package ManagerRoberto Di Cosmo
In this presentation, you will learn how to use solver preferences and external solvers to guide your package managers and find the best installation or upgrade for you.
With work from Mancoosi, Irill, and OCamlPro
Presented at JavaZone (10th September 2014)
Video available at https://vimeo.com/105758303
But how much reason supports the rituals and mantras often repeated as coding guidelines? It turns out that the advice often fails, even for the novices they are intended to guide. Let's reason through these rather than accept them as unquestioned habits.
How many asserts should a test case have or not have? How much work should a constructor (not) do? What mantra guides test-first programming? How do you name your classes and other identifiers? How do you lay out your code? These questions and others have standard answers based on received and repeated mantras, practices that are communicated in good faith to be passed on as habits. But how much reason supports these assertions? It turns out that the advice often fails, even for the novices they are intended to guide.
This talk has little respect for ritual and tradition and takes no prisoners: What actually makes sense and what doesn't when it comes to matters of practice? What guidelines offer the greatest effect and the greatest learning?
The goal of the EU project FASTEN is being able to perform a more sophisticated analysis of security-vulnerability propagation, licensing compliance, and dependency risk profiles (among others) by relying on the call-level dependency network of the whole software ecosystem. We outline the purpose and structure of the project, and present some preliminary results.
A popular form of software reuse involves linking open source software (OSS) libraries hosted on centralized code repositories, such as Maven or PyPI. The size of such repositories keeps increasing at an astonishing speed, and the network of dependencies among the libraries they host is only a very crude way to reflect the real impact of those dependencies, especially for what concerns bugs and vulnerabilities. It is becoming more and more urgent to develop techniques that aim at analyzing dependencies at a finer level (i.e., at call level). This is precisely the goal of the EU project FASTEN. The purpose is to be able to perform a more sophisticated analysis of security-vulnerability propagation, licensing compliance, and dependency risk profiles (among others) by relying on the call-level dependency network of the whole software ecosystem.
Vulnerability Exploitation in Docker Container EnvironmentsFlawCheck
According to Forrester, 53% of IT respondents say their biggest concern about containers is security. Containerization is not only prevalent in browsers (Google Chrome), desktop applications (Adobe Reader X), and mobile operating systems (Apple iOS), but is also invading the data center via Docker. Docker and other LXC-based containerization solutions provide isolation via Linux control groups (cgroups). However, containers can still be exploited and even with kernel-level isolation, critical data can be stolen. In this presentation, the FlawCheck team will exploit real-world Docker implementations and show what can be done to mitigate the risk.
Development environments are a necessary part of every developer's workflow. They can also be a great source of friction. What may begin as simply running python my_app.py eventually bloats as you add more apps, more databases, more testing frameworks, and more developers. We'll talk about the evolution of a typical development environment, how it lets us down, and how we try to make it better. We'll end with an introduction to Dusty, a new tool which uses Docker containers to take our development environments to the next level.
Originally presented at PyGotham 2015.
Conda is a cross-platform package manager that lets you quickly and easily build environments containing complicated software stacks. It was built to manage the NumPy stack in Python but can be used to manage any complex software dependencies.
Why everyone is excited about Docker (and you should too...) - Carlo Bonamic...Codemotion
In less than two years Docker went from first line of code to major Open Source project with contributions from all the big names in IT. Everyone is excited, but what's in for me - as a Dev or Ops? In short, Docker makes creating Development, Test and even Production environments an order of magnitude simpler, faster and completely portable across both local and cloud infrastructure. We will start from Docker main concepts: how to create a Linux Container from base images, run your application in it, and version your runtimes as you would with source code, and finish with a concrete example.
Software Preservation: challenges and opportunities for reproductibility (Sci...Roberto Di Cosmo
Reprodicibility of scientific experiments, now mostly based on software tools, is in a sore state. We investigate here some of the causes and propose long term Software Preservation as one of the essential elements needed to bring our Science more in line with the Scientific Method.
Similar to Ten years analysing large code bases: a perspective (20)
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxMAGOTI ERNEST
Although Artemia has been known to man for centuries, its use as a food for the culture of larval organisms apparently began only in the 1930s, when several investigators found that it made an excellent food for newly hatched fish larvae (Litvinenko et al., 2023). As aquaculture developed in the 1960s and ‘70s, the use of Artemia also became more widespread, due both to its convenience and to its nutritional value for larval organisms (Arenas-Pardo et al., 2024). The fact that Artemia dormant cysts can be stored for long periods in cans, and then used as an off-the-shelf food requiring only 24 h of incubation makes them the most convenient, least labor-intensive, live food available for aquaculture (Sorgeloos & Roubach, 2021). The nutritional value of Artemia, especially for marine organisms, is not constant, but varies both geographically and temporally. During the last decade, however, both the causes of Artemia nutritional variability and methods to improve poorquality Artemia have been identified (Loufi et al., 2024).
Brine shrimp (Artemia spp.) are used in marine aquaculture worldwide. Annually, more than 2,000 metric tons of dry cysts are used for cultivation of fish, crustacean, and shellfish larva. Brine shrimp are important to aquaculture because newly hatched brine shrimp nauplii (larvae) provide a food source for many fish fry (Mozanzadeh et al., 2021). Culture and harvesting of brine shrimp eggs represents another aspect of the aquaculture industry. Nauplii and metanauplii of Artemia, commonly known as brine shrimp, play a crucial role in aquaculture due to their nutritional value and suitability as live feed for many aquatic species, particularly in larval stages (Sorgeloos & Roubach, 2021).
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills MN
Travis Hills of Minnesota developed a method to convert waste into high-value dry fertilizer, significantly enriching soil quality. By providing farmers with a valuable resource derived from waste, Travis Hills helps enhance farm profitability while promoting environmental stewardship. Travis Hills' sustainable practices lead to cost savings and increased revenue for farmers by improving resource efficiency and reducing waste.
BREEDING METHODS FOR DISEASE RESISTANCE.pptxRASHMI M G
Plant breeding for disease resistance is a strategy to reduce crop losses caused by disease. Plants have an innate immune system that allows them to recognize pathogens and provide resistance. However, breeding for long-lasting resistance often involves combining multiple resistance genes
Toxic effects of heavy metals : Lead and Arsenicsanjana502982
Heavy metals are naturally occuring metallic chemical elements that have relatively high density, and are toxic at even low concentrations. All toxic metals are termed as heavy metals irrespective of their atomic mass and density, eg. arsenic, lead, mercury, cadmium, thallium, chromium, etc.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
ANAMOLOUS SECONDARY GROWTH IN DICOT ROOTS.pptxRASHMI M G
Abnormal or anomalous secondary growth in plants. It defines secondary growth as an increase in plant girth due to vascular cambium or cork cambium. Anomalous secondary growth does not follow the normal pattern of a single vascular cambium producing xylem internally and phloem externally.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Ten years analysing large code bases: a perspective
1. Ten years analysing large code bases: a perspective
Roberto Di Cosmo
http://www.dicosmo.org
04/12/2015
EvoLille
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 1 / 45
2. Credits: joint work with...
Pietro Abate Jaap Boender Yacine Boufkhad
Stefano Zacchiroli Ralf Treinen Jérôme Vouillon
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 2 / 45
3. Outline
1 GNU/Linux Distributions: industrialising Free Software
2 Ten years studying and improving Package QA
Find uninstallable packages
Learning from the future of repositories
Find non co-installable packages
Find new non co-installable packages
Filter incoming packages
3 Taking a step back
Software is Fragile
4 Software Heritage
5 Call to action
6 Questions
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 3 / 45
4. Free Software, industrialised: distributions
Idea from FOSS in the 1990s: distributions are intermediate software
vendors between FOSS developers and users, offering to share upstream
tracking, integration, testing and QA among all of us
Project 1
Project 2
Project 3
FOSS Bazaar
User
installations
Server
side
Client
side
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 4 / 45
5. Free Software, industrialised: distributions
Idea from FOSS in the 1990s: distributions are intermediate software
vendors between FOSS developers and users, offering to share upstream
tracking, integration, testing and QA among all of us
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 4 / 45
6. Distributions: a “somehow” successful idea ...
Key notions: packages and package managers
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 5 / 45
7. Packages and their metadata
Package =
some files
some scripts
metadata
Identification
Inter-package rel.
Dependencies
Conflicts
Feature declarations
Other
Package maintainer
Textual descriptions
...
Example (package metadata)
Package: aterm
Version: 0.4.2-11
Section: x11
Installed-Size: 280
Maintainer: Göran Weinholt ...
Architecture: i386
Depends: libc6 (>= 2.3.2.ds1-4),
libice6 | xlibs (> 4.1.0), ...
Conflicts: suidmanager (< 0.50)
Provides: x-terminal-emulator
...
A package is the elemental component of modern distribution systems (not
FOSS-specific). A working system is deployed by installing a package set (≈ 2’000+
for modern FOSS distros)
.
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 6 / 45
8. Inter-package relationships get complex...
To play Backgammon...
Package: gnubg
Version: 0.14.3+20060923-4
Depends: gnubg-data, ttf-bitstream-
vera, libartsc0 (>= 1.5.0-1), . . . ,
libgl1-mesa-glx | libgl1, . . .
Conflicts: . . .
...pull a few strings
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 7 / 45
9. Inter-package relationships get complex...
To play Backgammon...
Package: gnubg
Version: 0.14.3+20060923-4
Depends: gnubg-data, ttf-bitstream-
vera, libartsc0 (>= 1.5.0-1), . . . ,
libgl1-mesa-glx | libgl1, . . .
Conflicts: . . .
...pull a few strings
Distributions grow superlinearly
Using and maintaining such large
software collections is becoming hard!
manual package review
semi-automated tools
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 7 / 45
10. Using and maintaining free software distributions is hard
We need advanced tools:
1 end user side – package managers
install packages and their dependencies, ...
...according to the user needs and policies
2 distribution editor side – QA infrastructure (our focus today)
find “broken” packages (now easy!)
find packages which impact large parts of the distribution
predict repository update woes
identify compatibility issues
...
With a big boost from the Mancoosi project1, we have made progress in
both areas.
Let’s focus on the distribution editor side
1
http://www.mancoosi.org
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 8 / 45
11. Outline
1 GNU/Linux Distributions: industrialising Free Software
2 Ten years studying and improving Package QA
Find uninstallable packages
Learning from the future of repositories
Find non co-installable packages
Find new non co-installable packages
Filter incoming packages
3 Taking a step back
Software is Fragile
4 Software Heritage
5 Call to action
6 Questions
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 9 / 45
12. Ensuring Quality throughout evolution
The Quality Assurance team and the release manager need to track tens of
thousands of packages, their bugs, their incompatibilities, etc. that change
every day.
It is important to catch as many installation-related errors as possible
before they hit the user and the BTS, and this requires automation.
Static analysis of package dependencies: the state of the art
find packages that cannot be installed at all, and...
spot the ones that surely need to be fixed (know who to blame)
provide advance warning for future problems
find the incompatibilities between packages
show how these incompatibilities evolve
automate package migration
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 10 / 45
13. QA 101: find individually broken packages
The installability problem
In a repository R, decide whether a package p can be installed in isolation
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 11 / 45
14. QA 101: find individually broken packages
The installability problem
In a repository R, decide whether a package p can be installed in isolation
Theorem
The installability problem is NP complete (Di Cosmo et al. ASE 2006)
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 11 / 45
15. QA 101: find individually broken packages
The installability problem
In a repository R, decide whether a package p can be installed in isolation
Theorem
The installability problem is NP complete (Di Cosmo et al. ASE 2006)
Solving tens of thousands of NP-complete problems?
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 11 / 45
16. QA 101: find individually broken packages
The installability problem
In a repository R, decide whether a package p can be installed in isolation
Theorem
The installability problem is NP complete (Di Cosmo et al. ASE 2006)
Solving tens of thousands of NP-complete problems?
In practice: recent SAT solvers handle current instances easily
few explicit conflicts (without conflicts, just dual Horn clauses)
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 11 / 45
17. QA 101: find individually broken packages
The installability problem
In a repository R, decide whether a package p can be installed in isolation
Theorem
The installability problem is NP complete (Di Cosmo et al. ASE 2006)
Solving tens of thousands of NP-complete problems?
In practice: recent SAT solvers handle current instances easily
few explicit conflicts (without conflicts, just dual Horn clauses)
A practical tool
rpmcheck/debcheck (Vouillon, 2006)
finds all broken packages and provides short explanations
fast: analyses ≈ 40 000 (binary) packages in minutes
In dose3 library as distcheck, by Pietro Abate et al.
Extensively used (Abate, Di Cosmo et al. MSR 2015)
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 11 / 45
18. QA 101, level 2: what can we say about the future?
Definition (outdated packages)
p is outdated if p is not installable, and it remains uninstallable no matter
how the other packages evolve (i.e. p’s maintainer has no excuses)
Definition (challengers)
p challenges q if upgrading p “forces” to upgrade q
What they have in common:
properties that hold in all installations of any future evolution of the
repository
seems unfeasible, but we can efficiently decide some properties of this
kind
Abate, Di Cosmo, Treinen, Zacchiroli
Learning from the Future of Component Repositories
CBSE 2012. Best paper award
Tools in the dose library now used in qa.debian.org/dose
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 12 / 45
19. QA 201: Package Co-Installability
Next step: interaction between packages
Example: is there any package which cannot be installed to-
gether with iceweasel? with kde-full?
Definition: a set of packages are co-installable if they can be installed
together.
all packages should be installable (individually!)
some package incompatibilities are expected
Can we summarise all incompatibility issues, and avoid browsing through
hundreds of hyperlinked pages?
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 13 / 45
20. Coinst
A simplification theory for repositories, based on the extraction of a
co-installability kernel, i.e. a repository much smaller than the original but
equivalent wrt co-installability.
Highlights:
reflexive/transitive dependency closure
equivalent classes and quotients
machine-checked proofs (in Coq!)
In a word: tough maths at work!
A tool: coinst (packaged in Debian)
Vouillon, Di Cosmo
On Software Components Co-Installability
ESEC/FSE 2011: Foundations of Software Engineering.
Best Artifact Award.
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 14 / 45
25. QA 301: New Co-Installability Issues
Compare two versions of a repository
New issues are more likely to be bugs
Can report precisely what changes caused an issue
Example a b
p q
Many new issues between packages p and q due to a single
new conflict between packages a and b.
Vouillon, Di Cosmo
Broken Sets in Software Repository Evolution
ICSE 2013
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 18 / 45
26. Finding New Co-Installability Issues
Tool coinst-upgrades
http://coinst.irill.org/upgrades
graphs illustrating each new issue
context: other packages involved, package popularities (popcon)
The new version of unoconv depends on any version of python3-uno
unoconv python3-uno python-uno
The new version of tdsodbc conflicts with any version of libiodbc2
tdsodbc libiodbc2
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 19 / 45
27. Finding New Co-Installability Issues
Tool coinst-upgrades
http://coinst.irill.org/upgrades
graphs illustrating each new issue
context: other packages involved, package popularities (popcon)
The new version of unoconv depends on any version of python3-uno
unoconv python3-uno python-uno
The new version of tdsodbc conflicts with any version of libiodbc2
tdsodbc libiodbc2
Package libiodbc2 had been unmaintained for years
Should not be a big issue if it gets removed, right?
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 19 / 45
29. QA 501: package migration
Unstable
Testing
Conflicting goals
package should reach testing rapidly
keep testing as stable as possible
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 21 / 45
30. The comigrate tool
Supplement/Replace Britney
Generate hints that can be fed to Britney
Interactively investigate migration issues
Run it repeatedly, studying different scenarios
Report of issues preventing package migration
http://coinst.irill.org/report/
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 22 / 45
31. Tool Core: Computing Package Migrations
Boolean solver
(Co-)installability
analysis
Tentative migration
New constraints
Start with simple constraints
The Boolean solver generates a tentative migration
Check for (co-)installability issues; analyse these issues to generate
new constraints (“package A cannot migrate”, or “package A cannot
migrate without package B”)
Repeat until no more issue is found
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 23 / 45
33. Bibliography and tools excerpts
Di Cosmo, Leroy, Treinen, Vouillon et al
Managing the complexity of large free and open source package-based software distributions.
ASE 2006: Automated Software Engineering.
Di Cosmo and J. Vouillon.
On software component co-installability.
In ESEC/FSE 2011.
Abate, Di Cosmo, Treinen, Zacchiroli
Learning from the Future of Component Repositories
CBSE 2012: Component Based Software Engineering.
Vouillon, Dogguy, Di Cosmo.
Easing software component repository evolution.
ICSE 2014.
Abate, Di Cosmo, Gesbert, Le Fessant, Treinen, and Zacchiroli.
Mining component repositories for installability issues.
MSR 2015
Claes, Mens, Di Cosmo, and Vouillon.
A historical analysis of debian package incompatibilities.
MSR 2015
Tools
Cudf library: http://gforge.inria.fr/projects/cudf/
Dose library: http://gforge.inria.fr/projects/dose/
Coinst suite: http://coinst.irill.org
Debian QA: http://qa.debian.org/dose
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 25 / 45
34. Outline
1 GNU/Linux Distributions: industrialising Free Software
2 Ten years studying and improving Package QA
Find uninstallable packages
Learning from the future of repositories
Find non co-installable packages
Find new non co-installable packages
Filter incoming packages
3 Taking a step back
Software is Fragile
4 Software Heritage
5 Call to action
6 Questions
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 26 / 45
35. A recurring pattern
In all examples above
identify a real world problem whose solution requires a research effort
work hard to find a solution
implement a tool, validate it on real world cases
publish a research article
foster adoption (the hardest part!)
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 27 / 45
36. A recurring pattern
In all examples above
identify a real world problem whose solution requires a research effort
work hard to find a solution
implement a tool, validate it on real world cases
publish a research article
foster adoption (the hardest part!)
In a picture
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 27 / 45
37. A recurring pattern
In all examples above
identify a real world problem whose solution requires a research effort
work hard to find a solution
implement a tool, validate it on real world cases
publish a research article
foster adoption (the hardest part!)
In a picture Under the hood
Question:
What were the
technical prerequisites
that made this work possible?
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 27 / 45
38. Technical prerequisites
Availability
all the (history of) Debian packages (after 2005)
no technical restrictions
no legal restrictions on content or metadata
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 28 / 45
39. Technical prerequisites
Availability
all the (history of) Debian packages (after 2005)
no technical restrictions
no legal restrictions on content or metadata
Traceability
Debian packages have
unique identifier
reference central
repository
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 28 / 45
40. Technical prerequisites
Availability
all the (history of) Debian packages (after 2005)
no technical restrictions
no legal restrictions on content or metadata
Traceability
Debian packages have
unique identifier
reference central
repository
Uniformity
Debian packages: a central catalog with
uniform metadata structure
uniform naming and versioning
schema
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 28 / 45
41. Technical prerequisites
Availability
all the (history of) Debian packages (after 2005)
no technical restrictions
no legal restrictions on content or metadata
Traceability
Debian packages have
unique identifier
reference central
repository
Uniformity
Debian packages: a central catalog with
uniform metadata structure
uniform naming and versioning
schema
These are all essential features for reproducibility and for preservation...
... we need them for all software!
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 28 / 45
42. Availability: software is fragile
An example is worth a thousand words...
let’s see a few
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 29 / 45
43. Inconsiderate or malicious loss of code
The Year 2000 Bug ... uncovered an inconvenient truth
in 1999, an estimated 40% of companies had either lost, or
thrown away the original source code for their systems!
CodeSpaces: source code hosting, 2007-2014
Yes, for seven years all seemed ok.
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 30 / 45
44. Inconsiderate or malicious loss of code
The Year 2000 Bug ... uncovered an inconvenient truth
in 1999, an estimated 40% of companies had either lost, or
thrown away the original source code for their systems!
CodeSpaces: source code hosting, 2007-2014
Yes, for seven years all seemed ok.
No, they did not recover the data.
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 30 / 45
45. Business-driven loss of code support: Google
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 31 / 45
46. Business-driven loss of code support: Gitorious
From: Rolf Bjaanes <rolf@gitorious.org>
To: zack@upsilon.cc
Subject: Gitorious.org is dead, long live Gitorious.o
Message-Id: <30589491.20150416155909.552fdc4d164758
Hi zacchiro,
I’m Rolf Bjaanes, CEO of Gitorious, and you are re-
ceiving this email because you have a user on gito-
rious.org. As you may know, Gitorious was acquired
by GitLab [1] about a month ago (NDLR: 3/3/2015),
and we announced that Gitorious.org would be shut-
ting down at the end of May, 2015.
... Rolf
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 32 / 45
47. Traceability: disruption of the web of reference
Web links are not permanent (even permalinks)}
there is no general guarantee that a URL... which at one time
points to a given object continues to do so
T. Berners-Lee et al. Uniform Resource Locators. RFC 1738.
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 33 / 45
48. Traceability: disruption of the web of reference
Web links are not permanent (even permalinks)}
there is no general guarantee that a URL... which at one time
points to a given object continues to do so
T. Berners-Lee et al. Uniform Resource Locators. RFC 1738.
URLs used in articles decay!
Analysis of IEEE Computer (Computer), and the Communications of the ACM
(CACM): 1995-1999
the half-life of a referenced URL is approximately 4 years from its
publication date D. Spinellis. The Decay and Failures of URL
References.
Communications of the ACM, 46(1):71-77, January 2003.
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 33 / 45
49. Uniformity: nowhere to be seen
Reference repositor(ies)
Bitbucket
GitHub
Gitorious (no, scratch this)
Google Code (no, scratch this)
Maven
Sourceforge
your institution’s forge
your home page
...
And they are all diffent / incompatible
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 34 / 45
50. Outline
1 GNU/Linux Distributions: industrialising Free Software
2 Ten years studying and improving Package QA
Find uninstallable packages
Learning from the future of repositories
Find non co-installable packages
Find new non co-installable packages
Filter incoming packages
3 Taking a step back
Software is Fragile
4 Software Heritage
5 Call to action
6 Questions
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 35 / 45
51. The Software Heritage Project
Our mission
Collect, organise, preserve and share all the software that lies at the heart of
our culture and our society.
Provides exactly
availability
traceability
uniformity
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 36 / 45
52. We are working on the foundations
one infrastructure to build them all
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 37 / 45
53. Fostering wider education to computing
A global source referencing all software
a SourceBook for technological education
intrinsic persistent identifiers for stable course materials
extensive access to real-world documentation
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 38 / 45
54. Supporting more accessible and reproducible Science
A global library referencing all software used in all research fields
completes the infrastructure for Open Access in Science
provides intrinsic persistent identifiers needed for scientific
reproducibility
enables large scale, verifiable Software Studies
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 39 / 45
55. The Knowledge Conservancy Magic Triangle
The Knowledge Conservancy Magic Triangle
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 40 / 45
56. The Knowledge Conservancy Magic Triangle
The Knowledge Conservancy Magic Triangle
Legenda (links are important!)
articles: ArXiv, HAL, ...
data: Zenodo, ...
software: Software Heritage tackles this
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 40 / 45
57. Outline
1 GNU/Linux Distributions: industrialising Free Software
2 Ten years studying and improving Package QA
Find uninstallable packages
Learning from the future of repositories
Find non co-installable packages
Find new non co-installable packages
Filter incoming packages
3 Taking a step back
Software is Fragile
4 Software Heritage
5 Call to action
6 Questions
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 41 / 45
58. Need your help
make it easy to integrate your work
development workflow
publication workflow
contribute importers
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 42 / 45
59. Need your help
make it easy to integrate your work
development workflow
publication workflow
contribute importers
make it ok to integrate, from the legal point of view
make licences explicit
make licences of dependencies explicit
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 42 / 45
60. Need your help
make it easy to integrate your work
development workflow
publication workflow
contribute importers
make it ok to integrate, from the legal point of view
make licences explicit
make licences of dependencies explicit
make it useful for research
contribute to the API
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 42 / 45
61. Need your help
make it easy to integrate your work
development workflow
publication workflow
contribute importers
make it ok to integrate, from the legal point of view
make licences explicit
make licences of dependencies explicit
make it useful for research
contribute to the API
help us make Software Heritage sustainable
support/sponsorship
open process and collaboration
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 42 / 45
62. Focus on the legal issues
a plurality of concerns
Who owns the rights to your research?
articles, data, software
too often forgotten: metadata
Software Track in Science of Computer Programming, 2015
You own the software, but who owns the metadata?
we need to recover our rigths
it is possible!
compulsory exclusive copyright transfer for free
is illegal in France (art L. 131-4 of CPI)
is debatable in all jurisdictions
see Free Scientific Publication
paying the editors (OpenAire) is not a solution
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 43 / 45
63. Outline
1 GNU/Linux Distributions: industrialising Free Software
2 Ten years studying and improving Package QA
Find uninstallable packages
Learning from the future of repositories
Find non co-installable packages
Find new non co-installable packages
Filter incoming packages
3 Taking a step back
Software is Fragile
4 Software Heritage
5 Call to action
6 Questions
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 44 / 45
64. Questions
Questions?
Subscribe to
mailing list: swh-science@inria.fr
https://sympa.inria.fr/sympa/info/swh-science
Roberto Di Cosmo (INRIA/Paris Diderot) Analysing large code bases EvoLille 2015 45 / 45