Finding self-organized criticality in collaborative work via repository mining
criticality in collaborative
work via repository mining
J. J. Merelo1, P. A. Castillo1, Mario García-Valdez2
1 University of Granada (Spain)
2 Instituto Tecnológico de Tijuana (México)
Development teams eventually become complex
systems, mainly in collaborative work environments.
Relations and collaborations take place through the
Pattern mining and analysing social-based information
is a complex problem.
Analysing self-organization in collaborative work
Using graphic tools to analyse the dynamics in
collaborative work environments.
To explore and analyse relations-based data:
Do developers self-organize?
Contribute to open science tools and methodologies.
In Statistical Physics, criticality is defined as a type of
behaviour observed when a system undergoes a
A state on the edge between two different types of
behaviour is called the critical state, and in this state
the system is at criticality.
Example: The sandpile model
The sandpile model of self-organized criticality:
Dropping an additional grain on the pile may set off
avalanches that slide down the pile's slopes.
Small variation, large effect
We add one grain to the pile, so in average the
steepness of slopes increases.
The slopes evolve to a critical state where a single
grain of sand is likely to settle on the pile, or to trigger
To present the underlying concepts and ideas from
Statistical Physics and nonlinear dynamics that could
explain relations in collaborative work environments.
Find out the dynamics underlying collaboration and
We examined 4 repositories where the collaborative
writing of scientific papers take place.
Analysing changes in files, looking for the existence of:
1. a scale free structure
2. long-distance correlations
3. pink noise
In this report we work on a repository for several papers.
Repositories with a certain “length”: more than 50
Macro measures extracted from the size of changes.
Several macro measures extracted from the size of
changes to the files in the repository.
• Sequence of changes
• Timeline of commit sizes
• Change sizes ranked in descending order
• Long-distance correlations
• Presence of pink noise (1/f)
Sequence of changes
static for a long time,
followed by big changes
Timeline of commit sizes
periods with small
changes VS other that
alternate big and small
Change sizes ranked in
many small changes VS
few commits that
change many lines.
The bird is big
signiﬁcant if the lines go
over the mean (dashed
the spectrum should present a
slope equal to -1
There is not a clear trend
downwards. The presence of
pink noise is not as clear as
the other two characteristics
Presence of pink noise, as measured
by the power spectral density (1/f)
After analysing several repositories for scientific papers,
they are in a critical state:
• changes have a scale-free form, and
• there are long-distance correlations
• pink noise (only in some cases)
Open Science + reproducibility: draw your own
conclusions using the programs and data published at:
“Measuring progress in literature and in other creative
endeavours, like programming”
Pedro A. Castillo
University of Granada