Theory and Practice of Reproducible
Research
OGRS, Perugia, October 12, 2016
Riccardo Rigon, Francesco Serafin, Marialaura Bancheri
AntonioCanova,Letregrazie
2
Antonio Canova gypsum statues bring a series of
little signs. They served the stonemasons to
reproduce “industrially” the opera. Art became
“reproducible” for the fist time.
Rigon & Al.
Canova ?
http://simplystatistics.org/2013/01/23/statisticians-and-computer-scientists-if-there-is-no-code-there-is-no-paper/
I have been frustrated often with statisticians and
computer scientists who write papers where they
develop new methods and seem to demonstrate that
those methods blow away all their competitors. But
then no software is available to actually test and see if
that is true. … In my mind, new methods/analyses
without software are just vaporware … If there is no
code, there is no paper.
By Jeff Leek*
4
Science must be reproducible (i.e. repeatable)
It is the fundamental. It means that everyone (in principle) should
be able to take what you write, the experiment you did, the
mathematics you drew, and doing it again with his own resources.
“In principle means” that science is often
not is not shared … for various reasons …
Why reproducibility ?
Rigon & Al.
5
Not anyone can reproduce scientific achievements
“In principle” means that
S/he must be trained to do it (there are problems of transmission of
information here). And, in fact, more advanced results, can be difficult to
grab, even for the very same autors.
Introduction
Rigon & Al.
6
Getting more: Replicability
Reproducibility vs Replicability
Rigon & Al.
7
Analysing a paper for reproducibility
(the case of Formetta et al., 2011)
Geosci. Model Dev., 4, 943–955, 2011
www.geosci-model-dev.net/4/943/2011/
doi:10.5194/gmd-4-943-2011
© Author(s) 2011. CC Attribution 3.0 License.
Geoscientific
Model Development
The JGrass-NewAge system for forecasting and managing the
hydrological budgets at the basin scale: models of flow generation
and propagation/routing
G. Formetta1, R. Mantilla2, S. Franceschi3, A. Antonello3, and R. Rigon1
1University of Trento, 77 Mesiano St., Trento, 38123, Italy
2The University of Iowa, C. Maxwell Stanley Hydraulics Laboratory, Iowa 52242-1585, USA
3Hydrologis S.r.l., Bolzano, BZ, Italy
Received: 16 April 2011 – Published in Geosci. Model Dev. Discuss.: 29 April 2011
Revised: 20 September 2011 – Accepted: 31 October 2011 – Published: 4 November 2011
Abstract. This paper presents a discussion of the predic-
tive capacity of the implementation of the semi-distributed
hydrological modeling system JGrass-NewAge. This model
focuses on the hydrological budgets of medium scale to large
scale basins as the product of the processes at the hillslope
scale with the interplay of the river network. The part of the
modeling system presented here deals with the: (i) estimation
of the space-time structure of precipitation, (ii) estimation of
runoff production; (iii) aggregation and propagation of flows
in channel; (v) estimation of evapotranspiration; (vi) auto-
matic calibration of the discharge with the method of particle
swarming.
The system is based on a hillslope-link geometrical par-
tition of the landscape, combining raster and vectorial treat-
ment of hillslope data with vector based tracking of flow in
channels. Measured precipitation are spatially interpolated
with the use of kriging. Runoff production at each channel
link is estimated through a peculiar application of the Hymod
model. Routing in channels uses an integrated flow equation
and produces discharges at any link end, for any link in the
river network. Evapotranspiration is estimated with an im-
plementation of the Priestley-Taylor equation. The model
system assembly is calibrated using the particle swarming
algorithm. A two year simulation of hourly discharge of the
Little Washita (OK, USA) basin is presented and discussed
with the support of some classical indices of goodness of fit,
and analysis of the residuals. A novelty with respect to tra-
ditional hydrological modeling is that each of the elements
above, including the preprocessing and the analysis tools,
is implemented as a software component, built upon Object
Modelling System v3 and jgrasstools prescriptions, that can
be cleanly switched in and out at run-time, rather than at
Correspondence to: G. Formetta
( formetta@ing.unitn.it)
compiling time. The possibility of creating different mod-
eling products by the connection of modules with or without
the calibration tool, as for instance the case of the present
modeling chain, reduces redundancy in programming, pro-
motes collaborative work, enhances the productivity of re-
searchers, and facilitates the search for the optimal modeling
solution.
1 Introduction
Hydrological forecasting over time has focused on differ-
ent issues. Determining the discharge of rivers during flood
events has been a central topic for more than a century;
firstly through the rational model of Mulvaney (1851), later
through the use of instantaneous unit hydrograph models
(Sherman, 1932; Dooge, 1959), and more recently includ-
ing the geomorphological approach (i.e. GIUH; Rodr´ıguez-
Iturbe and Vald´es, 1979; Gupta and Waymire, 1980; Rosso,
1984; D’Odorico and Rigon, 2003). Even models of runoff
generation such as Topmodel (Beven and Kirkby, 1979;
Beven, 2001; Franchini et al., 1996) have been used mainly
for this purpose.
Subsequently, however, the water resource and river man-
agement required the need to estimate a whole set of hydro-
logical quantities such as discharge, evapotranspiration, and
soil moisture, bringing very soon to the implementation of
more comprehensive modeling systems, like the pioneering
Stanford watershed model (Crawford and Linsley, 1966), the
Sacramento model (e.g. Burnash et al., 1973), and the PRMS
model (Leavesley et al., 1983). They were usually based on
the idea of intercommunicating compartments (reservoirs),
each representing a process domain, each one with its resi-
dence time.
Published by Copernicus Publications on behalf of the European Geosciences Union.
Formetta et al. 2011
Rigon & Al.
8
This is a paper, which I co-authored, dealing with a model for rainfall runoff,
It is mostly which presents a hydrological model, with an application to a case
study
Geosci. Model Dev., 4, 943–955, 2011
www.geosci-model-dev.net/4/943/2011/
doi:10.5194/gmd-4-943-2011
© Author(s) 2011. CC Attribution 3.0 License.
Geoscientific
Model Development
The JGrass-NewAge system for forecasting and managing the
hydrological budgets at the basin scale: models of flow generation
and propagation/routing
G. Formetta1, R. Mantilla2, S. Franceschi3, A. Antonello3, and R. Rigon1
1University of Trento, 77 Mesiano St., Trento, 38123, Italy
2The University of Iowa, C. Maxwell Stanley Hydraulics Laboratory, Iowa 52242-1585, USA
3Hydrologis S.r.l., Bolzano, BZ, Italy
Received: 16 April 2011 – Published in Geosci. Model Dev. Discuss.: 29 April 2011
Revised: 20 September 2011 – Accepted: 31 October 2011 – Published: 4 November 2011
Abstract. This paper presents a discussion of the predic-
tive capacity of the implementation of the semi-distributed
hydrological modeling system JGrass-NewAge. This model
focuses on the hydrological budgets of medium scale to large
scale basins as the product of the processes at the hillslope
scale with the interplay of the river network. The part of the
modeling system presented here deals with the: (i) estimation
of the space-time structure of precipitation, (ii) estimation of
runoff production; (iii) aggregation and propagation of flows
in channel; (v) estimation of evapotranspiration; (vi) auto-
matic calibration of the discharge with the method of particle
swarming.
The system is based on a hillslope-link geometrical par-
tition of the landscape, combining raster and vectorial treat-
ment of hillslope data with vector based tracking of flow in
channels. Measured precipitation are spatially interpolated
with the use of kriging. Runoff production at each channel
link is estimated through a peculiar application of the Hymod
model. Routing in channels uses an integrated flow equation
and produces discharges at any link end, for any link in the
river network. Evapotranspiration is estimated with an im-
plementation of the Priestley-Taylor equation. The model
system assembly is calibrated using the particle swarming
algorithm. A two year simulation of hourly discharge of the
Little Washita (OK, USA) basin is presented and discussed
with the support of some classical indices of goodness of fit,
and analysis of the residuals. A novelty with respect to tra-
ditional hydrological modeling is that each of the elements
above, including the preprocessing and the analysis tools,
is implemented as a software component, built upon Object
Modelling System v3 and jgrasstools prescriptions, that can
be cleanly switched in and out at run-time, rather than at
Correspondence to: G. Formetta
( formetta@ing.unitn.it)
compiling time. The possibility of creating different mod-
eling products by the connection of modules with or without
the calibration tool, as for instance the case of the present
modeling chain, reduces redundancy in programming, pro-
motes collaborative work, enhances the productivity of re-
searchers, and facilitates the search for the optimal modeling
solution.
1 Introduction
Hydrological forecasting over time has focused on differ-
ent issues. Determining the discharge of rivers during flood
events has been a central topic for more than a century;
firstly through the rational model of Mulvaney (1851), later
through the use of instantaneous unit hydrograph models
(Sherman, 1932; Dooge, 1959), and more recently includ-
ing the geomorphological approach (i.e. GIUH; Rodr´ıguez-
Iturbe and Vald´es, 1979; Gupta and Waymire, 1980; Rosso,
1984; D’Odorico and Rigon, 2003). Even models of runoff
generation such as Topmodel (Beven and Kirkby, 1979;
Beven, 2001; Franchini et al., 1996) have been used mainly
for this purpose.
Subsequently, however, the water resource and river man-
agement required the need to estimate a whole set of hydro-
logical quantities such as discharge, evapotranspiration, and
soil moisture, bringing very soon to the implementation of
more comprehensive modeling systems, like the pioneering
Stanford watershed model (Crawford and Linsley, 1966), the
Sacramento model (e.g. Burnash et al., 1973), and the PRMS
model (Leavesley et al., 1983). They were usually based on
the idea of intercommunicating compartments (reservoirs),
each representing a process domain, each one with its resi-
dence time.
Published by Copernicus Publications on behalf of the European Geosciences Union.
Formetta et al. 2011
Rigon & Al.
9
Reproducible, in this case requires first
Consistency of notation
For what regards to this, the paper is certainly consistent (it is part of
the peer-review process to guarantee it).
A more strong statement would require consistency of notation
through series of companion papers.
But this paper, in particular, is not a heavy
theoretical treatment of some topic, and
notation is not really crucial here.
Notation helps
Rigon & Al.
10
Different story for this paper
(the case of Botter et al., 2010)
Click
Here
for
Full
Article
Transport in the hydrologic response: Travel time
distributions, soil moisture dynamics, and the old
water paradox
Gianluca Botter,1
Enrico Bertuzzo,2
and Andrea Rinaldo1,2
Received 8 July 2009; revised 23 October 2009; accepted 29 October 2009; published 12 March 2010.
[1] We propose a mathematical framework for the general definition and computation of
travel time distributions defined by the closure of a catchment control volume, where the
input flux is an arbitrary rainfall pattern and the output fluxes are green and blue water
flows (namely, evapotranspiration and the hydrologic response embedding runoff
production through soil water dynamics). The relevance of the problem is both practical,
owing to implications in hydrologic watershed modeling, and conceptual for the linkages
and the explanations the theory provides, chiefly concerning the role of geomorphology,
climate, soils, and vegetation through soil water dynamics and the treatment of the so‐
called old water paradox. The work focuses in particular on the origins of the conditional
and time‐variant nature of travel time distributions and on the differences between unit
hydrographs and travel time distributions. Both carrier flow and solute matter transport in
the control volume are accounted for coherently. The key effect of mixing processes
occurring within runoff production is also investigated, in particular by a model that
assumes that mobilization of soil water involves randomly sampled particles from the
available storage. Travel time distributions are analytically expressed in terms of the major
water fluxes driving soil moisture dynamics, irrespectively of the specific model used to
compute them. Relevant numerical examples and a set of generalized applications are
provided and discussed.
Citation: Botter, G., E. Bertuzzo, and A. Rinaldo (2010), Transport in the hydrologic response: Travel time distributions, soil
moisture dynamics, and the old water paradox, Water Resour. Res., 46, W03514, doi:10.1029/2009WR008371.
1. Introduction
[2] The age of water (or residence time) represents the
time spent by water molecules ideally sampled from a given
hydrologic system within the reference control volume
(measured since the entry through rainfall). Thus, the age of
water blends in a single quantitative attribute information
about hydrological and chemical storages, flow pathways,
and water sources [e.g., McGuire and McDonnell, 2006].
Several field observations (especially built through exten-
sive rainfall/runoff dating by isotope hydrology) and a few
theoretical results have established the so‐called “old water
paradox,” according to which a sizable part of the runoff
within the hydrologic response of catchment transport vo-
lumes is constituted by aged water particles (i.e., by water
particles injected at times preceding the event causally re-
lated to the observed runoff) [e.g., Maloszewski and Zuber,
1982; McDonnell, 1990; McDonnell et al., 1991; Stewart
and McDonnell, 1991; Wilson et al., 1991a, 1991b;
Leaney et al., 1993; Rodhe et al., 1996; Cirmo and
McDonnell, 1998; Nyberg et al., 1999; Peters and
Ratcliffe, 1998; Burns et al., 1998; Weiler et al., 2003;
McGuire et al., 2007; Botter et al., 2007, 2008a, 2009]. The
release of old water has been explained by the propagation
of pressure waves induced by precipitation inputs with a
celerity exceeding the pore water velocity [e.g., Beven,
1981, 1989b], including displacement of water previously
immobilized within the soil matrix into preferential flow
pathways [e.g., Beven and Germann, 1982]. However, some
of the physical processes controlling the release of preevent
water from catchments are still poorly understood or
roughly modeled, and the observational data do not suggest
either universal behaviors, nor do they support linear and
time‐invariant behaviors as assumed by unit hydrograph
schemes [e.g., Weiler and McDonnell, 2006]. The com-
plexity of the mixing patterns involving event and preevent
waters in hillslopes is partly a byproduct of the structural
complexity of subsurface environments, which are typically
characterized by pronounced heterogeneity and time vari-
able connectivity of flow pathways. For this reason, it is
inappropriate to use the point‐scale physical laws deter-
mining the movement of water and solutes within hillslopes
to make predictions at larger scales because of the nonlin-
earity of flow processes and the uncertain distribution of
hydrologic, geological and morphological properties of
control volumes [e.g., Beven, 1989a, 2006; Kirchner, 2009].
Hence, lumped approaches are frequently employed to
describe in an effective manner the overall behavior of
hillslopes/catchments. In particular, the water travel time
1
Dipartimento di Ingegneria Idraulica Marittima Ambientale e
Geotecnica, Università degli Studi di Padova, Padua, Italy.
2
Laboratory of Ecohydrology, Faculte´ ENAC, Ecole Polytechinque
Federale, Lausanne, Switzerland.
Copyright 2010 by the American Geophysical Union.
0043‐1397/10/2009WR008371
WATER RESOURCES RESEARCH, VOL. 46, W03514, doi:10.1029/2009WR008371, 2010
W03514 1 of 18
R. Rigon
Botter et al., 2010
11
This is an outstanding paper dealing with transport for residence time, which
I read several times during the last months, in order to reproduce their
research (with my own tools)
Click
Here
for
Full
Article
Transport in the hydrologic response: Travel time
distributions, soil moisture dynamics, and the old
water paradox
Gianluca Botter,1
Enrico Bertuzzo,2
and Andrea Rinaldo1,2
Received 8 July 2009; revised 23 October 2009; accepted 29 October 2009; published 12 March 2010.
[1] We propose a mathematical framework for the general definition and computation of
travel time distributions defined by the closure of a catchment control volume, where the
input flux is an arbitrary rainfall pattern and the output fluxes are green and blue water
flows (namely, evapotranspiration and the hydrologic response embedding runoff
production through soil water dynamics). The relevance of the problem is both practical,
owing to implications in hydrologic watershed modeling, and conceptual for the linkages
and the explanations the theory provides, chiefly concerning the role of geomorphology,
climate, soils, and vegetation through soil water dynamics and the treatment of the so‐
called old water paradox. The work focuses in particular on the origins of the conditional
and time‐variant nature of travel time distributions and on the differences between unit
hydrographs and travel time distributions. Both carrier flow and solute matter transport in
the control volume are accounted for coherently. The key effect of mixing processes
occurring within runoff production is also investigated, in particular by a model that
assumes that mobilization of soil water involves randomly sampled particles from the
available storage. Travel time distributions are analytically expressed in terms of the major
water fluxes driving soil moisture dynamics, irrespectively of the specific model used to
compute them. Relevant numerical examples and a set of generalized applications are
provided and discussed.
Citation: Botter, G., E. Bertuzzo, and A. Rinaldo (2010), Transport in the hydrologic response: Travel time distributions, soil
moisture dynamics, and the old water paradox, Water Resour. Res., 46, W03514, doi:10.1029/2009WR008371.
1. Introduction
[2] The age of water (or residence time) represents the
time spent by water molecules ideally sampled from a given
hydrologic system within the reference control volume
(measured since the entry through rainfall). Thus, the age of
water blends in a single quantitative attribute information
about hydrological and chemical storages, flow pathways,
and water sources [e.g., McGuire and McDonnell, 2006].
Several field observations (especially built through exten-
sive rainfall/runoff dating by isotope hydrology) and a few
theoretical results have established the so‐called “old water
paradox,” according to which a sizable part of the runoff
within the hydrologic response of catchment transport vo-
lumes is constituted by aged water particles (i.e., by water
particles injected at times preceding the event causally re-
lated to the observed runoff) [e.g., Maloszewski and Zuber,
1982; McDonnell, 1990; McDonnell et al., 1991; Stewart
and McDonnell, 1991; Wilson et al., 1991a, 1991b;
Leaney et al., 1993; Rodhe et al., 1996; Cirmo and
McDonnell, 1998; Nyberg et al., 1999; Peters and
Ratcliffe, 1998; Burns et al., 1998; Weiler et al., 2003;
McGuire et al., 2007; Botter et al., 2007, 2008a, 2009]. The
release of old water has been explained by the propagation
of pressure waves induced by precipitation inputs with a
celerity exceeding the pore water velocity [e.g., Beven,
1981, 1989b], including displacement of water previously
immobilized within the soil matrix into preferential flow
pathways [e.g., Beven and Germann, 1982]. However, some
of the physical processes controlling the release of preevent
water from catchments are still poorly understood or
roughly modeled, and the observational data do not suggest
either universal behaviors, nor do they support linear and
time‐invariant behaviors as assumed by unit hydrograph
schemes [e.g., Weiler and McDonnell, 2006]. The com-
plexity of the mixing patterns involving event and preevent
waters in hillslopes is partly a byproduct of the structural
complexity of subsurface environments, which are typically
characterized by pronounced heterogeneity and time vari-
able connectivity of flow pathways. For this reason, it is
inappropriate to use the point‐scale physical laws deter-
mining the movement of water and solutes within hillslopes
to make predictions at larger scales because of the nonlin-
earity of flow processes and the uncertain distribution of
hydrologic, geological and morphological properties of
control volumes [e.g., Beven, 1989a, 2006; Kirchner, 2009].
Hence, lumped approaches are frequently employed to
describe in an effective manner the overall behavior of
hillslopes/catchments. In particular, the water travel time
1
Dipartimento di Ingegneria Idraulica Marittima Ambientale e
Geotecnica, Università degli Studi di Padova, Padua, Italy.
2
Laboratory of Ecohydrology, Faculte´ ENAC, Ecole Polytechinque
Federale, Lausanne, Switzerland.
Copyright 2010 by the American Geophysical Union.
0043‐1397/10/2009WR008371
WATER RESOURCES RESEARCH, VOL. 46, W03514, doi:10.1029/2009WR008371, 2010
W03514 1 of 18
It is mostly a theoretical paper, with an application to an idealised case study
Botter et al., 2010
Rigon & Al.
12
JGrass-NewAGE 1.0
Hymod and RHymod in fig.(7.9).
Figure 7.9: Modelling solutions: Hymod (in red dashed line) and RHymod (in blued dashed line).
Back to Formetta et al., 2011
Rigon & Al.
13
JGrass-NewAGE 1.0: more
Therefore, to reproduce JGrass-NewAGE 1.0 results, one has to know
the theory of any of the above components. Unfortunately, this is
only the first impression. You have to know actually more
6. NEWAGE-JGRASS SHORTWAVE RADIATION MODEL
Figure 6.1: OMS3 SWRB components of NewAge-JGrass and the flowchart to model shortwave
radiation at the terrain surface with generic sky conditions. Where not specified, quantity for input
Back to Formetta et al., 2011
Rigon & Al.
14
JGrass-NewAGE 1.0: even more
for di↵erent time steps. The outputs could be or a .csv file or a raster map with the interpolated
values.
Comparisons with the R-package Gstat (115) are presented in Appendix 1 in order to test
the implemented algorithms (ordinary and local kriging).
Figure 5.3: The Kriging flowchart.
Back to Formetta et al., 2011
Rigon & Al.
15
JGrass-NewAGE 1.0: even more than more
The analysis of the catchment, starts with the acquisition of a Digital Terrain Model (DTM)
of the catchment, e.g. (159). It is performed as illustrated in fig.(4.1) and summarized for the
reader below.
Figure 4.1: The workflow for the basin delineation in NewAge-JGrass -
4.2.1 Geomorphological analysis
Back to Formetta et al., 2011
Rigon & Al.
16
Scared Enough ?
R. Rigon
Help me!
17
JGrass-NewAGE 1.0: Sorry, I forgot a pieceG. Formetta et al.: The JGrass-NewAge System for forecasting and managing hydrological budgets 953
Fig. 9. Application of the JGrass-NewAge model for the period 01/01/2002 to 31/12/2003.
case of two submodels for runoff production, one of which,
whilst appealing from a theoretical point of view, revealed
unfeasible during calibration. This models was, in fact, eas-
ily substituted by another without the need to rebuild the
whole model system.
The versatility of the modeling approach was also tested
by implementing two different modeling chains, one sub-
stantially performing simulation with a very lumped appli-
cation of the model, just using Hymod for the whole catch-
ment, the other representing a more distributed “version” of
the same Hymod runoff generating mechanism, connected
with a routing scheme. The forecasts were tested by analysis
of the residuals and through the estimation of some objective
indices, which were also implemented as software compo-
nents. These allowed us to objectively state that, at least for
the case in study, the performances of the distributed ver-
sion of the modeling chain was significantly better than the
lumped version, thus supporting the idea that the increase in
model complexity was worthwhile. It is noteworthy that this
comparison was made between systems where most of the
code was the same, thus guaranteeing, in our opinion, the
The modeling chain, although seemingly very traditional,
was actually implemented using advanced specifications of
the geographical objects, as required by OGC, and uses a
particular specification of the river network hierarchy and the
related hillslopes that was built upon the Pfafstetter ordering
scheme.
Even though the overall performances of the forecasting
can be considered very good, in the future some new compo-
nents could substitute the older ones and be compared consis-
tently along the same lines, even if further improvements in
the ability to forecast measured discharge could not be con-
sidered significant without a proper assessment of the uncer-
tainties inherent to the description of the processes.
These comparisons could be made by the same authors
or independently by other researchers, since the JGrass-
NewAge modeling system is freely available, with just the
new component requiring coding. In this sense the infras-
tructure promotes independent testing and verification of re-
search results with unprecedented easiness. In this perspec-
tive a component by component and interoperability com-
parison of the JGrass-NewAge system with others, such as
You need the same data !
In this case, you are lucky. We used open data … but this is not always the case
Back to Formetta et al., 2011
Rigon & Al.
18
Assuming you are bold and smart
This will take for you at least a couple of years for putting all the parts
together for your own and just following verbatim the indication you can get
from the paper. (We think we put all of the information in the paper
necessary: but, you know, this is practically unverifiable)
Mumbling
Rigon & Al.
19
Our paper is theoretically reproducible … but practically not: it requires
a trained person to do it, having all the right tools in her hands (including
programming skills)…
If you are a Ph.D. student that starts from the scratch you cannot
afford it ! Almost nobody goes back and repeats something that's
already been published, though.*
*http://arstechnica.com/science/2012/08/scientific-reproducibility-for-fun-and-profit/
Mumbling Mumbling
Rigon & Al.
20
So are we doing science or just cheating of doing
science ?
Theoretically reproducible … but practically not: means that
theoretically we are doing science, but practically not ?
Mumbling Mumbling Mumbling
Rigon & Al.
21
This is even worse than believed in today sciences
Because of the massive use of computation.
Computation is now central to the scientific
enterprise and it adds a further layer of
complexity to the science visible in papers.
Some paper that comes
out from computation
are out of any control
Not just one single case
Rigon & Al.
Not just one single case
22
“Computation is now central to the scientific
enterprise, and the emergence of powerful
computational hardware, combined with a vast array
of computational software, presents novel
opportunities for researchers. Unfortunately, the
scientific culture surrounding computational work
has evolved in ways that make it difficult to verify
findings, efficiently build on past research, or even
apply the basic tenets of the scientific method to
computational procedures.”
By Victoria Stodden, Jonathan M. Borwein, David H. Bailey, SIAM news
http://sinews.siam.org/DetailsPage/tabid/607/ArticleID/351/%E2%80%9CSetting-the-Default-to-
Reproducible%E2%80%9D-in-Computational-Science-Research.aspx
are out of any control
Rigon & Al.
23
To keep out any doubt
I decided to make public any code (any source code, actually) under a copyleft
license (GPL v 3.0). Se at:
http://abouthydrology.blogspot.it/2015/03/jgrass-newage-essentials.html
So we reduced a couple of years of work to three months (with instructions)
No fake science
Rigon & Al.
24
An we plan to make our work
Replicable
in any paper not only Reproducible
but we are not alone
No fake science
Rigon & Al.
25
Editorial: The publication of geoscientific model
developments v1.0
one of the EGU’s Open Access journals, i.f. 3.6
Journals
Rigon & Al.
26
Editorial: Vadose Zone Journal
Reproducible Research
in Vadose Zone Sciences
T.H. Skaggs,* M.H. Young, and J.A. Vrugt
A significant portion of present-day soil and Earth science research is
computational, involving complex data analysis pipelines, advanced
mathematical and statistical models, and sophisticated computer codes.
Opportunities for scientific progress are greatly diminished if reproduc-
ing and building on published research is difficult or impossible due to the
complexity of these computational systems. Vadose Zone Journal (VZJ) is
launching a Reproducible Research (RR) program in which code and data
underlying a research article will be published alongside the article, thereby
enabling readers to analyze data in a manner similar to that presented in
the article and build on results in future research and applications. In this
article, we discuss reproducible research, its background and use across
other disciplines, its value to the scientific community, and its implementa-
tion in VZJ.
Abbreviations: NIH, National Institutes of Health; RR, Reproducible Research; VZJ, Vadose
Zone Journal.
A hallmark of the scientific method is that research results must be reproduc-
ible. Although the reproducibility requirement has always existed, technological advances
over the last few decades have changed the way science is practiced and communicated,
creating for researchers and publishers new opportunities and challenges with respect to
openness and reproducibility.
One set of opportunities involves increased reuse of experimental data. The internet and
related information technologies have allowed greater archiving and sharing of environmen-
tal and geoscience data. Data sharing makes the validation of scientific findings possible,
lessenstheneedforwastefulduplicationofresearchefforts,andfacilitatesnewdatasynthesis
and aggregation activities. A number of environmental and geoscience publishers have pro-
Core Ideas
•	A significant portion of present-
day geoscience resea rch is
computational.
•	Science would benefit from greater
transparency in computational
research.
•	Vadose Zone Journal is launching a
Reproducible Research program.
•	Code and data underlying a
research article will be published
alongside articles.
Opinion and Policy
Published October 12, 2015
Journals
Rigon & Al.
27
1 Make our source code open source (actually not necessary just the
executable could serve the scope) and available through
https://github.com/
Counterattack: a strategy to make our work replicable
Rigon & Al.
28
a) Documenting our code as best as possible, according to a
standard format (still to define … but we are working on it).
b) Documenting our algorithms.
c) Using the Object modelling System v3 (David et al., 2013,
Formetta et al, 2014)
Rigon & Al.
2
Counterattack: a strategy to make our work replicable
29
Counterattack: a strategy to make our work replicable
3
Using the appropriate building tools
https://gradle.org/
Rigon & Al.
30
Use standard names for hydrological variable. For instance use the
Basic Model Interface standards BMI
http://csdms.colorado.edu/wiki/BMI_Description
Rigon & Al.
Counterattack: a strategy to make our work replicable
4
31
Using Authorea for uploading complementary material and
documentation.
https://www.authorea.com/
Rigon & Al.
Counterattack: a strategy to make our work replicable
You can use also Jupyter or Beaker
32
A strategy to make our paper replicable
Using as much as possible Open Data in our research and making
available openly our data*.
*
is a Nature Journal
http://www.nature.com/sdata/
https://en.wikipedia.org/wiki/Open_data
Rigon & Al.
Counterattack: a strategy to make our work replicable
33
Other (more valuable experiences)
The R community
(https://cran.r-project.org/web/views/ReproducibleResearch.html)
Communities
Rigon & Al.
34
Communities
Python
http://software-carpentry.org/
Rigon & Al.
35
https://www.coursera.org/course/repdata?
from_restricted_preview=1&course_id=973513&r=https%3A%2F%2Fclass.coursera.org%2Frepdata-012%2Fclass#
R based reproducible research on Coursera
Communities
Rigon & Al.
36
XXXV	CONVEGNO	NAZIONALE	DI	IDRAULICA		
E	COSTRUZIONI	IDRAULICHE	
Bologna,	14-16	Se/embre	2016	
Bancheri M. et al., Research reproducibility and replicability: the case of JGrass-NewAge
Source code Project examples
Community blog Documentation
htpp://geoframe.blogspot.com & https://github.com/geoframecomponents
R.Rigon, M.Bancheri, F. Serafin,W.Abera, G.Formetta
37
Become a Reproducible Research Warrior !
Do not wait! Make your stuff available
on the Web (whatever format)
under an open license*.
*Same as Tim Berners-Lee - Waiting to have it in better shape will delays the publication forever,
and your contribution will be lost (like tears in rain): http://5stardata.info/
R2
The
stairs
For yourself
Rigon & Al.
38
M a k e i t a v a i l a b l e w i t h
documentation (e.g. a README
file for any data set and for any
model)
R2
The
stairs
For yourself
Rigon & Al.
39
Provide examples of runs, and give
some reference. Structure your
documentation. Include figures and
their making.
R2
The
stairs
For yourself
Rigon & Al.
40
Use URLs and providers like Github
to store code and data, so people
can point at your stuff, and browse
it freely*
R2
The
stairs
For yourself
Rigon & Al.
41
Maintain a user group (and answer to
questions when asked). Provide any run you
do on the web with the appropriate
metadata.** ***
**: https://earthsystemcog.org/projects/es-doc-models/
***http://abouthydrology.blogspot.it/2014/10/naming-things-in-hydrological-models.html
R2
The
stairs
For yourself
Rigon & Al.
42
R. Rigon
Maintain a user group (and answer to
questions when asked). Provide any run you
do on the web with the appropriate
metadata.** ***
**: https://earthsystemcog.org/projects/es-doc-models/
***http://abouthydrology.blogspot.it/2014/10/naming-things-in-hydrological-models.html
Use URLs and providers like Github
to store code and data, so people
can point at your stuff, and browse
it freely*
M a k e i t a v a i l a b l e w i t h
documentation (e.g. a README
file for any data set and for any
model)
Provide examples of runs, and give
some reference. Structure your
documentation. Include figures and
their making.
Do not wait! Make your stuff available
on the Web (whatever format)
under an open license*.
*http://5stardata.info/
R2
The
stairs
For yourself
43
See Also
Journals
Rigon & Al.
http://sciencecodemanifesto.org/
44
In conclusion
Conclusions
• Research must be reproducible
• In many case it would be better it is replicable
• Making our research replicable can be an advantage
• It can favour the progress of science
• Do not be shy: share your research
• Nobody is going to hurt you
Rigon & Al.
Find your own way to Reproducible Research
!45
Find this presentation at
http://abouthydrology.blogspot.com
Ulrici,2000?
Other material at
Questions ?
http://abouthydrology.blogspot.it/2015/07/theory-and-practice-of-reproducible.html
Rigon & Al.
46
For the web references, see the slides.
Formetta, G.; Mantilla, R.; Franceschi, S., Antonello A., Rigon R., The JGrass- NewAge system
for forecasting and managing the hydrological budgets at the basin scale: models of flow
generation and propagation/routing, Geoscientific Model Development Volume: 4 Issue: 4
Pages: 943-955, DOI: 10.5194/gmd-4- 943-201, 2011
Botter, G., E. Bertuzzo, and A. Rinaldo (2010), Transport in the hydrologic response: Travel
time distributions, soil moisture dynamics, and the old water paradox, Water Resour. Res.,
46, W03514, doi:10.1029/2009WR008371.
Formetta G., Antonello A., Franceschi S., David O., and Rigon R., Hydrological
modelling with components: A GIS-based open-source framework, Environmen- tal
Modelling Software, 5 (2014), 190-200
David, O., Ascough II, J.C., Lloyd, W., Green, T.R., Rojas, K.W., Leavesley, G.H., Ahuja, L.R.,
2013. A software engineering perspective on environmental modeling framework design: the
Object Modeling System. Environ. Model. Softw. 39, 201e213.
References
Rigon & Al.
47
References right to the point
Hutton, C., Wagener, T., Freer, J., Han, D., Duffy, C., & Arheimer, B. (2016). Most
computational hydrology is not reproducible, so is it really science?,, so is it
really science? Water Resources Research, 1–14. http://doi.org/
10.1002/2016WR019285
Ince, D. C., Hatton, L., & Graham-Cumming, J. (2013). The case of open
computers programs, Nature, 482(7386), 485–488. http://doi.org/10.1038/
nature10836
Reproducible Research in Vadose Zone Sciences. (2015). Reproducible Research
in Vadose Zone Sciences, 1–5. http://doi.org/10.2136/vzj
Rigon & Al.
References

Research reproducibility - Code etc.

  • 1.
    Theory and Practiceof Reproducible Research OGRS, Perugia, October 12, 2016 Riccardo Rigon, Francesco Serafin, Marialaura Bancheri AntonioCanova,Letregrazie
  • 2.
    2 Antonio Canova gypsumstatues bring a series of little signs. They served the stonemasons to reproduce “industrially” the opera. Art became “reproducible” for the fist time. Rigon & Al. Canova ?
  • 3.
    http://simplystatistics.org/2013/01/23/statisticians-and-computer-scientists-if-there-is-no-code-there-is-no-paper/ I have beenfrustrated often with statisticians and computer scientists who write papers where they develop new methods and seem to demonstrate that those methods blow away all their competitors. But then no software is available to actually test and see if that is true. … In my mind, new methods/analyses without software are just vaporware … If there is no code, there is no paper. By Jeff Leek*
  • 4.
    4 Science must bereproducible (i.e. repeatable) It is the fundamental. It means that everyone (in principle) should be able to take what you write, the experiment you did, the mathematics you drew, and doing it again with his own resources. “In principle means” that science is often not is not shared … for various reasons … Why reproducibility ? Rigon & Al.
  • 5.
    5 Not anyone canreproduce scientific achievements “In principle” means that S/he must be trained to do it (there are problems of transmission of information here). And, in fact, more advanced results, can be difficult to grab, even for the very same autors. Introduction Rigon & Al.
  • 6.
  • 7.
    7 Analysing a paperfor reproducibility (the case of Formetta et al., 2011) Geosci. Model Dev., 4, 943–955, 2011 www.geosci-model-dev.net/4/943/2011/ doi:10.5194/gmd-4-943-2011 © Author(s) 2011. CC Attribution 3.0 License. Geoscientific Model Development The JGrass-NewAge system for forecasting and managing the hydrological budgets at the basin scale: models of flow generation and propagation/routing G. Formetta1, R. Mantilla2, S. Franceschi3, A. Antonello3, and R. Rigon1 1University of Trento, 77 Mesiano St., Trento, 38123, Italy 2The University of Iowa, C. Maxwell Stanley Hydraulics Laboratory, Iowa 52242-1585, USA 3Hydrologis S.r.l., Bolzano, BZ, Italy Received: 16 April 2011 – Published in Geosci. Model Dev. Discuss.: 29 April 2011 Revised: 20 September 2011 – Accepted: 31 October 2011 – Published: 4 November 2011 Abstract. This paper presents a discussion of the predic- tive capacity of the implementation of the semi-distributed hydrological modeling system JGrass-NewAge. This model focuses on the hydrological budgets of medium scale to large scale basins as the product of the processes at the hillslope scale with the interplay of the river network. The part of the modeling system presented here deals with the: (i) estimation of the space-time structure of precipitation, (ii) estimation of runoff production; (iii) aggregation and propagation of flows in channel; (v) estimation of evapotranspiration; (vi) auto- matic calibration of the discharge with the method of particle swarming. The system is based on a hillslope-link geometrical par- tition of the landscape, combining raster and vectorial treat- ment of hillslope data with vector based tracking of flow in channels. Measured precipitation are spatially interpolated with the use of kriging. Runoff production at each channel link is estimated through a peculiar application of the Hymod model. Routing in channels uses an integrated flow equation and produces discharges at any link end, for any link in the river network. Evapotranspiration is estimated with an im- plementation of the Priestley-Taylor equation. The model system assembly is calibrated using the particle swarming algorithm. A two year simulation of hourly discharge of the Little Washita (OK, USA) basin is presented and discussed with the support of some classical indices of goodness of fit, and analysis of the residuals. A novelty with respect to tra- ditional hydrological modeling is that each of the elements above, including the preprocessing and the analysis tools, is implemented as a software component, built upon Object Modelling System v3 and jgrasstools prescriptions, that can be cleanly switched in and out at run-time, rather than at Correspondence to: G. Formetta ( formetta@ing.unitn.it) compiling time. The possibility of creating different mod- eling products by the connection of modules with or without the calibration tool, as for instance the case of the present modeling chain, reduces redundancy in programming, pro- motes collaborative work, enhances the productivity of re- searchers, and facilitates the search for the optimal modeling solution. 1 Introduction Hydrological forecasting over time has focused on differ- ent issues. Determining the discharge of rivers during flood events has been a central topic for more than a century; firstly through the rational model of Mulvaney (1851), later through the use of instantaneous unit hydrograph models (Sherman, 1932; Dooge, 1959), and more recently includ- ing the geomorphological approach (i.e. GIUH; Rodr´ıguez- Iturbe and Vald´es, 1979; Gupta and Waymire, 1980; Rosso, 1984; D’Odorico and Rigon, 2003). Even models of runoff generation such as Topmodel (Beven and Kirkby, 1979; Beven, 2001; Franchini et al., 1996) have been used mainly for this purpose. Subsequently, however, the water resource and river man- agement required the need to estimate a whole set of hydro- logical quantities such as discharge, evapotranspiration, and soil moisture, bringing very soon to the implementation of more comprehensive modeling systems, like the pioneering Stanford watershed model (Crawford and Linsley, 1966), the Sacramento model (e.g. Burnash et al., 1973), and the PRMS model (Leavesley et al., 1983). They were usually based on the idea of intercommunicating compartments (reservoirs), each representing a process domain, each one with its resi- dence time. Published by Copernicus Publications on behalf of the European Geosciences Union. Formetta et al. 2011 Rigon & Al.
  • 8.
    8 This is apaper, which I co-authored, dealing with a model for rainfall runoff, It is mostly which presents a hydrological model, with an application to a case study Geosci. Model Dev., 4, 943–955, 2011 www.geosci-model-dev.net/4/943/2011/ doi:10.5194/gmd-4-943-2011 © Author(s) 2011. CC Attribution 3.0 License. Geoscientific Model Development The JGrass-NewAge system for forecasting and managing the hydrological budgets at the basin scale: models of flow generation and propagation/routing G. Formetta1, R. Mantilla2, S. Franceschi3, A. Antonello3, and R. Rigon1 1University of Trento, 77 Mesiano St., Trento, 38123, Italy 2The University of Iowa, C. Maxwell Stanley Hydraulics Laboratory, Iowa 52242-1585, USA 3Hydrologis S.r.l., Bolzano, BZ, Italy Received: 16 April 2011 – Published in Geosci. Model Dev. Discuss.: 29 April 2011 Revised: 20 September 2011 – Accepted: 31 October 2011 – Published: 4 November 2011 Abstract. This paper presents a discussion of the predic- tive capacity of the implementation of the semi-distributed hydrological modeling system JGrass-NewAge. This model focuses on the hydrological budgets of medium scale to large scale basins as the product of the processes at the hillslope scale with the interplay of the river network. The part of the modeling system presented here deals with the: (i) estimation of the space-time structure of precipitation, (ii) estimation of runoff production; (iii) aggregation and propagation of flows in channel; (v) estimation of evapotranspiration; (vi) auto- matic calibration of the discharge with the method of particle swarming. The system is based on a hillslope-link geometrical par- tition of the landscape, combining raster and vectorial treat- ment of hillslope data with vector based tracking of flow in channels. Measured precipitation are spatially interpolated with the use of kriging. Runoff production at each channel link is estimated through a peculiar application of the Hymod model. Routing in channels uses an integrated flow equation and produces discharges at any link end, for any link in the river network. Evapotranspiration is estimated with an im- plementation of the Priestley-Taylor equation. The model system assembly is calibrated using the particle swarming algorithm. A two year simulation of hourly discharge of the Little Washita (OK, USA) basin is presented and discussed with the support of some classical indices of goodness of fit, and analysis of the residuals. A novelty with respect to tra- ditional hydrological modeling is that each of the elements above, including the preprocessing and the analysis tools, is implemented as a software component, built upon Object Modelling System v3 and jgrasstools prescriptions, that can be cleanly switched in and out at run-time, rather than at Correspondence to: G. Formetta ( formetta@ing.unitn.it) compiling time. The possibility of creating different mod- eling products by the connection of modules with or without the calibration tool, as for instance the case of the present modeling chain, reduces redundancy in programming, pro- motes collaborative work, enhances the productivity of re- searchers, and facilitates the search for the optimal modeling solution. 1 Introduction Hydrological forecasting over time has focused on differ- ent issues. Determining the discharge of rivers during flood events has been a central topic for more than a century; firstly through the rational model of Mulvaney (1851), later through the use of instantaneous unit hydrograph models (Sherman, 1932; Dooge, 1959), and more recently includ- ing the geomorphological approach (i.e. GIUH; Rodr´ıguez- Iturbe and Vald´es, 1979; Gupta and Waymire, 1980; Rosso, 1984; D’Odorico and Rigon, 2003). Even models of runoff generation such as Topmodel (Beven and Kirkby, 1979; Beven, 2001; Franchini et al., 1996) have been used mainly for this purpose. Subsequently, however, the water resource and river man- agement required the need to estimate a whole set of hydro- logical quantities such as discharge, evapotranspiration, and soil moisture, bringing very soon to the implementation of more comprehensive modeling systems, like the pioneering Stanford watershed model (Crawford and Linsley, 1966), the Sacramento model (e.g. Burnash et al., 1973), and the PRMS model (Leavesley et al., 1983). They were usually based on the idea of intercommunicating compartments (reservoirs), each representing a process domain, each one with its resi- dence time. Published by Copernicus Publications on behalf of the European Geosciences Union. Formetta et al. 2011 Rigon & Al.
  • 9.
    9 Reproducible, in thiscase requires first Consistency of notation For what regards to this, the paper is certainly consistent (it is part of the peer-review process to guarantee it). A more strong statement would require consistency of notation through series of companion papers. But this paper, in particular, is not a heavy theoretical treatment of some topic, and notation is not really crucial here. Notation helps Rigon & Al.
  • 10.
    10 Different story forthis paper (the case of Botter et al., 2010) Click Here for Full Article Transport in the hydrologic response: Travel time distributions, soil moisture dynamics, and the old water paradox Gianluca Botter,1 Enrico Bertuzzo,2 and Andrea Rinaldo1,2 Received 8 July 2009; revised 23 October 2009; accepted 29 October 2009; published 12 March 2010. [1] We propose a mathematical framework for the general definition and computation of travel time distributions defined by the closure of a catchment control volume, where the input flux is an arbitrary rainfall pattern and the output fluxes are green and blue water flows (namely, evapotranspiration and the hydrologic response embedding runoff production through soil water dynamics). The relevance of the problem is both practical, owing to implications in hydrologic watershed modeling, and conceptual for the linkages and the explanations the theory provides, chiefly concerning the role of geomorphology, climate, soils, and vegetation through soil water dynamics and the treatment of the so‐ called old water paradox. The work focuses in particular on the origins of the conditional and time‐variant nature of travel time distributions and on the differences between unit hydrographs and travel time distributions. Both carrier flow and solute matter transport in the control volume are accounted for coherently. The key effect of mixing processes occurring within runoff production is also investigated, in particular by a model that assumes that mobilization of soil water involves randomly sampled particles from the available storage. Travel time distributions are analytically expressed in terms of the major water fluxes driving soil moisture dynamics, irrespectively of the specific model used to compute them. Relevant numerical examples and a set of generalized applications are provided and discussed. Citation: Botter, G., E. Bertuzzo, and A. Rinaldo (2010), Transport in the hydrologic response: Travel time distributions, soil moisture dynamics, and the old water paradox, Water Resour. Res., 46, W03514, doi:10.1029/2009WR008371. 1. Introduction [2] The age of water (or residence time) represents the time spent by water molecules ideally sampled from a given hydrologic system within the reference control volume (measured since the entry through rainfall). Thus, the age of water blends in a single quantitative attribute information about hydrological and chemical storages, flow pathways, and water sources [e.g., McGuire and McDonnell, 2006]. Several field observations (especially built through exten- sive rainfall/runoff dating by isotope hydrology) and a few theoretical results have established the so‐called “old water paradox,” according to which a sizable part of the runoff within the hydrologic response of catchment transport vo- lumes is constituted by aged water particles (i.e., by water particles injected at times preceding the event causally re- lated to the observed runoff) [e.g., Maloszewski and Zuber, 1982; McDonnell, 1990; McDonnell et al., 1991; Stewart and McDonnell, 1991; Wilson et al., 1991a, 1991b; Leaney et al., 1993; Rodhe et al., 1996; Cirmo and McDonnell, 1998; Nyberg et al., 1999; Peters and Ratcliffe, 1998; Burns et al., 1998; Weiler et al., 2003; McGuire et al., 2007; Botter et al., 2007, 2008a, 2009]. The release of old water has been explained by the propagation of pressure waves induced by precipitation inputs with a celerity exceeding the pore water velocity [e.g., Beven, 1981, 1989b], including displacement of water previously immobilized within the soil matrix into preferential flow pathways [e.g., Beven and Germann, 1982]. However, some of the physical processes controlling the release of preevent water from catchments are still poorly understood or roughly modeled, and the observational data do not suggest either universal behaviors, nor do they support linear and time‐invariant behaviors as assumed by unit hydrograph schemes [e.g., Weiler and McDonnell, 2006]. The com- plexity of the mixing patterns involving event and preevent waters in hillslopes is partly a byproduct of the structural complexity of subsurface environments, which are typically characterized by pronounced heterogeneity and time vari- able connectivity of flow pathways. For this reason, it is inappropriate to use the point‐scale physical laws deter- mining the movement of water and solutes within hillslopes to make predictions at larger scales because of the nonlin- earity of flow processes and the uncertain distribution of hydrologic, geological and morphological properties of control volumes [e.g., Beven, 1989a, 2006; Kirchner, 2009]. Hence, lumped approaches are frequently employed to describe in an effective manner the overall behavior of hillslopes/catchments. In particular, the water travel time 1 Dipartimento di Ingegneria Idraulica Marittima Ambientale e Geotecnica, Università degli Studi di Padova, Padua, Italy. 2 Laboratory of Ecohydrology, Faculte´ ENAC, Ecole Polytechinque Federale, Lausanne, Switzerland. Copyright 2010 by the American Geophysical Union. 0043‐1397/10/2009WR008371 WATER RESOURCES RESEARCH, VOL. 46, W03514, doi:10.1029/2009WR008371, 2010 W03514 1 of 18 R. Rigon Botter et al., 2010
  • 11.
    11 This is anoutstanding paper dealing with transport for residence time, which I read several times during the last months, in order to reproduce their research (with my own tools) Click Here for Full Article Transport in the hydrologic response: Travel time distributions, soil moisture dynamics, and the old water paradox Gianluca Botter,1 Enrico Bertuzzo,2 and Andrea Rinaldo1,2 Received 8 July 2009; revised 23 October 2009; accepted 29 October 2009; published 12 March 2010. [1] We propose a mathematical framework for the general definition and computation of travel time distributions defined by the closure of a catchment control volume, where the input flux is an arbitrary rainfall pattern and the output fluxes are green and blue water flows (namely, evapotranspiration and the hydrologic response embedding runoff production through soil water dynamics). The relevance of the problem is both practical, owing to implications in hydrologic watershed modeling, and conceptual for the linkages and the explanations the theory provides, chiefly concerning the role of geomorphology, climate, soils, and vegetation through soil water dynamics and the treatment of the so‐ called old water paradox. The work focuses in particular on the origins of the conditional and time‐variant nature of travel time distributions and on the differences between unit hydrographs and travel time distributions. Both carrier flow and solute matter transport in the control volume are accounted for coherently. The key effect of mixing processes occurring within runoff production is also investigated, in particular by a model that assumes that mobilization of soil water involves randomly sampled particles from the available storage. Travel time distributions are analytically expressed in terms of the major water fluxes driving soil moisture dynamics, irrespectively of the specific model used to compute them. Relevant numerical examples and a set of generalized applications are provided and discussed. Citation: Botter, G., E. Bertuzzo, and A. Rinaldo (2010), Transport in the hydrologic response: Travel time distributions, soil moisture dynamics, and the old water paradox, Water Resour. Res., 46, W03514, doi:10.1029/2009WR008371. 1. Introduction [2] The age of water (or residence time) represents the time spent by water molecules ideally sampled from a given hydrologic system within the reference control volume (measured since the entry through rainfall). Thus, the age of water blends in a single quantitative attribute information about hydrological and chemical storages, flow pathways, and water sources [e.g., McGuire and McDonnell, 2006]. Several field observations (especially built through exten- sive rainfall/runoff dating by isotope hydrology) and a few theoretical results have established the so‐called “old water paradox,” according to which a sizable part of the runoff within the hydrologic response of catchment transport vo- lumes is constituted by aged water particles (i.e., by water particles injected at times preceding the event causally re- lated to the observed runoff) [e.g., Maloszewski and Zuber, 1982; McDonnell, 1990; McDonnell et al., 1991; Stewart and McDonnell, 1991; Wilson et al., 1991a, 1991b; Leaney et al., 1993; Rodhe et al., 1996; Cirmo and McDonnell, 1998; Nyberg et al., 1999; Peters and Ratcliffe, 1998; Burns et al., 1998; Weiler et al., 2003; McGuire et al., 2007; Botter et al., 2007, 2008a, 2009]. The release of old water has been explained by the propagation of pressure waves induced by precipitation inputs with a celerity exceeding the pore water velocity [e.g., Beven, 1981, 1989b], including displacement of water previously immobilized within the soil matrix into preferential flow pathways [e.g., Beven and Germann, 1982]. However, some of the physical processes controlling the release of preevent water from catchments are still poorly understood or roughly modeled, and the observational data do not suggest either universal behaviors, nor do they support linear and time‐invariant behaviors as assumed by unit hydrograph schemes [e.g., Weiler and McDonnell, 2006]. The com- plexity of the mixing patterns involving event and preevent waters in hillslopes is partly a byproduct of the structural complexity of subsurface environments, which are typically characterized by pronounced heterogeneity and time vari- able connectivity of flow pathways. For this reason, it is inappropriate to use the point‐scale physical laws deter- mining the movement of water and solutes within hillslopes to make predictions at larger scales because of the nonlin- earity of flow processes and the uncertain distribution of hydrologic, geological and morphological properties of control volumes [e.g., Beven, 1989a, 2006; Kirchner, 2009]. Hence, lumped approaches are frequently employed to describe in an effective manner the overall behavior of hillslopes/catchments. In particular, the water travel time 1 Dipartimento di Ingegneria Idraulica Marittima Ambientale e Geotecnica, Università degli Studi di Padova, Padua, Italy. 2 Laboratory of Ecohydrology, Faculte´ ENAC, Ecole Polytechinque Federale, Lausanne, Switzerland. Copyright 2010 by the American Geophysical Union. 0043‐1397/10/2009WR008371 WATER RESOURCES RESEARCH, VOL. 46, W03514, doi:10.1029/2009WR008371, 2010 W03514 1 of 18 It is mostly a theoretical paper, with an application to an idealised case study Botter et al., 2010 Rigon & Al.
  • 12.
    12 JGrass-NewAGE 1.0 Hymod andRHymod in fig.(7.9). Figure 7.9: Modelling solutions: Hymod (in red dashed line) and RHymod (in blued dashed line). Back to Formetta et al., 2011 Rigon & Al.
  • 13.
    13 JGrass-NewAGE 1.0: more Therefore,to reproduce JGrass-NewAGE 1.0 results, one has to know the theory of any of the above components. Unfortunately, this is only the first impression. You have to know actually more 6. NEWAGE-JGRASS SHORTWAVE RADIATION MODEL Figure 6.1: OMS3 SWRB components of NewAge-JGrass and the flowchart to model shortwave radiation at the terrain surface with generic sky conditions. Where not specified, quantity for input Back to Formetta et al., 2011 Rigon & Al.
  • 14.
    14 JGrass-NewAGE 1.0: evenmore for di↵erent time steps. The outputs could be or a .csv file or a raster map with the interpolated values. Comparisons with the R-package Gstat (115) are presented in Appendix 1 in order to test the implemented algorithms (ordinary and local kriging). Figure 5.3: The Kriging flowchart. Back to Formetta et al., 2011 Rigon & Al.
  • 15.
    15 JGrass-NewAGE 1.0: evenmore than more The analysis of the catchment, starts with the acquisition of a Digital Terrain Model (DTM) of the catchment, e.g. (159). It is performed as illustrated in fig.(4.1) and summarized for the reader below. Figure 4.1: The workflow for the basin delineation in NewAge-JGrass - 4.2.1 Geomorphological analysis Back to Formetta et al., 2011 Rigon & Al.
  • 16.
    16 Scared Enough ? R.Rigon Help me!
  • 17.
    17 JGrass-NewAGE 1.0: Sorry,I forgot a pieceG. Formetta et al.: The JGrass-NewAge System for forecasting and managing hydrological budgets 953 Fig. 9. Application of the JGrass-NewAge model for the period 01/01/2002 to 31/12/2003. case of two submodels for runoff production, one of which, whilst appealing from a theoretical point of view, revealed unfeasible during calibration. This models was, in fact, eas- ily substituted by another without the need to rebuild the whole model system. The versatility of the modeling approach was also tested by implementing two different modeling chains, one sub- stantially performing simulation with a very lumped appli- cation of the model, just using Hymod for the whole catch- ment, the other representing a more distributed “version” of the same Hymod runoff generating mechanism, connected with a routing scheme. The forecasts were tested by analysis of the residuals and through the estimation of some objective indices, which were also implemented as software compo- nents. These allowed us to objectively state that, at least for the case in study, the performances of the distributed ver- sion of the modeling chain was significantly better than the lumped version, thus supporting the idea that the increase in model complexity was worthwhile. It is noteworthy that this comparison was made between systems where most of the code was the same, thus guaranteeing, in our opinion, the The modeling chain, although seemingly very traditional, was actually implemented using advanced specifications of the geographical objects, as required by OGC, and uses a particular specification of the river network hierarchy and the related hillslopes that was built upon the Pfafstetter ordering scheme. Even though the overall performances of the forecasting can be considered very good, in the future some new compo- nents could substitute the older ones and be compared consis- tently along the same lines, even if further improvements in the ability to forecast measured discharge could not be con- sidered significant without a proper assessment of the uncer- tainties inherent to the description of the processes. These comparisons could be made by the same authors or independently by other researchers, since the JGrass- NewAge modeling system is freely available, with just the new component requiring coding. In this sense the infras- tructure promotes independent testing and verification of re- search results with unprecedented easiness. In this perspec- tive a component by component and interoperability com- parison of the JGrass-NewAge system with others, such as You need the same data ! In this case, you are lucky. We used open data … but this is not always the case Back to Formetta et al., 2011 Rigon & Al.
  • 18.
    18 Assuming you arebold and smart This will take for you at least a couple of years for putting all the parts together for your own and just following verbatim the indication you can get from the paper. (We think we put all of the information in the paper necessary: but, you know, this is practically unverifiable) Mumbling Rigon & Al.
  • 19.
    19 Our paper istheoretically reproducible … but practically not: it requires a trained person to do it, having all the right tools in her hands (including programming skills)… If you are a Ph.D. student that starts from the scratch you cannot afford it ! Almost nobody goes back and repeats something that's already been published, though.* *http://arstechnica.com/science/2012/08/scientific-reproducibility-for-fun-and-profit/ Mumbling Mumbling Rigon & Al.
  • 20.
    20 So are wedoing science or just cheating of doing science ? Theoretically reproducible … but practically not: means that theoretically we are doing science, but practically not ? Mumbling Mumbling Mumbling Rigon & Al.
  • 21.
    21 This is evenworse than believed in today sciences Because of the massive use of computation. Computation is now central to the scientific enterprise and it adds a further layer of complexity to the science visible in papers. Some paper that comes out from computation are out of any control Not just one single case Rigon & Al.
  • 22.
    Not just onesingle case 22 “Computation is now central to the scientific enterprise, and the emergence of powerful computational hardware, combined with a vast array of computational software, presents novel opportunities for researchers. Unfortunately, the scientific culture surrounding computational work has evolved in ways that make it difficult to verify findings, efficiently build on past research, or even apply the basic tenets of the scientific method to computational procedures.” By Victoria Stodden, Jonathan M. Borwein, David H. Bailey, SIAM news http://sinews.siam.org/DetailsPage/tabid/607/ArticleID/351/%E2%80%9CSetting-the-Default-to- Reproducible%E2%80%9D-in-Computational-Science-Research.aspx are out of any control Rigon & Al.
  • 23.
    23 To keep outany doubt I decided to make public any code (any source code, actually) under a copyleft license (GPL v 3.0). Se at: http://abouthydrology.blogspot.it/2015/03/jgrass-newage-essentials.html So we reduced a couple of years of work to three months (with instructions) No fake science Rigon & Al.
  • 24.
    24 An we planto make our work Replicable in any paper not only Reproducible but we are not alone No fake science Rigon & Al.
  • 25.
    25 Editorial: The publicationof geoscientific model developments v1.0 one of the EGU’s Open Access journals, i.f. 3.6 Journals Rigon & Al.
  • 26.
    26 Editorial: Vadose ZoneJournal Reproducible Research in Vadose Zone Sciences T.H. Skaggs,* M.H. Young, and J.A. Vrugt A significant portion of present-day soil and Earth science research is computational, involving complex data analysis pipelines, advanced mathematical and statistical models, and sophisticated computer codes. Opportunities for scientific progress are greatly diminished if reproduc- ing and building on published research is difficult or impossible due to the complexity of these computational systems. Vadose Zone Journal (VZJ) is launching a Reproducible Research (RR) program in which code and data underlying a research article will be published alongside the article, thereby enabling readers to analyze data in a manner similar to that presented in the article and build on results in future research and applications. In this article, we discuss reproducible research, its background and use across other disciplines, its value to the scientific community, and its implementa- tion in VZJ. Abbreviations: NIH, National Institutes of Health; RR, Reproducible Research; VZJ, Vadose Zone Journal. A hallmark of the scientific method is that research results must be reproduc- ible. Although the reproducibility requirement has always existed, technological advances over the last few decades have changed the way science is practiced and communicated, creating for researchers and publishers new opportunities and challenges with respect to openness and reproducibility. One set of opportunities involves increased reuse of experimental data. The internet and related information technologies have allowed greater archiving and sharing of environmen- tal and geoscience data. Data sharing makes the validation of scientific findings possible, lessenstheneedforwastefulduplicationofresearchefforts,andfacilitatesnewdatasynthesis and aggregation activities. A number of environmental and geoscience publishers have pro- Core Ideas • A significant portion of present- day geoscience resea rch is computational. • Science would benefit from greater transparency in computational research. • Vadose Zone Journal is launching a Reproducible Research program. • Code and data underlying a research article will be published alongside articles. Opinion and Policy Published October 12, 2015 Journals Rigon & Al.
  • 27.
    27 1 Make oursource code open source (actually not necessary just the executable could serve the scope) and available through https://github.com/ Counterattack: a strategy to make our work replicable Rigon & Al.
  • 28.
    28 a) Documenting ourcode as best as possible, according to a standard format (still to define … but we are working on it). b) Documenting our algorithms. c) Using the Object modelling System v3 (David et al., 2013, Formetta et al, 2014) Rigon & Al. 2 Counterattack: a strategy to make our work replicable
  • 29.
    29 Counterattack: a strategyto make our work replicable 3 Using the appropriate building tools https://gradle.org/ Rigon & Al.
  • 30.
    30 Use standard namesfor hydrological variable. For instance use the Basic Model Interface standards BMI http://csdms.colorado.edu/wiki/BMI_Description Rigon & Al. Counterattack: a strategy to make our work replicable 4
  • 31.
    31 Using Authorea foruploading complementary material and documentation. https://www.authorea.com/ Rigon & Al. Counterattack: a strategy to make our work replicable You can use also Jupyter or Beaker
  • 32.
    32 A strategy tomake our paper replicable Using as much as possible Open Data in our research and making available openly our data*. * is a Nature Journal http://www.nature.com/sdata/ https://en.wikipedia.org/wiki/Open_data Rigon & Al. Counterattack: a strategy to make our work replicable
  • 33.
    33 Other (more valuableexperiences) The R community (https://cran.r-project.org/web/views/ReproducibleResearch.html) Communities Rigon & Al.
  • 34.
  • 35.
  • 36.
    36 XXXV CONVEGNO NAZIONALE DI IDRAULICA E COSTRUZIONI IDRAULICHE Bologna, 14-16 Se/embre 2016 Bancheri M. etal., Research reproducibility and replicability: the case of JGrass-NewAge Source code Project examples Community blog Documentation htpp://geoframe.blogspot.com & https://github.com/geoframecomponents R.Rigon, M.Bancheri, F. Serafin,W.Abera, G.Formetta
  • 37.
    37 Become a ReproducibleResearch Warrior ! Do not wait! Make your stuff available on the Web (whatever format) under an open license*. *Same as Tim Berners-Lee - Waiting to have it in better shape will delays the publication forever, and your contribution will be lost (like tears in rain): http://5stardata.info/ R2 The stairs For yourself Rigon & Al.
  • 38.
    38 M a ke i t a v a i l a b l e w i t h documentation (e.g. a README file for any data set and for any model) R2 The stairs For yourself Rigon & Al.
  • 39.
    39 Provide examples ofruns, and give some reference. Structure your documentation. Include figures and their making. R2 The stairs For yourself Rigon & Al.
  • 40.
    40 Use URLs andproviders like Github to store code and data, so people can point at your stuff, and browse it freely* R2 The stairs For yourself Rigon & Al.
  • 41.
    41 Maintain a usergroup (and answer to questions when asked). Provide any run you do on the web with the appropriate metadata.** *** **: https://earthsystemcog.org/projects/es-doc-models/ ***http://abouthydrology.blogspot.it/2014/10/naming-things-in-hydrological-models.html R2 The stairs For yourself Rigon & Al.
  • 42.
    42 R. Rigon Maintain auser group (and answer to questions when asked). Provide any run you do on the web with the appropriate metadata.** *** **: https://earthsystemcog.org/projects/es-doc-models/ ***http://abouthydrology.blogspot.it/2014/10/naming-things-in-hydrological-models.html Use URLs and providers like Github to store code and data, so people can point at your stuff, and browse it freely* M a k e i t a v a i l a b l e w i t h documentation (e.g. a README file for any data set and for any model) Provide examples of runs, and give some reference. Structure your documentation. Include figures and their making. Do not wait! Make your stuff available on the Web (whatever format) under an open license*. *http://5stardata.info/ R2 The stairs For yourself
  • 43.
    43 See Also Journals Rigon &Al. http://sciencecodemanifesto.org/
  • 44.
    44 In conclusion Conclusions • Researchmust be reproducible • In many case it would be better it is replicable • Making our research replicable can be an advantage • It can favour the progress of science • Do not be shy: share your research • Nobody is going to hurt you Rigon & Al. Find your own way to Reproducible Research
  • 45.
    !45 Find this presentationat http://abouthydrology.blogspot.com Ulrici,2000? Other material at Questions ? http://abouthydrology.blogspot.it/2015/07/theory-and-practice-of-reproducible.html Rigon & Al.
  • 46.
    46 For the webreferences, see the slides. Formetta, G.; Mantilla, R.; Franceschi, S., Antonello A., Rigon R., The JGrass- NewAge system for forecasting and managing the hydrological budgets at the basin scale: models of flow generation and propagation/routing, Geoscientific Model Development Volume: 4 Issue: 4 Pages: 943-955, DOI: 10.5194/gmd-4- 943-201, 2011 Botter, G., E. Bertuzzo, and A. Rinaldo (2010), Transport in the hydrologic response: Travel time distributions, soil moisture dynamics, and the old water paradox, Water Resour. Res., 46, W03514, doi:10.1029/2009WR008371. Formetta G., Antonello A., Franceschi S., David O., and Rigon R., Hydrological modelling with components: A GIS-based open-source framework, Environmen- tal Modelling Software, 5 (2014), 190-200 David, O., Ascough II, J.C., Lloyd, W., Green, T.R., Rojas, K.W., Leavesley, G.H., Ahuja, L.R., 2013. A software engineering perspective on environmental modeling framework design: the Object Modeling System. Environ. Model. Softw. 39, 201e213. References Rigon & Al.
  • 47.
    47 References right tothe point Hutton, C., Wagener, T., Freer, J., Han, D., Duffy, C., & Arheimer, B. (2016). Most computational hydrology is not reproducible, so is it really science?,, so is it really science? Water Resources Research, 1–14. http://doi.org/ 10.1002/2016WR019285 Ince, D. C., Hatton, L., & Graham-Cumming, J. (2013). The case of open computers programs, Nature, 482(7386), 485–488. http://doi.org/10.1038/ nature10836 Reproducible Research in Vadose Zone Sciences. (2015). Reproducible Research in Vadose Zone Sciences, 1–5. http://doi.org/10.2136/vzj Rigon & Al. References