Dissecting Reproducibility: A case study with ecological niche models in the Whole Tale environment

Hierarchy of Hypotheses Workshop (HoH3): Research Synthesis
Dissecting Reproducibility
A case study with ecological niche models
in the Whole Tale environment
Bertram Ludäscher Santiago Núñez-Corrales
HoH3 2018-10-10..12
Ph.D. Candidate
Illinois Informatics
NCSA Building
Director, Center for Informatics Research
in Science & Scholarship (CIRSS)
School of Information Sciences
University of Illinois at Urbana-Champaign

All-in-One (Teaser)• Reproducibility Crisis in Science
• A conceptual tool: Provenance
• Transparency? Explanation? Provenance !
• … why-, how-, where-, why-not-, data-, workflow- ... provenance ...
• Terminological Chaos Reigns
– … replicability … reproducibility … repeatability …
• A modest proposal and (evolving) conceptual tool: PRIMAD
– What’s fixed? What varies? (X à X’, Y à Y’ , … )
– What is the information gain when succeeding, failing to reproduce?
• Tool Tools (cf. audio-book, e-book, book-book)
– Computational Reproducibility? Whole-Tale (vms++) !
– Modeling (Dataflow) Dependencies? YesWorkflow !
– Terminological Confusion? EulerX ! (“Semantics”)
• A Case Study
– Whole Tale Summer Internship (Santiago Núñez-Corrales):
– Reproducibility in Ecological Niche Models: the case of Phillips et al (2006)
Ludäscher & Núñez-Corrales
Whole Tale

Reproducibility Crisis
“Most research findings are false for
most research designs and for most
fields”
Ioannidis, John P. A. 2005. “Why Most
Published Research Findings Are False.”
“Most replication effects were smaller
than original results”
Open Science Collaboration. 2015.
“Estimating the Reproducibility of
Psychological Science.”

(Mis-)Trust in Science
What data?
What methods?
What parameter
settings?
Can we trust these
data and methods?
Smith, Melinda D., Alan K. Knapp, and Scott L. Collins. 2009. “A Framework for Assessing Ecosystem Dynamics
in Response to Chronic Resource Alterations Induced by Global Change.” Ecology 90 (12): 3279–89.
4

U.S. National Climate Assessment:
Transparency through Provenance to the rescue …
“This report is the result of a three-
year analytical effort by a team of
over 300 experts, overseen by a
broadly constituted Federal Advisory
Committee of 60 members. It was
developed from information and
analyses gathered in over 70
workshops and listening sessions held
across the country.”

Computational Provenance …
• Origin, processing history of artifacts
– data products, figures, ...
– also: underlying workflow
è understand methods, dataflow, and dependencies
è role of computational provenance in HoH !?
Climate Change Impacts
in the United States
U.S. National Climate Assessment
U.S. Global Change Research Program

A conceptual tool: Provenance …
• Grand Canyon’s rock layers are a record of the early geologic history of North America.
The ancestral puebloan granaries at Nankoweap Creek tell archaeologists about more
recent human history. (By Drenaline, licensed under CC BY-SA 3.0)
• Not shown: computational archaeologists reconstructing past climate from multiple tree-
ring databases è computational provenance is key for transparency & reproducibility

... provenance is:
Understanding what happened!
Zrzavý, Jan, David Storch, and Stanislav
Mihulka. Evolution: Ein Lese-Lehrbuch.
Springer-Verlag, 2009.
Author: Jkwchui (Based on
drawing by Truth-seeker2004)

: Provenance in DataONE
A DataONE search (here: “grass”) yields different packages with Data Provenance
(not covered: Semantic Search)

Exploring Provenance in DataONE
• Let’s go there è Mark Carls. 2017. Analysis of hydrocarbons following
the Exxon Valdez oil spill, Gulf of Alaska, 1989 - 2014. Gulf of Alaska
Data Portal. urn:uuid:3249ada0-afe3-4dd6-875e-0f7928a4c171.

DataONE: Search and Provenance Display

Adding YesWorkflow to DataONE
Yaxing’s script with
inputs & output
products
Christopher’s
YesWorkflow
model
Christopher using
Yaxing’s outputs as
inputs for his script
Christopher’s results
can be traced back all
the way to Yaxing’s
input

Reproduce,
Replicate,
Repeat …
Wait!
Mind your
vocabulary!
Barba, Lorena A. 2018. “Terminologies for Reproducible Research.”
ArXiv:1802.03311 [Cs], February. http://arxiv.org/abs/1802.03311.

Barba, Lorena A. 2018. “Terminologies for Reproducible Research.”
ArXiv:1802.03311 [Cs], February. http://arxiv.org/abs/1802.03311.

Plesser, Hans E. 2018. “Reproducibility vs.
Replicability: A Brief History of a Confused
Terminology.” Frontiers in Neuroinformatics 11.
https://doi.org/10.3389/fninf.2017.00076.
Barba, Lorena A. 2018. “Terminologies for
Reproducible Research.” ArXiv:1802.03311 [Cs],
February. http://arxiv.org/abs/1802.03311.

To succeed or to fail? What do we gain?
• Successful reproducibility study:
– increases trust in prior study J
– … but no surprises L
• Failed reproducibility study :
– decreases trust (or falsifies) prior study L
– … but surprising failure yields new info/knowledge J
• Learning from failures!
– not really a totally new idea..
– What does a positive vs negative result mean anyways?
– When developing s/w, tools: fail early, fail often ...

PRIMAD:
What have you “primed”? What do we gain?
Dagstuhl Seminar #16041 Report
Outputs = Exec(M,I,P,D) | RO, A
- M = MaxEnt/..
- I = package XYZ
- P = MacOS , Windows, ..
- D = (Params, Files)

New dimensions for HoH !?

Reproducibility in Ecological Niche
Models: the case of Phillips et al. (2006)
Santiago Núñez-Corrales
2018 Summer Internship, The Whole Tale
Mentors: Prof. Bertram Ludäscher (UIUC), Prof. Nico Franz (ASU)
University of Illinois at Urbana-Champaign

• SKOPE: system and tools to discover, access,
analyze, visualize paleoenvironmental data
– unprecedented ability to explore provenance
(detailed, comprehensible record of computational
derivation of results)
– for researchers, tinkerers, and modelers
• Whole Tale:
– leverage & contribute to existing CI to support the
whole tale (“living paper”), from workflow run to
scholarly publication
– integrate tools & CI (DataONE, Globus, iRODS,
NDS, ...) to simplify use and promote best
practices.
– driven by science WGs (Archaeology/SKOPE,
materials science, astro, bio ..)
But first: Some Tools (“Cyberinfrastructure”)
Ludäscher: Provenance Back & Forth 21

Provenance Support for Reproducible Science
in SKOPE: Paleoclimate Reconstruction
Science paper (OA) uses:
• open source code:
– R, PaleoCAR, …
• Is that all we need?
• What was the
“workflow”?
• Is there prospective
and/or retrospective
provenance?

Whole Tale: The next step in the evolution of
the scholarly article: The “Living” Paper
• 1st Generation:
– narrative (prose)
• 2nd Generation: plus …
– name .. identify .. include (access to) data
• 3rd Generation: plus …
– name .. reference .. include code (software) ..
– and provenance … and exec environment (containers, vms)
Whole Tale
Whole Tale Dashboard

Whole Tale Vision
Share narrative, data, code, computer ...
Tale
Data
{ Code
Virtual Machine (vm)
IDE (“Front-End”)

Project Goals (… Reproducibility in Ecological Niche Models … )
● Try to reproduce one set of results reported in the literature
using maximum entropy methods (MaxEnt) within The Whole
Tale environment
○ Phillips, S. J., Anderson, R. P., & Schapire, R. E. (2006). Maximum
entropy modeling of species geographic distributions. Ecological
modelling, 190(3-4), 231-259.
● Determine whether existing software tools focus more on the
scientific modeling problem instead of on software usage
while covering reproducibility concerns
○ Not with existing tools, either incomplete or desktop-based, not
comparable
● Build scientific software for ecological niche modeling that
helps users diversify and trace their stories
○ Introspection-based model

intros-MaxEnt: view in PRIMAD++
Actions Parameter Raw data Platform /
Stack
Implem. Method Research
Objective
Actor Gain
Re-code (x) x Run MaxEnt models in the Whole Tale
Validate (x) (x) (x) (x) x Determine MaxEnt robustness factors
Re-use x Increase the user base for MaxEnt methods
Independent x x Collectively verify MaxEnt experiments
Introspect (x) (x) x Explore and adjust model contents
Diff (x) (x) (x) x Test hypotheses dependent on state-change
Trace (log) (x) (x) (x) (x) x Capture time-dep decision modeling pathways
Package (x) (x) (x) (x) x Provide a zero cold-start entry for experiments
Freire, J., Fuhr, N., & Rauber, A. (2016). Reproducibility of data-oriented experiments in e-Science (Dagstuhl Seminar 16041). In Dagstuhl Reports (Vol. 6, No. 1). Schloss Dagstuhl-Leibniz-Zentrum fuer
Informatik.

Ecological niche models ..
1. Positive observations (i.e. presence-only data) suffice to
compute a distribution of a species
2. The likelihood of the presence of an individual depends on
biologically relevant environmental factors
3. Interactions between species can be abstracted as
environmental factors, hence not modeled explicitly
4. The distribution is stated in terms of the probability of finding
a member of the species at the locations of interest
5. An exact fit is not a good fit, but rather an overfit

Maximum Entropy
● Given input data, the best estimate for the probability
distribution P it approximates is given by the
distribution P’ that maximizes entropy
○ Jaynes, E. T. (1957). Information theory and statistical
mechanics. Physical Review, 106(4), 620.
Shoaib, M., Siddiqui, I., Rehman, S., ur Rehman, S., &
Khan, S. (2017). Speed distribution analysis based on
maximum entropy principle and Weibull distribution
function. Environmental Progress & Sustainable
Energy, 36(5), 1480-1489.

Bradypus variegatus: reported vs reproduced (WT-SVM)
Phillips et al. (2006), AUC = 0.873 Our rendition, AUC = 0.868

Microryzomys minutus: reported vs reproduced (WT-SVM)
Phillips et al. (2006), AUC = 0.986 Our rendition, AUC = 0.994

Original MaxEnt software in Java

1
2
2
Making assumptions explicit,
providing sources (~provenance)

The “Living Paper” (Jupyter Notebook)

Et Voilà !

Summary of Outcomes
1. Able to execute a version of MaxEnt with original data from
Phillips et al (2006) within The Whole Tale
a. Stated in terms of a regularized support vector machine
(complex code!)
b. Discovered problems with reproducibility and how to
evaluate it
2. A tool for batch georeferencing DarwinCore based on minimal
location data was implemented
a. Helpful to assign geolocation data after taxonomy
alignment
b. Discovered data is much less clean than expected
3. A new “introspective” software version of MaxEnt
a. Available in PyPI
b. Based on a state machine

... now what?
• PRIMAD ++
– PRIMAD is built on the idea of
– … keeping some things the same
– ... and “wiggling” some things
– We can start from the “execution stack”:
• Hardware … Operating System … Libraries ... PLs ... IDEs ..
– Then going into the domain:
• … varying datasets, parameters, assumptions ...
– Experimental Design ++ !
• PRIMAD ++ HoH (v2?)
• Tools to support
– “higher order” {data, parameter, method, …} sweeps
– Automate these (workflow tools!)

A Tool Tool: Publishing MaxEnt in PyPI

Excursion:
Biodiversity Informatics
Whole Tale Summer Internship:
A reproducible scientific workflow
for
Multiple Taxonomic Perspectives (Jessica)
++ Niche Modeling (Santiago)

Combine EulerX,
multiple taxonomic
perspectives
(hypotheses) with
ecological niche
modeling
è
transparency,
reproducibility

Taxonomic concept alignment, Andropogon glomeratus-virginicus
complex, spanning across 11 classifications authored 1889-2015
• 36 unique taxonomic names
• 88 taxonomic concept labels
Þ name sec. author strings
• Alignment by A.S. Weakley
Þ row position = congruence
• 1/36 names with unique 1 : 1
name : meaning cardinality
across all classifications
• Andropogon virginicus
• Source: Franz et al. 20161
1 Franz et al. 2016. Names are not good enough: reasoning over taxonomic change in the Andropogon complex.
Semantic Web Journal (IOS). doi:10.3233/SW-160220

Leaving taxon and species headaches …
• To illustrate Euler think of a simpler use case:
• Agreeing to disagree!
• … when there are multiple, legitimate
perspectives
• Sorting things out!
– Euler as a taxon concept (& name) “microscope” ...
– .. or “time machine” ?

Half-Smokes in DC: Typical for the Northeast?
… or the South !? (A tale of two taxonomies: NDC vs CEN)
“…in the face of incompatible information or data structures among users or among those specifying
the system, attempts to create unitary knowledge categories are futile. Rather, parallel or multiple
representational forms are required” [Bowker & Star, 2000, p.159]
West
Southwest Southeast
Midwest North-
east
West
South
Midwest North-
east
National Diversity Council map (NDC) US Census Buero map (CEN)
Source: Yi-Yun (Jessica) Cheng (PhD student, iSchool @ Illinois)

EulerX tool: Sorting things out …
CEN.Midwest
CEN.USA
CEN.South CEN.West CEN.Northeast NDC.Northeast
NDC.USA
NDC.Southeast NDC.Midwest NDC.Southwest NDC.West
Nodes
CEN 5
NDC 6
Edges
is_a (CEN) 4
is_a (NDC) 5
CEN.South
NDC.Northeast
o
NDC.Southwest
o
NDC.Southeast>
CEN.Midwest
NDC.Midwest=
CEN.USA
CEN.West
CEN.Northeast
NDC.USA
=
!
o
NDC.West
>
<
5
6
4
5
9
CEN.Midwest
CEN.USA
NDC.USA
Nodes
CEN 5
NDC 6
Edges
is_a (CEN) 4
is_a (NDC) 5
• Given:
– taxonomies T1, T2
– and relations T1 ~ T2
(articulations, alignment)
• Find:
– merged taxonomy T3
• Such that:
– T1, T2 are preserved
– all pairwise relations are
explicit
T1 T2

Merged taxonomy T3 (= T1 “+” T2)
CEN.South
NDC.Northeast
NDC.Southwest
CEN.USA
NDC.USA
CEN.West
CEN.Northeast
NDC.Southeast
NDC.West
CEN.Midwest
NDC.Midwest
N
CE
ND
cong
Ed
is_a (
overlap
CEN.Midwest
CEN.USA
NDC.USA
Nodes
CEN 5
NDC 6
Edges
is_a (CEN) 4
is_a (NDC) 5
CEN.Midwest
CEN.USA
NDC.USA
Nodes
CEN 5
NDC 6
Edges
is_a (CEN) 4
is_a (NDC) 5
CEN.South
NDC.Northeast
><
NDC.Southwest
><
NDC.Southeast>
CEN.Midwest
NDC.Midwest==
CEN.USA
CEN.West
CEN.Northeast
NDC.USA
==
!
><
NDC.West
>
<
des
N 5
C 6
ges
EN) 4
DC) 5
tions 9
T1 T2
T1 ~ T2 T3
overlap!

CEN.West
NDC.Southwest
CEN.USA
NDC.USA
CEN.Northeast
NDC.Northeast
CEN.South
NDC.Southeast
NDC.West
CEN.DC
NDC.DC
CEN.NM
NDC.NM
CEN.ND
NDC.ND
CEN.Midwest
NDC.Midwest
CEN.AZ
NDC.AZ
CEN.CA
NDC.CA
CEN.MT
NDC.MT
CEN.MA
NDC.MA
CEN.IN
NDC.IN
CEN.NV
NDC.NV
CEN.MD
NDC.MD
CEN.CT
NDC.CT
CEN.NH
NDC.NH
CEN.KY
NDC.KY
CEN.PA
NDC.PA
CEN.CO
NDC.CO
CEN.WA
NDC.WA
CEN.MI
NDC.MI
CEN.VA
NDC.VA
CEN.WI
NDC.WI
CEN.NE
NDC.NE
CEN.SD
NDC.SD
CEN.MN
NDC.MN
CEN.MS
NDC.MS
CEN.ID
NDC.ID
CEN.WV
NDC.WV
CEN.NY
NDC.NY
CEN.NJ
NDC.NJ
CEN.UT
NDC.UT
CEN.ME
NDC.ME
CEN.IL
NDC.IL
CEN.TN
NDC.TN
CEN.VT
NDC.VT
CEN.GA
NDC.GA
CEN.DE
NDC.DE
CEN.NC
NDC.NC
CEN.OK
NDC.OK
CEN.MO
NDC.MO
CEN.SC
NDC.SC
CEN.AR
NDC.AR
CEN.TX
NDC.TX
CEN.LA
NDC.LA
CEN.OH
NDC.OH
CEN.IA
NDC.IA
CEN.KS
NDC.KS
CEN.RI
NDC.RI
CEN.WY
NDC.WY
CEN.FL
NDC.FL
CEN.OR
NDC.OR
CEN.AL
NDC.AL
Nodes
CEN 3
NDC 4
comb 51
Edges
input 61
inferred 3
overlapsinferred 3
CEN.West
NDC.Southwest
CEN.USA
NDC.USA
NDC.Northeast
CEN.South
NDC.Southeast
CEN.DC
NDC.DC
CEN.NM
NDC.NM
CEN.AZ
NDC.AZ
CEN.MA
NDC.MA
CEN.MD
NDC.MD
CEN.CT
NDC.CT
CEN.KY
NDC.KY
CEN.VA
NDC.VA
CEN.MS
NDC.MS
CEN.WV
NDC.WV
CEN.TN
NDC.TN
CEN.GA
NDC.GA
CEN.DE
NDC.DE
CEN.NC
NDC.NC
CEN.OK
NDC.OK
CEN.SC
NDC.SC
CEN.AR
NDC.AR
CEN.TX
NDC.TX
CEN.LA
NDC.LA
CEN.FL
NDC.FL
NDC.OR
CEN.AL
NDC.AL
DC is in both the South and the Northeast
Adding the state-level makes
the overlap explicit!

Conclusion: Be explicit!
• Transparency, understanding, …
=> through provenance!
• Reproducibility studies
=> PRIMAD++ model
=> understand what you gain!
• Tool Tools:
– Whole-Tale (“living paper”)
– YesWorkflow (workflow modeling)
– EulerX (reasoning about taxonomies)

References
• ... promises & futures …

Dissecting Reproducibility: A case study with ecological niche models in the Whole Tale environment

More Related Content

More from Bertram Ludäscher

Recently uploaded

Dissecting Reproducibility: A case study with ecological niche models in the Whole Tale environment