Data Driven Model Development
David LeBauer, Mike Dietze, Deepak Jaiswal, Rob
Kooper, Stephen P. Long, Shawn Serbin, Dan Wang
Information

Objective: Useful Predictions

Precision, Accuracy
Clark et al. 2001 Ecological Forecasts, An Emerging Imperative. Science
Sources of Uncertainty

Schlesinger et al. 1979 Terminology for model credibility. Simulation.
Windows

An error has occurred. To continue:
Press Enter to return to Windows, or
Press CTRL+ALT+DEL to restart your computer. If you do this, you will loose any
unsaved information in all open applications
Error: 0E : 016F : BFF9B3D4
Press any key to continue _
Technical
Uncertainty
Yield

A Cautionary Tale

Observed
Technical
Uncertainty
A Cautionary Tale

Priors

Yield

Observed
Technical
Uncertainty
A Cautionary Tale

+ Trait Data

Priors

Yield

Observed
Technical
Uncertainty
A Cautionary Tale

+ Flux Data

+ Trait Data

Priors

Yield

Observed
Technical
Uncertainty
A Cautionary Tale

+ Flux Data

+ Trait Data

Priors

Yield

Observed

Annual Merge
Technical
Uncertainty
A Cautionary Tale

Annual Merge

+ Latest Version

+ Flux Data

+ Trait Data

Priors

Yield

Observed
Best Practices
Write programs for people, not computers
Automate repetitive tasks
Use the computer to record history

Make incremental changes
Use version control
Don't repeat yourself (or others)
Plan for mistakes
Optimize software only after it works correctly
Document the design and purpose of code
Conduct code reviews
Wilson et al 2012. Best Practices for Scientific Computing. arXiv:1210.0530v3
Best Practices
Write programs for people, not computers
Automate repetitive tasks
Use the computer to record history

Make incremental changes
Use version control
Don't repeat yourself (or others)
Plan for mistakes
Optimize software only after it works correctly
Document the design and purpose of code
Conduct code reviews
Wilson et al 2012. Best Practices for Scientific Computing. arXiv:1210.0530v3
Best Practices 1: Automation
Write programs for people, not computers
Automate repetitive tasks
Use the computer to record history

Make incremental changes
Use version control
Don't repeat yourself (or others)
Plan for mistakes
Optimize software only after it works correctly
Document the design and purpose of code
Conduct code reviews
Altintas et al 2004. Kepler: an extensible system for design and execution of scientific workflows. Proc 16th ICSSDM
Parameter Uncertainty: Test Case
Single Analysis:
Contribution of parameter uncertainty to
uncertainty in Switchgrass Yield prediction.

LeBauer, Wang, Richter, Davidson, and Dietze 2013.
Facilitating Feedbacks between ecological models and data. Ecological Monographs
Parameter Uncertainty: Automated
* 17 Plant functional types

* 6 biomes
* 8 scientists
* 6 Months

Dietze, Serbin, LeBauer, Davidson, Desai, Feng, Kelly, Kooper, LeBauer, Mantooth, McHenry, and Wang. submitted
A quantitative assessment of a terrestrial biosphere model's data needs across North American biomes. JGR

% SD Explained

Contribution of parameter uncertainty
to model uncertainty.
Best Practices 2:
Iteration with Testing
Write programs for people, not computers
Automate repetitive tasks
Use the computer to record history

Make incremental changes
Use version control
Don't repeat yourself (or others)
Plan for mistakes
Optimize software only after it works correctly
Document the design and purpose of code
Conduct code reviews
Wilson et al 2012. Best Practices for Scientific Computing. arXiv:1210.0530v3
Case Study:
C4 Crop  Coppice Willow
C3
Photosynthesis

Perennial
Stem

Leaf
Senescence
Benchmark
Data
Aboveground Biomass
23 Calibration Sites
72 Observations

0.0

20.0

40.0

Observed (Mg/ha)

60.0
Results:

Standard
Deviation*

1

Start (C4 Grass)
+ C3 Photosynthesis
+ Perennial Stem

1

Correlation

+ Fixed Respiration
+ Leaf Senescence

RMSE*
0
*Scaled to sddata = 1
Results:

Standard
1
Deviation* 0.74

Start (C4 Grass)
+ C3 Photosynthesis
+ Perennial Stem

1

Correlation

+ Fixed Respiration

0.20

+ Leaf Senescence
0.67

RMSE*
0
*Scaled to sddata = 1
Results:

Standard
1
Deviation* 0.74

Start (C4 Grass)
+ C3 Photosynthesis
+ Perennial Stem

1

Correlation

+ Fixed Respiration

0.20

+ Leaf Senescence
0.67

RMSE*
0
*Scaled to sddata = 1
Results:

1.46

Standard
1
Deviation* 0.74

Start (C4 Grass)
+ C3 Photosynthesis
+ Perennial Stem

1

Correlation

+ Fixed Respiration

0.20

+ Leaf Senescence
0.67

RMSE*
0
*Scaled to sddata = 1
Results:

1.46

Standard
1
Deviation* 0.74

Start (C4 Grass)
+ C3 Photosynthesis
+ Perennial Stem

1

Correlation

+ Fixed Respiration

0.20

+ Leaf Senescence
0.67

RMSE*
0
*Scaled to sddata = 1
Results:

1.46

Standard
1
Deviation* 0.74

0.84

Start (C4 Grass)
+ C3 Photosynthesis
+ Perennial Stem

1

0.87

Correlation

+ Fixed Respiration

0.20

+ Leaf Senescence
0.67

RMSE*

0.30
0

*Scaled to sddata = 1
Aboveground Biomass (Mg/ha)

Predicted

100

50.0

0.0

0.0

50.0

Observed

80.0
Conclusions
* Best practices lead to more effective and efficient modeling
* Applied integration tests to support model development
* Controlling technical error produces more robust and accurate inference
Future Directions
* Track benchmark metrics for specific model runs
* Maintain ability to reproduce published results
* Automated testing with each code commit or major release
* Current Metrics to define limits of model credibility
More Information
Email:

dlebauer@illinois.edu

Web:

pecanproject.org

Development: github.com/pecanproject

Le Bauer: Data Driven Model Development