Abstract:
Humans need a secure and sustainable food supply, and science can help. We have an opportunity to transform agriculture by combining knowledge of organisms and ecosystems to engineer ecosystems that sustainably produce food, fuel, and other services. The challenge is that the information we have. Measurements, theories, and laws found in publications, notebooks, measurements, software, and human brains are difficult to combine. We homogenize, encode, and automate the synthesis of data and mechanistic understanding in a way that links understanding at different scales and across domains. This allows extrapolation, prediction, and assessment. Reusable components allow automated construction of new knowledge that can be used to assess, predict, and optimize agro-ecosystems.
Developing reusable software and open-access databases is hard, and examples will illustrate how we use the Predictive Ecosystem Analyzer (PEcAn, pecanproject.org), the Biofuel Ecophysiological Traits and Yields database (BETYdb, betydb.org), and ecophysiological crop models to predict crop yield, decide which crops to plant, and which traits can be selected for the next generation of data driven crop improvement. A next step is to automate the use of sensors mounted on robots, drones, and tractors to assess plants in the field. The TERRA Reference Phenotyping Platform (TERRA-Ref, terraref.github.io) will provide an open access database and computing platform on which researchers can use and develop tools that use sensor data to assess and manage agricultural and other terrestrial ecosystems.
TERRA-Ref will adopt existing standards and develop modular software components and common interfaces, in collaboration with researchers from iPlant, NEON, AgMIP, USDA, rOpenSci, ARPA-E, many scientists and industry partners. Our goal is to advance science by enabling efficient use, reuse, exchange, and creation of knowledge.
---
Invited talk for the "Informatics for Reproducibility in Earth and Environmental Science Research" session at the American Geophysical Union Fall Meeting, Dec 17 2015.
Reusable Software and Open Data To Optimize Agriculture
1. Reusable Software and Open Data To Optimize Agriculture
David LeBauer
AGU 2015 Fall Meetings
@dlebauer
2. Overview
Ideas:
Software: Modular, Reusable, and Useable
Data: Harmonization, Distribution
Workflows: Reproducible, Automated
Science: Cumulative and Synthetic
Examples:
PEcAn ProjectBETYdb TERRA Ref
3. Agriculture: Model and Application
Food, fuel, and other ecosystem services (e.g. C, N, H2O)
Basic science: genes to organism to ecosystem
Engineering applications: computing, data collection, prediction
Enzyme Ecosystem Continent
4. Overview
Ideas:
Software: Modular, Reusable, and Useable
Data: Harmonization, Distribution
Workflows: Reproducible, Automated
Science: Cumulative and Synthetic
Examples:
PEcAn ProjectBETYdb TERRA Ref
betydb.org github.com/pecanproject/bety @BETYdatabase
14. PEcAn:
complex models in complex workflows
Modeling Information Systems
Dietze 2016 Princeton University Press
BioCro / Wimovac Crop Model
Humphries and Long, 2005
Miguez et al 2009
15. Ecosystem Modeling c. 2012
Select Site Configure Run Visualize, ExportRun Model
Dietze, Kooper, LeBauer 2012
16. LeBauer et al 2013
Given available data,
How well do we know parameters?
How does this affect prediction?
What should we collect?
PEcAn:
Sensitivity Analysis & Variance Decomposition
18. BETYdb + PEcAn
BETYdb is PEcAn’s informatics backend
Provides data, workflow and data provenance
Federated network of databases
19. Overview
Ideas:
Software: Modular, Reusable, and Useable
Data: Harmonization, Distribution
Workflows: Reproducible, Automated
Science: Cumulative and Synthetic
Examples:
PEcAn ProjectBETYdb TERRA Ref
terraref.ncsa.illinois.edu github.com/terraref @terra_ref
20. TERRA: Better Breeding Through Science
We have increased yields many times in the last 60 years.
What new opportunities does modern science provide?
University of Illinois
Integrated Pest Management
• Use scientific understanding to select for traits
• Replace manual measurement with remote sensing
• Target specific genes and phenotypes in crosses
21. ARPA-E TERRA Program
Six Funded Teams
$30 m in awards
$5 m in sensors
TERRA Ref:
Public reference data
HPC Computing
22. TERRA Ref: An Agricultural Observatory
Similar to and informed by:
Large Synoptic Survey Telescope
National Ecological Observatory Network
23. Open: Science, Data, Software
Useable: Useful and Familiar to Scientists, Breeders, Precision Ag
Modular: Extensible, Distributed, Automated, Interoperable
Interdisciplinary: Genes to Ecosystems with Robots, Vision, Statistics
Scalable: From Mobile Devices to High Performance Computers
terraref.ncsa.illinois.edu @terra_ref github.com/terraref
TERRA Reference Data and Computing
24. Sensor Data Sources
Lemnatec Indoor
Danforth, St. Louis
Lemnatec Field
USDA ALRC, Maricopa, AZ
Tractor and UAV
Kansas State
Plus, other teams, public, (sharing optional)
Shared Sorghum genomics and germplasm,
25. Reference Data
Raw Sequence Data
Aligned Reads
SNPs
Images
Spectra
Point clouds
Shapes
Biomass, Growth
Tissue Chemistry
Photosynthesis
Yield
Stress Tolerance
Ecosystem Services
26. Big Data Volume & Velocity
Imaging Spectrometers:
VNIR ~3-4 TB/d
SWIR ~1 TB/d
3D Laser Scanner ~ 1 TB/d
4 Year Total: 1 - 40 PB
VNIR
SWIR 3D …
Everything else
27. Computing and Storage
Roger Server: 1PB online, GIS optimized
Nebula: NCSA Open Stack Server
Blue Waters: 10 PB tape storage
Your Local: [Desktop, HPC, or Sensor Platform]
28. Data Products Standards Committee
Paul Bartlett Near Earth Autonomy
Jeff White USDA ALARC, ICASA
Melba Crawford Purdue University
Michael Gore,
Elodie Garazave
Cornell University
Matt Colgan Blue River
Christer Janssen PNNL
Barnabas Poczos Carnegie Mellon
Alex Thomasson Texas A&M University
Cheryl Porter University of Florida, AgMIP, USDA
Shawn Serbin Brookhaven National Lab, PEcAn
Shelly Petroy
Christine Laney
NEON
Carolyn J. Lawrence-Dill Iowa State, AgBioData
Eric Lyons University of Arizona, CoGE
Ted Habermann HDF Group
Participants
• Project representatives
• Domain Experts
• Scientific Community (You)*
Responsibilities
• Define Data
• Revise, Improve
• Training, Outreach
* github.com/terraref/reference-data/issues
29. Computing Pipeline
Data Uploaded via API
Triggers Analytical Pipeline
Generates and Stores Data, Metadata
Users select data, launch VM:
Favorite Software
Data Mounted
HPC Access
States can be Shared, Archived
30. Acknowledgements
Projects: PEcAn, NCSA, BrownDog, Plants In Silico, CyberGIS,
National Data Service, USDA, AgMIP, National Data Service
Data: Providers and Curators
Mentors: Mike Dietze, Steve Long, Kathleen Treseder
Funding: NSF, EBI, ARPA-E, DOE, NASA
31. Contact
Web GitHub s Twitter a
David LeBauer dlebauer@illinois.edu dlebauer @dlebauer
BETYdb betydb.org pecanproject/bety @BETYdatabase
PEcAn Project pecaproject.org pecanproject/pecan @PEcAnproject
TERRA Ref terraref.ncsa.illinois.edu terraref @terra_ref
32. PIs Amy Colin-Marshal, Steve Long, James O’Dwyer, Diwakar Shukla
Plants in Silico
Multi-scale modeling platform to predict crop response to climate change
33. Plants in Silico: Modular Architecture
Zhu et al, 2015 Plant Cell Environ