SlideShare a Scribd company logo
Predicting Phenotype from Multi-Scale Genomic
and Environment Data using Neural Networks
and Knowledge Graphs: An Introduction to the
NSF GenoPhenoEnvo Project
Anne E Thessen, Michael Behrisch, Emily J Cain, Remco Chang, Bryan Heidorn,
Pankaj Jaiswal, David LeBauer, Ab Mosca, Monica C Munoz-Torres, Arun Ross,
Tyson Swetnam
Acknowledgements
● NSF Ideas Lab
● NSF Award 1940330 Harnessing
the Data Revolution
● Translational and Integrative
Sciences Lab OSU (tislab.org)
● Two new members: Ishita Debnath
(MSU) and Ryan Bartelme (UA)
Predicting Phenotype from Genes and Environments
● G + E = P only works
in the simplest
systems, if at all
● There’s a lot we
don’t know about
how genes are
translated into
phenotypes
● How do phenotypes
affect the
ecosystem?
How can we predict phenotype given an organism’s environmental conditions and
genomic endowment?
● State-of-the-art statistical
modeling has led to many
insights, but has been applied
to very controlled systems.
● Getting the phenotype is only
part of the answer.
● Can we use the predictive
model to reveal hidden
processes? Critical variables?
Machine Learning for Results and Process
● Pros
○ Capable of coping with non-linearity
in biological systems
○ Find hidden relationships
● Cons
○ Output is opaque
○ Not enough curated data not available
○ Data that are available not “ML ready”
GenoPhenoEnvo Project
● Goal: Develop a machine learning framework capable of predicting
phenotypes based on multi-scale data about genes and environments.
○ Leverage existing, well-structured, cross-species reference data about genes and phenotypes
○ Provide interactive data visualizations for examining and interpreting the “black-box” behavior
of ML models and their results
○ Realize a new model for relating phenotypes, genetic endowment, and environmental
characteristics
● Just started Oct 1
● Now in Year 1 Q2
Training Data
Year 1
● TERRA-REF sorghum
● Heavily controlled and measured
environment data
● Thorough genotyping and
phenotyping
● Knowledge graph links
● How do we prepare these data to
be ML ready?
Year 2
● TERRA-REF wheat
● NEON, EOS
● Citizen science phenology
What is a Knowledge Graph?
How Can Knowledge Graphs Help?
DOI: 10.1126/scitranslmed.3009262
How Can Knowledge Graphs Help?
How Can Knowledge Graphs Help?
● Constrain ML and prioritize results
● Quality control - sanity check
● Integrate heterogeneous data
○ Manage terminology
○ Manage scale and granularity
● Find new relationships
● Fill in data gaps with inferencing
Training Data - Genomic
● Use VCF files and phenotype data from
TERRA-REF
● SNP to phenotype associations with p values and
effect sizes (GWAS)
● Manhattan plot for all phenotype data
List 1: TERRA-REF data for Y1
Gene Information (G)
● Sorghum whole genome
● Sorghum genotypes[219]
Phenotype Information (P)
● Emergence Date
● End of Season Biomass
● End of Season Height
● Flowering Date
Environment Information (E)
● Soil moisture
● Air temperature
● PAR (Irradiance)
● Wind speed
● Humidity
● Precipitation and Irrigation
● Fertilizer inputs
By M. Kamran Ikram et al - Ikram MK et al (2010) Four Novel Loci (19q13, 6q24, 12q24, and
5q14) Influence the Microcirculation In Vivo. PLoS Genet. 2010 Oct 28;6(10):e1001184.
doi:10.1371/journal.pgen.1001184.g001, CC BY 2.5,
https://commons.wikimedia.org/w/index.php?curid=18056138
*example
GWAS results can be combined
with the knowledge graph results to
reduce input variables for ML
Training Data - Phenomic
● Days to flowering
● Growing Degree Days to flowering
● Days to flag leaf emergence
● Growing Degree Days to flag leaf emergence
● Canopy height
● End of season canopy height
● Above ground dry biomass at harvest
List 1: TERRA-REF data for Y1
Gene Information (G)
● Sorghum whole genome
● Sorghum genotypes[219]
Phenotype Information (P)
● Emergence Date
● End of Season Biomass
● End of Season Height
● Flowering Date
Environment Information (E)
● Soil moisture
● Air temperature
● PAR (Irradiance)
● Wind speed
● Humidity
● Precipitation and Irrigation
● Fertilizer inputs
Phenotypes can be
represented as
categories or
integers (10 cm or
decreased height?)
Training Data - Environmental
● Air temperature
● Relative humidity
● Precipitation
● Wind speed and direction
● Growing degree days
● Cumulative precipitation
List 1: TERRA-REF data for Y1
Gene Information (G)
● Sorghum whole genome
● Sorghum genotypes[219]
Phenotype Information (P)
● Emergence Date
● End of Season Biomass
● End of Season Height
● Flowering Date
Environment Information (E)
● Soil moisture
● Air temperature
● PAR (Irradiance)
● Wind speed
● Humidity
● Precipitation and Irrigation
● Fertilizer inputs
Data from weather station and gantry
Abstracted to daily average, min, and max
Machine Learning - Preliminary
1. Regression Models
2. Simple Neural Networks
3. Deep Neural Networks
Year 2 - Expanding to Ecosystems (Preliminary)
Leverage data and resources from
multiple NSF supported programs:
● Analyze genetic and remote
sensing data from NEON
● Utilize the CyVerse Data Science
Workbench
● XSEDE for ML computations
List 2: Observational & EOS data for Y2
Gene Information (G)
● Cottonwood whole genome
● Cottonwood genotypes
● Sorghum whole genome
● Sorghum genotypes
● Wheat whole genome
● Wheat genotypes
Environment Information (E)
● Soil moisture (e.g., SMAP)
● Precipitation (Daymet)
● Air temperature (Daymet)
● PAR (NARR)
● Soil Type (USDA)
Phenotype Information (P)
● Leafing date
● Flowering date
● Breaking leaf buds (#)
● Ripe fruits (#)
● Increasing leaf size (Y/N)
● Falling leaves (Y/N)
● Colored leaves (Y/N)
● Flowers or buds (#)
● Open flowers (#)
● Pollen release (Y/N)
● Recent fruit/seed drop
(Y/N)
● Fruits (#)
● NDVI (EOS)
GenoPhenoEnvo Project Information
● Join our Google Group
● Watch our GitHub Repo
github.com/genophenoenvo
● Search Twitter hashtag
#GenoPhenoEnvo
● Visit the project web page
● Anne E Thessen
● annethessen@gmail.com
Questions?

More Related Content

Similar to Predicting Phenotype from Multi-Scale Genomic and Environment Data using Neural Networks and Knowledge Graphs: An Introduction to the NSF GenoPhenoEnvo Project

Open Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysisOpen Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysis
Antica Culina
 
Phenomics in crop improvement
Phenomics in crop  improvementPhenomics in crop  improvement
Phenomics in crop improvement
sukruthaa
 
My presentation at ICEM 2017: From data mining to information extraction: usi...
My presentation at ICEM 2017: From data mining to information extraction: usi...My presentation at ICEM 2017: From data mining to information extraction: usi...
My presentation at ICEM 2017: From data mining to information extraction: usi...
matteodefelice
 
Affordable field high-throughput phenotyping - some tips
Affordable field high-throughput phenotyping - some tipsAffordable field high-throughput phenotyping - some tips
Affordable field high-throughput phenotyping - some tips
CIMMYT
 
NTT
NTTNTT
Perth ausplots presentation_070616_internet_qu
Perth ausplots presentation_070616_internet_quPerth ausplots presentation_070616_internet_qu
Perth ausplots presentation_070616_internet_qu
bensparrowau
 
Undergraduate Modeling Workshop - Vegetation Working Group Final Presentatio...
Undergraduate Modeling Workshop -  Vegetation Working Group Final Presentatio...Undergraduate Modeling Workshop -  Vegetation Working Group Final Presentatio...
Undergraduate Modeling Workshop - Vegetation Working Group Final Presentatio...
The Statistical and Applied Mathematical Sciences Institute
 
Imprint of nutrient availability on WUE and Ts- α relationship
Imprint of nutrient availability on WUE and Ts- α relationshipImprint of nutrient availability on WUE and Ts- α relationship
Imprint of nutrient availability on WUE and Ts- α relationship
Integrated Carbon Observation System (ICOS)
 
Kim_WE3_T05_2.pptx
Kim_WE3_T05_2.pptxKim_WE3_T05_2.pptx
Kim_WE3_T05_2.pptxgrssieee
 
New predictive characterization methods for accessing and using crop wild rel...
New predictive characterization methods for accessing and using crop wild rel...New predictive characterization methods for accessing and using crop wild rel...
New predictive characterization methods for accessing and using crop wild rel...
Bioversity International
 
Utility of transcriptome sequencing for phylogenetic
Utility of transcriptome sequencing for phylogeneticUtility of transcriptome sequencing for phylogenetic
Utility of transcriptome sequencing for phylogenetic
EdizonJambormias2
 
EcoTas13 BradEvans e-MAST
EcoTas13 BradEvans e-MASTEcoTas13 BradEvans e-MAST
EcoTas13 BradEvans e-MAST
TERN Australia
 
agriculture decision support system
agriculture decision support systemagriculture decision support system
agriculture decision support system
Pankaj Khodifad
 
Jianqiang Ren_Simulation of regional winter wheat yield by EPIC model.ppt
Jianqiang Ren_Simulation of regional winter wheat yield by EPIC model.pptJianqiang Ren_Simulation of regional winter wheat yield by EPIC model.ppt
Jianqiang Ren_Simulation of regional winter wheat yield by EPIC model.pptgrssieee
 
Baseline study for EIA
Baseline study for EIABaseline study for EIA
Baseline study for EIA
Jenson Samraj
 
Drought Assessment + Impacts: A Preview
Drought Assessment + Impacts: A PreviewDrought Assessment + Impacts: A Preview
Drought Assessment + Impacts: A Preview
Jenkins Macedo
 
Dr. Andres Perez - PRRS Epidemiology: Best Principles of Control at a Regiona...
Dr. Andres Perez - PRRS Epidemiology: Best Principles of Control at a Regiona...Dr. Andres Perez - PRRS Epidemiology: Best Principles of Control at a Regiona...
Dr. Andres Perez - PRRS Epidemiology: Best Principles of Control at a Regiona...
John Blue
 
General ausplots school
General ausplots schoolGeneral ausplots school
General ausplots school
bensparrowau
 
Topological Data Analysis What is it? What is it good for? How can it be use...
Topological Data Analysis  What is it? What is it good for? How can it be use...Topological Data Analysis  What is it? What is it good for? How can it be use...
Topological Data Analysis What is it? What is it good for? How can it be use...
DanChitwood
 

Similar to Predicting Phenotype from Multi-Scale Genomic and Environment Data using Neural Networks and Knowledge Graphs: An Introduction to the NSF GenoPhenoEnvo Project (20)

Open Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysisOpen Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysis
 
Phenomics in crop improvement
Phenomics in crop  improvementPhenomics in crop  improvement
Phenomics in crop improvement
 
My presentation at ICEM 2017: From data mining to information extraction: usi...
My presentation at ICEM 2017: From data mining to information extraction: usi...My presentation at ICEM 2017: From data mining to information extraction: usi...
My presentation at ICEM 2017: From data mining to information extraction: usi...
 
Affordable field high-throughput phenotyping - some tips
Affordable field high-throughput phenotyping - some tipsAffordable field high-throughput phenotyping - some tips
Affordable field high-throughput phenotyping - some tips
 
NTT
NTTNTT
NTT
 
Perth ausplots presentation_070616_internet_qu
Perth ausplots presentation_070616_internet_quPerth ausplots presentation_070616_internet_qu
Perth ausplots presentation_070616_internet_qu
 
Undergraduate Modeling Workshop - Vegetation Working Group Final Presentatio...
Undergraduate Modeling Workshop -  Vegetation Working Group Final Presentatio...Undergraduate Modeling Workshop -  Vegetation Working Group Final Presentatio...
Undergraduate Modeling Workshop - Vegetation Working Group Final Presentatio...
 
Imprint of nutrient availability on WUE and Ts- α relationship
Imprint of nutrient availability on WUE and Ts- α relationshipImprint of nutrient availability on WUE and Ts- α relationship
Imprint of nutrient availability on WUE and Ts- α relationship
 
Kim_WE3_T05_2.pptx
Kim_WE3_T05_2.pptxKim_WE3_T05_2.pptx
Kim_WE3_T05_2.pptx
 
New predictive characterization methods for accessing and using crop wild rel...
New predictive characterization methods for accessing and using crop wild rel...New predictive characterization methods for accessing and using crop wild rel...
New predictive characterization methods for accessing and using crop wild rel...
 
Newproject
NewprojectNewproject
Newproject
 
Utility of transcriptome sequencing for phylogenetic
Utility of transcriptome sequencing for phylogeneticUtility of transcriptome sequencing for phylogenetic
Utility of transcriptome sequencing for phylogenetic
 
EcoTas13 BradEvans e-MAST
EcoTas13 BradEvans e-MASTEcoTas13 BradEvans e-MAST
EcoTas13 BradEvans e-MAST
 
agriculture decision support system
agriculture decision support systemagriculture decision support system
agriculture decision support system
 
Jianqiang Ren_Simulation of regional winter wheat yield by EPIC model.ppt
Jianqiang Ren_Simulation of regional winter wheat yield by EPIC model.pptJianqiang Ren_Simulation of regional winter wheat yield by EPIC model.ppt
Jianqiang Ren_Simulation of regional winter wheat yield by EPIC model.ppt
 
Baseline study for EIA
Baseline study for EIABaseline study for EIA
Baseline study for EIA
 
Drought Assessment + Impacts: A Preview
Drought Assessment + Impacts: A PreviewDrought Assessment + Impacts: A Preview
Drought Assessment + Impacts: A Preview
 
Dr. Andres Perez - PRRS Epidemiology: Best Principles of Control at a Regiona...
Dr. Andres Perez - PRRS Epidemiology: Best Principles of Control at a Regiona...Dr. Andres Perez - PRRS Epidemiology: Best Principles of Control at a Regiona...
Dr. Andres Perez - PRRS Epidemiology: Best Principles of Control at a Regiona...
 
General ausplots school
General ausplots schoolGeneral ausplots school
General ausplots school
 
Topological Data Analysis What is it? What is it good for? How can it be use...
Topological Data Analysis  What is it? What is it good for? How can it be use...Topological Data Analysis  What is it? What is it good for? How can it be use...
Topological Data Analysis What is it? What is it good for? How can it be use...
 

More from Anne Thessen

Unifying Genomics, Phenomics, and Environments
Unifying Genomics, Phenomics, and EnvironmentsUnifying Genomics, Phenomics, and Environments
Unifying Genomics, Phenomics, and Environments
Anne Thessen
 
Combining Phenomes and Genomes to Fill Analytical Gaps: Data Management in Ph...
Combining Phenomes and Genomes to Fill Analytical Gaps: Data Management in Ph...Combining Phenomes and Genomes to Fill Analytical Gaps: Data Management in Ph...
Combining Phenomes and Genomes to Fill Analytical Gaps: Data Management in Ph...
Anne Thessen
 
Bridging discrepancies across North American butterfly naming authorities: Su...
Bridging discrepancies across North American butterfly naming authorities: Su...Bridging discrepancies across North American butterfly naming authorities: Su...
Bridging discrepancies across North American butterfly naming authorities: Su...
Anne Thessen
 
Ontological Support of Data Discovery and Synthesis in Estuarine and Coastal ...
Ontological Support of Data Discovery and Synthesis in Estuarine and Coastal ...Ontological Support of Data Discovery and Synthesis in Estuarine and Coastal ...
Ontological Support of Data Discovery and Synthesis in Estuarine and Coastal ...
Anne Thessen
 
Next-Gen Taxonomic Descriptions for Microbial Eukaryotes
Next-Gen Taxonomic Descriptions for Microbial EukaryotesNext-Gen Taxonomic Descriptions for Microbial Eukaryotes
Next-Gen Taxonomic Descriptions for Microbial Eukaryotes
Anne Thessen
 
Linking biodiversity data for ecology
Linking biodiversity data for ecologyLinking biodiversity data for ecology
Linking biodiversity data for ecology
Anne Thessen
 
Data Infrastructure for Coastal and Estuarine Science
Data Infrastructure for Coastal and Estuarine ScienceData Infrastructure for Coastal and Estuarine Science
Data Infrastructure for Coastal and Estuarine Science
Anne Thessen
 
Gulf of Mexico Hydrocarbon Database: Integrating Heterogeneous Data for Impro...
Gulf of Mexico Hydrocarbon Database: Integrating Heterogeneous Data for Impro...Gulf of Mexico Hydrocarbon Database: Integrating Heterogeneous Data for Impro...
Gulf of Mexico Hydrocarbon Database: Integrating Heterogeneous Data for Impro...
Anne Thessen
 
Knowledge extraction from the Encyclopedia of Life using Python NLTK
Knowledge extraction from the Encyclopedia of Life using Python NLTKKnowledge extraction from the Encyclopedia of Life using Python NLTK
Knowledge extraction from the Encyclopedia of Life using Python NLTK
Anne Thessen
 
Marrying models and data: Adventures in Modeling, Data Wrangling and Software...
Marrying models and data: Adventures in Modeling, Data Wrangling and Software...Marrying models and data: Adventures in Modeling, Data Wrangling and Software...
Marrying models and data: Adventures in Modeling, Data Wrangling and Software...
Anne Thessen
 
Visualizing Evolution
Visualizing EvolutionVisualizing Evolution
Visualizing Evolution
Anne Thessen
 
The Future of Microalgal Taxonomy
The Future of Microalgal TaxonomyThe Future of Microalgal Taxonomy
The Future of Microalgal Taxonomy
Anne Thessen
 
Knowledge Extraction and Semantic Linking in the Encyclopedia of Life
Knowledge Extraction and Semantic Linking in the Encyclopedia of LifeKnowledge Extraction and Semantic Linking in the Encyclopedia of Life
Knowledge Extraction and Semantic Linking in the Encyclopedia of Life
Anne Thessen
 

More from Anne Thessen (13)

Unifying Genomics, Phenomics, and Environments
Unifying Genomics, Phenomics, and EnvironmentsUnifying Genomics, Phenomics, and Environments
Unifying Genomics, Phenomics, and Environments
 
Combining Phenomes and Genomes to Fill Analytical Gaps: Data Management in Ph...
Combining Phenomes and Genomes to Fill Analytical Gaps: Data Management in Ph...Combining Phenomes and Genomes to Fill Analytical Gaps: Data Management in Ph...
Combining Phenomes and Genomes to Fill Analytical Gaps: Data Management in Ph...
 
Bridging discrepancies across North American butterfly naming authorities: Su...
Bridging discrepancies across North American butterfly naming authorities: Su...Bridging discrepancies across North American butterfly naming authorities: Su...
Bridging discrepancies across North American butterfly naming authorities: Su...
 
Ontological Support of Data Discovery and Synthesis in Estuarine and Coastal ...
Ontological Support of Data Discovery and Synthesis in Estuarine and Coastal ...Ontological Support of Data Discovery and Synthesis in Estuarine and Coastal ...
Ontological Support of Data Discovery and Synthesis in Estuarine and Coastal ...
 
Next-Gen Taxonomic Descriptions for Microbial Eukaryotes
Next-Gen Taxonomic Descriptions for Microbial EukaryotesNext-Gen Taxonomic Descriptions for Microbial Eukaryotes
Next-Gen Taxonomic Descriptions for Microbial Eukaryotes
 
Linking biodiversity data for ecology
Linking biodiversity data for ecologyLinking biodiversity data for ecology
Linking biodiversity data for ecology
 
Data Infrastructure for Coastal and Estuarine Science
Data Infrastructure for Coastal and Estuarine ScienceData Infrastructure for Coastal and Estuarine Science
Data Infrastructure for Coastal and Estuarine Science
 
Gulf of Mexico Hydrocarbon Database: Integrating Heterogeneous Data for Impro...
Gulf of Mexico Hydrocarbon Database: Integrating Heterogeneous Data for Impro...Gulf of Mexico Hydrocarbon Database: Integrating Heterogeneous Data for Impro...
Gulf of Mexico Hydrocarbon Database: Integrating Heterogeneous Data for Impro...
 
Knowledge extraction from the Encyclopedia of Life using Python NLTK
Knowledge extraction from the Encyclopedia of Life using Python NLTKKnowledge extraction from the Encyclopedia of Life using Python NLTK
Knowledge extraction from the Encyclopedia of Life using Python NLTK
 
Marrying models and data: Adventures in Modeling, Data Wrangling and Software...
Marrying models and data: Adventures in Modeling, Data Wrangling and Software...Marrying models and data: Adventures in Modeling, Data Wrangling and Software...
Marrying models and data: Adventures in Modeling, Data Wrangling and Software...
 
Visualizing Evolution
Visualizing EvolutionVisualizing Evolution
Visualizing Evolution
 
The Future of Microalgal Taxonomy
The Future of Microalgal TaxonomyThe Future of Microalgal Taxonomy
The Future of Microalgal Taxonomy
 
Knowledge Extraction and Semantic Linking in the Encyclopedia of Life
Knowledge Extraction and Semantic Linking in the Encyclopedia of LifeKnowledge Extraction and Semantic Linking in the Encyclopedia of Life
Knowledge Extraction and Semantic Linking in the Encyclopedia of Life
 

Recently uploaded

如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Studia Poinsotiana
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyBLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
NoelManyise1
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
S.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary levelS.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary level
ronaldlakony0
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
zeex60
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
sonaliswain16
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 

Recently uploaded (20)

如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiologyBLOOD AND BLOOD COMPONENT- introduction to blood physiology
BLOOD AND BLOOD COMPONENT- introduction to blood physiology
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
S.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary levelS.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary level
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
role of pramana in research.pptx in science
role of pramana in research.pptx in sciencerole of pramana in research.pptx in science
role of pramana in research.pptx in science
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 

Predicting Phenotype from Multi-Scale Genomic and Environment Data using Neural Networks and Knowledge Graphs: An Introduction to the NSF GenoPhenoEnvo Project

  • 1. Predicting Phenotype from Multi-Scale Genomic and Environment Data using Neural Networks and Knowledge Graphs: An Introduction to the NSF GenoPhenoEnvo Project Anne E Thessen, Michael Behrisch, Emily J Cain, Remco Chang, Bryan Heidorn, Pankaj Jaiswal, David LeBauer, Ab Mosca, Monica C Munoz-Torres, Arun Ross, Tyson Swetnam
  • 2. Acknowledgements ● NSF Ideas Lab ● NSF Award 1940330 Harnessing the Data Revolution ● Translational and Integrative Sciences Lab OSU (tislab.org) ● Two new members: Ishita Debnath (MSU) and Ryan Bartelme (UA)
  • 3. Predicting Phenotype from Genes and Environments ● G + E = P only works in the simplest systems, if at all ● There’s a lot we don’t know about how genes are translated into phenotypes ● How do phenotypes affect the ecosystem?
  • 4. How can we predict phenotype given an organism’s environmental conditions and genomic endowment? ● State-of-the-art statistical modeling has led to many insights, but has been applied to very controlled systems. ● Getting the phenotype is only part of the answer. ● Can we use the predictive model to reveal hidden processes? Critical variables?
  • 5. Machine Learning for Results and Process ● Pros ○ Capable of coping with non-linearity in biological systems ○ Find hidden relationships ● Cons ○ Output is opaque ○ Not enough curated data not available ○ Data that are available not “ML ready”
  • 6. GenoPhenoEnvo Project ● Goal: Develop a machine learning framework capable of predicting phenotypes based on multi-scale data about genes and environments. ○ Leverage existing, well-structured, cross-species reference data about genes and phenotypes ○ Provide interactive data visualizations for examining and interpreting the “black-box” behavior of ML models and their results ○ Realize a new model for relating phenotypes, genetic endowment, and environmental characteristics ● Just started Oct 1 ● Now in Year 1 Q2
  • 7. Training Data Year 1 ● TERRA-REF sorghum ● Heavily controlled and measured environment data ● Thorough genotyping and phenotyping ● Knowledge graph links ● How do we prepare these data to be ML ready? Year 2 ● TERRA-REF wheat ● NEON, EOS ● Citizen science phenology
  • 8. What is a Knowledge Graph?
  • 9. How Can Knowledge Graphs Help? DOI: 10.1126/scitranslmed.3009262
  • 10. How Can Knowledge Graphs Help?
  • 11. How Can Knowledge Graphs Help? ● Constrain ML and prioritize results ● Quality control - sanity check ● Integrate heterogeneous data ○ Manage terminology ○ Manage scale and granularity ● Find new relationships ● Fill in data gaps with inferencing
  • 12. Training Data - Genomic ● Use VCF files and phenotype data from TERRA-REF ● SNP to phenotype associations with p values and effect sizes (GWAS) ● Manhattan plot for all phenotype data List 1: TERRA-REF data for Y1 Gene Information (G) ● Sorghum whole genome ● Sorghum genotypes[219] Phenotype Information (P) ● Emergence Date ● End of Season Biomass ● End of Season Height ● Flowering Date Environment Information (E) ● Soil moisture ● Air temperature ● PAR (Irradiance) ● Wind speed ● Humidity ● Precipitation and Irrigation ● Fertilizer inputs By M. Kamran Ikram et al - Ikram MK et al (2010) Four Novel Loci (19q13, 6q24, 12q24, and 5q14) Influence the Microcirculation In Vivo. PLoS Genet. 2010 Oct 28;6(10):e1001184. doi:10.1371/journal.pgen.1001184.g001, CC BY 2.5, https://commons.wikimedia.org/w/index.php?curid=18056138 *example GWAS results can be combined with the knowledge graph results to reduce input variables for ML
  • 13. Training Data - Phenomic ● Days to flowering ● Growing Degree Days to flowering ● Days to flag leaf emergence ● Growing Degree Days to flag leaf emergence ● Canopy height ● End of season canopy height ● Above ground dry biomass at harvest List 1: TERRA-REF data for Y1 Gene Information (G) ● Sorghum whole genome ● Sorghum genotypes[219] Phenotype Information (P) ● Emergence Date ● End of Season Biomass ● End of Season Height ● Flowering Date Environment Information (E) ● Soil moisture ● Air temperature ● PAR (Irradiance) ● Wind speed ● Humidity ● Precipitation and Irrigation ● Fertilizer inputs Phenotypes can be represented as categories or integers (10 cm or decreased height?)
  • 14. Training Data - Environmental ● Air temperature ● Relative humidity ● Precipitation ● Wind speed and direction ● Growing degree days ● Cumulative precipitation List 1: TERRA-REF data for Y1 Gene Information (G) ● Sorghum whole genome ● Sorghum genotypes[219] Phenotype Information (P) ● Emergence Date ● End of Season Biomass ● End of Season Height ● Flowering Date Environment Information (E) ● Soil moisture ● Air temperature ● PAR (Irradiance) ● Wind speed ● Humidity ● Precipitation and Irrigation ● Fertilizer inputs Data from weather station and gantry Abstracted to daily average, min, and max
  • 15. Machine Learning - Preliminary 1. Regression Models 2. Simple Neural Networks 3. Deep Neural Networks
  • 16. Year 2 - Expanding to Ecosystems (Preliminary) Leverage data and resources from multiple NSF supported programs: ● Analyze genetic and remote sensing data from NEON ● Utilize the CyVerse Data Science Workbench ● XSEDE for ML computations List 2: Observational & EOS data for Y2 Gene Information (G) ● Cottonwood whole genome ● Cottonwood genotypes ● Sorghum whole genome ● Sorghum genotypes ● Wheat whole genome ● Wheat genotypes Environment Information (E) ● Soil moisture (e.g., SMAP) ● Precipitation (Daymet) ● Air temperature (Daymet) ● PAR (NARR) ● Soil Type (USDA) Phenotype Information (P) ● Leafing date ● Flowering date ● Breaking leaf buds (#) ● Ripe fruits (#) ● Increasing leaf size (Y/N) ● Falling leaves (Y/N) ● Colored leaves (Y/N) ● Flowers or buds (#) ● Open flowers (#) ● Pollen release (Y/N) ● Recent fruit/seed drop (Y/N) ● Fruits (#) ● NDVI (EOS)
  • 17. GenoPhenoEnvo Project Information ● Join our Google Group ● Watch our GitHub Repo github.com/genophenoenvo ● Search Twitter hashtag #GenoPhenoEnvo ● Visit the project web page ● Anne E Thessen ● annethessen@gmail.com Questions?