Abstract
The challenge of precision medicine is to model complex interactions among DNA variants, phenotypes, development, environments, and treatments. The community of researchers using animal models must address the challenge of both genetic and environmental complexity typical of human populations. We have developed large families of mice and rats that can be used as uniquely powerful model for experimental versions of precision medicine. For example, the BXD family of mice segregates for 6 million common DNA variants—a level that exceeds many human populations. Because each member is an isogenic strain, the entire family can be replicated in many environments and offered many treatments. Heritable traits can be mapped with high power and precision. The current BXD phenome is unsurpassed in coverage and include deep omics data and thousands of quantitative traits—including a great deal of data relevant to metabolism, obesity, aging, kidney function, and insulin levels. These new Experimental Precision Medicine resources can be expanded to as many as 20,000 isogenic but non-inbred F1 progeny and be used as a far more effective platform for testing causal modeling and for predictive validation—unique core resources for the fields of prevention and therapeutics.
The top 3 key questions that GeneNetwork can answer:
1. What is the relation between insulin levels as a function of diet, age, body weight and lifespan in mice? (BXD family; see Nature Metabolism paper by Roy et al., Sept-Oct 2021)
2. How can researchers using single strains of mice broaden the relevance of their findings to improve the translational relevance to human health and to diabetes prevention and treatment?
3. How in the world can a molecular or cell biologist master complex statistical genetic methods to test causal (aka, mechanistic) linkages between DNA variants and disease risk?
Presenter: Robert W. Williams, Ph.D. Chair, Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, UT-ORNL Governor's Chair in Computational Genomics
Upcoming webinars schedule: https://dknet.org/about/webinar
2. 2
The challenge of precision medicine is to model complex interactions among DNA variants, phenotypes,
development, environments, and treatments. The community of researchers using animal models must
address the challenge of both genetic and environmental complexity typical of human populations. We have
developed large families of mice and rats that can be used as uniquely powerful model for experimental
versions of precision medicine. For example, the BXD family of mice segregates for 6 million common DNA
variants—a level that exceeds many human populations. Because each member is an isogenic strain, the entire
family can be replicated in many environments and offered many treatments. Heritable traits can be mapped
with high power and precision. The current BXD phenome is unsurpassed in coverage and include deep omics
data and thousands of quantitative traits—including a great deal of data relevant to metabolism, obesity, aging,
kidney function, and insulin levels. These new Experimental Precision Medicine resources can be expanded to
as many as 20,000 isogenic but non-inbred F1 progeny and be used as a far more effective platform for testing
causal modeling and for predictive validation—unique core resources for the fields of prevention and
therapeutics.
The top 3 key questions that GeneNetwork can answer:
1. What is the relation between insulin levels as a function of diet, age, body weight and lifespan in mice? (BXD
family; see Nature Metabolism paper by Roy et al., Sept-Oct 2021)
2. How can researchers using single strains of mice broaden the relevance of their findings to improve the
translational relevance to human health and to diabetes prevention and treatment?
3. How in the world can a molecular or cell biologist master complex statistical genetic methods to test causal
(aka, mechanistic) linkages between DNA variants and disease risk?
Dial-in Information:
Date/Time: Friday, October 8, 2021, 11 am - 12 pm PDT
https://uchealth.zoom.us/meeting/register/tZ0scOChrz8qGdOX_p4fh3XsaFC3QSBb53zH
3. 3
Experimental
Precision Medicine:
Smart FAIR+ Data for
Metabolomics and
Diabetes Research
Robert W. Williams
University of Tennessee Health Science
Center, rwilliams@uthsc.edu
With thanks to Ivan Gerling PhD
who has been my mentor in
anything/all things related to
diabetes. And to nPOD for data.
4. Three simple themes
What is the lifespan of your data, your
code, and your collaborations?
We as a community have developed large families of fully
sequenced and fully extendable isogenic lines of mice and
rats:
The BXDs: n = 120 x 120 = 14,400 F1s and parents
The Collaborative Cross: n = 60 x 60 = 3,600
The rat hybrid diversity panel: n = 96 x 96 = 9216
180 x 180 = 32,400 isogenic fully sequenced mice!
These three Reference Families mirror the genetic
complexity of humans (6.0–45.0 million variants at MAFs
> 0.1). We are using these famiies to generate
multiplicatively useful and fully FAIR+ quantitative
electronic health records (EHR) data, and to build and test
causal models of disease process.
In my case the focus has been on diseases of aging,
environmental exposures, metabolic disease, and
addiction. But any disease or process or GXE is FAIR game.
And this mixes very well with IMPC KO resources!
THEME 1: The tension in the fields of genetics and
genomics between
1. reductionist approaches—often reverse genetics
2. more integrative approaches—often forward
genetics, complex traits, families, GWAS
Synthesis is critical for our near-term survival as mouse
geneticists and molecular biologiest!
THEME 2: Human disease is almost entirey due to
complex interactions among genetic and environmental
factors (GXE)
1. But GWAS only provides POPULATION level
estimates of genetics effects
2. There is no Big E in the GWAS equation
3. Polygenic risk scores are a pseudo-solution and
are not suitabe for individual X in environment Y
THEME 3: The solution for both challenges: We need to
acquire the right types of "smart" data types—coherent
and multiplicative data. This is essential to make more
accurate predictions about risk and outcome for n = 1
humans—a daunting data and computational task.
6. Human disease is all GxE—not just G, not just E
6
A useful figure, a good start, but misses more than half the model — We need to add E, GxE, and GxG (epistasis)
7. Polygenic risk scores are population averages
7
Also a useful start, but dangerous if a single PERSON takes the PRS as a serious prognosis — it is NOT
Your risk of coronary arterial disease is NOT your PRS. It is PRS, GxE, and even GxG
•10.1016/j.hlc.2019.12.004
Environmental risk score for CAD
Individual A
Individual A
8. What about animal models for common disease?
8
N = 1
95% or more of rodent
biomedical research
relies on a single
genometype of mouse or
rat—usually C57BL/6
(B6) or Sprague Dawley
N = 8 billion
Again: a good start but misses half the picture—We need to add the G and the E
9. DNA variants, SNPs, CNVs, indels
Molecular traits: omics tiers
Synapses, cells, and circuits
Brain regions, systems, behaviors
Environment, Disease Risk, Sex, Aging
Data integration across tiers
10. The solution 1 — large replicable murine “randomized clinical trials”
10
22Sep2021
Data get better with age: Ben Taylor et al 1973—I can now clone it in 10 min
www.genenetwork.org/show_trait?trait_id=13035&dataset=BXDPublish
11. The solution 2 — smart polyphenomes that builds with age
11
Data should be FAIR (FINDABLE, ACCESSIBLE, INTEROPERABLE, REUSABLE) but that is not nearly enough
Data should also be exponentially useful. Data should be paired with open analysis code; dynamic workbooks: Pluto.jl
Data generation should be compatible with causal modeling and even more: future AI data mining and integration
Of course—great metadata! Of course, QAed and Ced data that has been wranggled into a database
=
Most extramural (and intramural) data are evaporative—a short lifespan—10 years
There is the huge lost-opportunity cost of R01 independence!
FAIR and multiplicative data are still rare—Framingham Heart Study, GTEx, PDB, Allen Brain Atlas,
IMPC, GeneNetwork... And very few data sets are both FAIR and structurally coherent and
quadratically useful. Even fewer data sets come integrated with code and services for their use.
Even few get better with age.
12. What does a polyphenome look like?
X = millions of rs
Main attribute: a single new data set or trait can be integrated with thousands of companion columns of
data— that is “coherent” data. You have to be willing to use the same set of genomes. Here is one example:
Longevity data for ~80 members of the BXD mouse family (females only)
https://genenetwork.org/show_trait?trait_id=18435&dataset=BXDPublish
646 ± 22 SE days
699 ± 23 days
D2)
B6
Why “smart”? Because it is smart to use replicable families and cohorts to generate extensible data and phenomes
13. High N >Locus>Gene>Mechanism>Treatment
Genetic reference families: Stable (immortal)
families of experimental subjects (genome-
types) used to generate coherent and
multiplicatively useful data to study complex
and casual relations. For HRDP and BXDs we
decades of coherent data. All sequenced at 40X
or more. Dense polyphenome.
GeneNetwork.org: An open data analysis
resource for systems genetics; also known as
experimental precision health care. GTEx for
model organisms (and humans too) with EHR.
A live interlude of GN
Systems genetics: An integrative, quantitative and
predictive approach to study genomes and genes,
molecules, cells, mechanisms, environments,
diseases —aka, experimental precision health care.
Ashbrook DG 2021
14. Live 1: FAIR-compliant quantitative EHR-like data with open source code
14
Let’s work with some smart data in GeneNetwork.org (one of first web
services in biomedical research, started Jan 1994, PMID 8043953)
www.genenetwork.org/show_trait?trait_i
d=18524&dataset=BXDPublish#redirect
15. 15
The top 3 key questions that
GeneNetwork can answer:
1. What is the relation between insulin levels as a
function of diet, age, body weight, and lifespan in
mice? (BXD family; see Nature Metabolism paper by
Roy et al., Sept-Oct 2021)
2. How can researchers using single strains of mice
broaden the relevance of their findings to improve
the translational relevance to human health and to
diabetes prevention and treatment?
3. How in the world can a molecular or cell biologist
master complex statistical genetic methods to test
causal (aka, mechanistic) linkages between DNA
variants and disease risk?
FAIR+ data:
https://genenetwork.org/search?species=hu
man&group=GTEx_v5&type=Pancreas+mRNA
&dataset=GTEXv5_Panc_0915&search_terms
_or=insulin&search_terms_and=&FormID=sea
rchResult
16. 16
And a 4th key question that
GeneNetwork can answer:
4. What gene variants or loci controls insulin
expression level in normal humans?
FAIR+ data:
https://genenetwork.org/search?species=hu
man&group=GTEx_v5&type=Pancreas+mRNA
&dataset=GTEXv5_Panc_0915&search_terms
_or=insulin&search_terms_and=&FormID=sea
rchResult
17. Strategies: The next decades: Where this should go
17
Generate far more coherent and multiplicative data (proteomics data please!)
The price we pay for data evaporation and obsolescence—inestimatable, inexcusable
Upon this gifted age, in its dark hour,
Rains from the sky a meteoric shower
Of facts...they lie unquestioned, uncombined.
Wisdom enough to leech us of our ill
Is daily spun; but there exists no loom
To weave it into fabric;
—Edna St. Vincent Millay (Huntsman, What Quarry? 1939)
As much as we need to preserved good data, we need a new approaches to
data generation based on statistical principles, causal models, and future AI
needs. Thousands of F1 crosses (a so-called diallel cross) may be the right
path forward. Scientific leadership required!
Data must stay alive and thrive!
18. Thanks to Lu Lu
Pjotr Prins
Johan Auwerx
David G. Ashbrook
Saunak Sen
Megan K. Mulligan
Erik Garrison
Byron Jones
Khobeni Mozhui
Evan G. Williams
Eldon Geisert
Elissa Chesler
Monica Jablonski
Hao Chen
Danny Arends
G. Allan Johnson
Catherine Kaczorowski
Karl Broman
Darryl Quarles
Siamak Yousefi
Suheeta Roy
Jesse Ingels
Casey Bohl
Melinda McCarty
Daniel Ciobanu
Jeremy Peirce
Xusheng Wang
Ashutosh Pandey
Benjamin A. Taylor (TLJ)
Cat Lutz (TJL)
Pjotr Prins
Arthur Centeno
Zachary Sloan
Bonface Munyoki
Lei Yan
Christian Fischer
Alex Williams
Evan Williams
Fred Manglis
Alexander Kabui
UT-ORNL Governor’s Chair and the UT Center for Integrative & Translational Genomics
NIGMS, NIDA, NIA, NIAAA, NIMH, NCI, NEI, NSF