1. Toward Meaningful Whole-Genome
Interpretation with Open Access Tools
From the Genome Commons
BioIT World Expo
2010-04-22
Reece Hart, Ph.D.
Chief Scientist, Genome Commons
QB3 / Center for Computational Biology
UC Berkeley
reece@berkeley.edu
1
2010-04-22 11:43
2. What did we learn from their genomes?
Not much.
2
3. Can we agree to disagree? Probably not.
Heart Attack Risk Prediction
from Experimental Man, DE Duncan
Gene Marker Risk Allele Genotype Risk Company
CELSR2/PSEC1 rs599839 G AG 0.86 deCodeMe
CDKN2A/CDKN2B? rs10116277 T GT 1 deCodeMe
CDKN2A/CDKN2B? rs1333049 C CC 1.72 Navigenics
MTHFD1L rs6922269 A AA 1.53 Navigenics
CDKN2A/CDKN2B? rs2383207 G GG 1.22 23andme
3
5. There's lots of good news, too.
➢ Disease diagnosis & prognosis
➢ Drug dosing and side effects
➢ Disease variant/gene identification
➢ Technological advances
5
6. The Genome Commons seeks to build
open access, open source tools that
maximize the predictive, preventative,
and personalized value of genomic data.
● Technical – organize date and streamline
tools
● Scientific – improve predictive accuracy
● Clinical – engage clinicians and counselors
● ELSI – address ineluctable ethical, legal, and
social dilemmas
6
8. Databases isolation impedes effective use.
Data are studied, compiled, and stored gene-wise.
That makes sense for collection, but not for genome-wide use.
OMIM
GeneTests/
GeneReviews
935 genes
LSDBs
1177 Locus-Specific Databases NHGRI GWAS
Source: http://www.hgvs.org/dblist/glsdb.html on Oct 15.
Some genes have multiple LSDBs.
PharmGKB
Literature
Literature
dbSNP
8
9. GCdb will be a repository of variants and traits.
OMIM from
dbSNP
dbSNP Genome Commons GO
Database
LSDBs
⋮ variants
pheno-
types ICD-10
GeneTests
PharmGKB Automated bulk Curated, high-quality, UMLS
loading of and traceable
structured data association data
➢ Genotypes in standard ➢ Up-to-date
coordinates ➢ Quality-controlled
➢ Phenotype ontologies ➢ Open access
➢ Asociations with ➢ Based on Unison
likelihood, confidence,
evidence, and severity
9
11. The Navigator will integrate data and tools.
Infer variants in LD Align variants to Identify variants with Facile user interfaces for basic research,
with typed markers specified genome known phenotypic impact clinical application, drug development,
epidemiology, and other uses.
Genome Commons Navigator
V
Genotypes (e.g., a
Imputer Remapper Annotator
by hybridization) r
Variant
i
Annotation
a
Integrator
Whole Genome/ Assembler/ Variant n Impact
Exome Sequences Aligner Caller t Predictor
s
Assemble genome Phased, aligned variants, Infer effect of unclassified Integrate and reconcile all
sequence and call variants from genotyping, genetic variants classified variants into a
(separately or jointly) imputation, or sequencing comprehensive report
External Data and Tools
Genome Commons Database
11
13. CAGI – Critical Assessment of Genome Interpretation
A community assessment of the state-of-the-art in phenotype prediction.
➢ Follow the successful CASP framework
● Solicit unpublished data
● Collect blind predictions from participants
● Assess against revealed annotations,
mechanisms, and phenotypes
➢ Prediction Domains:
Molecular phenotype Cellular phenotype Organismal phenotype
A A A
T T T
With John Moult & Steven Brenner 13
14. MTHFR and Methylation
exogenous
folate fol3
met13
5,10-Methylene tetrahydrofolate (TH4) is required for the synthesis of nucleic acids, while 5-methyl TH4
is required for the formation of methionine from homocysteine. Methionine, in the form of S-
adenosylmethionine, is required for many biological methylation reactions, including DNA methylation.
Methylene TH4 reductase is a flavin-dependent enzyme required to catalyze the reduction of 5,10-
methylene TH4 to 5-methyl TH4.
Linus Pauling Institute
http://lpi.oregonstate.edu 14
15. Sequencing 18 Genes of Folate Pathway
Guthrie-Spot Sequencing Protocol
➢ 250 NTD children and 250 case matched
controls
➢ Protocol
● 2mm punch
● Isolate genomic DNA
● Amplification
● Purification
● Sequencing by JGI
➢ Variant calls of 238 exons in 18 genes
● Analysis
● Curate
● QC
Jasper Rine 15
17. Step 1: Collect predictions.
mutation Team 1 Team 2
M110I No Effect Remediable
R134C Impaired Remediable
D223N Remediable
R519C No Effect No Effect
17
18. Step 2: Assess predictions.
mutation Team 1 Team 2 Experiment
M110I No Effect
Remediable Remediable
R134C Impaired Remediable Impaired
D223N Remediable Remediable
R519C No Effect Effect
No No Effect
18
19. Step 3: Celebrate and learn.
It's not whether you win or lose...
mutation Team 1 Team 2 Experiment
M110I No Effect
Remediable Remediable
R134C Impaired Remediable Impaired
D223N Remediable Remediable
R519C No Effect Effect
No No Effect
19
23. A few ineluctable ethical issues.
➢ How to fairly acknowledge aggregated
data?
➢ Should scientifically suggestive results be
used for clinical care?
➢ What is the balance between openness and
preventing misinterpretation?
➢ What happens to confidentiality
agreements during bankruptcy?
➢ How do we balance personal privacy with
opportunities for public health advances?
Bernard Lo 23