Your SlideShare is downloading. ×
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Building a clinical genome interpretation services company
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Building a clinical genome interpretation services company

483

Published on

Talk given to the Berkeley sequencing supergroup in January 2012.

Talk given to the Berkeley sequencing supergroup in January 2012.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
483
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Building a clinical genomeinterpretation services companyReece Hart, Ph.D.reece@locusdev.netLocus Development Inc.http://locusdevelopmentinc.com/ Reece Hart — Locus Development 1/28
  • 2. Opportunity Reece Hart — Locus Development 2/28
  • 3. Clinical Genome Interpretation Patient presents with symptoms If genomic interpretation might influence diagnosis or treatment, doctor refers patient to genetic counselor GC takes history; sample is sent to internal or one of Report is returned to GC hundreds of labs that provide and/or physician who specific genomic tests verify interpretation and consult with patient Sequencing and other lab data are processed into preliminary iterpretationphotos:Baylor College of Medicine, Univ. Utah, learningradiology.com, sciencephotos.com Reece Hart — Locus Development 3/28
  • 4. 100s of laboratory diagnostic testing labs Reece Hart — Locus Development 4/28
  • 5. Common variants are hard to interpret Reece Hart — Locus Development 5/28
  • 6. Some variants are informative Reece Hart — Locus Development 6/28
  • 7. The Significance of“Variants of Uncertain Significance” “VUS – Variant of uncertain significance. A variation in a genetic sequence whose association with disease risk is unknown. Also called variant of uncertain significance, variant of unknown significance, and unclassified variant.” http://www.cancer.gov/cancertopics/genetics-terms-alphalist 7/28
  • 8. The long tail of rare diseases. “A rare disease typically affects a patient population estimated at fewer than 200,000 in the U.S. There are more than 6,000 rare diseases known today and they affect an estimated 25 million persons in the U.S.” NIH Office of Rare Diseases Research http://rarediseases.info.nih.gov/ 8/28
  • 9. The Problems to Solve ➢ Develop a reliable database of genotypes and phenotypes. ➢ Develop methods to interpret all types of variants, not just common SNVs. ➢ Provide meaningful, reliable interpretations based on genomic data. ➢ Do it better than everyone else. Reece Hart — Locus Development 9/28
  • 10. PlanReece Hart — Locus Development 10/28
  • 11. Company Overview LocusGenomic Sequence Clinical and Variants Interpretation Reece Hart — Locus Development 11/28
  • 12. Curating Genotypes, Phenotypes, and Risk Genotype-Phenotype Database dbSNP GO LSDBs Genotypes/ Phenotypes/ OMIMPharmGKB Variants Conditions ICD-9/10 … … Risk Models Reece Hart — Locus Development 12/28
  • 13. Locus Overview hospitals/clinics, physicians, insurers workflow and tracking variants/ condition inter- sequences attributes predictions pretation
  • 14. Implementation Reece Hart — Locus Development 14/28
  • 15. Curation Content ➢ Many sources ● automated and manual tools ● databases and literature ➢ Most kinds of variants ● SNV, del, ins, delins, repeat, conv, CNV, haplotypes ➢ Many kinds of conditions ● inherited, spontaneous, dominant, recessive, x- linked, preventative, cancer, metabolic, pharmacogenomic, cardio ➢ Examples: ● Cystic Fibrosis (w/modifiers) ● CMT (~21 subclasses) ● Long and Short QT ● TPMT, warfarin, CYP2D6 Reece Hart — Locus Development 15/28
  • 16. The pipeline LIMS curation reqn and condn var. risk models sample info reads calls attributes report (fastq) variant (vcf) selection (xml) inter- (xml) calling pretationexecution framework <?xml version="1.0"?> variants_and_refagree.vcf filtered_on_callable.vcf: lake.mk reads.fastq <locus-report format="1.0"> @G88NFDU01AI6Z3 rank=0000170 x=101.0 y=1953.5 length=56 <requisition> AGTGTAGTAGTGAGAAAAACTTTGTGGGGATATGGATACAATTATTTACCCAAATC <requisition> (set -e; <?xml version="1.0" encoding="UTF-8"?> ##fileformat=VCFv4.1 +<conditions>not giving this section too muchthought. Good enough for now source /locus/opt/lake/bin/lakeSetupEnv; <!-- Im <sample-attributes> … <condition>VonHL</condition> we commercialize --> IIIIIIIIIIIGC>////-....826666<EIIIIIIIIIIIHI6644/..222== $(MAKE) -f $<id="LS125"when <sample-info $@; Can update later, gender="Unknown" … > ##FILTER=<ID=LowQual,Description="Lowy=1960.0 length=59 @G88NFDU01AKOQI rank=0000178 x=118.0 quality"> </conditions> ) 2>$@.err <client id="uuid"></client> <reference> ##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic ##INFO=<ID=AC,Number=A,Type=Inte </requisition> agtgtagtagtaaggaagattgagtgcctgaccttCCGGGTGGCGGTAGCGTTGGCCCCid="LS99"></patient> <patient name="LS99" ethnicity="" gender="Male" dob="" <organism>homo sapiens</organism> ##VariantFiltration="analysis_type=VariantFiltration input_file=[] sample_metadata=[] read_ calls.vcf: variants_and_refagree.vcf #filtered_on_callable.vcf … + <build-id>human_g1k_v37</build-id> ##contig=<ID=GL000240.1,length=41933,assembly=b37> <?xml version="1.0" encoding="UTF-8"?> BHBEEIIIIIEEEEEBGBECCCDEIIIIIIIIIIIEEICC===988ED>?>>>88...- ln -s $< $@ <conditions> </reference> ##reference=file:///locus/data/references/genomes/human_g1k_v37/sequences/human_g1k_v37.fas <samples> @G88NFDU01AL6H7 rank=0000323 x=135.0 y=2013.5 length=95 <condition code="VonHL"> <loci> ##source=SelectVariantsgender="Male" birth-date="/Date(1320120000000-0700)/" <sample-info id="LS99" start="10183685" ${ATTR_FILE} attr.xml: calls.vcf req.xml sample.xml end="10183685" agtgtagtagtgtgagctggtgaagaaggtctccGATGTCATATGGAACAGCCTCAGCCGCTCCTACTTCAAGGATCGGGCCCACA <associated-conditions></associated-conditions> <locus chr="3" #CHROM</condition>status="New" ordering-clinician="JMajor" nanodrop- generate_attributes_file.py FILTER INFO POS ID REF ALT QUAL type="GenomicDNA" TCCAGTCCC … FORMAT LS99 sequence="G" read-coverage="0"> 1 concentration="300" A + 145414740 . N . PASS AC=0;AF=0.00;AN=2;DP=137;MQ=213.16;MQ0=0 </conditions>read-coverage="" quality-score="" sequence="" original-barcode="NA06994" use-type="RD" origin="Coriell" GT:DP:G <alt 1 </requisition>. report.xml: attr.xmlGreq.xml. 145414741 N PASS AC=0;AF=0.00;AN=2;DP=138;MQ=213.16;MQ0=0 code="NA06994" concentration="300" description="" accession-date="" =>BBBBB==;;B>454@EA@>>===>>BBIIE@ACIGIEIFFDD66665@@:::>AA777A<;;>A>?accession- GT:DP:G locus-cvid-code="CVID1003741" locus-cvid="A|G" 1 <coverage> >>4433;>>;660000.9=85533,,, .-o PASS 145414742 user=""/> . C N sampleconditionreport $^ $@ AC=0;AF=0.00;AN=2;DP=139;MQ=212.39;MQ0=0 GT:DP:G locus-cvid-start="10183685"/> </samples> <sequence minimum-depth="100" sensitivity="98.9" specificity="99.0"> </locus> report.html: report.xml <region genome-build="GRCh37" chrom="3" end="440" start="400"></region> reportrenderer $< -o $@ <region genome-build="GRCh37" chrom="3" end="700" start="600"></region>
  • 17. The pipeline in action $ ls reads.fastq.gz req.xml sample.* Makefile Makefile reads.fastq.gz req.xml sample.info sample.xml $ time make report.html report.pdf gzip -cdq <reads.fastq.gz >reads.fastq lake --recipe reads_to_variants >lake.mk ln -s variants_and_refagree.vcf calls.vcf generate_attributes_file.py ... sampleconditionreport attr.xml req.xml -o report.xml reportrenderer report.xml -o report.html wkhtmltopdf report.html report.pdf real 7m14.804s user 7m16.490s sys 2m0.150s
  • 18. Locus Interpretation Reece Hart — Locus Development 18/28
  • 19. The big lesson… Transcripts are muchmessier than expected. Reece Hart — Locus Development 19/28
  • 20. Problem statement There is no single source of transcripts that is all of: stable (archived), mapped, agree with the reference genome, have RefSeq accessions. ➢ Issues: ● Poor access / programmability ● No archived mappings ● RefSeq != reference genome due to origin, ambiguity, error ● Patches are difficult to use Reece Hart — Locus Development 20/28
  • 21. When RefSeq != Genome Reference NC_000006.11:g.31030103C>T NC_000006.11:g.31038124T>G variant published discovered variant relative to RefSeq reported relative to RefSeq NM_0123.4:c.45C>T NM_0123.4:c.832T>G mismatch ins/del A - downstream coordinates shifted Reece Hart — Locus Development 21/28
  • 22. 17.8% of RefSeq transcripts differ fromGRCh37 5.4% have coordinate- changing differences Garla, V., Kong, Y., Szpakowski, S., & Krauthammer, M. (2011). MU2A--reconciling the genome and transcriptome to determine the effects of base substitutions. Bioinformatics (Oxford, England), 27(3), 416-8. doi:10.1093/bioinformatics/btq658 Reece Hart — Locus Development 22/28
  • 23. Sources of transcript information ➢ NCBI: ● map current transcripts to current genome only ● maps with splign ● doesnt agree with ref genome ~18% ● no local database option ➢ UCSC: ● current transcripts only ● maps using blat ➢ Ensembl: ● aligns using in-house gene building process ● cross-linked to refseqs ● incorporates NCBI transcripts ad hoc ● well-maintained; good API; broad data; VEP Reece Hart — Locus Development 23/28
  • 24. PTEN: insertion/deletion in 5 UTR Reece Hart — Locus Development 24/28
  • 25. NEFL: genome insertion leads toframeshift/stop Reece Hart — Locus Development 25/28
  • 26. RefSeq Handling ---------- Forwarded message ---------- Date: Wed, Jan 25, 2012 at 1:59 PM Subject: [Genome] How does UCSC hg19 gene model add exons to RefSeqs? To: genome@soe.ucsc.edu Hi, when using the human reference hg19 gene model … where the hg19 model has an exon that does not exon exist in the RefSeq accession (or any historical version of the RefSeq accession). How/why does the alignment introduce an intron in this case? Does it ensure there are plausible flanking splice junctions before inserting an intron to a RefSeq sequence that lacks it but it maps to? Reece Hart — Locus Development 26/28
  • 27. 338 genes so far➢ We should encourage LRG and adopt it when ready(and well still have to deal with legacy transcripts) Reece Hart — Locus Development 27/28
  • 28. Not pictured: Jon Sorenson Reece Hart — Locus Development 28/28

×