• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
VAAST: Deciphering Genetic Disease with Next-Generation Sequencing
 

VAAST: Deciphering Genetic Disease with Next-Generation Sequencing

on

  • 1,035 views

 

Statistics

Views

Total Views
1,035
Views on SlideShare
1,035
Embed Views
0

Actions

Likes
0
Downloads
31
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • I’m going to begin the discussion of VAAST with a simple description of how the pipeline runs
  • Numerator = Null Model (No Difference)Denominator = Alternate Model (Difference)
  • The maximum likelihood of the null model over the maximum likelihood of the alternate model - weighted by the frequency of the AAS in the healthy dataset over the frequency of that AAS in a disease datasetn=frequency of that AAS in the background p=estimated probability of...B=T=Y=X=a=frequency of this AAS in OMIM
  • The maximum likelihood of the null model over the maximum likelihood of the alternate model - weighted by the frequency of the AAS in the healthy dataset over the frequency of that AAS in a disease datasetn=frequency of that AAS in the background p=estimated probability of...B=T=Y=X=a=frequency of this AAS in OMIM
  • Miller Syndrome
  • A-G TransitionSerine to Proline
  • Thank the family

VAAST: Deciphering Genetic Disease with Next-Generation Sequencing VAAST: Deciphering Genetic Disease with Next-Generation Sequencing Presentation Transcript

  • VAASTDeciphering Genetic Disease with Next-GenerationSequencingBarry Moore, M.S.Research ScientistDepartment of Human GeneticsDepartment of Biomedical Informatics
  • Outline The VAAST Analysis Pipeline Ogden Syndrome: Application of VAAST to a Genetic Disease of Unknown Cause The Future of VAAST Development
  • $10,000,000Venter Genome $1,000,000 Watson $5,000 You?
  • Next Generation Sequencing Disease Healthy geneA geneB geneX geneY geneZ
  • Variant Variant AnnotationAnnotation Tool Variant Variant SelectionSelection Tool Variant Variant Annotation Analysis Analysis Search Tool
  • GVFVAAST Pipeline 3.5 Million Variants Reference VAT Reference Genome (Variant Annotation Tool) Genes Fasta GFF3 Annotated Annotated Annotated GVF Variants Variants Variants VST (Variant Selection Tool) CDR Merged Variant Sets
  • GVFVAAST Pipeline Variant Effect 3.5 Million •sequence_variant Variants •gene_variant Reference VAT Reference •five_prime_UTR_variant Genome Type Variant Genes •three_prime_UTR_variant (Variant Annotation Tool) •sequence_alteration Fasta •exon_variant GFF3 •deletion •splice_region_variant •insertion •splice_donor_variant •duplication Annotated Annotated •splice_acceptor_variant Annotated •inversion GVF •intron_variant Variants •substitution Variants Variants •coding_sequence_variant •SNV •stop_retained •MNP •stop_lost •complex substitution •stop_gained •translocation VST •synonymous_codon •non_synonymous_codon (Variant Selection Tool) •amino_acid_substitution •frameshift_variant •inframe_variant CDR Merged Variant Sets
  • GVFVAAST Pipeline Variant Effect 3.5 Million •sequence_variant Variants •gene_variant Reference VAT Reference •five_prime_UTR_variant Genome Type Variant Genes •three_prime_UTR_variant (Variant Annotation Tool) •sequence_alteration Fasta •exon_variant GFF3 •deletion •splice_region_variant •insertion •splice_donor_variant •duplication Annotated Annotated •splice_acceptor_variant Annotated •inversion GVF •intron_variant Variants •substitution Variants Variants •coding_sequence_variant •SNV •stop_retained •MNP •stop_lost •complex substitution •stop_gained •translocation VST •synonymous_codon •non_synonymous_codon (Variant Selection Tool) •amino_acid_substitution •frameshift_variant •inframe_variant CDR Merged Variant Sets
  • CDR CDRBackground Target Genomes Genomes VAAST Prioritized Candidate Genes VAAST Report
  • Key Features of VAAST• Probabilistic• Feature Based• Both Allele and AAS Frequencies• Considers Inheritance Model• Fast• Standardized Ontology Based Format• Modular and Flexible in Design
  • VAAST Uses Variant Frequencies in aProbabilistic Fashion Likelihood Ratio Test Maximum Likelihood of the Null Model (No Difference) Maximum Likelihood of the Alternate Model (There is Difference)
  • VAAST Uses Variant Frequencies in aProbabilistic Fashion
  • VAAST Uses Variant Frequencies in aProbabilistic Fashion• VAAST gives us the likelihood of the composite genotype at GENE X in the target given the background.• Do allele frequencies differ between Background and Target genomes within a given gene or feature?• Composite likelihood calculation assumes independence across sites. To control for LD, statistical significance is estimated by permutation test.• Multiple test correction for number of features (~20,000) is two orders of magnitude better than for the number of variants (~3,500,000).
  • Noise Decreases Dramatically withIncreasing Number of Genomes 1 genome target 1 genome background
  • 1 genome target10 genome background
  • 1 genome target250 genome background
  • 1 genome target250 genome background Trio Data
  • Alleles Responsible for Miller Syndrome in Utah Kindred CHR 16: DHODH CHR 5: DNAH5 Mom Dad Mom Dad G:R R:Q G:A R: * Son Daughter Son Daughter G:R G:R R:Q R:Q R: R: G:A G:A * *•Ng et al, Nature Genetics 42, 30–35 (2010) doi:10.1038/ng.499•Roach, et al, Science , 328 636, 2101
  • Schematic of VAAST Analysis of UtahMiller Kindred Using a Single Quartet DHODHDNAH5
  • Average Rank for 100 Dominant andRecessive Diseases 1300 Ave. rank genome-wide SIZE OF CASE COHORT 1100 2 allele copies 900 4 allele copies 700 6 allele copies 500 300 156 132 100 21 9 8 3 -100 DOMINANT RECESSIVE -300 -500 443 genomes in background
  • Impact of Missing Data 4000 3500 2 of 6 allele copies Ave. rank genome-wide 3000 4 of 6 allele copies 2500 6 of 6 allele copies 2000 1500 1000 639 500 373 61 21 9 3 0 -500 DOMINANT RECESSIVE 443 genomes in background
  • Outline The VAAST Analysis Pipeline Ogden Syndrome: Application of VAAST to a Genetic Disease of Unknown Cause The Future of VAAST Development
  • An Rare X-linked Mendelian Disorder• A Utah family coming to the University Hospital for 20+ years• About half of the male offspring die around 1 year of age• Aged appearance• Craniofacial anomalies• Hypotonia• Global developmental delays• Cardiac arrhythmias
  • Four Affected Boys over Two Generations I IIIII
  • Exome Sequencing • Agilent SureSelect In-Solution X Chromosome Capture • Covaris S series Sonication (150-200 bp) • 76 bp single-end reads on one lane each of the IlluminaGAIIxVariant Calling • Sequence alignment with bwa • Remove duplicate reads with PICARD • Realign indel regions with GATK • Variant calling with Samtools, GATK
  • Identifying Candidate Genes VAAST Identifies NAA10 as Candidate Gene • About 20 min. run time • 3 candidate genes (NAA10 ranked 2) proband only • 1 candidate gene (NAA10) with pedigree
  • Additional Analyses • Microarray based CNV analysis • No likely causal variants found • Sanger sequencing confirmation • Variant segregates perfectly with disease in 13 family members • Haplotype sharing (STR genotyping) • ~11 MB shared between two affected boys • A second family discovered – same mutation • IBD relatedness analysis – independent mutational events
  • N(alpha)-acetyltransferase • N-alpha-acetylation is one of the most common protein modifications that occurs during protein synthesis. • NatA (catalytic subunit NAA10 (hARD1) • Eight exons, Crick strand, highly conserved • A:G transition causes p.Ser37Pro
  • Functional Analyses • Quantitative in vitro N-terminal acetylation assay (RP- HPLC). • Four peptide substrates previously shown to be acetylated by NatA (NAA10) • Assays indicate loss-of-function allele.
  • Functional Analyses
  • VAAST in Summary• Probabilistic Disease Gene Finder• Feature Based not Variant Based• Both Allele and AAS Frequencies• Considers Inheritance Model• As few as two target genomes can be sufficient to identify causative gene.• Background Genomes are “Reusable”• Not Limited to Human Analyses
  • VAAST: Future Directions • Indel support • Splice-site • No-call support • Pedigree support • Phylogenetic conservation
  • AcknowledgementsVAAST Development Ogden•Chad Huff Syndrome •Thomas Arnesen•HaoHu •John Carey •Rune Evjenth•Lynn Jorde •Steven Chin •Johan R. Lillehaug•Barry Moore •Heidi Deborah Fain•Martin Reese •Gholson Lyon •Leslie G. Biesecker•Marc Singleton •John Optiz •Jennifer J.•Jinchuan Xing •Theodore J. Pysher Johnston•Mark Yandell •Alan Rope •Cathy A. StevensYandell Lab •Reid Robison •Sarah T. South •Brian Dalley•Michael Campbell •Tao Jiang•Daniel Ence •JeffereySwensen •Chad Huff•Guozhen Fan •Evan Johnson•Steven Flygare •HakonHakonarson •Barry Moore•HaoHu •Lynn B. Jorde •Christa Schank•Zev Kronenberg •Mark Yandell •Kai Wang•Barry Moore •Jinchuan Xing•Marc Singleton•Robert Ross•Mark Yandell
  • Acknowledgements