Church_NCBIvariation2013

928 views
799 views

Published on

NCBI Variation resources for CSHL Genome Access Course.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
928
On SlideShare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
24
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Put plug for Tim here…
  • RefSeqGene/LRG screen shot: stable coordinate system for gene level reporting. Gene centric genomic sequences.
  • Distribution of RefSeqGenes on GRCh37
  • Remap
  • Church_NCBIvariation2013

    1. 1. Variation Resources at NCBI Deanna M. Church Staff Scientist, NCBI @deannachurch
    2. 2. Variation Resources Team at NCBI Ming Ward Lon Phan Brad Holmes Anna Glodek Michael Kholodov Rama Maiti Juliana Sampson David Shao Eugene Shekhtman Qiang Wang Hua Zhang Key Collaborators Heidi Rehm, Harvard Partners Christa Lese Martin, Geisinger Sherri Bale, GeneDx Lisa Kalman, CDC Birgit Funke, Harvard Partners Madhuri Hegde, Emory Donna Maglott Melissa Landrum Jennifer Lee George Riley Ray Tully Craig Wallin Shanmuga Chitipiralla Douglas Hoffman Wonhee Jang Ken Katz Michael Ovetsky Ricardo Villamarin Tim Hefferon John Lopez John Garner Chao Chen
    3. 3. Figure credit: http://itknowledgeexchange.techtarget.com/
    4. 4. Data from external sources dbSNP dbVar ClinVar GTR Quality Control Ref variants References Annotations Visualization Tools
    5. 5. Variant Definitions Location Evidence Methodology dbSNP dbVar Variant Annotations Phenotypes Consequences Tests Other Biology ClinVar GTR dbSNP
    6. 6. GenBank vs RefSeq Submitter Owned RefSeq Owned Redundancy Updated rarely INSDC Non-Redundant Curated Not INSDC BRCA1 83 genomic records 31 mRNA records 27 protein records 3 genomic records 5 mRNA records 1 RNA record 5 protein records
    7. 7. Genome Res. 1999. 9: 677-679 http://www.ncbi.nlm.nih.gov/snp
    8. 8. SNPs defined by flanking position >gnl|dbSNP|ss76078129|allelePos=17|len=33|alleles='A/G’ GTGGCAGAGA CTGAAT R AAGGGTTGAC CCAGGG >gnl|dbSNP|ss3354770|allelePos=499|len=661|alleles='T/C’ actattcaca atagcaaaga cttggaacca acccaaatgt ccaacaatga tagactggat taagaaaatg tggcacatat acaccatgga atactaggca TTCCATTCTA CTGTGCACGA GTCACTGCAA ACTCAAGCAT TTCCAGAGTT CTGAAAGCTC AACTAAGAAC CAAGCCTACT CATTCAACAT CAACACACAC AGCACCCTGA GCGTCCAAAA CCACGGGGGT TATGTTCTAG ACCACAGGAC TGGCTACCTG GCCCTGCTCA AGGCGGCAGG ATCAATGGGC AAGAATGTGC AAGAATTTAC CACAACTCAG CCTTGCTGTG TCAACCACAG AGGCCAAGTA CCCCTAACAC CCAGATAGAG TAATTGTGCC TTACTTCTTT GTTCATTCCC ACCATTACAT TTTGTAAATT GGAACTTCTA GGAGGTTAGA AGGATATGCT GATCAAAAAA AGGGGACATA TTCAAGGAGT GTCCCTGGGT CAACCCTT Y ATTCAGTCTC TGCCACATGT CTAGTAACTG TGAGTGATGG GTGCATCAGT ATAATCCTGA GCCTCCCAAG GTACAGCCTT TCACTACTAT TCATCATATT GGCTAAGGTA TTCATCATAT TGGCTAAGGT ATTCACCAAC AGGGCTCATT TTCTATCAGA CC
    9. 9. ss76078129 'A/G’ ss3354770 'T/C’ ss3354770 (aligns to minus strand) ss76078129 (aligns to plus strand) ss76078129 (33bp) ss76078129 (661bp)
    10. 10. rs397515413
    11. 11. rs397515413 Hydin NC_000016.9 (chr16) Hydin2 NW_003871055.3 (chr1 fix patch)
    12. 12. VCF (Variant Call File) Defines variant by location rather than flanking sequence
    13. 13. Clustering microsatellites
    14. 14. rs62645748 To be replaced by a Variation Viewer To be replaced by a link to ClinVar
    15. 15. rs62645748 (NCBI Homo sapiens annotation run 104)
    16. 16. http://www.ncbi.nlm.nih.gov/dbvar
    17. 17. Submitter Information Contact and author information Study Information Study meta-data (description, PMID, ProjectID, etc) Sample/Sampleset data Experiment data Variants Sample IDs (if samples are consented) Sampleset ID for pooled samples (case v control sets) Assay method (sequencing, array) Platform and analysis information Variant definitions
    18. 18. Variant Call Ambiguity start stop Probes with decreased signal intensity Probes with expected signal intensity breakpoint breakpoint Inner start Inner stop Outer start Outer stop Inner start Inner stop
    19. 19. Fosmid clone (40 Kb +/- 1 Kb) Variant Call Ambiguity Outer start Outer stop 20Kb Clone has a deletion relative to the genome 60 Kb Clone has an insertion relative to the genome
    20. 20. http://www.ncbi.nlm.nih.gov/clinvar
    21. 21. ClinVar data model and display Allele Variant Variant Phenotype Variant Phenotype Submitter RCV RCV SCV SCV SCV SCV SCV SCV
    22. 22. ClinVar RCV report - Overview Interpretation • Significance • Review status * • Accession.version * Allele summary • Gene • Variant type • Genomic location • HGVS expressions* • Molecular consequence* • Links* • Frequency* Phenotype summary • Names • Links* • Age of onset * • Prevalence * * May be provided by NCBI
    23. 23. ClinVar RCV report – Summary of assertions • Each submission is accessioned and versioned • Terms provided by the submitter are mapped to controlled values • Method of review is clearly reported so primary data can be distinguished from that reported in the literature
    24. 24. ClinVar RCV report - Evidence
    25. 25. Allele report – available December
    26. 26. http://www.lrg-sequence.org/ http://www.ncbi.nlm.nih.gov/refseq/rsg
    27. 27. RefSeq Gene L R http://www.ncbi.nlm.nih.gov/refseq/rsg
    28. 28. From Assembly 1 <-> Assembly 2 Assembly <-> RefSeqGene/LRG Primary Assembly <-> Alternate loci http://www.ncbi.nlm.nih.gov/genome/tools/remap
    29. 29. 1:215844373
    30. 30. This new look coming next month http://www.ncbi.nlm.nih.gov/variations/tools/reporter
    31. 31. http://www.ncbi.nlm.nih.gov/variation/view
    32. 32. Target audience: Clinical testing labs Submissions from: Clinical and Research labs NA Concordant Discordant Calls Tests cSRA http://www.ncbi.nlm.nih.gov/variation/tools/get-rm
    33. 33. Twelve submitting labs to date Twelve custom scripts to regularize data Defined formats here: http://www.ncbi.nlm.nih.gov/projects/variation/get-rm
    34. 34. Platforms NA12878 Tests by Platform 30 25 20 15 10 5 0 HiSeq 2000 HiSeq 2500 MiSeq Ion Torrent Sanger 454
    35. 35. Lab Provided Validation Variants validated in this sample using another platform Variants validated in another sample using another platform Variants seen in other samples from submitting lab using this platform Variants seen in public data set Variants that are novel Variants that were not assessed
    36. 36. Suppor ng Read Counts 250 Number of Variant 200 150 100 50 0 0 10 Based on May 2013 Data release 50 100 500 Read Count Bins 1000 5000
    37. 37. Based on May 2013 Data release
    38. 38. http://www.ncbi.nlm.nih.gov/variation/tools/get-rm
    39. 39. Gene level concordance Σ (max(xi)/Σ T) i = genotype call X = count per call for each variant T = total genotype calls per variant Sums are taken over all variants in a gene. Tested regions taken into account Phasing ignored

    ×