Church clinical2012

606 views

Published on

Published in: Health & Medicine
1 Comment
0 Likes
Statistics
Notes
  • Thanks for posting, Deanna!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total views
606
On SlideShare
0
From Embeds
0
Number of Embeds
21
Actions
Shares
0
Downloads
15
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide
  • What is variant calling? Identifying differences from a reference.
  • Technical noise: Ideogram showing gaps in human genome
  • Technical noise example: ISCA variant submitted that was completely within a gap. Note: most people don’t look at assembly tracks when they review data- you might not ever see anything odd about this unless you did (this is actually what caused us to start doing validation on variant definition based on the assembly.
  • Technical issue: We get tiny reads (relative to the genome) that we have to align to a reference and interpret.Screen shot from 1000 genomes data in the 1000 genomes browser. Tenth exon of CDC27. Highlighting samples that have been sequencing using one technology and aligned using two different methods. Not lack of 1000 genomes calls in this region as well as questionable het SNP (see in BWA, but not Mosaik)
  • This problem can also arise due to population specific issues. APOL1 genes seems to be correctly assembled but there may be an African specific copy number duplication that causes a SNP to be called- this may not be a SNP but rather a difference between paralogous gene copies.
  • Focal segmental glomerulosclerosis 4: utility of having data in a centralized resource: we were able to add annotation from multiple sources onto this genome location. Utility of having variant calls in a central repository to allow for addition of knowledge. Early in the new year we will be adding tracks for Evan’s SUN data and alignment of known paralogs/pseudogenes to the genome.
  • To address assembly issues the GRC to centralize the production of the reference assembly. This gives the community a single point of contact for reporting problems and finding information about the assembly. Additionally, we serve as an aggregator of information- as individual labs find or fix problems, we can integrate this information into the reference assembly so everyone can have access to this data.
  • Region curation slide: We curate the genome region by region, and make this information available to users on a web site (and as downloadable files for integration with browsers).
  • The GRC releases patches to the human assembly on a quarterly cycle. There are two varieties of patches:FIX patches correct existing assembly problems: chromosome will update, patches integrated in GRCh38NOVEL patches add new sequence representations: will become alternate lociApproximately 3% of the current public human assembly GRCh37 is associated with a region that is represented by a patch or alternate locus.As you can see, the GRC has been very busy with updating assemblies. I’d now like to talk about the tools and software we use to do this.
  • If you are not using the entire assembly in your efforts, you may be missing genes in your exome capture reagents.
  • Example of a fix patches- no one is really screening for these right now despite clear importance in neuronal development.
  • RefSeqGene/LRG screen shot: stable coordinate system for gene level reporting. Gene centric genomic sequences.
  • Distribution of RefSeqGenes on GRCh37
  • Church clinical2012

    1. 1. Improving the accuracy of variant identification Deanna M. Church NCBI@deannachurch
    2. 2. Image credit: http://www.tohlejokes.com
    3. 3. ISCA submitted variant: nsv534088
    4. 4. CDC27 rs11491191http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes
    5. 5. APOL1 APOL1CEPH: A=1.000 G=0 http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes
    6. 6. APOL1 Multiple Suspect submissions 1000G Frequency Data YRI: A=0.5852 G=0.4148 Sudmant et al., 2010http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes
    7. 7. http://genomereference.org
    8. 8. Submitted on NCBI35 (hg17)nsv832911 (nstd68)http://www.ncbi.nlm.nih.gov/dbvar
    9. 9. Moved approximately 2 Mb distal on chr15NCBI35 (hg17) Tiling PathNC_0000015.8 (chr15) Gap Inserted Removed from assemblyGRCh37 (hg19) Tiling Path Added to assemblyNC_0000015.9 (chr15)HG-24
    10. 10. GRCh37.p10(160 regions: 2.89% of chromosomes)111 Fix PATCHES: Chromosome update in GRCh38 (adds >5 Mb of novel sequence to the assembly)71 Novel PATCHES: Additional sequence added (adds >800K of novel sequence to the assembly)Releasing patches quarterly Summer of 2013
    11. 11. MHC (chr6)Chr 6 representation (PGF)Alt_Ref_Locus_2 (COX)
    12. 12. Richa AgarwalaEugene Yaschenko
    13. 13. 1q32 1q21 1p211p21 patch alignment to chromosome 1Dennis et al., 2012
    14. 14. http://www.lrg-sequence.org/http://www.ncbi.nlm.nih.gov/refseq/rsg
    15. 15. RefSeq Gene L Rhttp://www.ncbi.nlm.nih.gov/refseq/rsg
    16. 16. Assembly<->RefSeqGene (including transcripts and coding sequences)http://www.ncbi.nlm.nih.gov/genome/tools/remap
    17. 17. Coordinate Remapping NCBI36 <->GRCh37http://www.ncbi.nlm.nih.gov/genome/tools/remap
    18. 18. The Reference Assembly is Evolving Centralization of assembly and sequence data facilitates: Reporting problems Implementing fixes Building tools Data management
    19. 19. Thanks!Genome Reference Consortium The Genome Institute at Washington University The Wellcome Trust Sanger Institute European Bioinformatics Institute National Center for Biotechnology InformationNCBI Assembly Group RefSeq Group Genome Annotation Group

    ×