Church gia13


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • The reference is not just the is the chromosome sequences of the primary assembly unit, but also includes the alternate loci and patches, which are used to provide additional sequence representations at selected genomic regions. The GRC has been releasing patches to the human assembly on a quarterly cycle, and we’re now at GRCh37.p12. There are two varieties of patches:FIX patches correct existing assembly problems: chromosome will update, patches integrated in GRCh38NOVEL patches add new sequence representations: will become alternate lociThis ideogram shows the current distribution of patches and alternate loci, and you can see that many regions have changed since GRCh37. Note that approximately 3% of the current public human assembly GRCh37 is associated with a region that is represented by a patch or alternate locus.
  • Church gia13

    1. 1. Converting from Analog to DigitalIntegrating the historical archive of human variation in an NGS worldDeanna M. ChurchStaff Scientist, NCBI@deannachurch Genome Informatics Alliance 2013
    2. 2. AcknowledgementsGeT-RMLisa Kalman (CDC)Birgit Funke (Harvard)Mahduri Hegde (Emory)Maryam HalaviChao ChenJon TrowDouglas SlottaPeter MericDaniel FrishbergVictor AnanievClinVarAlex AstashynShanmuga ChitipirallaDouglas HoffmanWonhee JangBrandi KattmanMelissa LandrumJennifer LeeAdriana MalheiroWendy RubinsteinGeorge RileyAmanjeev SethiRicardo VillamarinISCAChrista Lese Martin (Geisinger)Erin Riggs (Geisinger)Jose MenaMike FeoloTim HefferonJohn GarnerJohn LopezGRCValerie Schneider (NCBI)The Genome Institute at Washington UniversityThe Wellcome Trust Sanger InstituteThe European Bioinformatics Institute
    3. 3. VariationPhenotypes
    4. 4. Phenotypes
    5. 5. Variant Call (dbVarsubmission)Array data filesClinical LabsQC AnalysisCurationData regularizationdbGaPControlled AccessWeb accessFTP AccessAssemblyRemappingdbVarISCAUCSCDGVDGVaNCBIApproved UsersBioProject IDClinVardbGaP projects needa sponsoring NIHinstitute to run theDAC (NICHD)
    6. 6. ASDAtrial Septum Defect Autism Spectrum Disorder??No HPO1,814HPO6,770Riggs et al, 2012~2 HPO terms/case(max of 16)The Human Phenotype Ontology
    7. 7.
    8. 8. Variation
    9. 9. sequences alignments genotype likelihoods individual variants1101001,00010,000100,000size(gigabytes)component1092 genomes (low coverage + exome)38.2M SNPs3.9M Short Indels and14K DeletionsFASTQBAMVCFVCFFASTQBAMVCFVCFSteve Sherry, NCBI
    10. 10.
    11. 11.
    12. 12. http://genomereference.orgGRCh37
    13. 13. Dennis et al., 20121q32 1q21 1p211p21 patch alignment to chromosome 1
    14. 14. Hydin: chr16 (16q22.2)Hydin2: chr1 (1q21.1)Missing in NCBI35 Unlocalized in NCBI36/GRCh37 Finished in GRCh38Alignment to Hydin2 Genomic, 300 Kb, 99.4% IDAlignment to Hydin1 CHM1_1.0, >99.9% IDAlignment to Hydin2 Genomic, 300 Kb, 99.4% IDAlignment to Hydin1 CHM1_1.0, >99.9% IDDoggett et al., 2006
    15. 15. Kidd et al, 2007APOBEC clusterPart of chr22 assemblyAlternate locus for chr22White: InsertionBlack: Deletion
    16. 16.
    17. 17. Human Resolved for GRCh38
    18. 18. GRCh38 is coming(September, 2013)
    19. 19. audience: Clinical testing labsSubmissions from: Clinical and Research labs
    20. 20. Reporting Standards: Not standardTwelve submitting labs to dateTwelve custom scripts to regularize dataDespite defined formats here: are the issues?
    21. 21. Reporting Standards: Not standardWhat are the issues?Better Example: QUAL**Required sixth column in VCF file10.01-18357.112.6-21.20-21.220-3070Allele string34.79-44624.03None20-46006
    22. 22. c.1956+15C>CTReporting Standards: Not standardWhat are the issues?Lab reporting a single nucleotide change (C->T) het change as:c.1956+15C>T[=]HGVS standards says this should be reported as:Lab reporting a single nucleotide change (A->G) hom change as:c.670+9A>GHGVS standards says this should be reported as:c.[670+9A>G];[670+9A>G]
    23. 23. Defining a reference sequence: Data validationNM_007171.3:c.942T>CReported as:Base in transcript is a ‘C’ not a ‘T’
    24. 24.
    25. 25. Standardize data: what is the variation?607008.0001985A>G985A>G (K304E)A985GACADM, LYS304GLUK304EK304E (985 A->G)K304E (K329E)K304E onlyK329EK329E(985A>G)LYS304GLUMutation c.985A>G (p.K304E)c.985A>Gc.985A>G (p.K304E)c.985A>G (p.Lys304Gluincludes: K304E (985A>G)p.K304Ep.Lys329Glupreviously known as p.Lys329GluAnalysis of ACADM 985A>G mutationNC_000001.10:g.76226846A>GNG_007045.1:g.41804A>GNM_000016.4:c.985A>GNP_000007.1:p.Lys329Glurs77931234
    26. 26. Miki et al, 1994