The reference is not just the is the chromosome sequences of the primary assembly unit, but also includes the alternate loci and patches, which are used to provide additional sequence representations at selected genomic regions. The GRC has been releasing patches to the human assembly on a quarterly cycle, and we’re now at GRCh37.p12. There are two varieties of patches:FIX patches correct existing assembly problems: chromosome will update, patches integrated in GRCh38NOVEL patches add new sequence representations: will become alternate lociThis ideogram shows the current distribution of patches and alternate loci, and you can see that many regions have changed since GRCh37. Note that approximately 3% of the current public human assembly GRCh37 is associated with a region that is represented by a patch or alternate locus.
Converting from Analog to DigitalIntegrating the historical archive of human variation in an NGS worldDeanna M. ChurchStaff Scientist, NCBI@deannachurch Genome Informatics Alliance 2013
AcknowledgementsGeT-RMLisa Kalman (CDC)Birgit Funke (Harvard)Mahduri Hegde (Emory)Maryam HalaviChao ChenJon TrowDouglas SlottaPeter MericDaniel FrishbergVictor AnanievClinVarAlex AstashynShanmuga ChitipirallaDouglas HoffmanWonhee JangBrandi KattmanMelissa LandrumJennifer LeeAdriana MalheiroWendy RubinsteinGeorge RileyAmanjeev SethiRicardo VillamarinISCAChrista Lese Martin (Geisinger)Erin Riggs (Geisinger)Jose MenaMike FeoloTim HefferonJohn GarnerJohn LopezGRCValerie Schneider (NCBI)The Genome Institute at Washington UniversityThe Wellcome Trust Sanger InstituteThe European Bioinformatics Institute
http://www.ncbi.nlm.nih.gov/variation/tools/get-rmCallsTestscSRAConcordantDiscordantNATarget audience: Clinical testing labsSubmissions from: Clinical and Research labs
Reporting Standards: Not standardTwelve submitting labs to dateTwelve custom scripts to regularize dataDespite defined formats here:http://www.ncbi.nlm.nih.gov/projects/variation/get-rmWhat are the issues?
Reporting Standards: Not standardWhat are the issues?Better Example: QUAL**Required sixth column in VCF file10.01-18357.112.6-21.20-21.220-3070Allele string34.79-44624.03None20-46006
c.1956+15C>CTReporting Standards: Not standardWhat are the issues?Lab reporting a single nucleotide change (C->T) het change as:c.1956+15C>T[=]HGVS standards says this should be reported as:Lab reporting a single nucleotide change (A->G) hom change as:c.670+9A>GHGVS standards says this should be reported as:c.[670+9A>G];[670+9A>G]
Defining a reference sequence: Data validationNM_007171.3:c.942T>CReported as:Base in transcript is a ‘C’ not a ‘T’
Standardize data: what is the variation?607008.0001985A>G985A>G (K304E)A985GACADM, LYS304GLUK304EK304E (985 A->G)K304E (K329E)K304E onlyK329EK329E(985A>G)LYS304GLUMutation c.985A>G (p.K304E)c.985A>Gc.985A>G (p.K304E)c.985A>G (p.Lys304Gluincludes: K304E (985A>G)p.K304Ep.Lys329Glupreviously known as p.Lys329GluAnalysis of ACADM 985A>G mutationNC_000001.10:g.76226846A>GNG_007045.1:g.41804A>GNM_000016.4:c.985A>GNP_000007.1:p.Lys329Glurs77931234