SlideShare a Scribd company logo
TThhee CClliinniiccaall SSiiggnniiffiiccaannccee ooff TTrraannssccrriipptt 
AAlliiggnnmmeenntt DDiissccrreeppaanncciieess 
…… aanndd ttoooollss ttoo hheellpp yyoouu ddeeaall wwiitthh tthheemm.. 
RReeeeccee HHaarrtt,, PPhh..DD.. 
rrhhaarrtt@@2233aannddmmee..ccoomm 
GGeenneenntteecchh 
22001144--1100--1166 
Available on SlideShare (http://www.slideshare.net/reecehart)
The fidelity of transcript-ggeennoommee mmaappppiinngg mmaatttteerrss.. 
2 / 28 
Variants are identified 
and computed on in 
genome coordinates 
Variants are analyzed and 
communicated using 
transcript coordinates 
genome to 
transcript 
(g. to c.) 
transcript 
to genome 
(c. to g.)
Motivation 1: Discordant eexxoonn ccoooorrddiinnaatteess 
NNCCBBII aanndd UUCCSSCC rreeppoorrtt ddiiffffeerreenntt ccoooorrddiinnaatteess ffoorr CCAARRDD99,, NNMM__005522881133..33,, eexxoonn 1122 
exon 12 
displaced 322 nt 
3 / 28 
UCSC 
(BLAT) 
NCBI 
(Splign) 
Consequences: 
1. An assay that targets the wrong genomic region will generate 
uninformative sequence data. 
2. A genomic variant will be interpreted as exonic when it is 
intronic, or vice versa.
Motivation 2: iinnddeellss ccoonnffoouunndd mmaappppiinngg 
NNMM__000066115588..33 ((NNEEFFLL)) ccoonnttaaiinnss iinnddeell iinn CCDDSS 
4 / 28 
Deletion justified differently!
Motivation 3: Data mmaannaaggeemmeenntt cchhaalllleennggeess 
➢ Mutable data (!) 
➢ Sporadic failures 
➢ Inconsistent data from a single source 
➢ Inconsistent data across sources 
➢ Opaque and implicit data definitions 
➢ Historical alignment data not available 
Source AC Reference exons 
EUtils NM_005168.3 GRCh37.p10 1146 / 125 / 320 / 1998 
NM_005168.4 NG_008492.1 1398 / 125 / 320 / 1998 
seqgene NM_005168.3 GRCh37.p10 102 / 1046 / 125 / 321 / 143 / 1855 
UCSC NM_005168.4 hg19 1398 / 135 / 244 / 76 / 1997 
5 / 28
Motivation 4: Use Ensembl for Variant EEffffeecctt PPrreeddiiccttiioonn 
6 / 28 
RefAgree 
Do transcript and 
genome sequences agree? 
Transcript Equivalence 
Which RefSeq and Ensembl 
transcripts are equivalent? 
RefSeq 
(NM) 
Ensembl 
(ENST) 
Genome 
(GRCh37) 
➊ SNV 
➌ 
➋ Indel 
➍ Historical Transcripts UCSC (NM) 
LRG, BIC, …
Garla, V., Kong, Y., Szpakowski, S., & Krauthammer, M. (2011). 
MU2A--reconciling the genome and transcriptome to determine the effects of base substitutions. 
Bioinformatics (Oxford, England), 27(3), 416-8. doi:10.1093/bioinformatics/btq658 
7 / 28
Challenges and Solutions iinn TTrraannssccrriipptt MMaannaaggeemmeenntt 
8 / 28 
➢ Biological 
● Alternative splicing 
● Paralogs 
● Natural polymorphisms 
● Alternative references 
➢ Technical / Logistical 
● Multiple transcript sources 
● Multiple alignment methods 
● Multiple references 
● Genome-transcript sequence 
differences 
● Historical transcript alignments 
➢ Existing resources 
● RefSeq, UCSC, Ensembl 
● Locus Reference Genomic 
● Mutalyzer 
➢ See also 
● McCarthy DJ¸ et al. Genome 
Medicine 6:26 (2014). 
● Garla V, et al. Bioinformatics 
27(3): 416–8 (2010).
Part 1 
The Universal Transcript Archive 
10 / 28
UTA solves four issues with ttrraannssccrriipptt mmaannaaggeemmeenntt.. 
A 
Transcript ≠≠ Genome Reference 
➊ SNV 
➋ 
➍Exon coordinate differences between sources for same accession 
11 / 28 
T 
RefSeq 
NM_01234.5 
➌ 
RefSeq 
NM_01234.4 
InDel 
UCSC 
NM_01234.5 
Historical transcripts alignments no longer available
Universal Transcript AArrcchhiivvee ((UUTTAA)) 
MMuullttiippllee ssoouurrcceess,, mmuullttiippllee vveerrssiioonnss,, mmuullttiippllee aalliiggnnmmeenntt mmeetthhooddss iinn oonnee ddaattaabbaassee 
12 / 28 
transcript 
NM_01234.4 
NM_01234.4 
NM_01234.5 
NM_01234.5 
NM_01234.5 
NM_01234.5 
ENST012345 
ENST012345 
reference 
NM_01234.4 
NC_000012.3 
NM_01234.5 
NC_000012.3 
AC_45678.9 
NC_000012.3 
ENST012345 
NC_000012.3 
method 
self 
splign 
self 
splign 
splign 
blat 
self 
genebuild 
exons 
exon set
Universal Transcript AArrcchhiivvee ((UUTTAA)) 
MMuullttiippllee ssoouurrcceess,, mmuullttiippllee vveerrssiioonnss,, mmuullttiippllee aalliiggnnmmeenntt mmeetthhooddss iinn oonnee ddaattaabbaassee 
13 / 28 
transcript 
NM_01234.4 
NM_01234.4 
NM_01234.5 
NM_01234.5 
NM_01234.5 
NM_01234.5 
ENST012345 
ENST012345 
reference 
NM_01234.4 
NC_000012.3 
NM_01234.5 
NC_000012.3 
AC_45678.9 
NC_000012.3 
ENST012345 
NC_000012.3 
method 
self 
splign 
self 
splign 
splign 
blat 
self 
genebuild 
exons 
exon set 
exon alignments 
NM_01234.4 NC_000012.3 0 50≠ 
NM_01234.4 NC_000012.3 1 100≠1X49≠ 
NM_01234.4 NC_000012.3 2 5≠1I44≠ 
➊➋ 
Alignments use 
coordinates from source 
databases.
Universal Transcript AArrcchhiivvee ((UUTTAA)) 
MMuullttiippllee ssoouurrcceess,, mmuullttiippllee vveerrssiioonnss,, mmuullttiippllee aalliiggnnmmeenntt mmeetthhooddss iinn oonnee ddaattaabbaassee 
14 / 28 
transcript 
NM_01234.4 
NM_01234.4 
NM_01234.5 
NM_01234.5 
NM_01234.5 
NM_01234.5 
ENST012345 
ENST012345 
reference 
NM_01234.4 
NC_000012.3 
NM_01234.5 
NC_000012.3 
AC_45678.9 
NC_000012.3 
ENST012345 
NC_000012.3 
method 
self 
splign 
self 
splign 
splign 
blat 
self 
genebuild 
exons 
exon set 
➌
Universal Transcript AArrcchhiivvee ((UUTTAA)) 
MMuullttiippllee ssoouurrcceess,, mmuullttiippllee vveerrssiioonnss,, mmuullttiippllee aalliiggnnmmeenntt mmeetthhooddss iinn oonnee ddaattaabbaassee 
15 / 28 
transcript 
NM_01234.4 
NM_01234.4 
NM_01234.5 
NM_01234.5 
NM_01234.5 
NM_01234.5 
ENST012345 
ENST012345 
reference 
NM_01234.4 
NC_000012.3 
NM_01234.5 
NC_000012.3 
AC_45678.9 
NC_000012.3 
ENST012345 
NC_000012.3 
method 
self 
splign 
self 
splign 
splign 
blat 
self 
genebuild 
exons 
exon set 
➍
““RefAgree” Statistics by Protein CCooddiinngg TTrraannssccrriipptt 
SSeeqquueennccee ccoonnccoorrddaannccee bbeettwweeeenn RReeffSSeeqq aanndd GGRRCChh3377 pprriimmaarryy aasssseemmbbllyy 
➊➋ 
34531 NM transcripts (Jan 2014) 
760 0.2% with length discrepancies 
3481 10% with substitutions 
321 0.9% with deletions 
255 0.7% with insertions 
16 / 28 
c.f. Garla V, et al. Bioinformatics 27(3): 416–8 (2010).
Exon structures have uunniiqquuee ffiinnggeerrpprriinnttss 
IIddeennttiiffyyiinngg EENNSSTT--NNMM eeqquuiivvaalleenncceess wwiitthh ffiinnggeerrpprriinnttss 
=> select N.hgnc,N.es_fingerprint,N.tx_ac,E.tx_ac 
from uta_20140210.tx_exon_set_summary_mv N 
join uta_20140210.tx_exon_set_summary_mv E 
on N.es_fingerprint=E.es_fingerprint 
and N.tx_ac ~ '^NM_' and E.tx_ac ~ '^ENST' 
and N.alt_aln_method='transcript' 
and E.alt_aln_method='transcript'; 
┌─────────┬──────────────────────────────────┬────────────────┬─────────────────┐ 
│ hgnc │ es_fingerprint │ tx_ac │ tx_ac │ 
├─────────┼──────────────────────────────────┼────────────────┼─────────────────┤ 
│ AFF2 │ db0e20be1a2bb687c33227d2e6bf9d53 │ NM_002025.3 │ ENST00000370460 │ 
│ UBE3A │ d1eace7da295c45378fa5f898f2f03f6 │ NM_130838.1 │ ENST00000438097 │ 
│ ANXA8L1 │ 1f6fd4f3fe9854aa468489ec7f507512 │ NM_001098845.1 │ ENST00000359178 │ 
│ APOL5 │ 939a9e9e4a46ef9aef862cf9b369afe6 │ NM_030642.1 │ ENST00000249044 │ 
│ ARID4B │ 524fc954d10b08a4014e86aee81d0358 │ NM_016374.5 │ ENST00000264183 │ 
17 / 28
NCBI (Splign) v. UCSC (BBLLAATT)) AAlliiggnnmmeenntt SSttaattiissttiiccss 
SSpplliiggnn aanndd BBLLAATT pprroovviiddee ssiiggnniiffiiccaannttllyy ddiiffffeerreenntt eexxoonn ssttrruuccttuurreess ffoorr 888866 ttrraannssccrriippttss 
Are Splign 
and BLAT 
similar ? 
18 / 28 
31472 (97.3%) 
transcripts 
Y 
N 
32358 
transcripts 
w/exon structures 
➌ 
886 (2.7%) 
transcripts 
“similar” means either 
1) identical exon coordinates, or 
2) coordinates that differ only by 
short 3' terminal artifacts
Characterization of transcripts ddiissccrreeppaanncciieess 
WWhheetthheerr aalliiggnnmmeennttss pprroovviiddeedd bbyy NNCCBBII aanndd UUCCSSCC aaggrreeee wwiitthh GGRRCChh3377 pprriimmaarryy sseeqquueennccee.. 
Splign BLAT 
T F 
T 14 18 
F 545 311 
886 transcripts with 
significant discrepancies 
19 / 28
Characterization of transcripts ddiissccrreeppaanncciieess 
RReeffeerreennccee aaggrreeeemmeenntt ((bblluuee)) aanndd aalliiggnnmmeenntt ““ssiimmpplliicciittyy”” ((ggrreeeenn)) 
Splign BLAT 
T F 
T 14 18 
F 545 311 
20 / 28 
Splign 
Splign 
BLAT 
T F 
T 200 
(0) 
4 
(97) 
F 90 
(82) 
16 
(84) 
BLAT 
T F 
T 6 
(41) 
12 
(180) 
F 
Splign 
Splign 
BLAT 
T F 
T 434 
(7) 
F 110 
(652) 
BLAT 
T F 
T 14 
(11) 
F 
886 transcripts with 
significant discrepancies
AACCMMGG ““MMuusstt RReeppoorrtt”” GGeenneess 
Green, R. C., Berg, J. S., Grody, W. W., Kalia, S. S., Korf, B. R., Martin, C. L., … Biesecker, L. G. (2013). 
ACMG recommendations for reporting of incidental findings in clinical exome and genome 
sequencing. Genetics in Medicine : Official Journal of the American College of Medical Genetics, 
15(7), 565–74. doi:10.1038/gim.2013.73 
21 / 28
Summary of Splign-BLAT gene-wwiissee ccoooorrddiinnaattee ddeellttaass.. 
delta # genes # ACMG must 
22 / 28 
report 
=0 15206 45 
>=1 183 8 
>=10 116 0 
>=25 6 0 
>=50 5 0 
>=250 13 0 
>=1000 94 3 
delta ≝ minimum per gene of maximum per transcript of 
difference of exon coordinates between NCBI and UCSC. 
Identical Exon 
Structures 
(all trivial diffs) 
LDLR, MYL2, 
PRKAG2, SDHB, 
SDHC, TGFBR1, 
TGFBR2, WT1 
MYBPC3, MYH7, 
TNNI3
Part 2 
Using HGVS “Nomenclature” 
(http://www.hgvs.org/mutnomen/) 
23 / 28
24 / 28 
HHGGVVSS PPyytthhoonn PPaacckkaaggee 
hhttttpp::////bbiittbbuucckkeett..oorrgg//hhggvvss//hhggvvss// 
➢ Parser 
● HGVS → Python object 
● Based on a Parsing Expression 
Grammar 
➢ Formatter 
● Python object → HGVS 
➢ Validator 
● intrinsic & extrinsic validation 
➢ Mapping tools indel-aware! 
● g. ↔ c. → p. (m,n,r also supported) 
● transcript-to-transcript liftover 
● uses on UTA data
Example: Variant liftover bbeettwweeeenn ttrraannssccrriippttss 
Map 
from ➀ NM_182763.2:c.688+403C>T 
to ➁ NC_000001.10:g.150550916G>A 
to ➂ NM_001197320.1:281C>T 
with Splign alignments 
25 / 28 
NM_182763.2 
NP_877495.1 
NM_001197320.1 
NP_001184249.1 
➀ 
➂ 
➁ 
NC_000001.10
26 / 28 
DDeevveellooppeerr IInnffoo 
Testing 
➢ 91% code coverage 
➢ 25665 tests variants 
● ~200 hand curated, rest from 
dbSNP 
● 23436 sub, 1254 del, 908 ins, 45 
delins, 22 dup 
● 44 distinct transcripts, many 
selected for difficulty 
➢ >99% concordance with 
Mutalyzer 
● using >100K variants from 
ClinVar 
Upcoming directions 
(all issues are publicly readable) 
➢ multi-variant alleles 
➢ release LRG 
➢ GRCh38 
➢ API changes
CCoonncclluussiioonnss 
➢ The fidelity of reference-transcript mapping matters 
● For ~800 transcripts, splign and BLAT generate significantly different 
alignments 
● These differences might affect the interpretation of clinically-relevant 
genes (including 3 ACMG must report genes) 
➢ Current resources have important limitations 
➢ Two tools may help you deal with these limitations 
● UTA – Freely available archive of transcripts from multiple sources 
● HGVS – Comprehensive parsing, formatting, manipulation, and validation 
of variants 
27 / 28
28 / 28 
AAcckknnoowwlleeddggeemmeennttss 
➢ Invitae 
● Vince Fusaro 
● John Garcia 
● Emily Hare 
● Kevin Jacobs 
● Geoff Nilsen 
● Rudy Rico 
● Jody Westbrook 
● 
● 
● http://goo.gl/dq2uoW 
http://bitbucket.com/hgvs/hgvs 
http://bitbucket.com/uta/uta 
➢ Code (Python) 
➢ Documentation & Examples 
➢ Issues 
➢ BED files 
➢ Code testing is public 
Or just: 
pip install hgvs

More Related Content

Similar to Clinical significance of transcript alignment discrepancies gne - 20141016

genomeannotation2013-140127002622-phpapp02.ppt
genomeannotation2013-140127002622-phpapp02.pptgenomeannotation2013-140127002622-phpapp02.ppt
genomeannotation2013-140127002622-phpapp02.ppt
MohamedHasan816582
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013
Deanna Church
 
Adaptive Molecular Evolution (dN/dS). 2011
Adaptive Molecular Evolution (dN/dS). 2011Adaptive Molecular Evolution (dN/dS). 2011
Adaptive Molecular Evolution (dN/dS). 2011
Hernán Dopazo
 
MS thesis presentation_FINAL
MS thesis presentation_FINALMS thesis presentation_FINAL
MS thesis presentation_FINALTom Hajek
 
A Genome Sequence Analysis System Built with Hypertable
A Genome Sequence Analysis System Built with HypertableA Genome Sequence Analysis System Built with Hypertable
A Genome Sequence Analysis System Built with Hypertable
DATAVERSITY
 
Generalizing phylogenetics to infer patterns predicted by processes of divers...
Generalizing phylogenetics to infer patterns predicted by processes of divers...Generalizing phylogenetics to infer patterns predicted by processes of divers...
Generalizing phylogenetics to infer patterns predicted by processes of divers...
Jamie Oaks
 
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Paul Gardner
 
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
IJERD Editor
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Elia Brodsky
 
The Clinical Significance of Transcript Alignment Discrepancies … and tools t...
The Clinical Significance of Transcript Alignment Discrepancies … and tools t...The Clinical Significance of Transcript Alignment Discrepancies … and tools t...
The Clinical Significance of Transcript Alignment Discrepancies … and tools t...Human Variome Project
 
Sept2016 plenary mercer_sequins
Sept2016 plenary mercer_sequinsSept2016 plenary mercer_sequins
Sept2016 plenary mercer_sequins
GenomeInABottle
 
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble ApproachDetecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
Hong ChangBum
 
SNP genotyping using Affymetrix' Axiom Genotyping Solution
SNP genotyping using Affymetrix' Axiom Genotyping SolutionSNP genotyping using Affymetrix' Axiom Genotyping Solution
SNP genotyping using Affymetrix' Axiom Genotyping Solution
Affymetrix
 
proteome.pptx
proteome.pptxproteome.pptx
proteome.pptx
MohamedHasan816582
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene Expression
Aashish Patel
 
Towards a systems-level understanding of RNA secondary structure and interact...
Towards a systems-level understanding of RNA secondary structure and interact...Towards a systems-level understanding of RNA secondary structure and interact...
Towards a systems-level understanding of RNA secondary structure and interact...
Alexander Junge
 
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Databricks
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07Computer Science Club
 

Similar to Clinical significance of transcript alignment discrepancies gne - 20141016 (20)

genomeannotation2013-140127002622-phpapp02.ppt
genomeannotation2013-140127002622-phpapp02.pptgenomeannotation2013-140127002622-phpapp02.ppt
genomeannotation2013-140127002622-phpapp02.ppt
 
20150115_JQO_NYAPopulationGenomics
20150115_JQO_NYAPopulationGenomics20150115_JQO_NYAPopulationGenomics
20150115_JQO_NYAPopulationGenomics
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013
 
Adaptive Molecular Evolution (dN/dS). 2011
Adaptive Molecular Evolution (dN/dS). 2011Adaptive Molecular Evolution (dN/dS). 2011
Adaptive Molecular Evolution (dN/dS). 2011
 
MS thesis presentation_FINAL
MS thesis presentation_FINALMS thesis presentation_FINAL
MS thesis presentation_FINAL
 
A Genome Sequence Analysis System Built with Hypertable
A Genome Sequence Analysis System Built with HypertableA Genome Sequence Analysis System Built with Hypertable
A Genome Sequence Analysis System Built with Hypertable
 
Generalizing phylogenetics to infer patterns predicted by processes of divers...
Generalizing phylogenetics to infer patterns predicted by processes of divers...Generalizing phylogenetics to infer patterns predicted by processes of divers...
Generalizing phylogenetics to infer patterns predicted by processes of divers...
 
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...Avoidance of stochastic RNA interactions can be harnessed to control protein ...
Avoidance of stochastic RNA interactions can be harnessed to control protein ...
 
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
 
The Clinical Significance of Transcript Alignment Discrepancies … and tools t...
The Clinical Significance of Transcript Alignment Discrepancies … and tools t...The Clinical Significance of Transcript Alignment Discrepancies … and tools t...
The Clinical Significance of Transcript Alignment Discrepancies … and tools t...
 
Sept2016 plenary mercer_sequins
Sept2016 plenary mercer_sequinsSept2016 plenary mercer_sequins
Sept2016 plenary mercer_sequins
 
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble ApproachDetecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
Detecting Somatic Mutations in Impure Cancer Sample - Ensemble Approach
 
SNP genotyping using Affymetrix' Axiom Genotyping Solution
SNP genotyping using Affymetrix' Axiom Genotyping SolutionSNP genotyping using Affymetrix' Axiom Genotyping Solution
SNP genotyping using Affymetrix' Axiom Genotyping Solution
 
proteome.pptx
proteome.pptxproteome.pptx
proteome.pptx
 
final_presentation
final_presentationfinal_presentation
final_presentation
 
SAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene ExpressionSAGE- Serial Analysis of Gene Expression
SAGE- Serial Analysis of Gene Expression
 
Towards a systems-level understanding of RNA secondary structure and interact...
Towards a systems-level understanding of RNA secondary structure and interact...Towards a systems-level understanding of RNA secondary structure and interact...
Towards a systems-level understanding of RNA secondary structure and interact...
 
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
 

More from Reece Hart

HGVS 2015 poster: hgvs, uta, variantanalyzer
HGVS 2015 poster: hgvs, uta, variantanalyzerHGVS 2015 poster: hgvs, uta, variantanalyzer
HGVS 2015 poster: hgvs, uta, variantanalyzerReece Hart
 
Invitae PSB 2014 poster
Invitae PSB 2014 posterInvitae PSB 2014 poster
Invitae PSB 2014 poster
Reece Hart
 
ASHG 2012 Poster
ASHG 2012 PosterASHG 2012 Poster
ASHG 2012 Poster
Reece Hart
 
Building a clinical genome interpretation services company
Building a clinical genome interpretation services companyBuilding a clinical genome interpretation services company
Building a clinical genome interpretation services company
Reece Hart
 
Bio-IT 2010 Genome Commons
Bio-IT 2010 Genome CommonsBio-IT 2010 Genome Commons
Bio-IT 2010 Genome CommonsReece Hart
 
HVP Critical Assessment of Genome Interpretation
HVP Critical Assessment of Genome InterpretationHVP Critical Assessment of Genome Interpretation
HVP Critical Assessment of Genome Interpretation
Reece Hart
 
A Tour of Research Computing at Genentech
A Tour of Research Computing at GenentechA Tour of Research Computing at Genentech
A Tour of Research Computing at GenentechReece Hart
 
Integrating Public and Private Data: Lessons Learned from Unison
Integrating Public and Private Data: Lessons Learned from UnisonIntegrating Public and Private Data: Lessons Learned from Unison
Integrating Public and Private Data: Lessons Learned from UnisonReece Hart
 

More from Reece Hart (8)

HGVS 2015 poster: hgvs, uta, variantanalyzer
HGVS 2015 poster: hgvs, uta, variantanalyzerHGVS 2015 poster: hgvs, uta, variantanalyzer
HGVS 2015 poster: hgvs, uta, variantanalyzer
 
Invitae PSB 2014 poster
Invitae PSB 2014 posterInvitae PSB 2014 poster
Invitae PSB 2014 poster
 
ASHG 2012 Poster
ASHG 2012 PosterASHG 2012 Poster
ASHG 2012 Poster
 
Building a clinical genome interpretation services company
Building a clinical genome interpretation services companyBuilding a clinical genome interpretation services company
Building a clinical genome interpretation services company
 
Bio-IT 2010 Genome Commons
Bio-IT 2010 Genome CommonsBio-IT 2010 Genome Commons
Bio-IT 2010 Genome Commons
 
HVP Critical Assessment of Genome Interpretation
HVP Critical Assessment of Genome InterpretationHVP Critical Assessment of Genome Interpretation
HVP Critical Assessment of Genome Interpretation
 
A Tour of Research Computing at Genentech
A Tour of Research Computing at GenentechA Tour of Research Computing at Genentech
A Tour of Research Computing at Genentech
 
Integrating Public and Private Data: Lessons Learned from Unison
Integrating Public and Private Data: Lessons Learned from UnisonIntegrating Public and Private Data: Lessons Learned from Unison
Integrating Public and Private Data: Lessons Learned from Unison
 

Recently uploaded

ABDOMINAL TRAUMA in pediatrics part one.
ABDOMINAL TRAUMA in pediatrics part one.ABDOMINAL TRAUMA in pediatrics part one.
ABDOMINAL TRAUMA in pediatrics part one.
drhasanrajab
 
basicmodesofventilation2022-220313203758.pdf
basicmodesofventilation2022-220313203758.pdfbasicmodesofventilation2022-220313203758.pdf
basicmodesofventilation2022-220313203758.pdf
aljamhori teaching hospital
 
Temporomandibular Joint By RABIA INAM GANDAPORE.pptx
Temporomandibular Joint By RABIA INAM GANDAPORE.pptxTemporomandibular Joint By RABIA INAM GANDAPORE.pptx
Temporomandibular Joint By RABIA INAM GANDAPORE.pptx
Dr. Rabia Inam Gandapore
 
Maxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptx
Maxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptxMaxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptx
Maxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptx
Dr. Rabia Inam Gandapore
 
Netter's Atlas of Human Anatomy 7.ed.pdf
Netter's Atlas of Human Anatomy 7.ed.pdfNetter's Atlas of Human Anatomy 7.ed.pdf
Netter's Atlas of Human Anatomy 7.ed.pdf
BrissaOrtiz3
 
Pictures of Superficial & Deep Fascia.ppt.pdf
Pictures of Superficial & Deep Fascia.ppt.pdfPictures of Superficial & Deep Fascia.ppt.pdf
Pictures of Superficial & Deep Fascia.ppt.pdf
Dr. Rabia Inam Gandapore
 
Colonic and anorectal physiology with surgical implications
Colonic and anorectal physiology with surgical implicationsColonic and anorectal physiology with surgical implications
Colonic and anorectal physiology with surgical implications
Dr Maria Tamanna
 
Ophthalmology Clinical Tests for OSCE exam
Ophthalmology Clinical Tests for OSCE examOphthalmology Clinical Tests for OSCE exam
Ophthalmology Clinical Tests for OSCE exam
KafrELShiekh University
 
The Electrocardiogram - Physiologic Principles
The Electrocardiogram - Physiologic PrinciplesThe Electrocardiogram - Physiologic Principles
The Electrocardiogram - Physiologic Principles
MedicoseAcademics
 
Identification and nursing management of congenital malformations .pptx
Identification and nursing management of congenital malformations .pptxIdentification and nursing management of congenital malformations .pptx
Identification and nursing management of congenital malformations .pptx
MGM SCHOOL/COLLEGE OF NURSING
 
Top-Vitamin-Supplement-Brands-in-India List
Top-Vitamin-Supplement-Brands-in-India ListTop-Vitamin-Supplement-Brands-in-India List
Top-Vitamin-Supplement-Brands-in-India List
SwisschemDerma
 
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists  Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Saeid Safari
 
Physiology of Special Chemical Sensation of Taste
Physiology of Special Chemical Sensation of TastePhysiology of Special Chemical Sensation of Taste
Physiology of Special Chemical Sensation of Taste
MedicoseAcademics
 
Physiology of Chemical Sensation of smell.pdf
Physiology of Chemical Sensation of smell.pdfPhysiology of Chemical Sensation of smell.pdf
Physiology of Chemical Sensation of smell.pdf
MedicoseAcademics
 
ANATOMY AND PHYSIOLOGY OF URINARY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF URINARY SYSTEM.pptxANATOMY AND PHYSIOLOGY OF URINARY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF URINARY SYSTEM.pptx
Swetaba Besh
 
Integrating Ayurveda into Parkinson’s Management: A Holistic Approach
Integrating Ayurveda into Parkinson’s Management: A Holistic ApproachIntegrating Ayurveda into Parkinson’s Management: A Holistic Approach
Integrating Ayurveda into Parkinson’s Management: A Holistic Approach
Ayurveda ForAll
 
Cervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptxCervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptx
Dr. Rabia Inam Gandapore
 
Aortic Association CBL Pilot April 19 – 20 Bern
Aortic Association CBL Pilot April 19 – 20 BernAortic Association CBL Pilot April 19 – 20 Bern
Aortic Association CBL Pilot April 19 – 20 Bern
suvadeepdas911
 
Dehradun #ℂall #gIRLS Oyo Hotel 8107221448 #ℂall #gIRL in Dehradun
Dehradun #ℂall #gIRLS Oyo Hotel 8107221448 #ℂall #gIRL in DehradunDehradun #ℂall #gIRLS Oyo Hotel 8107221448 #ℂall #gIRL in Dehradun
Dehradun #ℂall #gIRLS Oyo Hotel 8107221448 #ℂall #gIRL in Dehradun
chandankumarsmartiso
 
Top 10 Best Ayurvedic Kidney Stone Syrups in India
Top 10 Best Ayurvedic Kidney Stone Syrups in IndiaTop 10 Best Ayurvedic Kidney Stone Syrups in India
Top 10 Best Ayurvedic Kidney Stone Syrups in India
Swastik Ayurveda
 

Recently uploaded (20)

ABDOMINAL TRAUMA in pediatrics part one.
ABDOMINAL TRAUMA in pediatrics part one.ABDOMINAL TRAUMA in pediatrics part one.
ABDOMINAL TRAUMA in pediatrics part one.
 
basicmodesofventilation2022-220313203758.pdf
basicmodesofventilation2022-220313203758.pdfbasicmodesofventilation2022-220313203758.pdf
basicmodesofventilation2022-220313203758.pdf
 
Temporomandibular Joint By RABIA INAM GANDAPORE.pptx
Temporomandibular Joint By RABIA INAM GANDAPORE.pptxTemporomandibular Joint By RABIA INAM GANDAPORE.pptx
Temporomandibular Joint By RABIA INAM GANDAPORE.pptx
 
Maxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptx
Maxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptxMaxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptx
Maxilla, Mandible & Hyoid Bone & Clinical Correlations by Dr. RIG.pptx
 
Netter's Atlas of Human Anatomy 7.ed.pdf
Netter's Atlas of Human Anatomy 7.ed.pdfNetter's Atlas of Human Anatomy 7.ed.pdf
Netter's Atlas of Human Anatomy 7.ed.pdf
 
Pictures of Superficial & Deep Fascia.ppt.pdf
Pictures of Superficial & Deep Fascia.ppt.pdfPictures of Superficial & Deep Fascia.ppt.pdf
Pictures of Superficial & Deep Fascia.ppt.pdf
 
Colonic and anorectal physiology with surgical implications
Colonic and anorectal physiology with surgical implicationsColonic and anorectal physiology with surgical implications
Colonic and anorectal physiology with surgical implications
 
Ophthalmology Clinical Tests for OSCE exam
Ophthalmology Clinical Tests for OSCE examOphthalmology Clinical Tests for OSCE exam
Ophthalmology Clinical Tests for OSCE exam
 
The Electrocardiogram - Physiologic Principles
The Electrocardiogram - Physiologic PrinciplesThe Electrocardiogram - Physiologic Principles
The Electrocardiogram - Physiologic Principles
 
Identification and nursing management of congenital malformations .pptx
Identification and nursing management of congenital malformations .pptxIdentification and nursing management of congenital malformations .pptx
Identification and nursing management of congenital malformations .pptx
 
Top-Vitamin-Supplement-Brands-in-India List
Top-Vitamin-Supplement-Brands-in-India ListTop-Vitamin-Supplement-Brands-in-India List
Top-Vitamin-Supplement-Brands-in-India List
 
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists  Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists
 
Physiology of Special Chemical Sensation of Taste
Physiology of Special Chemical Sensation of TastePhysiology of Special Chemical Sensation of Taste
Physiology of Special Chemical Sensation of Taste
 
Physiology of Chemical Sensation of smell.pdf
Physiology of Chemical Sensation of smell.pdfPhysiology of Chemical Sensation of smell.pdf
Physiology of Chemical Sensation of smell.pdf
 
ANATOMY AND PHYSIOLOGY OF URINARY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF URINARY SYSTEM.pptxANATOMY AND PHYSIOLOGY OF URINARY SYSTEM.pptx
ANATOMY AND PHYSIOLOGY OF URINARY SYSTEM.pptx
 
Integrating Ayurveda into Parkinson’s Management: A Holistic Approach
Integrating Ayurveda into Parkinson’s Management: A Holistic ApproachIntegrating Ayurveda into Parkinson’s Management: A Holistic Approach
Integrating Ayurveda into Parkinson’s Management: A Holistic Approach
 
Cervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptxCervical & Brachial Plexus By Dr. RIG.pptx
Cervical & Brachial Plexus By Dr. RIG.pptx
 
Aortic Association CBL Pilot April 19 – 20 Bern
Aortic Association CBL Pilot April 19 – 20 BernAortic Association CBL Pilot April 19 – 20 Bern
Aortic Association CBL Pilot April 19 – 20 Bern
 
Dehradun #ℂall #gIRLS Oyo Hotel 8107221448 #ℂall #gIRL in Dehradun
Dehradun #ℂall #gIRLS Oyo Hotel 8107221448 #ℂall #gIRL in DehradunDehradun #ℂall #gIRLS Oyo Hotel 8107221448 #ℂall #gIRL in Dehradun
Dehradun #ℂall #gIRLS Oyo Hotel 8107221448 #ℂall #gIRL in Dehradun
 
Top 10 Best Ayurvedic Kidney Stone Syrups in India
Top 10 Best Ayurvedic Kidney Stone Syrups in IndiaTop 10 Best Ayurvedic Kidney Stone Syrups in India
Top 10 Best Ayurvedic Kidney Stone Syrups in India
 

Clinical significance of transcript alignment discrepancies gne - 20141016

  • 1. TThhee CClliinniiccaall SSiiggnniiffiiccaannccee ooff TTrraannssccrriipptt AAlliiggnnmmeenntt DDiissccrreeppaanncciieess …… aanndd ttoooollss ttoo hheellpp yyoouu ddeeaall wwiitthh tthheemm.. RReeeeccee HHaarrtt,, PPhh..DD.. rrhhaarrtt@@2233aannddmmee..ccoomm GGeenneenntteecchh 22001144--1100--1166 Available on SlideShare (http://www.slideshare.net/reecehart)
  • 2. The fidelity of transcript-ggeennoommee mmaappppiinngg mmaatttteerrss.. 2 / 28 Variants are identified and computed on in genome coordinates Variants are analyzed and communicated using transcript coordinates genome to transcript (g. to c.) transcript to genome (c. to g.)
  • 3. Motivation 1: Discordant eexxoonn ccoooorrddiinnaatteess NNCCBBII aanndd UUCCSSCC rreeppoorrtt ddiiffffeerreenntt ccoooorrddiinnaatteess ffoorr CCAARRDD99,, NNMM__005522881133..33,, eexxoonn 1122 exon 12 displaced 322 nt 3 / 28 UCSC (BLAT) NCBI (Splign) Consequences: 1. An assay that targets the wrong genomic region will generate uninformative sequence data. 2. A genomic variant will be interpreted as exonic when it is intronic, or vice versa.
  • 4. Motivation 2: iinnddeellss ccoonnffoouunndd mmaappppiinngg NNMM__000066115588..33 ((NNEEFFLL)) ccoonnttaaiinnss iinnddeell iinn CCDDSS 4 / 28 Deletion justified differently!
  • 5. Motivation 3: Data mmaannaaggeemmeenntt cchhaalllleennggeess ➢ Mutable data (!) ➢ Sporadic failures ➢ Inconsistent data from a single source ➢ Inconsistent data across sources ➢ Opaque and implicit data definitions ➢ Historical alignment data not available Source AC Reference exons EUtils NM_005168.3 GRCh37.p10 1146 / 125 / 320 / 1998 NM_005168.4 NG_008492.1 1398 / 125 / 320 / 1998 seqgene NM_005168.3 GRCh37.p10 102 / 1046 / 125 / 321 / 143 / 1855 UCSC NM_005168.4 hg19 1398 / 135 / 244 / 76 / 1997 5 / 28
  • 6. Motivation 4: Use Ensembl for Variant EEffffeecctt PPrreeddiiccttiioonn 6 / 28 RefAgree Do transcript and genome sequences agree? Transcript Equivalence Which RefSeq and Ensembl transcripts are equivalent? RefSeq (NM) Ensembl (ENST) Genome (GRCh37) ➊ SNV ➌ ➋ Indel ➍ Historical Transcripts UCSC (NM) LRG, BIC, …
  • 7. Garla, V., Kong, Y., Szpakowski, S., & Krauthammer, M. (2011). MU2A--reconciling the genome and transcriptome to determine the effects of base substitutions. Bioinformatics (Oxford, England), 27(3), 416-8. doi:10.1093/bioinformatics/btq658 7 / 28
  • 8. Challenges and Solutions iinn TTrraannssccrriipptt MMaannaaggeemmeenntt 8 / 28 ➢ Biological ● Alternative splicing ● Paralogs ● Natural polymorphisms ● Alternative references ➢ Technical / Logistical ● Multiple transcript sources ● Multiple alignment methods ● Multiple references ● Genome-transcript sequence differences ● Historical transcript alignments ➢ Existing resources ● RefSeq, UCSC, Ensembl ● Locus Reference Genomic ● Mutalyzer ➢ See also ● McCarthy DJ¸ et al. Genome Medicine 6:26 (2014). ● Garla V, et al. Bioinformatics 27(3): 416–8 (2010).
  • 9.
  • 10. Part 1 The Universal Transcript Archive 10 / 28
  • 11. UTA solves four issues with ttrraannssccrriipptt mmaannaaggeemmeenntt.. A Transcript ≠≠ Genome Reference ➊ SNV ➋ ➍Exon coordinate differences between sources for same accession 11 / 28 T RefSeq NM_01234.5 ➌ RefSeq NM_01234.4 InDel UCSC NM_01234.5 Historical transcripts alignments no longer available
  • 12. Universal Transcript AArrcchhiivvee ((UUTTAA)) MMuullttiippllee ssoouurrcceess,, mmuullttiippllee vveerrssiioonnss,, mmuullttiippllee aalliiggnnmmeenntt mmeetthhooddss iinn oonnee ddaattaabbaassee 12 / 28 transcript NM_01234.4 NM_01234.4 NM_01234.5 NM_01234.5 NM_01234.5 NM_01234.5 ENST012345 ENST012345 reference NM_01234.4 NC_000012.3 NM_01234.5 NC_000012.3 AC_45678.9 NC_000012.3 ENST012345 NC_000012.3 method self splign self splign splign blat self genebuild exons exon set
  • 13. Universal Transcript AArrcchhiivvee ((UUTTAA)) MMuullttiippllee ssoouurrcceess,, mmuullttiippllee vveerrssiioonnss,, mmuullttiippllee aalliiggnnmmeenntt mmeetthhooddss iinn oonnee ddaattaabbaassee 13 / 28 transcript NM_01234.4 NM_01234.4 NM_01234.5 NM_01234.5 NM_01234.5 NM_01234.5 ENST012345 ENST012345 reference NM_01234.4 NC_000012.3 NM_01234.5 NC_000012.3 AC_45678.9 NC_000012.3 ENST012345 NC_000012.3 method self splign self splign splign blat self genebuild exons exon set exon alignments NM_01234.4 NC_000012.3 0 50≠ NM_01234.4 NC_000012.3 1 100≠1X49≠ NM_01234.4 NC_000012.3 2 5≠1I44≠ ➊➋ Alignments use coordinates from source databases.
  • 14. Universal Transcript AArrcchhiivvee ((UUTTAA)) MMuullttiippllee ssoouurrcceess,, mmuullttiippllee vveerrssiioonnss,, mmuullttiippllee aalliiggnnmmeenntt mmeetthhooddss iinn oonnee ddaattaabbaassee 14 / 28 transcript NM_01234.4 NM_01234.4 NM_01234.5 NM_01234.5 NM_01234.5 NM_01234.5 ENST012345 ENST012345 reference NM_01234.4 NC_000012.3 NM_01234.5 NC_000012.3 AC_45678.9 NC_000012.3 ENST012345 NC_000012.3 method self splign self splign splign blat self genebuild exons exon set ➌
  • 15. Universal Transcript AArrcchhiivvee ((UUTTAA)) MMuullttiippllee ssoouurrcceess,, mmuullttiippllee vveerrssiioonnss,, mmuullttiippllee aalliiggnnmmeenntt mmeetthhooddss iinn oonnee ddaattaabbaassee 15 / 28 transcript NM_01234.4 NM_01234.4 NM_01234.5 NM_01234.5 NM_01234.5 NM_01234.5 ENST012345 ENST012345 reference NM_01234.4 NC_000012.3 NM_01234.5 NC_000012.3 AC_45678.9 NC_000012.3 ENST012345 NC_000012.3 method self splign self splign splign blat self genebuild exons exon set ➍
  • 16. ““RefAgree” Statistics by Protein CCooddiinngg TTrraannssccrriipptt SSeeqquueennccee ccoonnccoorrddaannccee bbeettwweeeenn RReeffSSeeqq aanndd GGRRCChh3377 pprriimmaarryy aasssseemmbbllyy ➊➋ 34531 NM transcripts (Jan 2014) 760 0.2% with length discrepancies 3481 10% with substitutions 321 0.9% with deletions 255 0.7% with insertions 16 / 28 c.f. Garla V, et al. Bioinformatics 27(3): 416–8 (2010).
  • 17. Exon structures have uunniiqquuee ffiinnggeerrpprriinnttss IIddeennttiiffyyiinngg EENNSSTT--NNMM eeqquuiivvaalleenncceess wwiitthh ffiinnggeerrpprriinnttss => select N.hgnc,N.es_fingerprint,N.tx_ac,E.tx_ac from uta_20140210.tx_exon_set_summary_mv N join uta_20140210.tx_exon_set_summary_mv E on N.es_fingerprint=E.es_fingerprint and N.tx_ac ~ '^NM_' and E.tx_ac ~ '^ENST' and N.alt_aln_method='transcript' and E.alt_aln_method='transcript'; ┌─────────┬──────────────────────────────────┬────────────────┬─────────────────┐ │ hgnc │ es_fingerprint │ tx_ac │ tx_ac │ ├─────────┼──────────────────────────────────┼────────────────┼─────────────────┤ │ AFF2 │ db0e20be1a2bb687c33227d2e6bf9d53 │ NM_002025.3 │ ENST00000370460 │ │ UBE3A │ d1eace7da295c45378fa5f898f2f03f6 │ NM_130838.1 │ ENST00000438097 │ │ ANXA8L1 │ 1f6fd4f3fe9854aa468489ec7f507512 │ NM_001098845.1 │ ENST00000359178 │ │ APOL5 │ 939a9e9e4a46ef9aef862cf9b369afe6 │ NM_030642.1 │ ENST00000249044 │ │ ARID4B │ 524fc954d10b08a4014e86aee81d0358 │ NM_016374.5 │ ENST00000264183 │ 17 / 28
  • 18. NCBI (Splign) v. UCSC (BBLLAATT)) AAlliiggnnmmeenntt SSttaattiissttiiccss SSpplliiggnn aanndd BBLLAATT pprroovviiddee ssiiggnniiffiiccaannttllyy ddiiffffeerreenntt eexxoonn ssttrruuccttuurreess ffoorr 888866 ttrraannssccrriippttss Are Splign and BLAT similar ? 18 / 28 31472 (97.3%) transcripts Y N 32358 transcripts w/exon structures ➌ 886 (2.7%) transcripts “similar” means either 1) identical exon coordinates, or 2) coordinates that differ only by short 3' terminal artifacts
  • 19. Characterization of transcripts ddiissccrreeppaanncciieess WWhheetthheerr aalliiggnnmmeennttss pprroovviiddeedd bbyy NNCCBBII aanndd UUCCSSCC aaggrreeee wwiitthh GGRRCChh3377 pprriimmaarryy sseeqquueennccee.. Splign BLAT T F T 14 18 F 545 311 886 transcripts with significant discrepancies 19 / 28
  • 20. Characterization of transcripts ddiissccrreeppaanncciieess RReeffeerreennccee aaggrreeeemmeenntt ((bblluuee)) aanndd aalliiggnnmmeenntt ““ssiimmpplliicciittyy”” ((ggrreeeenn)) Splign BLAT T F T 14 18 F 545 311 20 / 28 Splign Splign BLAT T F T 200 (0) 4 (97) F 90 (82) 16 (84) BLAT T F T 6 (41) 12 (180) F Splign Splign BLAT T F T 434 (7) F 110 (652) BLAT T F T 14 (11) F 886 transcripts with significant discrepancies
  • 21. AACCMMGG ““MMuusstt RReeppoorrtt”” GGeenneess Green, R. C., Berg, J. S., Grody, W. W., Kalia, S. S., Korf, B. R., Martin, C. L., … Biesecker, L. G. (2013). ACMG recommendations for reporting of incidental findings in clinical exome and genome sequencing. Genetics in Medicine : Official Journal of the American College of Medical Genetics, 15(7), 565–74. doi:10.1038/gim.2013.73 21 / 28
  • 22. Summary of Splign-BLAT gene-wwiissee ccoooorrddiinnaattee ddeellttaass.. delta # genes # ACMG must 22 / 28 report =0 15206 45 >=1 183 8 >=10 116 0 >=25 6 0 >=50 5 0 >=250 13 0 >=1000 94 3 delta ≝ minimum per gene of maximum per transcript of difference of exon coordinates between NCBI and UCSC. Identical Exon Structures (all trivial diffs) LDLR, MYL2, PRKAG2, SDHB, SDHC, TGFBR1, TGFBR2, WT1 MYBPC3, MYH7, TNNI3
  • 23. Part 2 Using HGVS “Nomenclature” (http://www.hgvs.org/mutnomen/) 23 / 28
  • 24. 24 / 28 HHGGVVSS PPyytthhoonn PPaacckkaaggee hhttttpp::////bbiittbbuucckkeett..oorrgg//hhggvvss//hhggvvss// ➢ Parser ● HGVS → Python object ● Based on a Parsing Expression Grammar ➢ Formatter ● Python object → HGVS ➢ Validator ● intrinsic & extrinsic validation ➢ Mapping tools indel-aware! ● g. ↔ c. → p. (m,n,r also supported) ● transcript-to-transcript liftover ● uses on UTA data
  • 25. Example: Variant liftover bbeettwweeeenn ttrraannssccrriippttss Map from ➀ NM_182763.2:c.688+403C>T to ➁ NC_000001.10:g.150550916G>A to ➂ NM_001197320.1:281C>T with Splign alignments 25 / 28 NM_182763.2 NP_877495.1 NM_001197320.1 NP_001184249.1 ➀ ➂ ➁ NC_000001.10
  • 26. 26 / 28 DDeevveellooppeerr IInnffoo Testing ➢ 91% code coverage ➢ 25665 tests variants ● ~200 hand curated, rest from dbSNP ● 23436 sub, 1254 del, 908 ins, 45 delins, 22 dup ● 44 distinct transcripts, many selected for difficulty ➢ >99% concordance with Mutalyzer ● using >100K variants from ClinVar Upcoming directions (all issues are publicly readable) ➢ multi-variant alleles ➢ release LRG ➢ GRCh38 ➢ API changes
  • 27. CCoonncclluussiioonnss ➢ The fidelity of reference-transcript mapping matters ● For ~800 transcripts, splign and BLAT generate significantly different alignments ● These differences might affect the interpretation of clinically-relevant genes (including 3 ACMG must report genes) ➢ Current resources have important limitations ➢ Two tools may help you deal with these limitations ● UTA – Freely available archive of transcripts from multiple sources ● HGVS – Comprehensive parsing, formatting, manipulation, and validation of variants 27 / 28
  • 28. 28 / 28 AAcckknnoowwlleeddggeemmeennttss ➢ Invitae ● Vince Fusaro ● John Garcia ● Emily Hare ● Kevin Jacobs ● Geoff Nilsen ● Rudy Rico ● Jody Westbrook ● ● ● http://goo.gl/dq2uoW http://bitbucket.com/hgvs/hgvs http://bitbucket.com/uta/uta ➢ Code (Python) ➢ Documentation & Examples ➢ Issues ➢ BED files ➢ Code testing is public Or just: pip install hgvs