The document discusses the clinical significance of transcript alignment discrepancies and tools to help deal with them. It provides several motivations for why accurate transcript alignments are important, such as different exon coordinates being reported between databases, indels confounding mappings, and data management challenges. It then introduces the Universal Transcript Archive (UTA) as a solution to consolidate transcript alignment data from multiple sources and versions into a single database. The UTA allows comparisons of exon structures and fingerprints to determine RefSeq-Ensembl transcript equivalences. Statistics on reference agreement and alignment simplicity are also characterized for transcripts with significant alignment discrepancies between NCBI and UCSC. Finally, the document discusses using HGVS nomenclature and an HGVS Python package
The Clinical Significance of Transcript Alignment DiscrepanciesReece Hart
Gene transcripts are the lens through which we understand variants that are identified by genome sequencing, reported in scientific literature, and communicated on clinical reports. An accurate, shared representation of transcripts is essential to communicating variants reliably. This talk presents observations of significant discrepancies between sources of transcripts that will lead to discrepancies in the clinical interpretation of variants, and tools that we have released to contend with these complexities.
The Clinical Significance of Transcript Alignment DiscrepanciesReece Hart
Gene transcripts are the lens through which we understand variants that are identified by genome sequencing, reported in scientific literature, and communicated on clinical reports. An accurate, shared representation of transcripts is essential to communicating variants reliably. This talk presents observations of significant discrepancies between sources of transcripts that will lead to discrepancies in the clinical interpretation of variants, and tools that we have released to contend with these complexities.
A Genome Sequence Analysis System Built with HypertableDATAVERSITY
Deep genome sequencing has revolutionized the fields of biology and medicine. Since January 2008, the capacity to generate sequence data has increased exponentially, far outpacing Moore's Law. The emergence of scalable NoSQL database technologies has made the analysis of this vast amount of sequence data not only feasible, but cost effective.
The University of California at San Francisco UCSF-Abbott Viral Detection and Discovery Center, led by director Charles Chiu, MD, PhD, Taylor Sittler, MD and the Hypertable development team have embarked upon a project to build a scalable software platform to facilitate deep sequencing analysis in diagnostic microbiology, transcriptomic analysis, and clinical / environmental metagenomics, areas for which existing commercial and academic solutions are sorely lacking. Doug Judd, the original creator of Hypertable, will present an overview of this genome sequencing analysis system. The presentation will cover the following topics:
Rationale for choosing NoSQL
Schema design
Sources and description of input data
Algorithms for generating and querying lookup tables
Table sizes and compression ratios
Lessons learned during system deployment
Avoidance of stochastic RNA interactions can be harnessed to control protein ...Paul Gardner
Presented at the Computational RNA Biology conference in Hinxton, 17-19th October, 2016.
https://coursesandconferences.wellcomegenomecampus.org/events/item.aspx?e=584
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...IJERD Editor
Simple Sequence Repeats (SSR), also known as Microsatellites, have been extensively used as
molecular markers due to their abundance and high degree of polymorphism. The nucleotide sequences of
polymorphic forms of the same gene should be 99.9% identical. So, Microsatellites extraction from the Gene is
crucial. However, Microsatellites repeat count is compared, if they differ largely, he has some disorder. The Y
chromosome likely contains 50 to 60 genes that provide instructions for making proteins. Because only males
have the Y chromosome, the genes on this chromosome tend to be involved in male sex determination and
development. Several Microsatellite Extractors exist and they fail to extract microsatellites on large data sets of
giga bytes and tera bytes in size. The proposed tool “MS-Extractor: An Innovative Approach to extract
Microsatellites on „Y‟ Chromosome” can extract both Perfect as well as Imperfect Microsatellites from large
data sets of human genome „Y‟. The proposed system uses string matching with sliding window approach to
locate Microsatellites and extracts them.
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
This workshop will address critical issues related to Transcriptomics data:
Processing raw Next Generation Sequencing (NGS) data:
1. Next Generation Sequencing data preprocessing:
Trimming technical sequences
Removing PCR duplicates
2. RNA-seq based quantification of expression levels:
Conventional pipelines (looking at known transcripts)
Identification of novel isoforms
Analysis of Expression Data Using Machine Learning:
3. Unsupervised analysis of expression data:
Principal Component Analysis
Clustering
4. Supervised analysis:
Differential expression analysis
Classification, gene signature construction
5. Gene set enrichment analysis
The workshop will include hands-on exercises utilizing public domain datasets:
breast cancer cell lines transcriptomic profiles (https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-10-r110),
patient-derived xenograft (PDX) mouse model of tumor and stroma transcriptomic profiles (http://www.oncotarget.com/index.php?journal=oncotarget&page=article&op=view&path[]=8014&path[]=23533), and
processed data from The Cancer Genome Atlas samples (https://cancergenome.nih.gov/).
Team: The workshops are designed by the researchers at the Tauber Bioinformatics Research Center at University of Haifa, Israel in collaboration with academic centers across the US. Technical support for the workshops is provided by the Pine Biotech team. https://edu.t-bio.info/a-critical-approach-to-transcriptomic-data-analysis/
SNP genotyping using Affymetrix' Axiom Genotyping SolutionAffymetrix
Michael Shapero, Product Development, Affymetrix.Outline of the development and performance of the new Axiom® 384 high-throughput format for low-cost genotyping of up to 50,000 SNPs.
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...Databricks
Epinomics is advancing epigenetic research to drive personalized medicine, using epigenomic data analysis. Their goal is to provide an analysis resource to the community that will promote high-quality data and replicable and interpretable results. They work with academic and commercial users to ingest and analyze their genomic sequencing data and metadata. They extract epigenetic features from the sequenced genome, called “chromatin accessibility”, which are indicative of instrumental epigenetic changes responsible for differential gene expression and disease development.
Epinomics has built an Apache Spark-based pipeline that retrieves chromatin accessibility data from the epigenome, uses GraphX to find overlapping accessibility atlas and then clusters the data and runs machine learning algorithms. This session will provide a primer on epigenomics, details about Epinomics’ Spark-based data pipeline focusing on parallel bioinformatic analysis, and how they use machine learning models to build the epigenomic landscape and accelerate the field of personalized immunotherapy. use GraphX to find overlapping accessibility atlas and then cluster the data and run machine learning algorithms.
In this talk we will provide a primer on epigenomics, details about our Spark based data pipeline focusing on parallel bioinformatic analysis and how we use machine learning models to build the epigenomic landscape and accelerate the field of personalized immunotherapy.
A Genome Sequence Analysis System Built with HypertableDATAVERSITY
Deep genome sequencing has revolutionized the fields of biology and medicine. Since January 2008, the capacity to generate sequence data has increased exponentially, far outpacing Moore's Law. The emergence of scalable NoSQL database technologies has made the analysis of this vast amount of sequence data not only feasible, but cost effective.
The University of California at San Francisco UCSF-Abbott Viral Detection and Discovery Center, led by director Charles Chiu, MD, PhD, Taylor Sittler, MD and the Hypertable development team have embarked upon a project to build a scalable software platform to facilitate deep sequencing analysis in diagnostic microbiology, transcriptomic analysis, and clinical / environmental metagenomics, areas for which existing commercial and academic solutions are sorely lacking. Doug Judd, the original creator of Hypertable, will present an overview of this genome sequencing analysis system. The presentation will cover the following topics:
Rationale for choosing NoSQL
Schema design
Sources and description of input data
Algorithms for generating and querying lookup tables
Table sizes and compression ratios
Lessons learned during system deployment
Avoidance of stochastic RNA interactions can be harnessed to control protein ...Paul Gardner
Presented at the Computational RNA Biology conference in Hinxton, 17-19th October, 2016.
https://coursesandconferences.wellcomegenomecampus.org/events/item.aspx?e=584
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...IJERD Editor
Simple Sequence Repeats (SSR), also known as Microsatellites, have been extensively used as
molecular markers due to their abundance and high degree of polymorphism. The nucleotide sequences of
polymorphic forms of the same gene should be 99.9% identical. So, Microsatellites extraction from the Gene is
crucial. However, Microsatellites repeat count is compared, if they differ largely, he has some disorder. The Y
chromosome likely contains 50 to 60 genes that provide instructions for making proteins. Because only males
have the Y chromosome, the genes on this chromosome tend to be involved in male sex determination and
development. Several Microsatellite Extractors exist and they fail to extract microsatellites on large data sets of
giga bytes and tera bytes in size. The proposed tool “MS-Extractor: An Innovative Approach to extract
Microsatellites on „Y‟ Chromosome” can extract both Perfect as well as Imperfect Microsatellites from large
data sets of human genome „Y‟. The proposed system uses string matching with sliding window approach to
locate Microsatellites and extracts them.
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
This workshop will address critical issues related to Transcriptomics data:
Processing raw Next Generation Sequencing (NGS) data:
1. Next Generation Sequencing data preprocessing:
Trimming technical sequences
Removing PCR duplicates
2. RNA-seq based quantification of expression levels:
Conventional pipelines (looking at known transcripts)
Identification of novel isoforms
Analysis of Expression Data Using Machine Learning:
3. Unsupervised analysis of expression data:
Principal Component Analysis
Clustering
4. Supervised analysis:
Differential expression analysis
Classification, gene signature construction
5. Gene set enrichment analysis
The workshop will include hands-on exercises utilizing public domain datasets:
breast cancer cell lines transcriptomic profiles (https://genomebiology.biomedcentral.com/articles/10.1186/gb-2013-14-10-r110),
patient-derived xenograft (PDX) mouse model of tumor and stroma transcriptomic profiles (http://www.oncotarget.com/index.php?journal=oncotarget&page=article&op=view&path[]=8014&path[]=23533), and
processed data from The Cancer Genome Atlas samples (https://cancergenome.nih.gov/).
Team: The workshops are designed by the researchers at the Tauber Bioinformatics Research Center at University of Haifa, Israel in collaboration with academic centers across the US. Technical support for the workshops is provided by the Pine Biotech team. https://edu.t-bio.info/a-critical-approach-to-transcriptomic-data-analysis/
SNP genotyping using Affymetrix' Axiom Genotyping SolutionAffymetrix
Michael Shapero, Product Development, Affymetrix.Outline of the development and performance of the new Axiom® 384 high-throughput format for low-cost genotyping of up to 50,000 SNPs.
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...Databricks
Epinomics is advancing epigenetic research to drive personalized medicine, using epigenomic data analysis. Their goal is to provide an analysis resource to the community that will promote high-quality data and replicable and interpretable results. They work with academic and commercial users to ingest and analyze their genomic sequencing data and metadata. They extract epigenetic features from the sequenced genome, called “chromatin accessibility”, which are indicative of instrumental epigenetic changes responsible for differential gene expression and disease development.
Epinomics has built an Apache Spark-based pipeline that retrieves chromatin accessibility data from the epigenome, uses GraphX to find overlapping accessibility atlas and then clusters the data and runs machine learning algorithms. This session will provide a primer on epigenomics, details about Epinomics’ Spark-based data pipeline focusing on parallel bioinformatic analysis, and how they use machine learning models to build the epigenomic landscape and accelerate the field of personalized immunotherapy. use GraphX to find overlapping accessibility atlas and then cluster the data and run machine learning algorithms.
In this talk we will provide a primer on epigenomics, details about our Spark based data pipeline focusing on parallel bioinformatic analysis and how we use machine learning models to build the epigenomic landscape and accelerate the field of personalized immunotherapy.
ABDOMINAL TRAUMA in pediatrics part one.drhasanrajab
Abdominal trauma in pediatrics refers to injuries or damage to the abdominal organs in children. It can occur due to various causes such as falls, motor vehicle accidents, sports-related injuries, and physical abuse. Children are more vulnerable to abdominal trauma due to their unique anatomical and physiological characteristics. Signs and symptoms include abdominal pain, tenderness, distension, vomiting, and signs of shock. Diagnosis involves physical examination, imaging studies, and laboratory tests. Management depends on the severity and may involve conservative treatment or surgical intervention. Prevention is crucial in reducing the incidence of abdominal trauma in children.
These lecture slides, by Dr Sidra Arshad, offer a quick overview of the physiological basis of a normal electrocardiogram.
Learning objectives:
1. Define an electrocardiogram (ECG) and electrocardiography
2. Describe how dipoles generated by the heart produce the waveforms of the ECG
3. Describe the components of a normal electrocardiogram of a typical bipolar lead (limb II)
4. Differentiate between intervals and segments
5. Enlist some common indications for obtaining an ECG
6. Describe the flow of current around the heart during the cardiac cycle
7. Discuss the placement and polarity of the leads of electrocardiograph
8. Describe the normal electrocardiograms recorded from the limb leads and explain the physiological basis of the different records that are obtained
9. Define mean electrical vector (axis) of the heart and give the normal range
10. Define the mean QRS vector
11. Describe the axes of leads (hexagonal reference system)
12. Comprehend the vectorial analysis of the normal ECG
13. Determine the mean electrical axis of the ventricular QRS and appreciate the mean axis deviation
14. Explain the concepts of current of injury, J point, and their significance
Study Resources:
1. Chapter 11, Guyton and Hall Textbook of Medical Physiology, 14th edition
2. Chapter 9, Human Physiology - From Cells to Systems, Lauralee Sherwood, 9th edition
3. Chapter 29, Ganong’s Review of Medical Physiology, 26th edition
4. Electrocardiogram, StatPearls - https://www.ncbi.nlm.nih.gov/books/NBK549803/
5. ECG in Medical Practice by ABM Abdullah, 4th edition
6. Chapter 3, Cardiology Explained, https://www.ncbi.nlm.nih.gov/books/NBK2214/
7. ECG Basics, http://www.nataliescasebook.com/tag/e-c-g-basics
Ozempic: Preoperative Management of Patients on GLP-1 Receptor Agonists Saeid Safari
Preoperative Management of Patients on GLP-1 Receptor Agonists like Ozempic and Semiglutide
ASA GUIDELINE
NYSORA Guideline
2 Case Reports of Gastric Ultrasound
Title: Sense of Taste
Presenter: Dr. Faiza, Assistant Professor of Physiology
Qualifications:
MBBS (Best Graduate, AIMC Lahore)
FCPS Physiology
ICMT, CHPE, DHPE (STMU)
MPH (GC University, Faisalabad)
MBA (Virtual University of Pakistan)
Learning Objectives:
Describe the structure and function of taste buds.
Describe the relationship between the taste threshold and taste index of common substances.
Explain the chemical basis and signal transduction of taste perception for each type of primary taste sensation.
Recognize different abnormalities of taste perception and their causes.
Key Topics:
Significance of Taste Sensation:
Differentiation between pleasant and harmful food
Influence on behavior
Selection of food based on metabolic needs
Receptors of Taste:
Taste buds on the tongue
Influence of sense of smell, texture of food, and pain stimulation (e.g., by pepper)
Primary and Secondary Taste Sensations:
Primary taste sensations: Sweet, Sour, Salty, Bitter, Umami
Chemical basis and signal transduction mechanisms for each taste
Taste Threshold and Index:
Taste threshold values for Sweet (sucrose), Salty (NaCl), Sour (HCl), and Bitter (Quinine)
Taste index relationship: Inversely proportional to taste threshold
Taste Blindness:
Inability to taste certain substances, particularly thiourea compounds
Example: Phenylthiocarbamide
Structure and Function of Taste Buds:
Composition: Epithelial cells, Sustentacular/Supporting cells, Taste cells, Basal cells
Features: Taste pores, Taste hairs/microvilli, and Taste nerve fibers
Location of Taste Buds:
Found in papillae of the tongue (Fungiform, Circumvallate, Foliate)
Also present on the palate, tonsillar pillars, epiglottis, and proximal esophagus
Mechanism of Taste Stimulation:
Interaction of taste substances with receptors on microvilli
Signal transduction pathways for Umami, Sweet, Bitter, Sour, and Salty tastes
Taste Sensitivity and Adaptation:
Decrease in sensitivity with age
Rapid adaptation of taste sensation
Role of Saliva in Taste:
Dissolution of tastants to reach receptors
Washing away the stimulus
Taste Preferences and Aversions:
Mechanisms behind taste preference and aversion
Influence of receptors and neural pathways
Impact of Sensory Nerve Damage:
Degeneration of taste buds if the sensory nerve fiber is cut
Abnormalities of Taste Detection:
Conditions: Ageusia, Hypogeusia, Dysgeusia (parageusia)
Causes: Nerve damage, neurological disorders, infections, poor oral hygiene, adverse drug effects, deficiencies, aging, tobacco use, altered neurotransmitter levels
Neurotransmitters and Taste Threshold:
Effects of serotonin (5-HT) and norepinephrine (NE) on taste sensitivity
Supertasters:
25% of the population with heightened sensitivity to taste, especially bitterness
Increased number of fungiform papillae
Title: Sense of Smell
Presenter: Dr. Faiza, Assistant Professor of Physiology
Qualifications:
MBBS (Best Graduate, AIMC Lahore)
FCPS Physiology
ICMT, CHPE, DHPE (STMU)
MPH (GC University, Faisalabad)
MBA (Virtual University of Pakistan)
Learning Objectives:
Describe the primary categories of smells and the concept of odor blindness.
Explain the structure and location of the olfactory membrane and mucosa, including the types and roles of cells involved in olfaction.
Describe the pathway and mechanisms of olfactory signal transmission from the olfactory receptors to the brain.
Illustrate the biochemical cascade triggered by odorant binding to olfactory receptors, including the role of G-proteins and second messengers in generating an action potential.
Identify different types of olfactory disorders such as anosmia, hyposmia, hyperosmia, and dysosmia, including their potential causes.
Key Topics:
Olfactory Genes:
3% of the human genome accounts for olfactory genes.
400 genes for odorant receptors.
Olfactory Membrane:
Located in the superior part of the nasal cavity.
Medially: Folds downward along the superior septum.
Laterally: Folds over the superior turbinate and upper surface of the middle turbinate.
Total surface area: 5-10 square centimeters.
Olfactory Mucosa:
Olfactory Cells: Bipolar nerve cells derived from the CNS (100 million), with 4-25 olfactory cilia per cell.
Sustentacular Cells: Produce mucus and maintain ionic and molecular environment.
Basal Cells: Replace worn-out olfactory cells with an average lifespan of 1-2 months.
Bowman’s Gland: Secretes mucus.
Stimulation of Olfactory Cells:
Odorant dissolves in mucus and attaches to receptors on olfactory cilia.
Involves a cascade effect through G-proteins and second messengers, leading to depolarization and action potential generation in the olfactory nerve.
Quality of a Good Odorant:
Small (3-20 Carbon atoms), volatile, water-soluble, and lipid-soluble.
Facilitated by odorant-binding proteins in mucus.
Membrane Potential and Action Potential:
Resting membrane potential: -55mV.
Action potential frequency in the olfactory nerve increases with odorant strength.
Adaptation Towards the Sense of Smell:
Rapid adaptation within the first second, with further slow adaptation.
Psychological adaptation greater than receptor adaptation, involving feedback inhibition from the central nervous system.
Primary Sensations of Smell:
Camphoraceous, Musky, Floral, Pepperminty, Ethereal, Pungent, Putrid.
Odor Detection Threshold:
Examples: Hydrogen sulfide (0.0005 ppm), Methyl-mercaptan (0.002 ppm).
Some toxic substances are odorless at lethal concentrations.
Characteristics of Smell:
Odor blindness for single substances due to lack of appropriate receptor protein.
Behavioral and emotional influences of smell.
Transmission of Olfactory Signals:
From olfactory cells to glomeruli in the olfactory bulb, involving lateral inhibition.
Primitive, less old, and new olfactory systems with different path
Integrating Ayurveda into Parkinson’s Management: A Holistic ApproachAyurveda ForAll
Explore the benefits of combining Ayurveda with conventional Parkinson's treatments. Learn how a holistic approach can manage symptoms, enhance well-being, and balance body energies. Discover the steps to safely integrate Ayurvedic practices into your Parkinson’s care plan, including expert guidance on diet, herbal remedies, and lifestyle modifications.
2. The fidelity of transcript-ggeennoommee mmaappppiinngg mmaatttteerrss..
2 / 28
Variants are identified
and computed on in
genome coordinates
Variants are analyzed and
communicated using
transcript coordinates
genome to
transcript
(g. to c.)
transcript
to genome
(c. to g.)
3. Motivation 1: Discordant eexxoonn ccoooorrddiinnaatteess
NNCCBBII aanndd UUCCSSCC rreeppoorrtt ddiiffffeerreenntt ccoooorrddiinnaatteess ffoorr CCAARRDD99,, NNMM__005522881133..33,, eexxoonn 1122
exon 12
displaced 322 nt
3 / 28
UCSC
(BLAT)
NCBI
(Splign)
Consequences:
1. An assay that targets the wrong genomic region will generate
uninformative sequence data.
2. A genomic variant will be interpreted as exonic when it is
intronic, or vice versa.
5. Motivation 3: Data mmaannaaggeemmeenntt cchhaalllleennggeess
➢ Mutable data (!)
➢ Sporadic failures
➢ Inconsistent data from a single source
➢ Inconsistent data across sources
➢ Opaque and implicit data definitions
➢ Historical alignment data not available
Source AC Reference exons
EUtils NM_005168.3 GRCh37.p10 1146 / 125 / 320 / 1998
NM_005168.4 NG_008492.1 1398 / 125 / 320 / 1998
seqgene NM_005168.3 GRCh37.p10 102 / 1046 / 125 / 321 / 143 / 1855
UCSC NM_005168.4 hg19 1398 / 135 / 244 / 76 / 1997
5 / 28
6. Motivation 4: Use Ensembl for Variant EEffffeecctt PPrreeddiiccttiioonn
6 / 28
RefAgree
Do transcript and
genome sequences agree?
Transcript Equivalence
Which RefSeq and Ensembl
transcripts are equivalent?
RefSeq
(NM)
Ensembl
(ENST)
Genome
(GRCh37)
➊ SNV
➌
➋ Indel
➍ Historical Transcripts UCSC (NM)
LRG, BIC, …
7. Garla, V., Kong, Y., Szpakowski, S., & Krauthammer, M. (2011).
MU2A--reconciling the genome and transcriptome to determine the effects of base substitutions.
Bioinformatics (Oxford, England), 27(3), 416-8. doi:10.1093/bioinformatics/btq658
7 / 28
8. Challenges and Solutions iinn TTrraannssccrriipptt MMaannaaggeemmeenntt
8 / 28
➢ Biological
● Alternative splicing
● Paralogs
● Natural polymorphisms
● Alternative references
➢ Technical / Logistical
● Multiple transcript sources
● Multiple alignment methods
● Multiple references
● Genome-transcript sequence
differences
● Historical transcript alignments
➢ Existing resources
● RefSeq, UCSC, Ensembl
● Locus Reference Genomic
● Mutalyzer
➢ See also
● McCarthy DJ¸ et al. Genome
Medicine 6:26 (2014).
● Garla V, et al. Bioinformatics
27(3): 416–8 (2010).
9.
10. Part 1
The Universal Transcript Archive
10 / 28
11. UTA solves four issues with ttrraannssccrriipptt mmaannaaggeemmeenntt..
A
Transcript ≠≠ Genome Reference
➊ SNV
➋
➍Exon coordinate differences between sources for same accession
11 / 28
T
RefSeq
NM_01234.5
➌
RefSeq
NM_01234.4
InDel
UCSC
NM_01234.5
Historical transcripts alignments no longer available
18. NCBI (Splign) v. UCSC (BBLLAATT)) AAlliiggnnmmeenntt SSttaattiissttiiccss
SSpplliiggnn aanndd BBLLAATT pprroovviiddee ssiiggnniiffiiccaannttllyy ddiiffffeerreenntt eexxoonn ssttrruuccttuurreess ffoorr 888866 ttrraannssccrriippttss
Are Splign
and BLAT
similar ?
18 / 28
31472 (97.3%)
transcripts
Y
N
32358
transcripts
w/exon structures
➌
886 (2.7%)
transcripts
“similar” means either
1) identical exon coordinates, or
2) coordinates that differ only by
short 3' terminal artifacts
19. Characterization of transcripts ddiissccrreeppaanncciieess
WWhheetthheerr aalliiggnnmmeennttss pprroovviiddeedd bbyy NNCCBBII aanndd UUCCSSCC aaggrreeee wwiitthh GGRRCChh3377 pprriimmaarryy sseeqquueennccee..
Splign BLAT
T F
T 14 18
F 545 311
886 transcripts with
significant discrepancies
19 / 28
20. Characterization of transcripts ddiissccrreeppaanncciieess
RReeffeerreennccee aaggrreeeemmeenntt ((bblluuee)) aanndd aalliiggnnmmeenntt ““ssiimmpplliicciittyy”” ((ggrreeeenn))
Splign BLAT
T F
T 14 18
F 545 311
20 / 28
Splign
Splign
BLAT
T F
T 200
(0)
4
(97)
F 90
(82)
16
(84)
BLAT
T F
T 6
(41)
12
(180)
F
Splign
Splign
BLAT
T F
T 434
(7)
F 110
(652)
BLAT
T F
T 14
(11)
F
886 transcripts with
significant discrepancies
21. AACCMMGG ““MMuusstt RReeppoorrtt”” GGeenneess
Green, R. C., Berg, J. S., Grody, W. W., Kalia, S. S., Korf, B. R., Martin, C. L., … Biesecker, L. G. (2013).
ACMG recommendations for reporting of incidental findings in clinical exome and genome
sequencing. Genetics in Medicine : Official Journal of the American College of Medical Genetics,
15(7), 565–74. doi:10.1038/gim.2013.73
21 / 28
22. Summary of Splign-BLAT gene-wwiissee ccoooorrddiinnaattee ddeellttaass..
delta # genes # ACMG must
22 / 28
report
=0 15206 45
>=1 183 8
>=10 116 0
>=25 6 0
>=50 5 0
>=250 13 0
>=1000 94 3
delta ≝ minimum per gene of maximum per transcript of
difference of exon coordinates between NCBI and UCSC.
Identical Exon
Structures
(all trivial diffs)
LDLR, MYL2,
PRKAG2, SDHB,
SDHC, TGFBR1,
TGFBR2, WT1
MYBPC3, MYH7,
TNNI3
23. Part 2
Using HGVS “Nomenclature”
(http://www.hgvs.org/mutnomen/)
23 / 28
24. 24 / 28
HHGGVVSS PPyytthhoonn PPaacckkaaggee
hhttttpp::////bbiittbbuucckkeett..oorrgg//hhggvvss//hhggvvss//
➢ Parser
● HGVS → Python object
● Based on a Parsing Expression
Grammar
➢ Formatter
● Python object → HGVS
➢ Validator
● intrinsic & extrinsic validation
➢ Mapping tools indel-aware!
● g. ↔ c. → p. (m,n,r also supported)
● transcript-to-transcript liftover
● uses on UTA data
25. Example: Variant liftover bbeettwweeeenn ttrraannssccrriippttss
Map
from ➀ NM_182763.2:c.688+403C>T
to ➁ NC_000001.10:g.150550916G>A
to ➂ NM_001197320.1:281C>T
with Splign alignments
25 / 28
NM_182763.2
NP_877495.1
NM_001197320.1
NP_001184249.1
➀
➂
➁
NC_000001.10
26. 26 / 28
DDeevveellooppeerr IInnffoo
Testing
➢ 91% code coverage
➢ 25665 tests variants
● ~200 hand curated, rest from
dbSNP
● 23436 sub, 1254 del, 908 ins, 45
delins, 22 dup
● 44 distinct transcripts, many
selected for difficulty
➢ >99% concordance with
Mutalyzer
● using >100K variants from
ClinVar
Upcoming directions
(all issues are publicly readable)
➢ multi-variant alleles
➢ release LRG
➢ GRCh38
➢ API changes
27. CCoonncclluussiioonnss
➢ The fidelity of reference-transcript mapping matters
● For ~800 transcripts, splign and BLAT generate significantly different
alignments
● These differences might affect the interpretation of clinically-relevant
genes (including 3 ACMG must report genes)
➢ Current resources have important limitations
➢ Two tools may help you deal with these limitations
● UTA – Freely available archive of transcripts from multiple sources
● HGVS – Comprehensive parsing, formatting, manipulation, and validation
of variants
27 / 28
28. 28 / 28
AAcckknnoowwlleeddggeemmeennttss
➢ Invitae
● Vince Fusaro
● John Garcia
● Emily Hare
● Kevin Jacobs
● Geoff Nilsen
● Rudy Rico
● Jody Westbrook
●
●
● http://goo.gl/dq2uoW
http://bitbucket.com/hgvs/hgvs
http://bitbucket.com/uta/uta
➢ Code (Python)
➢ Documentation & Examples
➢ Issues
➢ BED files
➢ Code testing is public
Or just:
pip install hgvs