To fully interpret variants in the context of clinical genomics, as outlined by the ACMG interpretation guidelines, variants near canonical splice boundaries must be evaluated for their potential to disrupt gene splicing and thus be classified as a gene-damaging mutation. Five splicing methods have been canonized for this purpose in the clinical testing market: GeneSplicer, MaxEntScan, NNSplice, and Position Weight Matrix (PWM). Although these algorithms vary wildly in their performance characteristics, such as sensitivity and specificity, they are treated as black-box oracles on equal footing when being used by variant curators to classify variants.
In this presentation, we will review these algorithms and their technical strengths and weakness from the perspective of clinical variant interpretation. Additionally, we will present our approach to splice site prediction within VarSeq, demonstrating how we improve on the status quo by employing these algorithms both in the batch annotation and genomic visualization contexts. Finally, we will demonstrate how these methods can be leveraged during the interpretation of variants following the ACMG guidelines within our upcoming product, VSClinical.
1. Splice Site Algorithms for Clinical
Genomics
20 most promising
Biotech Technology
Providers
Top 10 Analytics
Solution Providers
Hype Cycle for
Life sciences
2. NIH Grant Funding Acknowledgments
Research reported in this publication was supported by the National Institute Of General
Medical Sciences of the National Institutes of Health under:
- Award Number R43GM128485
- Award Number 2R44 GM125432-01
- Award Number 2R44 GM125432-02
PI is Dr. Andreas Scherer, CEO Golden Helix.
The content is solely the responsibility of the authors and does not necessarily represent the
official views of the National Institutes of Health.
3. Q & A
Please enter your questions into your GoToWebinar Panel
4. Golden Helix – Who We Are
Golden Helix is a global
bioinformatics company founded
in 1998.
GWAS
Genomic Prediction
Large-N-Population Studies
RNA-Seq
Large-N CNV-Analysis
Variant Warehouse
Centralized Annotations
Hosted Reports
Sharing and Integration
Variant Calling
Filtering and Annotation
Clinical Reports
CNV Analysis
Pipeline: Run Workflows
7. Golden Helix – Who We Are
When you choose a Golden Helix solution, you get more than just
software
REPUTATION
TRUST
EXPERIENCE
INDUSTRY FOCUS
THOUGHT
LEADERSHIP
COMMUNITY
TRAINING
SUPPORT
RESPONSIVENES
S
INNOVATION and
SPEED
CUSTOMIZATIONS
8. Genetic Testing Process
Sample Prep Sequencing Align & Call Annotate
& Filter
Variant
Interpretation
Report
Sentieon
& VS-CNV
VarSeq VSReportsVSClinical
Golden Helix Clinical Suite
9. VSClinical
Complete Support for ACMG Guideline
Workflow:
- Implements a guided workflow for following the ACMG guideline
scoring and classifying
- Place criteria into conceptually related groups, paired with their
opposites, and formatted as answerable question.
Aggregate and Automate:
- Questions have supporting evidence presented with rich and
interactive visuals
- Automatically computed recommendations for questions that have
explicit bioinformatic evidence, with supporting reasons for each
answer.
Expert and Beginner Friendly:
- Start with descriptive summaries and recommendations for a
variant
- Deep dive into Population Catalogs, Gene Impact, Published Studies
and Clinical tabs
- Integrated documentation, readings on scoring criteria and citations
10. Splice Sites
Introns have distinct nucleotide
pairs at each end
- GT at the 5’ end (Donor Site)
- AG at the 3’ end (Acceptor Site)
Sequences around splice sites
are highly variable
Machine learning and
probabilistic methods are used to
identify sites
11. Algorithms
VSClinical supports four splice site prediction algorithms
- PWM: Uses position weight matrix similar to SpliceSiteFinder and
Human Splice Finder
- MaxEntScan: Approximates sequence motifs using Maximum
Entropy Distribution
- NNSplice: Identifies splice sites using neural networks
- GeneSplicer: Uses Markov models combined with maximal
dependence decomposition
12. Experiments
Algorithms were compared in terms of
- Accuracy
- Sensitivity
- Specificity
- Precision
The test data set was generated as follows:
- 20,000 Known splice sites were extracted from the 1000
Genomes GRCh38 reference sequence using exon
boundaries specified by NCBI RefSeq Genes
- This was combined with 20,000 false splice sites from
the HS3D splice site dataset
15. Discussion
GeneSplicer has exceptional performance for all
metrics
MaxEntScan has high accuracy and is competitive
with GeneSplicer in terms of specificity
NNSplice also performs well and is competitive
with MaxEntScan on donor data
PWM has a high false positive rate and performs
poorly on donor data
17. Q & A
Please enter your questions into your GoToWebinar Panel
Editor's Notes
The ACMG guidelines provide a set of criteria to score variants and place them into one of five classification tiers. Following the guidelines requires diving into the annotations, genomic context and existing clinical assertions about every variant. VSClinical provides a tailored workflow to score each relevant criterion, while also providing all the bioinformatic literature and evidence from clinical knowledgebases. This includes population frequencies, functional prediction scores, conservation scores, and effect prediction.
Through standardized questions, automatically computed evidence, and historical context, VSClinical provides clinicians with everything they need to reach a final classification consistently and reproducibly, thereby reducing the subjectivity of the variant classification process.
While a final classification of a variant requires the manual inspection of the sample’s clinical details, published literature and the content of existing related clinical interpretations, many criteria from the ACMG guidelines can be automatically computed with bioinformatic algorithms and specially curated annotation sources.
The work done in your lab to classify variants will be automatically included in future analyses. As the number of samples processed increases, the number of variants requiring classification will be reduced, along with sample turn-around time. Previously classified variants will be marked with their last evaluation date, allowing them to be accepted in the context of the current sample without extra analysis time.
Look at 16 and 12
We are a bioinformatics support company with over 17 years of expertise in the field of bioinformatics.
Founded in 1998
Founder spun off from early work in pharmacogenomics at GlaxoSmith Kline, who was a key investor and still remain a solid partner with us.
We have two flagship products Varseq and SNP and Variation Suite (SVS) for short.
SVS is our research application platform. It’s a powerful analytic tool created to give researchers the ability to perform complex analysis and visualizations on genomic and phenotypic data.
SVS has a broad range of tools to easily perform GWAS, Genomic Prediction, Differential expression analysis on RNA-Seq Data and has the ability to process CNV analysis.
VarSeq is our testing platform for filtering and annotating variants of interest.
Within VarSeq, We have the option to automatically create customizable clinical reports from the results of various workflows.
We also have an add-in utility called VSPipeline – take workflows that you created and automate the process with very little human interaction. Our pipeline function can be integrated into a secondary analysis pipeline, so you can take VCF files coming out of your secondary pipeline and take it all through filtering, annotation, and create a clinical report.
We also offer a Data Warehousing solution, which provides the ability to store samples, variants, and CNVs in a centralized repository. All projects and reports stored on the warehouse can be queried, exported, and used as frequency annotations.
Our software has been very well received by the industry. We have been cited in over 1000 peer-reviewed publications and that’s a testament to the utility of our software and the satisfaction of our customer base.
Global
We work with over 350 organizations including pharmaceutical companies, top-tier institutions, government organizations, clinics, and genetic testing labs.
Approaching 10,000 installs of our products and 1,000’s of users.
Section 1:
We are well known for quality products and excellent customer service. We have been developing software for over 17 years and have a wide range of talented experts on staff including biostatisticians, mathematicians and software engineers. We want to work with you to ensure the software is staying relevant and is meeting all of your requirements. We are a nimble organization and we can add features, functions, and algorithms rather quickly.
Section 2:
Our executives attend and often speak at top conferences and we pride ourselves in staying relevant and being in the forefront of the bioinformatics industry.
Section 3:
We have an exceptional team of experts available to provide the resources needed to help complete your analysis, from training, to technical support. We aim to support our customers every step of the way.
Section 4:
Transparency - All major methods and algorithms are documented (including the math, use cases, and article citations). We completely shunned the “black box” model. We are not aware of another commercial competitor whose methods are as transparent as ours.
Innovation and Speed – All of our software is designed for your standard desktop or laptop, without needing access to the cloud. We eliminate the need to work with a biostatics core and our clients can get results very often within hours or minutes. Golden Helix is a decisive edge in today’s competitive environment.
Customizations – We offer an expandable and customizable platform via scripting, and many scripts can be downloaded on our website at no additional cost. Additionally, all of our report templates are completely customizable. We are also species agnostic, working with data from all diploid organisms: humans, plants and animals.
If you’re a clinical lab, research lab, or hospital looking to interpret individual variants in the context of disorders you will follow this standard workflow for taking the raw sample and getting to the point where you have a reportable set of variants in the context of an individual.
This starts with sample preparation: extracting out the DNA from the nucleus of the cells, amplifying it, preparing it, and batching a number of samples onto a run so that you can place it on a sequencer, which does the primary analysis of reading the raw data to determine the order of the A’s, C’s, G’s, and T’s coming from the DNA. This information usually gets placed into individual fastq files per sample, which will then be fed into a secondary analysis pipeline which we call the align and call process, which takes the data in the fastq files and compares it to the human reference sequence so that you can get to the point of knowing which mutations occur in which individuals. Next comes the step of annotating and filtering your variants to determine which mutations are of clinical interest, by combining each variant with a slew of genomic data, populations frequencies and computations, interpreting that variant in the context of the individual, and then creating a report over the variants that make it through this process.
Golden Helix has this whole post sequencing pipeline covered for you. For align and call we have our fantastic sention tool along with our CNV caller VS-CNV. For annotating and filtering we have VarSeq, which is used to design the filters, annotations, and computations used to obtain a list of clinically relevant candidate variants. Next we have VSClinical, which allows clinicians to interpret and evaluate the impact of each variant. Finally we have VSReports, which allows you to create the beautiful customizable reports that present the information coming out of the interpretation process for a given test.