Validating an NGS workflow is an iterative process that begins with collaboration with personnel and planning protocols for the entire workflow from sample preparation, sequencing and variant calling, all the way to data analysis and reporting. At Golden Helix, while we do not provide pre-validated black-box workflows, we provide our customers with support to validate workflows in a transparent manner, and assist them in reaching production deadlines. This webcast will be led by members of our Field Application Scientist team, and we will explore some of the best practices for NGS workflow validation that we have observed and helped to implement based on real-world examples from our customer base. Key topics for discussion will include:
Sample preparation and collection of adequate case/control data
Designing a robust workflow with special considerations for single versus family analyses and phenotypic considerations
Generating the desired output for clinical or other reports
Real world NGS workflow validation strategies
Tune in for tips and strategies that you can deploy when designing and validating your NGS workflow.
Call Girls Frazer Town Just Call 7001305949 Top Class Call Girl Service Avail...
Best Practices for Validating a Next-Gen Sequencing Workflow
1. Best Practices for Validating a Next-Gen
Sequencing Workflow
August 16, 2023
Presented by Darby Kammeraad, Director of Field Application Services and
Rana Smalling, PhD, Field Application Scientist
3. Best Practices for Validating a Next-Gen
Sequencing Workflow
August 16, 2023
Presented by Darby Kammeraad, Director of Field Application Services and
Rana Smalling, PhD, Field Application Scientist
4. NIH Grant Funding Acknowledgments
4
• Research reported in this publication was supported by the National Institute Of General Medical Sciences of
the National Institutes of Health under:
o Award Number R43GM128485-01
o Award Number R43GM128485-02
o Award Number 2R44 GM125432-01
o Award Number 2R44 GM125432-02
o Montana SMIR/STTR Matching Funds Program Grant Agreement Number 19-51-RCSBIR-005
• PI is Dr. Andreas Scherer, CEO of Golden Helix.
• The content is solely the responsibility of the authors and does not necessarily represent the official views of the
National Institutes of Health.
5. Who Are We?
5
Golden Helix is a global bioinformatics company founded in 1998
Filtering and Annotation
ACMG & AMP Guidelines
Clinical Reports
CNV Analysis
CNV Analysis
GWAS | Genomic Prediction
Large-N Population Studies
RNA-Seq
Large-N CNV-Analysis
Variant Warehouse
Centralized Annotations
Hosted Reports
Sharing and Integration
Pipeline: Run Workflows
8. The Golden Helix Difference
8
FLEXIBLE DEPLOYMENT
On premise or in a private
cloud
BUSINESS MODEL
Annual fee for software,
training and support
CLIENT CENTRIC
Unlimited support from the
very beginning
SINGLE SOLUTION
Comprehensive cancer and
germline diagnostics
SCALABILITY
Gene panels to whole
exomes or genomes
THROUGHPUT
Automated pipeline
capabilities
QUALITY
Clinical reports correct the
first time
9. Today’s Presenters
9
Rana Smalling, PhD
Field Application Scientist
Darby Kammeraad
Director of Field Application
Services
Best Practices for Validating a Next-Gen Sequencing Workflow
10. 10
Confidential |
NGS Clinical Workflow
Golden Helix provides comprehensive data analytics software that scales across gene panels, whole exomes, and whole genomes
DNA Extraction in Wet
Lab and Sequence
Generation
Interpretation and
Result Reporting
Primary
Read Processing and
Quality Filtering
Alignment and Variant
Calling
Secondary
*Golden Helix provides
Secondary Analysis through
a reseller agreement
Tertiary
Golden Helix’s software and
primary focus
Comprehensive
secondary and tertiary
analysis solutions for
primary data
aggregated by all
commercially available
sequencers
Type Size
Gene Panel Small (100MB)
Whole Exome Medium (1GB)
Whole Genome Large (100GB)
Cancer use case
Hereditary use case
Process Analysis
… and scales across multiple
data set sizes for cancer and
hereditary use cases
Filtering and Annotation
Data Warehousing
Workflow Automation
Golden Helix works with all major
sequencers…
Topic of
Validation
12. Content Overview
12
• Preparation for NGS workflow validation
Adequate controls
Defining expected outcomes
• Design of the NGS workflow in VarSeq
Types of workflows needed
Sample related search terms
Automation
• Expert tips
Unique methods in VarSeq to expedite
validation
• Use cases
1) Somatic workflow
2) Germline workflow scenario
3) CNV validation example
13. Validation begins with well characterized sample
controls
13
Collection of case/control data
o Insightful: Kit with generic controls or catalog (sample or database file with numerous pathogenic variants)
Pros: Useful when testing accuracy of classifier or benchmarking algorithms
Cons: Do not suitably test efficacy of overall assay/filter for real world application
o Practical: Designed controls or real-world data with established results -> more suitable for workflow design/validation)
o Determine the number of samples needed to establish statistical robustness
Example for GHI CNV caller
• minimum 30 controls, read-depth 100X (panels and exomes), consistent library prep method.
Potentially >100s of samples with repeat runs for robustness
Handle spectrum of variant types (SNVs, Indels, CNVs, Fusions)
Handle workflow design/template (TN, T-only, Single germline, Trio/Duo)
Sample collection (blood, saliva, solid tumor, FFPE)
14. Example control sources
14
Horizon Molecular reference standards:
https://horizondiscovery.com/en/reference-standards
o Mimic patient material from sample prep to downstream
analysis
Platform agnostic
Oncology focused with >370 clinically-relevant
variants
SNVs/Indels/CNVs/Fusions
Various DNA source types
15. Example validation process
15
• Phase 1: Software installation and verification of user access
• Phase 2: Definition of all deliverables: clinical reports, exported
data... (outputs)
• Phase 3: Initial workflow design tested with controls (inputs)
• Phase 4: Peer-review and verification of workflow design
• Phase 5: Analytical verification (expected outcomes)
• Phase 6: Finalization of all SOPs
• Phase 7: Training of key employees
• Phase 8: Pipeline approval and go-live
16. Optimal NGS Workflow
16
Workflow design – simplifying the workflow upfront streamlines automation later
o Variant filtering : In order to finalize the filter chain, develop a clear understanding of
the applicable cut-offs that are being modeled within the workflow.
Variant quality (unique to each bioinformatic pipeline but VarSeq is agnostic
and can handle any VCF)
Alt allele frequency in population (typically 1-5% or less but easily adjusted
for disorders more prevalent in population
Ontology (Missense, LOF effect, or predicted to impact canonical or novel
splice site)
Sample specific information (phenotypes/panels or tumor type)
Classifier (default cutoffs adjustable to accommodate founder populations as
example)
o Implementation tip: Testing filter accuracy with flags
Use variant flag sets to test efficacy of filtering strategies (Where does my
known pathogenic variant get lost? Adjusting the filter cutoffs/thresholds)
17. Crucial to define scope of reportable findings as it creates
novel workflow designs
o Establish with the clinical stakeholders what is scope of
genomic data to report on?
For example, should report include
incidental/secondary findings?
• In somatic test, report germline findings such
hereditary risk later in life or related to
relatedness
• Perhaps an opt-in and opt-out policy
o Format choices: Exportable .json or simple visual with pdf
or word document
Reporting: Desired Output
17
18. Leveraging sample relationships
Types of potential germline workflows
o Carrier risk analysis
Pro: Interesting findings can facilitate early
genomic investigation for future prenatal
situation
Con: List of “risky alleles” require careful
reporting language
o Trios/Duos/Extended pedigrees
Pro: Highly efficient filtering strategy
Con: Require sequencing data from other family
members, may not always be available and add
cost
Inheritance Models: family-based analysis
18
19. o Leveraging sample phenotypes
Single sample: Phenotypic based search or use of panel when disorders are consistent in lab
• Pro: Phenotypes search expands beyond limit of panel. Not missing
potentially interesting pathogenic variants
• Con: User may need to research novel gene that falls outside current
version of panel
Approach supported with VarSeq PhoRank algorithm
• Can be setup to be deployed alongside panel to reinforce variant search
• Can be automated with VSPipeline on per sample basis if each case
disorder is unique
Leveraging sample specific search terms: PhoRank
19
20. Somatic
Workflow
Strategy
• Priority Lists
• Project design
Workflow
Template
design
• Parallel filters by priority
• Reporting vs. tracking decision
tree
Workflow
automation
•VSPipeline
deployment strategy
•LIMS/API integration
Lab scenarios: Somatic
20
o Somatic Workflow Strategy for TSO500 or other panels
Priority 1 list: known Tier I oncogenic variants with treatments
Priority 2 list: user explores Tier II variants to collect available
knowledge
Priority 3 list: VUS variants for future review
o Workflow Template design insights
Parallel filters for
• Report: Rapid discovery and report of Priority 1 variants
• Report: User investigation of Priority 2 variants
• No Report on VUS: tracking of VUS for future
reclassification via VSWarehouse
i. Bulk upload of entire variant cohorts
ii. Easily track changing classifications
iii. Track and filter out artifacts
o Workflow automation with VSPipeline
Streamline deployment of validated template
Integration of VarSeq with existing LIMS and external genomic
software via APIs
Somatic workflow
Tier1
Tier2
Tier3
21. Lab scenarios: Germline
21
Germline workflow
Routine workflow filters applied to almost all sample scenarios
o Following scenarios are currently deployed across our global customer base
Scenario 1: University lab running Genomes with designated panels
Scenario 2: Commercial lab running Genomes for any unique disorder
(case by case basis) and report must include any interesting incidental
findings for risk alleles.
o Workflow design insights
Parallel filters for
• Focused search for variants under “Standard Diagnostic” filter
• Parallel filter to capture scope of reportable “Incidental Findings”
o Second project: CNV calling with VarSeq
Reviewing the quality of CNV reference set
Comparison of findings against truth set
Standard
Diagnostic
Incidental
Findings
Scenario 1 Scenario 2
23. NIH Grant Funding Acknowledgments
23
• Research reported in this publication was supported by the National Institute Of General Medical Sciences of
the National Institutes of Health under:
o Award Number R43GM128485-01
o Award Number R43GM128485-02
o Award Number 2R44 GM125432-01
o Award Number 2R44 GM125432-02
o Montana SMIR/STTR Matching Funds Program Grant Agreement Number 19-51-RCSBIR-005
• PI is Dr. Andreas Scherer, CEO of Golden Helix.
• The content is solely the responsibility of the authors and does not necessarily represent the official views of the
National Institutes of Health.
25. 25 Licenses for 25 Months
25
Celebrating 25 Years in Business
• Limited quantity
• Licenses are 25-month license periods
• Available to new customers only
• Orders must be received by Sept 15, 2023
• Visit goldenhelix.com/forms/25-for-25 or
scan the QR code below
26. Conferences
26
European Human Genetics Conference, Booth #566
• June 10 – 13, 2023
• Glasgow, UK
• Monday, June 12, 12:00 - Corporate Satellite Talk (ALSH 1,
Level 0) Achieving Economic Success as an NGS Lab:
Strategy and Implementation
AMP Europe, Milan, Italy, Booth #14
• June 18 – 20, 2023
• Milan, Italy
• Monday, June 19, 1:00 – Industry Symposium Achieving
Economic Success as an NGS Lab: Strategy and
Implementation
Thanks Casey! We can’t wait to dive in to this subject
Thank you Casey, and good morning everyone. Today we will be presenting on the topic:
Before we start diving into the subject, I wanted mention our appreciation for our grant funding from NIH.
The research reported in this publication was supported by the National institute of general medical sciences of the national institutes of health under the listed awards.
We are also grateful to have received local grant funding from the state of Montana. Our PI is Dr. Andreas Scherer who is also the CEO at Golden Helix and the content described today is the responsibility of the authors and does not officially represent the views of the NIH.
So with that covered, lets take just a few minutes to talk a little bit about our company Golden Helix.
Golden Helix is a global bioinformatics software and analytics company that enables research and clinical practices to analyze large genomic datasets. We were originally founded in 1998 based off pharmacogenomics work performed at GlaxoSmithKline, who is still a primary investor in our company.
VarSeq, our flagship product, serves as a clinical tertiary analysis tool. At its core, it serves as a variant annotation and filtration engine. Additionally, however, users have access to automated AMP or ACMG variant guidelines. VarSeq also have the capability to detect copy number variations scaling from single exome to large aneuploidy events. Lastly, the finalization of variant interpretation and classification is further optimized with the VarSeq clinical reporting capability. Users can integrate all of these features into a standardized workflow.
Paired with VarSeq are VSWarehouse and VSPipeline. VSWarehouse serves as a repository for the large amount of useful genomic data wrangled by our customers. Warehouse not only solves the issue of data storage for ever-increasing genomic content, but also is fully queryable and auditable and allows for the definability of user access for project managers or collaborators. In tandem with this, VSPipeline, which will be a large part of today's discussion, allows for the automated execution of routine workflows, further optimizing users' abilities to handle large amounts of data and throughput.
Lastly, our research platform, SVS, enables researchers to perform complex analysis and visualizations on genomic and phenotypic data. SVS has a range of tools to perform GWAW, genomic prediction, and RNA-Seq analysis, among other common research applications.
Our software has been very well received by the industry. We have been cited in thousands of peer-reviewed publications, and that’s a testament to our customer base.
We work with over 400 organizations all over the globe. This includes top-tier institutions, like Stanford and yale, government organizations like the NCI and NIH, clinics such as Sick Kids, and many other genetic testing labs. We now have well over 20,000 installs of our products and with 1,000’s of unique users.
So how is this relevant to you?
At Golden Helix, we focus on the seven pillars of customer success. Golden Helix offers a single software solution that encompasses germline, somatic, and CNV analysis. Our software is also highly scalable, supporting gene panel to whole genome sequencing workflows. With our complete automation capabilities, we now offer a FASTQ or VCF to report pipeline. Our software can be locally deployed, or installed in cloud, and our business model of annual subscription per user means you are able to increase your workload without increasing analysis fees. And it goes without saying, that our FAS team is here to support you on your analysis journey.
Thank you to everyone who is in attendance today and thank you Rana for including me in this presentation. Rana and I are members of the Golden Helix support team that serves to assist with the workflow construction that we are going to highlight today.
Let's start with a bird's-eye view of an NGS clinical workflow, and explore how VarSeq fits in. When validating a workflow, it is important to plan with the beginning and end in mind, starting from sample collection and primary analysis to get your samples sequenced then run through the secondary stage handling alignment and variant calling then lasttly through the tertiary stage paired with data Warehousing. VarSeq mainly encompasses the tertiary analysis steps of filtering, annotation, interpretation and result reporting. However, its modular and flexible design makes it compatible with a variety of inputs coming from many secondary pipelines. Golden Helix software functions with all major sequencers, and our partnership with Sentieon allows users to establish industry-leading secondary analysis. Moreover, VarSeq tackles the issue of scalability quite well, allowing users to automate workflows for increasing sizes of datasets from small gene panels to the increasingly affordable genome. For this webcast, we will be focusing on key points of validating the tertiary analysis stage in VarSeq.
VarSeq facilitates handling of all your variant types for both somatic and germline analysis. The utility of the software can be broken into stages. The first being the import of your SNVs/indels, CNVs and fusions, then passed through a user defined variant filter coupled with many annotations and algorithms to isolate the clinically relevant variant. These filters and project structure are saved as templates to facilitate automation with our VSPipeline command line tool. Once the clinically relevant variant is isolated, it is then moved into stage 2 or VSClinical which serves as the interpretation hub to collect all relevant evidence for germline or somatic variants via the ACMG and AMP guidelines. Once the variant is evaluated, it is saved locally in a user database and carried into the final report stage. You’ll learn today that the reporting feature comes with quite expansive options for the user to customize, but overall, think of VarSeq as the one software suit solution to handle full import of all variants to isolating the reported findings of clinically relevant variants. So now that you have a high level understanding of the tools purpose, lets move into discussing today’s topic.
So if you’ve attended our webcasts in the past, you know that we typically give a demonstration of the software. Today is no exception, however, we’re going to keep the discussion high level and specifically related to the process of NGS workflow validation. One reason Rana and I were excited to discuss this topic is that we get a lot of hands-on experience helping our users develop their workflows. Prior to any workflow construction, the user will always benefit from having established controls and an early framework for the desired outputs of the NGS workflow so keep reports in mind for this topic. The initial push in our NGS platform VarSeq is to develop the ideal filtering template that can be used routinely across all samples and we’re going to cover some filtering recommendations you may align with. Additionally, keep in mind that this conversation is centered around the early exposure to the software, then once the workflow is established it is all highly automatable which quickly becomes a necessity for our busy customers. So lets take a second to discuss adequate controls
Proper validation begins with well characterized sample controls. One important distinction to make is the purpose of the control. From our experience, users come to us with two types. One being a mock sample or database of known pathogenic variants which are great for testing against our algorithms when benchmarking. However, they fall short when it comes to testing the application of the NGS workflow for real samples. To validate your variant filtration strategy, the more practical controls are real-world data or manufactured samples made to mimic samples in your pipeline with established variants.
It will also be important to determine early on the number of samples that will give enough power to your validated your assay as you will need enough to cover the spectrum of variant types.
For example for the GHI CNV caller, at least 30 germline controls are needed with a read depth of about 100x for panels, and these must be done with the same library prep and sequencing method.
Beyond just CNVs, you may need to run 100s of samples including biological or technical replicates for robustness. VarSeq allows you to handle the spectrum of variant types and does not limit the user to any specific sample type whether that’s blood, tumor or FFPE, just as long as you can get a VCF.
When sourcing controls, users are not limited to any specific vendor but the most useful types of samples will be well characterized truth sets that offer biological samples taking the user through the entire primary, secondary and tertiary process.
From the support realm, one example source for control samples are Horizon Molecular reference standards (mainly for cancer but appears to have some germline options) – these samples will mimic patient material from sample prep to downstream analysis regardless of the platform and variant type. Ideally, the user approaches the tertiary stage with a collection of adequate controls with established pathogenic variants for both germline and somatic workflows. Then in the tertiary stage, we can easily develop filters that ensure capture of these variants even when running against a whole genome sample. We will illustrate the potential NGS assay in our tertiary tool VarSeq here shortly, but first give an overview of a hypothetical NGS assay validation process.
Here is an example process overview. We typically initiate this process with installation of all necessary tools or in this case our software VarSeq and ensure all system requirements are met and all necessary users have adequate permissions. Next is the crucial discussion on output formats. Not only deciding on what file type may be generated but also getting an early framework the report format. For example, custom work on a word or pdf template generated by the lab. Then is the user’s due diligence at building the workflow to filter and prioritize variants. This phase is paired with the available control samples we previously discussed to construct and test the efficacy of the variant filter. Once built, the assay will likely undergo peer review and may result in some revisions to the workflow but is subsequently locked down and verified for automation. Along with this verification comes standardization of the necessary protocols, training of all potential users, and finally full approval prior to the institutions go-live date. Obviously this is meant to be a high level representation of the process for the purpose of expediting its implementation and we realize things may not happen exactly in this order. One technically rich area is phase 3 when building the assay, and this is something the golden helix support staff is more than comfortable with. Lets review the key components of constructing your dream NGS workflow
Yes workflow design is one of our favorite training topics to go through with our customers, so don’t hesitate to reach out if you would like some guidance.
VarSeq is a great variant prioritization tool as it gives the user full control over the filters and algorithms that are applied and allows the user to define the ideal workflow for their use case. To expedite this stage we provide prebuilt filtering templates, and also lend our expertise to each user as they build their preferred templates. These workflows are meant to be sensitive yet effective to retain clinically relevant variants while excluding non interesting or benign variants. Fortunately, many of these workflows leverage routine strategies that we can break down here:
Filter first on variant quality fields from your VCF, and in this case VarSeq is secondary pipeline agnostic meaning you can import VCFs from any caller as long as they meet standard VCF format. Our templates provide recommendations, but ultimately are dependent on your sample quality, lab testing SOPs or will be informed by best practices within your field.
Then we typically filter on population alt allele frequency, where the user can set a cutoff of 5% or less following the ACMG standard or even 1% or less which is used commonly.
Next, Our ontology filters allow you to captured variants that may impact the functionality of the gene protein. For example, keep all missense, LOF and potential splice variants.
In a workflow like this, a virtual panel or phenotype will be useful for focusing on sample specific disorders and help identify the subset of variant most relevant to an individual – for phenotypic prioritization we have our PhoRank algorithm which we will discuss in more detail shortly.
One of the most powerful strategies for variant prioritization in VarSeq is our group of variant classifier algorithms – for germline variants we provide automated ACMG guideline based classification, and we would like to give you guys a teaser – our Cancer classifier for somatic variants is going to be released later this year.
Applying classifiers are a powerful way to hone in on the most relevant pathogenic or oncogenic variants in your sample. These can be edited to accommodate varying founder populations or perhaps inclusion of additional consortium databases with known variant classifications.
When testing a workflow, a pro tip is to use our variant flags to assess accuracy of a filter – I will demonstrate an example of this when we get into the software
Another critical part of the workflow validation process is early designation of the report outputs. VarSeq gives users the option to generate reports in a Microsoft Word, PDF format, or even a machine readable JSON file for their existing LIMS. Overall there's a lot of options on how the report gets generated in VarSerq however it is also critical to define the scope of the evaluation being done in the software. For example, are you focused on primary findings only or perhaps inclusion of secondary germline findings in your somatic workflow, or even an incidental findings for your germline analysis. We seek to define the scope of evaluations to be carried out in VarSeq then establish what customization efforts are required to make the ideal document format for your pipeline. This is one of the strongest value points of the VarSeq software in that it gives users full control of the customized format of the report for their lab.
Most of our demo will be focuse don single samples, but we just want to make you aware that we have strategies for multiple relationship structures. The filters pretty much apply but we fir the inheritance mosdels below the common piece,
When designing an initial workflow, it is important to keep in mind the sample relationships between samples in a project – are they individual of the same family, or matched tumor and normal from the same sample or groups of affected vs unaffected samples in a case/control study. VarSeq facilitates defining these relationships, and we specifically want to highlight family-based analysis.
It can be quite powerful to leverage family relationships and shared phenotypes or disorders for identifying disease causing variants.
There are pros and cons to family NGS workflows that will need to be taken into account when validating such workflows. Two examples of inheritance or family based analyses are carrier risk analysis and trios (which is broad term that encapsulates duos and extended family analysis as well). Carrier risk analysis can facilitate early investigations into disease risk for family planning, but a downside is that reporting a list of high risk alleles requires very delicate handling and careful reporting language.
With regards to trio analysis, these are a highly efficient strategy for tracking down disease variants, but the need for data from multiple family members is affected by sample availability and high cost.
Another aspect of designing an efficient filter is leveraging the patients’ phenotype. Often a user will have a list of genes of interest for a well defined panel. However we are seeing it more and more commonly that a filter should be designed to accommodate any unique phenotype on a case by case basis. This can be easily setup in VarSeq by use of our phorank algorithm which will expand variant capture beyond the rigidity of a panel. A downside to this may be that the user may need to do further research to validate novel genes that fall outside the panel and ensure that the relationship to the patient’s disorder is truly solid. VarSeq gives the user the best of both worlds with virtual gene panels approach or phenotypic matching via PhoRank.
What you don’t filter on you may want to display in the variant table and or GB.
Another goal for the today’s discussion is to address hypothetical workflow scenarios for labs running Germline and Somatic Analysis. We will be exploring three scenarios in our demo based off real world experience with GHI customers.
One scenario is a lab running CGP kit like the TSO500 panel from Illumina on a large number of samples. There is a strong initial focus on building a filter for the various variant types, including SNVs, Indels, CNVs, and Fusions but also involves parallel filters to capture different categories of variant priority.
For example, one filter facilitate the rapid processing of variants in the top priority list, but users may develop workflows that best handle lower priority matches or even expedite the capture of VUS for future reassessment if not presently reportable. One crucial tool that plays a role in this reassessment of VUS is VSWarehouse which we will also present today. We will also demonstrate some example germline workflows as well.
Keep in mind the scope of the conversation is surrounding the early convo but we can save these as templates which I will show
I will demonstrate a couple VarSeq projects. One handling genome level data with a filter designed to capture pathogenic variants under a standard diagnostic workflow channeled through a designated panel for cardiomyopathy, but also show the user can utilize parallel filter logic to capture phenotypically ranked variants and also construct a secondary findings filter. A second project will be dedicated to the quality assessment of CNV calling with VarSeq, first reviewing the sample coverage quality and overall comparison, plus compare called CNVs to a known truth set to ensure accuracy of the caller.
Before wrapping up, we'd like to again state our appreciation for the grants included here. And with that, I'll hand things back to Casey to talk about some exciting marketing updates and take us through a Q&A session.
Again, I want to mention how grateful we are we are thankful of grants such as this which support the advancement and development of our software to create the high quality software you'll see today.
So with that covered, lets take a few minutes to talk a little bit about our company Golden Helix.