Combined Impact! New Tools to Assess
Complex and Compound Heterozygous
Variants with VarSeq
September 18, 2024
Presented by: Julia Love, Associate Director of Product & Quality
Jennifer Dankoff, PhD, Field Application Scientist
2
Combined Impact! New Tools to Assess
Complex and Compound Heterozygous
Variants with VarSeq
September 18, 2024
Presented by: Julia Love, Associate Director of Product & Quality
Jennifer Dankoff, PhD, Field Application Scientist
NIH Grant Funding Acknowledgments
4
• Research reported in this publication was supported by the National Institute Of General Medical Sciences of
the National Institutes of Health under:
o Award Number R43GM128485-01
o Award Number R43GM128485-02
o Award Number 2R44 GM125432-01
o Award Number 2R44 GM125432-02
o Montana SMIR/STTR Matching Funds Program Grant Agreement Number 19-51-RCSBIR-005
o NIH SBIR Grant 1R43HG013456-01
• PI is Dr. Andreas Scherer, CEO of Golden Helix.
• The content is solely the responsibility of the authors and does not necessarily represent the official views of
the National Institutes of Health.
Golden Helix at-a-Glace
5
Company Snapshot: Leading SaaS provider of tertiary genomic analysis solutions for NGS labs
Golden Helix is a SaaS bioinformatics solution provider specializing in next-gen sequencing
(“NGS”) data analysis

The Company’s software enables automated workflows and variant analysis for gene panels,
exomes, and whole genomes

Key Clinical Applications
Prenatal
testing
Hereditary disease
testing
Reproductive
testing
Oncology
Marquee Global Clients
Golden Helix’s solutions allow clients to increase throughput, ensure consistent quality,
maximize revenue, and save time

1998
Company Founded
Bozeman, Montana
Headquarters
Recognitions
Government Research
Pharmaceuticals
Agrigenomics
Testing Labs
Translational Labs
Human Genetics Research
Hospitals
Academia
Publications
Content & Resources
Pharmacogenetics
testing
6
Confidential |
NGS Clinical Workflow
Golden Helix provides comprehensive data analytics software that scales across gene panels, whole exomes, and whole genomes
DNA Extraction in Wet
Lab and Sequence
Generation
Interpretation and
Result Reporting
Primary
Read Processing and
Quality Filtering
Alignment and Variant
Calling
Secondary
*Golden Helix provides
Secondary Analysis through
a reseller agreement
Tertiary
Golden Helix’s software and
primary focus
Comprehensive
secondary and tertiary
analysis solutions for
primary data
aggregated by all
commercially available
sequencers
Type Size
Gene Panel Small (100MB)
Whole Exome Medium (1GB)
Whole Genome Large (100GB)
Cancer use case
Hereditary use case
Process Analysis
… and scales across multiple
data set sizes for cancer and
hereditary use cases
Filtering and Annotation
Data Warehousing
Workflow Automation
Golden Helix works with all major
sequencers…
Medical Device
Certification
Secured CE Mark for EU
7
• VarSeq Dx
• VarSeq Dx is designed with compliance and reliability for your
clinical analysis.
• VarSeq Dx is our flagship software, VarSeq, that is CE marked
to meet the European In Vitro Diagnostic Regulation (IVDR
2017/746) requirements. VarSeq Dx satisfies the IVDR
requirements within the European Economic Area (EEA).
• Verification
• CE MARK
• ISO Certification
• Our customers will work with our Field Application Scientist to
verify the installation and ensure proper usage of the
software. This can be used for ISO QMS software validation
documentation.
Recent webcasts
8
• Pharmacogenomics
https://www.goldenhelix.com/resources/webcasts/pgx-analysis-in-varseq-a-users-perspective/index.html
https://www.goldenhelix.com/resources/webcasts/introducing-vspgx-pharmacogenomics-testing-in-varseq/index.html
• VarSeq Dx – Medical device certification in Europe
https://www.goldenhelix.com/resources/webcasts/introducing-varseq-dx-as-a-medical-device-in-the-european-union/i
ndex.html
• Integrating Long and Short Read Sequencing for Comprehensive NGS Analysis
https://www.goldenhelix.com/resources/webcasts/integrating-long-and-short-read-sequencing-for-comprehensive-NG
S-analysis/
Topics for today
• Overview of new VarSeq tools and algorithms for analyzing the following variant types:
• Complex variants and corresponding allelic primatives, collapsed phased variants, their combined
impact, and the compound het condition.
• Small variant and CNV compound het status in trios.
• A look at the current state of Methylation analysis with the VarSeq software
9
10
Introducing the need for
more complex variant tools
New capabilities in VarSeq v.2.6.2
• Advanced sequencing technologies are
producing massive amounts of data, opening-up
avenues for more diverse complex variant
analysis.
• Genetic variants come in many shapes, sizes,
and structures.
• Contextualizing variants and complex variant
interactions for more thorough classifications.
• These contribute to improved patient treatment
options .
Long-Read Sequencing The Wholistic NGS Test
11
• Clinically relevant properties of long read-sequencing:
• Better precision and accuracy with lower coverage across the
whole genome. This is in particular helpful, identifying very rare
variants with high confidence.
• Improved calling of Structural Variants and CNVs.
• Higher yield capturing regulatory sequences (UTRs),
pseudogenes (SMN1, SMN2), centromeres, Alu elements (SINE),
short tandem repeats, LINE1 elements, and long repeats.
• Other improvements include
• Ability to provide methylation calls.
• Genotype phasing allows compound het analysis to be
completed on a single proband.
PacBio read length histogram
https://www.pacb.com/technology/hifi-sequencing/how-it-works/
2023 “Method of the Year” – Nature Methods journal
Phasing 101
Unphased example:
Chr17:56350190 G>A Likely Pathogenic GT: 0/1
Chr17: 56350196 G>A Pathogenic GT: 0/1
Unclear if the gene is in compound het.
Phased example:
Chr17:56350190 G>A Likely Pathogenic GT: 0|
1
Chr17: 56350196 G>A Pathogenic GT: 1|0
Both copies of the gene are affected.
• Phasing is the ability to infer haplotypes from
genotype data.
• A phased variant will be shown as 0|1 or 1|0.
• Phasing information can come from short-reads or
long reads.
?
?
1|0
0|1
Case study:
Singleton Analysis
• Analysis of complex variants in an immune
compromised individual.
• Examples:
o Likely Pathogenic Multiallelic Variant in HLA-
DRB5
o Merged In-Phase Variants in NOD2
o Evaluate Phased Compound Het Variants in
the recessive IL10RA gene
Introducing the Complex Variant
Table
• Analyze complex variant calls alongside variants split into allelic primitives.
• Annotations and algorithms can be applied to both tables.
• Set up independent filter strategies for SNVs and MNVs.
• Analyze variants in VSClinical in either configuration.
• Example:
• A complex variant in HLA-DRB5 that is Benign when split into Allelic Primatives but Likely Pathogenic
when conserved as a complex variant.
New capability in VarSeq v.2.6.2
Introducing Collapsed Phased
Variants
• Merges variants that are in-phase and within a certain distance
threshold together.
• Identify groups of variants and their compound effect.
• Can be used for both short and long read phased data.
• Annotate and evaluated collapsed variants in VSClinical.
• Example:
• Two variants in-phase are collapsed together, resulting in a Likely Pathogenic
classification for the Dominant NOD2 gene.
New Algorithm in VarSeq v.2.6.2
Introducing Phased Compound Het
Detection
• Identify potential compound heterozygous variants in a single sample analysis!
• For variants to be considered compound het from a single sample the following
need to be true:
• The variants are in the same gene
• The variants are in the same phase set
• The variants are not in phase
• Example:
• Two Likely Pathogenic and Pathogenic variants in the Recessive IL10RA gene are not
in-phase and found to be on opposing chromosomes. This results in a compound het
status.
New Algorithm in VarSeq v.2.6.2
Case Study:
Trio Analysis
• Analysis of compound SNP and CNV
variants in BRCA2 and breast cancer risk.
• Heterozygous deletion of exons 2-8 of
BRCA2 inherited from the father.
• Pathogenic NM_000059.4:c.3812C>G
variant in exon 11 inherited from the
mother.
• Compound het for Pathogenic variants
found in the proband.
Introducing the CNV Variant
Compound Het Algorithm
• Compound heterozygosity
often occurs when CNVs
and SNVs are located in
the same gene.
• This is an essential
analysis for autosomal
recessive disorders.
• Identifying compound het
SNVs and CNVs aids the
understanding of disease
diagnosis and disease
mechanisms.
New Algorithm in VarSeq v.2.6.2
What is Methylation?
5-mC
CpG
From ThermoFisher Sci
• DNA methylation is a chemical
epigenetic modification of the DNA
sequence.
• 5-methyl-cytosine is considered the ‘5th
base’ because of its unique properties.
• DNA methylation, especially in CpG
islands is a cellular mechanism used
for precise regulation of gene
expression
• In cancer, there is a departure from this
state.
• Hypermethylation of CpG islands is
linked to gene silencing in tumor
suppression genes.
• Hypomethylation is associated with
genomic instability
A Sneak Peek at an Up-and-Coming Research Field
Up-and-Coming Clinical Applications of
DNA Methylation in Cancer
https://cancerci.biomedcentral.com/articles/10.1186/s12935-023-03074-7
• Research is showing that there are
several methylation patterns which
are relevant to a broad range of
cancers.
• Lung cancer (LC), colorectal cancer (CRC),
gastric cancer (GC), hepatocellular
carcinoma (HCC), and esophageal cancer
(EC).
• Using cfDNA in blood/serum to
detect hypermethylation status of
certain TSGs is useful as an early
diagnostic marker for several cancers
as well as prognostic indicator.
• VarSeq is your research assistant by
importing long-read derived
methylation data for viewing and
filtering.
A Sneak Peek at an Evolving Research Field
Current VarSeq Capabilities with Methylation Data
• In VarSeq you can import
differentially methylated
regions files.
• In your project, plot methylated
regions in GenomeBrowse.
• Filter the methylation region
table for differentially methylated
regions.
• Annotate with custom databases.
• MethMarkerDB to be released
later in 2024!
• Make your own interpretations in
CancerKB for clinical reporting.
• Reach out to Golden Helix for
more information about your
workflow!
A Sneak Peek at an Evolving Research Field
22
Product Demo
In Summary
• Higher resolution sequencing has brought upon a need for more
sophisticated complex variant analysis tools.
• Our VarSeq v2.6.2 introduces a number of tools to analyze:
• Complex Variants
• In-Phase Variants
• Compound Het Variants not In-Phase
• SNV and CNV Compound Het in Trios
• We are also developing new tools for the up-and-coming field of
methylation analysis.
• In VarSeq you can import your DMR long-read data
• Plot and filter
• Create custom biomarkers to import into reports
• Reach out to Golden Helix if you would like to talk more about your
methylation analysis needs.
NIH Grant Funding Acknowledgments
24
• Research reported in this publication was supported by the National Institute Of General Medical Sciences of
the National Institutes of Health under:
o Award Number R43GM128485-01
o Award Number R43GM128485-02
o Award Number 2R44 GM125432-01
o Award Number 2R44 GM125432-02
o Montana SMIR/STTR Matching Funds Program Grant Agreement Number 19-51-RCSBIR-005
o NIH SBIR Grant 1R43HG013456-01
• PI is Dr. Andreas Scherer, CEO of Golden Helix.
• The content is solely the responsibility of the authors and does not necessarily represent the official views of
the National Institutes of Health.
25
eBook Library
26
• Prenatal Genetics – Learn the Following
o Existing approaches to prenatal WES, along with clinical
indications for its use
o How VarSeq and VSClinical can be utilized for its use
o A few interesting cases of variants and their classifications
• Pharmacogenetics – Learn the Following
o Foundations of Pharmacogenomics
o Genetic variability and drug response
o Pharmacogenomic test reporting nomenclature and
terminology
o The Pharmacogenomic eco-system
o VSPGx - A pharmacogenomics application
Secured CE Mark for EU
27
• VarSeq Dx
• VarSeq Dx is designed with compliance and reliability for your
clinical analysis.
• VarSeq Dx is our flagship software, VarSeq, that is CE marked
to meet the European In Vitro Diagnostic Regulation (IVDR
2017/746) requirements. VarSeq Dx satisfies the IVDR
requirements within the European Economic Area (EEA).
• Verification
• CE MARK
• ISO Certification
• Our customers will work with our Field Application Scientist to
verify the installation and ensure proper usage of the
software. This can be used for ISO QMS software validation
documentation.
28

Combined Impact: New Tools to Assess Complex and Compound Heterozygous Variants with VarSeq

  • 1.
    Combined Impact! NewTools to Assess Complex and Compound Heterozygous Variants with VarSeq September 18, 2024 Presented by: Julia Love, Associate Director of Product & Quality Jennifer Dankoff, PhD, Field Application Scientist
  • 2.
  • 3.
    Combined Impact! NewTools to Assess Complex and Compound Heterozygous Variants with VarSeq September 18, 2024 Presented by: Julia Love, Associate Director of Product & Quality Jennifer Dankoff, PhD, Field Application Scientist
  • 4.
    NIH Grant FundingAcknowledgments 4 • Research reported in this publication was supported by the National Institute Of General Medical Sciences of the National Institutes of Health under: o Award Number R43GM128485-01 o Award Number R43GM128485-02 o Award Number 2R44 GM125432-01 o Award Number 2R44 GM125432-02 o Montana SMIR/STTR Matching Funds Program Grant Agreement Number 19-51-RCSBIR-005 o NIH SBIR Grant 1R43HG013456-01 • PI is Dr. Andreas Scherer, CEO of Golden Helix. • The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
  • 5.
    Golden Helix at-a-Glace 5 CompanySnapshot: Leading SaaS provider of tertiary genomic analysis solutions for NGS labs Golden Helix is a SaaS bioinformatics solution provider specializing in next-gen sequencing (“NGS”) data analysis  The Company’s software enables automated workflows and variant analysis for gene panels, exomes, and whole genomes  Key Clinical Applications Prenatal testing Hereditary disease testing Reproductive testing Oncology Marquee Global Clients Golden Helix’s solutions allow clients to increase throughput, ensure consistent quality, maximize revenue, and save time  1998 Company Founded Bozeman, Montana Headquarters Recognitions Government Research Pharmaceuticals Agrigenomics Testing Labs Translational Labs Human Genetics Research Hospitals Academia Publications Content & Resources Pharmacogenetics testing
  • 6.
    6 Confidential | NGS ClinicalWorkflow Golden Helix provides comprehensive data analytics software that scales across gene panels, whole exomes, and whole genomes DNA Extraction in Wet Lab and Sequence Generation Interpretation and Result Reporting Primary Read Processing and Quality Filtering Alignment and Variant Calling Secondary *Golden Helix provides Secondary Analysis through a reseller agreement Tertiary Golden Helix’s software and primary focus Comprehensive secondary and tertiary analysis solutions for primary data aggregated by all commercially available sequencers Type Size Gene Panel Small (100MB) Whole Exome Medium (1GB) Whole Genome Large (100GB) Cancer use case Hereditary use case Process Analysis … and scales across multiple data set sizes for cancer and hereditary use cases Filtering and Annotation Data Warehousing Workflow Automation Golden Helix works with all major sequencers… Medical Device Certification
  • 7.
    Secured CE Markfor EU 7 • VarSeq Dx • VarSeq Dx is designed with compliance and reliability for your clinical analysis. • VarSeq Dx is our flagship software, VarSeq, that is CE marked to meet the European In Vitro Diagnostic Regulation (IVDR 2017/746) requirements. VarSeq Dx satisfies the IVDR requirements within the European Economic Area (EEA). • Verification • CE MARK • ISO Certification • Our customers will work with our Field Application Scientist to verify the installation and ensure proper usage of the software. This can be used for ISO QMS software validation documentation.
  • 8.
    Recent webcasts 8 • Pharmacogenomics https://www.goldenhelix.com/resources/webcasts/pgx-analysis-in-varseq-a-users-perspective/index.html https://www.goldenhelix.com/resources/webcasts/introducing-vspgx-pharmacogenomics-testing-in-varseq/index.html •VarSeq Dx – Medical device certification in Europe https://www.goldenhelix.com/resources/webcasts/introducing-varseq-dx-as-a-medical-device-in-the-european-union/i ndex.html • Integrating Long and Short Read Sequencing for Comprehensive NGS Analysis https://www.goldenhelix.com/resources/webcasts/integrating-long-and-short-read-sequencing-for-comprehensive-NG S-analysis/ Topics for today • Overview of new VarSeq tools and algorithms for analyzing the following variant types: • Complex variants and corresponding allelic primatives, collapsed phased variants, their combined impact, and the compound het condition. • Small variant and CNV compound het status in trios. • A look at the current state of Methylation analysis with the VarSeq software
  • 9.
  • 10.
    10 Introducing the needfor more complex variant tools New capabilities in VarSeq v.2.6.2 • Advanced sequencing technologies are producing massive amounts of data, opening-up avenues for more diverse complex variant analysis. • Genetic variants come in many shapes, sizes, and structures. • Contextualizing variants and complex variant interactions for more thorough classifications. • These contribute to improved patient treatment options .
  • 11.
    Long-Read Sequencing TheWholistic NGS Test 11 • Clinically relevant properties of long read-sequencing: • Better precision and accuracy with lower coverage across the whole genome. This is in particular helpful, identifying very rare variants with high confidence. • Improved calling of Structural Variants and CNVs. • Higher yield capturing regulatory sequences (UTRs), pseudogenes (SMN1, SMN2), centromeres, Alu elements (SINE), short tandem repeats, LINE1 elements, and long repeats. • Other improvements include • Ability to provide methylation calls. • Genotype phasing allows compound het analysis to be completed on a single proband. PacBio read length histogram https://www.pacb.com/technology/hifi-sequencing/how-it-works/ 2023 “Method of the Year” – Nature Methods journal
  • 12.
    Phasing 101 Unphased example: Chr17:56350190G>A Likely Pathogenic GT: 0/1 Chr17: 56350196 G>A Pathogenic GT: 0/1 Unclear if the gene is in compound het. Phased example: Chr17:56350190 G>A Likely Pathogenic GT: 0| 1 Chr17: 56350196 G>A Pathogenic GT: 1|0 Both copies of the gene are affected. • Phasing is the ability to infer haplotypes from genotype data. • A phased variant will be shown as 0|1 or 1|0. • Phasing information can come from short-reads or long reads. ? ? 1|0 0|1
  • 13.
    Case study: Singleton Analysis •Analysis of complex variants in an immune compromised individual. • Examples: o Likely Pathogenic Multiallelic Variant in HLA- DRB5 o Merged In-Phase Variants in NOD2 o Evaluate Phased Compound Het Variants in the recessive IL10RA gene
  • 14.
    Introducing the ComplexVariant Table • Analyze complex variant calls alongside variants split into allelic primitives. • Annotations and algorithms can be applied to both tables. • Set up independent filter strategies for SNVs and MNVs. • Analyze variants in VSClinical in either configuration. • Example: • A complex variant in HLA-DRB5 that is Benign when split into Allelic Primatives but Likely Pathogenic when conserved as a complex variant. New capability in VarSeq v.2.6.2
  • 15.
    Introducing Collapsed Phased Variants •Merges variants that are in-phase and within a certain distance threshold together. • Identify groups of variants and their compound effect. • Can be used for both short and long read phased data. • Annotate and evaluated collapsed variants in VSClinical. • Example: • Two variants in-phase are collapsed together, resulting in a Likely Pathogenic classification for the Dominant NOD2 gene. New Algorithm in VarSeq v.2.6.2
  • 16.
    Introducing Phased CompoundHet Detection • Identify potential compound heterozygous variants in a single sample analysis! • For variants to be considered compound het from a single sample the following need to be true: • The variants are in the same gene • The variants are in the same phase set • The variants are not in phase • Example: • Two Likely Pathogenic and Pathogenic variants in the Recessive IL10RA gene are not in-phase and found to be on opposing chromosomes. This results in a compound het status. New Algorithm in VarSeq v.2.6.2
  • 17.
    Case Study: Trio Analysis •Analysis of compound SNP and CNV variants in BRCA2 and breast cancer risk. • Heterozygous deletion of exons 2-8 of BRCA2 inherited from the father. • Pathogenic NM_000059.4:c.3812C>G variant in exon 11 inherited from the mother. • Compound het for Pathogenic variants found in the proband.
  • 18.
    Introducing the CNVVariant Compound Het Algorithm • Compound heterozygosity often occurs when CNVs and SNVs are located in the same gene. • This is an essential analysis for autosomal recessive disorders. • Identifying compound het SNVs and CNVs aids the understanding of disease diagnosis and disease mechanisms. New Algorithm in VarSeq v.2.6.2
  • 19.
    What is Methylation? 5-mC CpG FromThermoFisher Sci • DNA methylation is a chemical epigenetic modification of the DNA sequence. • 5-methyl-cytosine is considered the ‘5th base’ because of its unique properties. • DNA methylation, especially in CpG islands is a cellular mechanism used for precise regulation of gene expression • In cancer, there is a departure from this state. • Hypermethylation of CpG islands is linked to gene silencing in tumor suppression genes. • Hypomethylation is associated with genomic instability A Sneak Peek at an Up-and-Coming Research Field
  • 20.
    Up-and-Coming Clinical Applicationsof DNA Methylation in Cancer https://cancerci.biomedcentral.com/articles/10.1186/s12935-023-03074-7 • Research is showing that there are several methylation patterns which are relevant to a broad range of cancers. • Lung cancer (LC), colorectal cancer (CRC), gastric cancer (GC), hepatocellular carcinoma (HCC), and esophageal cancer (EC). • Using cfDNA in blood/serum to detect hypermethylation status of certain TSGs is useful as an early diagnostic marker for several cancers as well as prognostic indicator. • VarSeq is your research assistant by importing long-read derived methylation data for viewing and filtering. A Sneak Peek at an Evolving Research Field
  • 21.
    Current VarSeq Capabilitieswith Methylation Data • In VarSeq you can import differentially methylated regions files. • In your project, plot methylated regions in GenomeBrowse. • Filter the methylation region table for differentially methylated regions. • Annotate with custom databases. • MethMarkerDB to be released later in 2024! • Make your own interpretations in CancerKB for clinical reporting. • Reach out to Golden Helix for more information about your workflow! A Sneak Peek at an Evolving Research Field
  • 22.
  • 23.
    In Summary • Higherresolution sequencing has brought upon a need for more sophisticated complex variant analysis tools. • Our VarSeq v2.6.2 introduces a number of tools to analyze: • Complex Variants • In-Phase Variants • Compound Het Variants not In-Phase • SNV and CNV Compound Het in Trios • We are also developing new tools for the up-and-coming field of methylation analysis. • In VarSeq you can import your DMR long-read data • Plot and filter • Create custom biomarkers to import into reports • Reach out to Golden Helix if you would like to talk more about your methylation analysis needs.
  • 24.
    NIH Grant FundingAcknowledgments 24 • Research reported in this publication was supported by the National Institute Of General Medical Sciences of the National Institutes of Health under: o Award Number R43GM128485-01 o Award Number R43GM128485-02 o Award Number 2R44 GM125432-01 o Award Number 2R44 GM125432-02 o Montana SMIR/STTR Matching Funds Program Grant Agreement Number 19-51-RCSBIR-005 o NIH SBIR Grant 1R43HG013456-01 • PI is Dr. Andreas Scherer, CEO of Golden Helix. • The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
  • 25.
  • 26.
    eBook Library 26 • PrenatalGenetics – Learn the Following o Existing approaches to prenatal WES, along with clinical indications for its use o How VarSeq and VSClinical can be utilized for its use o A few interesting cases of variants and their classifications • Pharmacogenetics – Learn the Following o Foundations of Pharmacogenomics o Genetic variability and drug response o Pharmacogenomic test reporting nomenclature and terminology o The Pharmacogenomic eco-system o VSPGx - A pharmacogenomics application
  • 27.
    Secured CE Markfor EU 27 • VarSeq Dx • VarSeq Dx is designed with compliance and reliability for your clinical analysis. • VarSeq Dx is our flagship software, VarSeq, that is CE marked to meet the European In Vitro Diagnostic Regulation (IVDR 2017/746) requirements. VarSeq Dx satisfies the IVDR requirements within the European Economic Area (EEA). • Verification • CE MARK • ISO Certification • Our customers will work with our Field Application Scientist to verify the installation and ensure proper usage of the software. This can be used for ISO QMS software validation documentation.
  • 28.

Editor's Notes

  • #1 Casey’s intro
  • #3 Casey’s intro
  • #4 To get started today, I want to first express our appreciation for our grant funding from the NIH. The research and development efforts for a number of our software capabilities have been supported by the National institute of general medical sciences of the national institutes of health under the listed awards, as well as local grant funding from the state of Montana. Our PI is Dr. Andreas Scherer who is also the CEO at Golden Helix. I must mention here that the content described today is the responsibility of the authors and does not officially represent the views of the NIH.
  • #5 Before diving into the topic of today’s webcast I would like to take a moment to give our attendees, especially those that ma be new to Golden Helix, a brief introduction to our company. Golden Helix is a bioinformatics software company based out of Bozeman Montana that has been serving customers all over globe for over 25 years. We began by providing research focused software for array-based analysis but early on shifted our focus to next generation sequencing applications and we have emerged as market leaders in the NGS space and now our focus is to provide high quality bioinformatic software that specifically enables our customers to conduct routine clinical applications for their NGS analyses. Our tertiary software solutions are scalable for routine work with gene panels all the way to whole genomes and are highly automatable to facilitate high throughput operations where large numbers of samples are being processed. This combined with our subscription-based business model, allows users to freely process an unlimited number of samples as needed, without the concerns about scaling costs that would be experienced with most per sample applications on the market. The assays designed in our software are flexible and user refined, and span a wide spectrum of applications, including somatic workflows for oncology-based analyses, germline workflows for hereditary cancer, inherited and rare diseases, prenatal testing, carrier screening analysis, family based analysis, and of course our recently added pharmacogenomics analysis. Taking advantage of these capabilities is our very wide-spread global customer base – our users span government and testing labs, hospitals, universities, and many research and pharmaceutical labs. Our communication with our customers informs our software development as we aim to stay abreast of the most important features to develop, edge cases and different data types that we can support, and through this partnership our software has been regularly cited in reputable scientific journals, which is a testament to the work of our customer base.
  • #6 Now that we have discussed some uses cases for our software, let’s review where our tools fit into the bigger picture of an NGS Workflow. Generally speaking, the NGS workflow is divided into 3 stages where primary analysis encompasses everything from sample collection to sequencing, secondary analysis describes the processes for read alignment and variant calling, and the tertiary stage is where variant evaluation and reporting take place. VarSeq is a tertiary analysis tool that is designed to be agnostic to upstream sequencing platforms and secondary analysis pipelines, which means we accept NGS variant calling and alignment files from the various platforms and pipelines that are commonly used, provided that these adhere to standard VCF and alignment file formats. These upstream pipelines include Illumina and ThermoFisher, some of the emerging sources like MGI and very pertinent to today’s topic, we accommodate PacBio and Oxford Nanopore long read technologies, in fact we are endorsed as PacBio tertiary analysis. We also have a long-standing partnership with Sentieon to provide labs with a secondary analysis solution as needed. VarSeq is one of the few platforms that can handle the range of variant calling outputs from upstream analysis pipelines, tackling both short and long read data, and scaling from small targeted panels all the way to complete whole genomes to accommodate the large number of variants analyzed and the increase in computational and storage demands. The graphical user interface of VarSeq serves as the front end for annotation and filtering, as well as clinical interpretation and reporting for variants, CNVs and fusions. However, we couple this GUI with our command-line-interface workflow automation tool - VSPipeline - for higher throughput processing for each component of tertiary analysis. Lastly, we provide robust data warehousing solutions via VSWarehouse which serves as a repository for aggregating and storing variant frequency data from your own cohorts over time. Warehouse facilitates efficient data management and enables easy retrieval of variant assessments or interpretations that can be applied to a growing cohort of samples and is deployed locally in your environment to enable data security. To learn more about automation and warehousing with Golden Helix, or any of the components of our software, we encourage you to review the collection of webcasts hosted on our site.
  • #7 Each of the applications we just discussed has been diligently developed by our team here at Golden Helix, and we adhere to a highly structured and thoroughly documented manufacturing process. As a result of this commitment to quality Golden Helix is now an ISO 13485 certified medical device manufacturer as of January 2024, and is a CE marked medical device under IVDR as of April 2024 . This certification holds significant value for laboratories seeking their own ISO certification and IVDR compliance, especially those within the EU or those processing European samples, as the software can more easily be incorporated into a lab’s quality management systems and processes. Our certification and continued adherence to a rubost QMS assures reproducibility of our quality specifications and manuals thus simplifying the validation process for any lab using current and future versions of our software. It is important to note that VarSeq is not CE marked for users by default – if a user desires to use VarSeq as CE marked medical device we have developed VarSeq Dx Mode which is available in VarSeq 2.6.1 and all future versions. When implementing this feature, we have a certification process which users must complete, and our support staff is ready to guide you accordingly through our user onboarding, installation and verification process and proficiency certification process as we move through the workflow validation process together.
  • #8 I invite you to review some of the latest developments to our software by checking out some recent presentations by members of our team here at Golden Helix. . In the first half of this year we primarily focused on developing our pharmacogenomics analysis tool in VarSeq for which we’ve had a number of webcasts. And as just discussed we have a webcast highlighting the VarSeq Dx mode. We also recently presented on a co-published article with TWIST Bioscience for a whole exome based CNV project showing 100% concordance with known events detected by MLPA in a Corielle truth set. Stay tuned for more updates on our software stack throughout the year. For today’s presentation, we will be discussing using VarSeq to analyze complex variants and their combined effects in a single sample and a trio. In addition we will briefly touch on methylation analysis within VarSeq as it is an emerging topic of discussion.
  • #9 So now regardless of the sequencing approach you take, VarSeq gives you a streamlined way to conduct tertiary analyses in as efficient a way as possible. VarSeq software provides the interface that can create comprehensive workflows on both short and long read data, and our partnership with Sentieon also has you covered on the secondary analysis front if needed. It’s simplest to separate VarSeq into three steps. Step 1 is to import the full list of variants, SNPs and indels, and CNVs and SVs, any STRs from both long and short read pipelines. Our demo will show you a single sample project with these imported data types. The usual modality is available to annotate against the variant list and build a filtering strategy to isolate the clinically relevant variant and this filter strategy can be used for both short and long read data. We also allow you to see phasing and look at other items like methylated regions. The goal is then at the end of step 1 results in a single or few selected variants that are carried into the VSClinical interpretation hub in Step 2. VSClinical houses the automated ACMG and AMP guidelines for evaluation of your germline and somatic variants. Here the user will assess every available layer of evidence for a variant, draft and catalog comprehensive interpretations and ultimately complete step 3 which is to render your customized clinical reports. We ship a number of example report template but know that users have a wide spectrum of report customization options that are rendered with a click of a button. For today’s presentation, we’d like to highlight capacity of VarSeq to handle the complexity of a comprehensive approach, in taking both short and long read sequencing data with complex variants for both a singleton and family analysis. We will show you how our new tools will give more flexibility for a comprehensive variant analysis and how to bring this into a clinical report via VSClinical.
  • #10 Now, let’s jump into today’s webcast topic by discussing the importance of complex variant analysis and why the need for more complex variants analysis tools is becoming more and more important- particularly in genomics, bioinformatics, and personalized medicine fields. There are several factors that contribute to this need. With Sequencing technologies advancing, massive amounts of data are being produced and sophisticated algorithms are required to handle this data efficiently and accurately. As we all know, genetic variants come in many forms and all require different analytical approaches. Single nucleotide polymorphisms and small insertions and deletions require a more simple and straightforward approach to variant analysis, such as determining the frequency of the variant in the population, identifying genotype-phenotype associations, and understanding the functional impact of the variant based on gene function and location within a gene. However complex variants such as copy number variations, structural variants, complex indels, and multi-allelic variants require intricate data interpretation and more complex strategies for data analysis. Analyzing complex variants often requires WGS ideally including variant phase information to detect and analyze the complex variant effect and functional consequences. To further build onto the complexity of complex variant analysis, we know that genetic variants do not exist in isolation. Their effects can be influenced by factors such as genetic background, gene interactions, and environmental factors. Importantly, genetic variants often interact with one another in ways that can influence disease risk and phenotype. Complex tools can incorporate multi-dimensional data and contextual information, providing a more nuanced understanding the variants impact. This understanding leads to treatments and interventions that are tailored based on an individual’s genetic profile. To tailor these effectively, it’s necessary to understand how multiple genetic variants interact to affect disease risk and drug response. Tools that analyze compound effects can provide insights into how different variants together might impact an individual's health and treatment outcomes. In research, understanding the compound effects of genetic variants can improve disease models, making them more reflective of human biology. This can enhance the development of new therapies and improve our understanding of disease mechanisms.
  • #11 Now that we have established the importance of complex variant analysis, Id like to introduce one of the key advancements in sequencing technology that will come up a few times during the presentation today and that’s Long read sequencing. There is whole genome sequencing, and then there is long read sequencing. Long-read sequencing is having its moment in the context of NGS analysis methodology, being hailed as the scientific method of the year for 2023 by Nature Publishing Group. This technology has enabled large scale projects such as the T2T consortium. So what exactly are we getting out of long-read technology in terms of variant detection? What makes long read sequencing a technological breakthrough is improved read quality and the high confidence in calling structural variants and CNVs – a large part of this is the fact that the reads are just longer allowing you to sequence through those break points and break ends to capture a more accurate picture of structural genomic variations. There are other benefits of long read technology as well. In general long read technology enables better precision and accuracy with lower coverage, and is particularly beneficial in difficult-to-capture regions. In the context of this webcast, we have the ability to see phased genotypes which is an asset to detecting compound heterozygous variants in a single proband. We also are able to detect methylated regions, which we will show how these can be plotted in VarSeq later in this presentation. Overall, long-read sequencing brings a number of data types to the table, and VarSeq has the tools needed to analyze these data types. VarSeq is able to import long-read sequencing data from our partners over at PacBio, and for our friends using ONT data, don’t worry, we also support the import of that sequencing data as well. Now let’s take a look at how phasing plays into variant analysis.
  • #12 Variant phasing information provides an inference about the location of multiple nearby variants on the same sequencing read and if those variants are on the same chromosome or not. Phasing information can also come from analyzing overlapping sequencing reads. Variant Phasing is represented in the genotype field of the vcf file and they can be identified by the use of the pipe symbol instead of the back slash. It is important to note that some variants will not have phase information so their genotypes will be represented in the classical format separated by the back slash. Short reads can produce immediately adjacent ‘phased’ variants. Short-read can use some inference methods, but these are not as reliable as those produced by long reads. Long-reads can produce higher phasing definition across an entire gene, but can be quite expensive to perform. Let’s take a look at the example for the MPO gene and Myeloperoxidase deficiency here on the right. Myeloperoxidase deficiency is an autosomal recessive disorder wherein if both copies of the MPO gene are affected the immune system is compromised and there is no functional activity from MPO, whereas the immune system can still function as long as there is one good copy of the gene. First when there is not any phase information, we cannot know if these two pathogenic variants are on the same copy of the gene or not. However when we know these variants are part of the same phase set and out of phase with one another, we know that both copies of the gene are affected and the MPO function is lost resulting in myeloperoxidase deficiency.
  • #13 Ok now with that covered, we can now jump into the new complex variant analysis tools and support that VarSeq 2.6.2 has to offer! We will be looking at 2 case studies today, The first case we will look at is an immuo-comprised single sample. This sample is long-read WGS and we are going to make good use of its phased variants with two of our algorithms. We will look at three examples of different complex variants and their analysis within VarSeq. Each of these examples involves assessing the combined effect of multiple variants that if analyzed individually would not be impactful. Now let’s dive into our first Complex variant.
  • #14 Previously on import, a choice needed to made was to either split alleles into allelic primitives (Single nucleotide polymorphisms) or keep variants in their multiallelic states. The recommendation was always to split alleles into allelic primitives on import because typically variants are represented this way in annotation sources, so for better variant annotation it was preferred to split the variants into allelic primitives. However, in order to still capture the context of the variant in multiallelic form, the same sample would have needed to be re-imported with the split allelic import option turned off. But now, instead of having to pick and choose between split, or no-split, you can have both! Annotations and algorithms can be applied to both the SNV and the complex variant tables. Annotating single nucleotide polymorphisms will still, in general, be more informative and describe well the impact of that single variant. You might run into a few annotations source that will have information on a complex variant but in general multi-nucleotide variants are less frequently annotated than single nucleotide variants. Utilizing the ability to filter the SNV table differently from the Multi Nuclotide Vvariants will be the most holistic view of the variants and their impacts. Let’s take a look at the example shown here for the HLA-DRB5 gene. On the left we have the complex variants and on the right we have the variants split into allelic primitives. If we look at the out put of both the ACMG Sample Classifier and out transcript annotation algorithm, we can see that the effect of the two variants individually that make up the complex variant are not very interesting, they are both benign. But when analyzed as a complex variant we see that this variant is classified as Likely Pathogenic, which we would not have classified as such before unless we had imported the same project with different settings twice. The complex variant table saves us that trouble, and allows flexibility for which variants we want to report with VSClinical.
  • #15 Next we can take a look at our first new algorithm in VS 2.6.2, collapsed phased variants. This algorithm will merge variants that are in-phase and within a certain distance threshold together. Now groups of in-phase variants can be analyzed together and we can see the overall compound effect. Earlier in the presentation I mentioned the advantages of long vs short read variant phasing, but I want to point out that this algorithm can work well with both sequencing strategies. Short-read phased data will work well to see multiple variants within a single amino acid whereas with long read data you can define a thresholds to see compound effect of variants that are in phase but a farther distance away. Similar to the complex variant table, the collapsed phased variants table can be annotated and variants can be added to VSClinical in their collapsed state. Here we have an example of a phased variant in NOD2 where individually, the variants are each missense variants. Together, we get a stop gained variant which we would not have captured otherwise.
  • #16 The second new algorithm that is introduced in VS 2.6.2 and will utilize phased genotypes is the phased compound het detection algorithm. This algorithm will Identify potential compound heterozygous variants in a single sample analysis. Unlike the collapsed phased variants algorithm, this algorithm is searching for variants that are in the same phase set but not in phase and therefore could be contributing to a compound heterozygous state. Historically, detection of compound het variants required a trio with at least one proband with designated parents. But now, by utilizing variant phased sets, this algorithm can determine if there are 2 variants within a gene that are potentially compound het variants in the absence of parental data. The real take away is that with the phase information we know that each copy of the gene is being affected. In this example, we can see 2 pathogenic variants in IL10RA gene. The genotypes indicate that these variants are heterozygous variants that are on different copies of the gene meaning they have a compound effect on IL10RA function. Jennifer will be walking us through the analysis of these variants within VarSeq in just a little bit, but I did mention we have a second case study to cover as well.
  • #17 Our second case study is a trio analysis where we will look at a compound het CNV and small variant in the BRCA2 gene conferring heightened risk for familial breast cancer. Though BRCA2 mutations are generally inherited in an autosomal dominant pattern, it is an interesting use case as the 2-hit hypothesis suggests that for those who are predisposed to breast and ovarian cancer they will be more likely to actually get cancer when both copies of the gene are affected. In this case we have a heterozygous deletion of exons 2- 8 of BRCA2 that is being inherited from the father and a pathogenic variant in exon 11 being inherited from the mother.
  • #18 Compound heterozygosity does not only occur between SNVs in a single gene it often occurs between CNVs and SNVs, and even compound het CNVs with other CNVs so in VS 2.6.2 we generated an algorithm that will detect compound het variants and CNVs. This analysis is particularly relevant for genes and disorders that are inherited in an autosomal recessive manor, wherein both copies of the gene need to be affected to cause the disorder. Detecting compound heterozygosity between variants and CNVs is especially important as this combination can often lead to a more severe phenotype and even different manifestations of the disorder. Understanding the combined effect of SNV and CNVs inherited together will improve our understanding of diagnosing diseases and disease mechanisms. Before I hand things over to Jennifer to show us the CNV and Variant compound het analysis in action, there is one more form of data analysis that may help increase our understanding disesase mechanisms- that is, the up and coming research field of DNA methylation analysis.
  • #19 DNA methylation is a chemical epigenetic modification of the DNA sequence that is heritable throughout cell division cycles and essentially involves the addition of a methyl group to a cytosine base to create the base methyl-cytosine (Andrew E Jaffe, 2012). The modification occurs almost exclusively at genomic loci termed CpGs, which are cytosines immediately followed by guanines in the 5’ to 3’ direction. DNA methylation occurs at the carbon-5 position of cytosines so the modified base is called 5-methylcytosine (5-mC). There is actually methylation widespread throughout the CpG sites of the genome. In CpG islands, DNA methylation is a cellular mechanism used for precise regulation of gene expression (regulates imprinting, tissue specific gene expression, cancer etc.). In normal cells, the DNA is usually methylated in the gene body and unmethylated in CpG islands in the 5’ ends or promoter regions of genes. In cancer, there is a departure from this state. So, we can say that aberrant DNA methylation is a hallmark of cancer, best described by two separate paradigms, focal hypermethylation and global hypomethylation which are relatively independent processes in the cancer genome and in tumor progression. Global hypomethylation is strongly associated with genomic instability owing to events such as replication stress, depression of transposable elements leading to structural variation, and increased transcription of oncogenes. Whereas Focal hypermethylation in CpG islands of gene promoters typically results in gene silencing of tumor suppressor genes due to a blockade of the transcriptional machinery.
  • #20 https://cancerci.biomedcentral.com/articles/10.1186/s12935-023-03074-7 Research is showing that several methylation markers have been discovered relevant to a broad range of cancers, such as lung cancer (LC), colorectal cancer (CRC), gastric cancer (GC), hepatocellular carcinoma (HCC), and esophageal cancer (EC). Using cfDNA in blood/serum to detect hypermethylation status of certain TSGs is useful as an early diagnostic marker for several cancers as well as prognostic indicator VarSeq is your research assistant by importing long-read derived methylation data for viewing and filtering. Note: there are no official ACMG or AMP guidelines and no targeted therapies related to methylation biomarkers at this time.
  • #21 How exactly can VarSeq be used to assist in analyzing methylation data? To start, VarSeq can import differentially methylated regions. These region files are generated from pieplines such as methbat from PacBio and Ont. Once imported you can plot the methylated regions in Genomebrowse and filter and annotate the methylated regions table for differentially methylated regions of interest. We plan on releasing the MethMarkerDB this year which will help prioritize methylated regions for their relevance in different cancer types. An example of interest is in cancers of the central nervous system and an example is shown here on the right. Here we are looking at differential methylation of the MGMT gene for a glioblastoma. There are a couple of ways you can report on methylation biomarkers with a custom script to bring in the regions and then you can create your own custom interpretations with CancerKB. Methylation analysis is a highly customizable and evolving field. Please reach out to Golden Helix for more information on how methylation could be brought into your unique workflows. Now I will turn things over to Jennifer and she will provide us with a VarSeq 2.6.2 demonstration focused on our two case studies that I introduced earlier in the presentation.
  • #23 Turn it back over to Casey.
  • #24 Before we start diving into the subject, I wanted mention our appreciation for our grant funding from NIH. The research reported in this publication was supported by the National institute of general medical sciences of the national institutes of health under the listed awards. We are also grateful to have received local grant funding from the state of Montana. Our PI is Dr. Andreas Scherer who is also the CEO at Golden Helix and the content described today is the responsibility of the authors and does not officially represent the views of the NIH. So with that covered, Before diving into today's topic, I'd like to offer some background and context on what Golden Helix brings to the table as a company