Using VarSeq to Improve Variant Analysis Research WorkflowsGolden Helix Inc
In this webinar presentation, we will review workflow strategies for quality control and analysis of DNA sequence variants using the VarSeq software package from Golden Helix. VarSeq is a powerful platform for analysis of DNA sequence variants in clinical and translational research settings. VarSeq provides researchers with easy access to curated public databases of variant annotation information, and also enables users to incorporate their own local databases or downloaded information about variants and genomic regions.
Many of today's researchers are generating DNA sequence data for large numbers of samples in population-based experiments. This may include whole genomes, exomes, or targeted regions. The Golden Helix SNP and Variation Suite (SVS) provides a powerful computing environment for analyzing these data and performing association tests at the gene and/or variant level.
In this presentation, Dr. Christensen will review fundamentals of population-based variant analysis and demonstrate some of the tools available in SVS for analysis of both common and rare variants. The presentation will feature the recently implemented SKAT-O method, as well as other functions for annotation, visualization, quality control and statistical analysis of DNA sequence variants.
Communication of chemistry in the internet era, while it has improved, remains challenged in terms of the exchange of data in a lossless fashion. While there are moves afoot within the publishing industry to produce “data journals”, including embracing some of the new approaches for making data available to the community, many challenges remain. Chemistry data sharing, at even the most basic level, remains a challenge for many chemistry journals. The vast majority of chemistry data is provided as PDF files or trapped on webpages and therefore not available for reuse and repurposing without a significant amount of effort to extract the data. Some of the responsibility resides with the scientists who need to be educated and encouraged in the adoption of appropriate exchange formats and utilization of online platforms for data hosting and dissemination. There are certain practices which, if adopted, could increase both the availability and utility of data for the community. This includes recognition that data, in itself, has value above and beyond inclusion in peer-reviewed publications, the adoption of standard (not necessarily open) formats, clear data licensing, and distribution of the data across multiple platforms. This presentation will provide an overview of ongoing efforts within the National Center for Computational Toxicology to publish chemistry data, both in databases and associated with peer-reviewed publications, in a manner that makes our data and models consumable by the community.
This abstract does not reflect U.S. EPA policy.
RNA-seq for DE analysis: extracting counts and QC - part 4BITS
Part 4 of the training sesson 'RNA-seq for differential expression analysis' considers extracting the count table from a mapping, and performing QC to detect sample biases. See http://www.bits.vib.be
Using VarSeq to Improve Variant Analysis Research WorkflowsGolden Helix Inc
In this webinar presentation, we will review workflow strategies for quality control and analysis of DNA sequence variants using the VarSeq software package from Golden Helix. VarSeq is a powerful platform for analysis of DNA sequence variants in clinical and translational research settings. VarSeq provides researchers with easy access to curated public databases of variant annotation information, and also enables users to incorporate their own local databases or downloaded information about variants and genomic regions.
Many of today's researchers are generating DNA sequence data for large numbers of samples in population-based experiments. This may include whole genomes, exomes, or targeted regions. The Golden Helix SNP and Variation Suite (SVS) provides a powerful computing environment for analyzing these data and performing association tests at the gene and/or variant level.
In this presentation, Dr. Christensen will review fundamentals of population-based variant analysis and demonstrate some of the tools available in SVS for analysis of both common and rare variants. The presentation will feature the recently implemented SKAT-O method, as well as other functions for annotation, visualization, quality control and statistical analysis of DNA sequence variants.
Communication of chemistry in the internet era, while it has improved, remains challenged in terms of the exchange of data in a lossless fashion. While there are moves afoot within the publishing industry to produce “data journals”, including embracing some of the new approaches for making data available to the community, many challenges remain. Chemistry data sharing, at even the most basic level, remains a challenge for many chemistry journals. The vast majority of chemistry data is provided as PDF files or trapped on webpages and therefore not available for reuse and repurposing without a significant amount of effort to extract the data. Some of the responsibility resides with the scientists who need to be educated and encouraged in the adoption of appropriate exchange formats and utilization of online platforms for data hosting and dissemination. There are certain practices which, if adopted, could increase both the availability and utility of data for the community. This includes recognition that data, in itself, has value above and beyond inclusion in peer-reviewed publications, the adoption of standard (not necessarily open) formats, clear data licensing, and distribution of the data across multiple platforms. This presentation will provide an overview of ongoing efforts within the National Center for Computational Toxicology to publish chemistry data, both in databases and associated with peer-reviewed publications, in a manner that makes our data and models consumable by the community.
This abstract does not reflect U.S. EPA policy.
RNA-seq for DE analysis: extracting counts and QC - part 4BITS
Part 4 of the training sesson 'RNA-seq for differential expression analysis' considers extracting the count table from a mapping, and performing QC to detect sample biases. See http://www.bits.vib.be
BioAssay Express: Creating and exploiting assay metadataPhilip Cheung
The challenge of accurately characterizing bioassays is a real pain point for many drug discovery organizations. Research has shown that some organizations have legacy assay collections exceeding 20,000 protocols, the great majority of which are not accurately characterized. This problem is compounded by the fact that many new protocol registrations are still not following FAIR (Findability, Accessibility, Interoperability, and Reusability) Data principles.
BioAssay Express is a tool focused on transforming the traditional protocol description from an unstructured free form text into a well-curated data store based upon FAIR Data principles. By using well-defined annotations for assays, the tool enables precise ontology based searches without having to resort to imprecise keyword searches.
This talk explores a number of new important features designed to help scientists accelerate the drug discovery process. Some example use-cases include: enabling drug repositioning projects; improving SAR models; identifying appropriate machine learning data sets; fine-tuning integrative-omic pathways;
An aspirational goal for our team is to build a metadata schema based on semantic web vocabularies that is comprehensive to the extent that the text description becomes optional. One of the many possibilities is to take the initial prospective ELN entry for a bioassay protocol and feed it directly to an automated instrument. While there are many challenges involved in creating the ELN-to-robot loop, we will provide some insights into our collaborations with UCSF automation experts.
In summary, the ability to quickly and accurately search or analyze bioassay data (public or internal) is a rate limiting problem in drug discovery. We will present the latest developments toward removing this bottleneck.
https://plan.core-apps.com/acs_sd2019/abstract/6f58993d-a716-49ad-9b09-609edde5a3f4
Making NGS Data Analysis Clinically Practical: Repeatable and Time-Effective ...Golden Helix Inc
Exploring next-generation sequence data requires an iterative process whereby a researcher can find a "needle in the haystack" that contributes to a particular disease or other phenotype. Once that needle has been found, a workflow can be established for analyzing other samples or to create a repeatable, time-effective process for clinical usage.
Yet, repeating a workflow that involves several different quality control, filtering, and analysis steps is burdensome and error-prone.
To solve this problem, we introduce custom workflow automation in SVS, which allows you to collapse dozens of steps into a few run-specific options. This click-and-go process saves an exponential amount of time while eliminating the inevitable user error that happens with tedious repetition and ensures that the exact same protocol is followed with each run, a critical requirement for use in the clinic.
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Tom Plasterer
Edge Informatics is an approach to accelerate collaboration in the BioPharma pipeline. By combining technical and social solutions knowledge can be shared and leveraged across the multiple internal and external silos participating in the drug development process. This is accomplished by making data assets findable, accessible, interoperable and reusable (FAIR). Public consortia and internal efforts embracing FAIR data and Edge Informatics are highlighted, in both preclinical and clinical domains.
This talk was presented at the Molecular Medicine Tri-Conference in San Francisco, CA on February 20, 2017
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Tom Plasterer
As scientists in the life sciences we are trained to pursue singular goals around a publication or a validated target or a drug submission. Our failure rates are exceedingly high especially as we move closer to patients in the attempt to collect sufficient clinical evidence to demonstrate the value of novel therapeutics. This wastes resources as well as time for patients depending upon us for the next breakthrough.
Edge Informatics is an approach to ameliorate these failures. Using both technical and social solutions together knowledge can be shared and leveraged across the drug development process. This is accomplished by making data assets discoverable, accessible, self-described, reusable and annotatable. The Open PHACTS project pioneered this approach and has provided a number of the technical and social solutions to enable Edge Informatics. A number of pre-competitive consortia and some content providers have also embraced this approach, facilitating networks of collaborators within and outside a given organization. When taken together more accurate, timely and inclusive decision-making is fostered.
Data Integration in a Big Data Context: An Open PHACTS Case StudyAlasdair Gray
Keynote presentation at the EU Ambient Assisted Living Forum workshop The Crusade for Big Data in the AAL Domain.
The presentation explores the Open PHACTS project and how it overcame various Big Data challenges.
Dealers in Hope - Programme Leaders in the 21st Century
Benedict Pinches
APM Programme Management SIG Conference 2017,
02 March 17,
Rolls-Royce Learning and Development Centre
Derby
BioAssay Express: Creating and exploiting assay metadataPhilip Cheung
The challenge of accurately characterizing bioassays is a real pain point for many drug discovery organizations. Research has shown that some organizations have legacy assay collections exceeding 20,000 protocols, the great majority of which are not accurately characterized. This problem is compounded by the fact that many new protocol registrations are still not following FAIR (Findability, Accessibility, Interoperability, and Reusability) Data principles.
BioAssay Express is a tool focused on transforming the traditional protocol description from an unstructured free form text into a well-curated data store based upon FAIR Data principles. By using well-defined annotations for assays, the tool enables precise ontology based searches without having to resort to imprecise keyword searches.
This talk explores a number of new important features designed to help scientists accelerate the drug discovery process. Some example use-cases include: enabling drug repositioning projects; improving SAR models; identifying appropriate machine learning data sets; fine-tuning integrative-omic pathways;
An aspirational goal for our team is to build a metadata schema based on semantic web vocabularies that is comprehensive to the extent that the text description becomes optional. One of the many possibilities is to take the initial prospective ELN entry for a bioassay protocol and feed it directly to an automated instrument. While there are many challenges involved in creating the ELN-to-robot loop, we will provide some insights into our collaborations with UCSF automation experts.
In summary, the ability to quickly and accurately search or analyze bioassay data (public or internal) is a rate limiting problem in drug discovery. We will present the latest developments toward removing this bottleneck.
https://plan.core-apps.com/acs_sd2019/abstract/6f58993d-a716-49ad-9b09-609edde5a3f4
Making NGS Data Analysis Clinically Practical: Repeatable and Time-Effective ...Golden Helix Inc
Exploring next-generation sequence data requires an iterative process whereby a researcher can find a "needle in the haystack" that contributes to a particular disease or other phenotype. Once that needle has been found, a workflow can be established for analyzing other samples or to create a repeatable, time-effective process for clinical usage.
Yet, repeating a workflow that involves several different quality control, filtering, and analysis steps is burdensome and error-prone.
To solve this problem, we introduce custom workflow automation in SVS, which allows you to collapse dozens of steps into a few run-specific options. This click-and-go process saves an exponential amount of time while eliminating the inevitable user error that happens with tedious repetition and ensures that the exact same protocol is followed with each run, a critical requirement for use in the clinic.
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Tom Plasterer
Edge Informatics is an approach to accelerate collaboration in the BioPharma pipeline. By combining technical and social solutions knowledge can be shared and leveraged across the multiple internal and external silos participating in the drug development process. This is accomplished by making data assets findable, accessible, interoperable and reusable (FAIR). Public consortia and internal efforts embracing FAIR data and Edge Informatics are highlighted, in both preclinical and clinical domains.
This talk was presented at the Molecular Medicine Tri-Conference in San Francisco, CA on February 20, 2017
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Tom Plasterer
As scientists in the life sciences we are trained to pursue singular goals around a publication or a validated target or a drug submission. Our failure rates are exceedingly high especially as we move closer to patients in the attempt to collect sufficient clinical evidence to demonstrate the value of novel therapeutics. This wastes resources as well as time for patients depending upon us for the next breakthrough.
Edge Informatics is an approach to ameliorate these failures. Using both technical and social solutions together knowledge can be shared and leveraged across the drug development process. This is accomplished by making data assets discoverable, accessible, self-described, reusable and annotatable. The Open PHACTS project pioneered this approach and has provided a number of the technical and social solutions to enable Edge Informatics. A number of pre-competitive consortia and some content providers have also embraced this approach, facilitating networks of collaborators within and outside a given organization. When taken together more accurate, timely and inclusive decision-making is fostered.
Data Integration in a Big Data Context: An Open PHACTS Case StudyAlasdair Gray
Keynote presentation at the EU Ambient Assisted Living Forum workshop The Crusade for Big Data in the AAL Domain.
The presentation explores the Open PHACTS project and how it overcame various Big Data challenges.
Dealers in Hope - Programme Leaders in the 21st Century
Benedict Pinches
APM Programme Management SIG Conference 2017,
02 March 17,
Rolls-Royce Learning and Development Centre
Derby
Public and private sector collaboration to deliver major infrastructure
APM Programme Management SIG Conference 2017,
02 March 17,
Rolls-Royce Learning and Development Centre
Derby
Best Practices for Validating a Next-Gen Sequencing WorkflowGolden Helix
Validating an NGS workflow is an iterative process that begins with collaboration with personnel and planning protocols for the entire workflow from sample preparation, sequencing and variant calling, all the way to data analysis and reporting. At Golden Helix, while we do not provide pre-validated black-box workflows, we provide our customers with support to validate workflows in a transparent manner, and assist them in reaching production deadlines. This webcast will be led by members of our Field Application Scientist team, and we will explore some of the best practices for NGS workflow validation that we have observed and helped to implement based on real-world examples from our customer base. Key topics for discussion will include:
Sample preparation and collection of adequate case/control data
Designing a robust workflow with special considerations for single versus family analyses and phenotypic considerations
Generating the desired output for clinical or other reports
Real world NGS workflow validation strategies
Tune in for tips and strategies that you can deploy when designing and validating your NGS workflow.
Using VarSeq to Improve Variant Analysis Research WorkflowsDelaina Hawkins
Many questions must be answered when analyzing DNA sequence variants: How do I determine which variants are potentially deleterious? Is the sequencing quality sufficient? How do I prioritize the results? Which annotation sources may help answer my research question?
In this webinar presentation, we will review workflow strategies for quality control and analysis of DNA sequence variants using the VarSeq software package from Golden Helix. VarSeq is a powerful platform for analysis of DNA sequence variants in clinical and translational research settings. VarSeq provides researchers with easy access to curated public databases of variant annotation information, and also enables users to incorporate their own local databases or downloaded information about variants and genomic regions.
The presentation will include interactive demonstrations using VarSeq to analyze variants found by exome sequencing of an extended family with a complex disease. We will review strategies for assessing variant quality, applying genomic annotations, incorporating custom annotation sources, and creating variant filters in VarSeq. We will also demonstrate the PhoRank gene ranking algorithm and its application for prioritizing variants.
The ionization state of a chemical, reflected in pKa values, affects lipophilicity, solubility, protein binding and the ability of a chemical to cross the plasma membrane. These properties govern the pharmacokinetic parameters such as absorption, distribution, metabolism, excretion and toxicity and thus pKa is a fundamental chemical property and is used in many models of chemical toxicity.
Experimentally determining pKa is not feasible for high-throughput assays. Predicting pKa is challenging and existing models have been developed only using restricted chemical space (e.g., anilines, phenols, benzoic acids, primary amines) and lack of a generalized model impedes ADME modeling.
No free and open source models exist for heterogeneous chemical classes, however, several proprietary programs exist. In this work, pKa open data bundled with DataWarrior (http://www.openmolecules.org/) were used to develop predictive models for pKa. After data cleaning, there were ~3100 and ~3900 monoprotic chemicals with an acidic or basic pKa, respectively. 1D and 2D chemical descriptors (AlogP, Topological polar surface area, etc) in addition to 12 fingerprints (presence or absence of a chemical group) were generated using PaDEL software. Three datasets were used: acidic, basic and acidic and basic combined.
13 datasets were examined, the 1D/2D descriptors and 12 fingerprints. Using the Extreme Gradient Boosting algorithm showed that the MACCS and Substructure Count fingerprints yielded the best results, with models showing an R-Squared of ~0.78 and a RMSE of 1.42.
Recently, Deep Learning models have showed remarkable progress in image recognition and natural language processing. To determine if the Deep Learning algorithms would increase model performance we examined the datasets and found that the Deep Learning models were somewhat superior than Extreme Gradient Boosting with an R-Squared of ~0.80 and an RMSE of ~1.38.
This work does not reflect U.S. EPA policy.
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveGolden Helix
Earlier this year, we released VarSeq 2.3.0 which brought massive updates to our VSClinical AMP interface, such as enhanced capabilities for automation and analysis of structural variants in the cancer context. Naturally, we wanted to follow that up shortly with similar advancements to our VSClinical ACMG interface, and also make our customers doing germline variant analysis happy.
Our latest software release, VarSeq 2.4.0, was therefore focused on the advancements in VSClinical ACMG, namely support for importing and clinically evaluating structural variants, long read sequencing, advanced automation with evaluation scripts in VSClinical ACMG and end-to-end automation of ACMG workflows with VSPipeline. These new and improved features were discussed in a great webcast by our VP of Product and Engineering, Gabe Rudy, last month.
This upcoming webcast by our FAS team will be a user’s perspective on the new features in VarSeq 2.4.0 and VSClinical ACMG and how our tools can precisely and efficiently enable the full spectrum NGS analysis for Mendelian disorders.
VarSeq 2.4.0: VSClinical ACMG Workflow from the User PerspectiveGolden Helix
Earlier this year, we released VarSeq 2.3.0 which brought massive updates to our VSClinical AMP interface, such as enhanced capabilities for automation and analysis of structural variants in the cancer context. Naturally, we wanted to follow that up shortly with similar advancements to our VSClinical ACMG interface, and also make our customers doing germline variant analysis happy.
Our latest software release, VarSeq 2.4.0, was therefore focused on the advancements in VSClinical ACMG, namely support for importing and clinically evaluating structural variants, long read sequencing, advanced automation with evaluation scripts in VSClinical ACMG and end-to-end automation of ACMG workflows with VSPipeline. These new and improved features were discussed in a great webcast by our VP of Product and Engineering, Gabe Rudy, last month.
This upcoming webcast by our FAS team will be a user’s perspective on the new features in VarSeq 2.4.0 and VSClinical ACMG and how our tools can precisely and efficiently enable the full spectrum NGS analysis for Mendelian disorders.
Production Bioinformatics, emphasis on ProductionChris Dwan
Production bioinformatics at Sema4 can be thought of as data ops - a peer to the lab ops organization. We operate 24/7 to deliver correct and timely results on NGS and other data for thousands of samples per week. This deck introduces the Prod BI organization and systems architecture with a focus on what it takes to run bioinformatics in production rather than for R&D or pure research.
Introducing VSWarehouse - A Scalable Genetic Data Warehouse for VarSeqGolden Helix Inc
As Precision Medicine is taking off, the number of samples in a testing lab and the associated data volume is increasing exponentially. In order to organize the data and build a knowledge base of cases that can be used for future analysis as well as ongoing research, labs need to leverage state of the art warehousing technology.
Speaker: Benedict C. S. Cross, PhD, Team leader (Discovery Screening), Horizon Discovery
CRISPR–Cas9 mediated genome editing provides a highly efficient way to probe gene function. Using this technology, thousands of genes can be knocked out and their function assessed in a single experiment. We have conducted over 150 of these complex and powerful screens and will use our experience to guide you through the process of screen design, performance and analysis.
We'll be discussing:
• How to use CRISPR screening for target ID and validation, understanding drug MOA and patient stratification
• The screen design, quality control and how to evaluate success of your screening program
• Horizon’s latest developments to the platform
• Horizon’s novel approaches to target validation screening
Understanding and controlling for sample and platform biases in NGS assaysCandy Smellie
What is the impact of assay failure in your laboratory and how do you monitor for it?
The advancement of next-generation sequencing has provided invaluable resources to researchers in multiple industries and disciplines, and will be a major driver during the personalized medicine revolution that is upon us. However, while the cost of generating sequencing data continues to decrease this does not take into account the significant costs associated with the infrastructure and expertise that are required to develop a robust, routine NGS pipeline.
Specifically, as predicted by Sboner, et al in 2011, the cost of the sequencing portion of the experiment continues to decrease and the costs associated with upfront experimental design and downstream analysis dominate the cost of each assay. This is true whether you are performing a pre-clinical R&D project, and perhaps even more so for clinical assays. In the paper, the authors note the unpredictable and considerable ‘human time’ spent on the upstream design and downstream analysis. Here at Horizon, we aim to develop tools that help researchers and clinicians optimize these workflows to make NGS more reliable and ultimately, more affordable by streamlining these resource intensive areas.
Similar to XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody Discovery (20)
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
1. XAbTracker® & SeqAgent®: Integrated LIMS and Sequence
Analysis Tools for Antibody Phage Display
February 20, 2017
Mark Evans
Best Practices in Personalized and Translational Medicine Short Course
2. 2
Established in 1981
Located in Berkeley, CA
Small, publicly traded biotech company
Experts in antibody discovery, optimization, cell line and process
development
Currently supporting ongoing Phase 2 clinical trials
• XOMA 358: congenital hyperinsulinism & Post-bariatric surgery hyperinsulinism
• XOMA 213: Various hyperprolactinemias
About Xoma
3. 3
Antibody Phage Display technologies are well established
after more than 10 years of use in the Pharmaceutical industry
as important drug discovery tools.
Scientific Background
Phage library Combine phage + antigen
Wash
EluteAmplify
Assay
Sequence
Heavy chain
Light chain
4. 4
What has not kept up is adequate data analysis and data
management systems.
Screening, DNA sequence analysis and candidate selection
can still be very time consuming.
We found that the data analysis aspect was a major bottleneck
Limits the number of drug development projects the pipeline
could handle
Problem
5. 5
Developed two integrated software applications
SeqAgent™ - integrated DNA sequence analysis package
specifically designed for use with antibody V-regions (Fv)
• Semi-automated pipeline
• Input is zipped DNA sequence files
• Converted to protein sequence
• Identifies framework and CDR structural features
• Produces protein sequence alignment
• Highly annotated and ready for final analysis.
Solution
6. 6
Developed two integrated software applications
SeqAgent™ - integrated DNA sequence analysis package
specifically designed for use with antibody V-regions (Fv)
XAbTracker™ - a clone / assay data management system.
• Tracks clones throughout discovery process
• Tracks and evaluates associated assay results
• Integrated sequence identification via SeqAgent™
• Provides flexible workflow and data management for antibody discovery
Solution
7. 7
Heterotetramers
8 Constant and 4 variable regions
16 light chain families
7 heavy chain families
Variable region
• 4 conserved framework regions
• 3 hyper-variable regions
Specific Challenges: Problem of data complexity
VL
CL
VH
CH1
CH2CH3
CH2CH3
CH1
VH
CL
VL
Fv
Fab
Light Chain
Heavy
Chain
8. 8
SeqAgent™ analysis workflow
Upload compressed
DNA sequences
Evaluate sequence quality
Identify low-quality regions
Translate proper reading frame
Compare protein seq against profile HMM
Identify light and heavy chain families
Identify constant and variable regions
Specific sequence pattern recognition
Cluster HC and LC sequences by
Levenshtein algorithm
Assemble and annotate display viewDisplay analysis result
UserActivity
ServerActivity
9. 9
SeqAgent™ Results Display
1
2
3 4 5 6 7 8 9 10
11
1) View management. Add / remove additional
sequences or copy view.
2) View controls.
3) Sequence selection box
4) Unique coded sequence identifiers.
5) Heavy and Light chain bin identifiers. Closely
related chain sequences have the same bin
identifier.
6) Unique light and heavy chain sequence
identifiers.
7) Representative box. Select sequence to
represent a bin.
8) Example of additional tags, signals, etc that
are automatically identified.
9) Framework and CDR regions are identified
and color coded. Alignment gaps are
indicated by a dash.
10) Poor DNA sequence quality glyph.
11) Grouped rows that have the same color
background indicate identical chain sequence
10. 10
Low quality sequence as well as stop codons, potential post
translation modification sites are indicated on the sequences
SeqAgent™ Results Display
1
2
1) Showing Query-anchored view, the first row is the anchor for the bin and
the second row is identical.
1) Additional glyphs indicating post translational modification site and an
amber stop codon.
11. 11
SeqAgent™ Results Display
• Individual sequences can be inspected
• Sequences of light and heavy chains are tracked as paired sets which
represent functional antibodies
13. 13
Tracking large numbers of clones, replicates, assays, rearrays,
etc. is no trivial task
100s to 1000s of individual bacterial colonies are picked into
96 well plates for screening.
In addition to sequencing, clones are assayed via ELISA,
FACS or SPR methods.
In most cases, the original raw data file is parsed directly into
XAbTracker™, with the exceptions coming in as tab-text after
preprocessing elsewhere, and is associated with the correct
clone.
XabTracker™ Data Management System
14. 14
Which libraries are used
What the target is
Which antigens are being screened in each assay
Organizational concepts such as Projects, Studies, Study
Rounds, Screens and Assays
Several unique naming conventions
• Individual heavy and light chain sequences
• Antibodies and their format (IgG, Fab, scFv)
• Individual clones, reformatted clones, engineered clones
XabTracker™ keeps track of…
15. 15
1.Set thresholds for all plates
dynamically to data set min and
max values
2.Button locks the results of this
analysis in the database
3.Total hit indicator for each plate
4.Data quality (histogram/ scatter
plot).
5.Per plate thresholds can
override the master threshold.
6.Threshold line indicated in blue,
hits in red. Graph values
dynamically linked to the plate
view, hit colors automatically
reflected in the assay plate.
7.Wells in the plate view are
colored in shades of blue
across 11 scaled bins to
indicate data diversity range.
When they are a hit, they are
colored on a red scale to
indicate magnitude of the hit.
XabTracker™ 1
23
4
5
6
7
16. 16
Multiple antigens and/or
multiple analysis criteria can be
part of an assay
Two different antigens and a
total of three different analyses
are summarized.
Red, orange and green in the
pie chart legend indicate which
pie slice will show the analysis
result for that assay.
Hits are indicated in yellow and
blue is a non-hit
XabTracker™ Analysis
Summary View
20. 20
Developed in-house
• Derived from open-source resources
• Less than a year by a team of three
Number of samples that can be analyzed increased >10x
Analysis time: days / weeks minutes / hours
Standardized analysis methods allow consistent data
interpretation
Prosecution of drug targets per year has increased 3x
A significant ROI on the manpower costs and minimal cost
• (< $12K) of commercial license fees needed for access to certain open
source libraries.
ROI
21. 21
Laboratory software is often plagued by antiquated interfaces
We have developed a relatively lightweight, nimble data management and sequence
analysis application suite that is specifically designed for antibody discovery
• As thin client systems, they are able to run in web browsers.
• Since they utilize responsive web UI components, the applications work equally well on
PC, tablet and even smart phone platforms, providing the users with maximum
flexibility.
We believe that these applications provide a good illustration of what the future of
laboratory software will look like
Conclusions
22. 22
For inquiries contact Zander Strange
• zander.strange@xoma.com
Thanks to
• Yevheniy (Eugene) Chuba
• Matthew Batterton
• Lauren Schwimmer
• The Discovery Research group at Xoma
• BioIT World and CHI for providing this opportunity
Finally…
Editor's Notes
Much progress has been made to improve the characteristics of the libraries, increasing diversity to more than 1011, utilizing wide repertoires of antibody frameworks, etc.
Phage display technology can be automated and hundreds of clones can be identified per screening round.
Antibodies are heterotetramers consisting of two light chains and two heavy chains, each having constant (CL, CH1, CH2, and CH3) and variable (VL and VH) regions.
The variable regions are responsible for binding specificity and are classified into families according to their sequence similarity.
For VL, there are six kappa (Vκ) families and ten lambda families (Vλ) and for VH there are seven families.
Phage display libraries are made in two formats;
Fab libraries which have VL/CL paired with VH/CH1 and
scFv libraries which have VL and VH connected with a flexible linker.
Within each variable region there are sub-regions classified as framework regions (FR1-4) and complementary determining regions (CDR1-3).
Each sequence segment (constant, variable, linker, tags, etc) is labeled.
Differential shading is used to denote sequence similarity bins.
Each bin is assigned a unique id, as is each unique sequence.
Users then have the ability to interact with the view, for example sorting by HCDR3.
Additional mining can be done using Markov clustering.
Users can choose to hide light or heavy sequences from view to allow them to focus on only one.
They can also change the alignment representation to be query-anchored in which the first sequence in a bin is shown completely and subsequent sequences only indicate difference from the first sequence.
Users may then select one or more sequences to be “representative” of a particular sequence bin. These sequence representatives are transferred back to XAbTracker™ as “hits” in a sequencing assay, providing a link between sequencing results and sample tracking in XAbTracker™
XAbTracker™ allows the users to perform data analysis directly in the LIMS system,
Providing standard normalization and analysis methodology
This integrated analysis approach along with multiple data visualizations provides the ability to perform exploratory analysis, testing various parameter thresholds.
Analysis results from multiple antigens in complex assays are summarized in top level summary result page for each assay
This makes it easy for the user to identify trends or unexpected outcomes as well as normal hits.
The system aims to fully capture the decision making process.
Prospective candidate clones are rearrayed to new plates based on a combination of hit criteria from different assays that have been performed on the same samples.
The details of these rearraying decisions are captured and the analyses involved in the decision are locked from further manipulation
All studies in a screening round are available to be included, but only six can be compared at a time
Diagram instantly changes shape based on number selected
Select any intersection to see the combined assay results for those samples
Clones that are displayed can be selected for rearray
Prior to the XAbTracker™/SeqAgent™ applications, antibody phage display data analysis was performed piecemeal utilizing different applications such as VectorNTI, Excel and SoftMaxPro.
The task of correlating DNA multiple sequence alignments (MSA) with Protein MSA was very onerous. Users often printed out pages of alignments, then drew the CDR and framework regions to identify differences.
Nonstandard analysis in Excel meant data QC and normalizations methods were often inconsistent between users.
This time consuming process took days or weeks to complete, delaying the next assay start. It also limited the number of samples that could be screened and drug targets that could be simultaneously prosecuted per year.
Laboratory software is often plagued by antiquated interfaces, restricted to specific operating systems or requires extensive, expensive customization in order to be useful.
This can result in tools that have a low user adoption rate, are not used effectively and create risks for companies as technology in general continues to improve while their data languishes in outdated legacy systems.
We have developed a relatively lightweight, nimble data management and sequence analysis application suite that is specifically designed for antibody discovery using phage display.
It was developed quickly and cheaply in-house while providing a robust drug screening platform.
By pairing a flexible web application utilizing current best practices and frameworks with existing bioinformatics expertise, XAbTracker™ and SeqAgent™ are open to refinements and improvements to meet the ever changing needs of the phage screening process in R&D.
As thin client systems, they are able to run in the web browser of any computer.
Since they utilize responsive web UI components, the applications work equally well on PC, tablet and even smart phone platforms, providing the users with maximum flexibility.
In addition to using the applications to support antibody discovery via phage display, we have successfully used them in antibody discovery for hybridomas, antibody engineering utilizing light chain shuffling or XOMA’s proprietary TAE™ system.
We believe that these applications represent a significant advance over current applications that are available and provide a good illustration of what the future of laboratory software will look like.