ASMS 2010 Poster - Mark Bayliss, Virscidian Inc - Towards automated evaluation of result accuracy for LC/MS/UV/ELSD/CLND substance screening – supporting Library Management and Medicinal Chemistry
Qvarfordt Understanding The Benefits Of Gaze Enhanced Visual SearchKalle
In certain applications such as radiology and imagery analysis, it is important to minimize errors. In this paper we evaluate a structured inspection method that uses eye tracking information as a feedback mechanism to the image inspector. Our two-phase method starts with a free viewing phase during which gaze data is collected. During the next phase, we either segment the image, mask previously seen areas of the image, or combine the two techniques, and repeat the search. We compare the different methods
proposed for the second search phase by evaluating the inspection method using true positive and false negative rates, and subjective workload. Results show that gaze-blocked configurations reduced the subjective workload, and that gaze-blocking without segmentation showed the largest increase in true positive identifications and the largest decrease in false negative identifications of previously unseen objects.
In our experiences, we have found a significant number of situations that force us to have to QC a much greater percentage of our LC/MS UV, ELSD compound QC results than we feel should be really necessary. This oftentimes means a 100% QC. Some of the reasons are summarized as: Target(s) Found (Green) but the purity or concentration of the sample being too low to be of practical usage. Targets found but eluting in a region of significant level impurities and therefore more challenging for auto-purification. Targets eluting within the solvent front or end of the chromatographic run typically with poor integration. Targets being poorly classified as found, maybe or not found due to challenges in the signal processing, baselining, peak integration, MS peak classification, poor assignment of adducts and so on. The major issue of course, was that we were not really sure to what level these issues were prevalent or were causing us to over QC results. To better understand these effects we have undertaken a relatively large scale review of our results to determine where most of the problem situation occurs and to remedy as many as possible. We were also looking to increase the trust we have our processing and to be able to trap those situations where an analyst needs to make an informed decision and communicate this effectively. This presentation summarizes some of our finding and how we have attempted to solve these issues.
Komogortsev Qualitative And Quantitative Scoring And Evaluation Of The Eye Mo...Kalle
This paper presents a set of qualitative and quantitative scores designed to assess performance of any eye movement classification algorithm. The scores are designed to provide a foundation for the eye tracking researchers to communicate about the performance validity of various eye movement classification algorithms. The paper concentrates on the five algorithms in particular: Velocity Threshold Identification (I-VT), Dispersion Threshold Identification (I-DT), Minimum Spanning Tree Identification (MST), Hidden Markov Model Identification (IHMM) and Kalman Filter Identification (I-KF). The paper presents an evaluation of the classification performance of each algorithm in the case when values of the input parameters are varied. Advantages provided by the new scores are discussed. Discussion on what is the "best" classification algorithm is provided for several applications. General recommendations for the selection of the input parameters for each algorithm are
provided.
Stephen Friend Institute for Cancer Research 2011-11-01Sage Base
This document discusses building models of disease using data intensive science. It describes integrating omics data and computational models in a compute space. The challenges of the current drug discovery process are outlined, noting a need to better understand disease biology before testing compounds. Network models are proposed to capture disease complexity beyond single components. Examples are given of building gene co-expression networks from large datasets and using them to identify disease modules and key drivers. The potential for predictive models of genotype-specific drug responses is also mentioned.
Stephen Friend National Heart Lung & Blood Institute 2011-07-19Sage Base
The document discusses using "data intensive science" and network models to better understand human disease. It describes how large datasets from equipment that can generate massive amounts of data, combined with open information systems and evolving computational models, can be used to build better maps of human disease. This "fourth paradigm" of data-driven science is presented as an advantage over traditional reductionist approaches for accelerating disease elimination through open innovation.
Stephen Friend Molecular Imaging Program at Stanford (MIPS) 2011-08-15Sage Base
The document discusses using "data intensive science" and building better disease maps through comprehensive monitoring of disease and molecular traits in large populations. It describes constructing co-expression networks from gene expression measures across hundreds of samples to identify modules of genes that interact. Preliminary probabilistic models have been built using these networks to directly identify genes that are causal for disease.
This document discusses using "data intensive science" and integrated omics data to build better maps of human diseases. It proposes using bionetworks to monitor molecular traits in populations at scale and generate massive datasets. These datasets could then be analyzed using computational approaches like Bayesian networks and co-expression networks to identify causal relationships between genes, traits, and diseases. This would allow moving beyond just identifying altered components to understanding disease mechanisms.
1. The document describes a study using spectroscopy to detect cervical dysplasia. Non-negative matrix factorization (NNMF) was used to decompose spectroscopy data into constituent source spectra and concentrations.
2. A machine learning model (Lasso regression) combined NNMF source concentrations to predict dysplasia levels. This improved prediction performance over individual reflectance or fluorescence data.
3. Two-dimensional disease maps were created locating cervical dysplasia tissue using the machine learning results. These maps correctly identified biopsy-confirmed normal and dysplastic tissue locations.
Qvarfordt Understanding The Benefits Of Gaze Enhanced Visual SearchKalle
In certain applications such as radiology and imagery analysis, it is important to minimize errors. In this paper we evaluate a structured inspection method that uses eye tracking information as a feedback mechanism to the image inspector. Our two-phase method starts with a free viewing phase during which gaze data is collected. During the next phase, we either segment the image, mask previously seen areas of the image, or combine the two techniques, and repeat the search. We compare the different methods
proposed for the second search phase by evaluating the inspection method using true positive and false negative rates, and subjective workload. Results show that gaze-blocked configurations reduced the subjective workload, and that gaze-blocking without segmentation showed the largest increase in true positive identifications and the largest decrease in false negative identifications of previously unseen objects.
In our experiences, we have found a significant number of situations that force us to have to QC a much greater percentage of our LC/MS UV, ELSD compound QC results than we feel should be really necessary. This oftentimes means a 100% QC. Some of the reasons are summarized as: Target(s) Found (Green) but the purity or concentration of the sample being too low to be of practical usage. Targets found but eluting in a region of significant level impurities and therefore more challenging for auto-purification. Targets eluting within the solvent front or end of the chromatographic run typically with poor integration. Targets being poorly classified as found, maybe or not found due to challenges in the signal processing, baselining, peak integration, MS peak classification, poor assignment of adducts and so on. The major issue of course, was that we were not really sure to what level these issues were prevalent or were causing us to over QC results. To better understand these effects we have undertaken a relatively large scale review of our results to determine where most of the problem situation occurs and to remedy as many as possible. We were also looking to increase the trust we have our processing and to be able to trap those situations where an analyst needs to make an informed decision and communicate this effectively. This presentation summarizes some of our finding and how we have attempted to solve these issues.
Komogortsev Qualitative And Quantitative Scoring And Evaluation Of The Eye Mo...Kalle
This paper presents a set of qualitative and quantitative scores designed to assess performance of any eye movement classification algorithm. The scores are designed to provide a foundation for the eye tracking researchers to communicate about the performance validity of various eye movement classification algorithms. The paper concentrates on the five algorithms in particular: Velocity Threshold Identification (I-VT), Dispersion Threshold Identification (I-DT), Minimum Spanning Tree Identification (MST), Hidden Markov Model Identification (IHMM) and Kalman Filter Identification (I-KF). The paper presents an evaluation of the classification performance of each algorithm in the case when values of the input parameters are varied. Advantages provided by the new scores are discussed. Discussion on what is the "best" classification algorithm is provided for several applications. General recommendations for the selection of the input parameters for each algorithm are
provided.
Stephen Friend Institute for Cancer Research 2011-11-01Sage Base
This document discusses building models of disease using data intensive science. It describes integrating omics data and computational models in a compute space. The challenges of the current drug discovery process are outlined, noting a need to better understand disease biology before testing compounds. Network models are proposed to capture disease complexity beyond single components. Examples are given of building gene co-expression networks from large datasets and using them to identify disease modules and key drivers. The potential for predictive models of genotype-specific drug responses is also mentioned.
Stephen Friend National Heart Lung & Blood Institute 2011-07-19Sage Base
The document discusses using "data intensive science" and network models to better understand human disease. It describes how large datasets from equipment that can generate massive amounts of data, combined with open information systems and evolving computational models, can be used to build better maps of human disease. This "fourth paradigm" of data-driven science is presented as an advantage over traditional reductionist approaches for accelerating disease elimination through open innovation.
Stephen Friend Molecular Imaging Program at Stanford (MIPS) 2011-08-15Sage Base
The document discusses using "data intensive science" and building better disease maps through comprehensive monitoring of disease and molecular traits in large populations. It describes constructing co-expression networks from gene expression measures across hundreds of samples to identify modules of genes that interact. Preliminary probabilistic models have been built using these networks to directly identify genes that are causal for disease.
This document discusses using "data intensive science" and integrated omics data to build better maps of human diseases. It proposes using bionetworks to monitor molecular traits in populations at scale and generate massive datasets. These datasets could then be analyzed using computational approaches like Bayesian networks and co-expression networks to identify causal relationships between genes, traits, and diseases. This would allow moving beyond just identifying altered components to understanding disease mechanisms.
1. The document describes a study using spectroscopy to detect cervical dysplasia. Non-negative matrix factorization (NNMF) was used to decompose spectroscopy data into constituent source spectra and concentrations.
2. A machine learning model (Lasso regression) combined NNMF source concentrations to predict dysplasia levels. This improved prediction performance over individual reflectance or fluorescence data.
3. Two-dimensional disease maps were created locating cervical dysplasia tissue using the machine learning results. These maps correctly identified biopsy-confirmed normal and dysplastic tissue locations.
Have cooperative retailers and distributors.
9) Be willing to participate in test marketing programs.
10) Have adequate facilities for data collection and analysis.
11) Be economically accessible.
12) Be willing to accept new products on a test basis.
13) Have stable economic and social conditions.
A presentation on smart content (what it is, how it is produced, why it is useful, and its relevance to the future of scholarly publishing) for the Association of American Publishers Professional and Scholarly Publishing Pre-Conference in Washington, D.C. on 2012-02-01.
ProteoMatch: modified block matching based technique for analysis of 2D gel images techpack, developed by RNASA Lab at the University of A Coruña, Spain.
BAEB601 Chapter 2: Research Types, Objectives and Problem StatementDr Nur Suhaili Ramli
This chapter discusses research problem statements, objectives, and types. It defines basic components like variables and hypotheses. There are three main types of research: exploratory research to clarify problems, descriptive research to describe situations, and causal research to identify relationships. Developing a clear problem statement involves understanding the decision-maker's objectives, the background, isolating the core problem, and determining relevant variables and units of analysis. The research questions and objectives must be specific and measurable.
Test for HIV-associated cognitive impairment in IndiaKimberly Schafer
This document describes the development of a brief screening battery to detect HIV-associated neurocognitive impairment (HAND) in India. Researchers administered a comprehensive neuropsychological (NP) battery to 206 HIV-positive Indian patients. Statistical analysis identified that combinations of two tests - the Brief Visuospatial Memory Test-Revised for learning and either the Color Trails 1 test for processing speed, Grooved Pegboard test for motor skills, or Digit Symbol test for processing speed - achieved high sensitivity and specificity for detecting HAND. The study aims to develop a quick iPad-based screening tool to assess cognitive functioning in resource-limited settings like India.
OWA BASED MAGDM TECHNIQUE IN EVALUATING DIAGNOSTIC LABORATORY UNDER FUZZY ENV...ijfls
The aim of this paper is to present an evaluation process using OWA operator in fuzzy Multi-attribute
group decision making (MAGDM) technique for helping the health-care department to choose a suitable
diagnostic laboratory among several alternatives. In the process of decision making, experts provide
linguistic terms to evaluate each of the alternatives, which are parameterized by generalized triangular
fuzzy numbers (GTFNs). Subsequently fuzzy MAGDM method is applied to determine the overall
performance value for each alternative (laboratory) to make a final decision. Finally, the diagnostic
laboratory evaluation problem is presented involving seven evaluation attributes, five laboratories and five
experts.
Aggregating Published Prediction models with Individual Patient DataNightlordTW
The document compares different approaches for aggregating published prediction models with individual patient data. It finds that Bayesian inference, which incorporates prior evidence from other models, performs similarly or better than standard logistic regression when heterogeneity across models is moderate. However, for strongly heterogeneous evidence, standard approaches without aggregation may be preferable. The study used simulations and a traumatic brain injury prediction task to evaluate the approaches.
Enhancing high throughput screeing for mycobacterium tuberculosis drug discov...Sean Ekins
This document summarizes research applying Bayesian machine learning models to enhance high-throughput screening for drug discovery against Mycobacterium tuberculosis (Mtb). The researchers built Bayesian classification models using over 200,000 compounds and their bioactivity data against Mtb. They tested the models on new screening data, achieving hit rates 4-10 times higher than random. The models were also used to prospectively select compounds for screening from large libraries, identifying several novel potent lead series. This work demonstrates that computational models can efficiently prioritize compounds for screening to increase hit discovery for Mtb drug development.
Unification Of Randomized Anomaly In Deception Detection Using Fuzzy Logic Un...IJORCS
In the recent era of computer electronic communication we are currently facing the critical impact of Deception which plays its vital role in the mode of affecting efficient information sharing system. Identifying Deception in any mode of communication is a tedious process without using the proper tool for detecting those vulnerabilities. This paper deals with the efficient tools of Deception detection in which combined application implementation is our main focus rather than with its individuality. We propose a research model which comprises Fuzzy logic, Uncertainty and Randomization. This paper deals with an experiment which implements the scenario of mixture application with its revealed results. We also discuss the combined approach rather than with its individual performance.
The document presents research on defining and understanding youth preferences and behaviors. It outlines the research problem, design, phases, hypotheses, analysis and key findings. The research was conducted by a team and included exploratory qualitative research followed by a quantitative survey. Key findings included that age does define youth, independence is most important, loss of identity is a top worry, surfing the internet is a favorite pastime, peers have strong influence, products sell themselves over endorsers, and youth icons lose influence with age. Recommendations focused on using the internet for communication targeting independence among peers without an endorser.
The document discusses extreme spatio-temporal data analysis in biomedical informatics. It summarizes contributions in computer science methods for analyzing large datasets from sensors and in biomedical fields to mine insights. The talk outlines analyzing pathology images, tumor subtyping in brain tumors, whole slide image analysis in clinical practice, and tissue flow analysis using high performance computing.
The importance of learning how patients feel and function when taking a new clinical therapy has been acknowledged by the FDA, EMA and other global regulatory authorities. Sponsors currently engaged in drug development programs appreciate and leverage the added value of patient-reported outcome (PRO) data. They no longer ask if PROs should be collected, but what phase to begin PRO collection.
This issue of Insights is intended to identify the costs (delays and expenses) of collecting patient-reported outcomes on paper; and compare these against electronic PRO capture. The intention of this issue to provide clinical teams with industry data that can refute the presumption that paper methods are cheaper than ePRO.
This document describes two machine learning techniques, particle swarm optimization with support vector machines (PSO-SVM) and recursive feature elimination with support vector machines (RFE-SVM), that were used to classify autism neuroimaging data from the Autism Brain Imaging Data Exchange database. PSO-SVM was used to select discriminative features for classification, while RFE-SVM ranked features by importance. Both techniques aimed to improve classification accuracy and reduce overfitting by selecting optimal feature subsets from the high-dimensional neuroimaging data. The results could help develop brain-based diagnostic criteria for autism.
In our experiences, we have found a significant number of situations that force us to have to QC a much greater percentage of our LC/MS UV, ELSD compound QC results than we feel should be really necessary. This oftentimes means a 100% QC. Some of the reasons are summarized as: Target(s) Found (Green) but the purity or concentration of the sample being too low to be of practical usage. Targets found but eluting in a region of significant level impurities and therefore more challenging for auto-purification. Targets eluting within the solvent front or end of the chromatographic run typically with poor integration. Targets being poorly classified as found, maybe or not found due to challenges in the signal processing, baselining, peak integration, MS peak classification, poor assignment of adducts and so on. The major issue of course, was that we were not really sure to what level these issues were prevalent or were causing us to over QC results. To better understand these effects we have undertaken a relatively large scale review of our results to determine where most of the problem situation occurs and to remedy as many as possible. We were also looking to increase the trust we have our processing and to be able to trap those situations where an analyst needs to make an informed decision and communicate this effectively. This presentation summarizes some of our finding and how we have attempted to solve these issues.
1) The document describes a study analyzing over 1,000 LC/UV and MS compound quality control results to understand common issues requiring unnecessary re-review and improve processing methods.
2) Using statistical analysis tools, the researchers compared traditional "traffic light" result categorization to a customized "combined query" approach, finding the latter exposed more hidden detail and improved focused review accuracy.
3) The study determined that while 100% review ensures the highest accuracy, a targeted workflow-based approach using specific result tags and queries could reliably classify results with 97% accuracy, reducing unnecessary re-review needs.
Modeling XCS in class imbalances: Population sizing and parameter settingskknsastry
This paper analyzes the scalability of the population size required in XCS to maintain niches that are infrequently activated. Facetwise models have been developed to predict the effect of the imbalance ratio—ratio between the number of instances of the majority class and the minority class that are sampled to XCS—on population initialization, and on the creation and deletion of classifiers of the minority class. While theoretical models show that, ideally, XCS scales linearly with the imbalance ratio, XCS with standard configuration scales exponentially.
The causes that are potentially responsible for this deviation from the ideal scalability are also investigated. Specifically, the inheritance procedure of classifiers’ parameters, mutation, and subsumption are analyzed, and improvements in XCS’s mechanisms are proposed to effectively and efficiently handle imbalanced problems. Once the recommendations are incorporated to XCS, empirical results show that the population size in XCS indeed scales linearly with the imbalance ratio.
my poster presentation in the jcms2011 conferencePawitra Masa-ah
1) The study created a new scheme for calculating standardized uptake values (SUVs) from DICOM files using MATLAB and tested it by comparing results to a GE healthcare software.
2) The SUVs calculated from the MATLAB scheme showed a high correlation of 0.974 with the GE software. The accuracy was 85% on average based on a 95% confidence interval.
3) The results demonstrated the SUVs from the MATLAB scheme can be used interchangeably with the GE software, providing increased accessibility for physicians to interpret PET/CT scans without other applications.
Open science resources for `Big Data' Analyses of the human connectomeCameron Craddock
Neuroimaging has become a `Big Data' pursuit that requires very large datasets and high throughput computational tools. In this talk I will highlight many open science resources for acquiring the necessary data. This is from a lecture that I gave in 2015 at the USC Neuroimaging and Informatics Institute.
This study developed a new quantitative method for mass spectrometry imaging (MSI) using matrix-assisted laser desorption/ionization (MALDI). Liver tissue samples from rats administered varying doses of olanzapine were analyzed by both MALDI-MSI and liquid chromatography-tandem mass spectrometry (LC/MS/MS) to determine drug concentrations. A linear correlation between MSI response and LC/MS/MS concentrations was obtained, allowing MSI data to be quantitated based on a conversion factor. This new method provides a way to quantitatively interpret MSI data in terms of drug concentrations and could help advance MSI for applications in drug development and safety assessment.
Deep learning methods applied to physicochemical and toxicological endpointsValery Tkachenko
Chemical and pharmaceutical companies, and government agencies regulating both chemical and biological compounds, all strive to develop new methods to provide efficient prioritization, evaluation and safety assessments for the hundreds of new chemicals that enter the market annually. While there is a lot of historical data available within the various agencies, organizations and companies, significant gaps remain in both the quantity and quality of data available coupled with optimal predictive methods. Traditional QSAR methods are based on sets of features (fingerprints) which representing the functional characteristics of chemicals. Unfortunately, due to both data gaps and limitations in the development of QSAR models, read-across approaches have become a popular area of research. Successes in the application of Artificial Neural Networks, and specifically in Deep Learning Neural Networks, has delivered a new optimism that the lack of data and limited feature sets can be overcome by using Deep Learning methods. In this poster we will present a comparison of various machine learning methods applied to several toxicological and physicochemical parameter endpoints. This abstract does not reflect U.S. EPA policy.
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
This document provides an introduction and overview of tutorials for metabolomic data analysis. It discusses downloading required files and software. The goals of the analysis include using statistical and multivariate analyses to identify differences between sample groups and impacted biochemical domains. It also discusses various data analysis techniques including data quality assessment, univariate and multivariate statistical analyses, clustering, principal component analysis, partial least squares modeling, functional enrichment analysis, and network mapping.
Have cooperative retailers and distributors.
9) Be willing to participate in test marketing programs.
10) Have adequate facilities for data collection and analysis.
11) Be economically accessible.
12) Be willing to accept new products on a test basis.
13) Have stable economic and social conditions.
A presentation on smart content (what it is, how it is produced, why it is useful, and its relevance to the future of scholarly publishing) for the Association of American Publishers Professional and Scholarly Publishing Pre-Conference in Washington, D.C. on 2012-02-01.
ProteoMatch: modified block matching based technique for analysis of 2D gel images techpack, developed by RNASA Lab at the University of A Coruña, Spain.
BAEB601 Chapter 2: Research Types, Objectives and Problem StatementDr Nur Suhaili Ramli
This chapter discusses research problem statements, objectives, and types. It defines basic components like variables and hypotheses. There are three main types of research: exploratory research to clarify problems, descriptive research to describe situations, and causal research to identify relationships. Developing a clear problem statement involves understanding the decision-maker's objectives, the background, isolating the core problem, and determining relevant variables and units of analysis. The research questions and objectives must be specific and measurable.
Test for HIV-associated cognitive impairment in IndiaKimberly Schafer
This document describes the development of a brief screening battery to detect HIV-associated neurocognitive impairment (HAND) in India. Researchers administered a comprehensive neuropsychological (NP) battery to 206 HIV-positive Indian patients. Statistical analysis identified that combinations of two tests - the Brief Visuospatial Memory Test-Revised for learning and either the Color Trails 1 test for processing speed, Grooved Pegboard test for motor skills, or Digit Symbol test for processing speed - achieved high sensitivity and specificity for detecting HAND. The study aims to develop a quick iPad-based screening tool to assess cognitive functioning in resource-limited settings like India.
OWA BASED MAGDM TECHNIQUE IN EVALUATING DIAGNOSTIC LABORATORY UNDER FUZZY ENV...ijfls
The aim of this paper is to present an evaluation process using OWA operator in fuzzy Multi-attribute
group decision making (MAGDM) technique for helping the health-care department to choose a suitable
diagnostic laboratory among several alternatives. In the process of decision making, experts provide
linguistic terms to evaluate each of the alternatives, which are parameterized by generalized triangular
fuzzy numbers (GTFNs). Subsequently fuzzy MAGDM method is applied to determine the overall
performance value for each alternative (laboratory) to make a final decision. Finally, the diagnostic
laboratory evaluation problem is presented involving seven evaluation attributes, five laboratories and five
experts.
Aggregating Published Prediction models with Individual Patient DataNightlordTW
The document compares different approaches for aggregating published prediction models with individual patient data. It finds that Bayesian inference, which incorporates prior evidence from other models, performs similarly or better than standard logistic regression when heterogeneity across models is moderate. However, for strongly heterogeneous evidence, standard approaches without aggregation may be preferable. The study used simulations and a traumatic brain injury prediction task to evaluate the approaches.
Enhancing high throughput screeing for mycobacterium tuberculosis drug discov...Sean Ekins
This document summarizes research applying Bayesian machine learning models to enhance high-throughput screening for drug discovery against Mycobacterium tuberculosis (Mtb). The researchers built Bayesian classification models using over 200,000 compounds and their bioactivity data against Mtb. They tested the models on new screening data, achieving hit rates 4-10 times higher than random. The models were also used to prospectively select compounds for screening from large libraries, identifying several novel potent lead series. This work demonstrates that computational models can efficiently prioritize compounds for screening to increase hit discovery for Mtb drug development.
Unification Of Randomized Anomaly In Deception Detection Using Fuzzy Logic Un...IJORCS
In the recent era of computer electronic communication we are currently facing the critical impact of Deception which plays its vital role in the mode of affecting efficient information sharing system. Identifying Deception in any mode of communication is a tedious process without using the proper tool for detecting those vulnerabilities. This paper deals with the efficient tools of Deception detection in which combined application implementation is our main focus rather than with its individuality. We propose a research model which comprises Fuzzy logic, Uncertainty and Randomization. This paper deals with an experiment which implements the scenario of mixture application with its revealed results. We also discuss the combined approach rather than with its individual performance.
The document presents research on defining and understanding youth preferences and behaviors. It outlines the research problem, design, phases, hypotheses, analysis and key findings. The research was conducted by a team and included exploratory qualitative research followed by a quantitative survey. Key findings included that age does define youth, independence is most important, loss of identity is a top worry, surfing the internet is a favorite pastime, peers have strong influence, products sell themselves over endorsers, and youth icons lose influence with age. Recommendations focused on using the internet for communication targeting independence among peers without an endorser.
The document discusses extreme spatio-temporal data analysis in biomedical informatics. It summarizes contributions in computer science methods for analyzing large datasets from sensors and in biomedical fields to mine insights. The talk outlines analyzing pathology images, tumor subtyping in brain tumors, whole slide image analysis in clinical practice, and tissue flow analysis using high performance computing.
The importance of learning how patients feel and function when taking a new clinical therapy has been acknowledged by the FDA, EMA and other global regulatory authorities. Sponsors currently engaged in drug development programs appreciate and leverage the added value of patient-reported outcome (PRO) data. They no longer ask if PROs should be collected, but what phase to begin PRO collection.
This issue of Insights is intended to identify the costs (delays and expenses) of collecting patient-reported outcomes on paper; and compare these against electronic PRO capture. The intention of this issue to provide clinical teams with industry data that can refute the presumption that paper methods are cheaper than ePRO.
This document describes two machine learning techniques, particle swarm optimization with support vector machines (PSO-SVM) and recursive feature elimination with support vector machines (RFE-SVM), that were used to classify autism neuroimaging data from the Autism Brain Imaging Data Exchange database. PSO-SVM was used to select discriminative features for classification, while RFE-SVM ranked features by importance. Both techniques aimed to improve classification accuracy and reduce overfitting by selecting optimal feature subsets from the high-dimensional neuroimaging data. The results could help develop brain-based diagnostic criteria for autism.
In our experiences, we have found a significant number of situations that force us to have to QC a much greater percentage of our LC/MS UV, ELSD compound QC results than we feel should be really necessary. This oftentimes means a 100% QC. Some of the reasons are summarized as: Target(s) Found (Green) but the purity or concentration of the sample being too low to be of practical usage. Targets found but eluting in a region of significant level impurities and therefore more challenging for auto-purification. Targets eluting within the solvent front or end of the chromatographic run typically with poor integration. Targets being poorly classified as found, maybe or not found due to challenges in the signal processing, baselining, peak integration, MS peak classification, poor assignment of adducts and so on. The major issue of course, was that we were not really sure to what level these issues were prevalent or were causing us to over QC results. To better understand these effects we have undertaken a relatively large scale review of our results to determine where most of the problem situation occurs and to remedy as many as possible. We were also looking to increase the trust we have our processing and to be able to trap those situations where an analyst needs to make an informed decision and communicate this effectively. This presentation summarizes some of our finding and how we have attempted to solve these issues.
1) The document describes a study analyzing over 1,000 LC/UV and MS compound quality control results to understand common issues requiring unnecessary re-review and improve processing methods.
2) Using statistical analysis tools, the researchers compared traditional "traffic light" result categorization to a customized "combined query" approach, finding the latter exposed more hidden detail and improved focused review accuracy.
3) The study determined that while 100% review ensures the highest accuracy, a targeted workflow-based approach using specific result tags and queries could reliably classify results with 97% accuracy, reducing unnecessary re-review needs.
Modeling XCS in class imbalances: Population sizing and parameter settingskknsastry
This paper analyzes the scalability of the population size required in XCS to maintain niches that are infrequently activated. Facetwise models have been developed to predict the effect of the imbalance ratio—ratio between the number of instances of the majority class and the minority class that are sampled to XCS—on population initialization, and on the creation and deletion of classifiers of the minority class. While theoretical models show that, ideally, XCS scales linearly with the imbalance ratio, XCS with standard configuration scales exponentially.
The causes that are potentially responsible for this deviation from the ideal scalability are also investigated. Specifically, the inheritance procedure of classifiers’ parameters, mutation, and subsumption are analyzed, and improvements in XCS’s mechanisms are proposed to effectively and efficiently handle imbalanced problems. Once the recommendations are incorporated to XCS, empirical results show that the population size in XCS indeed scales linearly with the imbalance ratio.
my poster presentation in the jcms2011 conferencePawitra Masa-ah
1) The study created a new scheme for calculating standardized uptake values (SUVs) from DICOM files using MATLAB and tested it by comparing results to a GE healthcare software.
2) The SUVs calculated from the MATLAB scheme showed a high correlation of 0.974 with the GE software. The accuracy was 85% on average based on a 95% confidence interval.
3) The results demonstrated the SUVs from the MATLAB scheme can be used interchangeably with the GE software, providing increased accessibility for physicians to interpret PET/CT scans without other applications.
Open science resources for `Big Data' Analyses of the human connectomeCameron Craddock
Neuroimaging has become a `Big Data' pursuit that requires very large datasets and high throughput computational tools. In this talk I will highlight many open science resources for acquiring the necessary data. This is from a lecture that I gave in 2015 at the USC Neuroimaging and Informatics Institute.
This study developed a new quantitative method for mass spectrometry imaging (MSI) using matrix-assisted laser desorption/ionization (MALDI). Liver tissue samples from rats administered varying doses of olanzapine were analyzed by both MALDI-MSI and liquid chromatography-tandem mass spectrometry (LC/MS/MS) to determine drug concentrations. A linear correlation between MSI response and LC/MS/MS concentrations was obtained, allowing MSI data to be quantitated based on a conversion factor. This new method provides a way to quantitatively interpret MSI data in terms of drug concentrations and could help advance MSI for applications in drug development and safety assessment.
Deep learning methods applied to physicochemical and toxicological endpointsValery Tkachenko
Chemical and pharmaceutical companies, and government agencies regulating both chemical and biological compounds, all strive to develop new methods to provide efficient prioritization, evaluation and safety assessments for the hundreds of new chemicals that enter the market annually. While there is a lot of historical data available within the various agencies, organizations and companies, significant gaps remain in both the quantity and quality of data available coupled with optimal predictive methods. Traditional QSAR methods are based on sets of features (fingerprints) which representing the functional characteristics of chemicals. Unfortunately, due to both data gaps and limitations in the development of QSAR models, read-across approaches have become a popular area of research. Successes in the application of Artificial Neural Networks, and specifically in Deep Learning Neural Networks, has delivered a new optimism that the lack of data and limited feature sets can be overcome by using Deep Learning methods. In this poster we will present a comparison of various machine learning methods applied to several toxicological and physicochemical parameter endpoints. This abstract does not reflect U.S. EPA policy.
Metabolomic Data Analysis Workshop and Tutorials (2014)Dmitry Grapov
This document provides an introduction and overview of tutorials for metabolomic data analysis. It discusses downloading required files and software. The goals of the analysis include using statistical and multivariate analyses to identify differences between sample groups and impacted biochemical domains. It also discusses various data analysis techniques including data quality assessment, univariate and multivariate statistical analyses, clustering, principal component analysis, partial least squares modeling, functional enrichment analysis, and network mapping.
Implementation of the Defined Approaches on Skin Sensitisation (OECD GL 497) ...OECD Environment
Humans and the environment are exposed every day to chemicals. How do we make sure that these chemicals are safe?
Industry is required to test these chemicals to understand how they may affect people and the environment. In the past, these tests were most commonly carried out on animals. As scientific methods and tools progress, the use of animals to test a product designed for humans are becoming obsolete, in addition to being unethical. With new methods being developed, it is possible to perform these tests on human and animal cell cultures with equally rigorous and robust results. Because the OECD is committed to chemical safety and animal welfare, a new ground-breaking Guideline on Defined Approaches for Skin Sensitisation (OECD GL 497: https://doi.org/10.1787/b92879a4-en) was released on 14 June 2021. It is the first ever Guideline that uses non-animal methods to predict whether a chemical can cause skin allergies.
The OECD organised a webinar on 18 October 2021 at 14:00 to discuss the implementation of the Defined Approaches on Skin Sensitisation for chemical safety in member countries. This webinar paved the way for companies and authorities to determine the environmental toxicity of chemicals without having to resort to animal testing.
Speakers:
Nicole Kleinestreuer: NTP Interagency Center for the Evaluation of Alternative Toxicological Methods (NICEATM)
Silvia Casati: European Union Reference Laboratory for alternatives to animal testing (EURL ECVAM)
Anna Lowit: U.S. Environmental Protection's Office of Pesticide Programs (US EPA OPP)
Paul Brown: U.S. Food and Drug Administration (US FDA)
Laura Rossi: European Chemicals Agency (ECHA)
Andre Muller: National Institute for Public Health and the Environment (RIVM)
Access the video replay and more information about our work at: https://oe.cd/testing-assessment-webinars
The document discusses using a probabilistic neural network (PNN) to analyze seismic data and well logs to identify physical attributes, describing the layers and processing of the PNN model as well as examples of preprocessing seismic data and attributes to train the PNN to accurately predict properties like porosity and hydrocarbon volume. The PNN is trained on normalized seismic attribute data and well logs then applied to the full 3D seismic volume to generate property predictions across the area.
The document summarizes a research paper that proposed a link prediction model for citation networks. It applied support vector machines (SVMs) as the classifier and used 11 features optimized for citation networks across 5 academic fields. The model was able to better predict links compared to just using the classifier's performance metrics. However, the effective features varied by academic field, suggesting different models should be applied for different research areas.
1) The document describes using a Bayesian network model in BayesiaLab to classify cancer samples into two types (ALL vs AML) based on gene expression data from microarray analysis.
2) It imports gene expression data from 72 samples and over 7,000 genes, discretizes the continuous gene expression levels, and identifies a subset of genes best for classification through Markov blanket learning.
3) The model achieves equal or better classification performance compared to previous studies, demonstrating that Bayesian networks can efficiently generate effective classification models from high-dimensional genomic data.
Use of the Crowdsourcing Methodology to Generate a Problem-Laboratory Test Kn...Allison McCoy
We evaluated the use of a previously described crowdsourcing methodology to generate a problem-laboratory test knowledge base, identifying appropriately linked problem-laboratory test pairs by clinicians during e-ordering. Existing evaluation metrics, including patient frequency and link ratio, were not correlated with appropriateness for 600 links manually validated. Further research is necessary to better evaluate these associations.
Saliency Based Hookworm and Infection Detection for Wireless Capsule Endoscop...IRJET Journal
This document presents a method for detecting hookworm infection and ulcers in wireless capsule endoscopy images using saliency-based segmentation. The proposed method uses multi-level superpixel segmentation followed by feature extraction of color and texture properties. A particle swarm optimization algorithm is then used to classify images as healthy or infected/ulcerous based on the extracted features. Experimental results on capsule endoscopy images demonstrate the effectiveness of the proposed method at automatically detecting abnormalities in an efficient and non-invasive manner.
In this presentation I will show a set of important topics about Software Engineering Empirical Studies that can be useful for increasing quality on your thesis and monographs in general. You can read this presentation and to think about how to do a good experimentation by apply its objectives, validation methods, questions, answers expected, define metrics and measuring it.I will exhibit how the researchers selected the data for avoid case studies in a biased way using a GQM methodology to sort the study in a simpler view as well.
This document provides an overview of hypothesis testing basics and introduces related concepts. It discusses:
1) The difference between population parameters and sample statistics, and how samples are used to estimate populations.
2) Key terms like means, medians, standard deviations, and how samples provide statistic estimates of population parameters.
3) The Central Limit Theorem and how the distribution of sample means approaches normality as sample size increases.
4) Examples of applying hypothesis testing to compare processes and identify statistical differences in metrics like cycle time, accuracy, and quality of service.
This document provides an overview of hypothesis testing basics and confidence intervals. It discusses key concepts such as population parameters versus sample statistics, the central limit theorem, and variability of means. It also covers confidence intervals when the population standard deviation is known and unknown. Examples are provided to demonstrate how to calculate confidence intervals for the mean. The goal is to introduce statistical tests and understand how sample sizes influence results.
This document discusses developing robust and controlled bioanalytical methods through the use of statistics and experimental design. It emphasizes scoping potential method parameters through screening experiments, using designs of experiments to understand interactions and optimize conditions, and validating methods to verify robustness under deliberate variations. The goal is to generate data-driven methods with well-defined performance that can be reliably transferred to quality control and brought into routine use.
Similar to Virscidian Poster Asms2010 Final Version Letter (20)
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Sudheer Mechineni, Head of Application Frameworks, Standard Chartered Bank
Discover how Standard Chartered Bank harnessed the power of Neo4j to transform complex data access challenges into a dynamic, scalable graph database solution. This keynote will cover their journey from initial adoption to deploying a fully automated, enterprise-grade causal cluster, highlighting key strategies for modelling organisational changes and ensuring robust disaster recovery. Learn how these innovations have not only enhanced Standard Chartered Bank’s data infrastructure but also positioned them as pioneers in the banking sector’s adoption of graph technology.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!SOFTTECHHUB
As the digital landscape continually evolves, operating systems play a critical role in shaping user experiences and productivity. The launch of Nitrux Linux 3.5.0 marks a significant milestone, offering a robust alternative to traditional systems such as Windows 11. This article delves into the essence of Nitrux Linux 3.5.0, exploring its unique features, advantages, and how it stands as a compelling choice for both casual users and tech enthusiasts.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Virscidian Poster Asms2010 Final Version Letter
1. Towards automated evaluation of result accuracy for LC/MS/UV/ELSD/CLND substance screening – supporting Library Management and Medicinal Chemistry
Mark A. Bayliss, Joseph D. Simpkins, Virscidian Inc., Raleigh NC 27607
Abstract Results Solving the challenge of multiple data‐streams and signal Conclusions
The analysis of data supporting corporate compound library management, synthesis and Of the 600 samples, a total of 500 were detected with the Target substance determined as “Present”. The graph contributions to determine target relevance What is the practical application of this study?
medicinal chemistry support relies on LC/MS/UV/ELSD/CLND/CAD) as its primary means of below represents the level of deviation in the calculated Area% ‐ one of the key values typically reported back to Challenge 1 – Target Relevance Determination
1) Automation of raw data processing that then requires minimum results review
substance confirmation and is often highly automated. Confirmation being defined here as the Medicinal Chemists. The key thing to note, that other than 4 samples out of 500, the AsLs2 baseline performed well
One of the challenges in automated analysis of synthetic targets is being able to determine if the target is really fundamentally hinges on the ability of a piece of software to determine accurately
presence of the substance of interest, its purity (%Area of some chosen detector stream and within the experimental limits expected of this type of study. (The change of +100 represents the addition of a
there or not – we will refer to this as “Target Relevance”. Because we have to analyze compounds with a wide baselines and peak integrations.
typically UV) and in some cases an empirical concentration calculation using CLND, ELSD or CAD. target substance peak through manual integration, whereas a negative deviation represents a reduction in the
structural diversity we find a wide degree of detection differences between MS (positive and negative ionization)
Our perception after performing millions of sample analyses is that we had to manually review contribution of %area to the Target of interest). The comparison baseliner was subject to greater levels of deviation 2) This study was designed holistically to evaluate the accuracy of peak determination
and available analog detectors.
more results and make more modifications than we felt was time efficient. Our greatest than observed with AsLS2 which is in line with our previous smaller scale investigations. which encompasses accurate baselining, peak detection, tailing determination and
challenges were baseline determination inaccuracies, poor signal differentiation in the MS for Some example scenarios that can cause challenges in target selection peak selection.
Sample 245 for the AsLs2 baseliner results shows the addition of a previously non‐picked peak. The reason why this
weakly ionizing compounds, and poor assessment of adducts. Our challenge was to find a way (Target is found by MS (Positive AND/OR Negative ionization)) AND (has a high %TIC Area contribution) 3) The applicability of using predefined settings to process a range of data from the
target was not selected was due to peak filtering settings and not the baseliner – This is shown as the small
to quantify these aspects and evaluate solutions. AND (has no UV response) then this target compound may still be relevant to further investigation. same method and instrument but across a different days.
chromatogram inset within the graph.
(Target is weak by MS (Positive AND/OR Negative ionization)) AND (% TIC Area contribution is high) 4) To be able to define logical tests across a range of detector streams that reduces
The key thing that this study clearly highlights, is that careful choice of baselining is a key criterion in obtaining
Method accurate results which require minimal user review. AND (has a high %UV Area contribution) may still be relevant for further investigation. UV response the need for manual results QC.
could be based on multiple UV wavelengths Eg: 210 nm, 254nm and 310 nm.
• A statistically relevant batch of 600 random crude synthesis data sets were selected Overlaid Plot of Differences for for two baseliner Algorithms (Baseline Algorithm 1 and
AsLS2) for UV310 Within the application used for these experiments, is the ability to calculate a wide range of user defined
representing what we can refer to as reasonably challenging samples. Additionally these 1) Accuracy of baselining and Peak integrations.
100
mathematical expressions where the expressions can use calculated peak results. The calculated expression
samples are representative of the type of samples that would be found in Library Manual peak Addition.
Low signal Intensity, filtered by peak
values are then exposed in the application interface and can then be applied as interactive slider and logical This study confirms our earlier statement that not all baseline algorithms produce
Management Support and Medicinal Chemistry. The data were originally acquired using an 80
selection settings
query based visualization filters to highlight samples of interest. equivalent results. Comparisons of our own internal baselining algorithms as
Agilent Technologies Ion Trap, with the following streams of data [MS1 (+ve), UV310 and
presented here clearly shows that the AsLS2 baseline outperforms the comparison
ELSD]. A fast chromatographic gradient over 2 minutes was used for separation of the 60 Example implementation
peak picking based (Baseline Algorithm 1).
substances.
Data Processing 40 In making our manual review of integrated peaks, we found that the majority of
For a Target to be Found the following conditions must be met: failure situations fell into a number of basic categories
%Deviation from manually reviewed results
• All data were analysed using Virscidian’s Analytical Studio Professional‐Process Chemistry
20
Plug‐in software pre‐release version 1.2. Target = Y AND (Area% UV210 >= 80% OR Area% UV254 >= 80%)
• The original instrument raw data were imported and converted to an Analytical Studio For a Target to be classified as a Maybe then the following conditions must be met:
0 • Recurrent sample‐to‐sample baseline disturbances contributed to the majority of
Archive file (*.ASA) for processing. 0 50 100 150 200 250 300 350 400 450 500
Target = Y AND ((Area% UV210 is between 50 AND 80%) OR (Area% UV254 is between 50 AND 80%) none sample related peaks that were picked and included in the final result. While
• Processing Method was optimized for: ‐20
a background subtraction would help clearly some form of sample to sample
• Peak picking, integration and peak selection criteria using the interactive tuning For a Target to be classified as Not Found then the following conditions must be met:
baseline realignment and recognition may provide additional improvements.
system included in the software application and shown in Figure 1. An integration ‐40
Target = Y or N AND (Area% UV210 <50%) OR (Area% UV254 < 50%)
window was set to remove the contributions from the solvent front and the tail end • Missed peaks due to tail or fronting effects. A small percentage of samples required
some form of manual peak reintegration or peak integration addition to overcome
of the gradient where some excessive baseline ripples were present. ‐60
• Specific Method Settings Challenge 2 – Visualization of Target Relevance some form of shouldering on peaks that were highly overlapped or of poor signal to
• Two different processing methods were then saved with optimized baseline settings ‐80 noise. These are challenging situations for any automated algorithm to deal with. It
Another challenge is how to visualize arrays of results in a way that facilitates decision making based on the
for the following two test baseline algorithms that form the focus of this evaluation. is possible with some data driven adaptive approaches may provide additional
Target Relevance. One approach that has been adopted is to allow the values of calculated expressions to be
• Baseline Algorithm 1– A generic peak picking based algorithm. ‐100
Sample Number improvements.
visualized using a user defined query and coloration system as shown immediately below. Note the differences
Baseline Algorithm 1 (%Area) Normalized AsLs2 Difference %AreaCOI(UV310)
• AsLS2 – A proprietary in‐house developed baseline based on a least squares which are displayed as blue colored markers. The colorations are simply controlled by the query system. If a • Occasional peak filtration due to peak picking and selection criteria. These were
approach. Figure 2: Plot of the deviation in Area% for two baseline algorithms (Baseline Algorithm 1 and AsLs2) and the categorized as:
different series of target relevance criteria are required these are added as new expressions and queries.
corresponding manually reviewed and integrated results
• Batches of data were then selected from different non‐consecutive days of sample
o Low intensity peaks that were below the defined minimum area for peak
acquisitions to make up the test sample collection.
TRADITIONAL PLATE VISUALIZATION selection.
• All data were processed first using the Baseline Algorithm 1 and then secondly with the Sample‐to‐sample challenges
AsLS2 baseliner and an Excel peak report created without review of the results in each case. o Peaks dramatically wider than the normal peak widths set in the processing
Another challenge in obtaining reliable %Area calculation results is being able to differentiate baseline disturbances
TRADITIONAL PLATE DISPLAY
• The post AsLS2 baseline results were then inspected manually and where appropriate
from sample related peaks. The example below provides an indication of the some challenges that were faced during
method.
baseline adjustment, peak re‐integrations were made and peaks added if required.
this investigation. Even in the presence of the baseline ripples at the end of the chromatography, both baseliners and
• For each baseline algorithm tested, the %Area results for the target were subtracted from
the peak filtration parameters that were applied, were able to deal with the majority of these issues. Certainly
COMPOSITE DETECTOR STREAM RESULT VISUALIZATION
While 100% accuracy in automated results is the shared goal in the community, the
the manually integrated results and then normalized to the %area of the manually integrated
additional future investigations for sample‐to‐sample baseline recognition and realignment may make an
results. (TARGET FOUND) AND (%AREA(210)>=90% AND
practicality of real data means that challenges will still persist. A combination of result
incremental improvement. visualization approaches and exposure of data validation elements can provide a key
• Figure 1 shows an example low intensity chromatogram of an expected target. Note that %AREA(254)>=90%)
way to guide reviewer to these problems as shown in this poster.
intensities as would be normally expected in this type of experiment ranged from no
detection through to saturation.
(TARGET FOUND) AND ((%AREA(210)>=50%
AND <90%) OR (%AREA(254)>=50% AND <90%)) Certainly all baseline algorithms are not equivalent. A high performance baseliner is
imperative if high accuracy results that require minimum quality control are the goal.
((TARGET NOT FOUND) OR ((TARGET FOUND) We have found that equally important are the Peak picking and peak filtration
AND (%AREA(210)<50%) OR
(%AREA(254)<50%))) algorithms that are able to differentiate peaks from noise.
Figure 4: Rapid batch‐wise visualization of complex logic and value based decision making for target selection
For Further Information
www.virscidian.com
Contact Joseph Simpkins at jsimpkins@virscidian.com
Contact Mark Bayliss at mbayliss@virscidian.com
Figure 1: Low intensity UV310 extracted chromatogram that was used to calculate %Area for this Figure 3: Overlay visualization of UV310 for 64 test samples extracted from the matrix of samples processed. Note
series of analyses. the baseline resonance towards the end of the chromatographic analysis. This area is problematic for any baseliner Virscidian Inc. 7330 Chapel Hill Road, Suite 201, Raleigh, NC 27607,
Peaks which displayed with a cross (x) are peaks that have been peak picked but filtered by user and peak detection algorithm. Removal of this region was not possible due to elution of a small number of Target
USA
defined peak selection settings within the processing method. compounds.
Due to the fast gradient, these late eluting resonance peaks are typically shifted in their retention times making a (919) 809‐7651 or (919) 655 8050
simple baseline subtraction approach less effective.