Virscidian Poster Asms2010 Final Version Letter

Towards automated evaluation of result accuracy for LC/MS/UV/ELSD/CLND substance screening – supporting Library Management and Medicinal Chemistry

Mark A. Bayliss, Joseph D. Simpkins,  Virscidian Inc., Raleigh NC 27607

Abstract Results Solving the challenge of multiple data‐streams and signal Conclusions
The analysis of data supporting corporate compound library management, synthesis and Of the 600 samples, a total of 500 were detected with the Target substance determined as “Present”. The graph contributions to determine target relevance What is the practical application of this study?
medicinal chemistry support relies on LC/MS/UV/ELSD/CLND/CAD) as its primary means of below represents the level of deviation in the calculated Area% ‐ one of the key values typically reported back to Challenge 1 – Target Relevance Determination
1) Automation of raw data processing that then requires minimum results review
substance confirmation and is often highly automated. Confirmation being defined here as the Medicinal Chemists. The key thing to note, that other than 4 samples out of 500, the AsLs2 baseline performed well
One of the challenges in automated analysis of synthetic targets is being able to determine if the target is really fundamentally hinges on the ability of a piece of software to determine accurately
presence of the substance of interest, its purity (%Area of some chosen detector stream and within the experimental limits expected of this type of study. (The change of +100 represents the addition of a
there or not – we will refer to this as “Target Relevance”. Because we have to analyze compounds with a wide baselines and peak integrations.
typically UV) and in some cases an empirical concentration calculation using CLND, ELSD or CAD. target substance peak through manual integration, whereas a negative deviation represents a reduction in the
structural diversity we find a wide degree of detection differences between MS (positive and negative ionization)
Our perception after performing millions of sample analyses is that we had to manually review contribution of %area to the Target of interest). The comparison baseliner was subject to greater levels of deviation 2) This study was designed holistically to evaluate the accuracy of peak determination
and available analog detectors.
more results and make more modifications than we felt was time efficient. Our greatest than observed with AsLS2 which is in line with our previous smaller scale investigations.    which encompasses accurate baselining, peak detection, tailing determination and

challenges were baseline determination inaccuracies, poor signal differentiation in the MS for Some example scenarios that can cause challenges in target selection peak selection.
Sample 245 for the AsLs2 baseliner results shows the addition of a previously non‐picked peak. The reason why this
weakly ionizing compounds, and poor assessment of adducts. Our challenge was to find a way  (Target is found by MS (Positive AND/OR Negative ionization)) AND (has a high %TIC Area contribution) 3) The applicability of using predefined settings to process a range of data from the
target was not selected was due to peak filtering settings and not the baseliner – This is shown as the small
to quantify these aspects and evaluate solutions. AND (has no UV response) then this target compound may still be relevant to further investigation.   same method and instrument but across a different days.
chromatogram inset within the graph.
 (Target is weak by MS (Positive AND/OR Negative ionization)) AND (% TIC Area contribution is high) 4) To be able to define logical tests across a range of detector streams that reduces
The key thing that this study clearly highlights, is that careful choice of baselining is a key criterion in obtaining

Method accurate results which require minimal user review. AND (has a high %UV Area contribution) may still be relevant for further investigation. UV response the need for manual results QC.
could be based on multiple UV wavelengths Eg: 210 nm, 254nm and 310 nm.
• A statistically relevant batch of 600 random crude synthesis data sets were selected Overlaid Plot of Differences for for two baseliner Algorithms (Baseline Algorithm 1 and
AsLS2) for UV310 Within the application used for these experiments, is the ability to calculate a wide range of user defined
representing what we can refer to as reasonably challenging samples. Additionally these 1) Accuracy of baselining and Peak integrations.
100
mathematical expressions where the expressions can use calculated peak results. The calculated expression
samples are representative of the type of samples that would be found in Library Manual peak Addition.
Low signal Intensity, filtered by peak
values are then exposed in the application interface and can then be applied as interactive slider and logical This study confirms our earlier statement that not all baseline algorithms produce
Management Support and Medicinal Chemistry.  The data were originally acquired using an 80
selection settings
query based visualization filters  to highlight  samples of interest. equivalent results. Comparisons of our own internal baselining algorithms as
Agilent Technologies Ion Trap, with the following streams of data [MS1 (+ve), UV310 and
presented here clearly shows that the AsLS2 baseline outperforms the comparison
ELSD]. A fast chromatographic gradient over 2 minutes was used for separation of the 60 Example implementation
peak picking based (Baseline Algorithm 1).
substances.

Data Processing 40 In making our manual review of integrated peaks, we found that the majority of
For a Target to be Found the following conditions must be met: failure situations fell into a number of basic categories
%Deviation from manually reviewed results

• All data were analysed using Virscidian’s Analytical Studio Professional‐Process Chemistry
20
Plug‐in software pre‐release version 1.2. Target = Y AND (Area% UV210 >= 80% OR Area% UV254 >= 80%)
• The original instrument raw data were imported and converted to an Analytical Studio For a Target to be classified as a Maybe then the following conditions must be met:
0 • Recurrent sample‐to‐sample baseline disturbances contributed to the majority of
Archive file (*.ASA) for processing. 0 50 100 150 200 250 300 350 400 450 500

Target = Y AND ((Area% UV210 is between 50 AND 80%)  OR (Area% UV254 is between 50 AND 80%) none sample related peaks that were picked and included in the  final result. While
• Processing Method was optimized for: ‐20
a background subtraction would help clearly some form of sample to sample
• Peak picking, integration and peak selection criteria using the interactive tuning For a Target to be classified as Not Found then the following conditions must be met:
baseline realignment and recognition may provide additional improvements.
system included in the software application and shown in Figure 1. An integration ‐40
Target = Y or N AND (Area% UV210 <50%) OR (Area% UV254 < 50%)

window was set to remove the contributions from the solvent front and the tail end • Missed peaks due to tail or fronting effects. A small percentage of samples required
some form of manual peak reintegration or peak integration addition to overcome
of the gradient where some excessive baseline ripples were present. ‐60

• Specific Method Settings Challenge 2 – Visualization of Target Relevance some form of shouldering on peaks that were highly overlapped or of poor signal to

• Two different processing methods were then saved with optimized baseline settings ‐80 noise. These are challenging situations for any automated algorithm to deal with. It
Another challenge is how to visualize arrays of results in a way that facilitates decision making based on the
for the following two test baseline algorithms that form the focus of this evaluation. is possible with some data driven adaptive approaches may provide additional
Target Relevance. One approach that has been adopted is to allow the values of calculated expressions to be
• Baseline Algorithm 1– A generic peak picking based algorithm.   ‐100
Sample Number improvements.
visualized using a user defined query and coloration system as shown immediately below. Note the differences
Baseline Algorithm 1 (%Area) Normalized AsLs2 Difference %AreaCOI(UV310)
• AsLS2 – A proprietary in‐house developed baseline based on a least squares which are displayed as blue colored markers. The colorations are simply controlled by the query system. If a • Occasional peak filtration due to peak picking and selection criteria. These were
approach. Figure 2: Plot of the deviation in Area% for  two baseline algorithms (Baseline Algorithm 1 and AsLs2) and the categorized as:
different series of target relevance criteria are required these are added as new expressions and queries.
corresponding manually reviewed and integrated results
• Batches of data were then selected from different non‐consecutive days of sample
o Low intensity peaks that were below the defined minimum area for peak
acquisitions to make up the test sample collection.
TRADITIONAL PLATE VISUALIZATION selection.
• All data were processed first using the Baseline Algorithm 1 and then secondly with the Sample‐to‐sample challenges
AsLS2 baseliner and an Excel peak report created without review of the results in each case. o Peaks dramatically wider than the normal peak widths set in the processing
Another challenge in obtaining reliable  %Area calculation results  is being able to differentiate baseline disturbances
TRADITIONAL PLATE DISPLAY
• The post AsLS2 baseline results were then inspected manually and where appropriate
from sample related peaks. The example below provides an indication of the some challenges that were faced during
method.

baseline adjustment, peak re‐integrations were made and peaks added if required.
this investigation. Even in the presence of the baseline ripples at the end of the chromatography, both baseliners and
• For each baseline algorithm tested, the %Area results for the target were subtracted from
the peak filtration parameters that were applied, were able to deal with the majority of these issues. Certainly

COMPOSITE DETECTOR STREAM RESULT VISUALIZATION
While 100% accuracy in automated results is the shared goal in the community, the
the manually integrated results and then normalized to the %area of the manually integrated
additional future investigations for sample‐to‐sample baseline recognition and realignment may make an
results.   (TARGET FOUND) AND (%AREA(210)>=90% AND
practicality of real data means that challenges will still persist.  A combination of result
incremental improvement. visualization approaches and exposure of data validation elements can provide a key
• Figure 1 shows an example low intensity chromatogram of an expected target. Note that %AREA(254)>=90%)
way to guide reviewer to these problems as shown in this poster.
intensities as would be normally expected in this type of experiment ranged from no
detection through to saturation.
(TARGET FOUND) AND ((%AREA(210)>=50%
AND <90%) OR (%AREA(254)>=50% AND <90%)) Certainly all baseline algorithms are not equivalent. A high performance baseliner is
imperative if high accuracy results that require minimum quality control are the goal.
((TARGET NOT FOUND) OR ((TARGET FOUND) We have found that equally important are the Peak picking and peak filtration
AND (%AREA(210)<50%) OR
(%AREA(254)<50%))) algorithms that are able to differentiate peaks from noise.

Figure 4:  Rapid batch‐wise visualization of complex logic and value based decision making for target selection

For Further Information

www.virscidian.com

Contact Joseph Simpkins at jsimpkins@virscidian.com
Contact Mark Bayliss at mbayliss@virscidian.com
Figure 1: Low intensity UV310 extracted chromatogram that was used to calculate %Area for this Figure 3: Overlay visualization of UV310 for 64 test samples extracted from the matrix of samples processed. Note
series of analyses.   the baseline resonance towards the end of the chromatographic analysis. This area is problematic for any baseliner Virscidian Inc. 7330 Chapel Hill Road, Suite  201, Raleigh, NC 27607,
Peaks which displayed with a cross (x) are peaks that have been peak picked but filtered by user and peak detection algorithm. Removal of this region was not possible due to elution of a small number of Target
USA
defined peak selection settings within the processing method. compounds.
Due to the fast gradient, these late eluting resonance peaks are typically shifted in their retention times making a (919) 809‐7651  or  (919) 655 8050
simple baseline subtraction approach less effective.

Virscidian Poster Asms2010 Final Version Letter

Recommended

Recommended

More Related Content

What's hot

What's hot (13)

Similar to Virscidian Poster Asms2010 Final Version Letter

Similar to Virscidian Poster Asms2010 Final Version Letter (20)

Recently uploaded

Recently uploaded (20)

Virscidian Poster Asms2010 Final Version Letter