Validating Automated Structure                                                                                            ...
Upcoming SlideShare
Loading in...5
×

Validating automated structure confirmation in a blind study

91

Published on

PRESENTED AT ENC MEETING IN 2007

The current study focuses not only on the performance of the verification algorithms but also on the automated preparation of experimental data through a blind test. This study was designed to prove that such a system would hold up in an industrial environment without any human intervention.

This study consisted of two distinct sets of structures and spectra. The first contained 19 spectra sets (each dataset contained 1D 1H and 2D HSQC spectra) that were provided ahead of time for adjustment of processing settings and options. This step was necessary to identify the best software settings based on the instrument and data collection practices for the laboratory where the samples were prepared and run. Once the first set was run through the system and results of the verification procedure obtained, the second, blind test, was performed on 10 distinct datasets (with chemical structures) that were not available to the software or the software operators in advance. The details and results of these two tests are presented here, along with a comprehensive look at the structures that could not be confirmed.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
91
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Validating automated structure confirmation in a blind study

  1. 1. Validating Automated Structure Ryan Sasaki, Brent Lefebvre, Antony J. Williams, and Sergey GolotvinConfirmation in a Blind Study Advanced Chemistry Development, Inc. Toronto, ON, CanadaINTRODUCTION Looking closer at the failed samples provides some insight ID Verification Result to the nature of the failure. ID #3 failed specificallyIn previous work, we have presented several findings on 1 0.95 2 0.81 because of an unassigned 2D peak appearing at 17.8 ppmthe automated evaluation of chemical structures using 1H,13C, and 2D NMR verification algorithms.1–2 These 3 0.98 and 2.2 ppm. It was determined that this peak was due to 4 0.75 the aromatic methyl group in the structure. The reason forstudies have shown that these systems have performed 5 0.79 6 0.00 a lack of consistency between the predicted andextremely well through numerous challenges. 7 0.76 experimental chemical shifts in this case were due to a 8 0.36 slow rotation around the N-CO bond. As a result, thisThe current study focuses not only on the performance of 9 0.66 10 0.75 rotamer produces an experimental spectrum that looks likethe verification algorithms but also on the automated 11 0.00 a mixture. The software was unable to accurately predictpreparation of experimental data through a blind test. This 12 0.29 this mixture of forms based on the experimentalstudy was designed to prove that such a system would hold 13 0.40 14 0.92 conditions, and as a result, the predicted spectrum did notup in an industrial environment without any human 15 0.29 match the experimental and the sample was flagged for Figure 6. ID #10 failed due to the close proximity ofintervention. 16 0.95 17 0.50 manual analysis. an important multiplet to an intense water peak in the 18 0.88 experimental spectrum.This study consisted of two distinct sets of structures and CH3 19 0.57 Ospectra. The first contained 19 spectra sets (each datasetcontained 1D 1H and 2D HSQC spectra) that were Table 1. The results of the 19 Aldrich datasets. For NH NH NH N CONCLUSIONS this dataset, 13 of the 19 datasets (69%) were N The goal of this study was to evaluate a fully automatedprovided ahead of time for adjustment of processingsettings and options. This step was necessary to identify automatically evaluated by the software. O H O NMR processing and structure verification workflow for athe best software settings based on the instrument and data Cl blind test set of compounds. The processing andcollection practices for the laboratory where the samples The software was unable to confirm the proposed structure evaluation settings for a typical group of samples was set in Figure 1 (ID #8). Upon closer inspection it was Figure 2. ID #3 was a false negative as the software up using a pilot set of 19 compounds. Once these settingswere prepared and run. Once the first set was run through observed that the software failed because the experimental was unable to assign the methyl group (highlighted in were adjusted, they were used to automatically process andthe system and results of the verification procedure peak located at 4.74 ppm corresponding to atom #11 in the blue) due to slow rotation around the N-CO bond. evaluate a blind set of 10 compounds that were preparedobtained, the second, blind test, was performed on 10 proposed structure had an integration value that was too The presence of this rotamer resulted in an under the same conditions. The results revealed that thisdistinct datasets (with chemical structures) that were not low to assign correctly. As a result, the software flagged inconsistency between the experimental and completely automated system could reduce theavailable to the software or the software operators in this result as ambiguous. It was later observed that the low predicted chemical shifts. interpretation workload of a spectroscopist by up to 90% ifadvance. The details and results of these two tests arepresented here, along with a comprehensive look at the integration value could be due to enol formation. A longer problems with rotomers and impurities are filtered out relaxation delay may have more adequately prepared the The software also flagged the sample ID #7 to be before the NMR Verification step, up to 70% when thesestructures that could not be confirmed. experimental spectrum for automatic evaluation by the considered for closer inspection. In this particular case, problem samples are left in. software. the issue with this spectrum was determined to be based onSetting Up Ideal Processing and the presence of a mixture. Based on some of the spectral This study highlighted several examples where datasetsEvaluation Parameters features in both the 1H and HSQC–DEPT spectra, it is were flagged by the software for closer inspection by aIn order to have a system that can run without human believed that some of the product had converted to an spectroscopist. These particular examples illustrate theintervention, automated processing and structure alcohol resulting in a mixture of both the brominated and software’s discrimination ability that help reduce the riskverification procedures (macros) must be created in the hydroxylated products. As a result, the software correctly of false positives. The results of this blind study suggestedsoftware to perform these tasks. The raw 1D and 2D NMR identified this spectrum as not being consistent with the that a fully automated processing and interpretation systemdatasets for 19 Aldrich compounds were first evaluated proposed structure and it was flagged for manual analysis. can perform sufficiently in an industrial environment.using ACD/Labs’ standard macros. These settings provedto be non-sufficient as the datasets contained several ACKNOWLEDGEMENTS Br OHabnormally broad water peaks and low signal-to-noise The authors would like to acknowledge Dr. Timothy D.ratios. These macros were then modified to exclude these Spitzer and Randy D. Rutkowske of GlaxoSmithKline forwater peaks and set more stringent peak picking guidelines providing us with NMR data for the compounds in thisto combat the S/N issues. The second attempt was Figure 1. ID #8 was a false negative because the study.improved but had some issues with the referencing in one integration value for the multiplet at 4.74 ppm was too N Nof the 2D datasets. In addition, the 1D spectra were not low.well-resolved, resulting in an inaccurate evaluation of REFERENCES A second compound, ID #11, was rejected by the software Figure 3. ID #7 was flagged as an ambiguous result. 1. Automated Structure Verification Based on 1H NMRsome multiplets. These issues were rectified by decreasing Prediction, Sergey S. Golotvin, Eugene Vodopianov, Brentthe line broadening setting in the software by a factor of as well. Upon closer inspection of the experimental data, Spectral features in the 1H and HSQC–DEPT dataset it was determined that the purity of this compound was not A. Lefebvre, Antony J. Williams, and Timothy D. Spitzer.10. Following this modification, the settings were then suggest a mixture between the two compounds sufficient and that the sample contained several different Magn. Reson. Chem. 2006; 44: 524-538.deemed to be sufficient. shown above. components. In these two particular examples, the software did a good job of flagging the two problem 2. Automated Evaluation of a Chemical Structure with The final sample in the blind test set (ID #10) was also Only 1D 1H and 2D 1H–13C HSQC, Sergey S. Golotvin,Results of the First Test spectra that required a closer look. flagged by the software. In this particular case, theAn explanation of the combined verification algorithms Eugene Vodopianov, Rostislav Pol, Brent A. Lefebvre, software was unable to identify two protons in the Antony J. Williams, and Timothy D. Spitzer . ENC Posterused to evaluate spectrum-to-structure matches have been Results of the Blind Test experimental spectrum because the software correctly set apreviously reported.2 Following the modification of 2006. After the previous results and settings had been agreed dark region over a large water peak. Unfortunately, theACD/Labs’ standard macros explained in the previous upon, a blind test set of 10 compounds was run through the creation of this dark region resulted in the exclusion of ansection, the raw data of the 19 Aldrich compounds were system in the exact same fashion. No changes to the important multiplet in the experimental spectrum that wasfully processed and evaluated automatically. processing or verification parameters were made. The in close proximity to the water peak. Because of this dark results of this test are shown in table 2.The results revealed that the software was able to region, the software was unable to confirm a match ID Verification Resultsuccessfully evaluate 13 of the 19 datasets provided. In 1 0.90 between the spectrum and structure.other words, for this particular dataset, 69% of the samples 2 0.62 3 0.12were automatically evaluated by software without any 4 0.99human intervention. The remaining 6 samples would 5 0.79 110 Yonge Street, 14th floor, Torontorequire manual analysis by an NMR Spectroscopist as the 6 0.99 Ontario, Canada M5C 1T4software had flagged them as being either inconsistent or 7 0.39 8 0.85 Tel: (416) 368-3435incorrect (Table 1). Of these 6 samples, it was concluded 9 0.92that 4 of the false negatives were a result of algorithm Fax: (416) 368-5596 10 0.18errors that have been fixed in Version 10 of the software Toll Free: 1-800-304-3988(ID# 6, 12, 13, and 15). The other two ambiguous results Table 2. The results of the 10 blind Aldrich datasets. For this dataset, 7 of the 10 datasets (70%) were Email: info@acdlabs.comrequire a closer look to be explained. automatically evaluated by the software.

×