Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

421 views

Published on

Accelrys European User Group Meeting, Barcelona, Spain, October 26, 2010

Published in: Health & Medicine, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
421
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Intervet Chemicals Directory (ICD) - A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris

  1. 1. Intervet Chemicals Directory (ICD) A Framework Combining Accelrys Pipeline Pilot and Symyx Isentris Frank Oellien 10/26/2010 Accelrys European User Group Meeting, Barcelona
  2. 2. Outline • Motivation ICD project (historical review) • Technical Implementation (2003) • ICD Today (Enhancements in the last years) • Technical limitations of the Isentris approach • Solution: Combining Symyx Isentris & Accelrys PP – Structure Registration, Synchronization – Database Cleaning – Property-Calculations SP Intervet Chemicals Directory (ICD) 10/26/2010 2
  3. 3. Motivation • Start of the ICD project 2003 • Company was still young • BioChemInformatics group (more precisely the cheminformatics branch) started its work on regular basis – – – – Ligand- and Structure-based Virtual Screening (LO and Hit2Lead projects) Property and Descriptor Calculations QSAR Substructure- and Similarity Searches → Access to many in-house data sources especially structures required → Many exchange formats used (including Excel and SD files) → Many diverse tools and applications used SP Intervet Chemicals Directory (ICD) 10/26/2010 3
  4. 4. Pre-ICD Time (before Q2 2003) SP Intervet Chemicals Directory (ICD) 10/26/2010 4
  5. 5. The Idea – A Central Data Source Other Data Sources In-house Databases Supplier Data SD SD SD SD SD SD ICD CompLog BCI Applications Medicinal Chemists SP Intervet Chemicals Directory (ICD) 10/26/2010 5
  6. 6. Requirements • Standard data source for all BCI tasks • Merged data source including in-house structures, supplier structures and other data sources • Dynamically updated • Structure database with unique structure identifier • Standardized and Normalized data (including chemical normalization) • Extendable system that can store other BCI-relevant information (e.g. virtual screening data) Ask other Scientists in the Drug Discovery department • Storing supplier catalogues and other supplier information • Data source for compound ordering • Accessible by other scientists (especially medicinal chemists) • Storage of physico-chemical properties for research projects SP Intervet Chemicals Directory (ICD) 10/26/2010 6
  7. 7. Implementation: Reasons for Isentris (2003) • Not many systems available in 2003 (Auspyx, Acorrd, Isentris) • Isentris used many technolgies that were already available in-house (MDL Direct, Oracle) • Chemical Normalization available: Cheshire • Advanced J2EE architecture and API that allows a good customization and extension • CoRe: already an existing project based on Isentris – Intervet was an early adopter of Isentris – No additional software costs – Synergy effects (e.g. chemical business rules) SP Intervet Chemicals Directory (ICD) 10/26/2010 7
  8. 8. Implementation Overview SD Files • supplier catalogs Chemical Rules • TORE Updates (in-house) File syntax normalisation of SD files Generation of salt information and ParentHash codes (CheckAndFix_Main.cct) MDL Isentris (Client-Server) prepared SD Files CACTVS (Linux) ADME data Java application registration Java applications (Windows) Oracle SQLLoader (phys-chem properties) chemical normalisation ICD SP Intervet Chemicals Directory (ICD) 10/26/2010 8
  9. 9. Implementation: SD File Syntax Standardisation • Based on CACTVS application (by Xemistry) • SD file can have different inputs • 2 generic scripts (supplier-specific, in-house specific) to standardize the format of the input SD files and supplier-specific configuration files • SDF fields for supplier-related files: SupplierName, OrderNo, CatalogName, CatalogType, CatalogRelease, Confidential, CompoundName, IsSalt, Salt, Quantity, Purity • SDF fields for in-house data: AHNO, CompoundName, IsSalt, Salt • Calculation of structural hash codes (parent structure hash code) Insensitive hash codes: isotope, salt, tautomer, stereochemistry • Automatically knowledge-based identification of salts → 174 different salts can be determined SP Intervet Chemicals Directory (ICD) 10/26/2010 9
  10. 10. Implementation: Chemical Normalisation • Based on Cheshire (part of the Isentris framework) • JavaScript clone • Valence checks, Ion2kov, nitro group, transition metals, queries, geometries, stereo chemistry,… • 99 rules – 45 correction functions – 29 warnings functions – 25 error functions • Used by CoRe and ICD applications • Import: molfile string • Output: molfile string and message string → Category; No of changes???list of descriptions SP Intervet Chemicals Directory (ICD) 10/26/2010 10
  11. 11. Implementation: Registration • Based on Symyx Isentris Java Client (now Accelrys Isentris) • Using Isentris Data Sources (Data Source Factory) • 3 Java applications (in-house structures, supplier, virtual screening) → 31 java classes, ~9.500 lines code • Run types: command line, GUI, batch mode • Chem. Normalisation, duplicate check, registration logic ******************************************************* * * ICD Supplier Registration * Version null * Frank Oellien, Intervet Innovation GmbH * ******************************************************* 1:10:21 PM INFO: Chemical Normalization status: 304334 records without changes 4685 records fixed 11 records fixed but still have warnings 200 records with warnings 2 records with errors 1:10:21 PM INFO: Chemical Registration status: 1:10:21 PM INFO: New supplier has been registered. 309230 records to register 309222 records passed registration 8 records failed registration 180284 new structues registered 128938 structues already found in the DB ******************************************************* 1:10:21 PM INFO: Closing Cheshire environment... 1:10:22 PM INFO: Releasing the ICD datasource resources ... 1:10:22 PM INFO: Closing the ICD DataSourceFactory... 1:10:22 PM INFO: Logout... 1:10:22 PM INFO: All resources released. SP Intervet Chemicals Directory (ICD) 10/26/2010 11
  12. 12. ICD Today - Datasheet • ~ 11,500,000 structures • 237 different catalogues (including screening libraries, focused data sets) • 60 suppliers • A broad range of standard pysicochemical properties • Intervet’s in-house database • Specific Intervet data sets • References to external sources (PubChem) SP Intervet Chemicals Directory (ICD) 10/26/2010 12
  13. 13. ICD Today – Change of Relevance • Still the main data source for the BCI group, although almost all other BCI technologies have changed in the meantime • Moreover, has become a key technology platform for the whole Drug Discovery process – Almost all compound logistic activities are based on the ICD (Applications for compound ordering) – Stores specific essential information for CompLog – Important database for Hit2Lead and LO projects (contains decision-critical properties) – Has become the most important structure-database for medicinal chemists • Isentris upgrade to 3.1 → re-design of the ICD Isentris part necessary • New demands by BCI and others had to be implemented → could not be realized with former setup because of limitations • Solution: Combination with Pipeline Pilot SP Intervet Chemicals Directory (ICD) 10/26/2010 13
  14. 14. Limitations of the original Isentris Setup From the Beginning • Starting with Isentris 1.1, early adopters • Hard to implement: large, over-designed J2EE API, no developer guides, only some small code snippets • Limited and complicated functions – e.g. no support for very large structure files • Re-design of applications was necessary, because of Isentris updates • No automation, everything is done in user context! Regarding recent Demands • Missing Automation was still most critical issue: – Synchronisation – Adding non-structural data • Elaborate database cleaning mechanisms SP Intervet Chemicals Directory (ICD) 10/26/2010 14
  15. 15. Registration of Supplier Cataloges SD Files Chemical Rules (CheckAndFix_Main.cct) structural normalisation of SD files Generation of salt information and ParentHash codes CACTVS (Linux) MDL Isentris (Client-Server) prepared SD Files chemical normalisation registration Java applications (Windows) ICD SP Intervet Chemicals Directory (ICD) 10/26/2010 15
  16. 16. Registration of in-house Structures by PP I in-house database structural normalisation of SD files Chemical Rules (CheckAndFix_Main.cct) Generation of salt information and ParentHash codes CACTVS called by PP Synchronisation by Pipeline Pilot (Linux) chemical normalisation registration Cheshire PP Component ICD SP Intervet Chemicals Directory (ICD) 10/26/2010 16
  17. 17. Registration of in-house Structures by PP II Retrieve structures from database Call CACTVS application Chemical Normalisation & Registration SP Intervet Chemicals Directory (ICD) 10/26/2010 17
  18. 18. Cheshire PP Component (Java) • Implemented as PP Java component • Based on Cheshire Java API • Calls Cheshire core library (shared object files called by JNI) SP Intervet Chemicals Directory (ICD) 10/26/2010 18
  19. 19. Cheshire PP Component (Java) • Implemented as PP Java component • Based on Cheshire Java API • Calls Cheshire core library (shared object files called by JNI) SP Intervet Chemicals Directory (ICD) 10/26/2010 19
  20. 20. Cheshire PP Component (Java) SP Intervet Chemicals Directory (ICD) 10/26/2010 20
  21. 21. Importing physico-chemical Properties I ADME data Oracle SQLLoader (phys-chem properties) Java application ICD SP Intervet Chemicals Directory (ICD) 10/26/2010 21
  22. 22. Importing physico-chemical Properties I External application 2 (descriptos) External ADME data application 3 (phys-chem (descriptos) properties) External application 4 (descriptos) External application 1 (standardize) Retrieval of structures without properties Oracle SQLLoader Java application Internal PP components (descriptors) ICD Import properties Managed by Pipeline Pilot (Linux) SP Intervet Chemicals Directory (ICD) 10/26/2010 22
  23. 23. Importing physico-chemical Properties II SP Intervet Chemicals Directory (ICD) 10/26/2010 23
  24. 24. Database Maintenance SP Intervet Chemicals Directory (ICD) 10/26/2010 24
  25. 25. Isentris PP Components @ SP Intervet • Isentris Cheshire PP • Converter: – – – – Chime string to Molecule Chime string to CTAB Molecule to Chime string CTAB to Chime string SP Intervet Chemicals Directory (ICD) 10/26/2010 25
  26. 26. Acknowledgement Information Management • Werner Schlüter • Thomas Fischer BioChemInformatics • Richard Marhöfer • Andreas Krasky • (Jörg Cramer) • Jörg Schröder • Paul M. Selzer Thank you SP Intervet Chemicals Directory (ICD) 10/26/2010 26

×