EUGM 2013 - Michael Dippolito (Deltasoft): Great Migrations! – Approaches to moving your chemistry


Published on

Many research teams are facing chemistry data migrations in the coming months due to legacy system retirements and/or the opportunity to move to new informatics platforms. Some approaches, case studies, and experiences will be shared on migrating chemistry data between vendor technologies, with a focus on moving to ChemAxon cartridge technology.

Published in: Technology, Travel
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

EUGM 2013 - Michael Dippolito (Deltasoft): Great Migrations! – Approaches to moving your chemistry

  1. 1. Great Migrations!Approaches to Moving your ChemistryMichael Dippolito – 2013 ChemAxon UGM Budapest
  2. 2. 2013 ChemAxon UGM BudapestDeltaSoft – Migrations ‘R’ UsAccelrysAccordAccelrysDirectChemAxonJChemPerkin ElmerCSCartridgeOpenEyeCartridgeChemCartQuery, Browse, Update,Report, AnalyzeChemCartChoose yourchemistry
  3. 3. 2013 ChemAxon UGM BudapestMigration DrijversTechnology retirementMergers and AcquisitionsCompetitive AdvantageCost Savings
  4. 4. 2013 ChemAxon UGM BudapestChemistry Oracle CartridgesChemAxon JChemAccelrys (formerly Symyx, MDL, ISIS) DirectAccelrys AccordPerkinElmer CambridgeSoft CartridgeOpenEye CartridgeDaylight DayCartInfoChem ICCartridgeIDBS ChemXtraScitouch (GGA) BingoDotmatics PinpointDigital Chemistry TorusScillegence MolSQLBiochemfusion ProteaxEBI OrChemCustom Cartridge
  5. 5. 2013 ChemAxon UGM BudapestOlder technologiesISIS/Host, ISIS/BaseChemFinderSpreadsheetsetc…
  6. 6. 2013 ChemAxon UGM BudapestChemistry Format ‘Standards’• Molfile• V2000 – The ‘original’ molfile• V3000 – Extended molfile capabilities• SDFile – Molfile + Data• SMILES• INCHI• many others…
  7. 7. 2013 ChemAxon UGM BudapestChemical structure indexingOracle Cartridge Domain Index provideschemical operators for Oracle• JChem cartridge indexes a number of formats directly• Migration needed for some propietary formats• Many options are possible for migration• A few approaches and case studies will be shared…
  8. 8. 2013 ChemAxon UGM BudapestTools of the tradeCartridge operators• Accessing chemistryJChem MolConvert• Converting chemical formatsStructure Standardizer• Clean up and standardizeSQL scripts• Easily automate migration and testingCloud servers• Great staging, test, and work environment
  9. 9. 2013 ChemAxon UGM BudapestMigration Approach 1SDFile or SMILES export / import• Fairly simple• Best for older technologies or small databases• Less practical for large databases• Requires• Tool or script for import / export• Field mapping
  10. 10. 2013 ChemAxon UGM BudapestCase Study #1ISIS / Base  ChemAxon JChem• Small local ISIS database• Export SDFile from ISIS• Create Oracle table with SDFile fields• Map and import SDFile using ChemCart SDFile import
  11. 11. 2013 ChemAxon UGM BudapestMigration Approach 2One Table, Two Cartridge• Install JChem cartridge in same instance• Add additional field for new structure• Use cartridge operators to populate new structurefield (molfile or smiles)• Ex: update moltable set newstructure = molfile(oldstr);• Create JChem domain index on new structure field• Least movement of data, fastest for large datasets
  12. 12. 2013 ChemAxon UGM BudapestCase Study #2• Accord  JChem conversion• Large Accord cartridge database• Added new field for JChem structure• Convert and insert newstructure• Update structure_table set JChem_struct =Accord.convertout(oldstruct, ‘MDL Molfile’)• Remove old structure field• > 99% conversion• Manual remediation of ‘bad’ structures
  13. 13. 2013 ChemAxon UGM Budapest‘Convert to JChem’ button
  14. 14. 2013 ChemAxon UGM Budapest
  15. 15. 2013 ChemAxon UGM BudapestMigration Approach 3Two Table, Two Cartridge• Install JChem cartridge in same instance• Add new table with new structure field• Use cartridge operators (or JChem molconvert) topopulate new structure table and field (molfile orsmiles)• Ex: insert into moltable2.newstructure (select molfile(oldstr)from moltable)• Create JChem domain index on new structure field
  16. 16. 2013 ChemAxon UGM BudapestCase Study #3CambridgeSoft  ChemAxon JChem conversion• Registry and ELN conversion• Molecules and reactions converted using JChemmolconvert• Data reformatted to new table structures• Some manual remediation needed
  17. 17. 2013 ChemAxon UGM BudapestMigration Approach #4Two instance• Install JChem cartridge in 2nd instance• Convert structure using cartridge tools in 1st instance• Export schema from Oracle instance 1• Import schema into Oracle instance 2
  18. 18. 2013 ChemAxon UGM BudapestCase Study #4Accelrys Direct  ChemAxon JChem• Registry database• Molfile column added to structure table• Populated using Direct cartridge tools• Export complete schema• Import to 2nd instance• Create JChem domain index
  19. 19. 2013 ChemAxon UGM BudapestValidation & VerificationAutomated checks• SQL scripts• Search each structure with exact match to ensure it returns• Search each structure with substructure search to ensure it iscontained in the hits• Timings – before and after migrationManual checks• Searching• Registration
  20. 20. 2013 ChemAxon UGM BudapestIdeally invisible to the userBefore After
  21. 21. 2013 ChemAxon UGM BudapestSummary• Chemistry migrations are likely in yourfuture.• Several good approaches are possible.• With proper planning you can experience apainless and Great Migration!