Combining disparate cheminformatics resources into a single toolkitNoel M. O’Boyle and Geoffrey R. HutchisonMar 2010239th ACS National Meeting, San Francisco
Toolkits, toolkits and more toolkitsCommercial cheminformatics toolkits:
Toolkits, toolkits and more toolkitsOpen Source cheminformatics toolkits:CDKOpenBabelOASAPerlMol
The importance of being interoperableGood for usersCan take advantage of complementary featuresCDK:Gasteiger π charges, maximal common substructure, shape similarity with ultrafast shape descriptors, mass-spectrometry analysisRDKit:RECAP fragmentation, calculation of R/S, atom pair fingerprints, shape similarity with volume overlapOpenBabel:several forcefields, crystallography, large number of file formats, conformer searching, InChIKey
The importance of being interoperableGood for usersCan take advantage of complementary featuresCan choose between different implementationsFaster SMARTS searching, better 2D depiction, more accurate 3D structure generationAvoid vendor lock-inGood for developersLess reinvention of wheel, more time to spend on development of complementary featuresAvoid balkanisation of fieldBigger pool of users
J. Chem. Inf. Model., 2006, 46, 991http://www.blueobelisk.org
J. Chem. Inf. Model., 2006, 46, 991http://www.blueobelisk.org
Bringing it all together with CinfonyDifferent languagesJava (CDK), C++ (OpenBabel, RDKit)Use Python, a higher-level language that can bridge to bothDifferent APIsEach toolkit uses different commands to carry out the same tasksImplement a common APIDifferent chemical modelsDifferent internal representation of a moleculeUse existing method for storage and transfer of chemical information: chemical file formatsMDL mol file for 2D and 3D, SMILES for 0D
Cinfony API
One API to rule them allExample - create a Molecule from a SMILES string:mol = Chem.MolFromSmiles(SMILESstring)mol = openbabel.OBMol()obconversion = openbabel.OBConversion()obconversion.SetInFormat("smi")obconversion.ReadString(mol, SMILESstring)OpenBabelCDKbuilder = cdk.DefaultChemObjectBuilder.getInstance()sp = cdk.smiles.SmilesParser(builder)mol = sp.parseSmiles(SMILESstring)RDKitmol = toolkit.readstring("smi", SMILESstring)wheretoolkit is either obabel, cdk or rdk
Design of Cinfony APIAPI is small (“fits your brain”)Covers core functionality of toolkitsCorollary: need to access underlying toolkit for additional functionalityMakes it easy to carry out common tasksAPI is stableMake it easy to find relevant methodsExample: add hydrogens to a moleculeatommanip = cdk.tools.manipulator.AtomContainerManipulatoratommanip.convertImplicitToExplicitHydrogens(molecule)CDKmolecule.addh()
cinfony.toolkit
cinfony.toolkit.Molecule
Examples of useChemistry Toolkit Rosettahttp://ctr.wikia.comAndrew Dalke
Combining toolkits>>> from cinfony import rdk, cdk, obabel>>> obabelmol = obabel.readstring("smi", "CCC")>>> rdkmol = rdk.Molecule(obabelmol)>>> rdkmol.draw(show=False, filename="propane.png")>>> print cdk.Molecule(rdkmol).calcdesc(){'chi0C': 2.7071067811865475, 'BCUT.4': 4.4795252101839402, 'rotatableBondsCount': 2, 'mde.9': 0.0, 'mde.8': 0.0, ... }Import CinfonyRead in a molecule from a SMILES string with OpenBabelConvert it to an RDKit MoleculeCreate a 2D depiction of the molecule with RDKitConvert it to a CDK Molecule and calculate descriptor values
Comparing toolkits>>> from cinfony import rdk, cdk, obabel>>> for toolkit in [rdk, cdk, obabel]:...     mol = toolkit.readstring("smi", "CCC")...     print mol.molwt...     mol.draw(filename="%s.png" % toolkit.__name__)Import CinfonyFor each toolkit...... Read in a molecule from a SMILES string... Print its molecular weight... Create a 2D depictionUseful for sanity checks, identifying limitations, bugsCalculating the molecular weight (http://tinyurl.com/chemacs3)implicit hydrogen, isotopesComparison of descriptor values (http://tinyurl.com/chemacs2)Should be highly correlatedComparison of depictions (http://tinyurl.com/chemacs1)
Cinfony and the Web
Webel - Chemistry for Web 2.0Webel is a new Cinfony module that runs entirely using web servicesCDK webservices by Rajarshi Guha, hosted at Uppsala UniversityNCI/CADD Chemical Identifier Resolver by Markus Sitzmann (uses Cactvs for much of backend) - see CINF147 at 2:20pm in Room 212Easy to install – no dependenciesCan be used in environments where installing a cheminformatics toolkit is not possibleWeb services may provide additional services not available elsewhereExample: how similar is aspirin to Dr. Scholl’s Wart Remover Kit?>>> from cinfony import webel>>> aspirin = webel.readstring("name", "aspirin")>>> wartremover = webel.readstring("name",...                     "Dr. Scholl’s Wart Remover Kit")>>> print aspirin.calcfp() | wartremover.calcfp()0.59375
Webel - Chemistry for Web 2.0Webel is a new Cinfony module that runs entirely using web servicesCDK webservices by Rajarshi Guha, hosted at Uppsala UniversityNCI/CADD Chemical Identifier Resolver by Markus Sitzmann (uses Cactvs for much of backend) - see CINF147 at 2:20pm in Room 212Easy to install – no dependenciesCan be used in environments where installing a cheminformatics toolkit is not possibleWeb services may provide additional services not available elsewhereExample: how similar is aspirin to Dr. Scholl’s Wart Remover Kit?>>> from cinfony import webel>>> aspirin = webel.readstring("name", "aspirin")>>> wartremover = webel.readstring("name",...                     "Dr. Scholl’s Wart Remover Kit")>>> print aspirin.calcfp() | wartremover.calcfp()0.59375
Cheminformatics in the browserSee http://baoilleach.webfactional.com/site_media/webel/ or just Google “webelsilverlight”
makes it easy to...Start using a new toolkitCarry out common tasksCombine functionality from different toolkitsCompare results from different toolkitsDo cheminformatics through the web, and on the web
Combining disparate cheminformatics resources into a single toolkitChem. Cent. J., 2008, 2, 24.http://cinfony.googlecode.comhttp://baoilleach.blogspot.comAcknowledgementsCDK: Egon Willighagen, Rajarshi GuhaOpenBabel: Chris Morley,Tim VandermeerschRDKit: Greg LandrumOASA: Beda KosataJPype: Steve MénardChemical Identifier Resolver: Markus SitzmannInteractive Tutorial: Michael FoordImage: Tintin44 (Flickr)
Cheminformatics in the browserAs Webel is pure Python, it can run places where traditional cheminformatics software cannot......such as in a web browserMicrosoft have developed a browser plugin called Silverlight for developing applications for the webIt includes a Python interpreter (IronPython)So you can use Webel in Silverlight applicationsMichael Foord has developed an interactive Python tutorial using SilverlightSee http://ironpython.net/tutorial/I have combined this with Webel to develop an interactive Cheminformatics tutorial
Performance

Cinfony - Combining disparate cheminformatics resources into a single toolkit

  • 1.
    Combining disparate cheminformaticsresources into a single toolkitNoel M. O’Boyle and Geoffrey R. HutchisonMar 2010239th ACS National Meeting, San Francisco
  • 2.
    Toolkits, toolkits andmore toolkitsCommercial cheminformatics toolkits:
  • 3.
    Toolkits, toolkits andmore toolkitsOpen Source cheminformatics toolkits:CDKOpenBabelOASAPerlMol
  • 4.
    The importance ofbeing interoperableGood for usersCan take advantage of complementary featuresCDK:Gasteiger π charges, maximal common substructure, shape similarity with ultrafast shape descriptors, mass-spectrometry analysisRDKit:RECAP fragmentation, calculation of R/S, atom pair fingerprints, shape similarity with volume overlapOpenBabel:several forcefields, crystallography, large number of file formats, conformer searching, InChIKey
  • 5.
    The importance ofbeing interoperableGood for usersCan take advantage of complementary featuresCan choose between different implementationsFaster SMARTS searching, better 2D depiction, more accurate 3D structure generationAvoid vendor lock-inGood for developersLess reinvention of wheel, more time to spend on development of complementary featuresAvoid balkanisation of fieldBigger pool of users
  • 6.
    J. Chem. Inf.Model., 2006, 46, 991http://www.blueobelisk.org
  • 7.
    J. Chem. Inf.Model., 2006, 46, 991http://www.blueobelisk.org
  • 8.
    Bringing it alltogether with CinfonyDifferent languagesJava (CDK), C++ (OpenBabel, RDKit)Use Python, a higher-level language that can bridge to bothDifferent APIsEach toolkit uses different commands to carry out the same tasksImplement a common APIDifferent chemical modelsDifferent internal representation of a moleculeUse existing method for storage and transfer of chemical information: chemical file formatsMDL mol file for 2D and 3D, SMILES for 0D
  • 11.
  • 12.
    One API torule them allExample - create a Molecule from a SMILES string:mol = Chem.MolFromSmiles(SMILESstring)mol = openbabel.OBMol()obconversion = openbabel.OBConversion()obconversion.SetInFormat("smi")obconversion.ReadString(mol, SMILESstring)OpenBabelCDKbuilder = cdk.DefaultChemObjectBuilder.getInstance()sp = cdk.smiles.SmilesParser(builder)mol = sp.parseSmiles(SMILESstring)RDKitmol = toolkit.readstring("smi", SMILESstring)wheretoolkit is either obabel, cdk or rdk
  • 13.
    Design of CinfonyAPIAPI is small (“fits your brain”)Covers core functionality of toolkitsCorollary: need to access underlying toolkit for additional functionalityMakes it easy to carry out common tasksAPI is stableMake it easy to find relevant methodsExample: add hydrogens to a moleculeatommanip = cdk.tools.manipulator.AtomContainerManipulatoratommanip.convertImplicitToExplicitHydrogens(molecule)CDKmolecule.addh()
  • 14.
  • 15.
  • 16.
    Examples of useChemistryToolkit Rosettahttp://ctr.wikia.comAndrew Dalke
  • 17.
    Combining toolkits>>> fromcinfony import rdk, cdk, obabel>>> obabelmol = obabel.readstring("smi", "CCC")>>> rdkmol = rdk.Molecule(obabelmol)>>> rdkmol.draw(show=False, filename="propane.png")>>> print cdk.Molecule(rdkmol).calcdesc(){'chi0C': 2.7071067811865475, 'BCUT.4': 4.4795252101839402, 'rotatableBondsCount': 2, 'mde.9': 0.0, 'mde.8': 0.0, ... }Import CinfonyRead in a molecule from a SMILES string with OpenBabelConvert it to an RDKit MoleculeCreate a 2D depiction of the molecule with RDKitConvert it to a CDK Molecule and calculate descriptor values
  • 18.
    Comparing toolkits>>> fromcinfony import rdk, cdk, obabel>>> for toolkit in [rdk, cdk, obabel]:... mol = toolkit.readstring("smi", "CCC")... print mol.molwt... mol.draw(filename="%s.png" % toolkit.__name__)Import CinfonyFor each toolkit...... Read in a molecule from a SMILES string... Print its molecular weight... Create a 2D depictionUseful for sanity checks, identifying limitations, bugsCalculating the molecular weight (http://tinyurl.com/chemacs3)implicit hydrogen, isotopesComparison of descriptor values (http://tinyurl.com/chemacs2)Should be highly correlatedComparison of depictions (http://tinyurl.com/chemacs1)
  • 19.
  • 20.
    Webel - Chemistryfor Web 2.0Webel is a new Cinfony module that runs entirely using web servicesCDK webservices by Rajarshi Guha, hosted at Uppsala UniversityNCI/CADD Chemical Identifier Resolver by Markus Sitzmann (uses Cactvs for much of backend) - see CINF147 at 2:20pm in Room 212Easy to install – no dependenciesCan be used in environments where installing a cheminformatics toolkit is not possibleWeb services may provide additional services not available elsewhereExample: how similar is aspirin to Dr. Scholl’s Wart Remover Kit?>>> from cinfony import webel>>> aspirin = webel.readstring("name", "aspirin")>>> wartremover = webel.readstring("name",... "Dr. Scholl’s Wart Remover Kit")>>> print aspirin.calcfp() | wartremover.calcfp()0.59375
  • 21.
    Webel - Chemistryfor Web 2.0Webel is a new Cinfony module that runs entirely using web servicesCDK webservices by Rajarshi Guha, hosted at Uppsala UniversityNCI/CADD Chemical Identifier Resolver by Markus Sitzmann (uses Cactvs for much of backend) - see CINF147 at 2:20pm in Room 212Easy to install – no dependenciesCan be used in environments where installing a cheminformatics toolkit is not possibleWeb services may provide additional services not available elsewhereExample: how similar is aspirin to Dr. Scholl’s Wart Remover Kit?>>> from cinfony import webel>>> aspirin = webel.readstring("name", "aspirin")>>> wartremover = webel.readstring("name",... "Dr. Scholl’s Wart Remover Kit")>>> print aspirin.calcfp() | wartremover.calcfp()0.59375
  • 22.
    Cheminformatics in thebrowserSee http://baoilleach.webfactional.com/site_media/webel/ or just Google “webelsilverlight”
  • 23.
    makes it easyto...Start using a new toolkitCarry out common tasksCombine functionality from different toolkitsCompare results from different toolkitsDo cheminformatics through the web, and on the web
  • 24.
    Combining disparate cheminformaticsresources into a single toolkitChem. Cent. J., 2008, 2, 24.http://cinfony.googlecode.comhttp://baoilleach.blogspot.comAcknowledgementsCDK: Egon Willighagen, Rajarshi GuhaOpenBabel: Chris Morley,Tim VandermeerschRDKit: Greg LandrumOASA: Beda KosataJPype: Steve MénardChemical Identifier Resolver: Markus SitzmannInteractive Tutorial: Michael FoordImage: Tintin44 (Flickr)
  • 26.
    Cheminformatics in thebrowserAs Webel is pure Python, it can run places where traditional cheminformatics software cannot......such as in a web browserMicrosoft have developed a browser plugin called Silverlight for developing applications for the webIt includes a Python interpreter (IronPython)So you can use Webel in Silverlight applicationsMichael Foord has developed an interactive Python tutorial using SilverlightSee http://ironpython.net/tutorial/I have combined this with Webel to develop an interactive Cheminformatics tutorial
  • 27.

Editor's Notes

  • #13 OB has around 112 classes, 227 global functions. CDK has 882 classes.