Cheminformatics Software Development: Case Studies


Published on

Guest lecture given via webcast to Indiana University School of Informatics and Computing class I-571, Intro to Cheminformatics, Fall 2011.

1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Cheminformatics Software Development: Case Studies

  1. 1. CheminformaticsSoftware Development: Case Studies Direct Observations and Informed Opinions Jeremy J Yang PhD Student, IU Cheminformatics Mgr, Systems & Programming, UNM Biocomputing Indiana University School of Informatics and Computing - I571, Intro to Cheminformatics - Fall 2011
  2. 2. My experience1)  Daylight (1989 - 2002): Support Coordinator, SoftwareEngineer –  support, user education, software engineering, meetings, application science, web apps, QA, databases, methodology research, etc.2)  OpenEye (2002 - 2007): Director/VP of Support, Sr.Software Engineer –  support, management, software engineering, QA, web apps, methodology research, etc.3)  UNM (2002 - ): Mgr., Systems & Programming –  software engineering, management, support, computational methodolgy, bioassay screening informatics, biomedical informatics research, etc.
  3. 3. Concept of this talk...Describe direct observations from experiences incheminformatics over last 22 years, relevant todayto understand and navigate complex landscape ofcheminformatics software roles and choices.Avoiding excessive idle reminiscing, include someof the colorful personalities and curious events.Suggest some lessons learned and trendsobserved, in the opinion of the author.
  4. 4. OutlineCase studies included:1)  Daylight2)  OpenEye3)  Symyx a.k.a. MDL Some interesting others also mentioned4)  Accelrys briefly.5)  OpenBabel(Chosen mostly based on my familiarity.)
  5. 5. Perspectives on scientific software•  Developer, programmer•  Computational/informatics scientist, scholar•  Support, educator, maintainer•  Software as scientific publishing•  Open source collaboration•  Consumer: toolkit user (programmer)•  Consumer: app user, scientist•  Licensing, intellectual property, legal•  Business (commercial or non-commercial)
  6. 6. Case Study #1Daylight Chemical Information Systems, Inc.
  7. 7. Daylight Chemical Information Systems, Inc.l Founded 1987 by Dave Weininger, Art Weininger, Yosi Taitz, in Claremont, CAl Ancestry: Pomona College MedChem program (Corwin Hansch, Al Leo)l Innovations: SMILES, SMARTS, SMIRKS, fingerprints, rigorous syntax/grammar/semanticsl Products: ClogP, Thor, Merlin, C toolkits, Oracle Chemistry Cartridgel Fortran → C ~1990, oop-ish API (Scofields).l DEC-VAX/VMS → Unix → Linux, Windows
  8. 8. Daylight “MUG” meetings known for scientific impact Daylight MUG ‘98, Santa Fe, NM
  9. 9. Daylight Chemical Information Systems, Inc.l Success via appeal to examplars who became advocates at various pharmas.l Published, supported APIs and open transparent system appealed to “hackers, nurds and geeks”.l From research computing to enterprise (IT).l Many other software developers learned and borrowed from Daylight.l Focused on software, to enable science.l Max ~12 employees, $5M annual revenue.
  10. 10. Some idle reminiscing... Daylight Krewe @ Zulu Ball, New Orleans 1992 Dave Weininger ArtCraig James Weininger Jeremy Yang
  11. 11. Daylight Chemical Information Systems, Inc.l Scientific mission: To effectively store, retrieve, process all existing chemical information.l This mission often guided what to do, and also what to not do.l  Often that conflicted with profit motives.l  Avoided: consulting, macromolecules, biology, Windows, investors, sales, marketingIf I have seen further it is only by standing on the shoulders ofgiants. - Issac NewtonI cant see far because there are giants on my shoulders! - DaveWeininger
  12. 12. Daylight Thor, Merlin database apps ~ 1995
  13. 13. Daylight toolkit API (C, Perl)
  14. 14. Example research project using Daylight tools
  15. 15. Example research project using Daylight tools Slide 15
  16. 16. Case Study #2OpenEye Scientific Software, Inc.
  17. 17. OpenEye Scientific Software, Inc.l Founded 1997 by Anthony Nicholls in Santa Fe, NM.l Ancestry: Honig lab, biophysics, Columbia.l Innovations: molecular shape and continuum dielectric Poisson-Boltzmann electrostatics via atom centered Gaussians, 3D speed, rigor, accuracy and validation, comprehensive APIsl Products: Rocs, Fred, OEChem, Vidal APIs: C++, Python, Java, many OSs
  18. 18. OpenEye apps Rocs shape overlay Fred docking result Szmap hydrophillic regionsBrood bioisosteric fragments
  19. 19. OpenEye toolkit API
  20. 20. OpenEye Scientific Software, Inc.l Focused on software and science, sometimes a difficult balance.l High-throughput 3D virtual screening (esp. Rocs) has become standard practice (enterprise-like).l Spectrum of functionality between 2D cheminformatics and 3D high performance computing.l Max 34 employees (current).
  21. 21. OpenEye Example Code (2D & 3D)$ [options] [<infile] [>outfile] --i=<INFILE> --outlig=<OUTLIGFILE> --outpro=<OUTPROFILE> ... output protein or other macromol --multiligfiles ... one output file per ligand --minatoms=<N> ... minimum atomcount cutoff for ligand [7] --maxatoms=<N> ... maximum atomcount cutoff for ligand [100] --metal ... disconnected metal ions stay with protein --f ... force processing of non-PDB file --v ... verbose --vv ... very verbose --h ... help
  22. 22. ROCS web app (Python, source code available)
  23. 23. ROCS web app (Python, source code available)
  24. 24. FRED web app (Python, source code available)
  25. 25. FRED web app (Python, source code available)
  26. 26. Example research project using OpenEye tools
  27. 27. Example research project using OpenEye tools
  28. 28. Case Study #3Symyx a.k.a. MDL 2010,
  29. 29. Symyx a.k.a. MDLl  Founded 1978 by Stuart Marson and Todd Wipke (ancestry: Harvard, Princeton, Stanford).l  “Molecular Design Ltd”, reflected initial ab initio design goals, renamed “MDL Information Systems” (93)l  Products: REACCS (82), MACCS (84), MACCS-3D (88), ISIS (91), Isentris (04), chem +bio database systemsl  Invented molfile/SD format (proprietary till 1991), and extensions (query, R-group).l  Based Chime on Rasmol source code.l  Purchased by Reed Elsevier (97), then Symyx (07), then Accelrys (10).
  30. 30. Symyx a.k.a. MDLl Max ~300 employees. (My guess.)l By 1990, all pharma research companies used MDL software for their compound database.l Then the MBAs, lawyers and admen took over!l And innovation ceased. Slide 30
  31. 31. Case Study #4 Accelrys
  32. 32. Accelrys A tale of mergers and acquisitions (~1990 - 2011): –  MSI (Molecular Simulations) • Biodesign • Cambridge Molecular Design • Polygen • Biocad • Biosym Technologies –  Synopsys Scientific Systems –  Oxford Molecular –  Genetics Computer Group –  Synomyx –  SciTegic –  Symyx –  Contur Software AB
  33. 33. Accelrys SELECT hts_ap_archive.runset_number as "RUN", hts_ap_archive.ap_alias as "APlateName", hts_plate.plate_id, hts_plate.alternate_id as "IPlateName", example hts_well.well_no, hts_sample.alternate_id, of hts_result_detail.value_char AS Target, hts_result_type.type_desc AS Result_Type, hts_assay_result.concentration || hts_conc_unit.unit_value AS CONC, AEI hts_assay_result.dilution, hts_assay_result.result_value AS Value SQL: FROM hts_well, hts_plate, hts_sample, hts_conc_unit, ddi_container_master,corporate hts_ap_archive, hts_assay_result,merging hts_result_type, hts_result_detail WHERE hts_ap_archive.ap_alias=213_20110712_135454-1reflected in AND hts_assay_result.sample_id=hts_well.sample_id AND hts_assay_result.sample_id=hts_sample.sample_idschema, AND hts_well.plate_id=hts_plate.plate_id AND hts_assay_result.plate_id=hts_ap_archive.ap_numbertechnology AND hts_plate.alternate_id=ddi_container_master.container_name AND hts_well.plate_id=hts_plate.plate_id AND hts_well.sample_id=hts_sample.sample_id AND hts_assay_result.sample_plate=ddi_container_master.container_id AND hts_result_type.result_type=hts_assay_result.result_type AND hts_assay_result.result_id=hts_result_detail.result_id AND hts_assay_result.conc_unit=hts_conc_unit.unit_id ORDER BY hts_well.well_no,Target,Result_Type
  34. 34. Accelrys Merging reflected in technologyl  Chemical cartridges: AEI (v6 & v7), Symyx,new SciTegic cartridgel  Cheminformatics? E.g., canonicalize asmiles w/ Accelrys, Scitegic or MDL code.l  The Accord Enterprise Informatics (AEI6.2) UNM purchased in 2009 is now a“legacy product”.l  Re-branding and re-packagingl  May be great technically, but challengingfor customers and Accelrys alike.
  35. 35. Accelryskey product:Pipeline Pilot (Scitegicfounded 1999, acquired byAccelrys 2004 for $21.5M.)
  36. 36. Opinionated Observation :There is a sometimes subtle differencebetween (1) a software product with a supportcontract, and (2) a service contract whichinvolves customizing and installing andconfiguring a software product, requiring orlikely to require ongoing, additional servicecontracts. Accelrys and Symyx (“solutionsproviders”) have generated much of theirrevenue using the latter. In contrast: toolsproviders.
  37. 37. Accelrys (My opinions -- feel free to disagree.)1)  M&As reflected US & global businesstrends, but perpetual re-org is stressful andthere are huge costs to employees, customers,and technical progress.2)  No coherent scientific mission, onlybusiness growth.3)  Lots of excellent software, science andpeople. But all challenged by the chaos. "If in your science you only look for business, then you risk finding neither knowledge nor business." — Haldor Topsøe
  38. 38. Case Study #5: OpenBabel
  39. 39. OpenBabell Open-source C++ community project based on OELib by Matt Stahl, OpenEye.l  Founded 2001, by Geoff Hutchinson et al. Now ~100 credited contributors.l  C++ w/ wrappers via SWIG: Python, Perl, Java, Ruby, C#.l  2011 paper in Journal of Cheminformatics*l >160,000 downloads, >400 citations, used by >40 projects*”Open Babel: An open chemical toolbox”, Noel M. OBoyle, Michael Banck, Craig A. James, Chris Morley, TimVandermeersch and Geoffrey R. Hutchison, Journal of Cheminformatics 2011, 3:33doi:10.1186/1758-2946-3-33.
  40. 40. $ obscreenOpenBabel obscreen - screen molecules based on calculated properties OpenBabel v2.2.3 (Dec 4 2010)Example syntax: obscreen [options] options:Code -i <infile> -o <outfile> -otable <output table> -badsmarts <badfile> ... bad smarts file [default:builti -minmwt <MINWT> ... minimum molweight [200] -maxmwt <MAXWT> ... maximum molweight [600] -minhbd <MINHBD> ... min Hbond donors [0] -maxhbd <MAXHBD> ... max Hbond donors [5] -minhba <MINHBA> ... min Hbond acceptors [0] -maxhba <MAXHBA> ... max Hbond acceptors [10] -minrot <MINROT> ... min rotors [0] -maxrot <MAXROT> ... max rotors [12] -minchiral <MINCHIRAL> ... min chiral atoms [0] -maxchiral <MAXCHIRAL> ... max chiral atoms [5] -minlogp <MINLOGP> ... min logP [-5.0] -maxlogp <MAXLOGP> ... max logP [5.0] -hbasmarts <HBASMARTS> ... default=[#6,#7;R0]=[#8] -hbdsmarts <HBDSMARTS> ... default=[!H0;#7,#8,#9] -n_inmax <N> ... input limit -n_outmax <N> ... output limit -v ... verbose -vv ... very verbose
  41. 41. OpenBabel: active discussion listsindicators of community health
  42. 42. OpenBabel Used by eMolecules, quite successfullyCraig James, CTOformerly of Daylight,Accelrys
  43. 43. OpenBabel (My opinions -- feel free to disagree.)l OB is very good but not yet close to Daylight, OEChem or JChem in comprehensiveness, quality, and other measures.l  SMARTS accuracy not state of the art.l  OB continually improving thanks to capable and active community of developers and users.l  The value reflects the total quality developer years invested. So there is no reason OB cannot catch up, if the community continues to grow.
  44. 44. Some interesting and successful others:1)  ChemAxon - Budapest, Marvin, Java, JChem2)  Tripos (now Certara) - St. Louis, Sybyl, SGI3)  Chem Comp Group - Montreal, MOE, SVL4)  PyMol - was Delano, now Schrödinger (OSS)5)  Schrodinger - quantum, etc.6)  UCSF School of Pharmacy - DOCK, Kuntz & al.7)  NIH NCTT - OSS, Java, Tripod, bioassay analysis8)  Scitouch - Russia, Indigo, Dingo, Bingo9)  CDK - FOSS, Java10)  RDKit - FOSS, incl. machine learning11)  Silicos NV - OSS & commercial12)  Bioreason, VC-funded, bought by Simulations Plus13)  Mesa Analytics & Computing14)  NextMove Software15)  UNM Division of Biocomputing16)  IU SOIC CCRG
  45. 45. Some General Conclusionsl  Cheminformatics software landscape is very complexl  Economics & Freakonomics are drivers but alsopersonalitiesl  FOSS can compete with $$$ software.l  Landscape includes lots of hidden costs.l  “Software engineering” has been an ideal, far fromreality overall.l  Software can enable or hinder sciencel  Diverse business models, diverse everythingl  Plenty of technological changel  Some human change too
  46. 46. The EndFeel free to contact me directly with questions orideas!Jeremy J