(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and ChallengesBIOVIA
This session will dive deep into the bowels of the pro client. You will learn how to get the most out of recent enhancements such as design mode and protocol comparison. Protocol authors will also learn about best practices for the use of subprotocols, shortcuts, and search. Protocol developers will also get a better understanding of important but not widely known features in the pro client for deployment, collaboration, and validation. If you want to take your protocol development skills to the next level this is the session to attend!
Synergy of Human and Artificial Intelligence in Software EngineeringTao Xie
Keynote Talk by Tao Xie at International NSF sponsored Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE 2013) http://promisedata.org/raise/2013/
SBQS 2013 Keynote: Cooperative Testing and AnalysisTao Xie
SBQS 2013 Keynote: Cooperative Testing and Analysis: Human-Tool, Tool-Tool, and Human-Human Cooperations to Get Work Done http://sbqs.dcc.ufba.br/view/palestrantes.php
Presentation on the OpenML initiative to enable open, collaborative machine learning during the data@Sheffield event. We discuss how data, machine learning algorithms and experiments can be analysed collaboratively by data scientists and domain scientists, as well as citizen scientists.
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and ChallengesBIOVIA
This session will dive deep into the bowels of the pro client. You will learn how to get the most out of recent enhancements such as design mode and protocol comparison. Protocol authors will also learn about best practices for the use of subprotocols, shortcuts, and search. Protocol developers will also get a better understanding of important but not widely known features in the pro client for deployment, collaboration, and validation. If you want to take your protocol development skills to the next level this is the session to attend!
Synergy of Human and Artificial Intelligence in Software EngineeringTao Xie
Keynote Talk by Tao Xie at International NSF sponsored Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE 2013) http://promisedata.org/raise/2013/
SBQS 2013 Keynote: Cooperative Testing and AnalysisTao Xie
SBQS 2013 Keynote: Cooperative Testing and Analysis: Human-Tool, Tool-Tool, and Human-Human Cooperations to Get Work Done http://sbqs.dcc.ufba.br/view/palestrantes.php
Presentation on the OpenML initiative to enable open, collaborative machine learning during the data@Sheffield event. We discuss how data, machine learning algorithms and experiments can be analysed collaboratively by data scientists and domain scientists, as well as citizen scientists.
LUISS - Deep Learning and data analyses - 09/01/19Alberto Paro
My participation to the course "Data Analysis, Mobility, Proximity and App-based Marketing".
A new perspective on how data support companies on strategic decisions.
Practical Chaos Engineering will show how to start running chaos experiments in your infrastructure and will try to guide your through the principles of chaos.
Keynote on software sustainability given at the 2nd Annual Netherlands eScience Symposium, November 2014.
Based on the article
Carole Goble ,
Better Software, Better Research
Issue No.05 - Sept.-Oct. (2014 vol.18)
pp: 4-8
IEEE Computer Society
http://www.computer.org/csdl/mags/ic/2014/05/mic2014050004.pdf
http://doi.ieeecomputersociety.org/10.1109/MIC.2014.88
http://www.software.ac.uk/resources/publications/better-software-better-research
In this talk at the CECAM 2015 Workshop on Future Technologies in Automated Atomistic Simulations, I will discuss the Materials Project Ecosystem, an initiative to develop a comprehensive set of open-source software and data tools for materials informatics. The Materials Project is a US Department of Energy-funded initiative to make the computed properties of all known inorganic materials publicly available to all materials researchers to accelerate materials innovation. Today, the Materials Project database boasts more than 58,000 materials, covering a broad range of properties, including energetic properties (e.g., phase and aqueous stability, reaction energies), electronic structure (bandstructures, DOSs) and structural and mechanical properties (e.g., elastic constants).
A linchpin of the Materials Project is its robust data and software infrastructure, built on best open-source software development practices such as continuous testing and integration, and comprehensive documentation. I will provide an overview of the open-source software modules that have been developed for materials analysis (Python Materials Genomics), error handling (Custodian) and scientific workflow management (FireWorks), as well as the Materials API, a first-of-its-kind interface for accessing materials data based on REpresentational State Transfer (REST) principles. I will show a materials researcher may use and build on these software and data tools for materials informatics as well as to accelerate his own research.
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...Chetan Khatri
Pragmatic presentation on Penetration testing for Data-Driven Platforms.
Agenda:
- Motivation
- Information Security - Ethics.
- Encryption
- Authentication
- Information Security & Potential threats with Open Source World.
- Find vulnerabilities.
- Checklist before using any Open Source library.
- Vulnerabilities report.
- Penetration Testing for Data-Driven Developments.
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...Bonnie Hurwitz
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes. Overview of work underway to add applications and computational analysis pipelines to iPlant for metagenomics and microbial ecology.
This presentation details how a key healthcare interoperability program, the Medical Device Plug-and-Play (MD PnP) initiative, is using the Data Distribution Service (DDS) in the reference implementation for their Integrated Clinical Environment (ICE).
Resilience Engineering: A field of study, a community, and some perspective s...John Allspaw
These are slides from my talk on March 28, 2018 at the LA SCALE tech Meetup, graciously hosted at TicketMaster's office. (https://www.meetup.com/scalela/events/248904126/)
Talk at the FOSDEM 2011 Data Analytics Devroom about MyMediaLite.
http://fosdem.org/2011/schedule/event/mymedialite
MyMediaLite is a lightweight, multi-purpose library of recommender system algorithms written in C#.
The presentation gives a short overview of the library, how to use its features from the command line and from C#, Python, and Ruby programs, as well as how to extend the library with new recommender system algorithms.
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Ali Alkan
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi | Automating Machine Learning, Artificial Intelligence, and Data Science | Guided Analytics
TIGA: Target Illumination GWAS AnalyticsJeremy Yang
Aggregating and assessing experimental evidence for interpretable, explainable, accountable gene-trait associations. Presentation for NIH IDG Annual Meeting, Feb 9-11, 2021.
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizerJeremy Yang
DrugCentralDb, a biomedical research database developed at UNM and widely used by drug discovery scientists, has been Dockerized and deployed via AWS EC2. Additionally, we have developed a Python package BioClients, with module 'drugcentral' API for DrugCentral. Source code and Docker image are available via GitHub and DockerHub, respectively. These tools are new and in testing, with full release planned for later in 2020.
More Related Content
Similar to Cheminformatics Software Development: Case Studies
LUISS - Deep Learning and data analyses - 09/01/19Alberto Paro
My participation to the course "Data Analysis, Mobility, Proximity and App-based Marketing".
A new perspective on how data support companies on strategic decisions.
Practical Chaos Engineering will show how to start running chaos experiments in your infrastructure and will try to guide your through the principles of chaos.
Keynote on software sustainability given at the 2nd Annual Netherlands eScience Symposium, November 2014.
Based on the article
Carole Goble ,
Better Software, Better Research
Issue No.05 - Sept.-Oct. (2014 vol.18)
pp: 4-8
IEEE Computer Society
http://www.computer.org/csdl/mags/ic/2014/05/mic2014050004.pdf
http://doi.ieeecomputersociety.org/10.1109/MIC.2014.88
http://www.software.ac.uk/resources/publications/better-software-better-research
In this talk at the CECAM 2015 Workshop on Future Technologies in Automated Atomistic Simulations, I will discuss the Materials Project Ecosystem, an initiative to develop a comprehensive set of open-source software and data tools for materials informatics. The Materials Project is a US Department of Energy-funded initiative to make the computed properties of all known inorganic materials publicly available to all materials researchers to accelerate materials innovation. Today, the Materials Project database boasts more than 58,000 materials, covering a broad range of properties, including energetic properties (e.g., phase and aqueous stability, reaction energies), electronic structure (bandstructures, DOSs) and structural and mechanical properties (e.g., elastic constants).
A linchpin of the Materials Project is its robust data and software infrastructure, built on best open-source software development practices such as continuous testing and integration, and comprehensive documentation. I will provide an overview of the open-source software modules that have been developed for materials analysis (Python Materials Genomics), error handling (Custodian) and scientific workflow management (FireWorks), as well as the Materials API, a first-of-its-kind interface for accessing materials data based on REpresentational State Transfer (REST) principles. I will show a materials researcher may use and build on these software and data tools for materials informatics as well as to accelerate his own research.
Demystify Information Security & Threats for Data-Driven Platforms With Cheta...Chetan Khatri
Pragmatic presentation on Penetration testing for Data-Driven Platforms.
Agenda:
- Motivation
- Information Security - Ethics.
- Encryption
- Authentication
- Information Security & Potential threats with Open Source World.
- Find vulnerabilities.
- Checklist before using any Open Source library.
- Vulnerabilities report.
- Penetration Testing for Data-Driven Developments.
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...Bonnie Hurwitz
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to microbes. Overview of work underway to add applications and computational analysis pipelines to iPlant for metagenomics and microbial ecology.
This presentation details how a key healthcare interoperability program, the Medical Device Plug-and-Play (MD PnP) initiative, is using the Data Distribution Service (DDS) in the reference implementation for their Integrated Clinical Environment (ICE).
Resilience Engineering: A field of study, a community, and some perspective s...John Allspaw
These are slides from my talk on March 28, 2018 at the LA SCALE tech Meetup, graciously hosted at TicketMaster's office. (https://www.meetup.com/scalela/events/248904126/)
Talk at the FOSDEM 2011 Data Analytics Devroom about MyMediaLite.
http://fosdem.org/2011/schedule/event/mymedialite
MyMediaLite is a lightweight, multi-purpose library of recommender system algorithms written in C#.
The presentation gives a short overview of the library, how to use its features from the command line and from C#, Python, and Ruby programs, as well as how to extend the library with new recommender system algorithms.
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Ali Alkan
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi | Automating Machine Learning, Artificial Intelligence, and Data Science | Guided Analytics
TIGA: Target Illumination GWAS AnalyticsJeremy Yang
Aggregating and assessing experimental evidence for interpretable, explainable, accountable gene-trait associations. Presentation for NIH IDG Annual Meeting, Feb 9-11, 2021.
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizerJeremy Yang
DrugCentralDb, a biomedical research database developed at UNM and widely used by drug discovery scientists, has been Dockerized and deployed via AWS EC2. Additionally, we have developed a Python package BioClients, with module 'drugcentral' API for DrugCentral. Source code and Docker image are available via GitHub and DockerHub, respectively. These tools are new and in testing, with full release planned for later in 2020.
TIN-X v2: modernized architecture with REST APIJeremy Yang
TIN-X v2: modernized architecture with REST API for sustainability and interoperability. Presented at the IDG Face2Face meeting in Arlington, VA, Feb 26-27, 2019.
Ex-files: Sex-Specific Gene Expression Profiles ExplorerJeremy Yang
Poster prepared for NIH Data Commons Pilot Project Consortium (DCPPC) scientific use case, developed in 2018, with GTEx gene expression data and deployed as online application.
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Jeremy Yang
Talk given at 14th Annual New Mexico BioInformatics, Science and Technology (NMBIST) Symposium, entitled Integrative Omics, on March 14-15, 2019. Most slides c/o IDG KMC PI Tudor Oprea, MD, PhD.
Badapple: promiscuity patterns from noisy evidence (poster)Jeremy Yang
Badapple: promiscuity patterns from noisy evidence. Bioassay data analysis using scaffold associations. Presented at the UNM Staff Research Expo, Jan 27, 2017. Adapted from "Badapple: promiscuity patterns from noisy evidence", Yang JJ, Ursu O, Lipinski CA, Sklar LA, Oprea TI Bologa CG, J. Cheminfo. 8:29 (2016), DOI: 10.1186/s13321-016-0137-3.
Bibliological data science and drug discoveryJeremy Yang
Presented at the 2016 ACS Fall Meeting in Philadelpha, session "Effectively Harnessing the World's Literature to Inform Rational Compound Design", on 8/21/16.
BioMISS: Language Diversity of ComputingJeremy Yang
Talk given at the UNM BioMedical Informatics Seminar Series, Oct 15, 2015. Because the languages of computing are numerous and diverse, it can be challenging to choose an appropriate language for a given task. Yet data are of little value unless represented by semantic systems of languages with appropriate levels of abstraction. We consider the analogy between object-oriented programming and abstraction in biomedical vocabulary and the Sapir-Whorf Hypothesis (that an individual’s thoughts and actions are determined by the language he or she speaks). As an example, we consider the differences between ICD-10 and disease ontology.
Molecular scaffolds are special and useful guides to discovery, poster (36x54"). Presented at ACS National Meeting SciMix in Indianapolis, Sep 9, 2013.
Cyberinfrastructure Day 2010: Applications in BiocomputingJeremy Yang
UNM Cyberinfrastructure Day 2010 presentation: Applications in Biocomputing, biomedical and cheminformatics research computing cyberinfrastructure issues.
Promiscuous patterns and perils in PubChem and the MLSCN
Cheminformatics Software Development: Case Studies
1. Cheminformatics
Software Development:
Case Studies
Direct Observations and Informed Opinions
Jeremy J Yang
PhD Student, IU Cheminformatics
Mgr, Systems & Programming, UNM Biocomputing
Indiana University School of Informatics and Computing - I571, Intro to Cheminformatics - Fall 2011
2. My experience
1) Daylight (1989 - 2002): Support Coordinator, Software
Engineer
– support, user education, software engineering,
meetings, application science, web apps, QA,
databases, methodology research, etc.
2) OpenEye (2002 - 2007): Director/VP of Support, Sr.
Software Engineer
– support, management, software engineering, QA, web
apps, methodology research, etc.
3) UNM (2002 - ): Mgr., Systems & Programming
– software engineering, management, support,
computational methodolgy, bioassay screening
informatics, biomedical informatics research, etc.
3. Concept of this talk...
Describe direct observations from experiences in
cheminformatics over last 22 years, relevant today
to understand and navigate complex landscape of
cheminformatics software roles and choices.
Avoiding excessive idle reminiscing, include some
of the colorful personalities and curious events.
Suggest some lessons learned and trends
observed, in the opinion of the author.
4. Outline
Case studies included:
1) Daylight
2) OpenEye
3) Symyx a.k.a. MDL Some interesting
others also mentioned
4) Accelrys briefly.
5) OpenBabel
(Chosen mostly based on
my familiarity.)
5. Perspectives on scientific software
• Developer, programmer
• Computational/informatics scientist, scholar
• Support, educator, maintainer
• Software as scientific publishing
• Open source collaboration
• Consumer: toolkit user (programmer)
• Consumer: app user, scientist
• Licensing, intellectual property, legal
• Business (commercial or non-commercial)
7. Daylight Chemical Information Systems, Inc.
l Founded 1987 by Dave Weininger, Art
Weininger, Yosi Taitz, in Claremont, CA
l Ancestry: Pomona College MedChem program
(Corwin Hansch, Al Leo)
l Innovations: SMILES, SMARTS, SMIRKS,
fingerprints, rigorous syntax/grammar/semantics
l Products: ClogP, Thor, Merlin, C toolkits, Oracle
Chemistry Cartridge
l Fortran → C ~1990, oop-ish API (Scofields).
l DEC-VAX/VMS → Unix → Linux, Windows
9. Daylight Chemical Information Systems, Inc.
l Success via appeal to examplars who became
advocates at various pharmas.
l Published, supported APIs and open
transparent system appealed to “hackers, nurds
and geeks”.
l From research computing to enterprise (IT).
l Many other software developers learned and
borrowed from Daylight.
l Focused on software, to enable science.
l Max ~12 employees, $5M annual revenue.
10. Some idle reminiscing...
Daylight Krewe @ Zulu Ball, New Orleans 1992
Dave Weininger
Art
Craig James Weininger
Jeremy Yang
11. Daylight Chemical Information Systems, Inc.
l Scientific mission: To effectively store, retrieve,
process all existing chemical information.
l This mission often guided what to do, and also
what to not do.
l Often that conflicted with profit motives.
l Avoided: consulting, macromolecules, biology,
Windows, investors, sales, marketing
If I have seen further it is only by standing on the shoulders of
giants. - Issac Newton
I can't see far because there are giants on my shoulders! - Dave
Weininger
17. OpenEye Scientific Software, Inc.
l Founded 1997 by Anthony Nicholls in Santa Fe,
NM.
l Ancestry: Honig lab, biophysics, Columbia.
l Innovations: molecular shape and continuum
dielectric Poisson-Boltzmann electrostatics via
atom centered Gaussians, 3D speed, rigor,
accuracy and validation, comprehensive APIs
l Products: Rocs, Fred, OEChem, Vida
l APIs: C++, Python, Java, many OS's
18. OpenEye apps
Rocs shape overlay Fred docking result
Szmap hydrophillic regions
Brood bioisosteric fragments
20. OpenEye Scientific Software, Inc.
l Focused on software and science, sometimes a
difficult balance.
l High-throughput 3D virtual screening (esp.
Rocs) has become standard practice
(enterprise-like).
l Spectrum of functionality between 2D
cheminformatics and 3D high performance
computing.
l Max 34 employees (current).
21. OpenEye Example Code (2D & 3D)
$ pdb2lig.py
Usage:
pdb2lig.py [options] [<infile] [>outfile]
--i=<INFILE>
--outlig=<OUTLIGFILE>
--outpro=<OUTPROFILE> ... output protein or other macromol
--multiligfiles ... one output file per ligand
--minatoms=<N> ... minimum atomcount cutoff for ligand [7]
--maxatoms=<N> ... maximum atomcount cutoff for ligand [100]
--metal ... disconnected metal ions stay with protein
--f ... force processing of non-PDB file
--v ... verbose
--vv ... very verbose
--h ... help
28. Case Study #3
Symyx a.k.a. MDL
http://www.symyx.com
(since 2010, http://www.accelrys.com)
29. Symyx a.k.a. MDL
l Founded 1978 by Stuart Marson and Todd
Wipke (ancestry: Harvard, Princeton, Stanford).
l “Molecular Design Ltd”, reflected initial ab initio
design goals, renamed “MDL Information
Systems” ('93)
l Products: REACCS ('82), MACCS ('84),
MACCS-3D ('88), ISIS ('91), Isentris ('04), chem
+bio database systems
l Invented molfile/SD format (proprietary till
1991), and extensions (query, R-group).
l Based Chime on Rasmol source code.
l Purchased by Reed Elsevier ('97), then Symyx
('07), then Accelrys ('10).
30. Symyx a.k.a. MDL
l Max ~300 employees. (My guess.)
l By 1990, all pharma research companies
used MDL software for their compound
database.
l Then the MBAs, lawyers and admen took over!
l And innovation ceased.
Slide 30
32. Accelrys A tale of mergers and acquisitions (~1990 - 2011):
– MSI (Molecular Simulations)
• Biodesign
• Cambridge Molecular Design
• Polygen
• Biocad
• Biosym Technologies
– Synopsys Scientific Systems
– Oxford Molecular
– Genetics Computer Group
– Synomyx
– SciTegic
– Symyx
– Contur Software AB
33. Accelrys
SELECT
hts_ap_archive.runset_number as "RUN",
hts_ap_archive.ap_alias as "APlateName",
hts_plate.plate_id,
hts_plate.alternate_id as "IPlateName",
example hts_well.well_no,
hts_sample.alternate_id,
of hts_result_detail.value_char AS Target,
hts_result_type.type_desc AS Result_Type,
hts_assay_result.concentration || hts_conc_unit.unit_value AS CONC,
AEI hts_assay_result.dilution,
hts_assay_result.result_value AS Value
SQL: FROM hts_well,
hts_plate,
hts_sample,
hts_conc_unit,
ddi_container_master,
corporate hts_ap_archive,
hts_assay_result,
merging hts_result_type,
hts_result_detail
WHERE hts_ap_archive.ap_alias='213_20110712_135454-1'
reflected in AND hts_assay_result.sample_id=hts_well.sample_id
AND hts_assay_result.sample_id=hts_sample.sample_id
schema, AND hts_well.plate_id=hts_plate.plate_id
AND hts_assay_result.plate_id=hts_ap_archive.ap_number
technology AND hts_plate.alternate_id=ddi_container_master.container_name
AND hts_well.plate_id=hts_plate.plate_id
AND hts_well.sample_id=hts_sample.sample_id
AND hts_assay_result.sample_plate=ddi_container_master.container_id
AND hts_result_type.result_type=hts_assay_result.result_type
AND hts_assay_result.result_id=hts_result_detail.result_id
AND hts_assay_result.conc_unit=hts_conc_unit.unit_id
ORDER BY hts_well.well_no,Target,Result_Type
34. Accelrys
Merging reflected in technology
l Chemical cartridges: AEI (v6 & v7), Symyx,
new SciTegic cartridge
l Cheminformatics? E.g., canonicalize a
smiles w/ Accelrys, Scitegic or MDL code.
l The Accord Enterprise Informatics (AEI
6.2) UNM purchased in 2009 is now a
“legacy product”.
l Re-branding and re-packaging
l May be great technically, but challenging
for customers and Accelrys alike.
36. Opinionated Observation :
There is a sometimes subtle difference
between (1) a software product with a support
contract, and (2) a service contract which
involves customizing and installing and
configuring a software product, requiring or
likely to require ongoing, additional service
contracts. Accelrys and Symyx (“solutions
providers”) have generated much of their
revenue using the latter. In contrast: tools
providers.
37.
38. Accelrys
(My opinions -- feel free to disagree.)
1) M&A's reflected US & global business
trends, but perpetual re-org is stressful and
there are huge costs to employees, customers,
and technical progress.
2) No coherent scientific mission, only
business growth.
3) Lots of excellent software, science and
people. But all challenged by the chaos.
"If in your science you only look for business, then you risk
finding neither knowledge nor business."
— Haldor Topsøe
40. OpenBabel
l Open-source C++ community project based on
OELib by Matt Stahl, OpenEye.
l Founded 2001, by Geoff Hutchinson et al. Now
~100 credited contributors.
l C++ w/ wrappers via SWIG: Python, Perl, Java,
Ruby, C#.
l 2011 paper in Journal of Cheminformatics*
l >160,000 downloads, >400 citations, used by
>40 projects
*”Open Babel: An open chemical toolbox”, Noel M. O'Boyle, Michael Banck, Craig A. James, Chris Morley, Tim
Vandermeersch and Geoffrey R. Hutchison, Journal of Cheminformatics 2011, 3:33doi:10.1186/1758-2946-3-33.
41. $ obscreen
OpenBabel obscreen - screen molecules based on calculated properties
OpenBabel v2.2.3 (Dec 4 2010)
Example syntax: obscreen [options]
options:
Code -i <infile>
-o <outfile>
-otable <output table>
-badsmarts <badfile> ... bad smarts file [default:builti
-minmwt <MINWT> ... minimum molweight [200]
-maxmwt <MAXWT> ... maximum molweight [600]
-minhbd <MINHBD> ... min Hbond donors [0]
-maxhbd <MAXHBD> ... max Hbond donors [5]
-minhba <MINHBA> ... min Hbond acceptors [0]
-maxhba <MAXHBA> ... max Hbond acceptors [10]
-minrot <MINROT> ... min rotors [0]
-maxrot <MAXROT> ... max rotors [12]
-minchiral <MINCHIRAL> ... min chiral atoms [0]
-maxchiral <MAXCHIRAL> ... max chiral atoms [5]
-minlogp <MINLOGP> ... min logP [-5.0]
-maxlogp <MAXLOGP> ... max logP [5.0]
-hbasmarts <HBASMARTS> ... default='[#6,#7;R0]=[#8]'
-hbdsmarts <HBDSMARTS> ... default='[!H0;#7,#8,#9]'
-n_inmax <N> ... input limit
-n_outmax <N> ... output limit
-v ... verbose
-vv ... very verbose
43. OpenBabel
Used by eMolecules, quite successfully
Craig James, CTO
formerly of Daylight,
Accelrys
44. OpenBabel
(My opinions -- feel free to disagree.)
l OB is very good but not yet close to Daylight,
OEChem or JChem in comprehensiveness,
quality, and other measures.
l SMARTS accuracy not state of the art.
l OB continually improving thanks to capable and
active community of developers and users.
l The value reflects the total quality developer
years invested. So there is no reason OB
cannot catch up, if the community continues to
grow.
45. Some interesting and successful others:
1) ChemAxon - Budapest, Marvin, Java, JChem
2) Tripos (now Certara) - St. Louis, Sybyl, SGI
3) Chem Comp Group - Montreal, MOE, SVL
4) PyMol - was Delano, now Schrödinger (OSS)
5) Schrodinger - quantum, etc.
6) UCSF School of Pharmacy - DOCK, Kuntz & al.
7) NIH NCTT - OSS, Java, Tripod, bioassay analysis
8) Scitouch - Russia, Indigo, Dingo, Bingo
9) CDK - FOSS, Java
10) RDKit - FOSS, incl. machine learning
11) Silicos NV - OSS & commercial
12) Bioreason, VC-funded, bought by Simulations Plus
13) Mesa Analytics & Computing
14) NextMove Software
15) UNM Division of Biocomputing
16) IU SOIC CCRG
46. Some General Conclusions
l Cheminformatics software landscape is very complex
l Economics & Freakonomics are drivers but also
personalities
l FOSS can compete with $$$ software.
l Landscape includes lots of hidden costs.
l “Software engineering” has been an ideal, far from
reality overall.
l Software can enable or hinder science
l Diverse business models, diverse everything
l Plenty of technological change
l Some human change too
47. The End
Feel free to contact me directly with questions or
ideas!
Jeremy J Yang
jejyang@indiana.edu