SlideShare a Scribd company logo
1 of 33
Download to read offline
Open-source from/in the enterprise: the RDKit
Gregory Landrum
NIBR Informatics
Novartis Institutes for BioMedical Research, Basel, Switzerland
Outline
§  What is the RDKit?
§  RDKit integration with other open-source projects
•  Knime
•  PostgreSQL
•  IPython
•  Pandas
•  Lucene
§  RDKit in NIBR, some case studies
RDKit: What is it?
§  Open-source C++ toolkit for cheminformatics
§  Wrappers for Python (2.x), Java, C#
§  Functionality:
•  2D and 3D molecular operations
•  Descriptor generation for machine learning
•  PostgreSQL database cartridge for substructure and similarity searching
•  Knime nodes
•  IPython integration
•  Lucene integration (experimental)
•  Supports Mac/Windows/Linux
§  Releases every 6 months
§  business-friendly BSD license
§  Code: https://github.com/rdkit
§  http://www.rdkit.org
The community
§  Mailing lists hosted at sourceforge: https://sourceforge.net/p/rdkit/
mailman/
§  Active participants from academia, small and large pharma, software
companies, and service providers
§  30+ attendees at each of the two user group meetings
Some features
§  Input/Output: SMILES/SMARTS, SDF, TDT, PDB,
SLN [1], Corina mol2 [1]
§  “Cheminformatics”:
•  Substructure searching
•  Canonical SMILES
•  Chirality support (i.e. R/S or E/Z labeling)
•  Chemical transformations (e.g. remove matching
substructures)
•  Chemical reactions
§  2D depiction, including constrained depiction
§  2D->3D conversion/conformational analysis via
distance geometry
§  UFF and MMFF94 implementation for cleaning up
structures
§  Fingerprinting: Daylight-like, atom pairs, topological
torsions, Morgan algorithm, “MACCS keys”, etc.
§  Similarity/diversity picking
§  2D pharmacophores [1]
§  Gasteiger-Marsili charges
§  Hierarchical subgraph/fragment analysis
§  Bemis and Murcko scaffold determination
§  RECAP and BRICS implementations
§  Multi-molecule maximum common substructure
§  Feature maps
§  Shape-based similarity
§  Fraggle similarity (from GSK)
§  Molecule-molecule alignment
§  Open3DAlign implementation
§  Integration with PyMOL for 3D visualization
§  Functional group filtering
§  Salt stripping
§  Molecular descriptor library:
Topological (κ3, Balaban J, etc.), Compositional (Number
of Rings, Number of Aromatic Heterocycles, etc.),
EState, SlogP/SMR (Wildman and Crippen approach),
“MOE like” VSA descriptors, Feature-map vectors
§  Machine Learning:
•  Clustering (hierarchical)
•  Information theory (Shannon entropy, information
gain, etc.)
§  Tight integration with the IPython notebook and
pandas
§  Integration with the InChI library
[1] These implementations are functional but are not necessarily
the best, fastest, or most complete.
The contrib dir
§  LEF (Anna Vulpetti, NIBR): Local Environment of Fluorine
§  PBF (Nicholas Firth, ICR): Plane of best fit descriptor
§  SA_Score (Peter Ertl, NIBR): synthetic-accessibility score
§  fraggle (Jameed Hussain, GSK): fragment-based similarity
§  mmpa (Jameed Hussain, GSK): molecular matched pairs
§  pzc (Paul Czodrowski, Merck KGaA): tools for building and validating
classifiers
§  ConformerParser (Sereina Riniker, NIBR): parser for Amber trajectory
files
C++ :
Core data structures and algorithms
Postgre
SQL
Java
SWIG
Python
Boost.Python
Knime
What is this all about?
script
inter-
active
Exact same algorithms/implementations accessible from
many different endpoints
C#
App
Knime integration
§  Open-source RDKit-based nodes for Knime providing cheminformatics
functionality
+
§  Trusted nodes distributed from
knime community site
§  Work in progress: more nodes being
added (new wizard makes it easy)
What’s there?
+
RDKit Interactive Table
§  KNIME interactive table with molecules as column headers
+
+
Functionality for working with 3D molecules
§  Example: flexible molecule-molecule alignment
PostgreSQL integration
§  PostgreSQL (http://www.postgresql.org): a robust, flexible, and
extensible relational open-source database. Rich collection of
extensions available
§  RDKit “cartridge”:
•  Fast substructure and similarity search
•  Fingerprints (count-based and bit-vector):
Morgan (ECFP-like), FeatMorgan (FCFP-like), RDKit (Daylight like), atom pair,
topological torsion, MACCS
•  Standard molecule properties and descriptors
§  Basis for myChEMBL (http://chembl.blogspot.co.uk/2013/10/chembl-
virtual-machine-aka-mychembl.html) Ochoa, R., Davies, M., Papadatos, G.,
Atkinson, F., & Overington, J. P. (2014). myChEMBL: a virtual machine implementation of
open data and cheminformatics tools. Bioinformatics, 30(2), 298–300.
+
PostgreSQL integration
Substructure search
+
chembl_17=# select molregno,m from rdk.mols where
m@>'c1ccc2c(c1)C(=NN(C2=O)Cc3nc4cc(ccc4s3)C)CC(=O)O';!
molregno | m !
----------+---------------------------------------------------------------!
7502 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12!
23364 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)cc(C(F)(F)F)c3s2)c(=O)c2ccccc12!
23439 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)cc(Cl)c3s2)c(=O)c2ccccc12!
23462 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)cc(F)c3s2)c(=O)c2ccccc12!
24192 | Cc1cc2nc(Cn3nc(CC(=O)O)c4ccccc4c3=O)sc2c(C)c1!
24190 | COc1cc2sc(Cn3nc(CC(=O)O)c4ccccc4c3=O)nc2cc1C(F)(F)F!
24194 | Cc1ccc2sc(Cn3nc(CC(=O)O)c4ccccc4c3=O)nc2c1!
24237 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)c(O)cc3s2)c(=O)c2ccccc12!
24331 | CC(c1nc2cc(C(F)(F)F)ccc2s1)n1nc(CC(=O)O)c2ccccc2c1=O!
(9 rows)!
!
Time: 112.325 ms!
PostgreSQL integration
Similarity search
+
chembl_17=# select * from get_mfp2_neighbors('O=C(O)Cc1nn(Cc2nc3cc(C(F)
(F)F)ccc3s2)c(=O)c2ccccc12') limit 5;!
molregno | m | similarity !
----------+------------------------------------------------------+-------------------!
7502 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12 | 1!
24184 | O=C(O)Cc1nn(Cc2nc3ccc(C(F)(F)F)cc3s2)c(=O)c2ccccc12 | 0.859649122807018!
24153 | O=C(O)Cc1nn(CCc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12 | 0.830508474576271!
24152 | O=C(O)Cc1nn(Cc2nc3ccccc3s2)c(=O)c2cc(C(F)(F)F)ccc12 | 0.813559322033898!
24150 | O=C(O)Cc1nn(Cc2nc3ccccc3s2)c(=O)c2ccc(C(F)(F)F)cc12 | 0.813559322033898!
(5 rows)!
!
Time: 1222.426 ms!
!
!
Notice that results come back in sorted order
PostgreSQL integration
Other functionality
+
chembl_17=# select mol_formula('O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12');!
mol_formula !
---------------!
C19H12F3N3O3S!
(1 row)!
chembl_17=# select mol_logp('O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12');!
mol_logp !
----------!
3.7004!
(1 row)!
chembl_17=# select mol_inchi('O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12');
mol_inchi !
------------------------------------------------------------------------------------------
-----------------------------------------------!
InChI=1S/C19H12F3N3O3S/
c20-19(21,22)10-5-6-15-14(7-10)23-16(29-15)9-25-18(28)12-4-2-1-3-11(12)13(24-25)8-17(26)27
/h1-7H,8-9H2,(H,26,27)!
(1 row)!
!
!
!
PostgreSQL integration
Other functionality
+
chembl_17=# select mol_to_ctab('CC'::mol);!
mol_to_ctab !
-----------------------------------------------------------------------!
+!
RDKit 2D +!
+!
2 1 0 0 0 0 0 0 0 0999 V2000 +!
0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0+!
1.2990 0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0+!
1 2 1 0 +!
M END +!
!
(1 row)!
!
!
!
IPython notebok integration
§  IPython: a very powerful interactive shell for python
http://www.ipython.org
§  IPython notebook: IPython in the browser, with graphics
•  combines code and output in one place
•  great tool for reproducible research
•  Example notebook with graphics.
§  RDKit integration:
•  Display molecules, substructure matches, reactions, graphics from PyMOL
+
IPython notebook integration:
Molecule tables
http://rdkit.blogspot.ch/2014/02/more-on-datasets-ii.html
+
IPython notebook integration:
Similarity Maps
+
Riniker, S. & Landrum, G. A. J Cheminf (2013). http://www.jcheminf.com/content/5/1/43
IPython notebook integration:
PyMol
http://rdkit.blogspot.ch/2013/12/using-allchemconstrainedembed.html
+
Pandas integration
§  Pandas: library for working with data tables in Python. Integrates well
with matplotlib and ipython
http://pandas.pydata.org/
§  RDKit integration:
•  Load smiles tables or SD files into Pandas data tables
•  Adds molecule columns to existing tables with smiles/SD columns
•  Enables substructure filters on tables
•  Integration with IPython notebook to render molecules
+
Pandas integration
+
http://nbviewer.ipython.org/github/rdkit/UGM_2013/blob/master/Tutorials/pandastools/Pandas_RDKit_UGM.ipynb
Substructure filters
Molecules in tables
Lucene integration
§  Still in the experimental stage
§  Adds substructure search functionality with fingerprint screenout to
Lucene
§  Includes demo app for testing
+
RDKit in NIBR
§  Extensive use by CADD, informaticians, and IT
§  Lots of convenience code/wrappers for accessing internal data sources
and tools
§  Combined with the Avalon toolkit (another NIBR-supported open-
source project), provides the underpinning for many of our global
chemistry-based applications
+
The Avalon toolkit
§  C/Java cheminformatics toolkit
§  Primary author: Bernd Rohde (NIBRIT Basel)
§  http://sourceforge.net/projects/avalontoolkit/
§  Functionality:
•  Canonical SMILES
•  Avalon fingerprint (highly optimized substructure fingerprint)
•  Molecular standardization (STRUCHK)
•  2D Coordinate generation
•  Tomcat webapp for 2D rendering
§  The RDKit has (optional) Python bindings for much of the functionality
+
RDKit in NIBR
Case study 1: CIx Framework
§  “Service bus” for cheminformatics/CADD services
§  Handles format conversions for input/output automatically
i.e. callers can provide SMILES input to a service/model wants CTABs with 3D
coordinates
§  Supports versioning of models/services
§  Tight integration with scientific tools (e.g. Tibco Spotfire, Knime, Instant
JChem, etc.)
§  Enables trivial addition of “chemical intelligence” to web apps
§  Makes it easy to globally deploy models: once a new model/service (or
new version of a model/service) is registered with the Framework, it is
instantly globally accessible
+
CIx Framework architecture
Translation service
- molecule format conversion
- name lookup
XML File exchange
between engine and the
Models
Database to store
Model information
Model registration and
Request service
Web Model
Registration
Portal Front
end
Cix Tools Framework:
Cix Tools Web
Service
-SOAP
-REST
Model
Script
Model
Model
Script
Model
Model
Script
Model
Model
Script
Model
CIX Tools Engine
Data
In one of the following
formats:
- TSV/CSV File
- SMILES/CPD_NO
- SD-File
- DART query
XML File exchange
between engine and the
Translation service
Get the Model info from the Database
Client
- web app
-  KNIME
-  Spotfire
-  IJC
-  Python
Java/Tomcat
Python/Django
Geographically diverse servers
Most models are Python/Django
+
RDKit in NIBR
Case study 2: Small-Molecule Registration
§  Internally developed web application for compound registration
§  C#-based web services writing to Oracle
§  RDKit + Avalon toolkit for structure standardization
§  RDKit + InChI used for structure-key calculation
§  Calls out to CIx Framework for standard computed properties
§  Independent (but validated) Python implementation of standardization
and structure-key calculation for standalone use
+
RDKit in NIBR
Case study 3: QSAR Toolkit
§  Descriptor calculator providing access to all available internal
descriptors
§  Tools for pulling assay data from our data warehouse
§  Standardized model-building
§  Standardized reporting for evaluation and peer review
§  Packaging for deployment via CIx Framework
§  Model Watchdog:
Pulls most recent data, generates predictions, creates report showing evolution
of model accuracy over time
+
RDKit in NIBR
Case study 4: Similarity Server
§  Central PostgreSQL database with easily available compounds
•  in-house available
•  available from reliable vendors
§  Kept up-to-date
§  Substructure search
§  Similarity search with various fingerprints:
•  Avalon
•  Morgan2, Morgan3, FeatMorgan2
•  Atom Pairs, Topological Torsions
§  Web services interface
§  Available to chemists via one of their standard desktop tools
+
NIBR Open Source
Something new
Acknowledgements
§  General:
•  Remy Evard (NIBR/Informatics)
•  Richard Lewis (NIBR/GDC)
•  Tom Digby (NIBR/Legal)
•  Peter Gedeck (NIBR/GDC)
•  Nik Stiefl (NIBR/GDC)
§  RDKit Community
•  Roger Sayle (NextMove): PDB Parser
•  Andrew Dalke (Dalke Scientific): FMCS
•  Paolo Tosco (University of Turin):
MMFF94, Open3DAlign
•  Jameed Hussain (GSK): Fraggle,
mmpa
§  Pandas, scikit-learn:
•  Sereina Riniker (NIBR/Informatics)
•  Nikolas Fechner (NIBR/Informatics)
http://www.rdkit.org
§  Knime:
•  Manuel Schwarze (NIBR/Informatics)
•  Thorsten Meinl (knime.com)
•  Bernd Wiswedel (knime.com)
§  SMR
•  Thomas Mueller (NIBR/Informatics)
•  Thomas Veith (NIBR/Informatics)
•  Dave Cotter (NIBR/Informatics)
§  QSAR Toolkit:
•  Peter Gedeck (NIBR/GDC)
•  Nikolas Fechner (NIBR/Informatics)
§  CIx Framework
•  Sandra Mueller (NIBR/Informatics)
•  Joerg Muehlbacher (NIBR/CPC)
•  Riccardo Vianello (NIBR/Informatics)
§  NIBR Open Source
•  Ken Robbins (NIBR/Informatics)
•  Dennis Jen (NIBR/Informatics)
•  Mark Schreiber (NIBR/Informatics)
Advertising
33
3rd RDKit User Group Meeting
22-24 October 2014
Merck KGaA, Darmstadt, Germany
Talks, “talktorials”, lightning talks, social activities, and a hackathon on
the 24th.
Registration: http://goo.gl/z6QzwD
Full announcement: http://goo.gl/ZUm2wm
We’re looking for speakers. Please contact greg.landrum@gmail.com

More Related Content

What's hot

MobileNet - PR044
MobileNet - PR044MobileNet - PR044
MobileNet - PR044Jinwon Lee
 
Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edu...
Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edu...Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edu...
Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edu...Edureka!
 
LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)Yu Huang
 
Understanding RNN and LSTM
Understanding RNN and LSTMUnderstanding RNN and LSTM
Understanding RNN and LSTM健程 杨
 
Fake Image Identification
Fake Image IdentificationFake Image Identification
Fake Image IdentificationVenkat Projects
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
Binary Similarity : Theory, Algorithms and Tool Evaluation
Binary Similarity :  Theory, Algorithms and  Tool EvaluationBinary Similarity :  Theory, Algorithms and  Tool Evaluation
Binary Similarity : Theory, Algorithms and Tool EvaluationLiwei Ren任力偉
 
Docker 基礎介紹與實戰
Docker 基礎介紹與實戰Docker 基礎介紹與實戰
Docker 基礎介紹與實戰Bo-Yi Wu
 
Semantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesFellowship at Vodafone FutureLab
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - IntroductionJungwon Kim
 
Introduction to Computer Vision using OpenCV
Introduction to Computer Vision using OpenCVIntroduction to Computer Vision using OpenCV
Introduction to Computer Vision using OpenCVDylan Seychell
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningMohamed Loey
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersJinwon Lee
 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNNPradnya Saval
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object DetectionTaegyun Jeon
 
Word representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2VecWord representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2Vecananth
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것NAVER Engineering
 

What's hot (20)

MobileNet - PR044
MobileNet - PR044MobileNet - PR044
MobileNet - PR044
 
Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edu...
Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edu...Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edu...
Artificial Neural Network Tutorial | Deep Learning With Neural Networks | Edu...
 
LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)LiDAR-based Autonomous Driving III (by Deep Learning)
LiDAR-based Autonomous Driving III (by Deep Learning)
 
Understanding RNN and LSTM
Understanding RNN and LSTMUnderstanding RNN and LSTM
Understanding RNN and LSTM
 
Fake Image Identification
Fake Image IdentificationFake Image Identification
Fake Image Identification
 
Opencv
OpencvOpencv
Opencv
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Binary Similarity : Theory, Algorithms and Tool Evaluation
Binary Similarity :  Theory, Algorithms and  Tool EvaluationBinary Similarity :  Theory, Algorithms and  Tool Evaluation
Binary Similarity : Theory, Algorithms and Tool Evaluation
 
OpenCV Workshop
OpenCV WorkshopOpenCV Workshop
OpenCV Workshop
 
Docker 基礎介紹與實戰
Docker 基礎介紹與實戰Docker 基礎介紹與實戰
Docker 基礎介紹與實戰
 
Semantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network ApproachesSemantic segmentation with Convolutional Neural Network Approaches
Semantic segmentation with Convolutional Neural Network Approaches
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - Introduction
 
Introduction to Computer Vision using OpenCV
Introduction to Computer Vision using OpenCVIntroduction to Computer Vision using OpenCV
Introduction to Computer Vision using OpenCV
 
Convolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep LearningConvolutional Neural Network Models - Deep Learning
Convolutional Neural Network Models - Deep Learning
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
 
Deep Learning - RNN and CNN
Deep Learning - RNN and CNNDeep Learning - RNN and CNN
Deep Learning - RNN and CNN
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
 
Word representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2VecWord representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2Vec
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 

Viewers also liked

1st Zone Asian Photo Circuit 2016
1st Zone Asian Photo Circuit 20161st Zone Asian Photo Circuit 2016
1st Zone Asian Photo Circuit 2016maditabalnco
 
The Do's of Onboarding: How to Improve Employee Retention
The Do's of Onboarding: How to Improve Employee RetentionThe Do's of Onboarding: How to Improve Employee Retention
The Do's of Onboarding: How to Improve Employee RetentionCGS
 
Automotive SEO - Don't Risk Your Business
Automotive SEO - Don't Risk Your BusinessAutomotive SEO - Don't Risk Your Business
Automotive SEO - Don't Risk Your BusinessGreg Gifford
 
Quand lecture rime avec plaisir
Quand lecture rime avec plaisirQuand lecture rime avec plaisir
Quand lecture rime avec plaisirSoumia EL Yaacoubi
 
Strategy Instruction in writing
Strategy Instruction in writingStrategy Instruction in writing
Strategy Instruction in writingmystiquemel
 
Ppt eng y4
Ppt eng y4Ppt eng y4
Ppt eng y4azura272
 
Zentangle Animals
Zentangle AnimalsZentangle Animals
Zentangle Animalsquicarroll
 
Musicas cifradas mpb 5
Musicas cifradas mpb 5Musicas cifradas mpb 5
Musicas cifradas mpb 5Nome Sobrenome
 
(Nunca) perder la esperanza.
(Nunca) perder la esperanza.(Nunca) perder la esperanza.
(Nunca) perder la esperanza.José María
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Greg Landrum
 
Rpp matematika SMA (lingkaran)
Rpp matematika SMA (lingkaran)Rpp matematika SMA (lingkaran)
Rpp matematika SMA (lingkaran)Heriyanto Asep
 

Viewers also liked (16)

BEDP II
BEDP IIBEDP II
BEDP II
 
1st Zone Asian Photo Circuit 2016
1st Zone Asian Photo Circuit 20161st Zone Asian Photo Circuit 2016
1st Zone Asian Photo Circuit 2016
 
The Do's of Onboarding: How to Improve Employee Retention
The Do's of Onboarding: How to Improve Employee RetentionThe Do's of Onboarding: How to Improve Employee Retention
The Do's of Onboarding: How to Improve Employee Retention
 
Automotive SEO - Don't Risk Your Business
Automotive SEO - Don't Risk Your BusinessAutomotive SEO - Don't Risk Your Business
Automotive SEO - Don't Risk Your Business
 
Quand lecture rime avec plaisir
Quand lecture rime avec plaisirQuand lecture rime avec plaisir
Quand lecture rime avec plaisir
 
Strategy Instruction in writing
Strategy Instruction in writingStrategy Instruction in writing
Strategy Instruction in writing
 
Ppt eng y4
Ppt eng y4Ppt eng y4
Ppt eng y4
 
P7 e2 josemariabarrio
P7 e2 josemariabarrio P7 e2 josemariabarrio
P7 e2 josemariabarrio
 
538df1cdf0b7f
538df1cdf0b7f538df1cdf0b7f
538df1cdf0b7f
 
Zentangle Animals
Zentangle AnimalsZentangle Animals
Zentangle Animals
 
Musicas cifradas mpb 5
Musicas cifradas mpb 5Musicas cifradas mpb 5
Musicas cifradas mpb 5
 
(Nunca) perder la esperanza.
(Nunca) perder la esperanza.(Nunca) perder la esperanza.
(Nunca) perder la esperanza.
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...
 
Rpp matematika SMA (lingkaran)
Rpp matematika SMA (lingkaran)Rpp matematika SMA (lingkaran)
Rpp matematika SMA (lingkaran)
 
SPA: Key Questions
SPA: Key QuestionsSPA: Key Questions
SPA: Key Questions
 
New IBM Mainframe 2016 - Z13
New IBM Mainframe 2016 - Z13 New IBM Mainframe 2016 - Z13
New IBM Mainframe 2016 - Z13
 

Similar to Open-source from/in the enterprise: the RDKit

A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptSanket Shikhar
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Anubhav Jain
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookKeiichiro Ono
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Rothamsted Research, UK
 
Handling data and workflows in computational materials science: the AiiDA ini...
Handling data and workflows in computational materials science: the AiiDA ini...Handling data and workflows in computational materials science: the AiiDA ini...
Handling data and workflows in computational materials science: the AiiDA ini...Research Data Alliance
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupNed Shawa
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksAnyscale
 
OpenDiscovery
OpenDiscoveryOpenDiscovery
OpenDiscoverygwprice
 
GEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsGEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsTanu Malik
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for SparkMark Kerzner
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqEnis Afgan
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
 
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...Keiichiro Ono
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging EnvironmentsPaul Groth
 

Similar to Open-source from/in the enterprise: the RDKit (20)

A Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.pptA Hands-on Intro to Data Science and R Presentation.ppt
A Hands-on Intro to Data Science and R Presentation.ppt
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter Notebook
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
Getting the best of Linked Data and Property Graphs: rdf2neo and the KnetMine...
 
Handling data and workflows in computational materials science: the AiiDA ini...
Handling data and workflows in computational materials science: the AiiDA ini...Handling data and workflows in computational materials science: the AiiDA ini...
Handling data and workflows in computational materials science: the AiiDA ini...
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
 
Jump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with DatabricksJump Start on Apache Spark 2.2 with Databricks
Jump Start on Apache Spark 2.2 with Databricks
 
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark MeetupDistributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
 
From Laboratory to e-Laboratory
From Laboratory to e-LaboratoryFrom Laboratory to e-Laboratory
From Laboratory to e-Laboratory
 
OpenDiscovery
OpenDiscoveryOpenDiscovery
OpenDiscovery
 
GEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsGEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC Programs
 
IBM Strategy for Spark
IBM Strategy for SparkIBM Strategy for Spark
IBM Strategy for Spark
 
Introduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-SeqIntroduction to Galaxy and RNA-Seq
Introduction to Galaxy and RNA-Seq
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Bosco r users2013
Bosco r users2013Bosco r users2013
Bosco r users2013
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
SDCSB Advanced Tutorial: Reproducible Data Visualization Workflow with Cytosc...
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 

More from Greg Landrum

Chemical registration
Chemical registrationChemical registration
Chemical registrationGreg Landrum
 
Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022Greg Landrum
 
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...Greg Landrum
 
ACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsGreg Landrum
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Greg Landrum
 
Moving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningMoving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningGreg Landrum
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Greg Landrum
 
Let’s talk about reproducible data analysis
Let’s talk about reproducible data analysisLet’s talk about reproducible data analysis
Let’s talk about reproducible data analysisGreg Landrum
 
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them? How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them? Greg Landrum
 
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Greg Landrum
 
Processing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialProcessing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialGreg Landrum
 
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Greg Landrum
 
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchGreg Landrum
 
Some "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontSome "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontGreg Landrum
 
Large scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataLarge scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataGreg Landrum
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knimeGreg Landrum
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesGreg Landrum
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Greg Landrum
 

More from Greg Landrum (18)

Chemical registration
Chemical registrationChemical registration
Chemical registration
 
Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022Mike Lynch Award Lecture, ICCS 2022
Mike Lynch Award Lecture, ICCS 2022
 
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
 
ACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformatics
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
 
Moving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningMoving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine Learning
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
 
Let’s talk about reproducible data analysis
Let’s talk about reproducible data analysisLet’s talk about reproducible data analysis
Let’s talk about reproducible data analysis
 
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them? How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
 
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...
 
Processing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialProcessing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorial
 
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Big (chemical) data? No Problem!
Big (chemical) data? No Problem!
 
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical research
 
Some "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontSome "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data front
 
Large scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataLarge scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent data
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knime
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databases
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
 

Recently uploaded

JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorTier1 app
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingShane Coughlan
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencessuser9e7c64
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 

Recently uploaded (20)

JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
Effectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryErrorEffectively Troubleshoot 9 Types of OutOfMemoryError
Effectively Troubleshoot 9 Types of OutOfMemoryError
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full RecordingOpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
OpenChain AI Study Group - Europe and Asia Recap - 2024-04-11 - Full Recording
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conference
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 

Open-source from/in the enterprise: the RDKit

  • 1. Open-source from/in the enterprise: the RDKit Gregory Landrum NIBR Informatics Novartis Institutes for BioMedical Research, Basel, Switzerland
  • 2. Outline §  What is the RDKit? §  RDKit integration with other open-source projects •  Knime •  PostgreSQL •  IPython •  Pandas •  Lucene §  RDKit in NIBR, some case studies
  • 3. RDKit: What is it? §  Open-source C++ toolkit for cheminformatics §  Wrappers for Python (2.x), Java, C# §  Functionality: •  2D and 3D molecular operations •  Descriptor generation for machine learning •  PostgreSQL database cartridge for substructure and similarity searching •  Knime nodes •  IPython integration •  Lucene integration (experimental) •  Supports Mac/Windows/Linux §  Releases every 6 months §  business-friendly BSD license §  Code: https://github.com/rdkit §  http://www.rdkit.org
  • 4. The community §  Mailing lists hosted at sourceforge: https://sourceforge.net/p/rdkit/ mailman/ §  Active participants from academia, small and large pharma, software companies, and service providers §  30+ attendees at each of the two user group meetings
  • 5. Some features §  Input/Output: SMILES/SMARTS, SDF, TDT, PDB, SLN [1], Corina mol2 [1] §  “Cheminformatics”: •  Substructure searching •  Canonical SMILES •  Chirality support (i.e. R/S or E/Z labeling) •  Chemical transformations (e.g. remove matching substructures) •  Chemical reactions §  2D depiction, including constrained depiction §  2D->3D conversion/conformational analysis via distance geometry §  UFF and MMFF94 implementation for cleaning up structures §  Fingerprinting: Daylight-like, atom pairs, topological torsions, Morgan algorithm, “MACCS keys”, etc. §  Similarity/diversity picking §  2D pharmacophores [1] §  Gasteiger-Marsili charges §  Hierarchical subgraph/fragment analysis §  Bemis and Murcko scaffold determination §  RECAP and BRICS implementations §  Multi-molecule maximum common substructure §  Feature maps §  Shape-based similarity §  Fraggle similarity (from GSK) §  Molecule-molecule alignment §  Open3DAlign implementation §  Integration with PyMOL for 3D visualization §  Functional group filtering §  Salt stripping §  Molecular descriptor library: Topological (κ3, Balaban J, etc.), Compositional (Number of Rings, Number of Aromatic Heterocycles, etc.), EState, SlogP/SMR (Wildman and Crippen approach), “MOE like” VSA descriptors, Feature-map vectors §  Machine Learning: •  Clustering (hierarchical) •  Information theory (Shannon entropy, information gain, etc.) §  Tight integration with the IPython notebook and pandas §  Integration with the InChI library [1] These implementations are functional but are not necessarily the best, fastest, or most complete.
  • 6. The contrib dir §  LEF (Anna Vulpetti, NIBR): Local Environment of Fluorine §  PBF (Nicholas Firth, ICR): Plane of best fit descriptor §  SA_Score (Peter Ertl, NIBR): synthetic-accessibility score §  fraggle (Jameed Hussain, GSK): fragment-based similarity §  mmpa (Jameed Hussain, GSK): molecular matched pairs §  pzc (Paul Czodrowski, Merck KGaA): tools for building and validating classifiers §  ConformerParser (Sereina Riniker, NIBR): parser for Amber trajectory files
  • 7. C++ : Core data structures and algorithms Postgre SQL Java SWIG Python Boost.Python Knime What is this all about? script inter- active Exact same algorithms/implementations accessible from many different endpoints C# App
  • 8. Knime integration §  Open-source RDKit-based nodes for Knime providing cheminformatics functionality + §  Trusted nodes distributed from knime community site §  Work in progress: more nodes being added (new wizard makes it easy)
  • 10. RDKit Interactive Table §  KNIME interactive table with molecules as column headers +
  • 11. + Functionality for working with 3D molecules §  Example: flexible molecule-molecule alignment
  • 12. PostgreSQL integration §  PostgreSQL (http://www.postgresql.org): a robust, flexible, and extensible relational open-source database. Rich collection of extensions available §  RDKit “cartridge”: •  Fast substructure and similarity search •  Fingerprints (count-based and bit-vector): Morgan (ECFP-like), FeatMorgan (FCFP-like), RDKit (Daylight like), atom pair, topological torsion, MACCS •  Standard molecule properties and descriptors §  Basis for myChEMBL (http://chembl.blogspot.co.uk/2013/10/chembl- virtual-machine-aka-mychembl.html) Ochoa, R., Davies, M., Papadatos, G., Atkinson, F., & Overington, J. P. (2014). myChEMBL: a virtual machine implementation of open data and cheminformatics tools. Bioinformatics, 30(2), 298–300. +
  • 13. PostgreSQL integration Substructure search + chembl_17=# select molregno,m from rdk.mols where m@>'c1ccc2c(c1)C(=NN(C2=O)Cc3nc4cc(ccc4s3)C)CC(=O)O';! molregno | m ! ----------+---------------------------------------------------------------! 7502 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12! 23364 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)cc(C(F)(F)F)c3s2)c(=O)c2ccccc12! 23439 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)cc(Cl)c3s2)c(=O)c2ccccc12! 23462 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)cc(F)c3s2)c(=O)c2ccccc12! 24192 | Cc1cc2nc(Cn3nc(CC(=O)O)c4ccccc4c3=O)sc2c(C)c1! 24190 | COc1cc2sc(Cn3nc(CC(=O)O)c4ccccc4c3=O)nc2cc1C(F)(F)F! 24194 | Cc1ccc2sc(Cn3nc(CC(=O)O)c4ccccc4c3=O)nc2c1! 24237 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)c(O)cc3s2)c(=O)c2ccccc12! 24331 | CC(c1nc2cc(C(F)(F)F)ccc2s1)n1nc(CC(=O)O)c2ccccc2c1=O! (9 rows)! ! Time: 112.325 ms!
  • 14. PostgreSQL integration Similarity search + chembl_17=# select * from get_mfp2_neighbors('O=C(O)Cc1nn(Cc2nc3cc(C(F) (F)F)ccc3s2)c(=O)c2ccccc12') limit 5;! molregno | m | similarity ! ----------+------------------------------------------------------+-------------------! 7502 | O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12 | 1! 24184 | O=C(O)Cc1nn(Cc2nc3ccc(C(F)(F)F)cc3s2)c(=O)c2ccccc12 | 0.859649122807018! 24153 | O=C(O)Cc1nn(CCc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12 | 0.830508474576271! 24152 | O=C(O)Cc1nn(Cc2nc3ccccc3s2)c(=O)c2cc(C(F)(F)F)ccc12 | 0.813559322033898! 24150 | O=C(O)Cc1nn(Cc2nc3ccccc3s2)c(=O)c2ccc(C(F)(F)F)cc12 | 0.813559322033898! (5 rows)! ! Time: 1222.426 ms! ! ! Notice that results come back in sorted order
  • 15. PostgreSQL integration Other functionality + chembl_17=# select mol_formula('O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12');! mol_formula ! ---------------! C19H12F3N3O3S! (1 row)! chembl_17=# select mol_logp('O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12');! mol_logp ! ----------! 3.7004! (1 row)! chembl_17=# select mol_inchi('O=C(O)Cc1nn(Cc2nc3cc(C(F)(F)F)ccc3s2)c(=O)c2ccccc12'); mol_inchi ! ------------------------------------------------------------------------------------------ -----------------------------------------------! InChI=1S/C19H12F3N3O3S/ c20-19(21,22)10-5-6-15-14(7-10)23-16(29-15)9-25-18(28)12-4-2-1-3-11(12)13(24-25)8-17(26)27 /h1-7H,8-9H2,(H,26,27)! (1 row)! ! ! !
  • 16. PostgreSQL integration Other functionality + chembl_17=# select mol_to_ctab('CC'::mol);! mol_to_ctab ! -----------------------------------------------------------------------! +! RDKit 2D +! +! 2 1 0 0 0 0 0 0 0 0999 V2000 +! 0.0000 0.0000 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0+! 1.2990 0.7500 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0+! 1 2 1 0 +! M END +! ! (1 row)! ! ! !
  • 17. IPython notebok integration §  IPython: a very powerful interactive shell for python http://www.ipython.org §  IPython notebook: IPython in the browser, with graphics •  combines code and output in one place •  great tool for reproducible research •  Example notebook with graphics. §  RDKit integration: •  Display molecules, substructure matches, reactions, graphics from PyMOL +
  • 18. IPython notebook integration: Molecule tables http://rdkit.blogspot.ch/2014/02/more-on-datasets-ii.html +
  • 19. IPython notebook integration: Similarity Maps + Riniker, S. & Landrum, G. A. J Cheminf (2013). http://www.jcheminf.com/content/5/1/43
  • 21. Pandas integration §  Pandas: library for working with data tables in Python. Integrates well with matplotlib and ipython http://pandas.pydata.org/ §  RDKit integration: •  Load smiles tables or SD files into Pandas data tables •  Adds molecule columns to existing tables with smiles/SD columns •  Enables substructure filters on tables •  Integration with IPython notebook to render molecules +
  • 23. Lucene integration §  Still in the experimental stage §  Adds substructure search functionality with fingerprint screenout to Lucene §  Includes demo app for testing +
  • 24. RDKit in NIBR §  Extensive use by CADD, informaticians, and IT §  Lots of convenience code/wrappers for accessing internal data sources and tools §  Combined with the Avalon toolkit (another NIBR-supported open- source project), provides the underpinning for many of our global chemistry-based applications +
  • 25. The Avalon toolkit §  C/Java cheminformatics toolkit §  Primary author: Bernd Rohde (NIBRIT Basel) §  http://sourceforge.net/projects/avalontoolkit/ §  Functionality: •  Canonical SMILES •  Avalon fingerprint (highly optimized substructure fingerprint) •  Molecular standardization (STRUCHK) •  2D Coordinate generation •  Tomcat webapp for 2D rendering §  The RDKit has (optional) Python bindings for much of the functionality +
  • 26. RDKit in NIBR Case study 1: CIx Framework §  “Service bus” for cheminformatics/CADD services §  Handles format conversions for input/output automatically i.e. callers can provide SMILES input to a service/model wants CTABs with 3D coordinates §  Supports versioning of models/services §  Tight integration with scientific tools (e.g. Tibco Spotfire, Knime, Instant JChem, etc.) §  Enables trivial addition of “chemical intelligence” to web apps §  Makes it easy to globally deploy models: once a new model/service (or new version of a model/service) is registered with the Framework, it is instantly globally accessible +
  • 27. CIx Framework architecture Translation service - molecule format conversion - name lookup XML File exchange between engine and the Models Database to store Model information Model registration and Request service Web Model Registration Portal Front end Cix Tools Framework: Cix Tools Web Service -SOAP -REST Model Script Model Model Script Model Model Script Model Model Script Model CIX Tools Engine Data In one of the following formats: - TSV/CSV File - SMILES/CPD_NO - SD-File - DART query XML File exchange between engine and the Translation service Get the Model info from the Database Client - web app -  KNIME -  Spotfire -  IJC -  Python Java/Tomcat Python/Django Geographically diverse servers Most models are Python/Django +
  • 28. RDKit in NIBR Case study 2: Small-Molecule Registration §  Internally developed web application for compound registration §  C#-based web services writing to Oracle §  RDKit + Avalon toolkit for structure standardization §  RDKit + InChI used for structure-key calculation §  Calls out to CIx Framework for standard computed properties §  Independent (but validated) Python implementation of standardization and structure-key calculation for standalone use +
  • 29. RDKit in NIBR Case study 3: QSAR Toolkit §  Descriptor calculator providing access to all available internal descriptors §  Tools for pulling assay data from our data warehouse §  Standardized model-building §  Standardized reporting for evaluation and peer review §  Packaging for deployment via CIx Framework §  Model Watchdog: Pulls most recent data, generates predictions, creates report showing evolution of model accuracy over time +
  • 30. RDKit in NIBR Case study 4: Similarity Server §  Central PostgreSQL database with easily available compounds •  in-house available •  available from reliable vendors §  Kept up-to-date §  Substructure search §  Similarity search with various fingerprints: •  Avalon •  Morgan2, Morgan3, FeatMorgan2 •  Atom Pairs, Topological Torsions §  Web services interface §  Available to chemists via one of their standard desktop tools +
  • 32. Acknowledgements §  General: •  Remy Evard (NIBR/Informatics) •  Richard Lewis (NIBR/GDC) •  Tom Digby (NIBR/Legal) •  Peter Gedeck (NIBR/GDC) •  Nik Stiefl (NIBR/GDC) §  RDKit Community •  Roger Sayle (NextMove): PDB Parser •  Andrew Dalke (Dalke Scientific): FMCS •  Paolo Tosco (University of Turin): MMFF94, Open3DAlign •  Jameed Hussain (GSK): Fraggle, mmpa §  Pandas, scikit-learn: •  Sereina Riniker (NIBR/Informatics) •  Nikolas Fechner (NIBR/Informatics) http://www.rdkit.org §  Knime: •  Manuel Schwarze (NIBR/Informatics) •  Thorsten Meinl (knime.com) •  Bernd Wiswedel (knime.com) §  SMR •  Thomas Mueller (NIBR/Informatics) •  Thomas Veith (NIBR/Informatics) •  Dave Cotter (NIBR/Informatics) §  QSAR Toolkit: •  Peter Gedeck (NIBR/GDC) •  Nikolas Fechner (NIBR/Informatics) §  CIx Framework •  Sandra Mueller (NIBR/Informatics) •  Joerg Muehlbacher (NIBR/CPC) •  Riccardo Vianello (NIBR/Informatics) §  NIBR Open Source •  Ken Robbins (NIBR/Informatics) •  Dennis Jen (NIBR/Informatics) •  Mark Schreiber (NIBR/Informatics)
  • 33. Advertising 33 3rd RDKit User Group Meeting 22-24 October 2014 Merck KGaA, Darmstadt, Germany Talks, “talktorials”, lightning talks, social activities, and a hackathon on the 24th. Registration: http://goo.gl/z6QzwD Full announcement: http://goo.gl/ZUm2wm We’re looking for speakers. Please contact greg.landrum@gmail.com