SlideShare a Scribd company logo
RDKit: where did we come from and where are
we going?
Greg Landrum (@dr_greg_landrum)
12th International Conference on Chemical Structures
12 June, 2022
The Trustees of the CSA Trust are pleased to announce that
Greg Landrum has been awarded the 2022 Mike Lynch
Award, in recognition of his work on the development of
RDKit and his fostering of the community around it, a
transformative software resource for cheminformatics and
machine learning. https://csa-trust.org/2022/05/13/mike-lynch-award-2022-greg-landrum/
The purpose of the Award is to recognise and encourage outstanding
accomplishments in education, research and development activities that are
related to the systems and methods used to store, process and retrieve
information about chemical structures, reactions and properties.
The Mike Lynch Award will be presented at a prestigious, relevant conference
to be identified prior to each presentation and the awardee will be asked to
give a presentation at the conference. https://csa-trust.org/awards-and-grants/awards/
3
The RDKit
4
Acknowledgements
● Everyone who has contributed code, questions,
answers, bug reports, etc
● The people who manage RDKit packaging
● The organizers and sponsors of the RDKit
UGMs
● People who have funded RDKit development
(directly or indirectly)
● The others in our community who've been
pushing the idea and adoption of open source
5
An open source toolkit for cheminformatics
● Business-friendly BSD license
● Core data structures and algorithms in
C++
● Python 3.x wrapper generated using
Boost.Python
● Java and C# wrappers generated with
SWIG
● JavaScript wrappers
● CFFI wrapper for usage from other
languages
● 2D and 3D molecular operations
● Descriptor generation for machine
learning
● Molecular database cartridge for
PostgreSQL
● Cheminformatics nodes for KNIME
(distributed from the KNIME
community site:
http://www.knime.org/rdkit)
6
Ecodesystem
Exact same implementation regardless of where you are using it from
7
Releases, reproducibility, and citability
● 2 feature releases per year
● ~monthly patch releases with bug fixes
● Every release is assigned a DOI and archived on Zenodo
https://zenodo.org/record/6483170
8
Packaging
- conda-forge: conda install -c conda-forge rdkit
- pypi: pip install rdkit-pypi
- npm: npm i @rdkit/rdkit
- apt: apt install python3-rdkit postgresql-14-rdkit
9
Sustainability: the bus problem
https://commons.wikimedia.org/wiki/File:Postauto_susten.jpg
10
Sustainability: the bus problem
RDKit maintainers:
- Greg
- Brian Kelley (Relay Therapeutics)
- Ricardo Rodriguez (Schrödinger)
- Paolo Tosco (Novartis)
Regular code contributors:
- David Cosgrove
- Peter Gedeck
- Gareth Jones
- Eisuke Kawashima
- Dan Nealschneider
- Sereina Riniker
- Roger Sayle
- Riccardo Vianello
The RDKit community
How it started…
The RDKit community
How it’s going…
Where we came from, where we’re going
14
The early days
● 2000-2006: initial development work at Rational Discovery
● 2006: code open sourced and released on sourceforge.net
15
Aside: some motivations for open-sourcing scientific code
● Recognition
● Helping the scientific community
● Feedback and help from others
● You get to keep using the code when you move on
to your next position
16
Some history
● 2000-2006: initial development work at Rational Discovery
● 2006: code open sourced and released on sourceforge.net
● 2007: First NIBR contribution (chemical reaction handling); Noel discovers the RDKit
● 2008: first POC of Java wrapper; Mac support added; SLN and Mol2 parsers;
● 2009: Morgan fingerprints; switch to cmake; switch to VF2 for SSS
● 2010: PostgreSQL cartridge; First iteration of the KNIME nodes; $RDBASE/Contrib appears;
SaltRemover and FunctionalGroups code
● 2011: New Java wrappers; more functionality moved to C++; InChI support; AvalonTools
integration
● 2012: First UGM; Speed improvements; MCS implementation; IPython integration; “RDKit
Cookbook” appears
● 2013: Move to github; Pandas integration; MMFF and Open3DAlign support; PDB support;
rdkit blog started
17
Some history, cntd
● 2014: python3 support; conda integration; experimental lucene integration; MCS implementation in
C++
● 2015: new drawing code; improved canonicalization algorithm; ETKDG; reduced memory usage
● 2016: Regular patch releases; easier builds; performance improvements; KNIME nodes move to
Github
● 2017: Modern C++; R-group decomposition, first GSoC participation, conda-forge packages
● 2018: CoordGen integration; molecular standardization
● 2019: Azure DevOps, substructure speedup, new molecule hashing code, Neo4J integration, new JS
wrappers
● 2020: new CIP implementation, scaffold network, abbreviations, tautomer-insensitive substructure
search
● 2021: rdkit-cffi, more drawing improvements, R-group decomposition improvements
● 2022: C++17, generics for searching, non-tetrahedral symmetry…
An aside…
19
Looking forward
20
Longer term RDKit objectives
● Improved support for other classes of molecules
■ Polymers
■ Organometallics
● Ensuring that the PostgreSQL cartridge is a plausible
candidate for use in a corporate “data warehouse”1
● Ensuring all the pieces are in place to make it easy to
write a compound registration system
1
or whatever such things are called these days
21
Future directions: the cartridge
Ensuring that the PostgreSQL cartridge is a plausible candidate
for use in a corporate “data warehouse”
- Integration of tautomer insensitive search
- Integration of the MolStandardize code
- Improvements to the chemical reaction handling
- Integration of the generics for searching
Further ideas
- Adding some 3D search capabilities
22
Future directions: registration systems
First: what is a chemical registration system?
23
Aside: Goals of a compound registration system
We want to be able to answer these questions:
- Have we seen this compound before?
- Give me a key for this compound
- Give me the structure for this key
24
Aside: Goals of a compound registration system
We want to be able to answer these questions:
- Have we seen this compound before?
- Give me a key for this compound
- Give me the structure for this key
So what do we need to be able to do?
- Standardize molecules
- Generate hashes/keys for standardized molecules
- Store structures
25
Using keys for registration
Idea: use a hash to combine:
- The molecular structure (via a fixed H
InChI)
- A stereo code
- A stereo comment
https://github.com/rdkit/UGM_2015/blob/8f562e70add17bab35f43823af0f03673f8a
1f2d/Presentations/KeyToRegistration.GregLandrum.pdf
26
Future directions: registration systems
Ensuring all the pieces are in place to make it easy to write a compound registration system
- Improvements to MolStandardize code
- Improvements to the molecular hashing code
- Support for more other classes of molecules
27
Let’s talk about molecular identity
This isn’t just a topic for standard compound registration systems.
28
Molecular identity and computational questions
● Which molecules were used to generate this
result?
● Have I already done a calculation using this
molecule?
● Was this molecule part of my training set?
All of these require us to be able to answer
the question
“are these two molecules the same?”
Here be dragons…
29
Some things making molecular identity nontrivial
30
Some things making molecular identity nontrivial
● Counterions, solvents
● Resonance forms
● Charges
● Tautomers
● Stereochemistry
Sometimes we care about these differences, sometimes we don’t. It depends on the context
around when asking the question “are these two molecules the same?”
This is not a comprehensive list
31
Identity hashes for molecules
Idea: convert the molecule into some form which allows us to test whether or not it’s
identical to other molecules via a simple string (or numerical) comparison.
What “identical” means will be determined by the identity hash used.
Familiar examples:
- Canonical SMILES
- InChI
32
Contextual identity
Instead of having a single key/hash for a molecule, store a collection of layers with different
levels of detail/types of information. When searching, choose the layers which are relevant
for the current use case
● Store molecules using some relatively lossless format (e.g. v3000 SDF)
● Use molecular hashes capturing different levels of information to establish whether or
not duplicates exist
Note: it’s possible to do a limited version of this via careful manipulation of InChI strings
33
Some more identity hashes
https://www.nextmovesoftware.com/talks/OBoyle_MolHash_ACS_201908.pdf
Available in the RDKit since the 2019.09 release
34
Some of the basic identity hashes in rdMolHash
● Molecular formula
● Anonymous graph
● Element graph
● Murcko scaffold
● Tautomer
● Canonical smiles
There are many others
35
Hashes for registration
The team at Schrödinger1
have contributed a new RDKit module for calculating layered
hashes which are useful for compound identity testing and registration. This will be in the
2022.09 release.
Layers it currently supports:
- Formula
- Canonical SMILES : with and without stereo
- Tautomer hash: with and without stereo
- Sgroup data (for some help with polymers and things like atropisomers)
- “Escape layer” (free text allowing a structure to be different even if everything else says
it’s the same)
1
Chris Von Bargen, Hussein Faara, Dan Nealschneider, Ricardo Rodriguez, Rachel Walker
36
Registration hash example
{<HashLayer.CANONICAL_SMILES: 1>: 'COc1ccc2[nH]c([S@@](=O)Cc3ncc(C)c(OC)c3C)nc2c1',
<HashLayer.ESCAPE: 2>: '',
<HashLayer.FORMULA: 3>: 'C17H19N3O3S',
<HashLayer.NO_STEREO_SMILES: 4>: 'COc1ccc2[nH]c(S(=O)Cc3ncc(C)c(OC)c3C)nc2c1',
<HashLayer.NO_STEREO_TAUTOMER_HASH: 5>:
'CO[C]1[CH][CH][C]2[N][C]([S]([O])C[C]3[N][CH][C](C)[C](OC)[C]3C)[N][C]2[CH]1_1_0',
<HashLayer.SGROUP_DATA: 6>: '[]',
<HashLayer.TAUTOMER_HASH: 7>:
'CO[C]1[CH][CH][C]2[N][C]([S@@]([O])C[C]3[N][CH][C](C)[C](OC)[C]3C)[N][C]2[CH]1_1_0'}
37
Handling tautomers
{<HashLayer.CANONICAL_SMILES: 1>:
'CCCS(=O)(=O)Nc1ccc(F)c(C(=O)c2c[nH]c3ncc(-c
4ccc(Cl)cc4)cc23)c1F',
<HashLayer.ESCAPE: 2>: '',
<HashLayer.FORMULA: 3>: 'C23H18ClF2N3O3S',
…
<HashLayer.TAUTOMER_HASH: 7>:
'CCCS([O])([O])[N][C]1[CH][CH][C](F)[C]([C](
[O])[C]2[CH][N][C]3[N][CH][C]([C]4[CH][CH][C
](Cl)[CH][CH]4)[CH][C]32)[C]1F_2_0'}
{<HashLayer.CANONICAL_SMILES: 1>:
'CCCS(=O)(=O)Nc1ccc(F)c(C(=O)c2cnc3[nH]cc(-c
4ccc(Cl)cc4)cc2-3)c1F',
<HashLayer.ESCAPE: 2>: '',
<HashLayer.FORMULA: 3>: 'C23H18ClF2N3O3S',
…
<HashLayer.TAUTOMER_HASH: 7>:
'CCCS([O])([O])[N][C]1[CH][CH][C](F)[C]([C](
[O])[C]2[CH][N][C]3[N][CH][C]([C]4[CH][CH][C
](Cl)[CH][CH]4)[CH][C]32)[C]1F_2_0'}
38
Handling atropisomers
Structures from: https://doi.org/10.1016/j.xphs.2021.10.011
39
Handling atropisomers
Structures from: https://doi.org/10.1016/j.xphs.2021.10.011
The bold and hashed bonds are just drawing features and don’t survive translation
to things like CXSMILES or mol files. But we can use S groups to indicate the
stereochemistry
40
Handling atropisomers
Structures from: https://doi.org/10.1016/j.xphs.2021.10.011
{<HashLayer.CANONICAL_SMILES: 1>:
'COc1cc2ncc3c(c2cc1-c1cn(C)nc1C)n(-c1c(F)cncc1OC)c(=O
)n3C',
<HashLayer.ESCAPE: 2>: '',
<HashLayer.FORMULA: 3>: 'C23H21FN6O3',
…
<HashLayer.SGROUP_DATA: 6>: '[{"fieldName":
"atropisomer", "atom": [19, 20], "bonds": [],
"value": "M"}]',
…}
{<HashLayer.CANONICAL_SMILES: 1>:
'COc1cc2ncc3c(c2cc1-c1cn(C)nc1C)n(-c1c(F)cncc1OC)c(=O
)n3C',
<HashLayer.ESCAPE: 2>: '',
<HashLayer.FORMULA: 3>: 'C23H21FN6O3',
…
<HashLayer.SGROUP_DATA: 6>: '[{"fieldName":
"atropisomer", "atom": [19, 20], "bonds": [],
"value": "P"}]',
…}
41
Handling polymers
{<HashLayer.CANONICAL_SMILES: 1>: '*c1cnc(*)s1',
…,
<HashLayer.SGROUP_DATA: 6>: '[{"type": "SRU",
"atoms": [1, 2, 3, 4, 6], "bonds": [[0, 1], [4, 5]],
"index": 1, "connect": "HT", "label": "n"}]',
…}
{<HashLayer.CANONICAL_SMILES: 1>: '*c1cnc(*)s1',
…,
<HashLayer.SGROUP_DATA: 6>: '[{"type": "SRU",
"atoms": [1, 2, 3, 4, 6], "bonds": [[0, 1], [4, 5]],
"index": 1, "connect": "HH", "label": "n"}]',
…}
42
Handling enhanced stereochemistry
Ethambutol
These two describe the same racemic mixture
43
Handling enhanced stereochemistry
{<HashLayer.CANONICAL_SMILES: 1>:
'CC[C@@H](CO)NCCN[C@@H](CC)CO',
…,
<HashLayer.NO_STEREO_SMILES: 4>:
'CCC(CO)NCCNC(CC)CO',
…}
{<HashLayer.CANONICAL_SMILES: 1>:
'CC[C@@H](CO)NCCN[C@@H](CC)CO |&1:2,9|',
…,
<HashLayer.NO_STEREO_SMILES: 4>:
'CCC(CO)NCCNC(CC)CO',
…}
We get the same hash if the molecule is drawn with
wedged bonds.
44
Using the escape layer
Suppose I start with the racemic mixture, run it through a chiral column, and
collect the two fractions
I want to register the two fractions separately without determining the absolute
stereochemistry
45
Using the escape layer
{<HashLayer.CANONICAL_SMILES: 1>:
'CC[C@@H](CO)NCCN[C@@H](CC)CO |o1:2,9|',
<HashLayer.ESCAPE: 2>: ‘first fraction',
…}
{<HashLayer.CANONICAL_SMILES: 1>:
'CC[C@@H](CO)NCCN[C@@H](CC)CO |o1:2,9|',
<HashLayer.ESCAPE: 2>: ‘second fraction',
…}
46
Aside: using the escape layer for comp chem
{…
<HashLayer.ESCAPE: 2>: ‘conformer 1',
…}
{…
<HashLayer.ESCAPE: 2>: ‘conformer 2',
…}
Suppose I want to store multiple conformers/poses of the same molecule
47
Wrapping up: molecular identity
● For many computational tasks we want to be
able to figure out whether or not we have
seen/used a particular molecule
● The definition of “same” for molecules
depends on the context/question being asked
● Layered registration hashes make it easy (and
cheap) to store sets of molecules and answer
the context-dependent “are these the same?”
question
48
Thanks!
Thanks!

More Related Content

What's hot

Red Hat OpenShift Container Storage
Red Hat OpenShift Container StorageRed Hat OpenShift Container Storage
Red Hat OpenShift Container Storage
Takuya Utsunomiya
 
PostgreSQLをKubernetes上で活用するためのOperator紹介!(Cloud Native Database Meetup #3 発表資料)
PostgreSQLをKubernetes上で活用するためのOperator紹介!(Cloud Native Database Meetup #3 発表資料)PostgreSQLをKubernetes上で活用するためのOperator紹介!(Cloud Native Database Meetup #3 発表資料)
PostgreSQLをKubernetes上で活用するためのOperator紹介!(Cloud Native Database Meetup #3 発表資料)
NTT DATA Technology & Innovation
 
Profiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systemsProfiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systems
Jack (Jaegeun) Han
 
TVMの次期グラフIR Relayの紹介
TVMの次期グラフIR Relayの紹介TVMの次期グラフIR Relayの紹介
TVMの次期グラフIR Relayの紹介
Takeo Imai
 
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Preferred Networks
 
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
MITSUNARI Shigeo
 
GPGPU Seminar (PyCUDA)
GPGPU Seminar (PyCUDA)GPGPU Seminar (PyCUDA)
GPGPU Seminar (PyCUDA)
智啓 出川
 
ストリーム処理勉強会 大規模mqttを支える技術
ストリーム処理勉強会 大規模mqttを支える技術ストリーム処理勉強会 大規模mqttを支える技術
ストリーム処理勉強会 大規模mqttを支える技術
Keigo Suda
 
Python用ゲームエンジンPyxelで遊んでみた
Python用ゲームエンジンPyxelで遊んでみたPython用ゲームエンジンPyxelで遊んでみた
Python用ゲームエンジンPyxelで遊んでみた
Hirofumi Watanabe
 
オープンソースライセンスの基礎と実務
オープンソースライセンスの基礎と実務オープンソースライセンスの基礎と実務
オープンソースライセンスの基礎と実務
Yutaka Kachi
 
20221021_JP5.0.2-Webinar-JP_Final.pdf
20221021_JP5.0.2-Webinar-JP_Final.pdf20221021_JP5.0.2-Webinar-JP_Final.pdf
20221021_JP5.0.2-Webinar-JP_Final.pdf
NVIDIA Japan
 
暗号技術の実装と数学
暗号技術の実装と数学暗号技術の実装と数学
暗号技術の実装と数学
MITSUNARI Shigeo
 
JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
NTT DATA Technology & Innovation
 
Chainerで流体計算
Chainerで流体計算Chainerで流体計算
Chainerで流体計算
Preferred Networks
 
リアルタイムサーバー 〜Erlang/OTPで作るPubSubサーバー〜
リアルタイムサーバー 〜Erlang/OTPで作るPubSubサーバー〜 リアルタイムサーバー 〜Erlang/OTPで作るPubSubサーバー〜
リアルタイムサーバー 〜Erlang/OTPで作るPubSubサーバー〜
Yugo Shimizu
 
PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜
PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜
PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜
Preferred Networks
 
Xbyakの紹介とその周辺
Xbyakの紹介とその周辺Xbyakの紹介とその周辺
Xbyakの紹介とその周辺
MITSUNARI Shigeo
 
Under The Hood Of A Shard-Per-Core Database Architecture
Under The Hood Of A Shard-Per-Core Database ArchitectureUnder The Hood Of A Shard-Per-Core Database Architecture
Under The Hood Of A Shard-Per-Core Database Architecture
ScyllaDB
 
データ分析基盤を支えるエンジニアリング
データ分析基盤を支えるエンジニアリングデータ分析基盤を支えるエンジニアリング
データ分析基盤を支えるエンジニアリング
Recruit Lifestyle Co., Ltd.
 
Moving computation to the data (1)
Moving computation to the data (1)Moving computation to the data (1)
Moving computation to the data (1)
Kazunori Sato
 

What's hot (20)

Red Hat OpenShift Container Storage
Red Hat OpenShift Container StorageRed Hat OpenShift Container Storage
Red Hat OpenShift Container Storage
 
PostgreSQLをKubernetes上で活用するためのOperator紹介!(Cloud Native Database Meetup #3 発表資料)
PostgreSQLをKubernetes上で活用するためのOperator紹介!(Cloud Native Database Meetup #3 発表資料)PostgreSQLをKubernetes上で活用するためのOperator紹介!(Cloud Native Database Meetup #3 発表資料)
PostgreSQLをKubernetes上で活用するためのOperator紹介!(Cloud Native Database Meetup #3 発表資料)
 
Profiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systemsProfiling deep learning network using NVIDIA nsight systems
Profiling deep learning network using NVIDIA nsight systems
 
TVMの次期グラフIR Relayの紹介
TVMの次期グラフIR Relayの紹介TVMの次期グラフIR Relayの紹介
TVMの次期グラフIR Relayの紹介
 
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
Kubernetes ControllerをScale-Outさせる方法 / Kubernetes Meetup Tokyo #55
 
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
深層学習フレームワークにおけるIntel CPU/富岳向け最適化法
 
GPGPU Seminar (PyCUDA)
GPGPU Seminar (PyCUDA)GPGPU Seminar (PyCUDA)
GPGPU Seminar (PyCUDA)
 
ストリーム処理勉強会 大規模mqttを支える技術
ストリーム処理勉強会 大規模mqttを支える技術ストリーム処理勉強会 大規模mqttを支える技術
ストリーム処理勉強会 大規模mqttを支える技術
 
Python用ゲームエンジンPyxelで遊んでみた
Python用ゲームエンジンPyxelで遊んでみたPython用ゲームエンジンPyxelで遊んでみた
Python用ゲームエンジンPyxelで遊んでみた
 
オープンソースライセンスの基礎と実務
オープンソースライセンスの基礎と実務オープンソースライセンスの基礎と実務
オープンソースライセンスの基礎と実務
 
20221021_JP5.0.2-Webinar-JP_Final.pdf
20221021_JP5.0.2-Webinar-JP_Final.pdf20221021_JP5.0.2-Webinar-JP_Final.pdf
20221021_JP5.0.2-Webinar-JP_Final.pdf
 
暗号技術の実装と数学
暗号技術の実装と数学暗号技術の実装と数学
暗号技術の実装と数学
 
JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
JavaでCPUを使い倒す! ~Java 9 以降の CPU 最適化を覗いてみる~(NTTデータ テクノロジーカンファレンス 2019 講演資料、2019...
 
Chainerで流体計算
Chainerで流体計算Chainerで流体計算
Chainerで流体計算
 
リアルタイムサーバー 〜Erlang/OTPで作るPubSubサーバー〜
リアルタイムサーバー 〜Erlang/OTPで作るPubSubサーバー〜 リアルタイムサーバー 〜Erlang/OTPで作るPubSubサーバー〜
リアルタイムサーバー 〜Erlang/OTPで作るPubSubサーバー〜
 
PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜
PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜
PFN のオンプレML基盤の取り組み / オンプレML基盤 on Kubernetes 〜PFN、ヤフー〜
 
Xbyakの紹介とその周辺
Xbyakの紹介とその周辺Xbyakの紹介とその周辺
Xbyakの紹介とその周辺
 
Under The Hood Of A Shard-Per-Core Database Architecture
Under The Hood Of A Shard-Per-Core Database ArchitectureUnder The Hood Of A Shard-Per-Core Database Architecture
Under The Hood Of A Shard-Per-Core Database Architecture
 
データ分析基盤を支えるエンジニアリング
データ分析基盤を支えるエンジニアリングデータ分析基盤を支えるエンジニアリング
データ分析基盤を支えるエンジニアリング
 
Moving computation to the data (1)
Moving computation to the data (1)Moving computation to the data (1)
Moving computation to the data (1)
 

Similar to Mike Lynch Award Lecture, ICCS 2022

ACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformatics
Greg Landrum
 
Querying a Complex Web-Based KB for Cultural Heritage Preservation
Querying a Complex Web-Based KB  for Cultural Heritage PreservationQuerying a Complex Web-Based KB  for Cultural Heritage Preservation
Querying a Complex Web-Based KB for Cultural Heritage Preservation
Ester Giallonardo
 
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
Alexandre Gouaillard
 
Maintaining and Releasing Open Source Software
Maintaining and Releasing Open Source SoftwareMaintaining and Releasing Open Source Software
Maintaining and Releasing Open Source Software
Joel Nothman
 
The path to an hybrid open source paradigm
The path to an hybrid open source paradigmThe path to an hybrid open source paradigm
The path to an hybrid open source paradigm
Jonathan Challener
 
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering
 
Primers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code ReviewPrimers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code Review
Delft University of Technology
 
Open Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & AnalysisOpen Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & Analysis
Marcus Hanwell
 
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
sparkfabrik
 
Docs as Code: Publishing Processes for API Experiences
Docs as Code: Publishing Processes for API ExperiencesDocs as Code: Publishing Processes for API Experiences
Docs as Code: Publishing Processes for API Experiences
Anne Gentle
 
Continuous Security for GitOps
Continuous Security for GitOpsContinuous Security for GitOps
Continuous Security for GitOps
Weaveworks
 
OpenChain Mini-Summit May 2023
OpenChain Mini-Summit May 2023OpenChain Mini-Summit May 2023
OpenChain Mini-Summit May 2023
Shane Coughlan
 
Service computation20.ppt
Service computation20.pptService computation20.ppt
Service computation20.ppt
Yann-Gaël Guéhéneuc
 
BlockchainLAB Hackathon
BlockchainLAB HackathonBlockchainLAB Hackathon
BlockchainLAB Hackathon
Aleksandr Kopnin
 
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
Workshop -  Architecting Innovative Graph Applications- GraphSummit MilanWorkshop -  Architecting Innovative Graph Applications- GraphSummit Milan
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
Neo4j
 
Not all open source is the same
Not all open source is the sameNot all open source is the same
Not all open source is the same
EDB
 
PRO TALK - Kubernetes Security Workshop.pdf
PRO TALK - Kubernetes Security Workshop.pdfPRO TALK - Kubernetes Security Workshop.pdf
PRO TALK - Kubernetes Security Workshop.pdf
AvinashDesireddy
 
Kubernetes Security Workshop
Kubernetes Security WorkshopKubernetes Security Workshop
Kubernetes Security Workshop
Mirantis
 
Juni_Mukherjee_The_DevSecOps_Journey_AntiPatterns_Analytics_and_Insights
Juni_Mukherjee_The_DevSecOps_Journey_AntiPatterns_Analytics_and_InsightsJuni_Mukherjee_The_DevSecOps_Journey_AntiPatterns_Analytics_and_Insights
Juni_Mukherjee_The_DevSecOps_Journey_AntiPatterns_Analytics_and_Insights
TriNimbus
 
ExSchema - ICSM'13
ExSchema - ICSM'13ExSchema - ICSM'13
ExSchema - ICSM'13
jccastrejon
 

Similar to Mike Lynch Award Lecture, ICCS 2022 (20)

ACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformaticsACS San Diego - The RDKit: Open-source cheminformatics
ACS San Diego - The RDKit: Open-source cheminformatics
 
Querying a Complex Web-Based KB for Cultural Heritage Preservation
Querying a Complex Web-Based KB  for Cultural Heritage PreservationQuerying a Complex Web-Based KB  for Cultural Heritage Preservation
Querying a Complex Web-Based KB for Cultural Heritage Preservation
 
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
IIT-RTC 2017 Qt WebRTC Tutorial (Qt Janus Client)
 
Maintaining and Releasing Open Source Software
Maintaining and Releasing Open Source SoftwareMaintaining and Releasing Open Source Software
Maintaining and Releasing Open Source Software
 
The path to an hybrid open source paradigm
The path to an hybrid open source paradigmThe path to an hybrid open source paradigm
The path to an hybrid open source paradigm
 
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
Boston Data Engineering: Kedro Python Framework for Data Science: Overview an...
 
Primers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code ReviewPrimers or Reminders? The Effects of Existing Review Comments on Code Review
Primers or Reminders? The Effects of Existing Review Comments on Code Review
 
Open Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & AnalysisOpen Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & Analysis
 
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
 
Docs as Code: Publishing Processes for API Experiences
Docs as Code: Publishing Processes for API ExperiencesDocs as Code: Publishing Processes for API Experiences
Docs as Code: Publishing Processes for API Experiences
 
Continuous Security for GitOps
Continuous Security for GitOpsContinuous Security for GitOps
Continuous Security for GitOps
 
OpenChain Mini-Summit May 2023
OpenChain Mini-Summit May 2023OpenChain Mini-Summit May 2023
OpenChain Mini-Summit May 2023
 
Service computation20.ppt
Service computation20.pptService computation20.ppt
Service computation20.ppt
 
BlockchainLAB Hackathon
BlockchainLAB HackathonBlockchainLAB Hackathon
BlockchainLAB Hackathon
 
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
Workshop -  Architecting Innovative Graph Applications- GraphSummit MilanWorkshop -  Architecting Innovative Graph Applications- GraphSummit Milan
Workshop - Architecting Innovative Graph Applications- GraphSummit Milan
 
Not all open source is the same
Not all open source is the sameNot all open source is the same
Not all open source is the same
 
PRO TALK - Kubernetes Security Workshop.pdf
PRO TALK - Kubernetes Security Workshop.pdfPRO TALK - Kubernetes Security Workshop.pdf
PRO TALK - Kubernetes Security Workshop.pdf
 
Kubernetes Security Workshop
Kubernetes Security WorkshopKubernetes Security Workshop
Kubernetes Security Workshop
 
Juni_Mukherjee_The_DevSecOps_Journey_AntiPatterns_Analytics_and_Insights
Juni_Mukherjee_The_DevSecOps_Journey_AntiPatterns_Analytics_and_InsightsJuni_Mukherjee_The_DevSecOps_Journey_AntiPatterns_Analytics_and_Insights
Juni_Mukherjee_The_DevSecOps_Journey_AntiPatterns_Analytics_and_Insights
 
ExSchema - ICSM'13
ExSchema - ICSM'13ExSchema - ICSM'13
ExSchema - ICSM'13
 

More from Greg Landrum

Chemical registration
Chemical registrationChemical registration
Chemical registration
Greg Landrum
 
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Greg Landrum
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
Greg Landrum
 
Moving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningMoving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine Learning
Greg Landrum
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
Greg Landrum
 
Let’s talk about reproducible data analysis
Let’s talk about reproducible data analysisLet’s talk about reproducible data analysis
Let’s talk about reproducible data analysis
Greg Landrum
 
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them? How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
Greg Landrum
 
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Greg Landrum
 
Processing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialProcessing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorial
Greg Landrum
 
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Big (chemical) data? No Problem!
Big (chemical) data? No Problem!
Greg Landrum
 
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical research
Greg Landrum
 
Some "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontSome "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data front
Greg Landrum
 
Large scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataLarge scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent data
Greg Landrum
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knime
Greg Landrum
 
Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKit
Greg Landrum
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databases
Greg Landrum
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Greg Landrum
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...
Greg Landrum
 

More from Greg Landrum (18)

Chemical registration
Chemical registrationChemical registration
Chemical registration
 
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...Google BigQuery for analysis of scientific datasets: Interactive exploration ...
Google BigQuery for analysis of scientific datasets: Interactive exploration ...
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
 
Moving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine LearningMoving from Artisanal to Industrial Machine Learning
Moving from Artisanal to Industrial Machine Learning
 
Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)Building useful models for imbalanced datasets (without resampling)
Building useful models for imbalanced datasets (without resampling)
 
Let’s talk about reproducible data analysis
Let’s talk about reproducible data analysisLet’s talk about reproducible data analysis
Let’s talk about reproducible data analysis
 
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them? How Do You Build and Validate 1500 Models and What Can You Learn from Them?
How Do You Build and Validate 1500 Models and What Can You Learn from Them?
 
Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...Interactive and reproducible data analysis with the open-source KNIME Analyti...
Interactive and reproducible data analysis with the open-source KNIME Analyti...
 
Processing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialProcessing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorial
 
Big (chemical) data? No Problem!
Big (chemical) data? No Problem!Big (chemical) data? No Problem!
Big (chemical) data? No Problem!
 
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical research
 
Some "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontSome "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data front
 
Large scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent dataLarge scale classification of chemical reactions from patent data
Large scale classification of chemical reactions from patent data
 
Machine learning in the life sciences with knime
Machine learning in the life sciences with knimeMachine learning in the life sciences with knime
Machine learning in the life sciences with knime
 
Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKit
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databases
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...
 

Recently uploaded

Phytoremediation: Harnessing Nature's Power with Phytoremediation
Phytoremediation: Harnessing Nature's Power with PhytoremediationPhytoremediation: Harnessing Nature's Power with Phytoremediation
Phytoremediation: Harnessing Nature's Power with Phytoremediation
Gurjant Singh
 
Adjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyerAdjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyer
pablovgd
 
smallintestinedisorders-causessymptoms-240626042532-363e8392.pptx
smallintestinedisorders-causessymptoms-240626042532-363e8392.pptxsmallintestinedisorders-causessymptoms-240626042532-363e8392.pptx
smallintestinedisorders-causessymptoms-240626042532-363e8392.pptx
muralinath2
 
antenna-fundamentals an introductions to basics
antenna-fundamentals an introductions to basicsantenna-fundamentals an introductions to basics
antenna-fundamentals an introductions to basics
drphrao1
 
Gasification and Pyrolyssis of plastic Waste under a Circular Economy perpective
Gasification and Pyrolyssis of plastic Waste under a Circular Economy perpectiveGasification and Pyrolyssis of plastic Waste under a Circular Economy perpective
Gasification and Pyrolyssis of plastic Waste under a Circular Economy perpective
Recupera
 
A mature quasar at cosmic dawn revealed by JWST rest-frame infrared spectroscopy
A mature quasar at cosmic dawn revealed by JWST rest-frame infrared spectroscopyA mature quasar at cosmic dawn revealed by JWST rest-frame infrared spectroscopy
A mature quasar at cosmic dawn revealed by JWST rest-frame infrared spectroscopy
Sérgio Sacani
 
Science-Technology Quiz (School Quiz 2024)
Science-Technology Quiz (School Quiz 2024)Science-Technology Quiz (School Quiz 2024)
Science-Technology Quiz (School Quiz 2024)
Kashyap J
 
seed drying lecture, different types of dryers
seed drying lecture, different types of dryersseed drying lecture, different types of dryers
seed drying lecture, different types of dryers
Rammehargahlot1
 
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
Hossein Fani
 
morphology and reproduction of Thuja.pptx
morphology and reproduction of Thuja.pptxmorphology and reproduction of Thuja.pptx
morphology and reproduction of Thuja.pptx
karthiksaran8
 
largeintestinepathologiesconditions-240627071428-3c936a47 (2).pptx
largeintestinepathologiesconditions-240627071428-3c936a47 (2).pptxlargeintestinepathologiesconditions-240627071428-3c936a47 (2).pptx
largeintestinepathologiesconditions-240627071428-3c936a47 (2).pptx
muralinath2
 
MCQ in Electrostatics. for class XII pptx
MCQ in Electrostatics. for class XII  pptxMCQ in Electrostatics. for class XII  pptx
MCQ in Electrostatics. for class XII pptx
ArunachalamM22
 
Introduction to Artificial Intelligence.pdf
Introduction to Artificial Intelligence.pdfIntroduction to Artificial Intelligence.pdf
Introduction to Artificial Intelligence.pdf
kaavyashreegoskula
 
The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...
The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...
The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...
Sérgio Sacani
 
Travis Hills of Minnesota Sets a New Standard in Carbon Credits With Livestoc...
Travis Hills of Minnesota Sets a New Standard in Carbon Credits With Livestoc...Travis Hills of Minnesota Sets a New Standard in Carbon Credits With Livestoc...
Travis Hills of Minnesota Sets a New Standard in Carbon Credits With Livestoc...
Travis Hills MN
 
LOB LOD LOQ for method validation in laboratory
LOB LOD LOQ for method validation in laboratoryLOB LOD LOQ for method validation in laboratory
LOB LOD LOQ for method validation in laboratory
JCKH
 
Lunar Mobility Drivers and Needs - Artemis
Lunar Mobility Drivers and Needs - ArtemisLunar Mobility Drivers and Needs - Artemis
Lunar Mobility Drivers and Needs - Artemis
Sérgio Sacani
 
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
PANDURANGLAWATE1
 
poikilocytosis 23765437865210857453257844.pptx
poikilocytosis 23765437865210857453257844.pptxpoikilocytosis 23765437865210857453257844.pptx
poikilocytosis 23765437865210857453257844.pptx
muralinath2
 
poikilocytosis 237654378658585210854.pptx
poikilocytosis 237654378658585210854.pptxpoikilocytosis 237654378658585210854.pptx
poikilocytosis 237654378658585210854.pptx
muralinath2
 

Recently uploaded (20)

Phytoremediation: Harnessing Nature's Power with Phytoremediation
Phytoremediation: Harnessing Nature's Power with PhytoremediationPhytoremediation: Harnessing Nature's Power with Phytoremediation
Phytoremediation: Harnessing Nature's Power with Phytoremediation
 
Adjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyerAdjusted NuGOweek 2024 Ghent programme flyer
Adjusted NuGOweek 2024 Ghent programme flyer
 
smallintestinedisorders-causessymptoms-240626042532-363e8392.pptx
smallintestinedisorders-causessymptoms-240626042532-363e8392.pptxsmallintestinedisorders-causessymptoms-240626042532-363e8392.pptx
smallintestinedisorders-causessymptoms-240626042532-363e8392.pptx
 
antenna-fundamentals an introductions to basics
antenna-fundamentals an introductions to basicsantenna-fundamentals an introductions to basics
antenna-fundamentals an introductions to basics
 
Gasification and Pyrolyssis of plastic Waste under a Circular Economy perpective
Gasification and Pyrolyssis of plastic Waste under a Circular Economy perpectiveGasification and Pyrolyssis of plastic Waste under a Circular Economy perpective
Gasification and Pyrolyssis of plastic Waste under a Circular Economy perpective
 
A mature quasar at cosmic dawn revealed by JWST rest-frame infrared spectroscopy
A mature quasar at cosmic dawn revealed by JWST rest-frame infrared spectroscopyA mature quasar at cosmic dawn revealed by JWST rest-frame infrared spectroscopy
A mature quasar at cosmic dawn revealed by JWST rest-frame infrared spectroscopy
 
Science-Technology Quiz (School Quiz 2024)
Science-Technology Quiz (School Quiz 2024)Science-Technology Quiz (School Quiz 2024)
Science-Technology Quiz (School Quiz 2024)
 
seed drying lecture, different types of dryers
seed drying lecture, different types of dryersseed drying lecture, different types of dryers
seed drying lecture, different types of dryers
 
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
Collaborative Team Recommendation for Skilled Users: Objectives, Techniques, ...
 
morphology and reproduction of Thuja.pptx
morphology and reproduction of Thuja.pptxmorphology and reproduction of Thuja.pptx
morphology and reproduction of Thuja.pptx
 
largeintestinepathologiesconditions-240627071428-3c936a47 (2).pptx
largeintestinepathologiesconditions-240627071428-3c936a47 (2).pptxlargeintestinepathologiesconditions-240627071428-3c936a47 (2).pptx
largeintestinepathologiesconditions-240627071428-3c936a47 (2).pptx
 
MCQ in Electrostatics. for class XII pptx
MCQ in Electrostatics. for class XII  pptxMCQ in Electrostatics. for class XII  pptx
MCQ in Electrostatics. for class XII pptx
 
Introduction to Artificial Intelligence.pdf
Introduction to Artificial Intelligence.pdfIntroduction to Artificial Intelligence.pdf
Introduction to Artificial Intelligence.pdf
 
The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...
The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...
The cryptoterrestrial hypothesis: A case for scientific openness to a conceal...
 
Travis Hills of Minnesota Sets a New Standard in Carbon Credits With Livestoc...
Travis Hills of Minnesota Sets a New Standard in Carbon Credits With Livestoc...Travis Hills of Minnesota Sets a New Standard in Carbon Credits With Livestoc...
Travis Hills of Minnesota Sets a New Standard in Carbon Credits With Livestoc...
 
LOB LOD LOQ for method validation in laboratory
LOB LOD LOQ for method validation in laboratoryLOB LOD LOQ for method validation in laboratory
LOB LOD LOQ for method validation in laboratory
 
Lunar Mobility Drivers and Needs - Artemis
Lunar Mobility Drivers and Needs - ArtemisLunar Mobility Drivers and Needs - Artemis
Lunar Mobility Drivers and Needs - Artemis
 
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
[1] Data Mining - Concepts and Techniques (3rd Ed).pdf
 
poikilocytosis 23765437865210857453257844.pptx
poikilocytosis 23765437865210857453257844.pptxpoikilocytosis 23765437865210857453257844.pptx
poikilocytosis 23765437865210857453257844.pptx
 
poikilocytosis 237654378658585210854.pptx
poikilocytosis 237654378658585210854.pptxpoikilocytosis 237654378658585210854.pptx
poikilocytosis 237654378658585210854.pptx
 

Mike Lynch Award Lecture, ICCS 2022

  • 1. RDKit: where did we come from and where are we going? Greg Landrum (@dr_greg_landrum) 12th International Conference on Chemical Structures 12 June, 2022
  • 2. The Trustees of the CSA Trust are pleased to announce that Greg Landrum has been awarded the 2022 Mike Lynch Award, in recognition of his work on the development of RDKit and his fostering of the community around it, a transformative software resource for cheminformatics and machine learning. https://csa-trust.org/2022/05/13/mike-lynch-award-2022-greg-landrum/ The purpose of the Award is to recognise and encourage outstanding accomplishments in education, research and development activities that are related to the systems and methods used to store, process and retrieve information about chemical structures, reactions and properties. The Mike Lynch Award will be presented at a prestigious, relevant conference to be identified prior to each presentation and the awardee will be asked to give a presentation at the conference. https://csa-trust.org/awards-and-grants/awards/
  • 4. 4 Acknowledgements ● Everyone who has contributed code, questions, answers, bug reports, etc ● The people who manage RDKit packaging ● The organizers and sponsors of the RDKit UGMs ● People who have funded RDKit development (directly or indirectly) ● The others in our community who've been pushing the idea and adoption of open source
  • 5. 5 An open source toolkit for cheminformatics ● Business-friendly BSD license ● Core data structures and algorithms in C++ ● Python 3.x wrapper generated using Boost.Python ● Java and C# wrappers generated with SWIG ● JavaScript wrappers ● CFFI wrapper for usage from other languages ● 2D and 3D molecular operations ● Descriptor generation for machine learning ● Molecular database cartridge for PostgreSQL ● Cheminformatics nodes for KNIME (distributed from the KNIME community site: http://www.knime.org/rdkit)
  • 6. 6 Ecodesystem Exact same implementation regardless of where you are using it from
  • 7. 7 Releases, reproducibility, and citability ● 2 feature releases per year ● ~monthly patch releases with bug fixes ● Every release is assigned a DOI and archived on Zenodo https://zenodo.org/record/6483170
  • 8. 8 Packaging - conda-forge: conda install -c conda-forge rdkit - pypi: pip install rdkit-pypi - npm: npm i @rdkit/rdkit - apt: apt install python3-rdkit postgresql-14-rdkit
  • 9. 9 Sustainability: the bus problem https://commons.wikimedia.org/wiki/File:Postauto_susten.jpg
  • 10. 10 Sustainability: the bus problem RDKit maintainers: - Greg - Brian Kelley (Relay Therapeutics) - Ricardo Rodriguez (Schrödinger) - Paolo Tosco (Novartis) Regular code contributors: - David Cosgrove - Peter Gedeck - Gareth Jones - Eisuke Kawashima - Dan Nealschneider - Sereina Riniker - Roger Sayle - Riccardo Vianello
  • 11. The RDKit community How it started…
  • 12. The RDKit community How it’s going…
  • 13. Where we came from, where we’re going
  • 14. 14 The early days ● 2000-2006: initial development work at Rational Discovery ● 2006: code open sourced and released on sourceforge.net
  • 15. 15 Aside: some motivations for open-sourcing scientific code ● Recognition ● Helping the scientific community ● Feedback and help from others ● You get to keep using the code when you move on to your next position
  • 16. 16 Some history ● 2000-2006: initial development work at Rational Discovery ● 2006: code open sourced and released on sourceforge.net ● 2007: First NIBR contribution (chemical reaction handling); Noel discovers the RDKit ● 2008: first POC of Java wrapper; Mac support added; SLN and Mol2 parsers; ● 2009: Morgan fingerprints; switch to cmake; switch to VF2 for SSS ● 2010: PostgreSQL cartridge; First iteration of the KNIME nodes; $RDBASE/Contrib appears; SaltRemover and FunctionalGroups code ● 2011: New Java wrappers; more functionality moved to C++; InChI support; AvalonTools integration ● 2012: First UGM; Speed improvements; MCS implementation; IPython integration; “RDKit Cookbook” appears ● 2013: Move to github; Pandas integration; MMFF and Open3DAlign support; PDB support; rdkit blog started
  • 17. 17 Some history, cntd ● 2014: python3 support; conda integration; experimental lucene integration; MCS implementation in C++ ● 2015: new drawing code; improved canonicalization algorithm; ETKDG; reduced memory usage ● 2016: Regular patch releases; easier builds; performance improvements; KNIME nodes move to Github ● 2017: Modern C++; R-group decomposition, first GSoC participation, conda-forge packages ● 2018: CoordGen integration; molecular standardization ● 2019: Azure DevOps, substructure speedup, new molecule hashing code, Neo4J integration, new JS wrappers ● 2020: new CIP implementation, scaffold network, abbreviations, tautomer-insensitive substructure search ● 2021: rdkit-cffi, more drawing improvements, R-group decomposition improvements ● 2022: C++17, generics for searching, non-tetrahedral symmetry…
  • 20. 20 Longer term RDKit objectives ● Improved support for other classes of molecules ■ Polymers ■ Organometallics ● Ensuring that the PostgreSQL cartridge is a plausible candidate for use in a corporate “data warehouse”1 ● Ensuring all the pieces are in place to make it easy to write a compound registration system 1 or whatever such things are called these days
  • 21. 21 Future directions: the cartridge Ensuring that the PostgreSQL cartridge is a plausible candidate for use in a corporate “data warehouse” - Integration of tautomer insensitive search - Integration of the MolStandardize code - Improvements to the chemical reaction handling - Integration of the generics for searching Further ideas - Adding some 3D search capabilities
  • 22. 22 Future directions: registration systems First: what is a chemical registration system?
  • 23. 23 Aside: Goals of a compound registration system We want to be able to answer these questions: - Have we seen this compound before? - Give me a key for this compound - Give me the structure for this key
  • 24. 24 Aside: Goals of a compound registration system We want to be able to answer these questions: - Have we seen this compound before? - Give me a key for this compound - Give me the structure for this key So what do we need to be able to do? - Standardize molecules - Generate hashes/keys for standardized molecules - Store structures
  • 25. 25 Using keys for registration Idea: use a hash to combine: - The molecular structure (via a fixed H InChI) - A stereo code - A stereo comment https://github.com/rdkit/UGM_2015/blob/8f562e70add17bab35f43823af0f03673f8a 1f2d/Presentations/KeyToRegistration.GregLandrum.pdf
  • 26. 26 Future directions: registration systems Ensuring all the pieces are in place to make it easy to write a compound registration system - Improvements to MolStandardize code - Improvements to the molecular hashing code - Support for more other classes of molecules
  • 27. 27 Let’s talk about molecular identity This isn’t just a topic for standard compound registration systems.
  • 28. 28 Molecular identity and computational questions ● Which molecules were used to generate this result? ● Have I already done a calculation using this molecule? ● Was this molecule part of my training set? All of these require us to be able to answer the question “are these two molecules the same?” Here be dragons…
  • 29. 29 Some things making molecular identity nontrivial
  • 30. 30 Some things making molecular identity nontrivial ● Counterions, solvents ● Resonance forms ● Charges ● Tautomers ● Stereochemistry Sometimes we care about these differences, sometimes we don’t. It depends on the context around when asking the question “are these two molecules the same?” This is not a comprehensive list
  • 31. 31 Identity hashes for molecules Idea: convert the molecule into some form which allows us to test whether or not it’s identical to other molecules via a simple string (or numerical) comparison. What “identical” means will be determined by the identity hash used. Familiar examples: - Canonical SMILES - InChI
  • 32. 32 Contextual identity Instead of having a single key/hash for a molecule, store a collection of layers with different levels of detail/types of information. When searching, choose the layers which are relevant for the current use case ● Store molecules using some relatively lossless format (e.g. v3000 SDF) ● Use molecular hashes capturing different levels of information to establish whether or not duplicates exist Note: it’s possible to do a limited version of this via careful manipulation of InChI strings
  • 33. 33 Some more identity hashes https://www.nextmovesoftware.com/talks/OBoyle_MolHash_ACS_201908.pdf Available in the RDKit since the 2019.09 release
  • 34. 34 Some of the basic identity hashes in rdMolHash ● Molecular formula ● Anonymous graph ● Element graph ● Murcko scaffold ● Tautomer ● Canonical smiles There are many others
  • 35. 35 Hashes for registration The team at Schrödinger1 have contributed a new RDKit module for calculating layered hashes which are useful for compound identity testing and registration. This will be in the 2022.09 release. Layers it currently supports: - Formula - Canonical SMILES : with and without stereo - Tautomer hash: with and without stereo - Sgroup data (for some help with polymers and things like atropisomers) - “Escape layer” (free text allowing a structure to be different even if everything else says it’s the same) 1 Chris Von Bargen, Hussein Faara, Dan Nealschneider, Ricardo Rodriguez, Rachel Walker
  • 36. 36 Registration hash example {<HashLayer.CANONICAL_SMILES: 1>: 'COc1ccc2[nH]c([S@@](=O)Cc3ncc(C)c(OC)c3C)nc2c1', <HashLayer.ESCAPE: 2>: '', <HashLayer.FORMULA: 3>: 'C17H19N3O3S', <HashLayer.NO_STEREO_SMILES: 4>: 'COc1ccc2[nH]c(S(=O)Cc3ncc(C)c(OC)c3C)nc2c1', <HashLayer.NO_STEREO_TAUTOMER_HASH: 5>: 'CO[C]1[CH][CH][C]2[N][C]([S]([O])C[C]3[N][CH][C](C)[C](OC)[C]3C)[N][C]2[CH]1_1_0', <HashLayer.SGROUP_DATA: 6>: '[]', <HashLayer.TAUTOMER_HASH: 7>: 'CO[C]1[CH][CH][C]2[N][C]([S@@]([O])C[C]3[N][CH][C](C)[C](OC)[C]3C)[N][C]2[CH]1_1_0'}
  • 37. 37 Handling tautomers {<HashLayer.CANONICAL_SMILES: 1>: 'CCCS(=O)(=O)Nc1ccc(F)c(C(=O)c2c[nH]c3ncc(-c 4ccc(Cl)cc4)cc23)c1F', <HashLayer.ESCAPE: 2>: '', <HashLayer.FORMULA: 3>: 'C23H18ClF2N3O3S', … <HashLayer.TAUTOMER_HASH: 7>: 'CCCS([O])([O])[N][C]1[CH][CH][C](F)[C]([C]( [O])[C]2[CH][N][C]3[N][CH][C]([C]4[CH][CH][C ](Cl)[CH][CH]4)[CH][C]32)[C]1F_2_0'} {<HashLayer.CANONICAL_SMILES: 1>: 'CCCS(=O)(=O)Nc1ccc(F)c(C(=O)c2cnc3[nH]cc(-c 4ccc(Cl)cc4)cc2-3)c1F', <HashLayer.ESCAPE: 2>: '', <HashLayer.FORMULA: 3>: 'C23H18ClF2N3O3S', … <HashLayer.TAUTOMER_HASH: 7>: 'CCCS([O])([O])[N][C]1[CH][CH][C](F)[C]([C]( [O])[C]2[CH][N][C]3[N][CH][C]([C]4[CH][CH][C ](Cl)[CH][CH]4)[CH][C]32)[C]1F_2_0'}
  • 38. 38 Handling atropisomers Structures from: https://doi.org/10.1016/j.xphs.2021.10.011
  • 39. 39 Handling atropisomers Structures from: https://doi.org/10.1016/j.xphs.2021.10.011 The bold and hashed bonds are just drawing features and don’t survive translation to things like CXSMILES or mol files. But we can use S groups to indicate the stereochemistry
  • 40. 40 Handling atropisomers Structures from: https://doi.org/10.1016/j.xphs.2021.10.011 {<HashLayer.CANONICAL_SMILES: 1>: 'COc1cc2ncc3c(c2cc1-c1cn(C)nc1C)n(-c1c(F)cncc1OC)c(=O )n3C', <HashLayer.ESCAPE: 2>: '', <HashLayer.FORMULA: 3>: 'C23H21FN6O3', … <HashLayer.SGROUP_DATA: 6>: '[{"fieldName": "atropisomer", "atom": [19, 20], "bonds": [], "value": "M"}]', …} {<HashLayer.CANONICAL_SMILES: 1>: 'COc1cc2ncc3c(c2cc1-c1cn(C)nc1C)n(-c1c(F)cncc1OC)c(=O )n3C', <HashLayer.ESCAPE: 2>: '', <HashLayer.FORMULA: 3>: 'C23H21FN6O3', … <HashLayer.SGROUP_DATA: 6>: '[{"fieldName": "atropisomer", "atom": [19, 20], "bonds": [], "value": "P"}]', …}
  • 41. 41 Handling polymers {<HashLayer.CANONICAL_SMILES: 1>: '*c1cnc(*)s1', …, <HashLayer.SGROUP_DATA: 6>: '[{"type": "SRU", "atoms": [1, 2, 3, 4, 6], "bonds": [[0, 1], [4, 5]], "index": 1, "connect": "HT", "label": "n"}]', …} {<HashLayer.CANONICAL_SMILES: 1>: '*c1cnc(*)s1', …, <HashLayer.SGROUP_DATA: 6>: '[{"type": "SRU", "atoms": [1, 2, 3, 4, 6], "bonds": [[0, 1], [4, 5]], "index": 1, "connect": "HH", "label": "n"}]', …}
  • 42. 42 Handling enhanced stereochemistry Ethambutol These two describe the same racemic mixture
  • 43. 43 Handling enhanced stereochemistry {<HashLayer.CANONICAL_SMILES: 1>: 'CC[C@@H](CO)NCCN[C@@H](CC)CO', …, <HashLayer.NO_STEREO_SMILES: 4>: 'CCC(CO)NCCNC(CC)CO', …} {<HashLayer.CANONICAL_SMILES: 1>: 'CC[C@@H](CO)NCCN[C@@H](CC)CO |&1:2,9|', …, <HashLayer.NO_STEREO_SMILES: 4>: 'CCC(CO)NCCNC(CC)CO', …} We get the same hash if the molecule is drawn with wedged bonds.
  • 44. 44 Using the escape layer Suppose I start with the racemic mixture, run it through a chiral column, and collect the two fractions I want to register the two fractions separately without determining the absolute stereochemistry
  • 45. 45 Using the escape layer {<HashLayer.CANONICAL_SMILES: 1>: 'CC[C@@H](CO)NCCN[C@@H](CC)CO |o1:2,9|', <HashLayer.ESCAPE: 2>: ‘first fraction', …} {<HashLayer.CANONICAL_SMILES: 1>: 'CC[C@@H](CO)NCCN[C@@H](CC)CO |o1:2,9|', <HashLayer.ESCAPE: 2>: ‘second fraction', …}
  • 46. 46 Aside: using the escape layer for comp chem {… <HashLayer.ESCAPE: 2>: ‘conformer 1', …} {… <HashLayer.ESCAPE: 2>: ‘conformer 2', …} Suppose I want to store multiple conformers/poses of the same molecule
  • 47. 47 Wrapping up: molecular identity ● For many computational tasks we want to be able to figure out whether or not we have seen/used a particular molecule ● The definition of “same” for molecules depends on the context/question being asked ● Layered registration hashes make it easy (and cheap) to store sets of molecules and answer the context-dependent “are these the same?” question