UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Information (CINF 58, ACS National Meeting 2019-08-26)

CINF58 – 26 August 2019
Frederik van den Broek, Gerd Blanke, Jarek Tomczak, Markus Fischer
UDM
(Unified Data Model)
Enabling Exchange of Comprehensive
Reaction Information

Reactions Are Cool (Hot) Again
2
ChemPlanner®

Drivers and Opportunities of Change
There is a renewed interest in reaction-centric cheminformatics (most of it
was done before the 1990s), but this requires standardised data sets:
• Improved reaction searching and navigation
• Reaction similarity and classification
• Improved automatic determination of reaction mapping
• Mechanism elucidation
• Synthetic feasibility
• Retrosynthesis (design and planning)
• Reaction outcome prediction (products, yield, specificity, safety)
3

Drivers and Opportunities of Change
• Making data FAIR
4Source: https://www.dtls.nl/fair-data/fair-principles-explained/

Vendor Publisher
Ideal World
5
CRO Academia
Pharma
ELN 1 ELN 2
Robots
ELN 4ELN 3 Conversion
Integrated
Data
Public Data
Sources
Analysis Reporting
Publication
Supplements
UDM
others
Commercial
Databases

Current Situation
Various systems storing reaction information have been available for more
than three decades, however:
• No common data model that can comprehensively describe chemical
reactions
• Most of the commercially available databases use models similar to the former
Cheminform database that was distributed by MDL
• No common file format which allows representation of chemical reactions,
their conditions and outcomes
• Common formats:
• RXN/RD files originally created by MDL in the middle of the 80s
• PerkinElmer ELN XML
• No reaction drawing standards
• A IUPAC project “Graphical Representation Standards for Chemical Reaction Diagrams”
has been re-started by Keith Taylor in 2017
6

Challenges
• Collaboration and data exchange with partners and CROs using different
ELNs is difficult
• Integration, comparison and analysis of reaction data from various
sources is very laborious
• The lack of a common data model makes it difficult to develop and share
business rules, for consistent representation of reactions and IP capture
• There are very limited open source/open data activities around reaction
databases and searching
7

UDM Objective
The goal of the UDM project is to
create and publish an open,
extendable and freely available
data format for exchange of
experimental information about
compound synthesis and their
testing.
8

Roche
Pharma A
Pistoia Alliance
Learning From One Another
9
Export tool RD
Files
ELN Query Service Definition
HazELNut RD
Files
RD
Files
Reaxys
RD
Files

ELN Query Service Definition (Pistoia Alliance)
Scope:
• Define and publish high-level foundation design principles for an ELN
data mart and its query services
• Design and implement a prototype version of a synthetic chemistry ELN
data mart using information from the Discovery Chemistry workflow
• System independent query interface & tools accessing via a published
API
• Expand the data model to incorporate data from ELN sources across the
life science space – biology, pharmaceutical sciences and analytical
sciences
10

UDM Origins
• Roche UDM project (2012 to 2013) to integrate in-house chemistry data into
Elsevier’s Reaxys database
• Further developed by Roche and Elsevier with contributions from other pharma
companies as a data transfer format for chemical reactions from a variety of ELNs
into Elsevier Reaxys database
• The originators provided the created UDM XML file format to the Pistoia Alliance
and are committed to working together to make it more generic and to extend it
to other experiment types (October 2017)
• Founders:
11

Roche Key Drivers
From ACS Presentation by
Michael Kapler, Roche Pharma Research and Early Development
http://abstracts.acs.org/chem/245nm/program/view.php?obj_id=188977

Roche integration: Data source overview
From ACS Presentation by
Michael Kapler, Roche Pharma Research and Early Development
Roche in
Reaxys
Licensed
in Reaxys
Export &
Unify
Export &
Unify

Others have followed
From Bio-IT World 2019 presentation by
Ludovic Otterbein, Director Research Informatics & Operations, Lundbeck

UDM – Simplified Data Model
15
UDM
UDM_VERSION
LEGAL
CITATIONS
MOLECULES
REACTIONS VARIATION
CONDITIONS
PREPARATION
CONDITION_GROU
P
REACTANT
PRODUCT
CATALYST
SOLVENT
REAGENT

Reaction Representation
16
Reaction diagram (with optional atom-atom mapping)
Molecular properties
Reactant, product, catalyst, solvent, reagent
properties
Conditions
Analytical data
Preparation section
Scientist(s)
Reaction

Reaction Representation cont.
17
Literature reference (journal paper, patent)
Reaction outcome
Reaction scale
Reaction class(es)
Semantic annotations
Comments
Vendor data
Reaction

UDM Example – MOLECULE
<MOLECULES>
...
<MOLECULE ID="3247633">
<MOLSTRUCTURE><![CDATA[
Mrv0541 05221820572D
HDR
0 0 0 0 0 999 V3000
M V30 BEGIN CTAB
M V30 COUNTS 12 12 0 0 0 REGNO=3247633
M V30 BEGIN ATOM
M V30 1 C 5.39 1.3336 0 0
...
M V30 END CTAB
M END
]]></MOLSTRUCTURE>
<NAME>4-(prop-2-ynyloxy)benzaldehyde</NAME>
</MOLECULE>
...
</MOLECULES>
18

UDM Example – REACTION
<REACTIONS>
<REACTION ID="96874">
<RANK>90</RANK>
<MW_LARGEST_PRODUCT>424.667</MW_LARGEST_PRODUCT>
<REACTANT_ID>1209228</REACTANT_ID>
<PRODUCT_ID>1895054</PRODUCT_ID>
<VARIATION>
<PRODUCT ID="1895054">
<NAME>2,2'-methylenebis(4,6-di-tert-butylphenol)</NAME>
<YIELD>85</YIELD>
<COMMENT TYPE="Yield">85 percent</COMMENT>
</PRODUCT>
<CIT_ID>5924055</CIT_ID>
</VARIATION>
...
</REACTION>
...
</REACTIONS>
</UDM>
19

UDM Project Roadmap
20
2017 2018 2019 2020
Elsevier donates
UDM to Pistoia
Alliance
October 2017
June 2018
UDM Release 4.0
• Fully compatible with the Elsevier version
• Cleaned-up and documented XML schema
• Added support for units of measure
• Included sample data sets (Reaxys, SPRESI)
• Included conversion tool from SPRESI RD file to
UDM
November 2018
UDM Release 5.0 (Brooklyn)
• Support for various representation of molecular
structures and reaction diagrams
• Improved semantic of the model
• New representation of molecular properties
• New properties of reaction components (reactants,
products, catalysts, solvents, reagents)
• Improved representation of reaction conditions
• Support for vendor extensions of the model
• Special tags for capturing legal information
• Support for various formats of PREPARATION
section
• Glossary of UDM terms
• Change log
November 2019
UDM Release 6.0
• Further improvements to the reaction model
• Extended support for analytical data
• Possible BFO-compatible ontology representation +
SHACL model
2020
• Health and safety data
• Compound testing: screening /
DMPK
• Support for galenic formulation
development
• Biochemical reactions
• Support for large molecules
• …

UDM workshop 9 May 2019
21
• Workshop at Elsevier in Amsterdam, attended by representatives from Pharma,
ELN vendors, chemistry content providers and industry experts.
• Outcomes:
• Confirmed UDM priorities and roadmap for 2019
• Identified various UDM use cases
• Identified need for more sample data sets to improve UDM and its coverage for various
synthesis types, especially for those not found frequently in literature
• Identified various data types that need to be supported by UDM
• Discussed factors and risks influencing the adoption of UDM – to be largely mitigated by
developing an open source UDM Toolkit (have applied for funding, but additional donations are
welcome)

Initial UDM Version
Developed by Elsevier and
Roche to integrate
customer reactions
UDM Going Open
Source
Pistoia takes on
governance of UDM as
open source project
UDM 4.0 Release
First Pistoia Alliance
version released
Reaxys Reaction Flat
File
Reaxys exports single step
reactions as UDM (RDF is
part of the offering)
UDM 5.0 Release
Improved version
based on project
members requirements
Entellect Press Release
Elsevier announces data
Platform to harmonize
proprietary and external
data
Scilligence – Reaxys
Interoperability Press Release
The Scilligence ELN integrates
Reaxys reaction query capabilities
Reaxys SCI collaboration
Reaxys collaborates with the
Italian Chemistry Society to
support data driven
chemistry research
201820172013 2019 2020
ReaxysPlus
Established UDM as
ReaxysPlus import
format for customer
UDM extensions to
accommodate
customer
requirements
Entellect Reaction
Workbench
Advanced reaction analytics
platform
Upcoming support UDM as import
format
ELN – Reaxys data exchange using UDM
Evaluating UDM as bi-directional exchange format to
improve Scilligence ELN – Reaxys interoperability
Reaxys UDM export/import
Implement UDM export for Reaxys
and ReaxysPlus reactions. Support
import for batch searching
Lower Data Exchange Barriers in Academic Research
Evaluate the use of the UDM and ELNs in academic chemistry
research in collaboration with the Società Chimica Italiana. KIT
open source ELN adopting UDM.
Beilstein investigating impact adopting UDM. Entellect RMC Workbench
Utilize UDM to integrate
Bioactivity data
ReaxysPlus support for Pistoia UDM versions
Enable reaction ingestion using the Pistoia UDM versions
Nov-2018
Nov-2018
Apr-2019
Apr-2019
Support Pistoia Alliance UDM project
Mar-2018
Jun-2018Oct-2017
Jan-2014
Elsevier UDM Roadmap

Search &
workflow
Visualization
Predictive
analytics
Accelerated
data-science
driven R&D
• Chemistry
intelligence
• Disease
intelligence
• Safety
intelligence
• Efficacy
• intelligence
• Trial
intelligence
• Drug
intelligence
• Commercial
intelligence
Exploratory
Analytics
Compound
& Reaction
Trial
Post Market Assay
-omics Translational
Scientific data from internal
external sources
Ingest &
enrich
Connect Serve
Entellect is a smart and flexible life sciences platform that powers R&D discovery
by using Elsevier’s trusted approach towards data integration and harmonization.
Entellect delivers connected and AI-ready data by linking and enriching disparate content
against established life science taxonomies. Combined with the option of Elsevier data, the
result is a scalable knowledge environment, enabling exploratory and predictive analytics
applications

Benefits of UDM
• Provides improved quality of experimental data
• Supports integration, comparison and analysis of research data from
various sources
• Enables collaboration and data exchange with partners and CROs using
different ELNs
• Helps in defining and sharing business rules / protocols, for consistent
representation of experiments and IP capture.
• (Validation ELN data)
24

Questions?
By Malis - https://commons.wikimedia.org/w/index.php?curid=2633354

UDM Example – LEGAL
<UDM DATABASE="REAXYS" SEQUENCE="001" TIMESTAMP="2018-05-23T00:02:15.0721148+00:00“
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<UDM_VERSION MAJOR="5" MINOR="0" REVISION="1" VERSIONTEXT="5.0.1"></UDM_VERSION>
<LEGAL>
<PRODUCER>Elsevier</PRODUCER>
<TITLE>Sample Reaxys dataset</TITLE>
<LICENSE href="https://creativecommons.org/licenses/by-nc-sa/4.0/">
Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International
(CC BY-NC-SA 4.0)
</LICENSE>
<COPYRIGHT href="https://www.elsevier.com">
<TEXT>Copyright (c) 2018 Elsevier</TEXT>
<OWNER>Elsevier</OWNER>
<DATE>2018</DATE>
</COPYRIGHT>
</LEGAL>
...
27

UDM Example – CITATION
<CITATIONS>
<CITATION ID="5924055">
<TYPE>Article</TYPE>
<AUTHOR>Sartori, Giovanni; Bigi, Franca; Maggi, Raimondo; Porta, Cecilia</AUTHOR>
<TITLE>Metal-template ortho-regioselective mono- and bis-de-tert-butylation of poly-tert-butylated phenols</TITLE>
<JOURNAL>Tetrahedron Letters</JOURNAL>
<YEAR>1994</YEAR>
<VOL>35</VOL>
<PAGE>7073 - 7076</PAGE>
<DOI>10.1016/0040-4039(94)88229-0</DOI>
</CITATION>
...
</CITATIONS>
28

UDM Example – MOLECULE
<MOLECULES>
...
<MOLECULE ID="3247633">
<MOLSTRUCTURE><![CDATA[
Mrv0541 05221820572D
HDR
0 0 0 0 0 999 V3000
M V30 BEGIN CTAB
M V30 COUNTS 12 12 0 0 0 REGNO=3247633
M V30 BEGIN ATOM
M V30 1 C 5.39 1.3336 0 0
...
M V30 END CTAB
M END
]]></MOLSTRUCTURE>
<NAME>4-(prop-2-ynyloxy)benzaldehyde</NAME>
</MOLECULE>
...
</MOLECULES>
29

UDM Example – REACTION
<REACTIONS>
<REACTION ID="96874">
<RANK>90</RANK>
<MW_LARGEST_PRODUCT>424.667</MW_LARGEST_PRODUCT>
<PRODUCT_ID>1895054</PRODUCT_ID>
<VARIATION>
<PRODUCT ID="1895054">
<NAME>2,2'-methylenebis(4,6-di-tert-butylphenol)</NAME>
<YIELD>85</YIELD>
<COMMENT TYPE="Yield">85 percent</COMMENT>
</PRODUCT>
<CIT_ID>5924055</CIT_ID>
</VARIATION>
...
</REACTION>
...
</REACTIONS>
</UDM>
30

UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Information (CINF 58, ACS National Meeting 2019-08-26)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Information (CINF 58, ACS National Meeting 2019-08-26)

Similar to UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Information (CINF 58, ACS National Meeting 2019-08-26) (20)

Recently uploaded

Recently uploaded (13)

UDM (Unified Data Model) - Enabling Exchange of Comprehensive Reaction Information (CINF 58, ACS National Meeting 2019-08-26)

Editor's Notes