SlideShare a Scribd company logo
1 of 28
Download to read offline
Implementing iso 11238 standard
compliance with chemaxon tools
Roger Sayle
Nextmove software, cambridge, uk
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
What is iso 11238?
• ISO standard 11238 entitled “Health Informatics –
Identification of medicinal products – Data elements
and structures for the unique identification and
exchange of regulated information on substances”.
• Defines a framework for uniquely identifying and
exchanging compounds of pharmaceutical interest.
• The framework serves a similar role to CAS registry
numbers, PubChem CID or InChI-Key, assigning
unique identifiers to substances.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
Meet the (IDMP) family
• 11238 is one of a suite of 5 related standards, all for
“unique identification and exchange of …”
– 11238 “… regulated information on substances”.
– 11239 “… dose forms, units, administration, etc.”.
– 11240 “… units of measurement”.
– 11615 “… regulated medicinal product information”.
– 11616 “… regulated pharmaceutical product information”.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
Why this is 11238 important?
• EU regulation 520/2012 on “pharmacovigilance”
requires countries, regulatory authorities and
pharma to adopt the 5 IDMP standards (articles 25
and 26) by 1st July 2016 (article 40).
• Executive summary: It’s the law!
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
How it works
Code Assignment
(Authority)
Code Look-up
(Authority)
Name/Identifer
Connection Table
Properties
(Significant Text)
Unique Code
Unique Code
Name/Identifer
Connection Table
Properties
(Significant Text)
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
Likely implementation
Code Assignment
(Authority)
Code Look-up
(Authority)
Name/Identifer
Connection Table
Properties
(Significant Text)
Unique Code
Unique Code
Name/Identifer
Connection Table
Properties
(Significant Text)
FDA UNII
FDA SRS Search
FDA UNII
XML
INN/USAN/CID
FDA/NCATS GInAS
MOL2000/SMILES/InChI
Protein/NA Sequence
ISO11238 Groups 1-4
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
Current status
• The standard has been ratified and it use has been
written into EU law (EU Reg. 520/2012).
• Framework requires use of non-semantic, random,
fixed length unique identifiers, that include an
internal integrity check.
• The standard also details constraints on uniqueness.
• Exact implementation details yet to be determined
(to appear in a future “Implementation Guide”).
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
What will the future look like?
• ISO11238 compliant identifiers will be very similar to
the FDA’s UNII (UNique Ingredient Identifier).
• The fixed width non-semantic identifier requirement
rules out the use of plain SMILES, InChI, V2000 Mol
file and similar encodings.
• The random requirement rules out plain CAS registry
numbers, PubChem CIDs and ChEMBL IDs (which use
sequential or monotonic number assignment).
• Alternatively, InChI keys or similar hashes (with [CRC]
checks) of connection tables+text may be possible.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
What’s available now
• ISO charge for access to official standards documents
(which is why 5 IDMP standards is more profitable
than one), about 158 CHF ($177 USD) from ISO for
11238 [between $120 and $340 online].
• However, as with many ISO standards, late drafts of
ISO 11238 are freely available on the internet.
• Caution: Many of the technical examples (all XML)
were removed from the final standard and are due to
appear in the upcoming “Implementation Guide(s)”.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
Example requirement
• §3.4 “Naming of substances” states “at least one
substance name or company code shall be associated
with each substance”.
• For the envisioned work flows this typically assumes
INN or USAN name has already been assigned.
• One way to guarantee the existence of a suitable
substance name for investigational compounds is to
use IUPAC naming software (such as ChemAxon’s)
during submission to the unique coding authority.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
• Plug: ChemAxon s2n coverage is state-of-the-art.
The devil is in the details
• One of the interesting cheminformatics challenges
with working with the published ISO standard and
the examples from the draft annex is the typography.
• The document has been typeset by editors with
expertise outside the field of cheminformatics who
have inadvertently changed whitespace without
appreciating the impact this has on chemistry tools.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
Final ISO11238 standard Annex A
• §A.2.3 SMILES uses the example “C1 = CC = CC = C1”
where the spurious spaces create problems for
SMILES readers.
• §A.2.4 InChI both strips the “InChI=” prefix and again
suffers from spaces “1/C6H6 /c1-2-4-6-5-3-1/h1-6H”.
– Interestingly this is an old InChI not a standard InChI.
• §A.2.2 Molfile fails to mention that V2000 mol files
use fixed width columns and blank lines, as a result
the example given in text *next slide+ can’t easily be
read.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
Annex A: example.mol
ACD/Labs0812062058
6 6 0 0 0 0 0 0 0 0 1 V2000
1.9050 −0.7932 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.9050 −2.1232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.7531 −0.1282 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
0.7531 −2.7882 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
−0.3987 −0.7932 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
−0.3987 −2.1232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
2 1 1 0 0 0 0
3 1 2 0 0 0 0
4 2 2 0 0 0 0
5 3 1 0 0 0 0
6 4 1 0 0 0 0
6 5 2 0 0 0 0
M END
$$$$
Missing Blank Lines
Incorrectly aligned
columns
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
Benefit of the doubt?
• These unintentional typographical errors in the
normative text may perhaps be the result of poor
fonts, with the exception of “InChI=”.
• Alas the content of the original Annex B from the
draft indicate these issues were more widespread
and may arise from ignorance of cheminformatics
file formats.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
§B.2.2 InChI in XML Example
<STRUCTURAL_REPRESENTATION_TYPE>INCHI</STRUCTURAL_REPRESENTATION_TYPE>
<STRUCTURAL_REPRESENTATION>1S/C2H5NO2.AL.CLH.2H2O.ZR/C3-1-
2(4)5;;;;;/H1,3H2,(H,4,5);;1H;2*1H2;/Q;+3;;;;+4/P-
2</STRUCTURAL_REPRESENTATION>
Missing InChI=
Standard and Non-
Standard InChI?
Converted to
upper case
Indentation
Spurious Spaces
Line Breaks
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
§B.2.4 V2000 Mol File in XML Example
<STRUCTURAL_REPRESENTATION_TYPE>MOL</STRUCTURAL_REPRESENTATION_TYPE>
<STRUCTURAL_REPRESENTATION>30 29 0 0 0 0 0 0 0 0999 V2000 9.9563 -7.3055 0.0000 Y
1 1 0 0 0 0 0 0 0 0 0 0 15.0355 -4.8847 0.0000 * 0 0 0 0 0 0 0 0 0 0 0 0 13.3609 -
8.0134 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 13.8867 -9.9869 0.0000 O 0 5 0 0 0 0 0 0 0 0 0
0 6.4178 -6.8678 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 5.8872 -4.8955 0.0000 O 0 5 0 0 0 0
0 0 0 0 0 0 6.7218 -5.7285 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 13.0541 -9.1519 0.0000 C
0 0 0 0 0 0 0 0 0 0 0 0 13.3408 -6.8634 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 13.8599 -
4.8881 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 13.0301 -5.7260 0.0000 C 0 0 0 0 0 0 0 0 0 0 0
0 5.9099 -9.9441 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0 6.4492 -7.9743 0.0000 O 0 0 0 0 0 0
0 0 0 0 0 0 6.7482 -9.1149 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 7.8605 -5.4221 0.0000 C 0
0 0 0 0 0 0 0 0 0 0 0 11.8897 -5.4263 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 11.9147 -9.4555
0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 7.8855 -9.4263 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
7.6897 -8.0305 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 7.6897 -6.8513 0.0000 C 0 0 0 0 0 0 0
0 0 0 0 0 8.7018 -6.2618 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 9.2908 -5.2506 0.0000 C 0 0
0 0 0 0 0 0 0 0 0 0 10.4700 -5.2524 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 11.0577 -6.2664
0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 12.0761 -6.8427 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
12.0891 -8.0218 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 8.7257 -8.5952 0.0000 N 0 0 0 0 0 0
0 0 0 0 0 0 11.0839 -8.6223 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 10.4848 -9.6275 0.0000
C 0 0 0 0 0 0 0 0 0 0 0 0 9.3057 -9.6139 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 10 2 1 0 0 0 0
8 3 2 0 0 0 0 25 24 1 0 0 0 0 8 4 1 0 0 0 0 27 18 1 0 0 0 0 7 5 2 0 0 0 0 26 28 1 0 0 0 0
7 6 1 0 0 0 0 19 27 1 0 0 0 0 15 7 1 0 0 0 0 20 21 1 0 0 0 0 17 8 1 0 0 0 0 30 27 1 0 0 0
0 11 9 2 0 0 0 0 30 29 1 0 0 0 0 11 10 1 0 0 0 0 20 19 1 0 0 0 0 16 11 1 0 0 0 0 22 21 1
0 0 0 0 14 12 1 0 0 0 0 23 24 1 0 0 0 0 14 13 2 0 0 0 0 18 14 1 0 0 0 0 26 25 1 0 0 0 0
21 15 1 0 0 0 0 29 28 1 0 0 0 0 24 16 1 0 0 0 0 23 22 1 0 0 0 0 28 17 1 0 0 0 0 M CHG 4
1 3 4 -1 6 -1 12 -1 M ISO 1 1 90 M END </STRUCTURAL_REPRESENTATION>
Where to begin?
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
All is not lost!
• Back at the 2011 ChemAxon UGM here in Budapest,
Sorel Muressan from AstraZeneca Sweden gave a
presentation on how spelling correction improves
the recall of ChemAxon’s name-to-structure tools.
• The exact same CaffeineFix technology can be
applied to perform aggressive “spelling correction”
on SMILES strings, InChI and V2000 mol files.
• As with IUPAC-like systematic names, these can each
be specified by a formal grammar.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
How the algorithm works
• The regular expression describing a V2000 mol files is
compiled into a “finite state machine” with 1333
states.
• The only allowed “corrections” are the deletion of
new lines and the insertion of spaces or new lines,
but only where permitted in the grammar/FSM.
• Depth-first recursion is used to identify a minimal set
of edits to correct the input.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
§B.2.4 example after correction
30 29 0 0 0 0 0 0 0 0999 V2000
9.9563 -7.3055 0.0000 Y 1 1 0 0 0 0 0 0 0 0 0 0
15.0355 -4.8847 0.0000 * 0 0 0 0 0 0 0 0 0 0 0 0
13.3609 -8.0134 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
13.8867 -9.9869 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0
6.4178 -6.8678 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
5.8872 -4.8955 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0
6.7218 -5.7285 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
13.0541 -9.1519 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
13.3408 -6.8634 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
13.8599 -4.8881 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0
...
21 15 1 0 0 0 0
29 28 1 0 0 0 0
24 16 1 0 0 0 0
23 22 1 0 0 0 0
28 17 1 0 0 0 0
M CHG 4 1 3 4 -1 6 -1 12 -1
M ISO 1 1 90
M END
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
3 line Header Block
before Count Line
Chemaxon toolkit
implementation
public static Molecule molFileToChemaxonMol(String molFileStr)
throws MolFormatException {
try {
return MolImporter.importMol(molFileStr);
}
catch (MolFormatException e) {
molFileStr = FixMolFile.fixMolFile(molFileStr);
if (molFileStr == null){
throw e;
}
return MolImporter.importMol(molFileStr);
}
}
// Java source code available at http://www.chemaxon.com/forum/ftopic1265.html
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
Geek of the week
• A particularly tricky corner case concerns Accerlys’
Pipeline Pilot-style V2000 mol files which abbreviate
the columns in the atom block (to save space).
• In these files there’s potential ambiguity where the
first bond line is mistaken as a continuation of the
last (abbreviated) atom line.
• Our solution relies on the atom stereo care field
being zero in non-query mol files vs. the non-zero
values that appear in the first three fields of bonds.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
Lest we forget
• A similar “spelling correction” variant that allows
uppercase characters to be mapped to lowercase,
and the prefix “InChI=” to magically appear at the
start of a string can also be used to fix ISO InChIs.
• Alas uppercasing an InChI (or any molecular formula)
is potentially lossy, e.g. “CsN” vs. “CSn”.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
Before and after InChI example
1S/C17H21CLN4O/C1-22-12-3-2-4-13(22)8-11(7-
12)21-17(23)14-5-10(18)6-15-16(14)20-9-19-15/H5-
6,9,11-13H,2-4,7-8H2,1H3,(H,19,20)(H,21,23)
InChI=1S/C17H21ClN4O/c1-22-12-3-2-4-13(22)8-11(7-
12)21-17(23)14-5-10(18)6-15-16(14)20-9-19-15/h5-
6,9,11-13H,2-4,7-8H2,1H3,(H,19,20)(H,21,23)
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
How common are the ambiguities?
• 1.35 million standard InChIs from ChEMBL
• Uppercase the InChIs, fix them and check
whether the original InChI can be regenerated
• 99.5% roundtrip (6596 discrepancies)
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
Inchi case-insensitive ambiguities
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
conclusions
• The Java source code for recovering V2000 mol files
and InChIs from the types of corruption seen in the
ISO 12238 draft has now been contributed to the
ChemAxon forum, allowing Marvin and JChem to
read the examples given in that document.
• Whether this functionality will be required to fully
support the final (pending) “Implementation Guide”
requirements remains to be seen (and voted upon).
• Attention to detail is important in standards writing.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
Final words
• ISO 11238 IDs may become as popular as
Chemical Abstracts’ registry numbers.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
acknowledgements
• Daniel Lowe, NextMove Software, Cambridge, UK.
• Richard Bolton, GSK, Stevenage, UK.
• Evan Bolton, NCBI PubChem, Bethesda, MD, USA.
• Dac-Trung Nguyen, NIH NCATS, Rockville, MD, USA.
• Tyler Peryea, NIH NCATS, Rockville, MD, USA.
• Noel Southall, NIH NCATS, Rockville, MD, USA.
• Yulia Borodina, FDA, Silver Spring, MD, USA.
• Lawrence Callahan, FDA, Silver Spring, MD, USA.
• Andrew Marr, Marr Consultancy, Knebworth, UK.
ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014

More Related Content

Similar to EUGM 2014 - Roger Sayle (NextMove Software): Implementing ISO standard 11238 compliance with ChemAxon tools

Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...
Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...
Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...NextMove Software
 
Permanent Wafer Bonding for Semiconductor: Application Trends & Technology 20...
Permanent Wafer Bonding for Semiconductor: Application Trends & Technology 20...Permanent Wafer Bonding for Semiconductor: Application Trends & Technology 20...
Permanent Wafer Bonding for Semiconductor: Application Trends & Technology 20...Yole Developpement
 
SOA_Case Study_Solution_Overview
SOA_Case Study_Solution_OverviewSOA_Case Study_Solution_Overview
SOA_Case Study_Solution_Overviewsuri86
 
Cross standard and scheme composition - A needed cornerstone for the European...
Cross standard and scheme composition - A needed cornerstone for the European...Cross standard and scheme composition - A needed cornerstone for the European...
Cross standard and scheme composition - A needed cornerstone for the European...Javier Tallón
 
The anatomy of a chemical reaction: Dissection by machine learning algorithms
The anatomy of a chemical reaction: Dissection by machine learning algorithmsThe anatomy of a chemical reaction: Dissection by machine learning algorithms
The anatomy of a chemical reaction: Dissection by machine learning algorithmsAlex Clark
 
Chemoinformatics in Action
Chemoinformatics in ActionChemoinformatics in Action
Chemoinformatics in ActionSSA KPI
 
Interoperability for smart appliances in the IoT world
Interoperability for smart appliances in the IoT worldInteroperability for smart appliances in the IoT world
Interoperability for smart appliances in the IoT worldMonika Solanki
 
IEC 61850 Lessons Learned 2016 04-11
IEC 61850 Lessons Learned 2016 04-11IEC 61850 Lessons Learned 2016 04-11
IEC 61850 Lessons Learned 2016 04-11Kevin Mahoney
 
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...ChemAxon
 
plastic by kolomohjjjj amole shérif .pdf
plastic by kolomohjjjj amole shérif .pdfplastic by kolomohjjjj amole shérif .pdf
plastic by kolomohjjjj amole shérif .pdfrhrassanconnect
 
UNSPSC Process and Samples
UNSPSC Process and SamplesUNSPSC Process and Samples
UNSPSC Process and SamplesIndra kumar
 
Industry 4.0 - Enabling operational excellence of packaging lines
Industry 4.0 - Enabling operational excellence of packaging linesIndustry 4.0 - Enabling operational excellence of packaging lines
Industry 4.0 - Enabling operational excellence of packaging linesStephane Potier
 
Assembly Root Cause Analysis A Way To Reduce Dimensional Variation In Assemb...
Assembly Root Cause Analysis  A Way To Reduce Dimensional Variation In Assemb...Assembly Root Cause Analysis  A Way To Reduce Dimensional Variation In Assemb...
Assembly Root Cause Analysis A Way To Reduce Dimensional Variation In Assemb...Stephen Faucher
 
Nanoshel Presentation
Nanoshel PresentationNanoshel Presentation
Nanoshel PresentationNanoshel
 

Similar to EUGM 2014 - Roger Sayle (NextMove Software): Implementing ISO standard 11238 compliance with ChemAxon tools (20)

Partex Catalogue
Partex Catalogue Partex Catalogue
Partex Catalogue
 
Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...
Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...
Reading and Writing Molecular File Formats for Data Exchange of Small Molecul...
 
ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics...
ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics...ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics...
ChemSpider and Traveling the Internet via Chemical Structures Cheminformatics...
 
Permanent Wafer Bonding for Semiconductor: Application Trends & Technology 20...
Permanent Wafer Bonding for Semiconductor: Application Trends & Technology 20...Permanent Wafer Bonding for Semiconductor: Application Trends & Technology 20...
Permanent Wafer Bonding for Semiconductor: Application Trends & Technology 20...
 
SOA_Case Study_Solution_Overview
SOA_Case Study_Solution_OverviewSOA_Case Study_Solution_Overview
SOA_Case Study_Solution_Overview
 
Cross standard and scheme composition - A needed cornerstone for the European...
Cross standard and scheme composition - A needed cornerstone for the European...Cross standard and scheme composition - A needed cornerstone for the European...
Cross standard and scheme composition - A needed cornerstone for the European...
 
The anatomy of a chemical reaction: Dissection by machine learning algorithms
The anatomy of a chemical reaction: Dissection by machine learning algorithmsThe anatomy of a chemical reaction: Dissection by machine learning algorithms
The anatomy of a chemical reaction: Dissection by machine learning algorithms
 
Chemoinformatics in Action
Chemoinformatics in ActionChemoinformatics in Action
Chemoinformatics in Action
 
Interoperability for smart appliances in the IoT world
Interoperability for smart appliances in the IoT worldInteroperability for smart appliances in the IoT world
Interoperability for smart appliances in the IoT world
 
IEC 61850 Lessons Learned 2016 04-11
IEC 61850 Lessons Learned 2016 04-11IEC 61850 Lessons Learned 2016 04-11
IEC 61850 Lessons Learned 2016 04-11
 
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...
EUGM 2014 - Serge P. Parel (Exquiron): Farewell, PipelinePilot : Migrating th...
 
plastic by kolomohjjjj amole shérif .pdf
plastic by kolomohjjjj amole shérif .pdfplastic by kolomohjjjj amole shérif .pdf
plastic by kolomohjjjj amole shérif .pdf
 
UNSPSC Process and Samples
UNSPSC Process and SamplesUNSPSC Process and Samples
UNSPSC Process and Samples
 
Industry 4.0 - Enabling operational excellence of packaging lines
Industry 4.0 - Enabling operational excellence of packaging linesIndustry 4.0 - Enabling operational excellence of packaging lines
Industry 4.0 - Enabling operational excellence of packaging lines
 
Assembly Root Cause Analysis A Way To Reduce Dimensional Variation In Assemb...
Assembly Root Cause Analysis  A Way To Reduce Dimensional Variation In Assemb...Assembly Root Cause Analysis  A Way To Reduce Dimensional Variation In Assemb...
Assembly Root Cause Analysis A Way To Reduce Dimensional Variation In Assemb...
 
Forecasting Steel
Forecasting SteelForecasting Steel
Forecasting Steel
 
Vocabularies and Linked Open Data
Vocabularies and Linked Open DataVocabularies and Linked Open Data
Vocabularies and Linked Open Data
 
Lo c 2011-05-18
Lo c 2011-05-18Lo c 2011-05-18
Lo c 2011-05-18
 
Paper Journal_Final
Paper Journal_FinalPaper Journal_Final
Paper Journal_Final
 
Nanoshel Presentation
Nanoshel PresentationNanoshel Presentation
Nanoshel Presentation
 

More from ChemAxon

Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?ChemAxon
 
Chemaxon EU UGM 2022 | Translating data to predictive models
Chemaxon EU UGM 2022 | Translating data to predictive modelsChemaxon EU UGM 2022 | Translating data to predictive models
Chemaxon EU UGM 2022 | Translating data to predictive modelsChemAxon
 
Translating data to predictive models
Translating data to predictive modelsTranslating data to predictive models
Translating data to predictive modelsChemAxon
 
Efficient biomolecular structural data handling and analysis - Webinar with D...
Efficient biomolecular structural data handling and analysis - Webinar with D...Efficient biomolecular structural data handling and analysis - Webinar with D...
Efficient biomolecular structural data handling and analysis - Webinar with D...ChemAxon
 
Biomolecule structural data management
Biomolecule structural data managementBiomolecule structural data management
Biomolecule structural data managementChemAxon
 
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first releaseCheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first releaseChemAxon
 
Enhanced stereochemistry representation
Enhanced stereochemistry representation Enhanced stereochemistry representation
Enhanced stereochemistry representation ChemAxon
 
Intellectual property (IP) intelligence solutions designed for the way resear...
Intellectual property (IP) intelligence solutions designed for the way resear...Intellectual property (IP) intelligence solutions designed for the way resear...
Intellectual property (IP) intelligence solutions designed for the way resear...ChemAxon
 
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...ChemAxon
 
Patent Data for Artificial Intelligence based Drug Discovery
Patent Data for Artificial Intelligence based Drug DiscoveryPatent Data for Artificial Intelligence based Drug Discovery
Patent Data for Artificial Intelligence based Drug DiscoveryChemAxon
 
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...ChemAxon
 
Research data management on the cloud
Research data management on the cloudResearch data management on the cloud
Research data management on the cloudChemAxon
 
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound RegistrationCheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound RegistrationChemAxon
 
Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - JChem Engines introduction Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - JChem Engines introduction ChemAxon
 
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...ChemAxon
 
Cheminfo Stories APAC 2020 -- Markush technology
Cheminfo Stories APAC 2020 -- Markush technology Cheminfo Stories APAC 2020 -- Markush technology
Cheminfo Stories APAC 2020 -- Markush technology ChemAxon
 
JChem Microservices
JChem MicroservicesJChem Microservices
JChem MicroservicesChemAxon
 
Migration from joc to jpc or choral
Migration from joc to jpc or choralMigration from joc to jpc or choral
Migration from joc to jpc or choralChemAxon
 
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5ChemAxon
 
Chemicalize Pro - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5Chemicalize Pro - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5ChemAxon
 

More from ChemAxon (20)

Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
Akos Tarcsay (ChemAxon): How fast is Chemaxon RDBMS Search?
 
Chemaxon EU UGM 2022 | Translating data to predictive models
Chemaxon EU UGM 2022 | Translating data to predictive modelsChemaxon EU UGM 2022 | Translating data to predictive models
Chemaxon EU UGM 2022 | Translating data to predictive models
 
Translating data to predictive models
Translating data to predictive modelsTranslating data to predictive models
Translating data to predictive models
 
Efficient biomolecular structural data handling and analysis - Webinar with D...
Efficient biomolecular structural data handling and analysis - Webinar with D...Efficient biomolecular structural data handling and analysis - Webinar with D...
Efficient biomolecular structural data handling and analysis - Webinar with D...
 
Biomolecule structural data management
Biomolecule structural data managementBiomolecule structural data management
Biomolecule structural data management
 
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first releaseCheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
Cheminfo Stories 2021 | Virtual UGM | Marvin Pro: The first release
 
Enhanced stereochemistry representation
Enhanced stereochemistry representation Enhanced stereochemistry representation
Enhanced stereochemistry representation
 
Intellectual property (IP) intelligence solutions designed for the way resear...
Intellectual property (IP) intelligence solutions designed for the way resear...Intellectual property (IP) intelligence solutions designed for the way resear...
Intellectual property (IP) intelligence solutions designed for the way resear...
 
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
GPS for Chemical Space - Digital Assistants to Support Molecule Design - Chem...
 
Patent Data for Artificial Intelligence based Drug Discovery
Patent Data for Artificial Intelligence based Drug DiscoveryPatent Data for Artificial Intelligence based Drug Discovery
Patent Data for Artificial Intelligence based Drug Discovery
 
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
 
Research data management on the cloud
Research data management on the cloudResearch data management on the cloud
Research data management on the cloud
 
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound RegistrationCheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
Cheminfo Stories APAC 2020 - Introducing Design Hub & Compound Registration
 
Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - JChem Engines introduction Cheminfo Stories APAC 2020 - JChem Engines introduction
Cheminfo Stories APAC 2020 - JChem Engines introduction
 
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
Cheminfo Stories APAC 2020 - Database management on desktop with JChem for Of...
 
Cheminfo Stories APAC 2020 -- Markush technology
Cheminfo Stories APAC 2020 -- Markush technology Cheminfo Stories APAC 2020 -- Markush technology
Cheminfo Stories APAC 2020 -- Markush technology
 
JChem Microservices
JChem MicroservicesJChem Microservices
JChem Microservices
 
Migration from joc to jpc or choral
Migration from joc to jpc or choralMigration from joc to jpc or choral
Migration from joc to jpc or choral
 
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
ChemAxon's Compliance Checker - Cheminfo Stories 2020 Day 5
 
Chemicalize Pro - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5Chemicalize Pro - Cheminfo Stories 2020 Day 5
Chemicalize Pro - Cheminfo Stories 2020 Day 5
 

Recently uploaded

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 

Recently uploaded (20)

Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 

EUGM 2014 - Roger Sayle (NextMove Software): Implementing ISO standard 11238 compliance with ChemAxon tools

  • 1. Implementing iso 11238 standard compliance with chemaxon tools Roger Sayle Nextmove software, cambridge, uk ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 2. What is iso 11238? • ISO standard 11238 entitled “Health Informatics – Identification of medicinal products – Data elements and structures for the unique identification and exchange of regulated information on substances”. • Defines a framework for uniquely identifying and exchanging compounds of pharmaceutical interest. • The framework serves a similar role to CAS registry numbers, PubChem CID or InChI-Key, assigning unique identifiers to substances. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 3. Meet the (IDMP) family • 11238 is one of a suite of 5 related standards, all for “unique identification and exchange of …” – 11238 “… regulated information on substances”. – 11239 “… dose forms, units, administration, etc.”. – 11240 “… units of measurement”. – 11615 “… regulated medicinal product information”. – 11616 “… regulated pharmaceutical product information”. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 4. Why this is 11238 important? • EU regulation 520/2012 on “pharmacovigilance” requires countries, regulatory authorities and pharma to adopt the 5 IDMP standards (articles 25 and 26) by 1st July 2016 (article 40). • Executive summary: It’s the law! ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 5. How it works Code Assignment (Authority) Code Look-up (Authority) Name/Identifer Connection Table Properties (Significant Text) Unique Code Unique Code Name/Identifer Connection Table Properties (Significant Text) ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 6. Likely implementation Code Assignment (Authority) Code Look-up (Authority) Name/Identifer Connection Table Properties (Significant Text) Unique Code Unique Code Name/Identifer Connection Table Properties (Significant Text) FDA UNII FDA SRS Search FDA UNII XML INN/USAN/CID FDA/NCATS GInAS MOL2000/SMILES/InChI Protein/NA Sequence ISO11238 Groups 1-4 ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 7. Current status • The standard has been ratified and it use has been written into EU law (EU Reg. 520/2012). • Framework requires use of non-semantic, random, fixed length unique identifiers, that include an internal integrity check. • The standard also details constraints on uniqueness. • Exact implementation details yet to be determined (to appear in a future “Implementation Guide”). ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 8. What will the future look like? • ISO11238 compliant identifiers will be very similar to the FDA’s UNII (UNique Ingredient Identifier). • The fixed width non-semantic identifier requirement rules out the use of plain SMILES, InChI, V2000 Mol file and similar encodings. • The random requirement rules out plain CAS registry numbers, PubChem CIDs and ChEMBL IDs (which use sequential or monotonic number assignment). • Alternatively, InChI keys or similar hashes (with [CRC] checks) of connection tables+text may be possible. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 9. What’s available now • ISO charge for access to official standards documents (which is why 5 IDMP standards is more profitable than one), about 158 CHF ($177 USD) from ISO for 11238 [between $120 and $340 online]. • However, as with many ISO standards, late drafts of ISO 11238 are freely available on the internet. • Caution: Many of the technical examples (all XML) were removed from the final standard and are due to appear in the upcoming “Implementation Guide(s)”. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 10. Example requirement • §3.4 “Naming of substances” states “at least one substance name or company code shall be associated with each substance”. • For the envisioned work flows this typically assumes INN or USAN name has already been assigned. • One way to guarantee the existence of a suitable substance name for investigational compounds is to use IUPAC naming software (such as ChemAxon’s) during submission to the unique coding authority. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014 • Plug: ChemAxon s2n coverage is state-of-the-art.
  • 11. The devil is in the details • One of the interesting cheminformatics challenges with working with the published ISO standard and the examples from the draft annex is the typography. • The document has been typeset by editors with expertise outside the field of cheminformatics who have inadvertently changed whitespace without appreciating the impact this has on chemistry tools. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 12. Final ISO11238 standard Annex A • §A.2.3 SMILES uses the example “C1 = CC = CC = C1” where the spurious spaces create problems for SMILES readers. • §A.2.4 InChI both strips the “InChI=” prefix and again suffers from spaces “1/C6H6 /c1-2-4-6-5-3-1/h1-6H”. – Interestingly this is an old InChI not a standard InChI. • §A.2.2 Molfile fails to mention that V2000 mol files use fixed width columns and blank lines, as a result the example given in text *next slide+ can’t easily be read. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 13. Annex A: example.mol ACD/Labs0812062058 6 6 0 0 0 0 0 0 0 0 1 V2000 1.9050 −0.7932 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 1.9050 −2.1232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.7531 −0.1282 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 0.7531 −2.7882 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 −0.3987 −0.7932 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 −0.3987 −2.1232 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 2 1 1 0 0 0 0 3 1 2 0 0 0 0 4 2 2 0 0 0 0 5 3 1 0 0 0 0 6 4 1 0 0 0 0 6 5 2 0 0 0 0 M END $$$$ Missing Blank Lines Incorrectly aligned columns ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 14. Benefit of the doubt? • These unintentional typographical errors in the normative text may perhaps be the result of poor fonts, with the exception of “InChI=”. • Alas the content of the original Annex B from the draft indicate these issues were more widespread and may arise from ignorance of cheminformatics file formats. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 15. §B.2.2 InChI in XML Example <STRUCTURAL_REPRESENTATION_TYPE>INCHI</STRUCTURAL_REPRESENTATION_TYPE> <STRUCTURAL_REPRESENTATION>1S/C2H5NO2.AL.CLH.2H2O.ZR/C3-1- 2(4)5;;;;;/H1,3H2,(H,4,5);;1H;2*1H2;/Q;+3;;;;+4/P- 2</STRUCTURAL_REPRESENTATION> Missing InChI= Standard and Non- Standard InChI? Converted to upper case Indentation Spurious Spaces Line Breaks ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 16. §B.2.4 V2000 Mol File in XML Example <STRUCTURAL_REPRESENTATION_TYPE>MOL</STRUCTURAL_REPRESENTATION_TYPE> <STRUCTURAL_REPRESENTATION>30 29 0 0 0 0 0 0 0 0999 V2000 9.9563 -7.3055 0.0000 Y 1 1 0 0 0 0 0 0 0 0 0 0 15.0355 -4.8847 0.0000 * 0 0 0 0 0 0 0 0 0 0 0 0 13.3609 - 8.0134 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 13.8867 -9.9869 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0 6.4178 -6.8678 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 5.8872 -4.8955 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0 6.7218 -5.7285 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 13.0541 -9.1519 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 13.3408 -6.8634 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 13.8599 - 4.8881 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 13.0301 -5.7260 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 5.9099 -9.9441 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0 6.4492 -7.9743 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 6.7482 -9.1149 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 7.8605 -5.4221 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 11.8897 -5.4263 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 11.9147 -9.4555 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 7.8855 -9.4263 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 7.6897 -8.0305 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 7.6897 -6.8513 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 8.7018 -6.2618 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 9.2908 -5.2506 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 10.4700 -5.2524 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 11.0577 -6.2664 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 12.0761 -6.8427 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 12.0891 -8.0218 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 8.7257 -8.5952 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 11.0839 -8.6223 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 10.4848 -9.6275 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 9.3057 -9.6139 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 10 2 1 0 0 0 0 8 3 2 0 0 0 0 25 24 1 0 0 0 0 8 4 1 0 0 0 0 27 18 1 0 0 0 0 7 5 2 0 0 0 0 26 28 1 0 0 0 0 7 6 1 0 0 0 0 19 27 1 0 0 0 0 15 7 1 0 0 0 0 20 21 1 0 0 0 0 17 8 1 0 0 0 0 30 27 1 0 0 0 0 11 9 2 0 0 0 0 30 29 1 0 0 0 0 11 10 1 0 0 0 0 20 19 1 0 0 0 0 16 11 1 0 0 0 0 22 21 1 0 0 0 0 14 12 1 0 0 0 0 23 24 1 0 0 0 0 14 13 2 0 0 0 0 18 14 1 0 0 0 0 26 25 1 0 0 0 0 21 15 1 0 0 0 0 29 28 1 0 0 0 0 24 16 1 0 0 0 0 23 22 1 0 0 0 0 28 17 1 0 0 0 0 M CHG 4 1 3 4 -1 6 -1 12 -1 M ISO 1 1 90 M END </STRUCTURAL_REPRESENTATION> Where to begin? ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 17. All is not lost! • Back at the 2011 ChemAxon UGM here in Budapest, Sorel Muressan from AstraZeneca Sweden gave a presentation on how spelling correction improves the recall of ChemAxon’s name-to-structure tools. • The exact same CaffeineFix technology can be applied to perform aggressive “spelling correction” on SMILES strings, InChI and V2000 mol files. • As with IUPAC-like systematic names, these can each be specified by a formal grammar. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 18. How the algorithm works • The regular expression describing a V2000 mol files is compiled into a “finite state machine” with 1333 states. • The only allowed “corrections” are the deletion of new lines and the insertion of spaces or new lines, but only where permitted in the grammar/FSM. • Depth-first recursion is used to identify a minimal set of edits to correct the input. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 19. §B.2.4 example after correction 30 29 0 0 0 0 0 0 0 0999 V2000 9.9563 -7.3055 0.0000 Y 1 1 0 0 0 0 0 0 0 0 0 0 15.0355 -4.8847 0.0000 * 0 0 0 0 0 0 0 0 0 0 0 0 13.3609 -8.0134 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 13.8867 -9.9869 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0 6.4178 -6.8678 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 5.8872 -4.8955 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0 6.7218 -5.7285 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 13.0541 -9.1519 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0 13.3408 -6.8634 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0 13.8599 -4.8881 0.0000 N 0 0 0 0 0 0 0 0 0 0 0 0 ... 21 15 1 0 0 0 0 29 28 1 0 0 0 0 24 16 1 0 0 0 0 23 22 1 0 0 0 0 28 17 1 0 0 0 0 M CHG 4 1 3 4 -1 6 -1 12 -1 M ISO 1 1 90 M END ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014 3 line Header Block before Count Line
  • 20. Chemaxon toolkit implementation public static Molecule molFileToChemaxonMol(String molFileStr) throws MolFormatException { try { return MolImporter.importMol(molFileStr); } catch (MolFormatException e) { molFileStr = FixMolFile.fixMolFile(molFileStr); if (molFileStr == null){ throw e; } return MolImporter.importMol(molFileStr); } } // Java source code available at http://www.chemaxon.com/forum/ftopic1265.html ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 21. Geek of the week • A particularly tricky corner case concerns Accerlys’ Pipeline Pilot-style V2000 mol files which abbreviate the columns in the atom block (to save space). • In these files there’s potential ambiguity where the first bond line is mistaken as a continuation of the last (abbreviated) atom line. • Our solution relies on the atom stereo care field being zero in non-query mol files vs. the non-zero values that appear in the first three fields of bonds. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 22. Lest we forget • A similar “spelling correction” variant that allows uppercase characters to be mapped to lowercase, and the prefix “InChI=” to magically appear at the start of a string can also be used to fix ISO InChIs. • Alas uppercasing an InChI (or any molecular formula) is potentially lossy, e.g. “CsN” vs. “CSn”. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 23. Before and after InChI example 1S/C17H21CLN4O/C1-22-12-3-2-4-13(22)8-11(7- 12)21-17(23)14-5-10(18)6-15-16(14)20-9-19-15/H5- 6,9,11-13H,2-4,7-8H2,1H3,(H,19,20)(H,21,23) InChI=1S/C17H21ClN4O/c1-22-12-3-2-4-13(22)8-11(7- 12)21-17(23)14-5-10(18)6-15-16(14)20-9-19-15/h5- 6,9,11-13H,2-4,7-8H2,1H3,(H,19,20)(H,21,23) ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 24. How common are the ambiguities? • 1.35 million standard InChIs from ChEMBL • Uppercase the InChIs, fix them and check whether the original InChI can be regenerated • 99.5% roundtrip (6596 discrepancies) ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 25. Inchi case-insensitive ambiguities ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 26. conclusions • The Java source code for recovering V2000 mol files and InChIs from the types of corruption seen in the ISO 12238 draft has now been contributed to the ChemAxon forum, allowing Marvin and JChem to read the examples given in that document. • Whether this functionality will be required to fully support the final (pending) “Implementation Guide” requirements remains to be seen (and voted upon). • Attention to detail is important in standards writing. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 27. Final words • ISO 11238 IDs may become as popular as Chemical Abstracts’ registry numbers. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014
  • 28. acknowledgements • Daniel Lowe, NextMove Software, Cambridge, UK. • Richard Bolton, GSK, Stevenage, UK. • Evan Bolton, NCBI PubChem, Bethesda, MD, USA. • Dac-Trung Nguyen, NIH NCATS, Rockville, MD, USA. • Tyler Peryea, NIH NCATS, Rockville, MD, USA. • Noel Southall, NIH NCATS, Rockville, MD, USA. • Yulia Borodina, FDA, Silver Spring, MD, USA. • Lawrence Callahan, FDA, Silver Spring, MD, USA. • Andrew Marr, Marr Consultancy, Knebworth, UK. ChemAxon User Group Meeting 2014, Budapest, Hungary, Wednesday 21st May 2014