Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Coordination InChI (2019)

Download to read offline

Preliminary survey of inorganic compounds

For extensive details and updates, see https://github.com/aclarkxyz/data_coordinchi

Related Books

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Coordination InChI (2019)

  1. 1. Coordination InChI Preliminary survey of inorganic compounds Alex M. Clark, Ph.D. August 2019
  2. 2. COORDINATION INCHI Goal • Ideally • All drawings of a chemical entity produce the same InChI/C • One InChI/C can never match two drawings of different molecules • Probably impossible, but can we get close enough to be useful? 2 {coordination INCHI}
  3. 3. COORDINATION INCHI Deliverable • Training set for inorganic compounds: - real-world compounds (CSD, PubChem, misc) - some drawn well, others drawn badly • Prognosis for issues to expect: a. current InChI works fine, or b. new layer is required, or c. intractible problems persist • Use as a definitive pass/fail validation key 3
  4. 4. COORDINATION INCHI Source Data • Cambridge Structural Database: - ≤ 500K inorganics that aren't polymers - 2D coordinates, intelligent bonds, H-counts - selected ~500 by diverse clustering • PubChem: - picked ~200 from large subset of garbage - most had to be redrawn • Miscellaneous: - privately curated data ~500 compounds - carefully drawn inorganic valences 4
  5. 5. COORDINATION INCHI Exotic Bonding Types • Identified 17 types that need attention... 5 alkene alternating arene bidentate carbene
  6. 6. COORDINATION INCHI carbonyl dative disconnected H-bond hypervalent hypovalent metallabenzene
  7. 7. COORDINATION INCHI metal-metal multicentre nitrosyl symmetry terminal O
  8. 8. COORDINATION INCHI Core Datastructure 8
  9. 9. COORDINATION INCHI Rule 1 • If your representation does not imply the correct molecular formula • Most cheminformatics formats/editors/use patterns fail this test for nontrivial inorganics 9 then you are wrong
  10. 10. COORDINATION INCHI Rule 2 • In order of preference: (a) correct valence for early main groups (b) inferred electron delocalisation paths (c) realistic bond orders & formal charges (d) sensible oxidation states on metals (e) symmetry • Usually possible to satisfy all conditions, with frequent exception of symmetry 10
  11. 11. COORDINATION INCHI Rule 3 • Non-trivial inorganics usually offer many correct ways to draw • Avoid overspecification - more metadata can be added later • Use only minimum information needed to: - satisfy rule 1 (imply formula) - optimise for rule 2 - resolve genuinely different molecules 11
  12. 12. COORDINATION INCHI Core Datastructure 12
  13. 13. COORDINATION INCHI Algorithm: Prerequisites 1. complete heavy atom graph 2. hydrogen counts 3. bond orders → delocalisation islands 4. net charges for each island 13 • GIGO
  14. 14. COORDINATION INCHI Algorithm: Implementation • atom priority → [element, hcount, chg*] • bond → <0, 0..1, 1, 1..2, 2, 2..3, 3+> • iterate: atom priority → [a, ⇪{b1, a1}, {b2, a2}, …] • if degenerate, bump lowest priority atom & repeat • outcome: atom priority = walk order • can now serialise in various different ways, e.g. SMILES-esque, InChI-esque 14
  15. 15. COORDINATION INCHI Algorithm: Outcome • Algorithm weakest link is detecting delocalisation islands • User weakest link is implying correct hydrogen counts • Remarkably tolerant to multiple ways of drawing inorganic bonds • Preliminary results are promising for disambiguating inorganics correctly 15

Preliminary survey of inorganic compounds For extensive details and updates, see https://github.com/aclarkxyz/data_coordinchi

Views

Total views

150

On Slideshare

0

From embeds

0

Number of embeds

3

Actions

Downloads

0

Shares

0

Comments

0

Likes

0

×