Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mixtures InChI: a story of how standards drive upstream products

Presented to the NIH (National Institutes of Health) Big Data InChI symposium, March 2021.

  • Be the first to comment

  • Be the first to like this

Mixtures InChI: a story of how standards drive upstream products

  1. 1. Alex M. Clark & Leah R. McEwen Mixtures InChI A story of how standards drive upstream products alex@collaborativedrug.com
  2. 2. It's always a mixture ✤ The pure molecule approximation has value... but in the real world: 2
  3. 3. State of mixture informatics ✤ Bespoke formats exist ✤ Generally in silos: ‣ low machine readability ‣ low interoperability 3 ✤ Machine readable molecules ~ ½ century ago, but mixtures limited to text Phenolphthalein, 1% (w/v ) Indicator Solutio n in 95% Ethanol
  4. 4. Mixtures InChI 4 molmatinf.com/minchidemo
  5. 5. Mixtures InChI 4 molmatinf.com/minchidemo CH2O/c1-2/h1H2 1 37wf-2 CH4O/c1-2/h2H,1H3 3 10:15pp0 H2O/h1H2 2
  6. 6. Mixtures InChI 4 molmatinf.com/minchidemo MInChI=0.00.1S/ & & /n{{ & }& }/g{{ &}& } CH2O/c1-2/h1H2 1 37wf-2 CH4O/c1-2/h2H,1H3 3 10:15pp0 H2O/h1H2 2
  7. 7. Mixtures InChI 4 molmatinf.com/minchidemo MInChI=0.00.1S/ & & /n{{ & }& }/g{{ &}& } CH2O/c1-2/h1H2 1 37wf-2 CH4O/c1-2/h2H,1H3 3 10:15pp0 H2O/h1H2 2
  8. 8. One brick at a time... ✤ Have: 5 ✤ What to do? Chicken vs. egg problems... ‣ no community without data ‣ no data without tools ‣ no tools without community ☑︎ use cases ☑︎ standard ☒ tools ☒ data ☒ community talking to people: demand is real MInChI speci fi cation, IUPAC endorsed ... custom or limited purpose ... not machine readable ... not informatics oriented have to build simultaneously
  9. 9. Tools ✤ 2018: NIH SBIR grant awarded to Collaborative Drug Discovery ✤ First step: open source mixture editor and software libraries ✤ Coded in TypeScript, cross- compiled to JavaScript, for: ‣ web ‣ desktop (via Electron) ‣ server (via NodeJS) ✤ Operates on "Mix fi le" which is ELN-like, JSON-based, mixture analog of Mol fi le 6 github.com/cdd/mixtures
  10. 10. Upstream/Downstream 7 Mix fi le MInChI commercial open source standards use cases customers searching indexing labelling categorising time
  11. 11. ELN integration ✤ Mixture creation is also part of a commercial product ✤ Scientists use the ELN already... ✤ ... machine readable data is a side e ff ect of normal use ✤ First class citizens: ‣molecules ‣reactions ‣mixtures 8 collaborativedrug.com
  12. 12. Data ✤ Bootstrap from text sources ✤ Proprietary deep learning algorithm ✤ ~30K mixtures marked up, public release 9 ✤ Substantial body of exemplars, and upstream test data for MInChI generation ✤ Can rapidly markup inventories and vendor catalogs ✤ Integrated into software-as-a-service products
  13. 13. Support resources 10 ✤ Looking up known content speeds up data creation... ✤ INCI and UNII collections available to quick search
  14. 14. Demidigital ✤ Partially marked up data can be upgraded by document-wide options... 11 ✤ Currently in design phase MATERIAL MATERIAL QUANTITY B A C
  15. 15. Community ✤ Creating technology is easy, getting everyone to use it is hard... ✤ Requires concurrent strategies 12 ✤ Endorsement by respected standards organisations is a good start ✤ InChI derivatives have enthusiastic champions (that's us!)
  16. 16. Engagement ✤ Code it up: using MInChI notation is easy enough ✤ Got use cases? Let us know ✤ Spread the word: data resources need to be digitised 13
  17. 17. Further viewing ✤ Peer reviewed literature: 14 ✤ Webinars: 2019: www.youtube.com/watch?v=PcAJ4HoRnFU Capturing mixtures — bringing informatics to the world of practical chemistry 2021: www.youtube.com/watch?v=0ILc0owuEzQ (starts at 1:05:00) Mixtures as fi rst class citizens in the realm of informatics 2020: www.youtube.com/watch?v=aSQEVKKnrWw (starts at 4:13:00) Mixtures: informatics for formulations and consumer products github.com/cdd/mixtures
  18. 18. Further work ✤ Finalising MInChI 1.0 speci fi cation, reference implementation, validation ✤ MInChI needs to extend to less well de fi ned chemical entities ‣ variable structures ‣ polymers ‣ biologics ‣ nanomaterials ‣ reaction products ✤ Properties and metadata: ontology based / IUPAC Gold Book ✤ Implementation at scale: registration systems 15 MInChI Open Meeting: 20 April 11am-1pm (EDT)
  19. 19. Questions? ✤ Contact: ‣ Leah R. McEwen lrm1@cornell.edu (Cornell University, IUPAC/InChI Trust) ‣ Alex M. Clark alex@collaborativedrug.com (Collaborative Drug Discovery) ✤ Thanks to the MInChI team 16

×