Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Application of CurlySMILES to the encoding of polymer systems


Published on

Polymer informatics: topics in encoding and sharing macromolecular information presented and discussed on August 16, 2015, at the ACS 250th National Meeting in Boston

Published in: Software
  • Be the first to comment

Application of CurlySMILES to the encoding of polymer systems

  1. 1. Application of CurlySMILES to the encoding of polymer systems Presented on August 16, 2015, at the ACS 250th National Meeting in Boston Division: Computers in Chemistry Session: Accelerated Discovery of Chemical Compounds: Design New Polymers & Inorganic Materials from Integration of Polymer Science, Materials Science & Informatics Axel Drefahl Axeleratio, Reno, Nevada
  2. 2. Copyright © 2015 Axel Drefahl What is CurlySMILES? CurlySMILES is: ● a chemical language to capture, process and share, nanostructures, based on molecular constitution, connectivity and arrangement; ● a line notation system integrating SMILES with atom- and molecule-anchored annotations, inserted via curly braces: {…}; ● custamizable by annotation: encoding of polymers, complexes, multi-phase systems, ...; ● available as a suite of Python 3 modules, including a notation parser and unique notation generator.
  3. 3. Copyright © 2015 Axel Drefahl Overview ● Current state of polymer informatics ● Brief introduction to CurlySMILES ● Encoding of Structural Repeat Units (SRUs) ● Encoding of single-strand polymers ● Encoding of multi-strand polymers ● Encoding of copolymers and miscellaneous polymer systems ● CurlySMILES software/task-specific integration ● Perspective: virtual polymer chemistry
  4. 4. Copyright © 2015 Axel Drefahl Cheminformatics (sub)domains Established informatics ● “Small molecules” ● Crystalline solids ● Peptides, DNAs, ... Capturing & processing ● Molecular graph ● Unit cell, space group ● Fragment sequence Capturing & processing ● Struct. repeat unit (SRU) ● Nano-object (sphere,rod,...) ● Variable groups: R,X,Y,Z,... ● Metalevel components Evolving informatics ● Polymer systems ● Nanomaterials ● Material classes ● Composites & design
  5. 5. Copyright © 2015 Axel Drefahl Polymer informatics: approaches and tools ● IUPAC nomenclature & seniority rules (head-tail selection) ● S-group (superatom): SRU with crossing bonds and brackets (common representation, MDL, MarvinSketch), ● ThermoML with polymer block to specify compounds, ● Polymer Markup Language (PML), ● Polymer Informatics Knowledge System (PIKS) - PolyInfo database | “walled gardens” of polymer information, ● InChI polymer project (awaits implementation), ● CurlySMILES project, actively designs a human-machine interface for nanoarchitectures, including polymers, and develops open-source Python code.
  6. 6. Copyright © 2015 Axel Drefahl From SMILES to CurlySMILES SMILES: Simplified Molecular Input Line Entry System Published by David Weininger in 1988 doi: 10.1021/ci00057a005 CurlySMILES: Curly-braces enhanced Smart Material Input Line Entry Specification Published by Axel Drefahl in 2011 doi: 10.1186/1758-2946-3-1
  7. 7. Copyright © 2015 Axel Drefahl CurlySMILES Motivation ● Chemical nomenclature and encoding languages typically employ idealized representations, while minor structural irregularities and impurities are ignored. CurlySMILES encoding enables their insertion via annotation, if desired. ● A molecular-graph-derived notation is often taken to represent molecule and substance interchangeably. CurlySMILES employs molecule multipliers and allows for phase distinction, for example, by using state and shape annotations such as lq, tf, am, cr, np, ... ● Variability of detail: stoichiometric formula notation (SFN) ● Encoding of molecular arrangements: hydrogen-bonded molecules, complexes, macromolecules and other nanoassemblies.
  8. 8. Copyright © 2015 Axel Drefahl Format of curly-enclosed annotations in CurlySMILES {AMk1=v1;...;kn=vn} AM is a one-char or two-char annotation marker; a two-char AM may by followed by an annotation dictionary, a semicolon-separated list of key/value (ki/vi) pairs. Keys are predefined, but extensible by customization ($ prefix). Example: n-octanethiol functionalized gold nanoparticle dispersed in toluene (●_ SCH2 (CH2 )6 CH3 in toluene) S{-|c=[Au]{np}}CCCCCCCC{dpc=Cc1ccccc1} AMs: -| for surface-attached, dp for dispersed, np for nanoparticle.
  9. 9. Copyright © 2015 Axel Drefahl Annotated Molecular Graph Example: (Z)-but-1-ene-1,4-diyl substructure CurlySMILES: C{-}=C{Z}CC{-} Atom-anchored annotations: Structural unit annotation (pendent single bond): {-} Stereodescriptive annotation: {Z}
  10. 10. Copyright © 2015 Axel Drefahl Poly[(Z)-but-1-ene-1,4-diyl] CurlySMILES: C{-}=C{Z}CC{+n} Atom-anchored annotations: Structural unit annotation: {-} at head node Stereodescriptive annotation: {Z} Operational notation:{+n} at tail node
  11. 11. Copyright © 2015 Axel Drefahl Does CurlySMILES encode macromolecules or polymers? Answer: both (user choice). CurlySMILES comes with a rich annotation dictionary to encode chain length variation and phases. A macromolecule is a single molecule. The term “polymer” can mean “macromolecule” or a “substance” composed thereof, typically with a “degree of polymerization” (DOP) range. An oligomer or a macromolecule of a specific length is encoded based on the chain graph, i.e. the SRU graph, using annotation dictionary key n: {+nn=10} for ten-time-occurrence of SRU. A polymer is encoded by leaving out n (generic polymer). The key dpr may specify a DOP range: {+ndpr=gt250}. AMs such as am (amorphous) or cr (crystalline) indicate a particular polymer phase. A polymer system is encoded by additional annotations specifying, for example, impurities, additives and solvents.
  12. 12. Copyright © 2015 Axel Drefahl Tail node annotations to formally construct polymers {+n} anchored at tail node of divalent SRU to build single- strand polymer via head-tail single-bond connection. {+r} anchored at tail node of SRU to build single-strand macrocycle (last tail node connects first head node). {+m} anchored at tail node of non-single-bond or multivalent SRU to build multibond/multi-strand polymer via specified head-tail connection using key ich to provide index of corresponding head node. {+s} anchored at tail node of the last (right-most) SRU in a copolymer sequence to provide copolymer details; for example, a copolymer qualifier via key cpq to specify an alternating, block or random sequence.
  13. 13. Copyright © 2015 Axel Drefahl CurlySMILES notations of some common single-strand homopolymers Structure-Based Name Structural Formula CurlySMILES Notation Poly(oxymethylene) -[OCH2 ]-n O{-}C{+n} Poly(iminoethylene) -[NHCH2 CH2 ]-n N{-}CC{+n} Poly(1-hydroxyethylene) -[CH(OH)CH2 ]-n OC{-}C{+n} Poly(1-cyanoethylene) -[CH(CN)CH2 ]-n N#CC{-}C{+n} Poly(1,1-difluoroethylene) -[CF2 CH2 ]-n FC{-}(F)C{+n} Poly(1-phenylethylene) -[CH(Ph)CH2 ]-n C{-}(c1ccccc1)C{+n} Poly(oxy-1,4-phenylene) -[O-paraPh]-n O{-}c1ccc{+n}cc1 Poly(methylene) -[CH2 ]-n C{-}{+n} Poly(difluoromethylene) -[CF2 ]-n FC{-}{+n}F
  14. 14. Copyright © 2015 Axel Drefahl Polydispersity characterization With the exception of the dimensionless pdi, units are kg/mol. Example: {+npMn=89.2} to encode a single-strand polymer with a number-average molar mass of 89.2 kg/mol Key Symbol Meaning ThermoML tag name pMn Mn Number-average molar mass nNumberAvgMolWt pMm Mm Mass-average molar mass nWeightAvgMolWt pMz Mz z-Average molar mass nZAvgMolWt pMv Mv Viscosity-average molar mass nViscosityAvgMolWt pMp Mp Peak molar mass nPeakAvgMolWt (?) pdi Mm /Mn Polydispersity index nPolydispersityIndex
  15. 15. Copyright © 2015 Axel Drefahl Anionic homopolymer with monoatomic cations Example: poly(sodium 1-carboxylatoethylene) CurlySMILES: O=C([O-]{+Cc=[Na+]})C{-}C{+n} The operational annotation marker +C is used to include [Na+] as counterion to [O-]. [Na+]is part of the repeat unit.
  16. 16. Copyright © 2015 Axel Drefahl Homopolymer with terminating groups at head and tail Example: poly(ethylene terephthalate) by esterification of terephthalic acid with ethylene glycol [H]O{-}CCOC(=O)c1ccc(cc1)C{+ninc=2-15;ich=2} (=O)OCCO Nodes 2 to 15 are parts of SRU. Node 1 makes the head terminus and nodes 16-20 belong to tail end group.
  17. 17. Copyright © 2015 Axel Drefahl Cyclic polymers or oligomers Example: cyclic poly(silaether) [Si]{-}(C)(C)[Si](C)(C)O{+rn=24} Shortcut for a long SMILES notation: [Si]1(C)(C)[Si](C)(C)O...[Si](C)(C)[Si](C)(C)O1 Such cyclic poly(silaether) are obtained, for example, as by-products while making their linear homologs by ring-opening polymerization of octamethyl-1,4-dioxatetrasilacyclohexane [10.1021/ma00086a048].
  18. 18. Copyright © 2015 Axel Drefahl Surface-grafted functional oligomer Example: polyacrylamid brush grown on silicon N{-|c=[Si]}C(=O)c1ccc(cc1)CCC{-} C{+ninc=12-16;ich=12}C(=O)N Group environment annotion -| for bond to substrate Growth of such polyacrylamide brushes on a silicon wafer is studied to understand how to reduce or prevent microbial adhesion on surfaces by chemical surface modification [doi: 10.1021/la063531v].
  19. 19. Copyright © 2015 Axel Drefahl Regular double-strand polymers: chain of formally fused cycloalkane rings Example: poly(butane-1,4:3,2-tetrayl) CurlySMILES notation: C{-}C{+mich=1}C{+mich=4}C{-} two head nodes: C{-}, two tail nodes C{+m} For IUPAC nomenclature of this polymer see A Brief Guide to Polymer Nomenclature.
  20. 20. Copyright © 2015 Axel Drefahl Regular double-strand polymers: chain of formally fused heterocycles Example: poly(2,4-dimethyl-1,3,5-trioxa-2,4-disilapentane- 1,5:4,2-tetrayl) CurlySMILES notation: O{-}[Si]{+mich=1}(C)O[Si]{+mich=7}(C)O{-} two head nodes: O{-}, two tail nodes [Si]{+m} For IUPAC nomenclature of this polymer see page 1573 in
  21. 21. Copyright © 2015 Axel Drefahl Double bond between head and tail Example: poly(piperidine-3,5-diylideneethanediylidene) CurlySMILES notations: A: C1{=}CNCC(C1)=CC{+mich=1;b==} B: C{-}C1CNCC(C1)=C{+n} Both notations encode correct atom connectivity. In the IUPAC-compliant notation A, key b specifies = as bond between tail and head. For IUPAC nomenclature of this polymer see page 1941 in Nomenclature of Regular Single-Strand Organic Polymers.
  22. 22. Copyright © 2015 Axel Drefahl Encoding with copolymer qualifiers Copolymer Qualifiers Example: poly(styrene-co-isoprene) CurlySMILES notation of above example: C{-}C{+ninc=1-8;ich=1}(c1ccccc1) C{-}C=C(C)C{+ninc=9-13;ich=9}{+scpq=c} cpq Qualifier Meaning a alt alternating b block block c co generic g graft graft p per periodic r ran random s stat statistical
  23. 23. Copyright © 2015 Axel Drefahl Encoding of a terpolymer Example: poly[methyl-N-(3,4-dimethylphenyl)-N-(4-biphenyl)-N-(4- phenyloxy)siloxane-co-phenylmethylsiloxane-co- methylhydrosiloxane] c1ccccc1[Si]{-}(C)O{+ninc=1-9;ich=7}[SiH]{-} (C)O{+ninc=10-12;ich=10}[Si]{-}(C) (Oc2ccc(cc2)N(c3cc(C)c(C)cc3)c4ccc(cc4)- c5ccccc5)O{+ninc=13-43;ich=13}{+scpq=c} For more about this terpolymer see 10.1021/ma202041u.
  24. 24. Copyright © 2015 Axel Drefahl Nesting of SRUs Example: unsaturated polyester with α,ω-alkanediyl bridges CurlySMILES notation: C{-}(=O)OC{-}{+ninc=4;ich=4;n=5-9} OC(=O)C(C)=CCC{+ninc=1-13;ich=1}C
  25. 25. Copyright © 2015 Axel Drefahl Encoding of polymer blends Example: polystyrene/poly(methyl methacrylate) blend CurlySMILES notation: C{-}C{+n}c1ccccc1.C{-}C{+n}(C)C(=O)OC{mx} Annotation {mx} indicates a compatible or incompatible mixture. CurlySMILES encoding as a two-phase system (composite): {/C{-}C{+n}c1ccccc1/C{-}C{+n}(C)C(=O)OC}
  26. 26. Copyright © 2015 Axel Drefahl Encoding of polymer solutions Example: poly(1-cyanoethylene) dissolved in dihydrofuran-2(3H)-one (γ-butyrolactone) CurlySMILES notation: C{-}(C#N)C{+n}{dsc=O=C1OCCC1} Annotation marker ds for dissolved Key c for CurlySMILES notation with assigned value O=C1OCCC1
  27. 27. Copyright © 2015 Axel Drefahl Encoding of doped polymers Example: poly(1,4-phenylene sulfide) doped with arsenic pentafluoride CurlySMILES notation: c1{-}ccc(cc1)S{+n}{IMc=F[As](F)(F)(F)F} Annotation marker IM for impurity Key c specifying dopant F[As](F)(F)(F)F
  28. 28. Copyright © 2015 Axel Drefahl Encoding of polymer sets Example: poly[(alkylimino)methyleneimino-1,3- phenylene] with specified alkyl groups CurlySMILES notation: N{-}{+Rcc=C{-},CC{-},CCC{-},CC{-}C,CC{-} (C)C}CNc1cccc{+n}c1 Annotation marker +R for alkyl group insertion Key cc for list of comma-separated CurlySMILES notations; here, encoding the specified alkyl groups methyl, ethyl, n-propyl, iso- propyl and tert-butyl
  29. 29. Copyright © 2015 Axel Drefahl CurlySMILES in Python 3 Current iteratively tested implementations ● Modules to parse and analyze molecular-graph-based notations and their annotations ● CANGEN-based methods for input-to-unique conversion of notations (regular single-strands) ● Substructure and descriptor generation methods ● Programs to maintain and screen Axeleratio's in-house bibliography of CurlySMILES-tagged literature, including nano- device and polymer publications.
  30. 30. Copyright © 2015 Axel Drefahl Transformation of a CurlySMILES notation based on node ranks Example: poly[(2-propyl-1,3-dioxane-4,6- diyl)methylene] Entered: C1{-}OC(CCC)OC(C1)C{+n} Unique: C1{-}CC(OC(CCC)O1)C{+n} The CH2 ring node ranks lower than the left O node; the CH2 tail node ranks higher than the right O node.
  31. 31. Copyright © 2015 Axel Drefahl Uniqueness depending on selection of head/tail (H/T) pair O{-}CC{+n} C{-}OC{+n} C{-}CO{+n} poly(oxyethylene) Nomenclature-conform selection of head and tail nodes is recommended in polymer encoding. [see examples of unique notations for regular single-strand polymers]
  32. 32. Copyright © 2015 Axel Drefahl Task-specific integration of CurlySMILES modules ● Interfacing polymer structure (input/output) Form-to-notation editors Notation-to-sketch and notation-to-query software ● Pipelining polymer data (data administration) Automatic ranking and comparison of structure/data pairs Screening of structured lists and repositories ● Generating virtual libraries Automatically building lists of polymer notations for QSPR analysis and identification of optimal-design candidates
  33. 33. Copyright © 2015 Axel Drefahl Application to polymer data mining: “nurturing the mine sites” SRU-based CurlySMILES notations in unique form are identifiers of macromolecules and polymer systems that can be employed to • function as search keys in database applications, • tag factsheets, notes and bibliographic entries, • populate spreadsheet cells and XML text nodes, • index and abstract the polymer literature & patents, • create ontologies that organize polymer information, ..which can be shared via Semantic Web technologies.
  34. 34. Copyright © 2015 Axel Drefahl Application to polymer data mining: search and data extraction The CurlySMILES language has a rich and extensible dictionary to encode polymers in diverse contexts and at various levels of detail. Notations work both ways as precise data annotations and as query formulations for “needle-in-the-haystick” requests. Today's polymer knowledge systems are not marked up by CurlySMILES. But client-server mediation can be achieved, behind-the-scenes, via CurlySMILES code to • compact polymer input provided through entry forms, • expand notations into query language formats.
  35. 35. Copyright © 2015 Axel Drefahl Application to polymer modeling CurlySMILES representations of polymer systems contain detailed structural information to derive macromolecular descriptors and substructures (groups) as entry points for property prediction and model development: • Structure property relationships (QSPRs, GCMs) • SRU similarity (kNN and pattern recognition methods) • MC & MD simulations (flexibility, solution behavior) • Backbone modeling (polymer stability & degradation) • Kinetic & ab initio methods (controlled polymerization)
  36. 36. Copyright © 2015 Axel Drefahl Application to polymer design Specialty polymers must meet multifaceted requirements (multi-dimensional property windows). The virtual design of polymers by permutationally building (co)polymers (or blends) based on systematically varied monomer structures often results into large libraries of structurally related polymers with predictable properties. The automatic generation of the polymer structures of such libraries as compact CurlySMILES notation and the implementation of predictive methods for the desired properties will allow virtual high-throughput screening to initialize the synthesis of potential candidates.
  37. 37. Copyright © 2015 Axel Drefahl Summary & Outlook Done ● SRU annotations to encode polymers ● Polymer description grammar ● Python implementation To Do ● Stereochemical descriptions ● Unique notations for nested polymers ● Conquering polymer space Topics to be addressed for CurlySMILES applications ● Representation and iterative development of models for structure/property estimation ● Extension to advanced architectures: dendrimers, 3D polymers and nanostructure designs combining polymers with carbon nanotubes and fullerene-based bowls and cages