Chemical Semantics Sopron Talk


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Chemical Semantics Sopron Talk

  1. 1. Hypercube ChemicalSemantics, Inc. Publication and Retrieval of Computational Chemical-Physics Data via The Semantic Web Applying the Semantic Web to Computational Chemistry
  2. 2. HypercubeChemical Semantics, September 20132 What is this all about ? The principal objective of our enterprise is to create a testbed for comprehensive exploration of ideas behind the practical application of the Semantic Web in computational chemistry. The aforementioned working testbed (Chemical Semantics Portal) is initially limited to computational chemistry and a limited class of users. In addition, we will focus on the semi-empirical, ab-initio and density functional (DFT) calculations of quantum chemistry and their typical results. The purpose of this talk is to present the ideas of the Semantic Web and their possible application in computational chemistry, and to present the working prototype of the Chemical Semantic Portal.
  3. 3. 3Dr Mirek Sopek INTRODUCTION The Basics of Semantic Web
  4. 4. HypercubeChemical Semantics, September 20134 The evolution of the Web WEB 1.0 - Web of documents WEB 2.0 - Social, Read/Write Web WEB 3.0 - Semantic Web = Web of Data ? WEB 4.0 - Intelligent Web ? * AssumingChristmas1990asitsbeggining( The web is only 8287 days* (23 years) old ! Print – 203,800 days Newspapers – 142,800 days Radio – 41,200 days TV – 28,000 days
  5. 5. HypercubeChemical Semantics, September 20135 Web 1.0 – Web of documents 1989-2000 - Web of Hyperlinked documents
  6. 6. HypercubeChemical Semantics, September 20136 Web 2.0 – Social/Read-Write Web 2000-2010 - The Web of Social Networks and “Wisdom of the Crowds”
  7. 7. HypercubeChemical Semantics, September 20137 Web 3.0 – Semantic Web 2010-2020(?) - Web of Data, Linked Data Web Link Link Link Link Link Link Link Link LinkLink Resource Resource Resource Resource Resource Resource Resource Resource hasPeople humanResources hasServices hasProducts hasPeople hasPeople hasProduct hasProduct colleaguecolleague Organization HR Services Products People People Product Product
  8. 8. 8Dr Mirek Sopek What is wrong with today’s Web?
  9. 9. HypercubeChemical Semantics, September 20139 The WEB is TOO BIG to know Web 1.0 & 2.0 major issues The WEB is TOO BIG to know Social Web dwells in isolated silos Data Deluge - Scientific data stored in isolated silos People look at the Web through Google’s Goggles
  10. 10. 10Dr Mirek Sopek THE SOLUTION: Semantic Web – Web 3.0
  11. 11. HypercubeChemical Semantics, September 201311 What is Semantic Web ? The Semantic Web is a Web of data. It is an extension of the current Web that provides an easier way to find, share, reuse and combine information. “The vision of the Semantic Web is to extend principles of the Web from documents to data.(...) This also means creation of a common framework that allows data to be shared and reused across application, enterprise, and community boundaries, to be processed automatically by tools as well as manually, including revealing possible new relationships among pieces of data.”
  12. 12. HypercubeChemical Semantics, September 201312 Foundations of Semantic Web ―Semantic‖ in ―Semantic Web‖ is about MEANING of data, not about the syntax it is expressed in. Semantic Web = Web Full of Meaning = Web of meaningful Data Semantic Web is about representation of THINGS (OBJECTS and CONCEPTS) and their properties on the Web, not just about documents Semantic Web uses global NAMING scheme to identify THINGS, not just to address documents Semantic Web links THINGS with TYPED LINKS, not with ―blind‖ hyperlinks Semantic Web allows DISCOVERY of new FACTS about THINGS,not just browsing through pages * Picture by Roger Sayle (
  13. 13. HypercubeChemical Semantics, September 201313 Example COC(=O)[C@H](C1=CC=CC=C1Cl)N2CCC3=C(C2)C=CS3 InChI=1S/C16H16ClNO2S/c1-20-16(19)15(12-4-2-3-5- 13(12)17)18-8-6-14-11(10-18)7-9-21-14/h2- 5,7,9,15H,6,8,10H2,1H3/t15-/m0/s1 InchI (Key)=GKTWGGQPFAXNFI-HNNXBMFYSA-N “Plavix” (Clopidogrel) * Based on “Foreign Language Translation of Chemical Nomenclature by Computer” by Roger Sayle (DOI: 10.1021/ci800243w)
  14. 14. HypercubeChemical Semantics, September 201314 How do we represent THINGS on SW On the Semantic WEB we represent THINGS using elementary UNITS of data: TRIPLES. We can create logical and structural relations between elements of the triple, build taxonomies, vocabularies and classes and finally “reason” on large sets of triples. The file format we store the triples in — is called RDF. :H2O gnvc:hasInChIString “1S/H2O/h1H2” For example: Subject Predicate Object Thing Property Value Resource Description Framework :hasMolecularMass “18.0153” “RDF is for THINGS as HTML is for DOCUMENTS”
  15. 15. HypercubeChemical Semantics, September 201315 How do we Identify Things on the Semantic Web For unambiguous identification of things (objects)on the Web and their properties, Semantic Web uses URIs — Universal Resource Identifiers, a generalization of URL i.e. Ordinary Web addresses: Water Molecular Mass “18.0153” A number
  16. 16. Chemical Semantics, September 201316 Hypercube RDF/XML or Turtle (Terse RDF Triple Language) 1 @prefix cs: <> . 2 @prefix mol: <> . 3 @prefix xs: <> . 4 mol:molecule_31 a cs:molecule ; 5 cs:name ―water" ; 6 cs:atom _:atom31_1 ; 7 cs:atom _:atom31_2 ; 8 cs:atom _:atom31_3 ; 9 cs:bond _:bond31_1 ; 10 cs:bond _:bond31_2 . 11 _:atom31_1 cs:atomType cs:O ; 12 cs:x3 "-0.381950"^^xs:double; 13 cs:y3 "0.243825"^^xs:double; 14 cs:z3 "0.000000"^^xs:double. 15 _:atom31_2 cs:atomType cs:H ; 16 cs:x3 "-0.381950"^^xs:double; 17 cs:y3 "1.203825"^^xs:double; 18 cs:z3 "0.000000"^^xs:double. 19 _:atom31_3 cs:atomType cs:H ; 20 cs:x3 "0.523148"^^xs:double; (.....) RDF Serialization – preliminary example
  17. 17. HypercubeChemical Semantics, September 201317 Semantic Web allows Discovery Semantic Web tools for building ―inteligent‖ vocabularies – RDFS (RDFS Schema) and OWL ontologies allow for simple logical INFERENCES and discovery of IMPLICIT facts. For example: When a user searches for a molecule with specific properties, it is possible to automatically provide him with other molecules that belong to the same ―class‖ of molecules. .
  18. 18. HypercubeChemical Semantics, September 201318 Semantic Web = GGG (Giant Global Graph) Organization HR Services Products People People Product Product hasPeople humanResources hasServic es hasProducts hasPeople hasPeople hasProduc t hasProduc t colleaguecolleague GGG – term coined by Tim Berners Lee in 2007 Ooops… sorry, but it’s BIG  Semantic Web = GGG (Giant Global Graph)
  19. 19. HypercubeChemical Semantics, September 201319 Core Semantic Web Technologies RDF — ResourceDescriptionFramework RDFa— RDF ―inattributes‖ RDFS— ResourceDescriptionFrameworkSchema Language OWL — OntologyWeb Language SPARQL— Semantic Protocol& RDF Query Language RIF— Rule InterchangeFormat RDF deals with THINGS RDFa enablesto embed RDF into ordinaryHTML Web Pages RDFS deals with SETS and CLASSES of THINGS OWL  deals with intelligentVOCABULARIES(withlogical relationsbetween concepts) SPARQL allows for searchingthroughgraphsof triples storedin ―triple stores‖ RIF allows to expressand interchange generalizedIF...THENconstructs
  20. 20. HypercubeChemical Semantics, September 201320 AAA— Anyonecan say Anythingabout Any Topic. ... and one about Semantic Web Philosophy OWA— Open WorldAssumption. We mustassumethatat any time a new piece of informationmay come so we can’t assumethatwe have ALL the informationat themomentof informationconsumption. It also means that not knowing something does not necessarily imply falsity! HendlerHypothesis: “A Little SemanticsGoes A Long Way”
  21. 21. HypercubeChemical Semantics, September 201321 Link Data Four Principles: • UseWEB ADDRESES (URLs) as namesfor things. • UseADDRESSES THATWORK ON THE WEB - sothat peoplecan lookup thosenames. • Whensomeonelooksup a URL,PROVIDEUSEFUL INFORMATION,USING THE STANDARDS (likeRDF). • IncludeLINKS TO OTHERURLs,so thatthey can discovermore things. Hendler Hypothesis in action... The Semantic Web isn't just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data. (Tim-Berners Lee)
  22. 22. HypercubeChemical Semantics, September 201322 Ontologies “An ontology formally represents knowledge as a set of concepts within a domain, and the relationships between pairs of concepts. It can be used to model a domain and support reasoning about concepts.” (Wikipedia) The fundamental goals of ontologies: Define concepts used in Semantic graphs (like RDF) Enable terminological standardisation Provide tools for building intelligent dictionaries with synonyms and cross-references Enable encoding of taxonomies (hierarchical definitions) Enable reasoning and inferencing – discovering implicit knowledge
  23. 23. Chemical Semantics, September 201323 Hypercube Antoine Lavoisier “Traité élémentaire de chimie” Early ideas in ontology "We think only through the medium of words. -- Languages are true analytical methods. (…) The art of reasoning is nothing more than a language well arranged. Thus, while I thought myself employed only in forming a Nomenclature, and while I proposed to myself nothing more than to improve the chemical language, my work transformed itself by degrees, without my being able to prevent it, into a treatise upon the Elements of Chemistry.
  24. 24. Chemical Semantics, September 201324 Hypercube Nivaldo J. Tro “Chemistry. A Molecular Approach” Example of Ontology “Hello world” @prefix rdfs: <> . @prefix chem: <> . @prefix rdf: <> . @prefix xsd: <> . @prefix foo: <> . ## Classes chem:Matter a rdfs:Class ; rdfs:label "Matter"@en ; rdfs:label "Matière"@fr ; rdfs:label "Materia"@pl . chem:PureSubstances a rdfs:Class ; rdfs:label "Pure Substances"@en ; rdfs:label "Substances Pures"@fr ; rdfs:label "Substancja"@pl ; rdfs:subClassOf chem:Matter . chem:Mixture a rdfs:Class ; rdfs:label "Mixture"@en ; rdfs:label "Mélange "@fr ; rdfs:label "Mieszanina"@pl ; rdfs:subClassOf chem:Matter . chem:Heterogeneous a rdfs:Class ; rdfs:label "Heterogeneous"@en ; rdfs:label "Hétérogène"@fr ; rdfs:label "Heterogeniczny"@pl ; rdfs:subClassOf chem:Mixture . chem:Homogeneous a rdfs:Class ; rdfs:label "Homogeneous"@en ; rdfs:label "Homogène"@fr ; rdfs:label "Jednorodny"@pl ; rdfs:subClassOf chem:Mixture . ## Properties chem:atomicNumber a rdf:Property ; rdfs:domain chem:Element; rdfs:range rdfs:Literal . chem:moleculeName a rdf:Property ; rdfs:domain chem:Compound; rdfs:range rdfs:Literal . chem:componentName a rdf:Property ; rdfs:domain chem:Mixture ; rdfs:range chem:Matter .
  25. 25. Chemical Semantics, September 201325 Hypercube Non-Trivial Ontologies in Chemistry ChEBI – Chemical Entities of Biological Interest Project of EMBL-EBI European Bioinformatics Institute (Cambridge) of European Molecular Biology Lab (Heidelberg) OBO Foundry Ontology ( ) The Open Biological and Biomedical Ontologies Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on „small‟ chemical compounds. The term „molecular entity‟ refers to any constitutionally or isotopicaly distinct atom, molecule, ion, ion pair, radical, radical ion, complex, conformer, etc., identifiable as a separately distinguishable entity. The molecular entities in question are either products of nature or synthetic products used to intervene in the processes of living organisms. ChEBI incorporates an ontological classification, whereby the relationships between molecular entities or classes of entities and their parents and/or children are specified.
  26. 26. Chemical Semantics, September 201326 Hypercube Non-Trivial Ontologies in Chemistry ChemINF – Chemical Information Ontology Janna Hastings, Nico Adams, Christoph Steinbeck (EBI) Leonid Chepelev, Michel Dumontier, Egon Willighagen, Nico Adams OBO Foundry Candidate ChemINF descibes: • Chemical graphs, and various formats for encoding them. • Chemical descriptors, with definitions and axioms describing what they are specifically about. • Specifications for certain descriptors. • Algorithms and their software implementations and axioms describing their inputs and outputs. • Chemical data representation formalisms and formats.
  27. 27. HypercubeChemical Semantics, September 201327 Chemical Semantics Ontology Gainesville Core (alpha edition) Gainesville Core describes: • Molecular Publications • Molecular Systems • Molecular Calculations Molecular Systems contain Molecules • The Molecules may have Residues (for biopolymers and polymers) • Molecular Calculations contain Initial Data and Results • The Initial Data may have Methods, Basis Sets, Functionals, etc. • The Results may have Energies, Wave Functions and Spectra, etc. GC aims at complete description of typical Computational Chemistry experiment
  28. 28. HypercubeChemical Semantics, September 201328 Chemical Semantics Ontology gc.owl with Protege
  29. 29. HypercubeChemical Semantics, September 201329 Related Ontologies ... SIO – Semanticscience Integrated Ontology OPB– Ontologyof Physicsfor Biology RXNO – Name Reaction Ontology CMO – Chemical Methods Ontology MOP– Molecular Proocesses Ontology SO – The Sequence OntologyProject
  30. 30. HypercubeChemical Semantics, September 201330 Importance of Structural Data Structures CML – Chemical Markup Language “CMLisnot'justanotherfileformat';itiscapableofholdingextremelycomplexinformation structuresandsoactingasaninterchangemechanismorforarchival.Itinterfaceseasilywith moderndatabasearchitecturessuchasrelationaldatabasesorobject-orienteddatabases. Mostimportantly,italargeamountofgenericXMLsoftwaretoprocessandtransformitis alreadyavailablefromthecommunity.” P.Murray-Rust,H.S.Rzepa,2001 CML“pavedtheroad”toSemanticsinChemistry. Extremelyusefulasaninterchangeformat betweenCCsoftwareandSemanticWeb Ourposition:ChemicalSemanticswilluseCSX–similarstructuralformatenrichedbyexplicit descriptionof molecularconstituents,enricheddescriptionofcomputationsinputsandresults .
  31. 31. HypercubeChemical Semantics, September 201331 A timeline of Semantic Web RDF–1999 CML-ChemicalMarkupLanguage-1999 FOAF-2000 RDFa-2004 DBPedia–2007 ChEBI-ChemicalEntitiesofBiologicalInterest-2007 GoodRelations(2008,Googleadoption:November2,2010)–June2011 Google’sKnowledgeGraph–May2012 FacebookGraphSearch-January2013
  32. 32. Chemical Semantics, September 201332 Hypercube An emerging successor to the web, the Semantic Web, will likely profoundly change the very nature of how scientific knowledge is produced and shared, in ways that we can now barely imagine. Conclusion
  33. 33. 33Dr Mirek Sopek Chemical Semantics Portal
  34. 34. HypercubeChemical Semantics, September 201334 CS Portal main targets Interoperable PUBLISHING of Computational Chemistrycalculations FEDERATIONof publisheddata with existing web-based chemicaldatasets Cloud-like ARCHIVING of Computational Chemistrycalulations results, input/output files etc.
  35. 35. HypercubeChemical Semantics, September 201335
  36. 36. HypercubeChemical Semantics, September 201336
  37. 37. HypercubeChemical Semantics, September 201337 Manualpublication(upload) Automatedpublicationdirectly from ModellingSoftware - via Web API
  38. 38. HypercubeChemical Semantics, September 201338 Automatedgeneration of permanent URIs
  39. 39. HypercubeChemical Semantics, September 201339 Permanent Chemical URIs Automatedgeneration of permanent URIs Owned & controlled by OCLC (Online Computer Library Center) Is claimed to be persistent and eternal. Owned by OCLC controlled by Chemical Semantics, Inc. Generated by Chemical Semantics, Inc. for the user. Owned by the user.
  40. 40. HypercubeChemical Semantics, September 201340 URI naming scheme Publication Molecular Calculations Molecular System A Molecule of the system Bonds between atomsin the molecule
  41. 41. HypercubeChemical Semantics, September 201341 Dual nature of the URIs Realizes Linked Data Principles For Humans(i.e. as seen via web browser) Returns:
  42. 42. HypercubeChemical Semantics, September 201342 Dual nature of the URIs Realizes Linked Data Principles For Machines (i.e. as seen via Semantic Tools (rdfEditor,Fidler)) Returns: Content- negotiations: “Onegets what one asksfor”
  43. 43. HypercubeChemical Semantics, September 201343 More on “Human-oriented” views “Results”– aprototypeforfuturepublication“digest”
  44. 44. HypercubeChemical Semantics, September 201344 More on “Human-oriented” views “Molecules”– generic,webGLbasedmolecularviewer
  45. 45. HypercubeChemical Semantics, September 201345 More on “Human-oriented” views “Wavefunction”– visualizationoforbitalenergies
  46. 46. HypercubeChemical Semantics, September 201346 More on “Human-oriented” views “Graph”–exploretheknowledgestructureaboutyoursystem
  47. 47. HypercubeChemical Semantics, September 201347 More on “Human-oriented” views “DataFederation”–exploreSemanticLinkstoeternalresources
  48. 48. HypercubeChemical Semantics, September 201348 More on “Human-oriented” views “Datasets”–useCSPortalforarchivingpurposes
  49. 49. HypercubeChemical Semantics, September 201349 SPARQL queries on CS Portal CountingnumberoftriplesinthegraphsoftheCSPortal SELECT ?graph (count(*) as ?count) WHERE { GRAPH ?graph { ?s ?p ?o . } } group by ?graph order by DESC(?count)
  50. 50. HypercubeChemical Semantics, September 201350 SPARQL queries on CS Portal Countingnumberofelementsinallmolecularsystemson theCSPortal PREFIX rdf: <> PREFIX gc: <> PREFIX rdfs: <> SELECT ?element (count(*) as ?count) WHERE { ?atom gc:isElement ?element . } GROUP BY ?element ORDER BY DESC(?count)
  51. 51. HypercubeChemical Semantics, September 201351 SPARQL queries on CS Portal Numberofdifferentcalculationsinallmolecularsystemsof theCSPortal PREFIX rdf: <> PREFIX gc: <> SELECT ?resultType (count(*) as ?count) WHERE { GRAPH ?graph { ?calc rdf:type gc:Calculation ; gc:hasResult ?result . ?result rdf:type ?resultType . } } group by ?resultType order by DESC(?count)
  52. 52. HypercubeChemical Semantics, September 201352 SPARQL queries on CS Portal NumberofmolecularsystemswithhalogenatomstheCSPortal PREFIX rdf: <> PREFIX gc: <> PREFIX rdfs: <> SELECT ?graph WHERE { GRAPH ?graph { { ?something gc:hasAtom ?atom1 ; rdf:type ?somethingType ; rdfs:label ?somethingLabel . ?atom1 gc:isElement "F" . } UNION { ?something gc:hasAtom ?atom2 ; rdf:type ?somethingType ; rdfs:label ?somethingLabel . ?atom2 gc:isElement "Cl" . } UNION { ?something gc:hasAtom ?atom3 ; rdf:type ?somethingType ; rdfs:label ?somethingLabel . ?atom3 gc:isElement "Br" . } UNION { ?something gc:hasAtom ?atom4 ;
  53. 53. HypercubeChemical Semantics, September 201353 SPARQL queries on CS Portal Numberofinorganicmolecularsystems ## Show all molecules that contain atoms other than C,O,N,H PREFIX rdf: <> PREFIX gc: <> PREFIX rdfs: <> SELECT DISTINCT ?graph WHERE { {GRAPH ?graph { ?mol gc:hasAtom ?atom}} MINUS {GRAPH ?graph { ?a gc:isElement "C" }} MINUS {GRAPH ?graph { ?b gc:isElement "O" }} MINUS {GRAPH ?graph { ?b gc:isElement "N" }} MINUS {GRAPH ?graph { ?b gc:isElement "H" }} }
  54. 54. HypercubeChemical Semantics, September 201354 SPARQL queries on CS Portal Energyvaluescomputed of allofmolecularsystems PREFIX rdf: <> PREFIX gc: <> SELECT ?sysEnergy ?energyValue ?energyName WHERE { GRAPH ?graph { ?molSys rdf:type gc:MolecularSystem ; gc:hasCalculationOn ?molCalc . ?molCalc rdf:type gc:Calculation ; gc:hasResult ?sysEnergy . ?sysEnerg rdf:type gc:SystemEnergies ; ?p ?o . ?o gc:hasFloatValue ?energyValue; rdfs:label ?energyName. } } ORDER BY ?energyName
  55. 55. Hypercube Stay tuned ... If you want to work with us, or just share your opinions, Do not hesitate to notify us at:
  56. 56. Hypercube Thank you… Neil Ostlund, Hypercube, Inc. 1115 NW 4th St. Gainesville, FL 32608, USA Phone: (352) 371 7744 Web: eMail: Mirek Sopek MakoLab SA Demokratyczna 46, 93-430 Lodz, Poland Phone: +48 600 814 537 Web: eMail: