Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Towards an Open Research Knowledge Graph

1,102 views

Published on

The document-oriented workflows in science have reached (or already exceeded) the limits of adequacy as highlighted for example by recent discussions on the increasing proliferation of scientific literature and the reproducibility crisis. Now it is possible to rethink this dominant paradigm of document-centered knowledge exchange and transform it into knowledge-based information flows by representing and expressing knowledge through semantically rich, interlinked knowledge graphs. The core of the establishment of knowledge-based information flows is the creation and evolution of information models for the establishment of a common understanding of data and information between the various stakeholders as well as the integration of these technologies into the infrastructure and processes of search and knowledge exchange in the research library of the future. By integrating these information models into existing and new research infrastructure services, the information structures that are currently still implicit and deeply hidden in documents can be made explicit and directly usable. This has the potential to revolutionize scientific work because information and research results can be seamlessly interlinked with each other and better mapped to complex information needs. Also research results become directly comparable and easier to reuse.

Published in: Science
  • Be the first to comment

Towards an Open Research Knowledge Graph

  1. 1. Towards an Open Research Knowledge Graph Sören Auer
  2. 2. Gottfried Wilhelm Leibniz * 21. Juni/ 1. Juli 1646 in Leipzig † 14. November 1716 in Hannover Namesake Member of Library of Namesake
  3. 3. Had to do some research on serials…
  4. 4. 5 Serials Mail order catalogs
  5. 5. 6
  6. 6. 7 Mail order catalogs
  7. 7. 8
  8. 8. 9
  9. 9. 10 Road Maps
  10. 10. 11 Phone Books
  11. 11. How does it work today?
  12. 12. 13
  13. 13. 14
  14. 14. 15
  15. 15. 16 New means adapted to the new posibilities were developed, e.g. „zooming“, dynamics Business models changed completely More focus on data, interlinking of data and services and search in the data Integration, crowdsourcing play an important role The World of Publishing & Communication has profundely changed
  16. 16. What about Scholarly Communication?
  17. 17. 18 Scientific publishing in the 17th century One of the earliest research journals: Philosophical Transactions of the Royal Society © CC BY Henry Oldenburg
  18. 18. 19 Publishing in 1970s
  19. 19. 20 Scientific publishing today We have: BUT • Mainly based on PDF • Is only partially machine-readable • Does not preserve structure • Does not allow embedding of semantics • Does not facilitate interactivity/dynamicity/ repurposing • …
  20. 20. 21 Proliferation of scientific literature Duplication and inefficiency Deficiency of peer-review Reproducibility crisis Science is Seriously Flawed
  21. 21. 22 Science and engineering articles by region, country: 2004 and 2014 Proliferation of scientific literature National Science Foundation: Science and Engineering Publication Output Trends: https://www.nsf.gov/statistics/2018/nsf18300/nsf18300.pdf
  22. 22. 23 1,500 scientists lift the lid on reproducibility Monya Baker in Nature, 2016. 533 (7604): 452–454. doi:10.1038/533452a: • 70% failed to reproduce at least one other scientist's experiment • 50% failed to reproduce one of their own experiments Failure to reproduce results among disciplines (in brackets own results): • chemistry: 87% (64%), • biology: 77% (60%), • physics and engineering: 69% (51%), • Earth sciences: 64% (41%). Reproducibility Crisis © Stanford Medicine - Stanford University
  23. 23. 24 How can we avoid duplication if the terminology, research problems, approaches, methods, characteristics, evaluations, … are not properly defined and identified? How would you build an engine/building without properly defining their parts, relationships, materials, characteristics … ? Duplication and Inefficiency
  24. 24. 25 Lack of: • Transparency – information is hidden in text • Integratability – fitting different research results together • Machine assistance – unstructured content is hard to process • Identifyability of concepts beyond metadata • Collaboration – one brain barrier • Overview – scientists look for the needle in the haystack Root Cause - Deficiency of Scholarly Communication?
  25. 25. How can we fix it? 26
  26. 26. 27 Realizing Vannevar Bush‘s vision of Memex
  27. 27. Linked Data Principles 1. Use URIs to identify the “things” in your data 2. Use http:// URIs so people (and machines) can look them up on the web 3. When a URI is looked up, return a description of the thing in the W3C Resource Description Format (RDF) 4. Include links to related things http://www.w3.org/DesignIssues/LinkedData.html 28 [1] Auer, Lehmann, Ngomo, Zaveri: Introduction to Linked Data and Its Lifecycle on the Web. Reasoning Web 2013
  28. 28. Page 29 1. Graph based RDF data model consisting of S-P-O statements (facts) RDF & Linked Data in a Nutshell NasigConf2018 dbpedia:Atlanta 09.06.2018 NASIG conf:organizes conf:starts conf:takesPlaceIn 2. Serialised as RDF Triples: NASIG conf:organizes NasigConf2018 . NasigConf2018 conf:starts “2018-06-09”^^xsd:date . NasigConf2018 conf:takesPlaceAt dbpedia:Atlanta . 3. Publication under URL in Web, Intranet, Extranet Subject Predicate Object
  29. 29. Page 30 Creating Knowledge Graphs with RDF Linked Data located in label industry headquarters full nameDHL Post Tower 162.5 m Bonn Logistics Logistik DHL International GmbH height 物流 label
  30. 30. Page 31 Graph consists of:  Resources (identified via URIs)  Literals: data values with data type (URI) or language (multilinguality integrated)  Attributes of resources are also URI-identified (from vocabularies) Various data sources and vocabularies can be arbitrarily mixed and meshed URIs can be shortened with namespace prefixes; e.g. dbp: → http://dbpedia.org/resource/ RDF Data Model (a bit more technical) gn:locatedIn rdfs:label dbo:industry ex:headquarters foaf:namedbp:DHL_International_GmbH dbp:Post_Tower "162.5"^^xsd:decimal dbp:Bonn dbp:Logistics "Logistik"@de "DHL International GmbH"^^xsd:string ex:height "物流"@zh rdfs:label rdf:value unit:Meter ex:unit
  31. 31. Page 32 • Fabric of concept, class, property, relationships, entity descriptions • Uses a knowledge representation formalism (typically RDF, RDF-Schema, OWL) • Holistic knowledge (multi-domain, source, granularity): • instance data (ground truth), • open (e.g. DBpedia, WikiData), private (e.g. supply chain data), closed data (product models), • derived, aggregated data, • schema data (vocabularies, ontologies) • meta-data (e.g. provenance, versioning, documentation licensing) • comprehensive taxonomies to categorize entities • links between internal and external data • mappings to data stored in other systems and databases Knowledge Graphs – A definition Smart Data for Machine Learning
  32. 32. Page 33
  33. 33. Page 34 Search Engine Optimization & Web-Commerce  Schema.org used by >20% of Web sites  Major search engines exploit semantic descriptions Pharma, Lifesciences  Mature, comprehensive vocabularies and ontologies  Billions of disease, drug, clinical trial descriptions Digital Libraries  Many established vocabularies (DublinCore, FRBR, EDM)  Millions of aggregated from thousands of memory institutions in Europeana, German Digital Library Emerging Knowledge Graphs & Data Spaces
  34. 34. Paradigm Change in Scholarly Communication Towards more Knowledge-based Information Flows
  35. 35. 36 Paradigm Change in Scholarly Communication Knowledge-based Information Flows in Science & Technology Challenges: Digitalisation of Science, monopolisation by commercial actors, Proliferation of publications, Reproducibility Crisis
  36. 36. 37 Mathematics • Definitions • Theorems • Proofs • Methods • … Physics • Experiments • Data • Models • … Chemistry • Substances • Structures • Reactions • … Computer Science • Concepts • Implemen- tations • Evaluations • … Technology • Standards • Processes • Elements • Units, Sensor data Architecture • Regulations • Elements • Models • … Open Research Knowledge Graph Overarching Concepts  Research problems  Definitions  Research approaches  Methods Artefacts  Publications  Data  Software  Image/Audio/Video  Knowledge Graphs / Ontologies Domain specific concepts Open Research Knowledge Graph makes comprehensive and subject-specific concepts clearly identifiable and links them semantically (with clearly described relations) with each other and with relevant further artifacts.
  37. 37. 38
  38. 38. 39 Search for CRISPR: >4.000 Results
  39. 39. 40 Chemistry Example: CRISPR/Cas Genome Editing
  40. 40. 41 Semantic Representation using a Knowledge Graph Author Robert Reed Research Problem Methods Experimental Data related Concepts Genome editing in Lepidoptera CRISPR/cas9 Lepidoptera; Genome editing; CRSIPR https://doi.org/10.5281/zenodo.896916 A practial guide to CRISPR/cas9 editing in Lepidoptera <https://doi.org/10.1101/130344> Robert Reed <https://orcid.org/0000-0002- 6065-6728> Genome editing in Lepidoptera Experimental Data https://doi.org/10.528 1/zenodo.896916 isAuthorOf adresses CRSPRS/cas9 isImplementedBy isEvaluatedWith Genome editing <https://www.wikidata.or g/wiki/Q24630389> relatesConcept 3. Graph representation 2. Graph Curation Form 1. Original Publication
  41. 41. 42 Automatic Generation of Comparisons/Surveys
  42. 42. 43 Open Research Knowledge Graph interlinks existing Services and Resources
  43. 43. 44
  44. 44. Interlinking Article, Software, Video and Graph resources describing the research
  45. 45. 47 Advantages of knowledge based scholarly communication  Clear identification of all relevant artifacts, concepts, attributes, relationships  terminological and conceptual precision and sharpness, less ambiguity  Better and explicit networking of all relevant artifacts and information sources  traceability  ORKG machine-readability  new search, retrieval, mining and assistance applications  Avoidance of media discontinuities in the different phases of scientific work  Increased efficiency  Use of concepts and relationships across disciplinary boundaries  Interdisciplinarity and transdisciplinarity  Halting the proliferation of scientific publications  less duplication  Facilitating the entry of young academics or laypersons  Open Science
  46. 46. 48 There is a lot to do: • Equip existing services with Linked Data interfaces • Enable the deep semantic description of research, requires • Good user interfaces • Scalable storage and search facility • Collaboration between scientists, libariens, knowledge engineers, machines Stay tuned • Mailinglist/group: https://groups.google.com/forum/#!forum/orkg • Comming soon: Open Research Knowledge Graph: https://orkg.org • Next workshop at TIB on November, 22nd (after DILS Conference: https://events.tib.eu/dils2018/) Outlook
  47. 47. https://de.linkedin.com/in/soerenauer https://twitter.com/soerenauer https://www.xing.com/profile/Soeren_Auer http://www.researchgate.net/profile/Soeren_Auer TIB & Leibniz University of Hannover Soeren.Auer@tib.eu Sören Auer
  48. 48. 50 Said Fathalla, Sahar Vahdati, Sören Auer, Christoph Lange: Towards a Knowledge Graph Representing Research Findings by Semantifying Survey Articles. TPDL 2017: 315-327, https://www.researchgate.net/publication/319419350 Sahar Vahdati, Natanael Arndt, Sören Auer, Christoph Lange: OpenResearch: Collaborative Management of Scholarly Communication Metadata. EKAW 2016: 778-793, https://www.researchgate.net/publication/309700661 Sören Auer: Towards an Open Research Knowledge Graph https://zenodo.org/record/1157185 Sören Auer, Viktor Kovtun, Manuel Prinz, Anna Kasprzik, Markus Stocker: Towards a Knowledge Graph for Science. https://doi.org/10.15488/3401 References

×