Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Earth science knowledge graph

117 views

Published on

An automatic approach to developing Earth science knowledge graph using NLP

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Earth science knowledge graph

  1. 1. National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California An Automatic Approach to Building Earth Science Knowledge Graph to Improve Data Discovery (ESKG) Lewis J. McGibbney (JPL) Yongyao Jiang (George Mason University)
  2. 2. National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Agenda/Objectives This presentation aims to provide… • an introduction to ESKG (for those who may have never heard of it before) • an update on the ESKG testbed Project (for those who have…) • a budget update • future plans and community building efforts Who is this presentation for? • Put simply anyone with an interest in the ESIP Labs project • Semantic Technologists • Data integration and discovery enthusiasts What are the takeaways? • Learn about the growing ESIP Semantic Technology stack • Learn about the ESKG codebase - https://github.com/ESIPFed/eskg/ • Consider getting engaged in the process of building linked open data and knowledge graph(s) for the Earth Sciences community. ESIP Summer 2017 2
  3. 3. National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California The ESKG Team ESIP Summer 2017 3
  4. 4. National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Introduction What is ESKG? ESKG [1] is an effort to revolutionize the way in which ESIP communities interact with ES data in the open world through the entity, spatial and temporal linkages and characteristics that make it up. This project will enable the advancement of ESIP collaboration areas including both Discovery, Semantic Technologies and possibly Drone communities by putting graph information right at our fingertips in an interactive, modern manner and reducing the efforts to constructing ontology. ESKG will strengthen ties between observations and user communities by: • developing a knowledge graph derived from heterogeneous sources via natural language processing and knowledge extraction techniques, and • allowing users to traverse, explore, query, reason and navigate ES data via knowledge graph interaction. ESIP Summer 2017 4
  5. 5. National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California How is this achieved? What are the goals? • Leverage progress made through recent and existing Semantic Technologies Testbed projects [2], [3] to deliver the ESKG concept originally focused on NASA JPL’s PO.DAAC [4]. • Provide the ESIP community with a semi/fully automated knowledge representation methodology which will overcome current limitations associated with manual ontology development approaches. • Ensure that both of the above goals advance development of Semantic Technologies and Discovery collaboration areas within the ESIP Federation. ESIP Summer 2017 5
  6. 6. National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Rationale behind Ontology Integration and Knowledge Graphs • Ontology-based data integration involves the use of ontology to effectively combine data or information from multiple heterogeneous sources [5]. • Within Earth Science’s there are several persistent efforts to improve semantics and increase their uptake. • Such efforts take numerous forms e.g. – Community level – ESIP Semantic Technology Committee [6] – Infrastructure level – ESIP Semantic Technology Portal [2] and Community Ontology Repository [3], more to come on this… – Vocabulary level – SWEET [7], ENVO [8], etc. The goal is that, by using such resources we can better align search and discovery functionality within the Earth Sciences community(s) it will eventually serve. ESIP Summer 2017 6
  7. 7. National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California What kind of things does ESKG allow? Currently… 1. Generation of high quality domain-specific (oceanographic) ontology 2. Free and open use and availability of those ontology to the wider Earth Science community e.g. persistence and archival within ESIP infrastructure [2], [3] 3. Consumption of our own (and others) semantic resources within the NASA AIST (NNX15AM85G) MUDROD [9] semantic search and search engine ranking and possibly visualization 1. Semantic search – term expansion/auto completion/suggestions through use of synonyms, sub/super classes, term negation through antonyms, etc. 2. Search engine ranking – using ontological relationships as a fundamental element within an overall MUDROD scoring metric. 3. Visualization – through an ontology graph interface such as we have seen deployed at the old SWEET JPL Website. This use case typically aligns with domain discovery. ESIP Summer 2017 7
  8. 8. National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California ESKG Architecture ESIP Summer 2017 8 PODAACWebServicesClient PODAACOntologyMapper Storage Abstraction MUDROD
  9. 9. National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California PO.DAAC Datasets Ontology (PDO) • PO.DAAC offers Webservices API’s for programmatic access to a variety of PO.DAAC data holdings [10] • Data is available in a variety of serializations e.g. HTML, ATOM/RSS, GCMD, etc. • All PO.DAAC metadata is currently structured according to GCMD-DIF v9.8.2. The XSD can be found at [11] an example PO.DAAC GCMD metadata record can be found at [12] and the link also below. https://podaac.jpl.nasa.gov/ws/metadata/dataset?datasetId=PODAAC- TELND-PGTX1&format=gcmd • Complete end-to-end code for generating PDO and persisting it into either ESIP Semantic Portal or COR can be located at [13]. This is essentially the canonical ESKG source. ESIP Summer 2017 9
  10. 10. National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Project Update An entire project update as of 30th June 2017 can be located at [14]. This provides • An updated budget showing project spending to date as well as projected project spend accommodating specific events (such as proposed conference/workshop attendance) as explained within the text. • Alignment of projected conference/workshop attendance with deliverables outlined within a previously submitted Memorandum of Understanding (MOU) as attached in Appendix I. In order to successfully execute the ESKG Testbed project, we originally requested the full available budget of $7,000. In the project proposal we also stated the following: • the budget would be equally distributed over the project duration, • no costs would being directed towards travel or ESIP meeting attendance (e.g travel to meeting, accommodation and meeting registration) as such costs had already been obtained by proposed participating collaborators, finally • proportional small budget ‘may’ however be directed towards additional community/breakout events are ESIP meetings, and preparatory work such as generation of poster presentation(s), and transition of the ESKG project to an appropriate long term home. ESIP Summer 2017 10
  11. 11. National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Project Budgetting ESIP Summer 2017 11
  12. 12. National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Future Work • Integration of NOAA (and hopefully USGS) data sources • Community building event attendance at the International Conference on Biomedical Ontology 2017 September 13th - 17th, 2017 in Newcastle, England [15]. In particular ESKG (and therefore ESIP) will be present at the ONTOEDIT 2017 Workshop [16] aimed around addressing challenges in the design, authoring and publication of ontology. • Make a formal release of ESKG to Maven Central [17] enabling the ESKG Java client to be used as a dependency in other projects. • Present ESKG at AGU Fall Meeting, New Orleans, 11th – 15th, 2017 IN028: Enabling Interoperability, Interdisciplinary Use, and Stewardship of Scientific Data through Knowledge Representation Frameworks [18] ESIP Summer 2017 12
  13. 13. National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Acknowledgements • ESKG was funded as an ESIP Tesbed Project acknowledgement recognized. No salaries were paid to either Lewis John McGibbney or Yongyao Jiang. • Lewis John McGibbney’s participation at ESIP is funded by the National Aeronautics and Space Administration acknowledgement recognized. • To the ESIP Testbed Committee/Review Panel. Please contact us at [19] with any questions. ESIP Summer 2017 13
  14. 14. National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California References 1. http://bit.ly/2jFjEDB 2. http://semanticportal.esipfed.org 3. http://cor.esipfed.org/ 4. https://podaac.jpl.nasa.gov 5. H. Wache; T. Vögele; U. Visser; H. Stuckenschmidt; G. Schuster; H. Neumann; S. Hübner (2001). Ontology-Based Integration of Information A Survey of Existing Approaches. CiteSeerX 10.1.1.142.4390 6. http://wiki.esipfed.org/index.php/Semantic_Technologies 7. https://github.com/ESIPFed/sweet 8. https://github.com/EnvironmentOntology/envo/ 9. https://mudrod.github.io/ 10. https://podaac.jpl.nasa.gov/ws 11. http://gcmd.nasa.gov/Aboutus/xml/dif/dif_v9.8.2.xsd 12. https://podaac.jpl.nasa.gov/ws/metadata/dataset?datasetId=POD AAC-TELND-PGTX1&format=gcmd ESIP Summer 2017 14
  15. 15. National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California References 13. https://github.com/ESIPFed/eskg 14. http://bit.ly/2vdaHre 15. https://conferences.ncl.ac.uk/icbo17/ 16. https://conferences.ncl.ac.uk/icbo17/workshops/ 17. http://search.maven.org/ 18. https://agu.confex.com/agu/fm17/preliminaryview.cgi/Session236 20 19. https://github.com/ESIPFed/eskg/#community ESIP Summer 2017 15

×