Ontology Development for Provenance Tracing in National Climate Assessment of the US Global Change Research Program

678 views

Published on

The periodical National Climate Assessment (NCA) of the US Global Change Research Program (USGCRP) [1] produces reports about findings of global climate change and the impacts of climate change on the United States. Those findings are of great public and academic concerns and are used in policy and management decisions, which make the provenance information of findings in those reports especially important. The USGCRP is developing a Global Change Information System (GCIS), in which the NCA reports and associated provenance information are the primary records.

We were modeling and developing Semantic Web applications for the GCIS. By applying a use case-driven iterative methodology [2], we developed an ontology [3] to represent the content structure of a report and the associated provenance information. We also mapped the classes and properties in our ontology into the W3C PROV-O ontology [4] to realize the formal presentation of provenance. We successfully implemented the ontology in several pilot systems for a recent National Climate Assessment report (i.e., the NCA3). They provide users the functionalities to browse and search provenance information with topics of interest. Provenance information of the NCA3 has been made structured and interoperable by applying the developed ontology. Besides the pilot systems we developed, other tools and services are also able to interact with the data in the context of the “Web of data” and thus create added values.

Our research shows that the use case-driven iterative method bridges the gap between Semantic Web researchers and earth and environmental scientists and is able to be deployed rapidly for developing Semantic Web applications. Our work also provides first-hand experience for re-using the W3C PROV-O ontology in the field of earth and environmental sciences, as the PROV-O ontology is recently ratified (on 04/30/2013) by the W3C as a recommendation and relevant applications are still rare.

[1] http://www.globalchange.gov
[2] Fox, P., McGuinness, D.L., 2008. TWC Semantic Web Methodology. Accessible at: http://tw.rpi.edu/web/doc/TWC_SemanticWebMethodology
[3] https://scm.escience.rpi.edu/svn/public/projects/gcis/trunk/rdf/schema/GCISOntology.ttl
[4] http://www.w3.org/TR/prov-o/

Published in: Education, Technology
  • Be the first to comment

Ontology Development for Provenance Tracing in National Climate Assessment of the US Global Change Research Program

  1. 1. TWC AGU Fall Meeting 2013, San Francisco, CA Ontology Development for Provenance Tracing in National Climate Assessment of the US Global Change Research Program Xiaogang Ma a, Jin Guang Zheng a, Justin Goldstein b,c, Linyun Fu a, Brian Duggan b,c, Patrick West a, Jun Xu a, Chengcong Du a, Anusha Akkiraju a Steve Aulenbach b,c, Curt Tilmes c,d, Peter Fox a a Tetherless World Constellation, Rensselaer Polytechnic Institute; b University Corporation for Atmospheric Research; c U.S. Global Change Research Program; d NASA Goddard Space Flight Center
  2. 2. TWC Background • United States Global Change Research Program (USGCRP): An interagency program that coordinates and integrates Federal research on changes in the global environment and their implications for society • National Climate Assessment (NCA): An assessment conducted under the auspices of the Global Change Research Act of 1990, which requires a report to the President and the Congress every four years that evaluates, integrates and interprets the findings of the USGCRP with the intent to advance an inclusive and sustained process for assessing and communicating scientific knowledge of the impacts, risks and vulnerabilities associated with a changing global climate in support of decision making across the United States • Global Change Information System (GCIS): An information system under development through the USGCRP that establishes data interfaces and interoperable repositories of climate and global change data which can be easily and efficiently accessed, integrated with other data sets, maintained over time and expanded as needed into the future From: The National Global Change Research Plan 2012 - 2021 2
  3. 3. TWC Collaborators National Science and Technology Council (NSTC) Committee on Environment, Natural Resources and Sustainability (CENRC) White House Office of Science and Technology Policy (OSTP) Subcommittee on Global Change Research (SGCR) U.S. Global Change Research Program (USGCRP) GCIS: Information Model and Semantic Application Prototypes (GCIS-IMSAP) Global Change Information System (GCIS) National Climate Assessment (NCA) National Climate Assessment Development Advisory Committee (NCADAC) 3
  4. 4. TWC What we do • Ongoing: provenance* for the NCA3** report • Future: provenance of publications, datasets, models, organizations, instruments, experiments, people, etc. eventually covering the entire scope of global change * Provenance - Information about entities, activities, people and organizations involved in the production of the research findings and the supporting datasets and methods (cf. Moreau and Missier, 2013) ** NCA3 - The National Climate Assessment Development Advisory Committee (NCADAC) engaged more than 240 authors in the creation of the third NCA (NCA3) report, which is to be released in early 2014 4
  5. 5. TWC An example “Figure 1.2: Sea Level Rise: Past, Present, and Future” in draft NCA3 5
  6. 6. TWC Remote sensing sensors, platforms, and instruments are used in global change research Image source: Yang et al., 2013. Nature Climate Change 6
  7. 7. TWC An example question of provenance tracing: What are NASA contributionsPast, Present, in the draft NCA3? NCA3 “Figure 1.2: Sea Level Rise: to Figure 1.2 and Future” in draft 7
  8. 8. TWC Ontology Development for Provenance Tracing in the third National Climate Assessment The third National Climate The third National Climate Assessment Report (NCA3) Assessment Report (NCA3) Provenance – Information about Provenance – Information about entities, activities, people and entities, activities, people and organizations involved in the organizations involved in the production of the research production of the research findings and the supporting findings and the supporting datasets and methods datasets and methods Ontology – In this work the Ontology – In this work the ontology (GCIS ontology) is a ontology (GCIS ontology) is a conceptual model of classes, conceptual model of classes, properties and instances that properties and instances that can be used to capture can be used to capture provenance information in the provenance information in the NCA3 NCA3 Image courtesy of nature.com 8
  9. 9. TWC Method: a use case-driven iterative approach Source: Fox and McGuiness, 2008. http://tw.rpi.edu/web/doc/TWC_SemanticWebMethodology 9
  10. 10. TWC Identifies: Identifies: •goals/objectives to be accomplished •goals/objectives to be accomplished •resources to be used to achieve these objectives •resources to be used to achieve these objectives •methods to be used to produce the desired results •methods to be used to produce the desired results A template for documenting use cases: A template for documenting use cases: http://tw.rpi.edu/media/2013/07/25/ae99/UseCase_Tem http://tw.rpi.edu/media/2013/07/25/ae99/UseCase_Tem plate_SeS.doc plate_SeS.doc Source: Fox and McGuiness, 2008. http://tw.rpi.edu/web/doc/TWC_SemanticWebMethodology 10
  11. 11. TWC A facilitator: A facilitator: •sets and monitors direction •sets and monitors direction •provides guidance for scoping the use case •provides guidance for scoping the use case •milestones for implementation •milestones for implementation Team formation: domain experts, data and information Team formation: domain experts, data and information producers, knowledge and information modelers, producers, knowledge and information modelers, software engineers, and a scribe. software engineers, and a scribe. Source: Fox and McGuiness, 2008. http://tw.rpi.edu/web/doc/TWC_SemanticWebMethodology 11
  12. 12. TWC In GCIS-IMSAP works we used: In GCIS-IMSAP works we used: •Group meeting: Titanpad, Skpye, GotoMeeting •Group meeting: Titanpad, Skpye, GotoMeeting •Conceptual modeler: CMapTools •Conceptual modeler: CMapTools •Ontology editor: Protege, Notepad++ •Ontology editor: Protege, Notepad++ •Ontology documentation: LODE, Parrot •Ontology documentation: LODE, Parrot •Evolution environmens: TopBraid •Evolution environmens: TopBraid •Validator/Browser: ELDA, S2S •Validator/Browser: ELDA, S2S Source: Fox and McGuiness, 2008. http://tw.rpi.edu/web/doc/TWC_SemanticWebMethodology 12
  13. 13. TWC Provenance-explicit use cases The first use case • • • Title: Visit data center website of dataset used to generate a report figure Actor and system: a reader of the draft NCA3 on the GCIS website Flow of interactions: A reader wishes to identify the source of the data used to produce a particular figure in the draft NCA3. A reference to the paper in which the image contained in this figure was originally published appears in the figure caption. Clicking that reference displays a page of metadata information about the paper, including links to the datasets used in that paper. Pursuing each of those links presents a page of metadata information about the dataset, including a link back to the agency/data center web page describing the dataset in more detail and making the actual data available for order or download. 13
  14. 14. TWC An intuitive concept map of the use case 14
  15. 15. TWC An intuitive concept map of the use case Classes and properties recognized from the use case 15
  16. 16. TWC An intuitive concept map of the use case From an intuitive model to an ontology: (1)A defined class or property should be meaningful and robust enough to meet the requirements of various use cases (2)An ontology can be extended by adding classes and properties Classes and properties recognized from the use case recognized from new use cases through the iterative approach 16
  17. 17. TWC The second use case • • • • Title: Identify roles of people in the generation of a chapter in the draft NCA3 Actor and system: a viewer of the GCIS website Flow of interactions: A viewer sees that Chapter 6 (Agriculture) in the draft NCA3 was written by a group of authors mentioned in a list. On the title page of that chapter the reader can view the role of each author, e.g., convening lead author, lead author or contributing author, in the generation of this report chapter. We decided to use the PROV-O ontology to describe this use case 17
  18. 18. TWC The three Starting Point classes in PROV-O ontology and the properties that relate them Source: http://www.w3.org/TR/prov-o/ 18
  19. 19. TWC Mapping the use case into PROV-O Author of Chapter 6 Chapter 6 in NCA3 isA isA Writing of isA Chapter 6 in NCA3 19
  20. 20. TWC Roles of agents in an activity in PROV-O Source: http://www.w3.org/TR/prov-o/ 20
  21. 21. TWC Mapping roles of chapter authors into PROV-O isA Author of Chapter 6 Writing of Chapter 6 in NCA3 isA Convening lead author Lead author isA Contributing author 21
  22. 22. TWC Roles of people in the activity ‘Writing of Chapter 6’ Here only three of the eight authors of this chapter are shown. Each author had a specific role for this chapter.
  23. 23. TWC We used PROV-O for describing roles of agents in an activity We can also describe roles of agents for an entity 23
  24. 24. TWC Roles of people to the entity ‘Chapter 6: Agriculture’ Here only three of the eight authors of this chapter are shown. Each author had a specific role for this chapter. 24
  25. 25. TWC More instances of prov:Role collected in the GCIS ontology 25
  26. 26. TWC Re-using existing ontologies for the GCIS ontology By such mappings we can use reasoners that are suitable for the PROV-O ontology, and thus to retrieve provenance graphs from the established GCIS 26
  27. 27. TWC The third use case • • • Title: Provenance tracing of NASA contributions to Figure 1.2 in the draft NCA3 Actor and system: a viewer of the GCIS website Flow of interactions: A viewer sees that the caption of Figure 1.2 “Sea Level Rise: Past, Present and Future” of the draft NCA3 cites four data sources. Selecting the third citation displays a page of information about the cited paper and a citation to the dataset used in that paper. Information about the dataset includes a formal description of its origin, that is, the dataset is derived from data produced by the TOPEX/Poseidon and Jason altimeter missions funded by NASA and CNES. Clicking a link to each of these missions presents a page about the platforms, instruments and sensors in that mission. 27
  28. 28. TWC “Figure 1.2: Sea Level Rise: Past, Present, and Future” in draft NCA3 28
  29. 29. TWC (a) Instances of calibration, model and software underpinning “paper/103” Here only the details of one paper (i.e., “paper/103”) cited by that figure are shown Here only the details of Topex-Poseidon mission are shown (b) Instances of sensor, instrument and platform underpinning that paper Provenance tracing of NASA contributions to Figure 1.2 in draft NCA3 29
  30. 30. TWC
  31. 31. TWC
  32. 32. TWC 32
  33. 33. TWC Current result • GCIS ontology version 1.1 – – – – http://tw.rpi.edu/web/project/gcis-imsap/GCISOntology Ontology documentation Conceptual map gcis ontology rpi Ontology RDF • We have had and will have more use cases, and • New versions of GCIS ontologies 33
  34. 34. TWC Current result: GCIS ontology version 1.1 GCIS ontology version 1.1 (a) Classes and properties representing a brief structure of the draft NCA3
  35. 35. TWC GCIS ontology version 1.1 (b) Classes and properties related to the findings of the draft NCA3 and each chapter in it 35
  36. 36. TWC GCIS ontology version 1.1 (c) Classes and properties about sensors, instruments, platforms, and algorithms, etc. that datasets are derived from 36
  37. 37. TWC A few classes are asserted as sub-classes of “prov:Entity” and “prov:Activity”, respectively 37
  38. 38. TWC Wrap up • The use case-driven iterative method bridges the gap between Semantic Web researchers and Earth and environmental scientists – It is capable of rapid deployment for Semantic Web application developments • First-hand experience for re-using the W3C PROV-O ontology in the field of Earth and environmental sciences • GCIS will enrich the GCIS ontology in its provenance tracing capability, eventually for covering provenance information for the entire scope of global change • Collaboration for a PROV-ES ontology for Earth and environmental sciences 38
  39. 39. TWC Thank you! Sponsors gcis rpi max7@rpi.edu

×