Climate Science for a Sustainable Energy Future Provenance


Invited talk at the Earth System Grid Federation workshop
  1. 1. Climate Science for aSustainable Energy Future(CSSEF) ProvenanceERIC STEPHANPacific Northwest National LaboratoryRichland, WADecember 26, 2012 1
  2. 2. Provenance Definitions!   Provenance is a record that describes the people, institutions, entities, and activities, involved in producing, influencing, or delivering a piece of data or a thing.!   Metadata used to describe the origin of the data and any of its modifications.!   A log of historical events describing the origin of data and any subsequent changes.December 26, 2012 2
  3. 3. Popular Provenance Vocabularies Dublin  Core  Provenance  Task  Force   Open  Provenance  Model   Proof  Markup  Language  Ontology   The  Provenance  Ontology  (Prov-­‐O)  See  Also:  W3C  Incubator  Group,  h8p://   3
  4. 4. The Systems Science Challenge!   Studying  complex  systems  typically  has  the   following  characterisEcs:     !  Interdisciplinary  studies  involve  mulEple  stakeholders     !  Leverage  mulEple  tools,  algorithms,  data  products,  and   sensors   !  Reliant  on  highly  iteraEve  and  repeEEve  techniques   !  Steps  are  difficult  to  document  and  are  oLen  Eme   commiMed  to  memory  or  notes.  !   Sharing  complex  systems  data  between   collaborators  has  the  following  inherent  problems   !  To  establish  data  confidence,  scienEsts  accessing  data   (consumers)  need  to  know  data  origin  and  modificaEon   history  (data  provenance).       !  ScienEsts  producing  the  data  need  a  consistent  means  to   convey  data  provenance  to  targeted  scienEfic  communiEes   !    the  data  provenance  needs  to  be  diverse  enough  to   support  any  data.   !   It  must  also  be  based  on  community  standards  to   cross-­‐reference  searches     4
  5. 5. Example: Motivating User Questions Aboutthe CSSEFARMBE Diagnostics Dataset How  did  both   CSSEFARMBE   and  ARMBE   How  do  CAM   originate?   output   Variables  map   to  the   CSSEFARMBE   variables?   What   addiEonal   Atmosphere   ancillary   informaEon  is   ScienEst   available  about   this  dataset?  CAM  Modeler  December 26, 2012 5
  6. 6. The Knowledge Gap: CSSEF Users Needing Additional Answers from Data Producers Test   NCL   CF   read   Terms   Code   wrote   ARMBE   compared   Header   CAM   read   Web   wrote   Page   CSSEF   CSSEFARMBE  Developers   ARMBE   Header   wrote   How  do  CAM   Tech   output   Variables  map   Report   to  the   CSSEFARMBE   How  did  both   variables?   CSSEFARMBE   and  ARMBE   What   originate?   addiEonal   ancillary   informaEon  is   available  about   this  dataset?  CAM  Modeler   Atmosphere   December 26, 2012 6 ScienEst  
  7. 7. Goals of CSSEF Provenance Environment (ProvEn) Services!   Identify future user communities that will need provenance while the data is being generated by scientists producing the data!   Knowledge products (e.g reports, archivable provenance records)!   Create consumer oriented provenance products by: !   Capturing historical information from any native source necessary to describe the origin of the dataset. !   For user referential purposes retaining a copy of the native source familiar to the domain community. December 26, 2012 7
  8. 8. Goals of CSSEF Provenance Environment (ProvEn) Services!   Store this information in a cross-referenced knowledge model by mapping domain ontology to foundational ontology !   Domain ontologies are diverse and subject to constant changes defined by the concepts extracted from native sources. !   Foundational ontologies are stable and seldom change.!   Use composite knowledge model to provide finished products to different kinds of consumers !   Stability infers lots of methodologies, tools and, services are available to leverage. FoundaGonal  Ontology   Cross-­‐Reference  Capability   W3C  Provenance  Ontology  (Prov-­‐O)   Core  Ontology  Describing  Data  Origin   Dublin  Core  Terms   Data  citaEons  and  soLware   Friend  of  a  Friend  (FOAF)   DescripEon  of  ScienEst    and  collaborators   (Future)  Proof  Markup  Language  3.0   DescripEon  of  jusEficaEon  and  trust   (Future)  Dublin  Core  to  PROV-­‐O  Mapping   Support  integraEon  of  DC  provenance  and  PROV-­‐O   December 26, 2012 8
  9. 9. Identifying a New Product with Native Sources,Domain Concepts and Terms for dataset CSSEF   ARMBE   ARMBE   Header   Header   Tech   ObservaEonal  Data   ObservaEonal  Data   Report   Origin  Concepts   Origin  Concepts   Test   NCL   Code   IdenEfied  Variable  Mapping   Concepts  and  Terms   CF   Terms  CAM   IdenEfied  Variable  Mapping  Web  Page   Concepts  and  Terms  December 26, 2012 9
  10. 10. Creating and Maintaining DomainOntologies (Knowledge Engineer) Atmosphere   DiagnosEcs   Add   Atmosphere   Dataset  Origin/ Domain   Mapping   Ontology   Terms  and   Concepts   (Build  Ontology)   Aligned   Knowledge     Model     Register   For   Atmosphere   (Align  Ontologies)   FoundaEonal   Ontologies   ProvEn  Services  December 26, 2012 10
  11. 11. Creating new Product By Populating ProvEn Services with CSSEFARMBE Dataset Native Sources CSSEF   ARMBE  CSSEFARMBE   Tech     Header   ARMBE  knowledge  relevant     Report   Test   Header   NaEve  Sources    to  CAM  Modeler  and   NCL   CAM   CF   contributed  by  Atmosphere  ScienEst   Code   Web   Terms   Developers   Page   CSSEFARMBE  Developers   NaEve  Source  Concept  ExtracEon   ProvEn  Services   NaEve  Provenance  Mapped   Copy  of   to  Atmosphere  Domain   Corresponding   Ontology   NaEve     NaEve  Sources   Source   Aligned  Knowledge  Model   References   for  Atmosphere     FoundaEonal  Ontologies   December 26, 2012 11
  12. 12. Producing ProvEn Services Product:CSSEFARMBE Dataset Origin Report ProvEn  Services  Store   What   addiEonal   NaEve  Provenance  Mapped   ancillary   informaEon  is   available  about   to  Atmosphere  Domain   this  dataset?   Ontology   CAM  Modeler   Aligned  Knowledge  Model   for  Atmosphere     Standard  Vocabulary   Cross-­‐Reference     How  did  both   FoundaEonal  Ontologies   Searching  and  Reasoning   CSSEFARMBE   and  ARMBE   originate?   Atmosphere   ScienEst   December 26, 2012 12
  13. 13. ProvEn Services ArchitectureStore  NaEve     Query  and  Cross-­‐Reference    Provenance   Provenance   ESGF  Node   ProvEn  (Jersey)  REST  Services   Ali  Baba  Object   Searching  and   to  RDF    API   Inferencing  API   Local   Compute   Glassfish  Server   Portable   Cluster   Jarfile   Deploy   Sesame  Store   UVCDAT  December 26, 2012 13
  14. 14. Questions?!   Contact: eric.stephan@pnnl.gov14