• Save
Leveraging The Open Provenance Model as a Multi-Tier Model for Global Climate Research
Upcoming SlideShare
Loading in...5
×
 

Leveraging The Open Provenance Model as a Multi-Tier Model for Global Climate Research

on

  • 243 views

My web page: http://www.linkedin.com/in/ericstephan

My web page: http://www.linkedin.com/in/ericstephan

My citations: http://scholar.google.com/citations?hl=en&user=f4bH2esAAAAJ

Statistics

Views

Total Views
243
Views on SlideShare
241
Embed Views
2

Actions

Likes
0
Downloads
0
Comments
0

2 Embeds 2

http://www.docseek.net 1
http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Leveraging The Open Provenance Model as a Multi-Tier Model for Global Climate Research Leveraging The Open Provenance Model as a Multi-Tier Model for Global Climate Research Presentation Transcript

  • Leveraging The Open Provenance Model as a Multi- Tier Model for Global Climate ResearchEric Stephan, Todd Halter, Brian ErmoldIPAW, 2010
  • Discussion Outline!   Background on Atmospheric Radiation Measurement (ARM) program.!   Challenges without Provenance!   Requirements Analysis!   Multi-Tier Provenance Model!   Use of Open Provenance Model!   Impacts
  • Background!   Atmospheric Radiation Measurement Program !  Production system designed and developed in 1990 !  Data is collected from over 300 remote sensors worldwide. Expanding to over 400 sensors in 2010 !  Data collection will reach over 500 GB/day of atmospheric and satellite data by FY11 !  Value added products (VAPs) developed to correlate, aggregate and support quality studies of raw data into computational models3 View slide
  • Challenges Facing Current VAP Development !   Causality, Lineage, Referential Knowledge Not Formalized: !  Captured in multiple ways and stored in different media and representation forms. !  Sample causality not directly accessible to scientists !  Inability to seamlessly analyze and visualize knowledge !   Provenance Required By Different Audiences !  Producers – Operations/VAP developers !  Consumers –scientist relying on VAPs4 View slide
  • Requirements Analysis 1 of 2Value Added Product Directed GraphLineage (Path) Acyclic Graph andValue Added Product Common PropertiesWorkflow Causality (Hedge) Ordered AutonomousSample Causality … Acyclic Graphs When Processing Data Product (Branch)
  • Requirements Analysis 2 of 2 Tier Purpose Resources Status Operations Developer Researcher Path Lineage N/A Future Needed Needed Needed Path Curation Sample Level QC Exists In Use Needed Needed Path/Hedge Reference Metadata Repository Exists In Use In Use Needed Hedge Reference Configuration files Exists In Use In Use Needed Hedge/Branch Causality Log files Exists Needed In Use Needed Hedge/Branch Derived Trends/Anomalies Future Needed Needed Needed Branch Causality Sample Derivation Method Exists In Use Needed Needed Branch Causality Sample Source Exists In Use Needed Needed6
  • ARM Provenance Model !   Characteristics !  Knowledge required to depict interdependency, overall processing, and discrete sample processing !  Multi-tier !   Each tier representing different granularity and purpose !   Each hedge in context of path, branch in context of hedge. !   Declared tiers make knowledge easier to perform cross comparison !   Because sample provenance at branch tier is autonomous and ordered, provenance can be processed in parallel or stored in chunks. !   Leverage Standards and Community Efforts7
  • 8
  • PROVENANCE LISTENER PICTURE9
  • Estimated Cost of Provenance Sample  Quality  Control   Field  Origin   ~30K for each VAP sample 2 bytes for each VAP ~5-10K sample < 5K graph VAP Lineage VAP Sample Path Hedge Branch10 Low Granularity Medium Granularity High Granularity
  • Analysis Examples !   Timeline Inspection Anomaly and Trend Detection !  Aggregation !  Out of 43,200 potential samples (560K log entries) !   15 distinct processes !   60 distinct process results e.g. !   No AERO G data within minutes of x !   No RRTM_LW output for x !   No RRTM_SW output for x !   No clear sky longwave cloud forcing run for x !   No clear sky shortwave cloud forcing run for x !   No emissivities file RRTM_SW_sfcemissdata !   This can be used to help users know the kinds of questions they can ask.11
  • Impacts!   Provenance articulates ARM data processing causality and lineage in a formal and recognizable way.!   Adding provenance creates a data intensive computing challenge due to the shear volume of provenance represented as a large semantic graph.!   Use of a multi-tier model makes analysis and visualization possible because the provenance graph can be broken into chunks for distributed or parallel processing.!   Modeling the branch tier as autonomous acyclic graphs makes quantitative analysis possible to look for trends or anomalies within one data product, or between multiple data products.