SlideShare a Scribd company logo
Leveraging The Open
  Provenance Model as a Multi-
  Tier Model for Global Climate
            Research
Eric Stephan, Todd Halter, Brian Ermold
IPAW, 2010
Discussion Outline

!   Background on Atmospheric Radiation
    Measurement (ARM) program.
!   Challenges without Provenance
!   Requirements Analysis
!   Multi-Tier Provenance Model
!   Use of Open Provenance Model
!   Impacts
Background

!   Atmospheric Radiation Measurement Program
     !    Production system designed and developed in 1990
     !    Data is collected from over 300 remote sensors worldwide.
          Expanding to over 400 sensors in 2010
     !    Data collection will reach over 500 GB/day of atmospheric
          and satellite data by FY11
     !    Value added products (VAPs)
          developed to correlate, aggregate
          and support quality studies of raw
          data into computational models




3
Challenges Facing Current VAP Development

    !   Causality, Lineage, Referential Knowledge Not
        Formalized:
       !    Captured in multiple ways and stored in different media and
            representation forms.
       !    Sample causality not directly accessible to scientists
       !    Inability to seamlessly analyze and visualize knowledge
    !   Provenance Required By Different Audiences
       !    Producers – Operations/VAP developers
       !    Consumers –scientist relying on VAPs




4
Requirements Analysis 1 of 2


Value Added Product             Directed Graph
Lineage                         (Path)




                               Acyclic Graph and
Value Added Product
                               Common Properties
Workflow Causality             (Hedge)


                               Ordered Autonomous
Sample Causality           …   Acyclic Graphs When
                               Processing Data
                               Product (Branch)
Requirements Analysis 2 of 2


        Tier           Purpose     Resources                  Status   Operations   Developer   Researcher




        Path           Lineage     N/A                        Future   Needed       Needed      Needed




        Path           Curation    Sample Level QC            Exists   In Use       Needed      Needed



        Path/Hedge     Reference   Metadata Repository        Exists   In Use       In Use      Needed



        Hedge          Reference   Configuration files        Exists   In Use       In Use      Needed



        Hedge/Branch   Causality   Log files                  Exists   Needed       In Use      Needed



        Hedge/Branch   Derived     Trends/Anomalies           Future   Needed       Needed      Needed




        Branch         Causality   Sample Derivation Method   Exists   In Use       Needed      Needed




        Branch         Causality   Sample Source              Exists   In Use       Needed      Needed




6
ARM Provenance Model


    !   Characteristics
       !    Knowledge required to depict interdependency, overall
            processing, and discrete sample processing
       !    Multi-tier
             !   Each tier representing different granularity and purpose

             !   Each hedge in context of path, branch in context of hedge.

             !   Declared tiers make knowledge easier to perform cross
                 comparison
             !   Because sample provenance at branch tier is autonomous and
                 ordered, provenance can be processed in parallel or stored in
                 chunks.
    !   Leverage Standards and Community Efforts

7
8
PROVENANCE LISTENER PICTURE




9
Estimated Cost of Provenance




                                                                       Sample	
  Quality	
  Control	
  
                                                                           Field	
  Origin	
  
                                             ~30K for
                                            each VAP
                                              sample                    2 bytes for
                                                                        each VAP
                           ~5-10K                                        sample
      < 5K graph
      VAP Lineage             VAP                       Sample

          Path               Hedge                      Branch

10   Low Granularity   Medium Granularity           High Granularity
Analysis Examples
     !   Timeline Inspection                                    Anomaly and Trend Detection




     !    Aggregation
     !    Out of 43,200 potential samples (560K log entries)
           !   15 distinct processes
           !   60 distinct process results e.g.
                  !   No AERO G data within minutes of x
                  !   No RRTM_LW output for x
                  !   No RRTM_SW output for x
                  !   No clear sky longwave cloud forcing run for x
                  !   No clear sky shortwave cloud forcing run for x
                  !   No emissivities file RRTM_SW_sfcemissdata
           !   This can be used to help users know the kinds of questions they can ask.
11
Impacts

!   Provenance articulates ARM data processing causality
    and lineage in a formal and recognizable way.

!   Adding provenance creates a data intensive computing
    challenge due to the shear volume of provenance
    represented as a large semantic graph.

!   Use of a multi-tier model makes analysis and visualization
    possible because the provenance graph can be broken
    into chunks for distributed or parallel processing.

!   Modeling the branch tier as autonomous acyclic graphs
    makes quantitative analysis possible to look for trends or
    anomalies within one data product, or between multiple
    data products.

More Related Content

Similar to Leveraging The Open Provenance Model as a Multi-Tier Model for Global Climate Research

Paper presentation: Taverna, reloaded
Paper presentation: Taverna, reloadedPaper presentation: Taverna, reloaded
Paper presentation: Taverna, reloaded
Paolo Missier
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
Dmitry Grapov
 
SNIA Emerald Introduction
SNIA Emerald IntroductionSNIA Emerald Introduction
SNIA Emerald Introductiondlarusso15
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Prof. Wim Van Criekinge
 
Evlib2009forum8
Evlib2009forum8Evlib2009forum8
Evlib2009forum8jatpack
 
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Sage Base
 
[2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger [2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger
Eli Kaminuma
 
Top Cited Articles International Journal of Computer Science, Engineering and...
Top Cited Articles International Journal of Computer Science, Engineering and...Top Cited Articles International Journal of Computer Science, Engineering and...
Top Cited Articles International Journal of Computer Science, Engineering and...
IJCSEA Journal
 
Real-Time Non-Intrusive Speech Quality Estimation for VoIP
Real-Time Non-Intrusive Speech Quality Estimation for VoIPReal-Time Non-Intrusive Speech Quality Estimation for VoIP
Real-Time Non-Intrusive Speech Quality Estimation for VoIP
adil raja
 
Molecular Biology Software Links
Molecular Biology Software LinksMolecular Biology Software Links
Molecular Biology Software Links
university of education,Lahore
 
All Aboard the Databus
All Aboard the DatabusAll Aboard the Databus
All Aboard the Databus
Amy W. Tang
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pubsesejun
 
Gwas.emes.comp
Gwas.emes.compGwas.emes.comp
Gwas.emes.comp
Richard Emes
 
Mar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupMar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupGenomeInABottle
 
Dfma
DfmaDfma
Best Practices for Validating a Next-Gen Sequencing Workflow
Best Practices for Validating a Next-Gen Sequencing WorkflowBest Practices for Validating a Next-Gen Sequencing Workflow
Best Practices for Validating a Next-Gen Sequencing Workflow
Golden Helix
 
BioDec Srl Company Profile
BioDec Srl Company ProfileBioDec Srl Company Profile
BioDec Srl Company Profile
BioDec
 
170326 giab abrf
170326 giab abrf170326 giab abrf
170326 giab abrf
GenomeInABottle
 
2013-01-17 Research Object
2013-01-17 Research Object2013-01-17 Research Object
2013-01-17 Research Object
Stian Soiland-Reyes
 

Similar to Leveraging The Open Provenance Model as a Multi-Tier Model for Global Climate Research (20)

Paper presentation: Taverna, reloaded
Paper presentation: Taverna, reloadedPaper presentation: Taverna, reloaded
Paper presentation: Taverna, reloaded
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
 
SNIA Emerald Introduction
SNIA Emerald IntroductionSNIA Emerald Introduction
SNIA Emerald Introduction
 
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
Galaxy dna-seq-variant calling-presentationandpractical_gent_april-2016
 
Evlib2009forum8
Evlib2009forum8Evlib2009forum8
Evlib2009forum8
 
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
 
[2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger [2017-05-29] DNASmartTagger
[2017-05-29] DNASmartTagger
 
Top Cited Articles International Journal of Computer Science, Engineering and...
Top Cited Articles International Journal of Computer Science, Engineering and...Top Cited Articles International Journal of Computer Science, Engineering and...
Top Cited Articles International Journal of Computer Science, Engineering and...
 
Real-Time Non-Intrusive Speech Quality Estimation for VoIP
Real-Time Non-Intrusive Speech Quality Estimation for VoIPReal-Time Non-Intrusive Speech Quality Estimation for VoIP
Real-Time Non-Intrusive Speech Quality Estimation for VoIP
 
Molecular Biology Software Links
Molecular Biology Software LinksMolecular Biology Software Links
Molecular Biology Software Links
 
All Aboard the Databus
All Aboard the DatabusAll Aboard the Databus
All Aboard the Databus
 
20110524zurichngs 1st pub
20110524zurichngs 1st pub20110524zurichngs 1st pub
20110524zurichngs 1st pub
 
Gwas.emes.comp
Gwas.emes.compGwas.emes.comp
Gwas.emes.comp
 
Mar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working GroupMar2013 Performance Metrics Working Group
Mar2013 Performance Metrics Working Group
 
Dfma
DfmaDfma
Dfma
 
Best Practices for Validating a Next-Gen Sequencing Workflow
Best Practices for Validating a Next-Gen Sequencing WorkflowBest Practices for Validating a Next-Gen Sequencing Workflow
Best Practices for Validating a Next-Gen Sequencing Workflow
 
BioDec Srl Company Profile
BioDec Srl Company ProfileBioDec Srl Company Profile
BioDec Srl Company Profile
 
Brizio rossibiodec
Brizio rossibiodecBrizio rossibiodec
Brizio rossibiodec
 
170326 giab abrf
170326 giab abrf170326 giab abrf
170326 giab abrf
 
2013-01-17 Research Object
2013-01-17 Research Object2013-01-17 Research Object
2013-01-17 Research Object
 

More from Eric Stephan

Increasing the Reputation of your Published Data on the Web
Increasing the Reputation of your Published Data on the WebIncreasing the Reputation of your Published Data on the Web
Increasing the Reputation of your Published Data on the Web
Eric Stephan
 
Diary of a Wimpy Model Manager
Diary of a Wimpy Model ManagerDiary of a Wimpy Model Manager
Diary of a Wimpy Model Manager
Eric Stephan
 
Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...
Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...
Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...
Eric Stephan
 
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
Eric Stephan
 
Climate Science for a Sustainable Energy Future Provenance
Climate Science for a Sustainable Energy Future ProvenanceClimate Science for a Sustainable Energy Future Provenance
Climate Science for a Sustainable Energy Future Provenance
Eric Stephan
 
The Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and WorkflowThe Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and Workflow
Eric Stephan
 

More from Eric Stephan (6)

Increasing the Reputation of your Published Data on the Web
Increasing the Reputation of your Published Data on the WebIncreasing the Reputation of your Published Data on the Web
Increasing the Reputation of your Published Data on the Web
 
Diary of a Wimpy Model Manager
Diary of a Wimpy Model ManagerDiary of a Wimpy Model Manager
Diary of a Wimpy Model Manager
 
Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...
Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...
Do It Yourself (DIY) Earth Science Collaboratories Using Best Practices and B...
 
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
A Linked Fusion of Things, Services, and Data to Support a Collaborative Data...
 
Climate Science for a Sustainable Energy Future Provenance
Climate Science for a Sustainable Energy Future ProvenanceClimate Science for a Sustainable Energy Future Provenance
Climate Science for a Sustainable Energy Future Provenance
 
The Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and WorkflowThe Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and Workflow
 

Recently uploaded

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 

Recently uploaded (20)

Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 

Leveraging The Open Provenance Model as a Multi-Tier Model for Global Climate Research

  • 1. Leveraging The Open Provenance Model as a Multi- Tier Model for Global Climate Research Eric Stephan, Todd Halter, Brian Ermold IPAW, 2010
  • 2. Discussion Outline !   Background on Atmospheric Radiation Measurement (ARM) program. !   Challenges without Provenance !   Requirements Analysis !   Multi-Tier Provenance Model !   Use of Open Provenance Model !   Impacts
  • 3. Background !   Atmospheric Radiation Measurement Program !  Production system designed and developed in 1990 !  Data is collected from over 300 remote sensors worldwide. Expanding to over 400 sensors in 2010 !  Data collection will reach over 500 GB/day of atmospheric and satellite data by FY11 !  Value added products (VAPs) developed to correlate, aggregate and support quality studies of raw data into computational models 3
  • 4. Challenges Facing Current VAP Development !   Causality, Lineage, Referential Knowledge Not Formalized: !  Captured in multiple ways and stored in different media and representation forms. !  Sample causality not directly accessible to scientists !  Inability to seamlessly analyze and visualize knowledge !   Provenance Required By Different Audiences !  Producers – Operations/VAP developers !  Consumers –scientist relying on VAPs 4
  • 5. Requirements Analysis 1 of 2 Value Added Product Directed Graph Lineage (Path) Acyclic Graph and Value Added Product Common Properties Workflow Causality (Hedge) Ordered Autonomous Sample Causality … Acyclic Graphs When Processing Data Product (Branch)
  • 6. Requirements Analysis 2 of 2 Tier Purpose Resources Status Operations Developer Researcher Path Lineage N/A Future Needed Needed Needed Path Curation Sample Level QC Exists In Use Needed Needed Path/Hedge Reference Metadata Repository Exists In Use In Use Needed Hedge Reference Configuration files Exists In Use In Use Needed Hedge/Branch Causality Log files Exists Needed In Use Needed Hedge/Branch Derived Trends/Anomalies Future Needed Needed Needed Branch Causality Sample Derivation Method Exists In Use Needed Needed Branch Causality Sample Source Exists In Use Needed Needed 6
  • 7. ARM Provenance Model !   Characteristics !  Knowledge required to depict interdependency, overall processing, and discrete sample processing !  Multi-tier !   Each tier representing different granularity and purpose !   Each hedge in context of path, branch in context of hedge. !   Declared tiers make knowledge easier to perform cross comparison !   Because sample provenance at branch tier is autonomous and ordered, provenance can be processed in parallel or stored in chunks. !   Leverage Standards and Community Efforts 7
  • 8. 8
  • 10. Estimated Cost of Provenance Sample  Quality  Control   Field  Origin   ~30K for each VAP sample 2 bytes for each VAP ~5-10K sample < 5K graph VAP Lineage VAP Sample Path Hedge Branch 10 Low Granularity Medium Granularity High Granularity
  • 11. Analysis Examples !   Timeline Inspection Anomaly and Trend Detection !  Aggregation !  Out of 43,200 potential samples (560K log entries) !   15 distinct processes !   60 distinct process results e.g. !   No AERO G data within minutes of x !   No RRTM_LW output for x !   No RRTM_SW output for x !   No clear sky longwave cloud forcing run for x !   No clear sky shortwave cloud forcing run for x !   No emissivities file RRTM_SW_sfcemissdata !   This can be used to help users know the kinds of questions they can ask. 11
  • 12. Impacts !   Provenance articulates ARM data processing causality and lineage in a formal and recognizable way. !   Adding provenance creates a data intensive computing challenge due to the shear volume of provenance represented as a large semantic graph. !   Use of a multi-tier model makes analysis and visualization possible because the provenance graph can be broken into chunks for distributed or parallel processing. !   Modeling the branch tier as autonomous acyclic graphs makes quantitative analysis possible to look for trends or anomalies within one data product, or between multiple data products.