DataFed:  View-Based Mediated Web Service Architecture Rudolf B. Husar and Kari Hoijarvi Washington University, St. Louis  Presented at IGARSS07, Barcelona, ES, July 22, 2007 Mediator Wrapper Wrapper Source 1 Source 2 User query Data View
Information Landscape: Providers  Geography, Content, Agency, Form Data are  distributed  geographically by  autonomous  providers Emission Ambient Satellite Model EPA NOAA NASA Other Content | Agency  |  Form Data includes  emissions Emission Emission Emission Emission Emission Ambient Satellite Model EPA NOAA NASA Other Content | Agency  |  Form Data includes  emissions, ambient  data,  Ambient Ambient Ambient Ambient Emission Emission Emission Emission Emission Ambient Satellite Model EPA NOAA NASA Other Content | Agency  |  Form Data includes  emissions, ambient  data,  satellite  data Satellite Satellite Satellite Satellite Ambient Ambient Ambient Ambient Emission Emission Emission Emission Emission Ambient Satellite Model EPA NOAA NASA Other Content | Agency  |  Form Data includes  emissions, ambient  data,  satellite  data and  model  output Model Model Model Model Satellite Satellite Satellite Satellite Ambient Ambient Ambient Ambient Emission Emission Emission Emission Emission Ambient Satellite Model EPA NOAA NASA Other Content | Agency  |  Form Data are provided   by multiple  agencies : EPA, NOAA, NASA and others NASA Mission NOAA GASP NASA IDEA NASA  DAACs NOAA  ASOS EPA-AQS DataMart EPA  AIRNow RPO  VIEWS FS  FireInv State/Local Emission EPA  NEISGEI EPA  NEI NOAA  WeaMod EPA AQModel NOAA  Forecast Emission Ambient Satellite Model EPA NOAA NASA Other Content | Agency  |  Form NASA   DAACs NOAA   GASP NASA IDEA NASA  Missions EPA   NEI EPA   NEISGEI FS  FireInv State/Local   Emission NOAA   ASOS RPO   VIEWS EPA   AIRNow EPA-AQS   AIRS NOAA   WeaMod EPA   AQModel NASA   GloModel NOAA   Forecast Furthermore, data are provided in varied  formats  and access  protocols Emission Ambient Satellite Model EPA NOAA NASA Other Content | Agency  |  Form Data on Internet are geography-independent and can be ‘linearized’  Internet NASA   DAACs EPA R&D Model EPA   AIRNow others
Information Landscape: Users Types, Agency, Info Needs Users are  distributed  geographically EPA NOAA NASA Other Stakeholder |  Agency  |  Form Policy Manager Public Scientist EPA NOAA NASA Other Stakeholder |  Agency  |  Form Policy Manager Public Scientist Policy Policy Policy Users includes  policy makers EPA NOAA NASA Other Stakeholder |  Agency  |  Form Policy Manager Public Scientist Users includes  policy makers,  the  public Policy Policy Policy Public Public EPA NOAA NASA Other Stakeholder |  Agency  |  Form Policy Manager Public Scientist Users includes  policy makers,  the  public , AQ   managers   Policy Policy Policy Public Public Manager Manager EPA NOAA NASA Other Stakeholder |  Agency  |  Form Policy Manager Public Scientist and  scientist   Policy Policy Policy Public Public Manager Manager Scientist Scientist Scientist EPA NOAA NASA Other Stakeholder |  Agency  |  Form Policy Manager Public Scientist Users are affiliated with multiple  agencies : EPA, NOAA, NASA, as well as others Policy Policy Policy Public Public Manager Manager Scientist Scientist Scientist EPA NOAA NASA Other Stakeholder |  Agency  |  Form Policy Manager Public Scientist Furthermore, users need various types of information provided in multiple formats Policy Manager Policy Scientist Manager Scientist Scientist Policy Public Public EPA NOAA NASA Other Stakeholder |  Agency  |  Form Policy Manager Public Scientist Since the users are also on the Internet, their geographic  location is irrelevant Public Manager Scientist Internet other
Lets agree on Space-Time-Parameter Data Access Query Protocol
OGS WCS/WMS Protocols Space-Time-Parameter queries T2 T1 Loose Coupling of Servcies GetCapabilities GetData Capabilities, ‘Profile’ Data Where? When? What? Which Format? Server Back End Std. Interface Client Front End Std. Interface CF, EOS, OGC CF OGC, ISO OGC, ISO Standards netCDF, HDF.. Format Temperature What? Time When? BBOX Where? GetData Query
Gio Wiederhold 1992: Mediators
Jeff Ullman, 1998: Answering Queries Using Views Mediator Global Schema Wrapper Wrapper Source 1 Source 2 User query Query Query Query Query Result Result Result Result View Jeff Ullman, Stanford, 1998 Mediator answers queries Collects data from wrappers or mediators Wrapper (adapter) translates between the local and global language, model Heterogeneous sources
Wrapper Classes: Point, Image, Grid  5Dim Data Model Common Views
Anatomy of a Wrapper Service: TOMS Satellite Image Data Wrapper Service can access and spatially subset image for any day (WMS) Wrapper Service and mediation is performed by third party This makes a non-intrusive, adoptive system for agile networking Image Description for Data Access: image_width=502 image_height=329 margin_bottom=105 margin_left=69 margin_right=69 margin_top=46 lat_min=-70  lat_max=70  lon_min=-180 lon_max=180 Daily TOMS images on FTP archive  ftp://toms.gsfc.nasa.gov/pub/eptoms/images/aerosol/y2000/ea000820.gif Template:  ftp://toms.gsfc.nasa.gov/pub/eptoms/images/aerosol/y[yyyy]/ea[yy][mm][dd].gif Transparent colors for overlays RGB(89,140,255) RGB(41,117,41) RGB(23,23,23) RGB(0,0,0)
Integrated Data System for Air Quality-IDAQ  ESIP AQ Cluster 050510  Draft  [email_address] The challenge is to design a general supportive infrastructure Simply connecting the relevant provides and users for each info product is messy The info system infrastructure needs to facilitate the creation of info products AQ Compliance Nowcast/Forecast Status & Trends Find Data Gaps ID New Problems ……… Info Needs Reports Providers supply the ‘raw material’ (data and models) for ‘refined’ info products Emission Surface Satellite Model Single Datasets Providers Wrappers Where? What? When? Federate Data Structuring Structuring the heterogeneous data into where-when-what ‘cubes’ simplifies the mess Slice & Dice Explore Data Viewers The ‘cubed’ data can be accessed and explored by slicing-dicing tools Programs Integrate Understand More elaborate data integration and fusion can be done by web service chaining This infrastructure support for IDAQ can be provided by the ESIP Federation  Non-intrusive Linking & Mediation Data Users Data Providers
Model-Data Comparison Workflow Software Workflow Flow Program   Lego-like assembly of component   AeroCom Chemical Models Paris, FR VIEWS Chemical Data Ft. Collins, CO Model- Data Comparison Workflow Std I/O Std I/O WMS, WCS OGC Services
 
DataFed: 100+ Datasets Non-intrusively Federated Data are accessed from autonomous, distributed providers DataFed ‘wrappers’ provide uniform geo-time referencing Tools allow space/time overlay, comparisons and fusion Near Real Time Data Integration Delayed Data Integration Surface Air Quality  AIRNOW O3, PM25  ASOS_STI Visibility, 300 sites METAR Visibility, 1200 sites VIEWS_OL 40+ Aerosol Parameters Satellite MODIS_AOT AOT, Idea Project GASP Reflectance, AOT TOMS Absorption Indx, Refl. SEAW_US Reflectance, AOT Model Output NAAPS Dust, Smoke, Sulfate, AOT WRF Sulfate Fire Data HMS_Fire Fire Pixels MODIS_Fire Fire Pixels Surface Meteorology RADAR NEXTRAD SURF_MET Temp, Dewp, Humidity… SURF_WIND Wind vectors ATAD Trajectory, VIEWS locs.
Summary Third-party  mediation can homogenize  distributed ES data Agile SOA-based IS can deliver diverse  info products to users Since 2005, one such IS,  DataFed is used  by EPA and in research  For networking, more data and services need to be federated  Parting thoughts Think outside  the stovepipe –  Think networking Divide and Conquer, NO!  Connect and Enable, YES! Thank you

070726 Igarss07 Barcelona

  • 1.
    DataFed: View-BasedMediated Web Service Architecture Rudolf B. Husar and Kari Hoijarvi Washington University, St. Louis Presented at IGARSS07, Barcelona, ES, July 22, 2007 Mediator Wrapper Wrapper Source 1 Source 2 User query Data View
  • 2.
    Information Landscape: Providers Geography, Content, Agency, Form Data are distributed geographically by autonomous providers Emission Ambient Satellite Model EPA NOAA NASA Other Content | Agency | Form Data includes emissions Emission Emission Emission Emission Emission Ambient Satellite Model EPA NOAA NASA Other Content | Agency | Form Data includes emissions, ambient data, Ambient Ambient Ambient Ambient Emission Emission Emission Emission Emission Ambient Satellite Model EPA NOAA NASA Other Content | Agency | Form Data includes emissions, ambient data, satellite data Satellite Satellite Satellite Satellite Ambient Ambient Ambient Ambient Emission Emission Emission Emission Emission Ambient Satellite Model EPA NOAA NASA Other Content | Agency | Form Data includes emissions, ambient data, satellite data and model output Model Model Model Model Satellite Satellite Satellite Satellite Ambient Ambient Ambient Ambient Emission Emission Emission Emission Emission Ambient Satellite Model EPA NOAA NASA Other Content | Agency | Form Data are provided by multiple agencies : EPA, NOAA, NASA and others NASA Mission NOAA GASP NASA IDEA NASA DAACs NOAA ASOS EPA-AQS DataMart EPA AIRNow RPO VIEWS FS FireInv State/Local Emission EPA NEISGEI EPA NEI NOAA WeaMod EPA AQModel NOAA Forecast Emission Ambient Satellite Model EPA NOAA NASA Other Content | Agency | Form NASA DAACs NOAA GASP NASA IDEA NASA Missions EPA NEI EPA NEISGEI FS FireInv State/Local Emission NOAA ASOS RPO VIEWS EPA AIRNow EPA-AQS AIRS NOAA WeaMod EPA AQModel NASA GloModel NOAA Forecast Furthermore, data are provided in varied formats and access protocols Emission Ambient Satellite Model EPA NOAA NASA Other Content | Agency | Form Data on Internet are geography-independent and can be ‘linearized’ Internet NASA DAACs EPA R&D Model EPA AIRNow others
  • 3.
    Information Landscape: UsersTypes, Agency, Info Needs Users are distributed geographically EPA NOAA NASA Other Stakeholder | Agency | Form Policy Manager Public Scientist EPA NOAA NASA Other Stakeholder | Agency | Form Policy Manager Public Scientist Policy Policy Policy Users includes policy makers EPA NOAA NASA Other Stakeholder | Agency | Form Policy Manager Public Scientist Users includes policy makers, the public Policy Policy Policy Public Public EPA NOAA NASA Other Stakeholder | Agency | Form Policy Manager Public Scientist Users includes policy makers, the public , AQ managers Policy Policy Policy Public Public Manager Manager EPA NOAA NASA Other Stakeholder | Agency | Form Policy Manager Public Scientist and scientist Policy Policy Policy Public Public Manager Manager Scientist Scientist Scientist EPA NOAA NASA Other Stakeholder | Agency | Form Policy Manager Public Scientist Users are affiliated with multiple agencies : EPA, NOAA, NASA, as well as others Policy Policy Policy Public Public Manager Manager Scientist Scientist Scientist EPA NOAA NASA Other Stakeholder | Agency | Form Policy Manager Public Scientist Furthermore, users need various types of information provided in multiple formats Policy Manager Policy Scientist Manager Scientist Scientist Policy Public Public EPA NOAA NASA Other Stakeholder | Agency | Form Policy Manager Public Scientist Since the users are also on the Internet, their geographic location is irrelevant Public Manager Scientist Internet other
  • 4.
    Lets agree onSpace-Time-Parameter Data Access Query Protocol
  • 5.
    OGS WCS/WMS ProtocolsSpace-Time-Parameter queries T2 T1 Loose Coupling of Servcies GetCapabilities GetData Capabilities, ‘Profile’ Data Where? When? What? Which Format? Server Back End Std. Interface Client Front End Std. Interface CF, EOS, OGC CF OGC, ISO OGC, ISO Standards netCDF, HDF.. Format Temperature What? Time When? BBOX Where? GetData Query
  • 6.
  • 7.
    Jeff Ullman, 1998:Answering Queries Using Views Mediator Global Schema Wrapper Wrapper Source 1 Source 2 User query Query Query Query Query Result Result Result Result View Jeff Ullman, Stanford, 1998 Mediator answers queries Collects data from wrappers or mediators Wrapper (adapter) translates between the local and global language, model Heterogeneous sources
  • 8.
    Wrapper Classes: Point,Image, Grid 5Dim Data Model Common Views
  • 9.
    Anatomy of aWrapper Service: TOMS Satellite Image Data Wrapper Service can access and spatially subset image for any day (WMS) Wrapper Service and mediation is performed by third party This makes a non-intrusive, adoptive system for agile networking Image Description for Data Access: image_width=502 image_height=329 margin_bottom=105 margin_left=69 margin_right=69 margin_top=46 lat_min=-70 lat_max=70 lon_min=-180 lon_max=180 Daily TOMS images on FTP archive ftp://toms.gsfc.nasa.gov/pub/eptoms/images/aerosol/y2000/ea000820.gif Template: ftp://toms.gsfc.nasa.gov/pub/eptoms/images/aerosol/y[yyyy]/ea[yy][mm][dd].gif Transparent colors for overlays RGB(89,140,255) RGB(41,117,41) RGB(23,23,23) RGB(0,0,0)
  • 10.
    Integrated Data Systemfor Air Quality-IDAQ ESIP AQ Cluster 050510 Draft [email_address] The challenge is to design a general supportive infrastructure Simply connecting the relevant provides and users for each info product is messy The info system infrastructure needs to facilitate the creation of info products AQ Compliance Nowcast/Forecast Status & Trends Find Data Gaps ID New Problems ……… Info Needs Reports Providers supply the ‘raw material’ (data and models) for ‘refined’ info products Emission Surface Satellite Model Single Datasets Providers Wrappers Where? What? When? Federate Data Structuring Structuring the heterogeneous data into where-when-what ‘cubes’ simplifies the mess Slice & Dice Explore Data Viewers The ‘cubed’ data can be accessed and explored by slicing-dicing tools Programs Integrate Understand More elaborate data integration and fusion can be done by web service chaining This infrastructure support for IDAQ can be provided by the ESIP Federation Non-intrusive Linking & Mediation Data Users Data Providers
  • 11.
    Model-Data Comparison WorkflowSoftware Workflow Flow Program Lego-like assembly of component AeroCom Chemical Models Paris, FR VIEWS Chemical Data Ft. Collins, CO Model- Data Comparison Workflow Std I/O Std I/O WMS, WCS OGC Services
  • 12.
  • 13.
    DataFed: 100+ DatasetsNon-intrusively Federated Data are accessed from autonomous, distributed providers DataFed ‘wrappers’ provide uniform geo-time referencing Tools allow space/time overlay, comparisons and fusion Near Real Time Data Integration Delayed Data Integration Surface Air Quality AIRNOW O3, PM25 ASOS_STI Visibility, 300 sites METAR Visibility, 1200 sites VIEWS_OL 40+ Aerosol Parameters Satellite MODIS_AOT AOT, Idea Project GASP Reflectance, AOT TOMS Absorption Indx, Refl. SEAW_US Reflectance, AOT Model Output NAAPS Dust, Smoke, Sulfate, AOT WRF Sulfate Fire Data HMS_Fire Fire Pixels MODIS_Fire Fire Pixels Surface Meteorology RADAR NEXTRAD SURF_MET Temp, Dewp, Humidity… SURF_WIND Wind vectors ATAD Trajectory, VIEWS locs.
  • 14.
    Summary Third-party mediation can homogenize distributed ES data Agile SOA-based IS can deliver diverse info products to users Since 2005, one such IS, DataFed is used by EPA and in research For networking, more data and services need to be federated Parting thoughts Think outside the stovepipe – Think networking Divide and Conquer, NO! Connect and Enable, YES! Thank you