Interoperable Web Services for Distributed Data Access and Analysis of Emissions Inventories November 30, 2006 GEIA 2006 Open Conference Paris, France Stefan Falke [email_address] Center for Air Pollution Impact and Trend Analysis (CAPITA) Department of Energy, Environmental and Chemical Engineering Washington University St. Louis, Missouri USA Terry Keating [email_address] US Environmental Protection Agency Office of Air & Radiation Washington, DC, USA
<ul><li>Objectives: advance the implementation of the Networked Environmental Information Systems for Global Emissions Inventories (NEISGEI) , an US EPA initiative to develop a web-based global air emissions inventory network to provide </li></ul><ul><ul><li>access to distributed emission inventory data at multi-spatial and temporal scales </li></ul></ul><ul><ul><li>tools for data processing and analysis </li></ul></ul><ul><ul><li>means for sharing data & tools </li></ul></ul><ul><ul><li>an environment for collaboration among researchers, regulators, policy analysts and interested public </li></ul></ul><ul><li>Approach: Develop, test, and implement components of an air quality cyberinfrastructure using the latest advances in information technology to make multi-scale air emissions data and tools easier to find, use and integrate. </li></ul>An air emissions “cyberinfrastructure”
Cyberinfrastructure Cyberinfrastructure - information sciences and technologies used to build new types of scientific and engineering knowledge environments with the goal of pursuing research and management more effectively and efficiently. “ Contemporary projects require effective federation of both distributed resources (data and facilities) and distributed, multidisciplinary expertise and cyberinfrastructure is a key to making this possible .” - NSF Blue Ribbon Report on Cyberinfrastructure, 2003 (Atkins, 2004)
Conceptual Diagram of an Emissions Cyberinfrastructure XML GIS Estimation Methods Geospatial One-Stop Transport Models GEIA/ACCENT Data Portal Users & Projects Web Tools/Services Emissions Inventories Data Data Catalogs Activity Data Spatial Allocation Comparison of Emissions Methods Data Analysis Model Development Wrappers/ Adapters/ Standards Emissions Factors Surrogates Report Generation Mediators / Portals Portals
Networked Inventories Principles Distributed/Federated . Data are shared but remain distributed and maintained by their original inventory organizations. The data are dynamically accessed from multiple sources through the Internet rather than collecting all emission data in a single repository. Non-intrusive . The technologies needed to bring inventory nodes together in a distributed network should not require substantial modifications by the emission inventory organizations in order to participate. However, there will need to be some harmonization of existing inventory data. Transparent . From the emission inventory user’s perspective, the distributed data should appear to originate from a single database. One interface to multiple data sets should be possible without required special software or download onto the user’s computer. Flexible/Extendable (Interoperable) . An emission data network should be designed with the ability to easily incorporate new data and tools from new providers joining the network so that they can be integrated with existing data and tools.
NEISGEI Web Portal Built using LifeRay , an open-source portal package Accessible through http://www.neisgei.org A community resource providing access to , descriptions of , and dialogues about an array of content and services for exploring and sharing emissions data , tools and ideas .
Federated data system - DataFed The Data Federation is a web-based infrastructure for distributed data access and collaborative processing/analysis of air quality data. ( Husar et al., 2004 ) http:// datafed.net 50+ Datasets Export or connect to other web services NEISGEI is built on DataFed infrastructure and services.
Geospatial Web Standards <ul><li>Standards for finding, accessing, portraying, and processing geospatial data are defined by the Open Geospatial Consortium (OGC) . </li></ul><ul><ul><li>Web Map Server (WMS) for exchanging map images, but the </li></ul></ul><ul><ul><li>Web Feature Service (WFS) retrieves discrete feature data (roads, political boundaries) </li></ul></ul><ul><ul><li>Web Coverage Service (WCS) allows access to multidimensional data that represent coverages, such as grids or point monitoring data </li></ul></ul><ul><ul><li>Sensor Observation Service (SOS) multidimensional access to measurement data </li></ul></ul><ul><li>While these standards are based on the geospatial domain, many are designed to be extended to support non-geographic data “dimensions,” such as time and the many other dimension tables found in emissions inventories. </li></ul>Geospatial One-Stop
Web Coverage Service (WCS) http://webapps.datafed.net/ogc_EPA.wsfl ?SERVICE=wcs &REQUEST=GetCoverage &VERSION=1.0.0 &CRS=EPSG:4326 &COVERAGE=EPA_CAMD_HOUR.SO2_MASS &FORMAT=NetCDF-table &BBOX=-82.4606,42.9258,-82.4606,42.9258,0,0 &TIME=2002-04-01T15:00:00Z/2002-04-30T15:00:00Z &WIDTH=700 &HEIGHT=350 &DEPTH=99 WCS Server WCS Client GetCoverage Request GetCoverage GeoTiff,HDF, netCDF,CSV,ASCII… netCDF
Using Standard Interfaces for Web Access Emissions Portal <ul><li>Data access without standard interfaces: </li></ul><ul><li>Find data in Portal </li></ul><ul><li>Download data </li></ul><ul><li>Reformat / “Wrapping” </li></ul><ul><li>Repeat 1-3 for other datasets </li></ul><ul><li>Browse, visualize, analyze </li></ul><ul><li>Data access with standard interfaces: </li></ul><ul><li>Find data in Portal </li></ul><ul><li>Access through standard interfaces </li></ul><ul><li>Browse, visualize, analyze </li></ul>RETRO User Emissions Portal(s) RETRO WCS User EDGAR GEIA Current Process: Possible Future Process: 1. 2. 3. 1. 3. 2. 2. WCS WCS WCS 2. 2. Multi- dim cube WCS DataFed 5.
Summary Information technologies (particularly service oriented architectures and web services) provide opportunities to realize benefits of distributed databases using standardized interfaces Distributed databases allow data to remain maintained by owner - dynamically updated (avoids versioning issues) - make connection once – always get latest and greatest Standard interfaces foster networked activity and sharing of data and tools through interoperability - simplify integration and analysis by moving the information technology details to the background Federated inventories, datasets, models, analysis tools, portals - no “one-stop” can meet all user needs - faster progress through distributed, shared efforts