Ws Stuff

  • 774 views
Uploaded on

http://capitawiki.wustl.edu/index.php/20050317_Air_Quality_Cluster_TechTrack

http://capitawiki.wustl.edu/index.php/20050317_Air_Quality_Cluster_TechTrack

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
774
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
14
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Air Quality Cluster TechTrack Earth Science Information Partners Data Flow Technologies and Demonstrations Contact: Rudolf Husar, rhusar@me.wustl.edu Draft, March 2005 (intended as background for AQ Cluster discussions) Partners(?)
    • NASA
    • NOAA
    • EPA
    • USGS
    • DOE
    • NSF
    • Industry…
    Flow of Data Flow of Control Air Quality Data Meteorology Data Emissions Data Informing Public AQ Compliance Status and Trends Network Assess. Tracking Progress Data to Knowledge Transformation
  • 2. Information ‘Refinery’ Value Chain Information Informing Knowledge Action Productive Knowledge Data Organizing Grouping Classifying Formatting Displaying Analyzing Separating Evaluating Interpreting Synthesizing Judging Options Quality Advantages Disadvantages Deciding Matching goals Compromising Bargaining Deciding e.g. DAAC e.g. Langley IDEA, FASTNET, Forecast Modeling e.g. WG Reports e.g. RPO Manager Info. Quality Transform. Processes Examples ESIP – Facilitation ? Sates, EPA, International Value Chain Based on Taylor, 1975: Value-Added Processes in Information Systems
  • 3. Data Flow & Processing in AQ Management AQ DATA EPA Networks IMPROVE Visibility Satellite-PM Pattern METEOROLOGY Met. Data Satellite-Transport Forecast model EMISSIONS National Emissions Local Inventory Satellite Fire Locs Status and Trends AQ Compliance Exposure Assess. Network Assess. Tracking Progress AQ Management Reports ‘ Knowledge’ Derived from Data Primary Data Diverse Providers Data ‘Refining’ Processes Filtering, Aggregation, Fusion
  • 4. Data Flow and Flow Control in AQ Management
    • Data are supplied by the provider and exposed on the ‘smorgasbord’
    • However, the choice of which data is used is made by the user
    • Thus, data consumers, providers and mediators together form the info system
    Provider Push User Pull Flow of Data Flow of Control AQ DATA METEOROLOGY EMISSIONS DATA Informing Public AQ Compliance Status and Trends Network Assess. Tracking Progress Data to Knowledge Transformation
  • 5. Air Quality Data Use (EPA Ambient Air Monitoring Strategy, 2004) Ecosystem assessment Science support, processes Strategies, SIPs, model eval. Tracking progress (trends) NAAQS compliance NAAQS (Nat. Amb AQ Std) Informing the public ESIP Role AQ Mangmt Activity
  • 6. A Sample of Datasets Accessible through ESIP Mediation Near Real Time (~ day)
    • It has been demonstrated (project FASTNET) that these and other datasets can be accessed, repackaged and delivered by AIRNow through ‘Consoles’
    MODIS Reflectance MODIS AOT TOMS Index GOES AOT GOES 1km Reflec NEXTRAD Radar MODIS Fire Pix NRL MODEL NWS Surf Wind, Bext
  • 7. Web Services
    • A Web service is a software for interoperable machine-to-machine interaction over a network.
    • It has an interface described in a machine-processable format (specifically WSDL).
    • Web services communicate using SOAP-messages,
    • Typically conveyed using HTTP with an XML serialization in using Web-related standards.
  • 8.  
  • 9. AQ Data and Analysis: Challenges and Opportunities
    • Shift from primary to secondary pollutants . Ozone and PM2,5 travel 500 + miles across state or international boundaries and their sources are not well established
    • New Regulatory approach . Compliance evaluation based on ‘weight of evidence’ and tracking the effectiveness of controls
    • Shift from command & control to participatory management. Inclusion of federal, state, local, industry, international stakeholders.
    • Challenges
    • Broader user community . The information systems need to be extended to reach all the stakeholders ( federal, state, local, industry, international)
    • A richer set of data and analysis. Establishing causality, ‘weight of evidence’, emissions tracking requires more data and air quality analysis
    • Opportunities
    • Rich AQ data availability . Abundant high-grade routine and research monitoring data from EPA, NASA, NOAA and other agencies are now available .
    • New information technologies . DBMS, data exploration tools and web-based communication now allows cooperation (sharing) and coordination among diverse groups.
  • 10. The Researcher’s Challenge “ The researcher cannot get access to the data; if he can, he cannot read them; if he can read them, he does not know how good they are; and if he finds them good he cannot merge them with other data.” Information Technology and the Conduct of Research: The Users View National Academy Press, 1989
    • These resistances can be overcome through
    • A catalog of distributed data resources for easy data ‘ discovery ’
    • Uniform data coding and formatting for easy access, transfer and merging
    • Rich and flexible metadata structure to encode the knowledge about data
    • Powerful shared tools to access, merge and analyze the data
  • 11. Recap: Harnessing the Winds
    • Secondary pollutants along with more open environmental management style are placing increasing demand on data analysis. Meanwhile, rich AQ data sets and the computer and communications technologies offer unique opportunities.
    • It appears timely to consider the development of a web-based, open, distributed air quality data integration, analysis and dissemination system.
    • The challenge is learn how to harness the winds of change as sailors have learned to use the winds for going from A to B
  • 12. Uniform Coding and Formatting of Distributed Data
    • Data are now easily accessible through standard Internet protocols, but the coding and formatting of the data is very heterogeneous
    • On the other hand data sharing is most effective if the codes/formats/protocols are uniform (e.g. the Web formats and protocols )
    • Re-coding and reformatting all the heterogeneous data into universal form in their respective server is unrealistic
    • An alternative is enrich the heterogeneous data with uniform coding along the way from the provider to the user.
    • A third party ‘proxy’ server can perform the necessary homogenization with the following benefits:
      • The data user interfaces with a simple universal data query and delivery system (interface, formats..)
      • The data provider does not need to change the system; gets additional security protection since the data data accessed by the proxy
      • Reduced data flow resistances results in increased overall data flow and data usage.
  • 13. DataFed Servcies
    • Dvoy Services offer a homogeneous, read-only access mechanism to a dynamically changing collection of heterogeneous, autonomous and distributed information sources.
    • Data access uses a global multidimensional schema consisting of spatial, temporal and parameter dimensions, suitable for data browsing and online analytical processing, OLAP. The limited query capabilities yield slices through the spatio-temporal data cubes.
    • The main software components of Dvoy are wrappers , which encapsulate sources and remove technical heterogeneity, and mediators , which resolve the logical heterogeneity.
    • Wrapper classes are available for geo-spatial (incl. satellite) images, SQL servers, text files,etc. The mediator classes are implemented as web services for uniform data access, transformation and portrayal.
  • 14. DVOY Interfaces
    • Data Input
    • Data input
    • Data Output - Browser
    • The DVOY interface is composed of data viewers and controllers, all displayed on a webpage
    • The web services and the preparation of the webpage interface is through .NET(Microsoft)
    • The graphic data display on the webpage uses an SVG plugin (Adobe)
    • The DVOY controls are linked to the SVG plugin and the .NET through client-side JavaScript
    • Data Output – Web Service
      • The DVOY outputs are XML formatted datasets suitable for chaining with processing or rendering services
  • 15. NSF-NOAA-EPA/EMAP (NASA)? Project: Real-Time Aerosol Watch System Real-Time Virtual PM Monitoring Dashboard. A web-page for one-stop access to pre-set views of current PM monitoring data including surface PM, satellite, weather and model data. Virtual Workgroup Website. An interactive website which facilitates the active participation of diverse members in the interpretation, discussion, summary and assessment of the on-line PM monitoring data. Air Quality Managers Console. Helps PM managers make decisions during major aerosol events; delivers a subset of the PM data relevant to the AQ managers, including summary reports prepared by the Virtual workgroups.
  • 16. Dvoy Federated Information System
    • Dvoy offers a homogeneous, read-only access mechanism to a dynamically changing collection of heterogeneous, autonomous and distributed information sources.
    • Data access uses a global multidimensional schema consisting of spatial, temporal and parameter dimensions
    • The uniform global schema is suitable for data browsing and online analytical processing, OLAP
    • The limited global query capabilities yield slices along the spatial, temporal and parameter dimensions of the multidimensional data cubes.
  • 17. Mediator-Based Integration Architecture (Wiederhold, 1992)
    • Software agents (mediators) can perform many of the data integration chores
    • Heterogeneous sources are wrapped by translation software local to global language
    • Mediators (web services) obtain data from wrappers or other mediators and pass it on …
    • Wrappers remove technical, while mediators resolve the logical heterogeneity
    • The job of the mediator is to provide an answer to a user query ( Ullman , 1997 )
    • In database theory sense, a mediator is a view of the data found in one or more sources
    Wrapper Wrapper Service Service User Query View Busse et. al, 1999
  • 18. Value-Added Processing in Service Oriented Architecture
    • Peer-to-peer network representation
    Data, services and users are distributed throughout the network Users compose data processing chains form reusable services Intermediate data are also exposed for possible further use Chains can be linked to form compound value-adding processes Service chain representation User Tasks: Fi nd data and services Compose service chains Expose output User Carries less Burden In service-oriented peer-to peer architecture, the user is aided by software ‘agents’ Control Data Chain 1 Chain 2 Chain 3 Data Service Catalog User Chain 2 Chain 1 Chain 3 Data Service
  • 19. Dvoy Federated Information System
    • Dvoy offers a homogeneous, read-only access mechanism to a dynamically changing collection of heterogeneous, autonomous and distributed information sources.
    • Data access uses a global multidimensional schema consisting of spatial, temporal and parameter dimensions
    • The uniform global schema is suitable for data browsing and online analytical processing, OLAP
    • The limited global query capabilities yield slices along the spatial, temporal and parameter dimensions of the multidimensional data cubes.
  • 20. Architecture of DATAFED Federated Data System After Busse et. al., 1999
    • The main software components of Dvoy are wrappers , which encapsulate sources and remove technical heterogeneity, and mediators , which resolve the logical heterogeneity.
    • Wrapper classes are available for geo-spatial (incl. satellite) images, SQL servers, text files,etc. The mediator classes are implemented as web services for uniform data access to n-dimensional data.
  • 21. Integration Architecture ( Ullman , 1997 )
    • Heterogeneous sources are wrapped by software that translates between the sources local language, model and concepts and the shared global concepts
    • Mediators obtain information from one or more components (wrappers or other mediators) and pass it on to other mediators or to external users.
    • In a sense, a mediator is a view of the data found in one or more sources; it does not hold the data but it acts as it it did. The job of the mediator is to go to the sources and provide an answer to the query.
    Information Integration Using Logical Views Source Lecture Notes In Computer Science; Vol. 1186 archive Proceedings of the 6th International Conference on Database Theory
  • 22. Referencing Ullman(1)
    • Source integration for data warehousing
    • The main goal of a data warehouse is to provide support for data analysis and management's decisions, a aspect in design of a data warehouse system is the process of acquiring the raw data from a set of relevant information sources. We will call source integration system the component of a data warehouse system dealing with this process. The goal of a source integration system is to deal with the transfer of data from the set of sources constituting the application-oriented operational environment, to the data warehouse. Since sources are typically autonomous, distributed, and heterogeneous, this task has to deal with the problem of cleaning, reconciling, and integrating data coming from the sources . The design of a source integration system is a very complex task, which comprises several different issues. The purpose of this chapter is to discuss the most important problems arising in the design of a ource integration system, with special emphasis on schema integration, processing queries for data integration, and data leaningand reconciliation.
    • Data integration under integrity constraints
    • Data integration systems provide access to a set of heterogeneous, autonomous data sources through a so-called global schema . There are basically two approaches for designing a data integration system. In the global-as-view approach, one defines the elements of the global schema as views over the sources, whereas in the local-as-view approach, one characterizes the sources as views over the global schema. It is well known that processing queries in the latter approach is similar to query answering with incomplete information, and, therefore, is a complex task. On the other hand, it is a common opinion that query processing is much easier in the former approach. In this paper we show the surprising result that, when the global schema is expressed in the relational model with integrity constraints, even of simple types, the problem of incomplete information implicitly arises, making query processing difficult in the global-as-view approach as well. We then focus on global schemas with key and foreign key constraints, which represents a situation which is very common in practice, and we illustrate techniques for effectively answering queries posed to the data integration system in this case.
    • MedMaker : A Mediation System Based on Declarative Specifications
    • Ullman & co: Mediators are used for integration of heterogeneous information sources . We present a system for declaratively specifying mediators. It is targeted for integration of sources with unstructured or semi-structured data and/or sources with changing schemas. We illustrate the main features of the Mediator Specification Language (MSL), show how they facilitate integration, and describe the implementation of the system that interprets the MSL specifications
    • Mediators in the Architecture of Future Information Systems
    • Gio Wiederhold For single databases, primary hindrances for end-user access are the volume of data that is becoming available, the lack of abstraction, and the need to understand the representation of the data. When information is combined from multiple databases, the major concern is the mismatch encountered in information representation and structure. Intelligent and active use of information requires a class of software modules that mediate between the workstation applications and the databases. It is shown that mediation simplifies, abstracts, reduces, merges, and explains data . A mediator is a software module that exploits encoded knowledge about certain sets or subsets of data to create information for a higher layer of applications. A model of information processing and information system components is described. The mediator architecture, including mediator interfaces, sharing of mediator modules, distribution of mediators , and triggers for knowledge maintenance, are discussed.
  • 23. Referencing Ullman (2)
    • Extracting information from heterogeneous information sources using ontologically specified target views
    • Being deluged by exploding volumes of structured and unstructured data contained in databases, data warehouses, and the global Internet, people have an increasing need for critical information that is expertly extracted and integrated in personalized views . Allowing for the collective efforts of many data and knowledge workers , we offer in this paper a framework for addressing the issues involved. In our proposed framework we assume that a target view is specified ontologically and independently of any of the sources , and we model both the target and all the sources in the same modeling language. Then, for a given target and source we generate a target-to-source mapping, that has the necessary properties to enable us to load target facts from source facts. The mapping generator raises specific issues for a user's consideration, but is endowed with defaults to allow it to run to completion with or without user input. The framework is based on a formal foundation, and we are able to prove that when a source has a valid interpretation, the generated mapping produces a valid interpretation for the part of the target loaded from the source
    • Secure mediation: requirements, design, and architecture
    • In mediated information systems clients and various autonomous sources are brought together by mediators . The mediation paradigm needs powerful and expressive security mechanisms considering the dynamics and conflicting interests of the mediation participants. Firstly, we discuss the security requirements for mediation with an emphasis on confidentiality and authenticity. We argue for basing the enforcement of these properties on certified personal authorization attributes rather than on identification. Using a public key infrastructure such personal authorization attributes can be bound to asymmetric encryption keys by credentials. Secondly, we propose a general design of secure mediation where credentials are roughly used as follows: clients show their eligibility for receiving requested information by the contained personal authorization attributes, and sources and the mediator guarantee confidentiality by using the contained encryption keys. Thirdly, we refine the general design for a specific approach to mediation, given by our prototype of a Multimedia Mediator, MMM. Among other contributions, we define the authorization model and the specification of query access authorizations within the framework of ODL, as well as the authorization and encryption policies for mediation, and we outline the resulting security architecture of the MMM. We also analyze the achievable security properties including support for anonymity, and we discuss the inevitable tradeoffs between security and mediation functionality
    • Research Commentary: An Agenda for Information Technology Research in Heterogeneous and Distributed Environments
    • Application-driven, technology-intensive research is critically needed to meet the challenges of globalization, interactivity, high productivity, and rapid adaptation faced by business organizations. Information systems researchers are uniquely positioned to conduct such research, combining computer science, mathematical modeling, systems thinking, management science, cognitive science, and knowledge of organizations and their functions . We present an agenda for addressing these challenges as they affect organizations in heterogeneous and distributed environments. We focus on three major capabilities enabled by such environments: Mobile Computing, Intelligent Agents, and Net-Centric Computing . We identify and define important unresolved problems in each of these areas and propose research strategies to address them.
  • 24. IS Interoperability
    • Information systems interoperability: What lies beneath?
    • A comprehensive framework for managing various semantic conflicts is proposed. Our proposed framework provides a unified view of the underlying representational and reasoning formalism for the semantic mediation process. This framework is then used as a basis for automating the detection and resolution of semantic conflicts among heterogeneous information sources. We define several types of semantic mediators to achieve semantic interoperability . A domain-independent ontology is used to capture various semantic conflicts. A mediation-based query processing technique is developed to provide uniform and integrated access to the multiple heterogeneous databases . A usable prototype is implemented as a proof-of-concept for this work. Finally, the usefulness of our approach is evaluated using three cases in different application domains. Various heterogeneous datasets are used during the evaluation phase. The results of the evaluation suggest that correct identification and construction of both schema and ontology-schema mapping knowledge play very important roles in achieving interoperability at both the data and schema levels
    • Mediation:
    • Syntactic – Web services, Dimensional
    • Dimensional - Global Data Model
    • Semantic - ??? Ontotlogy, name spaces
    • Semantic mediation is for content. Dimensional mediation is by the wrappers!!
    • In a recent research commentary March et al. [2000] identified semantic interoperability as one of the most important research issues and technical challenges in heterogeneous and distributed environments.
    • Semantic interoperability is the knowledge-level interoperability that provides cooperating businesses with the ability to bridge semantic conflicts arising from differences in implicit meanings, perspectives, and assumptions. Semantic interoperability creates semantically compatible information environment based on the agreed concepts among the cooperating entities.
    • Syntactic interoperability , is the application-level interoperability that allows multiple software components to cooperate even though their implementation languages, interfaces, and execution platforms are different. Emerging standards, such as XML and Web Services based on SOAP (Simple Object Access Protocol), UDDI (Universal, Description, Discovery, and Integration), and WSDL (Web Service Description Language), can resolve many application-level interoperability problems through technological means.
  • 25. Referencing Ullman (3)
    • Federated Information Systems: Concepts, Terminology and Architectures (1999)
    • Busse & co: We are currently witnessing the emerging of a new generation of software systems: Federated information systems. Their main characteristic is that they are constructed as an integrating layer over existing legacy applications and databases. They can be broadly classified in three dimensions: the degree of autonomy they allow in integrated components, the degree of heterogeneity between components they can cope with, and whether or not they support distribution. Whereas the communication an
    • IDB: Toward the Scalable Integration of Queryable Internet Data Sources
    • As the number of databases accessible on the Web grows, the ability to execute queries spanning multiple heterogeneous queryable sources is becoming increasingly important. To date, research in this area has focused on providing semantic completeness, and has generated solutions that work well when querying over a relatively small number of databases that have static and well-defined schemas. Unfortunately, these solutions do not extend to the scale of the present Internet, let alone the
    • Building Intelligent Web Applications Using Lightweight Wrappers (2000
    • The Web so far has been incredibly successful at delivering information to human users. So successful actually, that there is now an urgent need to go beyond a browsing human. Unfortunately, the Web is not yet a well organized repository of nicely structured documents but rather a conglomerate of volatile HTML pages. To address this problem, we present the World Wide Web Wrapper Factory (W4F), a toolkit for the generation of wrappers for Web sources, that oers: (1) an expressive language to specify the extraction of complex structures from HTML pages; (2) a declarative mapping to various data formats like XML; (3) some visual tools to make the engineering of wrappers faster and easier.
    • A Unified Framework for Wrapping, Mediating and Restructuring Information from the Web
    • The goal of information extraction from the Web is to provide an integrated view on heterogeneous information sources via a common data model and query language. A main problem with current approaches is that they rely on very different formalisms and tools for wrappers and mediators, thus leading to an "impedance mismatch" between the wrapper and mediator level. In contrast, our approach integrates wrapping and mediation in a unified framework based on an object-oriented data model which
    • Wrappers translate data from different local languages to common format, mediators provide integrated VIEW of the data.
    • Multi-Mediator Browser
    • With the mediator/wrapper approach to data integration [Wie92] wrappers define interfaces to heterogeneous data sources while mediators are virtual database layers where queries and views to the mediated data can be defined. The mediator approach to data integration has gained a lot of interest in recent year [BE96,G97,HKWY97,LP97,TRV98]. Early mediator systems are central in that a single mediator database server integrates data from several wrapped data sources. In the work presented here, the integration of many sources is facilitated through a scalable mediator architecture where views are defined in terms of object-oriented (OO) views from other mediators and where different wrapped data sources can be plugged in. This allows for a component-based development of mediator modules, as early envisioned in [Wie92].
  • 26. Distributed Programming: Interpreted and Compiled
    • Web services allow processing of distributed data
      • Data are distributed and maintained by their custodians,
      • Processing nodes (web-services) are also distributed
      • ‘ Interpreted’ web-programs for data processing can be created ad hoc by end users
    • However, ‘interpreted’ web programs are slow, fragile and uncertain
      • Slow due to large data transfers between nodes
      • Fragile due to instability of connections
      • Uncertain due to failures of data provider and processing nodes
    • One solution is to ‘compile’ the data and processing services
      • Data compilation transforms the data for fast, effective access (e.g. OLAP)
      • Web service compilation combines processes for effective execution
    • Interpreted or compiled?
      • Interpreted web programs are simpler and up to date but slow, fragile, uncertain
      • Compiled versions are more elaborate and latent but also faster and more robust
      • Frequently used datasets and processing chains should be compiled and kept current
  • 27. Interpreted and Compiled Service
    • Interpreted Service
    • Processes distributed
    • Data flow on Internet
    • Compiled Service
    • Processes in the same place
    • Data flow within aggregate service
    • Controllers, e.g. zoom can be shared
    Point Access Point Grid Grid Render Point Render PtGrid Overlay Point Access Point Grid Grid Render Point Access Point Render PtGrid Overlay Data Flow Control Flow
  • 28. Services Program Execution: Reverse Polish Notation
    • Writing the WS program:
    • - Write the program on the command line of a URL call
    • - Services are written sequentially using RPN
    • - Replacements
    • Connector/Adaptor:
    • - Reads the service name from the command line and loads its WSDL
    • - Scans the input WSDL
    • - The schema walker populates the service input fields from:
    • - the data on the command line
    • - the data output of the upstream process
    • - the catalog for the missing data
    • Service Execution
    • For each service
    • Reads the command line, one service at a time
    • Passes the service parameters to the above Connector/Adopter, which prepares the service
    • Executes the service
    • It also handles the data stack for RPN
  • 29.  
  • 30. Mediator-Based Integration Architecture (Wiederhold, 1992)
    • Software agents (mediators) can perform many of the data integration chores
    • Heterogeneous sources are wrapped by translation software local to global language
    • Mediators (web services) obtain data from wrappers or other mediators and pass it on …
    • Wrappers remove technical, while mediators resolve the logical heterogeneity
    • The job of the mediator is to provide an answer to a user query ( Ullman , 1997 )
    • In database theory sense, a mediator is a view of the data found in one or more sources
    Wrapper Wrapper Service Service User Query View Busse et. al, 1999
  • 31. An Application Program: Voyager Data Browser
    • The web-programs consists of a stable core and adoptive input/output layers
    • The core maintains the state and executes the data selection, access and render services
    • The adoptive, abstract I/O layers connects the core to evolving web data, flexible displays and to the a configurable user interface:
      • Wrappers encapsulate the heterogeneous external data sources and homogenize the access
      • Device Drivers translate generic, abstract graphic objects to specific devices and formats
      • Ports connect the internal parameters of the program to external controls
      • WDSL web service description documents
    Data Sources Controls Displays I/O Layer Device Drivers Wrappers App State Data Flow Interpreter Core Web Services WSDL Ports
  • 32. DataFed Topology: Mediated Peer-to-Peer Network, MP2P Mediated Peer-to Peer Network Broker maintains a catalog of accessible resources Peers find data and access instructions in the catalog Peers get resources directly from peer providers Google Example: Finding Images on Network Topology Google catalogs the images related to Network Topology User selects an image from the cached image catalog User visits the provider web-page where the image resides Source : Federal Standard 1037C Mediator Peers
  • 33. Generic Data Flow and Processing in DATAFED DataView 1 Data Processed Data Portrayed Data Process Data Portrayal/ Render Abstract Data Access View Wrapper Physical Data Abstract Data Physical Data Resides in autonomous servers; accessed by view-specific wrappers which yield abstract data ‘slices’ Abstract Data Abstract data slices are requested by viewers; uniform data are delivered by wrapper services DataView 2 DataView 3 View Data Processed data are delivered to the user as multi-layer views by portrayal and overlay web services Processed Data Data passed through filtering, aggregation, fusion and other web services
  • 34. Tight and Lose Coupled Programs
    • Coupling is the dependency between interacting systems
    • Dependency can be real (the service one consumes) or artificial (language, platform…)
    • One can never reduce real dependency but itself is evolving
    • One can never get rid of artificial dependency but one can reduce artificial dependency or the cost of artificial dependency.
    • Hence, loose coupling describes the state when artificial dependency or the cost of artificial dependency has been reduced to the minimum.
    They are self-contained, self-describing, modular applications that can be published, located, and invoked across the Web. Web services perform functions, which can be anything from simple requests to complicated business processes... SOA is the right mechanism—a transmission of sorts—for an IT environment in which data-crunching legacy systems must mesh with agile front-facing applications
  • 35. The pathway to a service-oriented architecture Bob Sutor, IBM
    • In an SOA world, business tasks are accomplished by executing a series of "services,“
      • Services have well-defined ways of talking to them and well-defined ways in which they talk back
      • It doesn't really matter how a service is implemented, as long as it properly responds and offers the quality of service
      • The service must be secure, reliable and fast enough
      • SOA a suitable technology to use in an IT environment where software and hardware from multiple vendors is deployed
      • IBM identified four steppingstones on the path to SOA nirvana and its full business benefits
    • 1. Make applications available as Web services to multiple consumers via a middle-tier Web application server.
      • This is an ideal entry point for those wishing to deploy an SOA with existing enterprise applications
      • Target customer retention or operational efficiency projects
      • Work with multiple consumers to correctly define the granularity of your services
      • Pay proper attention to keeping the services and the applications loosely coupled.
    • 2. Choreography of web services
      • Create business logic outside the boundaries of any one application
      • Make new apps easy to adjust as business conditions or strategy require
      • Keep in mind asynchronous services and sequences of actions that don't complete successfully
      • Don’t forget managing the processes and services
    • 3. Leverage existing integration tools
      • leverage your existing enterprise messaging software; enterprise messaging and brokers
      • Web services give you an easy entry to SOA adoption
      • WS are not the gamut of what service orientation means; Modeling is likely required
      • Integration with your portal for "people integration" will also probably happen
    • 4. Be flexible, responsive on-demand business
      • Use SOA to gain efficiencies, better use of software and information assets, and competitive differentiation
  • 36.
    • The pathway to a service-oriented architecture
    • Opinion by Bob Sutor, IBM Bob Sutor is IBM's director of WebSphere Infrastructure Software. DECEMBER 03, 2003 (COMPUTERWORLD) - I recently read through a large collection of analyst reports on service-oriented architecture (SOA) that have been published in the past year. I was pleasantly surprised at the amount of agreement among these industry observers and their generally optimistic outlook for the adoption of this technology.
    • SOA is not really new -- by some accounts, it dates back to the mid-1980s -- but it's starting to become practical across enterprise boundaries because of the rise of the Internet as a way of connecting people and companies. Even though the name sounds very technical, it's the big picture behind the use of Web services, the plumbing that's now being used to tie together companies with their customers, suppliers and partners.
    • In an SOA world, business tasks are accomplished by executing a series of "services," jobs that have well-defined ways of talking to them and well-defined ways in which they talk back. It doesn't really matter how a particular service is implemented, as long as it responds in the expected way to your commands and offers the quality of service you require. This means that the service must be secure, reliable and fast enough to meet your needs. This makes SOA a nearly ideal technology to use in an IT environment where software and hardware from multiple vendors is deployed.
    • At IBM, we've identified four steppingstones on the path to SOA nirvana and its full business benefits. Unlike most real paths, this is one you can jump on at almost any point
    • The first step is to start making individual applications available as Web services to multiple consumers via a middle-tier Web application server. I'm not precluding writing new Web services here, but this is an ideal entry point for those wishing to deploy an SOA with existing Java or Cobol enterprise applications, perhaps targeting customer retention or operational efficiency projects.
    • You should work with multiple consumers to correctly define the granularity of your services, and pay proper attention to keeping the services and the applications using them loosely coupled.
    • The second step involves having several services that are integrated to accomplish a task or implement a business process. Here you start getting serious about modeling the process and choreographing the flow among the services, both inside and outside your organization, that implement the process.
    • The choreography is essentially new business logic that you have created outside the boundaries of any one application, therefore making it easier to adjust as business conditions or strategy require. As part of this, you will likely concern yourself with asynchronous services as well as compensation for sequences of actions that don't complete successfully. Your ability to manage the processes and services and their underlying implementations will start to become important as well. This entry point is really "service-oriented integration" and will probably affect a single department or a business unit such as a call center.
    • If you already have an enterprise application integration infrastructure, you will most like enter the SOA adoption path at the third steppingstone. At this point, you are looking to use SOA consistently across your entire organization and want to leverage your existing enterprise messaging software investment.
    • I want to stress here that proper SOA design followed by use of enterprise messaging and brokers constitutes a fully valid deployment of SOA. Web services give you an easy entry to SOA adoption, but they don't constitute the gamut of what service orientation means. Modeling is likely required at this level, and integration with your portal for "people integration" will also probably happen here if you didn't already do this at the previous SOA adoption point.
    • The final step on the path is the one to which you aspire: being a flexible, responsive on-demand business that uses SOA to gain efficiencies, better use of software and information assets, and competitive differentiation. At this last level, you'll be able to leverage your SOA infrastructure to effectively run, manage and modify your business processes and outsource them should it make business sense to do so.
    • You are probably already employing some SOA in your organization today. Accelerating your use of it will help you build, deploy, consume, manage and secure the services and processes that can make you a better and more nimble business.
  • 37. Don Box , Microsoft
    • Integration in the Distributed Object Technology was by code injection.
    • This introduced deep technological dependency of the distributed objects
    • Web Services introduced protocol-based integration or service-orientation
  • 38. Don Box , Microsoft
    • When we started working on SOAP in 1998, the goal was to get away from this model of integration by code injection that distributed object technology had embraced . Instead, we wanted an integration model that made as few possible assumptions about the other side of the wire, especially about the technology used to build the other side of the wire. We've come to call this style of integration protocol-based integration or service-orientation . Service-orientation doesn't replace object-orientation - I don't see the industry (or Microsoft) abandoning objects as the primary metaphor for building individual programs. I do see the industry (and Microsoft) moving away from objects as the primary metaphor for integrating and coordinating multiple programs that are developed, deployed and versioned independently, especially across host boundaries.  In the 1990's, we stretched the object metaphor as far as we could, and through communal experience, we found out what the limits are. With Indigo, we're betting on the service metaphor and attempting to make it as accessible as humanly possible to all developers on our platform. How far we can "shrink" the service metaphor remains to be seen. Is it suitable for cross-process work? Absolutely. Will every single CLR type you write be a service in the future? Probably not - at some point you know where your integration points are and want the richer interaction you get with objects.
  • 39. Data Processing in Service Oriented Architecture
    • Data Values
      • Immediacy
      • Quality
    • Lose Coupling of data
    • Open data processing – let competitive approaches deliver the appropriate products to the right place
  • 40. Major Service Categories As envisioned by Open GiS Consortium ( OGC ) Service Category Description Human Interaction Managing user interfaces, graphics, presentation.   Info. Management Managing and storage of metadata, schemas, datasets.   Workflow Services that support specific tasks or work-related activities.   Processing Data processing, computations; no data storage or transfer   Communication Services that encode and transfer data across networks.   Sys. Management Managing system components, applications, networks (access).   0000 OGC web service [ROOT] 1000 Human interaction 1100 Portrayal 1110 Geospatial viewer 1111 Animation 1112 Mosaicing 1113 Perspective 1114 Imagery 1120 Geospatial symbol editor 1130 Feature generalization editor 1200 Service interaction editor 1300 Registry browser 2000 Information Management 2100 Feature access 2200 Coverage access 2210 Real-time sensor 2300 Map access 2400 Gazetteer 2500 Registry 2600 Sensor access 2700 Order handling 3000 Workflow 3100 Chain definition 3200 Enactment 3300 Subscription 4000 Processing 4100 Spatial 4110 Coordinate conversion 4120 Coordinate transformation 4130 Representation conversion 4140 Orthorectification 4150 Subsetting 4160 Sampling 4170 Feature manipulation 4180 Feature generalization 4190 Route determination 41A0 Positioning 4200 Thematic 4210 Geoparameter calculation 4220 Thematic classification 4221 Unsupervised 4222 Supervised 4230 Change detection 4240 Radiometric correction 4250 Geospatial analysis 4260 Image processing 4261 Reduced resolution generation 4262 Image manipulation 4263 Image synthesis 4270 Geoparsing 4280 Geocoding 4300 Temporal 4310 Reference system transformation 4320 Subsetting 4330 Sampling 4340 Proximity analysis 4400 Metadata 4410 Statistical analysis 4420 Annotation 5000 Communication 5100 Encoding 5200 Format conversion 5300 Messaging 6000 System Management OGC code Service class
  • 41. SOAP and WSDL SOAP Envelope for message description and processing A set of encoding rules for expressing data types Convention for remote procedure calls and responses A binding convention for exchanging messages WSDL Message format Ports
  • 42. Services in 500 words
    • Introduction
    • Search and Retrieve Web Service (SRW) and Search and Retrieve URL Service (SRU) are Web Services-based protocols for querying databases and returning search results. SRW and SRU requests and results are similar, their difference lies in the ways the queries and results are encapsulated and transmitted between client and server applications.
    •  
    • Basic "operations"
    • Both protocols define three and only three basic "operations": explain, scan, searchRetrieve.
    • explain . Explain operations are requests sent by clients as a way of learning about the server's database. At minimum, responses to explain operations return the location of the database, a description of what the database contains, and what features of the protocol the server supports.
    • scan . Scan operations enumerate the terms found in the remote database's index. Clients send scan requests and servers return lists of terms. The process is akin to browsing a back-of-the-book index where a person looks up a term in a book index and "scans" the entries surrounding the term.
    • searchRetrieve . - SearchRetrieve operations are the heart of the matter. They provide the means to query the remote database and return search results. Queries must be articulated using the Common Query Language. CQL queries range from simple freetext searches to complex Boolean operations with nested queries and proximity qualifications. Servers do not have to implement every aspect of CQL, but they have to know how to return diagnostic messages when something is requested but not supported. The results of searchRetrieve operations can be returned in any number of formats, as specified via explain operations. Examples might include structured but plain text streams or data marked up in XML vocabularies such as Dublin Core, MARCXML, MODS, etc.
    •  
    • Differences in operation
    • The differences between SRW and SRU lie in the way operations are encapsulated and transmitted between client and server as well as how results are returned. SRW is essentially as SOAP-ful Web service. Operations are encapsulated by clients as SOAP requests and sent to the server. Likewise, responses by servers are encapsulated using SOAP and returned to clients.
    • On the other hand, SRU is essentially a REST-ful Web Service. Parameters are encoded as name/value pairs in the query string of a URL. As such operations sent by SRU clients can only be transmitted via HTTP GET requests. The result of SRU requests are XML streams, the same streams returned via SRW requests sans the SOAP envelope.
    •  
    • Summary
    • SRW and SRU are "brother and sister" standardized protocols for accomplishing the task of querying databases and returning search results. If index providers were to expose their services via SRW and/or SRU, then access to these services would become more ubiquitous.
  • 43. Lib Congress
    • SOAP Web Services (SWS) and URL Web Services (UWS) are protocols for querying and returning results from remote servers. The difference is in encapsulation of queries and results transmitted between clients and servers.
    • In SWS, the messages between the client and server are encapsulated in an XML SOAP envelope.
    • In UWS, the web service parameters are encoded as name/value pairs in the query string of a URL and transmitted via HTTP GET requests. The results are returned as XML streams or ASCII streams, without the SOPA envelope.
    • When providers expose their services via SWS or UWS, then the use of these services (data/processing/rendering, etc) can be become more ubiquitous.
  • 44. REST Web Services
    • REST, unlike SOAP, doesn't require you to install a separate tool kit to send and receive data. Instead, the idea is that everything you need to use Web services is already available if you know where to look. HTTP lets you communicate your intentions through GET, POST, PUT, and DELETE requests. To access resources, you request URIs from Web servers.
    • Therefore, REST advocates claim, there's no need to layer the increasingly complicated SOAP specification on top of these fundamental tools. For more on REST, see the RESTwiki and Paul Prescod's pieces on XML.com .
    • There may be some community support for this philosophy. While SOAP gets all the press, there are signs REST is the Web service that people actually use. Since Amazon.com has both SOAP and REST APIs, they're a great way to measure usage trends. Sure enough, at OSCon, Jeff Barr, Amazon.com's Web Services Evangelist, revealed that Amazon handles more REST than SOAP requests. I forgot the exact percentage, but Tim O'Reilly blogged this number as 85% ! The number I do remember from Jeff's talk, however, is 6, as in "querying Amazon using REST is 6 times faster than with SOAP".
    • The hidden battle between web services: REST versus SOAP
    • Is SOAP a washout?
    • Are more developers turning their backs on SOAP for Web services? Redmonk’s James Governor just posted this provocative thought at his MonkChips blogsite:
    • "Evidence continues to mount that developers can’ t be bothered with SOAP and the learning requirements associated with use of the standard for information interchange. It is often described as ‘lightweight’, but its RPC [remote procedure call] roots keep showing. …semantically rich platforms like flickr and Amazon are being accessed by RESTful methods, not IBM/MS defined ‘XML Web Services’ calls.“
  • 45. NetKernel
    • Every software component on NetKernel is addressed by URI like a Web resource. A component is executed as a result of issuing Web-like[REST] requests. A software component on NetKernel is therefore a simple software service which hides the complexity of its internal implementation.
    • Applications or higher-order services are built by composing simple services. Service composition can be written in a wide-choice of either procedural or declarative languages. You can think of this as Unix-like application composition and pipelines in a uniform URI-based application context.
    • Complexity is managed through the URI address space which may be remodelled and extended indefinitely. A complex aggregated service may always be abstracted into one or more higher-level services or URI interfaces.
    • It generalizes the principles of REST, the basis for the successful operation of the World Wide Web, and applies them down to the finest granularity of service-based software composition. The Web is the most scaleable and adaptive information system ever - now these properties can be realized in general software systems too.
    • The NetKernel abstraction borrows many Unix-like principles including enabling emergent complexity through the pipelining of simple components and by offering managed localized application contexts.
    • NetKernel provides many features which create a truly satisfying and productive environment for developing services and service oriented applications.
    • Service Oriented Development
    • Why limit service oriented architectures to distributed system integration? With NetKernel the service oriented concepts are intrinsic in the development model right down to the finest levels of granularity. With NetKernel, Software Development is the dynamic composition of simple services
    • I never heard the phrase "REST microkernel" before, but I had an immediate expectation of what that would mean. An hour's experimentation with the system met that expectation. Wildly interesting stuff.
    • Jon Udell , infoworld .co
    • The philosophy and thinking behind the development of the NetKernel is captured in our whitepaper, NetKernel : From Websites to Internet Operating Systems . A Hewlett Packard Research Report presents the Dexter Project, the project which seeded NetKernel.
  • 46. NetKernel – Peter Rogers
    • CASE STUDY - Service-Oriented-Development On NetKernel - Patterns, Processes And Product To Reduce The Complexity Of IT Systems1060 NetKernel Case Study To Apply Service Oriented Abstraction To Any Application, Component Or Service
    • Web services hold great promise for exposing functionality to the outside world. They allow organizations to quickly connect disparate systems in a platform neutral manner. The real challenge occurs when Web services need to address the underlying complexity and inflexibility of the systems they connect together. While Web services provide an interface to connect systems - there remains the increasing complexity of the applications you have built, and are currently building, which sit behind those interfaces. 1060 NetKernel applies the underlying architectural principles of the Web and Web services together with Unix-like scheduling and pipelines to provide radical flexibility and improved simplicity by providing a platform to apply Service Oriented Architecture throughout your application environment. Developed through the exploration of some of the most complex Internet commerce systems, 1060 NetKernel will allow you to apply service oriented abstraction to any application, component or service.
    • The result: the ability to realize the promise of adaptive SOA with service-implementations which are dynamically adaptive and easily change with your business.
    • Peter Rodgers is the founder and CEO of 1060 Research and architect of the 1060 NetKernel XML Application Server. Prior to starting 1060 he established and led Hewlett-Packard's XML research programm and provided strategic consultancy to Hewlett Packard's software businesses. Peter holds a PhD in solid-state quantum mechanics from the University of Nottingham. ( more )
  • 47. Coordinated Views and Exploratory visualization
    • T his project involves investigating a novel coordination model and developing related software system. The objectives of the project are:-
      • investigate aspects of coupling within an exploratory visualization environment;
      • develop a formal coordination model for use with exploratory visualization;
      • produce a publicly available visualization software system with the model as an integral part.
    • M otivation for this research comes from the rapid rise in numbers of people using and developing exploratory visualization techniques. Indeed, coordination is used in many visualization tools, coupling navigation or selection of elements. The use of such exploratory tools generates an explosion of different views, but there has been little research done in looking at an underlying model and effective techniques for connecting elements, particularly in the context of the abundant windows that are generated when using exploratory methods that provide profuse views with slightly different content (aggregations, parameterizations or forms).
    • T here are many terms to do with multiples that may be used in this context [6]. From multiple windows , an all encompassing term to describe any multi window system, through multiple views of many separate presentations of the same information; to Multiform which refers to different representations (different forms) of the same data, a useful technique, as Brittain et al [21] explain "it is best to allow the user to have as many renderings (e.g. cutting planes, isosurface, probes) as desired on the screen at once"; the user sees the information in different ways to hopefully provide a better understanding of the underlying information. Additionally, abstract techniques are useful. These, present the information in a view that is related to the original view but have been altered or generalized to simplify the image [1]. The London Underground map is a good example of an abstract map, by displaying the connectivity of the underground stations and losing the positional information of the stations to simplify the whole schematic.
    • C oordination. There are two different reasons for using coordination, either for selection or for navigation [32]. Selection allows the user to highlight one or many items either as a choice of items for a filtering operation or as an exploration in its own right, this is often done by direct manipulation where the user directly draws or wands the mouse over the visualization itself (a brushing operation [39]). Joint navigation provides methods to quickly view related information in multiple different windows, thus providing rapid exploration by saving the user from performing the same or similar operations multiple times. Moreover, these operations need not be applied to the same information but, more interestingly, to collections of different information. Coordination and abstract views provide a powerful exploratory visualization tool [1], for example, in a three-dimensional visualization, a navigation or selection operation may be inhibited by occlusion, but the operation may be easier using an abstract view; thus, a linked abstract view may be used to better control and investigate the information in the coupled view.
    • REFERENCES
    • [1]. Jonathan C. Roberts. Aspects of Abstraction in Scientific Visualization. Ph.D thesis, University of Kent at Canterbury, Computing Laboratory, Canterbury, Kent, England, UK, CT2 7NF, October 1995.
    • [6.] Jonathan C. Roberts. Multiple-View and Multiform Visualization. Visual Data Exploration and Analysis VII. Proceedings of SPIE. Vol. 3960, pages 176--185. January 2000.
    • [21.] Donald L. Brittain, Josh Aller, Michael Wilson, and Sue­Ling C. Wang. Design of an end­user data visualization system. In Proceedings Visualization '90. IEEE Computer Society, pages 323--328. 1990.
    • [32.] Chris North and Ben Shneiderman. A Taxonomy of Multiple Window Coordinations. University of Maryland Computer Science Dept. Tecnical Report #CS-TR-3854. 1997
    • [39.] Matthew O. Ward. XmdvTool: Integrating multiple methods for visualizing multivariate data. In Proceedings Visualization '94, pages 326--333. IEEE Computer
  • 48.
    • Note that in distributed systems responsibility is distributed. For NVODS responsibility for
    • The data lies with the data providers.
    OpenDAP Responsibility
    • Data location with the GCMD and NVODS.
    • Application packages (Matlab, Ferret, Excel…) with the developers of these packages. (Services??)
    • The data access protocol lies with OPeNDAP .
  • 49. OpenDAP
    • OPeNDAP allows serving and accessing
    • data over the internet using analysis and visualization tools like Matlab, Ferret, IDL and many other OPeNDAP clients
    • OpenDAP unifies the data variety of formats and allows accessing only only the data of interest
    • OPeNDAP allows you to make your data available remotely.
    • Your data can be in a variety of data format, including netCDF and HDF
    • For other available formats, see our complete list of OPeNDAP servers
  • 50. Don Box , Microsoft
    • When we started working on SOAP in 1998, the goal was to get away from this model of integration by code injection that distributed object technology had embraced . Instead, we wanted an integration model that made as few possible assumptions about the other side of the wire, especially about the technology used to build the other side of the wire. We've come to call this style of integration protocol-based integration or service-orientation . Service-orientation doesn't replace object-orientation - I don't see the industry (or Microsoft) abandoning objects as the primary metaphor for building individual programs. I do see the industry (and Microsoft) moving away from objects as the primary metaphor for integrating and coordinating multiple programs that are developed, deployed and versioned independently, especially across host boundaries.  In the 1990's, we stretched the object metaphor as far as we could, and through communal experience, we found out what the limits are. With Indigo, we're betting on the service metaphor and attempting to make it as accessible as humanly possible to all developers on our platform. How far we can "shrink" the service metaphor remains to be seen. Is it suitable for cross-process work? Absolutely. Will every single CLR type you write be a service in the future? Probably not - at some point you know where your integration points are and want the richer interaction you get with objects.
  • 51. Interoperability Stack HTTP, SMTP addressing, data flow Transport XML data format Syntax Schema, WSDL types Data SOAP, WS-* ext. communication behavior Protocol WSDL ext., Policy, RDF meaning Semantics Standards Description Layer