Going beyond the data.doc


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Going beyond the data.doc

  1. 1. Geo-Data Portal: Final Report Go-Geo! – the Geo-Data Portal Project Going beyond the data - data mining, visualisation, exploitation and analysis of geospatial data Date: 14 August 2003 Status of document: Final 10/05/10 1
  2. 2. Geo-Data Portal: Final Report Going beyond the data - data mining, visualisation, exploitation and analysis of geospatial data Part of the Go-Geo! portal project involved investigating: • how access could be provided to 'deep' geospatial resources (data mining) and provide a demonstration of this functionality if possible; • how a comprehensive service would include search and browse, visualisation, exploitation and analysis of geospatial data. In the original scoping study, it was noted that, having found a geospatial dataset that seems to meet their requirements, it should be possible for a user to gain access to that data. Three sorts of access can be identified and are discussed below. First we describe the situation where the user is taken to the data usually via a link to some on-line service. The service may just provide a data extraction function or it may provide clients through which the data can be used. Second, we describe the still rare situation where the data can be automatically ‘mined’ from a remote, online data source and delivered to some type of web client where it can visualised and possibly analysed. This client could be a application offered by the geo-data portal. In the two types of access described so far some form of recognised service (which is free, subscription but free at point of use, or pay as you go) is offered through which users are able to access the geospatial data. The third sort of access is where an individual, research centre or a academic department holds the data and to which a user requires access. We describe how this type of access might be facilitated at the end of this document. Taking the user to the data In its simplest form gaining access to geospatial data may just involve the user following a link provided in the metadata record to an on-line ordering service. Through the service the user can order a copy of the data they require. This is the sort of service traditionally offered by commercial data providers. Having placed an order, the data is normally dispatched to the user on a CD or some other type of media along with guidance on how to load the data into application software (typically a GIS) and a codebook which describes the structure of the data in detail. Increasingly, documentation is available on line and a number of commercial data providers now also delivery the data on-line. For specific types of data e.g. aerial photography, satellite imagery, digital and paper maps, there may be an intermediate step where the user follows a link from a metadata record to a specialist inventory. This contains information to identify and retrieve metadata about specific instances of a collection item, e.g. a map in a map series, a single satellite image or aerial photograph, and from there an order can be placed. Where the data is on-line i.e. is provided through some form of on-line service, it should be possible to access the service directly, subject to the user having the authority to do so. (If the user doesn’t have the authority, then they should be able to get information about how they can register to use the service). UKBORDERS and EDINA Digimap are two obvious examples of this type of service. In the case of UKBORDERS, the user can use the service to extract the data (digitised boundaries of UK administrative, civil and census geographies) they require, specifying the format they require the data to be in. The data is extracted from a large geospatial database managed by EDINA, converted to the format requested by the user, packaged up into one or more files and placed in an FTP directory. The user can then either download the data via a link in a web page or from the FTP directory. 10/05/10 2
  3. 3. Geo-Data Portal: Final Report EDINA Digimap is different to UKBORDERS in that, in addition to a facility to download Ordnance Survey digital map data, web applications are provided so that the data can be used directly without it needing to be downloaded. Two applications are provided to produce maps from the Ordnance Survey data, a standard mapping tool and Digimap Carto. The difference between the two is the level of user control over the map content and the size and format of the output. In an ideal world, when the user moves from the geo-data portal to a data service, the users’ current thematic, geographic and temporal context should be transferred as well so that the service the user is accessing knows what data the user wishes to access. Although it is rumoured that some services do this, we have been unable to identify them. The OpenGIS Consortium (OGC) (see below) are working on a variation of what has been described so far. They are proposing that the metadata contain declarations of service associations for the data - that is, where known, potentially multiple ways that a dataset can be accessed are listed. These could, separately, express that a particular dataset is known to be bound to an FTP download, a OGC Web Map Service (for visualisation), and a OGC Web Feature Service (for data mining of the whole dataset or some subset of it). Bringing the data to the user In this section we look at how data identified using the Go-Geo! portal might be brought to the user rather than the user having to go to another service to access it. By this we mean the data can be automatically ‘mined’ from an online data source and delivered to some type of online client where it can visualised and, possibly, analysed. This client could be a part of the service offered by the Go-Geo! portal, however, once a mechanism for requesting data is in place, there is no reason why other services couldn’t offer clients with similar or more specialised functionality. To achieve interoperability the adoption of open standards is key. In this regard, the building blocks are already mature thanks to the work of the Open GIS consortium. The OpenGIS Consortium (OGC) was formed in 1994 with the mission to promote the development and use of advanced open systems standards and techniques in the area of geoprocessing and related information technologies. Its focus is on the “interoperability of geospatial systems”. It has over 200 members made up of Geographic Information Systems (GIS) vendors, user organisations, academic institutions and government bodies. Members include Oracle Corporation, United States Geological Survey, Ordnance Survey GB, NASA, Sun Microsystems, HP and Microsoft Corporation, Environmental Systems Research Institute (ESRI). OGC manages an industry-wide specification process designed to develop “interoperable geoprocessing”. This is based around the idea of web services, a “URL-addressable software resource that performs functions and provides answers” (Seybold, 2002). OpenGIS Implementation Specifications are publicly available engineering specifications of interfaces and protocols that enable interoperability among software systems dealing with geospatial data and geoprocessing functions. OpenGIS specifications are created, evaluated, and adopted through a formal consensus process facilitated by the OGC. Practical testbeds, pilot projects and a consensus specification development process are used to arrive at open specifications. 10/05/10 3
  4. 4. Geo-Data Portal: Final Report Products and services that conform to OGC open interface specifications enable users to freely exchange and apply spatial information, applications and services across networks and different platforms. A large number of specifications have now either been formally adopted or are in preparation. Below we describe the adopted open interface specifications of most interest in the context of geo-data services within the JISC Information Environment. Web Map Servers The OpenGIS Web Map Server (WMS) Interface Specification specifies open protocols that provide uniform access by HTML clients to maps rendered by map servers on the Internet. Figure 1 is an example of a simple a client using the WMS interface specification to display maps from a variety of remote sources. In this case the data on which the maps are based relate to global environmental change. Software conforming to the WMS specification is able to automatically overlay, in ordinary web browsers, map images obtained from multiple dissimilar map servers, regardless of map scale, projection, earth co-ordinate system or digital format. This is maturest of the Open GIS Implementation specifications. Vendors of Geographic Information Systems (GIS), earth imaging systems and the like have already implemented the OpenGIS Web Map Server Interface Specification in software upgrades and new software. Map and imagery suppliers are beginning to make their data available over the Web through these vendors' OpenGIS-conformant servers. Figure 1 – Example of a client using WMS interface specification to display maps from a variety of sources. 10/05/10 4
  5. 5. Geo-Data Portal: Final Report The specification defines a syntax for Uniform Resource Locators (URLs) that invoke each of these operations. The current WMS Implementation Specification (version 1.1.0) defines keyword/value encodings for operation requests using HTTP GET. Also, an Extensible Markup Language (XML) encoding is defined for service-level metadata. When requesting a map, a client may specify the information to be shown on the map (one or more "Layers"), possibly the "Styles" (cartographic symbolisation) of those Layers, what portion of the Earth is to be mapped (a "Bounding Box"), the projected or geographic co- ordinate reference system to be used (the "Spatial Reference System"), the desired output format, the output size (Width and Height), and background transparency and colour. When two or more maps are produced with the same Bounding Box, Spatial Reference System, and output size, the results can be accurately layered to produce a composite map. The use of image formats that support transparent backgrounds allows the lower Layers to be visible. Furthermore, individual map Layers can be requested from different Servers. The WMS specification thus enables the creation of a network of distributed Map Servers from which clients can build customised maps. Web Feature Servers The Web Feature Service (WFS) Interface Specification is an interface that supports query level access to vector data (points, lines, and areas (polygons)) in spatial databases. In other words it provides a data mining capability. Unlike the WMS which returns a ‘picture’, a WFS returns data. A request is generated on the client and is posted to a WFS server. The WFS Server reads and executes the request, returning the resulting data in a feature set as Geography Markup Language (GML)1. A GML enabled client can then use the data either directly to support query and analysis tasks or it can pass the data to another service to be turned into a form that can be displayed to the user. Bundled with the WFS specification is the Filter Encoding Specification. This defines a standard encoding for query predicates using XML. Using XML encoding, a query operation can be defined that, for example, retrieves features that lie in a particular geographic region or match some attribute criteria e.g. all lakes greater than 5km2 in area. HTTP GET and/or POST methods may be used to make requests. The former uses value pairs to encode the various parameters of a request, the latter XML. Web Coverage Servers The Web Coverage Service (WCS) supports the networked interchange of geospatial data as "coverages" containing values or properties of geographic locations. Unlike the Web Map Service (WMS), which filters and portrays spatial data to return static maps (server rendered as pictures), the Web Coverage Service provides access to intact (unrendered) geospatial information, as needed for client-side rendering and input into scientific models and other clients beyond simple viewers. The current version emphasises "simple” coverages (defined on some regular, rectangular grid or tessellation of space), for example, a rectangular grid of heights. 1 GML has been developed co-operatively as a global standard for geospatial data by the OGC. It is an XML encoding for the transport and storage of geographic information, including both the geometry and attributes of geographic features. The new OS MasterMap data, replacing the large-scale mapping product Land-line, is only available as GML. 10/05/10 5
  6. 6. Geo-Data Portal: Final Report Like the WMS specification, the WCS specification defines a syntax for Uniform Resource Locators (URLs) that invoke each of these operations and defines keyword/value encodings for operation requests using HTTP GET. Web Services In September 2001, OGC launched OGC Web Services (OWS1) 1.1 Initiative, an OGC Interoperability Program Testbed. The testbed ended in March 2002. A critical product from the OWS 1.1 testbed was a reference architecture for web-based geospatial services. OGC are now moving towards adopting a Web Services architecture; one that employs the commonly recognised protocols making up what is known as the Web Service protocol stack i.e HTTP, FTP, XML, SOAP, WSDL and UDDI/ebXML. OpenGIS Implementation Specifications and Go-Geo! Figure 2 shows diagrammatically a possible architecture by which geospatial data could be mined and displayed to users for visualisation and analysis within the Go-Geo! portal. In this example, the portal operates two clients. One of these is a web mapping client which displays maps, requested using the Web Map Server (WMS) Interface Specification, from two map servers; one hosted at a JISC data centre and the other by a research institute. The second client is a WFS client. This takes GML data requested using the Web Feature Server Interface Specification from a data service provided by a JISC national data centre and renders it for display to the user, using, for example, SVG (Simple Vector Graphics) or some similar mechanism. Because the data is now local to the user, it can be manipulated and queried by the user, subject to the client supporting this functionality. The WFS client can also request maps from the map server operated by the research institute as a backdrop for the rendered data. It is likely that access to ‘raw’ data may be restricted to authorised users only. Therefore, between the WFS client and the WFS server at the JISC Data Centre, there is a Web Authentication and Authorisation Server service. This communicates with Athens to authenticate the user and determine whether they have rights to access the data being supplied by the JISC data centre. 10/05/10 6
  7. 7. Geo-Data Portal: Final Report User WWW-Browser Go-Geo! Portal WMS Client WFS Client WAAS Client Clients Services WAAS Service WMS Service WMS Service Security Zone Athens WAAS Service WFS Service Geo- Geo- Data Data Data Data set set 1 2 JIS C D ata C entre HE R esearch JIS C D ata C entre Institute Figure 2 - OpenGIS Implementation Specifications and Go-Geo! A possible architecture 7
  8. 8. Geo-Data Portal: Final Report Current state of play The sorts of services able to support remote requests are likely to be large-scale, nationally operated services such as those run by EDINA and MIMAS. For example, over the last two years EDINA has implemented several of the OpenGIS implementation specifications. The result is a set of prototype ‘web services’ through which we are able to demonstrate how maps and data might served up to the Go-Geo! portal and other JISC services2. A series of web map servers have been implemented on top of the Digimap server, one server for each Ordnance Survey digital map product. Some of these servers are already used in the Go-Geo! portal to produce maps for display as part of geographic searching and as backdrops when displaying the geographic extent of a dataset described in a metadata record. As well as map servers, EDINA has also implemented an OGC web coverage server and a basic OGC web feature server. The former serves up digital terrain model (DTM) height data, while the latter supports the extraction of OS Strategi data. Recently, the latter has been chained with another service to allow the conversion of GML to SVG, the result of which is rendered in a simple web-based visualisation tool. The new UKBORDERS system (under development) is to deploy OGC compliant map and feature server interfaces thus allowing census and other digital boundaries to be requested and delivered to third party services. However, there is a number of licensing issues to be resolved before this can be deployed generally. Before data mining like functionality can develop further within the JISC IE, a number of issues need resolving. First, it is important that any contracts giving access to geospatial content negotiated by the JISC include the right to disseminate data in the manner described above. Existing contracts need to be revisited and re-negotiated to permit this kind of use, as EDINA has done with the Ordnance Survey. Second, funding is required so that other MIMAS and other service providers holding geospatial data on-line can extend their services to support remote requests for maps and data using OGC specifications. Linked to this is the issue of performance and capacity. It is essential that existing geo-spatial services do not suffer a performance penalty because they are supporting remote requests as well as direct usage. In time, a large number of services could be making multiple simultaneous and independent requests of the map and data servers operated by a JISC service provider. The hardware infrastructure must be able to: i) scale to cope with high demand without degradation in response times and ii) be sufficiently robust to permit transparent hardware and database failures without impinging on service availability. Network bandwidth might also prove to be a limiting factor if traffic volumes are high and data requests large in size. Establishing such architectures may be costly. Finally, given that access to the JISC funded web mapping and data services should not be open to all web services, some form of access control mechanism would have to be implemented so that only authorised JISC services could make requests and receive responses. 2 EDINA are near the end of negotiations with Ordnance Survey which will allow maps, supplied by EDINA Digimap service using WMS specification, to be displayed in web applications provided by other UK academic services and development projects. 8
  9. 9. Geo-Data Portal: Final Report Data Sharing One of the hopes of the Go-Geo! project is that researchers and research centres will provide metadata to Go-Geo! about smaller, more specialist datasets they have created. One of the benefits to individuals and centres of doing this is that demonstrating continued usage of data after the original research is completed can influence funders to provide further research money. However, feedback has indicated that provision of metadata from these sorts of data providers might be limited because of concerns about how they would distribute their data. Individuals/research centres/departments are generally not organised in such a way as to be able to administer distribution of data to people who approach them. While a researcher might be willing to burn data onto a CD as a one off request, they would be less willing to do this on a regular basis. It seems clear that what is required is a cost effective and easy way for those holding geo-spatial data to share their data with others within UK tertiary education. Three approaches might be considered. By depositing copies of data with archive organisations such as the UK Data Archive. The archive would maintain controls over the data on behalf of the owner and ensure the long-term safekeeping of the data. A key benefit is that the owner avoids the administrative tasks associated with external users and their queries. Potential users of the data typically find data through an on-line catalogue provided by the archive. Popular or large datasets may be available on-line otherwise on-line ordering systems are provided to order copies of datasets. If researchers and research centres did deposit their data with an archive, it would be important that the metadata records displayed by Go-Geo! recorded this fact and how to contact the archive. Through the use of a self-archiving service. Here we envisage the establishment of one or more self-archiving services which data producers/holders could use to publish data for use by others. The service would need to provide mechanisms for users to submit data, metadata and accompanying documentation (pdf, word files etc.). Metadata would also be published, possibly using OAI, and therefore harvested and stored in the Go-Geo! catalogue. To date self-archiving has been about depositing a digital document, typically a full text document3, in a publicly accessible web site. Depositing involves a simple web interface where the depositor copy/pastes in the "metadata" (date, author-name, title, journal-name, etc.) and then attaches the full-text document. Consideration of the use of self-archiving for depositing copies of datasets seems to have been limited. DSpace (http://dspace.org), is an exception to this. DSpace is a specialised type of digital asset management: it manages and distributes digital items, which includes datasets, and allows for the creation, indexing, and searching of associated metadata to locate and retrieve the items. It is designed to support the long-term preservation of the digital material stored in the repository. DSpace is also designed to make submission easy: DSpace “Communities” (such as university departments, labs, and centres) can adapt the system to meet their individual needs and manage the submission process themselves. Dspace is open- 3 Frequently these documents are Eprints, digital texts of peer-reviewed research articles, before and after refereeing. 9
  10. 10. Geo-Data Portal: Final Report source systems and available for anyone to download and run at any type of institution, organisation, or company. We might envisage therefore Go-Geo! running an instance of Dspace, into which those creating geospatial data deposit their data. Through provision of a peer to peer (p2p) application. Data holders/custodians would set up a p2p server on institutional machines and store data in them. (Probably at an institutional or department level.) Metadata would be published announcing the existence of servers and geospatial data. Metadata could also be published to the Go- Geo! catalogue. Users would use a p2p client to search for data or the Go-Geo! portal and then, having located a copy of the data, download it to their machine. Funding has been provided in the Phase 3 project to allow us to investigate options 2 and 3 and compare them against option 1. Whatever the result, a data sharing mechanism is seen as critical. Without it the amount of geo-spatial data available for re-use may be limited. In conclusion then there are a variety of ways in which users can be given access to the data they discover using Go-Geo!. This reflects the resources available to the data creator/custodian to distribute the data. More investigation is required into the best means by which individuals and smaller organisations such as research groups/centres can share data with others. For larger datasets hosted and made available through national data centres and other large service providers, the technology already exists through which the Go-Geo! portal could support on-line data mining, visualisation, exploitation and analysis of geospatial data. However, issues of licence to use data in this way and funding to establish the technical infrastructure need to be resolved before further developments can take place. References Open GIS Consortium Inc: http://www.opengis.org/ Open GIS Consortium Inc. (2002). Web Map Server Interfaces Implementation Specification, Version 1.1.1, OGC 01-068r3 OpenGIS Consortium Inc. (2002). Web Feature Service Implementation Specification, Version 1.0, 2002. OGC 02-058 OpenGIS Consortium Inc. (2002). Geography Markup Language (GML) Implementation Specification, Version 2.1.1, OGC 02-009 Seybold, P.A. (2002). Web Services Guide for Customer-Centric Executives, Patricia Seybold Group, Inc. Boston. 10