SPIN!, IST-99-10536, 15.06.1999                                                 1



                                     ...
SPIN!, IST-99-10536, 15.06.1999                                                                                           ...
SPIN!, IST-99-10536, 15.06.1999                                                                   3

B3. Objectives

To de...
SPIN!, IST-99-10536, 15.06.1999                                                                 4




Figure 1. Descriptio...
SPIN!, IST-99-10536, 15.06.1999                                                                    5

geoenvironmental map...
SPIN!, IST-99-10536, 15.06.1999                                                                   6

dissemination of data...
SPIN!, IST-99-10536, 15.06.1999                                                                7

interfaces and visual pr...
SPIN!, IST-99-10536, 15.06.1999                                                                   8

methods differ in man...
SPIN!, IST-99-10536, 15.06.1999                                                              9

Uses knowledge based syste...
SPIN!, IST-99-10536, 15.06.1999                                                                   10


B1.                ...
SPIN!, IST-99-10536, 15.06.1999                                                         11

WP10       Develop a business ...
SPIN!, IST-99-10536, 15.06.1999                                                                        12



B2.          ...
SPIN!, IST-99-10536, 15.06.1999                                                     13

D3.6          Implementation of ad...
SPIN!, IST-99-10536, 15.06.1999                                                      14

D7.1          Description of the ...
SPIN!, IST-99-10536, 15.06.1999                                                  15

D10.2         Report describing exist...
SPIN!, IST-99-10536, 15.06.1999                                                                              16


Introduc...
SPIN!, IST-99-10536, 15.06.1999                                                                   17

Risk management

Man...
SPIN!, IST-99-10536, 15.06.1999   18


Gantt Chart




                                       18
SPIN!, IST-99-10536, 15.06.1999                                                                          19

Main stages o...
SPIN!, IST-99-10536, 15.06.1999                                                                20


Pert diagram
The diagr...
SPIN!, IST-99-10536, 15.06.1999                                                             21

Work package description

...
SPIN!, IST-99-10536, 15.06.1999                                                              22

The base system contains
...
SPIN!, IST-99-10536, 15.06.1999                                                                  23

All these work packag...
SPIN!, IST-99-10536, 15.06.1999                                                                  24

Visualisation of Data...
SPIN!, IST-99-10536, 15.06.1999                                                                   25


This work package w...
SPIN!, IST-99-10536, 15.06.1999                                                                   26

sources such as cens...
SPIN!, IST-99-10536, 15.06.1999                                                                           27




B3.      ...
SPIN!, IST-99-10536, 15.06.1999                                                                          28



B3.        ...
SPIN!, IST-99-10536, 15.06.1999                                                                           29


B3.        ...
SPIN!, IST-99-10536, 15.06.1999                                                                           30


B3.        ...
SPIN!, IST-99-10536, 15.06.1999                                                                              31



B3.    ...
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
proposal/SPIN proposal final version.doc
Upcoming SlideShare
Loading in...5
×

proposal/SPIN proposal final version.doc

1,078

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,078
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "proposal/SPIN proposal final version.doc"

  1. 1. SPIN!, IST-99-10536, 15.06.1999 1 Part B B1. Title. Spatial Mining for Data of Public Interest SPIN! Proposal No. IST-1999-10536 Proposal for: IST programme, 1.1.2-5.1.4 Cross-Programme Action CPA4: New Indicators and statistical methods 1
  2. 2. SPIN!, IST-99-10536, 15.06.1999 2 B3. OBJECTIVES ................................................................................................................................................. 3 B4. CONTRIBUTION TO PROGRAMME/KEY ACTION OBJECTIVES ................................................... 5 B5. INNOVATIONS ............................................................................................................................................. 6 STATE OF THE ART .............................................................................................................................................. 6 TECHNOLOGICAL & SCIENTIFIC ADVANCES......................................................................................................... 7 DISTRIBUTION OF WORKLOAD ON WORK PACKAGES .......................................................................................... 11 INTRODUCTION TO WORKPACKAGES .................................................................................................................. 16 RISK MANAGEMENT ........................................................................................................................................... 17 PERT DIAGRAM .................................................................................................................................................. 20 WORK PACKAGE DESCRIPTION ........................................................................................................................... 21 C2. CONTENTS FOR PART C ........................................................................................................................ 40 C3. COMMUNITY ADDED VALUE AND CONTRIBUTION TO EU POLICIES .................................... 41 C4. CONTRIBUTION TO COMMUNITY SOCIAL OBJECTIVES ............................................................ 42 C5. PROJECT MANAGEMENT ...................................................................................................................... 43 C6. DESCRIPTION OF THE CONSORTIUM ............................................................................................... 45 C7. DESCRIPTION OF THE PARTICIPANTS.............................................................................................. 46 GMD - GERMAN NATIONAL RESEARCH CENTER FOR INFORMATION TECHNOLOGY.......................................... 46 DEPARTMENT OF INFORMATICS OF THE UNIVERSITY OF BARI ........................................................................... 48 SCHOOL OF GEOGRAPHY AT THE UNIVERSITY OF LEEDS ................................................................................... 49 THE INSTITUTE FOR INFORMATION TRANSMISSION PROBLEMS, RUSSIAN ACADEMY OF SCIENCES (IITP RAS) 50 DIALOGIS SOFTWARE & SERVICES GMBH, ST. AUGUSTIN, GERMANY .............................................................. 51 PROFESSIONAL GEO SYSTEMS B.V. (PGS), AMSTERDAM ................................................................................ 52 GEOFORSCHUNGSZENTRUM, POTSDAM, GERMANY DESCRIPTION OF THE PARTNER ......................................... 52 MANCHESTER METROPOLITAN UNIVERSITY/MIMAS ....................................................................................... 53 C8. ECONOMIC DEVELOPMENT AND SCIENTIFIC AND TECHNOLOGICAL PROSPECTS......... 54 APPENDIX – PUBLICATIONS OF PARTNERS CITED IN PART B ......................................................... 58 REFERENCES PARTNER P1 – GMD .................................................................................................................... 58 REFERENCES PARTNER P2 - UNIVERSITY OF BARI ............................................................................................. 59 REFERENCES PARTNER P3 – IITP, RUSSIAN ACADEMY OF SCIENCES ................................................................ 59 REFERENCES PARTNER 4 – LEEDS...................................................................................................................... 59 REFERENCES PARTNER P5 – DIALOGIS .............................................................................................................. 60 REFERENCES PARTNER P6 – PGS ...................................................................................................................... 60 2
  3. 3. SPIN!, IST-99-10536, 15.06.1999 3 B3. Objectives To develop an integrated interactive internet-enabled spatial data mining system. Data mining systems (DMS) and geographical information systems (GIS) are complementary tools for describing, transforming, analysing and modelling data about real world systems. Most contemporary GIS facilitate only very basic spatial analysis and data mining functionality and many are confined to simplistic analysis that involves comparing maps or descriptive statistical displays like histograms and pie charts. There is growing demand for integrated geographical or spatial data mining systems (SDMS) from public and private sector organisations who need both enhanced decision making capabilities and innovative solutions to a wide range of different problems. An integrated, user friendly SDMS operable over the internet offers exciting new possibilities for all manner of geographical research and spatial decision making. Thus the overall objective of SPIN! is to develop a state of the art, fully functional, truly integrated, internet-enabled, easily extendable and modifiable GIS-DMS platform, SPIN - a comprehensive and intuitive SDMS for data of public interest. In recent years, a number of project partners have developed the technological components and scientific tools that are needed to develop the kernel of this type of SDMS. During this project these individual efforts and the associated expertise and experience will be united in a joint European effort. SPIN! Consortium partners from statistical offices and seismic research centres will use the system in applied research and provide feedback to direct the development efforts. The applications of SPIN will clearly demonstrate the generic utility and additional benefits that this type of SDMS will have over existing technologies. Industrial partners will develop a business model for web-based information brokering with georeferenced statistical data, and estimate the likely economic impacts of the technology. The following scenarios describe some of the wide ranging potential benefits that statistical analysts, environmental decision makers, seismic data experts, biodiversity researchers and other public and private sector users can expect from such a system and introduce some of the main features that SPIN will include. To improve knowledge discovery by providing an enhanced capability to visualise data mining results in spatial temporal and attribute dimensions. Imagine a statistical officer has to prepare a report describing unusual aspects of African demography inter-related with socio-economics and the physical environment. Suppose initially the officer applies a data mining technique to classify all countries based on death rate and life expectancy and one classified subgroup with unusually high death rate and low life expectancy includes 40 African countries and only 51 in all. Suppose the officer creates a statistical display of all the classified groups (Fig. 1) and then decides to map the geographical distribution of the unusual subgroup distinguishing between African countries and those elsewhere (Fig. 2). The geographical distribution of the subgroups shown by the map may initiate ideas for further analysis. For instance, the analyst may wish to select sets of countries from the map to take a closer look at their demography and other geographical variables that describe socio- economic and environmental conditions. In addition, the officer may wish to discover what demographic attributes best characterise each continent at different points in time and investigate which groups of demographic attributes have interesting spatio-temporal co-distributions and inter- relationships with other socio-economic and environmental variables. All the analysis, some of which is quite complex could clearly be performed quicker and easier if an integrated SDMS with a linked display component and reporting system were available for use. It would be a major benefit if the maps and other data displays were automatically generated by a knowledge base of statistical display and thematic data mapping and these were automatically linked so that information the officer is focussing on during the analysis is simultaneously highlighted in all the relevant displays. This type of linked GIS style display component will be developed as a fundamental part of the integrated visualisation component of SPIN, which would facilitate this kind of statistical analysis (see partner P1, publication 3). 3
  4. 4. SPIN!, IST-99-10536, 15.06.1999 4 Figure 1. Descriptions of interesting subgroups Figure 2. Visualisation of the subgroup. To develop new and integrated ways of revealing complex patterns in spatio-temporally referenced data that were previously undiscovered using existing methods. Suppose an environmental decision maker is asked to look for relations between lung cancer and environmental pollution. What may be desired initially is some kind of exploratory spatial data analysis (ESDA) technique that automatically detects unusual spatial clustering of lung cancer incidence in the entire data set and for specific time periods. Additional spatial and aspatial analysis methods might then used to try and explain any unusual spatial clustering patterns observed using a range of other spatio- temporal and aspatio-temporal variables. In SPIN, exploratory spatio-temporal pattern analysis techniques derived from existing ESDA tools will be integrated with a wide variety of temporal, spatial and aspatial analysis methods. Partner P4 has developed a suite of ESDA tools that detect unusual clusters of incidence and produce mapable output that reveals the clustering pattern. Temporal versions of these tools and outputs will be developed along with the mechanisms for exporting the results of the analysis into other temporal, spatial and aspatial data mining techniques. Having all the tools available in one integrated SDMS would allow the decision maker to perform an in-depth, spatio-temporal analysis quickly and thereby help develop understanding of the geographical processes and inter-relationships that may result in an increased risk of contracting lung cancer. The analytical speed up will allow the decision maker to generate and test more hypotheses regarding the observed spatial, temporal and spatio-temporal patterns and to investigate even more advanced hypotheses about causal relationships. To enhance decision making capabilities by developing interactive GIS techniques, which provide an integrated exploratory and statistical basis for investigating spatial patterns. Seismic data experts regularly use GIS to help them spot geoenvironmental data patterns related to seismic activity. However, the complexity of geoenvironmental processes and noise in the spatial patterns of these variables makes it very difficult to objectively compare seismic maps with other 4
  5. 5. SPIN!, IST-99-10536, 15.06.1999 5 geoenvironmental maps and identify interesting patterns and relationships. To help reduce the likelihood of becoming overly subjective, a seismologist may wish to initially classify and select groups of areas with similar geoenvironmental characteristics and then perform statistical tests to investigate general differences in localised distributions of selected areas belonging to the same geoenvironmental group in the classification. An interactive version of SPIN will clearly aid the seismologist in the process classifying and selecting these areas and in performing the statistical tests. By simplifying this analysis task, the user can focus on looking for interesting patterns and testing a great number of alternative hypotheses. To deepen the understanding of spatio-temporal patterns by visual simulation. Imagine a biodiversity researcher wants to investigate the migratory flight route of a flock of storks travelling from Europe to Africa. Suppose the researcher uses a global positioning system (GPS) to track the progress of these birds and wishes to visually simulate the migration to provide an overview of the migratory route, the speed of different parts of the journey and identify areas where the storks rested along the way. SPIN will provide the capability to develop and play back this type of simulation over the internet. The same technique can be applied in many other areas, for example, logistics companies may want to use it to help keep track of orders and optimise transport routes or transport planners may desire it to aid the development of integrated transport networks. To publish and disseminate geographical data mining services over the internet. Suppose the various analysts described above (i.e. the statistical officer, the environmental decision maker, the seismic data expert and the biodiversity researcher) want to distribute their results quickly and cost effectively to encourage similar applications and promote world-wide scientific exchange of their research. Furthermore, suppose they want to publish both the conclusions and the details of their entire geographical data mining investigation so that other similar research can extend, generalise and build on their analyses. Imagine also that these researchers want to enable others to access and use the same analysis tools that were available to them. To realise all of this, they would probably need a relatively automatic way to plug-in their specific application to a Java-based internet enabled SDMS. This would then enable anyone with a standard web-browser to replicate and perform similar analyses wherever and whenever desired (see partner P4, publications 2 and 9; partner P1, publication 1,2; partner P3, publication 1,2). The proposed SDMS, SPIN will provide this type of capability in an integrated organised fashion. B4. Contribution to programme/key action objectives The proposal contributes to the IST programme objective of building key, user-friendly applications that enable the potential of the information society in several ways: Merging data mining and GIS based technology offers exciting new possibilities for spatial data research that is applicable in a wide variety of problem domains. Much expert geographical analysis has been restricted by prescribing in advance and exclusively following either a statistical or a GIS based approach. When both approaches have been applied, error prone and cumbersome data transfer between different applications has been necessary, nonetheless, useful information has been extracted from georeferenced data much more effectively by employing both approaches simultaneously. Clearly an integrated SPIN will facilitate such analysis and help to develop understanding of a wide range of geographical processes faster enhancing research and decision making in diverse application areas. SPIN will provide a user friendly interface to advanced data mining functionality, GIS and exploratory spatial data analysis tools that can be accessed via the internet. The system will enable quick and cost effective dissemination of information via the internet and enhance web-based research capabilities. The objective of nurturing emergent technologies is supported by the development of an innovative business model. A web-based brokering service is proposed that is designed to add value to the 5
  6. 6. SPIN!, IST-99-10536, 15.06.1999 6 dissemination of data and information providing a key to the commercialisation of the software and the service it facilitates. The proposal contributes to CPA4 (New indicators and statistical methods) by developing new tools for extracting information from data by adapting data mining functions specifically for spatial analysis. This includes adapting methods from Bayesian statistics, machine learning and other adaptive techniques so they can be launched from an integrated environment, which assists experimental comparison of their relative strengths and weaknesses. A further contribution to CPA4 derives from developing technology for the user-friendly dissemination of statistical data. SPIN will enable the dissemination of interactive statistical maps and provide data mining services over the internet, where the users need nothing but a standard web- browser such as Netscape or Internet Explorer. Many of the problems relevant to this use of SPIN will be addressed in an application that aims to facilitate the analysis of census data over the internet. The proposed web-based brokering service aims to go even further by enhancing the user-friendly and cost-effective dissemination of data. The proposed system will be generic and easily adaptable to diverse application areas and the research is specifically relevant to the following key actions of the cross-programmatic action (CPA) of the IST programme: Key Action I.4: Systems and services for citizen administration; systems enhancing the efficiency and user-friendliness of administrations. This is addressed in work package WP9 by the application to develop user friendly dissemination of statistical data. Key Action I.5: Intelligent environmental monitoring and management systems; environmental risk and emergency management systems (in conjunction with hazards and earth observation). These are addressed in work package WP8 by an application of the proposed system to the analysis of seismic and volcano data. Key Action II.3.2: New methods of work and electronic commerce. New market mediation systems, to develop innovative market place concepts and technologies. This will be addressed in the web-based brokering application in work package WP9. Key Action II.4.3: Digital object transfer. This will be addressed by a specific task within work package WP2 that aims to develop efficient and appropriate means of distributing data and maps over the internet. Key Action III.1: The future priority action line concerning geographic information is also clearly addressed. B5. Innovations State of the Art Contemporary GIS are monolithic closed systems that can be difficult to use and are usually very expensive. In the last few years a new generation of GIS has been emerging that enable interactive, dynamic maps to be disseminated via the Internet (see partner P1, publication 1, 3; partner P4, publication 4; partner P3, publication 10, 11). So far, most of these systems are confined to projecting descriptive statistical displays, such as histograms or pie charts, onto geographical space (maps). As decision making and inference using these projected map displays is not always straight-forward, data mining offers great potential benefits. The range of application areas is huge and there are many different types of applications in statistical analysis, urban planning, environmental decision making, and geomarketing for example. Largely unconnected to GIS research a wide range of analysis techniques now commonly referred to as data mining functions have been developed. These data mining functions are extensions of analytical techniques known for decades and have been packaged in various ways to form a large number of essentially very similar data mining systems (DMS). Some DMS provide user friendly 6
  7. 7. SPIN!, IST-99-10536, 15.06.1999 7 interfaces and visual programming environments that the non-expert can use to help automate the search for hidden patterns in large databases. Interest in DMS has boomed in recent years partly as a result of the packaged nature of the technology and improving graphical user interfaces, but mainly because of the desperate need for commercial enterprises to make returns on often large investments in data warehouses. Since the GIS revolution in the early 1980s there has been an explosion of geographically referenced information forming a rapidly expanding geocyberspace (see partner P4, publication 1), wherein much of the data is also temporally referenced. Commercial enterprises and government organisations have been swamped by this data explosion with few tools to extract useful information that can be applied in decision making contexts to solve problems and improve their function. By combining the strengths of GIS and DMS the proposed SDMS, SPIN, will have even greater functionality and should be a huge help to decision makers and spatial analysts charged with the task of backing up their intuitive insights using real world data. Some of the integrated components not currently present in either GIS or DMS include exploratory spatial data analysis methods that search for geographical patterns and relationships in complex space-time-attribute domains. Extending and integrating GIS and DMS to develop an internet enabled geographical data mining system is a logical progression for spatial data analysis technology. This development is poised to play a major role in the proposed terms of reference 1999-2003 of the Commission on Visualisation and Virtual Environments of the International Cartographic Association (MacEachren and Kraak 1999 1) and it can be expected that a great deal of research effort is needed to this effect in coming years. DMS and GIS are quite complex tools with wide ranging functionality and capabilities, so the SPIN! Consortium does not propose to start from scratch, but to build on existing tools. Many of these existing tools have been developed by various partners during 4th framework research, and many have passed the prototype stage and have well established user communities. One major advantage of the SPIN! Consortium is that the software developers will have access to the source code of all the various module components, which facilitates a seamless integration of all the technology in SPIN. (This would not be possible if the system were to be developed on top of third party proprietary products.) The system will be based on open standards such as Java and TCP/IP. The evolutionary prototype development approach proposed has many benefits. Users will be able to provide feedback on SPIN prototype requirements and performance throughout the project (starting from day one), and progressive prototype versions of the system will guide the development effort to fulfil user expectations by the end. The early development of prototypes is known to be one of the most effective counter-measures to limit the risks of such software development. Technological & Scientific Advances First system that tightly integrates state of the art GIS and data mining functionality in an open, extensible, internet-enabled plug-in architecture. The system will integrate a rich functionality: a data mining platform (see partner P1 and P5, publication 10); an internet enabled tool for interactive manipulation of statistical maps (P1, publication 1,2); an application for exploratory spatial data analysis (partner P4, publication 2); new modules for spatial data mining (see below); new modules for visualising temporal data and spatial data mining results; and a Java based GIS (partner P6, publication 1). The generic system architecture is easily adaptable to diverse application areas such as seismic data analysis and hazard management, environmental decision making, and census data dissemination. Adapting machine learning methods to spatial analysis. It is generally accepted that currently there exists no single data mining or machine learning method that is efficacious in every case. Available 1 See the following URL for details: http://www.geovista.psu.edu/ica/icavis/terms.html 7
  8. 8. SPIN!, IST-99-10536, 15.06.1999 8 methods differ in many ways in terms of complexity, representational power, accuracy, scalability, comprehensibility, and their ability to cope with noise and missing values, and many others factors. Different methods based in different approaches make different assumptions about the data being analysed which may not matter in some cases and maybe totally inappropriate in other cases. It is therefore important that users have access to a variety of spatial data mining methods, and help so they choose and combine whichever methods seem most appropriate for their task. In developing SPIN we will advance the state of the art in spatial data mining in several ways. Symbolic machine learning methods will be adapted to spatial data analysis, in particular, inductive logic programming (ILP) algorithms for the discovery of subgroups and spatial association rules. Efficient methods for the discovery of (non-spatial) association rules have been proposed in the field of data mining, most of which can deal with propositional, or zero th-order representations; however, they are unsuitable to express higher order spatial relationships. ILP is based on first-order predicate logic which allows for the representation of relations such as adjacent_to, inside, and close_to. This makes ILP a natural and promising approach to many forms of spatial data mining. Methods for the induction of first-order rules have been extensively investigated within ILP. Some of these methods have already been applied to the automated interpretation of topographic maps (see partner P2, publication 2,3). In this case, symbolic first-order descriptions of cells of a map are automatically extracted from a vector representation of maps stored in an object-oriented database. Intelligent map feature extraction is a challenging task. Advances in this field would open new possibilities for enhancing intelligent automated map design; also first-order descriptions of maps could be fed into (future) first-order learning systems as background knowledge, e.g. for topographically informed subgroup discovery. Combining the expressive power of first-order learning methods with the coherence and scalability of Bayesian statistics. First-order machine learning methods tend to be search intensive, and when dealing with large sets of data and highly dimensional dependencies, scalability might become a problem. To overcome this problem, we will investigate how scalability can be improved by the use of adaptive sampling, i.e. active learning techniques based on Bayesian Decision Theory. This will also help to bridge the gap between first-order learning and statistics. Applies advanced Bayesian classification, prediction, and interpolation to spatial data. In the last years computationally intensive Bayesian methods have been developed that compare favourably with classical approaches. Instead of selecting an “optimal” model they generate a whole distribution of models which characterise their uncertainty in the light of the available data. On the one hand they derive predictive distributions for new inputs reflecting the actual uncertainty and information. On the other hand they allow a rigorous assessment of the adequacy of different model types. This method has already been successfully applied by partner P1 (see partner P1, publication x13) to credit scoring and will now be adapted to spatial data. Automating the exploratory spatial data analysis of geographical data. Various exploratory spatial data analysis tools have been developed by partner P4 (see partner P4, publication 2) and made available for research via the internet. However the current format of the application may be criticised in that it is not user-friendly enough, and users are restricted to a select few input and output data formats. The search methods used in it are unintelligent brute force heuristics that could be improved by the application of artificial intelligence methods to direct the search. Early experiments by partner P4 indicate that there is great potential for these heuristics especially when analysing data in a multi- attribute space-time-attribute tri-space (see partner P4, publication 3). So by improving the quality of the search procedure the belief is that much larger more complex data sets can be investigated in a scalable way. To address the need for the system to communicate with other packages, both local and remote, the tool developed will make use of CORBA for data input and results output. Partner P4 also plans to develop improved visualisation tools to allow users to view the outputs of the tools developed in an easy and obvious way that aids their understanding of the results instead of hampering them as many current tools do. 8
  9. 9. SPIN!, IST-99-10536, 15.06.1999 9 Uses knowledge based systems technology to involve the expertise on thematic cartography in supporting visual mining of spatial and temporal data. Currently there is a recognised need in combining cartographic visualisation (meaning building maps to facilitate visual data exploration) with data mining (see, for example, special issue of Int. J. Geographical Information Science on Visualization for Exploration of Spatial Data, v.13(4), June 1999). Within the project we plan to develop both cartographical interface for preparing (selecting, preprocessing, etc.) data for data mining and interactive map presentation of results of data mining dynamically linked with specially designed non-geographic illustrations. Especial attention will be paid to interactivity of maps and other graphical displays and to the visualisation and analysis of the temporal aspect of data. Use of new techniques for efficient distribution of large maps for low bandwidth networks. Special attention will be given to develop efficient mechanisms that reduce the amount of data that has to be transferred from the client to the server. 9
  10. 10. SPIN!, IST-99-10536, 15.06.1999 10 B1. Workpackage list Work- Workpackage title Lead Person- Start End Phas Deliv- package contract months4 month5 month e7 erable No2 or 6 No8 No3 Coordination WP1 P1 34 0 36 - D1.1- 1.4 Identify user needs, define and WP2 P1 69 0 36 - D2.1- realize a generic system 2.6 architecture that integrates GIS and Data Mining functionality WP3 Extend machine-learning P2 42 0 36 - D3.1- methods to spatial mining 3.9 WP4 Generalize Bayesian Markov P1 40 0 36 - D4.1- Chain Monte Carlo to spatial 4.7 mining WP5 Adapt and integrate methods for P4 40 0 36 - D5.1- spatial pattern analysis 5.7 WP6 Develop support of visual P1 40 0 36 - D6.1- analysis of time-dependent 6.6 spatial data WP7 Develop methods for P1 40 0 36 - D7.1- visualization of Data Mining 7.6 results within GIS WP8 Application to seismic and P7 70 0 36 - D8.1- volcano data 8.9 WP9 Application to web-based P8 49 0 36 - D9.1- dissemination of data from 9.6 statistical offices 2 Workpackage number: WP 1 – WP n.- 3 Number of the contractor leading the work in this workpackage. 4 The total number of person-months allocated to each workpackage. 5 Relative start date for the work in the specific workpackages, month 0 marking the start of the project, and all other start dates being relative to this start date. 6 Relative end date, month 0 marking the start of the project, and all end dates being relative to this start date. 7 Only for combined research and demonstration projects: Please indicate R for research and D for demonstration. 8 Deliverable number: Number for the deliverable(s)/result(s) mentioned in the workpackage: D1 - Dn. 10
  11. 11. SPIN!, IST-99-10536, 15.06.1999 11 WP10 Develop a business model for P6 24 0 36 - D10.1- web based information and 10.5 service brokering with geo- referenced data WP11 Dissemination P8 38 0 36 - D11.1- 11.5 TOTAL 482 Distribution of Workload on work packages Partner P1 P2 P3 P4 P5 P6 P8 P8 Total Coord WP1 28 6 34 Techn. Dev. WP2 30 2 9 18 10 69 ML WP3 18 24 42 Bayes WP4 30 4 6 40 ESDA WP5 36 36 Vis. Spa-T WP6 28 12 40 Vis. DM WP7 28 12 40 Seis.Dat WP8 3 18 3 2 12 32 70 Stat. Off. WP9 3 6 2 4 34 49 Web-Brok. WP10 2 12 10 24 Dissem. WP11 2 8 2 14 4 8 38 172 24 20 96 36 56 36 42 482 11
  12. 12. SPIN!, IST-99-10536, 15.06.1999 12 B2. Deliverables list Deliverable Deliverable title Delivery Nature Dissemination No9 date level 10 11 12 D1.1 Project workplan 3 R PU D1.2 Reports for EC period. R PU D1.3 Project handbook 6 R PU D1.4 Project meetings period. R PU D2.1 System design document 8 R CO D2.2 Prototype 0 (incl. documentation) 12 P CO D2.3 Implementation of efficient methods for map transfer 15 P CO D2.4 Prototype 1 (incl. documentation) 18 P CO D2.5 Prototype 2 (incl. documentation) 30 P CO D2.6 Revision Release Prototype 2 (incl. documentation) (Final 32 P CO Release) D3.1 Theoretical report on spatio-temporal subgroup discovery 6 R PU D3.2 Theoretical report on adaptive sampling 21 R PU D3.3 Theoretical report on spatial association rules 5 R PU D3.4 Specifications of the descriptions to be automatically 15 R CO extracted from vectorized maps D3.5 Implementation of subgroup discovery 8 P CO 9 Deliverable numbers in order of delivery dates: D1 – Dn 10 Month in which the deliverables will be available. Month 0 marking the start of the project, and all delivery dates being relative to this start date. 11 Please indicate the nature of the deliverable using one of the following codes: R = Report P = Prototype D = Demonstrator O = Other 12 Please indicate the dissemination level using one of the following codes: PU = Public PP = Restricted to other programme participants (including the Commission Services). RE = Restricted to a group specified by the consortium (including the Commission Services). CO = Confidential, only for members of the consortium (including the Commission Services). 12
  13. 13. SPIN!, IST-99-10536, 15.06.1999 13 D3.6 Implementation of adaptive sampling for subgroup 23 P CO discovery D3.7 Implementation of spatial association rules 11 P CO D3.8 Software for the extraction of symbolic descriptions from 18 P CO vectorized maps D3.9 Report evaluating the application of first-order learning 36 R PU methods to spatial data D4.1 Report reviewing current Bayesian approaches 6 R PU D4.2 Software Implementation for bootstrap 11 P CO D4.3 Report on advanced spatial models and corresponding 15 R PU Bayesian models D4.4 Implementation of MCMC 18 P CO D4.5 Implementation of model selection 28 P CO D4.6 Performance evaluation and guidelines 36 R PU D4.7 Generic software library for spatial data transformations 6 P CO D5.1 Theoretical paper on algorithms for handling interaction 5 R PU with spatial location D5.2 Software for handling interaction with spatial location 11 P CO D5.3 Theoretical paper evaluating statistical clustering tests 14 R PU D5.4 Implementation of selected statistical clustering tests 18 P CO D5.5 Theoretical paper on algorithms for multiple search 24 R PU D5.6 Implementation of algorithms for multiple search 30 P CO D5.7 Reports on testing and evaluation of Spatial Analysis 36 R PU software tool Rule base on application of visualisation and interaction D6.1 16 P CO techniques depending on characteristics of data and the type of their time variation. D6.2 Software library implementing the proposed methods 26 P CO D6.3 Expert system engine performing selection of methods 30 P CO according to characteristics of data D6.4 Theoretical paper on algorithms for investigation of 18 R PU temporal changes D6.5 Implementation of algorithms for investigation of temporal 24 P CO changes D6.6 Evaluation report 36 R PU 13
  14. 14. SPIN!, IST-99-10536, 15.06.1999 14 D7.1 Description of the presentation methods proposed to apply 6 R PU to results of the considered data mining methods D7.2 Implementation of visualization method for subgroup 11 P CO discovery D7.3 Implementation of visualization method for spatial 12 P CO association rules D7.4 Implementation of visualization method for Bayesian 17 P CO classification D7.5 Implementation of best-practice methods for visualisation 17 P CO in ESDA Report on current & potential application methods in D7.6 36 R PU ESDA D8.1 Definition of user requirements 3 R PU D8.2 Description of the methods of space-time analysis and data 10 R PU mining of seismic data D8.3 Description of the methodology for designing seismic 15 R PU hazard information models D8.4 Software implementing the proposed methods within the 26 P CO SPIN! architecture D8.5 Evaluation report 24 R PU D8.6 Application of the software tools to the seismic active 34 P CO Eastern Mediterranean region D8.7 Application of the software tools to the high risk Merapi 36 P CO volcano D8.8 Integration of continuous monitoring data into the analysis 36 P CO process Report on the application of Spatial Mining to seismic and D8.9 36 R PU volcano data User requirements document for dissemination of D9.1 3 R PU statistical data D9.2 Description of data model 12 R CO D9.3 A prototype web site with interactive thematic maps that 16 P CO can be accessed over the internet D9.4 Prototype web-site based on SPIN prototype 2 30 P CO D9.5 Report about different user acceptance, recommendation 24 R PU for use, etc. D9.6 Report: recommendation of use 36 R PU D10.1 Define requirements for web-brokering 3 R PU 14
  15. 15. SPIN!, IST-99-10536, 15.06.1999 15 D10.2 Report describing existing brokering services, business 8 R PU model and property of rights problematic D10.3 Report addressing technical infrastructure 24 R CO D10.4 Prototype web-site for web-brokering 30 R PU D10.5 Final report on web-brokering 36 R CO D11.1 Project web page 3 R PU D11.2 Project description for the general public 2 P PU D11.3 First dissemination workshop 24 O PU D11.4 Second dissemination workshop 36 O PU D11.5 Feasibility study about commercialization 33 R PU 15
  16. 16. SPIN!, IST-99-10536, 15.06.1999 16 Introduction to workpackages The workpackages fall into several categories: technology development, research, application, exploitation. Figure 1 shows the main dependencies between the workpackages, but does not display feedback mechanisms which will be set up between all workpackages, as described in the section about project management. Building a spatial mining system is a demanding task. It requires expertise in many fields including Geographic Information Systems, Cartography, Statistics, Machine Learning, and Databases, as well as excellent software engineering skills. The consortium has been carefully chosen to ensure uncomprising competence in all these areas. It includes two industrial partners active in Data Mining and Geographic Information Systems (partner P5 and P6), a university and a national research center active in the areas of Data Mining, Machine Learning, and GIS (partners P2 and P1), an institute for geography active in Exploratory Spatial Data Analysis since the 80ies (partner P4), a university having a leading role in the dissemination of statistical data (partner P8), and two institutes active in seismic data research (partner P3 and P7). Each partner in the consortium has a unique area of competence not shared by the others, and brings into the consortium his expertise as well as his technologies. Adapt, Bayes Markov Visualization of Data Mining Chain Monte Carlo results to Spatial Mining Develop, adapt, Machine Learning algorithms to Spatial Mining Methods for spatio-temporal visualization Develop, adapt, Spatial Point Pattern Analysis Design, integrate GIS & DM platform Extending system for Application to Statistical Web-Based Information application to Seismic Offices Brokering Data Coordination Dissemination Technology Research Application Exploitation Figure 3. Main dependencies between work packages. 16
  17. 17. SPIN!, IST-99-10536, 15.06.1999 17 Risk management Many research and technology development projects fail since the typical risks of such a project are not taken into account. To prevent such a failure, the workplan has been designed to prevent typical causes of failure in advance. The main approaches taken towards risk management are: software reuse and incremental evolution of existing technology modular design of software components (plug-in architecture) strong user involvement early delivery of prototypes Involving users at all stages of the systems development is of utmost importance. The development process will implement iterative improvements to an incremental version of the system having delivered an original prototype for users to evaluate and suggest generic design modifications. The users will be involved in defining the system analysis requirements and in designing and testing the system right from the start. The users are responsible for providing evaluation reports, which serve as input to specific system design modifications. Since important modules of the final system already exist in a preliminary and non-integrated form, the users will be trained in using the individual systems at an early stage. This will help to shape their expectations and provide valuable feedback to the software developers. The users in work package WP9 already use the GIS technology developed by partner P1, so they can formulate specific requirements at an early stage minimising the likelihood that generic system requirements will undergo continuous change. The base integrating system platform will be an object-oriented plug-in style architecture to facilitate technological integration. The dependencies between work packages are reduced as plug-in components can be incorporated incrementally as they become available. In this way, revisions to the internal structure of either the client or the server should not affect the other parts. CORBA and RMI will be evaluated as integrating middle ware. Strong modularization should minimise the dangers of integrating technology developed separately by different groups. If for some reason one module were not delivered on time, this would not necessarily affect the implementation of other modules. Since partners P1, P3, and P4 have implemented major parts of the existing technology in Java anyway, risks of technology integration problems are already low. The Unified Modelling Language (UML) will be used for documentation and design to ensure product quality. Potential performance bottlenecks should be easy to spot at an early stage by applying the existing technology on test data provided by the users. The system needs to be interactive and users should not be made to wait too long for analysis results. Performance issues are addressed in a special task within WP2. Our approach to risk management has been tightly integrated within the overall technology development cycle of SPIN. Since an evolutionary approach containing several iterations is chosen, all work packages start at the kick-off meeting and end with the final workshop. 17
  18. 18. SPIN!, IST-99-10536, 15.06.1999 18 Gantt Chart 18
  19. 19. SPIN!, IST-99-10536, 15.06.1999 19 Main stages of technology development cycle Month Event Description of Event A kick-off-meeting will be held, where the users are informed in detail about the prospects of developing an SDMS, where alternative approaches will be discussed, and Kick-Off- 1 where the users will articulate specific expectations and requirements for the system. Meeting There will also be a tutorial session on Spatial Mining based on the existing technology The developer teams and the users will jointly define the user requirement report which User is due by month 3, and for which the users are responsible. This will be a major input for 3 requirement the system design. s report The existing, non-integrated systems will be applied to example data sets for further Test 5 clarifying user need, to spot performance bottlenecks at an early stage etc… applications The design specification is due in month 8. It is located mainly in WP2, but all work packages will contribute from their perspective. The report defines the intended Design 8 applications on a detailed level. On the basis of this document, the integration of the specification existing technologies will start and they will be merged in a single, coherent architecture. Developer A developer version (prototype 0) is due by month 12. This will be used for integrating version the modules developed in WP3-7, which will start at month 12. Users will get access to 12 (prototype this version as a technology preview. 0) Revised Initial feedback from users and developers will be used for making a revised system system design document which is due to month 15. 15 design document This will be used for developing the prototype 1, which is due in month 18. In this prototype, functionality from all work packages WP3-WP7 will be integrated, however, some functionality will still be missing (e.g. adaptive sampling for subgroup discovery in 18 Prototype 1 WP3). This prototype will be delivered to the users that will use them in their experimental applications. Users will evaluate whether the system meets the requirements specified in user requirements, and whether it meets the system design. The users will write an evaluation User report, which is due to month 24. In this month, an external workshop will be held 24 evaluation (WP11), where additional user groups and partners for commercial exploitation (WP10) report will be targeted. Users will have installed internally and even partially externally accessible web-sites, which will feature initial applications of the technology. Final design The user evaluation of prototype 1 will lead to modifications of the system design, where 27 document the final design document will be delivered in month 27. revision This will be input for the development of the prototype 2, which is due to month 30. It will integrate all technology developed in work packages WP3-WP7, and will be delivered to the users. With the full functionality available, the users will work intensely 30 Prototype 2 on their applications. The web-sites should be publicly accessible, so that feedback from a wider audience can be gathered. Experience in applications will lead to a revision release of prototype 2 in month 32. Revision The revision will cover the base system as well as the modules from work packages 32 release of WP3-WP7. prototype 2 Final user At the end of the project, the users will deliver a report describing their applications, evaluation; and they will give a final evaluation. A workshop for dissemination to a wider 36 Disseminati audience, for identifying partners for follow-up projects (WP11), and for partners for on potential commercialisation (WP10) will be held in this month. workshop 19
  20. 20. SPIN!, IST-99-10536, 15.06.1999 20 Pert diagram The diagram shows dependencies between tasks. To give a better overview, we have grouped tasks by category. Task numbers refer to the Gantt-Chart, which shows the exact starting and end date of tasks Kick-Off meeting 2.1 1 User require- Visualization ments Requirements 8.1, 8.2 6.1, 7.1 9.1, 10.1 System design 2.2 8 Data Mining 3.1, 3.3, 3.5,3.7, 4.1, 4.2, 4.7, 5.1, 5.4 Visualization 7.1-7.2 Prototype 0 2.4 Test & 12 Evaluation 8.3, 9.2, 9.3, 9.4 10.2 Design revision 2.2 15 Data Mining 3.4, 3.8, 4.3, 4.4, 5.3, 5.6 Visualization 6.1, 6.4 Prototype 1 2.5 Seismic data & statistical 18 offices 8.4, 8.5 9.4, 9.5 10.3, 10.4, 11.3 Evaluation 2.2 Data Mining 24 3.2, 3.6, 4.5, 5.2, 5.5 Visualization 6.2, 6.3, 6.5 Prototype 2 2.6 Real-world 30 Application 8.6, 8.7, 8.8, 8.9, 9.6, 9.7, 10.5 Final Workshop 11.5 36 20
  21. 21. SPIN!, IST-99-10536, 15.06.1999 21 Work package description Co-ordination The project brings together researchers, software developers, and users from a number of European countries, with different backgrounds and different approaches to spatial analysis and geographical modelling. To manage technology development, research, and exploit the component tools and system effectively, working package WP1 is devoted to co-ordination. Special attention has been given to define clear responsibilities and modular work package responsibilities and deliverables. The SPIN consortium will meet approximately every four months to establish and maintain an effective team. The management plan is based on a successfully applied EU project co-ordinated by partner P1 that is detailed in section C5 below. Technology development WP2 has the objective of designing an integrated system for Data Mining and GIS. This work package has the overall task of the technological integration of the existing GIS and Data Mining software, and to incorporate the modules developed in the other work packages in a coherent manner. It‟s the project„s technological hub, to which all partners will deliver, and whose deliverables all partners will need to have access to at some point. This will serve as a technological basis. We conceptually distinguish a base system and an integrated Spatial Mining system. Figure 4. The basic architecture of SPIN. Spatial mining and visualization methods can be added as plug-ins to the base system. Clients can access the system over the internet 21
  22. 22. SPIN!, IST-99-10536, 15.06.1999 22 The base system contains internet enabled GIS for automatic generation of interactive thematic maps Data Mining methods for nearest neighbour, decision trees, association rules, subgroup discovery, inductive logic programming, visualisation for these methods data transformation capabilities for discretization, restriction, projection, union, join, and calculated rows access to heterogeneous data sources (JDBC-compliant databases, ODBC, flat files, spatial data interfaces etc.), also over the internet facilities for organising and documenting analysis tasks. The existing Data Mining methods complement the spatial mining methods in the task of “explaining” spatial patterns in terms of non-spatial attributes. The internet enabled basis GIS module contains facilities for interactive manipulation of thematic maps. To provide automated visualisation, the GIS incorporates the knowledge of thematic cartography in the form of generic, domain-independent rules. To choose the adequate presentation techniques for given data, it takes into account data characteristics and relations among data components or attributes. The automation of map generation releases the user from the necessity of thinking how to present the data and from the routine work of map building and allows you to concentrate on the analysis of your data. This work package includes the steps of requirement analysis, design, implementation, testing, and documentation. Building the base system requires to integrate an already existing GIS tool and an existing Data Mining platform, both developed by partner P1. For tight integration a common Task manager, Data Management Layer, Extension API, and user interface have to be defined and implemented. The integrated system incorporates the Spatial Mining and visualisation methods developed in WP3-7 into the base system. Main input of this work package are the existing Data Mining and GIS systems, and the modules developed in WP3-7, the main output will be the integrated system. This integrated system will be developed in three main stages: prototype 0 (developer version), prototype 1 and prototype 2. User feedback will be gathered and evaluated from the first day on and will be used for improving the system. Research Work packages WP3, WP4, WP5 develop methods for Spatial Data Mining that can be added as a plug-in to the base system. A variety of methods have been selected for implementation, partially depending on previous experiences and results of the partners. Each partner has chosen a method for adaptation to whose advancement he has already made a theoretical and practical contribution, so that he is well acquainted with the subtleties of the chosen method; yet by combining the project partners expertise a broad range of advanced Data Mining techniques will be covered, from Bayesian Statistics (Partner P1, publication 6,8,9) and Neural Networks (Partner P1, publication 7) to symbolic approaches from Machine Learning and Inductive Logic Programming (Partner P1, publication 4, 10,11, Partner P2, publication 1,2,3) and genuine approaches to Spatial Cluster Analysis (Partner P4, publication 2,4). This gives the project a quite unique blend of depth of expertise with a broad range of methods covered. Since all these methods can be launched within a single, coherent platform, the project can also contribute to a comparison of the relative strengths and weaknesses of the methods and develop guidelines for their use in spatial mining. 22
  23. 23. SPIN!, IST-99-10536, 15.06.1999 23 All these work packages include a) state of the art review; b) theoretical advances, which will be communicated in a report; c) implementation and validation of the methods; d) integration with the base system; e) application to real-world tasks; f) documentation and final report. These stages are synchronised with the technology development cycle. These work packages have as their input previous theoretical and practical work of the partners and will have as their main output a theoretical description of the respective methods. Machine Learning (WP3). This work package is mainly concerned with the adaptation of symbolic machine learning methods to spatial data analysis. In particular methods to be adapted are Inductive Logic Programming algorithms for the discovery of subgroups and spatial association rules. They tend to be search intensive, and when dealing with large sets of data and high dimensional dependencies, scalability might become a problem. Moreover, most have been developed in order to satisfy classical properties of consistency and completeness, while in spatial data mining people are interested to detect patterns that satisfy minimum criteria for support and consistency. Adaptation of these machine learning tools will be based on the use of adaptive sampling, i.e. active learning techniques based on Bayesian Decision Theory, or on more efficient search strategies. Another contribution of this work package is the definition of appropriate algorithms for the automated extraction from vectorised maps of symbolic descriptions of parts (e.g., cells) of a map. Bayesian Statistics (WP4). A spatial relation may be described by a number of different models, leading to widely varying results. Currently the support for assessing and selecting models in GIS is very limited. Based on the extrapolation of the uncertainty of individual predictions of different models we will develop methods for a well-founded selection or combination of models. In the last years computationally intensive Bayesian methods have been developed that compare favourably with classical approaches. Instead of selecting an “optimal” model they generate a whole distribution of models which characterise their uncertainty in the light of the available data. On the one hand they derive predictive distributions for new inputs reflecting the actual information. On the other hand they allow a rigorous assessment of the adequacy of different model types. Partner P1 (publication 8,9) has developed Bayesian classification methods which use a Bayesian ensemble of decision trees or neural networks. These methods have already been successfully applied to credit scoring and will now be adapted to spatial data. Exploratory Spatial Data Analysis (WP5). This work package will explore methods of extending existing methods of spatial pattern detection. Currently ESDA methods tend to be concerned solely with the detection of spatial pattern and often overlook other data attributes. This shortcoming will be addressed by extending existing tools developed by partner P4 to handle attribute interaction with spatial location and to consider how temporal changes in spatial data can be investigated (see partner P4, publications 4 and 2). The tool will be expanded to use multiple search methods in addition to the current heuristic search used currently. There is also potential to investigate how different statistical tests of clustering can be used in the tool. Work packages WP6 and WP7 develop methods for visualisation of spatial and temporal information, and for the visualisation of Data Mining methods developed in WP3-5. Visualisation of spatial and temporal data (WP6). In most areas, spatially referenced data also refer to different moments or intervals in time. The study of such data is meaningless if their development in time is not taken into account. Analysis of spatially referenced data should be supported by their visual presentation in maps. Spatio-temporal data require substantial advancement of the traditional map form of presentation towards dynamics and high user interactivity. The work package aims at development of methods of visualisation of spatio-temporal data that can facilitate analysis of such data. The methods include not only graphical presentation by itself but also various data transformations and interactive manipulation of the displays. 23
  24. 24. SPIN!, IST-99-10536, 15.06.1999 24 Visualisation of Data Mining results (WP7). The form of presentation of data mining results to the user is crucial for their appropriate interpretation. Large amounts of information or complex concepts can be more easily comprehended when represented graphically. This especially applies to data and concepts having spatial reference or distribution. The objective of this work package is to design appropriate graphical techniques to represent results of the data mining methods developed within the project. The approach to be taken is a combination of cartographic and non-cartographic displays linked together through simultaneous dynamic highlighting of the corresponding parts (see partner P1, publication 1). The non-cartographic displays will represent the data mining results in summarised, generalised form while maps will provide the transition from general descriptions to individual spatial objects and phenomena characterised by them. Application The system will be used in several applications. One criterion for the selection of application areas is that a broad range of problem domains of special importance for the EU is covered, underlining the generality of the approach. A second criterion is that each of these areas should contribute in a unique way to evaluating/validating the adequacy of the chosen approach to Spatial Mining. This makes the evaluation process more focussed. An objective common to all application areas is to explore the applicability of advanced Data Mining methods. Specifically, spatial subgroup discovery, spatial Markov Chain Monte Carlo, and localised Spatial Point Pattern Analysis will be evaluated in each application area. Application to Seismic Data (WP8). In WP 1-7 a generic Spatial Mining System is developed. Such a kind of system has the important advantage that it has a potentially broad range of application areas and promotes technology reuse. However, some application areas will also need to incorporate specialised analysis methods. One of the main risks associated with the development of generic information technology is that an architecture that is not extensible may end up in not addressing the real needs of the user. Work package WP8 addresses this problem in an exemplary way. This will ensure that the generic system will be designed in a modular and extensible way right from the start. A key component is the plug-in architecture of the already existing Data Mining platform developed by partner P1, that allows for an easy integration of new modules. The application area selected for this task is earthquake prediction. This is a well-established scientific field belonging to physical geography, where a great amount of spatio-temporally referenced data from different sources is available. Research in this area has an obvious and great potential benefit for public health and quality of life. Advances in earthquake prediction could help to prevent massive financial losses. The objective of this work package is to adapt the generic system to the specialised application area of earthquake prediction and hazard assessment by integrating methods for natural hazard assessment that have been developed by partner P3. For achieving this goal, an integration layer between the generic Spatial Mining system and the specialised methods implemented by partner P3 has to be designed. Partner P7, which is active in the area of earthquake prediction for a long time, will profit from this technology by getting access to advanced and complementary methods for data analysis and by getting an instrument for the web-based dissemination of research results. Web-based dissemination of census data from statistical offices. A second application area is the analysis and web-based dissemination of census data from statistical offices. Here the main objective is to put to practical use the timely, cost-effective dissemination of statistical information over the internet. Partner P8 has several years‟ experience in developing tools for web based access to large spatial data sets and provides an academic service for access to census data. These tools are primarily for visualising database contents, data browsing and locating and mapping spatial data and they can handle spatial and aspatial referencing systems. Partner P8 also has access to a SUNE6500 super- server for academic applications. Additionally the project will be supported by the national census agency, which currently with the partner are planning the tools and services for public access to the forthcoming national census in 2001. 24
  25. 25. SPIN!, IST-99-10536, 15.06.1999 25 This work package will allow evaluation of the efficiency of the developed methods and of the responsiveness of the application as well as acceptance by customers of statistical offices. Potential problem areas are the availability of bandwidth, the number of concurrent users, and the size of maps and data sets. Especially if Data Mining analysis over the internet is permitted, the performance of the server will be of central importance. Experiences in this application area will be crucial for improving the prototype 1 system for better efficiency (which is a task within WP2). Dissemination and Exploitation Web-based brokering. Statistical offices, public agencies, and scientific institutions often face the problem that their initial efforts to build up a public database are externally funded, but the maintenance of such a service is not. Funding agencies require more and more that these institutions develop business plans for commercialising such a service in the long-run (at least for-non scientific use). The aim of this work package, for which the industrial partners will be responsible, is to develop a detailed concept for a web based information brokering service with georeferenced data as a foundation for a cost-effective dissemination of data. Web-based, interactive Spatial Mining can add a tremendous value to the mere distribution of data. This added value can be the key for commercialising the distribution of data for statistical offices, public agencies, and scientific institutions. What is new about this proposal is that the customer does not need to buy or to install any complex and expensive software on his computer, yet is not confined to the usual printed, non-interactive reports. An interactive thematic map is delivered over the internet using the Java technology. This map can be used by the customer for further exploration as well as for presentation and decision making. There will be different levels of service, as suggested by the following example business scenarios. The project will deliver technology to solve tasks 1-4 and provides the technological basis for task 5. The feasibility of this concept will be tested in a demonstrator. Customer needs Business Solution Customer Customer gets supplies 1. An institute for ecological Building a Data & Maps Interactive map on the studies prepares a environmental thematic map for internet report and needs a visualisation for predefined data their vegetation data and vegetation and map maps to make a presentation 2. A statistical office needs a Building a Data Interactive map on the visualisation of data about land use thematic map for internet predefined data 3. A department for urban Building a map, Description of Interactive Map with cluster development needs a local map data & map Data & detection, significance showing hazard risks for decision brokering Location testing making 4. A company running a power Maps periodically Description Interactive Map with cluster plant needs visualisation of updated from a Location; detection, significance monthly aggregated environmental database via the Data that have testing, periodically updated data for monitoring. internet to be periodically refreshed 5. A consulting company prepares Geomarketing A descriptive Interactive Map with cluster a market study for the chances of consulting task detection, significance sustainable tourism; for this it testing, visualisation of data needs access to data from different mining results; a summary 25
  26. 26. SPIN!, IST-99-10536, 15.06.1999 26 sources such as census data and report about Data Mining data about nature protection and results pollution in this area. Dissemination. The technology developed in this project is of a generic nature and has a broad range of potential applications. Yet potential user groups may be unaware of the existence of the type of technology the project develops, or they may have false expectation about it. The aim of this work package is to address the general public, as well potential users and partners for commercial exploitation. Dissemination will be an ongoing activity and will include organisation of workshops, maintaining a project web page, systematically identifying additional user groups that could act as partners in follow-up projects, providing project descriptions for the general public. Partner 6 will perform a feasibility study for commercialising technology developed especially within the application to seismic data. To this end they will actively search for a partner in the area of noise- level zoning. This is expected to become a major issue in the next two to three years in Holland, because of anticipated new legislation. This third application, where the partner will not be directly involved into the project, is also an application that demonstrates the potential of the technology for environmental decision making. A project sheet will be due in month 3, as well as a project web-site. Beginning with month 12, when a technological preview version will be available, potential additional user groups and potential customers will be systematically identified and contacted, so that knowledge about the project will be spread around. This activity will increase when the prototype 1 becomes available in month 18. A public workshop will be organised bringing together users, developers, potential users, as well as other interested people, in month 24. A second public workshop will be organised in month 36, concluding the project. 26
  27. 27. SPIN!, IST-99-10536, 15.06.1999 27 B3. Workpackage description Workpackage number : WP1 - Coordination Start date or starting event: 0 Participant number: P1 P4 Person-months per participant: 28 6 Objectives Overall and technical management. This will involve A) Overall Management Ensure that the various phases of the project are properly coordinated Development of project workplan Monitoring and reviewing progress of work Handling administrative procedures relating to European Commission Reporting to the European Commission Supporting a good communication between the partners B) Technical Management Writing of a project handbook including quality management plan Responsibility for critical technical decision which affect the project as a whole Definition of quality standards relevant to the project and determination how to satisfy them Description of work A) Overall Management T1. Ensure that the various phases of the project are properly coordinated T2. Development of project workplan (partners P1, P4) T3. Monitoring and reviewing progress of work T4. Handling administrative procedures relating to European Commission T5. Reporting to the European Commission T6. Scheduling of meetings B) Technical Management T7. Write a project handbook including quality management plan (partners P1, P4) T8. Responsibility for critical technical decision which affect the project as a whole (partners P1, P4) T9. Define quality standards relevant to the project and determination how to satisfy them (partners P1, P4) Deliverables D1. Project workplan (T2) D2. Reports for EC (T5) D3. Project handbook (T7) D4 Periodical project meetings (T6) Milestones and expected result Milestones of this workpackage are synchronized with the milestones of WP2: M1: System design (8), M2: Prototypes 0 (12), M3: prototype 1 (18), M4: prototype 2 (30) 27
  28. 28. SPIN!, IST-99-10536, 15.06.1999 28 B3. Workpackage description Workpackage number : WP2 Integrate Data Mining and GIS (Technology development) Start date or starting event: 0 Participant number: P1 P4 P3 P5 P6 Person-months per participant: 30 9 2 18 10 Objectives This workpackage has the overall task of the technological integration of the existing GIS and Data Mining software, and to incorporate the modules developed in the other workpackages in a coherent manner. It‟s the project„s technological hub, to which all partners will deliver, and whose deliverables all partners will need to have access to at some point. For tight integration of existing components a common Task manager, Data Management Layer, Extension API, and user interface have to be defined and implemented. The base system is designed as an object-oriented plug-in architecture, facilitating technological integration. Unified Modelling Language (UML) will be used for documentation and design to ensure product quality. CORBA and RMI as a middleware for integration will be evaluated. The integrated system incorporates the Spatial Mining and visualization methods developed in WP3-7 into the base system. Description of work T1. Organize kick-off meeting for identification of users needs T2. Design of the SPIN! system architecture T3. Develop efficient methods for transfer of data and maps over the internet (partner P6) T4. Implementation of developer version (prototype 0) T5. Technological integration of software developed in Task 1.3, 1.4 with spatial mining modules and visualization modules, resulting in prototype 1 T6. Testing and validation, revision of design, getting user input, improving system, resulting in prototype 2 T7. Revision release of second prototype (final release) Deliverables D1. System design document (T1, T2) D2. Prototype 0 (software & documentation) (T3) D3. Implementation of efficient methods for transfer of data and maps over the internet (partner P6) D4. Prototype 1 (software & documentation) (T4, T5) D5. Prototype 2 (software & documentation) (T6) D6. Revision release of prototype 2 (Final Release) (software & documentation) (T7) Milestones and expected result A user-friendly, internet enabled, extensible Spatial Mining software tightly integrating Data Mining and GIS functionality System providing a broad variety of methodological approaches to Spatial Mining that can be operated within a single environment M1. Specification of design (month 8) M2. Delivery of Prototype 0 (month 12) M3. Delivery of prototype 1 (month 18) M4. Delivery of prototype 2 (month 30) 28
  29. 29. SPIN!, IST-99-10536, 15.06.1999 29 B3. Workpackage description Workpackage number : WP3 – Extending machine learning methods to spatial mining Start date or starting event: 0 Participant number: P2 P1 Person-months per participant: 24 18 Objectives This workpackage mainly concerns with the adaptation of symbolic machine learning methods to spatial data analysis. In particular methods to be adapted are Inductive Logic Programming algorithms for the discovery of subgroups and spatial association rules. Moreover, some have been developed in order to satisfy classical properties of consistency and completeness, while in spatial data mining people are interested to detect patterns that satisfy minimum criteria for support and consistency. Adaptation of these machine learning tools will be based on the use of adaptive sampling, i.e. active learning techniques based on Bayesian Decision Theory, or on more efficient search strategies, to increase scalability. Another contribution of this workpackage is the definition of appropriate algorithms for the automated extraction from vectorized maps of symbolic descriptions of parts (e.g., cells) of a map. By evaluating Bayesian posterior distributions or their approximations, the uncertainty of subgroup quality indicators may be assessed. Relatively large subgroups with potentially high indicator values have a high utility and the sampling of new data from the corresponding spatial locations is rewarding. Active learning stops if the cost (negative utility) of collecting new data is higher than the expected utility of the subgroups that might be discovered. Description of work T1. Develop concepts for the definition of subgroup criteria linking space, time, domain knowledge. T2. Define criteria for adaptive sampling integrating the utility of subgroups as well as the cost of data collection and computation. Develop adaptive sampling methods based on Bayesian posterior distributions or their approximations T3. Investigate properties of spatial association rules and adapting rule discovery system to spatial association rules T4. Investigate the representation language to be adopted for the representation of parts of a vectorized map. T5. Software implementation of spatio-temporal subgroup discovery (without adaptive sampling) T6. Software implementation of spatio-temporal subgroup discovery with adaptive sampling T7. Software for the discovery of spatial association rules T8. Develop algorithms for the extraction of symbolic descriptions from vectorized maps T9. Application and evaluation of implemented methods to real-world data Deliverables D1. Theoretical report on spatio-temporal subgroup discovery (T1) D2. Theoretical report on adaptive sampling (T2) D3. Theoretical report on spatial association rules (T3) D4. Specifications of descriptions to be automatically extracted from vectorized maps (T4) D5. Software for spatio-temporal subgroup discovery (T5) D6. Software for adaptive sampling (T6) D7. Software for the discovery of spatial association rules (T7) D8. Software for the extraction of symbolic descriptions from vectorized maps (T8) D9. Report evaluating the application of first-order learning methods to spatial data (T9) Milestones and expected result The work done in this workpackage will advance the state of the art in spatial data analysis by adapting methods from Machine Learning to Spatial Mining, especially first-order learning methods. They are a natural and promising approach to Spatial Mining, since they allow to represent spatial relations directly. Work in this package is synchronized with the milestones M1-M4 of WP2: for each prototype a set of methods will be delivered 29
  30. 30. SPIN!, IST-99-10536, 15.06.1999 30 B3. Workpackage description Workpackage number : WP4 - Generalize Bayesian Markov Chain Monte Carlo to Spatial Mining Start date or starting event: 0 Participant number: P1 P4 P6 Person-months per participant: 30 4 6 Objectives Currently the support for assessing and selecting models in GIS is very limited. Based on the extrapolation of the uncertainty of individual predictions of different models we will develop methods for a well-founded selection or combination of models. Partner P1 has developed Bayesian classification methods which use a Bayesian ensemble of decision trees or neural networks, which will be adapted to spatial data. We will use the Bayesian approach in several directions: calculation of a predictive density characterizing the predictive or classification uncertainty for new inputs The main algorithms use asymptotic expansions and Markov Chain Monte Carlo (MCMC); selection of optimal models by comparing their performance according to the Bayes factor and related methods; Generation of ensembles of models of different type, e.g. using Bayesian model averaging and reversible jump MCMC. An approximate Bayesian techniques is the bootstrap. We will analyse the relative merits of this approach in comparison to Bayesian models. Besides the classical spatial statistics models (e.g. kriging) we will concentrate on localized models which adaptively partition the input area and generate different submodels. Promising candidates are radial basis functions, mixtures of experts and multivariate adaptive regression splines. Selection criterion is their adequacy for the intended application. Description of work T1. Report reviewing current approaches of spatial classification, prediction and interpolation T2. Implementation of selected current approaches using bootstrap techniques. T3. Report on advanced spatial models and the corresponding Bayesian algorithms. T4. A basic implementation of Bayesian MCMC for selected models. T5. Implementation of MCMC- or approximate Bayesian model selection / averaging. T6. Report on performance evaluation for spatial mining methods and guidelines for selecting models depending on data and prior conditions. T7. Implement a generic library for spatial data transformations used by the mining algorithms (Partner P6, P4) Deliverables D1. Report reviewing current approaches of spatial classification, prediction and interpolation (T1) D2. Implementation for bootstrap (T2) D3. Report on advanced spatial models and the corresponding Bayesian models (T3) D4. Implementation for MCMC (T4) D5. Implementation for model selection (T5) D6. Report on performance evaluation for spatial mining methods and guidelines (T6) D7. Generic software library for spatial data transformations (T7) Milestones and expected result adaptation of several advanced statistical models to the spatial domain, a comprehensive assessment of prediction/classification uncertainty for GIS, flexible framework for model formation, and model checking in a GIS-context. Work in this package is synchronized with the milestones M1-M4 of WP2, where methods will be delivered 30
  31. 31. SPIN!, IST-99-10536, 15.06.1999 31 B3. Workpackage description Workpackage number : WP5 – Adapt and integrate methods for spatial pattern analysis Start date or starting event: 0 Participant number: P4 Person-months per participant: 36 Objectives This work package will explore methods of extending existing methods of spatial pattern detection. Currently ESDA methods tend to be concerned solely with the detection of spatial pattern and often overlook other data attributes. This shortcoming will be addressed by extending existing tools developed by partner P4 (Partner P4, publication 3) to handle attribute interaction with spatial location and to consider how temporal changes in spatial data can be investigated. The tool will be expanded to use multiple search methods in addition to the current heuristic search used currently. These methods will include genetic algorithms, artificial life, and multi-agent techniques (WP 3). Partner P4 has already carried out some limited experiments with these techniques (Partner P4, publication 3) but will also investigate ways that the search techniques can be used together in the form of a hybrid search system. There is also potential to investigate how different statistical tests of clustering can be used in the tool. The development of the system as a modular Java based program allows other tests to be dropped into the tool for testing and comparison. Combined with this work, the methods developed in this work package will be designed to work closely with input and output functions developed in work packages 2 and 7. This will include the evaluation of CORBA and ODBC methods for data input and output. Description of work T1. Investigate algorithms for handling attribute interaction with spatial location T2. Implement attribute interaction with spatial location T3. Evaluate statistical clustering tests T4. Implement selected statistical clustering tests T5. Investigate algorithms for multiple search T6. Implement algorithms for multiple search T7. Testing and evaluation of software tool. Deliverables D1. Theoretical paper on algorithms for handling attribute interaction with spatial location (T1) D2. Implementation of attribute interaction with spatial location (T2) D3. Theoretical paper evaluating statistical clustering tests (T3) D4. Implementation of selected statistical clustering tests (T4) D5. Theoretical paper on algorithms for multiple search (T5) D6. Implementation of algorithms for multiple search (T6) D7. Reports of testing and evaluation of software tool. (T7) Milestones and expected result This workpackage will provide a variety of spatial pattern analysis methods for SPIN! system. Work in this package is synchronized with the milestones M1-M4 of WP2, where the implemented methods will be successively integrated into the prototype 31

×