proposal/SPIN proposal final version.doc
Upcoming SlideShare
Loading in...5
×
 

proposal/SPIN proposal final version.doc

on

  • 1,127 views

 

Statistics

Views

Total Views
1,127
Views on SlideShare
1,127
Embed Views
0

Actions

Likes
0
Downloads
13
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft Word

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

proposal/SPIN proposal final version.doc proposal/SPIN proposal final version.doc Document Transcript

  • SPIN!, IST-99-10536, 15.06.1999 1 Part B B1. Title. Spatial Mining for Data of Public Interest SPIN! Proposal No. IST-1999-10536 Proposal for: IST programme, 1.1.2-5.1.4 Cross-Programme Action CPA4: New Indicators and statistical methods 1
  • SPIN!, IST-99-10536, 15.06.1999 2 B3. OBJECTIVES ................................................................................................................................................. 3 B4. CONTRIBUTION TO PROGRAMME/KEY ACTION OBJECTIVES ................................................... 5 B5. INNOVATIONS ............................................................................................................................................. 6 STATE OF THE ART .............................................................................................................................................. 6 TECHNOLOGICAL & SCIENTIFIC ADVANCES......................................................................................................... 7 DISTRIBUTION OF WORKLOAD ON WORK PACKAGES .......................................................................................... 11 INTRODUCTION TO WORKPACKAGES .................................................................................................................. 16 RISK MANAGEMENT ........................................................................................................................................... 17 PERT DIAGRAM .................................................................................................................................................. 20 WORK PACKAGE DESCRIPTION ........................................................................................................................... 21 C2. CONTENTS FOR PART C ........................................................................................................................ 40 C3. COMMUNITY ADDED VALUE AND CONTRIBUTION TO EU POLICIES .................................... 41 C4. CONTRIBUTION TO COMMUNITY SOCIAL OBJECTIVES ............................................................ 42 C5. PROJECT MANAGEMENT ...................................................................................................................... 43 C6. DESCRIPTION OF THE CONSORTIUM ............................................................................................... 45 C7. DESCRIPTION OF THE PARTICIPANTS.............................................................................................. 46 GMD - GERMAN NATIONAL RESEARCH CENTER FOR INFORMATION TECHNOLOGY.......................................... 46 DEPARTMENT OF INFORMATICS OF THE UNIVERSITY OF BARI ........................................................................... 48 SCHOOL OF GEOGRAPHY AT THE UNIVERSITY OF LEEDS ................................................................................... 49 THE INSTITUTE FOR INFORMATION TRANSMISSION PROBLEMS, RUSSIAN ACADEMY OF SCIENCES (IITP RAS) 50 DIALOGIS SOFTWARE & SERVICES GMBH, ST. AUGUSTIN, GERMANY .............................................................. 51 PROFESSIONAL GEO SYSTEMS B.V. (PGS), AMSTERDAM ................................................................................ 52 GEOFORSCHUNGSZENTRUM, POTSDAM, GERMANY DESCRIPTION OF THE PARTNER ......................................... 52 MANCHESTER METROPOLITAN UNIVERSITY/MIMAS ....................................................................................... 53 C8. ECONOMIC DEVELOPMENT AND SCIENTIFIC AND TECHNOLOGICAL PROSPECTS......... 54 APPENDIX – PUBLICATIONS OF PARTNERS CITED IN PART B ......................................................... 58 REFERENCES PARTNER P1 – GMD .................................................................................................................... 58 REFERENCES PARTNER P2 - UNIVERSITY OF BARI ............................................................................................. 59 REFERENCES PARTNER P3 – IITP, RUSSIAN ACADEMY OF SCIENCES ................................................................ 59 REFERENCES PARTNER 4 – LEEDS...................................................................................................................... 59 REFERENCES PARTNER P5 – DIALOGIS .............................................................................................................. 60 REFERENCES PARTNER P6 – PGS ...................................................................................................................... 60 2
  • SPIN!, IST-99-10536, 15.06.1999 3 B3. Objectives To develop an integrated interactive internet-enabled spatial data mining system. Data mining systems (DMS) and geographical information systems (GIS) are complementary tools for describing, transforming, analysing and modelling data about real world systems. Most contemporary GIS facilitate only very basic spatial analysis and data mining functionality and many are confined to simplistic analysis that involves comparing maps or descriptive statistical displays like histograms and pie charts. There is growing demand for integrated geographical or spatial data mining systems (SDMS) from public and private sector organisations who need both enhanced decision making capabilities and innovative solutions to a wide range of different problems. An integrated, user friendly SDMS operable over the internet offers exciting new possibilities for all manner of geographical research and spatial decision making. Thus the overall objective of SPIN! is to develop a state of the art, fully functional, truly integrated, internet-enabled, easily extendable and modifiable GIS-DMS platform, SPIN - a comprehensive and intuitive SDMS for data of public interest. In recent years, a number of project partners have developed the technological components and scientific tools that are needed to develop the kernel of this type of SDMS. During this project these individual efforts and the associated expertise and experience will be united in a joint European effort. SPIN! Consortium partners from statistical offices and seismic research centres will use the system in applied research and provide feedback to direct the development efforts. The applications of SPIN will clearly demonstrate the generic utility and additional benefits that this type of SDMS will have over existing technologies. Industrial partners will develop a business model for web-based information brokering with georeferenced statistical data, and estimate the likely economic impacts of the technology. The following scenarios describe some of the wide ranging potential benefits that statistical analysts, environmental decision makers, seismic data experts, biodiversity researchers and other public and private sector users can expect from such a system and introduce some of the main features that SPIN will include. To improve knowledge discovery by providing an enhanced capability to visualise data mining results in spatial temporal and attribute dimensions. Imagine a statistical officer has to prepare a report describing unusual aspects of African demography inter-related with socio-economics and the physical environment. Suppose initially the officer applies a data mining technique to classify all countries based on death rate and life expectancy and one classified subgroup with unusually high death rate and low life expectancy includes 40 African countries and only 51 in all. Suppose the officer creates a statistical display of all the classified groups (Fig. 1) and then decides to map the geographical distribution of the unusual subgroup distinguishing between African countries and those elsewhere (Fig. 2). The geographical distribution of the subgroups shown by the map may initiate ideas for further analysis. For instance, the analyst may wish to select sets of countries from the map to take a closer look at their demography and other geographical variables that describe socio- economic and environmental conditions. In addition, the officer may wish to discover what demographic attributes best characterise each continent at different points in time and investigate which groups of demographic attributes have interesting spatio-temporal co-distributions and inter- relationships with other socio-economic and environmental variables. All the analysis, some of which is quite complex could clearly be performed quicker and easier if an integrated SDMS with a linked display component and reporting system were available for use. It would be a major benefit if the maps and other data displays were automatically generated by a knowledge base of statistical display and thematic data mapping and these were automatically linked so that information the officer is focussing on during the analysis is simultaneously highlighted in all the relevant displays. This type of linked GIS style display component will be developed as a fundamental part of the integrated visualisation component of SPIN, which would facilitate this kind of statistical analysis (see partner P1, publication 3). 3
  • SPIN!, IST-99-10536, 15.06.1999 4 Figure 1. Descriptions of interesting subgroups Figure 2. Visualisation of the subgroup. To develop new and integrated ways of revealing complex patterns in spatio-temporally referenced data that were previously undiscovered using existing methods. Suppose an environmental decision maker is asked to look for relations between lung cancer and environmental pollution. What may be desired initially is some kind of exploratory spatial data analysis (ESDA) technique that automatically detects unusual spatial clustering of lung cancer incidence in the entire data set and for specific time periods. Additional spatial and aspatial analysis methods might then used to try and explain any unusual spatial clustering patterns observed using a range of other spatio- temporal and aspatio-temporal variables. In SPIN, exploratory spatio-temporal pattern analysis techniques derived from existing ESDA tools will be integrated with a wide variety of temporal, spatial and aspatial analysis methods. Partner P4 has developed a suite of ESDA tools that detect unusual clusters of incidence and produce mapable output that reveals the clustering pattern. Temporal versions of these tools and outputs will be developed along with the mechanisms for exporting the results of the analysis into other temporal, spatial and aspatial data mining techniques. Having all the tools available in one integrated SDMS would allow the decision maker to perform an in-depth, spatio-temporal analysis quickly and thereby help develop understanding of the geographical processes and inter-relationships that may result in an increased risk of contracting lung cancer. The analytical speed up will allow the decision maker to generate and test more hypotheses regarding the observed spatial, temporal and spatio-temporal patterns and to investigate even more advanced hypotheses about causal relationships. To enhance decision making capabilities by developing interactive GIS techniques, which provide an integrated exploratory and statistical basis for investigating spatial patterns. Seismic data experts regularly use GIS to help them spot geoenvironmental data patterns related to seismic activity. However, the complexity of geoenvironmental processes and noise in the spatial patterns of these variables makes it very difficult to objectively compare seismic maps with other 4
  • SPIN!, IST-99-10536, 15.06.1999 5 geoenvironmental maps and identify interesting patterns and relationships. To help reduce the likelihood of becoming overly subjective, a seismologist may wish to initially classify and select groups of areas with similar geoenvironmental characteristics and then perform statistical tests to investigate general differences in localised distributions of selected areas belonging to the same geoenvironmental group in the classification. An interactive version of SPIN will clearly aid the seismologist in the process classifying and selecting these areas and in performing the statistical tests. By simplifying this analysis task, the user can focus on looking for interesting patterns and testing a great number of alternative hypotheses. To deepen the understanding of spatio-temporal patterns by visual simulation. Imagine a biodiversity researcher wants to investigate the migratory flight route of a flock of storks travelling from Europe to Africa. Suppose the researcher uses a global positioning system (GPS) to track the progress of these birds and wishes to visually simulate the migration to provide an overview of the migratory route, the speed of different parts of the journey and identify areas where the storks rested along the way. SPIN will provide the capability to develop and play back this type of simulation over the internet. The same technique can be applied in many other areas, for example, logistics companies may want to use it to help keep track of orders and optimise transport routes or transport planners may desire it to aid the development of integrated transport networks. To publish and disseminate geographical data mining services over the internet. Suppose the various analysts described above (i.e. the statistical officer, the environmental decision maker, the seismic data expert and the biodiversity researcher) want to distribute their results quickly and cost effectively to encourage similar applications and promote world-wide scientific exchange of their research. Furthermore, suppose they want to publish both the conclusions and the details of their entire geographical data mining investigation so that other similar research can extend, generalise and build on their analyses. Imagine also that these researchers want to enable others to access and use the same analysis tools that were available to them. To realise all of this, they would probably need a relatively automatic way to plug-in their specific application to a Java-based internet enabled SDMS. This would then enable anyone with a standard web-browser to replicate and perform similar analyses wherever and whenever desired (see partner P4, publications 2 and 9; partner P1, publication 1,2; partner P3, publication 1,2). The proposed SDMS, SPIN will provide this type of capability in an integrated organised fashion. B4. Contribution to programme/key action objectives The proposal contributes to the IST programme objective of building key, user-friendly applications that enable the potential of the information society in several ways: Merging data mining and GIS based technology offers exciting new possibilities for spatial data research that is applicable in a wide variety of problem domains. Much expert geographical analysis has been restricted by prescribing in advance and exclusively following either a statistical or a GIS based approach. When both approaches have been applied, error prone and cumbersome data transfer between different applications has been necessary, nonetheless, useful information has been extracted from georeferenced data much more effectively by employing both approaches simultaneously. Clearly an integrated SPIN will facilitate such analysis and help to develop understanding of a wide range of geographical processes faster enhancing research and decision making in diverse application areas. SPIN will provide a user friendly interface to advanced data mining functionality, GIS and exploratory spatial data analysis tools that can be accessed via the internet. The system will enable quick and cost effective dissemination of information via the internet and enhance web-based research capabilities. The objective of nurturing emergent technologies is supported by the development of an innovative business model. A web-based brokering service is proposed that is designed to add value to the 5
  • SPIN!, IST-99-10536, 15.06.1999 6 dissemination of data and information providing a key to the commercialisation of the software and the service it facilitates. The proposal contributes to CPA4 (New indicators and statistical methods) by developing new tools for extracting information from data by adapting data mining functions specifically for spatial analysis. This includes adapting methods from Bayesian statistics, machine learning and other adaptive techniques so they can be launched from an integrated environment, which assists experimental comparison of their relative strengths and weaknesses. A further contribution to CPA4 derives from developing technology for the user-friendly dissemination of statistical data. SPIN will enable the dissemination of interactive statistical maps and provide data mining services over the internet, where the users need nothing but a standard web- browser such as Netscape or Internet Explorer. Many of the problems relevant to this use of SPIN will be addressed in an application that aims to facilitate the analysis of census data over the internet. The proposed web-based brokering service aims to go even further by enhancing the user-friendly and cost-effective dissemination of data. The proposed system will be generic and easily adaptable to diverse application areas and the research is specifically relevant to the following key actions of the cross-programmatic action (CPA) of the IST programme: Key Action I.4: Systems and services for citizen administration; systems enhancing the efficiency and user-friendliness of administrations. This is addressed in work package WP9 by the application to develop user friendly dissemination of statistical data. Key Action I.5: Intelligent environmental monitoring and management systems; environmental risk and emergency management systems (in conjunction with hazards and earth observation). These are addressed in work package WP8 by an application of the proposed system to the analysis of seismic and volcano data. Key Action II.3.2: New methods of work and electronic commerce. New market mediation systems, to develop innovative market place concepts and technologies. This will be addressed in the web-based brokering application in work package WP9. Key Action II.4.3: Digital object transfer. This will be addressed by a specific task within work package WP2 that aims to develop efficient and appropriate means of distributing data and maps over the internet. Key Action III.1: The future priority action line concerning geographic information is also clearly addressed. B5. Innovations State of the Art Contemporary GIS are monolithic closed systems that can be difficult to use and are usually very expensive. In the last few years a new generation of GIS has been emerging that enable interactive, dynamic maps to be disseminated via the Internet (see partner P1, publication 1, 3; partner P4, publication 4; partner P3, publication 10, 11). So far, most of these systems are confined to projecting descriptive statistical displays, such as histograms or pie charts, onto geographical space (maps). As decision making and inference using these projected map displays is not always straight-forward, data mining offers great potential benefits. The range of application areas is huge and there are many different types of applications in statistical analysis, urban planning, environmental decision making, and geomarketing for example. Largely unconnected to GIS research a wide range of analysis techniques now commonly referred to as data mining functions have been developed. These data mining functions are extensions of analytical techniques known for decades and have been packaged in various ways to form a large number of essentially very similar data mining systems (DMS). Some DMS provide user friendly 6
  • SPIN!, IST-99-10536, 15.06.1999 7 interfaces and visual programming environments that the non-expert can use to help automate the search for hidden patterns in large databases. Interest in DMS has boomed in recent years partly as a result of the packaged nature of the technology and improving graphical user interfaces, but mainly because of the desperate need for commercial enterprises to make returns on often large investments in data warehouses. Since the GIS revolution in the early 1980s there has been an explosion of geographically referenced information forming a rapidly expanding geocyberspace (see partner P4, publication 1), wherein much of the data is also temporally referenced. Commercial enterprises and government organisations have been swamped by this data explosion with few tools to extract useful information that can be applied in decision making contexts to solve problems and improve their function. By combining the strengths of GIS and DMS the proposed SDMS, SPIN, will have even greater functionality and should be a huge help to decision makers and spatial analysts charged with the task of backing up their intuitive insights using real world data. Some of the integrated components not currently present in either GIS or DMS include exploratory spatial data analysis methods that search for geographical patterns and relationships in complex space-time-attribute domains. Extending and integrating GIS and DMS to develop an internet enabled geographical data mining system is a logical progression for spatial data analysis technology. This development is poised to play a major role in the proposed terms of reference 1999-2003 of the Commission on Visualisation and Virtual Environments of the International Cartographic Association (MacEachren and Kraak 1999 1) and it can be expected that a great deal of research effort is needed to this effect in coming years. DMS and GIS are quite complex tools with wide ranging functionality and capabilities, so the SPIN! Consortium does not propose to start from scratch, but to build on existing tools. Many of these existing tools have been developed by various partners during 4th framework research, and many have passed the prototype stage and have well established user communities. One major advantage of the SPIN! Consortium is that the software developers will have access to the source code of all the various module components, which facilitates a seamless integration of all the technology in SPIN. (This would not be possible if the system were to be developed on top of third party proprietary products.) The system will be based on open standards such as Java and TCP/IP. The evolutionary prototype development approach proposed has many benefits. Users will be able to provide feedback on SPIN prototype requirements and performance throughout the project (starting from day one), and progressive prototype versions of the system will guide the development effort to fulfil user expectations by the end. The early development of prototypes is known to be one of the most effective counter-measures to limit the risks of such software development. Technological & Scientific Advances First system that tightly integrates state of the art GIS and data mining functionality in an open, extensible, internet-enabled plug-in architecture. The system will integrate a rich functionality: a data mining platform (see partner P1 and P5, publication 10); an internet enabled tool for interactive manipulation of statistical maps (P1, publication 1,2); an application for exploratory spatial data analysis (partner P4, publication 2); new modules for spatial data mining (see below); new modules for visualising temporal data and spatial data mining results; and a Java based GIS (partner P6, publication 1). The generic system architecture is easily adaptable to diverse application areas such as seismic data analysis and hazard management, environmental decision making, and census data dissemination. Adapting machine learning methods to spatial analysis. It is generally accepted that currently there exists no single data mining or machine learning method that is efficacious in every case. Available 1 See the following URL for details: http://www.geovista.psu.edu/ica/icavis/terms.html 7
  • SPIN!, IST-99-10536, 15.06.1999 8 methods differ in many ways in terms of complexity, representational power, accuracy, scalability, comprehensibility, and their ability to cope with noise and missing values, and many others factors. Different methods based in different approaches make different assumptions about the data being analysed which may not matter in some cases and maybe totally inappropriate in other cases. It is therefore important that users have access to a variety of spatial data mining methods, and help so they choose and combine whichever methods seem most appropriate for their task. In developing SPIN we will advance the state of the art in spatial data mining in several ways. Symbolic machine learning methods will be adapted to spatial data analysis, in particular, inductive logic programming (ILP) algorithms for the discovery of subgroups and spatial association rules. Efficient methods for the discovery of (non-spatial) association rules have been proposed in the field of data mining, most of which can deal with propositional, or zero th-order representations; however, they are unsuitable to express higher order spatial relationships. ILP is based on first-order predicate logic which allows for the representation of relations such as adjacent_to, inside, and close_to. This makes ILP a natural and promising approach to many forms of spatial data mining. Methods for the induction of first-order rules have been extensively investigated within ILP. Some of these methods have already been applied to the automated interpretation of topographic maps (see partner P2, publication 2,3). In this case, symbolic first-order descriptions of cells of a map are automatically extracted from a vector representation of maps stored in an object-oriented database. Intelligent map feature extraction is a challenging task. Advances in this field would open new possibilities for enhancing intelligent automated map design; also first-order descriptions of maps could be fed into (future) first-order learning systems as background knowledge, e.g. for topographically informed subgroup discovery. Combining the expressive power of first-order learning methods with the coherence and scalability of Bayesian statistics. First-order machine learning methods tend to be search intensive, and when dealing with large sets of data and highly dimensional dependencies, scalability might become a problem. To overcome this problem, we will investigate how scalability can be improved by the use of adaptive sampling, i.e. active learning techniques based on Bayesian Decision Theory. This will also help to bridge the gap between first-order learning and statistics. Applies advanced Bayesian classification, prediction, and interpolation to spatial data. In the last years computationally intensive Bayesian methods have been developed that compare favourably with classical approaches. Instead of selecting an “optimal” model they generate a whole distribution of models which characterise their uncertainty in the light of the available data. On the one hand they derive predictive distributions for new inputs reflecting the actual uncertainty and information. On the other hand they allow a rigorous assessment of the adequacy of different model types. This method has already been successfully applied by partner P1 (see partner P1, publication x13) to credit scoring and will now be adapted to spatial data. Automating the exploratory spatial data analysis of geographical data. Various exploratory spatial data analysis tools have been developed by partner P4 (see partner P4, publication 2) and made available for research via the internet. However the current format of the application may be criticised in that it is not user-friendly enough, and users are restricted to a select few input and output data formats. The search methods used in it are unintelligent brute force heuristics that could be improved by the application of artificial intelligence methods to direct the search. Early experiments by partner P4 indicate that there is great potential for these heuristics especially when analysing data in a multi- attribute space-time-attribute tri-space (see partner P4, publication 3). So by improving the quality of the search procedure the belief is that much larger more complex data sets can be investigated in a scalable way. To address the need for the system to communicate with other packages, both local and remote, the tool developed will make use of CORBA for data input and results output. Partner P4 also plans to develop improved visualisation tools to allow users to view the outputs of the tools developed in an easy and obvious way that aids their understanding of the results instead of hampering them as many current tools do. 8
  • SPIN!, IST-99-10536, 15.06.1999 9 Uses knowledge based systems technology to involve the expertise on thematic cartography in supporting visual mining of spatial and temporal data. Currently there is a recognised need in combining cartographic visualisation (meaning building maps to facilitate visual data exploration) with data mining (see, for example, special issue of Int. J. Geographical Information Science on Visualization for Exploration of Spatial Data, v.13(4), June 1999). Within the project we plan to develop both cartographical interface for preparing (selecting, preprocessing, etc.) data for data mining and interactive map presentation of results of data mining dynamically linked with specially designed non-geographic illustrations. Especial attention will be paid to interactivity of maps and other graphical displays and to the visualisation and analysis of the temporal aspect of data. Use of new techniques for efficient distribution of large maps for low bandwidth networks. Special attention will be given to develop efficient mechanisms that reduce the amount of data that has to be transferred from the client to the server. 9
  • SPIN!, IST-99-10536, 15.06.1999 10 B1. Workpackage list Work- Workpackage title Lead Person- Start End Phas Deliv- package contract months4 month5 month e7 erable No2 or 6 No8 No3 Coordination WP1 P1 34 0 36 - D1.1- 1.4 Identify user needs, define and WP2 P1 69 0 36 - D2.1- realize a generic system 2.6 architecture that integrates GIS and Data Mining functionality WP3 Extend machine-learning P2 42 0 36 - D3.1- methods to spatial mining 3.9 WP4 Generalize Bayesian Markov P1 40 0 36 - D4.1- Chain Monte Carlo to spatial 4.7 mining WP5 Adapt and integrate methods for P4 40 0 36 - D5.1- spatial pattern analysis 5.7 WP6 Develop support of visual P1 40 0 36 - D6.1- analysis of time-dependent 6.6 spatial data WP7 Develop methods for P1 40 0 36 - D7.1- visualization of Data Mining 7.6 results within GIS WP8 Application to seismic and P7 70 0 36 - D8.1- volcano data 8.9 WP9 Application to web-based P8 49 0 36 - D9.1- dissemination of data from 9.6 statistical offices 2 Workpackage number: WP 1 – WP n.- 3 Number of the contractor leading the work in this workpackage. 4 The total number of person-months allocated to each workpackage. 5 Relative start date for the work in the specific workpackages, month 0 marking the start of the project, and all other start dates being relative to this start date. 6 Relative end date, month 0 marking the start of the project, and all end dates being relative to this start date. 7 Only for combined research and demonstration projects: Please indicate R for research and D for demonstration. 8 Deliverable number: Number for the deliverable(s)/result(s) mentioned in the workpackage: D1 - Dn. 10
  • SPIN!, IST-99-10536, 15.06.1999 11 WP10 Develop a business model for P6 24 0 36 - D10.1- web based information and 10.5 service brokering with geo- referenced data WP11 Dissemination P8 38 0 36 - D11.1- 11.5 TOTAL 482 Distribution of Workload on work packages Partner P1 P2 P3 P4 P5 P6 P8 P8 Total Coord WP1 28 6 34 Techn. Dev. WP2 30 2 9 18 10 69 ML WP3 18 24 42 Bayes WP4 30 4 6 40 ESDA WP5 36 36 Vis. Spa-T WP6 28 12 40 Vis. DM WP7 28 12 40 Seis.Dat WP8 3 18 3 2 12 32 70 Stat. Off. WP9 3 6 2 4 34 49 Web-Brok. WP10 2 12 10 24 Dissem. WP11 2 8 2 14 4 8 38 172 24 20 96 36 56 36 42 482 11
  • SPIN!, IST-99-10536, 15.06.1999 12 B2. Deliverables list Deliverable Deliverable title Delivery Nature Dissemination No9 date level 10 11 12 D1.1 Project workplan 3 R PU D1.2 Reports for EC period. R PU D1.3 Project handbook 6 R PU D1.4 Project meetings period. R PU D2.1 System design document 8 R CO D2.2 Prototype 0 (incl. documentation) 12 P CO D2.3 Implementation of efficient methods for map transfer 15 P CO D2.4 Prototype 1 (incl. documentation) 18 P CO D2.5 Prototype 2 (incl. documentation) 30 P CO D2.6 Revision Release Prototype 2 (incl. documentation) (Final 32 P CO Release) D3.1 Theoretical report on spatio-temporal subgroup discovery 6 R PU D3.2 Theoretical report on adaptive sampling 21 R PU D3.3 Theoretical report on spatial association rules 5 R PU D3.4 Specifications of the descriptions to be automatically 15 R CO extracted from vectorized maps D3.5 Implementation of subgroup discovery 8 P CO 9 Deliverable numbers in order of delivery dates: D1 – Dn 10 Month in which the deliverables will be available. Month 0 marking the start of the project, and all delivery dates being relative to this start date. 11 Please indicate the nature of the deliverable using one of the following codes: R = Report P = Prototype D = Demonstrator O = Other 12 Please indicate the dissemination level using one of the following codes: PU = Public PP = Restricted to other programme participants (including the Commission Services). RE = Restricted to a group specified by the consortium (including the Commission Services). CO = Confidential, only for members of the consortium (including the Commission Services). 12
  • SPIN!, IST-99-10536, 15.06.1999 13 D3.6 Implementation of adaptive sampling for subgroup 23 P CO discovery D3.7 Implementation of spatial association rules 11 P CO D3.8 Software for the extraction of symbolic descriptions from 18 P CO vectorized maps D3.9 Report evaluating the application of first-order learning 36 R PU methods to spatial data D4.1 Report reviewing current Bayesian approaches 6 R PU D4.2 Software Implementation for bootstrap 11 P CO D4.3 Report on advanced spatial models and corresponding 15 R PU Bayesian models D4.4 Implementation of MCMC 18 P CO D4.5 Implementation of model selection 28 P CO D4.6 Performance evaluation and guidelines 36 R PU D4.7 Generic software library for spatial data transformations 6 P CO D5.1 Theoretical paper on algorithms for handling interaction 5 R PU with spatial location D5.2 Software for handling interaction with spatial location 11 P CO D5.3 Theoretical paper evaluating statistical clustering tests 14 R PU D5.4 Implementation of selected statistical clustering tests 18 P CO D5.5 Theoretical paper on algorithms for multiple search 24 R PU D5.6 Implementation of algorithms for multiple search 30 P CO D5.7 Reports on testing and evaluation of Spatial Analysis 36 R PU software tool Rule base on application of visualisation and interaction D6.1 16 P CO techniques depending on characteristics of data and the type of their time variation. D6.2 Software library implementing the proposed methods 26 P CO D6.3 Expert system engine performing selection of methods 30 P CO according to characteristics of data D6.4 Theoretical paper on algorithms for investigation of 18 R PU temporal changes D6.5 Implementation of algorithms for investigation of temporal 24 P CO changes D6.6 Evaluation report 36 R PU 13
  • SPIN!, IST-99-10536, 15.06.1999 14 D7.1 Description of the presentation methods proposed to apply 6 R PU to results of the considered data mining methods D7.2 Implementation of visualization method for subgroup 11 P CO discovery D7.3 Implementation of visualization method for spatial 12 P CO association rules D7.4 Implementation of visualization method for Bayesian 17 P CO classification D7.5 Implementation of best-practice methods for visualisation 17 P CO in ESDA Report on current & potential application methods in D7.6 36 R PU ESDA D8.1 Definition of user requirements 3 R PU D8.2 Description of the methods of space-time analysis and data 10 R PU mining of seismic data D8.3 Description of the methodology for designing seismic 15 R PU hazard information models D8.4 Software implementing the proposed methods within the 26 P CO SPIN! architecture D8.5 Evaluation report 24 R PU D8.6 Application of the software tools to the seismic active 34 P CO Eastern Mediterranean region D8.7 Application of the software tools to the high risk Merapi 36 P CO volcano D8.8 Integration of continuous monitoring data into the analysis 36 P CO process Report on the application of Spatial Mining to seismic and D8.9 36 R PU volcano data User requirements document for dissemination of D9.1 3 R PU statistical data D9.2 Description of data model 12 R CO D9.3 A prototype web site with interactive thematic maps that 16 P CO can be accessed over the internet D9.4 Prototype web-site based on SPIN prototype 2 30 P CO D9.5 Report about different user acceptance, recommendation 24 R PU for use, etc. D9.6 Report: recommendation of use 36 R PU D10.1 Define requirements for web-brokering 3 R PU 14
  • SPIN!, IST-99-10536, 15.06.1999 15 D10.2 Report describing existing brokering services, business 8 R PU model and property of rights problematic D10.3 Report addressing technical infrastructure 24 R CO D10.4 Prototype web-site for web-brokering 30 R PU D10.5 Final report on web-brokering 36 R CO D11.1 Project web page 3 R PU D11.2 Project description for the general public 2 P PU D11.3 First dissemination workshop 24 O PU D11.4 Second dissemination workshop 36 O PU D11.5 Feasibility study about commercialization 33 R PU 15
  • SPIN!, IST-99-10536, 15.06.1999 16 Introduction to workpackages The workpackages fall into several categories: technology development, research, application, exploitation. Figure 1 shows the main dependencies between the workpackages, but does not display feedback mechanisms which will be set up between all workpackages, as described in the section about project management. Building a spatial mining system is a demanding task. It requires expertise in many fields including Geographic Information Systems, Cartography, Statistics, Machine Learning, and Databases, as well as excellent software engineering skills. The consortium has been carefully chosen to ensure uncomprising competence in all these areas. It includes two industrial partners active in Data Mining and Geographic Information Systems (partner P5 and P6), a university and a national research center active in the areas of Data Mining, Machine Learning, and GIS (partners P2 and P1), an institute for geography active in Exploratory Spatial Data Analysis since the 80ies (partner P4), a university having a leading role in the dissemination of statistical data (partner P8), and two institutes active in seismic data research (partner P3 and P7). Each partner in the consortium has a unique area of competence not shared by the others, and brings into the consortium his expertise as well as his technologies. Adapt, Bayes Markov Visualization of Data Mining Chain Monte Carlo results to Spatial Mining Develop, adapt, Machine Learning algorithms to Spatial Mining Methods for spatio-temporal visualization Develop, adapt, Spatial Point Pattern Analysis Design, integrate GIS & DM platform Extending system for Application to Statistical Web-Based Information application to Seismic Offices Brokering Data Coordination Dissemination Technology Research Application Exploitation Figure 3. Main dependencies between work packages. 16
  • SPIN!, IST-99-10536, 15.06.1999 17 Risk management Many research and technology development projects fail since the typical risks of such a project are not taken into account. To prevent such a failure, the workplan has been designed to prevent typical causes of failure in advance. The main approaches taken towards risk management are: software reuse and incremental evolution of existing technology modular design of software components (plug-in architecture) strong user involvement early delivery of prototypes Involving users at all stages of the systems development is of utmost importance. The development process will implement iterative improvements to an incremental version of the system having delivered an original prototype for users to evaluate and suggest generic design modifications. The users will be involved in defining the system analysis requirements and in designing and testing the system right from the start. The users are responsible for providing evaluation reports, which serve as input to specific system design modifications. Since important modules of the final system already exist in a preliminary and non-integrated form, the users will be trained in using the individual systems at an early stage. This will help to shape their expectations and provide valuable feedback to the software developers. The users in work package WP9 already use the GIS technology developed by partner P1, so they can formulate specific requirements at an early stage minimising the likelihood that generic system requirements will undergo continuous change. The base integrating system platform will be an object-oriented plug-in style architecture to facilitate technological integration. The dependencies between work packages are reduced as plug-in components can be incorporated incrementally as they become available. In this way, revisions to the internal structure of either the client or the server should not affect the other parts. CORBA and RMI will be evaluated as integrating middle ware. Strong modularization should minimise the dangers of integrating technology developed separately by different groups. If for some reason one module were not delivered on time, this would not necessarily affect the implementation of other modules. Since partners P1, P3, and P4 have implemented major parts of the existing technology in Java anyway, risks of technology integration problems are already low. The Unified Modelling Language (UML) will be used for documentation and design to ensure product quality. Potential performance bottlenecks should be easy to spot at an early stage by applying the existing technology on test data provided by the users. The system needs to be interactive and users should not be made to wait too long for analysis results. Performance issues are addressed in a special task within WP2. Our approach to risk management has been tightly integrated within the overall technology development cycle of SPIN. Since an evolutionary approach containing several iterations is chosen, all work packages start at the kick-off meeting and end with the final workshop. 17
  • SPIN!, IST-99-10536, 15.06.1999 18 Gantt Chart 18
  • SPIN!, IST-99-10536, 15.06.1999 19 Main stages of technology development cycle Month Event Description of Event A kick-off-meeting will be held, where the users are informed in detail about the prospects of developing an SDMS, where alternative approaches will be discussed, and Kick-Off- 1 where the users will articulate specific expectations and requirements for the system. Meeting There will also be a tutorial session on Spatial Mining based on the existing technology The developer teams and the users will jointly define the user requirement report which User is due by month 3, and for which the users are responsible. This will be a major input for 3 requirement the system design. s report The existing, non-integrated systems will be applied to example data sets for further Test 5 clarifying user need, to spot performance bottlenecks at an early stage etc… applications The design specification is due in month 8. It is located mainly in WP2, but all work packages will contribute from their perspective. The report defines the intended Design 8 applications on a detailed level. On the basis of this document, the integration of the specification existing technologies will start and they will be merged in a single, coherent architecture. Developer A developer version (prototype 0) is due by month 12. This will be used for integrating version the modules developed in WP3-7, which will start at month 12. Users will get access to 12 (prototype this version as a technology preview. 0) Revised Initial feedback from users and developers will be used for making a revised system system design document which is due to month 15. 15 design document This will be used for developing the prototype 1, which is due in month 18. In this prototype, functionality from all work packages WP3-WP7 will be integrated, however, some functionality will still be missing (e.g. adaptive sampling for subgroup discovery in 18 Prototype 1 WP3). This prototype will be delivered to the users that will use them in their experimental applications. Users will evaluate whether the system meets the requirements specified in user requirements, and whether it meets the system design. The users will write an evaluation User report, which is due to month 24. In this month, an external workshop will be held 24 evaluation (WP11), where additional user groups and partners for commercial exploitation (WP10) report will be targeted. Users will have installed internally and even partially externally accessible web-sites, which will feature initial applications of the technology. Final design The user evaluation of prototype 1 will lead to modifications of the system design, where 27 document the final design document will be delivered in month 27. revision This will be input for the development of the prototype 2, which is due to month 30. It will integrate all technology developed in work packages WP3-WP7, and will be delivered to the users. With the full functionality available, the users will work intensely 30 Prototype 2 on their applications. The web-sites should be publicly accessible, so that feedback from a wider audience can be gathered. Experience in applications will lead to a revision release of prototype 2 in month 32. Revision The revision will cover the base system as well as the modules from work packages 32 release of WP3-WP7. prototype 2 Final user At the end of the project, the users will deliver a report describing their applications, evaluation; and they will give a final evaluation. A workshop for dissemination to a wider 36 Disseminati audience, for identifying partners for follow-up projects (WP11), and for partners for on potential commercialisation (WP10) will be held in this month. workshop 19
  • SPIN!, IST-99-10536, 15.06.1999 20 Pert diagram The diagram shows dependencies between tasks. To give a better overview, we have grouped tasks by category. Task numbers refer to the Gantt-Chart, which shows the exact starting and end date of tasks Kick-Off meeting 2.1 1 User require- Visualization ments Requirements 8.1, 8.2 6.1, 7.1 9.1, 10.1 System design 2.2 8 Data Mining 3.1, 3.3, 3.5,3.7, 4.1, 4.2, 4.7, 5.1, 5.4 Visualization 7.1-7.2 Prototype 0 2.4 Test & 12 Evaluation 8.3, 9.2, 9.3, 9.4 10.2 Design revision 2.2 15 Data Mining 3.4, 3.8, 4.3, 4.4, 5.3, 5.6 Visualization 6.1, 6.4 Prototype 1 2.5 Seismic data & statistical 18 offices 8.4, 8.5 9.4, 9.5 10.3, 10.4, 11.3 Evaluation 2.2 Data Mining 24 3.2, 3.6, 4.5, 5.2, 5.5 Visualization 6.2, 6.3, 6.5 Prototype 2 2.6 Real-world 30 Application 8.6, 8.7, 8.8, 8.9, 9.6, 9.7, 10.5 Final Workshop 11.5 36 20
  • SPIN!, IST-99-10536, 15.06.1999 21 Work package description Co-ordination The project brings together researchers, software developers, and users from a number of European countries, with different backgrounds and different approaches to spatial analysis and geographical modelling. To manage technology development, research, and exploit the component tools and system effectively, working package WP1 is devoted to co-ordination. Special attention has been given to define clear responsibilities and modular work package responsibilities and deliverables. The SPIN consortium will meet approximately every four months to establish and maintain an effective team. The management plan is based on a successfully applied EU project co-ordinated by partner P1 that is detailed in section C5 below. Technology development WP2 has the objective of designing an integrated system for Data Mining and GIS. This work package has the overall task of the technological integration of the existing GIS and Data Mining software, and to incorporate the modules developed in the other work packages in a coherent manner. It‟s the project„s technological hub, to which all partners will deliver, and whose deliverables all partners will need to have access to at some point. This will serve as a technological basis. We conceptually distinguish a base system and an integrated Spatial Mining system. Figure 4. The basic architecture of SPIN. Spatial mining and visualization methods can be added as plug-ins to the base system. Clients can access the system over the internet 21
  • SPIN!, IST-99-10536, 15.06.1999 22 The base system contains internet enabled GIS for automatic generation of interactive thematic maps Data Mining methods for nearest neighbour, decision trees, association rules, subgroup discovery, inductive logic programming, visualisation for these methods data transformation capabilities for discretization, restriction, projection, union, join, and calculated rows access to heterogeneous data sources (JDBC-compliant databases, ODBC, flat files, spatial data interfaces etc.), also over the internet facilities for organising and documenting analysis tasks. The existing Data Mining methods complement the spatial mining methods in the task of “explaining” spatial patterns in terms of non-spatial attributes. The internet enabled basis GIS module contains facilities for interactive manipulation of thematic maps. To provide automated visualisation, the GIS incorporates the knowledge of thematic cartography in the form of generic, domain-independent rules. To choose the adequate presentation techniques for given data, it takes into account data characteristics and relations among data components or attributes. The automation of map generation releases the user from the necessity of thinking how to present the data and from the routine work of map building and allows you to concentrate on the analysis of your data. This work package includes the steps of requirement analysis, design, implementation, testing, and documentation. Building the base system requires to integrate an already existing GIS tool and an existing Data Mining platform, both developed by partner P1. For tight integration a common Task manager, Data Management Layer, Extension API, and user interface have to be defined and implemented. The integrated system incorporates the Spatial Mining and visualisation methods developed in WP3-7 into the base system. Main input of this work package are the existing Data Mining and GIS systems, and the modules developed in WP3-7, the main output will be the integrated system. This integrated system will be developed in three main stages: prototype 0 (developer version), prototype 1 and prototype 2. User feedback will be gathered and evaluated from the first day on and will be used for improving the system. Research Work packages WP3, WP4, WP5 develop methods for Spatial Data Mining that can be added as a plug-in to the base system. A variety of methods have been selected for implementation, partially depending on previous experiences and results of the partners. Each partner has chosen a method for adaptation to whose advancement he has already made a theoretical and practical contribution, so that he is well acquainted with the subtleties of the chosen method; yet by combining the project partners expertise a broad range of advanced Data Mining techniques will be covered, from Bayesian Statistics (Partner P1, publication 6,8,9) and Neural Networks (Partner P1, publication 7) to symbolic approaches from Machine Learning and Inductive Logic Programming (Partner P1, publication 4, 10,11, Partner P2, publication 1,2,3) and genuine approaches to Spatial Cluster Analysis (Partner P4, publication 2,4). This gives the project a quite unique blend of depth of expertise with a broad range of methods covered. Since all these methods can be launched within a single, coherent platform, the project can also contribute to a comparison of the relative strengths and weaknesses of the methods and develop guidelines for their use in spatial mining. 22
  • SPIN!, IST-99-10536, 15.06.1999 23 All these work packages include a) state of the art review; b) theoretical advances, which will be communicated in a report; c) implementation and validation of the methods; d) integration with the base system; e) application to real-world tasks; f) documentation and final report. These stages are synchronised with the technology development cycle. These work packages have as their input previous theoretical and practical work of the partners and will have as their main output a theoretical description of the respective methods. Machine Learning (WP3). This work package is mainly concerned with the adaptation of symbolic machine learning methods to spatial data analysis. In particular methods to be adapted are Inductive Logic Programming algorithms for the discovery of subgroups and spatial association rules. They tend to be search intensive, and when dealing with large sets of data and high dimensional dependencies, scalability might become a problem. Moreover, most have been developed in order to satisfy classical properties of consistency and completeness, while in spatial data mining people are interested to detect patterns that satisfy minimum criteria for support and consistency. Adaptation of these machine learning tools will be based on the use of adaptive sampling, i.e. active learning techniques based on Bayesian Decision Theory, or on more efficient search strategies. Another contribution of this work package is the definition of appropriate algorithms for the automated extraction from vectorised maps of symbolic descriptions of parts (e.g., cells) of a map. Bayesian Statistics (WP4). A spatial relation may be described by a number of different models, leading to widely varying results. Currently the support for assessing and selecting models in GIS is very limited. Based on the extrapolation of the uncertainty of individual predictions of different models we will develop methods for a well-founded selection or combination of models. In the last years computationally intensive Bayesian methods have been developed that compare favourably with classical approaches. Instead of selecting an “optimal” model they generate a whole distribution of models which characterise their uncertainty in the light of the available data. On the one hand they derive predictive distributions for new inputs reflecting the actual information. On the other hand they allow a rigorous assessment of the adequacy of different model types. Partner P1 (publication 8,9) has developed Bayesian classification methods which use a Bayesian ensemble of decision trees or neural networks. These methods have already been successfully applied to credit scoring and will now be adapted to spatial data. Exploratory Spatial Data Analysis (WP5). This work package will explore methods of extending existing methods of spatial pattern detection. Currently ESDA methods tend to be concerned solely with the detection of spatial pattern and often overlook other data attributes. This shortcoming will be addressed by extending existing tools developed by partner P4 to handle attribute interaction with spatial location and to consider how temporal changes in spatial data can be investigated (see partner P4, publications 4 and 2). The tool will be expanded to use multiple search methods in addition to the current heuristic search used currently. There is also potential to investigate how different statistical tests of clustering can be used in the tool. Work packages WP6 and WP7 develop methods for visualisation of spatial and temporal information, and for the visualisation of Data Mining methods developed in WP3-5. Visualisation of spatial and temporal data (WP6). In most areas, spatially referenced data also refer to different moments or intervals in time. The study of such data is meaningless if their development in time is not taken into account. Analysis of spatially referenced data should be supported by their visual presentation in maps. Spatio-temporal data require substantial advancement of the traditional map form of presentation towards dynamics and high user interactivity. The work package aims at development of methods of visualisation of spatio-temporal data that can facilitate analysis of such data. The methods include not only graphical presentation by itself but also various data transformations and interactive manipulation of the displays. 23
  • SPIN!, IST-99-10536, 15.06.1999 24 Visualisation of Data Mining results (WP7). The form of presentation of data mining results to the user is crucial for their appropriate interpretation. Large amounts of information or complex concepts can be more easily comprehended when represented graphically. This especially applies to data and concepts having spatial reference or distribution. The objective of this work package is to design appropriate graphical techniques to represent results of the data mining methods developed within the project. The approach to be taken is a combination of cartographic and non-cartographic displays linked together through simultaneous dynamic highlighting of the corresponding parts (see partner P1, publication 1). The non-cartographic displays will represent the data mining results in summarised, generalised form while maps will provide the transition from general descriptions to individual spatial objects and phenomena characterised by them. Application The system will be used in several applications. One criterion for the selection of application areas is that a broad range of problem domains of special importance for the EU is covered, underlining the generality of the approach. A second criterion is that each of these areas should contribute in a unique way to evaluating/validating the adequacy of the chosen approach to Spatial Mining. This makes the evaluation process more focussed. An objective common to all application areas is to explore the applicability of advanced Data Mining methods. Specifically, spatial subgroup discovery, spatial Markov Chain Monte Carlo, and localised Spatial Point Pattern Analysis will be evaluated in each application area. Application to Seismic Data (WP8). In WP 1-7 a generic Spatial Mining System is developed. Such a kind of system has the important advantage that it has a potentially broad range of application areas and promotes technology reuse. However, some application areas will also need to incorporate specialised analysis methods. One of the main risks associated with the development of generic information technology is that an architecture that is not extensible may end up in not addressing the real needs of the user. Work package WP8 addresses this problem in an exemplary way. This will ensure that the generic system will be designed in a modular and extensible way right from the start. A key component is the plug-in architecture of the already existing Data Mining platform developed by partner P1, that allows for an easy integration of new modules. The application area selected for this task is earthquake prediction. This is a well-established scientific field belonging to physical geography, where a great amount of spatio-temporally referenced data from different sources is available. Research in this area has an obvious and great potential benefit for public health and quality of life. Advances in earthquake prediction could help to prevent massive financial losses. The objective of this work package is to adapt the generic system to the specialised application area of earthquake prediction and hazard assessment by integrating methods for natural hazard assessment that have been developed by partner P3. For achieving this goal, an integration layer between the generic Spatial Mining system and the specialised methods implemented by partner P3 has to be designed. Partner P7, which is active in the area of earthquake prediction for a long time, will profit from this technology by getting access to advanced and complementary methods for data analysis and by getting an instrument for the web-based dissemination of research results. Web-based dissemination of census data from statistical offices. A second application area is the analysis and web-based dissemination of census data from statistical offices. Here the main objective is to put to practical use the timely, cost-effective dissemination of statistical information over the internet. Partner P8 has several years‟ experience in developing tools for web based access to large spatial data sets and provides an academic service for access to census data. These tools are primarily for visualising database contents, data browsing and locating and mapping spatial data and they can handle spatial and aspatial referencing systems. Partner P8 also has access to a SUNE6500 super- server for academic applications. Additionally the project will be supported by the national census agency, which currently with the partner are planning the tools and services for public access to the forthcoming national census in 2001. 24
  • SPIN!, IST-99-10536, 15.06.1999 25 This work package will allow evaluation of the efficiency of the developed methods and of the responsiveness of the application as well as acceptance by customers of statistical offices. Potential problem areas are the availability of bandwidth, the number of concurrent users, and the size of maps and data sets. Especially if Data Mining analysis over the internet is permitted, the performance of the server will be of central importance. Experiences in this application area will be crucial for improving the prototype 1 system for better efficiency (which is a task within WP2). Dissemination and Exploitation Web-based brokering. Statistical offices, public agencies, and scientific institutions often face the problem that their initial efforts to build up a public database are externally funded, but the maintenance of such a service is not. Funding agencies require more and more that these institutions develop business plans for commercialising such a service in the long-run (at least for-non scientific use). The aim of this work package, for which the industrial partners will be responsible, is to develop a detailed concept for a web based information brokering service with georeferenced data as a foundation for a cost-effective dissemination of data. Web-based, interactive Spatial Mining can add a tremendous value to the mere distribution of data. This added value can be the key for commercialising the distribution of data for statistical offices, public agencies, and scientific institutions. What is new about this proposal is that the customer does not need to buy or to install any complex and expensive software on his computer, yet is not confined to the usual printed, non-interactive reports. An interactive thematic map is delivered over the internet using the Java technology. This map can be used by the customer for further exploration as well as for presentation and decision making. There will be different levels of service, as suggested by the following example business scenarios. The project will deliver technology to solve tasks 1-4 and provides the technological basis for task 5. The feasibility of this concept will be tested in a demonstrator. Customer needs Business Solution Customer Customer gets supplies 1. An institute for ecological Building a Data & Maps Interactive map on the studies prepares a environmental thematic map for internet report and needs a visualisation for predefined data their vegetation data and vegetation and map maps to make a presentation 2. A statistical office needs a Building a Data Interactive map on the visualisation of data about land use thematic map for internet predefined data 3. A department for urban Building a map, Description of Interactive Map with cluster development needs a local map data & map Data & detection, significance showing hazard risks for decision brokering Location testing making 4. A company running a power Maps periodically Description Interactive Map with cluster plant needs visualisation of updated from a Location; detection, significance monthly aggregated environmental database via the Data that have testing, periodically updated data for monitoring. internet to be periodically refreshed 5. A consulting company prepares Geomarketing A descriptive Interactive Map with cluster a market study for the chances of consulting task detection, significance sustainable tourism; for this it testing, visualisation of data needs access to data from different mining results; a summary 25
  • SPIN!, IST-99-10536, 15.06.1999 26 sources such as census data and report about Data Mining data about nature protection and results pollution in this area. Dissemination. The technology developed in this project is of a generic nature and has a broad range of potential applications. Yet potential user groups may be unaware of the existence of the type of technology the project develops, or they may have false expectation about it. The aim of this work package is to address the general public, as well potential users and partners for commercial exploitation. Dissemination will be an ongoing activity and will include organisation of workshops, maintaining a project web page, systematically identifying additional user groups that could act as partners in follow-up projects, providing project descriptions for the general public. Partner 6 will perform a feasibility study for commercialising technology developed especially within the application to seismic data. To this end they will actively search for a partner in the area of noise- level zoning. This is expected to become a major issue in the next two to three years in Holland, because of anticipated new legislation. This third application, where the partner will not be directly involved into the project, is also an application that demonstrates the potential of the technology for environmental decision making. A project sheet will be due in month 3, as well as a project web-site. Beginning with month 12, when a technological preview version will be available, potential additional user groups and potential customers will be systematically identified and contacted, so that knowledge about the project will be spread around. This activity will increase when the prototype 1 becomes available in month 18. A public workshop will be organised bringing together users, developers, potential users, as well as other interested people, in month 24. A second public workshop will be organised in month 36, concluding the project. 26
  • SPIN!, IST-99-10536, 15.06.1999 27 B3. Workpackage description Workpackage number : WP1 - Coordination Start date or starting event: 0 Participant number: P1 P4 Person-months per participant: 28 6 Objectives Overall and technical management. This will involve A) Overall Management Ensure that the various phases of the project are properly coordinated Development of project workplan Monitoring and reviewing progress of work Handling administrative procedures relating to European Commission Reporting to the European Commission Supporting a good communication between the partners B) Technical Management Writing of a project handbook including quality management plan Responsibility for critical technical decision which affect the project as a whole Definition of quality standards relevant to the project and determination how to satisfy them Description of work A) Overall Management T1. Ensure that the various phases of the project are properly coordinated T2. Development of project workplan (partners P1, P4) T3. Monitoring and reviewing progress of work T4. Handling administrative procedures relating to European Commission T5. Reporting to the European Commission T6. Scheduling of meetings B) Technical Management T7. Write a project handbook including quality management plan (partners P1, P4) T8. Responsibility for critical technical decision which affect the project as a whole (partners P1, P4) T9. Define quality standards relevant to the project and determination how to satisfy them (partners P1, P4) Deliverables D1. Project workplan (T2) D2. Reports for EC (T5) D3. Project handbook (T7) D4 Periodical project meetings (T6) Milestones and expected result Milestones of this workpackage are synchronized with the milestones of WP2: M1: System design (8), M2: Prototypes 0 (12), M3: prototype 1 (18), M4: prototype 2 (30) 27
  • SPIN!, IST-99-10536, 15.06.1999 28 B3. Workpackage description Workpackage number : WP2 Integrate Data Mining and GIS (Technology development) Start date or starting event: 0 Participant number: P1 P4 P3 P5 P6 Person-months per participant: 30 9 2 18 10 Objectives This workpackage has the overall task of the technological integration of the existing GIS and Data Mining software, and to incorporate the modules developed in the other workpackages in a coherent manner. It‟s the project„s technological hub, to which all partners will deliver, and whose deliverables all partners will need to have access to at some point. For tight integration of existing components a common Task manager, Data Management Layer, Extension API, and user interface have to be defined and implemented. The base system is designed as an object-oriented plug-in architecture, facilitating technological integration. Unified Modelling Language (UML) will be used for documentation and design to ensure product quality. CORBA and RMI as a middleware for integration will be evaluated. The integrated system incorporates the Spatial Mining and visualization methods developed in WP3-7 into the base system. Description of work T1. Organize kick-off meeting for identification of users needs T2. Design of the SPIN! system architecture T3. Develop efficient methods for transfer of data and maps over the internet (partner P6) T4. Implementation of developer version (prototype 0) T5. Technological integration of software developed in Task 1.3, 1.4 with spatial mining modules and visualization modules, resulting in prototype 1 T6. Testing and validation, revision of design, getting user input, improving system, resulting in prototype 2 T7. Revision release of second prototype (final release) Deliverables D1. System design document (T1, T2) D2. Prototype 0 (software & documentation) (T3) D3. Implementation of efficient methods for transfer of data and maps over the internet (partner P6) D4. Prototype 1 (software & documentation) (T4, T5) D5. Prototype 2 (software & documentation) (T6) D6. Revision release of prototype 2 (Final Release) (software & documentation) (T7) Milestones and expected result A user-friendly, internet enabled, extensible Spatial Mining software tightly integrating Data Mining and GIS functionality System providing a broad variety of methodological approaches to Spatial Mining that can be operated within a single environment M1. Specification of design (month 8) M2. Delivery of Prototype 0 (month 12) M3. Delivery of prototype 1 (month 18) M4. Delivery of prototype 2 (month 30) 28
  • SPIN!, IST-99-10536, 15.06.1999 29 B3. Workpackage description Workpackage number : WP3 – Extending machine learning methods to spatial mining Start date or starting event: 0 Participant number: P2 P1 Person-months per participant: 24 18 Objectives This workpackage mainly concerns with the adaptation of symbolic machine learning methods to spatial data analysis. In particular methods to be adapted are Inductive Logic Programming algorithms for the discovery of subgroups and spatial association rules. Moreover, some have been developed in order to satisfy classical properties of consistency and completeness, while in spatial data mining people are interested to detect patterns that satisfy minimum criteria for support and consistency. Adaptation of these machine learning tools will be based on the use of adaptive sampling, i.e. active learning techniques based on Bayesian Decision Theory, or on more efficient search strategies, to increase scalability. Another contribution of this workpackage is the definition of appropriate algorithms for the automated extraction from vectorized maps of symbolic descriptions of parts (e.g., cells) of a map. By evaluating Bayesian posterior distributions or their approximations, the uncertainty of subgroup quality indicators may be assessed. Relatively large subgroups with potentially high indicator values have a high utility and the sampling of new data from the corresponding spatial locations is rewarding. Active learning stops if the cost (negative utility) of collecting new data is higher than the expected utility of the subgroups that might be discovered. Description of work T1. Develop concepts for the definition of subgroup criteria linking space, time, domain knowledge. T2. Define criteria for adaptive sampling integrating the utility of subgroups as well as the cost of data collection and computation. Develop adaptive sampling methods based on Bayesian posterior distributions or their approximations T3. Investigate properties of spatial association rules and adapting rule discovery system to spatial association rules T4. Investigate the representation language to be adopted for the representation of parts of a vectorized map. T5. Software implementation of spatio-temporal subgroup discovery (without adaptive sampling) T6. Software implementation of spatio-temporal subgroup discovery with adaptive sampling T7. Software for the discovery of spatial association rules T8. Develop algorithms for the extraction of symbolic descriptions from vectorized maps T9. Application and evaluation of implemented methods to real-world data Deliverables D1. Theoretical report on spatio-temporal subgroup discovery (T1) D2. Theoretical report on adaptive sampling (T2) D3. Theoretical report on spatial association rules (T3) D4. Specifications of descriptions to be automatically extracted from vectorized maps (T4) D5. Software for spatio-temporal subgroup discovery (T5) D6. Software for adaptive sampling (T6) D7. Software for the discovery of spatial association rules (T7) D8. Software for the extraction of symbolic descriptions from vectorized maps (T8) D9. Report evaluating the application of first-order learning methods to spatial data (T9) Milestones and expected result The work done in this workpackage will advance the state of the art in spatial data analysis by adapting methods from Machine Learning to Spatial Mining, especially first-order learning methods. They are a natural and promising approach to Spatial Mining, since they allow to represent spatial relations directly. Work in this package is synchronized with the milestones M1-M4 of WP2: for each prototype a set of methods will be delivered 29
  • SPIN!, IST-99-10536, 15.06.1999 30 B3. Workpackage description Workpackage number : WP4 - Generalize Bayesian Markov Chain Monte Carlo to Spatial Mining Start date or starting event: 0 Participant number: P1 P4 P6 Person-months per participant: 30 4 6 Objectives Currently the support for assessing and selecting models in GIS is very limited. Based on the extrapolation of the uncertainty of individual predictions of different models we will develop methods for a well-founded selection or combination of models. Partner P1 has developed Bayesian classification methods which use a Bayesian ensemble of decision trees or neural networks, which will be adapted to spatial data. We will use the Bayesian approach in several directions: calculation of a predictive density characterizing the predictive or classification uncertainty for new inputs The main algorithms use asymptotic expansions and Markov Chain Monte Carlo (MCMC); selection of optimal models by comparing their performance according to the Bayes factor and related methods; Generation of ensembles of models of different type, e.g. using Bayesian model averaging and reversible jump MCMC. An approximate Bayesian techniques is the bootstrap. We will analyse the relative merits of this approach in comparison to Bayesian models. Besides the classical spatial statistics models (e.g. kriging) we will concentrate on localized models which adaptively partition the input area and generate different submodels. Promising candidates are radial basis functions, mixtures of experts and multivariate adaptive regression splines. Selection criterion is their adequacy for the intended application. Description of work T1. Report reviewing current approaches of spatial classification, prediction and interpolation T2. Implementation of selected current approaches using bootstrap techniques. T3. Report on advanced spatial models and the corresponding Bayesian algorithms. T4. A basic implementation of Bayesian MCMC for selected models. T5. Implementation of MCMC- or approximate Bayesian model selection / averaging. T6. Report on performance evaluation for spatial mining methods and guidelines for selecting models depending on data and prior conditions. T7. Implement a generic library for spatial data transformations used by the mining algorithms (Partner P6, P4) Deliverables D1. Report reviewing current approaches of spatial classification, prediction and interpolation (T1) D2. Implementation for bootstrap (T2) D3. Report on advanced spatial models and the corresponding Bayesian models (T3) D4. Implementation for MCMC (T4) D5. Implementation for model selection (T5) D6. Report on performance evaluation for spatial mining methods and guidelines (T6) D7. Generic software library for spatial data transformations (T7) Milestones and expected result adaptation of several advanced statistical models to the spatial domain, a comprehensive assessment of prediction/classification uncertainty for GIS, flexible framework for model formation, and model checking in a GIS-context. Work in this package is synchronized with the milestones M1-M4 of WP2, where methods will be delivered 30
  • SPIN!, IST-99-10536, 15.06.1999 31 B3. Workpackage description Workpackage number : WP5 – Adapt and integrate methods for spatial pattern analysis Start date or starting event: 0 Participant number: P4 Person-months per participant: 36 Objectives This work package will explore methods of extending existing methods of spatial pattern detection. Currently ESDA methods tend to be concerned solely with the detection of spatial pattern and often overlook other data attributes. This shortcoming will be addressed by extending existing tools developed by partner P4 (Partner P4, publication 3) to handle attribute interaction with spatial location and to consider how temporal changes in spatial data can be investigated. The tool will be expanded to use multiple search methods in addition to the current heuristic search used currently. These methods will include genetic algorithms, artificial life, and multi-agent techniques (WP 3). Partner P4 has already carried out some limited experiments with these techniques (Partner P4, publication 3) but will also investigate ways that the search techniques can be used together in the form of a hybrid search system. There is also potential to investigate how different statistical tests of clustering can be used in the tool. The development of the system as a modular Java based program allows other tests to be dropped into the tool for testing and comparison. Combined with this work, the methods developed in this work package will be designed to work closely with input and output functions developed in work packages 2 and 7. This will include the evaluation of CORBA and ODBC methods for data input and output. Description of work T1. Investigate algorithms for handling attribute interaction with spatial location T2. Implement attribute interaction with spatial location T3. Evaluate statistical clustering tests T4. Implement selected statistical clustering tests T5. Investigate algorithms for multiple search T6. Implement algorithms for multiple search T7. Testing and evaluation of software tool. Deliverables D1. Theoretical paper on algorithms for handling attribute interaction with spatial location (T1) D2. Implementation of attribute interaction with spatial location (T2) D3. Theoretical paper evaluating statistical clustering tests (T3) D4. Implementation of selected statistical clustering tests (T4) D5. Theoretical paper on algorithms for multiple search (T5) D6. Implementation of algorithms for multiple search (T6) D7. Reports of testing and evaluation of software tool. (T7) Milestones and expected result This workpackage will provide a variety of spatial pattern analysis methods for SPIN! system. Work in this package is synchronized with the milestones M1-M4 of WP2, where the implemented methods will be successively integrated into the prototype 31
  • SPIN!, IST-99-10536, 15.06.1999 32 B3. Workpackage description Workpackage number : WP6 - Support of visual analysis of time-dependent spatial data Start date or starting event: 0 Participant number: P1 P4 Person-months per participant: 28 12 Objectives In most areas, spatially referenced data also refer to different moments or intervals in time. The study of such data is meaningless if their development in time is not taken into account. Analysis of spatially referenced data should be supported by their visual presentation in maps. Spatio-temporal data require substantial advancement of the traditional map form of presentation towards dynamics and high user interactivity. The workpackage aims at development of methods of visualisation of spatio-temporal data that can facilitate analysis of such data. The methods include not only graphical presentation by itself but also various data transformations (e.g. calculation of the absolute or relative magnitude or the rate of change since the previous or the specified moment, time aggregation, etc.) and interactive manipulation of the displays. Thus, the user may move forth and back along the time axis, vary the animation step or the length of the aggregation interval, select objects or areas in the map to view data and temporal trends for them in detail, possibly, in supplementary non-cartographic displays, and so on. The results of data transformation may be directed to the data mining procedures. Description of work T1. Review the existing types of time variation (e.g. changes in object existence, position, shape, or associated attribute values) and analysis tasks that can emerge in relation to these types. T2. Develop combined visualisation-interaction methods productive for fulfilling these analysis tasks. T3. Software implementation of the visualisation and interaction methods and their selection depending on characteristics of data and their temporal variation. T4. Develop algorithms for investigation of temporal changes T5. Implementation of algorithms for investigation of temporal changes T6. Evaluate methods developed in T1-T5 in applciations Deliverables D1. Rule base on application of visualisation and interaction techniques depending on characteristics of data and the type of their time variation. (T1) D2. Software library implementing the combined visualisation-interaction methods proposed.(T2) D3. Expert system engine performing selection of methods according to characteristics of data. (T3) D4. Report describing the implemented visualisation-interaction methods (T3) D5. Theoretical paper on algorithms for investigation of temporal changes (partner 4) (T4) D6. Implementation of algorithms for of temporal changes (partner 4) (T5) D7. Evaluation report (T6) Milestones and expected result Advancing the state of the art in visualization methods especially for the visualization of temporal data Work in this package is synchronized with the milestones M1-M4 of WP2, where the implemented methods will be successively integrated into the prototype. 32
  • SPIN!, IST-99-10536, 15.06.1999 33 B3. Workpackage description Workpackage number : WP7 - Visualisation of data mining results Start date or starting event: 0 Participant number: P1 P4 Person-months per participant: 28 12 Objectives The form of presentation of data mining results to the user is crucial for their appropriate interpretation. Large amounts of information or complex concepts can be more easily comprehended when represented graphically. This especially applies to data and concepts having spatial reference or distribution. However, to play their role effectively, graphical displays must be properly designed in respect to the principles of human perception. The objective of this workpackage is to design appropriate graphical techniques to represent results of the data mining methods developed within the project. This work will be informed by the results obtained by partner P4 (publication 9) during development of public access GIS. The approach to be taken is a combination of cartographic and non-cartographic displays linked together through simultaneous dynamic highlighting of the corresponding parts (Partner P1, publication 3, Partner P4, publication 3). The non-cartographic displays will represent the data mining results in summarised, generalised form while maps will provide the transition from general descriptions to individual spatial objects and phenomena characterised by them. The techniques to be developed will apply general principles of graphical presentation established through cognitive psychological studies and analysis of ”best practice” in visualisation. These principles are expounded in the literature on graphics design and cartography. Description of work T1: Identify the types and formats of results produced by the data mining and statistical methods to be developed and develop a methodology of visual representation of the results based on principles of graphics design. (partners P1, P4) T2. Implementation of visualization method for spatial subgroup discovery (partners P1) T3. Implementation of visualization method for spatial association rules (partners P1) T4. Implementation of visualization method for Bayesian classification (partner P1) T5. Implementation of best practice in visualisation methods in ESDA (partners P4) T6. Testing & validation in applications (partners P1, P4) Deliverables D1. Description of the presentation methods proposed to apply to results of the considered data mining methods (T1) D2. Implementation of visualization method for spatial subgroup discovery (T2) D3. Implementation of visualization method for spatial association rules (T3) D4. Implementation of visualization method for Bayesian classification (T4) D5 Implementation of best practice in visualisation methods for ESDA (T5) D6 Report on current and potential visulalisation methods in ESDA (T6) Milestones and expected result This WP provides visualizations for the methods developed in WP3-5, so that they can be used in linked displays. Work in this package is synchronized with the milestones M1-M4 of WP2, where the implemented methods will be successively integrated into the prototype 33
  • SPIN!, IST-99-10536, 15.06.1999 34 B3. Workpackage description Workpackage number : WP8 – Application to seismic and volcano data Start date or starting event: 0 Participant number: P7 P3 P1 P4 P6 P5 Person-months per participant: 32 18 3 3 12 2 Objectives The objective of this workpackage is to adapt the generic system to the specialised application area of earthquake and volcanic eruption prediction, and seismic hazard assessment. This will be achieved by integrating Data Mining methods for natural hazard assessment that have been developed by partner P3 and P7. Partner P7 runs monitoring observatories at the Merapi volcano, which has been classified as a high-risk volcano. Integration of these monitoring data into the analysis process may give new results for understanding volcanic hazards. As an extension to the core SPIN! system, an online intelligent system for processing and analysis of natural hazard monitoring data as well as for decision making support will be integrated. For achieving this goal, an integration layer between the generic Spatial Mining system and the specialized methods implemented by partner P3 has to be designed. From the point of view of the application area, the technical objectives of using the SPIN! system are: Cartographic representation of earthquakes and volcano monitoring information. Automatic extraction of the most essential information on seismic hazard (parameters of seismic regime such as seismic activity, seismic energy, b-value and so on) from seismic monitoring data. Automatic extraction of essential information on volcanic hazard (seismic activity, dome deformation measurement, rock fall, gas chromatography as well as geology, topography and land use data). Support of the administrative decisions referenced to seismic and volcano hazard. Support of earthquake and volcano eruption prediction research. This application will test the SPIN! system on scientifically highly important real-world problems and will provide valuable feedback for directing the development efforts. A succesful application would demonstrate the usefulnes of the SPIN! system for applications in the natural sciences. Description of work T1. Define user requirements T2. Investigate methods of space-time analysis and data mining of seismic data. T3. Investigate methodology for designing seismic hazard information models T4. Implement software for the proposed methods as a plug-in to the SPIN! architecture. T5. Evaluation report T6. Apply of software tools to the seismic active Eastern Mediterranean region. T7. Apply software tools to the high risk Merapi volcano. T8. Integrate of continuous monitoring data into the analysis process. T9. Final Report on the application of Spatial Mining to seismic and volcano data Deliverables D1. User requirements report (T1) D2. Description of the methods of space-time analysis and data mining of seismic data. (T2) D3. Description of the methodology for designing seismic hazard information models (T3) D4. Software implementing the proposed methods as a plug-in to the SPIN! architecture. (T4) D5. Evaluation report (T5) D6. Application of the software tools to the seismic active Eastern Mediterranean region. (T6) D7. Application of the software tools to the high risk Merapi volcano. (T7) D8. Integration of continuous monitoring data into the analysis process. (T8) D9. Report on the application of Spatial Mining to seismic and volcano data (T9) 34
  • SPIN!, IST-99-10536, 15.06.1999 35 Milestones and expected result Cartographic representation of earthquakes and volcanic monitoring information, automatic extraction of the most essential information on seismic hazard from seismic monitoring data; automatic extraction of essential information on volcanic hazard, support of the administrative decisions referenced to seismic and volcanic hazardand support of earthquake and volcanic eruption prediction research. M1. Description of methodology (10) M2. Software integration into SPIN system M3. Applications to seismic and volcano data Work in this package is synchronized with the milestones M1-M4 of WP2, where the application area helps to direct the development by formulating user requirements and evaluations. 35
  • SPIN!, IST-99-10536, 15.06.1999 36 B3. Workpackage description Workpackage number : WP9 – Web based dissemination of data from statistical offices Start date or starting event: 0 Participant number: P8 P4 P1 P5 P6 Person-months per participant: 34 6 3 2 4 Objectives A second application area is the analysis and web-based dissemination of census data from statistical offices. Here the main objective is to put to practical use the timely, cost-effective dissemination of statistical information over the internet. This will allow to evaluate the efficiency of the developed methods, the responsiveness of the application, as well as acceptance by customers of statistical offices. Potential problem areas are the availability of bandwidth, the number of concurrent users, and the size of maps and data sets. Especially if Data Mining analysis over the internet is permitted, the performance of the server will be of central importance. Experiences in this application area will be crucial for improving the prototype 1 system for better efficiency (which is a task within WP2). Description of work T1. Defining user requirements T2: Selecting and preparing maps and data for the application T3: Adapt generic Spatial Mining system to the specific needs of the application and building a prototype web site with maps and data T4: Using the GIS as a web-based front end for Data Mining Tasks, using prototype 1 from WP2 T5: Collecting user experiences and improving the system T6: Delivering the final release, using prototype 2 form WP2 T7: Writing a final report about the application and about user acceptance Deliverables D1. User requirements document (T1) D2: Report containing description of data model and data characterization schema (T3) D3: A prototype web site with interactive thematic maps that an be accessed over the internet (T2, T4) D4: Evaluation report (T5) D5: A prototype web site with interactive thematic maps based on SPIN! prototype 2 (T6) D6: Report about user acceptance, recommendation for use (T7) Milestones and expected result A web site based at a statistical office used for web published dissemination of statistical data and spatial mining over the internet Knowledge about different types of users accessing such a system Practical experiences for the use of internet based spatial mining M1. Prototype web site M2. Extended web site containing prototype 1 from WP2 M3. Extended web site containing prototype 2 from WP2 Work in this package is synchronized with the milestones M1-M4 of WP2, where the application area helps to direct the development by formulating user requirements and evaluations 36
  • SPIN!, IST-99-10536, 15.06.1999 37 B3. Workpackage description Workpackage number : WP10 – Web-based information brokering service Start date or starting event: 0 Participant number: P5 P6 P1 Person-months per participant: 12 10 2 Objectives Statistical offices, public agencies, and scientific institutions often face the problem that their initial efforts to build up a public database are externally funded, but the maintenance of such a service is not. Funding agencies require more and more that these institutions develop business plans for commercializing such a service in the long-run (at least for-non scientific use). The aim of this workpackage, for which the industrial partners will be responsible, is to develop a detailed concept for a web based information brokering service with geo-referenced data as a foundation for a cost- effective dissemination of data. Web-based, interactive Spatial Mining can add a tremendous value to the mere distribution of data. This added value can be the key for commercializing the distribution of data for statistical offices, public agencies, and scientific institutions. A web based information brokering service which distributes interactive thematic maps over the internet is proposed. What is new about this proposal is that the customer does not need to buy or to install any complex and expensive software on his computer, yet is not confined to the usual printed, non-interactive reports. An interactive thematic map is delivered over the internet using the Java technology. This map can be used by the customer for further exploration as well as for presentation and decision making. There can be different levels of service. Description of work T1. Identify needs of statistical offices, public agencies, and scientific institutions with respect to dissemination of geo- referenced data T2. Make a survey on existing services, describe business models applicable to this service, address property of rights problematic T3. Describe required technical infrastructure for web-based information brokering service T4. Build a prototype web site for information brokering T5. Prepare final report with specific recommendations for setting up such a service Deliverables D1. Report defining requirements of the application (T1) D2. Report addressing existing services, business model and property of rights problematic (T2) D3. Report technical infrastructure (T3) D4. Prototype website (T4) D5. Final report (T5) Milestones and expected result Detailed technical and economical guidelines for setting a web based dissemination service Identification of advantages and risks M1. Delivery of D2 M2. Delivery of D3 M3. Delivery of D4 37
  • SPIN!, IST-99-10536, 15.06.1999 38 B3. Workpackage description Workpackage number : WP11 - Dissemination Start date or starting event: 0 Participant number: P6 P5 P1 P8 P7 P4 Person-months per participant: 14 2 2 8 4 8 Objectives The technology developed in this project is of a generic nature and has a broad range of potential applications. Yet potential user groups may be unaware of the existence of the type of technology the project develops, or they may have false expectation about it. The aim of this workpackage is to address the general public, as well as potential users and partners. Dissemination will be an ongoing activity and will include organization of workshops, maintaining a project web page, systematically identifying additional user groups that could act as partners in follow-up projects, and providing project descriptions for the general public. Partner P6 will make a feasibility study for commercializing the SPIN! system in the area of noise-pollution. Description of work T1. Maintaining a project web page, T2. Providing project descriptions for the general public T3. Organization of dissemination workshop 1, T4. Feasibility study for commercialization T5. Organization of dissemination workshop 2, T6. Systematically identifying additional user groups that could act as partners in follow-up projects, Deliverables D1. Project web page (T1) D2. Project description for the general public (T2) D3. First dissemination workshop (T3) D4. Second dissemination workshop (T5) D5. Feasibility study on prospects of commercialization (partner P6) (T4) Milestones and expected result Effective dissemination of project results Identification of user groups for follow-up projects M1. Project web-page (month 3) M2. Workshop 1 (month 24) M3. Workshop 2 (month36) 38
  • SPIN!, IST-99-10536, 15.06.1999 39 Part C C1. Title. Spatial Mining for Data of Public Interest SPIN! Proposal No. IST-1999-10536 Proposal for: IST programme, 1.1.2-5.1.4 Cross-Programme Action CPA4: New Indicators and statistical methods 39
  • SPIN!, IST-99-10536, 15.06.1999 40 C2. Contents for part C C3. Community added value and contribution to EU policies 3 C4. Contribution to Community social objectives 4 C5. Project management 5 C6. Description of the consortium 6 C7. Description of the participants 8 C8. Economic development and scientific and technological prospects 18 Appendix – Publications of partners cited in part B 40
  • SPIN!, IST-99-10536, 15.06.1999 41 C3. Community added value and contribution to EU policies Building a spatial mining system is a demanding task, since it requires expertise in many fields including Geographic Information Systems, Cartography, Statistics, Machine Learning, and Databases, as well as excellent software engineering skills. The consortium has been carefully chosen to ensure uncompromising competence in all these areas. It includes two industrial partners active in Data Mining and Geographic Information Systems (Dialogis GmbH, Germany; PGS, Holland), a university and a national research center for informatics active in the areas of Data Mining, Machine Learning, and GIS (University of Bari, Italy; GMD, Germany), an institute for geography active in Exploratory Spatial Data Analysis since the 80ies (University of Leeds, England), a university having a leading role in the dissemination of statistical data in the UK (Manchester Metropolitan University/MIMAS), and two institutes active in seismic data research (IITP, Russian Academy of Sciences, Moscow; GeoForschungszentrum Potsdam, Germany). Thus partners come from four different EU countries – England, Holland, Italy, and Germany – and from one NIS country (Russia), forming a truly European consortium. Involvement of the Russian Academy of Sciences promotes scientific exchange with NIS countries. Europe gets added value by getting access to the work of a group that has more than 20 years of expertise in this field, and has developed some very mature technologies. The group combines technological skill with expertise in their application area earthquake prediction and hazard management in a unique way. Earthquake prediction and hazard management is an area that has an enormous an obvious potential impact on quality of life and health. It could help to prevent massive financial losses. It is a vital interest of the EU to get access to technologies for an improved hazard management. Independently of the proposal, GMD and GFZ have made appointments to invite members of IITP as guest researchers. This will promote scientific interchange with NIS countries and will make an intense collaboration possible. In recent years, the partners have individually developed many of the technological and methodological pieces needed to build an integrated spatial mining system. A project that wanted to build a spatial mining system from scratch would need dozens of person years for developing the tools which are already available as the starting point for the SPIN! consortium. The existence of this body of technology is a precondition for the iterative approach to software development chosen by the consortium, since user input can be gathered right from the start of the project. This in turn reduces the risk of failure. A concentration of expertise and existing tools such as in the SPIN! consortium can not be found within a single European country. Only by joining efforts on a European scale the critical mass needed to develop a spatial mining system ready for real-world applications can be achieved. This will offer perspectives for the dissemination and exploitation of the results that were impossible on a national level. One such area for further exploitation is European biodiversity research. The Organisation for Economic Co-operation and Development (OECD) working group for Biodiversity Informatics has recommended the installation of a Global Biodiversity Information Facility (GBIF). Key technologies 41
  • SPIN!, IST-99-10536, 15.06.1999 42 needs that have been identified include some kind of integrated DMS and GIS. Here is an exciting opportunity to develop a spatial mining solution as a coordinated European effort which can be linked to develop a European perspective within GBIF, which is currently dominated by research focussed in the USA and Australia. Biological informatics is perceived as a key technology of the next century. From the strategic perspective of GMD knowledge discovery team, biodiversity informatics will be a major application area in which the techniques developed in this project can be put to very good use, supporting several European conventions, especially the Convention on Biological Diversity (CBD). The project supports EU policies directed towards SME‟s. Both Dialogis and PGS are SME‟s active in the Data Mining and GIS market. These companies will be responsible for the exploitation of the SPIN! technology. The Data Mining platform Kepler and the GIS tool Descartes are the technological basis on which the SPIN! system will be built. Both systems have been co-developed and commercially distributed by Dialogis, and its market position will be significantly increased by the new technology. PGS also plans to incorporate the technology in its product line (see C8 for details). C4. Contribution to Community social objectives New economic prospects The SPIN! project has the goal of combining state of the art research with commercial exploitation. Both goals have been kept firmly in mind in the design of the workpackages. The commercial potential of this new technology has been specifically addressed in workpackages 10 and 11. In WP10 a business concept for a web-based information brokering service is developed by the industrial partners Dialogis GmbH and PGS. A key goal of this business concept is to support public agencies etc. in the cost-effective dissemination of data of public interest. Funding agencies require more and more that such services are commercialized in the long-run (at least for-non scientific use). The added value which the SPIN! technology offers can be a key for commercialization. Sustainable Development The SPIN! project can have a major impact on promoting the Local Agenda 21. The Local Agenda 21 is the process that aims to involve local people and communities in the design of a way of life that can be sustained and thus protect the quality of life for future generations. It originates in the Rio- conference in 1992 which led to the agreement of an Agenda 21 document detailing a series of strategies for action world-wide. The Local Agenda 21 is a highly democratic, consensus-building and empowering process. This can only be achieved with the help of leading-edge information technology. The SPIN! project, with its focus on data of public interest, provides such a technology. Specifically, it helps statistical offices in the user-friendly dissemination of census data, where customers get access to powerful yet easy to use analysis tools. The Descartes system is already used by statistical offices and by urban planners in several European countries, and they are also potential users of the new technology. Quality of life & health As the German national research centre for Earth sciences, the GFZ carries out research and development projects on a very broad scale of fields which are of direct relevance for the fulfilment of the principle of a sustainable development as enshrined in the Treaty of Amsterdam. Namely the Fifth European Community environment programme: "Towards sustainability" (see European Parliament and Council Decision 2179/98/EC) is asking for appropriate measures to improve health and safety in particular in relation to the management of natural and industrial hazards, nuclear safety and radiation protection as well as the improvement in energy efficiency, a reduction in the consumption of fossil fuels and the promotion of renewable energy sources (see e.g. Communication from the Commission COM(1998) 571 final). Co-operative research executed on a long-term basis at a very high level allocates substantial contributions to the construction of the legislative framework aimed at combating 42
  • SPIN!, IST-99-10536, 15.06.1999 43 pollution and protecting the environment like documented Communication from the Commission COM(97) 592 final. Through the creation of the European Macroseismic Scale 1998 (EMS-98) and the resulting as well as associated regulations and standards like EUROCODE-8 with all its accompanying National Application Documents, the GFZ has set an important cornerstone for the safety of the local communities of the Europe of tomorrow. The application of the SPIN! system to hazard management will help GFZ to contribute to those policies by improving the quality of data analysis and by providing means for a timely distribution of data relevant for hazard management. Protection of environment PGS will apply the SPIN! system to the problem of noise-level zoning, which is expected to become a major issue in the next two to three years in Holland, because of anticipated new legislation (WP11). This will demonstrate the potential of the new technology for the protection of environment. C5. Project management Work Organization The work is organized in a set of well identified work-packages. For each work-package a partner acting as coordinator is identified. The work-package coordinator is responsible for managing the execution of tasks associated with his work-package. In turn, for each single task an operative partner is identified The work-package coordinators have been chosen among the partners according to their past experience and present role in the specific technical fields. Work-package coordinators are responsible for the performance of their associated operative partners and will have discretion to manage the resources allocated to them. Furthermore, work-package coordinators will directly respond to the technical committee (see below). See the work-plan for a detailed description of the work organization. Team Organization The team organization directly reflects the division of the work into work-packages. For each work- package a working team will be constituted by the several operative-partners and the respective coordinator. In addition to these work-package teams, a Technical Committee will be appointed. Its mission will be to manage the project developments in terms of technical content. It will be constituted by a member of every SPIN! partner. Overall Management The project management of SPIN! is seen itself as a work-package and as such it will have an appointed coordinator. GMD will be coordinator of this work-package, thus acting as the overall project-manager of this proposal. The overall project-manager will act as the contact point between the consortium and the Commission project Officer, and is responsible for the overall execution and performance of the project. The management work-package of SPIN! is divided into the following tasks; Overall management Technical management Overall Management. The overall management objective is to ensure that the various phases of the project are properly coordinated in order to maximize the project success. 43
  • SPIN!, IST-99-10536, 15.06.1999 44 Two complementary steering activities are foreseen: Development of the project work plan. From this activity should result a formally approved work- plan document used to manage and control the project execution. The work-plan will serve as a basis for follow-up. Nevertheless, it should be expected to be complemented and modified over time as more detailed information becomes available. Project monitoring and review. Project performance must be measured on a regular basis to identify deviations from the plan. In addition to the “continuous” monitoring done in an informal way – mainly by e-mail - status review meetings are scheduled on a quarterly basis. These meetings will be attended by all the work- package coordinators. The result of these meetings should be a detailed assessment about the work progress of the consortium. All project plan change requests should be presented and discussed in this forum. The result of these meetings will be updated work-plan documents. At each status review meeting, the work-package coordinators should present a progress report which addresses the following issues: Current progress of the work-package in general and for each task in particular, Unresolved issues and required actions to solve them. All administrative procedures between the consortium and the Commission belong to the responsibility of the overall project manager. They include: Distribution of EC funding to the participating partners. Preparation of documentation in view of the required periodic reports. Managing all the procedures related with the potential commercial exploitation of the results. GMD will co-ordinate the project and will be the point of contact to the Commission for all administrative and financial business. Technical Management. Technical management will be carried out by the Technical Committee. Its role is to address specific technical issues, namely To take critical technical decisions which affect the project as a whole, such as general system architecture, integration requirements for the several software components, common development tools, and so on. To define which quality standards are relevant to the project and determining how to satisfy them. It will as well be responsible for monitoring specific project results, determining whether they comply with the relevant quality standards, and identifying ways to eliminate causes of unsatisfactory quality performance. The Technical Committee will meet whenever considered necessary, with a minimum periodicity of four months. Ideally, the technical meetings will be merged with the status review meetings. In the course of the first three months of the project, the Technical Committee should document the organization of the technical work in written form by providing a Project Handbook that also includes a Quality Management Plan. IPR. The SPIN! Consortium has agreed to handle IPR related matters along the following lines: 1. Each partner will keep the rights for the software and methods he brings into the project 2. Each partner will get the rights for the commercial exploitation of his main deliverables 3. If a partner contributes a small part to another partner's main deliverable, this second partner can exploit the contribution for free 4. If a partner wants to exploit the main deliverable of another partner, a special agreement is needed 44
  • SPIN!, IST-99-10536, 15.06.1999 45 Several such agreements are foreseen and are vital for a joint exploitation, the details however have to be settled in the first three months after the project starts. Dialogis and PGS already have made an successful joint exploitation agreement in the CommonGIS project. C6. Description of the consortium GMD will be the coordinator of the project. It will provide the technology on which the project is based and will bring in its expertise in Data Mining and GIS. Within several EU projects, original Data Mining methods have been developed. The GIS tool Descartes is used in several web based applications in the area of nature protection, urban decision making and census data by several countries. There exists extensive experience in design and implementation of Data Mining tools and client/server systems. During the last years, the data mining platform Kepler and the GIS-tool Descartes have been developed. The University of Bari (Machine Learning) will be active in the evaluation, adaptation and development of machine learning algorithms to the task of spatial analysis. More specifically, they will be in charge of the specification of quality measures for spatial association rules and the adaptation/implementation and test of the algorithms for the discovery of such rules. A further contribution from the University of Bari will be the specification and implementation of algorithms for the automated feature extraction from maps. The University of Leeds (Geography) has theoretical and practical expertise about web based mapping, developing and applying spatial analysis and modelling tools to geographical data, and will be responsible for the spatial pattern analysis module. In the last year, an internet enabled version of the Geographical Analysis Machine, whose origins go back to the 1980s, has been developed to allow exploratory spatial data analysis to be carried out over the internet. This work has also developed other more advanced spatial analysis tools, which can take attributes relating to cases into account. The group also has experience in development of web based mapping tools based on a Java toolkit, GeoTools. The Institute for Information Transmission Problems, Russian Academy of Sciences (IITP RAS) will bring in their expertise in seismic data analysis, spatial statistics, and decision support. It will join the project as a full partner, yet without funding. The group of Valeri Gitis is working in the field of geoinformation technology for more than 20 years. Members of group have fundamental knowledge and experience both in modern information technology and in seismology. The group has got original results on pattern recognition and artificial intelligence. A part of these results got an award from Hewlett Packard in Competition of Works on Pattern Recognition in 1992. The Group developed several original geoinformation technologies for natural hazard assessment and environmental zonation. The basic direction of the group activity nowadays is devoted to developing intelligent network geoinformation technologies and systems. Dialogis GmbH, a SME located in Sankt Augustin, will be responsible for technology integration and exploitation. Dialogis commercially distributes Descartes and Kepler, and is active in the areas of Data Mining and GIS consulting, and develops Data Mining solutions for database marketing. It has strong experiences in software design and development, as well as Data Mining and GIS consulting. 45
  • SPIN!, IST-99-10536, 15.06.1999 46 PGS, Amsterdam PGS will make its Lava/Magma products available to the project. It will adapt and extend the interfaces to Lava/Magma so that it can be integrated with the knowledge acquisition tools and the knowledge based visualization environment. PGS will develop software modules for data characterization tools and the data visualization environment. PGS will be responsible, together with Dialogis, for the packaging of the developed software into commercially viable components, that can be integrated into its Lava/Magma product line. GeoForschungszentrum Potsdam, Physik des Erdkörpers und Desasterforschung. The Section Earthquakes and Volcanism of GFZ is a research group (10 Scientists, 8 PhD students, 10 technicians and engineers, several students) with research focus and experience on origins of hazards, development and installation of monitoring networks and early warning systems, and training experts in seismic hazard assessment, in particular in developing countries. Manchester Metropolitan University is the largest non-federal university in the UK. Within the Department of Environmental and Geographical Sciences is the GIS and Remote Sensing Research Group. The main areas of research are in Internet mapping, the access to spatial databases over the Internet, web based educational technologies, satellite remote sensing, digital image processing and environmental modelling. C7. Description of the participants GMD - German National Research Center for Information Technology Description of the partner GMD is Germany's national research center for information technology. It is a non-profit, limited liability private company (GmbH) whose shareholders are the Federal Republic of Germany and the Federal States of Hesse, Berlin and North Rhine-Westphalia. GMD has a staff of about 1300. The annual budget is approximately Euro 95 million, almost 30% of which come from externally funded R&D projects. GMD's main research areas are communication and co-operation, intelligent multimedia systems, system design technology and scientific computing. Research and development activities are application-oriented, and most projects co-operate with partners from industry and science. The Institute for Autonomous intelligent Systems is one of the eight institutes of GMD and has a staff of about 150 people. The Knowledge Discovery Team (KD) is a research group (12 scientists, 2 PhD students, several students) located in the field of artificial intelligence. A recent survey established that GMD as a whole is the leading German research institution in this area in terms of publications and citations. Professional experience and expertise in this group include Data Mining, Inductive Logic Programming, Bayesian Statistics, Neural Networks, Geographic information systems, and databases. The Knowledge Discovery team has extensive experiences with EU projects: it participates in several EU projects (currently ILP2 (Inductive Logic programming), KESO (Knowledge Extraction for Statistical Offices), MLNet 2 (Machine Learning Network of Excellence 2) coordinates one (CommonGIS (Common Access to Geographically Referenced Data)), and has participated in several others in the past. The team has a lot of experience in design and implementation of commercial quality software systems. During the last years, it developed the data mining platform Kepler and the GIS-tool Descartes. Kepler and Descartes are also used by scientific partners in USA, Russia, Netherlands, UK, Portugal, and Germany. 46
  • SPIN!, IST-99-10536, 15.06.1999 47 Key personnel Dr. Willi Klösgen has developed methods and tools for partially automating data exploration at GMD since the mid eighties. He has led various projects for a wide range of data mining applications, including market research, tax and transfer legislation, medical research, production control. Willi Klösgen has contributed to the main KDD (Knowledge Discovery in Databases) workshops and conferences and has organized the international conferences on New Techniques and Technologies for Statistics for Eurostat. He is the chief editor of the Handbook of Data Mining and Knowledge Discovery, which will appear later this year at Oxford University Press. He is a member of the editorial boards of KDD and related journals. He has studied mathematics, statistics, and physics at several German universities and received his Ph.D. in 1972 from Bonn University. Dr. Gerhard Paass has designed statistical and knowledge-based algorithms and tools for extracting structure from data at GMD since the mid eighties. Among others he has worked on probabilistic Bayes networks, neural networks, bootstrap methods, Bayesian Markov Chain Monte Carlo procedures and Bayesian decision theory. He has led a number of projects aiming at the elicitation of the information content and uncertainty of statistical procedures with applications in database security, vague reasoning, adaptive sampling and exploration as well as credit scoring of enterprises. He is adjunct Prof. of Neurocomputing, at the Queensland University of Technology, Brisbane and on the editorial board of the International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. He has studied mathematics, statistics, computer science, and economy and received his Ph.D. in 1983 from Bonn University. Dr. Gennady Andrienko and Dr. Natalia Andrienko received a Ph.D. equivalent in Computer Science from Moscow State University in 1992 and 1993, respectively. They worked on knowledge- based systems at the Mathematics Institute of Moldavian Academy of Sciences (Kishinev, Moldova), then at the Institute on Mathematical Problems of Biology of Russian Academy of Science (Pushchino Research Center, Russia). Dr. Gennady Andrienko also worked as an assistant professor at Pushchino State University, conducting a course on GIS and supervising students for their master degree. In 1995 and 1996, they visited GMD as guest researchers. Since July 1997 they have a research position at GMD. Since November 1998 they play a key roles in EU-funded CommonGIS project. They are authors of many papers that have been published in international journals and conference proceedings. Their research interests and experiences are interactive computer graphics, automated knowledge-based cartographic visualization, and visual geo-data exploration. Dr. Michael May is the team leader of GMD Knowledge Discovery Team and coordinates the research efforts at the intersection of Data Mining and Knowledge Discovery, visual knowledge exploration, and databases. He studied Philosophy of Science and Computer Science and holds a PhD degree in Philosophy of Science, where he worked on a computer simulation of causal reasoning. He has professional experience as a software consultant and database developer for major companies, among them Deutsche Shell AG and SAP AG. His research interest are the formalization of causal reasoning processes and the application of Data Mining and Data Warehousing especially to the analysis of biological data. Dr. Hans Voss received his Diploma in Computer Science from Bonn University in 1981, and his Research Doctorate in Computer Science from University of Kaiserslautern in 1986. Since 1986 he is working at GMD, Sankt Augustin, Germany. He was head of projects on real-time expert systems, on the product development of the hybrid expert system tool Babylon, and on knowledge engineering in the context of diagnostic expert systems. From 1992 until 1998 he was head of the research area on Cooperative Design (some twenty researchers). He was coordinator of the EU-funded project GeoMed-F, and he is currently coordinator of the EU-funded project CommonGIS. He is particularly interested in the integration of various technologies in order to support cooperative/competitive spatio-temporal planning and decision-making. 47
  • SPIN!, IST-99-10536, 15.06.1999 48 Department of Informatics of the University of Bari Description of the partner The “Università degli Studi” of Bari is one of the largest state universities of Italy both in the number of enrolled students (above sixty-thousand) and in the number of curricula and specialisation courses available (above forty). The Department of Informatics was founded in 1991 in order to guarantee administrative autonomy to the former Institute of Information Science created in 1973. It recruits 200-250 undergraduates annually for the Diploma (three-years) and Laurea (five-years) Degree in Informatics, and currently has an academic board of 32 full/associate/assistant professors and administrative/technical staff of 15 people. The research strengths of the Department fall into four main categories: Machine Learning, Image Processing and Pattern Recognition, Software Engineering, Human-Computer Interaction. The Machine Learning Group, which will be the main research group of the Department involved in the SPIN! project, comprises 3 permanent members, 2 Ph.D. students and approximately 3 external collaborators. The Group has access the facilities of the research laboratory LACAM (Laboratorio di Acquisizione della Conoscenza ed Apprendimento nelle Macchine). Other members of LACAM laboratory that collaborate with the Machine Learning Group have competence in human-computer interaction and automated interpretation of topographic/cadastral maps. The Machine Learning Group has been active in the area of knowledge acquisition and machine learning since 1986. Its members have developed several machine learning systems, both supervised and unsupervised ones. They have worked on real-world applications of machine learning tools and techniques, such as intelligent document processing, digital libraries and geographic information systems. The Group has been involved in several national and European research projects, which include: Esprit Project N.5203 INTREPID (INnovative Techniques for REcognition and ProcessIng of Documents), 1991-93; ESPRIT project SODAS 20821 (Symbolic Official Data Analysis System). ESPRIT project CONCERTO 29159 (Conceptual Indexing, Querying and Retrieval of Digital Documents) 1998-2000. National project Intelligent Agents (funded by the Italian Ministry for Universities and Scientific/Technological Research), 1998-1999. In this project our unit has to develop a learning server available on-line for intelligent agent-based applications. The Machine Learning Group is a node of MLNet, the European Network of Excellence on Machine Learning (Esprit Projects 7115 and 29288), and of Compunet European Network of Excellence on Computational Logic (Esprit Project 7230). Its members have also participated to the project LHM (Human and Machine Learning) funded by the European Science Foundation. Key personnel Floriana Esposito is full professor of Computer Science and responsible of the laboratory LACAM. Since 1997 she has been director of the Interdepartmental Centre for Logic and Applications of the University of Bari, and Dean of the Faculty of Informatics of the University of Bari. Currently, she lectures on “Algorithms and Data Structures” and “Knowledge Engineering and Expert Systems”. Currently, her main research interests are in similarity based learning, multistrategy learning, incremental learning and discovery of causal models. She is author of more than 100 papers published in refereed journals and conference proceedings. She is in the directorial board of the Italian Association for Artificial Intelligence (AI*IA) and is currently responsible of the national Machine Learning Group. She has been in the program committees of many international conferences (ECML‟94-98-2000, AI*IA‟93-95-97-99, ECAI‟96, ICDAR‟97-99, ICML‟99); she organised the 13th Int. Conf. on Machine Learning (ICML‟96), and co-chaired the 4th Int. Workshop on Multistrategy Learning MSL‟98. Donato Malerba is an associate professor at the University of Bari, Department of Informatics, where he lectures on “Databases and Knowledge-Base Systems” and “Computer Programming.” For 48
  • SPIN!, IST-99-10536, 15.06.1999 49 the past decade, he has been active in machine learning and its applications to intelligent document processing, knowledge discovery in databases, map interpretation, and intelligent interfaces. He has published several papers in refereed conferences and journals and received the best paper award of the Symposium on “Knowledge Discovery in Databases” - 13th European Meeting on Cybernetics and Systems Research. He has served in the program committee of the Int. Conf. on Machine Learning (ICML‟96, ICML‟99), of the AI*IA workshop on Machine Learning and Natural Language Processing (Turin, December 1997), of the ICML‟99 Workshop on Machine Learning in Text Data Analysis, and of the 2nd Int. Conf. on Innovation through Electronic Commerce (IeC‟99). He acted/acts as key personnel in all ESPRIT projects in which the Machine Learning Group of Bari has been involved into. Antonietta Lanza is assistant professor at the University of Bari, Department of Informatics, where she teaches courses for the computers science curriculum. She received her first appointment with the University of Bari in July 1984. From 1978 to 1981 she was fellowship affiliated with the C.S.A.T.A. (Centro Studi di Automazione e Tecnologie Avanzate) in Bari and the Institute of Physics of the University of Bari. Initially her research interests were in student modeling and computer-based- instruction with applications of many technologies (CAI, CBT, Hypertext, AI, ITS). At present, her main research activity is in man-machine interaction, machine learning and knowledge acquisition; applications include pre-processing, feature-extraction and interpretation of topographic charts and cadastral maps. She has published several papers on national and international journals and conferences on the above topics. School of Geography at the University of Leeds Description of the partner The Centre for Computational Geography is the largest research group within the School of Geography at the University of Leeds consisting of 14 researchers and 8 postgraduate research students. The School of Geography was rated five (on a scale of 1-5*) in the last UK university research assessment exercise. Research in the centre involves problems from both human and physical geography. The group specialises in the development and application of exploratory spatial data analysis tools and other artificial intelligence techniques in geography. This work has included the application of fuzzy logic to areas such as flood forecasting, new geodemographics systems. Other projects the group have worked on include the design of census output areas for both the UK and Italy, flexible output systems that assure the confidentiality of the data. The group has recently finished a project looking at predicting the interaction of human systems and land use degradation processes in the Mediterranean basin (MEDALUS). The centre is also the premier high performance computing group involved in social science research in the UK. The group have been active in developing a culture of use of high performance computers in the social sciences and especially in geography by developing key parallel applications for others to use. Key personnel Stan Openshaw is the professor of Human Geography and a fellow of the Royal Geographical Society and of the Royal Statistical Society. He is the director of the Centre for Computational Geography. He has been researching intelligent hyperspace search methods and AI techniques for a significant period of his career. He has written a book on AI in geography and developed several generations of search machines for cluster location in multi-dimensional space. His other research interests are in the application of parallel computing techniques to the development of computational geography and human systems modelling. He gained his PhD from the University of Newcastle, UK in 1974 and was employed at the University of Newcastle until 1992 when he moved to the School of Geography at Leeds University to become professor of human geography. 49
  • SPIN!, IST-99-10536, 15.06.1999 50 Ian Turton is a senior departmental research fellow in the Centre for Computational Geography. After completing his BSc in Geophysics and Planetary Physics at the University of Newcastle in 1988 he moved to the University of Edinburgh where he completed a PhD in Geophysics in 1992. He has been a researcher in the Centre for Computational Geography for the past seven years. During this time he has worked on a variety of projects utilising artificial intelligence methods in geographical applications and in the application of parallel programming methods to hard problems in geography. At the present time he is working on the development of smart pattern detection methods for the analysis of rare diseases. He is the co-author, with Prof. Openshaw, of a textbook on parallel programming applications for geography. He and Prof. Openshaw teach a masters level module in Java for Geographers and are writing a book on this topic. Ian is also working with a postgraduate student in the CCG on the development of a portable mapping toolkit written in Java. Linda See is a research fellow in the Centre for Computational Geography. She obtained a degree in physical geography and environmental management at the University of Toronto in 1988, she then completed an MSc at McMasters University in Climatology in 1990. Linda then moved to become an Associate Professional Officer at the Max Plank Insitut für Aeronomie in Germany. In 1991 she became the technical co-ordinator in the Global Information and Early Warning System of the Food and Agriculture Organisation (FAO) of the UN in Rome. In 1995 she began a PhD in Fuzzy Logic Applications in Geography at the School of Geography, Leeds. Since completing this in 1998 she has worked on the application of soft computing methods, i.e., fuzzy logic, neural networks and genetic algorithms, to spatial problems. She is currently involved in the development of better geodemographic systems using these technologies. Andy Turner is a research fellow in the Centre for Computational Geography. In 1996 he completed a degree in Maths, Statistics and Geography from the University of Leeds. Following this he gained a MA in GIS from the School of Geography at Leeds. Since 1997 he has been employed in the Centre for Computational Geography on a variety of projects. Recently Andy has compared the performance of two major commercial data mining packages with the capabilities of in-house spatial analysis tools for geodemographic targeting. He also has extensive experience with neuro-fuzzy methods, spatial pattern analysis and geographical information systems (GIS). The Institute for Information Transmission Problems, Russian Academy of Sciences (IITP RAS) Description of the partner IITP RAS was founded in 1961. At present basic direction of researches are the information theory and applied mathematics, computer and communication sciences in technique, management, nature, language and living systems. Among the most important topics researched by Institute are the problems of the theory linear analysis of complex systems, image processing, pattern recognition, intelligent geoinformation technologies, error coding correction. There is a stable scientific body of highly trained and young specialists, composed of mathematicians, physicists, biologists, linguists, computer scientists and engineers – a total of 320 collaborators. Now there are 9 members and corresponding members of Russian Academy of Sciences, 205 full professors and doctor of sciences. IITP RAS will bring in their expertise in seismic data analysis, spatial statistics, and decision support. It will join the project as a partner, yet without funding. The group of Valeri Gitis is working in the field of geoinformation technology for more than 20 years. Members of group have fundamental knowledge and experience both in modern information technology and in seismology. The group has got original results on pattern recognition and artificial intelligence. A part of these results got an award from Hewlett Packard in Competition of Works on Pattern Recognition in 1992. The Group developed several original geoinformation technologies for natural hazard assessment and 50
  • SPIN!, IST-99-10536, 15.06.1999 51 environmental zonation. The basic direction of the group activity nowadays is devoted to developing intelligent network geoinformation technologies and systems. Key personnel Valeri Gitis is Ph. doctor in Technical Cybernetics and Information and is the head of the department on Geoinformation Technologies and Systems. His fields of research are geoinformation technology, artificial intelligence, seismic hazard and risk assessment,. earthquake prediction. He holds grants by the Russian Basic Research Foundation (N 97-07-90326) ”Information technology for space-time forecasting in Earth sciences” – leader of Project, INCO-COPERNICUS (IC 15 CT97 0200) "Assessment of Seismic Potential in European Large Earthquake Areas (ASPELEA)" – leader of Russian part of Project, Russian Basic Research Foundation (N 99-07-90326) ”Word Data Center Online” – researcher. Arkadi Vainchtok, education: Moscow Electrotechnical Institute of Telecommunication, 1970. He is a Member of the Artificial Intelligence Association of Russia. His fields of research are pattern recognition, expert systems, speech analysis and recognition, geoinformation technology and instrumental environment. Boris Osher holds a Ph.D in Physics and Mathematics: "Uncertainties in earthquake maximum magnitude estimation" at the Institute of Physics of the Earth, Russian Acad. Sciences, Moscow (1997). His main scientific interests are Geographic information systems; Estimation of seismic hazard and risk; Time and space variations and interconnection of geophysical features; Earthquake records processing; Dialogis Software & Services GmbH, St. Augustin, Germany Description of the partner Dialogis Software & Services GmbH is a spin-off company of GMD, the German National Research Center for Information Technology. Dialogis was founded in March 1997. Dialogis turns, in close collaboration with GMD, research prototypes into marketable products. The currently evolving product line of Dialogis is composed of three packages that offer support for different facets of decision making processes: The data mining system Kepler offers facilities to extract knowledge from collections of data, hence providing the user with information for an informed decision; The geographic visualization tool dialoGIS (Descartes) displays geographically based data in an easily understood manner, hence making otherwise often incomprehensible data intuitive; The Zeno system for mediated decision making processes helps groups of users to effectively employ each participants knowledge to arrive at a decision that optimally reflects the entire groups knowledge and opinions. Dialogis is also a partner in the EU ESPRIT project CommonGIS number 28983. Key personnel Dietrich Wettschereck holds a Ph.D. in Computer Science (machine learning) from Oregon State University, USA. From 1994 to 1997 he was a post-doctoral researcher at GMD where he pursued research topics related to data mining and machine learning, participated in several European research projects, and participated in the development of the data mining system Kepler. He is co-founder and technical director of Dialogis Software & Services GmbH. His responsibilities at Dialogis include supervision of and participation in research and development projects as well as data mining consultancy Andrea Lüthje works as a Data Mining Consultant and product manager for Kepler at Dialogis. Her responsibilities at Dialogis include coordination of user requirements and software development as 51
  • SPIN!, IST-99-10536, 15.06.1999 52 well as providing data mining consultancy and end-user training. She has a Diploma in the Social Sciences. Professional GEO Systems B.V. (PGS), Amsterdam Description of the partner Professional GEO Systems B.V. was founded in March 1996, by a group of researchers from the TNO (a national Dutch research organization), and the University of Amsterdam. PGS has developed the first commercial pure Java map viewing environment (Lava/Magma). This was a further development of GEO++ (developed by TNO and marketed by PGS). Currently PGS offers the following products and services: The Java based map/viewing environment Lava/Magma (used for example in a decision support system build by the United States Geological Survey (http://dss1.er.usgs.gov/)). The „Vastgoed Informatie Web‟ (Real-Estate Information Web). A datawarehouse based system for integrated information management for all real-estate related information in a municipality (including cadastral, environmental, planning, zoning and management infomation). The map viewing environment is based on Lava/Magma, and the system is completely Internet based. Services for implementing the VIW at municipalities in Holland. General consulting services related to the design and implementation of geographical information systems. Key personnel Frank Tuijnman holds a Ph.D. in Computer Science from the University of Amsterdam USA. As a lecturer at the University of Amsterdam he pursued research topics related AI, robotics and distributed database management systems, and has published numerous articles on these topics. He has participated in several European research projects. He is co-founder and a director of Professional GEO Systems B.V. GeoForschungsZentrum, Potsdam, Germany Description of the partner The GeoForschungsZentrum (GFZ) is the German national research centre for Earth sciences, founded on January 1st, 1992 on the Telegrafenberg in Potsdam. Financing is provided by the Federal Ministry of Education and Research. GFZ has a staff of about 600, out of which are 300 scientists. The annual budget is approximately Euro 35 million, about 30% are externally funded. As the first of its kind world-wide, the GFZ combines all solid earth science fields including geodesy, geology, geophysics, mineralogy and geochemistry, in a multidisciplinary research centre. 22 sections are organised in five divisions according to the main topics of the GFZ: Kinematics and Dynamics of the Earth, Solid Earth Physics and Disaster Research, Structure and Evolution of the Lithosphere, Material Properties and Transport Processes and Rock Mechanics and Management of Drilling Projects. Research is accomplished by the use of a broad spectrum of methods and techniques, such as satellite geodesy and remote sensing, geophysical deep sounding, scientific drilling, experiments under in-situ conditions and modelling of geo-processes. The GFZ maintains various instrument pools for field research and global measurement campaigns, a team of engineers for the development of geoscientific instruments and a group of specialists for the Task Force Earthquake. An underlying principle is to combine the geoscientific know-how of universities and other research centres in national and international joint projects. The Section Earthquakes and Volcanism is a research group (15 Scientists, 8 PhD students, 10 technicians and engineers, several students) with research focus and experience on origins of hazards, 52
  • SPIN!, IST-99-10536, 15.06.1999 53 development and installation of monitoring networks and early warning systems, and training experts in seismic hazard assessment, in particular in developing countries. The group has experience in with EU projects (PRENLAB 1,2; BBMT 1,2 and EPOC-CT91/0043). Key Personnel Prof. Dr. Jochen Zschau, since 1992 director of GFZ division “Disaster Research” and since 1996 director of GFZ division “Solid Earth Physics and Disaster Research” and head of the section “Earthquakes and Volcanism”. He holds a Ph.D. since 1974 from the Kiel University, and became a professor of geophysics in 1980 at the Kiel University. His field of research is general and theoretical geophysics, potential theory, regional and global dynamics, rheology, earthquake prediction and volcano monitoring. He is member of the European Seismol. Commission – Subcomm. on Earthquake Prediction (Chairman, since 1996), European Advisory Evaluation Committee for Earthquake Prediction (Council of Europe, Vice president, since 1994), Scientific Advisory Board of the German Committee for the IDNDR (Member, since 1994), IASPAI Subcommission on Earthquake Prediction (Member, since 1993) and German Task Force Committee for Earthquakes (Chairman, since 1993). Heiko Woith, holds a Ph.D. since 1996 from the University of Kiel in geology with research in hydrology, nuclear physics and earthquake prediction. He is the responsible scientist and manager of the project READINESS (REAltime Data Information Network in Earth ScienceS) which is related to large scale fault zone interaction in the Eastern Mediterranean. Claus Milkereit, holds a Ph.D. since 1998 from the University of Potsdam in geophysics with research in theoretical geophysics, time series analysis, seismology and earthquake prediction. His main task is monitoring of the seismic activity at the western end of the North Anatolian Fault near Istanbul. Malte Westerhaus, holds a Ph.D. since 1996 from the University of Kiel in geophysics with research in tilt and well level tides along active faults, volcano monitoring and earthquake prediction. He is the responsible scientist and manager of the ground deformation within the project MERAPI and the deformation network at the North Anatolian Fault in Turkey. Anita Pfaff, holds a Diploma in Geography and is Ph.D. student on presentation of geological and geophysical mapping and monitoring data. Manchester Metropolitan University/MIMAS Description of the partner Manchester Metropolitan University is the largest non-federal university in the UK. Within the Department of Environmental and Geographical Sciences is the GIS and Remote Sensing Research Group. The main areas of research are in Internet mapping, the access to spatial databases over the Internet, web based educational technologies, satellite remote sensing, digital image processing and environmental modelling. The group also hosts the UNIGIS, which is a world-wide consortium of educational establishments providing a common programme of distance education in GIS. Currently this comprises over twenty institutions in sixteen countries and operates through a web-based system of education management and delivery. The group has led several research projects in the field of GIS and World Wide Web technologies. The main ones are the KINDS projects for the access to large spatial data sets over the Internet (http://midas.ac.uk/kinds). A summary of the functionality of the KINDS system is given in Table 1. The KINDS Projects are undertaken in collaboration with the University of Salford IT Institute and Manchester Computing which has MIMAS (formerly MIDAS) the national academic data provider which hosts and supports the use of national spatial data sets including the census and Bartholomew‟s and Ordnance Survey Map data. A major collaborator in the KINDS Project is the Office of National Statistics that produces and distributes the UK Census. 53
  • SPIN!, IST-99-10536, 15.06.1999 54 Key Personnel Dr. Jim Petch is the leader of the GIS and Remote Sensing Group. He is coordinating several projects in the Application of Mathematical and Statistical Models of Complex Systems to the Analysis of Spatial Strucutre of Remotely Sensed Images, the Effective Sustainable Use of Network Accessible Datasets, and Catchment modelling of Hydrological Parameters. C8. Economic development and scientific and technological prospects Public access to the immense volume of existing geo-data and their exploitation is of significant value for the development of an open and democratic ”information society” and a true global market. The widespread use of geo-data and GIS will promote general public awareness and further social cohesion. Publicly available geo-data is, however, of little use unless people can easily access and easily exploit it. Here, the SPIN! system will advance the state of the art, and we expect that the exploitation of the results will be done throughout the following axes: Software Components; Demonstrators; Application framework; additional user groups; other dissemination activities. Software Components. The software architecture of SPIN! is based on a set of reusable and self- contained components. The great advantage of generic approach to Spatial Mining in SPIN! is that the components are independent of the specific-application domain and can be used as building blocks for developing particular applications. Robustness, scalability, platform-independence and timeliness are the foreseen benefits when following this approach for developing new applications. Two results of the project will be of particular commercial value: (i) Integrated software system (ii) Application to web based brokering The target market for the first products will be end users themselves, software developers aiming to incorporate in their GIS applications the ”intelligence” of an automatic data analysis mechanism, and GIS companies who would like to add this functionality to their solutions. The target market for the second product are information providers in the public sector, but also commercial companies, e.g. active in geomarketing. Dialogis intends to commercialize the results of this project. The Descartes environment already enables government agencies or statistical analysts to make their geographic information available over the Internet. Kepler allows users in the industry to analyze their data with Data Mining methods. The markets for data mining tools and services as well as for geographical information systems are growing at a rate substantially higher than that for the entire sector of information technology. The results of the proposed project will enable Dialogis to market a greatly improved and highly competitive data mining / GIS product. We strongly believe that the resulting product will: enable Dialogis to establish the resulting system world wide as a highly competitive European data mining tool / GIS tool, make data mining and GIS technology accessible to and affordable for clients that currently refrain from investing into such technology due to the high recurring consulting costs, open up entirely new markets for Dialogis due to the substantially enhanced functionality of Kepler and dialoGIS that will be part of the resulting product. Dialogis sees the resulting product at the core of its product line, and will exploit its results of the project by marketing the improved product to all existing and future customers of Dialogis. Existing customers will be utilized as reference customers for the end result of SPIN!. Further customers can be acquired through the standard sales channels of Dialogis (own sales activities, affiliates outside of Germany, value added resellers and OEM partners). The results of this project are of such importance to Dialogis that we see it as a core precondition for the international expansion of Dialogis in the data 54
  • SPIN!, IST-99-10536, 15.06.1999 55 mining market. Conservatively estimated, we expect a two-fold increase in sales through the proposed project. PGS expects to integrate the results of the project into its VIW and Lava/Magma product lines. The VIW is a data-warehouse system for real-estate related information for Dutch municipalities. The system is completely based on Internet technology, so that all information is accessible to any (authorized) internet user. Currently most projects with VIW concentrate on building-up the data warehouse. Typically this is a complex process, requiring organizational changes. The reason is that a unified view in the entire organization is required on all information. Even for basic information (such as addresses and who lives where) different departments (social security, tax, planning) use different datasets. We expect that in a two to three years a substantial number of municipalities will have constructed a datawarehouse with VIW, and have a good, consistent dataset. We anticipate that at that time a great interest will arise in an easy to use analysis and data-mining environment to fully exploit the information in the datawarehouse. PGS will also exploit the results of the project to improve its consulting capabilities for complex geographical analysis operations. In particular we expect that noise-level zoning is expected to become a major issue in the next two to three years in Holland, because of anticipated new legislation. The Lava/Magma environment already enables local and other government agencies to effectively make their geographic information available through the Internet to a large public. With the results of this project added to that environment PGS expects to dramatically improve the attractiveness of its product line for expert users, that want to carry out complex analysis operations on their data-sets. The technology developed in this project is uniquely suited for innovative ways of web based information brokering and has a broad range of applications. PGS and Dialogis are convinced that a shared exploitation of the SPIN! results will considerably enhance the market potential of both companies. They therefore intend joint marketing and sales activities building on their respective expertise and customer base. They already made a similar agreement for Lava/Magma and Descartes within the CommonGIS project. Manchester Metropolitan University(MMU) with MIMAS (formerly MIDAS) runs the KINDS Project (http://www.midas.ac.uk/kinds) for accessing national spatial data sets over the Internet. The SPIN! Project will enhance considerably the service which can be offered to academic users by providing a major extension of functionality to complement the data browsing, data access and visualisation services which are currently available. MIMAS provides the main academic data service to the UK academic community and the SPIN! Project will have an immediate and maintained role in this service. A collaborator of MMU in the SPIN Project is the Office of National Statistics that produces and distributes the UK National Census. A new census will be undertaken in 2001. The SPIN Project is timely in the planning phase for the distribution of the Census data to commercial and academic users and is expected to be a major platform for the dissemination of data. Additional user groups and Scientific exploitation. The Global Biodiversity Information Facility (GBIF), whose installation is recommended by the OECD, has identified DMS and GIS as key technologies. Here is an exciting opportunity to develop a spatial mining solution as a coordinated European effort which can be linked to develop a European perspective within GBIF. From the strategic perspective of GMD knowledge discovery team, biodiversity informatics will be a major application area in which the techniques developed in this project can be put to very good use, supporting several European conventions. Partners at the University of Leeds have also been involved in environmental EU research, namely, MEDALUS III. This Mediterranean desertification and land use project completed a third stage of research this year and a further proposal has been submitted to the framework 5 research program. CCG research in MEDALUS III was geared to designing and developing a Synoptic Prediction 55
  • SPIN!, IST-99-10536, 15.06.1999 56 System (SPS) which aimed to be able to forecast future land use change impacts and land degradation risks based on imposing climate change scenarios. The researchers of the Department of Informatics of the University of Bari (I) have been working for a decade on the application of machine learning tools and techniques to problems related to image processing and computer vision. The SPIN! project provides a natural extension of the work done in Bari (I) along two new directions: the embedding of the machine learning algorithms in a platforms that tightly integrates GIS and Data Mining tools, and the application of the developed research techniques and tools to a new domain, namely earthquake prediction and hazard assessment. The former extension is important in order to define standard algorithms for the automated extraction of features from maps. Currently, many proprietary formats of vectorised maps are available, which make tools for automated extraction of information from maps hardly reusable. The collaboration with researchers having experience on GIS will provide researchers from the University of Bari (I) a better understanding of how to develop interoperable feature extraction algorithms for vectorised maps. Moreover, the strict collaboration with end users requiring innovative tools for discovery of geographic knowledge in data-rich environments will potentially result with better understanding of the possible application areas and also open some new research problems. Experiences and non- confidential research results will be disseminated in the scientific community by publishing papers and by organizing a workshop in collaboration with other partners of the European Network of Excellence on Machine Learning II (Esprit Project 29288), especially those actively involved in the “Industrial Application Initiative”, and partners of the ESPRIT project SODAS 20821 (Symbolic Official Data Analysis System), namely the statistical offices that already have geographically referenced data. The partner GFZ will contribute to the design and application of a demonstrator to seismic data research. GFZ will profit from the technology by getting access to advanced and complementary methods for data analysis. It intends to maintain a service to make research results on seismic and volcano data as well as on hazard management accessible via the Internet using the technology developed in this project. The research activities of the research group "Earthquakes and Volcanism" of the GFZ are world- wide, permanent observation networks in the Mediterranean region and Indonesia. Co-operative partner institutions are for example in Greece, Italy, Turkey, Armenia, Israel, Venezuela, China and Indonesia. World-wide data exchange and exchange of scientific results between research groups therefore is a crucial and important point. As many countries don't possess a fast Internet connection yet, exchange of large data sets or graphic information is still a bottleneck. As long as there are limitations in the bandwidth of telecommunication lines, scientists are in need of intelligent and effective methods for transferring geographical information and results, so that not only results can be examined by scientists and administrative persons in developed but also in developing countries. Other Dissemination Activities. With regard to promotion and diffusion the final goal for the partnership should be to offer to the end users concrete examples of how to disseminate geo-data to a wide audience of users and how to exploit such data: the success of these demonstrators will significantly contribute to the visibility of the European GIS and Data Mining technology. In the course of the project, when the first prototype becomes available, the establishment of an Advisory Board with external members will be seriously considered. The project will then have to assign some financial resources for paying expenses of the Board members. We have already asked the EEA (European Environment Agency) for their participation, and they have shown explicit interest. EEA has also shown interest to become end user of the results of the project, having in mind the recently started work on Sustainable Local indicators, together with DGXI., and also having in mind their information gathering, assessment and reporting cycle activities. For example, EIONET – the European Environment Information and Observation NETwork - was created as the main vehicle 56
  • SPIN!, IST-99-10536, 15.06.1999 57 of the European Environment Agency to collect data, information and knowledge for the process of reporting on the state of environment. 57
  • SPIN!, IST-99-10536, 15.06.1999 58 Appendix – Publications of partners cited in part B References partner P1 – GMD The Descartes system can be found at the following places: http://allanon.gmd.de/and/java/iris/ http://ais.gmd.de/descartes/IcaVisApplet/ 1. Andrienko, G. and Andrienko N. Interactive Maps for Visual Data Exploration. International Journal Geographical Information Science, 1999, 13(4). pp 355-374. 2. Andrienko, G. and Andrienko N. Intelligent Visualization and Dynamic Manipulation: Two Complementary Instruments to Support Data Exploration with GIS. In Proceedings of AVI'98: Advanced Visual Interfaces Int. Working Conference (L'Aquila - Italy, May 24-27, 1998), ACM Press, pp.66-75 3. Andrienko, G. and Andrienko N. Knowledge-Based Visualization to Support Spatial Data Mining. In Proceedings Intelligent Data Analysis IDA'99, Springer-Verlag, 1999 (accepted) 4. Klösgen, W. (1998). Deviation and association patterns for subgroup mining in temporal, spatial, and textual data bases. In: Polkowski, L., Skowron, A. (eds): Rough sets and current trends in computing. Lecture Notes in Artificial Intelligence, Vol. 1424, pp 1-18, Springer, Berlin, Heidelberg, New York. 5. Klösgen, W. , and Zytkow, J. (1999) (eds). Handbook of Data Mining and Knowledge Discovery, Oxford University Press, New York. 6. G. Paass and J. Kindermann (1995), G. Tesauro, D. Touretzky, T. Leen (eds.): Bayesian Query Construction for Neural Network Models, Advances in Neural Information Processing Systems 7, pp 443--450, MIT Press 7. Kindermann, J. and Paaß, G., Weber, F. (1995), Query Construction for Neural Networks Using the Bootstrap, in: Fogelman-Soulie, F. and Gallinari, P.Proc. (eds.) ICANN 95, International Conference on Artificial Neural Networks, Paris, 135-140 EC2 & Cie 8. Gerhard Paaß, Jörg Kindermann: Bayesian Classification Trees with Overlapping Leaves Applied to Credit-Scoring In: X. Wu , R. Kotagiri, K.B. Korb (eds.): Research and Development in Knowledge Discovery and Data Mining. Springer-Verlag, Berlin 1998 pp. 234 - 245 9. J. Kindermann and G. Paass (1998), Model Switching for Bayesian Classification Trees with Soft Splits in: J. Zytkow and M. Quafafou: Principles of Data Mining and Knowledge Discovery, 148-157, Springer 10. Stefan Wrobel. Scalability Issues in Inductive Logic Programming Data. In Proc. 9th Int. Workshop onAlgorithmic Learning Theory (ALT-98), Berlin, 1998. Springer Verlag. 58
  • SPIN!, IST-99-10536, 15.06.1999 59 References partner P2 - University of Bari 1. Malerba D., Esposito F., and Lisi, F.A. (1998). Learning recursive theories with ATRE. In H. Prade (Ed.), Proceedings of the 13th European Conference on Artificial Intelligence, 435-439, John Wiley & Sons, Chichester, England. 2. F. Esposito, A. Lanza, D. Malerba, & G. Semeraro (1997). Machine learning for map interpretation: An intelligent tool for environmental planning. Applied Artificial Intelligence: An Artificial Intelligence Journal, 11, 10, 673-696. 3. F. Esposito, A. Lanza, D. Malerba, & G. Semeraro (1998). Information capture from topographic maps using machine learning. Proceedings of the Joint Workshop of the Italian Association for Artificial Intelligence (AI*IA) and the International Association for Pattern Recognition - Italian Chapter (IAPR-IC) on "Artificial Intelligence and Pattern Recognition Techniques for Computer Vision", 122-127. References partner P3 – IITP, Russian Academy of Sciences The GeoProcessor system can be accessed at: http://www.iitp.ru/projects/geo/index.html http://www.iitp.ru/projects/geo/geoprocessor.html 1. Gitis V., Dovgyallo A., Osher B. An information technology for analysis of geological and geophysical data in INTERNET. Proceedings of VI national conference on Artificial Intelligence, Puschino, 1998, 473-479 (in Russian). 2. Gitis V., Dovgyallo A., Osher B., Gergely T. GeoNet: an information technology for WWW on-line intelligent Geodata analysis. Abstracts of 4th EC-GIS Workshop, Hungary, 1998. 3. Gitis V., Dovgyallo A., Osher B., Gergely T. An approach to Online Geoinformation Modeling. – Proceedings of the 1st International Workshop on Computer Science and Information Technologies, Moscow, January 18-22, 1999, 181-186. 4. Gitis V.G. GIS technology for the design of computer-based models in seismic hazard assessment.- Geographical Information Systems in Assessing Natural Hazards, A.Carrara and F.Guzzetti (eds), 1995, Kluver Academic Publishers, 219-233. 5. Gitis V.G., Jurkov E.F, Osher B.V., Pirogov S.A., Ponomarev A.V., Sobolev G.A. A system for analysis of geological catastrophe precursors.- Journal of Earthquake Prediction Research 3, 1994, 540-555. References partner 4 – Leeds The internet version of the GAM system can be found at: http://www.ccg.leeds.ac.uk/smart/gam/gam.html 1. Openshaw, S. and Perrée, T. (1996) „User centred intelligent spatial analysis of point data‟, in Parker, D. (eds) Innovations in GIS 3 ,Taylor and Francis, London, 119-134. 2. Openshaw, S., Turton, I., Macgill, J. and Davy, J., (1999) Putting the Geographical Analysis Machine on the Internet in Gittings, B. (ed.) Innovations in GIS 6, Taylor and Francis, London, (in press) 59
  • SPIN!, IST-99-10536, 15.06.1999 60 3. Openshaw, S., Turner, A., Turton, I., Macgill, J. and Brunsdon, C., (2000) Testing space-time and more complex hyperspace geographical analysis tools in Martin, D. (ed.) Innovations in GIS and GeoComputation 7, Taylor and Francis, London, (in press) 4. Openshaw, S. (1998) „Building automated Geographical Analysis and Exploration Machines‟, in Longley, P. A., Brooks, S. M. and Mcdonnell, B. (eds) Geocomputation: A primer Macmillan Wiley Chichester, p95-115. 5. Turton I, (1999) Using Pattern Recognition to Discover Concepts in Spatial Data, in Gittings, B. (ed.) Innovations in GIS 6, Taylor and Francis, London, (in press) 6. Turton, I., (1999) Application of Pattern Recognition to Concept Discovery in Geography, in Allan, R.J., Guest, M.F., Simpson, A., Henty, D. and Nicole, D., High- Performance Computing p467-486, Plenum Press, New York 7. Openshaw S, and Turton, I. (1998) Application of GAM to crime analysis, Crime Mapping Research Centre Report, U.S. Department of Justice. 8. Openshaw, S, Turton, I. and Macgill, J., (1999) Using the Geographical Analysis Machine to analyse census limiting long term illness, Geographical & Environmental Modelling, vol. 3.1 p83-99 9. Carver, S., Blake, M., Turton, I. and Duke-Williams, O., (1997) Open spatial decision making: Evaluation of the potential of the world wide web, in Z. Kemp (ed.), Innovations in GIS 4, pp 267- 278, Taylor and Francis, London. References partner P5 – Dialogis 1. Wrobel, S., Wettschereck, D., Sommer, E., Emde, W. (1996). Extensibility in Data Mining Systems, Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, AAAI Press, Menlo Park, California References partner P6 – PGS The Lava/Magma system can be accessed at: http://www.pgs.nl/ 1. C. van den Berg, F.Tuinman, T.Vijbrief, C.Meijer, P. van Oosterom, and H. Uitermark (1999), Multi-server Internet GIS: Standardization and Practical Experiences, In Goodchild, M., Egenhofer, M., Fegeas, R., and Kottman, C. (eds.) Interoperating Geographic Information Systems. Boston: Kluwer Academic Publishers, 1999, pp.365-377 60