Real world object (streets, roads, homes etc.) acquisition is carried out by various companies or institutions. The consequence is the development of a variety of heterogeneous geo-spatial datasets, which represent the same area of the real world and are diﬀerent in their geometrical and thematic accuracy. Insuﬃcient geometrical and thematic accuracy leads to the need for a new method of geo-spatial information acquisition. Geoinformation Research Group at the University of Postdam in the previous works ,  and  has developed a method, which increases the thematic and geometrical quality of the available spatial data sets. The new method incorporates the data fusion process.
DAFU has three components: 1) preprocessing of input data sets, 2) fusion/ﬁltering of input data sets, 3) post-processing of end data set.
The developed algorithms were implemented in the widely used interpreted, dynamic programming language PERL. PERL is particularly suitable for the processing and/or manipulation of large ASCII data sets (DAFU works basically with ASCII ﬁles). The graphical user interface for easy control of the core routines were developed using the Tk library. During the implementation of DAFU, modular construction of the software was particularly important. This allows generic programming and simpliﬁes considerably the subsequent extension. DAFU contains a core with ﬁve modules. Three of them (ATKIS-, TeleAtlas- and Navteq-module) cover the input datasets. The objective is a conversion of input data in to the same internal data structure. If the input data sets during the pre-processing step are converted into the same data structure, the next step calls the assignment module. The assignment module relates individual objects of diﬀerent data sets. This is necessary condition for the fusion module. The fusion module processes merges two diﬀerent data structures into one.. This merging takes place according to certain rules, which are given by the user over the graphical user interface (GUI). The GUI module controls not only the core of DAFU, but also the periphery. The periphery includes the input and output modules, and the pre-processing and post-processing modules. The input module supplies the data for the core module, which provides data for the output module. Furthermore the GUI module is responsible also for the visualisation of the input data sets and the output data.
The ﬁrst DAFU component Pre-processing has the following process steps: Analysis of the input data sets, determination of the data quality and data preparation. The subject of this process step is the analysis of the geo-spatial input information. Here the vector model of every input data set is analysed. In the ﬁrst pre-processing step each input data set must be converted in a uniﬁed coordinate system. This is important for the future geo-data merge. The next step is the transfer of the input data into the same data format. In the third analysis step the uniqueness and completeness of spatial input data sets must be veriﬁed. All the data sets used in DAFU systems are vector data. Any spatial object contains geometric and semantic information. Moreover, quality measures (or possibly a quality measure) will be computed. This deﬁnes the quality of the available input data sets. To be able to deﬁne the quality measure, two characteristics of each input data set have to be examined. The ﬁrst characteristic is the degree of topological correctness. The second characteristic is the measurement of the thematic completeness. To merge the geo-data with each other, so that they are correct from a topological point of view, it is necessary to carry out geometrical correction of the geo-data. This includes removing duplicate geometries. The result of pre-processing component is output-pre-pro-data sets. These data will be used as input data for the fusion/ﬁltering component.
After the successful analysis and preparation of the input data sets, the input data sets are merged. The DAFU system executes the algorithms based on direct comparison of coordinates. For the fusion of the vector data with diﬀerent geometrical types, separate algorithms were developed. The requirements for the successful realisation of the subroutines are alike for all geometrical types. The input data sets must have the same coordinate system and the same data format. The next important requirement is redundancy-free input data sets. This means that each object of the space may be represented only with one geometry. All algorithms are based on the direct comparison of the coordinates. The relationship between objects in the various input data sets and objects in the real world are determined by using pairs of coordinates. Using pairs of coordinates are determined relationships between the objects that are present in the various input data sets and represent the same object in the real world. By creating a relation between two objects, the transfer of attributes (thematic information) is ensured. The user-deﬁned set of attributes ensures that thematic information is transmitted over an object from two or more input data sets. This transmission (or cross-referencing) means the extension of the attribute table and generation of new geometrical features. In one implementation of DAFU only one input data set with the other input data set can be extended. The user decides which data set will be extended. The result of fusion/ﬁltering component is the output-merge-data set.
In the post-processing component the output-merge-data set must be veriﬁed. The quality of fusion process is calculated and evaluated. This is followed by manual correction of the possible errors, which are usually below ﬁve percent for line-like objects and less than 10-15 percent for polygonal objects. The transfer of geo-data in to diﬀerent coordinate systems follows after manual correction of errors. The next and ﬁnal step of the post-processing component is to convert merged data set into other data formats. The last two steps are performed only by user request.
This screenshot shows the input elements of the DAFU GUI. Here information about the source, target and ﬁnal data sets is entered. It is important to set the type of the data set (ATKIS, TeleAtlas, Navteq or user-deﬁned) and the ﬁle format (SVG, SHP, WKT, or user-deﬁned). DAFU implements diﬀerent sub-routines based on these settings. It is possible to set four diﬀerent debugging levels (0 to 3). Once the input data has been entered successfully in the input section, the user can begin to analyse the input datasets. This is carried out in the attribute-section. The analysis of the various attributes and the values of the source and target data sets is important in order to determine which attributes should be included in the ﬁnal data set.
A Data Fusion System for Spatial Data Mining, Analysis and Improvement Silvija Stankute, Hartmut Asche - University of Potsdam