Your SlideShare is downloading. ×
A Data Fusion System for Spatial Data Mining, Analysis and Improvement Silvija Stankute, Hartmut Asche - University of Potsdam
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

A Data Fusion System for Spatial Data Mining, Analysis and Improvement Silvija Stankute, Hartmut Asche - University of Potsdam


Published on

Published in: Technology

1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Real world object (streets, roads, homes etc.) acquisition is carried out by various companies or institutions. The consequence is the development of a variety of heterogeneous geo-spatial datasets, which represent the same area of the real world and are different in their geometrical and thematic accuracy. Insufficient geometrical and thematic accuracy leads to the need for a new method of geo-spatial information acquisition. Geoinformation Research Group at the University of Postdam in the previous works [1], [2] and [3] has developed a method, which increases the thematic and geometrical quality of the available spatial data sets. The new method incorporates the data fusion process.
  •   DAFU has three components: 1) preprocessing of input data sets, 2) fusion/filtering of input data sets, 3) post-processing of end data set.
  • The developed algorithms were implemented in the widely used interpreted, dynamic programming language PERL. PERL is particularly suitable for the processing and/or manipulation of large ASCII data sets (DAFU works basically with ASCII files). The graphical user interface for easy control of the core routines were developed using the Tk library. During the implementation of DAFU, modular construction of the software was particularly important. This allows generic programming and simplifies considerably the subsequent extension. DAFU contains a core with five modules. Three of them (ATKIS-, TeleAtlas- and Navteq-module) cover the input datasets. The objective is a conversion of input data in to the same internal data structure. If the input data sets during the pre-processing step are converted into the same data structure, the next step calls the assignment module. The assignment module relates individual objects of different data sets. This is necessary condition for the fusion module. The fusion module processes merges two different data structures into one.. This merging takes place according to certain rules, which are given by the user over the graphical user interface (GUI). The GUI module controls not only the core of DAFU, but also the periphery. The periphery includes the input and output modules, and the pre-processing and post-processing modules. The input module supplies the data for the core module, which provides data for the output module. Furthermore the GUI module is responsible also for the visualisation of the input data sets and the output data.
  • The first DAFU component Pre-processing has the following process steps: Analysis of the input data sets, determination of the data quality and data preparation. The subject of this process step is the analysis of the geo-spatial input information. Here the vector model of every input data set is analysed. In the first pre-processing step each input data set must be converted in a unified coordinate system. This is important for the future geo-data merge. The next step is the transfer of the input data into the same data format. In the third analysis step the uniqueness and completeness of spatial input data sets must be verified. All the data sets used in DAFU systems are vector data. Any spatial object contains geometric and semantic information. Moreover, quality measures (or possibly a quality measure) will be computed. This defines the quality of the available input data sets. To be able to define the quality measure, two characteristics of each input data set have to be examined. The first characteristic is the degree of topological correctness. The second characteristic is the measurement of the thematic completeness. To merge the geo-data with each other, so that they are correct from a topological point of view, it is necessary to carry out geometrical correction of the geo-data. This includes removing duplicate geometries. The result of pre-processing component is output-pre-pro-data sets. These data will be used as input data for the fusion/filtering component.
  • After the successful analysis and preparation of the input data sets, the input data sets are merged. The DAFU system executes the algorithms based on direct comparison of coordinates. For the fusion of the vector data with different geometrical types, separate algorithms were developed. The requirements for the successful realisation of the subroutines are alike for all geometrical types. The input data sets must have the same coordinate system and the same data format. The next important requirement is redundancy-free input data sets. This means that each object of the space may be represented only with one geometry. All algorithms are based on the direct comparison of the coordinates. The relationship between objects in the various input data sets and objects in the real world are determined by using pairs of coordinates. Using pairs of coordinates are determined relationships between the objects that are present in the various input data sets and represent the same object in the real world. By creating a relation between two objects, the transfer of attributes (thematic information) is ensured. The user-defined set of attributes ensures that thematic information is transmitted over an object from two or more input data sets. This transmission (or cross-referencing) means the extension of the attribute table and generation of new geometrical features. In one implementation of DAFU only one input data set with the other input data set can be extended. The user decides which data set will be extended. The result of fusion/filtering component is the output-merge-data set.
  • In the post-processing component the output-merge-data set must be verified. The quality of fusion process is calculated and evaluated. This is followed by manual correction of the possible errors, which are usually below five percent for line-like objects and less than 10-15 percent for polygonal objects. The transfer of geo-data in to different coordinate systems follows after manual correction of errors. The next and final step of the post-processing component is to convert merged data set into other data formats. The last two steps are performed only by user request.
  • This screenshot shows the input elements of the DAFU GUI. Here information about the source, target and final data sets is entered. It is important to set the type of the data set (ATKIS, TeleAtlas, Navteq or user-defined) and the file format (SVG, SHP, WKT, or user-defined). DAFU implements different sub-routines based on these settings. It is possible to set four different debugging levels (0 to 3). Once the input data has been entered successfully in the input section, the user can begin to analyse the input datasets. This is carried out in the attribute-section. The analysis of the various attributes and the values of the source and target data sets is important in order to determine which attributes should be included in the final data set.
  • Transcript

    • 1. data|fusion 1/18 A data fusion system for spatial data mining, analysis and improvement Silvija Stankute, Hartmut Asche Geoinformation Research Group Dept of Geography | University of Potsdam | Germany ICCSA 2012 | GEOG-AN-MOD 2012 | Salvador da Bahia, Brazil | 18-21/06/2012 © stankute|asche·ifg·uni·potsdam 2012
    • 2. data|fusion 2/18 Summary Data fusion system for spatial data mining 1. Motivation 2. Concept: Automated data fusion 3. System architecture: Generic components 4. Fusion pipeline: Operations and workflow 5. System operation: User interface 6. Conclusion © stankute|asche·ifg·uni·potsdam 2012
    • 3. data|fusion 3/18 1 Motivation Improvement of geodata quality  Acquistion of geodata by range of actors including state insti- tutions (NMAs) and private enterprises resulting in heteroge- nous, frequently redundant geospatial databases  Geometric, semantic quality of geospatial data heterogenous, frequently insufficient or inaccurate: unreliable data quality of existing datasets for identical real world section  Effective geodata management and use necessitate harmonisa- tion of heterogenous geodata according to application-specific data quality specifications  To avoid fresh data acquisition automated process required to fuse imperfect geometric and/or semantic information of 2 or more datasets to produce optimal application-specific data © stankute|asche·ifg·uni·potsdam 2012
    • 4. data|fusion 4/18 2 Concept Automated fusion of imperfect geodata 1 2 1+2 1+2 3 1+2+3 © stankute|asche·ifg·uni·potsdam 2012
    • 5. data|fusion 5/18 2 Concept Automated fusion of imperfect geodata  Development and implementation of automated fusion process (DataFusion) to produce single geospatial dataset from existing datasets superior in geometric and/or semantic quality to im- perfect source data  Objective to extract, filter and combine relevant features from diverse source data into single best-fit quality dataset according to user and application specifications  Data harmonisation and fusion process allows for selection, elimination and/or substitution of unwanted source attribute features by user-specified geometric and/or semantic attributes  DataFusion or DAFU provides user-defined data filter to gene- rate optimal geodata in automated filtering process © stankute|asche·ifg·uni·potsdam 2012
    • 6. data|fusion 6/18 3 System architecture Modular components © stankute|asche·ifg·uni·potsdam 2012
    • 7. data|fusion 7/18 3 System architecture Modular component system  Implementation of DataFusion based on generic, modular com- ponent architecture and object-oriented, procedural cross-plat- form programming language (Perl)  Presently DataFusion consists of 3 components, sequentially linked in fusion pipeline  Preprocessing component: preprocessing modules for Tele- atlas, Navteq, ATKIS input data, at present  Filtering/fusion component: merge of 2 or more different input datasets into single optimal dataset  Validation component: quality assessment of merged dataset according to user or application specifications © stankute|asche·ifg·uni·potsdam 2012
    • 8. data|fusion 8/18 4 Fusion pipeline Preprocessing of source data Source data Conversion to 1 2 Conversion to uniform coordinate uniform data system format Quality measures 3 4 3 5 6 Analysis for Analysis for topological Analysis for topo- Analysis for uniqueness 2 logical errors completeness completeness 7 Geometric correction 6 Preprocessed input data © stankute|asche·ifg·uni·potsdam 2012
    • 9. data|fusion 9/18 4 Fusion pipeline Preprocessing of source data  Preprocessing component executes the following operations on heterogenous geospatial source data:  Objective: Quality assessment of input vector data model underlying each source dataset  Operations: Selection of source data; integration of source data by conversion to unified coordinate system; transformation into common data format; source data assessment for uniqueness and completeness; quality assessment and adjustment of topo- logical correctness, thematic completeness  Result: Preprocessed input datasets used as input data for sub- sequent fusion/filtering component © stankute|asche·ifg·uni·potsdam 2012
    • 10. data|fusion 10/18 4 Fusion pipeline Fusion of preprocessed data Detection of 1 Preprocessed input data relations among input data 2 Assignments of related objects 3 4 3 Transfer of geo- Transfer of the- metric information matic information 2 Merged output data © stankute|asche·ifg·uni·potsdam 2012
    • 11. data|fusion 11/18 4 Fusion pipeline Fusion of preprocessed data  Data filtering/fusion component executes following operations on preprocessed geospatial input data:  Objective: Generation of single optimal dataset by transmission and augmentation of attribute features from n input datasets  Operations: Iterative comparison of geometric features (coor- dinates) of vector input datasets; determination of relationships between data features and real-world objects; generation of non-redundant fusion data (1 semantic feature assigned 1 geo- metric feature only, vice versa); transfer (cross-referencing) and extension of specified attributes to target dataset  Result: Merged dataset used as input data for subsequent vali- dation component © stankute|asche·ifg·uni·potsdam 2012
    • 12. data|fusion 12/18 4 Fusion pipeline Validation of merged data 1 Merged Validation of output data fusion quality 2 Interactive error correction 3 Coordinate system transformation 4 3 Data format con- version 2 Specified DAFU data © stankute|asche·ifg·uni·potsdam 2012
    • 13. data|fusion 13/18 4 Fusion pipeline Validation of merged data  Validation component executes the following operations on single merged geospatial dataset:  Objective: Quality verification of fusion process  Operations: Calculation and evaluation of data fusion quality; if required and/or specified: interactive correction of errors of source data (< 5 percent for linear objects, <10-15 percent for polygonal objects); transfer of merged geodata to specified co- ordinate systems; conversion of merged dataset into specified data formats (SVG, CSV, SHP, etc.)  End result: Application and/or user-specified optimal geospatial dataset © stankute|asche·ifg·uni·potsdam 2012
    • 14. data|fusion 14/18 5 System operation User interface  Front-end of Data Fusion system allows for 2 operation modes: graphical user interface (GUI) or command-line interface  Command-line operation for implementation into remote sys- tems, such als servers, clusters, etc., by GI experts  GUI operation standard operation mode for application-orien- ted GI users  GUI composed of 8 widgets covering core funtions of DAFU; widgets communicate via data exchange and signal exchange (bindings)  Additional flexible support system provides user with relevant information on operation and understanding of DAFU © stankute|asche·ifg·uni·potsdam 2012
    • 15. data|fusion 15/18 5 System operation User interface > GUI  Abb 5-3 Diss © stankute|asche·ifg·uni·potsdam 2012
    • 16. data|fusion 16/18 6 Conclusion Data fusion – what‘s the benefit? © stankute|asche·ifg·uni·potsdam 2012
    • 17. data|fusion 17/18 6 Conclusion Data fusion – what‘s the benefit?  The DataFusion system presents an innovatiove approach to geospatial data mining by harmonising and improving the geo- metric and semantic quality of digital vector data  DAFU demonstrates that single optimal geospatial data can be generated from existing suboptimal datasets making repeated data acquistion unneccessary  DAFU facilitates cost-effective geospatial data management by multiple re-use of existing datasets customised to individual user and/or application requirements  DAFU contributes to reducing heterogeneity and redundancy of geospatial data in geo databases, at the same time increasing efficient, meaningful use of geographically-related mass data © stankute|asche·ifg·uni·potsdam 2012
    • 18. data|fusion 18/18 Thank you for your attention Questions? Comments? Feedback? Contact Hartmut Asche | Dept of Geography | University of Potsdam | GER Web ICCSA 2012 | GEOG-AN-MOD 2012 | Salvador da Bahia, Brazil | 18-21/06/2012 © stankute|asche·ifg·uni·potsdam 2012