WORKSHOP ON SPATIAL DATA USABILITY
CENTRE FOR GEOINFORMATION
UNDERSTANDING SPATIAL DATA USABILITY
Gary J. Hunter
Department of Geomatics, University of Melbourne, Victoria 3010, Australia
In recent scientific literature, a number of researchers have made mention of a seemingly new
characteristic of spatial data known as ‘usability’. This apparent property is also receiving
mention in the data mining and knowledge discovery literature, so it would seem to be something
that is not wholly restricted to the spatial domain and has much broader impact as well. While
concepts such as the use and value, and diffusion of spatial information have been the subject of
research since the late-1980s, the current references to usability clearly represent something that
is both novel and different. Accordingly, the purposes of this paper are initially to understand
what is meant by usability and to assess whether it is a significant concept worthy of more
detailed scientific pursuit. If this is so, then the secondary aims of the paper are to identify the
elements that comprise usability, and to consider what the related research questions might be
and how an appropriate research agenda should be shaped.
Keywords: Spatial data, usability, research agenda.
In the public announcement for this workshop on spatial data usability, mention was made of the
well-known example that occurred almost 150 years ago when a London doctor, John Snow,
combined spatial data relating to the locations of cholera deaths with the positions of water
pumps in that city, to test his theory about the source and transmission of an outbreak of that
deadly disease that killed 600 people in its first ten days (UCLA, 2001). That famous example is
now taught to students worldwide in fields such as geography and epidemiology, and serves as a
perfect example of how spatial data can be very effectively applied in critical situations.
Similarly, there have been more modern applications of spatial data which, although they do not
have the same impact as Dr. Snow's work, have nevertheless proven to be extremely valuable.
For instance, exploratory data analysis techniques have been used to locate previously
unidentified cancer clusters (Openshaw et al., 1987), while the value-added linkage of electronic
telephone directories with street mapping products is causing people to replace their hardcopy
“YellowPages” telephone directories with enhanced digital “YellowMap” alternatives.
On the other hand, a bold initiative of the 1970s to provide on-line interactive color maps of
statistical data as part of the U.S. While House Information System (the Domestic Information
Display System—DIDS), was completely abandoned within the space of a few short years
(Cowen, 1982). Then, more recently there have been cases reported in the past two or three
years where a lack of faith in the reliability of outputs from environmental models that employ
spatial data has caused governments to abandon major projects—in essence because they are
unwilling to proceed with their decision-making because of major concerns about the
trustworthiness of the scientific information presented to them (Beven, 2000). Of course, there
are many thousands of applications of spatial data that fall between these two extremes, but it is
these particular cases at the outer limits that are both exciting and distressing, and are therefore
deserving of closer scrutiny.
So there seems to be a common link between these examples, involving some fundamental
characteristic that has resulted in these spatial data applications being either very successful or
unsuccessful. It would appear the cases all demonstrate either a very high or very low degree of
data ‘usefulness’ or ‘usability’, which in turn produces very positive or very negative economic,
social, environmental or scientific impacts. Our interest here lies in knowing exactly what it is that
distinguishes these cases from other, perhaps more mundane, examples. For instance, is low
usability caused by a poor choice of data, models and algorithms for the given application, or is it
simply a matter of bad data quality? Alternatively, is high usability proportional to the degree of
‘interestingness’ or ‘unexpectedness’ in the data (as data miners would say), or the result of data
integration and value-adding? Or are these differences caused simply by some unpredictable,
indescribable phenomenon that produces such extreme examples? At this stage we do not know,
but given the very large expenditure of resources nowadays on the development of spatial
information products, it would seem to be a goal worth pursuing to ensure that all spatial data is
as ‘usable’ as possible. Clearly, with a better understanding of data usability we might be able to
increase the number of ‘successes’ and reduce the ‘failures’ in the application of spatial data.
Accordingly, this paper seeks to provide a better understanding of spatial data usability.
Following this introduction, it examines the meaning of the concept within a technological
setting in both non-spatial and spatial contexts, and then investigates what fundamental elements
comprise usability. Finally, the paper focuses upon what the research questions associated with
usability might be, and what priorities a future research agenda in this subject should adopt.
Usability –a scientific or a procedural issue for
Geographical Information Science?
Peter A. Burrough, Utrecht University.
The organisers of this meeting have set up five questions on the theme of “usability”. The aim of
this talk will be to discuss each of these questions in order to determine the usefulness to GI-
practitioners of pursuing this topic and attempting to arrive at formal, reproducible and useful
1. What do we mean by 'usability'?
Collins Eng Dict (mid 90s) says: usable: able to be used. It gives as nouns: usability,
useability and usableness. This is different from useful: able to be used advantageously,
beneficially, or for several purposes; helpful, serviceable: Noun: usefulness
Therefore usability is the degree with which an object or unit of information serves a defined
purpose. This could be indicated on a binary scale or on a gradual scale – degrees of
usefulness or usability.
Usability depends on context – something could be usable for one purpose and useless for
Usability may also depend on availability, formatting, coding, language, source, provenance,
history, etc; many of these aspects may prevent a potentially useful object/piece of
information from being used/useful.
Usability is different from quality/accuracy. These may determine usability, but may also be
irrelevant if the context is wrong. For example, you may buy a new tyre for the car. The tyre
is of high quality, but will not be usable if it does not fit the wheels.
2. Why is usability important?
The degree of usability provides an indication to the user of the degree with which an
object/unit of information will serve its intended purpose.
3. What are the characteristics of spatial data usability?
What are the characteristics of spatial data use?
What do users need?
How long is a piece of string?
4. What are the research problems to be solved in spatial data usability?
Is this a scientific or a practical question?
Can you derive usability from metadata?
Can you define usability in metadata?
How do you set up a generic means for determining usability for a wide (unlimited) range
5. What should the research priorities be?
Is there sufficient science in this to make research worthwhile?
How much is common sense?
All these points will be discussed, mainly in the context of environmental modelling.
Services for Data Integration
for the Workshop on Spatial Data Usability
November 19-20, 2001, Center for Geo-Information, Wageningen UR, The Netherlands
Catharina Riedemann & Werner Kuhn
Institute for Geoinformatics
University of Münster
Tel.: +49 (0)251 83-31963/34707
Data users are interested in the information necessary to solve problems and make decisions.
Data handling and processing are not their intention. Therefore, we call data usable that easily reveal
inherent information without demanding technical expertise. This is usually done by wrapping data
into services, which give direct access to the relevant information.
A major problem with the exploitation of geospatial data is that the needed information can
often only be obtained by combining various sources (e. g. finding locations for industrial plants
requires topographic, infrastructure, environmental, and demographic data). Creating new datasets
today mostly does this integration of existing data, but among other things these additional datasets are
difficult to update when the original data change. We seek a way to integrate data just-in-time by
wrapping them in suitable services which immediately give the answer to a user’s question instead of
producing a new dataset. This will eliminate the update problem, because at any time the most current
data are accessed.
We envision an environment with data and service providers, where services can be coupled
with data on demand to form a “wrapped object” exposing the desired information. The platform is the
Internet. Promising technologies are available and under development that help to build the necessary
infrastructure, among them the Web Services Description Language (WSDL), Universal Description,
Discovery and Integration (UDDI), and the Simple Object Access Protocol (SOAP). The challenge is
to ensure that only permissible operations are performed on the data. This will need research
concerning the exposure and evaluation of data and operation semantics.
We examine case studies to gain crisp problem statements and test research results. In the end,
we are interested in how the notion, the handling, and the results of data integration change and
influence the future use of existing datasets.
Shared Earth System Models for the Dutch
J.D. VAN WEES, R. VERSSEPUT, H.J. SIMMELINK, R. ALLARD, H. PAGNIER
Netherlands Institute of Applied Geoscience TNO- National Geological Survey,
P.O. Box 6012, 2600JA Delft, The Netherlands
One of TNO-NITG’s primary missions is to map the Deep Subsurface of the Netherlands, up
to a depth of approximately 5 000 m, at a scale of 1:250 000. Over the years, the institute has compiled
a wealth of such mapping data and stored it in digital form. The horizons mapped with these data were
used to construct a volumetrically consistent stratigraphic model: the three-dimensional atlas of the
deep subsurface of the Netherlands. These data and parameters – which include borehole and locations
of seismic interpretations, horizons, fault lines and sub-crop lines – are now available for digital
As part of the work done on the atlas, TNO-NITG developed a dissemination and visualisation
system for a broader public. This system will be available to all interested parties, free of charge,
starting in the spring of 2001. All they need is a computer with an Internet connection to start the
HTML navigator page designed by TNO-NITG. Users can select data on any given part of the
Netherlands; the system will then copy all of the relevant data from the database into a zip file and
send it to them by e-mail.
TNO-NITG also designed a three-dimensional viewer in Java 3D, a program that runs on any
computer, so that these data could be visualised. Users can adjust it to their computer’s speed. The
viewer is available free of charge for interactively viewing the digital atlas of the Dutch subsurface.
The atlas can and will be further expanded to include fault structures and property models, among
other things, in three dimensions. Moreover, the dissemination system and viewer provide a generic
means of integrating data from other disciplines and sources.
The digital atlas provides a great deal of added value compared to standard, static information
media such as maps and profiles. For instance, the system architecture makes it possible to distribute
customised data and keep these constantly updated in response to the needs of society and the market.
Another benefit is that users can interactively determine what part of the subsurface they want to see
depicted and on what scale. Interested parties can also choose to look at a two-dimensional cross-
section of any given area of the subsurface. They can peel off, as it were, layers from the volumetric
model to get a clearer picture of the interconnection between the layers.
For all these reasons, spatial underground models that integrate knowledge and their
visualisation will become increasingly important for obtaining greater insight into the complex
geological structure of the subsurface. And that insight is crucial for the spatial planning of the deep
subsurface, especially now that more intensive uses for it, such as gas and CO2 storage and geothermal
energy generation, are being discussed. Seen in that light, this recently developed dissemination and
visualisation system is a valuable policy instrument.
Analysing uncertainty propagation in GIS: why is it not that
Gerard B.M. Heuvelink
Institute for Biodiversity and Ecosystem Dynamics
Universiteit van Amsterdam
Nieuwe Achtergracht 166
1018 WV Amsterdam
Attention for spatial accuracy assessment and uncertainty propagation in GIS has been with us
ever since the introduction of Geographical Information Systems in the 1980s. Over the past 20 years,
an impressive number of scientific articles have addressed the issue of uncertainty and error in relation
to GIS. But have we made much progress? Speaking for uncertainty propagation in GIS, most of us
would agree that the answer to this question should be a. We now have a number of techniques that
enable us to track how error propagates in GIS operations. Arguably the most appealing technique –
because it is intuitively clear, easily implemented and generally applicable – is Monte Carlo
simulation. Given uncertain input to a GIS operation, the idea behind this method is to draw a
realisation from the input probability distribution, submit it to the operation, compute and store the
result, and to repeat this procedure many times, so that the collection of outputs approximates the true
output probability distribution. Indeed, the Monte Carlo method is a straightforward and effective
technique. The method is computationally demanding, but nowadays this can hardly be considered a
serious drawback. In other words, have we not solved the problem of uncertainty propagation in GIS?
Alas, we have not. There still are a number of fundamental problems to be resolved before we may
expect to see standard GIS equipped with a universal ‘error propagation button’. In this presentation I
discuss some of these problems. Not necessarily the most important ones, but rather those that I find
the most interesting. These problems have to do with difficulties in the assessment of input error,
difficulties in representing uncertain spatial attributes in conventional GIS databases, the ever-present
mistiness about what uncertainty really is and problems concerning the scale or support of the model
entities. Although the presentation is likely to generate more questions than it will provide answers,
some suggestions for going about the problems raised will be proposed. Shall we see a GIS that is
completely tuned to uncertain spatial attributes in the year 2010? Given the many fundamental
problems to be resolved, it seems to me that this very much depends on our ability to launch a large-
scale concerted initiative in this direction.
Why do we use geographic data so ineffectively?
Department of Geography and Human Environment, University Tel-Aviv
Huge geographic databases at different resolutions have been constructed during the last
decade and continue to grow intensively in extends, resolution, and layers of information.
Surprisingly, the usage of the geographic data remains far beyond this progress. Planning, decision-
making, marketing, and locating procedures are still based on the aggregate information and
geographic data at fine resolution remain unemployed.
The poor use of data is characteristic not for politicians and managers only, but also for
researchers. For example, the exceptional GIS of the Israeli population census of 1995, which contains
multiple data (including salaried income!) on all households in the country, and is geo-referenced at
the level of separate buildings, is explored by a couple of researches only. The public use of geo-data
is not much better. As the analysis of the Internet GIS services demonstrates, existing location and
way-finding engines are used much below their capacity.
If so, the problem might be not in the data and methods, but in human cognition of the geo-
information. The initial analysis makes it possible to assume that the problem is in artificial character
of the basic GIS navigation and presentation operations, such as zoom, pan, and thematic mapping,
which do not allow viewing of geo-data at varying resolution, which depends on distance to objects
and, thus, do not fit to human perception of space.
The current presentation and navigation tools can be modified. Several approaches that fit to
human cognition and are based on 3D visualization can be proposed. Three-dimensional visualization
enables human-friendly presentation of the information, including varying resolution within one map
window and continuous navigation. I review recent advances in these 3D approaches and propose the
ways to make them a part of standard GIS GUI.
Is interestingness an indicator of data usability?
Centre for Geo-Information
Data mining consists of a variety of techniques and tools used for the discovery of previously
unknown, valid, potentially useful, and understandable patterns in very large databases. These
techniques differ in the types of data they can mine and the kinds of knowledge representation they
use to convey the discovered patterns. Text, multimedia and geographic data are some examples of the
forms of data being used for mining patterns. These patterns can in turn be represented as
classification rules, association rules, clusters, sequential patterns, time series, contingency tables and
concept hierarchies. However, the number of patterns being generated is often very large, and only a
few of these patterns are likely to be of interest to the user or an organisation. To address this problem,
researchers have been working on defining various measures of pattern 'interestingness'.
One approach to determining interestingness is to define it in objective terms, where
interestingness is measured in terms of its structure and the type of data used in the mining task. The
measures used to achieve this capture the statistical strength of a pattern, such as the amounts of
'confidence' and 'support' for a rule. An alternative approach is to define subjective measures of
interestingness that do not depend solely on the statistical strength of a pattern, but also on the opinion
of the user who examines the pattern. Subjective measures are based upon user beliefs or biases
regarding relationships in the data. Some examples of subjective interestingness measures are
'unexpectedness' (a pattern is interesting if it is surprising to a user) and 'actionability' (a pattern is
interesting if the user can act on it to his/her advantage).
Interestingness measures have already proven to be successful in reducing the number of rules
that need to be considered in data mining tasks. As such, this paper proposes another potential
application of interestingness to determine 'usability' (the state or quality of being useful) and utility
('the degree of usefulness') of very large data sets. Ultimately, different interestingness measures and
their relationships could provide users with knowledge about patterns in the data, as opposed to
knowledge about the data itself (for example, metadata).
The paper will address the following questions:
- Can interestingness measures be used for determining the usability and utility of very
large data sets?
- Once we have determined an interestingness measure of a pattern of a particular data set,
how can we search for patterns in other similar data sets using the same measure? Will
this approach be a way to determine the usefulness and utility of similar data sets?
- When data sets change over time, patterns also keep changing with the data. How an
update to a data set will affect the definition of interestingness measures? How changes
can be incorporated into interestingness measures? Would it be possible to make use of
the knowledge about emerging, sequential, and constraint-based frequent patterns for
defining usefulness and utility over time?
Data in light of the GNOSTICAL THEORY OF SPATIAL
Dr. Karel SEVCIK, Australia, email@example.com
This paper is focused on quantitative data. Any individual datum is a product of defined
processes called quantification. Quantification is studied by the theory of measurement and in practice
usually applied as measurement or counting. The quantification process does not define just the value
of an individual datum, but also its mathematical structure. The structure precisely determines datum's
numeric, geometric and algebraic properties. Properties of sets of such data are then obvious and
impossible composition of data of different structures is a matter-of-course. This approach to an
individual datum (accepted in relativistic paradigm: e.g. special relativity, theory of measurement,
gnostical theory of spatial uncertain data, etc.) is in sharp contrast with today still much predominant
statistical model of random selection of an individual datum from some basket of theoretical
distribution (paradigm of Newtonian mechanics, Euclidean geometry and statistics).
Each measured quantity has its ideal value (e.g. concentration of gold in a sample). However,
no measurement is absolutely precise. Therefore, repeated measurements produce different results.
This difference is caused by influence of uncertainty. Because these two components have the same
group structure, they project into the same numerical result – an individual uncertain datum.
Quantification process producing data can be described in 2D plane, where one axis represents the
ideal value and the second axis the uncertainty.
Analysis of data structure and mainly analysis of physical interpretation of its geometric
properties results in new generation of information theory. It describes quantification process in detail.
Its fundamental discovery is in definition, properties and interpretations of an estimation process as a
counterpart of quantification. This complete information theory by P. Kovanic is called the Gnostical
theory of uncertain data (see references). The GTUD offers very powerful, easy programmable, robust
and universal tools for processing of any sets of quantitative data (even small and highly disturbed).
Data are treated regardless of shape of their distribution, because the only condition for validity of
estimates is defined structure of processed data.
Space has its structure well known from geometry and physics. Spatial / temporal structures
are identical with data structures and structures of quantification and estimation processes. Existence
of the Gnostical Theory of Spatial Uncertain Data is just logical consequence.
The above mentioned paradigm puts the first proposed question of data usability in a new
light. Each individual datum carries information on property, for which its measurement method was
designed. Quality of measurement determines damage of data information caused by various amounts
of uncertainty. However, influence of uncertainty is partially compensated by estimation process
(compensation of uncertainty is always bellow 100% as it follows from the Law of thermodynamics).
If the quantification process is in order, consequently, data are in order. There is no question of
unsuitable data. (Expect of obvious errors not related to information theory - GIGA systems)
The above turns question of data usability to question of suitability of processing methods. The
reason is obvious. If processing method does not respect data structure or fundamental natural laws (or
both), its results are more or less false. Development of relativistic paradigm and related individual
mathematical, physical and information theories pinpointed many mistakes of previous paradigms.
The most common mistakes are e.g.:
(1) random distribution of quantitative data (data are individual products of quantification);
(2) composition of squares of error as a measure of uncertainty of a set (uncertainty is part of a datum
and its measure is cos or cosh function according to geometry of given process, i.e. like power
factor in physics);
(3) composition of squares of data errors into semivariogram (spatial weights depend on location of
individual estimate points; information and spatial properties of data are strictly separated);
(4) Estimation of "in-average-all-sample" minimum variability (real errors might be much smaller,
local disturbances "make" wide results doubtful).
(5) Negative weights of kriging contradict the Law of thermodynamics (weight relates to entropy).
Spatial data are usable, if the data information structure and structure of sampling space are
known and quantitative. However, such data still need not satisfy purpose of investigation for which
they are used. Although GTSUD extracts maximum of information contained in data, it cannot violate
the Law of thermodynamics and "create" information. Other questions include sampling density and
pattern, and quality of data values. However, density and pattern of sampling depends on purpose of
data and scale of required result (definition from semivariogram is in this view irrelevant). Quality of
data relates to applied analytical methods and belongs to other professions rather than to information
theory. Data quality can be always improved by better sampling and/or analytical design, what not
always means higher cost.
Present practical processing of spatial data is troubled much more by unrealistic expectations
and trivial mistakes in data collection, analysis and processing, than by serious theoretical problems.
Insufficiency of statistical approaches for many tasks need not be discussed. More advanced paradigm
is also known (GTUD is published for almost 20 years, special relativity and Riemannian algebra
about a century).
Research priorities are thus set by common scientific paradigm.
Kovanic P. (1984): Gnostical Theory of Individual Data, Problems of control and information theory,
Vol. 13(4), pp. 259-274.
Kovanic P. (1984): Gnostical Theory of Small Samples of Real Data, Problems of control and
information theory, Vol. 13(5), pp. 303-319.
Kovanic P. (1984): On Relations between Information and Physics, Problems of control and
information theory, Vol. 13(6), pp. 383-399.
Kovanic P. (1986): A New Theoretical and Algorithmical Basis for Estimation, Identification and
Control, Automatica, Vol. 22, No. 6, pp. 657-674.
SUMMARY OF GNOSTICAL THEORY OF SPATIAL
Dr. Karel SEVCIK, Australia, firstname.lastname@example.org
Knowledge of spatial properties of studied variables is fundamental for many scientific fields,
from geology to for example engineering, information technology, automation, economy, and
particularly for artificial intelligence and artificial sensing (vision).
Although Theory of Regionalized Variable (geostatistics) represents significant experiment,
until now, no modern scientific approach to these estimation problems has existed. Gnostical Theory
of Spatial Uncertain Data (GTSUD or perhaps "geognostics"), the principles of which are summarized
in this abstract constitutes a new generation of approaches to quantitative spatial data. Some of the
main results given by GTSUD are shown and critically compared with classical methods of
GTSUD grows from mathematical properties of space and numbers. Each individual datum
carries complete information. It is considered a unique individual object. Spatial datum is composed of
two separate parts: its uncertain value and spatial location. Each part must have structure of a
quantitative numerical group. Kind of structure of the group completely determines data model and
space model. Consequently, GTSUD is applicable to any quantitative data (measured or counted).
There is no assumption on data distribution or their spatial properties like stationarity, homogeneity,
Natural consequence of existence of information uncertainty (i.e. a difference between
uncertain information value of a datum (e.g. measured ore concentration) and its ideal value) is a pair
of information characteristics: information weight and information irrelevance. Because space (or
time) is also quantitative variable, existence of spatial uncertainty (difference between location of a
datum (sample) and location of an estimate) naturally results in existence of a pair of spatial
characteristics: spatial weight and spatial irrelevance. Each individual spatial uncertain datum
possesses these four characteristics regardless of other properties (e.g. geostatistical). There is no
relationship between information and spatial characteristics (e.g. no need for stationarity, homogeneity
or model distribution).
Squares of weight and irrelevance have direct physical interpretations in growth of entropy
and loss of information. Entropy and information form two mutually compensating fields. The
mentioned functions result in definition of two distribution functions of an individual spatial
uncertain datum, one in information structure, second for space. Interpretation of all the above
mentioned functions is completely isomorphic with interpretations of corresponding characteristics in
the Special Theory of Relativity and significant correspondence with quantum mechanics was also
shown in literature.
Proven additive composition of information weight and information irrelevance results in two
kinds of distribution functions: global distribution (GDF) and local distribution (LDF). Although
spatial weight and spatial irrelevance are also additive, they are not used in estimation of "spatial
distribution", but in contrast, they serve for optimization of distribution estimates of the observed
variable at a point of an estimate.
Global distribution function is very robust and describes data as one cluster (homogeneous). It
has no general statistical counterpart. Field of GDF-estimate over studied space is always unimodal,
but need not be continuous and partially need not exist at all. If data are not homogeneous in their
value in some area, this estimate simply does not exist. Practical consequence is: (1) in protection of
the estimate against influence of inhomogeneity like e.g. nugget effect and consequent extreme
robustness; and (2) in detection of spatial discontinuity in values like e.g. faults or different
geochemical units. There is no need for any kind of test of existence of GDF, because it always does
not exist, if at least one point of its derivative (the data density) is negative (general probabilistic
definition of distribution function).
Local distribution function is infinitely flexible and thus it could describe multimodal data. Its
statistical counterpart could be found in Parzen's kernels. Practical consequence is: (1) in separation of
different objects like e.g. one map for main concentration field and separate maps of nuggets,
pollution, leached zones in a single estimate; and (2) in detection of spatial discontinuity in values like
e.g. faults or different geochemical units. There is also no need for testing, because this estimate
Quality of estimates is measured by growth of entropy and loss of information, what
guarantees best possible results. GTSUD extracts maximum information from data, but cannot "make"
more information, than information contained in data.
GTUSD produces simple, universal and strictly logic algorithms easily programmable for
computers. Such programs are applicable to any data without need for special knowledge or human
interference, like "art-of-geostatistics". Properties of GTSUD protect applications from production of
mistaken results (e.g. if data are inhomogeneous in their values at one point, there is preferred no
result for that point against wrong global estimate, while local estimate always exists, but might have
more than one value).
Opportunities and limitations of sharing spatial data
Institute of Geodesy and Cartography
Nowadays, there is a rapid growth of the availability of digital spatial data and a growing need
to use it for all kinds of GIS applications and to support decision-making process. The development of
communication technology makes possible to collect datasets from variety of sources and different
types of application. Data providers make data available to users via Internet. It seems to be a lot of
databases, datasets, and other geographical information like satellite imagery, aerial photographs, and
maps in the digital and analogue forms. It also becomes possible for every user to share some spatial
data, and not to collect them from very beginning. Sharing data requires, first of all wide information
about scope of data, and the place where they are stored, furthermore translation from original source
of data into the user’s system and adaptation to specific GIS applications. The understanding of the
quality of the data in order to conscious applications is of utmost importance. It becomes the limitation
of data sharing and data usability.
GIS allows mixing data of different origin and accuracy, this leads to assume that all objects
in GIS databases are of different quality. The digital systems are capable of processing data more
precisely than the analog ones, but their final accuracy still depends on the accuracy of their source
data. The final accuracy depends on the qualities of the original input data and on the precision with
witch input data are processed.
Data usability means the effectiveness, efficiency, and satisfaction with which users can
achieve goals using the data in a particular environment. As users requirements are very diversified
usability is relative and can be only determined in the context of use. The users always define usability
for certain applications to fulfill goal of GI system. For most of users usability means that data have to
be in form that can be handled with the tools that user posses. Users need to know what data are in the
database and what are their characteristics.
The same data should be shared by different applications that is why one of the most
important question is ”how the data should be stored, maintained and make available for users?” in
order to be usable.
Three conditions are important and necessary to enable high data usability:
1) Methodological – methodology for building GIS application (conceptual, logical and
physical project, object definitions, data dictionary), metadata, standards for data recording,
storing and transferring, available and clear documentation for further development,
reference system and reference data, data integration and harmonisation.
2) Organizational – rules for sharing data, agreement between the actors of spatial data
3) Technical – computer communication technology, openness in software and hardware,
Request for spatial data are very high both in public and private sectors in Poland. The most useful
data are that collected in the National Land Information System, literally: land and building register,
boundaries of administrative units, cadastral and utilities maps. These data have the best usability as
they are always stored in the similar way and the context of information is always the same, the
responsibility, quality and all necessary information about data are well known. The LIS databases can
be easily incorporated and integrated with other system.
Selected topographical data (transport network, hydrography, settlements), DTM and land use
data are also concerned. For the sake of lack of commonly access to digital forms of these data they
have rather poor usability. The data are collected and maintained mostly by private companies, seldom
up-dated, poor documented. Metadata system for geographic data to all intents and purposes does not
exist. Sometimes data provider have data description in written or digital forms sometimes only
mentally in their minds. Data are difficult to access and rather expensive.
The other limitations for sharing data by GIS applications in Poland are as follows:
− inadequate technical infrastructure;
− too many co-ordinate systems defined in zones that disable creating seamless databases;
− lack of accepted standards for data recording, storing and transferring;
− not cleared rules of sharing data;
Only recently increased efforts have been put into making data more usable. At the central
level of administration some essential decisions were made concerning: project NSDI, topographical
database, creation of database for general data, standardization. At regional level GIS systems are
build with the topographic data as the reference data. However the GIS environmental are different,
regional databases are harmonized at the conceptual level.
Usability issues in information visualization applications
Carla M. D. S. Freitas 1 , Marco A. Winckler 2 , Paulo R.G. Luzzardi 1 , Luciana P. Nedel 1
and Marcelo S. Pimenta 1
1 Universidade Federal do Rio Grande do Sul, Instituto de Informática
Caixa Postal 15064 91501-970 Porto Alegre, RS, Brazil
2 LIHS Université Toulouse 1 Place Anatole France
31042 Toulouse France
In the last few years the increasing volume of information provided by several applications,
different instruments and mainly the Web has lead to the development of techniques for selecting
among the bulk of data the subset of information that is relevant for a particular goal or need. Research
on visual query systems, data mining and interactive visualization techniques has resulted in a wide
variety of visual presentation and interaction techniques that can be applied in different situations.
However, although there is a great variety of models and techniques for information visualization ,
each application requires a particular study in order to determine if the selected technique is useful and
usable. This study is usually guided by the type of data that should be represented and the user tasks or
analysis process that the visualization should help or support. Previous work at our research group [2,
3, 4] has resulted in a classification of either data categories and visual representations that provides a
conceptual framework for developing new techniques [5,6]. During these projects it has become
evident that we can not separate the visual aspects of both data representation and interface issues
from the interaction mechanisms to help a user to browse and query the data set through its visual
representation. Moreover, our experience confirms that evaluating these two aspects is an important
issue that must be addressed with different approaches including, of course, empirical tests with users,
which have proved that users often have their own analysis tools and are not aware of the benefits of
We separate usability issues in three main categories: visual representation usability, tool's
usability and data usability. Developing an application where visual data representation is the basis for
interaction imposes answering the following questions. Does visualization techniques' usability affect
data usability? How do we separate visual representation and interaction aspects that affect tool's
usability from modeling aspects that clearly affect data usability?
Our approach is to link interface usability knowledge with evaluation of the expressiveness,
semantic content, and interaction facilities of visualization techniques. Classical techniques employed
for evaluating user interfaces for example, usability inspection methods and user testing, are being
investigated to select an adequate framework for a methodology of usability testing at all the three
levels mentioned above. At present we have empirical evidences collected from case studies
suggesting that we can distinguish these three categories.
 Card, S.K.; Mackinlay, J.D. and Shneiderman, B. (eds.) Readings in Information Visualization - using
Visualization to Think. San Francisco, Morgan Kaufmann, 1999.
 Manssour, I., Freitas, C.M.D.S., Claudio, D.M. and Wagner, F.R. Visualizing and Exploring Meteorological
Data Using a Tool-Oriented Approach. In: Earnshaw, R., Vonce, J. and Jone, H. (eds.) Visualization and
Modeling, Cambridge, Academic Press, 1995. pp. 47-62.
 Freitas, C.M.D.S, Basso, K., Drehmer, M., Oliveira, J.B., Hofmann, L.S. and Freitas, T.R.O. Visualizing
dolphins'behavior in a limited area. Unpublished case study. 1999.
 Basso, K. and Freitas, C.M.D.S. Visualization of geological prospecting data. In: International Symposium
on Computer Graphics, Image Processing, and Vision Proceedings. Rio de Janeiro, Brazil, 1998. pp. 142-149.
 Manssour, I.H.; Furuie, S, Nedel, L. P., Freitas, C. M.D. S. A Framework to Visualize and Interact with
Multimodal Medical Images. In: International Workshop on Volume Graphics 2001. Stony Brook, New York.
IEEE Computer Society.
 Cava, R.A. and Freitas, C.M.D.S. Visualizing Hierarchies using a Modified Focus+Context Technique. In:
IEEE Information Visualization 2001. Late Break Hot Topics Proceedings. (Contribution accepted as interactive
Users perception of spatial data usability.
Centre for Geo-information
Spatial data usability is a complex issue. What are the key factors that determine
spatial data usability? How do different users judge the usability of spatial data? What is the
best way to present spatial data in order to make a user assessment possible of its usability?
Answers to the above raised questions are not easy. Clearly the specific demand of a user at a
certain moment in space and time pays a crucial role, but also the characteristics and
accessibility of spatial data are important.
In order to get some notion of the users perception on spatial data usability a limited
survey was held among 40 persons. Each person was asked to classify its own level of
knowledge as a spatial data expert or spatial data amateur. 20 Persons are classified as spatial
data experts and 20 as amateurs. Next, the question “What makes spatial data usable for you?”
was asked to all persons. The time for responding was limited, instant reactions on this
question were recorded. In the presentation the results of this survey will be presented.
Although, limited in scope and scale, the survey clearly indicates some key aspects of
spatial data usability.
Data usability for operational modelling in British forestry
Juan C. Suárez
Silv (N) Forest Research
Forestry in GB is evolving a multi-purpose role in which concerns over the environment and
the provision of recreation match the more traditional requirements of timber production. There is,
therefore, a need to realign research outputs to meet these new demands through the provision of
Models are a method for encoding knowledge in order to address the volume, complexity and
uncertainty in our understanding of natural processes. Therefore, modelling can provide one of the
most effective methods for technology transfer of research. It also provides forest managers with
decision-making tools. The British Forest Research Agency is developing the CoreModel programme
as a method for integrating models in a multimodel structure that gravitates around the use of a
Process Based Model of tree growth and the use of an Object-Oriented design architecture, reinforced
by the addition of GIS capabilities.
Models frequently offer limited adaptability in their predictions when confronted by new
situations and applications (e.g. different species composition, irregular stand structures). Constant
changes in the problem domain require existing models to be adaptable to new situations and still offer
sensible predictions. In model integration, up-scaling and down-scaling operations are limited by the
availability of data at different spatial and temporal scales. In forestry, each level operates at different
spatial and temporal scales that vary from a few minutes or hours affecting individual leaves
(physiological models) to thousands of years affecting an entire forest over hundreds or thousands of
square kilometres (forest succession models).
There are different approaches for harmonising data scales when integrating models.
Aggregation is generally perceived as computationally intensive and the effort of running an entire
successional sequence certainly intimidating. Alternatively, the use of proportional multipliers can be
an optimal solution when contrasted with empirical datasets. However, the use of multipliers may be
constrained by the absence of mechanistic cause-and-effect information that could be useful in
predicting the same response across different scales. Data modelling techniques like data trends is
used in those situations where data are highly correlated to baseline data (e.g. temperature and
elevation). In other situations, the use of intermediate models can be used to create the data inputs
required in other models. This is a process not exempt from uncertainties difficult to quantify in the
absence of validation data. Natural processes random in appearance may be pose a limit in our
capabilities for data modelling or data usability. Gap models in forestry describe forest structure in
terms of the competition for nutrients between trees. Nevertheless, they are limited in terms of
describing rate of survival when the stand is affected by abiotic hazards like wind or snow.
A case of model integration within the CoreModel framework is ForestGALES, a wind risk
model for forestry plantations. One of the main problems in the prediction of wind damage is the level
of aggregation of the crop characteristics as depicted in the current datasets. Stocking density, tree
height and tree diameter, are referred to different fields as a unique value representing each forest
stand in the component table of the Forestry Commission Sub-Compartment Database. Montecarlo
methods and a height-diameter distribution have been applied to create a spatio-temporal dimension of
stand variability. A second approach is the use of a commercial Lidar sensor to map spatial variability
within the polygons representing each forest stand. The second approach may be used to the analysis
of locational effects not considered originally by the model.
Fitness for purpose as a component of data usability
Sytze de Bruin
Centre for Geo-Information
At the onset of the workshop, there might be as many ideas about what ‘data usability’ means
as there are participants. Possibly, we will end up with a list of components that all contribute to the
concept. My intended contribution to that list would be its interpretation in terms of ‘fitness for use’. I
will deal with assessment of fitness for use in cases where we know how data are to be used, but we
are uncertain as to whether or not a particular data set suits the intended objective. I will briefly
discuss three case studies that demonstrate a decision analytical method for assessing the expected
utility of data and address the data requirements of such approach.
In the first case study, fitness for use is determined by the uncertainty in the data set and by
the risk of undesirable consequences when making decisions based on that data. Here, the utility of the
data set lies in its ability to control the probability of adverse consequences. Another aspect
highlighted in this case study is that of spatial variability of uncertainty.
In the second case study, two candidate data sets provide information about a process that can
be considered stochastic. The data sets will not be able to control this process, but rather provide
information on its unknown realisation. Again, the expected utility of the information can be assessed
before actually using the data sets.
Finally, the operational practicability of the decision analytical approach adopted in the first
two case studies is discussed in a setting where the expected benefits and risks of decision
consequences are unknown and where a proper model of data uncertainty cannot be specified.
Estimating the usability of old information
Luleå University of Technology
Department of Environmental Engineering
SE-971 87 Luleå
Several user surveys indicates that having current data is a very important for the usefulness of
a certain dataset . In the standard proposals of today, currentness is specified by a date stamp in the
lineage section. Although the clear importance of the aging factor, very few attempts have been made
to find a theoretical framework for managing incurrent data.
Examples of such framework are the statistical reliability theories and maintenance
engineering. Using these theories, it is possible to estimate the reduced usability of a certain dataset as
well as an optimal data maintenance program.
In this paper, reliability theories are reviewed in the light of geospatial data maintenance and
usage. It is concluded that we often lack important information, to be able to estimate the aging effect
with high precision. This information has so far not been intended to be a part of the quality
specifications as defined by ISO. However, the statistical theories reviewed in this paper, provides a
solid fundament for further research activities.
Robustness in Spatial Analysis
D. Josselin, THEMA, CNRS, France
Key-words : Robustness, Spatial Analysis, Data Quality, Statistical Tools Efficiency,
Expert Approach, ESDA
We propose to discuss about the relationship between robustness and spatial analysis. Is robustness
important in spatial analysis? How it can induce skewed knowledge and decisions? Is there different
forms of robustness and at which level of the spatial analysis process? Indeed, we'll try to find out at
which stage of the spatial analysis process these questions may be pertinent, and if it is possible to
improve robustness in contexts of spatial decision support.
More precisely, we'll divide our text in four parts.
First, we'll give a global definition of the notion of robustness. We'll try to show why it is so
consequential to take it into account in spatial analysis in order to help actors to make decisions
keeping in mind the information reliability along its processing. This will be highlighted using several
examples and applications.
In a second part, we'll present three different aspects of « robustness » (in a large meaning) related to
different levels in spatial analysis:
- the data : their « quality » (notions of accuracy, pertinence, completeness, notably),
- the statistical tools to qualify and quantify a spatial phenomenon by exploring these data : the
statistical tools « efficiency » (resistance and robustness),
- the way expert investigates its data to extract relevant information : the expert « approaches »
(global vs local ; exploratory vs confirmatory).
In a third part, we'll present three propositions for robustness improvement :
- at the data level : a series of maps to make relative the data quality (application to French agricultural
flows and related spatial partitioning),
- at the statistical level : the example of robust estimations of the central value of a statistical
distribution (for instance : the « meadian », a robust estimator built on mean and median compared
to different robust M-estimators),
- at the expert level : different ways (proposistions !) to model and explore a spatial phenomenon by
coupling local and global analysis ;
Finally, we conclude on discussing about a global framework able to enhance robustness and to
provide to the expert different complementary keys to improve its spatial analysis.