Your SlideShare is downloading. ×
Unlocking the geospatial potential of survey data
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Unlocking the geospatial potential of survey data


Published on

Paper on a JISC-funded project based at the UK Data Archive, as presented at the GISRUK 2012 conference, Lancaster University. The project set out to better enable the use of Archive datasets in GIS, …

Paper on a JISC-funded project based at the UK Data Archive, as presented at the GISRUK 2012 conference, Lancaster University. The project set out to better enable the use of Archive datasets in GIS, primarily by addressing metadata and quality issues of geospatial identifiers.

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • Based on JISC Geospatial funded work at the UK Data Archive
  • Archived survey data has great potential for secondary analysis in GIS, a potential which is not yet fully realised. The UK Data Archive,as distributor ofthe UKs largest collection of social survey data, is well positioned to spearhead developments in this area.
  • We curate the largest collection of digital data in the social sciences in the UK. Over 5,000 datasets from government departments and research institutes and other organisations, all of which are made available online to UK academia.Some of you might be familiar with our datasets, some of the more well known series include the QLFS, BHPS and BCS.
  • The UGeo project looked in depth at this survey data, much of which contains geographic variables of some kind. We wanted to assess the quality and condition of the identifiers and the metadata describing them.
  • First part of project a systematic information gathering exercise, working through datasets one by one and pulling out the geography variables for further examination. A observation we were quickly able to make was that the availability of geo-referenced data has been steadily increasing, particularly over the last 10 years. Not only are new studies being geo-referenced, but new varieties of identifier are being added with uses for different disciplines. Lower level geographies such as postcode and grid reference have also been increasingly made available, thanks to the advent of new licensing options and secure data services.However, the actual state of the variables and their metadata is still relatively poor. For example: timestamps are often missing making appropriate linking impossible; inappropriate units are used prohibiting meaningful analysis
  • The next stage of our project then, was to work how exactly to remedy some of these data problems so evident in our investigation. What exactly are we looking for in ready to use georeferences? We suggest a three part criteria:
  • Selection is the choice of geographic identifier.Ideally of a sufficiently low level that they can be transformed to any other variable e.g. grid reference, postcodeAppropriate for analysis – e.g. statistics-appropriate units such as output areaShould be appropriate to the data subject e.g. researchers are likely to want parliamentary constituencies for a political survey, police force areas with the BCS
  • How easy it is to unambigously interpret the variable and codes:Use standard names for units e.g. the term Scottish Region could refer to administrative or electoral regions – so disambiguate them in the nameUse standards such as GSS Coding and Naming scheme produced by the ONS which provides a standard set of codes for each division of many popular spatial units
  • Ensure any spatial unit is well documented:A timestamp for each variable, for example Government Office Region as defined in 2001 as opposed to 1998Sufficient documentation of provenance. For example, if you’re including a grid reference, how was it derived? Postcode centroids?
  • In order to meet this criteria, there are new approaches needed in many stages of the pre-analysis data lifecycle, from both those gathering the data and going on to deposit it, and from those who preserve and disseminate the data such as the UK Data Archive
  • Briefly, what should those collecting data be doing? This has relevance to those working on research projects as well as big government surveys. - Instead of tacking on geo identifiers they should be considered prior to data collection, and asking which units and why?- Using data standards at the collection stage- Documenting how the unit been derived in precise terms
  • What are we doing to make the lives of researchers easier? The UK Data Archive will be leading the way in new developments for archives INSPIRE is an EU standard which helps to ensure a minimum level of information about the geospatial content of a dataset.A number of survey data / spatial unit specific improvements will be required. Much data cleansing work on our catalogue will be taking place over the coming months to bring it up to scratch
  • 3. Using the enhanced metadata, we will try and make it easier for users to find the data they need. We will be considering interface design and making the relevant documentation easier to find. All this will consider the semantics of the unit – dataset relationship4. And finally we will of course be encouraging data depositors to give us better geospatial data
  • An immediate development of the project has been a web tool called the UGeo Browser. This is a demonstration tool, that brings the geospatial to the forefront of searchable metadata. It meets many of the requirements I have just outlined, for a subset of our survey data collection:Revised and augmented variable level metadata to ensure accuracy and completenessExtra quality information – e.g. this variable is Ward, but it’s missing value labelsClear and immediately accessible unit definitionsVerified links to boundary files, with divergence (if any) between dataset and boundary clarified
  • Interface preview
  • Interface preview
  • In many ways this functions as a proof of concept for our ideas on how ‘studies’ and ‘units’ as entities should interact. The long term goal is that this tech will be integrated in the Archive’s central catalogue. Data cleansing and application development work has already begun.We’re also now considering the best way of creating formal semantics between units and studies. Perhaps a first step will be persistent identifiers for units…
  • Thanks and contact details :)
  • Transcript

    • 2. Archived survey data presents a vast wealth of material with potential for secondary use in GISUNLOCKING THE GEOSPATIALPOTENTIAL OF SURVEY DATA
    • 3. UK DATA ARCHIVE • Over 5,000 datasets • Popular survey data series include:  Quarterly Labour Force Survey  British Household Panel Survey / Understanding Society  British Crime SurveyUNLOCKING THE GEOSPATIALPOTENTIAL OF SURVEY DATA
    • 4. We set out to explore the availability and usability of geo-identifiers in the UK Data Archive collection These identifiers come in the form of ‘spatial units’ e.g. Ward and ConstituencyUNLOCKING THE GEOSPATIALPOTENTIAL OF SURVEY DATA
    • 5. • The availability of geo-referenced data is ever increasing • The usability of geo-referenced data ‘out- of-the-box’ is still generally poor Reflective of and contributing too a divide between: • GIS experts – idiosyncratic methodologies • Untrained with interest – steep learning curveUNLOCKING THE GEOSPATIALPOTENTIAL OF SURVEY DATA
    • 6. Three key features of ‘ready-to-link’ survey data for GIS 1. SELECTION 2. QUALITY 3. METADATA
    • 7. 1. SELECTION Include geographical identifiers which: • Can be readily transformed • Are of sufficient resolution to allow for fine-grained analysis • Are appropriate to the data subjectUNLOCKING THE GEOSPATIALPOTENTIAL OF SURVEY DATA
    • 8. 2. QUALITY Include geographical identifiers which: • Use standard names • Are coded with a standard coding scheme e.g. ONS’ GSS Coding and NamingUNLOCKING THE GEOSPATIALPOTENTIAL OF SURVEY DATA
    • 9. 3. METADATA Include geographical identifiers which are: • Time-referenced e.g. Government Office Region as defined in 2001 as opposed to 1998 • Well documented in their derivationUNLOCKING THE GEOSPATIALPOTENTIAL OF SURVEY DATA
    • 10. Those collecting data need to adjust their workflows to enable thisThose curating data need to adjust their workflows to enable this
    • 11. What should data collectors be doing? • Considering geographic identifiers BEFORE data collection! • Considering standards • INSPIRE/GEMINI • GSS Coding and Naming • Documenting the provenance of geographic identifiersUNLOCKING THE GEOSPATIALPOTENTIAL OF SURVEY DATA
    • 12. What will we be doing at the UK Data Archive? • INSPIRE compliance (we have published a metadata mapping for DDI-INSPIRE-GEMINI) • Improving spatial unit definitions through extensive data cleansing  Standardised  Time referencedUNLOCKING THE GEOSPATIALPOTENTIAL OF SURVEY DATA
    • 13. What will we be doing at the UK Data Archive? • Improving resource discovery tools / interface  User friendly  Lessen time spent searching through text  Consider semantics • Feeding back to data depositors  Guidance on best practiseUNLOCKING THE GEOSPATIALPOTENTIAL OF SURVEY DATA
    • 14. U·Geo Browser A new web tool for resource discovery • Revised and augmented variable metadata • Information clarifying the quality of the geo-identifier • Integrated spatial unit definitions • Links to boundary files Live beta at: THE GEOSPATIALPOTENTIAL OF SURVEY DATA
    • 15. U·Geo Browser • A demo tool using a simple, pragmatic approach • This tech will be integrated into a central Archive resource discovery tool, and catalogued data will be updated to reflect these refinements - • A step in the right direction but we need formal semantics built on persistent vocabularies • A drive needed to establish thisUNLOCKING THE GEOSPATIALPOTENTIAL OF SURVEY DATA
    • 16. Thanks to: • all those at the UK Data Archive • to EDINA for their contributions as consultants Tom Ensom @UKDataArchiveUNLOCKING THE GEOSPATIALPOTENTIAL OF SURVEY DATA