Incentivising the uptake of reusable
metadata in the survey
production process
ESRA15
Reykjavik
July 2015
Louise Corti
Collections Development and
Producer Support
Why worry about metadata?
• No universal language used to document
questions and variables
• Too many bespoke systems and vocabularies
around
• Massive waste of human resource in the survey
data lifecycle
• Interoperability saves money
• Why don’t we all use the Data Documentation
Initiative (DDI)?
Who needs incentivising?
Show how to exploit metadata for surveys
• Challenge – to get established survey
operations to recognise the benefits of reusable
metadata
• Midlife study in the US (MIDUS) quite unique!
• Help funders, owners and producers ‘See the
light’
• For this we need to show something very cool
• Some good experimental stuff happening
Benefits of publishing rich survey metadata
• Survey documentation systems
• Question banks
• Survey data exploration systems
• Nesstar
• SDA
• Bespoke visualisation systems
Published outputs – question bank
Published outputs – online access
The reality
• Hard to match up Question and Variable
information
• Too much manual data entry involved in publishing
• Must do better
• Gain rich reusable metadata from the survey
design and production process
Survey production lifecycle
• Beset with manual processes
• Legacy systems
• Reluctancy to change or adapt systems
• Hard to embrace new ways – disruptive,
expensive
Typical process – worst case scenario
• Manual questionnaire entry
(doc/excel/database)
• Export in word format
• Deliver to survey agency
• Manual transfer to IBM Data Collection
• Export SPSS and PDF/word questionnaire
Survey Metadata: Barriers & Opportunities
Workshop: 26 June 2014
Meeting outcomes
• Great turn out and knowledge exchange!
• Quick turn around of principles into a ‘campaign’
document and a published ‘Questionnaire profile’
• Some very positive responses – shared problem
• Be an advocate!
Increasing use of XML for survey design and
publishing
Such as:
• Social science data archive published survey
metadata (DDI 2.5)
• Essex panel studies - bespoke XML Questionnaire
Specification Language for survey design
• UK LifeStudy – survey design instrument – XML
Discussing DDI implementation today
• CLOSER cohorts portal using DDI 3.2 Questionnaire
Profile
• DASHISH DDI 3.2 use
• Blaise – import by Michigan Questionnaire
Documentation System (MQDS) DDI 3
• IBM Data Collection DDI experiments
Short brochure for sharable survey products
• Work closely with data owners and producers
• Existing information on data sharing complex
• What is really expected!
• Transferrable information
• Not a bible
Sticks?
• Specifying data documentation requirements in the
commissioning tender for fieldwork
• Mapping between questions and data outputs
• Improved readable questionnaire for end users
CLOSER project
• Funded variable/question discovery service
• Long-running birth cohorts & longitudinal studies
• Drivers for project
• Harmonisation (biomedical, socio-economic)
• Capacity building
• Data Linkage
• Impact
• Discovery
• Encourage use of existing data resources
• Tools for enhancing survey metadata
Incentives for CLOSER PIs?
• Large award to get prestigious cohort studies on
board £££
• Reduce burden - enhancement work done centrally
• Survey data managers
 happy to be part of peer group
 rewarding to to go back and look at data
 liked a shared controlled vocabulary
 Received training
 variable to questionnaire mappings useful
 liked visibility of their study in the search platform
Forward looking survey design
• Think upfront about reusability of questionnaire
metadata
• New studies – new opportunities
• Legacy work to get old messy survey design metadata
into a new environment – may be worth investing in
• Can make harmonisation work so much easier – XML
schema allow formal linkages of variables across time,
equivalence, differences etc.
Data publishers
• Survey owners/producers - documentation online
• Question banks
• Journals - supporting data with sufficient metadata
• Use DDI 3.2 Questionnaire profile, not bespoke
schemas
Self-deposit expectations?
• Peer review of data by data centres for all data
published – includes quality of metadata
• Journals – no unified standard for data description
or documentation
• Start with minimal metadata expectations:
• data collection description
• provenance
• data description: file and variable names, labels,
• relationships between tables/files
Some tips on incentivising
• Speak a common language
• On DDI, don’t drown in detail; use existing profiles
• Start with the lowest common denominator. Baby steps
• Show value – shiny interfaces and examples!
• Provide supporting tools where possible e.g. metadata
entry
• Integrate into everyday workflows and research tools
CONTACT
UK Data Service
University of Essex
Wivenhoe Park
Colchester
Essex CO4 3SQ
• ……………..…..………
………………..
T +44 (0)1206 872145
E corti@essex.ac.uk

Incentivising the uptake of reusable metadata in the survey production process

  • 1.
    Incentivising the uptakeof reusable metadata in the survey production process ESRA15 Reykjavik July 2015 Louise Corti Collections Development and Producer Support
  • 2.
    Why worry aboutmetadata? • No universal language used to document questions and variables • Too many bespoke systems and vocabularies around • Massive waste of human resource in the survey data lifecycle • Interoperability saves money • Why don’t we all use the Data Documentation Initiative (DDI)?
  • 3.
  • 4.
    Show how toexploit metadata for surveys • Challenge – to get established survey operations to recognise the benefits of reusable metadata • Midlife study in the US (MIDUS) quite unique! • Help funders, owners and producers ‘See the light’ • For this we need to show something very cool • Some good experimental stuff happening
  • 5.
    Benefits of publishingrich survey metadata • Survey documentation systems • Question banks • Survey data exploration systems • Nesstar • SDA • Bespoke visualisation systems
  • 9.
    Published outputs –question bank
  • 10.
    Published outputs –online access
  • 11.
    The reality • Hardto match up Question and Variable information • Too much manual data entry involved in publishing • Must do better • Gain rich reusable metadata from the survey design and production process
  • 12.
    Survey production lifecycle •Beset with manual processes • Legacy systems • Reluctancy to change or adapt systems • Hard to embrace new ways – disruptive, expensive
  • 13.
    Typical process –worst case scenario • Manual questionnaire entry (doc/excel/database) • Export in word format • Deliver to survey agency • Manual transfer to IBM Data Collection • Export SPSS and PDF/word questionnaire
  • 14.
    Survey Metadata: Barriers& Opportunities Workshop: 26 June 2014
  • 15.
    Meeting outcomes • Greatturn out and knowledge exchange! • Quick turn around of principles into a ‘campaign’ document and a published ‘Questionnaire profile’ • Some very positive responses – shared problem • Be an advocate!
  • 17.
    Increasing use ofXML for survey design and publishing Such as: • Social science data archive published survey metadata (DDI 2.5) • Essex panel studies - bespoke XML Questionnaire Specification Language for survey design • UK LifeStudy – survey design instrument – XML
  • 18.
    Discussing DDI implementationtoday • CLOSER cohorts portal using DDI 3.2 Questionnaire Profile • DASHISH DDI 3.2 use • Blaise – import by Michigan Questionnaire Documentation System (MQDS) DDI 3 • IBM Data Collection DDI experiments
  • 19.
    Short brochure forsharable survey products • Work closely with data owners and producers • Existing information on data sharing complex • What is really expected! • Transferrable information • Not a bible
  • 20.
    Sticks? • Specifying datadocumentation requirements in the commissioning tender for fieldwork • Mapping between questions and data outputs • Improved readable questionnaire for end users
  • 21.
    CLOSER project • Fundedvariable/question discovery service • Long-running birth cohorts & longitudinal studies • Drivers for project • Harmonisation (biomedical, socio-economic) • Capacity building • Data Linkage • Impact • Discovery • Encourage use of existing data resources • Tools for enhancing survey metadata
  • 22.
    Incentives for CLOSERPIs? • Large award to get prestigious cohort studies on board £££ • Reduce burden - enhancement work done centrally • Survey data managers  happy to be part of peer group  rewarding to to go back and look at data  liked a shared controlled vocabulary  Received training  variable to questionnaire mappings useful  liked visibility of their study in the search platform
  • 23.
    Forward looking surveydesign • Think upfront about reusability of questionnaire metadata • New studies – new opportunities • Legacy work to get old messy survey design metadata into a new environment – may be worth investing in • Can make harmonisation work so much easier – XML schema allow formal linkages of variables across time, equivalence, differences etc.
  • 24.
    Data publishers • Surveyowners/producers - documentation online • Question banks • Journals - supporting data with sufficient metadata • Use DDI 3.2 Questionnaire profile, not bespoke schemas
  • 25.
    Self-deposit expectations? • Peerreview of data by data centres for all data published – includes quality of metadata • Journals – no unified standard for data description or documentation • Start with minimal metadata expectations: • data collection description • provenance • data description: file and variable names, labels, • relationships between tables/files
  • 26.
    Some tips onincentivising • Speak a common language • On DDI, don’t drown in detail; use existing profiles • Start with the lowest common denominator. Baby steps • Show value – shiny interfaces and examples! • Provide supporting tools where possible e.g. metadata entry • Integrate into everyday workflows and research tools
  • 27.
    CONTACT UK Data Service Universityof Essex Wivenhoe Park Colchester Essex CO4 3SQ • ……………..…..……… ……………….. T +44 (0)1206 872145 E corti@essex.ac.uk