Data context new developments for research the social sciences
Data context: new developments for
research the social sciences
4th Luso-Brazilian Conference on Open Access,
University of Sao Paulo
9th October 2013
Structure of the presentation
• Recent reports - what’s going on?
• What constitutes data in the social sciences?
• What problems do we face with the more
traditional forms of data?
• New forms of data
• Challenges using new data types
• The report of the Administrative Data
• What does this mean for journals?
Science as an Open Enterprise
(Royal Society 2012)
The main thrust of this report was that transparency and
openness should characterise all scientific research. As a
major part of this, data sharing should be regarded as the
norm and researchers, their funders and research
institutions should adopt this stance in all their research
activities. An important recommendation relates to
situations where data hold personal information. In such
cases, appropriate safeguards should be put in place to
prevent disclosure of such details whilst facilitating data
New Data for Understanding the
Human Condition: international
perspectives. (OECD 2013)
The focus of this report was on the need for global
collaboration over data sharing. This will require
improved incentives for researchers who agree to share
data, and the adoption of agreed standards and
protocols for data description. Additionally, the report
calls for an international approach to the use of ‘Big Data’
for research, covering collaboration over the exploration
of the research value of new forms of data, the
development of tools for their analysis and improved
access to administrative datasets on a cross-national
Report of the Administrative Data
Taskforce 2012 (ESRC, MRC, Wellcome
This cross-departmental Taskforce proposes a
major boost to the resources available for linkage
and sharing across administrative datasets with
the establishment of Administrative Data
Research Centres in the countries in the UK.
Additionally, all taskforce members are agreed
that new legislation is required in order to
overcome current legal obstacles to record-level
linkage between data held by different
Investing for Growth: Capital
infrastructure for the 21st Century
This report sets out priorities for capital
investment for research. A major theme
throughout is to improve UK capacity to harness
‘Big Data’, emphasising the key importance of
longitudinal data, of linking socioeconomic data
sources to other data, including administrative
records, private sector, and biomedical data, as
well as ensuring these resources are accessible
for social scientific research to benefit the
economy, health and other sectors.
What constitutes data in the social
• Research interests focus upon people and organisations, their
interaction, their evolution – seeking to understand better the
behavioural relationships between them
Data types of interests relate to people and organisations,
variously classified as
Data structures include ‘rectangular’ datasets, hierarchical data,
textual, numerical, audio, video
What problems do we face with the
more traditional forms of data?
• Discovery (NESSTAR; CESSDA; Data
• Documentation (DDI; SDMX)
• Access (DWB; IHSN)
• Reuse (CESSDA)
• Preservation (CESSDA)
New forms of data
Individual tax records
Corporate tax records
Corporation tax; sales; tax, value added tax
Property tax records
Tax on sales of property; tax on value of property
Social security payments
State pensions; hardship payments: unemployment benefits;
Income tax; tax credits
Border control records; import/export licensing records
Housing and land use
Registers of ownership
Criminal justice registers
Police records; court records
Social security registers
School inspections; pupil results
Registers of eligible persons
Voter registration records
Employer census records: registers of persons joining/leaving
Births; marriages; civil unions; deaths; immigration/emigration
records; census records
Health system registers
Personal medical records; hospital records
Driver licence registers; vehicle licence registers
Political parties; charities; clubs
Challenges using new data types
Access may be strictly controlled
Focus from here on one particular data
Administrative data – reuse for
What are administrative data?
Data which are the product of an administrative
system. They are generated by organisations for
operational purposes or as a legal requirement.
They might identify people and/or organisations
and may contain detailed spatial information, be
time-stamped. They are produced by public and
private sector organisations. They are not
designed for research.
What is the research value of such data?
• They already exist. No additional data collection costs associated
with research use.
• They are typically large national datasets, permitting more
detailed research to be undertaken than would otherwise be the
• They record a process, which can be documented and
• Linkage between data relating to different time periods can
create longitudinal resources.
• Linkage to other data sources (e.g. surveys) can enhance these
What are the problems associated with
their research use?
• Not designed for research. This may pose difficulties for their
use in specific research areas.
• They are not subject to statistical standards or statistical
• They may be difficult to access, and linkage may be prohibited
or may not be feasible.
• As the systems that generate them change, so might the data.
• Their preservation for research is not regarded as a
fundamental objective – may lead to problems with metadata.
Some of the problems currently faced by
• Inconsistent access conditions.
• Severe time delays in granting access or refusal.
• Lack of information about selection and/or linking of
• Restricted access to datasets – especially for addressing the
• Data controller making unilateral decision about
appropriateness of data for research.
• Research permitted then publication denied.
Terms of reference for the Taskforce
• identification of potential risks and benefits from increased
research use of administrative data;
• identification of likely resource implications arising from
increased research use of administrative data;
• the development and introduction of common procedures to
provide more efficient access to administrative datasets;
• clarification of the legal situation governing the use of routine
• clarification of when consent is required and what consent
procedures should be used;
• identification of possible need for legislative change to improve
access to administrative data for research.
What has the Taskforce recommended?
• Improved access and linkage procedures and arrangements
for their governance.
• A clearer legal environment for linkage between data held by
• A common accreditation process for researchers applying for
access to and linkage between administrative datasets.
Where are we now?
• £34 million released by government .
• Four Administrative Data Centres commissioned.
• A new UK Administrative Data Service set up.
• A national governing authority is being established.
• New legislation under preparation.
• Now commissioning centres for local government and private
What are the implications for libraries and
• Libraries as home for secure remote access facilities .
• More attention to data documentation and discovery tools.
• Building up capacity within the research community to
facilitate research using the improved access and data linkage
• Subject knowledge of librarians to extend to administrative
• To be solved – open access and access to administrative data