Oficina 6 - Confoa 2013 - Gestão de Periódicos Científicos - ministrada pelo Prof. Dr. Peter Elias

  1. 1. Data context: new developments for research the social sciences Peter Elias 4th Luso-Brazilian Conference on Open Access, University of Sao Paulo 9th October 2013
  2. 2. Structure of the presentation • Recent reports - what’s going on? • What constitutes data in the social sciences? • What problems do we face with the more traditional forms of data? • New forms of data • Challenges using new data types • The report of the Administrative Data Taskforce • What does this mean for journals?
  3. 3. Recent reports… Royal Society 2012 OECD 2013 ESRC, MRC, Wellcome Trust 2012 RCUK 2012
  4. 4. Science as an Open Enterprise (Royal Society 2012) Royal Society 2012 The main thrust of this report was that transparency and openness should characterise all scientific research. As a major part of this, data sharing should be regarded as the norm and researchers, their funders and research institutions should adopt this stance in all their research activities. An important recommendation relates to situations where data hold personal information. In such cases, appropriate safeguards should be put in place to prevent disclosure of such details whilst facilitating data sharing.
  5. 5. New Data for Understanding the Human Condition: international perspectives. (OECD 2013) OECD 2013 The focus of this report was on the need for global collaboration over data sharing. This will require improved incentives for researchers who agree to share data, and the adoption of agreed standards and protocols for data description. Additionally, the report calls for an international approach to the use of ‘Big Data’ for research, covering collaboration over the exploration of the research value of new forms of data, the development of tools for their analysis and improved access to administrative datasets on a cross-national basis.
  6. 6. Report of the Administrative Data Taskforce 2012 (ESRC, MRC, Wellcome 2012) This cross-departmental Taskforce proposes a major boost to the resources available for linkage and sharing across administrative datasets with the establishment of Administrative Data Research Centres in the countries in the UK. Additionally, all taskforce members are agreed that new legislation is required in order to overcome current legal obstacles to record-level linkage between data held by different administrative bodies.
  7. 7. Investing for Growth: Capital infrastructure for the 21st Century (RCUK 2012) This report sets out priorities for capital investment for research. A major theme throughout is to improve UK capacity to harness ‘Big Data’, emphasising the key importance of longitudinal data, of linking socioeconomic data sources to other data, including administrative records, private sector, and biomedical data, as well as ensuring these resources are accessible for social scientific research to benefit the economy, health and other sectors.
  8. 8. What constitutes data in the social sciences? • Research interests focus upon people and organisations, their • • interaction, their evolution – seeking to understand better the behavioural relationships between them Data types of interests relate to people and organisations, variously classified as  Aggregated/disaggregated  Spatially referenced/time-stamped  Longitudinal/cross-sectional  Quantitative/qualitative  Structured/unstructured Data structures include ‘rectangular’ datasets, hierarchical data, textual, numerical, audio, video
  9. 9. What problems do we face with the more traditional forms of data? • Discovery (NESSTAR; CESSDA; Data Management Plans) • Documentation (DDI; SDMX) • Access (DWB; IHSN) • Reuse (CESSDA) • Preservation (CESSDA)
  10. 10. New forms of data Broad category of data Detailed categories Examples Individual tax records Corporate tax records Corporation tax; sales; tax, value added tax Property tax records Tax on sales of property; tax on value of property Social security payments State pensions; hardship payments: unemployment benefits; child benefits Import/export records Category A: Government transactions Income tax; tax credits Border control records; import/export licensing records Housing and land use registers Registers of ownership Educational registers Criminal justice registers Police records; court records Social security registers Category B: Government and other registration records School inspections; pupil results Registers of eligible persons Electoral registers Voter registration records Employment registers Population registers Employer census records: registers of persons joining/leaving employment Births; marriages; civil unions; deaths; immigration/emigration records; census records Health system registers Personal medical records; hospital records Vehicle/driver registers Driver licence registers; vehicle licence registers Membership registers Political parties; charities; clubs
  11. 11. New forms of data – contd. Broad category of data Detailed categories Examples Store cards Utilities; financial institutions; mobile phone usage Other customer records Product purchases; service agreements Google; Bing; Yahoo search activity Visit statistics; user generated content Downloads Music; films; TV Social networks Facebook; Twitter; LinkedIn Blogs; news sites Reddit CCTV images Category E: Tracking data Customer accounts Website interactions Category D: Internet usage Supermarket loyalty cards Search terms Category C: Commercial transactions Security/safety camera recordings Traffic sensors Vehicle tracking records; vehicle movement records Mobile phone locations: GPS data Visible light spectrum Category F: Satellite and aerial imagery Google Earth© Night-time visible radiation Landsat Infrared; radar mapping
  12. 12. Challenges using new data types • • • • • • • • Provenance Replicability Durability Volume Ethics Confidentiality Legal issues Access may be strictly controlled
  13. 13. Focus from here on one particular data type: Administrative data – reuse for research
  14. 14. What are administrative data? Data which are the product of an administrative system. They are generated by organisations for operational purposes or as a legal requirement. They might identify people and/or organisations and may contain detailed spatial information, be time-stamped. They are produced by public and private sector organisations. They are not designed for research.
  15. 15. What is the research value of such data? • They already exist. No additional data collection costs associated with research use. • They are typically large national datasets, permitting more detailed research to be undertaken than would otherwise be the case. • They record a process, which can be documented and understood. • Linkage between data relating to different time periods can create longitudinal resources. • Linkage to other data sources (e.g. surveys) can enhance these resources.
  16. 16. What are the problems associated with their research use? • Not designed for research. This may pose difficulties for their use in specific research areas. • They are not subject to statistical standards or statistical quality controls. • They may be difficult to access, and linkage may be prohibited or may not be feasible. • As the systems that generate them change, so might the data. • Their preservation for research is not regarded as a fundamental objective – may lead to problems with metadata.
  17. 17. Some of the problems currently faced by researchers • Inconsistent access conditions. • Severe time delays in granting access or refusal. • Lack of information about selection and/or linking of administrative datasets. • Restricted access to datasets – especially for addressing the counterfactual. • Data controller making unilateral decision about appropriateness of data for research. • Research permitted then publication denied.
  18. 18. Terms of reference for the Taskforce • identification of potential risks and benefits from increased research use of administrative data; • identification of likely resource implications arising from increased research use of administrative data; • the development and introduction of common procedures to provide more efficient access to administrative datasets; • clarification of the legal situation governing the use of routine data; • clarification of when consent is required and what consent procedures should be used; • identification of possible need for legislative change to improve access to administrative data for research.
  19. 19. What has the Taskforce recommended? • Improved access and linkage procedures and arrangements for their governance. • A clearer legal environment for linkage between data held by different departments. • A common accreditation process for researchers applying for access to and linkage between administrative datasets.
  20. 20. Where are we now? • £34 million released by government . • Four Administrative Data Centres commissioned. • A new UK Administrative Data Service set up. • A national governing authority is being established. • New legislation under preparation. • Now commissioning centres for local government and private sector data
  21. 21. What are the implications for libraries and journals? • Libraries as home for secure remote access facilities . • More attention to data documentation and discovery tools. • Building up capacity within the research community to facilitate research using the improved access and data linkage arrangements. • Subject knowledge of librarians to extend to administrative datasets. • To be solved – open access and access to administrative data