Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

ROER4D Open Data Initiative


Published on

Overview of the Research on Open Educational Resources for Development (ROER4D) Open Data initiative, highlighting data management principles, the five pillars of the ROER4D data publication approach and the project de-identification approach.

Published in: Education
  • Be the first to comment

  • Be the first to like this

ROER4D Open Data Initiative

  1. 1. The ROER4D Open Data initiative Michelle Willmers and Thomas King January 2018 CC BY
  2. 2. Introduction to ROER4D • Research on Open Educational Resources for Development project – 18 sub-projects, across 26 countries in the Global South from Chile to Mongolia, with 100 researchers, supported by a Network Hub team based in the University of Cape Town and Wawasan Open University. – Datasets in multiple languages (English, Spanish, Mongolian) – Mostly mixed-methods data (mix of quantitative and qualitative) • ROER4D Open Data initiative: supporting interested sub-projects in sharing their data openly
  3. 3. Research On Open Educational Resources (OER) for Development • Imperative to establish empirical baseline research on OER in Global South • 86 researchers in 26 countries across 3 continents • Project ‘Open’ ethos manifests in Open Research strategy, bridging ‘Open’ silos • Open content (typically used in a teaching and learning content) that can be reused, revised, remixed, redistributed and retained • Made possible by open licensing, although increasing focus on differentiating implicit vs. explicit open content • Focus on role OER can play in improving access to quality education • Focus on role project can play in building Global South Open Education research capacity • Strong advocacy and activism component (NGO, CBO sectors – not only career researchers) Focus on empirical baseline manifests in focus on curatorial and publishing capacity within the research project. The project acts as publisher, providing greater agency and control (but presenting some challenges in terms of accreditation/reward). Unpacking the “ROER4D” project title…
  4. 4. Curation & Dissemination strategy • Provide a content management and publishing service to SP researchers and the Network Hub team in order to advance research capacity development efforts and increase visibility of outputs. • Support Principal Investigators and SP researchers in editorial development of ROER4D outputs. • Address infrastructure deficits and provide content management solutions (including content hosting) in a research community with uneven institutional support and capacity challenges. • Ensure that the ROER4D legacy is freely accessible for reuse in line with international curatorial and publishing standards. • Complement Network Hub Communications efforts in an integrated communications/dissemination approach.
  5. 5. • Data sharing as component of generalised open content focus. • Organising and profiling open content increases the potential for reuse and citation (impact). • Well-organised, strategic research management and content organisation promotes rigour in the research process. • Copyright vests with the author > data-sharing activity determined by their willingness and capacity to engage. • Format and platform/tool agnostic. • Share openly by default on condition that it is valuable, legal and ethical Data management principles
  6. 6. Research Data Management Collect data Organise data Refine data Share data Document data Store data Backup, archive, on- site storage, cloud storage Metadata, dataset description De-identification, publishing, open data Ethics clearance, methodology, instruments Formats, naming conventions Verification, validation
  7. 7. The two pillars of Open Data sharing Consensual ethical legal Comprehensible coherent valuable Research Data Management & Open Data sharing
  8. 8. Project archive (external) Zenodo Researcher ROER4D archive (internal) Google, Vula, UCT eResearch Centre Publisher DataFirst Network Hub (Google, Vula) ROER4D project data flow Internal sharing and collaboration External sharing and collaboration
  9. 9. Open Data terminology • Open Data = Microdata – Unit record data (survey data, census data) – Interview and Focus Group transcripts – i.e. the ‘raw material’ from which outputs, reports, publications etc. are produced. • Supportive documentation = Metadata – Dataset descriptions – Study descriptions (methods/methodology, data collection schedules – Data processing information (e.g. de-identification schema)
  10. 10. Terms and definitions TERM DEFINITION Microdata (aka Unit Record Data) The information that underlies a research project’s analysis (i.e. the ‘thing’) Metadata Data that describes a file or record on a database (for example, keywords, author fields, ISBNs, DOIs) Research Data Management (RDM) Overall term for how individuals/projects/institutions manage their data Data Management Plan (DMP) Outlines an individual or project’s strategy around all aspects of data management Curation Organising, storing/archiving and describing data to ensure & control its long-term accessibility and usability. May include collating/concatenating from other sources De-identification Removing, eliding or replacing pieces of information that reveal research participants’ (possibly also referents’) identity Anonymity Personal details (identifiers) are not gathered Confidentiality Personal details (identifiers) are not shared Curation platform An on-premises or cloud-based storage space that contains metadata capabilities, Search Engine Optimisation, and backup capabilities
  11. 11. Why should researchers share data? • ROER4D motivations: – Build the empirical base for future research – Coherent with our generally ‘open’ approach – publishing open access outputs, actively communicating with audiences and stakeholders, etc. • Good practice – many research funders now require some sort of data- sharing activity or plan • Improve rigour – Sharing data openly demands that the dataset is well described and organised – Increased scrutiny of the dataset often leads to more refined analysis
  12. 12. Five pillars of ROER4D data publication approach
  13. 13. Step 1: Evaluate contractual framework, articulate strategy
  14. 14. Step 2: Get researchers on board
  15. 15. Recruiting participants • Emphasising social justice through sharing – Sharing open data allows for latitudinal studies using data from multiple sites • Emphasising personal reputation – Sharing open data as a means of building one’s personal profile as a researcher • Emphasising rigour – Sharing data openly enhances the quality of the research
  16. 16. • Check ethics approval and consent • Ensure first-tier de-identification takes place prior to Network Hub transfer in order to ensure research subject confidentiality • ROER4D agnostic in its approach (in terms of scale, format and technical sophistication) • Challenges of varying researcher sophistication in terms of data collection and presentation • Challenges of varying researcher sophistication in terms of technology employed to capture, present, and analyse data Step 3: Source sub-project micro-data
  17. 17. • Archive in LMS and secure institutional archive • Network Hub C&D team audits researchers’ submitted dataset > What is the dataset comprised of? > Are all the pieces there? > What were the data collection processes, and do we have all the instruments to share? > What languages are represented? > Does something else like it exist? > Who might it be of use to? • Address file naming and format issues • Articulate sub-project-specific data management plan Step 4: Network Hub curation and quality assurance
  18. 18. • Scope and conceptualise the dataset > Which components of the project-generated micro-data are you ethically and legally allowed to share? > Which components of the project-generated micro-data will you invest resources in curating and sharing? > Which instruments will you include? • Identify focus of data and points of sensitivity • Define appropriate second-tier de-identification approach Step 5: Preparing data for publication
  19. 19. READ DATA Coherence Format & layout Editing Fix typos & identify anomalous data 1. 2. 3. 4. 5. De-identifying Remove identifiers Validation Identify and account for missing data ROER4D data interrogation process
  20. 20. The de-identification balancing act First, do no harm Remove as much as needed to ensure the confidentiality or anonymity of the research participants. Ensure that all ethical and consent processes have been adhered to. Don’t go overboard Remove as little as is ethical to ensure the richness of the data. Take the unit of analysis as the guide – de- identify up to the Unit of Analysis. E.g: If Study X compares two universities, you can safely remove all identifiers lower than the university affiliation. HOWEVER Your data may be useful to others. The purpose of de-identification is to preserve confidentiality – don’t de-identify for the sake of it
  21. 21. ROER4D de-identification process 1. First-level de-identification by researcher – Removal of direct identifiers (names of people/institutions/companies, ID numbers, etc.) – Important to ensure that raw data is not shared 2. Second-level de-identification by C&D team to catch remaining direct identifiers 3. In-depth sweep of the text to identify indirect identifiers – Meticulous, thorough, repeated reading of the text (which ties back to general data enhancement)
  22. 22. Qualitative de-identification • De-identification located in the same ecosystem as data cleaning and data validation – no clear line between data improvement and de-identification – Cleaning up typos – Standardising presentation and layout – Identifying unanswered questions (or additional questions), mislabelled responses, etc. • Much of these also apply to quantitative data • Articulation of principles in RDM and description of these processes included in metadata
  23. 23. Qualitative de-identification example • Raw data – Well my name is Susan Tsvangirai, and I’m the Head of the Anthropology department at the University of Zimbabwe. I first started getting involved in publishing my data – see I’m the only person in the country who works on human ecologies, well it’s me and Ishaan at Wits, but I’m the only one locally, and I started out using the institutional repository but it didn’t really work. It kept timing out when I tried to upload resources. So I switched the Zenodo which was fine but it felt a little bit sterile… • Cleaned/processed data – Well my name is [redacted], and I’m the Head of [my] department at the University of Zimbabwe. I first started getting involved in publishing my data – see I’m the only person in the country who works [in my area], well it’s me and [a colleague] at Wits, but I’m the only one locally, and I started out using the institutional repository but it didn’t really work. It kept timing out when I tried to upload resources. So I switched the Zenodo which was fine but it felt a little bit sterile…
  24. 24. • Generate metadata and dataset description (accompanying narrative) • Submit content to publisher (in ROER4D instance, DataFirst) • Link to published outputs • Include description of process in research Methodology statements • Profile in project communications activity Step 6: Publish
  25. 25. Challenges • Data collected in multiple languages – De-identification (particularly in qualitative data) far more difficult – greater reliance on the researcher to identify disclosive information • Post-hoc consent process – Departments merge or close, participants retire or disappear • Data collected by multiple researchers – Different collection strategies, adherence to interview schedules, use/non- use of clarifying questions, etc.
  26. 26. Ways forward: ‘Open by design’ • Help researchers write consent forms to facilitate ethical open data sharing. • ‘Red flag’ clauses abound in template consent forms, including: – “will be used for research purposes only” – “data will be destroyed after use” – “only researchers will have access to the data” • More open consent forms allow for data sharing but do not mandate it.
  27. 27. Lessons learned 1. Openness increases rigour. Preparing data for publication promotes professional approach to research process. 2. Preparing data for publication exposes weaknesses in instrument design and research process. 3. Introducing C&D and data-sharing focus midway through a project poses many challenges, particularly in terms of ethical and consent components. 4. Data sharing drives focus on reproducibility, transforming traditional approach to crafting methodology statements. 5. The data preparation process takes time (approx. one week of researchers’ time in ROER4D context). 6. Obtaining balance between utility and adequate protection in de-identification of qualitative data is a challenge. 7. Openness is threatening to researchers in terms of exposing weakness in processes and perceived threat of losing publication advantage. 8. C&D and data sharing activity require support, capacity development and resourcing.