Overview of the Research on Open Educational Resources for Development (ROER4D) Open Data initiative, highlighting data management principles, the five pillars of the ROER4D data publication approach and the project de-identification approach.
What is data discovery and how do people find out about data?
Metadata: What information helps potential users decide whether that data might be useful?
How and why do machines exchange information about research data?
Data without metadata and connections is useless:
Linked data
How Scholix is helping publishers and others to link data with publications and more
Metadata, controlled vocabularies, linked data and crosswalks
Things #11, #12, #13 of 23 Things
How do we make FAIR data? Finable, Accessible, Interoperable, Reusable?
Presentation on data sharing that outlines five layers that must be addressed to enable data to be located, obtained, access, understood and use, and cited.
This presentation was provided by Clara Llebot of Oregon State University, during the NISO hot topic virtual conference "Effective Data Management," which was held on September 29, 2021.
It is about:
Introduction: What Is “Research Data”? and Data Lifecycle
Part 1:
Why Manage Your Data?
Formatting and organizing the data
Storage and Security of Data
Data documentation and meta data
Quality Control
Version controlling
Working with sensitive data
Controlled Vocabulary
Centralized Data Management
Part 2:
Data sharing
What are publishers & funders saying about data sharing?
Researchers’ Attitudes
Benefits of data sharing
Considerations before data sharing
Methods of Data Sharing
Shared Data Uses and Its’ Limitations
Data management plans
Brief summary
Acknowledgment , References
An introduction to the FAIR principles and a discussion of key issues that must be addressed to ensure data is findable, accessible, interoperable and re-usable. The session explored the role of the CDISC and DDI standards for addressing these issues.
Presented by Gareth Knight at the ADMIT Network conference, organised by the Association for Data Management in the Tropics, in Antwerp, Belgium on December 1st 2015.
Managing Ireland's Research Data - 3 Research MethodsRebecca Grant
Slides providing an overview of the research methods used in the author's thesis, "Managing Ireland's Research Data: Recognising Roles for Recordkeepers". The methods discussed are online surveys, comparative case studies, and autoethnography.
Licensed as CC-BY.
What is data discovery and how do people find out about data?
Metadata: What information helps potential users decide whether that data might be useful?
How and why do machines exchange information about research data?
Data without metadata and connections is useless:
Linked data
How Scholix is helping publishers and others to link data with publications and more
Metadata, controlled vocabularies, linked data and crosswalks
Things #11, #12, #13 of 23 Things
How do we make FAIR data? Finable, Accessible, Interoperable, Reusable?
Presentation on data sharing that outlines five layers that must be addressed to enable data to be located, obtained, access, understood and use, and cited.
This presentation was provided by Clara Llebot of Oregon State University, during the NISO hot topic virtual conference "Effective Data Management," which was held on September 29, 2021.
It is about:
Introduction: What Is “Research Data”? and Data Lifecycle
Part 1:
Why Manage Your Data?
Formatting and organizing the data
Storage and Security of Data
Data documentation and meta data
Quality Control
Version controlling
Working with sensitive data
Controlled Vocabulary
Centralized Data Management
Part 2:
Data sharing
What are publishers & funders saying about data sharing?
Researchers’ Attitudes
Benefits of data sharing
Considerations before data sharing
Methods of Data Sharing
Shared Data Uses and Its’ Limitations
Data management plans
Brief summary
Acknowledgment , References
An introduction to the FAIR principles and a discussion of key issues that must be addressed to ensure data is findable, accessible, interoperable and re-usable. The session explored the role of the CDISC and DDI standards for addressing these issues.
Presented by Gareth Knight at the ADMIT Network conference, organised by the Association for Data Management in the Tropics, in Antwerp, Belgium on December 1st 2015.
Managing Ireland's Research Data - 3 Research MethodsRebecca Grant
Slides providing an overview of the research methods used in the author's thesis, "Managing Ireland's Research Data: Recognising Roles for Recordkeepers". The methods discussed are online surveys, comparative case studies, and autoethnography.
Licensed as CC-BY.
Preparing your data for sharing and publishingVarsha Khodiyar
Talk given as part of the MRC Cognition and Brain Sciences Unit Open Science Day on 20th November 2018 , University of Cambridge (https://www.eventbrite.co.uk/e/open-science-day-at-the-mrc-cbu-tickets-50363553745)
Overview of the UKRDDS pilot project at Univwersity of Edinburgh employing PhD interns to validate metadata about research data created by University of Edinburgh researchers and held in local RDM services solutions. This was presented at IASSIST in June 2016, Bergen, Norway.
Going Full Circle: Research Data Management @ University of PretoriaJohann van Wyk
Presentation delivered at the eResearch Africa Conference, held 23-27 November 2014, at the University of Cape Town, Cape Town, South Africa. Various approaches to Research Data Management at Higher Education Institutions focus on an aspect or two of the research data cycle. At the University of Pretoria the approach has been to support researchers throughout the research process covering the whole research data cycle. The idea is to facilitate/capture the research data throughout the research cycle. This will give context to the data and will add provenance to the data. The University of Pretoria uses the UK Data Archive’s research data cycle model, to align its Research Data Management project-development. This model identifies the stages of a research data cycle as: creating data, processing data, analysing data, preserving data, giving access to data, and reusing data. This paper will give a short overview of the chronological development of research data management at the University of Pretoria. The overview will also highlight findings of two surveys done at the University, one in 2009 and one in 2013. This will be followed by a discussion of a number of pilot projects at the University, and how the needs of researchers involved in these projects are being addressed in a number of the stages of the research data cycle. The discussion will also give a short overview of how the University plans to support those stages not currently being addressed. The second part of the presentation will focus on the projects and technology (software and hardware) used. The University of Pretoria has adopted an Enterprise Content Management (ECM) approach to manage its Research Data. ECM is not a singular platform or system but rather a set of strategies, tools and methodologies that interoperate with each other to create a comprehensive management tool. These sets create an all-encompassing process addressing document, web, records and digital asset management. At the University of Pretoria we address all these processes with different software suites and tools to create a complete management system. Each process presented its own technical challenges. These had to be addressed, while keeping in mind the end objective of supporting researchers throughout the whole research process and data life cycle. Various platforms and standards have been adopted to meet the University of Pretoria’s criteria. To date three processes have been addressed namely, the capturing of data during the research process, the dissemination of data and the preservation of data.
This presentation was provided by Les Hawkins of The Library of Congress, during the NISO at NASIG Pre-conference "Metadata in a Digital Age: New Models of Creation, Discovery, and Use," held on June 4, 2008.
Elaine Martin, MSLS, DA, Donna Kafel, RN, MSLS, and Andrew Creamer, MaEd, MSLS of UMass Medical School''s Lamar Soutter Library present Best Practices for Managing Data. The presentation features the importance of managing data for research projects, and tactical best practice initiatives to create a data management and sharing plan, including how to preserve label, secure, store, and preserve data. Issues, such as licensing, data dictionaries, regulations, and metadata are addressed in the presentation.
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
These slides cover evolving federal research requirements for sharing scientific data. Provided are updates on federal agency responses to the 2013 OSTP memo, guidance on data management plans, resources for data management and curation training for staff/researchers, and tips for evaluating public data-sharing services. ICPSR's public data-sharing service, openICPSR, is also presented. Recording of this presentation is here: https://www.youtube.com/watch?v=2_erMkASSv4&feature=youtu.be
An introduction to Research Data Management and Data Management Planning presented at the University of the West of England on Wednesday 9th July 2014.
February 18 2015 NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Keynote Address: Data Management Plan Requirements at the US Department of Energy
Laura J. Biven, Ph.D., Senior Science and Technology Advisor, Office of the Deputy Director for Science Programs, Office of Science, US Department of Energy
This presentation was provided by Maria Praetzellis of California Digital Library, during the NISO hot topic virtual conference "Effective Data Management," which was held on September 29, 2021.
This presentation was delivered at the Elsevier Library Connect Seminar on 6 October 2014 in Johannesburg, 7 October 2014 in Durban and 9 October 2014 in Cape Town and gives an overview of the potential role that librarians can play in research data management
The ROER4D Curation & Dissemination team provides an overview of the ROER4D open data initiative as well as some key insights and challenges experienced.
Preparing your data for sharing and publishingVarsha Khodiyar
Talk given as part of the MRC Cognition and Brain Sciences Unit Open Science Day on 20th November 2018 , University of Cambridge (https://www.eventbrite.co.uk/e/open-science-day-at-the-mrc-cbu-tickets-50363553745)
Overview of the UKRDDS pilot project at Univwersity of Edinburgh employing PhD interns to validate metadata about research data created by University of Edinburgh researchers and held in local RDM services solutions. This was presented at IASSIST in June 2016, Bergen, Norway.
Going Full Circle: Research Data Management @ University of PretoriaJohann van Wyk
Presentation delivered at the eResearch Africa Conference, held 23-27 November 2014, at the University of Cape Town, Cape Town, South Africa. Various approaches to Research Data Management at Higher Education Institutions focus on an aspect or two of the research data cycle. At the University of Pretoria the approach has been to support researchers throughout the research process covering the whole research data cycle. The idea is to facilitate/capture the research data throughout the research cycle. This will give context to the data and will add provenance to the data. The University of Pretoria uses the UK Data Archive’s research data cycle model, to align its Research Data Management project-development. This model identifies the stages of a research data cycle as: creating data, processing data, analysing data, preserving data, giving access to data, and reusing data. This paper will give a short overview of the chronological development of research data management at the University of Pretoria. The overview will also highlight findings of two surveys done at the University, one in 2009 and one in 2013. This will be followed by a discussion of a number of pilot projects at the University, and how the needs of researchers involved in these projects are being addressed in a number of the stages of the research data cycle. The discussion will also give a short overview of how the University plans to support those stages not currently being addressed. The second part of the presentation will focus on the projects and technology (software and hardware) used. The University of Pretoria has adopted an Enterprise Content Management (ECM) approach to manage its Research Data. ECM is not a singular platform or system but rather a set of strategies, tools and methodologies that interoperate with each other to create a comprehensive management tool. These sets create an all-encompassing process addressing document, web, records and digital asset management. At the University of Pretoria we address all these processes with different software suites and tools to create a complete management system. Each process presented its own technical challenges. These had to be addressed, while keeping in mind the end objective of supporting researchers throughout the whole research process and data life cycle. Various platforms and standards have been adopted to meet the University of Pretoria’s criteria. To date three processes have been addressed namely, the capturing of data during the research process, the dissemination of data and the preservation of data.
This presentation was provided by Les Hawkins of The Library of Congress, during the NISO at NASIG Pre-conference "Metadata in a Digital Age: New Models of Creation, Discovery, and Use," held on June 4, 2008.
Elaine Martin, MSLS, DA, Donna Kafel, RN, MSLS, and Andrew Creamer, MaEd, MSLS of UMass Medical School''s Lamar Soutter Library present Best Practices for Managing Data. The presentation features the importance of managing data for research projects, and tactical best practice initiatives to create a data management and sharing plan, including how to preserve label, secure, store, and preserve data. Issues, such as licensing, data dictionaries, regulations, and metadata are addressed in the presentation.
Meeting Federal Research Requirements for Data Management Plans, Public Acces...ICPSR
These slides cover evolving federal research requirements for sharing scientific data. Provided are updates on federal agency responses to the 2013 OSTP memo, guidance on data management plans, resources for data management and curation training for staff/researchers, and tips for evaluating public data-sharing services. ICPSR's public data-sharing service, openICPSR, is also presented. Recording of this presentation is here: https://www.youtube.com/watch?v=2_erMkASSv4&feature=youtu.be
An introduction to Research Data Management and Data Management Planning presented at the University of the West of England on Wednesday 9th July 2014.
February 18 2015 NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Keynote Address: Data Management Plan Requirements at the US Department of Energy
Laura J. Biven, Ph.D., Senior Science and Technology Advisor, Office of the Deputy Director for Science Programs, Office of Science, US Department of Energy
This presentation was provided by Maria Praetzellis of California Digital Library, during the NISO hot topic virtual conference "Effective Data Management," which was held on September 29, 2021.
This presentation was delivered at the Elsevier Library Connect Seminar on 6 October 2014 in Johannesburg, 7 October 2014 in Durban and 9 October 2014 in Cape Town and gives an overview of the potential role that librarians can play in research data management
The ROER4D Curation & Dissemination team provides an overview of the ROER4D open data initiative as well as some key insights and challenges experienced.
This presentation introduced participants to the DC 101 course and was given at the Digital Curation and Preservation Outreach and Capacity Building Workshop in Belfast on September 14-15 2009.
http://www.dcc.ac.uk/events/workshops/digital-curation-and-preservation-outreach-and-capacity-building-workshop
Introduction to research data managementdri_ireland
An Introduction to Research Data Management: slides from a presentation given online on May 12 2022, by Beth Knazook, Project Manager, Research Data. Covers topics such as: what are research data; why share research data; why DMPs are important; and where should you share your data?
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...EUDAT
| www.eudat.eu | 1st Session: July 7, 2016.
In this webinar, Sarah Jones (DCC) and Marjan Grootveld (DANS) talked through the aspects that Horizon 2020 requires from a DMP. They discussed examples from real DMPs and also touched upon the Software Management Plan, which for some projects can be a sensible addition
PIDs, Data and Software: How Libraries Can Support Researchers in an Evolving...Sarah Anna Stewart
Presentation given at the M25 Consortium of Academic Libraries, CPD25 Event on 'The Role of the Library in Supporting Research'. Provides an introduction to data, software and PIDs and a brief look at how libraries can enable researchers to gain impact and credit for their research data and software.
PARTHENOS Common Policies and Implementation StrategiesParthenos
Presentation by Hella Hollander for the PARTHENOS workshop "Introducing PARTHENOS - Integrating the Digital Humanities" on 14 December 2016 in Prato, Italy.
A presentation on FAIR, FAIRsharing and the FAIR ecosystem for the ENVRI-FAIR community on the 13th December 2019. This presentation covers the basics of what FAIR is, how FAIRsharing can help 'FAIRify' standards, repositories, knowledgebases and data policies, and then the connections FAIRsharing has with other initiatives, such as the FAIR Evaluator, Data Stewardship Wizard, our RDA WG, GO-FAIR and EOSC-Life.
Presentation by the ROER4D Curation and Dissemination Manager, Michelle Willmers, on Science Communication to the “Middleware for Collaborative Applications and Global Virtual Communities” (Magic) project.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Macroeconomics- Movie Location
This will be used as part of your Personal Professional Portfolio once graded.
Objective:
Prepare a presentation or a paper using research, basic comparative analysis, data organization and application of economic information. You will make an informed assessment of an economic climate outside of the United States to accomplish an entertainment industry objective.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
1. The ROER4D Open Data initiative
Michelle Willmers and Thomas King
January 2018
CC BY
2. Introduction to ROER4D
• Research on Open Educational Resources for Development project
– 18 sub-projects, across 26 countries in the Global South from Chile to
Mongolia, with 100 researchers, supported by a Network Hub team based in
the University of Cape Town and Wawasan Open University.
– Datasets in multiple languages (English, Spanish, Mongolian)
– Mostly mixed-methods data (mix of quantitative and qualitative)
• ROER4D Open Data initiative: supporting interested sub-projects in
sharing their data openly
3.
4. Research
On Open Educational Resources (OER)
for Development
• Imperative to establish empirical baseline research on OER in Global South
• 86 researchers in 26 countries across 3 continents
• Project ‘Open’ ethos manifests in Open Research strategy, bridging ‘Open’
silos
• Open content (typically used in a teaching and learning
content) that can be reused, revised, remixed,
redistributed and retained
• Made possible by open licensing, although increasing
focus on differentiating implicit vs. explicit open
content
• Focus on role OER can play in improving access to quality education
• Focus on role project can play in building Global South Open Education
research capacity
• Strong advocacy and activism component (NGO, CBO sectors – not only
career researchers)
Focus on empirical baseline manifests in focus on curatorial and publishing capacity within the
research project. The project acts as publisher, providing greater agency and control (but
presenting some challenges in terms of accreditation/reward).
Unpacking the “ROER4D” project title…
5. Curation & Dissemination strategy
• Provide a content management and publishing service to SP researchers and the
Network Hub team in order to advance research capacity development efforts and
increase visibility of outputs.
• Support Principal Investigators and SP researchers in editorial development of
ROER4D outputs.
• Address infrastructure deficits and provide content management solutions
(including content hosting) in a research community with uneven institutional
support and capacity challenges.
• Ensure that the ROER4D legacy is freely accessible for reuse in line with international
curatorial and publishing standards.
• Complement Network Hub Communications efforts in an integrated
communications/dissemination approach.
6. • Data sharing as component of generalised open content focus.
• Organising and profiling open content increases the potential for reuse and citation
(impact).
• Well-organised, strategic research management and content organisation promotes
rigour in the research process.
• Copyright vests with the author > data-sharing activity determined by their willingness
and capacity to engage.
• Format and platform/tool agnostic.
• Share openly by default on condition that it is valuable, legal and ethical
Data management principles
7. Research Data
Management
Collect data
Organise data
Refine data
Share data
Document
data
Store data
Backup, archive, on-
site storage, cloud
storage
Metadata, dataset
description
De-identification,
publishing, open
data
Ethics clearance,
methodology,
instruments Formats, naming
conventions
Verification,
validation
8. The two pillars of Open Data sharing
Consensual
ethical
legal
Comprehensible
coherent
valuable
Research Data Management &
Open Data sharing
9. Project archive
(external)
Zenodo
Researcher
ROER4D archive (internal)
Google, Vula, UCT eResearch
Centre
Publisher
DataFirst
Network Hub
(Google, Vula)
ROER4D project data flow
Internal
sharing and
collaboration
External
sharing and
collaboration
10. Open Data terminology
• Open Data = Microdata
– Unit record data (survey data, census data)
– Interview and Focus Group transcripts
– i.e. the ‘raw material’ from which outputs, reports, publications etc. are
produced.
• Supportive documentation = Metadata
– Dataset descriptions
– Study descriptions (methods/methodology, data collection schedules
– Data processing information (e.g. de-identification schema)
11. Terms and definitions
TERM DEFINITION
Microdata (aka Unit
Record Data)
The information that underlies a research project’s analysis (i.e. the
‘thing’)
Metadata Data that describes a file or record on a database (for example,
keywords, author fields, ISBNs, DOIs)
Research Data
Management (RDM)
Overall term for how individuals/projects/institutions manage their
data
Data Management Plan
(DMP)
Outlines an individual or project’s strategy around all aspects of data
management
Curation Organising, storing/archiving and describing data to ensure & control
its long-term accessibility and usability. May include
collating/concatenating from other sources
De-identification Removing, eliding or replacing pieces of information that reveal
research participants’ (possibly also referents’) identity
Anonymity Personal details (identifiers) are not gathered
Confidentiality Personal details (identifiers) are not shared
Curation platform An on-premises or cloud-based storage space that contains metadata
capabilities, Search Engine Optimisation, and backup capabilities
12. Why should researchers share data?
• ROER4D motivations:
– Build the empirical base for future research
– Coherent with our generally ‘open’ approach – publishing open
access outputs, actively communicating with audiences and
stakeholders, etc.
• Good practice – many research funders now require some sort of data-
sharing activity or plan
• Improve rigour
– Sharing data openly demands that the dataset is well described
and organised
– Increased scrutiny of the dataset often leads to more refined
analysis
16. Recruiting participants
• Emphasising social justice through sharing
– Sharing open data allows for latitudinal studies using data from multiple sites
• Emphasising personal reputation
– Sharing open data as a means of building one’s personal profile as a
researcher
• Emphasising rigour
– Sharing data openly enhances the quality of the research
17. • Check ethics approval and consent
• Ensure first-tier de-identification takes place prior to Network Hub transfer in
order to ensure research subject confidentiality
• ROER4D agnostic in its approach (in terms of scale, format and technical
sophistication)
• Challenges of varying researcher sophistication in terms of data collection and
presentation
• Challenges of varying researcher sophistication in terms of technology employed
to capture, present, and analyse data
Step 3: Source sub-project micro-data
18. • Archive in LMS and secure institutional archive
• Network Hub C&D team audits researchers’ submitted dataset
> What is the dataset comprised of?
> Are all the pieces there?
> What were the data collection processes, and do we have all the instruments to
share?
> What languages are represented?
> Does something else like it exist?
> Who might it be of use to?
• Address file naming and format issues
• Articulate sub-project-specific data management plan
Step 4: Network Hub curation and quality
assurance
19. • Scope and conceptualise the dataset
> Which components of the project-generated micro-data are you ethically and
legally allowed to share?
> Which components of the project-generated micro-data will you invest
resources in curating and sharing?
> Which instruments will you include?
• Identify focus of data and points of sensitivity
• Define appropriate second-tier de-identification approach
Step 5: Preparing data for publication
20. READ
DATA
Coherence
Format &
layout Editing
Fix typos &
identify
anomalous data
1.
2.
3.
4.
5.
De-identifying
Remove
identifiers
Validation
Identify and
account for missing
data
ROER4D data
interrogation
process
21. The de-identification balancing act
First, do no harm
Remove as much as needed to ensure the
confidentiality or anonymity of the
research participants.
Ensure that all ethical and consent
processes have been adhered to.
Don’t go overboard
Remove as little as is ethical to ensure the
richness of the data.
Take the unit of analysis as the guide – de-
identify up to the Unit of Analysis.
E.g: If Study X compares two universities,
you can safely remove all identifiers lower
than the university affiliation.
HOWEVER
Your data may be useful to others. The
purpose of de-identification is to preserve
confidentiality – don’t de-identify for the
sake of it
22. ROER4D de-identification process
1. First-level de-identification by researcher
– Removal of direct identifiers (names of people/institutions/companies, ID
numbers, etc.)
– Important to ensure that raw data is not shared
2. Second-level de-identification by C&D team to catch remaining direct
identifiers
3. In-depth sweep of the text to identify indirect identifiers
– Meticulous, thorough, repeated reading of the text (which ties back to
general data enhancement)
23. Qualitative de-identification
• De-identification located in the same ecosystem as data cleaning and data
validation – no clear line between data improvement and de-identification
– Cleaning up typos
– Standardising presentation and layout
– Identifying unanswered questions (or additional questions), mislabelled
responses, etc.
• Much of these also apply to quantitative data
• Articulation of principles in RDM and description of these processes included in
metadata
24. Qualitative de-identification example
• Raw data
– Well my name is Susan Tsvangirai, and I’m the Head of the
Anthropology department at the University of Zimbabwe. I first
started getting involved in publishing my data – see I’m the only
person in the country who works on human ecologies, well it’s me
and Ishaan at Wits, but I’m the only one locally, and I started out
using the institutional repository but it didn’t really work. It kept
timing out when I tried to upload resources. So I switched the Zenodo
which was fine but it felt a little bit sterile…
• Cleaned/processed data
– Well my name is [redacted], and I’m the Head of [my] department at
the University of Zimbabwe. I first started getting involved in
publishing my data – see I’m the only person in the country who
works [in my area], well it’s me and [a colleague] at Wits, but I’m the
only one locally, and I started out using the institutional repository
but it didn’t really work. It kept timing out when I tried to upload
resources. So I switched the Zenodo which was fine but it felt a little
bit sterile…
25. • Generate metadata and dataset description (accompanying narrative)
• Submit content to publisher (in ROER4D instance, DataFirst)
• Link to published outputs
• Include description of process in research Methodology statements
• Profile in project communications activity
Step 6: Publish
26. Challenges
• Data collected in multiple languages
– De-identification (particularly in qualitative data) far more difficult –
greater reliance on the researcher to identify disclosive information
• Post-hoc consent process
– Departments merge or close, participants retire or disappear
• Data collected by multiple researchers
– Different collection strategies, adherence to interview schedules, use/non-
use of clarifying questions, etc.
27. Ways forward: ‘Open by design’
• Help researchers write consent forms to facilitate ethical open
data sharing.
• ‘Red flag’ clauses abound in template consent forms,
including:
– “will be used for research purposes only”
– “data will be destroyed after use”
– “only researchers will have access to the data”
• More open consent forms allow for data sharing but do not
mandate it.
28. Lessons learned
1. Openness increases rigour. Preparing data for publication promotes professional approach to
research process.
2. Preparing data for publication exposes weaknesses in instrument design and research
process.
3. Introducing C&D and data-sharing focus midway through a project poses many challenges,
particularly in terms of ethical and consent components.
4. Data sharing drives focus on reproducibility, transforming traditional approach to crafting
methodology statements.
5. The data preparation process takes time (approx. one week of researchers’ time in ROER4D
context).
6. Obtaining balance between utility and adequate protection in de-identification of qualitative
data is a challenge.
7. Openness is threatening to researchers in terms of exposing weakness in processes and
perceived threat of losing publication advantage.
8. C&D and data sharing activity require support, capacity development and resourcing.
Editor's Notes
The ROER4D project, conceived in 2012 and running from 2013 to the end of 2017, was explicitly scoped with an ambition to conduct Open Research inasmuch as that proved viable and valuable. An early ambition mentioned in the scoping document was the desire to share data openly, but this process was not begun until 2015 with the elevation of Curation and Dissemination as a core project objective and the subsequent launch of the Open Data Initiative.
The graphic above shows where the ROER4D sub-projects were located and where they conducted their research activities. The research participants included high-school (secondary) and university (tertiary) students, teachers in secondary and tertiary education, government officials, and members of NGOs,.
In the networked model of the ROER4D project, the Curation and Dissemination team were not involved in the gathering and validation of data, but supported the sub-projects in processing and organising their data for long-term curation and storage, and in some cases for publication and sharing as Open Data. Due to contractual requirements, the Network Hub
There are two competing influences on Open Data sharing, namely the ethical imperative – the requirement to actively inform research participants of the Open Data process and protect them from potential negative consequences – and ensuring the integrity and value of the shared dataset by not removing so much content that the final product is incomprehensible or so sparse as to lack value.
The ROER4D Open Data initiative was scoped to serve internal curation purposes (professionalising data stewardship) as well as external, public sharing of micro-data (where ethically and legally possible). The internal curation component was crucial in terms of keeping track of and curating the large amount of data produced by the 17 sub-projects, particularly as relates to the project’s meta-synthesis activities.
Microdata is the raw material that underpins the analysis of a research project. It can consist of quantitative (large-scale datasets, often represented in tabular form) or qualitative data (personal observations, field notes, interview and focus group transcripts). Metadata is the data that supports and describes microdata, and can consist of some or all of the following: dataset descriptions, study methods or methodologies, production dates, data collection and processing schema, etc.
Difference between anonymity and confidentiality: an anonymous survey contains no questions about personal identifiers; a confidential survey does contain these questions, but will not share/publish them.
While there are potential and real benefits to civil society and government from sharing Open Data, there is also a case to be made for the individual benefits accruing to researchers from sharing their data. Open data sharing is increasingly being mandated by funder institutions, particularly large national and regional funders in the Global North, and so familiarising oneself with open data principles and sharing data openly is good practice for those interested in applying to these funders. Finally, and significantly, the process of preparing one’s data for open sharing necessitates deep and thorough data sophistication, through improvement of the microdata and/or metadata.
As the Open Data Initiative was a voluntary activity (not mandated in the original project scoping), participants had to be persuaded to participate. The three primary methods used to encourage participation were through:
1) An appeal to the project’s overall Open Research agenda, by emphasising the value of Open Data for future studies and potentially latitudinal research2) An appeal to the benefits accruing to contributors’ personal reputation, through the production of a citable research object (an open dataset)3) An emphasis on the rigour-enhancement inherent in preparing a dataset for open sharing.
As the project was conceived with an explicit open agenda, much of the first strategy was implicit in the project’s general Open Research orientation. The second emphasis (personal reputation) relied on the standard practice of measuring citations as a means of measuring an academic’s public profile. Finally, the third strategy highlighted the Open Data sharing process as serving the core academic principle of ethical, rigorous research practice.
The ROER4D Network Hub conceives of data publishing as a ‘data interrogation’ process that may result in published Open Data, but still provides value even if the decision is made not to publish. The data interrogation process relies on frequently returning to read the original data in between coherence checking, editorial work, validation and verification activity, and finally de-identification. This process helps surface issues, particularly indirect identifiers, that are particularly relevant and prevalent in qualitative data.
While ethical considerations and the protection of research participants must come first, part of the value of Open Data lies in part in the ability of other researchers to mine datasets according to different conceptual and analytical frameworks. In such instances, a de-identification approach that only retains such content as supports the original study’s analysis limits the reusability, and thus the value of the dataset.
In quantitative data, disclosive information is typically isolated to specific variables or data values that can be identified and removed automatically. In qualitative data however the interplay between otherwise nondisclosive information or insights may potentially be disclosive. Therefore, more attention must be paid to identifying and removing, eliding or obfuscating these indirect identifiers, which may only be recognisable after repeated passes of the data.
The above is an excerpt from a fictional qualitative dataset with the disclosive information indicated in red. The bold text indicates an indirect identifier that, in combination with the directly disclosive information, becomes disclosive itself. The second paragraph serves as an example of one way of de-identifying this excerpt.
As a networked project ROER4D covered a vast area with different linguistic and cultural norms, and contained sub-projects with different research methodologies. This introduced complexity into the data cleaning process, made even more complicated by the fact that the Open Data Initiative had not formed part of the original scoping and therefore in some cases research participants had to be recontacted in order to gain consent for their data to be shared.