SlideShare a Scribd company logo
1 of 24
Challenges in Preparing and Sharing Open Data
OpenCon 2016 Cape Town
14 December 2016
Michelle Willmers and Thomas King
ROER4D Curation and Dissemination Manager
CC BY
Research
On Open Educational Resources (OER)
for
Development
• Imperative to establish empirical baseline research on OER in Global South
• 86 researchers in 26 countries across 3 continents
• Project ‘Open’ ethos manifests in Open Research strategy, bridging ‘Open’
silos
• Open content (typically used in a teaching and
learning content) that can be reused, revised, remixed,
redistributed and retained
• Made possible by open licensing, although increasing
focus on differentiating implicit vs. explicit open
content
• Focus on role OER can play in improving access to quality education
• Focus on role project can play in building Global South Open Education
research capacity
• Strong advocacy and activism component (NGO, CBO sectors – not only
career researchers)
Focus on empirical baseline manifests in focus on curatorial and publishing capacity
within the research project. The project acts as publisher, providing greater agency and
control (but presenting some challenges in terms of accreditation/reward).
ROER4D Curation & Dissemination Strategy
• Provide a content management and publishing service to SP researchers and the
Network Hub team in order to advance research capacity development efforts and
increase visibility of outputs.
• Support Principal Investigators and SP researchers in editorial development of
ROER4D outputs.
• Address infrastructure deficits and provide content management solutions
(including content hosting) in a research community with uneven institutional
support and capacity challenges.
• Ensure that the ROER4D legacy is freely accessible for reuse in line with international
curatorial and publishing standards.
• Complement Network Hub Communications efforts in an integrated
communications/dissemination approach.
• Data sharing as component of open content focus.
• Organising and profiling open content increases the potential for reuse and citation
(impact).
• Well-organised, strategic research management and content organisation promotes
rigour in the research process.
• Copyright vests with the author > data-sharing activity determined by their willingness
and capacity to engage.
• Format and platform/tool agnostic.
• Share openly by default on condition that it is valuable, legal and ethical
ROER4D data management principles
Project archive
(external)
Zenodo
Network Hub
(Google, Vula)
ROER4D project data flow
Internal
sharing and
collaboration
External
sharing and
collaboration
Five pillars of
ROER4D data
publication
approach
Step 1: Evaluate contractual framework, articulate strategy
Step 2: Get researchers on board
• Check ethics approval and consent
• Ensure first-tier de-identification takes place prior to Network Hub transfer in order to
ensure research subject confidentiality
• ROER4D agnostic in its approach (in terms of scale, format and technical
sophistication)
• Challenges of varying researcher sophistication in terms of data collection and
presentation
• Challenges of varying researcher sophistication in terms of technology employed to
capture, present, and analyse data
Step 3: Obtain source sub-project micro-data
• Archive in Vula and UCT e-Research Centre secure institutional archive
• Network Hub C&D team audits researchers’ submitted dataset
> What is the dataset comprised of?
> Are all the pieces there?
> What were the data collection processes, and do we have all the instruments to share?
> What languages are represented?
> Does something else like it exist?
> Who might it be of use to?
• Address file naming and format issues
• Articulate sub-project-specific data management plan
Step 4: Network Hub curation and quality assurance
• Scope and conceptualise the dataset
> Which components of the project-generated micro-data are you ethically and
legally allowed to share?
> Which components of the project-generated micro-data will you invest
resources in curating and sharing?
> Which instruments will you include?
• Identify focus of data and points of sensitivity
• Define appropriate second-tier de-identification approach
Step 5: Prepare data for publication
• Generate metadata and dataset description (accompanying narrative)
• Submit content to publisher (DataFirst)
• Link to published outputs
• Include description of process in research Methodology statements
• Profile in project communications activity
Step 6: Publish
Some
lessons
learned
1. Openness increases rigour. Preparing data for publication promotes professional
approach to research process.
2. Preparing data for publication exposes weaknesses in instrument design and
research process.
3. Introducing C&D and data-sharing focus midway through a project poses many
challenges, particularly in terms of ethical and consent components.
4. Data sharing drives focus on reproducibility, transforming traditional approach to
crafting methodology statements.
5. The data preparation process takes time (approx. one week of researchers’ time in
ROER4D context).
6. Obtaining balance between utility and adequate protection in de-identification of
qualitative data is a challenge.
7. Openness is threatening to researchers in terms of exposing weakness in processes
and perceived threat of losing publication advantage.
8. C&D and data sharing activity require support, capacity development and
resourcing.
Qualitative de-identification
Thomas King
Terms and definitions
• De-identification – removing, eliding or replacing
pieces of information that reveal research
participants’ (possibly also referents’) identity.
• Anonymity – personal details are not gathered.
• Confidentiality – personal details are not shared.
• E.g. an anonymous survey contains no questions
about personal identifiers. A confidential survey
does contain these questions, but will not
share/publish them.
The two pillars of open data sharing
Consensual
ethical
legal
Comprehensible
coherent
valuable
Research Data Management &
Open Data sharing
The de-identification balancing act
First, do no harm
Remove as much as needed to ensure the
confidentiality or anonymity of the
research participants.
Ensure that all ethical and consent
processes have been adhered to.
Don’t go overboard
Remove as little as is ethical to ensure the
richness of the data.
Take the unit of analysis as the guide – de-
identify up to the Unit of Analysis.
E.g: If Study X compares two universities,
you can safely remove all identifiers lower
than the university affiliation.
HOWEVER
Your data may be useful to others. The
purpose of de-identification is to preserve
confidentiality – don’t de-identify for the
sake of it
Qualitative de-identification
• De-identification located in the same ecosystem
as data cleaning and data validation – no clear
line between data improvement and de-
identification
– Cleaning up typos
– Standardising presentation and layout
– Identifying unanswered questions (or additional
questions), mislabelled responses, etc.
• Much of these also apply to quantitative data
• Articulation of principles in RDM and description
of these processes included in metadata
READ
DATA
READ
DATA
Coherence
Format &
layout Editing
Fix typos &
identify
anomalous data
1.
2.
3.
4.
5.
De-identifying
Remove
identifiers
Validation
Identify and
account for missing
data
ROER4D data
interrogation
process
NETWORK HUB
Principal Investigator
Curation and Dissemination
team
Communication and Evaluation
consultants
NETWORK HUB
Principal Investigator
Curation and Dissemination
team
Communication and Evaluation
consultants
SUB PROJECTSSUB PROJECTS
ROER4D project structure
Using largely mixed-methods data (both
quantitative and qualitative)
ROER4D de-identification process
1. First-level de-identification by researcher
– Removal of direct identifiers (names of
people/institutions/companies, ID numbers, etc.)
– Important to ensure that raw data is not shared
1. Second-level de-identification by C&D team to
catch remaining direct identifiers
2. In-depth sweep of the text to identify indirect
identifiers
– Meticulous, thorough, repeated reading of the text
• (which ties back to general data enhancement)
Tricky situations
• Data collected in multiple languages
– De-identification (particularly in qualitative data) far
more difficult – greater reliance on the researcher
• Post-hoc consent process
– Departments merge or close, participants retire or
disappear
• Data collected by multiple researchers
– Different collection strategies, adherence to interview
schedules, use/non-use of clarifying questions, etc.
Open by design
• Help researchers write consent forms!
Particularly for open data sharing.
• ‘Red flag’ clauses abound in template consent
forms, including:
– “will be used for research purposes only”
– “data will be destroyed after use”
– “only researchers will have access to the data”
• More open consent forms allow for data
sharing but do not mandate it.

More Related Content

What's hot

5 qualitative methodology (Dr Mai, 2014)
5   qualitative methodology (Dr Mai, 2014)5   qualitative methodology (Dr Mai, 2014)
5 qualitative methodology (Dr Mai, 2014)Phong Đá
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
 
Harmonising Research between South and North: Results from ROER4D’s Question ...
Harmonising Research between South and North: Results from ROER4D’s Question ...Harmonising Research between South and North: Results from ROER4D’s Question ...
Harmonising Research between South and North: Results from ROER4D’s Question ...Open Education Consortium
 
Bioinformatic core facilities discussion
Bioinformatic core facilities discussionBioinformatic core facilities discussion
Bioinformatic core facilities discussionJennifer Shelton
 
RDAP 16 Poster: Interpreting Local Data Policies in Practice
RDAP 16 Poster: Interpreting Local Data Policies in PracticeRDAP 16 Poster: Interpreting Local Data Policies in Practice
RDAP 16 Poster: Interpreting Local Data Policies in PracticeASIS&T
 
Incentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processIncentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processLouise Corti
 
The Italian Universities RDM WG: tools and best practices
The Italian Universities RDM WG:  tools and best practicesThe Italian Universities RDM WG:  tools and best practices
The Italian Universities RDM WG: tools and best practicesResearch Data Alliance
 
Lessons from Journal Research Data Policy Registry Pilot
Lessons from Journal Research Data Policy Registry PilotLessons from Journal Research Data Policy Registry Pilot
Lessons from Journal Research Data Policy Registry PilotJisc RDM
 
[AIIM17] Data Categorization You Can Live With - Monica Crocker
[AIIM17]  Data Categorization You Can Live With - Monica Crocker [AIIM17]  Data Categorization You Can Live With - Monica Crocker
[AIIM17] Data Categorization You Can Live With - Monica Crocker AIIM International
 
Embedding ORCID across researcher career paths
Embedding ORCID across researcher career pathsEmbedding ORCID across researcher career paths
Embedding ORCID across researcher career pathsORCID, Inc
 
IR Strangelove or: How I Learned to Stop Worrying and Love the Institutional ...
IR Strangelove or: How I Learned to Stop Worrying and Love the Institutional ...IR Strangelove or: How I Learned to Stop Worrying and Love the Institutional ...
IR Strangelove or: How I Learned to Stop Worrying and Love the Institutional ...OCLC Research
 
Research data Management Workshop
Research data Management WorkshopResearch data Management Workshop
Research data Management WorkshopLilian Juma
 
Presentation on Open Science and its 'Impacts';
Presentation on Open Science and its 'Impacts'; Presentation on Open Science and its 'Impacts';
Presentation on Open Science and its 'Impacts'; Rene Von schomberg
 
dorsdl2006-arrow
dorsdl2006-arrowdorsdl2006-arrow
dorsdl2006-arrowguestfbf1e1
 
RDMRose 3.1 Data Asset Framewok surveys
RDMRose 3.1 Data Asset Framewok surveysRDMRose 3.1 Data Asset Framewok surveys
RDMRose 3.1 Data Asset Framewok surveysRDMRose
 

What's hot (20)

Research Data Alliance Overview
Research Data Alliance OverviewResearch Data Alliance Overview
Research Data Alliance Overview
 
5 qualitative methodology (Dr Mai, 2014)
5   qualitative methodology (Dr Mai, 2014)5   qualitative methodology (Dr Mai, 2014)
5 qualitative methodology (Dr Mai, 2014)
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
 
ER&L-S45-RDADataPrivacy-April5
ER&L-S45-RDADataPrivacy-April5ER&L-S45-RDADataPrivacy-April5
ER&L-S45-RDADataPrivacy-April5
 
Collaborate to Share
Collaborate to ShareCollaborate to Share
Collaborate to Share
 
Harmonising Research between South and North: Results from ROER4D’s Question ...
Harmonising Research between South and North: Results from ROER4D’s Question ...Harmonising Research between South and North: Results from ROER4D’s Question ...
Harmonising Research between South and North: Results from ROER4D’s Question ...
 
Bioinformatic core facilities discussion
Bioinformatic core facilities discussionBioinformatic core facilities discussion
Bioinformatic core facilities discussion
 
RDAP 16 Poster: Interpreting Local Data Policies in Practice
RDAP 16 Poster: Interpreting Local Data Policies in PracticeRDAP 16 Poster: Interpreting Local Data Policies in Practice
RDAP 16 Poster: Interpreting Local Data Policies in Practice
 
Incentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production processIncentivising the uptake of reusable metadata in the survey production process
Incentivising the uptake of reusable metadata in the survey production process
 
The Italian Universities RDM WG: tools and best practices
The Italian Universities RDM WG:  tools and best practicesThe Italian Universities RDM WG:  tools and best practices
The Italian Universities RDM WG: tools and best practices
 
Enhance your rese​arch impact through open science
Enhance your rese​arch impact through open scienceEnhance your rese​arch impact through open science
Enhance your rese​arch impact through open science
 
Lessons from Journal Research Data Policy Registry Pilot
Lessons from Journal Research Data Policy Registry PilotLessons from Journal Research Data Policy Registry Pilot
Lessons from Journal Research Data Policy Registry Pilot
 
[AIIM17] Data Categorization You Can Live With - Monica Crocker
[AIIM17]  Data Categorization You Can Live With - Monica Crocker [AIIM17]  Data Categorization You Can Live With - Monica Crocker
[AIIM17] Data Categorization You Can Live With - Monica Crocker
 
Hawkins, "Scitopia.org, A Discovery Tool Using Federated Search"
Hawkins, "Scitopia.org, A Discovery Tool Using Federated Search"Hawkins, "Scitopia.org, A Discovery Tool Using Federated Search"
Hawkins, "Scitopia.org, A Discovery Tool Using Federated Search"
 
Embedding ORCID across researcher career paths
Embedding ORCID across researcher career pathsEmbedding ORCID across researcher career paths
Embedding ORCID across researcher career paths
 
IR Strangelove or: How I Learned to Stop Worrying and Love the Institutional ...
IR Strangelove or: How I Learned to Stop Worrying and Love the Institutional ...IR Strangelove or: How I Learned to Stop Worrying and Love the Institutional ...
IR Strangelove or: How I Learned to Stop Worrying and Love the Institutional ...
 
Research data Management Workshop
Research data Management WorkshopResearch data Management Workshop
Research data Management Workshop
 
Presentation on Open Science and its 'Impacts';
Presentation on Open Science and its 'Impacts'; Presentation on Open Science and its 'Impacts';
Presentation on Open Science and its 'Impacts';
 
dorsdl2006-arrow
dorsdl2006-arrowdorsdl2006-arrow
dorsdl2006-arrow
 
RDMRose 3.1 Data Asset Framewok surveys
RDMRose 3.1 Data Asset Framewok surveysRDMRose 3.1 Data Asset Framewok surveys
RDMRose 3.1 Data Asset Framewok surveys
 

Viewers also liked

9 клас підготовка до контрольної 1
9 клас підготовка до контрольної 19 клас підготовка до контрольної 1
9 клас підготовка до контрольної 1Andy Levkovich
 
Alitera, E-Magazine First Edition
Alitera, E-Magazine First Edition  Alitera, E-Magazine First Edition
Alitera, E-Magazine First Edition Akash Khonde
 
الأنوار الكاشفة
الأنوار الكاشفةالأنوار الكاشفة
الأنوار الكاشفةosama mostafa
 
A Simulator for Social Exchanges and Collaborations - Architecture and Case S...
A Simulator for Social Exchanges and Collaborations - Architecture and Case S...A Simulator for Social Exchanges and Collaborations - Architecture and Case S...
A Simulator for Social Exchanges and Collaborations - Architecture and Case S...Simon Caton
 
Graph-based Question Answering
Graph-based Question AnsweringGraph-based Question Answering
Graph-based Question AnsweringDiego Molla-Aliod
 
التراث العالمي ديسمبر 2012 معدلة
التراث العالمي   ديسمبر 2012 معدلةالتراث العالمي   ديسمبر 2012 معدلة
التراث العالمي ديسمبر 2012 معدلةProf. Sherif Shaheen
 
Tasty ice cream raipur
Tasty ice cream raipurTasty ice cream raipur
Tasty ice cream raipurKomal_Verma
 
Художня культура вступ 2015_2016_н_р
Художня культура вступ 2015_2016_н_рХудожня культура вступ 2015_2016_н_р
Художня культура вступ 2015_2016_н_рAndy Levkovich
 
الأبيونيون وورقة ابن نوفل والإسلام
الأبيونيون وورقة ابن نوفل والإسلامالأبيونيون وورقة ابن نوفل والإسلام
الأبيونيون وورقة ابن نوفل والإسلامabadi1713
 
asian architecture case study paper
asian architecture case study paperasian architecture case study paper
asian architecture case study papercarol wong
 
Newton divided difference interpolation
Newton divided difference interpolationNewton divided difference interpolation
Newton divided difference interpolationVISHAL DONGA
 

Viewers also liked (13)

Multi Marine Venture Sdn Bhd Coverpage - For Advertising (new)
Multi Marine Venture Sdn Bhd Coverpage - For Advertising (new)Multi Marine Venture Sdn Bhd Coverpage - For Advertising (new)
Multi Marine Venture Sdn Bhd Coverpage - For Advertising (new)
 
9 клас підготовка до контрольної 1
9 клас підготовка до контрольної 19 клас підготовка до контрольної 1
9 клас підготовка до контрольної 1
 
Alitera, E-Magazine First Edition
Alitera, E-Magazine First Edition  Alitera, E-Magazine First Edition
Alitera, E-Magazine First Edition
 
الأنوار الكاشفة
الأنوار الكاشفةالأنوار الكاشفة
الأنوار الكاشفة
 
A Simulator for Social Exchanges and Collaborations - Architecture and Case S...
A Simulator for Social Exchanges and Collaborations - Architecture and Case S...A Simulator for Social Exchanges and Collaborations - Architecture and Case S...
A Simulator for Social Exchanges and Collaborations - Architecture and Case S...
 
Graph-based Question Answering
Graph-based Question AnsweringGraph-based Question Answering
Graph-based Question Answering
 
التراث العالمي ديسمبر 2012 معدلة
التراث العالمي   ديسمبر 2012 معدلةالتراث العالمي   ديسمبر 2012 معدلة
التراث العالمي ديسمبر 2012 معدلة
 
Tasty ice cream raipur
Tasty ice cream raipurTasty ice cream raipur
Tasty ice cream raipur
 
Художня культура вступ 2015_2016_н_р
Художня культура вступ 2015_2016_н_рХудожня культура вступ 2015_2016_н_р
Художня культура вступ 2015_2016_н_р
 
الأبيونيون وورقة ابن نوفل والإسلام
الأبيونيون وورقة ابن نوفل والإسلامالأبيونيون وورقة ابن نوفل والإسلام
الأبيونيون وورقة ابن نوفل والإسلام
 
Proyecto
ProyectoProyecto
Proyecto
 
asian architecture case study paper
asian architecture case study paperasian architecture case study paper
asian architecture case study paper
 
Newton divided difference interpolation
Newton divided difference interpolationNewton divided difference interpolation
Newton divided difference interpolation
 

Similar to Willmers&King open con2016-ct-14.11.16

Creating a Data Management Plan for your Research
Creating a Data Management Plan for your ResearchCreating a Data Management Plan for your Research
Creating a Data Management Plan for your ResearchRobin Rice
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorialJosh Young
 
Evaluating an open research project: Some practical lessons from the ROER4D p...
Evaluating an open research project: Some practical lessons from the ROER4D p...Evaluating an open research project: Some practical lessons from the ROER4D p...
Evaluating an open research project: Some practical lessons from the ROER4D p...Open Education Consortium
 
Magic willmers presentation_30.06.16
Magic willmers presentation_30.06.16Magic willmers presentation_30.06.16
Magic willmers presentation_30.06.16Michelle Willmers
 
Realising the value of open data: some disciplinary perspectives
Realising the value of open data: some disciplinary perspectivesRealising the value of open data: some disciplinary perspectives
Realising the value of open data: some disciplinary perspectivesLIBER Europe
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation Research Data Alliance
 
Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data managementdri_ireland
 
Open Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsOpen Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsMartin Donnelly
 
CAQDAS 2014 From graph paper to digital research our Framework journey
CAQDAS 2014 From graph paper to digital research our Framework journeyCAQDAS 2014 From graph paper to digital research our Framework journey
CAQDAS 2014 From graph paper to digital research our Framework journeyKandy Woodfield
 
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...EUDAT
 
FAIRsharing - ENVRI-FAIR Webinar
FAIRsharing - ENVRI-FAIR WebinarFAIRsharing - ENVRI-FAIR Webinar
FAIRsharing - ENVRI-FAIR WebinarPeter McQuilton
 
Practical Research Data Management: tools and approaches, pre- and post-award
Practical Research Data Management:  tools and approaches, pre- and post-awardPractical Research Data Management:  tools and approaches, pre- and post-award
Practical Research Data Management: tools and approaches, pre- and post-awardMartin Donnelly
 
Open Data: Strategies for Research Data Management (and Planning)
Open Data: Strategies for Research Data  Management (and Planning)Open Data: Strategies for Research Data  Management (and Planning)
Open Data: Strategies for Research Data Management (and Planning)Martin Donnelly
 
Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...amiraryani
 
Intro to Data Management Plans
Intro to Data Management PlansIntro to Data Management Plans
Intro to Data Management PlansSarah Jones
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data ManagementAnita de Waard
 

Similar to Willmers&King open con2016-ct-14.11.16 (20)

ROER4D Open Data Initiative
ROER4D Open Data InitiativeROER4D Open Data Initiative
ROER4D Open Data Initiative
 
Creating a Data Management Plan for your Research
Creating a Data Management Plan for your ResearchCreating a Data Management Plan for your Research
Creating a Data Management Plan for your Research
 
DataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data SharingDataONE Education Module 02: Data Sharing
DataONE Education Module 02: Data Sharing
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial
 
Evaluating an open research project: Some practical lessons from the ROER4D p...
Evaluating an open research project: Some practical lessons from the ROER4D p...Evaluating an open research project: Some practical lessons from the ROER4D p...
Evaluating an open research project: Some practical lessons from the ROER4D p...
 
Magic willmers presentation_30.06.16
Magic willmers presentation_30.06.16Magic willmers presentation_30.06.16
Magic willmers presentation_30.06.16
 
Realising the value of open data: some disciplinary perspectives
Realising the value of open data: some disciplinary perspectivesRealising the value of open data: some disciplinary perspectives
Realising the value of open data: some disciplinary perspectives
 
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation OpenAIRE and Eudat services and tools to support FAIR DMP implementation
OpenAIRE and Eudat services and tools to support FAIR DMP implementation
 
Introduction to research data management
Introduction to research data managementIntroduction to research data management
Introduction to research data management
 
Open Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and SolutionsOpen Access to Research Data: Challenges and Solutions
Open Access to Research Data: Challenges and Solutions
 
CAQDAS 2014 From graph paper to digital research our Framework journey
CAQDAS 2014 From graph paper to digital research our Framework journeyCAQDAS 2014 From graph paper to digital research our Framework journey
CAQDAS 2014 From graph paper to digital research our Framework journey
 
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
EUDAT & OpenAIRE Webinar: How to write a Data Management Plan - July 7, 2016|...
 
FAIRsharing - ENVRI-FAIR Webinar
FAIRsharing - ENVRI-FAIR WebinarFAIRsharing - ENVRI-FAIR Webinar
FAIRsharing - ENVRI-FAIR Webinar
 
Practical Research Data Management: tools and approaches, pre- and post-award
Practical Research Data Management:  tools and approaches, pre- and post-awardPractical Research Data Management:  tools and approaches, pre- and post-award
Practical Research Data Management: tools and approaches, pre- and post-award
 
Shareable by Design: Making Better Use of your Research
Shareable by Design: Making Better Use of your ResearchShareable by Design: Making Better Use of your Research
Shareable by Design: Making Better Use of your Research
 
Open Data: Strategies for Research Data Management (and Planning)
Open Data: Strategies for Research Data  Management (and Planning)Open Data: Strategies for Research Data  Management (and Planning)
Open Data: Strategies for Research Data Management (and Planning)
 
Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...
 
Intro to Data Management Plans
Intro to Data Management PlansIntro to Data Management Plans
Intro to Data Management Plans
 
Digital Curation 101 - Taster
Digital Curation 101 - TasterDigital Curation 101 - Taster
Digital Curation 101 - Taster
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data Management
 

Recently uploaded

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

Willmers&King open con2016-ct-14.11.16

  • 1. Challenges in Preparing and Sharing Open Data OpenCon 2016 Cape Town 14 December 2016 Michelle Willmers and Thomas King ROER4D Curation and Dissemination Manager CC BY
  • 2. Research On Open Educational Resources (OER) for Development • Imperative to establish empirical baseline research on OER in Global South • 86 researchers in 26 countries across 3 continents • Project ‘Open’ ethos manifests in Open Research strategy, bridging ‘Open’ silos • Open content (typically used in a teaching and learning content) that can be reused, revised, remixed, redistributed and retained • Made possible by open licensing, although increasing focus on differentiating implicit vs. explicit open content • Focus on role OER can play in improving access to quality education • Focus on role project can play in building Global South Open Education research capacity • Strong advocacy and activism component (NGO, CBO sectors – not only career researchers) Focus on empirical baseline manifests in focus on curatorial and publishing capacity within the research project. The project acts as publisher, providing greater agency and control (but presenting some challenges in terms of accreditation/reward).
  • 3. ROER4D Curation & Dissemination Strategy • Provide a content management and publishing service to SP researchers and the Network Hub team in order to advance research capacity development efforts and increase visibility of outputs. • Support Principal Investigators and SP researchers in editorial development of ROER4D outputs. • Address infrastructure deficits and provide content management solutions (including content hosting) in a research community with uneven institutional support and capacity challenges. • Ensure that the ROER4D legacy is freely accessible for reuse in line with international curatorial and publishing standards. • Complement Network Hub Communications efforts in an integrated communications/dissemination approach.
  • 4. • Data sharing as component of open content focus. • Organising and profiling open content increases the potential for reuse and citation (impact). • Well-organised, strategic research management and content organisation promotes rigour in the research process. • Copyright vests with the author > data-sharing activity determined by their willingness and capacity to engage. • Format and platform/tool agnostic. • Share openly by default on condition that it is valuable, legal and ethical ROER4D data management principles
  • 5. Project archive (external) Zenodo Network Hub (Google, Vula) ROER4D project data flow Internal sharing and collaboration External sharing and collaboration
  • 6. Five pillars of ROER4D data publication approach
  • 7. Step 1: Evaluate contractual framework, articulate strategy
  • 8. Step 2: Get researchers on board
  • 9. • Check ethics approval and consent • Ensure first-tier de-identification takes place prior to Network Hub transfer in order to ensure research subject confidentiality • ROER4D agnostic in its approach (in terms of scale, format and technical sophistication) • Challenges of varying researcher sophistication in terms of data collection and presentation • Challenges of varying researcher sophistication in terms of technology employed to capture, present, and analyse data Step 3: Obtain source sub-project micro-data
  • 10. • Archive in Vula and UCT e-Research Centre secure institutional archive • Network Hub C&D team audits researchers’ submitted dataset > What is the dataset comprised of? > Are all the pieces there? > What were the data collection processes, and do we have all the instruments to share? > What languages are represented? > Does something else like it exist? > Who might it be of use to? • Address file naming and format issues • Articulate sub-project-specific data management plan Step 4: Network Hub curation and quality assurance
  • 11. • Scope and conceptualise the dataset > Which components of the project-generated micro-data are you ethically and legally allowed to share? > Which components of the project-generated micro-data will you invest resources in curating and sharing? > Which instruments will you include? • Identify focus of data and points of sensitivity • Define appropriate second-tier de-identification approach Step 5: Prepare data for publication
  • 12. • Generate metadata and dataset description (accompanying narrative) • Submit content to publisher (DataFirst) • Link to published outputs • Include description of process in research Methodology statements • Profile in project communications activity Step 6: Publish
  • 14. 1. Openness increases rigour. Preparing data for publication promotes professional approach to research process. 2. Preparing data for publication exposes weaknesses in instrument design and research process. 3. Introducing C&D and data-sharing focus midway through a project poses many challenges, particularly in terms of ethical and consent components. 4. Data sharing drives focus on reproducibility, transforming traditional approach to crafting methodology statements. 5. The data preparation process takes time (approx. one week of researchers’ time in ROER4D context). 6. Obtaining balance between utility and adequate protection in de-identification of qualitative data is a challenge. 7. Openness is threatening to researchers in terms of exposing weakness in processes and perceived threat of losing publication advantage. 8. C&D and data sharing activity require support, capacity development and resourcing.
  • 16. Terms and definitions • De-identification – removing, eliding or replacing pieces of information that reveal research participants’ (possibly also referents’) identity. • Anonymity – personal details are not gathered. • Confidentiality – personal details are not shared. • E.g. an anonymous survey contains no questions about personal identifiers. A confidential survey does contain these questions, but will not share/publish them.
  • 17. The two pillars of open data sharing Consensual ethical legal Comprehensible coherent valuable Research Data Management & Open Data sharing
  • 18. The de-identification balancing act First, do no harm Remove as much as needed to ensure the confidentiality or anonymity of the research participants. Ensure that all ethical and consent processes have been adhered to. Don’t go overboard Remove as little as is ethical to ensure the richness of the data. Take the unit of analysis as the guide – de- identify up to the Unit of Analysis. E.g: If Study X compares two universities, you can safely remove all identifiers lower than the university affiliation. HOWEVER Your data may be useful to others. The purpose of de-identification is to preserve confidentiality – don’t de-identify for the sake of it
  • 19. Qualitative de-identification • De-identification located in the same ecosystem as data cleaning and data validation – no clear line between data improvement and de- identification – Cleaning up typos – Standardising presentation and layout – Identifying unanswered questions (or additional questions), mislabelled responses, etc. • Much of these also apply to quantitative data • Articulation of principles in RDM and description of these processes included in metadata
  • 20. READ DATA READ DATA Coherence Format & layout Editing Fix typos & identify anomalous data 1. 2. 3. 4. 5. De-identifying Remove identifiers Validation Identify and account for missing data ROER4D data interrogation process
  • 21. NETWORK HUB Principal Investigator Curation and Dissemination team Communication and Evaluation consultants NETWORK HUB Principal Investigator Curation and Dissemination team Communication and Evaluation consultants SUB PROJECTSSUB PROJECTS ROER4D project structure Using largely mixed-methods data (both quantitative and qualitative)
  • 22. ROER4D de-identification process 1. First-level de-identification by researcher – Removal of direct identifiers (names of people/institutions/companies, ID numbers, etc.) – Important to ensure that raw data is not shared 1. Second-level de-identification by C&D team to catch remaining direct identifiers 2. In-depth sweep of the text to identify indirect identifiers – Meticulous, thorough, repeated reading of the text • (which ties back to general data enhancement)
  • 23. Tricky situations • Data collected in multiple languages – De-identification (particularly in qualitative data) far more difficult – greater reliance on the researcher • Post-hoc consent process – Departments merge or close, participants retire or disappear • Data collected by multiple researchers – Different collection strategies, adherence to interview schedules, use/non-use of clarifying questions, etc.
  • 24. Open by design • Help researchers write consent forms! Particularly for open data sharing. • ‘Red flag’ clauses abound in template consent forms, including: – “will be used for research purposes only” – “data will be destroyed after use” – “only researchers will have access to the data” • More open consent forms allow for data sharing but do not mandate it.