Natasha Simons
#AM6 Data in the Scholarly
Communications Lifecycle
FORCE11 Scholarly Communications Institute
Monday 30 July – Friday 3 August 2018
San Diego, USA
Thursday 2 August
Today’s course outline
• Licensing research data for reuse
• Managing personal and sensitive data
• Describing, publishing and sharing research data
Link to slides: https://tinyurl.com/y7s5okr4
Licensing research data for reuse
Rachael Samberg & Maria Gould
University of California Berkeley
Data rights and licensing
Thursday
Managing personal & sensitive data
What are personal and sensitive data?
Privacy Act 1988
Personal
information
Sensitive
information
Health
information
Sensitive data
“data that can be used to
identify an individual,
species, object, process or
location that introduces a
risk of discrimination,
harm or unwanted
attention.”
Guide to Publishing and Sharing Sensitive Data
http://www.ands.org.au/guides/sensitivedata
Why it matters
http://www.abc.net.au/news/2018-03-18/cambridge-analytica-suspended-by-
facebook/9560272
https://www.smh.com.au/national/guilty-health-department-breached-privacy-
laws-publishing-data-of-2-5m-people-20180329-p4z6wf.html
Ethics
• Informed consent
• A key principle of ethics is avoid harm
• Can be achieved by removing/minimising sensitivity
• De-identifying data if possible (and the meaning is not
lost in the process)
• Conditions around access to data (mediated access, 5
safes)
• Ethics committee approval needs to cover consent and access
conditions
• See also ANDS’ medical webinar series
http://www.ands.org.au/working-with-data/sensitive-data/medical-and-
health/webinars-health-and-medical
Informed consent for data sharing
1. Avoid precluding data de-identification,
publication and sharing
2. State possibility of future data publication
3. State conditions of access
4. Document consent with collected data to inform
subsequent users
Example wording available in ANDS Guide to
Publishing and Sharing Sensitive Data
Identifiable*
Re-identifiable*
Non-identifiable*
De-identification/
Anonymisation
No specific individual can be identified
Possible to re-identify an individual
Identity of an individual can be
reasonably ascertained
http://www.ands.org.au/working-with-
data/sensitive-data/de-identifying-data
* Terms from National Statement on Ethical Conduct
in Human Research 2007 (Updated May 2015)
Data de-identification
What about sharing data that can’t be
de-identified?
healthtalkaustralia.org
Informed consent /
mediated access
Mediated access
• The metadata is openly available but the data is
not
• Access mediated through
• The researcher
• The research team
• The repository
• A data access committee
Resources for medical and health data
ands.org.au/working-with-data/sensitive-data
ands.org.au/medical
Publishing and sharing
sensitive data Guide
Data sharing considerations for Human
Research Ethics Committees Guide
De-identification Guide
Exercise: Thing 10: Sharing sensitive
data
Sharing sensitive data requires careful consideration, but it
can be done. Find out how.
Getting started: If it’s so sensitive - how can it possibly be
shared and published?!
Learn more: Who are the “data gatekeepers”?
Challenge me: Make me anonymous
http://www.ands.org.au/partners-and-communities/23-
research-data-things/all23/thing-10
Activity
Look at the consent forms
• UK Data Archive sample consent form
• Global Alliance for Genomics and Health consent
tools (focus on Section C)
• Health Science Alliance Biobank Consent
• https://www.icpsr.umich.edu/files/ICPSR/access/
dataprep.pdf (bottom of page 13)
Discuss between groups some of the good and bad
points of the consent form you examined.
Describing, publishing and
sharing research data
Why do people search for data?
Why do people search for data*?
•Exploratory/Scoping
•Reuse/Secondary data analysis
•Can be starting point or ad hoc
•Peer review
•Reproduce/extend results
•Repurpose (e.g. for mashups, visualisations, simulations)
•Verify claims (e.g. report findings)
*Not in any order; not exhaustive!
How do people find data?
How do people find data*?
•Google
•Ask a colleague
•Find link to data in a journal article
•Data journals
•Data registries e.g. re3data
•Open data portals e.g. data.gov
•Institutional repositories
•Data / Discipline repositories e.g. Dryad
•Project website
•Data discovery aggregators like Research Data Australia
•Library catalogues, databases
*Not in any order; not exhaustive!
Characteristics of finding data
When creating metadata records, keep in mind that finding data is:
● Movable feast / changing beast
● No standard practice, universal standard or vocab
● Databases are non-exhaustive
● Methods for searching and terms driven by why people are
looking and how the data is stored
FAIR Data
To aid discovery and reuse, data needs to be:
● Findable
● Accessible
● Interoperable
● Reusable
More on FAIR Data:
● FAIR Data Principles (FORCE11): https://www.force11.org/group/fairgroup/fairprinciples
● ANDS and FAIR Data: https://www.ands.org.au/working-with-data/fairdata
● FAIR Data ANDS Webinar series: https://www.youtube.com/user/andsdata (FAIR Data Playlist)
Good metadata is key to FAIR data
https://www.youtube.com/watch?v=ABF2FvSPVYE
Hands-on exercise: data description
Your task:
1. Divide into pairs
2. Each pair take one of the CSV datasets at
https://tinyurl.com/y8ttg2cj
3. Describe the dataset by creating a metadata record. Think
about: title, creators, date, short description and so on.
4. Bring your record to whole class discussion
Data description
How did you go? What did you learn?
What do think makes a ‘good’ metadata record?
Here are the original metadata descriptions for your csv datasets:
CSV dataset #1 - https://data.qld.gov.au/dataset/crocodile-sightings-in-queensland
- subset https://data.qld.gov.au/dataset/crocodile-sightings-in-
queensland/resource/6b0e71dd-4148-4934-b919-d50935d14417
CSV dataset #2 –
https://data.qld.gov.au/dataset/koala-hospital-data
Types of metadata
Metadata elements can describe either a single item or a collection, and can serve different
purposes. Examples of metadata for a photograph could include:
• descriptive metadata, such as the name of the photographer, the location and subject of the
photograph, the date and time that the photograph was taken
• technical metadata, such as the type of camera used to take the photograph, the file format in
which the photograph is stored, the exposure time and dimensions of the photograph, and so
on
• access and rights metadata, defining who is allowed to view the photograph under what
conditions, and what they can do with it (reuse)
• preservation metadata, which keeps track of actions taken to preserve or sustain the
photograph for later access and use.
Source: ANDS website
Where does metadata come from?
• Metadata can be created manually by people or automatically by instruments or computers.
• Metadata capture is easiest if it is automatically generated when the data is created, for
example, the metadata your camera captures every time you take a photo.
• For much research data, the researcher needs to create the descriptive and provenance
metadata, as only they have that information.
When should metadata for research data be created?
Whenever it is needed, particularly:
• During the course of data collection
• When the data changes
• And at the end when the data is deposited and ‘published’
Where is metadata stored?
• Metadata can be stored in local source systems like repositories – often with the data it is
about.
• Metadata that enables research data to be discovered and accessed should be published in
discovery portals like Research Data Australia, or in discipline or institutional portals.
• Metadata that gives detailed contextual information and supports reuse, such as data-item-level
metadata, workflows, analysis, and detailed methods information, is usually stored with the
data.
The power of rich metadata
• Well described metadata records show the power of rich metadata in making research data
collections discoverable, citable, reusable and accessible for the long term.
• Two-Rocks moorings data 2004 - 2005 metadata record in the CSIRO Data Access Portal
contains 35 metadata fields which enable researchers to quickly and accurately assess the
relevance of this dataset to their research. The metadata record and the data are closely
linked through co-location on the same access page. The Files tab contains additional
metadata about each of the 17 files within this collection: file type, last modified, and file size.
• Rich metadata allows records to be syndicated to other data catalogues; here is the same Two-
Rocks mooring data record syndicated to:
• Research Data Australia: Australia’s aggregated research data catalogue
• Marlin Oceans and Atmosphere: a discipline-specific metadata catalogue
Metadata standards
A metadata standard is a schema that has been formally approved and published, with governance
procedures in place to maintain and update the standard.
Examples:
Dublin Core - http://dublincore.org/documents/dces/
RIF-CS – http://services.ands.org.au/documentation/rifcs/1.6/guidelines/rif-cs.html
How do I find metadata standards?
• See this disciplinary metadata directory
• Also see https://data.research.cornell.edu/content/writing-metadata
Let’s delve a little deeper into how this works….
Vocabularies and research data
• A vocabulary sets out the common language a discipline has agreed to use to refer to concepts of
interest in that discipline.
• Researchers planning observation or surveys need to define their data items clearly.
• An agreed vocabulary (a standard) makes a good starting point for translating concepts into other
vocabularies so that collaboration can occur.
• Indexing vocabularies are used to tag items in library catalogues and search portals and to provide
keywords for academic journal articles.
• A vocabulary service is a machine-to-machine service that can support activities such as creating,
managing and querying vocabularies.
Want more?
• Read the ANDS Guide on vocabularies for research data
• Explore Research Vocabularies Australia
• Check out the COAR vocabularies
m-2-m metadata exchange
m-2-m = networked devices to exchange information and perform actions without the manual
assistance of humans.
Examples:
OAI-PMH – also known as metadata harvesting and commonly used by repositories and repository
aggregators
APIs – such as the ORCID API that enables things like authenticating against the ORCID registry
Want more?
Have a go at these Things:
Thing 4 – data discovery
Thing 11 – what’s my metadata schema?
Thing 12 – vocabularies for data description
Thing 13 – walk the crosswalk
Data repositories
Slide credit(above): Amanda Rinehart, Illinois State University
https://www.slideserve.com/lisle/amanda-rinehart-data-librarian-illinois-state-
university
Why are data
repositories a key
part of scholarly
communications?
Exploring data repositories
Exercise:
1. Go to re3data.org
2. Find a repository
3. Explore the records and
features of the repository
4. Input into class
discussion of what you
found
Bonus Q: what do you think makes a repository trustworthy?
Who knows what this is?
With the exception of third party images or where otherwise indicated, this work is licensed under the Creative
Commons 4.0 International Attribution Licence.
ANDS, Nectar and RDS are supported by the Australian Government through the National Collaborative Research
Infrastructure Strategy Program (NCRIS).
Natasha Simons
Associate Director, Skilled Workforce| Australian Research Data Commons
Industry Fellow | The University of Queensland
T: +61 7 3346 9991 | E: natasha.simons@ands.org.au | W: ands.org.au
ORCID: https://orcid.org/0000-0003-0635-1998 Tw: @n_simons
Thank you!

Fsci 2018 thursday2_august_am6

  • 1.
    Natasha Simons #AM6 Datain the Scholarly Communications Lifecycle FORCE11 Scholarly Communications Institute Monday 30 July – Friday 3 August 2018 San Diego, USA
  • 2.
    Thursday 2 August Today’scourse outline • Licensing research data for reuse • Managing personal and sensitive data • Describing, publishing and sharing research data Link to slides: https://tinyurl.com/y7s5okr4
  • 3.
    Licensing research datafor reuse Rachael Samberg & Maria Gould University of California Berkeley Data rights and licensing Thursday
  • 4.
    Managing personal &sensitive data
  • 5.
    What are personaland sensitive data? Privacy Act 1988 Personal information Sensitive information Health information Sensitive data “data that can be used to identify an individual, species, object, process or location that introduces a risk of discrimination, harm or unwanted attention.” Guide to Publishing and Sharing Sensitive Data http://www.ands.org.au/guides/sensitivedata
  • 6.
  • 7.
    Ethics • Informed consent •A key principle of ethics is avoid harm • Can be achieved by removing/minimising sensitivity • De-identifying data if possible (and the meaning is not lost in the process) • Conditions around access to data (mediated access, 5 safes) • Ethics committee approval needs to cover consent and access conditions • See also ANDS’ medical webinar series http://www.ands.org.au/working-with-data/sensitive-data/medical-and- health/webinars-health-and-medical
  • 8.
    Informed consent fordata sharing 1. Avoid precluding data de-identification, publication and sharing 2. State possibility of future data publication 3. State conditions of access 4. Document consent with collected data to inform subsequent users Example wording available in ANDS Guide to Publishing and Sharing Sensitive Data
  • 9.
    Identifiable* Re-identifiable* Non-identifiable* De-identification/ Anonymisation No specific individualcan be identified Possible to re-identify an individual Identity of an individual can be reasonably ascertained http://www.ands.org.au/working-with- data/sensitive-data/de-identifying-data * Terms from National Statement on Ethical Conduct in Human Research 2007 (Updated May 2015) Data de-identification
  • 10.
    What about sharingdata that can’t be de-identified? healthtalkaustralia.org Informed consent / mediated access
  • 11.
    Mediated access • Themetadata is openly available but the data is not • Access mediated through • The researcher • The research team • The repository • A data access committee
  • 14.
    Resources for medicaland health data ands.org.au/working-with-data/sensitive-data ands.org.au/medical Publishing and sharing sensitive data Guide Data sharing considerations for Human Research Ethics Committees Guide De-identification Guide
  • 16.
    Exercise: Thing 10:Sharing sensitive data Sharing sensitive data requires careful consideration, but it can be done. Find out how. Getting started: If it’s so sensitive - how can it possibly be shared and published?! Learn more: Who are the “data gatekeepers”? Challenge me: Make me anonymous http://www.ands.org.au/partners-and-communities/23- research-data-things/all23/thing-10
  • 17.
    Activity Look at theconsent forms • UK Data Archive sample consent form • Global Alliance for Genomics and Health consent tools (focus on Section C) • Health Science Alliance Biobank Consent • https://www.icpsr.umich.edu/files/ICPSR/access/ dataprep.pdf (bottom of page 13) Discuss between groups some of the good and bad points of the consent form you examined.
  • 18.
  • 19.
    Why do peoplesearch for data?
  • 20.
    Why do peoplesearch for data*? •Exploratory/Scoping •Reuse/Secondary data analysis •Can be starting point or ad hoc •Peer review •Reproduce/extend results •Repurpose (e.g. for mashups, visualisations, simulations) •Verify claims (e.g. report findings) *Not in any order; not exhaustive!
  • 21.
    How do peoplefind data?
  • 22.
    How do peoplefind data*? •Google •Ask a colleague •Find link to data in a journal article •Data journals •Data registries e.g. re3data •Open data portals e.g. data.gov •Institutional repositories •Data / Discipline repositories e.g. Dryad •Project website •Data discovery aggregators like Research Data Australia •Library catalogues, databases *Not in any order; not exhaustive!
  • 23.
    Characteristics of findingdata When creating metadata records, keep in mind that finding data is: ● Movable feast / changing beast ● No standard practice, universal standard or vocab ● Databases are non-exhaustive ● Methods for searching and terms driven by why people are looking and how the data is stored
  • 24.
    FAIR Data To aiddiscovery and reuse, data needs to be: ● Findable ● Accessible ● Interoperable ● Reusable More on FAIR Data: ● FAIR Data Principles (FORCE11): https://www.force11.org/group/fairgroup/fairprinciples ● ANDS and FAIR Data: https://www.ands.org.au/working-with-data/fairdata ● FAIR Data ANDS Webinar series: https://www.youtube.com/user/andsdata (FAIR Data Playlist)
  • 25.
    Good metadata iskey to FAIR data https://www.youtube.com/watch?v=ABF2FvSPVYE
  • 26.
    Hands-on exercise: datadescription Your task: 1. Divide into pairs 2. Each pair take one of the CSV datasets at https://tinyurl.com/y8ttg2cj 3. Describe the dataset by creating a metadata record. Think about: title, creators, date, short description and so on. 4. Bring your record to whole class discussion
  • 27.
    Data description How didyou go? What did you learn? What do think makes a ‘good’ metadata record? Here are the original metadata descriptions for your csv datasets: CSV dataset #1 - https://data.qld.gov.au/dataset/crocodile-sightings-in-queensland - subset https://data.qld.gov.au/dataset/crocodile-sightings-in- queensland/resource/6b0e71dd-4148-4934-b919-d50935d14417 CSV dataset #2 – https://data.qld.gov.au/dataset/koala-hospital-data
  • 28.
    Types of metadata Metadataelements can describe either a single item or a collection, and can serve different purposes. Examples of metadata for a photograph could include: • descriptive metadata, such as the name of the photographer, the location and subject of the photograph, the date and time that the photograph was taken • technical metadata, such as the type of camera used to take the photograph, the file format in which the photograph is stored, the exposure time and dimensions of the photograph, and so on • access and rights metadata, defining who is allowed to view the photograph under what conditions, and what they can do with it (reuse) • preservation metadata, which keeps track of actions taken to preserve or sustain the photograph for later access and use. Source: ANDS website
  • 29.
    Where does metadatacome from? • Metadata can be created manually by people or automatically by instruments or computers. • Metadata capture is easiest if it is automatically generated when the data is created, for example, the metadata your camera captures every time you take a photo. • For much research data, the researcher needs to create the descriptive and provenance metadata, as only they have that information. When should metadata for research data be created? Whenever it is needed, particularly: • During the course of data collection • When the data changes • And at the end when the data is deposited and ‘published’
  • 30.
    Where is metadatastored? • Metadata can be stored in local source systems like repositories – often with the data it is about. • Metadata that enables research data to be discovered and accessed should be published in discovery portals like Research Data Australia, or in discipline or institutional portals. • Metadata that gives detailed contextual information and supports reuse, such as data-item-level metadata, workflows, analysis, and detailed methods information, is usually stored with the data.
  • 31.
    The power ofrich metadata • Well described metadata records show the power of rich metadata in making research data collections discoverable, citable, reusable and accessible for the long term. • Two-Rocks moorings data 2004 - 2005 metadata record in the CSIRO Data Access Portal contains 35 metadata fields which enable researchers to quickly and accurately assess the relevance of this dataset to their research. The metadata record and the data are closely linked through co-location on the same access page. The Files tab contains additional metadata about each of the 17 files within this collection: file type, last modified, and file size. • Rich metadata allows records to be syndicated to other data catalogues; here is the same Two- Rocks mooring data record syndicated to: • Research Data Australia: Australia’s aggregated research data catalogue • Marlin Oceans and Atmosphere: a discipline-specific metadata catalogue
  • 32.
    Metadata standards A metadatastandard is a schema that has been formally approved and published, with governance procedures in place to maintain and update the standard. Examples: Dublin Core - http://dublincore.org/documents/dces/ RIF-CS – http://services.ands.org.au/documentation/rifcs/1.6/guidelines/rif-cs.html How do I find metadata standards? • See this disciplinary metadata directory • Also see https://data.research.cornell.edu/content/writing-metadata Let’s delve a little deeper into how this works….
  • 33.
    Vocabularies and researchdata • A vocabulary sets out the common language a discipline has agreed to use to refer to concepts of interest in that discipline. • Researchers planning observation or surveys need to define their data items clearly. • An agreed vocabulary (a standard) makes a good starting point for translating concepts into other vocabularies so that collaboration can occur. • Indexing vocabularies are used to tag items in library catalogues and search portals and to provide keywords for academic journal articles. • A vocabulary service is a machine-to-machine service that can support activities such as creating, managing and querying vocabularies. Want more? • Read the ANDS Guide on vocabularies for research data • Explore Research Vocabularies Australia • Check out the COAR vocabularies
  • 34.
    m-2-m metadata exchange m-2-m= networked devices to exchange information and perform actions without the manual assistance of humans. Examples: OAI-PMH – also known as metadata harvesting and commonly used by repositories and repository aggregators APIs – such as the ORCID API that enables things like authenticating against the ORCID registry
  • 35.
    Want more? Have ago at these Things: Thing 4 – data discovery Thing 11 – what’s my metadata schema? Thing 12 – vocabularies for data description Thing 13 – walk the crosswalk
  • 36.
    Data repositories Slide credit(above):Amanda Rinehart, Illinois State University https://www.slideserve.com/lisle/amanda-rinehart-data-librarian-illinois-state- university Why are data repositories a key part of scholarly communications?
  • 37.
    Exploring data repositories Exercise: 1.Go to re3data.org 2. Find a repository 3. Explore the records and features of the repository 4. Input into class discussion of what you found Bonus Q: what do you think makes a repository trustworthy?
  • 38.
  • 39.
    With the exceptionof third party images or where otherwise indicated, this work is licensed under the Creative Commons 4.0 International Attribution Licence. ANDS, Nectar and RDS are supported by the Australian Government through the National Collaborative Research Infrastructure Strategy Program (NCRIS). Natasha Simons Associate Director, Skilled Workforce| Australian Research Data Commons Industry Fellow | The University of Queensland T: +61 7 3346 9991 | E: natasha.simons@ands.org.au | W: ands.org.au ORCID: https://orcid.org/0000-0003-0635-1998 Tw: @n_simons Thank you!