SlideShare a Scribd company logo
Dataverse and DataTags
Mercè Crosas, Ph.D.
Chief Data Science and Technology Officer
Institute for Quantitive Social Science
Harvard University
@mercecrosas
NFAIS Open Data Fostering Open Science
June 20, 2016
“Research	
  data	
  publishing	
  is	
  the	
  release	
  of	
  research	
  data,	
  associated	
  metadata,	
  accompanying	
  documenta8on,	
  and	
  so9ware	
  code	
  (in	
  cases	
  where	
  the	
  raw	
  data	
  have	
  been	
  processed	
  or	
  manipulated)	
  for	
  re-­‐use	
  and	
  
analysis	
  in	
  such	
  a	
  manner	
  that	
  they	
  can	
  be	
  discovered	
  on	
  the	
  Web	
  and	
  referred	
  to	
  in	
  a	
  unique	
  and	
  persistent	
  way.	
  
“Research	
  data	
  publishing	
  is	
  the	
  release	
  of	
  research	
  
data,	
  associated	
  metadata,	
  accompanying	
  
documenta8on,	
  and	
  so9ware	
  code	
  (in	
  cases	
  where	
  the	
  
raw	
  data	
  have	
  been	
  processed	
  or	
  manipulated)	
  for	
  re-­‐
use	
  and	
  analysis	
  in	
  such	
  a	
  manner	
  that	
  they	
  can	
  be	
  
discovered	
  on	
  the	
  Web	
  and	
  referred	
  to	
  in	
  a	
  unique	
  and	
  
persistent	
  way.”
RDA Data Publishing Workflows Working Group; 10.5281/zenodo.34542
Data Publishing is sharing data
that are:
Findable Accessible
Interoperable Reusable
Why publish data?
Researchers
Get credit for their
data
Publishers and
Journals
Verify published work
Federal funding
agencies
Make public assets
public
Science
Validate, reuse and
extend previous work
Ways of Publishing Data
Scholarly Article
Data in
Repository
Data
Descriptor
or Data Paper
Data in
Repository
Published
Dataset
in Repository
Scholarly Article
Scholarly Article
Journal’s
data policy
A data repository system for sharing and
archiving research data
A Solution for Publishing FAIR research data:
Findable,Accessible, Interoperable, Reusable
http://dataverse.org
Created and developed at Harvard’s Institute
for Quantitative Social Science
Harvard Dataverse:
Generic data repository open
to researchers world wide
http://dataverse.harvard.edu
Dataverse Today:A growing Community
Dataverse Project:
• Dataverse installations:19; serving >
200 Universities
• User Community group: 294
members
• Open-source software: 29
contributors
• Dataverse Community Meeting (July,
2016):107 registered, so far
• Twitter: 2940 followers
Harvard Dataverse Repository:
• Registered users: 13,795; 300
new per month
• Dataverses: 1,677; 50 new per
month
• Journal Dataverses: 91
• Datasets: 61,781; 400 new per
month
• Data Files: 330,462; 3,000 new
per month
Dataverses contain datasets or dataverses
Datasets contain metadata and data files
Dataverse follows best practices
for FAIR Data Publishing
Best Practices
Data Citation Metadata
Access
Control and
Rules
Reference,
locate and
attribute
Discover and
reuse
Access
protecting
privacy
APIs and
Standards
Interoperate
Data Citation in Dataverse
PublishedYear Dataset Title
Global
Persistent
Identifier
Repository
= Data Publisher
Version (or time
range)
Authors
Data Citation Basics
Force11, Joint Declaration of Data Citation Principles, 2014; Starr et al, 2015
The dataset landing page is accessible and guaranteed by the
repository (data publisher), even when data are restricted or
deaccessioned
Metadata in Dataverse
Citation Metadata author, title, repository, year
published, version, etc
Dublin Core
DataCite
Domain-specific
Metadata
data collection info (methods,
organism, observation, survey,
experiment, etc)
DDI (social sciences)
ISA-Tab BioCaddie (biomed)
Virtual Observatory (astro)
+ Custom metadata blocks
File-level Metadata
metadata inside the data file
(variables, instrument details,
geospatial info, etc)
DDI (for variables),
+ more to be determined
Fields StandardsMetadata Level
DataverseJSONSchema
Tiered Access
Open (default):
CC0
Open Open Click to Download
GuestBook Open Open
Fill in guestbook before
download
Terms of Use Open Open
Click through terms of
use before download
Data Restricted Open Restricted Request Access via
click through
Data Restricted Open Restricted
Request Access via
application
Metadata Files How to Access
Data Publishing Workflows
Create Dataset
(landing page
restricted)
Publish v. 1
Minor change
(metadata only)
Publish v. 1.1
Major change
(might include new
data file)
Publish v. 2
Review
(collaborators or
anonymous review)
Learn more at dataverse.org guides
Privacy tools to
share sensitive data
Data provenance
Biomedical large-
scale data
Social Science
Big Data
Journal articles
connected to data
Data Privacy
Current Research Grants
How can we maximize data
publishing of sensitive data
while being mindful of privacy?
Sweeney L, Crosas M, Bar-Sinai M. Sharing Sensitive Data with Confidence:The DataTags System.
Technology Science. 2015101601. October 16, 2015. http://techscience.org/a/2015101601
The DataTags System
A datatag is a set of security features and
access requirements for file handling.
A datatags repository is one that stores and
shares data files in accordance with a
standardized and ordered levels of security
and access requirements
Datatags&Levels&
Tag$Type$ Descrip-on$ Security$Features$ Access$Requirements$
Blue$ Public& Clear&storage&
Clear&transmission&
&
Open&
Green$ Controlled$
public&
Clear&storage&
Clear&transmission&
Email,&OAuth&verified&
registra:on&
Yellow$ Accountable& Clear&storage&
Encrypted&transmit&
Password,&Registered&,&
Approval,&Click&DUA&
Orange$ More$
accountable&
Encrypted&storage&
Encrypted&transmit&
Password,&Registered,&
Approval,&Signed&DUA&
Red$ Fully$
accountable&
Encrypted&storage&
Encrypted&transmit&
TwoDfactor&authen:ca:on,&
Approval,&Signed&DUA&
Crimson$ Maximally$
restricted&
Mul:Encrypt&store&
Encrypted&transmit&
TwoDfactor&authen:ca:on,&
Approval,&Signed&DUA&
DataTags Workflow in a Dataverse Repository
(under development)
Data$File$
Inges-on$
Sensi-ve$
Dataset$
Direct$
Access$
Privacy$
Preserving$
Access$
Automa-c$
Interview$$
Review$Board$
Approval$
http://datatags.org
http://privacytools.seas.harvard.edu
Two-factor
Authentication;
Signed DUA
Example of DataTags Interview:
A sequence of questions from an expert system
Example of DataTags Interview:
Final datatag human-readable and machine-actionable policy
Summary
• Data sharing is good for researchers, journals, funding
agencies, and science
• Dataverse is an open-source software for building data
repositories to share research data
• Data citation and rich metadata support are key to
Dataverse, and enable FAIR data publishing
• Dataverse also supports tiered access to data and data
publishing review and versioning workflows
• DataTags generates human-readable and machine-actionable
policies to support sensitive datasets in data repositories
Join us to this year’s Dataverse
Community Meeting
References
@mercecrosas and http://scholar.harvard.edu/mercecrosas
http://dataverse.org
http://dataverse.harvard.edu
http://datatags.org
Wilkinson, et al, 2016,The FAIR Guiding Principles for Scientific Data Management
and Stewardship, Scientific Data
Altman, Borgman, Crosas, Martone, 2015,An Introduction to the Joint Data Citation
Principles, Bulletin of the Association for Information Science and Technology
Starr et al, 2015, Achieving Human and Machine Accessibility of Cited Data in
Scholarly Publications, PeerJ Computer Science
Meyer et al, 2016, Data Publication with the Structural Biology Grid Supports Live
Analysis, Nature Communications
Sweeney, Crosas, Bar-Sinai. 2015, Sharing Sensitive Data with Confidence:The
DataTags System.Technology Science

More Related Content

What's hot

Preparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR PrinciplesPreparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR Principles
London School of Hygiene and Tropical Medicine
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
Carole Goble
 
Data Citation Implementation at Dataverse
Data Citation Implementation at DataverseData Citation Implementation at Dataverse
Data Citation Implementation at Dataverse
Merce Crosas
 
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Tom Plasterer
 
dkNET Poster ENDO 2016
dkNET Poster ENDO 2016 dkNET Poster ENDO 2016
dkNET Poster ENDO 2016
dkNET
 
Dataverse on the MOC
Dataverse on the MOCDataverse on the MOC
Dataverse on the MOC
Merce Crosas
 
BioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative AdvantageBioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative Advantage
Tom Plasterer
 
Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck
Todd Vision
 
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Tom Plasterer
 
The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...
Hilmar Lapp
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management EcosystemJohn Kunze
 
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
dkNET
 
dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017
dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017
dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017
dkNET
 
NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016
Susanna-Assunta Sansone
 
Linked Data for Biopharma
Linked Data for BiopharmaLinked Data for Biopharma
Linked Data for Biopharma
Tom Plasterer
 
Data Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim ClarkData Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim Clark
datascienceiqss
 
dkNET Introductory Webinar 05/10/2017
dkNET Introductory Webinar 05/10/2017dkNET Introductory Webinar 05/10/2017
dkNET Introductory Webinar 05/10/2017
dkNET
 
Collaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and softwareCollaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and software
Anita de Waard
 
The expanding dataverse
The expanding dataverseThe expanding dataverse
The expanding dataverse
Merce Crosas
 
RDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuseRDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuse
ASIS&T
 

What's hot (20)

Preparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR PrinciplesPreparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR Principles
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
Data Citation Implementation at Dataverse
Data Citation Implementation at DataverseData Citation Implementation at Dataverse
Data Citation Implementation at Dataverse
 
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
Edge Informatics and FAIR (Findable, Accessible, Interoperable and Reusable) ...
 
dkNET Poster ENDO 2016
dkNET Poster ENDO 2016 dkNET Poster ENDO 2016
dkNET Poster ENDO 2016
 
Dataverse on the MOC
Dataverse on the MOCDataverse on the MOC
Dataverse on the MOC
 
BioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative AdvantageBioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative Advantage
 
Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck
 
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
 
The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management Ecosystem
 
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
bioCADDIE Webinar: The NIDDK Information Network (dkNET) - A Community Resear...
 
dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017
dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017
dkNET-NURSA Challenge Kick-Off Webinar 04/27/2017
 
NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016
 
Linked Data for Biopharma
Linked Data for BiopharmaLinked Data for Biopharma
Linked Data for Biopharma
 
Data Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim ClarkData Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim Clark
 
dkNET Introductory Webinar 05/10/2017
dkNET Introductory Webinar 05/10/2017dkNET Introductory Webinar 05/10/2017
dkNET Introductory Webinar 05/10/2017
 
Collaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and softwareCollaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and software
 
The expanding dataverse
The expanding dataverseThe expanding dataverse
The expanding dataverse
 
RDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuseRDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuse
 

Similar to Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and DataTags

Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access Symposium
Merce Crosas
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data Sharing
Merce Crosas
 
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
ARDC
 
Research Data Management and Librarians
Research Data Management and LibrariansResearch Data Management and Librarians
Research Data Management and Librarians
Johann van Wyk
 
SHARE Update for CNI, Fall 2014
SHARE Update for CNI, Fall 2014SHARE Update for CNI, Fall 2014
SHARE Update for CNI, Fall 2014
SHARE
 
Jonathan David Crabtree - The Dataverse Community: Supporting Open Science an...
Jonathan David Crabtree - The Dataverse Community: Supporting Open Science an...Jonathan David Crabtree - The Dataverse Community: Supporting Open Science an...
Jonathan David Crabtree - The Dataverse Community: Supporting Open Science an...
SciELO - Scientific Electronic Library Online
 
Fsci 2018 thursday2_august_am6
Fsci 2018 thursday2_august_am6Fsci 2018 thursday2_august_am6
Fsci 2018 thursday2_august_am6
ARDC
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
Carole Goble
 
Managing, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital EnvironmentManaging, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital Environment
philipdurbin
 
Research Integrity Advisor and Data Management
Research Integrity Advisor and Data ManagementResearch Integrity Advisor and Data Management
Research Integrity Advisor and Data Management
ARDC
 
Laurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
Laurie Goodman at NDIC: Big Data Publishing, Handling & ReuseLaurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
Laurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
GigaScience, BGI Hong Kong
 
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Natsuko Nicholls
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
Carole Goble
 
SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...
Fiona Nielsen
 
Natasha intro to rdm c3 dis may 2018.pptx
Natasha intro to rdm c3 dis may 2018.pptxNatasha intro to rdm c3 dis may 2018.pptx
Natasha intro to rdm c3 dis may 2018.pptx
ARDC
 
FAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basicsFAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basics
OpenAIRE
 
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
DeVonne Parks, CEM
 
It summit dataverse-bigdata-mercecrosas
It summit dataverse-bigdata-mercecrosasIt summit dataverse-bigdata-mercecrosas
It summit dataverse-bigdata-mercecrosas
kevin_donovan
 
DataONE Education Module 08: Data Citation
DataONE Education Module 08: Data CitationDataONE Education Module 08: Data Citation
DataONE Education Module 08: Data Citation
DataONE
 
Managing your data paget
Managing your data pagetManaging your data paget
Managing your data paget
TERN Australia
 

Similar to Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and DataTags (20)

Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access Symposium
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data Sharing
 
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
Research Data Management in practice, RIA Data Management Workshop Brisbane 2017
 
Research Data Management and Librarians
Research Data Management and LibrariansResearch Data Management and Librarians
Research Data Management and Librarians
 
SHARE Update for CNI, Fall 2014
SHARE Update for CNI, Fall 2014SHARE Update for CNI, Fall 2014
SHARE Update for CNI, Fall 2014
 
Jonathan David Crabtree - The Dataverse Community: Supporting Open Science an...
Jonathan David Crabtree - The Dataverse Community: Supporting Open Science an...Jonathan David Crabtree - The Dataverse Community: Supporting Open Science an...
Jonathan David Crabtree - The Dataverse Community: Supporting Open Science an...
 
Fsci 2018 thursday2_august_am6
Fsci 2018 thursday2_august_am6Fsci 2018 thursday2_august_am6
Fsci 2018 thursday2_august_am6
 
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
FAIR Software (and Data) Citation: Europe, Research Object Systems, Networks ...
 
Managing, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital EnvironmentManaging, Sharing and Curating Your Research Data in a Digital Environment
Managing, Sharing and Curating Your Research Data in a Digital Environment
 
Research Integrity Advisor and Data Management
Research Integrity Advisor and Data ManagementResearch Integrity Advisor and Data Management
Research Integrity Advisor and Data Management
 
Laurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
Laurie Goodman at NDIC: Big Data Publishing, Handling & ReuseLaurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
Laurie Goodman at NDIC: Big Data Publishing, Handling & Reuse
 
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...SciDataCon - How to increase accessibility and reuse for clinical and persona...
SciDataCon - How to increase accessibility and reuse for clinical and persona...
 
Natasha intro to rdm c3 dis may 2018.pptx
Natasha intro to rdm c3 dis may 2018.pptxNatasha intro to rdm c3 dis may 2018.pptx
Natasha intro to rdm c3 dis may 2018.pptx
 
FAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basicsFAIR Ddata in trustworthy repositories: the basics
FAIR Ddata in trustworthy repositories: the basics
 
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
 
It summit dataverse-bigdata-mercecrosas
It summit dataverse-bigdata-mercecrosasIt summit dataverse-bigdata-mercecrosas
It summit dataverse-bigdata-mercecrosas
 
DataONE Education Module 08: Data Citation
DataONE Education Module 08: Data CitationDataONE Education Module 08: Data Citation
DataONE Education Module 08: Data Citation
 
Managing your data paget
Managing your data pagetManaging your data paget
Managing your data paget
 

More from Merce Crosas

Practical Implementation of research data policies: Solutions with Dataverse
Practical Implementation of research data policies: Solutions with DataversePractical Implementation of research data policies: Solutions with Dataverse
Practical Implementation of research data policies: Solutions with Dataverse
Merce Crosas
 
Research Data Management @Harvard
Research Data Management @HarvardResearch Data Management @Harvard
Research Data Management @Harvard
Merce Crosas
 
Cloud Dataverse: A Data repository platform for an OpenStack Cloud
Cloud Dataverse: A Data repository platform for an OpenStack CloudCloud Dataverse: A Data repository platform for an OpenStack Cloud
Cloud Dataverse: A Data repository platform for an OpenStack Cloud
Merce Crosas
 
Can data access combat fake news?
Can data access combat fake news?Can data access combat fake news?
Can data access combat fake news?
Merce Crosas
 
The Data Lifecycle (Harvard DataFest)
The Data Lifecycle (Harvard DataFest)The Data Lifecycle (Harvard DataFest)
The Data Lifecycle (Harvard DataFest)
Merce Crosas
 
Cloud Dataverse
Cloud DataverseCloud Dataverse
Cloud Dataverse
Merce Crosas
 
Making Data Accessible
Making Data AccessibleMaking Data Accessible
Making Data Accessible
Merce Crosas
 
Abcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasAbcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosas
Merce Crosas
 
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...
Merce Crosas
 
A very Brief History of Communicating Science
A very Brief History of Communicating ScienceA very Brief History of Communicating Science
A very Brief History of Communicating Science
Merce Crosas
 
Dataverse hpdm symposium
Dataverse   hpdm symposiumDataverse   hpdm symposium
Dataverse hpdm symposium
Merce Crosas
 
Collaboration in science and technology it summit
Collaboration in science and technology   it summitCollaboration in science and technology   it summit
Collaboration in science and technology it summit
Merce Crosas
 
Dataverse for Journals
Dataverse for JournalsDataverse for Journals
Dataverse for Journals
Merce Crosas
 
Collaboration in science and technology
Collaboration in science and technologyCollaboration in science and technology
Collaboration in science and technology
Merce Crosas
 
Force11 jddcp intro
Force11  jddcp introForce11  jddcp intro
Force11 jddcp intro
Merce Crosas
 

More from Merce Crosas (15)

Practical Implementation of research data policies: Solutions with Dataverse
Practical Implementation of research data policies: Solutions with DataversePractical Implementation of research data policies: Solutions with Dataverse
Practical Implementation of research data policies: Solutions with Dataverse
 
Research Data Management @Harvard
Research Data Management @HarvardResearch Data Management @Harvard
Research Data Management @Harvard
 
Cloud Dataverse: A Data repository platform for an OpenStack Cloud
Cloud Dataverse: A Data repository platform for an OpenStack CloudCloud Dataverse: A Data repository platform for an OpenStack Cloud
Cloud Dataverse: A Data repository platform for an OpenStack Cloud
 
Can data access combat fake news?
Can data access combat fake news?Can data access combat fake news?
Can data access combat fake news?
 
The Data Lifecycle (Harvard DataFest)
The Data Lifecycle (Harvard DataFest)The Data Lifecycle (Harvard DataFest)
The Data Lifecycle (Harvard DataFest)
 
Cloud Dataverse
Cloud DataverseCloud Dataverse
Cloud Dataverse
 
Making Data Accessible
Making Data AccessibleMaking Data Accessible
Making Data Accessible
 
Abcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasAbcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosas
 
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...
 
A very Brief History of Communicating Science
A very Brief History of Communicating ScienceA very Brief History of Communicating Science
A very Brief History of Communicating Science
 
Dataverse hpdm symposium
Dataverse   hpdm symposiumDataverse   hpdm symposium
Dataverse hpdm symposium
 
Collaboration in science and technology it summit
Collaboration in science and technology   it summitCollaboration in science and technology   it summit
Collaboration in science and technology it summit
 
Dataverse for Journals
Dataverse for JournalsDataverse for Journals
Dataverse for Journals
 
Collaboration in science and technology
Collaboration in science and technologyCollaboration in science and technology
Collaboration in science and technology
 
Force11 jddcp intro
Force11  jddcp introForce11  jddcp intro
Force11 jddcp intro
 

Recently uploaded

Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 

Recently uploaded (20)

Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 

Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and DataTags

  • 1. Dataverse and DataTags Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitive Social Science Harvard University @mercecrosas NFAIS Open Data Fostering Open Science June 20, 2016
  • 2. “Research  data  publishing  is  the  release  of  research  data,  associated  metadata,  accompanying  documenta8on,  and  so9ware  code  (in  cases  where  the  raw  data  have  been  processed  or  manipulated)  for  re-­‐use  and   analysis  in  such  a  manner  that  they  can  be  discovered  on  the  Web  and  referred  to  in  a  unique  and  persistent  way.   “Research  data  publishing  is  the  release  of  research   data,  associated  metadata,  accompanying   documenta8on,  and  so9ware  code  (in  cases  where  the   raw  data  have  been  processed  or  manipulated)  for  re-­‐ use  and  analysis  in  such  a  manner  that  they  can  be   discovered  on  the  Web  and  referred  to  in  a  unique  and   persistent  way.” RDA Data Publishing Workflows Working Group; 10.5281/zenodo.34542
  • 3. Data Publishing is sharing data that are: Findable Accessible Interoperable Reusable
  • 4. Why publish data? Researchers Get credit for their data Publishers and Journals Verify published work Federal funding agencies Make public assets public Science Validate, reuse and extend previous work
  • 5. Ways of Publishing Data Scholarly Article Data in Repository Data Descriptor or Data Paper Data in Repository Published Dataset in Repository Scholarly Article Scholarly Article Journal’s data policy
  • 6. A data repository system for sharing and archiving research data A Solution for Publishing FAIR research data: Findable,Accessible, Interoperable, Reusable
  • 7. http://dataverse.org Created and developed at Harvard’s Institute for Quantitative Social Science Harvard Dataverse: Generic data repository open to researchers world wide http://dataverse.harvard.edu
  • 8. Dataverse Today:A growing Community Dataverse Project: • Dataverse installations:19; serving > 200 Universities • User Community group: 294 members • Open-source software: 29 contributors • Dataverse Community Meeting (July, 2016):107 registered, so far • Twitter: 2940 followers Harvard Dataverse Repository: • Registered users: 13,795; 300 new per month • Dataverses: 1,677; 50 new per month • Journal Dataverses: 91 • Datasets: 61,781; 400 new per month • Data Files: 330,462; 3,000 new per month
  • 9. Dataverses contain datasets or dataverses Datasets contain metadata and data files
  • 10. Dataverse follows best practices for FAIR Data Publishing
  • 11. Best Practices Data Citation Metadata Access Control and Rules Reference, locate and attribute Discover and reuse Access protecting privacy APIs and Standards Interoperate
  • 12. Data Citation in Dataverse PublishedYear Dataset Title Global Persistent Identifier Repository = Data Publisher Version (or time range) Authors
  • 13. Data Citation Basics Force11, Joint Declaration of Data Citation Principles, 2014; Starr et al, 2015 The dataset landing page is accessible and guaranteed by the repository (data publisher), even when data are restricted or deaccessioned
  • 14. Metadata in Dataverse Citation Metadata author, title, repository, year published, version, etc Dublin Core DataCite Domain-specific Metadata data collection info (methods, organism, observation, survey, experiment, etc) DDI (social sciences) ISA-Tab BioCaddie (biomed) Virtual Observatory (astro) + Custom metadata blocks File-level Metadata metadata inside the data file (variables, instrument details, geospatial info, etc) DDI (for variables), + more to be determined Fields StandardsMetadata Level DataverseJSONSchema
  • 15. Tiered Access Open (default): CC0 Open Open Click to Download GuestBook Open Open Fill in guestbook before download Terms of Use Open Open Click through terms of use before download Data Restricted Open Restricted Request Access via click through Data Restricted Open Restricted Request Access via application Metadata Files How to Access
  • 16. Data Publishing Workflows Create Dataset (landing page restricted) Publish v. 1 Minor change (metadata only) Publish v. 1.1 Major change (might include new data file) Publish v. 2 Review (collaborators or anonymous review)
  • 17. Learn more at dataverse.org guides
  • 18. Privacy tools to share sensitive data Data provenance Biomedical large- scale data Social Science Big Data Journal articles connected to data Data Privacy Current Research Grants
  • 19. How can we maximize data publishing of sensitive data while being mindful of privacy?
  • 20. Sweeney L, Crosas M, Bar-Sinai M. Sharing Sensitive Data with Confidence:The DataTags System. Technology Science. 2015101601. October 16, 2015. http://techscience.org/a/2015101601 The DataTags System
  • 21. A datatag is a set of security features and access requirements for file handling. A datatags repository is one that stores and shares data files in accordance with a standardized and ordered levels of security and access requirements
  • 22. Datatags&Levels& Tag$Type$ Descrip-on$ Security$Features$ Access$Requirements$ Blue$ Public& Clear&storage& Clear&transmission& & Open& Green$ Controlled$ public& Clear&storage& Clear&transmission& Email,&OAuth&verified& registra:on& Yellow$ Accountable& Clear&storage& Encrypted&transmit& Password,&Registered&,& Approval,&Click&DUA& Orange$ More$ accountable& Encrypted&storage& Encrypted&transmit& Password,&Registered,& Approval,&Signed&DUA& Red$ Fully$ accountable& Encrypted&storage& Encrypted&transmit& TwoDfactor&authen:ca:on,& Approval,&Signed&DUA& Crimson$ Maximally$ restricted& Mul:Encrypt&store& Encrypted&transmit& TwoDfactor&authen:ca:on,& Approval,&Signed&DUA&
  • 23. DataTags Workflow in a Dataverse Repository (under development) Data$File$ Inges-on$ Sensi-ve$ Dataset$ Direct$ Access$ Privacy$ Preserving$ Access$ Automa-c$ Interview$$ Review$Board$ Approval$ http://datatags.org http://privacytools.seas.harvard.edu Two-factor Authentication; Signed DUA
  • 24. Example of DataTags Interview: A sequence of questions from an expert system
  • 25. Example of DataTags Interview: Final datatag human-readable and machine-actionable policy
  • 26. Summary • Data sharing is good for researchers, journals, funding agencies, and science • Dataverse is an open-source software for building data repositories to share research data • Data citation and rich metadata support are key to Dataverse, and enable FAIR data publishing • Dataverse also supports tiered access to data and data publishing review and versioning workflows • DataTags generates human-readable and machine-actionable policies to support sensitive datasets in data repositories
  • 27. Join us to this year’s Dataverse Community Meeting
  • 28. References @mercecrosas and http://scholar.harvard.edu/mercecrosas http://dataverse.org http://dataverse.harvard.edu http://datatags.org Wilkinson, et al, 2016,The FAIR Guiding Principles for Scientific Data Management and Stewardship, Scientific Data Altman, Borgman, Crosas, Martone, 2015,An Introduction to the Joint Data Citation Principles, Bulletin of the Association for Information Science and Technology Starr et al, 2015, Achieving Human and Machine Accessibility of Cited Data in Scholarly Publications, PeerJ Computer Science Meyer et al, 2016, Data Publication with the Structural Biology Grid Supports Live Analysis, Nature Communications Sweeney, Crosas, Bar-Sinai. 2015, Sharing Sensitive Data with Confidence:The DataTags System.Technology Science