SlideShare a Scribd company logo
1 of 16
Download to read offline
ADDRESSING	
  THE	
  NEXT	
  CHALLENGES	
  IN	
  
DATA	
  SHARING:	
  
LARGE-­‐SCALE	
  DATA	
  AND	
  SENSITIVE	
  DATA	
  
Mercè	
  Crosas,	
  Ph.D.	
  
Chief	
  Data	
  Science	
  and	
  Technology	
  Officer	
  
Ins=tute	
  for	
  Quan=ta=ve	
  Social	
  Science	
  
Harvard	
  University	
  
@mercecrosas	
  
	
  
Data	
  sharing:	
  	
  
good	
  for	
  you	
  and	
  good	
  for	
  the	
  world	
  
Researchers	
   Get	
  credit	
  for	
  their	
  data	
  
Publishers	
  and	
  Journals	
   Verify	
  published	
  work	
  
Federal	
  funding	
  agencies	
  
Make	
  public	
  assets	
  
accessible	
  
Science	
  
Validate,	
  reuse	
  and	
  
extend	
  previous	
  work	
  
Data	
  
Sharing	
  (or	
  
Publishing)	
  
A	
  formal	
  data	
  
cita=on	
  
•  Reference	
  
•  Access	
  (persistent	
  
iden=fier)	
  
Informa=on	
  
about	
  the	
  data	
  
(metadata)	
  
•  Discovery	
  
•  Use	
  
A	
  trusted	
  data	
  
repository	
  
•  Access	
  (long-­‐term	
  
archival)	
  
Data	
  Sharing	
  needs	
  to	
  support	
  data	
  
discovery,	
  referencing,	
  access,	
  and	
  reuse	
  	
  	
  
	
  
 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  dataverse.org	
  
	
  
Open-­‐source	
  soVware	
  developed	
  at	
  Harvard’s	
  IQSS	
  since	
  2006	
  
Used	
  to	
  share,	
  publish,	
  cite	
  and	
  archive	
  research	
  data	
  
Installed	
  in	
  12	
  sites	
  world	
  wide	
  
Serving	
  100s	
  of	
  universi=es	
  and	
  organiza=ons	
  
Harvard	
  Dataverse:	
  dataverse.harvard.edu	
  
Started	
  as	
  a	
  community	
  repository	
  for	
  Social	
  Science	
  
Now	
  open	
  to	
  all	
  research	
  fields	
  and	
  all	
  researchers	
  
More	
  than	
  1300	
  dataverses	
  
More	
  than	
  59,000	
  datasets	
  
More	
  than	
  1,400,000	
  downloads	
  
	
  
	
  
	
  
Data	
  Sharing	
  with	
  Dataverse	
  	
  
Now	
  	
  
•  No	
  sensi=ve	
  data	
  
•  Seldom	
  versioning	
  
•  Datasets	
  up	
  to	
  ~GB	
  
The	
  Next	
  5	
  Years	
  
•  Highly-­‐sensi=ve	
  data	
  
•  Streaming	
  or	
  frequently	
  
updated	
  data	
  
•  Datasets	
  >	
  GBs,	
  TBs,	
  PBs	
  
–  Thousands	
  of	
  files	
  per	
  dataset	
  	
  
–  Large	
  dataset	
  in	
  a	
  Big	
  Data,	
  
NoSQL	
  storage	
  (MongoDB,	
  
Cassandra,	
  Lucene)	
  
Large-­‐scale	
  data	
  sharing	
  needs	
  to	
  
con=nue	
  suppor=ng	
  discovery,	
  
referencing,	
  access	
  and	
  reuse.	
  	
  	
  
Adhering	
  to	
  the	
  same	
  high	
  standards	
  
for	
  large-­‐scale	
  data	
  	
  
•  Metadata	
  for	
  discovery:	
  
–  cita=on	
  metadata	
  
–  domain-­‐specific	
  descrip=ve	
  metadata	
  
–  file-­‐level	
  or	
  variable	
  metadata	
  
•  Data	
  cita=on	
  for	
  reference	
  and	
  access:	
  
–  for	
  en=re	
  dataset	
  and	
  for	
  subsets	
  of	
  the	
  dataset	
  
(based	
  on	
  =me	
  of	
  retrieval	
  or	
  variables	
  selected)	
  
•  Fast	
  queries,	
  data	
  explora=on	
  and	
  visualiza=ons	
  
for	
  reuse:	
  
–  	
  might	
  not	
  be	
  able	
  to	
  download	
  en=re	
  dataset	
  
Data	
  retrieval,	
  explora=ons	
  and	
  
visualiza=ons	
  of	
  large-­‐scale	
  datasets	
  
require	
  data	
  repositories	
  be	
  closer	
  to	
  
compu=ng	
  resources.	
  
Current	
  collabora=ons	
  to	
  address	
  
the	
  next	
  challenges	
  in	
  data	
  sharing	
  
SB	
  Grid	
  Data	
  Repository	
  
(HMS,	
  IQSS)	
  
Social	
  Science	
  Big	
  Data	
  (IQSS)	
  
Data	
  Provenance	
  (SEAS,	
  IQSS)	
  
Privacy	
  Tools	
  to	
  share	
  
sensi=ve	
  data	
  (SEAS,	
  
Berkman,	
  Privacy	
  Lab,	
  IQSS,	
  
MIT)	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
Sharing	
  and	
  Preserving	
  Large	
  Structural	
  Biology	
  Data	
  
Funded	
  by	
  	
  hhps://data.sbgrid.org/	
  
Structural	
  Biology	
  
Primary	
  Data	
  
1	
  Dataset	
  is	
  180-­‐360	
  images	
  of	
  
X-­‐ray	
  diffrac=on	
  data,	
  3.5-­‐7	
  GB;	
  
~	
  1TB	
  per	
  dataset,	
  with	
  a	
  total	
  
up	
  to	
  100	
  PBs	
  
Integra=on	
  with	
  Dataverse:	
  
	
  	
  
●  Long-­‐term	
  access	
  
●  Formal	
  Data	
  Cita=on	
  
●  Standard	
  Metadata	
  
●  Data	
  Explora=on	
  (OME)	
  
●  Preserva=on,	
  with	
  copies	
  
in	
  mul=ple	
  sites	
  (following	
  
dataPASS	
  approach)	
  
Dataverse	
  on	
  the	
  Massachusehs	
  Open	
  Cloud	
  
(MOC):	
  Compu=ng	
  closer	
  to	
  data	
  storage	
  
Current	
  Architecture	
   On	
  the	
  MOC	
  
Network	
  
File	
  System	
  
(data	
  files)	
  
UI	
  Layer	
  	
  
(PrimeFaces,	
  js)	
  
Applica=on	
  Logic	
  	
  
(Java	
  EE)	
  	
  
A	
  
P	
  
I	
  
PostgreSQL	
  
(user	
  data,	
  
metadata)	
  
Solr	
  
(Index)	
  
RServe	
  
(R	
  ingest,	
  
analysis)	
  
COMPUTE	
  SERVICES	
  
(R,	
  Python,	
  Spark,	
  
Hadoop,	
  …)	
   CINDER	
  	
  
block	
  storage	
  
SWIFT	
  
object	
  storage	
  
UI	
  Layer	
  	
  
(PrimeFaces,	
  js)	
  
Applica=on	
  Logic	
  	
  
(Java	
  EE)	
  	
  
A	
  
P	
  
I	
  
PostgreSQL	
  
(user	
  data,	
  
metadata)	
  
Solr	
  
(Index)	
  
Dataverse	
  
Sharing	
  Sensi=ve	
  Data	
  with	
  Confidence:	
  
DataTags	
  System	
  
DataTag:	
  A	
  set	
  of	
  security	
  features	
  and	
  access	
  requirements	
  for	
  file	
  handling	
  
Sweeney,	
  Crosas,	
  Bar-­‐Sinai,	
  2015,	
  “Sharing	
  Sensi=ve	
  Data	
  with	
  Confidence:	
  The	
  DataTags	
  System”	
  Technology	
  Science	
  
Data	
  Sharing	
  Workflow	
  	
  
for	
  Sensi=ve	
  Data	
  	
  
Sensi=ve	
  
Dataset	
  
Sensi=ve	
  
Dataset	
  
Direct	
  
Access	
  
Privacy	
  
Preserving	
  
Access	
  
hhp://datatags.org	
  
hhp://privacytools.seas.harvard.edu	
  
Authorized	
  
Signed	
  DUA	
  
THANKS	
  
	
  
Piotrek	
  Sliz	
  (SBGrid,	
  HMS),	
  Latanya	
  Sweeney	
  (Data	
  Privacy	
  Lab,	
  Harvard),	
  Dataverse	
  team	
  (IQSS,	
  Harvard)	
  
@mercecrosas	
  

More Related Content

What's hot

Data Citation Implementation at Dataverse
Data Citation Implementation at DataverseData Citation Implementation at Dataverse
Data Citation Implementation at DataverseMerce Crosas
 
Dataverse on the MOC
Dataverse on the MOCDataverse on the MOC
Dataverse on the MOCMerce Crosas
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsMerce Crosas
 
NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016Susanna-Assunta Sansone
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCarly Strasser
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
Research Data Management at USU
Research Data Management at USUResearch Data Management at USU
Research Data Management at USUAndrea Payant
 
Mitigating the Risk: identifying Strategic University Partnerships for Compli...
Mitigating the Risk: identifying Strategic University Partnerships for Compli...Mitigating the Risk: identifying Strategic University Partnerships for Compli...
Mitigating the Risk: identifying Strategic University Partnerships for Compli...Andrea Payant
 
dkNET Poster ENDO 2016
dkNET Poster ENDO 2016 dkNET Poster ENDO 2016
dkNET Poster ENDO 2016 dkNET
 
Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Todd Vision
 
Lightning Talk, Konkiel: Bootstrapping Library Data Management Services for E...
Lightning Talk, Konkiel: Bootstrapping Library Data Management Services for E...Lightning Talk, Konkiel: Bootstrapping Library Data Management Services for E...
Lightning Talk, Konkiel: Bootstrapping Library Data Management Services for E...ASIS&T
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
 
The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...Hilmar Lapp
 
The expanding dataverse
The expanding dataverseThe expanding dataverse
The expanding dataverseMerce Crosas
 
Collaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and softwareCollaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and softwareAnita de Waard
 
A Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data RepositoriesA Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data RepositoriesLIBER Europe
 
The Rocky Road to Reuse
The Rocky Road to ReuseThe Rocky Road to Reuse
The Rocky Road to ReuseAnita de Waard
 
dkNET ESP Meeting - February 2016
dkNET ESP Meeting - February 2016dkNET ESP Meeting - February 2016
dkNET ESP Meeting - February 2016dkNET
 

What's hot (20)

Data Citation Implementation at Dataverse
Data Citation Implementation at DataverseData Citation Implementation at Dataverse
Data Citation Implementation at Dataverse
 
Preparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR PrinciplesPreparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR Principles
 
Dataverse on the MOC
Dataverse on the MOCDataverse on the MOC
Dataverse on the MOC
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTags
 
NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016NIH BD2K DataMed metadata model - Force11, 2016
NIH BD2K DataMed metadata model - Force11, 2016
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP Students
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Research Data Management at USU
Research Data Management at USUResearch Data Management at USU
Research Data Management at USU
 
Mitigating the Risk: identifying Strategic University Partnerships for Compli...
Mitigating the Risk: identifying Strategic University Partnerships for Compli...Mitigating the Risk: identifying Strategic University Partnerships for Compli...
Mitigating the Risk: identifying Strategic University Partnerships for Compli...
 
Levine - Data Curation; Ethics and Legal Considerations
Levine - Data Curation; Ethics and Legal ConsiderationsLevine - Data Curation; Ethics and Legal Considerations
Levine - Data Curation; Ethics and Legal Considerations
 
dkNET Poster ENDO 2016
dkNET Poster ENDO 2016 dkNET Poster ENDO 2016
dkNET Poster ENDO 2016
 
Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck
 
Lightning Talk, Konkiel: Bootstrapping Library Data Management Services for E...
Lightning Talk, Konkiel: Bootstrapping Library Data Management Services for E...Lightning Talk, Konkiel: Bootstrapping Library Data Management Services for E...
Lightning Talk, Konkiel: Bootstrapping Library Data Management Services for E...
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
 
The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...The blessing and the curse: handshaking between general and specialist data r...
The blessing and the curse: handshaking between general and specialist data r...
 
The expanding dataverse
The expanding dataverseThe expanding dataverse
The expanding dataverse
 
Collaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and softwareCollaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and software
 
A Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data RepositoriesA Data Citation Roadmap for Scholarly Data Repositories
A Data Citation Roadmap for Scholarly Data Repositories
 
The Rocky Road to Reuse
The Rocky Road to ReuseThe Rocky Road to Reuse
The Rocky Road to Reuse
 
dkNET ESP Meeting - February 2016
dkNET ESP Meeting - February 2016dkNET ESP Meeting - February 2016
dkNET ESP Meeting - February 2016
 

Viewers also liked

Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the webJose Manuel Gómez-Pérez
 
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015OSTHUS
 
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...Merce Crosas
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingMerce Crosas
 
Internet of Things: Challenges and Issues
Internet of Things: Challenges and IssuesInternet of Things: Challenges and Issues
Internet of Things: Challenges and Issuesrjain51
 
Internet of Things - Privacy and Security issues
Internet of Things - Privacy and Security issuesInternet of Things - Privacy and Security issues
Internet of Things - Privacy and Security issuesPierluigi Paganini
 

Viewers also liked (6)

Scientific data management from the lab to the web
Scientific data management   from the lab to the webScientific data management   from the lab to the web
Scientific data management from the lab to the web
 
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015
OSTHUS-Allotrope presents "Laboratory Informatics Strategy" at SmartLab 2015
 
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data Sharing
 
Internet of Things: Challenges and Issues
Internet of Things: Challenges and IssuesInternet of Things: Challenges and Issues
Internet of Things: Challenges and Issues
 
Internet of Things - Privacy and Security issues
Internet of Things - Privacy and Security issuesInternet of Things - Privacy and Security issues
Internet of Things - Privacy and Security issues
 

Similar to Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive Data

December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...DeVonne Parks, CEM
 
Data Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseData Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseMicah Altman
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceGlobus
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing dataWorld Agroforestry (ICRAF)
 
It summit dataverse-bigdata-mercecrosas
It summit dataverse-bigdata-mercecrosasIt summit dataverse-bigdata-mercecrosas
It summit dataverse-bigdata-mercecrosaskevin_donovan
 
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...DuraSpace
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunitiesvty
 
Dataverse for Journals
Dataverse for JournalsDataverse for Journals
Dataverse for JournalsMerce Crosas
 
Data Management for Grown Ups
Data Management for Grown UpsData Management for Grown Ups
Data Management for Grown UpsAll Things Open
 
IEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfIEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfssuserff37aa
 
Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...rmacneil88
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014ResearchSpace
 
Data publishing at the UQ Library
Data publishing at the UQ LibraryData publishing at the UQ Library
Data publishing at the UQ LibraryARDC
 
Building and Extensible Storage Ecosystem with WOS
Building and Extensible Storage Ecosystem with WOSBuilding and Extensible Storage Ecosystem with WOS
Building and Extensible Storage Ecosystem with WOSinside-BigData.com
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositoriesChris Rusbridge
 
XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives
XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives
XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives John Towns
 
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Natsuko Nicholls
 
Evolving NASA’s Data and Information Systems for Earth Science
Evolving NASA’s Data and Information Systems for Earth ScienceEvolving NASA’s Data and Information Systems for Earth Science
Evolving NASA’s Data and Information Systems for Earth Scienceinside-BigData.com
 

Similar to Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive Data (20)

December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
 
Data Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseData Publishing Workflows with Dataverse
Data Publishing Workflows with Dataverse
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials Science
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing data
 
It summit dataverse-bigdata-mercecrosas
It summit dataverse-bigdata-mercecrosasIt summit dataverse-bigdata-mercecrosas
It summit dataverse-bigdata-mercecrosas
 
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
 
Cloud Dataverse
Cloud DataverseCloud Dataverse
Cloud Dataverse
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunities
 
Dataverse for Journals
Dataverse for JournalsDataverse for Journals
Dataverse for Journals
 
Data Management for Grown Ups
Data Management for Grown UpsData Management for Grown Ups
Data Management for Grown Ups
 
IEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfIEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdf
 
Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...Integrating an electronic lab notebook with a data repository; American Chemi...
Integrating an electronic lab notebook with a data repository; American Chemi...
 
Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014Elns and repositories, American Chemical Society, Dallas, March 2014
Elns and repositories, American Chemical Society, Dallas, March 2014
 
Data publishing at the UQ Library
Data publishing at the UQ LibraryData publishing at the UQ Library
Data publishing at the UQ Library
 
Building and Extensible Storage Ecosystem with WOS
Building and Extensible Storage Ecosystem with WOSBuilding and Extensible Storage Ecosystem with WOS
Building and Extensible Storage Ecosystem with WOS
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositories
 
XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives
XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives
XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives
 
Introduction of Linked Data for Science
Introduction of Linked Data for ScienceIntroduction of Linked Data for Science
Introduction of Linked Data for Science
 
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
 
Evolving NASA’s Data and Information Systems for Earth Science
Evolving NASA’s Data and Information Systems for Earth ScienceEvolving NASA’s Data and Information Systems for Earth Science
Evolving NASA’s Data and Information Systems for Earth Science
 

More from Merce Crosas

Practical Implementation of research data policies: Solutions with Dataverse
Practical Implementation of research data policies: Solutions with DataversePractical Implementation of research data policies: Solutions with Dataverse
Practical Implementation of research data policies: Solutions with DataverseMerce Crosas
 
Research Data Management @Harvard
Research Data Management @HarvardResearch Data Management @Harvard
Research Data Management @HarvardMerce Crosas
 
Cloud Dataverse: A Data repository platform for an OpenStack Cloud
Cloud Dataverse: A Data repository platform for an OpenStack CloudCloud Dataverse: A Data repository platform for an OpenStack Cloud
Cloud Dataverse: A Data repository platform for an OpenStack CloudMerce Crosas
 
Can data access combat fake news?
Can data access combat fake news?Can data access combat fake news?
Can data access combat fake news?Merce Crosas
 
The Data Lifecycle (Harvard DataFest)
The Data Lifecycle (Harvard DataFest)The Data Lifecycle (Harvard DataFest)
The Data Lifecycle (Harvard DataFest)Merce Crosas
 
Making Data Accessible
Making Data AccessibleMaking Data Accessible
Making Data AccessibleMerce Crosas
 
Abcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasAbcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasMerce Crosas
 
A very Brief History of Communicating Science
A very Brief History of Communicating ScienceA very Brief History of Communicating Science
A very Brief History of Communicating ScienceMerce Crosas
 
Dataverse hpdm symposium
Dataverse   hpdm symposiumDataverse   hpdm symposium
Dataverse hpdm symposiumMerce Crosas
 
Collaboration in science and technology it summit
Collaboration in science and technology   it summitCollaboration in science and technology   it summit
Collaboration in science and technology it summitMerce Crosas
 
Collaboration in science and technology
Collaboration in science and technologyCollaboration in science and technology
Collaboration in science and technologyMerce Crosas
 
Force11 jddcp intro
Force11  jddcp introForce11  jddcp intro
Force11 jddcp introMerce Crosas
 

More from Merce Crosas (12)

Practical Implementation of research data policies: Solutions with Dataverse
Practical Implementation of research data policies: Solutions with DataversePractical Implementation of research data policies: Solutions with Dataverse
Practical Implementation of research data policies: Solutions with Dataverse
 
Research Data Management @Harvard
Research Data Management @HarvardResearch Data Management @Harvard
Research Data Management @Harvard
 
Cloud Dataverse: A Data repository platform for an OpenStack Cloud
Cloud Dataverse: A Data repository platform for an OpenStack CloudCloud Dataverse: A Data repository platform for an OpenStack Cloud
Cloud Dataverse: A Data repository platform for an OpenStack Cloud
 
Can data access combat fake news?
Can data access combat fake news?Can data access combat fake news?
Can data access combat fake news?
 
The Data Lifecycle (Harvard DataFest)
The Data Lifecycle (Harvard DataFest)The Data Lifecycle (Harvard DataFest)
The Data Lifecycle (Harvard DataFest)
 
Making Data Accessible
Making Data AccessibleMaking Data Accessible
Making Data Accessible
 
Abcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasAbcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosas
 
A very Brief History of Communicating Science
A very Brief History of Communicating ScienceA very Brief History of Communicating Science
A very Brief History of Communicating Science
 
Dataverse hpdm symposium
Dataverse   hpdm symposiumDataverse   hpdm symposium
Dataverse hpdm symposium
 
Collaboration in science and technology it summit
Collaboration in science and technology   it summitCollaboration in science and technology   it summit
Collaboration in science and technology it summit
 
Collaboration in science and technology
Collaboration in science and technologyCollaboration in science and technology
Collaboration in science and technology
 
Force11 jddcp intro
Force11  jddcp introForce11  jddcp intro
Force11 jddcp intro
 

Recently uploaded

+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制vexqp
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjurptikerjasaptiker
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制vexqp
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...gajnagarg
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制vexqp
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 

Recently uploaded (20)

+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 

Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive Data

  • 1. ADDRESSING  THE  NEXT  CHALLENGES  IN   DATA  SHARING:   LARGE-­‐SCALE  DATA  AND  SENSITIVE  DATA   Mercè  Crosas,  Ph.D.   Chief  Data  Science  and  Technology  Officer   Ins=tute  for  Quan=ta=ve  Social  Science   Harvard  University   @mercecrosas    
  • 2. Data  sharing:     good  for  you  and  good  for  the  world   Researchers   Get  credit  for  their  data   Publishers  and  Journals   Verify  published  work   Federal  funding  agencies   Make  public  assets   accessible   Science   Validate,  reuse  and   extend  previous  work  
  • 3. Data   Sharing  (or   Publishing)   A  formal  data   cita=on   •  Reference   •  Access  (persistent   iden=fier)   Informa=on   about  the  data   (metadata)   •  Discovery   •  Use   A  trusted  data   repository   •  Access  (long-­‐term   archival)   Data  Sharing  needs  to  support  data   discovery,  referencing,  access,  and  reuse        
  • 4.                                                                                                                                        dataverse.org     Open-­‐source  soVware  developed  at  Harvard’s  IQSS  since  2006   Used  to  share,  publish,  cite  and  archive  research  data   Installed  in  12  sites  world  wide   Serving  100s  of  universi=es  and  organiza=ons  
  • 5. Harvard  Dataverse:  dataverse.harvard.edu   Started  as  a  community  repository  for  Social  Science   Now  open  to  all  research  fields  and  all  researchers   More  than  1300  dataverses   More  than  59,000  datasets   More  than  1,400,000  downloads        
  • 6. Data  Sharing  with  Dataverse     Now     •  No  sensi=ve  data   •  Seldom  versioning   •  Datasets  up  to  ~GB   The  Next  5  Years   •  Highly-­‐sensi=ve  data   •  Streaming  or  frequently   updated  data   •  Datasets  >  GBs,  TBs,  PBs   –  Thousands  of  files  per  dataset     –  Large  dataset  in  a  Big  Data,   NoSQL  storage  (MongoDB,   Cassandra,  Lucene)  
  • 7. Large-­‐scale  data  sharing  needs  to   con=nue  suppor=ng  discovery,   referencing,  access  and  reuse.      
  • 8. Adhering  to  the  same  high  standards   for  large-­‐scale  data     •  Metadata  for  discovery:   –  cita=on  metadata   –  domain-­‐specific  descrip=ve  metadata   –  file-­‐level  or  variable  metadata   •  Data  cita=on  for  reference  and  access:   –  for  en=re  dataset  and  for  subsets  of  the  dataset   (based  on  =me  of  retrieval  or  variables  selected)   •  Fast  queries,  data  explora=on  and  visualiza=ons   for  reuse:   –   might  not  be  able  to  download  en=re  dataset  
  • 9. Data  retrieval,  explora=ons  and   visualiza=ons  of  large-­‐scale  datasets   require  data  repositories  be  closer  to   compu=ng  resources.  
  • 10. Current  collabora=ons  to  address   the  next  challenges  in  data  sharing   SB  Grid  Data  Repository   (HMS,  IQSS)   Social  Science  Big  Data  (IQSS)   Data  Provenance  (SEAS,  IQSS)   Privacy  Tools  to  share   sensi=ve  data  (SEAS,   Berkman,  Privacy  Lab,  IQSS,   MIT)                    
  • 11. Sharing  and  Preserving  Large  Structural  Biology  Data   Funded  by    hhps://data.sbgrid.org/  
  • 12. Structural  Biology   Primary  Data   1  Dataset  is  180-­‐360  images  of   X-­‐ray  diffrac=on  data,  3.5-­‐7  GB;   ~  1TB  per  dataset,  with  a  total   up  to  100  PBs   Integra=on  with  Dataverse:       ●  Long-­‐term  access   ●  Formal  Data  Cita=on   ●  Standard  Metadata   ●  Data  Explora=on  (OME)   ●  Preserva=on,  with  copies   in  mul=ple  sites  (following   dataPASS  approach)  
  • 13. Dataverse  on  the  Massachusehs  Open  Cloud   (MOC):  Compu=ng  closer  to  data  storage   Current  Architecture   On  the  MOC   Network   File  System   (data  files)   UI  Layer     (PrimeFaces,  js)   Applica=on  Logic     (Java  EE)     A   P   I   PostgreSQL   (user  data,   metadata)   Solr   (Index)   RServe   (R  ingest,   analysis)   COMPUTE  SERVICES   (R,  Python,  Spark,   Hadoop,  …)   CINDER     block  storage   SWIFT   object  storage   UI  Layer     (PrimeFaces,  js)   Applica=on  Logic     (Java  EE)     A   P   I   PostgreSQL   (user  data,   metadata)   Solr   (Index)   Dataverse  
  • 14. Sharing  Sensi=ve  Data  with  Confidence:   DataTags  System   DataTag:  A  set  of  security  features  and  access  requirements  for  file  handling   Sweeney,  Crosas,  Bar-­‐Sinai,  2015,  “Sharing  Sensi=ve  Data  with  Confidence:  The  DataTags  System”  Technology  Science  
  • 15. Data  Sharing  Workflow     for  Sensi=ve  Data     Sensi=ve   Dataset   Sensi=ve   Dataset   Direct   Access   Privacy   Preserving   Access   hhp://datatags.org   hhp://privacytools.seas.harvard.edu   Authorized   Signed  DUA  
  • 16. THANKS     Piotrek  Sliz  (SBGrid,  HMS),  Latanya  Sweeney  (Data  Privacy  Lab,  Harvard),  Dataverse  team  (IQSS,  Harvard)   @mercecrosas