SlideShare a Scribd company logo
1 of 16
Download to read offline
ADDRESSING	
  THE	
  NEXT	
  CHALLENGES	
  IN	
  
DATA	
  SHARING:	
  
LARGE-­‐SCALE	
  DATA	
  AND	
  SENSITIVE	
  DATA	
  
Mercè	
  Crosas,	
  Ph.D.	
  
Chief	
  Data	
  Science	
  and	
  Technology	
  Officer	
  
Ins=tute	
  for	
  Quan=ta=ve	
  Social	
  Science	
  
Harvard	
  University	
  
@mercecrosas	
  
	
  
Data	
  sharing:	
  	
  
good	
  for	
  you	
  and	
  good	
  for	
  the	
  world	
  
Researchers	
   Get	
  credit	
  for	
  their	
  data	
  
Publishers	
  and	
  Journals	
   Verify	
  published	
  work	
  
Federal	
  funding	
  agencies	
  
Make	
  public	
  assets	
  
accessible	
  
Science	
  
Validate,	
  reuse	
  and	
  
extend	
  previous	
  work	
  
Data	
  
Sharing	
  (or	
  
Publishing)	
  
A	
  formal	
  data	
  
cita=on	
  
•  Reference	
  
•  Access	
  (persistent	
  
iden=fier)	
  
Informa=on	
  
about	
  the	
  data	
  
(metadata)	
  
•  Discovery	
  
•  Use	
  
A	
  trusted	
  data	
  
repository	
  
•  Access	
  (long-­‐term	
  
archival)	
  
Data	
  Sharing	
  needs	
  to	
  support	
  data	
  
discovery,	
  referencing,	
  access,	
  and	
  reuse	
  	
  	
  
	
  
 	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  dataverse.org	
  
	
  
Open-­‐source	
  soVware	
  developed	
  at	
  Harvard’s	
  IQSS	
  since	
  2006	
  
Used	
  to	
  share,	
  publish,	
  cite	
  and	
  archive	
  research	
  data	
  
Installed	
  in	
  12	
  sites	
  world	
  wide	
  
Serving	
  100s	
  of	
  universi=es	
  and	
  organiza=ons	
  
Harvard	
  Dataverse:	
  dataverse.harvard.edu	
  
Started	
  as	
  a	
  community	
  repository	
  for	
  Social	
  Science	
  
Now	
  open	
  to	
  all	
  research	
  fields	
  and	
  all	
  researchers	
  
More	
  than	
  1300	
  dataverses	
  
More	
  than	
  59,000	
  datasets	
  
More	
  than	
  1,400,000	
  downloads	
  
	
  
	
  
	
  
Data	
  Sharing	
  with	
  Dataverse	
  	
  
Now	
  	
  
•  No	
  sensi=ve	
  data	
  
•  Seldom	
  versioning	
  
•  Datasets	
  up	
  to	
  ~GB	
  
The	
  Next	
  5	
  Years	
  
•  Highly-­‐sensi=ve	
  data	
  
•  Streaming	
  or	
  frequently	
  
updated	
  data	
  
•  Datasets	
  >	
  GBs,	
  TBs,	
  PBs	
  
–  Thousands	
  of	
  files	
  per	
  dataset	
  	
  
–  Large	
  dataset	
  in	
  a	
  Big	
  Data,	
  
NoSQL	
  storage	
  (MongoDB,	
  
Cassandra,	
  Lucene)	
  
Large-­‐scale	
  data	
  sharing	
  needs	
  to	
  
con=nue	
  suppor=ng	
  discovery,	
  
referencing,	
  access	
  and	
  reuse.	
  	
  	
  
Adhering	
  to	
  the	
  same	
  high	
  standards	
  
for	
  large-­‐scale	
  data	
  	
  
•  Metadata	
  for	
  discovery:	
  
–  cita=on	
  metadata	
  
–  domain-­‐specific	
  descrip=ve	
  metadata	
  
–  file-­‐level	
  or	
  variable	
  metadata	
  
•  Data	
  cita=on	
  for	
  reference	
  and	
  access:	
  
–  for	
  en=re	
  dataset	
  and	
  for	
  subsets	
  of	
  the	
  dataset	
  
(based	
  on	
  =me	
  of	
  retrieval	
  or	
  variables	
  selected)	
  
•  Fast	
  queries,	
  data	
  explora=on	
  and	
  visualiza=ons	
  
for	
  reuse:	
  
–  	
  might	
  not	
  be	
  able	
  to	
  download	
  en=re	
  dataset	
  
Data	
  retrieval,	
  explora=ons	
  and	
  
visualiza=ons	
  of	
  large-­‐scale	
  datasets	
  
require	
  data	
  repositories	
  be	
  closer	
  to	
  
compu=ng	
  resources.	
  
Current	
  collabora=ons	
  to	
  address	
  
the	
  next	
  challenges	
  in	
  data	
  sharing	
  
SB	
  Grid	
  Data	
  Repository	
  
(HMS,	
  IQSS)	
  
Social	
  Science	
  Big	
  Data	
  (IQSS)	
  
Data	
  Provenance	
  (SEAS,	
  IQSS)	
  
Privacy	
  Tools	
  to	
  share	
  
sensi=ve	
  data	
  (SEAS,	
  
Berkman,	
  Privacy	
  Lab,	
  IQSS,	
  
MIT)	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
Sharing	
  and	
  Preserving	
  Large	
  Structural	
  Biology	
  Data	
  
Funded	
  by	
  	
  hhps://data.sbgrid.org/	
  
Structural	
  Biology	
  
Primary	
  Data	
  
1	
  Dataset	
  is	
  180-­‐360	
  images	
  of	
  
X-­‐ray	
  diffrac=on	
  data,	
  3.5-­‐7	
  GB;	
  
~	
  1TB	
  per	
  dataset,	
  with	
  a	
  total	
  
up	
  to	
  100	
  PBs	
  
Integra=on	
  with	
  Dataverse:	
  
	
  	
  
●  Long-­‐term	
  access	
  
●  Formal	
  Data	
  Cita=on	
  
●  Standard	
  Metadata	
  
●  Data	
  Explora=on	
  (OME)	
  
●  Preserva=on,	
  with	
  copies	
  
in	
  mul=ple	
  sites	
  (following	
  
dataPASS	
  approach)	
  
Dataverse	
  on	
  the	
  Massachusehs	
  Open	
  Cloud	
  
(MOC):	
  Compu=ng	
  closer	
  to	
  data	
  storage	
  
Current	
  Architecture	
   On	
  the	
  MOC	
  
Network	
  
File	
  System	
  
(data	
  files)	
  
UI	
  Layer	
  	
  
(PrimeFaces,	
  js)	
  
Applica=on	
  Logic	
  	
  
(Java	
  EE)	
  	
  
A	
  
P	
  
I	
  
PostgreSQL	
  
(user	
  data,	
  
metadata)	
  
Solr	
  
(Index)	
  
RServe	
  
(R	
  ingest,	
  
analysis)	
  
COMPUTE	
  SERVICES	
  
(R,	
  Python,	
  Spark,	
  
Hadoop,	
  …)	
   CINDER	
  	
  
block	
  storage	
  
SWIFT	
  
object	
  storage	
  
UI	
  Layer	
  	
  
(PrimeFaces,	
  js)	
  
Applica=on	
  Logic	
  	
  
(Java	
  EE)	
  	
  
A	
  
P	
  
I	
  
PostgreSQL	
  
(user	
  data,	
  
metadata)	
  
Solr	
  
(Index)	
  
Dataverse	
  
Sharing	
  Sensi=ve	
  Data	
  with	
  Confidence:	
  
DataTags	
  System	
  
DataTag:	
  A	
  set	
  of	
  security	
  features	
  and	
  access	
  requirements	
  for	
  file	
  handling	
  
Sweeney,	
  Crosas,	
  Bar-­‐Sinai,	
  2015,	
  “Sharing	
  Sensi=ve	
  Data	
  with	
  Confidence:	
  The	
  DataTags	
  System”	
  Technology	
  Science	
  
Data	
  Sharing	
  Workflow	
  	
  
for	
  Sensi=ve	
  Data	
  	
  
Sensi=ve	
  
Dataset	
  
Sensi=ve	
  
Dataset	
  
Direct	
  
Access	
  
Privacy	
  
Preserving	
  
Access	
  
hhp://datatags.org	
  
hhp://privacytools.seas.harvard.edu	
  
Authorized	
  
Signed	
  DUA	
  
THANKS	
  
@mercecrosas	
  

More Related Content

What's hot

RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupAnita de Waard
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
The Rocky Road to Reuse
The Rocky Road to ReuseThe Rocky Road to Reuse
The Rocky Road to ReuseAnita de Waard
 
Publishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecyclePublishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecycleAnita de Waard
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data SharingAnita de Waard
 
Implementing Archivematica, research data network
Implementing Archivematica, research data networkImplementing Archivematica, research data network
Implementing Archivematica, research data networkJisc RDM
 
Collaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and softwareCollaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and softwareAnita de Waard
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
 
Big data service architecture: a survey
Big data service architecture: a surveyBig data service architecture: a survey
Big data service architecture: a surveyssuser0191d4
 
Research data spring: extending the OPD to cover RDM
Research data spring: extending the OPD to cover RDMResearch data spring: extending the OPD to cover RDM
Research data spring: extending the OPD to cover RDMJisc RDM
 
Dataverse for Journals
Dataverse for JournalsDataverse for Journals
Dataverse for JournalsMerce Crosas
 
Rots RDAP11 Data Archives in Federal Agencies
Rots RDAP11 Data Archives in Federal AgenciesRots RDAP11 Data Archives in Federal Agencies
Rots RDAP11 Data Archives in Federal AgenciesASIS&T
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataAnita de Waard
 
Dataverse hpdm symposium
Dataverse   hpdm symposiumDataverse   hpdm symposium
Dataverse hpdm symposiumMerce Crosas
 
EPSRC Policy Compliance: What researchers need to know
EPSRC Policy Compliance: What researchers need to knowEPSRC Policy Compliance: What researchers need to know
EPSRC Policy Compliance: What researchers need to knowHistoric Environment Scotland
 

What's hot (20)

RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
 
SMRUDAS
SMRUDAS SMRUDAS
SMRUDAS
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Smith - Developing Campus Stakeholders' Collaborations - Sept 8
Smith - Developing Campus Stakeholders' Collaborations - Sept 8Smith - Developing Campus Stakeholders' Collaborations - Sept 8
Smith - Developing Campus Stakeholders' Collaborations - Sept 8
 
The Rocky Road to Reuse
The Rocky Road to ReuseThe Rocky Road to Reuse
The Rocky Road to Reuse
 
Publishing the Full Research Data Lifecycle
Publishing the Full Research Data LifecyclePublishing the Full Research Data Lifecycle
Publishing the Full Research Data Lifecycle
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data Sharing
 
Implementing Archivematica, research data network
Implementing Archivematica, research data networkImplementing Archivematica, research data network
Implementing Archivematica, research data network
 
Sept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the CloudSept 24 NISO Virtual Conference: Library Data in the Cloud
Sept 24 NISO Virtual Conference: Library Data in the Cloud
 
Collaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and softwareCollaboratively creating a network of ideas, data and software
Collaboratively creating a network of ideas, data and software
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
 
Big data service architecture: a survey
Big data service architecture: a surveyBig data service architecture: a survey
Big data service architecture: a survey
 
Research data spring: extending the OPD to cover RDM
Research data spring: extending the OPD to cover RDMResearch data spring: extending the OPD to cover RDM
Research data spring: extending the OPD to cover RDM
 
Hoffman and Rajan "Metadata: The Importance of Interoperability, and Factors ...
Hoffman and Rajan "Metadata: The Importance of Interoperability, and Factors ...Hoffman and Rajan "Metadata: The Importance of Interoperability, and Factors ...
Hoffman and Rajan "Metadata: The Importance of Interoperability, and Factors ...
 
Dataverse for Journals
Dataverse for JournalsDataverse for Journals
Dataverse for Journals
 
BioSharing - Update - Feb2016
BioSharing - Update - Feb2016BioSharing - Update - Feb2016
BioSharing - Update - Feb2016
 
Rots RDAP11 Data Archives in Federal Agencies
Rots RDAP11 Data Archives in Federal AgenciesRots RDAP11 Data Archives in Federal Agencies
Rots RDAP11 Data Archives in Federal Agencies
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
 
Dataverse hpdm symposium
Dataverse   hpdm symposiumDataverse   hpdm symposium
Dataverse hpdm symposium
 
EPSRC Policy Compliance: What researchers need to know
EPSRC Policy Compliance: What researchers need to knowEPSRC Policy Compliance: What researchers need to know
EPSRC Policy Compliance: What researchers need to know
 

Similar to December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Part 1 Large Data Sets

Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Merce Crosas
 
Dataverse on the MOC
Dataverse on the MOCDataverse on the MOC
Dataverse on the MOCMerce Crosas
 
Data Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseData Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseMicah Altman
 
Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumMerce Crosas
 
The expanding dataverse
The expanding dataverseThe expanding dataverse
The expanding dataverseMerce Crosas
 
It summit dataverse-bigdata-mercecrosas
It summit dataverse-bigdata-mercecrosasIt summit dataverse-bigdata-mercecrosas
It summit dataverse-bigdata-mercecrosaskevin_donovan
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceGlobus
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing dataWorld Agroforestry (ICRAF)
 
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Merce Crosas
 
Data Management for Grown Ups
Data Management for Grown UpsData Management for Grown Ups
Data Management for Grown UpsAll Things Open
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsMerce Crosas
 
IEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfIEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfssuserff37aa
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositoriesChris Rusbridge
 
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...DuraSpace
 
Data publishing at the UQ Library
Data publishing at the UQ LibraryData publishing at the UQ Library
Data publishing at the UQ LibraryARDC
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunitiesvty
 
XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives
XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives
XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives John Towns
 
Connecting Dataverse with the Research Life Cycle
Connecting Dataverse with the Research Life CycleConnecting Dataverse with the Research Life Cycle
Connecting Dataverse with the Research Life CycleMerce Crosas
 
Data Citation Implementation at Dataverse
Data Citation Implementation at DataverseData Citation Implementation at Dataverse
Data Citation Implementation at DataverseMerce Crosas
 
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Natsuko Nicholls
 

Similar to December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Part 1 Large Data Sets (20)

Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
Addressing the New Challenges in Data Sharing: Large-Scale Data and Sensitive...
 
Dataverse on the MOC
Dataverse on the MOCDataverse on the MOC
Dataverse on the MOC
 
Data Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseData Publishing Workflows with Dataverse
Data Publishing Workflows with Dataverse
 
Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access Symposium
 
The expanding dataverse
The expanding dataverseThe expanding dataverse
The expanding dataverse
 
It summit dataverse-bigdata-mercecrosas
It summit dataverse-bigdata-mercecrosasIt summit dataverse-bigdata-mercecrosas
It summit dataverse-bigdata-mercecrosas
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials Science
 
Research methods group accelarating impact by sharing data
Research methods group  accelarating impact by sharing dataResearch methods group  accelarating impact by sharing data
Research methods group accelarating impact by sharing data
 
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
 
Data Management for Grown Ups
Data Management for Grown UpsData Management for Grown Ups
Data Management for Grown Ups
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTags
 
IEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdfIEEE_BigData2014-Lee.pdf
IEEE_BigData2014-Lee.pdf
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositories
 
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
10-15-13 “Metadata and Repository Services for Research Data Curation” Presen...
 
Data publishing at the UQ Library
Data publishing at the UQ LibraryData publishing at the UQ Library
Data publishing at the UQ Library
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunities
 
XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives
XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives
XSEDE National Cyberinfrastructure, NIST, and Supporting NCSI Objectives
 
Connecting Dataverse with the Research Life Cycle
Connecting Dataverse with the Research Life CycleConnecting Dataverse with the Research Life Cycle
Connecting Dataverse with the Research Life Cycle
 
Data Citation Implementation at Dataverse
Data Citation Implementation at DataverseData Citation Implementation at Dataverse
Data Citation Implementation at Dataverse
 
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
Enriching Scholarship 2014 Beyond the Journal Article: Publishing and Citing ...
 

More from DeVonne Parks, CEM

January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...
January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...
January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...DeVonne Parks, CEM
 
January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...
January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...
January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...DeVonne Parks, CEM
 
January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...
January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...
January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...DeVonne Parks, CEM
 
December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types Pa...
December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Pa...December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Pa...
December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types Pa...DeVonne Parks, CEM
 
December 16, 2015 NISO Two-Part Webinar: Emerging Resource Types - Part 2 Equ...
December 16, 2015 NISO Two-Part Webinar: Emerging Resource Types - Part 2 Equ...December 16, 2015 NISO Two-Part Webinar: Emerging Resource Types - Part 2 Equ...
December 16, 2015 NISO Two-Part Webinar: Emerging Resource Types - Part 2 Equ...DeVonne Parks, CEM
 
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...DeVonne Parks, CEM
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...DeVonne Parks, CEM
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...DeVonne Parks, CEM
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...DeVonne Parks, CEM
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...DeVonne Parks, CEM
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...DeVonne Parks, CEM
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...DeVonne Parks, CEM
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...DeVonne Parks, CEM
 

More from DeVonne Parks, CEM (13)

January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...
January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...
January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...
 
January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...
January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...
January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...
 
January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...
January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...
January 13, 2016 NISO Webinar: Ensuring the Scholarly Record: Scholarly Retra...
 
December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types Pa...
December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Pa...December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types  Pa...
December 16, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types Pa...
 
December 16, 2015 NISO Two-Part Webinar: Emerging Resource Types - Part 2 Equ...
December 16, 2015 NISO Two-Part Webinar: Emerging Resource Types - Part 2 Equ...December 16, 2015 NISO Two-Part Webinar: Emerging Resource Types - Part 2 Equ...
December 16, 2015 NISO Two-Part Webinar: Emerging Resource Types - Part 2 Equ...
 
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
 

Recently uploaded

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 

Recently uploaded (20)

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 

December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Part 1 Large Data Sets

  • 1. ADDRESSING  THE  NEXT  CHALLENGES  IN   DATA  SHARING:   LARGE-­‐SCALE  DATA  AND  SENSITIVE  DATA   Mercè  Crosas,  Ph.D.   Chief  Data  Science  and  Technology  Officer   Ins=tute  for  Quan=ta=ve  Social  Science   Harvard  University   @mercecrosas    
  • 2. Data  sharing:     good  for  you  and  good  for  the  world   Researchers   Get  credit  for  their  data   Publishers  and  Journals   Verify  published  work   Federal  funding  agencies   Make  public  assets   accessible   Science   Validate,  reuse  and   extend  previous  work  
  • 3. Data   Sharing  (or   Publishing)   A  formal  data   cita=on   •  Reference   •  Access  (persistent   iden=fier)   Informa=on   about  the  data   (metadata)   •  Discovery   •  Use   A  trusted  data   repository   •  Access  (long-­‐term   archival)   Data  Sharing  needs  to  support  data   discovery,  referencing,  access,  and  reuse        
  • 4.                                                                                                                                        dataverse.org     Open-­‐source  soVware  developed  at  Harvard’s  IQSS  since  2006   Used  to  share,  publish,  cite  and  archive  research  data   Installed  in  12  sites  world  wide   Serving  100s  of  universi=es  and  organiza=ons  
  • 5. Harvard  Dataverse:  dataverse.harvard.edu   Started  as  a  community  repository  for  Social  Science   Now  open  to  all  research  fields  and  all  researchers   More  than  1300  dataverses   More  than  59,000  datasets   More  than  1,400,000  downloads        
  • 6. Data  Sharing  with  Dataverse     Now     •  No  sensi=ve  data   •  Seldom  versioning   •  Datasets  up  to  ~GB   The  Next  5  Years   •  Highly-­‐sensi=ve  data   •  Streaming  or  frequently   updated  data   •  Datasets  >  GBs,  TBs,  PBs   –  Thousands  of  files  per  dataset     –  Large  dataset  in  a  Big  Data,   NoSQL  storage  (MongoDB,   Cassandra,  Lucene)  
  • 7. Large-­‐scale  data  sharing  needs  to   con=nue  suppor=ng  discovery,   referencing,  access  and  reuse.      
  • 8. Adhering  to  the  same  high  standards   for  large-­‐scale  data     •  Metadata  for  discovery:   –  cita=on  metadata   –  domain-­‐specific  descrip=ve  metadata   –  file-­‐level  or  variable  metadata   •  Data  cita=on  for  reference  and  access:   –  for  en=re  dataset  and  for  subsets  of  the  dataset   (based  on  =me  of  retrieval  or  variables  selected)   •  Fast  queries,  data  explora=on  and  visualiza=ons   for  reuse:   –   might  not  be  able  to  download  en=re  dataset  
  • 9. Data  retrieval,  explora=ons  and   visualiza=ons  of  large-­‐scale  datasets   require  data  repositories  be  closer  to   compu=ng  resources.  
  • 10. Current  collabora=ons  to  address   the  next  challenges  in  data  sharing   SB  Grid  Data  Repository   (HMS,  IQSS)   Social  Science  Big  Data  (IQSS)   Data  Provenance  (SEAS,  IQSS)   Privacy  Tools  to  share   sensi=ve  data  (SEAS,   Berkman,  Privacy  Lab,  IQSS,   MIT)                    
  • 11. Sharing  and  Preserving  Large  Structural  Biology  Data   Funded  by    hhps://data.sbgrid.org/  
  • 12. Structural  Biology   Primary  Data   1  Dataset  is  180-­‐360  images  of   X-­‐ray  diffrac=on  data,  3.5-­‐7  GB;   ~  1TB  per  dataset,  with  a  total   up  to  100  PBs   Integra=on  with  Dataverse:       ●  Long-­‐term  access   ●  Formal  Data  Cita=on   ●  Standard  Metadata   ●  Data  Explora=on  (OME)   ●  Preserva=on,  with  copies   in  mul=ple  sites  (following   dataPASS  approach)  
  • 13. Dataverse  on  the  Massachusehs  Open  Cloud   (MOC):  Compu=ng  closer  to  data  storage   Current  Architecture   On  the  MOC   Network   File  System   (data  files)   UI  Layer     (PrimeFaces,  js)   Applica=on  Logic     (Java  EE)     A   P   I   PostgreSQL   (user  data,   metadata)   Solr   (Index)   RServe   (R  ingest,   analysis)   COMPUTE  SERVICES   (R,  Python,  Spark,   Hadoop,  …)   CINDER     block  storage   SWIFT   object  storage   UI  Layer     (PrimeFaces,  js)   Applica=on  Logic     (Java  EE)     A   P   I   PostgreSQL   (user  data,   metadata)   Solr   (Index)   Dataverse  
  • 14. Sharing  Sensi=ve  Data  with  Confidence:   DataTags  System   DataTag:  A  set  of  security  features  and  access  requirements  for  file  handling   Sweeney,  Crosas,  Bar-­‐Sinai,  2015,  “Sharing  Sensi=ve  Data  with  Confidence:  The  DataTags  System”  Technology  Science  
  • 15. Data  Sharing  Workflow     for  Sensi=ve  Data     Sensi=ve   Dataset   Sensi=ve   Dataset   Direct   Access   Privacy   Preserving   Access   hhp://datatags.org   hhp://privacytools.seas.harvard.edu   Authorized   Signed  DUA