SlideShare a Scribd company logo
1 of 12
DATAVERSE ON THE MOC
Mercè Crosas, Ph.D.
Chief Data Science and Technology Officer
Institute for Quantitative Social Science
Harvard University
@mercecrosas
MOC Workshop, Boston University, November 19, 2015
Data Repositories vs
Repository Software
Domain-
specific
repositories
GenBank
WW Protein Data
Bank
SBGrid Data Bank
…
General-
purpose
repositories
Harvard
Dataverse
DataDryad
Figshare
…
Repository
Software
Dataverse
Software
Dspace
Fedora
…
dataverse.org
Open-source software developed at Harvard’s IQSS since 2006
Used to share, publish, cite and archive research data
Installed in 12 sites world wide
Serving 100s of universities and organizations
Harvard Dataverse: dataverse.harvard.edu
Started as a community repository for Social Science
Now open to all research fields and all researchers
More than 1300 dataverses
More than 59,000 datasets
More than 1,400,000 downloads
Dataverses are containers for Datasets
Each Dataverse can be for a researcher, a research project, a department, a
journal, or a larger organization.
Dataverse offers a rich feature set
to publish research data
Credit and Visibility
• Standard,
persistent data
citation
• Branding for each
dataverse
• Widgets to
embed in your
own website
Discovery
• Faceted search
for all metadata
• Standard
metadata:
• citation
• scientific
domain
• file-level
Access Control &
Roles
• CCO waiver for
public datasets
• Tiered access:
• terms of use
• guestbook
• restricted data
• Publishing
workflow
• Multiple roles:
• contribute
• curate, review
• administrate
Data Features
• Versioning
• Conversion of
tabular data files
to standard
format
• Automatic
extraction of file
metadata (R,
STATA, SPSS,
XSLX, FITS)
Journal Systems (Open Journal System, ScholarOne); Open Science Framework
Data Analysis (TwoRavens); Spatial Viz (WorldMap); Preservation systems (Archivematica)
Interoperability through APIs
Current Collaborations: Addressing the
Next Challenges in Data Sharing
Structural Biology Grid Data
Repository (Sliz, HMS, Crosas,
IQSS)
Social Science Big Data (King,
Crosas, IQSS, CGA)
Data Provenance (Seltzer,
SEAS, Crosas, King, IQSS)
Privacy Tools to share sensitive
data (SEAS, Berkman Center,
Privacy Lab, IQSS, MIT)
Sharing Sensitive Data with Confidence:
DataTags System
DataTag: A set of security features and access requirements for file handling
Sweeney, Crosas, Bar-Sinai, 2015, Technology Science
Data Sharing Workflow
for Sensitive Data
Sensitive
Dataset
Sensitive
Dataset
Direct
Access
Privacy
Preserving
Access
http://datatags.org
http://privacytools.seas.harvard.edu
Authorized
Signed DUA
Dataverse on the MOC
Current Architecture On the MOC
Network
File System
(data files)
UI Layer
(PrimeFaces, js)
Application Logic
(Java EE)
A
P
I
PostgreSQL
(user data,
metadata)
Solr
(Index)
RServe
(R ingest,
analysis)
COMPUTE SERVICES
(R, Python, Spark,
Hadoop, …) CINDER
block storage
SWIFT
object storage
UI Layer
(PrimeFaces, js)
Application Logic
(Java EE)
A
P
I
PostgreSQL
(user data,
metadata)
Solr
(Index)
Dataverse
Dataverse-MOC Projects
We propose to pilot a framework for sharing and publishing
large and streaming datasets, and enabling collaborative
computing on them.
Boston Data from City Hall and Boston Area Research Initiative (BARI):
• Dan O’Brien (Northeastern University)
• Storage: 911 and 311 calls streaming data:
• 311 calls: 800,000 reports, at 500 reports/day
• 911 calls: 2,500,000 reports, at 1,500 reports/day
• Computing: merge data + geospatial, temporal exploration
Partners Healthcare Clinical Trial Data:
• Shawn Murphy (Partners Healthcare)
• Scalable Collaborative Infrastructure for a Learning Healthcare System
(SCILHS)
• Data from 14 health systems sites
• Sensitive data sets, categorized using the DataTags levels
THANKS
@mercecrosas
mcrosas@iq.harvard.edu
http://scholar.harvard.edu/mercecrosas

More Related Content

What's hot

Connecting Dataverse with the Research Life Cycle
Connecting Dataverse with the Research Life CycleConnecting Dataverse with the Research Life Cycle
Connecting Dataverse with the Research Life CycleMerce Crosas
 
DataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse IntegrationDataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse IntegrationMichael Bar-Sinai
 
Data Citation Implementation at Dataverse
Data Citation Implementation at DataverseData Citation Implementation at Dataverse
Data Citation Implementation at DataverseMerce Crosas
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseAnita de Waard
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
 
6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation SlidesDuraSpace
 
Data Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim ClarkData Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim Clarkdatascienceiqss
 
SEAD Prototype: Data Curation and Preservation for Sustainability Science
SEAD Prototype: Data Curation and Preservation for Sustainability ScienceSEAD Prototype: Data Curation and Preservation for Sustainability Science
SEAD Prototype: Data Curation and Preservation for Sustainability ScienceSEAD
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceCarole Goble
 
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Anita de Waard
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupAnita de Waard
 
Building an institutional repository using dspace
Building an institutional repository using dspaceBuilding an institutional repository using dspace
Building an institutional repository using dspaceBharat Chaudhari
 
The DATS model: datasets descriptions for data discovery in DataMed
The DATS model: datasets descriptions for data discovery in DataMedThe DATS model: datasets descriptions for data discovery in DataMed
The DATS model: datasets descriptions for data discovery in DataMedAlejandra Gonzalez-Beltran
 
Clinical Trials & Big Data-Final
Clinical Trials & Big Data-FinalClinical Trials & Big Data-Final
Clinical Trials & Big Data-FinalManoj Vig
 

What's hot (20)

Connecting Dataverse with the Research Life Cycle
Connecting Dataverse with the Research Life CycleConnecting Dataverse with the Research Life Cycle
Connecting Dataverse with the Research Life Cycle
 
DataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse IntegrationDataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse Integration
 
Data Citation Implementation at Dataverse
Data Citation Implementation at DataverseData Citation Implementation at Dataverse
Data Citation Implementation at Dataverse
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
 
Accelerating your research with Microsoft Azure
Accelerating your research with Microsoft AzureAccelerating your research with Microsoft Azure
Accelerating your research with Microsoft Azure
 
6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides6.15.17 DSpace-Cris Webinar Presentation Slides
6.15.17 DSpace-Cris Webinar Presentation Slides
 
Data Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim ClarkData Citation Implementation Guidelines By Tim Clark
Data Citation Implementation Guidelines By Tim Clark
 
Reproducible Research and the Cloud
Reproducible Research and the CloudReproducible Research and the Cloud
Reproducible Research and the Cloud
 
SEAD Prototype: Data Curation and Preservation for Sustainability Science
SEAD Prototype: Data Curation and Preservation for Sustainability ScienceSEAD Prototype: Data Curation and Preservation for Sustainability Science
SEAD Prototype: Data Curation and Preservation for Sustainability Science
 
Preparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR PrinciplesPreparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR Principles
 
Software Sustainability: Better Software Better Science
Software Sustainability: Better Software Better ScienceSoftware Sustainability: Better Software Better Science
Software Sustainability: Better Software Better Science
 
mx & dbs
mx & dbsmx & dbs
mx & dbs
 
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
Optimising Scientific Knowledge Transfer: How Collective Sensemaking Can Ena...
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
 
Building an institutional repository using dspace
Building an institutional repository using dspaceBuilding an institutional repository using dspace
Building an institutional repository using dspace
 
A4 r overview deck_1.7
A4 r overview deck_1.7A4 r overview deck_1.7
A4 r overview deck_1.7
 
The DATS model: datasets descriptions for data discovery in DataMed
The DATS model: datasets descriptions for data discovery in DataMedThe DATS model: datasets descriptions for data discovery in DataMed
The DATS model: datasets descriptions for data discovery in DataMed
 
Clinical Trials & Big Data-Final
Clinical Trials & Big Data-FinalClinical Trials & Big Data-Final
Clinical Trials & Big Data-Final
 

Viewers also liked (8)

Props
PropsProps
Props
 
инструкция по работе с Wikiwall
инструкция по работе с Wikiwallинструкция по работе с Wikiwall
инструкция по работе с Wikiwall
 
Taller de integración 2015 matías
Taller de integración 2015 matíasTaller de integración 2015 matías
Taller de integración 2015 matías
 
Bbfc
BbfcBbfc
Bbfc
 
The Power of Chunks Flyer 11x17
The Power of Chunks Flyer 11x17The Power of Chunks Flyer 11x17
The Power of Chunks Flyer 11x17
 
презентация умвб газ рус Blue
презентация умвб газ рус Blueпрезентация умвб газ рус Blue
презентация умвб газ рус Blue
 
innovative lesson plan
innovative lesson planinnovative lesson plan
innovative lesson plan
 
CV ALI 2016
CV ALI 2016CV ALI 2016
CV ALI 2016
 

Similar to Dataverse on the MOC

December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...DeVonne Parks, CEM
 
Data Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseData Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseMicah Altman
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptxmantatheralyasriy
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptxmantatheralyasriy
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptxmantatheralyasriy
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptxmantatheralyasriy
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunitiesvty
 
Metadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data RepositoriesMetadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data Repositoriesandrea huang
 
Using Dataverse Virtual Archive Technology for Research Data Management
Using Dataverse Virtual Archive Technology for Research Data ManagementUsing Dataverse Virtual Archive Technology for Research Data Management
Using Dataverse Virtual Archive Technology for Research Data ManagementGary Wilhelm
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositoriesChris Rusbridge
 
How Cyverse.org enables scalable data discoverability and re-use
How Cyverse.org enables scalable data discoverability and re-useHow Cyverse.org enables scalable data discoverability and re-use
How Cyverse.org enables scalable data discoverability and re-useMatthew Vaughn
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Laurent Alquier
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
 
It summit dataverse-bigdata-mercecrosas
It summit dataverse-bigdata-mercecrosasIt summit dataverse-bigdata-mercecrosas
It summit dataverse-bigdata-mercecrosaskevin_donovan
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesASIS&T
 
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...aceas13tern
 

Similar to Dataverse on the MOC (20)

December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
December 9, 2015 NISO Webinar: Two-Part Webinar: Emerging Resource Types - Pa...
 
Data Publishing Workflows with Dataverse
Data Publishing Workflows with DataverseData Publishing Workflows with Dataverse
Data Publishing Workflows with Dataverse
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
 
Dataverse opportunities
Dataverse opportunitiesDataverse opportunities
Dataverse opportunities
 
Metadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data RepositoriesMetadata as Linked Data for Research Data Repositories
Metadata as Linked Data for Research Data Repositories
 
wdb-01.ppt
wdb-01.pptwdb-01.ppt
wdb-01.ppt
 
Ndsa 2013-abrams-integrating-repositories-for-data-sharing
Ndsa 2013-abrams-integrating-repositories-for-data-sharingNdsa 2013-abrams-integrating-repositories-for-data-sharing
Ndsa 2013-abrams-integrating-repositories-for-data-sharing
 
Using Dataverse Virtual Archive Technology for Research Data Management
Using Dataverse Virtual Archive Technology for Research Data ManagementUsing Dataverse Virtual Archive Technology for Research Data Management
Using Dataverse Virtual Archive Technology for Research Data Management
 
Or 2013-abrams-sharing-data-rich-research
Or 2013-abrams-sharing-data-rich-researchOr 2013-abrams-sharing-data-rich-research
Or 2013-abrams-sharing-data-rich-research
 
Data curation issues for repositories
Data curation issues for repositoriesData curation issues for repositories
Data curation issues for repositories
 
How Cyverse.org enables scalable data discoverability and re-use
How Cyverse.org enables scalable data discoverability and re-useHow Cyverse.org enables scalable data discoverability and re-use
How Cyverse.org enables scalable data discoverability and re-use
 
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.Exploration of a Data Landscape using a Collaborative Linked Data Framework.
Exploration of a Data Landscape using a Collaborative Linked Data Framework.
 
Sailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0sSailing on the ocean of 1s and 0s
Sailing on the ocean of 1s and 0s
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
It summit dataverse-bigdata-mercecrosas
It summit dataverse-bigdata-mercecrosasIt summit dataverse-bigdata-mercecrosas
It summit dataverse-bigdata-mercecrosas
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
 
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
SPatially Explicit Data Discovery, Extraction and Evaluation Services (SPEDDE...
 

More from Merce Crosas

Practical Implementation of research data policies: Solutions with Dataverse
Practical Implementation of research data policies: Solutions with DataversePractical Implementation of research data policies: Solutions with Dataverse
Practical Implementation of research data policies: Solutions with DataverseMerce Crosas
 
Research Data Management @Harvard
Research Data Management @HarvardResearch Data Management @Harvard
Research Data Management @HarvardMerce Crosas
 
Cloud Dataverse: A Data repository platform for an OpenStack Cloud
Cloud Dataverse: A Data repository platform for an OpenStack CloudCloud Dataverse: A Data repository platform for an OpenStack Cloud
Cloud Dataverse: A Data repository platform for an OpenStack CloudMerce Crosas
 
Can data access combat fake news?
Can data access combat fake news?Can data access combat fake news?
Can data access combat fake news?Merce Crosas
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingMerce Crosas
 
The Data Lifecycle (Harvard DataFest)
The Data Lifecycle (Harvard DataFest)The Data Lifecycle (Harvard DataFest)
The Data Lifecycle (Harvard DataFest)Merce Crosas
 
Making Data Accessible
Making Data AccessibleMaking Data Accessible
Making Data AccessibleMerce Crosas
 
Abcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasAbcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasMerce Crosas
 
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...Merce Crosas
 
A very Brief History of Communicating Science
A very Brief History of Communicating ScienceA very Brief History of Communicating Science
A very Brief History of Communicating ScienceMerce Crosas
 
Dataverse hpdm symposium
Dataverse   hpdm symposiumDataverse   hpdm symposium
Dataverse hpdm symposiumMerce Crosas
 
Collaboration in science and technology it summit
Collaboration in science and technology   it summitCollaboration in science and technology   it summit
Collaboration in science and technology it summitMerce Crosas
 
Dataverse for Journals
Dataverse for JournalsDataverse for Journals
Dataverse for JournalsMerce Crosas
 
Collaboration in science and technology
Collaboration in science and technologyCollaboration in science and technology
Collaboration in science and technologyMerce Crosas
 
Force11 jddcp intro
Force11  jddcp introForce11  jddcp intro
Force11 jddcp introMerce Crosas
 
The expanding dataverse
The expanding dataverseThe expanding dataverse
The expanding dataverseMerce Crosas
 

More from Merce Crosas (17)

Practical Implementation of research data policies: Solutions with Dataverse
Practical Implementation of research data policies: Solutions with DataversePractical Implementation of research data policies: Solutions with Dataverse
Practical Implementation of research data policies: Solutions with Dataverse
 
Research Data Management @Harvard
Research Data Management @HarvardResearch Data Management @Harvard
Research Data Management @Harvard
 
Cloud Dataverse: A Data repository platform for an OpenStack Cloud
Cloud Dataverse: A Data repository platform for an OpenStack CloudCloud Dataverse: A Data repository platform for an OpenStack Cloud
Cloud Dataverse: A Data repository platform for an OpenStack Cloud
 
Can data access combat fake news?
Can data access combat fake news?Can data access combat fake news?
Can data access combat fake news?
 
FAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data SharingFAIR Data Management and FAIR Data Sharing
FAIR Data Management and FAIR Data Sharing
 
The Data Lifecycle (Harvard DataFest)
The Data Lifecycle (Harvard DataFest)The Data Lifecycle (Harvard DataFest)
The Data Lifecycle (Harvard DataFest)
 
Cloud Dataverse
Cloud DataverseCloud Dataverse
Cloud Dataverse
 
Making Data Accessible
Making Data AccessibleMaking Data Accessible
Making Data Accessible
 
Abcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosasAbcd iqs ssoftware-projects-mercecrosas
Abcd iqs ssoftware-projects-mercecrosas
 
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...
The Rise of Data Publishing in the Digital World (and how Dataverse and DataT...
 
A very Brief History of Communicating Science
A very Brief History of Communicating ScienceA very Brief History of Communicating Science
A very Brief History of Communicating Science
 
Dataverse hpdm symposium
Dataverse   hpdm symposiumDataverse   hpdm symposium
Dataverse hpdm symposium
 
Collaboration in science and technology it summit
Collaboration in science and technology   it summitCollaboration in science and technology   it summit
Collaboration in science and technology it summit
 
Dataverse for Journals
Dataverse for JournalsDataverse for Journals
Dataverse for Journals
 
Collaboration in science and technology
Collaboration in science and technologyCollaboration in science and technology
Collaboration in science and technology
 
Force11 jddcp intro
Force11  jddcp introForce11  jddcp intro
Force11 jddcp intro
 
The expanding dataverse
The expanding dataverseThe expanding dataverse
The expanding dataverse
 

Recently uploaded

专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookmanojkuma9823
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 

Recently uploaded (20)

专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 

Dataverse on the MOC

  • 1. DATAVERSE ON THE MOC Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitative Social Science Harvard University @mercecrosas MOC Workshop, Boston University, November 19, 2015
  • 2. Data Repositories vs Repository Software Domain- specific repositories GenBank WW Protein Data Bank SBGrid Data Bank … General- purpose repositories Harvard Dataverse DataDryad Figshare … Repository Software Dataverse Software Dspace Fedora …
  • 3. dataverse.org Open-source software developed at Harvard’s IQSS since 2006 Used to share, publish, cite and archive research data Installed in 12 sites world wide Serving 100s of universities and organizations
  • 4. Harvard Dataverse: dataverse.harvard.edu Started as a community repository for Social Science Now open to all research fields and all researchers More than 1300 dataverses More than 59,000 datasets More than 1,400,000 downloads
  • 5. Dataverses are containers for Datasets Each Dataverse can be for a researcher, a research project, a department, a journal, or a larger organization.
  • 6. Dataverse offers a rich feature set to publish research data Credit and Visibility • Standard, persistent data citation • Branding for each dataverse • Widgets to embed in your own website Discovery • Faceted search for all metadata • Standard metadata: • citation • scientific domain • file-level Access Control & Roles • CCO waiver for public datasets • Tiered access: • terms of use • guestbook • restricted data • Publishing workflow • Multiple roles: • contribute • curate, review • administrate Data Features • Versioning • Conversion of tabular data files to standard format • Automatic extraction of file metadata (R, STATA, SPSS, XSLX, FITS) Journal Systems (Open Journal System, ScholarOne); Open Science Framework Data Analysis (TwoRavens); Spatial Viz (WorldMap); Preservation systems (Archivematica) Interoperability through APIs
  • 7. Current Collaborations: Addressing the Next Challenges in Data Sharing Structural Biology Grid Data Repository (Sliz, HMS, Crosas, IQSS) Social Science Big Data (King, Crosas, IQSS, CGA) Data Provenance (Seltzer, SEAS, Crosas, King, IQSS) Privacy Tools to share sensitive data (SEAS, Berkman Center, Privacy Lab, IQSS, MIT)
  • 8. Sharing Sensitive Data with Confidence: DataTags System DataTag: A set of security features and access requirements for file handling Sweeney, Crosas, Bar-Sinai, 2015, Technology Science
  • 9. Data Sharing Workflow for Sensitive Data Sensitive Dataset Sensitive Dataset Direct Access Privacy Preserving Access http://datatags.org http://privacytools.seas.harvard.edu Authorized Signed DUA
  • 10. Dataverse on the MOC Current Architecture On the MOC Network File System (data files) UI Layer (PrimeFaces, js) Application Logic (Java EE) A P I PostgreSQL (user data, metadata) Solr (Index) RServe (R ingest, analysis) COMPUTE SERVICES (R, Python, Spark, Hadoop, …) CINDER block storage SWIFT object storage UI Layer (PrimeFaces, js) Application Logic (Java EE) A P I PostgreSQL (user data, metadata) Solr (Index) Dataverse
  • 11. Dataverse-MOC Projects We propose to pilot a framework for sharing and publishing large and streaming datasets, and enabling collaborative computing on them. Boston Data from City Hall and Boston Area Research Initiative (BARI): • Dan O’Brien (Northeastern University) • Storage: 911 and 311 calls streaming data: • 311 calls: 800,000 reports, at 500 reports/day • 911 calls: 2,500,000 reports, at 1,500 reports/day • Computing: merge data + geospatial, temporal exploration Partners Healthcare Clinical Trial Data: • Shawn Murphy (Partners Healthcare) • Scalable Collaborative Infrastructure for a Learning Healthcare System (SCILHS) • Data from 14 health systems sites • Sensitive data sets, categorized using the DataTags levels