- Dataverse is an open source data repository platform that aims to make research data findable, accessible, interoperable, and reusable (FAIR).
- It has features like persistent identifiers, metadata enrichment, and APIs that help promote and disseminate datasets through search engines and other tools to increase citations and visibility.
- The DataverseNL instance integrates with services from CLARIAH.nl and DANS EASY to ensure datasets are preserved and meet standards for interoperability.
1. dans.knaw.nl
DANS is een instituut van KNAW en NWO
FAIR DataverseNL
Vyacheslav Tykhonov
Senior Data Scientist (DANS-KNAW),
Royal Netherlands Academy of Arts and Sciences
vyacheslav.tykhonov@dans.knaw.nl
Den Haag, the Netherlands, 06.12.2016
2. Why Dataverse?
• Open source project developed by IQSS of Harvard University
and published on github
• Great product with very long history (from 2006)
• Very dynamic and experienced development team working in the
Agile environment (community call scheduled once in two weeks)
• Clear vision and understanding of research communities
requirements, public roadmap
• Strong community behind of Dataverse is helping to improve the
basic functionality and develop it further
• Well developed architecture with rich APIs allows to build
application layers around Dataverse
3. QA: value of data repository for researcher
- I’ve deposited my dataset, it’s archived now. Are you like Dropbox but with
some rich metadata?
- No, we’ll make your data visible and will invite interested audience.
- Are you going help me to get more citations?
- We can help to promote your datasets in Google, Yahoo and other search engines.
- Are you persistent? What if I loose my hard drive with data someday?
- Every dataset will get own persistent identifier after deposit. You’ll be able to access
it on the same url any time
- Is your service good enough to store my data?
- Nature Research journal Scientific Data recommended Dataverse for researchers
submitting supporting datasets from any research subject
- Probably it’s very expensive, isn’t it?
- At the moment we’re working with a lot of partners and charging every customer
only 4000 EUR per year for basic services.
4. Dataverse widget disseminates scientific data
like YouTube or Slideshare
<iframe width="560" height="315" src="https://www.youtube.com/embed/fgn6dmfsZ_M" frameborder="0" allowfullscreen></iframe>
7. Dataverse, SEO and FAIR (F and A)
Search engine optimization (SEO) is the process of affecting the visibility of a website or a web page in a
web search engine's unpaid results—often referred to as "natural", "organic", or "earned" results. In general,
the earlier (or higher ranked on the search results page), and more frequently a site appears in the search
results list, the more visitors it will receive from the search engine's users, and these visitors can be converted
into customers.[1] SEO may target different kinds of search, including image search, local search, video
search, academic search,[2] news search and industry-specific vertical search engines.
from Wikipedia
FAIR - a set of guiding principles to make data Findable, Accessible, Interoperable, and Re-
usable.
So… SEO is a major approach to make research data FAIR.
8. DataverseNL and research community
Common goals:
- getting higher position of DataverseNL in search engines will set higher
rank of researcher in his community. Great approach to be cited more!
- pointers to deposited metadata and data are persistent with handles
(ongoing research) and DOIs (archive)
- depositor is getting not just citation but own dynamic research media
channel that can go up (or down)
- adding more dataverses and datasets will automatically increase the
importance of DataverseNL in search engines and will boost visibility of the
datasets
- metadata enrichment will attract more interested visitors on landing pages
of researchers, and in the same time increase the popularity of DataverseNL
website
9. The role of Archivist in the digital age
- providing guidance for depositors to describe their metadata by
relevant and rich keywords
- collecting information from search engines about similar
research projects
- links exchange to get inbound rank higher (position increase)
- suggest new keywords that should increase visibility of datasets
and attract more visitors
- digital archivist should have good analytics skills to understand
trends
- research and collections are coming together
11. Community efforts
Every member of Dataverse community improves own
metadata and visibility of his data - and other members
automatically can get higher positions by higher rank and new
citations
Value of the community grows and citation rank increases
More partners will join to benefit from this collaboration
Research data become Findable and Accessible
13. Back to FAIR: Interoperable (I)
To be Interoperable:
I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge
representation.
I2. (meta)data use vocabularies that follow FAIR principles.
I3. (meta)data include qualified references to other (meta)data.
CLARIAH.nl project delivers standards for researchers to be
able to find and use the data and tools.
DataverseNL will use their services to ingest, map, convert,
curate, harvest, query, explore, visualize and export structured
humanities research data.
14. Back to FAIR: Re-usable (R)
Dataverse features:
• clear licences for every dataset
• plurality of accurate and relevant attributes.
• provenance of data (version 4.6, Q4 2016)
• domain-relevant community standards
15. Data Provenance is the key to be Re-usable
Provenance of datasets will allow researchers to see
the context on how they are captured, processed,
analyzed, and validated and other information that
enables interpretation and reuse:
Source: Harvard’s IQSS director Gary King’ Balsamiq
16. API economy
Dataverse is data repository platform with 4 API endpoints:
- Native API
- SWORD API
- Search API
- Data Access API
API token is the key to connect Dataverse with unlimited
amount of tools developed by different research communities
and integrate it with other repositories.
We can benefit from other FAIR tools and datasets today!
17. Data Preservation
Trusted Digital Repository (TDR) is permanent archive for data
and metadata, and provenance information.
Data citation alone does not solve the transparency issue. Full
documentation of data set provenance and context is necessary.
The vision is to have Dataverse as deposit service for ongoing
research and DANS EASY (TDR) as permanent archive.
18. Try it now
Dataverse is the way to make your data FAIR.
Contact us today!
http://demo.dataverse.nl
info@dataverse.nl