February 11, 2020
Anita de Waard, Alberto Zigoni, Elsevier
Enhancing Data Discovery,
Sharing and Reuse
11.02.2020
Meet our hero: a cancer biologist
• Specialized in liver cancer
• Runs a lab of 6 people
• Focusing on understanding
the molecular mechanics of
liver cancer to identify
biomarkers for diagnosis and
prognosis
• Mostly pre-clinical research,
mouse models and diseased
tissue samples
• Research has been
supported by various NIH
grants
Private
11.02.2020
The research (data) journey: private and public
•literature (and data) search to assess
state of the art and identify gapsDiscover
•Run experiments and collect data
•Collaborate
•Collaborate on project data with team
members and external researchers
Collaborate
•Report on data that is produced for a
specific grant
•Share data in domain specific and
generalist / institutional repository
Publish
Researcher
• Data published in
institutional data repositoryCurate
• Data and related
publications / projects /
awards in public portal
Showcase
• Datasets published
everywhere
• Compliance to funders’
policies
Track
Institution
Public
11.02.2020
Discover: Mendeley Datasearch: find related work
Sequencing Data (NIH repository) Western Blots (Mendeley Data)
Deep indexing
Acronym expansion
HCC = Hepatocellular
Carcinoma
20.5m datasets from
1,700+ sources
Collaborate: store and share experiment/project data
Project
members can
share data
sources inside
project
Multiple data sources
(incl. institutional
servers) available
Datasets can be
viewed & edited by
all project members
(role based access)
Large files copied server-to-
server asynchronously.
11.02.2020
Curate: institute/library moderates data before
dataset is publicly shared
Institutional
moderation queues
Dataset moderator view
Moderation history
email notification
11.02.2020
Publish: add metadata and license to data
Type-ahead from taxonomy
Custom metadata
templates configured
Broad choice of data and
software licenses
Semantic links to
articles, datasets,
software
Reserved DOI prior to
publication, can be used
in manuscript to cite data
Multiple sharing options
100 GB per dataset
(configurable)
11.02.2020
Repository page includes
relevant datasets from
domain specific
repositories as well
Showcase: institution-branded data sharing
11.02.2020
Report: on created/published data in repository
11.02.2020
Report: integrate grants, publications and datasets in
funder’s reporting system
11.02.2020
Track: monitor institutional datasets published anywhere
Enriched dataset metadata include
institutional IDs (Scopus, SciVal,
Mendeley) to facilitate tracking
1700+ data sources
indexed
APIs for integration
Mendeley Data is a full research data management suite:
Data
Repository
Publish
MD
Data
Search
Discover
dB Institutional Product
Data
Manager
Curate
Data
Monitor
Report
Track
NIH
Collaborate
Free
Open,
API’s
Secure
Tailored
Paid
• Data is always
owned by the
user
• 16 possible open
licenses
• Archiving with
DANS
• Open API”s to
many platforms
Data
Repository
Publish
Showcase MD
We are well-connected to the global RDM ecosystem:
NB: ‘Data Repository Selection, Criteria That Matter’
https://osf.io/m2bce/
Anita de Waard, a.dewaard@elsevier.com
Alberto Zigoni, a.zigoni@elsevier.com
Eric Livingston, e.livingston@elsevier.com
Thank you!
• Anita de Waard, a.dewaard@Elsevier.com
• Alberto Zigoni, a.zigoni@Elsevier.com
https://data.mendeley.com/
Extra Slides
1billion
Over 16 m people a month use ScienceDirect,
our flagship platform for academic research
and download over 1 billion articles/year
73m+
Scopus, the leading abstract and citation
database of research literature, contains
over 73m records across 24,000 journals,
sourced from more than 5,000 publishers.
25,000
Our products are used at more than
25,000 Academic and Government
institutes globally
7,500
Elsevier has 7,500 employees and
serves customers in over 180
countries.
430,000
Elsevier publishes 430,000 peer-
reviewed articles annually
9 m
Mendeley is a scientific social networking tool
that enables over 11 million users worldwide
to organize, write, collaborate and promote
their research.
~11m
490+
ClinicalKey provides over 490 clinical
overviews that gives quick clinical answers
and summaries; over 4m images and 51,000
medical and surgical videos in a single, fully
integrated site.
> 6 my
ScienceDirect contains over 6
million articles, 2,500 journals, 900
full open access journals, 39,000
books and 330,000 topic pages
Elsevier in numbers:
https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data
Research Data Should Be FAIR.. and Happy!
https://www.force11.org/group/fairgroup/fairprinciples
Maslow’s Hierarchy of Research Data
• Following TOP guidelines for data deposition:
• Data deposition (and citation) fully integrated in article submission process
• Following Force11 Data citation principles
Supporting research data sharing through our publishing workflow:
This Photo by Unknown Author is licensed under CC BY-NC-ND
This Photo by Unknown Author is licensed under CC BY-SAhttps://cos.io/our-services/top-guidelines/
https://www.elsevier.com/connect/elsevier-supports-top-guidelines-in-ongoing-
efforts-to-ensure-research-quality-and-transparency
Enabling FAIR Data in the Earth, Environmental and Space Sciences
Scholix: Multi-stakeholder Linked Data Effort
Publishers
Data
Centers
Repositories
Past: disconnected sources using
heterogeneity of practices
Publishers
Data Centers
Repositories
Current: standard set of guidelines for exposing
and consuming links, supported by hubs
http://www.scholix.org/
FAIRSharing: Following Metadata Guidelines
https://fairsharing.org/graph/#/collection/bsg-c000041
NHLBI Stage Project
https://www.nhlbidatastage.org/about/
Seven Bridges
Fair4CURES Platform
Storage
Task 1 Workflow
Input
Task 2
Research Object Profiler
Add annotation and
relationships (metadata)
to collection to describe a
research object:
- URI
- Length
- Filename
- Checksums
etc.
Mendeley Data
Research Object Serializer (a
manifest itemizing file names)
Serialise Research Object in
standard format based BagIt
Mix of digital
objects
Research Object ComposerFile source
TOPMed
R
O
Open API
R
O
Future: need to address how to
deserialize the BagIt so that there isn’t
additional work for user to repack
Storing ROsCreating the RO
Zenodo
DataVerse
https://github.com/Research
Object/research-object-
composer/blob/master/introduction.ipynb
Partners:
https://rdmla.github.io/home/
Research Data Management Librarian Academy:
Mendeley Data integrates through open APIs
+ 35 repositories
(BePress planned)
• Mendeley Data Repository
datasets are automatically
synced with the Pure
curation workflow
• Projects, grants,
equipment, showcase
on portal (planned)
• Mendeley Data Search results
are visible on Scopus
• Notify new articles to Monitor
for data sharing compliance
• Datasets appear as records
on Scopus (planned)
• Mendeley Data usage is
accessible through Plum API
and widget
• Plumx metrics (citations,
usage, social mentions) are
captured and shown on
Mendeley Data Repository
Publish datasets
alongside an article
on Mendeley Data
within the SSRN
publication flow
Publish or link datasets
alongside an article on
Mendeley Data within the
ScienceDirect publication flow
Researcher and
Institutional
Dataset metrics
• User identity & login
• Library (planned)
• Notes (planned)
• Projects (planned)
Mendeley Data
Existing integration
Planned integration
• Mendeley Data indexed
by OpenAIRE index
• OpenAire Zenodo
repository indexed by
Mendeley Data Search
Long-term
preservation of
published datasets
Links between articles and datasets:
• Contributed by Mendeley
Data to Scholix
• Indexed by Menndeley Data
Search and Data Monitor
• Consumed by Scopus and
ScienceDirect
Integrate with machine
readabledata management plans
• For more than 35 repositories the
metadata as well as the underlying
datasets are indexed by Mendeley
Data Search
• First repositories are actively
integrating with the free and open
‘push API’ of Mendeley Data
Search
• Mint DOIs for Mendeley Data
Repository
• Data Cite indexed by
Mendeley Data Search

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse

  • 1.
    February 11, 2020 Anitade Waard, Alberto Zigoni, Elsevier Enhancing Data Discovery, Sharing and Reuse
  • 2.
    11.02.2020 Meet our hero:a cancer biologist • Specialized in liver cancer • Runs a lab of 6 people • Focusing on understanding the molecular mechanics of liver cancer to identify biomarkers for diagnosis and prognosis • Mostly pre-clinical research, mouse models and diseased tissue samples • Research has been supported by various NIH grants
  • 3.
    Private 11.02.2020 The research (data)journey: private and public •literature (and data) search to assess state of the art and identify gapsDiscover •Run experiments and collect data •Collaborate •Collaborate on project data with team members and external researchers Collaborate •Report on data that is produced for a specific grant •Share data in domain specific and generalist / institutional repository Publish Researcher • Data published in institutional data repositoryCurate • Data and related publications / projects / awards in public portal Showcase • Datasets published everywhere • Compliance to funders’ policies Track Institution Public
  • 4.
    11.02.2020 Discover: Mendeley Datasearch:find related work Sequencing Data (NIH repository) Western Blots (Mendeley Data) Deep indexing Acronym expansion HCC = Hepatocellular Carcinoma 20.5m datasets from 1,700+ sources
  • 5.
    Collaborate: store andshare experiment/project data Project members can share data sources inside project Multiple data sources (incl. institutional servers) available Datasets can be viewed & edited by all project members (role based access) Large files copied server-to- server asynchronously.
  • 6.
    11.02.2020 Curate: institute/library moderatesdata before dataset is publicly shared Institutional moderation queues Dataset moderator view Moderation history email notification
  • 7.
    11.02.2020 Publish: add metadataand license to data Type-ahead from taxonomy Custom metadata templates configured Broad choice of data and software licenses Semantic links to articles, datasets, software Reserved DOI prior to publication, can be used in manuscript to cite data Multiple sharing options 100 GB per dataset (configurable)
  • 8.
    11.02.2020 Repository page includes relevantdatasets from domain specific repositories as well Showcase: institution-branded data sharing
  • 9.
  • 10.
    11.02.2020 Report: integrate grants,publications and datasets in funder’s reporting system
  • 11.
    11.02.2020 Track: monitor institutionaldatasets published anywhere Enriched dataset metadata include institutional IDs (Scopus, SciVal, Mendeley) to facilitate tracking 1700+ data sources indexed APIs for integration
  • 12.
    Mendeley Data isa full research data management suite: Data Repository Publish MD Data Search Discover dB Institutional Product Data Manager Curate Data Monitor Report Track NIH Collaborate Free Open, API’s Secure Tailored Paid • Data is always owned by the user • 16 possible open licenses • Archiving with DANS • Open API”s to many platforms Data Repository Publish Showcase MD
  • 13.
    We are well-connectedto the global RDM ecosystem:
  • 14.
    NB: ‘Data RepositorySelection, Criteria That Matter’ https://osf.io/m2bce/
  • 15.
    Anita de Waard,a.dewaard@elsevier.com Alberto Zigoni, a.zigoni@elsevier.com Eric Livingston, e.livingston@elsevier.com Thank you! • Anita de Waard, a.dewaard@Elsevier.com • Alberto Zigoni, a.zigoni@Elsevier.com https://data.mendeley.com/
  • 16.
  • 17.
    1billion Over 16 mpeople a month use ScienceDirect, our flagship platform for academic research and download over 1 billion articles/year 73m+ Scopus, the leading abstract and citation database of research literature, contains over 73m records across 24,000 journals, sourced from more than 5,000 publishers. 25,000 Our products are used at more than 25,000 Academic and Government institutes globally 7,500 Elsevier has 7,500 employees and serves customers in over 180 countries. 430,000 Elsevier publishes 430,000 peer- reviewed articles annually 9 m Mendeley is a scientific social networking tool that enables over 11 million users worldwide to organize, write, collaborate and promote their research. ~11m 490+ ClinicalKey provides over 490 clinical overviews that gives quick clinical answers and summaries; over 4m images and 51,000 medical and surgical videos in a single, fully integrated site. > 6 my ScienceDirect contains over 6 million articles, 2,500 journals, 900 full open access journals, 39,000 books and 330,000 topic pages Elsevier in numbers:
  • 18.
    https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data Research Data ShouldBe FAIR.. and Happy! https://www.force11.org/group/fairgroup/fairprinciples Maslow’s Hierarchy of Research Data
  • 19.
    • Following TOPguidelines for data deposition: • Data deposition (and citation) fully integrated in article submission process • Following Force11 Data citation principles Supporting research data sharing through our publishing workflow: This Photo by Unknown Author is licensed under CC BY-NC-ND This Photo by Unknown Author is licensed under CC BY-SAhttps://cos.io/our-services/top-guidelines/ https://www.elsevier.com/connect/elsevier-supports-top-guidelines-in-ongoing- efforts-to-ensure-research-quality-and-transparency
  • 20.
    Enabling FAIR Datain the Earth, Environmental and Space Sciences
  • 21.
    Scholix: Multi-stakeholder LinkedData Effort Publishers Data Centers Repositories Past: disconnected sources using heterogeneity of practices Publishers Data Centers Repositories Current: standard set of guidelines for exposing and consuming links, supported by hubs http://www.scholix.org/
  • 22.
    FAIRSharing: Following MetadataGuidelines https://fairsharing.org/graph/#/collection/bsg-c000041
  • 23.
    NHLBI Stage Project https://www.nhlbidatastage.org/about/ SevenBridges Fair4CURES Platform Storage Task 1 Workflow Input Task 2 Research Object Profiler Add annotation and relationships (metadata) to collection to describe a research object: - URI - Length - Filename - Checksums etc. Mendeley Data Research Object Serializer (a manifest itemizing file names) Serialise Research Object in standard format based BagIt Mix of digital objects Research Object ComposerFile source TOPMed R O Open API R O Future: need to address how to deserialize the BagIt so that there isn’t additional work for user to repack Storing ROsCreating the RO Zenodo DataVerse https://github.com/Research Object/research-object- composer/blob/master/introduction.ipynb
  • 24.
  • 25.
    Mendeley Data integratesthrough open APIs + 35 repositories (BePress planned) • Mendeley Data Repository datasets are automatically synced with the Pure curation workflow • Projects, grants, equipment, showcase on portal (planned) • Mendeley Data Search results are visible on Scopus • Notify new articles to Monitor for data sharing compliance • Datasets appear as records on Scopus (planned) • Mendeley Data usage is accessible through Plum API and widget • Plumx metrics (citations, usage, social mentions) are captured and shown on Mendeley Data Repository Publish datasets alongside an article on Mendeley Data within the SSRN publication flow Publish or link datasets alongside an article on Mendeley Data within the ScienceDirect publication flow Researcher and Institutional Dataset metrics • User identity & login • Library (planned) • Notes (planned) • Projects (planned) Mendeley Data Existing integration Planned integration • Mendeley Data indexed by OpenAIRE index • OpenAire Zenodo repository indexed by Mendeley Data Search Long-term preservation of published datasets Links between articles and datasets: • Contributed by Mendeley Data to Scholix • Indexed by Menndeley Data Search and Data Monitor • Consumed by Scopus and ScienceDirect Integrate with machine readabledata management plans • For more than 35 repositories the metadata as well as the underlying datasets are indexed by Mendeley Data Search • First repositories are actively integrating with the free and open ‘push API’ of Mendeley Data Search • Mint DOIs for Mendeley Data Repository • Data Cite indexed by Mendeley Data Search