Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

by Barend Mons
Brought to you for
in V parts
A Plea
For Professional Datapublishing
Bringing Data to Broadway

FAIR play
For Research Data and other Research Objects
Findable
Accessible
Interoperable
Reusable
The Cast

A-The
Curse of Multidisciplinarity

I can not keep my data experts !!!

f
2005: Text Mining ?
Why Bury it first and then mine it again !

Part II
The Explicitome
and the Elusive Part
(our own fault)
The Explicitome: everything we already asserted

The Elusive Explicitome Phenomenon
example from: Yepes & Verspoor, 2013
narrative
Tables/figures
abstract
# of assertions
Supplementary data
5 500* 1000 50K-1M+
# of SNP-Phen: 2% 4% 50%*
The Elusive Explicitome: what escapes us (95%)
Hurdle 1:
Paywalls
Hurdle 2:
‘TIF’walls
Hurdle 3:
The Wall of Broken Links

Data loss is real and significant, while data growth is
staggering
Nature news, 19 December 2013 • Computer speed and storage
capacity is doubling every 18
months and this rate is steady
• DNA sequence data is
doubling every 6-8 months
over the last 3 years and looks
‘Oops, that link was the laptop of my PhD student’ to continue for this decade

The trends in e-Science
Computer Analytics
(takes charge)
Enormity of datasets
(beyond narrative)
Collaborative Intelligence
(calls for million minds)
Irreversable movement
(towards OA)
FAIR
?
Data
Publishing &
Stewardship

Professionalise Data Stewardship
A
F
R
I
Educate, Reward and Keep Data Experts

Part 3
Unavoidable: some science of ‘our own’
Part III
INTERMEZZO
Some Research….
but…..as ….Sorry examples, for the LS examples…..
sorry

Simplified eScience
RO’s
The Explicitome
+
WorkFlows
Ridiculogram
New
dataset
User
New
Insights

FAIR for computers FAIR for people
AERIAL SURVEY
pattern recognition in
Ridiculograms
HUMAN EXCAVATION
rationalisation and
‘confirmational reading’
X
‘Why would I believe this association’???

For KD we need each association only once
23
Cardinal Assertion
(<1011)
n identical
assertions
‘n’ different
provenances

We publish about less than a million LS Concepts !
24 106 concept clusters (Knowlets)

www.biosemantics.org LUMC - LIACS
BioSemantics Knowledge Discovery Pipeline
⊲
data sources ‘coordinated’ data
!
nanopub cache
cardinal
assertion
store
semantic
data
indexing modelling
reasoning
algorithms
trends
phase
transitions
‘new’ data
alerts differentials
{
funding
priorities
• gene
• disease
semantic
query
{

© Phortos Consultants
44,000 hypotheses (PPI)
What about the other 43,999 ?

Part 3
Part IV
Towards Solutions
Bigger is not Better
Zipping the Explicitome
but…..as examples, sorry

Electronic
Health
Databases
The Rescued Explicitome
Value
Added
Databases
narrative
Tables/figures
Supplementary data
abstract
PROVENANCE
Total Explicitome
an estimated
1014 asserted associations
in 2,500 data sources
ETL to
FAIR
FAIR
to
read

Assertions
Concepts
1014
1011
106
Semantic MedLine
U+C+CT+EG+GO = 36 M
80%
20%
Cardinal
Zipping the Explicitome

Part 3
Part V
(FAIR) data should take
CENTER STAGE
but…..as examples, sorry

DOI
PID
ARK
Handles
UUID
TURI’s
?

PID
Metadata (intrinsic)
'provenance' (user defined)
Data (elements)
A simplified diagram of a Digital (data) Object irrespective of technological choices and naming

Digital Object Architecture
PID
Data (elements)
s are Digital Objects
Some Research Objects Nanopublications are Research Objects
are

Data as increasingly FAIR Digital Objects
Totally UNFAIR
PID
Data (elements)
Usable for Humans
PID
Findable
Data (elements)
PID
FAIR metadata
Data (elements)
PID
FAIR data-restricted
access
Data (elements)
FAIR data-
Open Access
PID
Data (elements)
Open Access/Functionally Linked
PID
FAIR data-
Data (elements)

The Data Stewardship Cycle
35
5%

Data Owners
(supp)
data
Data
bases
Repositories
FAIRport proof of concept
ELIXIR FAIR Data Search Index
End-users
FAIR L2
ELIXIR
Data
FAIR
Port
ELIXIR federated data
ELIXIR semantic data repository
FAIR L1
Search for
datasets
Download
data (sub)
sets in many
formats (xml,
rdf, json etc)
FAIR
L3
FAIR L4
ASPs, Inhouse IT,
Bioinformatics
Etc..
Tools &
Applications
Elixir
Fin.
Elixir
Esp.
Elixir
Nor.
Elixir
Elixir UK
Elixir SWE
NL..
Elixir
Fin.
Elixir
Esp.
Elixir
Nor.
Elixir
Elixir UK
Elixir SWE
NL..
www.nanopubmed.org

Parties needed Typical Candidates NL-example
Tusted Party
Usually Public Sector
With 'data stewardship' mandate
1
Executive Party/
Coordinator
Usually Public or Private Sector
With Expert Knowledge on Project
ans relation management
2
Technology
Providers
3 4 PID/ARTA stewards
DTL/ELIXIR-nl
others
5 DOA architecture/IMS CNRI + EURETOS
6 Publishing pipeline EURETOS
7 Repository Software
8 eInfrastructure

Malpractices…….
Journal Impact Factor
Ignore Altmetrics
No data stewardship plan
Obstruct Tenure
Data Experts
‘supplementary data’
Knowledge Sharing Impaired

NITRD
FORCE11 ORCID VIVO
4/10/14
EUDAT
40
DATAVERSE
BD2K
DANS
ELIXIR
NIHCom
mons
H2020
DRYAD RDA
FigShare
Nanopub
Biosharing
Elsevier
Science Nature
SageBio
HVP
DataCite
EGA
Reseach Objects
Nebulus
Embassy
SADI
EURETOS
YARCdata
IMI
interoperability
ISA
Open PHACTS
Data Fabric

Good practices (apart from collaborating)
‘professional data publishing’
RO Impact Factor
Award Altmetrics
5% for
data stewardship plan
Train & Tenure
Data Experts
FAIR play

Endorsed by 82 organisations and [y] individuals
1. FAIR guiding principles with public discussion forum:
https://www.force11.org/group/fairgroup/fairprinciples
2. Notes and Annexes: https://www.force11.org/node/6062/
3. Group home page https://www.force11.org/group/fairgroup
COMMENT: (till October 1st)
ENDORSE: (after October 1st)

Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

Recommended

Recommended

More Related Content

What's hot

What's hot (17)

Similar to Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"

Similar to Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway" (20)

More from Research Data Alliance

More from Research Data Alliance (20)

Recently uploaded

Recently uploaded (20)

Prof. Barend Mons, Biosemantics Group at Leiden University Medical Center and Head of Node of ELIXIR-NL - Keynote "Bringing Data to Broadway"