Real-World Data Challenges: Moving Towards Richer Data Ecosystems

| 1
Anita de Waard 0000-0002-9034-4119
VP Research Data Collaborations
Elsevier RDM Services
a.dewaard@elsevier.com
Big Data PI Meeting
March 16, 2016
Real-World Data
Challenges:
Moving Towards
Richer Data Ecosystems

| 2
ESGF-
VL
ESGF
ESG-
CET
ESG-II
ESG-I
Usable
capabilities
Future
capabilities
Prototype
capabilities
1999-2001
2001-2006
2006-2011
2011-2020
2020-
Planned Earth System Grid System Evolution
Planned Earth System Grid System Data Archival
Model
Intercomparison
Projects
Remote Sensing,
In Situ, Climatology,
Diagnostics, Ecosystem,
Hydrology, Biology,
Etc.
Petabytes (1015) Exabytes (1018)
1999 20222017
Centralized Archive Distributed Data Ecosystem Virtual Laboratory
Source: Dean Williams, Lawrence Livermore/ESGF, March 1st 2017
Trend # 1: Repositories are becoming virtual labs

| 3
Trend # 2: Scientists are Moving ‘Beyond Downloads’

| 4
Trend # 3: Computers are scientists, too!
“intelligent systems for computer-aided
discovery can complement and integrate
into the insight generation loop in
scalable ways…”
http://ieeexplore.ieee.org/abstract/document/7515118/: Computer-Aided Discovery: Toward Scientific Insight Generation with Machine Support
“This work combines time series Principal
Component Analysis with InSAR to constrain
the space of possible model explanations on
current empirical data sets and achieve a better
identification of deformation patterns”

| 5
Raising many technical/organisational/policy questions:
• Is Long-Tail Data + Semantics = Big Data?
• Is Data Science a field, or a skill? (A department, or a class?)
• Are supercomputing centers research departments or bits of infrastructure? (And if
infrastructure, are they part of IT? (“Oh, no, anything but that!”)
• Are repositories places to store outputs, or places where science is conducted?
• If so, how are repositories and HPC’s recognised and rewarded?
• How can we keep track of (micro)provenance of parts of data sets?
• Should we explore Blockchain technology for this? (“Oh no, anything but that!”)
• Is a piece of software part of the University’s Research Outputs?
• If so, how do we reward brilliant coders who blog, but don’t write?
• How do we reward (virtual) collaboration?
• Why won’t those damn scientists share their data?
• Who will own the Data Science Cloud: Amazon? Or the joint HPC’s (NDS??) Is NIH
Data Commons the Model? Or is this a free for all? What is the role of commercial
parties?
• Is data curation/stewardship a part of science, or a glorified administrator's job?
• What is the role of libraries, in all this?
• And why the hell is a publisher talking about it?

| 6 6
Inst. Data
Repositorie(s)
Lab
ELN(s)
Data
Journal
Data search
Link to article
Journal
Find
Topic
Identify
gaps
Plan &
Fund
Discover data, people,
methods & protocols
Collect, analyze &
vizualize
Store, preserve
& share
Publish
Prepare, reproduce,
re-use & benchmark
Domain-specific
Repositories
General search
Faculty
LIMS
Data
center
Inst. Data
Repositorie(s)
Lab
ELN(s)
Data
Journal
Data search
Data Management
Plans
Metadata, methods &
protocols ready for
preservation and publishing
Link to article
Journal
Publish data
(under embargo)
Secure
discoverability
in & outside
the institution
Plan each step from
experiment to publish
Domain-specific
Repositories
General search
What Elsevier is Interested in: Supporting RDM Networks

| 7
Biological Pathways extracted via
semantic text mining
A upregulates B
B upregulates C
C increases disease D
Normalizing vocabularies required: proteins, diseases, drugs, chemicals
A  B  C  D
Bioactivities
through text analysis
IC50 6.3nM, kinase binding assay
10mM concentration
Chemical Structures
And Properties
InChi,
Name
NCBI,
Uniprot
EMTREE
ReaxysTree,
Structures
What Elsevier is Interested in: Knowledge Graphs in Life
Science

| 8
What Elsevier is Interested in: Knowledgegraphs in Research

| 9
Thank you!
Links to things we’re involved with:
• https://www.elsevier.com/connect/10-aspects-of-highly-effective-research-data
• https://www.elsevier.com/about/open-science/research-data
• https://www.hivebench.com
• https://data.mendeley.com/
• https://datasearch.elsevier.com/
• https://www.elsevier.com/books-and-journals/content-innovation/data-base-
linking
• http://www.journals.elsevier.com/softwarex/
• https://www.elsevier.com/physical-sciences/earth-and-planetary-sciences/the-
2015-international-data-rescue-award-in-the-geosciences
• https://rd-alliance.org/groups/rdawds-publishing-data-services-wg.html
• https://www.force11.org/
• http://www.nationaldataservice.org/
• https://rd-alliance.org/
Anita de Waard, a.dewaard@elsevier.com

Real-World Data Challenges: Moving Towards Richer Data Ecosystems

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Real-World Data Challenges: Moving Towards Richer Data Ecosystems

Similar to Real-World Data Challenges: Moving Towards Richer Data Ecosystems (20)

More from Anita de Waard

More from Anita de Waard (20)

Recently uploaded

Recently uploaded (20)

Real-World Data Challenges: Moving Towards Richer Data Ecosystems

Editor's Notes