'Stories that persuade with data' - talk at CENDI meeting January 9 2014

Anita de Waard
Anita de WaardVice President, Research Data Collaborations at Elsevier
Stories, that Persuade With Data:
Some Thoughts on Making Networks
of Knowledge.
Anita de Waard
VP Research Data Collaborations
a.dewaard@elsevier.com
Outline:
• Stories, that persuade with data
• Networks of claims and data
• Research Data Management: some
thoughts.
Scientific articles are stories...
The Story of Goldilocks and the
Three Bears

Story

Grammar

Paper

The AXH Domain of Ataxin-1 Mediates Neurodegeneration through Its
Interaction with Gfi-1/Senseless Proteins

Once upon a time

Time

Setting

Background

The mechanisms mediating SCA1 pathogenesis are still not fully
understood, but some general principles have emerged.

a little girl named Goldilocks

Characters

Objects of study the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract,

She went for a walk in the forest.
Pretty soon, she came upon a
house.

Location

Experimental
setup

studied and compared in vivo effects and interactions to those of the
human protein

She knocked and, when no one
answered,

Goal

Research
goal

Gain insight into how Atx-1's function contributes to SCA1 pathogenesis.
How these interactions might contribute to the disease process and how
they might cause toxicity in only a subset of neurons in SCA1 is not fully
understood.

she walked right in.

Attempt

Hypothesis

Atx-1 may play a role in the regulation of gene expression

At the table in the kitchen, there
were three bowls of porridge.

Name

Name

dAtX-1 and hAtx-1 Induce Similar Phenotypes When Overexpressed in File

Goldilocks was hungry.

Subgoal

Subgoal

test the function of the AXH domain

She tasted the porridge from the
first bowl.

Attempt

Method

overexpressed dAtx-1 in flies using the GAL4/UAS system (Brand and
Perrimon, 1993) and compared its effects to those of hAtx-1.

This porridge is too hot! she
exclaimed.

Outcome

Results

Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which drives
expression in the differentiated R1-R6 photoreceptor cells (Mollereau et a
2000 and O'Tousa et al., 1985), results in neurodegeneration in the eye, as
does overexpression of hAtx-1[82Q]. Although at 2 days after eclosion,
overexpression of either Atx-1 does not show obvious morphological
changes in the photoreceptor cells

Data

(data not shown),

Results

both genotypes show many large holes and loss of cell integrity at 28 days

So, she tasted the porridge from the Activity
second bowl.
3
This porridge is too cold, she said
Outcome

Theme

Episode 1
...that persuade (editors/authors/readers!)…
Aristotle

Quintilian

Scientific Paper

prooimion

The introduction of a speech, where one announces the
Introduction subject and purpose of the discourse, and where one usually
/ exordium
employs the persuasive appeal to ethos in order to establish
credibility with the audience.

Introduction:
positioning

prothesis

Statement
of
The speaker here provides a narrative account of what has
Facts/narrati happened and generally explains the nature of the case.
o

Introduction: research
question

Summary/
propostitio

The propositio provides a brief summary of what one is about
to speak on, or concisely puts forth the charges or accusation.

Summary of contents

Proof/
confirmatio

The main body of the speech where one offers logical
arguments as proof. The appeal to logos is emphasized here.

Results

Refutation/
refutatio

As the name connotes, this section of a speech was devoted to
answering the counterarguments of one's opponent.

Related Work

peroratio

Following the refutatio and concluding the classical oration, the
peroratio conventionally employed appeals through pathos,
and often included a summing up.

Discussion: summary,
implications.

pistis

epilogos
... with data.

5
As claims get cited, they become facts:
Voorhoeve et al, Cell, 2006:
To investigate the possibility that miR-372 and miR-373 suppress the
expression of LATS2, we...

Hypothesis

Therefore, these results point to LATS2 as a mediator of the miR-372 and miR-373
effects on cell proliferation and tumorigenicity,
Implication

Raver-Shapira et.al, JMolCell 2007
Cited Implication
... two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in
testicular germ cell tumors by inhibition of LATS2 expression, which suggests that
Lats2 is an important tumor suppressor (Voorhoeve et al., 2006).
Yabuta, JBioChem 2007:
miR-372 and miR-373 target the Lats2 tumor suppressor (Voorhoeve et al., 2006)

Fact
There are many problems
with this system!
• There are too many papers to read – for
people.
• Papers are not written in a way that makes
reading easy – for computers.
• Issues with reproducibility.
• Hard to access or assess the data.
• And how do we know how many data points a
claim was based on?
What might work: networks of claims and data:
Claim
PHC

undergo

Paper A:

Growth arrest

Paper B:
implication

implication

method

fact
method link

method

fact

goal

fact

goal

fact

results

results

data 1

data 4
data 2

data 3

data 5

data 6
How do we get there? Find claims:
E.g., scientific discourse analysis:
In contrast with previous hypotheses compact plaques form before significant
deposition of diffuse A beta, suggesting that different mechanisms are involved
in the deposition of diffuse amyloid and the aggregation into plaques.
Entities
Relationships
Temporality
Connections

Status

core information
(proposition)

information extraction

discourse structure

discourse analysis

rhetorical
metadiscourse

discourse analysis

thematic roles

Sándor, Àgnes and de Waard, Anita, (2012).
Turn claims into formal representations:
Biological statement with BEL/ epistemic
markup

BEL representation:

Epistemic
evaluation

These miRNAs neutralize p53-mediated CDK
inhibition, possibly through direct inhibition
of the expression of the tumor-suppressor
LATS2.

r(MIR:miR-372) |(tscript(p(HUGO:Trp53)) -|
kin(p(PFH:”CDK Family”)))
Increased abundance of miR372 decreases abundance of
LATS2
r(MIR:miR-372) -|
r(HUGO:LATS2)

Value =
Possible
Source =
Unknown
Basis =
Unknown

Biological statement with
Medscan/epistemic markup

MedScan Representation:

Epistemic
evaluation

Furthermore, we present evidence that the
secretion of nesfatin-1 into the culture
media was dramatically increased during the
differentiation of 3T3-L1 preadipocytes into
adipocytes (P < 0.001) and after treatments
with TNF-alpha, IL-6, insulin, and
dexamethasone (P < 0.01).

IL-6  NUCB2 (nesfatin-1)
Relation: MolTransport
Effect: Positive
CellType: Adipocytes
Cell Line: 3T3-L1

Value =
Probable
Source =
Author
Basis = Data
Use Linked Data to point to claims, and connect them:
the xml is fixed, but the structure is open!

allows for layers of annotation
but we all know
she was deluded then

said @anitawaard
on January 9, 2014

<ce:section id=#123>

this says

mice like cheese
What about the data?
•
•
•
•

Can we see it?
How can we evaluate it?
Can it be reproduced?
Can we combine or compare data points?
Elsevier Research Data Services
• Goals:
–
–
–
–

Increase data preservation: quantity and quality
Improve data use: by and for authors, readers, and lay people
Enhance interoperability: between systems, journals, databases
Help develop a sustainable data infrastructure.

• Guiding principles:
– In principle, all data stays open
– Work with existing repositories and tools (so URLs, front end etc
stay where they are)
– 2013/2014: Series of pilots and questionnaires to drive data
strategy/data policy and contribute optimally to an integrated
data infrastructure, enabling networked knowledge.
Research data management today:
Using antibodies
and squishy bits
Grad Students experiment
and enter details into their
lab notebook.
The PI then tries to make
sense of their slides,
and writes a paper.
End of story.
Where research data goes now:
> 50 My Papers
2 M scientists
2 My papers/year

A small portion of data
(1-2%?) stored in small,
topic-focused
data repositories

Majority of data
(90%?) is stored
on local hard drives

PDB:
88,3 k

PetDB:
1,5 k
MiRB:
25k

Some data
(8%?) stored in large,
generic data
repositories

Dryad:
7,631 files

SedDB:
0.6 k
TAIR:
72,1 k

Dataverse:
0.6 My

Institutional
Repositories
An Urban Legend is born:
• How can we make a standard neuroscience
wet lab more data-sharing savvy?
• Incorporate structured workflows into the daily
practice of a typical electrophysiology lab (the
Urban Lab at CMU)
– What does it take?
– Where are points of conflict?

• 1-year pilot, funded by Elsevier RDS:
– CMU: Shreejoy Tripathy, manage/user test
– Elsevier: development, UI, project management
Annotating data during
experimentation:
What does high-quality data
curation take?
Pilot project with IEDA:
– Build a database for lunar geochemistry
– Write joint report on building
repository, curation, costs and
challenges
Some thoughts
about the role of
the institute:

Funding
Agencies
Performance
reporting
Library

Institution
Research Office

Usage/Citation
reporting
Institutional
Repository

Indexing
Integrated
Performance
Query

Usage/Citation
reporting
Indexing

Research Data
Repositories

Unified Metadata Layer

Curation

Deposit /
Store
Indexing
Generic Data Storage
(such as Dropbox)

Electronic Lab
Notebooks

Integrated
Data Search

Data Flow

Deposit /
Store

Indexing & Search

Researchers

Performance
Reporting
Data Initiatives:
• Data Citation group:
– Synthesize principles of proper data citation
– ‘Declaration of Data Citation Principles’, 8 principles of
successful data citation -http://www.force11.org/datacitation

• Resource Identification Initiative:
– Promote research resource identification, discovery, and reuse
– Resource Identification Portal http://scicrunch.com/resources
– Central location for obtaining research resource identifiers
(RRIDs) for materials and software used in biomedical research
• Antibody: Abgent Cat# AP7251E, ABR:AB_2140114
• Tool: CellProfiler Image Analysis Software, NIFRegistry:nif-0000-00280
• Organism: MGI:MGI:3840442
In summary:
• Stories, that persuade with data:
– We need better ways to communicate science!

• Networks of claims and data:
– Promising steps towards identifying claims
– Entity identification and Linked Data helps
– Problem: access to data

• Research Data Management:
– Key issue: get data in up-front
– Evaluate and scale up role of repositories
– Codevelop view of role of institutions/libraries
Questions?
Anita de Waard
VP Research Data Collaborations
a.dewaard@elsevier.com

http://researchdata.elsevier.com/
Thank you!
Collaborations and discussions gratefully acknowledged:
• CMU: Nathan Urban, Shreejoy Tripathy, Shawn Burton, Rick
Gerkin, Santosh Chandrasekaran, Matthew Geramita, Eduard
Hovy
• UCSD: Phil Bourne, Brian Shoettlander, David Minor, Declan
Fleming, Ilya Zaslavsky
• NIF/Force11: Maryann Martone, Anita Bandrowski
• OHSU: Melissa Haendel, Nicole Vasilevsky
• California Digital Library: Carly Strasser, John Kunze, Stephen
Abrams
• IEDA: Kerstin Lehnert, Annika
• Elsevier: Mark Harviston, Jez Alder, David Marques
1 of 23

Recommended

Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO... by
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...Neuroscience Information Framework
1.3K views31 slides
From Advanced Queries to Algorithms and Graph-Based ML: Tackling Diabetes wit... by
From Advanced Queries to Algorithms and Graph-Based ML: Tackling Diabetes wit...From Advanced Queries to Algorithms and Graph-Based ML: Tackling Diabetes wit...
From Advanced Queries to Algorithms and Graph-Based ML: Tackling Diabetes wit...Neo4j
327 views34 slides
Integration of heterogeneous data by
Integration of heterogeneous dataIntegration of heterogeneous data
Integration of heterogeneous dataLars Juhl Jensen
294 views252 slides
Pydata London January 2017 by
Pydata London January 2017Pydata London January 2017
Pydata London January 2017Edward Perello
149 views43 slides
2015 ohsu-metagenome by
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenomec.titus.brown
1K views57 slides
Literature mining and large-scale data integration by
Literature mining and large-scale data integrationLiterature mining and large-scale data integration
Literature mining and large-scale data integrationLars Juhl Jensen
316 views130 slides

More Related Content

What's hot

2016 davis-plantbio by
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbioc.titus.brown
729 views55 slides
2015 aem-grs-keynote by
2015 aem-grs-keynote2015 aem-grs-keynote
2015 aem-grs-keynotec.titus.brown
1.4K views32 slides
Literature Mining and Systems Biology by
Literature Mining and Systems BiologyLiterature Mining and Systems Biology
Literature Mining and Systems BiologyLars Juhl Jensen
418 views68 slides
Quality Assessment of Biomedical Metadata using Topic Modeling by
Quality Assessment of Biomedical Metadata using Topic ModelingQuality Assessment of Biomedical Metadata using Topic Modeling
Quality Assessment of Biomedical Metadata using Topic ModelingStuti Nayak
138 views21 slides
Practical Guide to the $1000 Genome (2014) by
Practical Guide to the $1000 Genome (2014)Practical Guide to the $1000 Genome (2014)
Practical Guide to the $1000 Genome (2014)AllSeq
2.6K views47 slides
Text Mining for Biocuration of Bacterial Infectious Diseases by
Text Mining for Biocuration of Bacterial Infectious DiseasesText Mining for Biocuration of Bacterial Infectious Diseases
Text Mining for Biocuration of Bacterial Infectious DiseasesDan Sullivan, Ph.D.
623 views26 slides

What's hot(13)

Literature Mining and Systems Biology by Lars Juhl Jensen
Literature Mining and Systems BiologyLiterature Mining and Systems Biology
Literature Mining and Systems Biology
Lars Juhl Jensen418 views
Quality Assessment of Biomedical Metadata using Topic Modeling by Stuti Nayak
Quality Assessment of Biomedical Metadata using Topic ModelingQuality Assessment of Biomedical Metadata using Topic Modeling
Quality Assessment of Biomedical Metadata using Topic Modeling
Stuti Nayak138 views
Practical Guide to the $1000 Genome (2014) by AllSeq
Practical Guide to the $1000 Genome (2014)Practical Guide to the $1000 Genome (2014)
Practical Guide to the $1000 Genome (2014)
AllSeq2.6K views
Text Mining for Biocuration of Bacterial Infectious Diseases by Dan Sullivan, Ph.D.
Text Mining for Biocuration of Bacterial Infectious DiseasesText Mining for Biocuration of Bacterial Infectious Diseases
Text Mining for Biocuration of Bacterial Infectious Diseases
Building a Biomedical Knowledge Garden by Benjamin Good
Building a Biomedical Knowledge Garden Building a Biomedical Knowledge Garden
Building a Biomedical Knowledge Garden
Benjamin Good6.1K views
Exploring proteins, chemicals and their interactions with STRING and STITCH by biocs
Exploring proteins, chemicals and their interactions with STRING and STITCHExploring proteins, chemicals and their interactions with STRING and STITCH
Exploring proteins, chemicals and their interactions with STRING and STITCH
biocs552 views

Similar to 'Stories that persuade with data' - talk at CENDI meeting January 9 2014

How Scientists Read, How Computers Read, and What We Should Do by
How Scientists Read, How Computers Read, and What We Should DoHow Scientists Read, How Computers Read, and What We Should Do
How Scientists Read, How Computers Read, and What We Should DoAnita de Waard
1.9K views37 slides
Are we finally ready for transclusion?* by
Are we finally ready for transclusion?*Are we finally ready for transclusion?*
Are we finally ready for transclusion?*Paul Groth
785 views14 slides
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy... by
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...Anita de Waard
674 views39 slides
How to persuade with data by
How to persuade with dataHow to persuade with data
How to persuade with dataAnita de Waard
1.4K views41 slides
Argumentation in biology papers by
Argumentation in biology papersArgumentation in biology papers
Argumentation in biology papersAnita de Waard
1.3K views38 slides
2013 alumni-webinar by
2013 alumni-webinar2013 alumni-webinar
2013 alumni-webinarc.titus.brown
810 views50 slides

Similar to 'Stories that persuade with data' - talk at CENDI meeting January 9 2014(20)

How Scientists Read, How Computers Read, and What We Should Do by Anita de Waard
How Scientists Read, How Computers Read, and What We Should DoHow Scientists Read, How Computers Read, and What We Should Do
How Scientists Read, How Computers Read, and What We Should Do
Anita de Waard1.9K views
Are we finally ready for transclusion?* by Paul Groth
Are we finally ready for transclusion?*Are we finally ready for transclusion?*
Are we finally ready for transclusion?*
Paul Groth785 views
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy... by Anita de Waard
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
Anita de Waard674 views
How to persuade with data by Anita de Waard
How to persuade with dataHow to persuade with data
How to persuade with data
Anita de Waard1.4K views
Argumentation in biology papers by Anita de Waard
Argumentation in biology papersArgumentation in biology papers
Argumentation in biology papers
Anita de Waard1.3K views
EEG and Telemetry: Best Practices for Managing Large Data Sets to Investigate... by InsideScientific
EEG and Telemetry: Best Practices for Managing Large Data Sets to Investigate...EEG and Telemetry: Best Practices for Managing Large Data Sets to Investigate...
EEG and Telemetry: Best Practices for Managing Large Data Sets to Investigate...
InsideScientific1.8K views
Amia tb-review-08 by Russ Altman
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
Russ Altman291 views
CINECA webinar slides: Modular and reproducible workflows for federated molec... by CINECAProject
CINECA webinar slides: Modular and reproducible workflows for federated molec...CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECA webinar slides: Modular and reproducible workflows for federated molec...
CINECAProject9 views
IJSRED-V2I1P5 by IJSRED
IJSRED-V2I1P5IJSRED-V2I1P5
IJSRED-V2I1P5
IJSRED22 views
dkNET Webinar: RRIDs and Naughty Cell Lines 04/12/2019 by dkNET
dkNET Webinar: RRIDs and Naughty Cell Lines 04/12/2019dkNET Webinar: RRIDs and Naughty Cell Lines 04/12/2019
dkNET Webinar: RRIDs and Naughty Cell Lines 04/12/2019
dkNET502 views
Data analysis & integration challenges in genomics by mikaelhuss
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
mikaelhuss10.2K views
Talk at ISWC 2012 Workshop on Semantic Technologies Applied to Biomedical In... by Anita de Waard
 Talk at ISWC 2012 Workshop on Semantic Technologies Applied to Biomedical In... Talk at ISWC 2012 Workshop on Semantic Technologies Applied to Biomedical In...
Talk at ISWC 2012 Workshop on Semantic Technologies Applied to Biomedical In...
Anita de Waard1.2K views
Friend harvard 2013-01-30 by Sage Base
Friend harvard 2013-01-30Friend harvard 2013-01-30
Friend harvard 2013-01-30
Sage Base775 views
Crofton McKim Conf Thyroid QSAR Talk 10-17-2008 by KevinCrofton
Crofton McKim Conf Thyroid QSAR Talk 10-17-2008Crofton McKim Conf Thyroid QSAR Talk 10-17-2008
Crofton McKim Conf Thyroid QSAR Talk 10-17-2008
KevinCrofton293 views
Bioinformatics final by Rainu Rajeev
Bioinformatics finalBioinformatics final
Bioinformatics final
Rainu Rajeev432 views
Molecular and data visualization in drug discovery by Deepak Bandyopadhyay
Molecular and data visualization in drug discoveryMolecular and data visualization in drug discovery
Molecular and data visualization in drug discovery

More from Anita de Waard

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse by
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseAnita de Waard
464 views25 slides
Why would a publisher care about open data? by
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?Anita de Waard
308 views59 slides
Research Object Composer: A Tool for Publishing Complex Data Objects in the C... by
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
383 views12 slides
NFAIS Talk on Enabling FAIR Data by
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataAnita de Waard
230 views21 slides
CNI 2018: A Research Object Authoring Tool for the Data Commons by
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsAnita de Waard
301 views16 slides
Enabling FAIR Data: TAG B Authoring Guidelines by
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesAnita de Waard
187 views4 slides

More from Anita de Waard(20)

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse by Anita de Waard
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Anita de Waard464 views
Why would a publisher care about open data? by Anita de Waard
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?
Anita de Waard308 views
Research Object Composer: A Tool for Publishing Complex Data Objects in the C... by Anita de Waard
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Anita de Waard383 views
NFAIS Talk on Enabling FAIR Data by Anita de Waard
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR Data
Anita de Waard230 views
CNI 2018: A Research Object Authoring Tool for the Data Commons by Anita de Waard
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data Commons
Anita de Waard301 views
Enabling FAIR Data: TAG B Authoring Guidelines by Anita de Waard
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring Guidelines
Anita de Waard187 views
Scientific facts are myths, told through fairytales and spread by gossip. by Anita de Waard
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.
Anita de Waard396 views
Data, Data Everywhere: What's A Publisher to Do? by Anita de Waard
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
Anita de Waard338 views
Talk on Research Data Management by Anita de Waard
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data Management
Anita de Waard546 views
Networked Science, And Integrating with Dataverse by Anita de Waard
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
Anita de Waard596 views
Big Data and the Future of Publishing by Anita de Waard
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of Publishing
Anita de Waard276 views
Real-World Data Challenges: Moving Towards Richer Data Ecosystems by Anita de Waard
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Anita de Waard593 views
Data Repositories: Recommendation, Certification and Models for Cost Recovery by Anita de Waard
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Anita de Waard423 views
The Economics of Data Sharing by Anita de Waard
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data Sharing
Anita de Waard374 views
Public Identifiers in Scholarly Publishing by Anita de Waard
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly Publishing
Anita de Waard129 views
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum by Anita de Waard
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Anita de Waard519 views
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data by Anita de Waard
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Anita de Waard573 views
RDA-WDS Publishing Data Interest Group by Anita de Waard
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
Anita de Waard401 views

Recently uploaded

Melek BEN MAHMOUD.pdf by
Melek BEN MAHMOUD.pdfMelek BEN MAHMOUD.pdf
Melek BEN MAHMOUD.pdfMelekBenMahmoud
14 views1 slide
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdfDr. Jimmy Schwarzkopf
19 views29 slides
AMAZON PRODUCT RESEARCH.pdf by
AMAZON PRODUCT RESEARCH.pdfAMAZON PRODUCT RESEARCH.pdf
AMAZON PRODUCT RESEARCH.pdfJerikkLaureta
26 views13 slides
Special_edition_innovator_2023.pdf by
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdfWillDavies22
17 views6 slides
Piloting & Scaling Successfully With Microsoft Viva by
Piloting & Scaling Successfully With Microsoft VivaPiloting & Scaling Successfully With Microsoft Viva
Piloting & Scaling Successfully With Microsoft VivaRichard Harbridge
12 views160 slides
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensorssugiuralab
19 views15 slides

Recently uploaded(20)

STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf by Dr. Jimmy Schwarzkopf
STKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdfSTKI Israeli Market Study 2023   corrected forecast 2023_24 v3.pdf
STKI Israeli Market Study 2023 corrected forecast 2023_24 v3.pdf
AMAZON PRODUCT RESEARCH.pdf by JerikkLaureta
AMAZON PRODUCT RESEARCH.pdfAMAZON PRODUCT RESEARCH.pdf
AMAZON PRODUCT RESEARCH.pdf
JerikkLaureta26 views
Special_edition_innovator_2023.pdf by WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2217 views
Piloting & Scaling Successfully With Microsoft Viva by Richard Harbridge
Piloting & Scaling Successfully With Microsoft VivaPiloting & Scaling Successfully With Microsoft Viva
Piloting & Scaling Successfully With Microsoft Viva
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors by sugiuralab
TouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective SensorsTouchLog: Finger Micro Gesture Recognition  Using Photo-Reflective Sensors
TouchLog: Finger Micro Gesture Recognition Using Photo-Reflective Sensors
sugiuralab19 views
Igniting Next Level Productivity with AI-Infused Data Integration Workflows by Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software263 views
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... by Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker37 views
Serverless computing with Google Cloud (2023-24) by wesley chun
Serverless computing with Google Cloud (2023-24)Serverless computing with Google Cloud (2023-24)
Serverless computing with Google Cloud (2023-24)
wesley chun11 views
Attacking IoT Devices from a Web Perspective - Linux Day by Simone Onofri
Attacking IoT Devices from a Web Perspective - Linux Day Attacking IoT Devices from a Web Perspective - Linux Day
Attacking IoT Devices from a Web Perspective - Linux Day
Simone Onofri16 views
PharoJS - Zürich Smalltalk Group Meetup November 2023 by Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi127 views
Empathic Computing: Delivering the Potential of the Metaverse by Mark Billinghurst
Empathic Computing: Delivering  the Potential of the MetaverseEmpathic Computing: Delivering  the Potential of the Metaverse
Empathic Computing: Delivering the Potential of the Metaverse
Mark Billinghurst478 views

'Stories that persuade with data' - talk at CENDI meeting January 9 2014

  • 1. Stories, that Persuade With Data: Some Thoughts on Making Networks of Knowledge. Anita de Waard VP Research Data Collaborations a.dewaard@elsevier.com
  • 2. Outline: • Stories, that persuade with data • Networks of claims and data • Research Data Management: some thoughts.
  • 3. Scientific articles are stories... The Story of Goldilocks and the Three Bears Story Grammar Paper The AXH Domain of Ataxin-1 Mediates Neurodegeneration through Its Interaction with Gfi-1/Senseless Proteins Once upon a time Time Setting Background The mechanisms mediating SCA1 pathogenesis are still not fully understood, but some general principles have emerged. a little girl named Goldilocks Characters Objects of study the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract, She went for a walk in the forest. Pretty soon, she came upon a house. Location Experimental setup studied and compared in vivo effects and interactions to those of the human protein She knocked and, when no one answered, Goal Research goal Gain insight into how Atx-1's function contributes to SCA1 pathogenesis. How these interactions might contribute to the disease process and how they might cause toxicity in only a subset of neurons in SCA1 is not fully understood. she walked right in. Attempt Hypothesis Atx-1 may play a role in the regulation of gene expression At the table in the kitchen, there were three bowls of porridge. Name Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When Overexpressed in File Goldilocks was hungry. Subgoal Subgoal test the function of the AXH domain She tasted the porridge from the first bowl. Attempt Method overexpressed dAtx-1 in flies using the GAL4/UAS system (Brand and Perrimon, 1993) and compared its effects to those of hAtx-1. This porridge is too hot! she exclaimed. Outcome Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which drives expression in the differentiated R1-R6 photoreceptor cells (Mollereau et a 2000 and O'Tousa et al., 1985), results in neurodegeneration in the eye, as does overexpression of hAtx-1[82Q]. Although at 2 days after eclosion, overexpression of either Atx-1 does not show obvious morphological changes in the photoreceptor cells Data (data not shown), Results both genotypes show many large holes and loss of cell integrity at 28 days So, she tasted the porridge from the Activity second bowl. 3 This porridge is too cold, she said Outcome Theme Episode 1
  • 4. ...that persuade (editors/authors/readers!)… Aristotle Quintilian Scientific Paper prooimion The introduction of a speech, where one announces the Introduction subject and purpose of the discourse, and where one usually / exordium employs the persuasive appeal to ethos in order to establish credibility with the audience. Introduction: positioning prothesis Statement of The speaker here provides a narrative account of what has Facts/narrati happened and generally explains the nature of the case. o Introduction: research question Summary/ propostitio The propositio provides a brief summary of what one is about to speak on, or concisely puts forth the charges or accusation. Summary of contents Proof/ confirmatio The main body of the speech where one offers logical arguments as proof. The appeal to logos is emphasized here. Results Refutation/ refutatio As the name connotes, this section of a speech was devoted to answering the counterarguments of one's opponent. Related Work peroratio Following the refutatio and concluding the classical oration, the peroratio conventionally employed appeals through pathos, and often included a summing up. Discussion: summary, implications. pistis epilogos
  • 6. As claims get cited, they become facts: Voorhoeve et al, Cell, 2006: To investigate the possibility that miR-372 and miR-373 suppress the expression of LATS2, we... Hypothesis Therefore, these results point to LATS2 as a mediator of the miR-372 and miR-373 effects on cell proliferation and tumorigenicity, Implication Raver-Shapira et.al, JMolCell 2007 Cited Implication ... two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumor suppressor (Voorhoeve et al., 2006). Yabuta, JBioChem 2007: miR-372 and miR-373 target the Lats2 tumor suppressor (Voorhoeve et al., 2006) Fact
  • 7. There are many problems with this system! • There are too many papers to read – for people. • Papers are not written in a way that makes reading easy – for computers. • Issues with reproducibility. • Hard to access or assess the data. • And how do we know how many data points a claim was based on?
  • 8. What might work: networks of claims and data: Claim PHC undergo Paper A: Growth arrest Paper B: implication implication method fact method link method fact goal fact goal fact results results data 1 data 4 data 2 data 3 data 5 data 6
  • 9. How do we get there? Find claims: E.g., scientific discourse analysis: In contrast with previous hypotheses compact plaques form before significant deposition of diffuse A beta, suggesting that different mechanisms are involved in the deposition of diffuse amyloid and the aggregation into plaques. Entities Relationships Temporality Connections Status core information (proposition) information extraction discourse structure discourse analysis rhetorical metadiscourse discourse analysis thematic roles Sándor, Àgnes and de Waard, Anita, (2012).
  • 10. Turn claims into formal representations: Biological statement with BEL/ epistemic markup BEL representation: Epistemic evaluation These miRNAs neutralize p53-mediated CDK inhibition, possibly through direct inhibition of the expression of the tumor-suppressor LATS2. r(MIR:miR-372) |(tscript(p(HUGO:Trp53)) -| kin(p(PFH:”CDK Family”))) Increased abundance of miR372 decreases abundance of LATS2 r(MIR:miR-372) -| r(HUGO:LATS2) Value = Possible Source = Unknown Basis = Unknown Biological statement with Medscan/epistemic markup MedScan Representation: Epistemic evaluation Furthermore, we present evidence that the secretion of nesfatin-1 into the culture media was dramatically increased during the differentiation of 3T3-L1 preadipocytes into adipocytes (P < 0.001) and after treatments with TNF-alpha, IL-6, insulin, and dexamethasone (P < 0.01). IL-6  NUCB2 (nesfatin-1) Relation: MolTransport Effect: Positive CellType: Adipocytes Cell Line: 3T3-L1 Value = Probable Source = Author Basis = Data
  • 11. Use Linked Data to point to claims, and connect them: the xml is fixed, but the structure is open! allows for layers of annotation but we all know she was deluded then said @anitawaard on January 9, 2014 <ce:section id=#123> this says mice like cheese
  • 12. What about the data? • • • • Can we see it? How can we evaluate it? Can it be reproduced? Can we combine or compare data points?
  • 13. Elsevier Research Data Services • Goals: – – – – Increase data preservation: quantity and quality Improve data use: by and for authors, readers, and lay people Enhance interoperability: between systems, journals, databases Help develop a sustainable data infrastructure. • Guiding principles: – In principle, all data stays open – Work with existing repositories and tools (so URLs, front end etc stay where they are) – 2013/2014: Series of pilots and questionnaires to drive data strategy/data policy and contribute optimally to an integrated data infrastructure, enabling networked knowledge.
  • 14. Research data management today: Using antibodies and squishy bits Grad Students experiment and enter details into their lab notebook. The PI then tries to make sense of their slides, and writes a paper. End of story.
  • 15. Where research data goes now: > 50 My Papers 2 M scientists 2 My papers/year A small portion of data (1-2%?) stored in small, topic-focused data repositories Majority of data (90%?) is stored on local hard drives PDB: 88,3 k PetDB: 1,5 k MiRB: 25k Some data (8%?) stored in large, generic data repositories Dryad: 7,631 files SedDB: 0.6 k TAIR: 72,1 k Dataverse: 0.6 My Institutional Repositories
  • 16. An Urban Legend is born: • How can we make a standard neuroscience wet lab more data-sharing savvy? • Incorporate structured workflows into the daily practice of a typical electrophysiology lab (the Urban Lab at CMU) – What does it take? – Where are points of conflict? • 1-year pilot, funded by Elsevier RDS: – CMU: Shreejoy Tripathy, manage/user test – Elsevier: development, UI, project management
  • 18. What does high-quality data curation take? Pilot project with IEDA: – Build a database for lunar geochemistry – Write joint report on building repository, curation, costs and challenges
  • 19. Some thoughts about the role of the institute: Funding Agencies Performance reporting Library Institution Research Office Usage/Citation reporting Institutional Repository Indexing Integrated Performance Query Usage/Citation reporting Indexing Research Data Repositories Unified Metadata Layer Curation Deposit / Store Indexing Generic Data Storage (such as Dropbox) Electronic Lab Notebooks Integrated Data Search Data Flow Deposit / Store Indexing & Search Researchers Performance Reporting
  • 20. Data Initiatives: • Data Citation group: – Synthesize principles of proper data citation – ‘Declaration of Data Citation Principles’, 8 principles of successful data citation -http://www.force11.org/datacitation • Resource Identification Initiative: – Promote research resource identification, discovery, and reuse – Resource Identification Portal http://scicrunch.com/resources – Central location for obtaining research resource identifiers (RRIDs) for materials and software used in biomedical research • Antibody: Abgent Cat# AP7251E, ABR:AB_2140114 • Tool: CellProfiler Image Analysis Software, NIFRegistry:nif-0000-00280 • Organism: MGI:MGI:3840442
  • 21. In summary: • Stories, that persuade with data: – We need better ways to communicate science! • Networks of claims and data: – Promising steps towards identifying claims – Entity identification and Linked Data helps – Problem: access to data • Research Data Management: – Key issue: get data in up-front – Evaluate and scale up role of repositories – Codevelop view of role of institutions/libraries
  • 22. Questions? Anita de Waard VP Research Data Collaborations a.dewaard@elsevier.com http://researchdata.elsevier.com/
  • 23. Thank you! Collaborations and discussions gratefully acknowledged: • CMU: Nathan Urban, Shreejoy Tripathy, Shawn Burton, Rick Gerkin, Santosh Chandrasekaran, Matthew Geramita, Eduard Hovy • UCSD: Phil Bourne, Brian Shoettlander, David Minor, Declan Fleming, Ilya Zaslavsky • NIF/Force11: Maryann Martone, Anita Bandrowski • OHSU: Melissa Haendel, Nicole Vasilevsky • California Digital Library: Carly Strasser, John Kunze, Stephen Abrams • IEDA: Kerstin Lehnert, Annika • Elsevier: Mark Harviston, Jez Alder, David Marques

Editor's Notes

  1. Walk through pieces 1 by 1, also mention that this is very much an uncompleted work in progress