SlideShare a Scribd company logo
1 of 41
How to find networked knowledge:About
Stories, that Persuade With Data
Anita de Waard
VP Research Data Collaborations
a.dewaard@elsevier.com
Federal Big Data Meetup, May 20 2014
Discourse Comprehension 101
• Letter < syllable < word < clause < sentence < discourse:
This is how linguistics is structured.
But it is not how we understand text!
• Letter < syllable < word < clause < sentence < discourse:
This is how linguistics is structured.
But it is not how we understand text!
Discourse Comprehension 101
• Letter < syllable < word < clause < sentence < discourse:
This is how linguistics is structured.
But it is not how we understand text!
Discourse Comprehension 101
• Letter < syllable < word < clause < sentence < discourse:
This is how linguistics is structured.
But it is not how we understand text!
Discourse Comprehension 101
• Letter < syllable < word < clause < sentence < discourse:
This is how linguistics is structured.
But it is not how we understand text!
Discourse Comprehension 101
• Letter < syllable < word < clause < sentence < discourse:
This is how linguistics is structured.
But it is not how we understand text!
Discourse Comprehension 101
• Letter < syllable < word < clause < sentence < discourse:
This is how linguistics is structured.
But it is not how we understand text!
• Kintsch and Van Dijk, ‘93: we read a text at three levels:
– surface code: literal text, exact words/syntax
– text base: preserves meaning, but not exact wording
– situation model: ‘microworld’ that the text is about:
constructed inferentially through interaction between the
text and background knowledge
• We use knowledge about text genre to activate a schema:
this allows creation of the text base and situation model
Discourse Comprehension 101
In summary, how scientists read:
• Surface code provides noun phrases and triples that offer
pointers re. topical relevance
• Text base and and situation model are created through specific
metadiscourse conventions (e.g. refs at the end) that create a
biological reasoning model:
• This can be expressed as a set of claims, linked to evidence, that
can help represent key points in the paper
• Journal name and author’s affiliation help define schema and
provide ‘willingness to be convinced’ socially/interpersonally.
We next asked whether …
To do so, we transiently inhibited…
Suppression of X enhanced invasion …
but F was unaffected …(Figure 3A). …
Collectively, these data indicated that … .
Hypothesis
Goal/Method
Result
Results
Implication
Examples of schema’s:
human breast cancer
noninvasive MCF7-Ras
antisense oligonucleotides
high-grade malignancy
cell viability
retroviral vector
miR-31
cloned
transiently expressed miRNA sponges
Is it pertinent? -> Possibly…
Is it true? -> ?
Is it new, but in agreement with what I know? -> -?
What is this paper about?
A. NOUN PHRASES
Noun Phrases: some issues
• Problem 1: disambiguating terms (© GoPubMed):
– Hnrpa1 = Tis = Fli-2 = nuclear ribonucleoprotein A1 = helix
destabilizing protein = single-strand binding protein = hnRNP core
protein A1 = HDP-1 = topoisomerase-inhibitor suppressed.
– Cellulose 1,4-beta-cellobiosidase = exoglucanase
– COLD =/ C.O.L.D. =/ cold (runny nose) =/ cold (low T)
• Problem 2: disambiguating entities (© M. Martone):
– 95 antibodies were (manually!) identified in 8 articles
– 52 did not contain enough information to determine the antibody
used
– Some provided details in other papers
– Failed to give species, clonality, vendor, or catalog number
Noun Phrases: some progress
• Despite these difficulties, noun phrase recall/precision is
quite high, e.g. I2B22011 [1], [2], others: 90%-98%
• Many tools, see [3] for a list; e.g. GoPubMed:
miR-31 PREVENT acquisition of aggressive traits
miR-31 INHIBIT noninvasive MCF7-Ras cells
miR-31 ENHANCE invasion
cell viability AFFECT inhibitor
miR-31 expression DEPRIVE metastatic cells
Is it pertinent? -> Possibly…
Is it true? -> ?
Is it new, but in agreement with what I know? ->?
What is this paper about?
B. TRIPLES
Triples: some issues:
• Contingent on good NP & VP detection
• Hard to parse text! E.g. a commercial tool gave:
insulin maintaining glucose homeostasis
When insulin secretion cannot be increased adequately (type I
diabetes defect) to overcome insulin resistance in maintaining
glucose homeostasis, hyperglycemia and glucose intolerance
ensues.
insulin may be involved glucose homeostasis
Because PANDER is expressed by pancreatic beta-cells and in
response to glucose in a similar way to those of insulin, PANDER
may be involved in glucose homeostasis.
Triples: some progress:
Biological Expression Language [4]:
We provide evidence that these miRNAs are potential novel oncogenes participating in the development
of human testicular germ cell tumors by numbing the p53 pathway, thus allowing tumorigenic growth in
the presence of wild-type p53.
Increased abundance of miR-372 decreases activity of TP53
r(MIR:miR-372) -| tscript(p(HUGO:Trp53))
Context: cancer
SET Disease = “Cancer”
Activity of TP53 decreases cell growth
tscript(p(HUGO:Trp53)) -| bp(GO:”Cell Growth”
The preceding observations demonstrated that X expression deprives Y cells of
attributes associated with Z.
We next asked whether X also prevents the acquisition of A traits by B cells.
To do so, we transiently inhibited X in C cells with either D or E.
Both approaches inhibited X function by > 4.5-fold (Figure S7A).
Suppression of X enhanced invasion by 20-fold and motility by 5-fold, but F was
unaffected by either inhibitor (Figure 3A; Figure S7B).
The E sponge reduced X function by 2.5-fold, but did not affect the activity of other
known Js (Figures S8A and S8B).
Collectively, these data indicated that sustained X activity is necessary to prevent the
acquisition of Z traits by both K and untransformed B cells.
Is it pertinent? -> Need content
Is it true? -> Sounds likely! I know this stuff!
Is it new, but in agreement with what I know? -> Need content
What is this paper about?
C. METADISCOURSE
Metadiscourse: why it matters
• Voorhoeve et al., 2006: “These miRNAs neutralize p53- mediated CDK
inhibition, possibly through direct inhibition of the expression of the tumor
suppressor LATS2.”
• Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and miR-373
were found to allow proliferation of primary human cells that express
oncogenic RAS and active p53, possibly by inhibiting the tumor suppressor
LATS2 (Voorhoeve et al., 2006).”
• Yabuta et al., 2007: “[On the other hand,] two miRNAs, miRNA-372 and-373,
function as potential novel oncogenes in testicular germ cell tumors by
inhibition of LATS2 expression, which suggests that Lats2 is an important tumo
suppressor (Voorhoeve et al., 2006).”
• Okada et al., 2011: “Two oncogenic miRNAs, miR-372 and miR-373, directly
inhibit the expression of Lats2, thereby allowing tumorigenic growth in the
presence of p53 (Voorhoeve et al., 2006).”
“[Y]ou can transform .. fiction into fact just by adding or
subtracting references”, Bruno Latour [5]
Metadiscourse: some progress
• Hedging cues, speculative language, modality/negation:
– Light et al [6]: finding speculative language
– Wilbur et al (Hagit) [7]: focus, polarity, certainty, evidence, and
directionality
– Thompson et al (Sophia) [8]: level of speculation, type/source of
the evidence and level of certainty
• Sentiment detection (e.g. Kim and Hovy [9] a.m.o.):
– Holder of the opinion, strength, polarity as ‘mathematical
function’ acting on main propositional content
• Can make this part of the semantic web: (e.g., Ontology for
Reasoning, Certainty and Attribution, ORCA [10]):
– Value (Presumed True, Probable, Possible, Unknown)
– Source (Author, Named Other, Unknown)
– Basis (Data, Reasoning, Unknown)
Claim:
• sustained miR-31 activity is necessary to prevent the acquisition of aggressive
traits by both tumor cells and untransformed breast epithelial
Evidence: Method:
• We transiently inhibited miR-31 in noninvasive MCF7-Ras cells with either
antisense oligonucleotides or miRNA sponges.
Evidence: Result:
• Both approaches inhibited miR-31 function by >4.5-fold (Figure S7A).
• Suppression of miR-31 enhanced invasion by 20-fold and motility by 5-
fold, but cell viability was unaffected by either inhibitor (Figure 3A; Figure
S7B).
• The miR-31 sponge reduced miR-31 function by 2.5-fold, but did not affect
the activity of other known antimetastatic miRNAs (Figures S8A and S8B).
What is this paper about?
D. CLAIMS AND EVIDENCE
Is it pertinent? -> Probably
Is it true? -> Sounds likely!
Is it new, but in agreement with what I know? -> Check/know
Claims and Evidence: some issues:
• Data2Semantics [11]: linking clinical guidelines to evidence.
Inconsistency within guideline and guidelines v. evidence:
• Studies have demonstrated inconsistent results regarding the use of such
markers of inflammation as C-reactive protein (CRP), interleukins- 6 (IL-6) and
-8, and procalcitonin (PCT) in neutropenic patients with cancer [55–57].
• [55]: PCT and IL-6 are more reliable markers than CRP for predicting
bacteremia in patients with febrile neutropenia
• [56] In conclusion, daily measurement of PCT or IL-6 could help identify
neutropenic patients with a stable course when the fever lasts >3 d. …,
it would reduce adverse events and treatment costs.
• [57] Our study supports the value of PCT as a reliable tool to predict
clinical outcome in febrile neutropenia.
• Drug Interaction Knowledgebase [12]: how to identify evidence?
• R-citalopram_is_not_substrate_of_cyp2c19:
• At 10uM R- or S-CT, ketoconazole reduced reaction velocity to 55 -60% of
control, quinidine to 80%, and omeprazole to 80-85% of control (Fig. 6).
Claims and Evidence: some progress
• Defining ‘salient knowledge components’ in text:
– Argumentative zones, CoreSC can both be found
– Blake, Claim networks (2012)
– Claimed Knowledge Updates (Sandor/de Waard, 2012):
Finding claims in XIP:
E.g. through scientific discourse analysis:
In contrast with previous hypotheses compact plaques form before significant
deposition of diffuse A beta, suggesting that different mechanisms are involved in the
deposition of diffuse amyloid and the aggregation into plaques.
Entities
Relationships
Temporality
Connections thematic roles
Status
core information
(proposition)
information extraction
rhetorical
metadiscourse
discourse analysis
discourse analysisdiscourse structure
Sándor, Àgnes and de Waard, Anita, (2012).
Formalizing claims with hedging:
Biological statement with BEL/ epistemic
markup
BEL representation: Epistemic
evaluation
These miRNAs neutralize p53-mediated CDK
inhibition, possibly through direct inhibition
of the expression of the tumor-suppressor
LATS2.
r(MIR:miR-372) -
|(tscript(p(HUGO:Trp53)) -|
kin(p(PFH:”CDK Family”)))
Increased abundance of miR-
372 decreases abundance of
LATS2
r(MIR:miR-372) -|
r(HUGO:LATS2)
Value =
Possible
Source =
Unknown
Basis =
Unknown
Biological statement with
Medscan/epistemic markup
MedScan Representation: Epistemic
evaluation
Furthermore, we present evidence that the
secretion of nesfatin-1 into the culture
media was dramatically increased during the
differentiation of 3T3-L1 preadipocytes into
adipocytes (P < 0.001) and after treatments
with TNF-alpha, IL-6, insulin, and
dexamethasone (P < 0.01).
IL-6  NUCB2 (nesfatin-1)
Relation: MolTransport
Effect: Positive
CellType: Adipocytes
Cell Line: 3T3-L1
Value =
Probable
Source =
Author
Basis = Data
25
Schema’s: scientific articles are stories...
The Story of Goldilocks and the
Three Bears
Story Grammar Paper The AXH Domain of Ataxin-1 Mediates Neurodegeneration through Its
Interaction with Gfi-1/Senseless Proteins
Once upon a time Time Setting Background The mechanisms mediating SCA1 pathogenesis are still not fully
understood, but some general principles have emerged.
a little girl named Goldilocks Characters Objects of study the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract,
She went for a walk in the forest.
Pretty soon, she came upon a
house.
Location Experimental
setup
studied and compared in vivo effects and interactions to those of the
human protein
She knocked and, when no one
answered,
Goal Theme Research
goal
Gain insight into how Atx-1's function contributes to SCA1 pathogenesis.
How these interactions might contribute to the disease process and how
they might cause toxicity in only a subset of neurons in SCA1 is not fully
understood.
she walked right in. Attempt Hypothesis Atx-1 may play a role in the regulation of gene expression
At the table in the kitchen, there
were three bowls of porridge.
Name Episode 1 Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When Overexpressed in File
Goldilocks was hungry. Subgoal Subgoal test the function of the AXH domain
She tasted the porridge from the
first bowl.
Attempt Method overexpressed dAtx-1 in flies using the GAL4/UAS system (Brand and
Perrimon, 1993) and compared its effects to those of hAtx-1.
This porridge is too hot! she
exclaimed.
Outcome Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which drives
expression in the differentiated R1-R6 photoreceptor cells (Mollereau et a
2000 and O'Tousa et al., 1985), results in neurodegeneration in the eye, as
does overexpression of hAtx-1[82Q]. Although at 2 days after eclosion,
overexpression of either Atx-1 does not show obvious morphological
changes in the photoreceptor cells
So, she tasted the porridge from the
second bowl.
Activity Data (data not shown),
This porridge is too cold, she said Outcome Results both genotypes show many large holes and loss of cell integrity at 28 days
...that persuade (editors/authors/readers!)…
Aristotle Quintilian Scientific Paper
prooimion
Introduction
/ exordium
The introduction of a speech, where one announces the
subject and purpose of the discourse, and where one usually
employs the persuasive appeal to ethos in order to establish
credibility with the audience.
Introduction:
positioning
prothesis
Statement
of
Facts/narrati
o
The speaker here provides a narrative account of what has
happened and generally explains the nature of the case.
Introduction: research
question
Summary/
propostitio
The propositio provides a brief summary of what one is about
to speak on, or concisely puts forth the charges or accusation.
Summary of contents
pistis
Proof/
confirmatio
The main body of the speech where one offers logical
arguments as proof. The appeal to logos is emphasized here.
Results
Refutation/
refutatio
As the name connotes, this section of a speech was devoted to
answering the counterarguments of one's opponent.
Related Work
epilogos peroratio
Following the refutatio and concluding the classical oration, the
peroratio conventionally employed appeals through pathos,
and often included a summing up.
Discussion: summary,
implications.
27
... with data.
What about the data?
Using antibodies
and squishy bits
Grad Students experiment
and enter details into their
lab notebook.
The PI then tries to make
sense of their slides,
and writes a paper.
End of story.
7. Trusted (validated/checked by reviewers)
Maslow’s Hierarchy of Needs for Research Data
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can understand
data & processes)
2. Archived (long-term & format-
independent)
1. Preserved (existing in some form)
5. Discoverable (can be indexed by a system)
3. Accessible (can be accessed by others)
8. Citable (able to point & track citations)
1. Preserve: Data Rescue Challenge
• With IEDA/Lamont: award succesful data
rescue attempts
• Awarded at AGU 2013
• 23 submissions of data that was digitized,
preserved, made available
• Winner: NIMBUS Data Rescue:
– Recovery, reprocessing and digitization of the
infrared and visible observations along with their
navigation and formatting.
– Over 4000 7-track tapes of global infrared
satellite data were read and reprocessed.
– Nearly 200,000 visible light images were
scanned, rectified and navigated.
– All the resultant data was converted to HDF-5
(NetCDF) format and freely distributed to users
from NASA and NSIDC servers.
– This data was then used to calculate monthly sea
ice extents for both the Arctic d the Antarctic.
• Conclusion: we (collectively) need to do more
of this! How can we fund it?
7. Trusted (validated/checked by
reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can
understand data & processes)
1. Preserved (existing in some
form)
5. Discoverable (can be indexed by
a system)
8. Citable (able to point & track
citations)
3. Accessible (can be accessed by
others)
2. Archived (long-term & format-
independent)
7. Trusted (validated/checked by
reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can
understand data & processes)
3. Accessible (can be accessed by
others)
1. Preserved (existing in some
form)
5. Discoverable (can be indexed by
a system)
2. Archived (long-term & format-
independent)
8. Citable (able to point & track
citations)
2. Archive: Olive Project
• CMU CS & Library: funded by a grant
from the IMLS, Elsevier is partner
• Goal: Preservation of executable content
- nowadays a large part of intellectual
output, and very fragile
• Identified a series of software packages
and prepared VM to preserve
• Does it work? Yes – see video (1:24)
7. Trusted (validated/checked by
reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can
understand data & processes)
1. Preserved (existing in some
form)
5. Discoverable (can be indexed by
a system)
8. Citable (able to point & track
citations)
3. Access: Urban Legend
3. Accessible (can be accessed by
others)
2. Archived (long-term & format-
independent)
• Part 1: Metadata acquisition
• Step through experimental process in series of dropdown
menus in simple web UI
• Can be tailored to workflow of individual researcher
• Connected to shared ontologies through lookup table,
managed centrally in lab
• Connect to data input console (Igor Pro)
7. Trusted (validated/checked by
reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can
understand data & processes)
1. Preserved (existing in some
form)
5. Discoverable (can be indexed by
a system)
8. Citable (able to point & track
citations)
4. Comprehend: Urban Legend
3. Accessible (can be accessed by
others)
2. Archived (long-term & format-
independent)
• Part 2: Data Dashboard
• Access, select and manipulate data (calculate
properties, sort and plot)
• Final goal: interactive figures linked to data
• Plan to expand to more neuroscience labs
• Plan to build for geochemistry use case
7. Trusted (validated/checked by
reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can
understand data & processes)
1. Preserved (existing in some
form)
5. Discoverable (can be indexed by
a system)
8. Citable (able to point & track
citations)
5. Discover: Data Indexing proposals
• Collaborated on Data Discovery Index
proposal with UCSD/Carnegie Mellon
• Also worked with UIUC!
• Interested in developing distributed
infrastructures on making data easier to
search: what is the ‘Goldilocks lndex’ where
search is scalable, yet useful?
• Looking for academic/industry partners/use
cases/platforms to address the next stage
• Discoverability is key driver for
metadata/data format structure!
3. Accessible (can be accessed by
others)
2. Archived (long-term & format-
independent)
7. Trusted (validated/checked by
reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can
understand data & processes)
1. Preserved (existing in some
form)
5. Discoverable (can be indexed by
a system)
8. Citable (able to point & track
citations)
6. Reproduce: Resource Identifier Initiative
Force11 Working Group to add data identifiers
to articles that is
– 1) Machine readable;
– 2) Free to generate and access;
– 3) Consistent across publishers and journals.
• Authors publishing in participating journals
will be asked to provide RRID's for their
resources; these are added to the keyword
field
• RRID's will be drawn from:
– The Antibody Registry
– Model Organism Databases
– NIF Resource Registry
• So far, Springer, Wiley, Biomednet, Elsevier
journals have signed up with 11 journals,
more to come
• Wide community adoption!
3. Accessible (can be accessed by
others)
2. Archived (long-term & format-
independent)
7. Trusted (validated/checked by
reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can
understand data & processes)
1. Preserved (existing in some
form)
5. Discoverable (can be indexed by
a system)
8. Citable (able to point & track
citations)
7.Trust: Moonrocks
3. Accessible (can be accessed by
others)
2. Archived (long-term & format-
independent)
How can we scale up data curation?
Pilot project with IEDA:
• A database for lunar geochemistry:
leapfrog & improve curation time
• 1-year pilot, funded by Elsevier
• Main conclusion: if spreadsheet
columns/headers map to RDB
schema we can scale curation cost!
7. Trusted (validated/checked by
reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can
understand data & processes)
1. Preserved (existing in some
form)
5. Discoverable (can be indexed by
a system)
8. Citable (able to point & track
citations)
8. Cite: Force11 Data Citation Principles
• Another Force11 Working group
• Defined 8 principles:
• Now seeking endorsement/working on
implementation
3. Accessible (can be accessed by
others)
2. Archived (long-term & format-
independent)
1. Importance: Data should be considered legitimate, citable products of
research. Data citations should be accorded the same importance in
the scholarly record as citations of other research objects, such as
publications.
2. Credit and attribution: Data citations should facilitate giving scholarly
credit and normative and legal attribution to all contributors to the
data, recognizing that a single style or mechanism of attribution may
not be applicable to all data.
3. Evidence: Where a specific claim rests upon data, the corresponding
data citation should be provided.
4. Unique Identification: A data citation should include a persistent
method for identification that is machine actionable, globally unique,
and widely used by a community.
5. Access: Data citations should facilitate access to the data themselves
and to such associated metadata, documentation, and other materials,
as are necessary for both humans and machines to make informed use
of the referenced data.
6. Persistence: Metadata describing the data, and unique identifiers
should persist, even beyond the lifespan of the data they describe.
7. Versioning and granularity: Data citations should facilitate
identification and access to different versions and/or subsets of data.
Citations should include sufficient detail to verifiably link the citing
work to the portion and version of data cited.
8. Interoperability and flexibility: Data citation methods should be
sufficiently flexible to accommodate the variant practices among
communities but should not differ so much that they compromise
interoperability of data citation practices across communities.
7. Trusted (validated/checked by
reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can
understand data & processes)
1. Preserved (existing in some
form)
5. Discoverable (can be indexed by
a system)
8. Citable (able to point & track
citations)
9. Use: Executable Papers
• Result of a challenge to come up with
cyberinfrastructure components to
enable executable papers
• Pilot in Computer Science journals
– See all code in the paper
– Save it, export it
– Change it and rerun on data set:
3. Accessible (can be accessed by
others)
2. Archived (long-term & format-
independent)
10: Integrate data creation with data use
7. Trusted (validated/checked by reviewers)
6. Reproducible (others can redo
experiments)
9. Usable (allow tools to run on it)
4. Comprehensible (others can understand
data & processes)
2. Archived (long-term & format-
independent)
1. Preserved (existing in some form)
5. Discoverable (can be indexed by a system)
3. Accessible (can be accessed by others)
8. Citable (able to point & track citations)
Work with domain data repositories to develop easier
ways to upload data that confirms to their schema.
Follow Force11 Resource Identification initiative;
reproducibility imitative. Support standard protocols.
Content enrichment using data, e.g. executable papers,
virtual microsope, database linking, and others
Build tools that allow researchers to interpret and
reevaluate their data directly; drive adoption of ELNs.
Software standards change: need investment in updating,
e.g. Olive Project to save OSs.
Key is to have data be digital and preservable, e.g. data
rescue challenge. Need funding for digitisation projects.
Collaborate on grants to develop data discovery tools;
promote and use common standards, indices.
Build and encourage electronic lab notebooks to ensure
data can be shared if/when needed; follow workflow.
Force11 Data Citation Principles link data to papers and
v.v. Issues: need better identifiers, granularity, versioning.
10.Integrateupstreamanddownstream–makemetadatato
serveuse.
Thank you!
Collaborations and discussions gratefully acknowledged:
• CMU: Nathan Urban, Shreejoy Tripathy, Shawn Burton, Rick
Gerkin, Santosh Chandrasekaran, Matthew Geramita, Eduard
Hovy
• UCSD: Phil Bourne, Brian Shoettlander, David Minor, Declan
Fleming, Ilya Zaslavsky
• NIF/Force11: Maryann Martone, Anita Bandrowski
• OHSU: Melissa Haendel, Nicole Vasilevsky
• California Digital Library: Carly Strasser, John Kunze, Stephen
Abrams
• IEDA: Kerstin Lehnert, Annika
• Elsevier: Mark Harviston, Jez Alder, David Marques
Questions?
Anita de Waard
VP Research Data Collaborations
a.dewaard@elsevier.com
http://researchdata.elsevier.com/

More Related Content

Similar to How to persuade with data

The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...Anita de Waard
 
BiTeM / SIBTex @ TREC CDS 2014
BiTeM / SIBTex @ TREC CDS 2014BiTeM / SIBTex @ TREC CDS 2014
BiTeM / SIBTex @ TREC CDS 2014Julien Gobeill
 
Preparing a manuscript
Preparing a manuscriptPreparing a manuscript
Preparing a manuscriptlemberger
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei LinChien-Wei Lin
 
Essential Biology 04.1 Chromosomes, Genes, Alleles, Mutations
Essential Biology 04.1   Chromosomes, Genes, Alleles, MutationsEssential Biology 04.1   Chromosomes, Genes, Alleles, Mutations
Essential Biology 04.1 Chromosomes, Genes, Alleles, MutationsStephen Taylor
 
Preparing a manuscript
Preparing a manuscriptPreparing a manuscript
Preparing a manuscriptlemberger
 
Talk at ISWC 2012 Workshop on Semantic Technologies Applied to Biomedical In...
 Talk at ISWC 2012 Workshop on Semantic Technologies Applied to Biomedical In... Talk at ISWC 2012 Workshop on Semantic Technologies Applied to Biomedical In...
Talk at ISWC 2012 Workshop on Semantic Technologies Applied to Biomedical In...Anita de Waard
 
Citation practices and the construction of scientific fact--ECA-facts-preconf...
Citation practices and the construction of scientific fact--ECA-facts-preconf...Citation practices and the construction of scientific fact--ECA-facts-preconf...
Citation practices and the construction of scientific fact--ECA-facts-preconf...jodischneider
 
Debunk bullshit in statistics QN
Debunk bullshit in statistics QNDebunk bullshit in statistics QN
Debunk bullshit in statistics QNQuan Nguyen
 
Optional Enrichment Activities. Online assignment writing service.
Optional Enrichment Activities. Online assignment writing service.Optional Enrichment Activities. Online assignment writing service.
Optional Enrichment Activities. Online assignment writing service.Lesly Lockwood
 
Nlp for the precision medicine
Nlp for the precision medicineNlp for the precision medicine
Nlp for the precision medicineVishwas N
 
Analysing & interpreting data.ppt
Analysing & interpreting data.pptAnalysing & interpreting data.ppt
Analysing & interpreting data.pptmanaswidebbarma1
 
eumr_issue_2-2_2015-2.compressed
eumr_issue_2-2_2015-2.compressedeumr_issue_2-2_2015-2.compressed
eumr_issue_2-2_2015-2.compressedSharon Hsieh
 
Annotated Bibliography – Gender, Race & CrimeMACJ560 (PLEASE P.docx
Annotated Bibliography – Gender, Race & CrimeMACJ560 (PLEASE P.docxAnnotated Bibliography – Gender, Race & CrimeMACJ560 (PLEASE P.docx
Annotated Bibliography – Gender, Race & CrimeMACJ560 (PLEASE P.docxjustine1simpson78276
 
Narrative Essay With Dialogue Example. 008 Essay Example Dialogue Narrative W...
Narrative Essay With Dialogue Example. 008 Essay Example Dialogue Narrative W...Narrative Essay With Dialogue Example. 008 Essay Example Dialogue Narrative W...
Narrative Essay With Dialogue Example. 008 Essay Example Dialogue Narrative W...Susan Belcher
 
Bivariate RegressionRegression analysis is a powerful and comm.docx
Bivariate RegressionRegression analysis is a powerful and comm.docxBivariate RegressionRegression analysis is a powerful and comm.docx
Bivariate RegressionRegression analysis is a powerful and comm.docxhartrobert670
 
Microtask crowdsourcing for disease mention annotation in PubMed abstracts
Microtask crowdsourcing for disease mention annotation in PubMed abstractsMicrotask crowdsourcing for disease mention annotation in PubMed abstracts
Microtask crowdsourcing for disease mention annotation in PubMed abstractsBenjamin Good
 
CORE: Quantitative Research Methodology: An Overview
CORE: Quantitative Research Methodology: An OverviewCORE: Quantitative Research Methodology: An Overview
CORE: Quantitative Research Methodology: An OverviewTrident University
 
Why Life is Difficult, and What We MIght Do About It
Why Life is Difficult, and What We MIght Do About ItWhy Life is Difficult, and What We MIght Do About It
Why Life is Difficult, and What We MIght Do About ItAnita de Waard
 

Similar to How to persuade with data (20)

The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
The Narrative Structure of Research Articles, or, Why Science is Like a Fairy...
 
BiTeM / SIBTex @ TREC CDS 2014
BiTeM / SIBTex @ TREC CDS 2014BiTeM / SIBTex @ TREC CDS 2014
BiTeM / SIBTex @ TREC CDS 2014
 
Preparing a manuscript
Preparing a manuscriptPreparing a manuscript
Preparing a manuscript
 
Research Statement Chien-Wei Lin
Research Statement Chien-Wei LinResearch Statement Chien-Wei Lin
Research Statement Chien-Wei Lin
 
Essential Biology 04.1 Chromosomes, Genes, Alleles, Mutations
Essential Biology 04.1   Chromosomes, Genes, Alleles, MutationsEssential Biology 04.1   Chromosomes, Genes, Alleles, Mutations
Essential Biology 04.1 Chromosomes, Genes, Alleles, Mutations
 
Preparing a manuscript
Preparing a manuscriptPreparing a manuscript
Preparing a manuscript
 
Scientific writing.pptx
Scientific writing.pptxScientific writing.pptx
Scientific writing.pptx
 
Talk at ISWC 2012 Workshop on Semantic Technologies Applied to Biomedical In...
 Talk at ISWC 2012 Workshop on Semantic Technologies Applied to Biomedical In... Talk at ISWC 2012 Workshop on Semantic Technologies Applied to Biomedical In...
Talk at ISWC 2012 Workshop on Semantic Technologies Applied to Biomedical In...
 
Citation practices and the construction of scientific fact--ECA-facts-preconf...
Citation practices and the construction of scientific fact--ECA-facts-preconf...Citation practices and the construction of scientific fact--ECA-facts-preconf...
Citation practices and the construction of scientific fact--ECA-facts-preconf...
 
Debunk bullshit in statistics QN
Debunk bullshit in statistics QNDebunk bullshit in statistics QN
Debunk bullshit in statistics QN
 
Optional Enrichment Activities. Online assignment writing service.
Optional Enrichment Activities. Online assignment writing service.Optional Enrichment Activities. Online assignment writing service.
Optional Enrichment Activities. Online assignment writing service.
 
Nlp for the precision medicine
Nlp for the precision medicineNlp for the precision medicine
Nlp for the precision medicine
 
Analysing & interpreting data.ppt
Analysing & interpreting data.pptAnalysing & interpreting data.ppt
Analysing & interpreting data.ppt
 
eumr_issue_2-2_2015-2.compressed
eumr_issue_2-2_2015-2.compressedeumr_issue_2-2_2015-2.compressed
eumr_issue_2-2_2015-2.compressed
 
Annotated Bibliography – Gender, Race & CrimeMACJ560 (PLEASE P.docx
Annotated Bibliography – Gender, Race & CrimeMACJ560 (PLEASE P.docxAnnotated Bibliography – Gender, Race & CrimeMACJ560 (PLEASE P.docx
Annotated Bibliography – Gender, Race & CrimeMACJ560 (PLEASE P.docx
 
Narrative Essay With Dialogue Example. 008 Essay Example Dialogue Narrative W...
Narrative Essay With Dialogue Example. 008 Essay Example Dialogue Narrative W...Narrative Essay With Dialogue Example. 008 Essay Example Dialogue Narrative W...
Narrative Essay With Dialogue Example. 008 Essay Example Dialogue Narrative W...
 
Bivariate RegressionRegression analysis is a powerful and comm.docx
Bivariate RegressionRegression analysis is a powerful and comm.docxBivariate RegressionRegression analysis is a powerful and comm.docx
Bivariate RegressionRegression analysis is a powerful and comm.docx
 
Microtask crowdsourcing for disease mention annotation in PubMed abstracts
Microtask crowdsourcing for disease mention annotation in PubMed abstractsMicrotask crowdsourcing for disease mention annotation in PubMed abstracts
Microtask crowdsourcing for disease mention annotation in PubMed abstracts
 
CORE: Quantitative Research Methodology: An Overview
CORE: Quantitative Research Methodology: An OverviewCORE: Quantitative Research Methodology: An Overview
CORE: Quantitative Research Methodology: An Overview
 
Why Life is Difficult, and What We MIght Do About It
Why Life is Difficult, and What We MIght Do About ItWhy Life is Difficult, and What We MIght Do About It
Why Life is Difficult, and What We MIght Do About It
 

More from Anita de Waard

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseAnita de Waard
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?Anita de Waard
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataAnita de Waard
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsAnita de Waard
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesAnita de Waard
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Anita de Waard
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?Anita de Waard
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data ManagementAnita de Waard
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseAnita de Waard
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of PublishingAnita de Waard
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsAnita de Waard
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryAnita de Waard
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data SharingAnita de Waard
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingAnita de Waard
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataAnita de Waard
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupAnita de Waard
 

More from Anita de Waard (20)

Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and ReuseMendeley Data: Enhancing Data Discovery, Sharing and Reuse
Mendeley Data: Enhancing Data Discovery, Sharing and Reuse
 
Why would a publisher care about open data?
Why would a publisher care about open data?Why would a publisher care about open data?
Why would a publisher care about open data?
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
NFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR DataNFAIS Talk on Enabling FAIR Data
NFAIS Talk on Enabling FAIR Data
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data Commons
 
Enabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring GuidelinesEnabling FAIR Data: TAG B Authoring Guidelines
Enabling FAIR Data: TAG B Authoring Guidelines
 
Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.Scientific facts are myths, told through fairytales and spread by gossip.
Scientific facts are myths, told through fairytales and spread by gossip.
 
Data, Data Everywhere: What's A Publisher to Do?
Data, Data Everywhere: What's  A Publisher to Do?Data, Data Everywhere: What's  A Publisher to Do?
Data, Data Everywhere: What's A Publisher to Do?
 
Talk on Research Data Management
Talk on Research Data ManagementTalk on Research Data Management
Talk on Research Data Management
 
History of the future
History of the futureHistory of the future
History of the future
 
Networked Science, And Integrating with Dataverse
Networked Science, And Integrating with DataverseNetworked Science, And Integrating with Dataverse
Networked Science, And Integrating with Dataverse
 
Big Data and the Future of Publishing
Big Data and the Future of PublishingBig Data and the Future of Publishing
Big Data and the Future of Publishing
 
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data EcosystemsReal-World Data Challenges: Moving Towards Richer Data Ecosystems
Real-World Data Challenges: Moving Towards Richer Data Ecosystems
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
 
The Economics of Data Sharing
The Economics of Data SharingThe Economics of Data Sharing
The Economics of Data Sharing
 
Public Identifiers in Scholarly Publishing
Public Identifiers in Scholarly PublishingPublic Identifiers in Scholarly Publishing
Public Identifiers in Scholarly Publishing
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective DataElsevier‘s RDM Program: Ten Habits of Highly Effective Data
Elsevier‘s RDM Program: Ten Habits of Highly Effective Data
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
RDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest GroupRDA-WDS Publishing Data Interest Group
RDA-WDS Publishing Data Interest Group
 

Recently uploaded

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 

Recently uploaded (20)

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Cultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptxCultivation of KODO MILLET . made by Ghanshyam pptx
Cultivation of KODO MILLET . made by Ghanshyam pptx
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 

How to persuade with data

  • 1. How to find networked knowledge:About Stories, that Persuade With Data Anita de Waard VP Research Data Collaborations a.dewaard@elsevier.com Federal Big Data Meetup, May 20 2014
  • 2. Discourse Comprehension 101 • Letter < syllable < word < clause < sentence < discourse: This is how linguistics is structured. But it is not how we understand text!
  • 3. • Letter < syllable < word < clause < sentence < discourse: This is how linguistics is structured. But it is not how we understand text! Discourse Comprehension 101
  • 4. • Letter < syllable < word < clause < sentence < discourse: This is how linguistics is structured. But it is not how we understand text! Discourse Comprehension 101
  • 5. • Letter < syllable < word < clause < sentence < discourse: This is how linguistics is structured. But it is not how we understand text! Discourse Comprehension 101
  • 6. • Letter < syllable < word < clause < sentence < discourse: This is how linguistics is structured. But it is not how we understand text! Discourse Comprehension 101
  • 7. • Letter < syllable < word < clause < sentence < discourse: This is how linguistics is structured. But it is not how we understand text! Discourse Comprehension 101
  • 8. • Letter < syllable < word < clause < sentence < discourse: This is how linguistics is structured. But it is not how we understand text! • Kintsch and Van Dijk, ‘93: we read a text at three levels: – surface code: literal text, exact words/syntax – text base: preserves meaning, but not exact wording – situation model: ‘microworld’ that the text is about: constructed inferentially through interaction between the text and background knowledge • We use knowledge about text genre to activate a schema: this allows creation of the text base and situation model Discourse Comprehension 101
  • 9. In summary, how scientists read: • Surface code provides noun phrases and triples that offer pointers re. topical relevance • Text base and and situation model are created through specific metadiscourse conventions (e.g. refs at the end) that create a biological reasoning model: • This can be expressed as a set of claims, linked to evidence, that can help represent key points in the paper • Journal name and author’s affiliation help define schema and provide ‘willingness to be convinced’ socially/interpersonally. We next asked whether … To do so, we transiently inhibited… Suppression of X enhanced invasion … but F was unaffected …(Figure 3A). … Collectively, these data indicated that … . Hypothesis Goal/Method Result Results Implication
  • 11. human breast cancer noninvasive MCF7-Ras antisense oligonucleotides high-grade malignancy cell viability retroviral vector miR-31 cloned transiently expressed miRNA sponges Is it pertinent? -> Possibly… Is it true? -> ? Is it new, but in agreement with what I know? -> -? What is this paper about? A. NOUN PHRASES
  • 12. Noun Phrases: some issues • Problem 1: disambiguating terms (© GoPubMed): – Hnrpa1 = Tis = Fli-2 = nuclear ribonucleoprotein A1 = helix destabilizing protein = single-strand binding protein = hnRNP core protein A1 = HDP-1 = topoisomerase-inhibitor suppressed. – Cellulose 1,4-beta-cellobiosidase = exoglucanase – COLD =/ C.O.L.D. =/ cold (runny nose) =/ cold (low T) • Problem 2: disambiguating entities (© M. Martone): – 95 antibodies were (manually!) identified in 8 articles – 52 did not contain enough information to determine the antibody used – Some provided details in other papers – Failed to give species, clonality, vendor, or catalog number
  • 13. Noun Phrases: some progress • Despite these difficulties, noun phrase recall/precision is quite high, e.g. I2B22011 [1], [2], others: 90%-98% • Many tools, see [3] for a list; e.g. GoPubMed:
  • 14. miR-31 PREVENT acquisition of aggressive traits miR-31 INHIBIT noninvasive MCF7-Ras cells miR-31 ENHANCE invasion cell viability AFFECT inhibitor miR-31 expression DEPRIVE metastatic cells Is it pertinent? -> Possibly… Is it true? -> ? Is it new, but in agreement with what I know? ->? What is this paper about? B. TRIPLES
  • 15. Triples: some issues: • Contingent on good NP & VP detection • Hard to parse text! E.g. a commercial tool gave: insulin maintaining glucose homeostasis When insulin secretion cannot be increased adequately (type I diabetes defect) to overcome insulin resistance in maintaining glucose homeostasis, hyperglycemia and glucose intolerance ensues. insulin may be involved glucose homeostasis Because PANDER is expressed by pancreatic beta-cells and in response to glucose in a similar way to those of insulin, PANDER may be involved in glucose homeostasis.
  • 16. Triples: some progress: Biological Expression Language [4]: We provide evidence that these miRNAs are potential novel oncogenes participating in the development of human testicular germ cell tumors by numbing the p53 pathway, thus allowing tumorigenic growth in the presence of wild-type p53. Increased abundance of miR-372 decreases activity of TP53 r(MIR:miR-372) -| tscript(p(HUGO:Trp53)) Context: cancer SET Disease = “Cancer” Activity of TP53 decreases cell growth tscript(p(HUGO:Trp53)) -| bp(GO:”Cell Growth”
  • 17. The preceding observations demonstrated that X expression deprives Y cells of attributes associated with Z. We next asked whether X also prevents the acquisition of A traits by B cells. To do so, we transiently inhibited X in C cells with either D or E. Both approaches inhibited X function by > 4.5-fold (Figure S7A). Suppression of X enhanced invasion by 20-fold and motility by 5-fold, but F was unaffected by either inhibitor (Figure 3A; Figure S7B). The E sponge reduced X function by 2.5-fold, but did not affect the activity of other known Js (Figures S8A and S8B). Collectively, these data indicated that sustained X activity is necessary to prevent the acquisition of Z traits by both K and untransformed B cells. Is it pertinent? -> Need content Is it true? -> Sounds likely! I know this stuff! Is it new, but in agreement with what I know? -> Need content What is this paper about? C. METADISCOURSE
  • 18. Metadiscourse: why it matters • Voorhoeve et al., 2006: “These miRNAs neutralize p53- mediated CDK inhibition, possibly through direct inhibition of the expression of the tumor suppressor LATS2.” • Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and miR-373 were found to allow proliferation of primary human cells that express oncogenic RAS and active p53, possibly by inhibiting the tumor suppressor LATS2 (Voorhoeve et al., 2006).” • Yabuta et al., 2007: “[On the other hand,] two miRNAs, miRNA-372 and-373, function as potential novel oncogenes in testicular germ cell tumors by inhibition of LATS2 expression, which suggests that Lats2 is an important tumo suppressor (Voorhoeve et al., 2006).” • Okada et al., 2011: “Two oncogenic miRNAs, miR-372 and miR-373, directly inhibit the expression of Lats2, thereby allowing tumorigenic growth in the presence of p53 (Voorhoeve et al., 2006).” “[Y]ou can transform .. fiction into fact just by adding or subtracting references”, Bruno Latour [5]
  • 19. Metadiscourse: some progress • Hedging cues, speculative language, modality/negation: – Light et al [6]: finding speculative language – Wilbur et al (Hagit) [7]: focus, polarity, certainty, evidence, and directionality – Thompson et al (Sophia) [8]: level of speculation, type/source of the evidence and level of certainty • Sentiment detection (e.g. Kim and Hovy [9] a.m.o.): – Holder of the opinion, strength, polarity as ‘mathematical function’ acting on main propositional content • Can make this part of the semantic web: (e.g., Ontology for Reasoning, Certainty and Attribution, ORCA [10]): – Value (Presumed True, Probable, Possible, Unknown) – Source (Author, Named Other, Unknown) – Basis (Data, Reasoning, Unknown)
  • 20. Claim: • sustained miR-31 activity is necessary to prevent the acquisition of aggressive traits by both tumor cells and untransformed breast epithelial Evidence: Method: • We transiently inhibited miR-31 in noninvasive MCF7-Ras cells with either antisense oligonucleotides or miRNA sponges. Evidence: Result: • Both approaches inhibited miR-31 function by >4.5-fold (Figure S7A). • Suppression of miR-31 enhanced invasion by 20-fold and motility by 5- fold, but cell viability was unaffected by either inhibitor (Figure 3A; Figure S7B). • The miR-31 sponge reduced miR-31 function by 2.5-fold, but did not affect the activity of other known antimetastatic miRNAs (Figures S8A and S8B). What is this paper about? D. CLAIMS AND EVIDENCE Is it pertinent? -> Probably Is it true? -> Sounds likely! Is it new, but in agreement with what I know? -> Check/know
  • 21. Claims and Evidence: some issues: • Data2Semantics [11]: linking clinical guidelines to evidence. Inconsistency within guideline and guidelines v. evidence: • Studies have demonstrated inconsistent results regarding the use of such markers of inflammation as C-reactive protein (CRP), interleukins- 6 (IL-6) and -8, and procalcitonin (PCT) in neutropenic patients with cancer [55–57]. • [55]: PCT and IL-6 are more reliable markers than CRP for predicting bacteremia in patients with febrile neutropenia • [56] In conclusion, daily measurement of PCT or IL-6 could help identify neutropenic patients with a stable course when the fever lasts >3 d. …, it would reduce adverse events and treatment costs. • [57] Our study supports the value of PCT as a reliable tool to predict clinical outcome in febrile neutropenia. • Drug Interaction Knowledgebase [12]: how to identify evidence? • R-citalopram_is_not_substrate_of_cyp2c19: • At 10uM R- or S-CT, ketoconazole reduced reaction velocity to 55 -60% of control, quinidine to 80%, and omeprazole to 80-85% of control (Fig. 6).
  • 22. Claims and Evidence: some progress • Defining ‘salient knowledge components’ in text: – Argumentative zones, CoreSC can both be found – Blake, Claim networks (2012) – Claimed Knowledge Updates (Sandor/de Waard, 2012):
  • 23. Finding claims in XIP: E.g. through scientific discourse analysis: In contrast with previous hypotheses compact plaques form before significant deposition of diffuse A beta, suggesting that different mechanisms are involved in the deposition of diffuse amyloid and the aggregation into plaques. Entities Relationships Temporality Connections thematic roles Status core information (proposition) information extraction rhetorical metadiscourse discourse analysis discourse analysisdiscourse structure Sándor, Àgnes and de Waard, Anita, (2012).
  • 24. Formalizing claims with hedging: Biological statement with BEL/ epistemic markup BEL representation: Epistemic evaluation These miRNAs neutralize p53-mediated CDK inhibition, possibly through direct inhibition of the expression of the tumor-suppressor LATS2. r(MIR:miR-372) - |(tscript(p(HUGO:Trp53)) -| kin(p(PFH:”CDK Family”))) Increased abundance of miR- 372 decreases abundance of LATS2 r(MIR:miR-372) -| r(HUGO:LATS2) Value = Possible Source = Unknown Basis = Unknown Biological statement with Medscan/epistemic markup MedScan Representation: Epistemic evaluation Furthermore, we present evidence that the secretion of nesfatin-1 into the culture media was dramatically increased during the differentiation of 3T3-L1 preadipocytes into adipocytes (P < 0.001) and after treatments with TNF-alpha, IL-6, insulin, and dexamethasone (P < 0.01). IL-6  NUCB2 (nesfatin-1) Relation: MolTransport Effect: Positive CellType: Adipocytes Cell Line: 3T3-L1 Value = Probable Source = Author Basis = Data
  • 25. 25 Schema’s: scientific articles are stories... The Story of Goldilocks and the Three Bears Story Grammar Paper The AXH Domain of Ataxin-1 Mediates Neurodegeneration through Its Interaction with Gfi-1/Senseless Proteins Once upon a time Time Setting Background The mechanisms mediating SCA1 pathogenesis are still not fully understood, but some general principles have emerged. a little girl named Goldilocks Characters Objects of study the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract, She went for a walk in the forest. Pretty soon, she came upon a house. Location Experimental setup studied and compared in vivo effects and interactions to those of the human protein She knocked and, when no one answered, Goal Theme Research goal Gain insight into how Atx-1's function contributes to SCA1 pathogenesis. How these interactions might contribute to the disease process and how they might cause toxicity in only a subset of neurons in SCA1 is not fully understood. she walked right in. Attempt Hypothesis Atx-1 may play a role in the regulation of gene expression At the table in the kitchen, there were three bowls of porridge. Name Episode 1 Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When Overexpressed in File Goldilocks was hungry. Subgoal Subgoal test the function of the AXH domain She tasted the porridge from the first bowl. Attempt Method overexpressed dAtx-1 in flies using the GAL4/UAS system (Brand and Perrimon, 1993) and compared its effects to those of hAtx-1. This porridge is too hot! she exclaimed. Outcome Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which drives expression in the differentiated R1-R6 photoreceptor cells (Mollereau et a 2000 and O'Tousa et al., 1985), results in neurodegeneration in the eye, as does overexpression of hAtx-1[82Q]. Although at 2 days after eclosion, overexpression of either Atx-1 does not show obvious morphological changes in the photoreceptor cells So, she tasted the porridge from the second bowl. Activity Data (data not shown), This porridge is too cold, she said Outcome Results both genotypes show many large holes and loss of cell integrity at 28 days
  • 26. ...that persuade (editors/authors/readers!)… Aristotle Quintilian Scientific Paper prooimion Introduction / exordium The introduction of a speech, where one announces the subject and purpose of the discourse, and where one usually employs the persuasive appeal to ethos in order to establish credibility with the audience. Introduction: positioning prothesis Statement of Facts/narrati o The speaker here provides a narrative account of what has happened and generally explains the nature of the case. Introduction: research question Summary/ propostitio The propositio provides a brief summary of what one is about to speak on, or concisely puts forth the charges or accusation. Summary of contents pistis Proof/ confirmatio The main body of the speech where one offers logical arguments as proof. The appeal to logos is emphasized here. Results Refutation/ refutatio As the name connotes, this section of a speech was devoted to answering the counterarguments of one's opponent. Related Work epilogos peroratio Following the refutatio and concluding the classical oration, the peroratio conventionally employed appeals through pathos, and often included a summing up. Discussion: summary, implications.
  • 28. What about the data? Using antibodies and squishy bits Grad Students experiment and enter details into their lab notebook. The PI then tries to make sense of their slides, and writes a paper. End of story.
  • 29. 7. Trusted (validated/checked by reviewers) Maslow’s Hierarchy of Needs for Research Data 6. Reproducible (others can redo experiments) 9. Usable (allow tools to run on it) 4. Comprehensible (others can understand data & processes) 2. Archived (long-term & format- independent) 1. Preserved (existing in some form) 5. Discoverable (can be indexed by a system) 3. Accessible (can be accessed by others) 8. Citable (able to point & track citations)
  • 30. 1. Preserve: Data Rescue Challenge • With IEDA/Lamont: award succesful data rescue attempts • Awarded at AGU 2013 • 23 submissions of data that was digitized, preserved, made available • Winner: NIMBUS Data Rescue: – Recovery, reprocessing and digitization of the infrared and visible observations along with their navigation and formatting. – Over 4000 7-track tapes of global infrared satellite data were read and reprocessed. – Nearly 200,000 visible light images were scanned, rectified and navigated. – All the resultant data was converted to HDF-5 (NetCDF) format and freely distributed to users from NASA and NSIDC servers. – This data was then used to calculate monthly sea ice extents for both the Arctic d the Antarctic. • Conclusion: we (collectively) need to do more of this! How can we fund it? 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 9. Usable (allow tools to run on it) 4. Comprehensible (others can understand data & processes) 1. Preserved (existing in some form) 5. Discoverable (can be indexed by a system) 8. Citable (able to point & track citations) 3. Accessible (can be accessed by others) 2. Archived (long-term & format- independent)
  • 31. 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 9. Usable (allow tools to run on it) 4. Comprehensible (others can understand data & processes) 3. Accessible (can be accessed by others) 1. Preserved (existing in some form) 5. Discoverable (can be indexed by a system) 2. Archived (long-term & format- independent) 8. Citable (able to point & track citations) 2. Archive: Olive Project • CMU CS & Library: funded by a grant from the IMLS, Elsevier is partner • Goal: Preservation of executable content - nowadays a large part of intellectual output, and very fragile • Identified a series of software packages and prepared VM to preserve • Does it work? Yes – see video (1:24)
  • 32. 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 9. Usable (allow tools to run on it) 4. Comprehensible (others can understand data & processes) 1. Preserved (existing in some form) 5. Discoverable (can be indexed by a system) 8. Citable (able to point & track citations) 3. Access: Urban Legend 3. Accessible (can be accessed by others) 2. Archived (long-term & format- independent) • Part 1: Metadata acquisition • Step through experimental process in series of dropdown menus in simple web UI • Can be tailored to workflow of individual researcher • Connected to shared ontologies through lookup table, managed centrally in lab • Connect to data input console (Igor Pro)
  • 33. 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 9. Usable (allow tools to run on it) 4. Comprehensible (others can understand data & processes) 1. Preserved (existing in some form) 5. Discoverable (can be indexed by a system) 8. Citable (able to point & track citations) 4. Comprehend: Urban Legend 3. Accessible (can be accessed by others) 2. Archived (long-term & format- independent) • Part 2: Data Dashboard • Access, select and manipulate data (calculate properties, sort and plot) • Final goal: interactive figures linked to data • Plan to expand to more neuroscience labs • Plan to build for geochemistry use case
  • 34. 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 9. Usable (allow tools to run on it) 4. Comprehensible (others can understand data & processes) 1. Preserved (existing in some form) 5. Discoverable (can be indexed by a system) 8. Citable (able to point & track citations) 5. Discover: Data Indexing proposals • Collaborated on Data Discovery Index proposal with UCSD/Carnegie Mellon • Also worked with UIUC! • Interested in developing distributed infrastructures on making data easier to search: what is the ‘Goldilocks lndex’ where search is scalable, yet useful? • Looking for academic/industry partners/use cases/platforms to address the next stage • Discoverability is key driver for metadata/data format structure! 3. Accessible (can be accessed by others) 2. Archived (long-term & format- independent)
  • 35. 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 9. Usable (allow tools to run on it) 4. Comprehensible (others can understand data & processes) 1. Preserved (existing in some form) 5. Discoverable (can be indexed by a system) 8. Citable (able to point & track citations) 6. Reproduce: Resource Identifier Initiative Force11 Working Group to add data identifiers to articles that is – 1) Machine readable; – 2) Free to generate and access; – 3) Consistent across publishers and journals. • Authors publishing in participating journals will be asked to provide RRID's for their resources; these are added to the keyword field • RRID's will be drawn from: – The Antibody Registry – Model Organism Databases – NIF Resource Registry • So far, Springer, Wiley, Biomednet, Elsevier journals have signed up with 11 journals, more to come • Wide community adoption! 3. Accessible (can be accessed by others) 2. Archived (long-term & format- independent)
  • 36. 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 9. Usable (allow tools to run on it) 4. Comprehensible (others can understand data & processes) 1. Preserved (existing in some form) 5. Discoverable (can be indexed by a system) 8. Citable (able to point & track citations) 7.Trust: Moonrocks 3. Accessible (can be accessed by others) 2. Archived (long-term & format- independent) How can we scale up data curation? Pilot project with IEDA: • A database for lunar geochemistry: leapfrog & improve curation time • 1-year pilot, funded by Elsevier • Main conclusion: if spreadsheet columns/headers map to RDB schema we can scale curation cost!
  • 37. 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 9. Usable (allow tools to run on it) 4. Comprehensible (others can understand data & processes) 1. Preserved (existing in some form) 5. Discoverable (can be indexed by a system) 8. Citable (able to point & track citations) 8. Cite: Force11 Data Citation Principles • Another Force11 Working group • Defined 8 principles: • Now seeking endorsement/working on implementation 3. Accessible (can be accessed by others) 2. Archived (long-term & format- independent) 1. Importance: Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications. 2. Credit and attribution: Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing that a single style or mechanism of attribution may not be applicable to all data. 3. Evidence: Where a specific claim rests upon data, the corresponding data citation should be provided. 4. Unique Identification: A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community. 5. Access: Data citations should facilitate access to the data themselves and to such associated metadata, documentation, and other materials, as are necessary for both humans and machines to make informed use of the referenced data. 6. Persistence: Metadata describing the data, and unique identifiers should persist, even beyond the lifespan of the data they describe. 7. Versioning and granularity: Data citations should facilitate identification and access to different versions and/or subsets of data. Citations should include sufficient detail to verifiably link the citing work to the portion and version of data cited. 8. Interoperability and flexibility: Data citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities.
  • 38. 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 9. Usable (allow tools to run on it) 4. Comprehensible (others can understand data & processes) 1. Preserved (existing in some form) 5. Discoverable (can be indexed by a system) 8. Citable (able to point & track citations) 9. Use: Executable Papers • Result of a challenge to come up with cyberinfrastructure components to enable executable papers • Pilot in Computer Science journals – See all code in the paper – Save it, export it – Change it and rerun on data set: 3. Accessible (can be accessed by others) 2. Archived (long-term & format- independent)
  • 39. 10: Integrate data creation with data use 7. Trusted (validated/checked by reviewers) 6. Reproducible (others can redo experiments) 9. Usable (allow tools to run on it) 4. Comprehensible (others can understand data & processes) 2. Archived (long-term & format- independent) 1. Preserved (existing in some form) 5. Discoverable (can be indexed by a system) 3. Accessible (can be accessed by others) 8. Citable (able to point & track citations) Work with domain data repositories to develop easier ways to upload data that confirms to their schema. Follow Force11 Resource Identification initiative; reproducibility imitative. Support standard protocols. Content enrichment using data, e.g. executable papers, virtual microsope, database linking, and others Build tools that allow researchers to interpret and reevaluate their data directly; drive adoption of ELNs. Software standards change: need investment in updating, e.g. Olive Project to save OSs. Key is to have data be digital and preservable, e.g. data rescue challenge. Need funding for digitisation projects. Collaborate on grants to develop data discovery tools; promote and use common standards, indices. Build and encourage electronic lab notebooks to ensure data can be shared if/when needed; follow workflow. Force11 Data Citation Principles link data to papers and v.v. Issues: need better identifiers, granularity, versioning. 10.Integrateupstreamanddownstream–makemetadatato serveuse.
  • 40. Thank you! Collaborations and discussions gratefully acknowledged: • CMU: Nathan Urban, Shreejoy Tripathy, Shawn Burton, Rick Gerkin, Santosh Chandrasekaran, Matthew Geramita, Eduard Hovy • UCSD: Phil Bourne, Brian Shoettlander, David Minor, Declan Fleming, Ilya Zaslavsky • NIF/Force11: Maryann Martone, Anita Bandrowski • OHSU: Melissa Haendel, Nicole Vasilevsky • California Digital Library: Carly Strasser, John Kunze, Stephen Abrams • IEDA: Kerstin Lehnert, Annika • Elsevier: Mark Harviston, Jez Alder, David Marques
  • 41. Questions? Anita de Waard VP Research Data Collaborations a.dewaard@elsevier.com http://researchdata.elsevier.com/