08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Why Life is Difficult, and What We MIght Do About It
1. Why Research Data
Management
May Save Science
Anita de Waard
VP Research Data Collaborations
a.dewaard@elsevier.com
http://researchdata.elsevier.com/
Why Life is Difficult,
And What We Can Do About It
2. Outline:
• The problem: life is difficult.
• One approach to tackling this: claim-evidence
networks.
– How do we find claims?
– How do we find evidence?
– How do we connect the two?
• What is still missing?
• Call to action!
4. Problem 1: a rose is not a rose:
• “…there was significant variability of the
injected venom composition from
specimen to specimen, in spite of their
common biogeographic origin.”
Jose A. Rivera-Ortiz, Herminsul Cano, Frank Marí, Intraspecies variability of the
injected venom of Conus ermineus, doi:10.1016/j.peptides.2010.11.014
• “…Strains DV-3/84 DV-7/84 (group 3)
showed 76.6% similarity to each other and
were similar to all other strains at the
67.6% level.”
Zofia Dzierżewicz et al., Intraspecies variability of Desulfovibrio desulfuricans
strains determined by the genetic profiles, FEMS Microbiology Letters, Volume
219, Issue 1, 14 February 2003, Pages 69–74, doi:10.1016/S0378-
1097(02)01199-0
=> A specimen is not a species!
5. Problem 2: gene expression varies with:
Age: “SIRT1-Associated genes are deregulated in the aged brain”
Philipp Oberdoerffer et al., SIRT1 Redistribution on Chromatin Promotes Genomic Stability but Alters Gene Expression
during Aging, Cell, Volume 135, Issue 5, 28 November 2008, Pages 907–918, doi:10.1016/j.cell.2008.10.025
Smell: “…major urinary proteins *…+ mediate the pregnancy blocking
effects of male urine”
P.A. Brennan, et al, Patterns of expression of the immediate-early gene egr-1 in the accessory olfactory bulb of female
mice exposed to pheromonal constituents of male urine, Neuroscience, Volume 90, Issue 4, June 1999, P 1463–
1470, doi:10.1016/S0306-4522(98)00556-9
Hunger: “Out of the ~30K genes, about 10K are differentially expressed
in liver cells when an animal is in different states of satiety.“
Zhang F, Xu X, Zhou B, He Z, Zhai Q (2011) Gene Expression Profile Change and Associated Physiological and
Pathological Effects in Mouse Liver Induced by Fasting and Refeeding.
PLoS ONE 6(11): e27553. doi:10.1371/journal.pone.002755
Light: “Longer-term enrichment training also altered the mRNA levels of
many genes associated with structural changes that occur during
neuronal growth.”
Cailotto C., et al. (2009) Effects of Nocturnal Light on (Clock) Gene Expression in Peripheral Organs: A Role for the
Autonomic Innervation of the Liver. PLoS ONE 4(5): e5650. doi:10.1371/journal.pone.0005650:
=> Knowing genes is not knowing
how they are expressed!
6. • “We found the diversity and abundance of each habitat’s
signature microbes to vary widely even among healthy
subjects, with strong niche specialization both within
and among individuals.”
The Human Microbiome Project Consortium, Structure, function and diversity of the healthy
human microbiome, Nature 486, 207–214 (14 June 2012) doi:10.1038/nature11234
• “Colonization of an infant’s gastrointestinal tract begins
at birth. The acquisition and normal development of the
neonatal microflora is vital for the healthy maturation of
the immune system.”
Mackie RI, Sghir A, Gaskins HR., Developmental microbial ecology of the neonatal
gastrointestinal tract. Am J Clin Nutr. 1999 May;69(5):1035S-1045S
Problem 3:
No man (or mouse) is an island…
=> An animal is an ecosystem!
7. Problem 4:
Interactions create more complexity:
• Computing cancer: “No amount of information about
what happens inside a single cell can ever tell you
what a tissue is going to do,” *Glazier+ said. “Much of
the information and complexity of tissues and life is
embedded in the way cells talk to each other and the
extracellular environment.”
• Megadata:“These complex emergent systems are
impossible to understand,”,”*we+ founded Applied
Proteomics to create a protein diagnostic that reveals
not just where a cancer is, but how it interacts with
the body..” Nature Special Issue Vol. 491 No. 7425
‘Physical Scientists Take On Cancer’ :
=> The whole is more than the sum of its parts!
8. Big problems in biology:
http://en.wikipedia.org/wiki/File:Duck_of_Vaucanson.jpg
1. Interspecies variability > A specimen is not a species!
2. Gene expression variability > Knowing genes is not
knowing how they are expressed!
3. Microbiome > An animal is an ecosystem!
4. Systems biology > Whole is more than the sum of its parts!
5. Models vs. experiment > Are we talking about the same
things? In a way we can all use?
6. Dynamics > Life is not in equilibrium!
Life is complicated!
Reductionism doesn’t
work for living systems.
9. Statistics could help!
With enough observations, trends and anomalies can be
detected:
• “Here we present resources from a population of 242
healthy adults sampled at 15 or 18 body sites up to three
times, which have generated 5,177 microbial taxonomic
profiles from 16S ribosomal RNA genes and over 3.5
terabases of metagenomic sequence so far.”
The Human Microbiome Project Consortium, Structure, function and diversity of
the healthy human microbiome, Nature 486, 207–214 (14 June 2012)
doi:10.1038/nature11234
• “The large sample size — 4,298 North Americans of
European descent and 2,217 African Americans — has
enabled the researchers to mine down into the human
genome.”
Nidhi Subbaraman, Nature News, 28 November 2012, High-resolution sequencing
study emphasizes importance of rare variants in disease.
10. But biological research is insular!
• Biology is small: size 10^-5 – 10^2
m, scientist can work alone (‘King’ and
‘subjects’).
• Biology is messy: it doesn’t
happen behind a terminal.
• Biology is competitive: many
people with similar skill sets,
vying for the same grants
• In summary: the structure of biological
research does not inherently promote
collaboration (vs., for instance, HE physics or
astronomy (and they’re not all they’re cracked up to
be, either…)).
Prepare
Observe
Analyze
Ponder
Communicate
13. Converging on Claim/Evidence/Networks, e.g. here:
• The Karyotype Ontology: a computational representation for human cytogenetic patterns. Jennifer Warrender and
Phillip Lord
• Lexical Analysis and Characterization of the OBOFoundry Ontologies. Manuel Quesada-Martínez, Jesualdo Tomás
Fernández-Breis and Robert Stevens
• Exomiser: improved exome prioritization of disease genes through cross species phenotype comparison. Peter
Robinson, Sebastian Köhler, Anika Oellrich, Kai Wang, Chris Mungall, Suzanna E. Lewis, Sebastian Bauer, Dominik
Seelow, Peter Krawitz, Christian Gilissen, Melissa Haendel and Damian Smedley
• BioAssay Ontology (BAO): Modularization, Integration and Applications. Uma Vempati, Hande Kucuk, Saminda
Abeyruwan, Ubbo Visser, Vance Lemmon, Ahsan Mir and Stephan Schürer
• eXframe: A Semantic Web Platform for Genomics Experiments. Emily Merrill, Stephane Corlosquet, Paolo
Ciccarese, Tim Clark and Sudeshna Das
• Ovopub: Modular data publication with minimal. provenance Alison Callahan and Michel Dumontier
• Zooma – A tool for automated ontology annotation. Tony Burdett, Simon Jupp, James Malone, Helen
Parkinson, Eleanor Williams and Adam Faulconbridge
• A Probabilistic Framework for Ontology-Based Annotation in Neuroimaging Literature. Chayan
Chakrabarti, Thomas B. Jones, Jiawei F. Xu, George F. Luger, Angela R. Laird, Matthew D. Turner and Jessica A.
Turner
• Preserving sequence annotations across reference sequences. Zuotian Tatum, Andrew Gibson, Marco Roos, Peter
E.M. Taschner, Mark Thompson, Erik A. Schultes and Jeroen F. J. Laros
• A Taxonomy for Immunologists. James A. Overton, Randi Vita, Jason A. Greenbaum, Heiko Dietze, Alessandro Sette
and Bjoern Peters
• Health Data Ontology Trunk: A middle-layer ontology for health- care. Ulf Schwarz, Luc Schneider, Emilio
Sanfilippo, Holger Stenzhorn and Nikolina Koleva
• Structured representation of scientific evidence using semantic web techniques – a biochemistry use
case.Christian Bölling, Michael Weidlich and Hermann-Georg Holzhütter
• Synthetic Biology Open Language Visual: an ontological use case. Jacqueline Quinn, Michal Galdzicki, Robert
14. Step 1: Find claims:
E.g., using XIP for discourse analysis:
In contrast with previous hypotheses compact plaques form before significant
deposition of diffuse A beta, suggesting that different mechanisms are involved
in the deposition of diffuse amyloid and the aggregation into plaques.
Entities
Relationships
Temporality
Connections thematic roles
Status
core information
(proposition)
information extraction
rhetorical
metadiscourse
discourse analysis
discourse analysisdiscourse structure
Sándor, Àgnes and de Waard, Anita, (2012).
15. Finding Claimed Knowledge Updates:
Sandor, A. and de Waard, A. (2012)
Here we used mass spectrometry to identify HuD as a novel
neuronal SMN-interacting partner
Our analysis of known HuD-associated mRNAs in neurons identified
cpg15 mRNA as a highly abundant mRNA in HuD IPs
Our finding that SMN protein associates with HuD protein and the
HuD target cpg15 mRNA in neurons …
Definition:
1) A CKU expresses a verbal or nominal proposition about biological entities.
2) A CKU is a new proposition.
3) The authors present the CKU as factual.
4) A CKU is derived from the experimental work described in the article.
5) The ownership of the proposition is attributed to the author(s) of the article.
6) 4) and 5) are either explicitly expressed or are implicitly conveyed by a
structural position as title, section or caption title.
16. Allow for Hedging and Uncertainty:
Ontology of Reasoning, Certainty and Attribution (ORCA)
For a Proposition P, an epistemically marked clause E
is an evaluation of P, where EV, B, S(P), with:
– V = Value:
3 = Assumed true, 2 = Probable, 1 = Possible, 0 = Unknown,
(- 1= possibly untrue, - 2 = probably untrue, -3 = assumed untrue)
– B = Basis:
Reasoning
Data
– S = Source:
A = speaker is author A, explicit
IA = speaker author, A, implicit
N = other author N, explicit
NN = other author NN, implicit
Based on a conversation with Ed Hovy;
de Waard, A. and Schneider, J. (2012)
17. Turning claims into formal representations:
Biological statement with BEL/ epistemic
markup
BEL representation: Epistemic
evaluation
These miRNAs neutralize p53-mediated CDK
inhibition, possibly through direct inhibition
of the expression of the tumor-suppressor
LATS2.
r(MIR:miR-372) -
|(tscript(p(HUGO:Trp53)) -|
kin(p(PFH:”CDK Family”)))
Increased abundance of miR-
372 decreases abundance of
LATS2
r(MIR:miR-372) -|
r(HUGO:LATS2)
Value =
Possible
Source =
Unknown
Basis =
Unknown
Biological statement with
Medscan/epistemic markup
MedScan Representation: Epistemic
evaluation
Furthermore, we present evidence that the
secretion of nesfatin-1 into the culture
media was dramatically increased during the
differentiation of 3T3-L1 preadipocytes into
adipocytes (P < 0.001) and after treatments
with TNF-alpha, IL-6, insulin, and
dexamethasone (P < 0.01).
IL-6 NUCB2 (nesfatin-1)
Relation: MolTransport
Effect: Positive
CellType: Adipocytes
Cell Line: 3T3-L1
Value =
Probable
Source =
Author
Basis = Data
19. The evidence is in data. To structure this:
• There are many different research databases– both generic
(Dryad, Dataverse, DataBank, Zenodo, etc) and specific
(NIF, IEDA, PDB)
• There are many systems for creating/sharing workflows
(Taverna, MyExperiment, Vistrails, Workflow4Ever,)
• There are many e-lab notebooks
(LabGuru, LabArchives, LaBlog etc)
• There are scores of
projects, committees, standards, bodies, grants, initiatives,
conferences for discussing and connecting all of this
(KEfED, Pegasus, PROV, RDA, Science
Gateways, Codata, BRDI, Earthcube, etc. etc)
• … you could make a living out of this !
20. …but this is what most scientists do:
Using antibodies
and squishy bits
Grad Students experiment
and enter details into their
lab notebook.
The PI then tries to make
sense of their slides,
and writes a paper.
End of story.
21. One attempt to structure data:
CMU Urban Legend
de Waard, A., Burton, S. et al., 2013
26. Step 3: Connect Claims and Evidence
Example: Hunter et al., Hanalyzer:
27. Step 1: Manually identify DDIs and
drug names in wide collection of
content sources
Step 2: Develop a model of Drug-Drug
Interaction and define candidates
Step 3: Automate this process
and store as Linked Data
Example: Drug-Drug Interactions
Boyce, Schroeder et al., 2013
29. Using what is known about interactions in fly & yeast,
predict new interactions with a human protein –
Running over data on the web that he neither created nor knew about!
Given a protein P in Species X:
Find proteins similar to P in Species Y
Retrieve interactors in Species Y
Sequence-compare Y-interactors with Species X
genome
(1) Keep only those with homologue in
Find proteins similar to P in Species Z
Retrieve interactors in Species Z
Sequence-compare Z-interactors with (1)
Putative interactors in Species X
Example: do science ON the web:
30. Great! So we’re almost
done, right – and we can all go
home!
Not so fast…
31. Both seminomas and the EC component of
nonseminomas share features with ES cells. To
exclude that the detection of miR-371-3 merely
reflects its expression pattern in ES cells, we tested
by RPA miR-302a-d, another ES cells-specific
miRNA cluster (Suh et al, 2004). In many of the
miR-371-3 expressing seminomas and
nonseminomas, miR-302a-d was undetectable (Figs
S7 and S8), suggesting that miR-371-3 expression is
a selective event during tumorigenesis.
Both seminomas and the EC component of
nonseminomas share features with ES cells.
To exclude that
the detection of miR-371-3 merely reflects its
expression pattern in ES cells,
we tested by RPA miR-302a-d, another ES cells-
specific miRNA cluster (Suh et al, 2004).
In many of the miR-371-3 expressing seminomas
and nonseminomas, miR-302a-d was undetectable
(Figs S7 and S8),
suggesting that
miR-371-3 expression is a selective event during
tumorigenesis.
Fact
Hypothesis
Method
Result
Implication
Goal
Reg-Implication
Conceptual
knowledge
Experimental
Evidence
What is a claim? In a paragraph?
32. • Voorhoeve et al., 2006: “These miRNAs neutralize p53- mediated CDK
inhibition, possibly through direct inhibition of the expression of the tumor
suppressor LATS2.”
• Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and miR-373
were found to allow proliferation of primary human cells that express
oncogenic RAS and active p53, possibly by inhibiting the tumor suppressor
LATS2 (Voorhoeve et al., 2006).”
• Okada et al., 2011: “Two oncogenic miRNAs, miR-372 and miR-373, directly
inhibit the expression of Lats2, thereby allowing tumorigenic growth in the
presence of p53 (Voorhoeve et al., 2006).”
“[Y]ou can transform .. fiction into fact, just by adding
or subtracting references”, Latour, 1987
What is the claim? Who makes it?
33. > 50 My Papers
2 M scientists
2 My papers/year
Evidence is largely lost….
Majority of data
(90%?) is stored
on local hard drives
Dryad:
7,631 files
Dataverse:
0.6 My
Datacite:
1.5 My
Some data
(8%?) stored in large,
generic data
repositories
MiRB:
25k
PetDB:
1,5 k
TAIR:
72,1 k
PDB:
88,3 k
SedDB:
0.6 k
A small portion of data
(1-2%?) stored in small,
topic-focused
data repositories
35. • In 220 publications only 40% of antibodies, 40% of cell lines and 25% of
constructs can be manually identified (Vasilevsly et al, submitted)
• The good news: we can find automatically
what we can find manually
• Proposal (NIH, June 2013):
– Author is asked to add methods section to a tool
– Tool extracts likely reagents / resources
– User interface asks author to confirm or select
…and you can’t extract it after the fact.
49 publications193 publications 76 publications 214 publications 210 publica
Entity
Type
Precision Recall
Antibody 87.5 63.3
Resource 95.6 98.9
36. Even if we can link to evidence:
• Is it true?
38. We need to improve claim networks:
• Can we make systems of computer-readable
meaning that still represent the fullness of
natural language?
>> Let’s work with computational linguists!
• Trace claims across publications:
>> Let’s work with legal/political argumentation
specialists! Sentiment analysis!
39. > 50 My Papers
2 M scientists
2 My papers/year
Improve evidence: scale up data curation!
Dryad:
7,631 files
Dataverse:
0.6 My
Datacite:
1.5 My
MiRB:
25k
PetDB:
1,5 k
Majority of data
(90%?) is stored
on local hard drives
Some data
(8%?) stored in large,
generic data
repositories
TAIR:
72,1 k
PDB:
88,3 k
SedDB:
0.6 k
A small portion of data
(1-2%?) stored in small,
topic-focused
data repositories
INCREASE DATA
DIGITISATION
DEVELOP
SUSTAINABLE MODELS
IMPROVE
REPOSITORY
INTEROPERABILITY
40. Keep asking big questions:
• Is this true?
• Does it matter?
• To whom?
“Let us now build systems that allow a kid in Mali
who wants to learn about proteomics to not be
overwhelmed by the irrelevant and the untrue.”
- John Perry Barlow, iAnnotate, SF 2013
41. In Memoriam Douglas C. Engelbart, 1925-2013:
“This is an initial summary report of a project taking a new
and systematic approach to improving the intellectual
effectiveness of the individual human being. A detailed
conceptual framework explores the nature of the system
composed of the individual and the tools, concepts, and
methods that match his basic capabilities to his problems.
One of the tools that shows the greatest immediate promise
is the computer, when it can be harnessed for direct on-line
assistance, integrated with new concepts and methods.”
42. Summary:
• The problem: life is difficult.
• One approach to tackle this: claim-evidence
networks:
– Find claims
– Identify evidence
– Connect the two.
• But we still need:
– Better ways to represent subtlety of natural language
– Better evidence: more structured, better connected
– Focus on the big questions.
• There’s a lot of work to do!
43. Collaborations and discussions gratefully acknowledged:
• CMU: Nathan Urban, Shreejoy Tripathy, Shawn Burton, Ed Hovy
• UCSD: Phil Bourne, Brian Shoettlander, Ilya Zaslavsky
• NIF: Maryann Martone, Anita Bandrowski
• MSU: Brian Bothner
• OHSU: Melissa Haendel, Nicole Vasilevsky
• CDL: Carly Strasser, John Kunze, Stephen Abrams
• Harvard/MGH: Tim Clark, Paolo Ciccarese
• VU: Rinke Hoekstra, Frank van Harmelen, Paul Groth
• Columbia/IEDA: Kerstin Lehnert, Leslie Hsu
• University of Pittsburgh: Richard Boyce
• Xerox Research Europe: Agnes Sandor
• DERI: Jodi Schneider
Thank you!
44. References:
• de Waard, Buckingham Shum, Park, Samwald, Sandor, 2009: Hypotheses, Evidence and Relationships, ISWC2009
• Biological Expression Language – http://www.openbel.org
• Latour, B. and Woolgar, S., Laboratory Life: the Social Construction of Scientific Facts, 1979, Sage Publications
• Latour, B., Science in Action, 1987
• de Waard, A. and Pander Maat, H. (2012). Epistemic Modality and Knowledge Attribution in Scientific Discourse: A
Taxonomy of Types and Overview of Features. Proceedings of the 50th Annual Meeting of the Association for
Computational Linguistics, pages 47–55, Jeju, Republic of Korea, 12 July 2012.
• Data2Semantics project: http://www.data2semantics.org/
• Sándor, Àgnes and de Waard, Anita, (2012). Identifying Claimed Knowledge Updates in Biomedical Research
Articles, Workshop on Detecting Structure in Scholarly Discourse, ACL 2012.
• de Waard, A. and Schneider, J. (2012) Formalising Uncertainty: An Ontology of Reasoning, Certainty and Attribution
(ORCA), Semantic Technologies Applied to Biomedical Informatics and Individualized Medicine workshop, ISWC 2012
• de Waard, A., Burton, S.D., Gerkin, R.C., Harviston, M., Marques, D., Tripathy, S.J., Urban, N.N., Creating an Urban
Legend: A System for Electrophysiology Data Management and Exploration, Discovery Informatics, 2013
• Boyce, R.D., Horn, J.R., Hassanzadeh, O., de Waard, A., Schneider, J., Luciano, J. S, Liakata, M., Dynamic enhancement of
drug process labels to support drug safety, efficacy, and effectiveness. Jnl of Biomedical Semantics, 2013, 4:5.
• Hoekstra, R., de Waard,A., Vdovjak, R. (2012) Annotating Evidenced Based Clinical Guidelines - A Lightweight
Ontology, Proceedings of SWAT4LS 2012, Paris, Adrian Paschke, Albert Burger, Paolo Roma, M. Scott Marshall, Andrea
Splendiani (ed.), Springer.
http://researchdata.elsevier.com/