Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
How Scientists Read, How Computers Read, and What We Should Do
1. How Scientists Read,
How Computers Read,
and What We Should Do
(= not what it says in the abstract!)
Anita de Waard
Disruptive Technologies Director
Elsevier Labs
2. Outline
1. How do scientists read?
2. How do computers read?
3. What should we do?
3. Outline
1. How do scientists read?
2. How do computers read?
3. What should we do?
4. How we read
• Letter < syllable < word < clause < sentence < discourse:
This is how linguistics is structured.
But it is not how we understand text!
5. How we read
• Letter < syllable < word < clause < sentence < discourse:
This is how linguistics is structured.
But it is not how we understand text!
6. How we read
• Letter < syllable < word < clause < sentence < discourse:
This is how linguistics is structured.
But it is not how we understand text!
7. How we read
• Letter < syllable < word < clause < sentence < discourse:
This is how linguistics is structured.
But it is not how we understand text!
8. How we read
• Letter < syllable < word < clause < sentence < discourse:
This is how linguistics is structured.
But it is not how we understand text!
9. How we read
• Letter < syllable < word < clause < sentence < discourse:
This is how linguistics is structured.
But it is not how we understand text!
10. Scientists read:
• Why do scientists read?
– They want to ingest knowledge:
– read, integrate with their current knowledge
• What do scientists read?
– Things that are ‘interesting’ :
– Pertinent (within their ‘shell of interest’)
– Possibly or probably true
– Novel, but in agreement with what we know
11. What is this paper about?
NOUN PHRASES
transiently expressed miRNA sponges
human breast cancer high-grade malignancy
miR-31
noninvasive MCF7-Ras
antisense oligonucleotides
cell viability cloned
retroviral vector
Is it pertinent? -> Possibly…
Is it true? -> ?
Is it new, but in agreement with what I know? -> -?
12. What is this paper about?
TRIPLES
miR-31 expression DEPRIVE metastatic cells
miR-31 PREVENT acquisition of aggressive traits
miR-31 INHIBIT noninvasive MCF7-Ras cells
miR-31 ENHANCE invasion
cell viability AFFECT inhibitor
Is it pertinent? -> Possibly…
Is it true? -> ?
Is it new, but in agreement with what I know? ->?
13. What is this paper about?
METADISCOURSE
The preceding observations demonstrated that X expression deprives Y cells of
attributes associated with Z.
We next asked whether X also prevents the acquisition of A traits by B cells.
To do so, we transiently inhibited X in C cells with either D or E.
Both approaches inhibited X function by > 4.5-fold (Figure S7A).
Suppression of X enhanced invasion by 20-fold and motility by 5-fold, but F was
unaffected by either inhibitor (Figure 3A; Figure S7B).
The E sponge reduced X function by 2.5-fold, but did not affect the activity of other
known Js (Figures S8A and S8B).
Collectively, these data indicated that sustained X activity is necessary to prevent the
acquisition of Z traits by both K and untransformed B cells.
Is it pertinent? -> Need content
Is it true? -> Sounds likely! I know this stuff!
Is it new, but in agreement with what I know? -> Need content
14. What is this paper about?
CLAIMS AND EVIDENCE
Claim:
• sustained miR-31 activity is necessary to prevent the acquisition of aggressive
traits by both tumor cells and untransformed breast epithelial
Evidence: Method:
• We transiently inhibited miR-31 in noninvasive MCF7-Ras cells with either
antisense oligonucleotides or miRNA sponges.
Evidence: Result:
• Both approaches inhibited miR-31 function by >4.5-fold (Figure S7A).
• Suppression of miR-31 enhanced invasion by 20-fold and motility by 5-
fold, but cell viability was unaffected by either inhibitor (Figure 3A; Figure
S7B).
• The miR-31 sponge reduced miR-31 function by 2.5-fold, but did not affect
the activity of other known antimetastatic miRNAs (Figures S8A and S8B).
Is it pertinent? -> Probably
Is it true? -> Sounds likely!
Is it new, but in agreement with what I know? -> Check/know
15. What is this paper about?
DATA
Is it pertinent? -> Need content
Is it true? -> Need methods
Is it new, but in agreement with what I know? -> Check/know
16. What is this paper about?
METADATA
Is it pertinent? -> Possibly
Is it true? -> Probably!
Is it new, but in agreement with what I know? -> Need background
17. How scientists read:
Representation Pertinence Truth Fit with
knowledge
Noun phrases x
Triples x
Metadiscourse x
Claims and evidence x x x
Data x x x
Metadata x
Text mining
Publishing
Data-centric science
18. Outline
1. How do scientists read?
2. How do computers read?
3. What should we do?
20. Noun Phrases: some progress
• Despite these difficulties, noun phrase recall/precision is
quite high, e.g. I2B22011 [1], [2], others: 90%-98%
• Many tools, see [3] for a list; e.g. GoPubMed:
21. Triples: some issues:
• Contingent on good NP & VP detection
• Hard to parse text! E.g. a commercial tool gave:
insulin maintaining glucose homeostasis
When insulin secretion cannot be increased adequately (type I
diabetes defect) to overcome insulin resistance in maintaining
glucose homeostasis, hyperglycemia and glucose intolerance
ensues.
insulin may be involved glucose homeostasis
Because PANDER is expressed by pancreatic beta-cells and in
response to glucose in a similar way to those of insulin, PANDER
may be involved in glucose homeostasis.
22. Triples: some progress:
Biological Expression Language [4]:
We provide evidence that these miRNAs are potential novel oncogenes participating in the development
of human testicular germ cell tumors by numbing the p53 pathway, thus allowing tumorigenic growth in
the presence of wild-type p53.
Increased abundance of miR-372 decreases activity of TP53
r(MIR:miR-372) -| tscript(p(HUGO:Trp53))
Context: cancer
SET Disease = “Cancer”
Activity of TP53 decreases cell growth
tscript(p(HUGO:Trp53)) -| bp(GO:”Cell Growth”
25. Metadiscourse: why it matters:
“[Y]ou can transform .. fiction into fact just by adding or
subtracting references”, Bruno Latour [5]
• Voorhoeve et al., 2006: “These miRNAs neutralize p53- mediated CDK
inhibition, possibly through direct inhibition of the expression of the tumor
suppressor LATS2.”
• Kloosterman and Plasterk, 2006: “In a genetic screen, miR-372 and miR-373
were found to allow proliferation of primary human cells that express
oncogenic RAS and active p53, possibly by inhibiting the tumor suppressor
LATS2 (Voorhoeve et al., 2006).”
• Okada et al., 2011: “Two oncogenic miRNAs, miR-372 and miR-373, directly
inhibit the expression of Lats2, thereby allowing tumorigenic growth in the
presence of p53 (Voorhoeve et al., 2006).”
26. Adding metadiscourse to triples:
Biological statement with BEL/ epistemic BEL representation: Epistemic
markup evaluation
These miRNAs neutralize p53-mediated CDK r(MIR:miR-372) - Value =
inhibition, possibly through direct inhibition |(tscript(p(HUGO:Trp53)) -| Possible
of the expression of the tumor-suppressor kin(p(PFH:”CDK Family”))) Source =
LATS2. Increased abundance of miR- Unknown
372 decreases abundance of Basis =
LATS2 Unknown
r(MIR:miR-372) -|
r(HUGO:LATS2)
Biological statement with MedScan Analysis: Epistemic
Medscan/epistemic markup evaluation
Furthermore, we present evidence that the IL-6 NUCB2 (nesfatin-1) Value =
secretion of nesfatin-1 into the culture Relation: MolTransport Probable
media was dramatically increased during the Effect: Positive Source =
differentiation of 3T3-L1 preadipocytes into CellType: Adipocytes Author
adipocytes (P < 0.001) and after treatments Cell Line: 3T3-L1 Basis = Data
with TNF-alpha, IL-6, insulin, and
dexamethasone (P < 0.01).
27. Claims and Evidence, some examples:
Data2Semantics [11]
• Linking clinical guidelines to evidence in a linked data form
• Goal: improve speed of integration of research > practice
• Issue: evidence is not even correct within guideline?
• Studies have demonstrated inconsistent results regarding the
use of such markers of inflammation as C-reactive protein (CRP),
interleukins- 6 (IL-6) and -8, and procalcitonin (PCT) in
neutropenic patients with cancer [55–57].
• [55]: PCT and IL-6 are more reliable markers than CRP for
predicting bacteremia in patients with febrile neutropenia
• [56] In conclusion, daily measurement of PCT or IL-6
could help identify neutropenic patients with a stable
course when the fever lasts >3 d. …,
it would reduce adverse events and treatment costs.
• [57] Our study supports the value of PCT as a reliable tool to
predict clinical outcome in febrile neutropenia.
28. Claims and Evidence, example:
Drug Interaction Knowledgebase [12]
• Extracting adverse drug interactions (ADIs) from literature
and creating linked data node of this
• Goal: improve speed and coverage of ADIs and allowing
improved access to patients and doctors
• Issue: how to identify evidence?
– Claim:
R-citalopram_is_not_substrate_of_cyp2c19:
– Evidence:
At 10uM R- or S-CT, ketoconazole reduced reaction velocity to 55 -
60% of control, quinidine to 80%, and omeprazole to 80-85% of
control (Fig. 6)
29. Data, e.g. Web Science 2.0:
Mark Wilkinson (SADI, Madrid)
Using what is known about interactions in fly & yeast:
predict new interactions with a human protein
30. Wilkinson: doing science ON the web:
These are different
Web services!
...selected at run-time based
on the same model
31. Data
• All this evidence is based on data
• Increasingly: science is distributed between
– Groups creating data
– Groups using data – creating tools
– Groups using tools on data – ideas
• All of these groups need to communicate!
32. In summary:
1. How do scientists read?
2. How do computers read?
3. What should we do?
33. How we read vs. computers:
Level: People read: Computers read:
Noun phrases Know topic Pretty well
Triples Know topic Pretty well
Metadiscourse Trust method Not very well
Claims and evidence Understand and trust Not very well
Data Trust - and new science! Can enable!
34. Is this the future of publishing? [17]
1. Research: Each item in the system has metadata
metadata
(including provenance) and relations to other data items
metadata added to it.
2. Workflow: All data items created in the lab are added to a
metadata
(lab-owned) workflow system.
3. Authoring: A paper is written in an authoring tool which can
pull data with provenance from the workflow tool in the
appropriate representation into the document.
metadata 4. Editing and review: Once the co-authors agree, the paper is
„exposed‟ to the editors, who in turn expose it to reviewers.
metadata
Reports are stored in the authoring/editing system, the paper
gets updated, until it is validated.
5. Publishing and distribution: When a paper is
published, a collection of validated information is exposed to
the world. It remains connected to its related data item, and
Rats were subjected to two grueling its heritage can be traced.
tests
(click on fig 2 to see underlying data). 6. User applications: distributed applications run on this
These results suggest that the
neurological pain pro- „exposed data‟ universe.
Publisher runs
Review
Revise
service (‘app’)
Edit
Publisher runs
service (‘app’)
35. What should we do?
• Experiment! All over the place. Scientists get it !
• Support scientists working on these (e.g. text
miners, web science evangelists, data repositories, etc
etc) – great return for your investment!
• Join forums where interactions happen between
scientists, publishers, libraries, etc. e.g. Force11.org:
– Collective, sponsored by Sloane, aimed at
enabling/supporting this discussion
– Planning workshop,
innovative projects for 2013
– Please join us at
http://force11.org!
36. Thank you!
Anita de Waard
a.dewaard@elsevier.com
http://elsatglabs.com/labs/anita/
37. References
[1] J Am Med Inform Assoc. 2010 September; 17(5): 514–518 http://dx.doi.org/10.1136/jamia.2010.003947
[2] Quanzhi Li, Yi-Fang Brook Wu (2006): Identifying important concepts from medical documents, Journal of Biomedical Informatics 39 (2006)
668–679
[3] Useful list of resources in bioinformatics http://www.bioinformatics.ca/
[4] Biological Expression Language – http://www.openbel.org
[5] Latour, B. and Woolgar, S., Laboratory Life: the Social Construction of Scientific Facts, 1979, Sage Publications
[6] Light M, Qiu XY, Srinivasan P. (2004). The language of bioscience: facts, speculations, and statements in between. BioLINK 2004: Linking
Biological Literature, Ontologies and Databases 2004:17-24.
[7] Wilbur WJ, Rzhetsky A, Shatkay H (2006). New directions in biomedical text annotations: definitions, guidelines and corpus construction. BMC
Bioinformatics 2006, 7:356.
[8] Thompson P., Venturi G., McNaught J, Montemagni S, Ananiadou S. (2008). Categorising modality in biomedical texts. Proc. LREC 2008 Wkshp
Building and Evaluating Resources for Biomedical Text Mining 2008.
[9] Kim, S-M. Hovy, E.H. (2004). Determining the Sentiment of Opinions. Proceedings of the COLING conference, Geneva, 2004.
[10] de Waard, A. and Pander Maat, H. (2012). Epistemic Modality and Knowledge Attribution in Scientific Discourse: A Taxonomy of Types and
Overview of Features. Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pages 47–55, Jeju, Republic of
Korea, 12 July 2012.
[11] Data2Semantics project: http://www.data2semantics.org/
[12] Boyce R, Collins C, Horn J, Kalet I. (2009) Computing with evidence Part I: A drug-mechanism evidence taxonomy oriented toward
confidence assignment. J Biomed Inform. 2009 Dec;42(6):979-89. Epub 2009 May 10, see also http://dbmi-icode-01.dbmi.pitt.edu/dikb-
evidence/front-page.html
[13] Sándor, Àgnes and de Waard, Anita, (2012). Identifying Claimed Knowledge Updates in Biomedical Research Articles, Workshop on Detecting
Structure in Scholarly Discourse, ACL 2012.
[14] Blake, C. (2010) Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles, Journal of Biomedical
Informatics, 43(2):173-189
[15] See e.g. http://ucsdbiolit.codeplex.com/ and http://research.microsoft.com/en-us/projects/ontology/ for MS Word ontology add-ins
[16] de Waard, A. and Schneider, J. (2012) Formalising Uncertainty: An Ontology of Reasoning, Certainty and Attribution (ORCA), Semantic
Technologies Applied to Biomedical Informatics and Individualized Medicine workshop, ISWC 2012
[17] de Waard, A. (2010). The Future of the Journal? Integrating research data with scientific discourse, LOGOS: The Journal of the World Book
Community, Volume 21, Numbers 1-2, 2010 , pp. 7-11(5) also published in Nature
Precedings,http://precedings.nature.com/documents/4742/version/1