How Scientists Read, And Whether Computers Can Help Them
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

How Scientists Read, And Whether Computers Can Help Them

  • 1,721 views
Uploaded on

Talk given at the COBRE workshop August 23-25 2012, Bozeman, MT http://www.chemistry.montana.edu/cobre/workshop/Program.html

Talk given at the COBRE workshop August 23-25 2012, Bozeman, MT http://www.chemistry.montana.edu/cobre/workshop/Program.html

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,721
On Slideshare
1,681
From Embeds
40
Number of Embeds
1

Actions

Shares
Downloads
5
Comments
0
Likes
0

Embeds 40

https://twitter.com 40

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. How  Scien*sts  Read,  And  Whether   Computers  Can  Help  Them   Anita  de  Waard   Disrup*ve  Technologies  Director   Elsevier  Labs   Making  Sense  of  Biological  Systems,  Bozeman,  MT  
  • 2. Outline  •  Why  do  scien*sts  read?  •  How  do  we  read?  (Discourse  comprehension  101)  •  What  do  we  need  to  read:     –  Noun  phrases   –  Triples   –  Metadiscourse   –  Claims  and  Evidence  •  Can  the  computer  iden*fy  these  components?    •  Some  thoughts  on  explaining  our  texts  to  computers  
  • 3. How  and  why  scien*sts  read:  •  Why  do  we  read?     To  learn,  i.e.:  obtain  the  knowledge  contained  within  the   text  and  integrate  it  with  what  we  already  know.  •  What  do  we  read?     Things  that  are  ‘interes*ng’  :   –  Per*nent   –  Possibly/probably  true   –  Novel,  but  in  agreement  with  what  I  know  •  How  do  we  read?    
  • 4. Discourse  Comprehension  101  •  LeTer  <  syllable  <  word  <  clause  <  sentence  <  discourse:   This  is  how  linguis*cs  is  structured.     But  it  is  not  how  we  understand  text!  
  • 5. Discourse  Comprehension  101  •  LeTer  <  syllable  <  word  <  clause  <  sentence  <  discourse:   This  is  how  linguis*cs  is  structured.     But  it  is  not  how  we  understand  text!  
  • 6. Discourse  Comprehension  101  •  LeTer  <  syllable  <  word  <  clause  <  sentence  <  discourse:   This  is  how  linguis*cs  is  structured.     But  it  is  not  how  we  understand  text!  
  • 7. Discourse  Comprehension  101  •  LeTer  <  syllable  <  word  <  clause  <  sentence  <  discourse:   This  is  how  linguis*cs  is  structured.     But  it  is  not  how  we  understand  text!  
  • 8. Discourse  Comprehension  101  •  LeTer  <  syllable  <  word  <  clause  <  sentence  <  discourse:   This  is  how  linguis*cs  is  structured.     But  it  is  not  how  we  understand  text!  
  • 9. Discourse  Comprehension  101  •  LeTer  <  syllable  <  word  <  clause  <  sentence  <  discourse:   This  is  how  linguis*cs  is  structured.     But  it  is  not  how  we  understand  text!  
  • 10. Discourse  Comprehension  101  •  LeTer  <  syllable  <  word  <  clause  <  sentence  <  discourse:   This  is  how  linguis*cs  is  structured.     But  it  is  not  how  we  understand  text!  •  Kintsch  and  Van  Dijk,  ‘93:  we  read  a  text  at  three  levels:   –  surface  code:  literal  text,  exact  words/syntax   –  text  base:  preserves  meaning,  but  not  exact  wording   –  situa*on  model:  ‘microworld’  that  the  text  is  about:   constructed  inferen*ally  through  interac*on  between  the   text  and  background  knowledge  •  We  use  knowledge  about  text  genre  to  ac*vate  a  schema:     this  allows  crea*on  of  the  text  base  and  situa*on  model  
  • 11. Examples  of  schema’s:    
  • 12. What  is  this  paper  about?    
  • 13. What  is  this  paper  about?     A.  NOUN  PHRASES   transiently  expressed  miRNA  sponges   human  breast  cancer     high-­‐grade  malignancy   miR-­‐31   noninvasive  MCF7-­‐Ras   an*sense  oligonucleo*des     cell  viability     cloned     retroviral  vector  Is  it  per*nent?  -­‐>  Possibly…  Is  it  true?  -­‐>  ?  Is  it  new,  but  in  agreement  with  what  I  know?  -­‐>  -­‐?  
  • 14. What  is  this  paper  about?     B.  TRIPLES   miR-­‐31  expression  DEPRIVE  metasta*c  cells   miR-­‐31  PREVENT  acquisi*on  of  aggressive  traits   miR-­‐31  INHIBIT  noninvasive  MCF7-­‐Ras  cells     miR-­‐31  ENHANCE    invasion     cell  viability  AFFECT  inhibitor    Is  it  per*nent?  -­‐>  Possibly…  Is  it  true?  -­‐>  ?  Is  it  new,  but  in  agreement  with  what  I  know?  -­‐>?  
  • 15. What  is  this  paper  about?     C.  METADISCOURSE  The  preceding  observa*ons  demonstrated  that  X  expression  deprives  Y  cells  of  aTributes  associated  with  Z.    We  next  asked  whether  X  also  prevents  the  acquisi*on  of  A  traits  by  B  cells.  To  do  so,  we  transiently  inhibited  X  in  C  cells  with  either  D  or  E.    Both  approaches  inhibited  X  func*on  by  >  4.5-­‐fold  (Figure  S7A).  Suppression  of  X  enhanced  invasion  by  20-­‐fold  and  mo*lity  by  5-­‐fold,    but  F  was  unaffected  by  either  inhibitor  (Figure  3A;  Figure  S7B).      The  E  sponge  reduced  X  func*on  by  2.5-­‐fold,  but  did  not  affect  the  ac*vity  of  other  known  Js  (Figures  S8A  and  S8B).    Collec*vely,  these  data  indicated  that  sustained  X  ac*vity  is  necessary  to  prevent  the  acquisi*on  of  Z  traits  by  both  K  and  untransformed  B  cells.     Is  it  per*nent?  -­‐>  Need  content   Is  it  true?  -­‐>  Sounds  likely!  I  know  this  stuff!   Is  it  new,  but  in  agreement  with  what  I  know?    -­‐>  Need  content    
  • 16. What  is  this  paper  about?     D.  CLAIMS  AND  EVIDENCE  Claim:    •  sustained  miR-­‐31  ac*vity  is  necessary  to  prevent  the  acquisi*on  of  aggressive   traits  by  both  tumor  cells  and  untransformed  breast  epithelial  Evidence:  Method:    •  We   transiently   inhibited   miR-­‐31   in   noninvasive   MCF7-­‐Ras   cells   with   either   an*sense  oligonucleo*des  or  miRNA  sponges.  Evidence:  Result:    •  Both  approaches  inhibited  miR-­‐31  func*on  by  >4.5-­‐fold  (Figure  S7A).    •  Suppression   of   miR-­‐31   enhanced   invasion   by   20-­‐fold   and   mo*lity   by   5-­‐fold,   but  cell  viability  was  unaffected  by  either  inhibitor  (Figure  3A;  Figure  S7B).    •  The   miR-­‐31   sponge   reduced   miR-­‐31   func*on   by   2.5-­‐fold,   but   did   not   affect   the  ac*vity  of  other  known  an*metasta*c  miRNAs  (Figures  S8A  and  S8B).   Is  it  per*nent?  -­‐>  Probably   Is  it  true?  -­‐>  Sounds  likely!         Is  it  new,  but  in  agreement  with  what  I  know?  -­‐>  Check/know  
  • 17. What  is  this  paper  about?     E.  JOURNAL  &  AUTHOR’S  NAMES/AFFILIATIONS  Is  it  per*nent?  -­‐>  Possibly    Is  it  true?     -­‐>  Probably!  Is  it  new,  but  in  agreement  with  what  I  know?    -­‐>  Need  background  
  • 18. In  summary,  how  scien*sts  read:  •  Surface  code  provides  noun  phrases  and  triples  that  offer   pointers  re.  topical  relevance  •  Text  base  and  and  situa*on  model  are  created  through  specific   metadiscourse  conven*ons    (e.g.  refs  at  the  end)  that  create  a   biological  reasoning  model:     We  next  asked  whether  …   Hypothesis   To  do  so,  we  transiently  inhibited…     Goal/Method   Suppression  of  X  enhanced  invasion  …     Result   but  F  was  unaffected  …(Figure  3A).    …   Results   Collec*vely,  these  data  indicated  that  …  .   Implica*on  •  This  can  be  expressed  as  a  set  of  claims,  linked  to  evidence,  that   can  help  represent  key  points  in  the  paper  •  Journal  name  and  author’s  affiliaHon  help  define  schema  and   provide  ‘willingness  to  be  convinced’  socially/interpersonally.  
  • 19. Can  computers  help  us  iden*fy:  A.  Noun  phrases  B.  Triples  C.  Metadiscourse  elements  D.  Claims  +  evidence  E.  Journal  and  author’s  names  and  affilia*on  
  • 20. Can  computers  help  us  iden*fy:  A.  Noun  phrases  B.  Triples  C.  Metadiscourse  elements  D.  Claims  +  evidence  E.  Journal  and  author’s  names  and  affiliaHon  
  • 21. Noun  Phrases:  some  issues  •  Problem  1:  disambigua*ng  terms  (©  GoPubMed):   –  Hnrpa1  =  Tis  =  Fli-­‐2  =  nuclear  ribonucleoprotein  A1  =  helix   destabilizing  protein  =  single-­‐strand  binding  protein  =  hnRNP  core   protein  A1  =  HDP-­‐1  =  topoisomerase-­‐inhibitor  suppressed.   –  Cellulose  1,4-­‐beta-­‐cellobiosidase  =  exoglucanase   –  COLD  =/  C.O.L.D.  =/  cold  (runny  nose)  =/  cold  (low  T)    •  Problem  2:  disambigua*ng  en**es  (©  M.  Martone):   –  95  an*bodies  were  (manually!)  iden*fied  in  8  ar*cles   –  52  did  not  contain  enough  informa*on  to  determine  the  an*body   used   –  Some  provided  details  in  other  papers   –  Failed  to  give  species,  clonality,  vendor,  or  catalog  number  
  • 22. Noun  Phrases:  some  progress  •  Despite  these  difficul*es,  noun  phrase  recall/precision  is   quite  high,  e.g.  I2B22011  [1],  [2],  others:  90%-­‐98%  •  Many  tools,  see  [3]  for  a  list;  e.g.  GoPubMed:      
  • 23. Triples:  some  issues:  •  Con*ngent  on  good  NP  &  VP  detec*on  •  Hard  to  parse  text!  E.g.  a  commercial  tool  gave:  insulin    maintaining      glucose  homeostasis      When  insulin  secre*on  cannot  be  increased  adequately  (type  I  diabetes  defect)  to  overcome  insulin  resistance  in  maintaining  glucose  homeostasis,  hyperglycemia  and  glucose  intolerance  ensues.    insulin    may  be  involved      glucose  homeostasis      Because  PANDER  is  expressed  by  pancrea*c  beta-­‐cells  and  in  response  to  glucose  in  a  similar  way  to  those  of  insulin,  PANDER  may  be  involved  in  glucose  homeostasis.  
  • 24. Triples:  some  progress:  Biological  Expression  Language  [4]:    We  provide  evidence  that  these  miRNAs  are  potenHal  novel  oncogenes  parHcipaHng  in  the  development  of  human  tesHcular  germ  cell  tumors  by  numbing  the  p53  pathway,  thus  allowing  tumorigenic  growth  in  the  presence  of  wild-­‐type  p53.    Increased  abundance  of  miR-­‐372  decreases  ac5vity  of  TP53  r(MIR:miR-372) -| tscript(p(HUGO:Trp53))Context:  cancer  SET Disease = “Cancer”Ac5vity  of  TP53  decreases  cell  growth  tscript(p(HUGO:Trp53)) -| bp(GO:”Cell Growth”  
  • 25. Metadiscourse:  why  it  maTers   “[Y]ou  can  transform  ..  fic*on  into  fact  just  by  adding  or   subtrac*ng  references”,  Bruno  Latour  [5]•  Voorhoeve  et  al.,  2006:   These  miRNAs  neutralize  p53-­‐  mediated  CDK   inhibi*on,  possibly  through  direct  inhibi*on  of  the  expression  of  the  tumor   suppressor  LATS2.  •  Kloosterman  and  Plasterk,  2006:   In  a  gene*c  screen,  miR-­‐372  and  miR-­‐373   were  found  to  allow  prolifera*on  of  primary  human  cells  that  express   oncogenic  RAS  and  ac*ve  p53,  possibly  by  inhibi*ng  the  tumor  suppressor   LATS2  (Voorhoeve  et  al.,  2006).  •  Yabuta  et  al.,  2007:     [On  the  other  hand,]  two  miRNAs,  miRNA-­‐372  and-­‐373,   func*on  as  poten5al  novel  oncogenes  in  tes*cular  germ  cell  tumors  by   inhibi*on  of  LATS2  expression,  which  suggests  that  Lats2  is  an  important   tumor  suppressor  (Voorhoeve  et  al.,  2006).    •  Okada  et  al.,  2011:   Two  oncogenic  miRNAs,  miR-­‐372  and  miR-­‐373,  directly   inhibit  the  expression  of  Lats2,  thereby  allowing  tumorigenic  growth  in  the   presence  of  p53  (Voorhoeve  et  al.,  2006).  
  • 26. Metadiscourse:  some  progress  •  Hedging  cues,  specula*ve  language,  modality/nega*on:   –  Light  et  al  [6]:  finding  specula*ve  language   –  Wilbur  et  al  (Hagit)  [7]:  focus,  polarity,  certainty,  evidence,  and   direc*onality   –  Thompson  et  al  (Sophia)  [8]:  level  of  specula*on,  type/source   of  the  evidence  and  level  of  certainty      •  Sen*ment  detec*on  (e.g.  Kim  and  Hovy  [9]  a.m.o.):     –  Holder  of  the  opinion,  strength,  polarity  as  ‘mathema*cal   func*on’  ac*ng  on  main  proposi*onal  content    •  Can  make  this  part  of  the  seman*c  web:  (e.g.,  Ontology  for   Reasoning,  Certainty  and  ATribu*on,  ORCA  [10]):     –  Value  (Presumed  True,  Probable,  Possible,  Unknown)   –  Source  (Author,  Named  Other,  Unknown)   –  Basis  (Data,  Reasoning,  Unknown)  
  • 27. Claims  and  Evidence:  some  issues:  •  Data2Seman*cs  [11]:  linking  clinical  guidelines  to  evidence.   Inconsistency  within  guideline  and  guidelines  v.  evidence:       •  Studies  have  demonstrated  inconsistent  results  regarding  the  use  of  such   markers  of  inflamma*on  as  C-­‐reac*ve  protein  (CRP),  interleukins-­‐  6  (IL-­‐6)  and   -­‐8,  and  procalcitonin  (PCT)  in  neutropenic  pa*ents  with  cancer  [55–57].     •  [55]:  PCT  and  IL-­‐6  are  more  reliable  markers  than  CRP  for  predic*ng   bacteremia  in  pa*ents  with  febrile  neutropenia   •  [56]  In  conclusion,  daily  measurement  of  PCT  or  IL-­‐6  could  help  iden5fy   neutropenic  pa5ents  with  a  stable  course  when  the  fever  lasts  >3  d.  …,     it  would  reduce  adverse  events  and  treatment  costs.     •  [57]  Our  study  supports  the  value  of  PCT  as  a  reliable  tool  to  predict   clinical  outcome  in  febrile  neutropenia.  •  Drug  Interac*on  Knowledgebase  [12]:  how  to  iden*fy  evidence?     •  R-­‐citalopram_is_not_substrate_of_cyp2c19:     •  At  10uM  R-­‐  or  S-­‐CT,  ketoconazole  reduced  reac*on  velocity  to  55  -­‐60%  of   control,  quinidine  to  80%,  and  omeprazole  to  80-­‐85%  of  control  (Fig.  6).    
  • 28. Claims  and  Evidence:  some  progress  •  Defining  ‘salient  knowledge  components’  in  text:   –  Argumenta*ve  zones,  CoreSC  can  both  be  found   –  Blake,  Claim  networks  (more  soon!)   –  Claimed  Knowledge  Updates  (Sandor/de  Waard,  [13]):      
  • 29. Perhaps  we  should  start  wri*ng  for   computers?  •  So  why  doesn’t  the  author  add  this  informa*on?     If  you’re  know  you’re  going  to  mine  it,  why  bury  it?  •  Authoring  tools  for  en*ty  iden*fica*on:  MS  for   Chemistry,  Math,  proteins;  some  experiments  but  no   solu*on  yet  [14]  •  Authoring  tool  for  triple  iden*fica*on  (MS  Ac*veText)  •  But  the  ques*on  remains:     A}er  we’ve  ‘extracted’  all  the  ‘facts’,   what  is  all  the  gunk  that  remains     in  the  filter?      
  • 30. Perhaps  we  should  explain:  a  paper  is  rhetorical?   Aristotle   Quin5lian   Scien5fic  Paper   The  introduc*on  of  a  speech,  where  one  announces  the  subject   Introduc*on and  purpose  of  the  discourse,  and  where  one  usually  employs   Introduc*on:  prooimion   /  exordium   the  persuasive  appeal  to  ethos  in  order  to  establish  credibility   posi*oning   with  the  audience.     Statement  of   The  speaker  here  provides  a  narra*ve  account  of  what  has   Introduc*on:  research   prothesis   Facts/ happened  and  generally  explains  the  nature  of  the  case.     narraHo   ques*on   Summary/   The  proposi*o  provides  a  brief  summary  of  what  one  is  about     proposHHo   to  speak  on,  or  concisely  puts  forth  the  charges  or  accusa*on.     Summary  of  contents   Proof/   The  main  body  of  the  speech  where  one  offers  logical   pis*s   confirmaHo   arguments  as  proof.  The  appeal  to  logos  is  emphasized  here.   Results   Refuta*on/   As  the  name  connotes,  this  sec*on  of  a  speech  was  devoted  to     refutaHo   answering  the  counterarguments  of  ones  opponent.   Related  Work   Following  the  refuta*o  and  concluding  the  classical  ora*on,  the   Discussion:  summary,   epilogos   peroraHo     perora*o  conven*onally  employed  appeals  through  pathos,   and  o}en  included  a  summing  up.   implica*ons.  -   goal  of  the  paper  is  to  be  published;  it  uses  author/journal  as  a  host  -   format  has  co-­‐evolved:  predator-­‐prey  rela*onship  with  reviewers  
  • 31. Perhaps  we  should  explain:  a  paper  is  a  story?  Story Grammar The Story of Goldilocks and Paper The AXH Domain of Ataxin-1 Mediates the Three Bears Grammar Neurodegeneration through Its Interaction with Gfi-1/ Senseless Proteins Setting Time Once upon a time Background The mechanisms mediating SCA1 pathogenesis are still not fully understood, but some general principles have emerged. Character a little girl named Goldilocks Objects of the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract, study Location She went for a walk in the forest. Pretty soon, she came upon a Experimental studied and compared in vivo effects and interactions to those of the house. setup human protein Theme Goal She knocked and, when no one Research Gain insight into how Atx-1s function contributes to SCA1 answered, goal pathogenesis. How these interactions might contribute to the disease process and how they might cause toxicity in only a subset of neurons in SCA1 is not fully understood. Attempt she walked right in. Hypothesis Atx-1 may play a role in the regulation of gene expression Episode Name At the table in the kitchen, there Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When Overexpressed were three bowls of porridge. in Files Subgoal Goldilocks was hungry. Subgoal test the function of the AXH domain Attempt She tasted the porridge from the Method overexpressed dAtx-1 in flies using the GAL4/UAS system (Brand and first bowl. Perrimon, 1993) and compared its effects to those of hAtx-1. Outcome This porridge is too hot! she Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which drives exclaimed. expression in the differentiated R1-R6 photoreceptor cells (Mollereau et al., 2000 and OTousa et al., 1985), results in neurodegeneration in Attempt So, she tasted the porridge from the the eye, as does overexpression of hAtx-1[82Q]. Although at 2 days second bowl. after eclosion, overexpression of either Atx-1 does not show obvious morphological changes in the photoreceptor cells Outcome This porridge is too cold, she said Data (data not shown), Attempt So, she tasted the last bowl of porridge. Results both genotypes show many large holes and loss of cell integrity at 28 days Outcome Ahhh, this porridge is just right, she (Figures 1B-1D).
  • 32. A  closer  look  at  verb  tense:  Conceptual realm: ‘state’ (gnomic) present•  ‘Dopaminergic innervation plays a major role in the control of mood and its perturbation’Experimental realm: ‘event’ past•  ‘Four out of seven cell lines expressed this cluster’,•  ‘Adult rats were individually housed for 2 days before testing.’Argumentational realm: ‘instantaneous’ present; to-infinitive•  ‘These results suggest that...’,•  ‘To identify these mechanisms…’Discourse progression: ‘instantaneous’ present•  ‘Fig 2a shows that’•  ‘see figure 7A’,Reference to other work: present perfect - ‘finalised’ past•  ‘Previous work has demonstrated that VPCs are sensitive to the levels of let-60/RAS (Han and Sternberg, 1990).’  
  • 33. Tense  use  in  science  and  mythology:  Facts  in  the   Endogenous  small  RNAs  (miRNAs)  regulate   I  sing  of  golden-­‐throned  Hera  whom  Rhea  bare.  eternal  present   gene  expression  by  mechanisms  conserved   Queen  of  the  immortals  is  she,  surpassing  all  in   across  metazoans.   beauty:  she  is  the  sister  and  the  wife  of  loud-­‐ thundering  Zeus,  -­‐-­‐the  glorious  one  whom  all  the   blessed  throughout  high  Olympus  reverence  and   honor.  Events  in  the   Vehicle-­‐treated  animals  spent  equivalent   Now  the  wooers  turned  to  the  dance  and  to  simple  past   *me  inves*ga*ng  a  juvenile  in  the  first  and   gladsome  song,  and  made  them  merry,  and  waited   second  sessions  in  experiments  conducted  in   *ll  evening  should  come;  and  as  they  made  merry   the  NAC  and  the  striatum:    T1  values  were   dark  evening  came  upon  them.   122  ±  6  s  and  114  ±  5  s.  Events  with   We  also  generated  BJ/ET  cells  expressing  the   And  she  took  her  mighty  spear,  *pped  with  sharp  embedded   RASV12-­‐ERTAM  chimera  gene,  which  is  only   bronze,  heavy  and  huge  and  strong,  wherewith  facts   ac*ve  when  tamoxifen  is  added  (De  Vita  et  al,   she  vanquishes  the  ranks  of  men-­‐of  warriors,  with   2005).   whom  she  is  wroth,  she,  the  daughter  of  the   mighty  sire.  AMribu5on  in   miRNAs  have  emerged  as  important   In  this  book  I  have  had  old  stories  wriTen  down,  as  the  present   regulators  of  development  and  control   I  have  heard  them  told  by  intelligent  people,  perfect   processes  such  as  cell  fate  determina*on  and   concerning  chiefs  who  have  held  dominion  in  the   cell  death  (Abrahante  et  al.,  2003,  Brennecke   northern  countries,  and  who  spoke  the  Danish   et  al.,  2003,  Chang  et  al.,  2004,  Chen  et  al.,   tongue;  and  also  concerning  some  of  their  family   2004,  Johnston  and  Hobert,  2003,  Lee  et  al.,   branches,  according  to  what  has  been  told  me.   1993]  Implica5ons   These  results  indicate  that  although   Now  it  is  said  that  ever  since  then  whenever  the  are  hedged,   miR-­‐372&3  confer  complete  protec*on  to   camel  sees  a  place  where  ashes  have  been  and  in  the   oncogene-­‐induced  senescence  in  a  manner   scaTered,  he  wants  to  get  revenge  with  his  enemy  present  tense   similar  to  p53  inac*va*on,  the  cellular   the  rat  and  stomps  and  rolls  in  the  ashes  hoping  to   response  to  DNA  damage  remains  intact   get  the  rat  
  • 34. Some  conclusions:  •  How  we  read:  surface  code,  textbase,  situa*on  model  •  Useful  components:  find  noun  phrases,  triples,   metadiscourse,  claims  and  evidence    •  Computers  keep  ge•ng  beTer  at  iden*fying  these  •  Authoring  tools  might  let  us  help  computers  •  But  for  the  forseeable  future,  scien*sts  will  con*nue  to   need  to  scan  the  literature  to  understand  and  believe   science  and  make  connec*ons  between  knowledge  •  To  achieve  progress,  perhaps  focus  less  on  what  computers   can  do  and  more  on  how  humans  communicate?  •  Let’s  pursue  collabora*ons  with  linguists,  cogni*ve   psychologists  etc.  on  how  we  read  and  learn!  
  • 35. Acknowledgements  •  Funding:     •  Discussion  partners:     –  Elsevier  Labs   –  Phil  Bourne,  UCSD   –  NWO   –  Ed  Hovy,    •  Collaborators:     –  Gully  Burns,  ISI   –  Henk  Pander  Maat,  UU   –  Joanne  Luciano,  RPI   –  Agnes  Sandor,  XRCE   –  Tim  Clark  et  al.,  Harvard   –  Jodi  Schneider,  DERI    …  and  all  of  you  J!   –  Rinke  Hoekstra  &  co,  VU   –  Richard  Boyce  &  co,  UpiT   –  Maria  Liakata,  EBI   –  Sophia  Ananiadou  &  co,   NaCTeM    
  • 36. Ques*ons?       Anita  de  Waard   a.dewaard@elsevier.com  hTp://elsatglabs.com/labs/anita/    
  • 37. References  [1]  J  Am  Med  Inform  Assoc.  2010  September;  17(5):  514–518  hTp://dx.doi.org/10.1136/jamia.2010.003947    [2]  Quanzhi  Li,  Yi-­‐Fang  Brook  Wu  (2006):  Iden*fying  important  concepts  from  medical  documents,  Journal  of  Biomedical  Informa*cs  39  (2006)  668–679  [3]  Useful  list  of  resources  in  bioinforma*cs  hTp://www.bioinforma*cs.ca/  [4]  Biological  Expression  Language  –  hTp://www.openbel.org    [5]  Latour,  B.  and  Woolgar,  S.,  Laboratory  Life:  the  Social  Construc*on  of  Scien*fic  Facts,  1979,  Sage  Publica*ons  [6]  Light  M,  Qiu  XY,  Srinivasan  P.  (2004).  The  language  of  bioscience:  facts,  specula*ons,  and  statements  in  between.  BioLINK  2004:  Linking  Biological  Literature,  Ontologies  and  Databases  2004:17-­‐24.  [7]  Wilbur  WJ,  Rzhetsky  A,  Shatkay  H  (2006).  New  direc*ons  in  biomedical  text  annota*ons:  defini*ons,  guidelines  and  corpus  construc*on.  BMC  Bioinforma*cs  2006,  7:356.  [8]  Thompson  P.,  Venturi  G.,  McNaught  J,  Montemagni  S,  Ananiadou  S.  (2008).  Categorising  modality  in  biomedical  texts.  Proc.  LREC  2008  Wkshp  Building  and  Evalua*ng  Resources  for  Biomedical  Text  Mining  2008.  [9]  Kim,  S-­‐M.  Hovy,  E.H.  (2004).  Determining  the  Sen*ment  of  Opinions.  Proceedings  of  the  COLING  conference,  Geneva,  2004.    [10]  de  Waard,  A.  and  Schneider,  J.  (2012)  Formalising  Uncertainty:  An  Ontology  of  Reasoning,  Certainty  and  ATribu*on  (ORCA),  Seman*c  Technologies  Applied  to  Biomedical  Informa*cs  and  Individualized  Medicine  workshop  at  ISWC  2012  (submibed)  [11]  Data2Seman*cs  project:  hTp://www.data2seman*cs.org/    [12]  Boyce  R,  Collins  C,  Horn  J,  Kalet  I.  (2009)    Compu*ng  with  evidence  Part  I:  A  drug-­‐mechanism  evidence  taxonomy  oriented  toward  confidence  assignment.  J  Biomed  Inform.  2009  Dec;42(6):979-­‐89.  Epub  2009  May  10,  see  also  hTp://dbmi-­‐icode-­‐01.dbmi.piT.edu/dikb-­‐evidence/front-­‐page.html    [13]  Sándor,  Àgnes  and  de  Waard,  Anita,  (2012).  Iden*fying  Claimed  Knowledge  Updates  in  Biomedical  Research  Ar*cles,  Workshop  on  Detec*ng  Structure  in  Scholarly  Discourse,  ACL  2012.    [14]  See  e.g.  hTp://ucsdbiolit.codeplex.com/  and  hTp://research.microso}.com/en-­‐us/projects/ontology/  for  MS  Word  ontology  add-­‐ins  
  • 38. Appendix:  ORCA  
  • 39. Logical  structure  of  epistemic  evalua*ons:  For  a  Proposi*on  P,  an  epistemically  marked  clause  E  is  an  evalua*on  of  P,    where    EV,  B,  S(P),  with:   –  V  =  Value:   3  =  Assumed  true,  2  =  Probable,  1  =  Possible,  0  =  Unknown,     (-­‐  1=  possibly  untrue,  -­‐  2  =  probably  untrue,  -­‐3  =  assumed  untrue)   –  B  =  Basis:   Reasoning   Data     –  S  =  Source:   A  =  speaker  is  author  A,  explicit   IA  =  speaker  author,  A,  implicit   N  =  other  author  N,  explicit   NN  =  other  author  NN,  implicit     Model  suggested  by  Eduard  Hovy,     InformaHon  Sciences  InsHtute  University  South  Califormia  
  • 40. Adding  Epistemic  Evalua*on  Claim   ORCA  Value  Together,  Lats2  and  ASPP1  shunt  p53  to  proapopto*c   Value  =  3  promoters  and  promote  the  death  of  polyploid  cells  [1].  (…)   Source  =  N     Basis  =  0    Further  biochemical  characteriza*on  of  hMOBs  showed  that     Value  =  3  only  hMOB1A  and  hMOB1B  interact  with  both  LATS1  and   Source  =  N  LATS2  in  vitro  and  in  vivo  [39].  (…)   Basis  =  Data        Our  findings  reveal  that  miR-­‐373  would  be  a  poten*al   Value  =  1  or  2  ?  oncogene  and  it  par*cipates  in  the  carcinogenesis  of  human   Source  =  Author  esophageal  cancer  by  suppressing  LATS2  expression.       Basis  =  Data        Furthermore,  we  demonstrated  that  the  direct  inhibi*on  of   Value  =  2  (or  3?)  LATS2  protein  was  mediated  by  miR-­‐373  and  manipulated  the   Source  =  Author  expression  of  miR-­‐373  to  affect  esophageal  cancer  cells  growth.     Basis  =  Data        
  • 41. Textual  Markers  •  Modal  auxiliary  verbs  (e.g.  can,  could,  might)    •  Qualifying  adverbs  and  adjec*ves  (e.g.  interesHngly,   possibly,  likely,  potenHal,  somewhat,  slightly,   powerful,  unknown,  undefined)  •  References,  either  external  (e.g.  ‘[Voorhoeve  et  al.,   2006]’)  or  internal  (e.g.  ‘See  fig.  2a’).    •  Repor*ng/epistemic  verbs  (e.g.  suggest,  imply,   indicate,  show)     –  either  within  the  clause:  ‘These  results  suggest  that...’     –  or  in  a  subordinate  clause  governed  by  repor*ng-­‐verb   matrix  clause  ‘{These  results  suggest  that}  indeed,  this   represents  the  true  endogenous  acHvity.’  
  • 42. Markers  v.  Types:  1  paper,  640  segments  Value   Modal   Repor5ng   Ruled  by   Adverbs/ Referenc None   Total     Aux     Verb   RV   Adjec5ves   es  Total  value  =  3   1  (0.5%)   81  (40%)   24  (12%)   7  (4%)   41  (20%)   47  (24%)  201(100%)  Total  Value  =  2   29  (51%)   23  (40%)   1  (2%)   4(7%)   57(100%)  Total  Value  =  1   9(27%)   11(33%)   11(33%)   1(3%)   1(3%)   33(100%)  Total  Value  =  0   9  (64%)   3  (21%)   1(7%)   1(7%)   14(100%)  Total  No  Modality   16(37%)   3(7%)   0   3(7%)   22(50%)   44(100%)  Overall  Total   10  (2%)   146(23%)   64(10%)   10(2%)   50(8%)   69(11%)  640(100%)  
  • 43. Most  prevalent  clause  type:     “These  results  suggest  that...”  Adverb/Connec*ve   thus,  therefore,  together,  recently,  in  summary    Determiner/Pronoun     it,  this,  these,  we/our  Adjec*ve   previous,  future,  beber  Noun  phrase   data,  report,  study,  result(s);  method  or  reference  Modal   form  of    ‘to  be’,  may,  remain  Adjec*ve   oken,  recently,  generally  Verb   show,  obtain,  consider,  view,  reveal,  suggest,   hypothesize,  indicate,  believe  Preposi*on     that,  to  
  • 44. Repor*ng  verbs  vs.  epistemic  value:  Value  =  0   establish,  (remain  to  be)  elucidated,    (unknown)   be  (clear/useful),  (remain  to  be)  examined/determined,   describe,  make  difficult  to  infer,  report  Value  =  1   be  important,  consider,  expect,  hypothesize  (5x),  give  (hypothe*cal)   insight,  raise  possibility  that,  suspect,  think  Value  =  2   appear,  believe,  implicate  (2x),  imply,  indicate  (12x),  play  a  (probable)   role,  represent,  suggest  (18x),  validate  (2x),    Value  =  3   be  able/apparent/important  /posi*ve/visible,  compare  (presumed  true)   (2x),  confirm  (2x),  define,    demonstrate  (15x),  detect  (5x),   discover,  display  (3x),  eliminate,  find  (3x),  iden*fy  (4x),   know,  need,  note  (2x),  observe  (2x),  obtain  (success/ results-­‐  3x),  prove  to  be,  refer,  report(2x),    reveal  (3x),   see(2x),  show(24x),    study,  view