Scientific Sensemaking


Published on

Talk at Microsoft Research, Bellevue, WA, January 24th 2013; overview of past 5 years of my research.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Scientific Sensemaking

  1. 1. Suppor&ng  Scien&fic   Sensemaking   Anita  de  Waard  VP  Research  Data  Collabora&ons,  Elsevier     Visit  Microso*  Research,  January  23,  2013  
  2. 2. Outline    •  A  model  of  scien&fic  sensemaking:     –  Stories,  that  persuade  with  data   –  Discourse  segments  and  verb  tense  •  Towards  extrac&ng  claim-­‐evidence  networks:   –  Hedging  in  science   –  Crea&ng  claim-­‐evidence  networks  •  Data:     –  Why  life  is  so  complicated   –  Connec&ng  biological  experiments  into  collaboratories  
  3. 3. A  paper  is  a  story…  Story Grammar The Story of Goldilocks and Paper The AXH Domain of Ataxin-1 Mediates the Three Bears Grammar Neurodegeneration through Its Interaction with Gfi-1/ Senseless Proteins Setting Time Once upon a time Background The mechanisms mediating SCA1 pathogenesis are still not fully understood, but some general principles have emerged. Character a little girl named Goldilocks Objects of the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract, study Location She went for a walk in the forest. Pretty soon, she came upon a Experimental studied and compared in vivo effects and interactions to those of the house. setup human protein Theme Goal She knocked and, when no one Research Gain insight into how Atx-1s function contributes to SCA1 answered, goal pathogenesis. How these interactions might contribute to the disease process and how they might cause toxicity in only a subset of neurons in SCA1 is not fully understood. Attempt she walked right in. Hypothesis Atx-1 may play a role in the regulation of gene expression Episode Name At the table in the kitchen, there Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When Overexpressed were three bowls of porridge. in Files Subgoal Goldilocks was hungry. Subgoal test the function of the AXH domain Attempt She tasted the porridge from the Method overexpressed dAtx-1 in flies using the GAL4/UAS system (Brand and first bowl. Perrimon, 1993) and compared its effects to those of hAtx-1. Outcome This porridge is too hot! she Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which drives exclaimed. expression in the differentiated R1-R6 photoreceptor cells (Mollereau et al., 2000 and OTousa et al., 1985), results in neurodegeneration in Attempt So, she tasted the porridge from the the eye, as does overexpression of hAtx-1[82Q]. Although at 2 days second bowl. after eclosion, overexpression of either Atx-1 does not show obvious morphological changes in the photoreceptor cells Outcome This porridge is too cold, she said Data (data not shown), Attempt So, she tasted the last bowl of porridge. Results both genotypes show many large holes and loss of cell integrity at 28 days Outcome Ahhh, this porridge is just right, she (Figures 1B-1D).
  4. 4. …that  persuades…   Aristotle   Quin-lian   Scien-fic  Paper   The  introducon  of  a  speech,  where  one  announces  the  subject   Introducon and  purpose  of  the  discourse,  and  where  one  usually  employs   Introducon:  prooimion   /  exordium   the  persuasive  appeal  to  ethos  in  order  to  establish  credibility   posioning   with  the  audience.     Statement  of   The  speaker  here  provides  a  narrave  account  of  what  has   Introducon:  research   prothesis   Facts/ happened  and  generally  explains  the  nature  of  the  case.     narrao   queson   Summary/   The  proposio  provides  a  brief  summary  of  what  one  is  about     proposo   to  speak  on,  or  concisely  puts  forth  the  charges  or  accusaon.     Summary  of  contents   Proof/   The  main  body  of  the  speech  where  one  offers  logical   piss   confirmao   arguments  as  proof.  The  appeal  to  logos  is  emphasized  here.   Results   Refutaon/   As  the  name  connotes,  this  secon  of  a  speech  was  devoted  to     refutao   answering  the  counterarguments  of  ones  opponent.   Related  Work   Following  the  refutao  and  concluding  the  classical  oraon,  the   Discussion:  summary,   epilogos   perorao     perorao  convenonally  employed  appeals  through  pathos,   and  oUen  included  a  summing  up.   implicaons.  Goal  of  the  paper  is  to  be  published;  it  uses  author/journal  as  a  host  Format  has  co-­‐evolved:  predator-­‐prey  relaonship  with  reviewers  
  5. 5. ...  with  data.  5  
  6. 6. In  defense  of  the  clause     as  the  unit  of  thought:   1.  Importantly,  our  results  so  far  indicate  that  the  expression  of   miR-­‐3723  did  not  reduce  the  acvity  of  RASV12,  as  these  cells   were  sll  growing  faster  than  normal  cells  and  were  tumorigenic,   for  which  RAS  acvity  is  indispensable  (Hahn  et  al,  1999  and   Kolfschoten  et  al,  2005).     2.  To  shed  more  light  on  this  aspect,  we  examined  the  effect  of   miR-­‐3723  expression  on  p53  acvaon  in  response  to  oncogenic   smulaon.     3.  We  used  for  this  experiment  BJ/ET  cells  containing  p14ARFkd   because,  following  RASV12  treatment,  in  those  cells  p53  is  sll   acvated  but  more  clearly  stabilized  than  in  parental  BJ/ET  cells     (Voorhoeve  and  Agami,  2003),  resulng  in  a  sensized  system  for   slight  alteraons  in  p53  in  response  to  RASV12.     4.  Figure  4A  shows  that  following  RASV12  smulaon,  p53  was   stabilized  and  acvated,  and  its  target  gene,  p21cip1,  was  induced   in  all  cases,  indicang  an  intact  p53  pathway  in  these  cells.      •  More  than  one  ‘thought  unit’  per  sentence.  •  Verb  tense  changes  within  sentence  (several  mes).  •  Airibuon,  acons/states,  and  preposions  all  contained  within  a  sentence.    
  7. 7. In  defense  of  the  clause     as  the  unit  of  thought:  1.  Importantly,  our  results  so  far  indicate  that  the  expression  of   miR-­‐3723  did  not  reduce  the  acvity  of  RASV12,  as  these  cells   were  sll  growing  faster  than  normal  cells  and  were  tumorigenic,   for  which  RAS  acvity  is  indispensable  (Hahn  et  al,  1999  and   Kolfschoten  et  al,  2005).    2.  To  shed  more  light  on  this  aspect,  we  examined  the  effect  of   miR-­‐3723  expression  on  p53  acvaon  in  response  to  oncogenic   smulaon.    3.  We  used  for  this  experiment  BJ/ET  cells  containing  p14ARFkd   because,  following  RASV12  treatment,  in  those  cells  p53  is  sll   acvated  but  more  clearly  stabilized  than  in  parental  BJ/ET  cells     (Voorhoeve  and  Agami,  2003),  resulng  in  a  sensized  system  for   slight  alteraons  in  p53  in  response  to  RASV12.    4.  Figure  4A  shows  that  following  RASV12  smulaon,  p53  was   stabilized  and  acvated,  and  its  target  gene,  p21cip1,  was  induced   in  all  cases,  indicang  an  intact  p53  pathway  in  these  cells.      Head:  premise,  movaon,   Middle:  main   End:  interpretaon,  elaboraon,  airibuon  (matrix  clause)   biological  statement   airibuon  (reference)  
  8. 8. In  defense  of  the  clause     as  the  unit  of  thought:  1.  Importantly,  our  results  so  far  indicate  that  the  expression  of   miR-­‐3723  did  not  reduce  the  acvity  of  RASV12,  as  these  cells   were  sll  growing  faster  than  normal  cells  and  were  tumorigenic,   for  which  RAS  acvity  is  indispensable  (Hahn  et  al,  1999  and   Kolfschoten  et  al,  2005).    2.  To  shed  more  light  on  this  aspect,  we  examined  the  effect  of   miR-­‐3723  expression  on  p53  acvaon  in  response  to  oncogenic   smulaon.    3.  We  used  for  this  experiment  BJ/ET  cells  containing  p14ARFkd   because,  following  RASV12  treatment,  in  those  cells  p53  is  sll   acvated  but  more  clearly  stabilized  than  in  parental  BJ/ET  cells     (Voorhoeve  and  Agami,  2003),  resulng  in  a  sensized  system  for   slight  alteraons  in  p53  in  response  to  RASV12.    4.  Figure  4A  shows  that  following  RASV12  smulaon,  p53  was   stabilized  and  acvated,  and  its  target  gene,  p21cip1,  was  induced   in  all  cases,  indicang  an  intact  p53  pathway  in  these  cells.       Regulatory   Fact   Goal   Method   Result   Implicaon   clause  
  9. 9. Clause,  realm  and  tense:   Conceptual  Both seminomas and the EC component ofofBoth seminomas and the EC component knowledge   Fact  nonseminomas share features withwithcells. cells. Tononseminomas share features ES ESTo exclude thatthe detection of miR-371-3 merelyexclude that Goal  the detection of miR-371-3 in ES cells, we testedreflects its expression pattern merely reflects its Hypothesis  expression pattern in ES cells,by RPA miR-302a-d, another ES cells-specificwe tested by RPA miR-302a-d, another ES cells-miRNA cluster (Suh et al, 2004). In many of thespecific miRNA clustere(Suhn g al,s2004). o m a s a n d Method  m i R - 3 7 1 - 3 e x p r s s i et emin Experimental  In many of the miR-371-3 expressing seminomas (Figsnonseminomas, miR-302a-d was undetectable Evidence  and nonseminomas, miR-302a-d was undetectableS7 and S8), suggesting that miR-371-3 expression is Result  (Figs S7 and S8),a selective event during tumorigenesis.suggesting that Reg-­‐Implicaon  miR-371-3 expression is a selective event during Implicaon  tumorigenesis.
  10. 10. Clause,  realm  and  tense:   Concepts,  models,  ‘facts’:  Present  tense   Fact   Problem   Implicaon  (1) Both seminomas (3) c. miR-371-3 (2) b. the detection ofand the EC component expression is a miR-371-3 merelyof nonseminomas selective event reflects its expressionshare features with ES during pattern in ES cells,cells. tumorigenesis. Goal   Regulatory-­‐Implicaon   (3) b. suggesting (2) a. To exclude that Transions:  present  tense   that Method   Result   (3) a. In many of the miR-371-3 (2) c. we tested by RPA expressing seminomas and miR-302a-d, another ES nonseminomas, miR-302a-d cells-specific miRNA cluster was undetectable (Figs S7 and (Suh et al, 2004). S8), Experiment:  Past  tense  
  11. 11. Tense  use  in  science  and  mythology:  Facts  in  the   Endogenous  small  RNAs  (miRNAs)  regulate   I  sing  of  golden-­‐throned  Hera  whom  Rhea  bare.  eternal  present   gene  expression  by  mechanisms  conserved   Queen  of  the  immortals  is  she,  surpassing  all  in   across  metazoans.   beauty:  she  is  the  sister  and  the  wife  of  loud-­‐ thundering  Zeus,  -­‐-­‐the  glorious  one  whom  all  the   blessed  throughout  high  Olympus  reverence  and   honor.  Events  in  the   Vehicle-­‐treated  animals  spent  equivalent   Now  the  wooers  turned  to  the  dance  and  to  simple  past   me  invesgang  a  juvenile  in  the  first  and   gladsome  song,  and  made  them  merry,  and  waited   second  sessions  in  experiments  conducted  in   ll  evening  should  come;  and  as  they  made  merry   the  NAC  and  the  striatum:    T1  values  were   dark  evening  came  upon  them.   122  ±  6  s  and  114  ±  5  s.  Events  with   We  also  generated  BJ/ET  cells  expressing  the   And  she  took  her  mighty  spear,  pped  with  sharp  embedded   RASV12-­‐ERTAM  chimera  gene,  which  is  only   bronze,  heavy  and  huge  and  strong,  wherewith  facts   acve  when  tamoxifen  is  added  (De  Vita  et  al,   she  vanquishes  the  ranks  of  men-­‐of  warriors,  with   2005).   whom  she  is  wroth,  she,  the  daughter  of  the   mighty  sire.  Aribu-on  in   miRNAs  have  emerged  as  important   In  this  book  I  have  had  old  stories  wriien  down,  as  the  present   regulators  of  development  and  control   I  have  heard  them  told  by  intelligent  people,  perfect   processes  such  as  cell  fate  determinaon  and   concerning  chiefs  who  have  held  dominion  in  the   cell  death  (Abrahante  et  al.,  2003,  Brennecke   northern  countries,  and  who  spoke  the  Danish   et  al.,  2003,  Chang  et  al.,  2004,  Chen  et  al.,   tongue;  and  also  concerning  some  of  their  family   2004,  Johnston  and  Hobert,  2003,  Lee  et  al.,   branches,  according  to  what  has  been  told  me.   1993]  Implica-ons   These  results  indicate  that  although   Now  it  is  said  that  ever  since  then  whenever  the  are  hedged,   miR-­‐3723  confer  complete  protecon  to   camel  sees  a  place  where  ashes  have  been  and  in  the   oncogene-­‐induced  senescence  in  a  manner   scaiered,  he  wants  to  get  revenge  with  his  enemy  present  tense   similar  to  p53  inacvaon,  the  cellular   the  rat  and  stomps  and  rolls  in  the  ashes  hoping  to   response  to  DNA  damage  remains  intact   get  the  rat  
  12. 12. From  ficon  to  fact:  Hedging   “[Y]ou  can  transform  ..  ficon  into  fact  just  by  adding  or   subtracng  references”,  Bruno  Latour  [1]•  Voorhoeve  et  al.,  2006:   These  miRNAs  neutralize  p53-­‐  mediated  CDK   inhibion,  possibly  through  direct  inhibion  of  the  expression  of  the  tumor   suppressor  LATS2.  •  Kloosterman  and  Plasterk,  2006:   In  a  genec  screen,  miR-­‐372  and  miR-­‐373   were  found  to  allow  proliferaon  of  primary  human  cells  that  express   oncogenic  RAS  and  acve  p53,  possibly  by  inhibing  the  tumor  suppressor   LATS2  (Voorhoeve  et  al.,  2006).  •  Yabuta  et  al.,  2007:     [On  the  other  hand,]  two  miRNAs,  miRNA-­‐372  and-­‐373,   funcon  as  poten-al  novel  oncogenes  in  tescular  germ  cell  tumors  by   inhibion  of  LATS2  expression,  which  suggests  that  Lats2  is  an  important   tumor  suppressor  (Voorhoeve  et  al.,  2006).    •  Okada  et  al.,  2011:   Two  oncogenic  miRNAs,  miR-­‐372  and  miR-­‐373,  directly   inhibit  the  expression  of  Lats2,  thereby  allowing  tumorigenic  growth  in  the   presence  of  p53  (Voorhoeve  et  al.,  2006).  
  13. 13. Hedging  in  science:  •  Why  do  authors  hedge?   –  Make  a  claim  ‘pending  […]  acceptance  in  the  community’  [2]   –  ‘Create  A  Research  Space’  –  hedging  allows  authors  to  insert  themselves  into   the  discourse  in  a  community  [3]   –  ‘the  strongest  claim  a  careful  researcher  can  make’  [4]  •  Hedging  cues,  speculave  language,  modality/negaon:   –  Light  et  al  [5]:  finding  speculave  language   –  Wilbur  et  al  [6]:  focus,  polarity,  certainty,  evidence,  and  direconality   –  Thompson  et  al  [7]:  level  of  speculaon,  type/source  of  the  evidence  and   level  of  certainty      •  Senment  detecon  (e.g.  Kim  and  Hovy  [8]  a.m.o.):     –  Holder  of  the  opinion,  strength,  polarity  as  ‘mathemacal  funcon’  acng  on   main  proposional  content     –  Wide  applicaons  in  product  reviews;  but  not  (yet)  in  science!  
  14. 14. A  model  for  epistemic  evaluaons:  For  a  Proposion  P,  an  epistemically  marked  clause  E  is  an  evaluaon  of  P,    where    EV,  B,  S(P),  with:   –  V  =  Value:   3  =  Assumed  true,  2  =  Probable,  1  =  Possible,  0  =  Unknown,     (-­‐  1=  possibly  untrue,  -­‐  2  =  probably  untrue,  -­‐3  =  assumed  untrue)   –  B  =  Basis:   Reasoning   Data     –  S  =  Source:   A  =  speaker  is  author  A,  explicit   IA  =  speaker  author,  A,  implicit   N  =  other  author  N,  explicit   NN  =  other  author  NN,  implicit     Model  suggested  by  Eduard  Hovy,     Informaon  Sciences  Instute  University  South  Califormia  
  15. 15. Reporng  verbs  vs.  epistemic  value:  Value  =  0   establish,  (remain  to  be)  elucidated,    (unknown)   be  (clear/useful),  (remain  to  be)  examined/determined,   describe,  make  difficult  to  infer,  report  Value  =  1   be  important,  consider,  expect,  hypothesize  (5x),  give  (hypothecal)   insight,  raise  possibility  that,  suspect,  think  Value  =  2   appear,  believe,  implicate  (2x),  imply,  indicate  (12x),  play  a  (probable)   role,  represent,  suggest  (18x),  validate  (2x),    Value  =  3   be  able/apparent/important  /posive/visible,  compare  (presumed  true)   (2x),  confirm  (2x),  define,    demonstrate  (15x),  detect  (5x),   discover,  display  (3x),  eliminate,  find  (3x),  idenfy  (4x),   know,  need,  note  (2x),  observe  (2x),  obtain  (success/ results-­‐  3x),  prove  to  be,  refer,  report(2x),    reveal  (3x),   see(2x),  show(24x),    study,  view  
  16. 16. Most  prevalent  clause  type:     These  results  suggest  that...  Adverb/Connecve   thus,  therefore,  together,  recently,  in  summary    Determiner/Pronoun     it,  this,  these,  we/our  Adjecve   previous,  future,  beYer  Noun  phrase   data,  report,  study,  result(s);  method  or  reference  Modal   form  of    ‘to  be’,  may,  remain  Adjecve   o*en,  recently,  generally  Verb   show,  obtain,  consider,  view,  reveal,  suggest,   hypothesize,  indicate,  believe  Preposion     that,  to  
  17. 17. Ontology  for  Reasoning,  Certainty  and   Airibuon  [11]    
  18. 18. Adding  metadiscourse  to  triples:  Biological  statement  with  BEL/  epistemic   BEL  representa-on:   Epistemic  markup   evalua-on  These  miRNAs  neutralize  p53-­‐mediated  CDK   r(MIR:miR-­‐372)  -­‐| Value  =  inhibion,  possibly  through  direct  inhibion   (tscript(p(HUGO:Trp53))  -­‐|   Possible  of  the  expression  of  the  tumor-­‐suppressor   kin(p(PFH:”CDK    Family”)))   Source  =  LATS2.     Increased  abundance  of   Unknown   miR-­‐372  decreases   Basis  =   abundance  of  LATS2   Unknown   r(MIR:miR-­‐372)  -­‐|     r(HUGO:LATS2)  Biological  statement  with  Medscan/ MedScan  Analysis:   Epistemic  epistemic  markup   evalua-on  Furthermore,  we  present  evidence  that  the   IL-­‐6  è  NUCB2  (nesfan-­‐1)   Value  =  secreon  of  nesfaTn-­‐1  into  the  culture   Relaon:  MolTransport   Probable  media  was  dramacally  increased  during  the   Effect:  Posive   Source  =  differenaon  of  3T3-­‐L1  preadipocytes  into   CellType:  Adipocytes   Author  adipocytes  (P    0.001)  and  aUer  treatments   Cell  Line:  3T3-­‐L1   Basis  =  Data    with  TNF-­‐alpha,  IL-­‐6,  insulin,  and      dexamethasone  (P    0.01).  
  19. 19. Claim-­‐Evidence  example:  Data2Semancs   Goal:  improve  speed  of  integraon  of  research    pracce     Step 1: Patient data + diagnosis link to Guideline recommendation B.  Elsevier-­‐published    A. Philips’ Electronic Patient Records Clinical  Guideline   Step 2: Guideline recommendation links to evidence in report or data C. Elsevier (or other publisher’s) Research Report or Data
  20. 20. Claim-­‐Evidence  Chains  in     Drug-­‐drug  wiide  collecon  oaf  nd   drug  names  in   nteracons   Step  1:  Manually  idenfy  DDIs   content  sources   Step  2:  Develop  a  model  of  Drug-­‐Drug   Interacon  and  define  candidates   Step  3:  Automate  this  process  and   store  as  Linked  Data   20
  21. 21. Claimed  Knowledge  Updates  Definion:    1)  A  CKU  expresses  a  proposion  about  biological  enes    2)  A  CKU  is  a  new  proposion  3)  The  authors  present  the  CKU  as  factual:  =  Strength  =  Certainty  4)  A  CKU  is  derived  from  experimental  work  described  in  the  arcle:  =  Basis  =  Data  5)  The  ownership  is  aiributed     to  the  author(s)  of  the  arcle.    ⇒  Source  =  Author,  Explicit  Sandor/de  Waard,  [13]  
  22. 22. A  corpus  for  citaon  analysis:    Type   Voorhoeve  text   CiTng  text  Method   We  subsequently  created  a  human   Voorhoeve  et  al.  (116)  employed  a  novel  strategy  by   miRNA  expression  library  (miR-­‐Lib)  by   combining  an  miRNA  vector  library  and  corresponding  bar   cloning  almost  all  annotated  human   code  array  Using  a  novel  retroviral  miRNA  expression   miRNAs  into  our  vector  (Rfam  release   library,     6)  (Figure  S3)     Agami  and  co-­‐workers  performed  a  cell-­‐based  screen  Result   we  idenfied  miR-­‐372  and  miR-­‐373,   miR-­‐372  and  miR-­‐373  were  consequently  found  to  permit   each  permi|ng  proliferaon  and   proliferaon  and  tumorigenesis  of  these  primary  cells   tumorigenesis  of  primary  human   carrying  both  oncogenic  RAS  and  wild-­‐type  p53,     cells  that  harbor  both  oncogenic   Voorhoeve  et  al.  (2006)  idenfied  miR-­‐372  and  miR-­‐373     RAS  and  acve  wild  -­‐  type  p53.     miR-­‐372  has  been  recently  described  as  potenal  oncogene   that  collaborate  with  oncogenic  RAS  in  cellular   transformaon  Interpretaon   These  miRNAs  neutralize  p53-­‐   probably  through  direct  inhibion  of  the  expression  of  the   mediated  CDK  inhibion,  possibly   tumor-­‐suppressor  LATS2  and  subsequent  neutralizaon  of   through  direct  inhibion  of  the   the  p53  pathway.     expression  of  the  tumor  suppressor   Compromised  Lats2  funconality  might  reduce  the  selecve   LATS2  .     pressure  for  p53  inacvaon  during  tumor  progression.       Work  done  with  Lucy  Vanderwende  
  23. 23. Data  sharing  in  biology  •  Interspecies  variability    A  specimen  is  not  a  species!  •  Gene  expression  variability      Knowing  genes  is  not     knowing  how  they  are  expressed!  •  Microbiome      An  animal  is  an  ecosystem!  •  Systems  biology    Whole  is  more  than  the  sum  of  its  parts!  •  Models  vs.  experiment    Are  we  talking  about  the  same   things?  In  a  way  we  can  all  use?    •  Dynamics    Life  is  not  in  equilibrium!           =  Life  is  complicated!   Reduconism  doesn’t  work   for  living  systems.   hip://  
  24. 24. Stascs  to  the  rescue!    With  enough  observaons,  trends  and  anomalies  can  be  detected:  •   “Here  we  present  resources  from  a  populaon  of  242   healthy  adults  sampled  at  15  or  18  body  sites  up  to  three   mes,  which  have  generated  5,177  microbial  taxonomic   profiles  from  16S  ribosomal  RNA  genes  and  over  3.5   terabases  of  metagenomic  sequence  so  far.”     The  Human  Microbiome  Project  Consorum,  Structure,  funcon  and  diversity  of   the  healthy  human  microbiome,  Nature  486,  207–214  (14  June  2012)  doi:10.1038/ nature11234  •  “The  large  sample  size  —  4,298  North  Americans  of   European  descent  and  2,217  African  Americans  —  has   enabled  the  researchers  to  mine  down  into  the  human   genome.”     Nidhi  Subbaraman,  Nature  News,  28  November  2012,  High-­‐resoluon  sequencing   study  emphasizes  importance  of  rare  variants  in  disease.    
  25. 25. Enable  ‘incidental  collaboratories’:  •  Collect:  store  data  at  the  level  of  the  experiment:   –  Accessible  through  a  single  interface   –  Add  enough  metadata  to  know  what  was  done/seen  •  Connect:  allow  analyses  over:     –  Similar  experiment  types     –  Experiments  done  with/on  similar  biological  ‘things’     (species,  strains,  systems,  cells  etc.)   –  In  a  way  that  can  be  used  by  modelers!    •  Keep:   –  Long-­‐term  preservaon  of  data  and  soUware       –  Fulfill  Data  Management  Plan  requirements   –  Allow  ‘gated’  access  when  and  to  whom  researcher  wants  
  26. 26. Let’s  look  at  a  typical  lab:  •  How  to  get  the  right     anbody  IDs    •  And  messy  bits      •  From  the  lab  notebook    •  Into  the  PI’s  command     center?  
  27. 27. Objecons  and  rebuials  re.  data  sharing  Objec-on:   Rebual:  “But  our  lab  notebooks  are  all  on   Develop  smart  phone/tablet  apps  for  data  paper”   input  “I  need  to  see  a  direct  benefit  from   Develop  ‘data  manipula-on  dashboard’  for  something  I  spend  my  me  on”   PI  to  allow  beier  access  to  full     experimental  output  for  his/her  lab  “I  want  things  to  be  peer  reviewed   Allow  reviewers  access  to  experimental  before  I  expose  them”   database  before  publicaon  (of  data  or     paper)  “I  don’t  really  trust  anyone  else’s   Add  a  social  networking  component  to  this  data  –  well,  except  for  the  guys  I   data  repository  so  you  know  who  (to  the  went  to  Grad  School  with…”     individual)  created  that  data  point.    “I  am  afraid  other  people   =  Reward  system  moves  from  a  might  scoop  my  discoveries”   compe--on  to  a  ‘shared  mission’  
  28. 28. Problem:  biological  research  is  quite  insular  •  Biology  is  small:  size  10^-­‐5  –  10^2  m,   scienst  can  work  alone  (‘King’  and   ‘subjects’).    •  Biology  is  messy:  it  doesn’t  happen   Prepare   behind  a  terminal.    •  Biology  is  compeve:  many     Ponder   Observe   people  with  similar  skill  sets,     Communicate   vying  for  the  same  grants       Analyze  •  In  summary:  the  structure  of  biological   research  does  not  inherently  promote   collaboraon  (vs.,  for  instance,  big   physics  or  astronomy).  
  29. 29. So  we  can  do  joint  experiments:  Across  labs,  experiments:  track  reagents  and  how  they  are  used   Observaons   Observaons   Observaons   Prepare   Prepare   Analyze   Communicate   Analyze   Communicate  
  30. 30. So  we  can  do  joint  experiments:  Compare  outcome  of  interacons  with  these  enes   Observaons   Observaons   Observaons   Prepare   Prepare   Analyze   Communicate   Analyze   Communicate  
  31. 31. So  we  can  do  joint  experiments:  Build  a  ‘virtual  reagent  spectrogram’  by  comparing    how  different  enes     Observaons  interacted  in  different  experiments   Observaons   Observaons   Prepare   Prepare   Analyze   Communicate   Analyze   Communicate  
  32. 32. Elsevier  Research  Data  Services:  1.  Help  increase  the  amount  of  data  shared  from   the  lab,  enabling  incidental  collaboratories  2.  Help  increase  the  value  of  the  data  shared  by   increasing  annotaon,  normalizaon,   provenance  enabling  enhanced  interoperability  3.  Help  measure  and  deliver  credit  for  shared   data,  the  researchers,  the  instute,  and  the   funding  body,  enabling  more  sustainable   pla‚orms  
  33. 33. Summary  –     Possible  Collaboraons?    •  A  model  of  scienfic  sensemaking:     Thesis:  joint     –  Stories,  that  persuade  with  data   research?     –  Discourse  segments  and  verb  tense  •  Towards  claim-­‐evidence  networks:   Labs:  research   collaboraons?   –  Hedging  in  science   –  Creang  claim-­‐evidence  networks  •  Data:     RDS:  joint   –  Why  life  is  so  complicated   development?   –  Connecng  experiments  into  collaboratories  
  34. 34. References:  [1]  J  Am  Med  Inform  Assoc.  2010  September;  17(5):  514–518  hip://    [2]  Quanzhi  Li,  Yi-­‐Fang  Brook  Wu  (2006):  Idenfying  important  concepts  from  medical  documents,  Journal  of  Biomedical  Informacs  39  (2006)  668–679  [3]  Useful  list  of  resources  in  bioinformacs  hip://  [4]  Biological  Expression  Language  –  hip://    [5]  Latour,  B.  and  Woolgar,  S.,  Laboratory  Life:  the  Social  Construcon  of  Scienfic  Facts,  1979,  Sage  Publicaons  [6]  Light  M,  Qiu  XY,  Srinivasan  P.  (2004).  The  language  of  bioscience:  facts,  speculaons,  and  statements  in  between.  BioLINK  2004:  Linking  Biological  Literature,  Ontologies  and  Databases  2004:17-­‐24.  [7]  Wilbur  WJ,  Rzhetsky  A,  Shatkay  H  (2006).  New  direcons  in  biomedical  text  annotaons:  definions,  guidelines  and  corpus  construcon.  BMC  Bioinformacs  2006,  7:356.  [8]  Thompson  P.,  Venturi  G.,  McNaught  J,  Montemagni  S,  Ananiadou  S.  (2008).  Categorising  modality  in  biomedical  texts.  Proc.  LREC  2008  Wkshp  Building  and  Evaluang  Resources  for  Biomedical  Text  Mining  2008.  [9]  Kim,  S-­‐M.  Hovy,  E.H.  (2004).  Determining  the  Senment  of  Opinions.  Proceedings  of  the  COLING  conference,  Geneva,  2004.    [10]  de  Waard,  A.  and  Schneider,  J.  (2012)  Formalising  Uncertainty:  An  Ontology  of  Reasoning,  Certainty  and  Airibuon  (ORCA),  Semanc  Technologies  Applied  to  Biomedical  Informacs  and  Individualized  Medicine  workshop  at  ISWC  2012  (submiYed)  [11]  Data2Semancs  project:  hip://    [12]  Boyce  R,  Collins  C,  Horn  J,  Kalet  I.  (2009)    Compung  with  evidence  Part  I:  A  drug-­‐mechanism  evidence  taxonomy  oriented  toward  confidence  assignment.  J  Biomed  Inform.  2009  Dec;42(6):979-­‐89.  Epub  2009  May  10,  see  also  hip://dbmi-­‐icode-­‐­‐evidence/front-­‐page.html