Scientific Sensemaking
Upcoming SlideShare
Loading in...5

Scientific Sensemaking



Talk at Microsoft Research, Bellevue, WA, January 24th 2013; overview of past 5 years of my research.

Talk at Microsoft Research, Bellevue, WA, January 24th 2013; overview of past 5 years of my research.



Total Views
Views on SlideShare
Embed Views



1 Embed 3 3



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Scientific Sensemaking Scientific Sensemaking Presentation Transcript

  • Suppor&ng  Scien&fic   Sensemaking   Anita  de  Waard  VP  Research  Data  Collabora&ons,  Elsevier     Visit  Microso*  Research,  January  23,  2013  
  • Outline    •  A  model  of  scien&fic  sensemaking:     –  Stories,  that  persuade  with  data   –  Discourse  segments  and  verb  tense  •  Towards  extrac&ng  claim-­‐evidence  networks:   –  Hedging  in  science   –  Crea&ng  claim-­‐evidence  networks  •  Data:     –  Why  life  is  so  complicated   –  Connec&ng  biological  experiments  into  collaboratories  
  • A  paper  is  a  story…  Story Grammar The Story of Goldilocks and Paper The AXH Domain of Ataxin-1 Mediates the Three Bears Grammar Neurodegeneration through Its Interaction with Gfi-1/ Senseless Proteins Setting Time Once upon a time Background The mechanisms mediating SCA1 pathogenesis are still not fully understood, but some general principles have emerged. Character a little girl named Goldilocks Objects of the Drosophila Atx-1 homolog (dAtx-1) which lacks a polyQ tract, study Location She went for a walk in the forest. Pretty soon, she came upon a Experimental studied and compared in vivo effects and interactions to those of the house. setup human protein Theme Goal She knocked and, when no one Research Gain insight into how Atx-1s function contributes to SCA1 answered, goal pathogenesis. How these interactions might contribute to the disease process and how they might cause toxicity in only a subset of neurons in SCA1 is not fully understood. Attempt she walked right in. Hypothesis Atx-1 may play a role in the regulation of gene expression Episode Name At the table in the kitchen, there Name dAtX-1 and hAtx-1 Induce Similar Phenotypes When Overexpressed were three bowls of porridge. in Files Subgoal Goldilocks was hungry. Subgoal test the function of the AXH domain Attempt She tasted the porridge from the Method overexpressed dAtx-1 in flies using the GAL4/UAS system (Brand and first bowl. Perrimon, 1993) and compared its effects to those of hAtx-1. Outcome This porridge is too hot! she Results Overexpression of dAtx-1 by Rhodopsin1(Rh1)-GAL4, which drives exclaimed. expression in the differentiated R1-R6 photoreceptor cells (Mollereau et al., 2000 and OTousa et al., 1985), results in neurodegeneration in Attempt So, she tasted the porridge from the the eye, as does overexpression of hAtx-1[82Q]. Although at 2 days second bowl. after eclosion, overexpression of either Atx-1 does not show obvious morphological changes in the photoreceptor cells Outcome This porridge is too cold, she said Data (data not shown), Attempt So, she tasted the last bowl of porridge. Results both genotypes show many large holes and loss of cell integrity at 28 days Outcome Ahhh, this porridge is just right, she (Figures 1B-1D).
  • …that  persuades…   Aristotle   Quin-lian   Scien-fic  Paper   The  introduc&on  of  a  speech,  where  one  announces  the  subject   Introduc&on and  purpose  of  the  discourse,  and  where  one  usually  employs   Introduc&on:  prooimion   /  exordium   the  persuasive  appeal  to  ethos  in  order  to  establish  credibility   posi&oning   with  the  audience.     Statement  of   The  speaker  here  provides  a  narra&ve  account  of  what  has   Introduc&on:  research   prothesis   Facts/ happened  and  generally  explains  the  nature  of  the  case.     narra<o   ques&on   Summary/   The  proposi&o  provides  a  brief  summary  of  what  one  is  about     propos<<o   to  speak  on,  or  concisely  puts  forth  the  charges  or  accusa&on.     Summary  of  contents   Proof/   The  main  body  of  the  speech  where  one  offers  logical   pis&s   confirma<o   arguments  as  proof.  The  appeal  to  logos  is  emphasized  here.   Results   Refuta&on/   As  the  name  connotes,  this  sec&on  of  a  speech  was  devoted  to     refuta<o   answering  the  counterarguments  of  ones  opponent.   Related  Work   Following  the  refuta&o  and  concluding  the  classical  ora&on,  the   Discussion:  summary,   epilogos   perora<o     perora&o  conven&onally  employed  appeals  through  pathos,   and  oUen  included  a  summing  up.   implica&ons.  Goal  of  the  paper  is  to  be  published;  it  uses  author/journal  as  a  host  Format  has  co-­‐evolved:  predator-­‐prey  rela&onship  with  reviewers  
  • ...  with  data.  5  
  • In  defense  of  the  clause     as  the  unit  of  thought:   1.  Importantly,  our  results  so  far  indicate  that  the  expression  of   miR-­‐372&3  did  not  reduce  the  ac&vity  of  RASV12,  as  these  cells   were  s&ll  growing  faster  than  normal  cells  and  were  tumorigenic,   for  which  RAS  ac&vity  is  indispensable  (Hahn  et  al,  1999  and   Kolfschoten  et  al,  2005).     2.  To  shed  more  light  on  this  aspect,  we  examined  the  effect  of   miR-­‐372&3  expression  on  p53  ac&va&on  in  response  to  oncogenic   s&mula&on.     3.  We  used  for  this  experiment  BJ/ET  cells  containing  p14ARFkd   because,  following  RASV12  treatment,  in  those  cells  p53  is  s&ll   ac&vated  but  more  clearly  stabilized  than  in  parental  BJ/ET  cells     (Voorhoeve  and  Agami,  2003),  resul&ng  in  a  sensi&zed  system  for   slight  altera&ons  in  p53  in  response  to  RASV12.     4.  Figure  4A  shows  that  following  RASV12  s&mula&on,  p53  was   stabilized  and  ac&vated,  and  its  target  gene,  p21cip1,  was  induced   in  all  cases,  indica&ng  an  intact  p53  pathway  in  these  cells.      •  More  than  one  ‘thought  unit’  per  sentence.  •  Verb  tense  changes  within  sentence  (several  &mes).  •  Airibu&on,  ac&ons/states,  and  preposi&ons  all  contained  within  a  sentence.    
  • In  defense  of  the  clause     as  the  unit  of  thought:  1.  Importantly,  our  results  so  far  indicate  that  the  expression  of   miR-­‐372&3  did  not  reduce  the  ac&vity  of  RASV12,  as  these  cells   were  s&ll  growing  faster  than  normal  cells  and  were  tumorigenic,   for  which  RAS  ac&vity  is  indispensable  (Hahn  et  al,  1999  and   Kolfschoten  et  al,  2005).    2.  To  shed  more  light  on  this  aspect,  we  examined  the  effect  of   miR-­‐372&3  expression  on  p53  ac&va&on  in  response  to  oncogenic   s&mula&on.    3.  We  used  for  this  experiment  BJ/ET  cells  containing  p14ARFkd   because,  following  RASV12  treatment,  in  those  cells  p53  is  s&ll   ac&vated  but  more  clearly  stabilized  than  in  parental  BJ/ET  cells     (Voorhoeve  and  Agami,  2003),  resul&ng  in  a  sensi&zed  system  for   slight  altera&ons  in  p53  in  response  to  RASV12.    4.  Figure  4A  shows  that  following  RASV12  s&mula&on,  p53  was   stabilized  and  ac&vated,  and  its  target  gene,  p21cip1,  was  induced   in  all  cases,  indica&ng  an  intact  p53  pathway  in  these  cells.      Head:  premise,  mo&va&on,   Middle:  main   End:  interpreta&on,  elabora&on,  airibu&on  (matrix  clause)   biological  statement   airibu&on  (reference)  
  • In  defense  of  the  clause     as  the  unit  of  thought:  1.  Importantly,  our  results  so  far  indicate  that  the  expression  of   miR-­‐372&3  did  not  reduce  the  ac&vity  of  RASV12,  as  these  cells   were  s&ll  growing  faster  than  normal  cells  and  were  tumorigenic,   for  which  RAS  ac&vity  is  indispensable  (Hahn  et  al,  1999  and   Kolfschoten  et  al,  2005).    2.  To  shed  more  light  on  this  aspect,  we  examined  the  effect  of   miR-­‐372&3  expression  on  p53  ac&va&on  in  response  to  oncogenic   s&mula&on.    3.  We  used  for  this  experiment  BJ/ET  cells  containing  p14ARFkd   because,  following  RASV12  treatment,  in  those  cells  p53  is  s&ll   ac&vated  but  more  clearly  stabilized  than  in  parental  BJ/ET  cells     (Voorhoeve  and  Agami,  2003),  resul&ng  in  a  sensi&zed  system  for   slight  altera&ons  in  p53  in  response  to  RASV12.    4.  Figure  4A  shows  that  following  RASV12  s&mula&on,  p53  was   stabilized  and  ac&vated,  and  its  target  gene,  p21cip1,  was  induced   in  all  cases,  indica&ng  an  intact  p53  pathway  in  these  cells.       Regulatory   Fact   Goal   Method   Result   Implica&on   clause  
  • Clause,  realm  and  tense:   Conceptual  Both seminomas and the EC component ofofBoth seminomas and the EC component knowledge   Fact  nonseminomas share features withwithcells. cells. Tononseminomas share features ES ESTo exclude thatthe detection of miR-371-3 merelyexclude that Goal  the detection of miR-371-3 in ES cells, we testedreflects its expression pattern merely reflects its Hypothesis  expression pattern in ES cells,by RPA miR-302a-d, another ES cells-specificwe tested by RPA miR-302a-d, another ES cells-miRNA cluster (Suh et al, 2004). In many of thespecific miRNA clustere(Suhn g al,s2004). o m a s a n d Method  m i R - 3 7 1 - 3 e x p r s s i et emin Experimental  In many of the miR-371-3 expressing seminomas (Figsnonseminomas, miR-302a-d was undetectable Evidence  and nonseminomas, miR-302a-d was undetectableS7 and S8), suggesting that miR-371-3 expression is Result  (Figs S7 and S8),a selective event during tumorigenesis.suggesting that Reg-­‐Implica&on  miR-371-3 expression is a selective event during Implica&on  tumorigenesis.
  • Clause,  realm  and  tense:   Concepts,  models,  ‘facts’:  Present  tense   Fact   Problem   Implica&on  (1) Both seminomas (3) c. miR-371-3 (2) b. the detection ofand the EC component expression is a miR-371-3 merelyof nonseminomas selective event reflects its expressionshare features with ES during pattern in ES cells,cells. tumorigenesis. Goal   Regulatory-­‐Implica&on   (3) b. suggesting (2) a. To exclude that Transi&ons:  present  tense   that Method   Result   (3) a. In many of the miR-371-3 (2) c. we tested by RPA expressing seminomas and miR-302a-d, another ES nonseminomas, miR-302a-d cells-specific miRNA cluster was undetectable (Figs S7 and (Suh et al, 2004). S8), Experiment:  Past  tense  
  • Tense  use  in  science  and  mythology:  Facts  in  the   Endogenous  small  RNAs  (miRNAs)  regulate   I  sing  of  golden-­‐throned  Hera  whom  Rhea  bare.  eternal  present   gene  expression  by  mechanisms  conserved   Queen  of  the  immortals  is  she,  surpassing  all  in   across  metazoans.   beauty:  she  is  the  sister  and  the  wife  of  loud-­‐ thundering  Zeus,  -­‐-­‐the  glorious  one  whom  all  the   blessed  throughout  high  Olympus  reverence  and   honor.  Events  in  the   Vehicle-­‐treated  animals  spent  equivalent   Now  the  wooers  turned  to  the  dance  and  to  simple  past   &me  inves&ga&ng  a  juvenile  in  the  first  and   gladsome  song,  and  made  them  merry,  and  waited   second  sessions  in  experiments  conducted  in   &ll  evening  should  come;  and  as  they  made  merry   the  NAC  and  the  striatum:    T1  values  were   dark  evening  came  upon  them.   122  ±  6  s  and  114  ±  5  s.  Events  with   We  also  generated  BJ/ET  cells  expressing  the   And  she  took  her  mighty  spear,  &pped  with  sharp  embedded   RASV12-­‐ERTAM  chimera  gene,  which  is  only   bronze,  heavy  and  huge  and  strong,  wherewith  facts   ac&ve  when  tamoxifen  is  added  (De  Vita  et  al,   she  vanquishes  the  ranks  of  men-­‐of  warriors,  with   2005).   whom  she  is  wroth,  she,  the  daughter  of  the   mighty  sire.  A>ribu-on  in   miRNAs  have  emerged  as  important   In  this  book  I  have  had  old  stories  wriien  down,  as  the  present   regulators  of  development  and  control   I  have  heard  them  told  by  intelligent  people,  perfect   processes  such  as  cell  fate  determina&on  and   concerning  chiefs  who  have  held  dominion  in  the   cell  death  (Abrahante  et  al.,  2003,  Brennecke   northern  countries,  and  who  spoke  the  Danish   et  al.,  2003,  Chang  et  al.,  2004,  Chen  et  al.,   tongue;  and  also  concerning  some  of  their  family   2004,  Johnston  and  Hobert,  2003,  Lee  et  al.,   branches,  according  to  what  has  been  told  me.   1993]  Implica-ons   These  results  indicate  that  although   Now  it  is  said  that  ever  since  then  whenever  the  are  hedged,   miR-­‐372&3  confer  complete  protec&on  to   camel  sees  a  place  where  ashes  have  been  and  in  the   oncogene-­‐induced  senescence  in  a  manner   scaiered,  he  wants  to  get  revenge  with  his  enemy  present  tense   similar  to  p53  inac&va&on,  the  cellular   the  rat  and  stomps  and  rolls  in  the  ashes  hoping  to   response  to  DNA  damage  remains  intact   get  the  rat  
  • From  fic&on  to  fact:  Hedging   “[Y]ou  can  transform  ..  fic&on  into  fact  just  by  adding  or   subtrac&ng  references”,  Bruno  Latour  [1]•  Voorhoeve  et  al.,  2006:   These  miRNAs  neutralize  p53-­‐  mediated  CDK   inhibi&on,  possibly  through  direct  inhibi&on  of  the  expression  of  the  tumor   suppressor  LATS2.  •  Kloosterman  and  Plasterk,  2006:   In  a  gene&c  screen,  miR-­‐372  and  miR-­‐373   were  found  to  allow  prolifera&on  of  primary  human  cells  that  express   oncogenic  RAS  and  ac&ve  p53,  possibly  by  inhibi&ng  the  tumor  suppressor   LATS2  (Voorhoeve  et  al.,  2006).  •  Yabuta  et  al.,  2007:     [On  the  other  hand,]  two  miRNAs,  miRNA-­‐372  and-­‐373,   func&on  as  poten-al  novel  oncogenes  in  tes&cular  germ  cell  tumors  by   inhibi&on  of  LATS2  expression,  which  suggests  that  Lats2  is  an  important   tumor  suppressor  (Voorhoeve  et  al.,  2006).    •  Okada  et  al.,  2011:   Two  oncogenic  miRNAs,  miR-­‐372  and  miR-­‐373,  directly   inhibit  the  expression  of  Lats2,  thereby  allowing  tumorigenic  growth  in  the   presence  of  p53  (Voorhoeve  et  al.,  2006).  
  • Hedging  in  science:  •  Why  do  authors  hedge?   –  Make  a  claim  ‘pending  […]  acceptance  in  the  community’  [2]   –  ‘Create  A  Research  Space’  –  hedging  allows  authors  to  insert  themselves  into   the  discourse  in  a  community  [3]   –  ‘the  strongest  claim  a  careful  researcher  can  make’  [4]  •  Hedging  cues,  specula&ve  language,  modality/nega&on:   –  Light  et  al  [5]:  finding  specula&ve  language   –  Wilbur  et  al  [6]:  focus,  polarity,  certainty,  evidence,  and  direc&onality   –  Thompson  et  al  [7]:  level  of  specula&on,  type/source  of  the  evidence  and   level  of  certainty      •  Sen&ment  detec&on  (e.g.  Kim  and  Hovy  [8]  a.m.o.):     –  Holder  of  the  opinion,  strength,  polarity  as  ‘mathema&cal  func&on’  ac&ng  on   main  proposi&onal  content     –  Wide  applica&ons  in  product  reviews;  but  not  (yet)  in  science!  
  • A  model  for  epistemic  evalua&ons:  For  a  Proposi&on  P,  an  epistemically  marked  clause  E  is  an  evalua&on  of  P,    where    EV,  B,  S(P),  with:   –  V  =  Value:   3  =  Assumed  true,  2  =  Probable,  1  =  Possible,  0  =  Unknown,     (-­‐  1=  possibly  untrue,  -­‐  2  =  probably  untrue,  -­‐3  =  assumed  untrue)   –  B  =  Basis:   Reasoning   Data     –  S  =  Source:   A  =  speaker  is  author  A,  explicit   IA  =  speaker  author,  A,  implicit   N  =  other  author  N,  explicit   NN  =  other  author  NN,  implicit     Model  suggested  by  Eduard  Hovy,     Informa<on  Sciences  Ins<tute  University  South  Califormia  
  • Repor&ng  verbs  vs.  epistemic  value:  Value  =  0   establish,  (remain  to  be)  elucidated,    (unknown)   be  (clear/useful),  (remain  to  be)  examined/determined,   describe,  make  difficult  to  infer,  report  Value  =  1   be  important,  consider,  expect,  hypothesize  (5x),  give  (hypothe&cal)   insight,  raise  possibility  that,  suspect,  think  Value  =  2   appear,  believe,  implicate  (2x),  imply,  indicate  (12x),  play  a  (probable)   role,  represent,  suggest  (18x),  validate  (2x),    Value  =  3   be  able/apparent/important  /posi&ve/visible,  compare  (presumed  true)   (2x),  confirm  (2x),  define,    demonstrate  (15x),  detect  (5x),   discover,  display  (3x),  eliminate,  find  (3x),  iden&fy  (4x),   know,  need,  note  (2x),  observe  (2x),  obtain  (success/ results-­‐  3x),  prove  to  be,  refer,  report(2x),    reveal  (3x),   see(2x),  show(24x),    study,  view  
  • Most  prevalent  clause  type:     These  results  suggest  that...  Adverb/Connec&ve   thus,  therefore,  together,  recently,  in  summary    Determiner/Pronoun     it,  this,  these,  we/our  Adjec&ve   previous,  future,  beYer  Noun  phrase   data,  report,  study,  result(s);  method  or  reference  Modal   form  of    ‘to  be’,  may,  remain  Adjec&ve   o*en,  recently,  generally  Verb   show,  obtain,  consider,  view,  reveal,  suggest,   hypothesize,  indicate,  believe  Preposi&on     that,  to  
  • Ontology  for  Reasoning,  Certainty  and   Airibu&on  [11]    
  • Adding  metadiscourse  to  triples:  Biological  statement  with  BEL/  epistemic   BEL  representa-on:   Epistemic  markup   evalua-on  These  miRNAs  neutralize  p53-­‐mediated  CDK   r(MIR:miR-­‐372)  -­‐| Value  =  inhibi<on,  possibly  through  direct  inhibi<on   (tscript(p(HUGO:Trp53))  -­‐|   Possible  of  the  expression  of  the  tumor-­‐suppressor   kin(p(PFH:”CDK    Family”)))   Source  =  LATS2.     Increased  abundance  of   Unknown   miR-­‐372  decreases   Basis  =   abundance  of  LATS2   Unknown   r(MIR:miR-­‐372)  -­‐|     r(HUGO:LATS2)  Biological  statement  with  Medscan/ MedScan  Analysis:   Epistemic  epistemic  markup   evalua-on  Furthermore,  we  present  evidence  that  the   IL-­‐6  è  NUCB2  (nesfa<n-­‐1)   Value  =  secre<on  of  nesfaTn-­‐1  into  the  culture   Rela&on:  MolTransport   Probable  media  was  drama&cally  increased  during  the   Effect:  Posi&ve   Source  =  differen&a&on  of  3T3-­‐L1  preadipocytes  into   CellType:  Adipocytes   Author  adipocytes  (P  <  0.001)  and  aUer  treatments   Cell  Line:  3T3-­‐L1   Basis  =  Data    with  TNF-­‐alpha,  IL-­‐6,  insulin,  and      dexamethasone  (P  <  0.01).  
  • Claim-­‐Evidence  example:  Data2Seman<cs   Goal:  improve  speed  of  integra&on  of  research  >  prac&ce     Step 1: Patient data + diagnosis link to Guideline recommendation B.  Elsevier-­‐published    A. Philips’ Electronic Patient Records Clinical  Guideline   Step 2: Guideline recommendation links to evidence in report or data C. Elsevier (or other publisher’s) Research Report or Data
  • Claim-­‐Evidence  Chains  in     Drug-­‐drug  wiide  collec&on  oaf  nd   drug  names  in   nterac&ons   Step  1:  Manually  iden&fy  DDIs   content  sources   Step  2:  Develop  a  model  of  Drug-­‐Drug   Interac&on  and  define  candidates   Step  3:  Automate  this  process  and   store  as  Linked  Data   20
  • Claimed  Knowledge  Updates  Defini&on:    1)  A  CKU  expresses  a  proposi&on  about  biological  en&&es    2)  A  CKU  is  a  new  proposi&on  3)  The  authors  present  the  CKU  as  factual:  =>  Strength  =  Certainty  4)  A  CKU  is  derived  from  experimental  work  described  in  the  ar&cle:  =>  Basis  =  Data  5)  The  ownership  is  aiributed     to  the  author(s)  of  the  ar&cle.    ⇒  Source  =  Author,  Explicit  Sandor/de  Waard,  [13]  
  • A  corpus  for  cita&on  analysis:    Type   Voorhoeve  text   CiTng  text  Method   We  subsequently  created  a  human   Voorhoeve  et  al.  (116)  employed  a  novel  strategy  by   miRNA  expression  library  (miR-­‐Lib)  by   combining  an  miRNA  vector  library  and  corresponding  bar   cloning  almost  all  annotated  human   code  array  Using  a  novel  retroviral  miRNA  expression   miRNAs  into  our  vector  (Rfam  release   library,     6)  (Figure  S3)     Agami  and  co-­‐workers  performed  a  cell-­‐based  screen  Result   we  iden&fied  miR-­‐372  and  miR-­‐373,   miR-­‐372  and  miR-­‐373  were  consequently  found  to  permit   each  permi|ng  prolifera&on  and   prolifera&on  and  tumorigenesis  of  these  primary  cells   tumorigenesis  of  primary  human   carrying  both  oncogenic  RAS  and  wild-­‐type  p53,     cells  that  harbor  both  oncogenic   Voorhoeve  et  al.  (2006)  iden&fied  miR-­‐372  and  miR-­‐373     RAS  and  ac&ve  wild  -­‐  type  p53.     miR-­‐372  has  been  recently  described  as  poten&al  oncogene   that  collaborate  with  oncogenic  RAS  in  cellular   transforma&on  Interpreta<on   These  miRNAs  neutralize  p53-­‐   probably  through  direct  inhibi&on  of  the  expression  of  the   mediated  CDK  inhibi&on,  possibly   tumor-­‐suppressor  LATS2  and  subsequent  neutraliza&on  of   through  direct  inhibi&on  of  the   the  p53  pathway.     expression  of  the  tumor  suppressor   Compromised  Lats2  func&onality  might  reduce  the  selec&ve   LATS2  .     pressure  for  p53  inac&va&on  during  tumor  progression.       Work  done  with  Lucy  Vanderwende  
  • Data  sharing  in  biology  •  Interspecies  variability  >  A  specimen  is  not  a  species!  •  Gene  expression  variability  >    Knowing  genes  is  not     knowing  how  they  are  expressed!  •  Microbiome  >    An  animal  is  an  ecosystem!  •  Systems  biology  >  Whole  is  more  than  the  sum  of  its  parts!  •  Models  vs.  experiment  >  Are  we  talking  about  the  same   things?  In  a  way  we  can  all  use?    •  Dynamics  >  Life  is  not  in  equilibrium!           =>  Life  is  complicated!   Reduc&onism  doesn’t  work   for  living  systems.   hip://  
  • Sta&s&cs  to  the  rescue!    With  enough  observa&ons,  trends  and  anomalies  can  be  detected:  •   “Here  we  present  resources  from  a  popula&on  of  242   healthy  adults  sampled  at  15  or  18  body  sites  up  to  three   &mes,  which  have  generated  5,177  microbial  taxonomic   profiles  from  16S  ribosomal  RNA  genes  and  over  3.5   terabases  of  metagenomic  sequence  so  far.”     The  Human  Microbiome  Project  Consor&um,  Structure,  func&on  and  diversity  of   the  healthy  human  microbiome,  Nature  486,  207–214  (14  June  2012)  doi:10.1038/ nature11234  •  “The  large  sample  size  —  4,298  North  Americans  of   European  descent  and  2,217  African  Americans  —  has   enabled  the  researchers  to  mine  down  into  the  human   genome.”     Nidhi  Subbaraman,  Nature  News,  28  November  2012,  High-­‐resolu&on  sequencing   study  emphasizes  importance  of  rare  variants  in  disease.    
  • Enable  ‘incidental  collaboratories’:  •  Collect:  store  data  at  the  level  of  the  experiment:   –  Accessible  through  a  single  interface   –  Add  enough  metadata  to  know  what  was  done/seen  •  Connect:  allow  analyses  over:     –  Similar  experiment  types     –  Experiments  done  with/on  similar  biological  ‘things’     (species,  strains,  systems,  cells  etc.)   –  In  a  way  that  can  be  used  by  modelers!    •  Keep:   –  Long-­‐term  preserva&on  of  data  and  soUware       –  Fulfill  Data  Management  Plan  requirements   –  Allow  ‘gated’  access  when  and  to  whom  researcher  wants  
  • Let’s  look  at  a  typical  lab:  •  How  to  get  the  right     an&body  IDs    •  And  messy  bits      •  From  the  lab  notebook    •  Into  the  PI’s  command     center?  
  • Objec&ons  and  rebuials  re.  data  sharing  Objec-on:   Rebu>al:  “But  our  lab  notebooks  are  all  on   Develop  smart  phone/tablet  apps  for  data  paper”   input  “I  need  to  see  a  direct  benefit  from   Develop  ‘data  manipula-on  dashboard’  for  something  I  spend  my  &me  on”   PI  to  allow  beier  access  to  full     experimental  output  for  his/her  lab  “I  want  things  to  be  peer  reviewed   Allow  reviewers  access  to  experimental  before  I  expose  them”   database  before  publica&on  (of  data  or     paper)  “I  don’t  really  trust  anyone  else’s   Add  a  social  networking  component  to  this  data  –  well,  except  for  the  guys  I   data  repository  so  you  know  who  (to  the  went  to  Grad  School  with…”     individual)  created  that  data  point.    “I  am  afraid  other  people   =>  Reward  system  moves  from  a  might  scoop  my  discoveries”   compe--on  to  a  ‘shared  mission’  
  • Problem:  biological  research  is  quite  insular  •  Biology  is  small:  size  10^-­‐5  –  10^2  m,   scien&st  can  work  alone  (‘King’  and   ‘subjects’).    •  Biology  is  messy:  it  doesn’t  happen   Prepare   behind  a  terminal.    •  Biology  is  compe&&ve:  many     Ponder   Observe   people  with  similar  skill  sets,     Communicate   vying  for  the  same  grants       Analyze  •  In  summary:  the  structure  of  biological   research  does  not  inherently  promote   collabora&on  (vs.,  for  instance,  big   physics  or  astronomy).  
  • So  we  can  do  joint  experiments:  Across  labs,  experiments:  track  reagents  and  how  they  are  used   Observa&ons   Observa&ons   Observa&ons   Prepare   Prepare   Analyze   Communicate   Analyze   Communicate  
  • So  we  can  do  joint  experiments:  Compare  outcome  of  interac&ons  with  these  en&&es   Observa&ons   Observa&ons   Observa&ons   Prepare   Prepare   Analyze   Communicate   Analyze   Communicate  
  • So  we  can  do  joint  experiments:  Build  a  ‘virtual  reagent  spectrogram’  by  comparing    how  different  en&&es     Observa&ons  interacted  in  different  experiments   Observa&ons   Observa&ons   Prepare   Prepare   Analyze   Communicate   Analyze   Communicate  
  • Elsevier  Research  Data  Services:  1.  Help  increase  the  amount  of  data  shared  from   the  lab,  enabling  incidental  collaboratories  2.  Help  increase  the  value  of  the  data  shared  by   increasing  annota&on,  normaliza&on,   provenance  enabling  enhanced  interoperability  3.  Help  measure  and  deliver  credit  for  shared   data,  the  researchers,  the  ins&tute,  and  the   funding  body,  enabling  more  sustainable   pla‚orms  
  • Summary  –     Possible  Collabora&ons?    •  A  model  of  scien&fic  sensemaking:     Thesis:  joint     –  Stories,  that  persuade  with  data   research?     –  Discourse  segments  and  verb  tense  •  Towards  claim-­‐evidence  networks:   Labs:  research   collabora&ons?   –  Hedging  in  science   –  Crea&ng  claim-­‐evidence  networks  •  Data:     RDS:  joint   –  Why  life  is  so  complicated   development?   –  Connec&ng  experiments  into  collaboratories  
  • References:  [1]  J  Am  Med  Inform  Assoc.  2010  September;  17(5):  514–518  hip://    [2]  Quanzhi  Li,  Yi-­‐Fang  Brook  Wu  (2006):  Iden&fying  important  concepts  from  medical  documents,  Journal  of  Biomedical  Informa&cs  39  (2006)  668–679  [3]  Useful  list  of  resources  in  bioinforma&cs  hip://www.bioinforma&  [4]  Biological  Expression  Language  –  hip://    [5]  Latour,  B.  and  Woolgar,  S.,  Laboratory  Life:  the  Social  Construc&on  of  Scien&fic  Facts,  1979,  Sage  Publica&ons  [6]  Light  M,  Qiu  XY,  Srinivasan  P.  (2004).  The  language  of  bioscience:  facts,  specula&ons,  and  statements  in  between.  BioLINK  2004:  Linking  Biological  Literature,  Ontologies  and  Databases  2004:17-­‐24.  [7]  Wilbur  WJ,  Rzhetsky  A,  Shatkay  H  (2006).  New  direc&ons  in  biomedical  text  annota&ons:  defini&ons,  guidelines  and  corpus  construc&on.  BMC  Bioinforma&cs  2006,  7:356.  [8]  Thompson  P.,  Venturi  G.,  McNaught  J,  Montemagni  S,  Ananiadou  S.  (2008).  Categorising  modality  in  biomedical  texts.  Proc.  LREC  2008  Wkshp  Building  and  Evalua&ng  Resources  for  Biomedical  Text  Mining  2008.  [9]  Kim,  S-­‐M.  Hovy,  E.H.  (2004).  Determining  the  Sen&ment  of  Opinions.  Proceedings  of  the  COLING  conference,  Geneva,  2004.    [10]  de  Waard,  A.  and  Schneider,  J.  (2012)  Formalising  Uncertainty:  An  Ontology  of  Reasoning,  Certainty  and  Airibu&on  (ORCA),  Seman&c  Technologies  Applied  to  Biomedical  Informa&cs  and  Individualized  Medicine  workshop  at  ISWC  2012  (submiYed)  [11]  Data2Seman&cs  project:  hip://www.data2seman&    [12]  Boyce  R,  Collins  C,  Horn  J,  Kalet  I.  (2009)    Compu&ng  with  evidence  Part  I:  A  drug-­‐mechanism  evidence  taxonomy  oriented  toward  confidence  assignment.  J  Biomed  Inform.  2009  Dec;42(6):979-­‐89.  Epub  2009  May  10,  see  also  hip://dbmi-­‐icode-­‐­‐evidence/front-­‐page.html