Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Towards Incidental Collaboratories; Research Data Services


Published on

Presentation given at the CNI Fall 2012 meeting on why we need research data services.

  • Be the first to comment

  • Be the first to like this

Towards Incidental Collaboratories; Research Data Services

  1. 1. Research  Data  Services:    Towards  a  Framework  for  Incidental  Collaboratories   Anita  de  Waard   VP  Research  Data  Collabora@ons,  Elsevier  RDS   Jericho,  VT,  USA  
  2. 2. Brief  bio:  •  Background:     –  Low-­‐temperature  physics  (Leiden  &  Moscow)   –  Joined  Elsevier  in  1988  as  publisher  in  solid  state  physics   –  1991:  ArXiV  =>  publishers  will  go  out  of  business  very  soon!  •  1997-­‐  now:  Disrup@ve  Technologies  Director,  focus  on  beXer   representa@on  of  scien@fic  knowledge:   –  Iden@fying  key  knowledge  elements  in  ar@cles  (linguis@cs  thesis)   –  Building  claim-­‐evidence  networks  (through  collabora@ons)   –  Help  build  communi@es  to  accelerate  rate  of  change  (Force11)  •  Star@ng  1/1/2013:  VP  Research  Data  Collabora@ons  -­‐  why?     –  Douglas  Engelbart’s  thinking:  connect  minds!   –  My  (non-­‐biologists)  understanding  of  biology:  
  3. 3. The  big  problem  in  biology:  Interspecies  variability:  A  specimen  is  not  a  species  Gene  expression  variability:  Knowing  genes  is  not    knowing  how  they  are  expressed  Microbiome:  An  animal  is  an  ecosystem  Systems  biology:  A  whole  is  more  than  the  sum  of  its  parts       Reduc@onist  science  doesn’t  work   for  living  systems!   hXp://  
  4. 4. Sta@s@cs  to  the  rescue!    With  enough  observa@ons,  trends  and  anomalies  can  be  detected:  •   “Here  we  present  resources  from  a  popula@on  of  242  healthy  adults   sampled  at  15  or  18  body  sites  up  to  three  @mes,  which  have  generated   5,177  microbial  taxonomic  profiles  from  16S  ribosomal  RNA  genes  and   over  3.5  terabases  of  metagenomic  sequence  so  far.”     The  Human  Microbiome  Project  Consor@um,  Structure,  func@on  and  diversity  of  the  healthy   human  microbiome,  Nature  486,  207–214  (14  June  2012)  doi:10.1038/nature11234  •  “The  large  sample  size  —  4,298  North  Americans  of  European  descent   and  2,217  African  Americans  —  has  enabled  the  researchers  to  mine   down  into  the  human  genome.”     Nidhi  Subbaraman,  Nature  News,  28  November  2012,  High-­‐resolu@on  sequencing  study   emphasizes  importance  of  rare  variants  in  disease.  •  “A  profile  unique  for  a  DNA  sample  source  is  obtained    …  a  series   of  numbers  are  generated  which  can  be  used  as  a  bar  code  for   that  DNA  source.  A  registry  of  bar  codes  would  make  it  easy  to   compare  DNA  samples”     Roland  M.  Nardone,  Ph.D.,  Eradica@on  of  Cross-­‐Contaminated  Cell  Lines:  A  Call  for  Ac@on,   hXp://    
  5. 5. Enable  ‘incidental  collaboratories’:  •  Collect:  store  data  at  the  level  of  the  experiment:   –  Accessible  through  a  single  interface   –  Add  enough  metadata  to  know  what  was  done/seen  •  Connect:  allow  analyses  over:     –  Similar  experiment  types     –  Experiments  done  with/on  similar  biological  ‘things’:   •  Species,  strains,  systems,  cells   •  Anatomical  components  (e.g.  spleen,  hypothalamus)   •  An@bodies,  biomarkers,  bioac@ve  chemicals,  etc  •  Keep:   –  Long-­‐term  preserva@on  of  data  and  sosware  (Olive)   –  Fulfill  Data  Management  Plan  requirements   –  Allow  gated  access,  if  needed    
  6. 6. Problem:  biological  research  is  quite  insular  •  Biology  is  small:  because  objects/ equipment  are  10^-­‐5  –  10^2  m,  you   can  work  alone  (‘King’  and   ‘subjects’).     Prepare  •  Biology  is  messy:  it  doesn’t  happen   behind  a  terminal.     Ponder   Observe  •  Biology  is  compe@@ve:  different   Communicate   people  with  similar  skill  sets,  vying   for  the  same  grants.     Analyze  •  In  summary:  it  does  not  promote   inherent  collabora@on  (vs.,  for   instance,  big  physics  or  astronomy).  
  7. 7. Try  to  pop  the  ‘lab  bubble’!   Prepare   Observa@ons  Labs  go  from  being   Analyze   Communicate   Think   Observa@ons  informa@on  islands,    to  being  ‘sensors  in  a   Observa@ons  network’.   Prepare   Prepare   Analyze   Communicate   Analyze   Communicate  
  8. 8. Some  objec@ons,  and  rebuXals:  Objec&on:   Rebu-al:  “But  our  lab  notebooks  are  all  on   Develop  smart  phone/tablet  apps  for  data  paper”   input  “I  need  to  see  a  direct  benefit  from   Develop  ‘data  manipula@on  dashboard’  something  I  spend  my  @me  on”   for  PI  to  allow  beXer  access  to  full     experimental  output  for  his/her  lab  “I  am  afraid  other  people  might   Develop  intra-­‐lab  data  communica@on  scoop  my  discoveries”   systems  first  and  allow  @med/granular     data  export  “I  want  things  to  be  peer  reviewed   Allow  reviewers  access  to  experimental  before  I  expose  them”   database  before  publica@on  (of  data  or     paper)  “I  don’t  really  trust  anyone  else’s   Add  a  social  networking  component  to  data  –  well,  except  for  the  guys  I   this  data  repository  so  you  know  who  (to  went  to  Grad  School  with…”     the  individual)  created  that  data  point.    
  9. 9. Elsevier  Research  Data  Services:  Goals  1.  Help  increase  the  amount  of  data  shared  from  the  lab,   enabling  incidental  collaboratories  2.  Help  increase  the  value  of  the  data  shared  by   increasing  annota@on,  normaliza@on,  provenance   enabling  enhanced  interoperability  3.  Help  measure  and  deliver  credit  for  shared  data,  the   researchers,  the  ins@tute,  and  the  funding  body,   enabling  more  sustainable  pla;orms  
  10. 10. RDS  Guiding  Principles:  •  In  principle,  all  open  data  stays  open  and  URLs,  front   end  etc.  stay  where  they  are  (i.e.  with  repository)  •  Collabora@on  is  tailored  to  data  repositories’    unique   needs/interests  and  of  a  ‘service-­‐model’  type:     –  Aspects  where  collabora@on  is  needed  are  discussed   –  A  collabora@on  plan  is  drawn  up  using  a  Service-­‐Level   Agreement:  agree  on  @me,  condi@ons,  etc.     –  All  communica@on,  finance,  IPR  etc.  is  completely   transparent  at  all  @mes.    •  Very  small  (2/3  people)  department;  immediate   communica@on;  instant  deployment  of  ideas    
  11. 11. RDS  Approach:  •  Collaborate  and  build  on  rela@onships  with  data   repositories  (life  science,  earth  science,  others)  •  Integrate  with  other  content  sources,  if  possible  •  Build  annota@on  and  standardisa@on  tools  and   processes  to  implement  this  •  Develop  next-­‐genera@on  infrastructure  solu@ons   for  back-­‐end  integra@on  •  Explore  crea@ve  revenue  opportuni@es  
  12. 12. NIF  An@body  Registry:  Problem:    •  95  an@bodies  were  iden@fied  in  8  papers  •  52  did  not  contain  enough  informa@on     to  determine  the  an@body  used  •  Some  provided  details  in  another  paper  •  Failed  to  give  species,  vendor,  catalog  #  Solu@on  #  1:    •  Journals  ask  authors  to  provide     an@body  catalog  nr    •  Link  to  NIF  Registry  from  manufacturers/ vendors’  sites  Solu@on  #2:    •  Pilot  with  a  lab:    
  13. 13. Let’s  start  with  the  Urban  Lab    •  Geyng  an@bodies    •  And  messy  bits      •  From  the  notebook    •  Into  Nathan  Urban’s   command  center    •  By  providing   – 7”  Tablets   – Links  to  IgorPro   – A  dashboard  UI  
  14. 14. My  ques@ons  to  you:  •  Thoughts  on  this  approach:     –  In  principle?     –  In  prac@ce?  •  Do  you  see  serious  hurdles:     –  Are  we  overlapping  with  other  ini@a@ves;  if  so,  are  we   complementary?   –  How  does  this  connect  to  libraries/local  repositories?     –  Are  there  sensi@vi@es/pain  points  we  are  overlooking?    •  Where  to  start:     –  How  to  collaborate?     –  Who  to  talk  to  –  funding  agencies,  socie@es:  who  else?     –  Thoughts  on  data  repositories/plazorms  to  connect  to?    
  15. 15. Your  ques@ons  to  me?   hXp://     hXp://    Thanks  go  to:  •  Anita  Bandrowski  and  Maryann  Martone,  NIF  •  Nathan  Urban,  Shreejoy  Tripathy,  CMU  •  David  Marques,  SVP  RDS