Towards Incidental Collaboratories; Research Data Services

570 views

Published on

Presentation given at the CNI Fall 2012 meeting on why we need research data services.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
570
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Towards Incidental Collaboratories; Research Data Services

  1. 1. Research  Data  Services:    Towards  a  Framework  for  Incidental  Collaboratories   Anita  de  Waard   VP  Research  Data  Collabora@ons,  Elsevier  RDS   Jericho,  VT,  USA  
  2. 2. Brief  bio:  •  Background:     –  Low-­‐temperature  physics  (Leiden  &  Moscow)   –  Joined  Elsevier  in  1988  as  publisher  in  solid  state  physics   –  1991:  ArXiV  =>  publishers  will  go  out  of  business  very  soon!  •  1997-­‐  now:  Disrup@ve  Technologies  Director,  focus  on  beXer   representa@on  of  scien@fic  knowledge:   –  Iden@fying  key  knowledge  elements  in  ar@cles  (linguis@cs  thesis)   –  Building  claim-­‐evidence  networks  (through  collabora@ons)   –  Help  build  communi@es  to  accelerate  rate  of  change  (Force11)  •  Star@ng  1/1/2013:  VP  Research  Data  Collabora@ons  -­‐  why?     –  Douglas  Engelbart’s  thinking:  connect  minds!   –  My  (non-­‐biologists)  understanding  of  biology:  
  3. 3. The  big  problem  in  biology:  Interspecies  variability:  A  specimen  is  not  a  species  Gene  expression  variability:  Knowing  genes  is  not    knowing  how  they  are  expressed  Microbiome:  An  animal  is  an  ecosystem  Systems  biology:  A  whole  is  more  than  the  sum  of  its  parts       Reduc@onist  science  doesn’t  work   for  living  systems!   hXp://en.wikipedia.org/wiki/File:Duck_of_Vaucanson.jpg  
  4. 4. Sta@s@cs  to  the  rescue!    With  enough  observa@ons,  trends  and  anomalies  can  be  detected:  •   “Here  we  present  resources  from  a  popula@on  of  242  healthy  adults   sampled  at  15  or  18  body  sites  up  to  three  @mes,  which  have  generated   5,177  microbial  taxonomic  profiles  from  16S  ribosomal  RNA  genes  and   over  3.5  terabases  of  metagenomic  sequence  so  far.”     The  Human  Microbiome  Project  Consor@um,  Structure,  func@on  and  diversity  of  the  healthy   human  microbiome,  Nature  486,  207–214  (14  June  2012)  doi:10.1038/nature11234  •  “The  large  sample  size  —  4,298  North  Americans  of  European  descent   and  2,217  African  Americans  —  has  enabled  the  researchers  to  mine   down  into  the  human  genome.”     Nidhi  Subbaraman,  Nature  News,  28  November  2012,  High-­‐resolu@on  sequencing  study   emphasizes  importance  of  rare  variants  in  disease.  •  “A  profile  unique  for  a  DNA  sample  source  is  obtained    …  a  series   of  numbers  are  generated  which  can  be  used  as  a  bar  code  for   that  DNA  source.  A  registry  of  bar  codes  would  make  it  easy  to   compare  DNA  samples”     Roland  M.  Nardone,  Ph.D.,  Eradica@on  of  Cross-­‐Contaminated  Cell  Lines:  A  Call  for  Ac@on,   hXp://www.sivb.org/publicPolicy_Eradica@on.pdf    
  5. 5. Enable  ‘incidental  collaboratories’:  •  Collect:  store  data  at  the  level  of  the  experiment:   –  Accessible  through  a  single  interface   –  Add  enough  metadata  to  know  what  was  done/seen  •  Connect:  allow  analyses  over:     –  Similar  experiment  types     –  Experiments  done  with/on  similar  biological  ‘things’:   •  Species,  strains,  systems,  cells   •  Anatomical  components  (e.g.  spleen,  hypothalamus)   •  An@bodies,  biomarkers,  bioac@ve  chemicals,  etc  •  Keep:   –  Long-­‐term  preserva@on  of  data  and  sosware  (Olive)   –  Fulfill  Data  Management  Plan  requirements   –  Allow  gated  access,  if  needed    
  6. 6. Problem:  biological  research  is  quite  insular  •  Biology  is  small:  because  objects/ equipment  are  10^-­‐5  –  10^2  m,  you   can  work  alone  (‘King’  and   ‘subjects’).     Prepare  •  Biology  is  messy:  it  doesn’t  happen   behind  a  terminal.     Ponder   Observe  •  Biology  is  compe@@ve:  different   Communicate   people  with  similar  skill  sets,  vying   for  the  same  grants.     Analyze  •  In  summary:  it  does  not  promote   inherent  collabora@on  (vs.,  for   instance,  big  physics  or  astronomy).  
  7. 7. Try  to  pop  the  ‘lab  bubble’!   Prepare   Observa@ons  Labs  go  from  being   Analyze   Communicate   Think   Observa@ons  informa@on  islands,    to  being  ‘sensors  in  a   Observa@ons  network’.   Prepare   Prepare   Analyze   Communicate   Analyze   Communicate  
  8. 8. Some  objec@ons,  and  rebuXals:  Objec&on:   Rebu-al:  “But  our  lab  notebooks  are  all  on   Develop  smart  phone/tablet  apps  for  data  paper”   input  “I  need  to  see  a  direct  benefit  from   Develop  ‘data  manipula@on  dashboard’  something  I  spend  my  @me  on”   for  PI  to  allow  beXer  access  to  full     experimental  output  for  his/her  lab  “I  am  afraid  other  people  might   Develop  intra-­‐lab  data  communica@on  scoop  my  discoveries”   systems  first  and  allow  @med/granular     data  export  “I  want  things  to  be  peer  reviewed   Allow  reviewers  access  to  experimental  before  I  expose  them”   database  before  publica@on  (of  data  or     paper)  “I  don’t  really  trust  anyone  else’s   Add  a  social  networking  component  to  data  –  well,  except  for  the  guys  I   this  data  repository  so  you  know  who  (to  went  to  Grad  School  with…”     the  individual)  created  that  data  point.    
  9. 9. Elsevier  Research  Data  Services:  Goals  1.  Help  increase  the  amount  of  data  shared  from  the  lab,   enabling  incidental  collaboratories  2.  Help  increase  the  value  of  the  data  shared  by   increasing  annota@on,  normaliza@on,  provenance   enabling  enhanced  interoperability  3.  Help  measure  and  deliver  credit  for  shared  data,  the   researchers,  the  ins@tute,  and  the  funding  body,   enabling  more  sustainable  pla;orms  
  10. 10. RDS  Guiding  Principles:  •  In  principle,  all  open  data  stays  open  and  URLs,  front   end  etc.  stay  where  they  are  (i.e.  with  repository)  •  Collabora@on  is  tailored  to  data  repositories’    unique   needs/interests  and  of  a  ‘service-­‐model’  type:     –  Aspects  where  collabora@on  is  needed  are  discussed   –  A  collabora@on  plan  is  drawn  up  using  a  Service-­‐Level   Agreement:  agree  on  @me,  condi@ons,  etc.     –  All  communica@on,  finance,  IPR  etc.  is  completely   transparent  at  all  @mes.    •  Very  small  (2/3  people)  department;  immediate   communica@on;  instant  deployment  of  ideas    
  11. 11. RDS  Approach:  •  Collaborate  and  build  on  rela@onships  with  data   repositories  (life  science,  earth  science,  others)  •  Integrate  with  other  content  sources,  if  possible  •  Build  annota@on  and  standardisa@on  tools  and   processes  to  implement  this  •  Develop  next-­‐genera@on  infrastructure  solu@ons   for  back-­‐end  integra@on  •  Explore  crea@ve  revenue  opportuni@es  
  12. 12. NIF  An@body  Registry:  Problem:    •  95  an@bodies  were  iden@fied  in  8  papers  •  52  did  not  contain  enough  informa@on     to  determine  the  an@body  used  •  Some  provided  details  in  another  paper  •  Failed  to  give  species,  vendor,  catalog  #  Solu@on  #  1:    •  Journals  ask  authors  to  provide     an@body  catalog  nr    •  Link  to  NIF  Registry  from  manufacturers/ vendors’  sites  Solu@on  #2:    •  Pilot  with  a  lab:    
  13. 13. Let’s  start  with  the  Urban  Lab    •  Geyng  an@bodies    •  And  messy  bits      •  From  the  notebook    •  Into  Nathan  Urban’s   command  center    •  By  providing   – 7”  Tablets   – Links  to  IgorPro   – A  dashboard  UI  
  14. 14. My  ques@ons  to  you:  •  Thoughts  on  this  approach:     –  In  principle?     –  In  prac@ce?  •  Do  you  see  serious  hurdles:     –  Are  we  overlapping  with  other  ini@a@ves;  if  so,  are  we   complementary?   –  How  does  this  connect  to  libraries/local  repositories?     –  Are  there  sensi@vi@es/pain  points  we  are  overlooking?    •  Where  to  start:     –  How  to  collaborate?     –  Who  to  talk  to  –  funding  agencies,  socie@es:  who  else?     –  Thoughts  on  data  repositories/plazorms  to  connect  to?    
  15. 15. Your  ques@ons  to  me?   a.dewaard@elsevier.com   hXp://elsatglabs.com/labs/anita/     hXp://www.slideshare.net/anitawaard    Thanks  go  to:  •  Anita  Bandrowski  and  Maryann  Martone,  NIF  •  Nathan  Urban,  Shreejoy  Tripathy,  CMU  •  David  Marques,  SVP  RDS  

×