Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

733 views

Published on

The advent of the Internet is bringing about fundamental changes in the ways that research is performed and communicated. These have been particularly driven by the growing importance of data, as well as the tools available to work with this data. This presentation will examine this shift, drawing on examples from the life‐sciences, and try to make some predictions about the next five years.

First presented at the 2014 Winter School in Mathematical and Computational Biology http://bioinformatics.org.au/ws14/program/

Published in: Science
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
733
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Andrew Treloar - The life‐sciences as a pathfinder in data‐intensive research practice Abstract: The

  1. 1. The  life-­‐sciences  as  a   pathfinder  in  data-­‐ intensive  research   prac3ce   Dr  Andrew  Treloar,  Director  of   Technology   July  10,  2014   CC-­‐BY-­‐SA,  @atreloar   1  
  2. 2. Structure  presenta3on   §  Research  Lifecycles   §  Func3ons  of  Scholarly  Communica3on   §  Pointers  to  the  future   §  Characterising  the  future   §  Pathfinder  problems   §  Conclusions   July  10,  2014   CC-­‐BY-­‐SA,  @atreloar   2  
  3. 3. So  many  lifecycles…   July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   3  
  4. 4. Minimal  Research  Lifecycle   Think DoShare July  10,  2014   CC-­‐BY-­‐SA,  @atreloar   4  
  5. 5. Sharing: Scholarly Communication System and its Functions §  Registration §  Certification §  Awareness §  Archiving (Rosendaal and Geurts, 1997) July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   5  
  6. 6. System of Journals §  Registration §  submission of manuscript §  Certification §  peer-review (pre-publication) §  commentary (post-publication) §  Awareness §  discovery services §  Archiving §  libraries (print) §  publishers (electronic) §  special purpose organisations (e.g. Portico) July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   6  
  7. 7. Pointers to the future “the future is already here – it’s just not very evenly distributed” William Gibson, NPR interview July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   7  
  8. 8. Registration: BioRxiv July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   8  
  9. 9. Registration: Github July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   9  
  10. 10. Registration: WikiPathways July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   10  
  11. 11. Registration: NeuroLex July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   11  
  12. 12. Registration: Nanopublications July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   12  
  13. 13. Registra3on:  some  observa3ons   §  Decoupling  registra3on  from  cer3fica3on     §  Timestamping,  versioning   §  Registra3on  of  various  types  of  objects   §  Machines  as  creators  and  contributors   July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   13  
  14. 14. Certification: PubMed Commons July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   14  
  15. 15. Certification: PubPeer July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   15  
  16. 16. Cer3fica3on:  Publons   July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   16  
  17. 17. Cer3fica3on:  some  observa3ons   §  Peer-­‐review  decoupled  from  publica3on  process   §  Cer3fica3on  of  various  types  of  objects   §  Machines  valida3ng  form   §  Social  endorsement   July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   17  
  18. 18. Awareness: myExperiment July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   18  
  19. 19. Awareness: eLabNotebook RSS July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   19  
  20. 20. Awareness: Twitter July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   20  
  21. 21. Awareness: some observations §  Awareness  for  various  types  of  objects   §  Real  3me  awareness   §  Awareness  support  targeted  at  machines   §  Awareness  through  social  media   July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   21  
  22. 22. Archiving: PDB July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   22  
  23. 23. Archiving: GenBank July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   23  
  24. 24. Characterising the future Fixed Varying Discrete Continuous Hidden VisibleResearch Process Nature of object Process of making public Speed of communicationDelayed Instant Atomic CompoundAtomicity of object Communicated object Publication +data proxies Publication + linked data + linked models Formal InformalNature of processJuly  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   24  
  25. 25. Fundamental changes §  The research process (objects, social dimension) is becoming more exposed §  Articles, books are no longer the only relevant objects for research communication §  Objects are no longer static §  Machines are joining humans as (co-)creators and consumers of research objects July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   25  
  26. 26. Pathfinder  problems   §  Integrity  of  the  scholarly  record   §  The  three  obsolescences   §  hardware   §  file  format   §  soWware   July  10,  2014   CC-­‐BY-­‐SA,  @atreloar   26  
  27. 27. System of Journals: Archiving July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   27  
  28. 28. Web of Objects: Archiving? July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   28  
  29. 29. Not just citation relationships July  10,  2014   CC-­‐BY-­‐SA,  @hvdsomp  and  @atreloar   29  
  30. 30. The  problem  of  obsolescence   §  Lifescience  research  environment  can  be  viewed   as  undergoing  a  process  of  accelerated  evolu3on   §  Other  disciplines  will  hit  these  problems  in  3me     July  10,  2014   CC-­‐BY-­‐SA,  @atreloar   30  
  31. 31. Cambrian  explosion   July  10,  2014   31  
  32. 32. Hardware  obsolescence:  Roche  454   July  10,  2014   CC-­‐BY-­‐SA,  @atreloar   32  
  33. 33. SoWware  obsolescence:  too  much  choice,  not   enough  support   July  10,  2014   CC-­‐BY-­‐SA,  @atreloar   33  
  34. 34. Abandonware   §  “Last  summer,  a  member  of  the  biology  department  of  the   University  of  Udine  in  Italy  approached  Nicola  Vitacolonna   with  an  intriguing  project.  The  ANREP  program,  which   annotates  structural  mo3fs  in  gene  or  protein  sequences,   was  out  of  date  having  been  wriben  more  than  a  decade   ago.  Although  s3ll  used  by  molecular  biologists,  its  slow   compu3ng  ability  meant  a  straighcorward  mul3ple  search   could  take  all  night  on  a  desktop  PC.  The  Udine  biologist   wanted  Vitacolonna,  a  postdoctoral  fellow  in   computa3onal  biology,  to  write  a  program  that  could  do   the  job  more  quickly.”   §  Sam  Jaffe,  Scien3sts  Abandon  their  SoWware,  The  Scien)st,  Feb  16,  2004   July  10,  2014   CC-­‐BY-­‐SA,  @atreloar   34  
  35. 35. File  format  obsolescence:  Illumina   §  Probability  of  error  in  basecalling  encoded  using  ascii   code  to  reduce  file  size   §  Meaning  of  the  ascii  code  changed  along  the  life  cycle   and  for  data  generated  at  different  3me  points  the   quality  might  be  encoded  differently   §  “If  you  get  an  error  like  "Invalid  quality  score  value",   your  fastq  file  probably  has  Sanger  (offset  33)  instead   of  Illumina  (ASCII  offset  64)  quality  scores.  You'll  need   to  add  the  op3on  "-­‐Q33"  to  your  FASTX  Toolkit   arguments”.  Obviously…   July  10,  2014   CC-­‐BY-­‐SA,  @atreloar   35  
  36. 36. Evereb  Rogers,  Diffusion  of  Innova)on,  1962   July  10,  2014   CC-­‐BY-­‐SA,  @atreloar   36  
  37. 37. Conclusions   §  Need  to  move  to  a  smaller  number  of  standard  file   formats   §  Need  to  move  to  a  more  sustainable  model  of   soWware  development  and  maintenance   §  Need  to  encourage  placorm  manufacturers  to   innovate  around  the  hardware,  not  the  soWware   §  NOTE:  other  disciplines  are  looking  to  lifesciences   to  work  out  how  to  solve  some  of  these  problems   July  10,  2014   CC-­‐BY-­‐SA,  @atreloar   37  
  38. 38. On  best  prac3ces  in  the  development  of   bioinforma3cs  soWware,  Front.  Genet.,  02  Jul  14   §  Source  code  available  to  reviewers   §  SoWware  indexed,  citable,  available   §  Source  code  documented   §  Source  code  managed   §  Test  libraries,  sample  data  and  dataset  repositories   available   July  10,  2014   CC-­‐BY-­‐SA,  @atreloar   38  
  39. 39. Ques3ons?   §  andrew.treloar@ands.org.au   §  @atreloar     §  hbps://www.slideshare.net/atreloar/the-­‐ lifesciences-­‐as-­‐a-­‐pathfinder-­‐in-­‐dataintensive-­‐ research-­‐prac3ce   July  10,  2014   CC-­‐BY-­‐SA,  @atreloar   39  

×