Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The life-sciences as a pathfinder in data-intensive research practice

738 views

Published on

Presentation given at UQ Winterschool 2014. The advent of the Internet is bringing about fundamental changes in the ways that research is performed and communicated. These have been particularly driven by the growing importance of data, as well as the tools available to work with this data. This presentation will examine this shift, drawing on examples from the life‐sciences, and try to make some predictions about the next five years.

Published in: Science, Technology
  • Be the first to comment

  • Be the first to like this

The life-sciences as a pathfinder in data-intensive research practice

  1. 1. The life-sciences as a pathfinder in data- intensive research practice Dr Andrew Treloar, Director of Technology 11 July 2014 CC-BY-SA, @atreloar 1
  2. 2. Structure presentation  Research Lifecycles  Functions of Scholarly Communication  Pointers to the future  Characterising the future  Pathfinder problems  Conclusions 11 July 2014 CC-BY-SA, @atreloar 2
  3. 3. So many lifecycles… 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 3
  4. 4. Minimal Research Lifecycle Think DoShare 11 July 2014 CC-BY-SA, @atreloar 4
  5. 5. Sharing: Scholarly Communication System and its Functions  Registration  Certification  Awareness  Archiving (Rosendaal and Geurts, 1997) 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 5
  6. 6. System of Journals  Registration  submission of manuscript  Certification  peer-review (pre-publication)  commentary (post-publication)  Awareness  discovery services  Archiving  libraries (print)  publishers (electronic)  special purpose organisations (e.g. Portico) 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 6
  7. 7. Pointers to the future “the future is already here – it’s just not very evenly distributed” William Gibson, NPR interview 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 7
  8. 8. Registration: BioRxiv 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 8
  9. 9. Registration: Github 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 9
  10. 10. Registration: WikiPathways 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 10
  11. 11. Registration: NeuroLex 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 11
  12. 12. Registration: Nanopublications 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 12
  13. 13. Registration: some observations  Decoupling registration from certification  Timestamping, versioning  Registration of various types of objects  Machines as creators and contributors 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 13
  14. 14. Certification: PubMed Commons 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 14
  15. 15. Certification: PubPeer 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 15
  16. 16. Certification: Publons 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 16
  17. 17. Certification: some observations  Peer-review decoupled from publication process  Certification of various types of objects  Machines validating form  Social endorsement 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 17
  18. 18. Awareness: myExperiment 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 18
  19. 19. Awareness: eLabNotebook RSS 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 19
  20. 20. Awareness: Twitter 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 20
  21. 21. Awareness: some observations  Awareness for various types of objects  Real time awareness  Awareness support targeted at machines  Awareness through social media 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 21
  22. 22. Archiving: PDB 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 22
  23. 23. Archiving: GenBank 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 23
  24. 24. Characterising the future Fixed Varying Discrete Continuous Hidden VisibleResearch Process Nature of object Process of making public Speed of communicationDelayed Instant Atomic CompoundAtomicity of object Communicated object Publication +data proxies Publication + linked data + linked models Formal InformalNature of process11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 24
  25. 25. Fundamental changes  The research process (objects, social dimension) is becoming more exposed  Articles, books are no longer the only relevant objects for research communication  Objects are no longer static  Machines are joining humans as (co- )creators and consumers of research objects 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 25
  26. 26. Pathfinder problems  Integrity of the scholarly record  The three obsolescences  hardware  file format  software 11 July 2014 CC-BY-SA, @atreloar 26
  27. 27. System of Journals: Archiving 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 27
  28. 28. Web of Objects: Archiving? 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 28
  29. 29. Not just citation relationships 11 July 2014 CC-BY-SA, @hvdsomp and @atreloar 29
  30. 30. The problem of obsolescence  Lifescience research environment can be viewed as undergoing a process of accelerated evolution  Other disciplines will hit these problems in time 11 July 2014 CC-BY-SA, @atreloar 30
  31. 31. Cambrian explosion 11 July 2014 31
  32. 32. Hardware obsolescence: Roche 454 11 July 2014 CC-BY-SA, @atreloar 32
  33. 33. Software obsolescence: too much choice, not enough support 11 July 2014 CC-BY-SA, @atreloar 33
  34. 34. Abandonware  “Last summer, a member of the biology department of the University of Udine in Italy approached Nicola Vitacolonna with an intriguing project. The ANREP program, which annotates structural motifs in gene or protein sequences, was out of date having been written more than a decade ago. Although still used by molecular biologists, its slow computing ability meant a straightforward multiple search could take all night on a desktop PC. The Udine biologist wanted Vitacolonna, a postdoctoral fellow in computational biology, to write a program that could do the job more quickly.”  Sam Jaffe, Scientists Abandon their Software, The Scientist, Feb 16, 2004 11 July 2014 CC-BY-SA, @atreloar 34
  35. 35. File format obsolescence: Illumina  Probability of error in basecalling encoded using ascii code to reduce file size  Meaning of the ascii code changed along the life cycle and for data generated at different time points the quality might be encoded differently  “If you get an error like "Invalid quality score value", your fastq file probably has Sanger (offset 33) instead of Illumina (ASCII offset 64) quality scores. You'll need to add the option "-Q33" to your FASTX Toolkit arguments”. Obviously… 11 July 2014 CC-BY-SA, @atreloar 35
  36. 36. Everett Rogers, Diffusion of Innovation, 1962 11 July 2014 CC-BY-SA, @atreloar 36
  37. 37. Conclusions  Need to move to a smaller number of standard file formats  Need to move to a more sustainable model of software development and maintenance  Need to encourage platform manufacturers to innovate around the hardware, not the software  NOTE: other disciplines are looking to lifesciences to work out how to solve some of these problems 11 July 2014 CC-BY-SA, @atreloar 37
  38. 38. On best practices in the development of bioinformatics software, Front. Genet., 02 Jul 14  Source code available to reviewers  Software indexed, citable, available  Source code documented  Source code managed  Test libraries, sample data and dataset repositories available 11 July 2014 CC-BY-SA, @atreloar 38
  39. 39. Questions?  andrew.treloar@ands.org.au  @atreloar  https://www.slideshare.net/atreloar/the- lifesciences-as-a-pathfinder-in-dataintensive- research-practice 11 July 2014 CC-BY-SA, @atreloar 39

×