Towards a global open scientific
   notebook infrastructure
      Jeremy Frey, Andrew Milsted,
         Simon Coles, Colin Bird,
   Cerys Willoughby, Cameron Neylon &
              Matthew Todd
Science is
      Science is
   increasingly
    increasingly
interdisciplinary
 interdisciplinary
Infrastructures - Architecture




Collaboration
 Collaboration

  Sharing
   Sharing                                        Curation
                                                   Curation

   Reuse
    Reuse
Comparison with
 Comparison with
traditional paper
 traditional paper
   notebooks
    notebooks
                                   •• Higher Quality Record
                                       Higher Quality Record
                                   •• Natural linking to data and external
                                       Natural linking to data and external
                                      resources
                     Electronic
                      Electronic       resources
                                   •• Easier Collaboration
                                       Easier Collaboration
                     Laboratory
                      Laboratory   •• Improved planning
                                       Improved planning
                     Notebooks
                      Notebooks    •• Improved discussions
                                       Improved discussions
                                   •• Efficiency gain in production of
                                       Efficiency gain in production of
                                      presentations/reports
                                       presentations/reports
                       ELNs
                       ELNs        •• Change the nature of
                                       Change the nature of
Communication
 Communication                        Professor/Student interactions
                                       Professor/Student interactions
 Collaboration
  Collaboration
    Sharing
     Sharing
    Linking
     Linking
   Curating
    Curating
Commercial offerings
                          Commercial offerings

                        Web 2.0
                        Web 2.0

  Developments in                   LabTrove
ELN implementation
                        Smart Tea

 and characteristics                        Semantics


              PNNL                         User focus



                                        Collaboration
 RS/1

                                      Trust in ELNs for
                                       IP compliance

1980          1990     2000                2010
The LabTrove story




  http://www.labtrove.org
How do we
                                 If you can't describe what
communicate?                     you are doing as a process,
                                 you don't know what
• Surprisingly difficult to      you're doing.
                                 W. Edwards Deming
  explain what a process
  involves
• Much of the detail is
  assumed to be understood
  and not explicitly discussed     Growing need for the
                                   global (virtual)
• This is where the mis-
                                   equivalent of the
  understandings usually           “Tea Room”
  arise.
LabTrove: Easy Communication
AutoTrove from Matlab




              Computational processes also blog
BlogMyData Project - Godiva
LabTrove Open Notebooks Mat Todd’s PZQ Project
Open Notebooks
• Troves can be open Read/Comment/Write
  – Can control this access so it is your choice
• All contributions attributable (login needed)
  – Anonymous contributions not usually enabled
• Open contribution does worry the IT services
  – Provides potential pathway for abuse of systems
  – Not just our systems
Global open scientific notebook
           infrastructure
• Global collaboration:
  – International
  – Interdisciplinary
• Open science

• To ascend the knowledge pyramid, we need
  open collaboration and sharing of results
We must speed up the knowledge discovery process




   All I am saying is that now is the time to
   develop the technology to deflect an asteroid

RDAP13 Cerys Willoughby: Towards a global open scientific notebook infrastructure

  • 1.
    Towards a globalopen scientific notebook infrastructure Jeremy Frey, Andrew Milsted, Simon Coles, Colin Bird, Cerys Willoughby, Cameron Neylon & Matthew Todd
  • 2.
    Science is Science is increasingly increasingly interdisciplinary interdisciplinary
  • 3.
    Infrastructures - Architecture Collaboration Collaboration Sharing Sharing Curation Curation Reuse Reuse
  • 4.
    Comparison with Comparisonwith traditional paper traditional paper notebooks notebooks •• Higher Quality Record Higher Quality Record •• Natural linking to data and external Natural linking to data and external resources Electronic Electronic resources •• Easier Collaboration Easier Collaboration Laboratory Laboratory •• Improved planning Improved planning Notebooks Notebooks •• Improved discussions Improved discussions •• Efficiency gain in production of Efficiency gain in production of presentations/reports presentations/reports ELNs ELNs •• Change the nature of Change the nature of Communication Communication Professor/Student interactions Professor/Student interactions Collaboration Collaboration Sharing Sharing Linking Linking Curating Curating
  • 5.
    Commercial offerings Commercial offerings Web 2.0 Web 2.0 Developments in LabTrove ELN implementation Smart Tea and characteristics Semantics PNNL User focus Collaboration RS/1 Trust in ELNs for IP compliance 1980 1990 2000 2010
  • 6.
    The LabTrove story http://www.labtrove.org
  • 7.
    How do we If you can't describe what communicate? you are doing as a process, you don't know what • Surprisingly difficult to you're doing. W. Edwards Deming explain what a process involves • Much of the detail is assumed to be understood and not explicitly discussed Growing need for the global (virtual) • This is where the mis- equivalent of the understandings usually “Tea Room” arise.
  • 8.
  • 9.
    AutoTrove from Matlab Computational processes also blog
  • 10.
  • 12.
    LabTrove Open NotebooksMat Todd’s PZQ Project
  • 14.
    Open Notebooks • Trovescan be open Read/Comment/Write – Can control this access so it is your choice • All contributions attributable (login needed) – Anonymous contributions not usually enabled • Open contribution does worry the IT services – Provides potential pathway for abuse of systems – Not just our systems
  • 15.
    Global open scientificnotebook infrastructure • Global collaboration: – International – Interdisciplinary • Open science • To ascend the knowledge pyramid, we need open collaboration and sharing of results
  • 16.
    We must speedup the knowledge discovery process All I am saying is that now is the time to develop the technology to deflect an asteroid

Editor's Notes

  • #2 Talk will discuss applications of work originated in Southampton on development of electronic laboratory notebooks to support collaborative investigations and illustrated by work undertaken at Southampton, the ISIS neutron facility (Neylon) and University of Sydney (Todd). Work comes out of the e-Science funding (CombeChem Project) from the UK RCUK (Research Councils UK) [e-Science maps to Cyber-Infrastructure in the USA] further developed by funding from the Universities Modernization Fund, collaborative R&D between chemistry, computer science and library.
  • #3 Open Access debate has been high profile, but primarily and economic argument, from our perspective the question would be open access to what and we are interested in the access to the data! Thus the role of data management plans. The Royal Society report is key as it stresses that access to the data is essential for the whole basis of science to enable other researchers to build on the published work which is must harder and can be impossible if the data is not available (and easier if freely available) but only if the data is comprehensible so intelligent access is highlighted as necessary (i.e. importance of metadata).
  • #4 Infrastructure needs to support the collection and curation of data for high quality dissemination with context and provenance. Infrastructure parallels the DIKW Data, Information, Knowledge, Wisdom hierarchy.
  • #5 Having the ELN leads to changes in behaviour.
  • #6 Development of the ELNs trade off in effort devoted to Semantics, Usability and IP building these up over time, showing our Smart Tea and LabTrove projects
  • #7 The LabTrove system – designed to be quite easy to use for open and closed projects, allow & encourage use of metadata but not require or enforce – approach needed for adoption. Open Source software, with hosting and advice services.
  • #8 Skip this slide – LabTrove was further developed under the SRF project
  • #9 Process is important! As important as the Data. Need to describe as we can’t all “visit” – global tea room [Chemists are big on tea rooms]
  • #10 Images important, able to sketch comment as well as text comment, highly linked notes. For example a record (post) about a substrate, can then trace what processes used this substrate and what results were then produced, so if it transpires there was an issue with the material then the consequences can be readily traced.
  • #11 Computational processes can “blog” as well. A Matlab script can be run from a publish script so that all aspects data, code, figures output are all added to a Trove to give full provenance of a figure/result so a clear reord is kept of what material generated what outputs. Very useful once students have left and figures need modifying for a paper
  • #12 Comments on computational models – in this case GODIVA is a way to show ocean models over the web (University of Reading) and with LabTrove added people can comment on geo-coded regions of the models results and have the video in the post – metadata taken from the models and put in the Trove.
  • #13 Just shows the use in the x-ray project… computationally intensive image reconstruction in a complex, multi-disciplinary project, use of timelines, I have this to show that my work is grounded in physical science as well as computer science. You may want to stress your background in usability which is as we know so important to actually making this all work
  • #14 Examples from USyd of the Open Notebook science use in malaria drugs. Enables global collaboration, link back to notebook from the publications, has industrial participation, links with other platforms (wiki etc). Pictures of the research are really useful.
  • #15 Social media to disseminate open research, links to Twitter, and perhaps Facebook etc, make sure metadata is good enough for search engines to find, perhaps need some specialist metadata for research findings, researcher and funder ids are certainly useful!
  • #16 Attribution requires similar infrastructure to security, so switching between Open Notebook Science, Open on Publication, Closed (i.e. industrially funded private research) is not hard:- in industry the work my not be public but often does need to be shared within the company, so similar issues to Open Science apply.
  • #17 Well more rapidly and more efficiently, but is viewed by many as a problem when it comes to establishing reputation and advancement in career or potential financial gain, but open does not mean free, perhaps free at the point of use, but someone has paid for the work and is paying to maintain the access. Could comment on the collective action of the long tail of laboratory science needs the global collaboration that semantics + the web (not necessarily the formal semantic web) provides.
  • #18 Attitudes to undertaking research need to change so that when data is collected the assumption is that it will be shared (at some point) and that collaboration is essential for rapid progress – don ’ t wait until it is right before you share at least with your collaborators, something students seem to resist not understanding that share and discuss is the best way to find out what is right.