a centre of expertise in data curation and preservation




Moving the Repository upstream


                       Chris Rusbridge
                    ARROW Repositories day
                       14 October 2008
                                                                                                 Funded by:
  This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5
  UK: Scotland License. To view a copy of this license, visit http://creativecommons
  .org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard
  Street, 5th Floor, San Francisco, California, 94105, USA.
a centre of expertise in data curation and preservation




                   Contents
    •   The resistant scholar
    •   Researcher work flow
    •   On negative clicks…
    •   Can the repository help rather than hinder?
    •   Towards a Research Repository System?




2                   ARROW Repositories Day
a centre of expertise in data curation and preservation




        The resistant scholar
    • Edinburgh Research Archive has 1100+
      (publicly accessible) items
    • Edinburgh scholarly output?
      • Wet finger in the air: annual output ~ number of
        academics? (RAE every 4 years wants 4 papers)
      • Ie ~2500 papers per year!
    • So after 4 years we have <10% of output
    • A common story everywhere!


3                  ARROW Repositories Day
a centre of expertise in data curation and preservation




               Why is this so?
    • Uncertainty, risk?
        • About copyright
        • About Ingelfinger rule
    •   Change
    •   Too busy
    •   Doesn’t fit in the way they do things now
    •   Not well motivated by advantages to others
    •   Little in it for them!

4                    ARROW Repositories Day
a centre of expertise in data curation and preservation




       Researcher work flow?
    • Many projects/tasks in parallel
    • All different stages
    • Teaching (several), research (several), writing
      up research, writing grant proposals,
      reviewing papers, administrative tasks,
      University governance, etc




5                 ARROW Repositories Day
a centre of expertise in data curation and preservation




         Researcher work flow?
    •   Think up research idea
    •   Write grant proposal with colleagues
    •   Submit, wait, refine/revise, resubmit
    •   Hire/assign staff, plan project
    •   Gather data, analyse data, refine hypothesis
    •   Refine methods, more data etc
    •   Write draft paper with colleagues
    •   Refine, revise submit paper, repeat until successful
    •   New directions, more data, new paper
    •   Conference presentations, discussion, new research
        ideas…
6                     ARROW Repositories Day
a centre of expertise in data curation and preservation




    Who are you working with?
    •   Your group
    •   Your department
    •   Other departments in your university
    •   Colleagues elsewhere worldwide
        • Often more of the latter than the former!
        • Wide variety of IT environments




7                     ARROW Repositories Day
a centre of expertise in data curation and preservation




         Write paper work flow?
    • PI and co-PIs outline structure
    • PI assign sections to colleagues
    • Gather sections, edit, circulate
    • Identify weaknesses, gather more data
    • Select & organise citations, images, tables, graphs,
      supplementary data
    • Comment, revise, circulate, repeat until deadline
    • Submit, wait
    • Revise following review, circulate, resubmit
        • By now working on other research!
    • It’s published! Add to bibliography…

8                      ARROW Repositories Day
a centre of expertise in data curation and preservation




       When do you submit to
           repository?
    • There’s no obvious point
    • It’s always extra work
    • Any doubt, uncertainty, distraction enough to
      put it off
    • “The library” wants you to do it? Sure, RSN

    • The repository doesn’t help, it hinders


9                  ARROW Repositories Day
a centre of expertise in data curation and preservation




            On negative clicks
     • Research in Glasgow for Effective Records
       Management project (JISC-funded)
       • Currall, Johnson, Johnston, Moss, Richmond
     • “How many extra clicks are you willing to
       make to ensure preservation of the records
       you are creating?”
       • Answer: zero
     • Design goal follows: reduce work for clerks
       (fewer clicks) AND ensure preservation


10                 ARROW Repositories Day
a centre of expertise in data curation and preservation




     How could a repository help?
     •   Support the research
     •   Support the researchers
     •   Support the writing
     •   Support the publishing
     •   Be a natural part of the work flow…

     • How could we do that?


11                   ARROW Repositories Day
a centre of expertise in data curation and preservation




      Negative click repository?
     • Could a repository reduce the workload of
       researchers?
       • Perhaps not on its own…
     • What would be needed to make things
       easier?
       • Maybe a system and services with the repository
         embedded?
       • Research Repository System



12                  ARROW Repositories Day
a centre of expertise in data curation and preservation




 Some known research issues
     • Extended teams across institutions, even legal
       jurisdictions
        • Varying technology: Windows/Mac/Linux versions, MS Office,
          OpenOffice, LaTeX, EndNote, BibTeX, etc
        • Localised, segregated identity management
        • Informal extranets
     • Distrust of anyone who (ever!) adds complexity or
       difficulty, even apparent
        • University, IT dept, Library, School…
     • Individualist local IT management
        •   Backup, version control, security, patching
        •   Data quantity, quality, provenance, metadata, version, sharing
        •   Analytic software version
        •   Lab notebooks…
13                        ARROW Repositories Day
a centre of expertise in data curation and preservation




           How could “we” help?
     • Who are “we”?
       •   Repository managers
       •   In/with the Library
       •   And IT Services
       •   Backed by the administration




14                    ARROW Repositories Day
a centre of expertise in data curation and preservation




              Maybe we could…
     • Help with publisher liaison
     • Support multiple authoring across several institutions
         • More permissive identity management/extranet
     • Support multiple versions
         • Fine-grained access control
         • Checkpointing
     •   Support supplementary data
     •   Provide basic data management capability
     •   Provide simple, cross-platform, persistent storage
     •   Provide some longevity
     •   Provide additional benefits

15                      ARROW Repositories Day
a centre of expertise in data curation and preservation




16   ARROW Repositories Day
a centre of expertise in data curation and preservation




                  Exposure
     • Digital Curation Blog posts
       • Comments and feedback
     • JISC Repository Ideascale discussions
       • Comments, feedback, voting
     • Blue Ribbon Task Force Ideascale
       discussions
       • voting




17                 ARROW Repositories Day
a centre of expertise in data curation and preservation




18   ARROW Repositories Day
a centre of expertise in data curation and preservation




               Some comments…
     •   Need to be careful of doing this as if it complicates the workflow, it just
         won't happen
     •   I think the RRS you envisage sounds fantastic and would be a 'good
         thing', what worries me is the 'function creep' taking us a few miles on
         from some of the more basic, simpler 'few keystrokes' approach…
     •   The RRS sounds to have many features of a Virtual Research
         Environment, albeit perhaps a less data centric VRE
     •   The availability of research via Open Access would increase if the same
         systems that provide Open Access also provided, or were integrated
         with, tools which support the authoring process
     •   My experience is, that feature requests of this sort [authoring support]
         are exactly the ones which end up in the "users didn't know what they
         wanted bin" (I'm a developer).
     •   A system to help streamline the editorial/publication process is fine (if
         we can persuade academics to use it)


19                           ARROW Repositories Day
a centre of expertise in data curation and preservation




                       CRIS
     • Several spotted that Current Research
       Information Systems can provide “fill the
       blanks” metadata
       •   Reduces workload
       •   Provides context
       •   Supports research disclosure
       •   Needs administration support




20                    ARROW Repositories Day
a centre of expertise in data curation and preservation




Is there something useful here?



     • DISCUSS!




21                ARROW Repositories Day
a centre of expertise in data curation and preservation




           Thank you
     c.rusbridge@ed.ac.uk




22    ARROW Repositories Day

Moving the repository upstream

  • 1.
    a centre ofexpertise in data curation and preservation Moving the Repository upstream Chris Rusbridge ARROW Repositories day 14 October 2008 Funded by: This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 2.5 UK: Scotland License. To view a copy of this license, visit http://creativecommons .org/licenses/by-nc-sa/2.5/scotland/ ; or, (b) send a letter to Creative Commons, 543 Howard Street, 5th Floor, San Francisco, California, 94105, USA.
  • 2.
    a centre ofexpertise in data curation and preservation Contents • The resistant scholar • Researcher work flow • On negative clicks… • Can the repository help rather than hinder? • Towards a Research Repository System? 2 ARROW Repositories Day
  • 3.
    a centre ofexpertise in data curation and preservation The resistant scholar • Edinburgh Research Archive has 1100+ (publicly accessible) items • Edinburgh scholarly output? • Wet finger in the air: annual output ~ number of academics? (RAE every 4 years wants 4 papers) • Ie ~2500 papers per year! • So after 4 years we have <10% of output • A common story everywhere! 3 ARROW Repositories Day
  • 4.
    a centre ofexpertise in data curation and preservation Why is this so? • Uncertainty, risk? • About copyright • About Ingelfinger rule • Change • Too busy • Doesn’t fit in the way they do things now • Not well motivated by advantages to others • Little in it for them! 4 ARROW Repositories Day
  • 5.
    a centre ofexpertise in data curation and preservation Researcher work flow? • Many projects/tasks in parallel • All different stages • Teaching (several), research (several), writing up research, writing grant proposals, reviewing papers, administrative tasks, University governance, etc 5 ARROW Repositories Day
  • 6.
    a centre ofexpertise in data curation and preservation Researcher work flow? • Think up research idea • Write grant proposal with colleagues • Submit, wait, refine/revise, resubmit • Hire/assign staff, plan project • Gather data, analyse data, refine hypothesis • Refine methods, more data etc • Write draft paper with colleagues • Refine, revise submit paper, repeat until successful • New directions, more data, new paper • Conference presentations, discussion, new research ideas… 6 ARROW Repositories Day
  • 7.
    a centre ofexpertise in data curation and preservation Who are you working with? • Your group • Your department • Other departments in your university • Colleagues elsewhere worldwide • Often more of the latter than the former! • Wide variety of IT environments 7 ARROW Repositories Day
  • 8.
    a centre ofexpertise in data curation and preservation Write paper work flow? • PI and co-PIs outline structure • PI assign sections to colleagues • Gather sections, edit, circulate • Identify weaknesses, gather more data • Select & organise citations, images, tables, graphs, supplementary data • Comment, revise, circulate, repeat until deadline • Submit, wait • Revise following review, circulate, resubmit • By now working on other research! • It’s published! Add to bibliography… 8 ARROW Repositories Day
  • 9.
    a centre ofexpertise in data curation and preservation When do you submit to repository? • There’s no obvious point • It’s always extra work • Any doubt, uncertainty, distraction enough to put it off • “The library” wants you to do it? Sure, RSN • The repository doesn’t help, it hinders 9 ARROW Repositories Day
  • 10.
    a centre ofexpertise in data curation and preservation On negative clicks • Research in Glasgow for Effective Records Management project (JISC-funded) • Currall, Johnson, Johnston, Moss, Richmond • “How many extra clicks are you willing to make to ensure preservation of the records you are creating?” • Answer: zero • Design goal follows: reduce work for clerks (fewer clicks) AND ensure preservation 10 ARROW Repositories Day
  • 11.
    a centre ofexpertise in data curation and preservation How could a repository help? • Support the research • Support the researchers • Support the writing • Support the publishing • Be a natural part of the work flow… • How could we do that? 11 ARROW Repositories Day
  • 12.
    a centre ofexpertise in data curation and preservation Negative click repository? • Could a repository reduce the workload of researchers? • Perhaps not on its own… • What would be needed to make things easier? • Maybe a system and services with the repository embedded? • Research Repository System 12 ARROW Repositories Day
  • 13.
    a centre ofexpertise in data curation and preservation Some known research issues • Extended teams across institutions, even legal jurisdictions • Varying technology: Windows/Mac/Linux versions, MS Office, OpenOffice, LaTeX, EndNote, BibTeX, etc • Localised, segregated identity management • Informal extranets • Distrust of anyone who (ever!) adds complexity or difficulty, even apparent • University, IT dept, Library, School… • Individualist local IT management • Backup, version control, security, patching • Data quantity, quality, provenance, metadata, version, sharing • Analytic software version • Lab notebooks… 13 ARROW Repositories Day
  • 14.
    a centre ofexpertise in data curation and preservation How could “we” help? • Who are “we”? • Repository managers • In/with the Library • And IT Services • Backed by the administration 14 ARROW Repositories Day
  • 15.
    a centre ofexpertise in data curation and preservation Maybe we could… • Help with publisher liaison • Support multiple authoring across several institutions • More permissive identity management/extranet • Support multiple versions • Fine-grained access control • Checkpointing • Support supplementary data • Provide basic data management capability • Provide simple, cross-platform, persistent storage • Provide some longevity • Provide additional benefits 15 ARROW Repositories Day
  • 16.
    a centre ofexpertise in data curation and preservation 16 ARROW Repositories Day
  • 17.
    a centre ofexpertise in data curation and preservation Exposure • Digital Curation Blog posts • Comments and feedback • JISC Repository Ideascale discussions • Comments, feedback, voting • Blue Ribbon Task Force Ideascale discussions • voting 17 ARROW Repositories Day
  • 18.
    a centre ofexpertise in data curation and preservation 18 ARROW Repositories Day
  • 19.
    a centre ofexpertise in data curation and preservation Some comments… • Need to be careful of doing this as if it complicates the workflow, it just won't happen • I think the RRS you envisage sounds fantastic and would be a 'good thing', what worries me is the 'function creep' taking us a few miles on from some of the more basic, simpler 'few keystrokes' approach… • The RRS sounds to have many features of a Virtual Research Environment, albeit perhaps a less data centric VRE • The availability of research via Open Access would increase if the same systems that provide Open Access also provided, or were integrated with, tools which support the authoring process • My experience is, that feature requests of this sort [authoring support] are exactly the ones which end up in the "users didn't know what they wanted bin" (I'm a developer). • A system to help streamline the editorial/publication process is fine (if we can persuade academics to use it) 19 ARROW Repositories Day
  • 20.
    a centre ofexpertise in data curation and preservation CRIS • Several spotted that Current Research Information Systems can provide “fill the blanks” metadata • Reduces workload • Provides context • Supports research disclosure • Needs administration support 20 ARROW Repositories Day
  • 21.
    a centre ofexpertise in data curation and preservation Is there something useful here? • DISCUSS! 21 ARROW Repositories Day
  • 22.
    a centre ofexpertise in data curation and preservation Thank you c.rusbridge@ed.ac.uk 22 ARROW Repositories Day