EuroSakai CLIF project presentation


Published on

A presentation given at the EuroSakai 2011 conference in Amsterdam on 27th September 2011. It covers the work of the CLIF project to investigate the management of the digital lifecycle across systems, using the integration of the Sakai collaboration and learning environment with the Fedora digital repository system as an exemplar.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • The lifecycle diagram that came out of REMAP, entirely contained within the repository.
  • This lifecycle is given as an example, to emphasise that there are multiple stages, but that CLIF was not in the business of creating its own version. A focus on the individual stages was found to be more beneficial as this allowed us to be generic in our approach to the integrations carried out (as the stages were common between them).
  • These characteristics also came from the literature review
  • The Sakai part of this architecture should have a box indicating use of the Hydranet code, as for SharePoint. This highlights that a common means of communicating with Fedora was found based on the Fedora web services and the construction of Hydra-compliant objects. In Sakai this interacted with the CHH, whilst in SharePoint it interacted with a couple of components – workflow, deposit and browse
  • Based on interviews and demonstrations with local academics and records managerEvaluation was related to how the digital content lifecycle could be managed bearing in mind the integrated functionality demonstrated
  • The reference to the structure of the content is a reference to the Hydra work and the benefit this brought the project.The last point refers to the situation of using content in situ in other systems, without moving it.
  • Please note that if there is nothing on the GitHub site yet, it will be soon!
  • EuroSakai CLIF project presentation

    1. 1. Enabling the digital content lifecycle: content flow between Sakai and Fedora<br />Chris Awre<br />Library and Learning Innovation<br />EuroSakai<br />Amsterdam, 27th September 2011<br />1<br />
    2. 2. CLIF Project<br />CLIF - Content Lifecycle Integration Framework<br />Funded by JISC<br /> 01 July 2009 – 31 March 2011<br />Project partners<br /> University of Hull<br /> King’s College London<br /> Centre for e-Research (CeRch)<br />2<br />
    3. 3. Background<br /><ul><li>CLIF is building on work within the JISC-funded RepoMMan and REMAP projects
    4. 4. In particular, REMAP explored how a repository could support records management and digital preservation as part of a lifecycle management approach for digital content
    5. 5. Previous work had sought to push the repository upstream in the workflow
    6. 6. Dilemma was that the repository risked becoming another content silo alongside other content management systems on campus (in our case, Sakai and SharePoint)
    7. 7. How can the repository become more integrated in the institutional environment?</li></ul>3<br />
    8. 8. Fedora<br /><ul><li>Powerful digital repository framework
    9. 9. Adopted at University of Hull in 2005
    10. 10. Live institutional repository since 2008
    11. 11. Developed and managed through DuraSpace
    12. 12. Strong community model, akin to Sakai
    13. 13. Features we like (the advert!)
    14. 14. Powerful digital object model
    15. 15. Extensible metadata management
    16. 16. Expressive inter-object relationships
    17. 17. Version management
    18. 18. Configurable security architecture</li></ul>4<br />
    19. 19. Local repository need<br /><ul><li>Scalable solution (not one that has upper limit)
    20. 20. Digital content is only going to grow
    21. 21. Standards-based (open standards where possible)
    22. 22. To provide a future-proof exit strategy
    23. 23. Content agnosticism
    24. 24. We don’t know what types of content may come along
    25. 25. Content semantics
    26. 26. Recording the relationships between different pieces of content supports future use and preservation </li></ul>5<br />
    27. 27. Other repository systems?<br /><ul><li>The focus of the work was based around systems that were in place at Hull
    28. 28. Other repository options were not actively considered
    29. 29. Following on from work looking at integration of DSpace and Sakai through CTREP project
    30. 30. Aimed to achieve the same end goal of seamless integration for Fedora
    31. 31. Regardless of the system, it is important to understand what you are trying to achieve in the management of content through integration
    32. 32. Repository choice driven by external factors of how repository management is carried out</li></ul>6<br />
    33. 33. CTREP<br /><ul><li>CTREP project was a JISC-funded project, 2007-9
    34. 34. Aimed to increase repository usage through integration within the LMS, using Sakai as the platform
    35. 35. Cambridge examined integration with DSpace
    36. 36. University of Highlands & Islands (UHI) examined integration with Fedora
    37. 37. Work focused on use of Sakai ContentHostingHandler
    38. 38. DSpace work successful, albeit that information being sent between the two was limited
    39. 39. Fedora work halted as it became clear that the version of Sakai CHH at the time was not able to deal with rich Fedora objects
    40. 40. Re-visiting this has been possible through Sakai developments
    41. 41. We are grateful to CTREP for pioneering this approach</li></ul>7<br />
    42. 42. Lifecycle<br />Lifecycle<br />management<br />within a<br />repository<br />8<br />Can this be<br />enabled across<br />systems?<br />
    43. 43. Lifecycle integration<br />9<br />Sakai<br />SharePoint<br />Repository<br />Content flows between systems according to need in lifecycle<br />
    44. 44. Sakai and content management<br /><ul><li>Content management for teaching & learning makes heavy use of the Resources tool
    45. 45. Some imaginative ways used for how content from here is used by other tools within the system
    46. 46. Content is also shared between sites, and staff are encouraged to make their content shareable
    47. 47. Focus of content management is to support use within Sakai
    48. 48. Focus is on Sakai, not the content
    49. 49. A content silo?
    50. 50. How could integration with a content store – a repository – enhance how Sakai manages and uses content?</li></ul>10<br />
    51. 51. CLIF project objectives<br /><ul><li>Understand how digital content can be managed across systems as part of the digital content lifecycle
    52. 52. Recognising that individual systems cannot always support the whole lifecycle from creation to preservation or deletion
    53. 53. Specifically investigate the role of repositories in the digital content lifecycle
    54. 54. Where is the repository best positioned within the lifecycle?
    55. 55. What roles can digital repositories play?
    56. 56. Understand how content will flow in and out of a repository as part of the lifecycle
    57. 57. CLIF has been agnostic about this</li></ul>11<br />
    58. 58. CLIF use cases I<br /><ul><li>Use cases cover research, teaching and administration
    59. 59. Based on interviews with staff at partner institutions
    60. 60. Academic staff (Head of Department / Senior Lecturer)
    61. 61. Records Manager
    62. 62. Research active staff
    63. 63. Interviews highlighted that staff were managing as best they could within single systems they were familiar with
    64. 64. Potential to exploit additional functionality in other systems welcomed</li></ul>12<br />
    65. 65. CLIF use cases II<br /><ul><li>Research
    66. 66. Capturing data produced through experimental equipment and archiving this for use in future work in the repository
    67. 67. Preparation of research outputs and archiving of these for dissemination
    68. 68. Teaching
    69. 69. Teaching materials accessed from within a repository to inform current courses
    70. 70. Exam papers created in one system and archived for future reference in the repository (marks could be archived for private access as well)
    71. 71. Administration
    72. 72. Committee papers circulated to committee members before a meeting are moved to the repository for wider access post-meeting</li></ul>13<br />
    73. 73. CLIF outputs<br /><ul><li>Literature review on managing the digital content lifecycle across systems
    74. 74. Technology integrations as exemplars of how a repository can support lifecycle management across systems
    75. 75. Fedora – Sakai integration
    76. 76. Fedora – SharePoint integration
    77. 77. Software available on GitHub
    78. 78. Technical appendix to final report describing architecture and implementation </li></ul>14<br />
    79. 79. A digital content lifecycle<br />15<br />There are many variations and<br />versions of lifecycle models<br /> - another is not required<br />Each has a number of stages<br />CLIF sought to capture use cases<br />that encompassed a number of<br />these stages and tested how they<br />could be managed across systems<br />© Digital Curation Centre<br />
    80. 80. Literature review<br /><ul><li>There was little literature directly addressing the system aspects of managing the digital content lifecycle
    81. 81. Work was focused within a system or was more architecture-based without addressing specific systems
    82. 82. Possibly due to flux in technology development
    83. 83. Terminology is key to addressing lifecycle management
    84. 84. There are many different lifecycles (knowledge, digitisation, metadata, etc.) that may overlap
    85. 85. Can be easier to break down the lifecycle into stages, many of which are common</li></ul>16<br />
    86. 86. Lifecycle characteristics<br /><ul><li>The use of standards can greatly ease movement between systems
    87. 87. cf. the use of the Hydra digital object approach
    88. 88. Policy is as important as technology in determining how different systems are used to manage a lifecycle
    89. 89. Digital preservation can be greatly supported if considered at the beginning of the lifecycle (as REMAP found)
    90. 90. There is a need to identify how people and roles fit into an overall lifecycle
    91. 91. It may be valuable to record information about the lifecycle itself as content moves, but this has resource implications
    92. 92. cf. the use of PREMIS events metadata recording what happens to an object</li></ul>17<br />
    93. 93. System overview<br />18<br />
    94. 94. Sakai – Fedora integration <br /><ul><li>Sakai 2.6.1
    95. 95. Fedora v3.4
    96. 96. Extends and enhances the JISC CTREP Fedora ContentHostingHandlerplugin
    97. 97. CHH is a pluggable provider model for hosting content
    98. 98. Content displayed in standard Sakai Resources Tool
    99. 99. Enabled and Configured by uploading a text file
    100. 100. Resources Tree view shows a ‘live view’ of a specific Fedora collection
    101. 101. ‘Show other sites’ allows files and/or nested folders to be copied/moved between MyWorkspace site and Fedora mounted site</li></ul>19<br />
    102. 102. .properties configuration file<br />20<br />
    103. 103. Sakai to Fedora<br />21<br />
    104. 104. Or…<br />22<br />Resources Tool<br />CHS API<br />BaseContentService<br />DBContentService<br />ContentHostingHandlerResolverImpl<br />ContentHostingHandlerImplFedora<br />
    105. 105. Linking Sakai and Fedora<br /><ul><li>Content held in Sakai and Fedora are held very differently
    106. 106. Sakai holds files
    107. 107. Fedora holds objects made up of a collection of datastreams, one of which is the file (others will contain metadata)
    108. 108. In linking Sakai and Fedora, three considerations needs to be addressed
    109. 109. Displaying Fedora objects in a tree structure and Fedora collections as folders</li></ul>Issue for security around the objects<br /><ul><li>Depositing a file in Fedora from Sakai requires a Fedora object with associated metadata to be created
    110. 110. Retrieving a file from Fedora for use in Sakai requires use of the search capability within Fedora</li></ul>23<br />
    111. 111. Lessons learned<br /><ul><li>SOAP messaging between the two systems made the link very slow
    112. 112. Due to use of HTTPS
    113. 113. Switching to HTTP improved performance and allowed easier debugging
    114. 114. Other performance improvements enabled included,
    115. 115. Caching of resources and folder objects
    116. 116. Minimising web service calls by sing one call to retrieve multiple properties
    117. 117. No pre-fetching of datastreams
    118. 118. The CHH code is over-complicated at times
    119. 119. Impact of changes at high level can be extensive lower down</li></ul>24<br />
    120. 120. Sakai – Fedora features<br /><ul><li>The repository is embedded as a set of resources that appear like any other set of resources
    121. 121. The majority of menu functions work in the same manner as with standard resources, e.g., upload, copy, paste, move, delete, create
    122. 122. This applies to folders as well as individual objects
    123. 123. Folders represent collection objects in the repository
    124. 124. Metadata can be captured in Sakai for use in Fedora (though Sakai is not able to re-use this when retrieving an object from Fedora)
    125. 125. User can browse Fedora collection (though not yet search)
    126. 126. User does not need to know they are working with the repository</li></ul>25<br />
    127. 127. Fedora 2<br /><ul><li>Very flexible – this has made exchanging objects between Fedora instances and between Fedora and other systems difficult
    128. 128. Common approach to structuring digital objects is required
    129. 129. Systems interacting with Fedora can build objects using this common approach
    130. 130. CLIF adopted the approach developed through the Hydra project
    131. 131.</li></ul>26<br />
    132. 132. Fedora 2 contd.<br /><ul><li>Common structuring/modelling approach allows for object metadata to be edited in the repository as part of their lifecycle management
    133. 133. Each object has:</li></ul>rightsmetadata<br /><ul><li>…and could have…</li></ul>descmetadata (using MODS)<br />contentmetadata<br />techmetadata<br />etc.<br /><ul><li>If Sakai can provide this</li></ul>27<br />
    134. 134. Copy/move to/from Repository<br />28<br />Copy & move folders/files between Fedora and MyWorkspace is easy ! Copy…<br />
    135. 135. Copy/move to/from Repository<br />…paste!<br />29<br />
    136. 136. It looks easy, but…<br />© 2008 Richard Green<br />… you don’t see what is going on underneath!<br />30<br />
    137. 137. Outstanding work<br /><ul><li>Managing versions from within Sakai, or accessing them, isn’t currently possible
    138. 138. Some of the commands under the Edit functionality have no current effect on the object in Fedora
    139. 139. The metadata captured is minimal, and Sakai cannot make use of metadata added within Fedora
    140. 140. Folders with large numbers of resources have a noticeable impact on performance when browsing or carrying out actions upon them</li></ul>31<br />
    141. 141. Evaluation<br /><ul><li>There needs to be a clear understanding and view about where the boundaries are between the different systems being used, to avoid confusion
    142. 142. There needs to be clarity over why different systems are being used, to overcome concerns about having to work with multiple systems
    143. 143. There is a need for better preservation and a recognition that integrating the repository could support this, but also a need to be clear about what needs preserving
    144. 144. There is benefit in being able to access other content stores from within your current working environment in order to see what is available more broadly</li></ul>32<br />
    145. 145. Sakai-repository evaluation<br /><ul><li>The seamless access was much valued
    146. 146. Having access to resources that could be used within Sakai was a valuable addition to being able to browse resources inside Sakai
    147. 147. Providing access to resources in context was considered very important, hence, linking to the files in the repository instead of copying them across may be preferred
    148. 148. Why create a copy if access is OK where the content is?
    149. 149. Reference or irregular content was considered to fit best into the model of access via repository
    150. 150. Bulk movement likely to be more useful than object by object movement</li></ul>33<br />
    151. 151. Sakai OAE<br /><ul><li>Focus on presentation of content in context
    152. 152. This tallies with findings in CLIF
    153. 153. Focus on use of APIs where available
    154. 154. Institutional repository systems are not so good at this
    155. 155. A challenge for these systems
    156. 156. Capturing annotations alongside original content would enhance archival records
    157. 157. Exporting multiple resources, as IMS CP or other, also a route for managing content across systems</li></ul>34<br />
    158. 158. Conclusions<br /><ul><li>Diverse content management systems can be effectively integrated to allow cross-system lifecycle management
    159. 159. Better adoption of interface standards would be helpful
    160. 160. Standardisation in the structure of the content being moved maximises how the content can be managed by the different systems
    161. 161. Where the repository is one of the systems involved its current primary role appears to be as a recipient of content (for preservation)
    162. 162. Perception that content in the repository can be used there without moving it into the other integrated systems </li></ul>35<br />
    163. 163. Demo<br />36<br />Copyright ©<br />
    164. 164. Thank you<br />Chris Awre –<br />Richard Green –<br />Andrew Thompson –<br />Simon Waddington –<br />Project website -<br />Project GitHub - and<br />Project final report -<br />37<br />