Web 2.0 and repositories - have we got our repository architecture right?

  • 12,269 views
Uploaded on

A presentation given at the Talis Xiphos Research Day, 10 June 2008.

A presentation given at the Talis Xiphos Research Day, 10 June 2008.

More in: Business , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Great share.
    Are you sure you want to
    Your message goes here
  • This is a very thoughtful and interesting presentation. I really like your point about replicating in IRs what we have done with collections in print. I currently intern at an IR, and am learning a lot about the process and the politics, and slideshows like this really open up my eyes to a few things.

    Thanks for creating this!
    Are you sure you want to
    Your message goes here
  • Thanks for sharing. Cool some interesting input for my workshop at OAI6 Where I ask: 'What can repositories learn from the web 2.0?'
    Are you sure you want to
    Your message goes here
  • This is so spot on. It is kind of embarrassing to admit as someone who has been in the 'repository' business far too long, but it wasn't until last year that I stopped thinking of 'search engine optimization' as the dirty words they are so often understood as.
    Are you sure you want to
    Your message goes here
  • Images used in this slide show:

    http://www.flickr.com/photos/striatic/729822/
    http://www.flickr.com/photos/estherase/128983854/
    http://www.flickr.com/photos/bwr/327994546/
    http://www.flickr.com/photos/pbo31/96243148/
    http://www.flickr.com/photos/dullhunk/303503677/
    http://www.flickr.com/photos/good_day/212468675/
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
12,269
On Slideshare
0
From Embeds
0
Number of Embeds
10

Actions

Shares
Downloads
341
Comments
6
Likes
38

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Web 2.0 and repositories… … have we got our repository architecture right?
  • 2. Outline
    • where are we now?
    • what’s wrong with where we are now?
    • what can we do about it?
    • do we need a new vision?
  • 3. Where are we now?
    • where are we now?
  • 4. What is a repository? a university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution . … An institutional repository is not simply a fixed set of software and hardware (Cliff Lynch, 2003)
  • 5. Repository “doing” words
    • manage
    • deposit
    • disclose
    • make openly available
    • curate
    • preserve
  • 6. Repository content
    • all sorts… but most “academic” focus currently on
      • scholarly publications
      • learning objects
      • research data
  • 7. Repository content
    • all sorts… but most “academic” focus currently on
      • scholarly publications
      • learning objects
      • research data
    • this talk focuses on the first of these, but with the intention that most of what I say will be generic
  • 8. Repository architecture
    • largely institutional focus though some exceptions – arXiv, RePEC, JORUM, etc.
    • interoperability through centralised aggregators (national and global)
      • search services (OAIster, Intute, …)
      • registries (DOAR, ROAR, …)
    • harvesting metadata about content using OAI-PMH (metadata = simple Dublin Core)
    • content = PDF
    • SWORD as deposit API
  • 9. What’s “wrong” with where we are now?
    • what’s “ wrong ” with where we are now?
  • 10. #1 We talk about “repositories”…
  • 11. …rather than “the Web” a focus on ‘ making content available on the Web’ would be more intuitive to researchers
  • 12. Whatever happened to the CMS?
    • a focus on ‘ content management ’ would change our emphasis
    • OAI-PMH out…
    • search engine optimisation, usability, accessibility, Web design, tagging, information architecture, cool URIs in…
  • 13. #2 We don’t emphasise…
    • Google indexing
    • RSS feeds
    • widget technology – embedding functionality into other sites
  • 14. #3 Our focus is on sharing metadata…
    • … even though we have full-text to share
    • worse… the full-text we share tends to be PDF rather than native Web format
      • the Web equivalent of a cul de sac
    • and the metadata we share tends to be “simple Dublin Core”
      • little consistency in approaches to describing ‘files’ vs. ‘documents’
      • little consistency in naming authors and subjects
      • ultimately, it is both too simple and too complex!
  • 15. #4 We ignore the Web Architecture
    • we have tended to adopt service oriented approaches
    • in line with long tradition from Z39.50 to SOAP/WSDL
      • e.g. JISC eFramework
    • focus is on building “services on content” rather than on the “content”
    pbo31 @ flickr
  • 16. REST is good
    • we don’t tend to adopt a resource oriented approach
    • we don’t adopt REST – an architectural style with a focus on resources, their identifiers (e.g. URIs), and a simple uniform set of operations that each resource supports (e.g. GET, PUT, POST, DELETE)
    • we don’t encourage a Web style “follow your nose” approach
  • 17. #5 We are antisocial…
    • … at least, we tend to treat “content” in isolation from the “social networks” that need to grow around that content
    • successful “repositories” (Flickr, YouTube, Slideshare, etc.) promote the social activity that takes place around content as well as the content management and disclosure activity
      • friends, groups, social tagging, comments, embedding, re-purposing, etc.
  • 18. But not just about functionality…
    • the institutional approach has fundamental mismatch with the real-life social networks adopted by researchers
      • subject-based
      • cross-institutional
      • global
    • while institutional approach is good from perspective of institutional management, preservation, etc.
    • globally “concentrated” repositories might better reflect the social networks that need to arise
  • 19. The net effect…
    • … is that there is no net effect
    • repositories remain uncompelling places to disclose scholarly publications from POV of the researcher
    • perceived cost of deposit remains higher than perceived benefits
    • we resort to institutional or funder mandates, “thou shalt deposit”, to fill what would otherwise remain empty
  • 20. Wait just a minute…
    • didn’t we used to have globally “concentrated” repository services?
    • arXiv – the first Web 2.0 service?
    • invented before the Web
    • unfortunately, also invented before Amazon S3
    • i.e. before we knew how to scale things
  • 21. Wait just another minute…
    • … doesn’t the blogsphere successfully layer a set of globally concentrated services over a distributed network of content?
      • e.g. Technorati
    • yes… but…
    • the content is under the control of ‘individuals’ rather than ‘institutions’, and…
    • the interoperability “glue” (RSS and tagging) is very lightweight and RESTful
  • 22. Having the conversation is hard
    • highly political space
    • strong “open access” voices who, understandably, don’t want their agenda de-railed by discussion about
      • preservation
      • search engine optimisation
      • Web 2.0
      • social networks
      • semantic Web
      • the future of peer review
    • it can be hard to get the conversation started
  • 23. What can we do about it?
    • what can we do about it?
  • 24. Things can go two ways… I think that things can go two ways… The Web 2.0 Way or The Semantic Web Way … possibly both
  • 25. Things can go two ways… what would a Web 2.0 repository look like?
  • 26. Like this?
  • 27. A Web 2.0 repository?
    • high-quality browser-based document viewer (not Acrobat!)
    • tagging, commentary, more-like-this, favorites, …
    • persistent (cool) URIs to content
    • ability to form simple social groups
    • ability to embed documents in other Web sites
    • high visibility to Google
    • offer RSS as primary API
    • use of Amazon S3 to cope with scalability
  • 28. In short… we go “simple”
    • we develop simple(ish) repositories
    • and complex aggregators and search engines
    • RSS/Atom as primary “glue”
    • social tagging as “description”
    • full-text indexing
    • microformats
    • Google Sitemaps to guide harvesters to content
    • complex functional requirements (e.g. author disambiguation) either ignored or met thru complexity in aggregators
  • 29. Alternatively… we go “complex”
    • …we look to the Semantic Web
    • we create and share much richer metadata about scholarly publications than we do currently
    • we explicitly model complexity (a la FRBR)
    • and aggregations
    • we expose resulting metadata thru the SW “graph”
  • 30. We go “complex”...
    • SWAP and ORE
  • 31. We go “complex”…
    • SWAP – Scholarly Works Application Profile
    • an application of the Dublin Core Abstract Model and Application Profiles
    • capturing relationships between works, expressions, manifestations, items and agents
    • ORE – OAI Object Re-use and Exchange
    • capturing relationships between aggregations and aggregated resources
    • note that ORE not tied to specific entity in FRBR
    • note that ORE implemented as profile of Atom
  • 32. SWAP application profile model ScholarlyWork Expression 0..∞ isExpressedAs Manifestation isManifestedAs 0..∞ Copy isAvailableAs 0..∞ 0..∞ 0..∞ isCreatedBy isPublishedBy 0..∞ isEditedBy 0..∞ isFundedBy isSupervisedBy AffiliatedInstitution Agent
  • 33. OAI ORE
  • 34. Summary
    • what can we learn from Web 2.0?
      • user interface design matters
      • global ‘concentration’ is an enabler of social interaction
    • simple DC is both too simple and too complex
    • richer DC application profiles such as SWAP and/or RDF applications like ORE may be a way forward
    • but need to ensure that their use does not over-complicate user interfaces and workflows
  • 35. A new vision?
    • a new vision?
  • 36. Flickr and digital cameras…
    • didn’t just take the practice of photography and put it on the Web
    • they fundamentally changed what photography was about
  • 37. What’s our vision?
    • the standards we adopt in the scholarly communication space…
    • OAI-PMH, OpenURL, DOI, PDF, …
    • are primarily about replicating in a Web world what we have always done on paper
    • this is not surprising given the necessary inertia of the scholarly communication life-cycle
    • but… do we need to re-envision scholarly communication as a true Web process?
    • if so, what would a repository look like?
  • 38. thank you