• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Aggregation as tactic sm new
 

Aggregation as tactic sm new

on

  • 626 views

 

Statistics

Views

Total Views
626
Views on SlideShare
626
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Based on a presentation by Peter Burnhill at launch of Discovery – “a UK metadata ecology for UK education and research” in May 2011 JISC consolidation of service
  • Creating ‘new or novel knowledge products, increasing serendipity, cross-fertilisation of resources a metadata ecology for UK education and research No one size fits all solutions – mixed economies (technical and subject expertise, infrastructures, cultural differences across domains and organisations)
  • A key concept in the RDTF Vision is aggregation , directly or represented through metadata: to unlock riches held in our organisations, typically digital and online. This recognises the added value in assembly for tactical purpose – to improve ‘discoverability’. The ultimate aim being to improve access for research and educational purpose: for researchers, students and their teachers, -  with few barriers for potential take-up beyond that Promiscuous metadata, washing your dirty metadata in public, making sure your metadata is well dressed !!! It’s very easy to publish in the web. Maximising the full potential of that content is another matter entirely.
  • the audience for any given work, service or aggregation is now machine as well as human Make content easier to use in global informaiton ecosystem, reducing technical and licensing barriers, potential to create new knowledge products or services, foster social innovation, Let your metadata speak for you when you have no one to speak for you, Removeal of duplicate of effort
  • The process of disintermediation
  • Technical support provided by EDINA SDSS Expert Group in Identity and Access Management Keep at a strategic level rather than diving into detailed issues and concerns Orientate towards opportunities and feasible steps, however small those might be.
  • Restricted outside JiscMedia Hub and Unrestricted content outside JISC Media Hub, collections inside JISC Media Hub available under subscription
  • SUNCAT – getting permission from contributing libraries – make data available in ODC PDDL Carmichael Watson – gaelic folklorist – digitising his greatest work Carmina Gadelica – an anthology of hebridean charms, hymns, songs, poetry
  • .Aggregation should be regarded as intervention to achieve value added improvement and context as aid to discoverability. Among their recommendations is that “such Aggregations should have supported APIs which are attractive to and convenient for developers”. Interpretation of other recommendations suggest that the Framework should: (a) assist Aggregators (extant and as funded aggregation projects) to demonstrate value to Content Providers were they to progressively follow each of the four Linked Data steps (b) encourage Content Providers to provide a  semantic sitemap prior data aggregation, e.g. publication of RDF schema of underlying database, from a registry of recommended schemas provided as guidance by Aggregation services. http://www.w3.org/DesignIssues/LinkedData
  • Multi-part work – data is meaningless without ancillary material (i.e. provides both context & meaning) – ancillary material in machine-readable formats Publish your DDI compliant XML codebook.
  • An extension of one of Tim O’Reilly’s 7 principles of Web 2.0 (2005) – Harnessing Collective Intelligence Bearing mind that there is a cost to keeping data closed !!! Open data are really open when they can be ‘always’ reused!

Aggregation as tactic sm new Aggregation as tactic sm new Presentation Transcript

  • ‘ aggregation as a tactic’ - to support discovery Peter Burnhill & Stuart Macdonald EDINA national data centre University of Edinburgh CERN workshop on Innovations in Scholarly Communication (OAI7) University of Geneva, 23 June 2011
    • RDTF Vision:
    • The joint JISC / RLUK Resource Discovery Task Force (RDTF) Vision:
    • “ UK researchers and students will have easy, flexible, and ongoing access to
    • content and services through a collaborative, aggregated and integrated resource
    • discovery and delivery framework which is comprehensive, open and sustainable ”
    • Making content more discoverable both by people and machine via a
    • mixed economy of technological solutions.
    • The Discovery Initiative aims to:
    • Engage stakeholders across libraries, archives and museums
    • Build critical mass of open content to inspire others to participate
    • Encourage development of ‘purposeful aggregations and compelling
    • applications’ - mashing at the macro-level
    • Exemplify what can be done across domains to free data and explore how to make that data work harder
    • No one-size fits all solution!
    Context
    • Key concept in RDTF Vision is aggregation, directly or represented through metadata – to unlock the online & digital riches held in our organisations
    • ‘ Regard aggregation as intervention t o exploit the telematic opportunity for things [that] are 'remote, digital & published’ - a phrase derived from an IASSIST conference in 1990 exploring what it meant with the Internet if we regarded all [content] as ‘remote and published’.
    • The Web in mid-1990s simplified and thus improved
    • Unfortunately, even now, much which is online and on the Web is badly or inadequately published …
    • We have to improve, re-interpreting what it means to be ‘well-published’
    ‘ aggregation as a tactic’ - a phrase coined to end an an impasse during a meeting to discuss technical aspects of the RDTF Vision statement to identify stakeholder groups
    • The term aggregation is used a lot in computer science for:
      • “ objects … assembled or configured together to create a more complex object” UML, IBM
      • “ aggregating resources based on … properties. … they are owl:sameAs and their other properties can be intermixed .”
    • For purposes of RDTF aggregation means:
    • an assembly of data sources
      • more than a collection of objects (image banks, data services, catalogues, activity data) – related or otherwise
    • for machine-as-user – independent of presentation layer
    • However aggregation is not a goal nor an end in itself - It is an intervention to be used for a twofold strategic purpose:
    • ‘ improvement’ - merge & match, customisation and consumption, multiple output formats, reduce duplication of effort
    • ‘ discoverability’ – via ‘promiscuous’ or ‘well-dressed’ metadata through e.g. Google or tailored services
    • Digital Library has mixed parentage - a ‘re-mix’ of the document
    • tradition & the computation tradition
      • “ approaches based on a concern with documents, with signifying records : archives, bibliography, documentation, librarianship, records management, and the like … [ Content Provider speak ]
      • “ approaches based on uses of formal techniques , whether mechanical (such as punch cards and data-processing equipment) or mathematical/computational (as in algorithmic procedures).” [ Developer speak ]
        • Prof. Michael Buckland, Presidential Address, American Society for Information Science, JASIS’s 50th (1998)
        • http://people.ischool.berkeley.edu/~buckland/asis62.html
    Language & Perspectives
    • EDINA - develops and delivers JISC-sponsored national online services
      • adding value to data and content
        • Digimap Collections (OS mapping; SeaZone; BGS)
        • NewsfilmOnline (various; digitised with JISC £)
        • UK Access Management Federation (institutions; authentication)
    • Data Library – move from support to middle folk
        • Research data support for Edinburgh researchers
        • Research data management guidelines, training, OER materials
        • Edinburgh DataShare – open data repository
        • RADAR – Researching A Data Asset Registry
    • Maybe as ‘middle folk’ - c.f. those who deal in middleware
        • sometimes having the role of creator and supplier of some service
        • sometimes being the user of what others supply
        • ‘ inter-operator’
    Perspectives … as provider
  • Perspective … as aggregator: developing and delivering JISC-sponsored aggregation services
      • JISCMediahub - links to collections & hosted content (c. 1m resources)
        • CultureGrid; First World War Poetry; Films of Scotland; Getty images (all content searchable and viewable within JISC Media Hub)
      • GoGeo! - metadata registry for spatially-referenced data
        • Geodoc Metadata creation tool, ShareGeo Open
      • SUNCAT – serials union catalogue: 80 libraries
        • metadata/links to full text, download MARC records (& XML & SUTRS - Simple
        • Unstructured Text Record Syntax - data exchange format widely used in
        • Z39.50)
      • PEPRS - e-journal preservation registry jointly led by EDINA with the ISSN International Centre
        • metadata registry of available back copy e-journals - aggregated from
        • preservation agencies (incl. British Library, UK LOCKSS Alliance, CLOCKSS)
  • Some RDTF-related projects @ EDINA
      • GOgeo Linked Data (GOLD) – triplify INSPIRE compliant metadata to – improve discoverability of metadata records via search engines
      • SUNCAT : Exploring Open [bibliographic] Metadata (working with OKF to open up data sent by contributing libraries – convert to RDF)
      • Sharing OpenURL Activity Data - monthly usage data: date & time; anonymised IP address/inst. ID; title; author; ISSN, DOI
      • Uses – article/journal recommendations, publishers reviewing what content is of interest to specific communities, innovative services to meet users’ needs
      • CHALICE – Use data mining to extract placenames from the English Place Name Survey to create a UK historic gazetteer published as Linked Data & link it to the Geonames ontology on the semantic web.
      • AddressingHistory – Geo-parsing of Scottish Post Office Directories, API onto digitised content, output in XML, CSV, JSON
      • 3 further case studies on other EDINA services illustrating how other collections can benefit from the same techniques.
  • The end is the start of a new beginning …
    • In earlier ‘web time’ we had the MODELS ‘user-verbs’:
      • Discover -> Locate -> Request -> Access (Deliver)
      • Dempsey, Russell & Murray (1999) http://www.ukoln.ac.uk/dlis/models/publications/utopia/
      • where Access was the end game for us ‘middle folk’ even if the
      • beginning & part of a deeper process for researchers, students …
    • Now there is call for more than bilateral & negotiated interoperability, where Access is the beginning for developers and for other services
    • RDF/Linked Data enables information to be shared in a more Web-friendly way
    • RDF/Linked Data enables structure and content of those data sources to be explicit - vocabularies, ontologies, relationships
    • Exposing the complexity and relationship in the underlying data,
    • hanging the insides on the outside!
  • The treasures are on show inside, but … Centre Pompidou
  • … and so to summarise..
    • Early web approaches focused on making content accessible for humans
    • hiding the complexity and relationship in the underlying data
      • paying attention to the user interface: HCI & GUI; Usability and Accessibility
    • However to ensure content gets noticed it must be made easier for machines to understand by:
    • exposing the complexity and relationship in the underlying data
      • having in mind the machine-as-user: API as well as HCI
    • Aggregation should be seen as intervention, with strategic purpose:
      • to engage in value-added improvement of content
      • to enhance the discoverability of that which is ‘aggregated’
          • to be a focus of attention (thro’ promiscuous metadata!)
    • If it is with RDF, then that’s good don’t make a fuss if not
      • Publish RDBMS schemas, catalogue records, codebooks, and
      • ancillary or related content in multiple, machine-readable formats
  • The Many Minds principle
    • “ the coolest thing to do with your data will be thought of by someone else“
    • Using data as the building platform
    • Jo Walsh & Rufus Pollock (2007-05-17). Open Data and Componentization . XTech 2007 (slide 14)
    • "Benefits of freeing data are many, arguably being the most relevant one
    • the “Many Minds principle”: there’ll always be someone that will find out
    • a way to reuse data that you wouldn’t have even figured.“
    • José Manuel Alonso , Notes from the 5th Internet, Law and Politics Conference: The Pros and Cons of Social Networking Sites , organized by the Open
    • University of Catalonia, School of Law and Political Science, and held in Barcelona, Spain, on July 6th and 7th, 2009.
  • [email_address] [email_address] http:// edina.ac.uk / Repository Fringe 2011 – call for participants: http://www.repositoryfringe.org/ THANK YOU CC BY-NC-ND 2.0 - image by enggul courtesy of Flickr – http://www.flickr.com/photos/enggul/2361808668 /