Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Aggregation as Tactic


Published on

Presented by Peter Burnhill and Stuart Macdonald at CERN Workshop on Innovations in Scholarly Communication (OAI7), Geneva Switzerland, 23 June 2011.

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Aggregation as Tactic

  1. 1. ‘ aggregation as a tactic’ - to support discovery Peter Burnhill & Stuart Macdonald EDINA national data centre University of Edinburgh CERN workshop on Innovations in Scholarly Communication (OAI7) University of Geneva, 23 June 2011
  2. 2. <ul><li>RDTF Vision: </li></ul><ul><li>The joint JISC / RLUK Resource Discovery Task Force (RDTF) Vision: </li></ul><ul><li>“ UK researchers and students will have easy, flexible, and ongoing access to </li></ul><ul><li>content and services through a collaborative, aggregated and integrated resource </li></ul><ul><li>discovery and delivery framework which is comprehensive, open and sustainable ” </li></ul><ul><li>Making content more discoverable both by people and machine via a </li></ul><ul><li>mixed economy of technological solutions. </li></ul><ul><li>The Discovery Initiative aims to: </li></ul><ul><li>Engage stakeholders across libraries, archives and museums </li></ul><ul><li>Build critical mass of open content to inspire others to participate </li></ul><ul><li>Encourage development of ‘purposeful aggregations and compelling </li></ul><ul><li>applications’ - mashing at the macro-level </li></ul><ul><li>Exemplify what can be done across domains to free data and explore how to make that data work harder </li></ul><ul><li>No one-size fits all solution! </li></ul>Context
  3. 3. <ul><li>Key concept in RDTF Vision is aggregation, directly or represented through metadata – to unlock the online & digital riches held in our organisations </li></ul><ul><li>‘ Regard aggregation as intervention t o exploit the telematic opportunity for things [that] are 'remote, digital & published’ - a phrase derived from an IASSIST conference in 1990 exploring what it meant with the Internet if we regarded all [content] as ‘remote and published’. </li></ul><ul><li>The Web in mid-1990s simplified and thus improved </li></ul><ul><li>Unfortunately, even now, much which is online and on the Web is badly or inadequately published … </li></ul><ul><li>We have to improve, re-interpreting what it means to be ‘well-published’ </li></ul>‘ aggregation as a tactic’ - a phrase coined to end an an impasse during a meeting to discuss technical aspects of the RDTF Vision statement to identify stakeholder groups
  4. 4. <ul><li>The term aggregation is used a lot in computer science for: </li></ul><ul><ul><li>“ objects … assembled or configured together to create a more complex object” UML, IBM </li></ul></ul><ul><ul><li>“ aggregating resources based on … properties. … they are owl:sameAs and their other properties can be intermixed .” </li></ul></ul><ul><li>For purposes of RDTF aggregation means: </li></ul><ul><li>an assembly of data sources </li></ul><ul><ul><li>more than a collection of objects (image banks, data services, catalogues, activity data) – related or otherwise </li></ul></ul><ul><li>for machine-as-user – independent of presentation layer </li></ul><ul><li>However aggregation is not a goal nor an end in itself - It is an intervention to be used for a twofold strategic purpose: </li></ul><ul><li>‘ improvement’ - merge & match, customisation and consumption, multiple output formats, reduce duplication of effort </li></ul><ul><li>‘ discoverability’ – via ‘promiscuous’ or ‘well-dressed’ metadata through e.g. Google or tailored services </li></ul>
  5. 5. <ul><li>Digital Library has mixed parentage - a ‘re-mix’ of the document </li></ul><ul><li>tradition & the computation tradition </li></ul><ul><ul><li>“ approaches based on a concern with documents, with signifying records : archives, bibliography, documentation, librarianship, records management, and the like … [ Content Provider speak ] </li></ul></ul><ul><ul><li>“ approaches based on uses of formal techniques , whether mechanical (such as punch cards and data-processing equipment) or mathematical/computational (as in algorithmic procedures).” [ Developer speak ] </li></ul></ul><ul><ul><ul><li>Prof. Michael Buckland, Presidential Address, American Society for Information Science, JASIS’s 50th (1998) </li></ul></ul></ul><ul><ul><ul><li> </li></ul></ul></ul>Language & Perspectives
  6. 6. <ul><li>EDINA - develops and delivers JISC-sponsored national online services </li></ul><ul><ul><li>adding value to data and content </li></ul></ul><ul><ul><ul><li>Digimap Collections (OS mapping; SeaZone; BGS) </li></ul></ul></ul><ul><ul><ul><li>NewsfilmOnline (various; digitised with JISC £) </li></ul></ul></ul><ul><ul><ul><li>UK Access Management Federation (institutions; authentication) </li></ul></ul></ul><ul><li>Data Library – move from support to middle folk </li></ul><ul><ul><ul><li>Research data support for Edinburgh researchers </li></ul></ul></ul><ul><ul><ul><li>Research data management guidelines, training, OER materials </li></ul></ul></ul><ul><ul><ul><li>Edinburgh DataShare – open data repository </li></ul></ul></ul><ul><ul><ul><li>RADAR – Researching A Data Asset Registry </li></ul></ul></ul><ul><li>Maybe as ‘middle folk’ - c.f. those who deal in middleware </li></ul><ul><ul><ul><li>sometimes having the role of creator and supplier of some service </li></ul></ul></ul><ul><ul><ul><li>sometimes being the user of what others supply </li></ul></ul></ul><ul><ul><ul><li>‘ inter-operator’ </li></ul></ul></ul>Perspectives … as provider
  7. 7. Perspective … as aggregator: developing and delivering JISC-sponsored aggregation services <ul><ul><li>JISCMediahub - links to collections & hosted content (c. 1m resources) </li></ul></ul><ul><ul><ul><li>CultureGrid; First World War Poetry; Films of Scotland; Getty images (all content searchable and viewable within JISC Media Hub) </li></ul></ul></ul><ul><ul><li>GoGeo! - metadata registry for spatially-referenced data </li></ul></ul><ul><ul><ul><li>Geodoc Metadata creation tool, ShareGeo Open </li></ul></ul></ul><ul><ul><li>SUNCAT – serials union catalogue: 80 libraries </li></ul></ul><ul><ul><ul><li>metadata/links to full text, download MARC records (& XML & SUTRS - Simple </li></ul></ul></ul><ul><ul><ul><li>Unstructured Text Record Syntax - data exchange format widely used in </li></ul></ul></ul><ul><ul><ul><li>Z39.50) </li></ul></ul></ul><ul><ul><li>PEPRS - e-journal preservation registry jointly led by EDINA with the ISSN International Centre </li></ul></ul><ul><ul><ul><li>metadata registry of available back copy e-journals - aggregated from </li></ul></ul></ul><ul><ul><ul><li>preservation agencies (incl. British Library, UK LOCKSS Alliance, CLOCKSS) </li></ul></ul></ul>
  8. 8. Some RDTF-related projects @ EDINA <ul><ul><li>GOgeo Linked Data (GOLD) – triplify INSPIRE compliant metadata to – improve discoverability of metadata records via search engines </li></ul></ul><ul><ul><li>SUNCAT : Exploring Open [bibliographic] Metadata (working with OKF to open up data sent by contributing libraries – convert to RDF) </li></ul></ul><ul><ul><li>Sharing OpenURL Activity Data - monthly usage data: date & time; anonymised IP address/inst. ID; title; author; ISSN, DOI </li></ul></ul><ul><ul><li>Uses – article/journal recommendations, publishers reviewing what content is of interest to specific communities, innovative services to meet users’ needs </li></ul></ul><ul><ul><li>CHALICE – Use data mining to extract placenames from the English Place Name Survey to create a UK historic gazetteer published as Linked Data & link it to the Geonames ontology on the semantic web. </li></ul></ul><ul><ul><li>AddressingHistory – Geo-parsing of Scottish Post Office Directories, API onto digitised content, output in XML, CSV, JSON </li></ul></ul><ul><ul><li>3 further case studies on other EDINA services illustrating how other collections can benefit from the same techniques. </li></ul></ul>
  9. 9. The end is the start of a new beginning … <ul><li>In earlier ‘web time’ we had the MODELS ‘user-verbs’: </li></ul><ul><ul><li>Discover -> Locate -> Request -> Access (Deliver) </li></ul></ul><ul><ul><li>Dempsey, Russell & Murray (1999) </li></ul></ul><ul><ul><li>where Access was the end game for us ‘middle folk’ even if the </li></ul></ul><ul><ul><li>beginning & part of a deeper process for researchers, students … </li></ul></ul><ul><li>Now there is call for more than bilateral & negotiated interoperability, where Access is the beginning for developers and for other services </li></ul><ul><li>RDF/Linked Data enables information to be shared in a more Web-friendly way </li></ul><ul><li>RDF/Linked Data enables structure and content of those data sources to be explicit - vocabularies, ontologies, relationships </li></ul><ul><li>Exposing the complexity and relationship in the underlying data, </li></ul><ul><li>hanging the insides on the outside! </li></ul>
  10. 10. The treasures are on show inside, but … Centre Pompidou
  11. 11. … and so to summarise.. <ul><li>Early web approaches focused on making content accessible for humans </li></ul><ul><li>hiding the complexity and relationship in the underlying data </li></ul><ul><ul><li>paying attention to the user interface: HCI & GUI; Usability and Accessibility </li></ul></ul><ul><li>However to ensure content gets noticed it must be made easier for machines to understand by: </li></ul><ul><li>exposing the complexity and relationship in the underlying data </li></ul><ul><ul><li>having in mind the machine-as-user: API as well as HCI </li></ul></ul><ul><li>Aggregation should be seen as intervention, with strategic purpose: </li></ul><ul><ul><li>to engage in value-added improvement of content </li></ul></ul><ul><ul><li>to enhance the discoverability of that which is ‘aggregated’ </li></ul></ul><ul><ul><ul><ul><li>to be a focus of attention (thro’ promiscuous metadata!) </li></ul></ul></ul></ul><ul><li>If it is with RDF, then that’s good don’t make a fuss if not </li></ul><ul><ul><li>Publish RDBMS schemas, catalogue records, codebooks, and </li></ul></ul><ul><ul><li>ancillary or related content in multiple, machine-readable formats </li></ul></ul>
  12. 12. The Many Minds principle <ul><li>“ the coolest thing to do with your data will be thought of by someone else“ </li></ul><ul><li>Using data as the building platform </li></ul><ul><li>Jo Walsh & Rufus Pollock (2007-05-17). Open Data and Componentization . XTech 2007 (slide 14) </li></ul><ul><li>&quot;Benefits of freeing data are many, arguably being the most relevant one </li></ul><ul><li>the “Many Minds principle”: there’ll always be someone that will find out </li></ul><ul><li>a way to reuse data that you wouldn’t have even figured.“ </li></ul><ul><li>José Manuel Alonso , Notes from the 5th Internet, Law and Politics Conference: The Pros and Cons of Social Networking Sites , organized by the Open </li></ul><ul><li>University of Catalonia, School of Law and Political Science, and held in Barcelona, Spain, on July 6th and 7th, 2009. </li></ul>
  13. 13. [email_address] [email_address] http:// / Repository Fringe 2011 – call for participants: THANK YOU CC BY-NC-ND 2.0 - image by enggul courtesy of Flickr – /