• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Edina cigs-21-september-2012
 

Edina cigs-21-september-2012

on

  • 658 views

 

Statistics

Views

Total Views
658
Views on SlideShare
658
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Edina cigs-21-september-2012 Edina cigs-21-september-2012 Presentation Transcript

    • Will’s World: Walking Through Shakespeare The use of Linked Data in the Shakespeare Registry Project Muriel Mewissen – Project Manager21 September 2012 http://willsworld.blogs.edina.ac.uk 1
    • Outline• Shakespeare Registry background• British Museum SPARQL endpoint• Conclusion21 September 2012 http://willsworld.blogs.edina.ac.uk 2
    • Background• JISC Discovery Programme 10 months projects – Dec 11 to Sep 12• Aim: to improve discoverability and usability of online data through better access to better metadata• Demonstrate the benefits and principles of assembling metadata: ‘aggregation as a tactic’• Focus on Shakespeare – Lots of data – Cultural Olympiad & Anniversary 23rd April • Glasgow culture hack event21 September 2012 http://willsworld.blogs.edina.ac.uk 3
    • Shakespeare RegistryContent Providers Users/Developers• Register online sources • Search, Identify, Locate online• Contribute metadata about resources using metadata resources aggregation Shakespeare Registry • Filters (Themes, Data types, Metadata, APIs, Sources,…) • Useful tools, license, documentation (build, use, register, contribute…) 21 September 2012 http://willsworld.blogs.edina.ac.uk 4
    • Linked Data Fit Questions Users/Developers Who? What? Registry How? Attractive? Format? Self- sustainable, transferable Schema? License? Content How much? Sharing? Easy Rich, complex Access? Format? Wikipedia on Linked Data: “linked data describes a method of publishing structured data so that it can be interlinked and become more useful. It builds upon standard Web technologies such as HTTP and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read automatically by computers. This enables data from different sources to be connected and queried… using standard formats such as RDF/XML…” Answer?21 September 2012 http://willsworld.blogs.edina.ac.uk 5
    • Linked Data Provision• Over 40 sources of online resources – Royal Shakespeare Company, Shakespeare Birthplace Trust, Shakespeare’s Globe, Shakespeare Institute, Folger Shakespeare Library, Open Shakespeare, World Shakespeare Festival, Open Source Shakespeare, … – British Museum, British Library, Bodleain Library, Bristish Universities Film & Video Council, National Library of Scotland, Wellcome Images, British Library of Sounds, JISC MediaHUB, BBC, … – National Theatre Poster, Bosak’s Play of Shakespeare in XML, The work of the Bard, internet Shakespeare Editions, PlayShakespeare.com, Seanco Technology Shakespeare Quote Generator, ...• Many images, some XML, one SPARQL endpoint! British Museum: http://collection.britishmuseum.org/Sparql21 September 2012 http://willsworld.blogs.edina.ac.uk 6
    • SPARQL Endpoint• Service endpoint• Web interface• Run SPARQL queries• Linked Data• Structured RDF stores21 September 2012 http://willsworld.blogs.edina.ac.uk 7
    • Using the British Museum SPARQLEasy to start:• Sample query: document ontologies• Help: data structure, access & URIS• Documentation: Controlled terms, object names thesaurusSearch for “Shakespeare”, “William Shakespeare”• Difficult to do keyword search• Difficult to do multi-stage search – Find the unique ID for an entity – Retrieve information related to the entity• Limited or no results• Overload the service21 September 2012 http://willsworld.blogs.edina.ac.uk 8
    • SPARQL Common IssuesCommon issues:• Lack of documentation (ontologies, identifiers)• Lack example queries• Lack of identifiers• Slow, timeouts & result size limit• Inefficient queries (text & keyword search)21 September 2012 http://willsworld.blogs.edina.ac.uk 9
    • SPARQL endpointSPARQL is not• Relational DB (search on given value for field) – Simple SQL query can be complex• Text DB like Solr (flexible text search) – Not suited for discoverySPARQL provides links & context Think about Linked Data in the right way21 September 2012 http://willsworld.blogs.edina.ac.uk 10
    • Asking the Right Questions• Structured data needs structured queries• To build meaningful queries, we need to know: – Data, structure, schema, identifiers• Internally specifiedHow do we identify “William Shakespeare” andrelated objects before we can the retrieve therelevant Linked Data?• Need identifier for “William Shakespeare”• URI or ID in the British Museum schema21 September 2012 http://willsworld.blogs.edina.ac.uk 11
    • Workflow for extracting metadata1. Collection Database Search GUI21 September 2012 http://willsworld.blogs.edina.ac.uk 12
    • Collection Database Search2. Select object4209 Results!21 September 2012 http://willsworld.blogs.edina.ac.uk 13
    • 3. Extract the IDfor the object 21 September 2012 http://willsworld.blogs.edina.ac.uk 14
    • 4. Use the ID in the SPARQL query21 September 2012 http://willsworld.blogs.edina.ac.uk 15
    • 5. Extract themetadataRepeat for theremaining 4208objects!21 September 2012 http://willsworld.blogs.edina.ac.uk 16
    • Sustainable Workflow?Workaround• Multiple GUI searches on Shakespeare, William Shakespeare, Macbeth, Hamlet,….• Manual steps• Many small queries, few large queriesFeedback on blog post Person ID for “Shakespeare, William”21 September 2012 http://willsworld.blogs.edina.ac.uk 17
    • Internal ID21 September 2012 http://willsworld.blogs.edina.ac.uk 18
    • 400 triples21 September 2012 http://willsworld.blogs.edina.ac.uk 19
    • Conclusions• SPARQL best suited to link data from different informational silos, not suited to text search and discovery• Common identifiers are essentials (i.e. ISSN) – Use of standards (ISNI), common language & ontologies• Documentation & example queries• Be prepared – To use additional data sources to identify URIs – To run many queries21 September 2012 http://willsworld.blogs.edina.ac.uk 20
    • Thanks• British Museum – SPARQL is Beta version to generate feedback – New version available within a few months• Owen Stephens• EDINA Peter Burnhill, Jackie Clark, Catherine Fleming, Andrew Dorward, Neil Mayo, Nicola Osborne, Christine Rees, Tim Stickland21 September 2012 http://willsworld.blogs.edina.ac.uk 21