Your SlideShare is downloading. ×
0
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Semantic web technologies and digital library search
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Semantic web technologies and digital library search

1,212

Published on

semantic web technologies and digital library search presentation discussing linked data basics and STELLAR project work

semantic web technologies and digital library search presentation discussing linked data basics and STELLAR project work

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,212
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
20
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Today I’m going to talk about some work we’ve been doing at the Open University to explore the potential of semantic web technologies as a tool to improve search in digital libraries
  • I’ll give you a bit of background to OU Library ServicesA very brief introduction to semantic web technologiesTalk about the context for looking at these search technologies and look at the work of the Stellar search projectThere’ll be time for questions at the end but feel free to interrupt me as we go along if you have questions
  • Brief background about the OUDistance learning so most students never visit the main campus
  • Range of library services – we aim to embed links to library material directly into course websites – online and digital library resources – not providing access to books for students
  • We also run a range of Information Management services, as well as the Institutional Repository and a University ArchiveA key area of expertise is metadata – I’ll cover later how important that is in relation to the semantic web
  • We’ve a good track record in recent years of running several Jisc-funded projects covering a range of different subjects – reference management, activity data and mobiles
  • And have carried out a number of other projects looking at video technology – so the AVA project looked at the University’s large collection of videos and built a system to find and digitise video content, particularly material on obsolete video media formatsWe’ve also been working with the University’s Knowledge Media Institute on several semantic web projects, including the Lucero project that built data.open.ac.uk which was the first University-wide linked data repository in the UK
  • Why searchSearch is critical to us – we spend a lot of time and effort on itWhether that is buying search systems, or teaching people how to search, or providing help and support materials, or trying to resolve problems that people have with trying to find library materialsIt’s a major part of what libraries do
  • And if people can’t find what they are searching for, then it’s a big problemStudents get frustrated, it affects their studies, it might affect your reputation through National Student Survey scoresSo it’s an important area for usFor staff, it can stop them from being able to effectively do their job and cost you money – this quote is from a person in an academic department asked to look at whether there was material from earlier courses that could be reused – in many cases they end up redoing the teaching materials
  • Ok searchWell we all know how to searchWe put some words into a search boxWe might also use some special instructions such as boolean logic or put things between inverted commas to help refine the search query
  • So a traditional search system takes your search query and compares it against an indexEssentially what systems are doing is matching what you typed with what is in the index – character by characterThey might tweak what gets presented, or offer you a ‘did you mean’ feature if you’ve typed something that looks similar to something else, but essentially they match your string of characters with an index derived from whatever they are indexing
  • But the systems don’t actually understand what you mean when type your search into the search boxThey don’t understand what the words mean, they don’t understand what the concepts are about, and they don’t understand how one search term might relate to another, other than by trawling their logs of what searchers who search for this… searched for that…
  • So even Google have been starting to think that just responding to a search with things that match character by character might not be the only approachSo they’ve started to explore alternatives with their Knowledge Graph
  • Which is what they using to flag up relevant material in their search resultsSo where does this different approach come from?
  • So where did the concept that the web might be organised differently come from?The concept for the semantic web came in an article by Tim Berners-Lee and others in Scientific American in 2001The article outlined an idea that the web should be adapted to allow computers to understand the ‘meaning’ of pages (the semantic content) and proposed structured collections of data
  • Semantic web technologies have been described as the web of meaning and the web of data.So there is a stack of protocols and technologies and standards used within this spaceThere are a few resources listed here if you want to follow up in more detail
  • These are some of the key elements of the semantic web toolkitAnd I’ll talk briefly about some of the basic concepts but won’t go into a lot of detail
  • URIs are Uniform Resource Identifiers – a string of characters used to identify a name of a web resource, this might include the name and the location but could be eitherSo an example of a URI is this OU course – the simple structure of a URI like this leads to a URI like this being described as a ‘cool’ URI
  • Linked data is the term used to describe the methods of exposing data on the semantic web
  • The key concept around linked data is around relationships between one piece of data and another. This picture comes from the original semantic web article in Scientific AmericanLinked data uses the concept called triples – where there is a subject and an object that are connected together by a predicate, so for example Jane Austen ‘is the author of’ Pride and PrejudiceThese triples are represented using a data interchange framework called RDF (the resource description framework) and stored in databases called ‘triplestoresThis is the basic building block of linked data
  • To be able to link data together you need to be able to have a shared vocabulary or ontology that allows you to describe the concepts in your data.These could be about people or places for example, or about things – such as a university courseSo you will need to go through an exercise to map all the concepts against relevant ontologies and then use RDF (the Resource Description Framework) to encode it
  • By coding the concepts in your data with standard vocabularies, essentially saying that we know where there is data about this concept that we can link to, you can allow connections to be made between websitesSo VIAF and FOAF are ways of linking together your content (in a catalogue for example) that mentions Jane Austen with other pages on the web that contain information about Jane Austen
  • So if we dig into the schema for that record you can see that there is a URI that identifies that your data is about a person and that there is information about that person at the VIAF URIAnd what you can do is to link the concepts represented by your triples to different ontonlogies
  • And there are places that you can go to find suitable ontologies and content
  • So this is part of the basic model
  • And this shows a bit more of the complexity
  • Within Fedora this is the approach This is the record for an OU module S100, showing some of the study materials that form part of the course
  • From the Fedora record then we can automatically create the linked data triple store content
  • Having created the data in RDF format we were then able to look at testing semantic web search tools to see if it improved the discoverability of the materialWe were able to use a tool called DiscOU developed by the Knowledge Media Institute (KMi)So the tool analyses the meaning of the words in your query to extract concepts in the digital library dataset then displays closest matchesAs well as showing digital library content we’ve also been able to easily connect it to other linked data datasets at the OU such as our OER platform OpenLearn
  • If you look a bit more closely at the toolThe tool works best with a rich piece of textYou get the concepts in the query displayed on the left with results on the rightYou can then change the priority of the concepts, or exclude them entirely, and as you change the concepts the data displayed refreshesThere’s a link to a short 1 minute screencast on the stellar blog
  • When you click on the item you get taken through to the content from the digital library – this might be the page of a book or a video
  • If you click on an OU course code it takes you to a page from data.open.ac.uk – but it could equally well take you to the main page on the university website about that particular course
  • This is how the system architecture looks, showing the main elements of the system and how it fits together
  • There’s a publicly available version that you can use to try out the technologyIn this case it searches the OU’s OER platform OpenLearn
  • But as you can see it is similar to DiscOU Stellar
  • If you remember our approach to Stellar – which was to measure value, carry out an enhancement and then assess the impact on perceptions of value, using a balanced scorecard approach with these four dimensions of value – how did this work affect user views?
  • A good positive reactionIn every area the majority agreed that the enhanced materials had valueWhere we saw particular increases in perceived value were maybe unsurprisingly to internal processes and financial value
  • So if we look in a bit more detailIt’s clear that users are more likely to explore enhanced materialSemantic search also finds material that a traditional ‘strings’ search would not have foundMaterial found through such a semantic search system would be more likely to be used through our course production system
  • When asked about the impact on the financial aspects it’s clear that the semantic enhancement has actually increased users perception of the value of the materialsIn particular there is a big jump in the percentage of people that think that the University could make cost savings if material was enhanced in this wayBut there’s a caveat that money needs to be invested in the process to make more material available
  • So to summarise how semantic search helpsUsers are more likely to find material, so more of it can be reused as an alternative to recreating course materialA key point is that semantic search finds things that traditional search can’tAnd users views had shifted in that an even greater percentage thought that there were cost benefits in reusing material
  • For us the key messages about the semantic search is that it was quite time-consuming to setup the ontologies and metadata for each of the types of material you are dealing with. But for that type of material once the process is done then you can continue to add more material of that type and automatically generate linked dataThere is a trade-off between the amount of enhancement you do and the improvements that gives you with discoverability – but that is pretty much the case with anything that involves metadataBut stakeholders do see and appreciate the value of an improved search feature
  • We’re now working on embedding the learning from STELLAR into the OU Digital Library. So linked data is in-built and we are using that capability to link out to related materials in other linked data datasets, in this case in OpenLearn and iTunesU.We haven’t yet built a semantic search tool as that will need some further development to build a version of the semantic meaning engine that we can integrate into the digital libraryWe’re also talking more widely across the university about how we could use the technology across other repositories of learning and teaching materials.
  • Ok, so that was a run through linked data and the semantic searching work carried out by the STELLAR projectFurther information including the final report along with about 20 blogs posts we put up during the project are on the project blogHope that was of interest
  • Are there any questions?
  • Transcript

    • 1. Semantic web and search Richard Nurse Open University Library Services
    • 2. Outline • • • • Background Basics of semantic web technologies Relevance to libraries and search STELLAR search project
    • 3. Open University • • • • • UK distance learning University +200,000 students Undergraduate/Postgraduate/Research Online learning supported by course materials & local tutors Milton Keynes campus and regional/national offices BUT… most students never visit the main campus
    • 4. Library Services • • • • 24/7 helpdesk Online library resources Online help sessions Links to library resources and skills activities embedded in VLE • Discovery platform, website resource lists • Librarians work with academics to build new courses
    • 5. Library Services • • • • • • Cross-university Information Management services Institutional Repository ORO http://oro.open.ac.uk/ Research Data Management Project Data retention and records management University Archive Metadata expertise
    • 6. Library Services • Innovation projects http://www.open.ac.uk/blogs/macon/ http://www.open.ac.uk/blogs/RISE/ http://www.open.ac.uk/blogs/telstar/
    • 7. Library Services • • • • Innovation and development OU Knowledge Media Institute and others Semantic web Video search http://www.open.ac.uk/blogs/AVA/ http://projects.kmi.open.ac.uk/reflex/index.xml http://kmi.open.ac.uk/projects/name/lucero
    • 8. Search
    • 9. Search “It‟s always so hit-and-miss… I used to sit there for hours and just not find anything. There were thousands and thousands of bits of material but no way of drilling down to find what I really needed. My manager needed to know, by tomorrow, whether there was something we could use or not and I didn‟t know the answer, so had to say no”.
    • 10. Search • Terms • Boolean logic – AND, OR, NOT • - site: “ “
    • 11. Search http://www.flickr.com/photos/niallkennedy/
    • 12. Search http://www.flickr.com/photos/dullhunk/
    • 13. Search • „things not strings‟ http://googleblog.blogspot.co.uk/2012/05/introducing-knowledge-graph-things-not.html
    • 14. Search Google‟s Knowledge Graph
    • 15. Semantic web Definition: "The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation." The Semantic Web Tim Berners-Lee, James Hendler, and Ora Lassila Scientific American, 2001 http://www.sciam.com/article.cfm?id=the-semantic-web http://www.nature.com/scientificamerican/journal/v284/n5/pdf/scientifica merican0501-34.pdf
    • 16. Semantic web basics • „web of meaning‟ • „web of data‟ http://www.w3.org/2001/sw/ http://semanticweb.org/wiki/Main_Page http://www.slideshare.net/fadirra/semanticweb-intro-040411
    • 17. Semantic web basics • • • • URIs Linked data Ontologies but also…
    • 18. Semantic web basics • URIs – Uniform Resource Identifier • http://en.wikipedia.org/wiki/Uniform_resource_identifier http://www.slideshare.net/mdaquin/sssw13-ldtut
    • 19. Linked data • “Linked Data is about using the Web to connect related data that wasn't previously linked, or using the Web to lower the barriers to linking data currently linked using other methods.” • Wikipedia defines Linked Data as "a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF." http://linkeddata.org/home
    • 20. Subject > Predicate < Object Jane Austen „is the author of‟ Pride and Prejudice http://www.nature.com/scientificamerican/journal/ v284/n5/pdf/scientificamerican0501-34.pdf
    • 21. Ontologies “An ontology is a formal specification of a shared conceptualization” Tom Gruber http://en.wikipedia.org/wiki/Tom_Gruber http://viaf.org/viaf/72955884/ http://www.slideshare.net/mdaquin/sssw13-ldtut
    • 22. Ontologies eg Virtual International Authority File – VIAF – maintained by OCLC Friend of a Friend – FOAF http://www.foaf-project.org/ http://oclc.org/developer/documentation/virtual-international-authority-file-viaf/viaf-rdf-example
    • 23. Ontologies http://viaf.org/viaf/102333412/#foaf:Person
    • 24. Ontologies http://dbpedia.org/About http://lov.okfn.org/dataset/lov/
    • 25. Linked data ‘cloud’ Richard Cyganiak and Anja Jentzsch http://lod-cloud.net/
    • 26. Why is this of interest? Lorcan Dempsey OCLC http://www.slideshare.net/lisld/the-inside-out-library
    • 27. Why is this of interest? Quoted by Lorcan Dempsey “Inside Out library: Scale, Learning and Engagement” http://www.slideshare.net/lisld/the-inside-out-library
    • 28. Why is this of interest? “The change that libraries will need to make … must include the transformation of the library’s public catalog from a stand-alone database of bibliographic records to a highly hyperlinked data set that can interact with information resources on the World Wide Web.” Karen Coyle Understanding the semantic web http://www.alatechsource.org/library-technologyreports/understanding-the-semantic-webbibliographic-data-and-metadata
    • 29. Why is this of interest? Search is a major “pain point” for students and staff Students ‘The library is very expansive which is great but you can never find what you need. They need to redo the system make it easier.’ NSS comment Staff ‘I would be more likely to explore existing noncurrent learning materials if there were a better way of finding them.’ STELLAR survey comment
    • 30. What are libraries doing? http://lodlam.net/ http://datahub.io/group/lld http://www.w3.org/2005/Incubator/lld/
    • 31. at the OU Library • Library catalogue • Archival material • Old course materials in the University Archive
    • 32. University Archive • OU study materials – print and audio-visual • Historical materials – photographs, oral history • Papers of OU people http://www.open.ac.uk/library/library-resources/the-openuniversity-archive
    • 33. Range of learning resource types
    • 34. Range of learning resource types
    • 35. The OU Digital Library (OUDL) FEDORA Flexible Extensible Digital Object Repository Architecture Open source, created by and supported by the digital preservation community purpose-designed Supports international metadata standards PREMIS – METS – MODS – EAD – DC - OAI Supports Linked Data natively Mulgara triplestore
    • 36. The STELLAR project • Semantic Technologies Enhancing the Lifecycle of Learning Resources • OU Library Services/OU Knowledge Media Institute • Experiment with semantic technologies in a digital library environment … and to consider the sustainability implications of using semantic technologies. • Jisc-funded 2012-2013 • Jisc Digital Infrastructure programme – Sustainability of digital content
    • 37. STELLAR project aims Taking collections preserved in the OUDL, the STELLAR project was established to: • Develop a detailed understanding of the value of legacy learning materials as perceived by academic staff and other key stakeholders • Experiment with the use of semantic technologies in a digital library environment to ascertain the extent to which the perceived value of these materials might be enhanced and to consider the sustainability implications of using semantic technologies. • Inform the development of digital libraries of learning resources by contributing to the evidence base for their effectiveness • Increase the return on investment of learning materials by developing an evidence based model for lifecycle management
    • 38. The STELLAR project • • • • Project approach Create a baseline of perceptions of the value of the collection Carry out an enhancement of the collection Assess the impact of that enhancement on perceptions of value
    • 39. Initial survey into value • 89.2% of respondents (501) agreed or strongly agreed with the statement that maintaining an archive of non-current OU learning materials is important to the reputation of the OU. • 75.9% of respondents thought that this should be maintained in perpetuity. • 90.16% of respondents (504) agreed or strongly agreed that non-current learning materials are important to the context of the history of higher education. • 91.75% of those respondents who were involved in module production (356) agreed or strongly agreed that when producing new OU learning material, I am likely to look to previous material, whether for inspiration or for potential reuse. “We are the world leaders in distance learning, so our curriculum designs are much admired and so are our materials. It would be remiss of us not to treat them as potential objects of scholarship themselves”.
    • 40. Capturing perceptions Using a balanced scorecard approach we conducted a benchmarking survey of academic staff and stakeholders to investigate the value they place on non-current learning materials Personal and professional perspectives of value I would be disappointed if the OU learning materials that I helped to produce were not kept I keep my own copies of the OU learning materials that I am involved in producing I would be pleased if others chose to reuse of reversion the OU learning materials that I have helped to produce Value to internal processes and cultures I keep my own copies of the OU learning materials that I am involved in producing When producing new OU learning material, I am likely to look to previous material, whether for inspiration or for potential reuse I would be more likely to explore existing noncurrent learning materials if there were a better way of finding them. Value to HE and academic communities Maintaining an archive of non-current OU learning materials is important to the reputation of the OU I think the non-current OU learning materials are important in the context of the history of higher education I think the non-current OU learning materials are important in showing how the OU taught at particular times in history Financial / bottom line perspectives of value I think that there is a monetary value to non-current OU learning materials The OU could make savings if more learning material were reused http://www.gla.ac.uk/services/library/espida/
    • 41. Module Information A metadata module record was created which connects the complicated web of content and metadata associated with each module STELLAR allowed us to link the metadata for all this module content, making it more discoverable & reusable
    • 42. Basic linked data model (for data.open.ac.uk and to comply with current module descriptions) doau:a103 dc:title | rdfs:label | courseware:has-title rdf:type courseware:istaught-present courseware:ha s-courseware daou-library:339347 dc:title dc:isVersionOf aiiso#code “An Introduction to the Humanities” courseware:Course | mlo:LearningOppor tunitySpecification | aiiso:Module | xcri:course “false” dc:subject“A103” doau:a102 dc:isVersionOf doau:a101 “An introduction to the humanities : resource book 2” jacs:V900 | doau-topic:artsand-humanities
    • 43. Relationship model
    • 44. Fedora record course
    • 45. Fedora record
    • 46. Application of Linked Data • Text entered into the tool is passed through a semantic meaning engine and concepts are matched against the concepts contained within the digital library dataset. • A selection of the closest matches are then displayed. These link through to the object in the Fedora digital library • The semantic web tool analyses the meaning of those words and finds related material • the tool can also show related material from other datasets from data.open.ac.uk
    • 47. http://www.open.ac.uk/blogs/stellar/wp-content/uploads/2013/07/stellar2.mp4
    • 48. Directly access digitised content stored in the OUDL Materials include those originally in print, audio and video formats Links to the extensive metadata about the course or element of the course, held on a data.open.ac.uk page
    • 49. data.open.ac.uk
    • 50. Architecture of the STELLAR tool
    • 51. Try the technology • http://discou.info/alfa/
    • 52. Headline findings
    • 53. Headline findings • A consistently positive reaction to the enhanced collections. In every area the majority of respondents agreed or strongly agreed that the enhanced materials had value • Were two dimensions where the evaluation indicates the transformation of the materials has increased the perceived value of the material: • value to internal processes & culture • financial/bottom line value • Participants also made several comments regarding which materials should be preserved & enhanced • Read the full report on the blog: http://www.open.ac.uk/blogs/stellar/wp-content/uploads/2013/07/STELLAR-Post-Enhancement-Survey-Report.pdf
    • 54. Value to internal processes & culture • 89% of respondents agreed or strongly agreed that they would be more likely to explore existing materials if they knew they had been enhanced • 94% agreed or strongly agreed that such enhancement makes content easier to reuse or refer to for inspiration during module production • When thinking about existing systems, 94% also agreed or strongly agreed that the semantic analysis they had seen suggested material which they would not have found using a traditional search • 78% of respondents agreed or strongly agreed that enhanced materials are more likely to be referred to during module production than those preserved in existing OU systems
    • 55. Financial / bottom line value • Improving the discoverability and reusability of the materials appears to have increased the perceived financial value of the materials • In the pre-enhancement survey 75.9% of respondents agreed that the OU could make savings if more learning material were reused • Following the enhancement, an increased 83% agreed or strongly agreed that the OU could make cost savings if existing materials were enhanced to make them more discoverable “It will be helpful to know what kind of support and budget is available to make more old course resources available. This will help reducing costly budgets for new modules in production.”
    • 56. Value of semantic search Stakeholder views of semantic search • ‘More likely to use material’ - 89% agreed/strongly agreed • ‘Content easier to reuse’ – 94% agreed/strongly agreed • ‘Found material that traditional search wouldn’t – 94% agreed/strongly agreed Cost-savings could be made if material re-used After Before 72.00% 74.00% 76.00% 78.00% 80.00% 82.00% 84.00% http://www.open.ac.uk/blogs/stellar/wp-content/uploads/2013/07/STELLARPost-Enhancement-Survey-Report.pdf
    • 57. Key findings • Significant effort required to improve the metadata • To make best use of the Linked Data, it was beneficial to digitise and preserve all course materials for the selected courses • Trade-off between value of extra content digitised and the cost of cataloguing • Once you’ve built it into your system you can automatically generate linked data for new content of that type • Stakeholders can see the value of this type of search
    • 58. Follow-up work to STELLAR • Linked Data embedded into OU Digital Library • Used to link to related iTunesU and OpenLearn material
    • 59. STELLAR • STELLAR blog www.open.ac.uk/blogs/stellar • Final report http://www.open.ac.uk/blogs/stellar/wpcontent/uploads/2013/09/STELLAR-JISC-Final-Report.pdf • Final report in Jorum http://hdl.handle.net/10949/18379
    • 60. Questions?

    ×