3. Open University
•
•
•
•
•
UK distance learning University +200,000 students
Undergraduate/Postgraduate/Research
Online learning supported by course materials & local tutors
Milton Keynes campus and regional/national offices
BUT… most students never visit the main campus
4. Library Services
•
•
•
•
24/7 helpdesk
Online library resources
Online help sessions
Links to library resources and skills activities
embedded in VLE
• Discovery platform, website resource lists
• Librarians work with academics to build new courses
5. Library Services
•
•
•
•
•
•
Cross-university Information Management services
Institutional Repository ORO http://oro.open.ac.uk/
Research Data Management Project
Data retention and records management
University Archive
Metadata expertise
7. Library Services
•
•
•
•
Innovation and development
OU Knowledge Media Institute and others
Semantic web
Video search
http://www.open.ac.uk/blogs/AVA/
http://projects.kmi.open.ac.uk/reflex/index.xml
http://kmi.open.ac.uk/projects/name/lucero
9. Search
“It‟s always so hit-and-miss… I used to sit
there for hours and just not find anything.
There were thousands and thousands of bits
of material but no way of drilling down to find
what I really needed. My manager needed
to know, by tomorrow, whether there was
something we could use or not and I didn‟t
know the answer, so had to say no”.
15. Semantic web
Definition: "The Semantic Web is not a separate Web
but an extension of the current one, in which information
is given well-defined meaning, better enabling computers
and people to work in cooperation."
The Semantic Web
Tim Berners-Lee, James Hendler, and Ora Lassila
Scientific American, 2001
http://www.sciam.com/article.cfm?id=the-semantic-web
http://www.nature.com/scientificamerican/journal/v284/n5/pdf/scientifica
merican0501-34.pdf
16. Semantic web basics
• „web of meaning‟
• „web of data‟
http://www.w3.org/2001/sw/
http://semanticweb.org/wiki/Main_Page
http://www.slideshare.net/fadirra/semanticweb-intro-040411
19. Linked data
• “Linked Data is about using the Web to connect related
data that wasn't previously linked, or using the Web to
lower the barriers to linking data currently linked using
other methods.”
• Wikipedia defines Linked Data as "a term used to
describe a recommended best practice for exposing,
sharing, and connecting pieces of data, information,
and knowledge on the Semantic Web using URIs and
RDF."
http://linkeddata.org/home
20. Subject > Predicate < Object
Jane Austen
„is the author of‟
Pride and Prejudice
http://www.nature.com/scientificamerican/journal/
v284/n5/pdf/scientificamerican0501-34.pdf
21. Ontologies
“An ontology is a formal specification of a shared conceptualization”
Tom Gruber
http://en.wikipedia.org/wiki/Tom_Gruber
http://viaf.org/viaf/72955884/
http://www.slideshare.net/mdaquin/sssw13-ldtut
22. Ontologies
eg
Virtual International Authority File – VIAF – maintained by OCLC
Friend of a Friend – FOAF http://www.foaf-project.org/
http://oclc.org/developer/documentation/virtual-international-authority-file-viaf/viaf-rdf-example
26. Why is this of interest?
Lorcan
Dempsey
OCLC
http://www.slideshare.net/lisld/the-inside-out-library
27. Why is this of interest?
Quoted by Lorcan
Dempsey
“Inside Out library:
Scale, Learning and
Engagement”
http://www.slideshare.net/lisld/the-inside-out-library
28. Why is this of interest?
“The change that libraries will need to make …
must include the transformation of the library’s
public catalog from a stand-alone database of
bibliographic records to a highly hyperlinked data
set that can interact with information resources on
the World Wide Web.”
Karen Coyle
Understanding the semantic web
http://www.alatechsource.org/library-technologyreports/understanding-the-semantic-webbibliographic-data-and-metadata
29. Why is this of interest?
Search is a major “pain point” for students and staff
Students
‘The library is very expansive which is great but
you can never find what you need. They need
to redo the system make it easier.’
NSS comment
Staff
‘I would be more likely to explore existing noncurrent learning materials if there were a
better way of finding them.’
STELLAR survey comment
30. What are libraries doing?
http://lodlam.net/
http://datahub.io/group/lld
http://www.w3.org/2005/Incubator/lld/
31. at the OU Library
• Library catalogue
• Archival material
• Old course materials in the University Archive
32. University Archive
• OU study materials – print and audio-visual
• Historical materials – photographs, oral history
• Papers of OU people
http://www.open.ac.uk/library/library-resources/the-openuniversity-archive
36. The OU Digital Library (OUDL)
FEDORA Flexible Extensible Digital Object Repository
Architecture
Open source, created by and supported by the digital
preservation community
purpose-designed
Supports international metadata standards
PREMIS – METS – MODS – EAD – DC - OAI
Supports Linked Data natively
Mulgara triplestore
37. The STELLAR project
• Semantic Technologies Enhancing the Lifecycle of Learning
Resources
• OU Library Services/OU Knowledge Media Institute
• Experiment with semantic technologies in a digital library
environment … and to consider the sustainability implications
of using semantic technologies.
• Jisc-funded 2012-2013
• Jisc Digital Infrastructure programme – Sustainability of digital
content
38. STELLAR project aims
Taking collections preserved in the OUDL, the STELLAR project was
established to:
• Develop a detailed understanding of the value of legacy learning materials
as perceived by academic staff and other key stakeholders
• Experiment with the use of semantic technologies in a digital library
environment to ascertain the extent to which the perceived value of these
materials might be enhanced and to consider the sustainability
implications of using semantic technologies.
• Inform the development of digital libraries of learning resources by
contributing to the evidence base for their effectiveness
• Increase the return on investment of learning materials by
developing an evidence based model for lifecycle
management
39. The STELLAR project
•
•
•
•
Project approach
Create a baseline of perceptions of the value of the collection
Carry out an enhancement of the collection
Assess the impact of that enhancement on perceptions of value
40. Initial survey into value
• 89.2% of respondents (501) agreed or strongly agreed
with the statement that maintaining an archive of
non-current OU learning materials is important to the
reputation of the OU.
• 75.9% of respondents thought that this should be
maintained in perpetuity.
• 90.16% of respondents (504) agreed or strongly
agreed that non-current learning materials are
important to the context of the history of higher
education.
• 91.75% of those respondents who were involved in
module production (356) agreed or strongly agreed
that when producing new OU learning material, I am
likely to look to previous material, whether for
inspiration or for potential reuse.
“We are the world leaders in distance learning, so our curriculum designs
are much admired and so are our materials. It would be remiss of us not to
treat them as potential objects of scholarship themselves”.
41. Capturing perceptions
Using a balanced scorecard approach we conducted a benchmarking survey of academic staff
and stakeholders to investigate the value they place on non-current learning materials
Personal and professional perspectives of value
I would be disappointed if the OU learning
materials that I helped to produce were not kept
I keep my own copies of the OU learning
materials that I am involved in producing
I would be pleased if others chose to reuse of
reversion the OU learning materials that I have
helped to produce
Value to internal processes and cultures
I keep my own copies of the OU learning
materials that I am involved in producing
When producing new OU learning material, I am
likely to look to previous material, whether for
inspiration or for potential reuse
I would be more likely to explore existing noncurrent learning materials if there were a better
way of finding them.
Value to HE and academic communities
Maintaining an archive of non-current OU learning
materials is important to the reputation of the OU
I think the non-current OU learning materials are
important in the context of the history of higher
education
I think the non-current OU learning materials are
important in showing how the OU taught at
particular times in history
Financial / bottom line perspectives of value
I think that there is a monetary value to non-current
OU learning materials
The OU could make savings if more learning material
were reused
http://www.gla.ac.uk/services/library/espida/
42. Module Information
A metadata
module record
was created
which connects
the complicated
web of content
and metadata
associated with
each module
STELLAR allowed us to link the metadata for all this
module content, making it more discoverable & reusable
43. Basic linked data model
(for data.open.ac.uk and to comply with current module descriptions)
doau:a103
dc:title | rdfs:label | courseware:has-title
rdf:type
courseware:istaught-present
courseware:ha
s-courseware
daou-library:339347
dc:title
dc:isVersionOf
aiiso#code
“An Introduction to
the Humanities”
courseware:Course
|
mlo:LearningOppor
tunitySpecification |
aiiso:Module |
xcri:course
“false”
dc:subject“A103”
doau:a102
dc:isVersionOf
doau:a101
“An introduction to the
humanities : resource book 2”
jacs:V900 |
doau-topic:artsand-humanities
47. Application of Linked Data
• Text entered into the tool is passed through a semantic meaning
engine and concepts are matched against the concepts
contained within the digital library dataset.
• A selection of the closest matches are then displayed. These link
through to the object in the Fedora digital library
• The semantic web tool analyses
the meaning of those words and
finds related material
• the tool can also show related
material from other
datasets from data.open.ac.uk
49. Directly access digitised
content stored in the OUDL
Materials include those
originally in print, audio and
video formats
Links to the extensive metadata about the
course or element of the course, held on a
data.open.ac.uk page
56. Headline findings
• A consistently positive reaction to the enhanced collections. In every area
the majority of respondents agreed or strongly agreed that the enhanced
materials had value
• Were two dimensions where the evaluation indicates the transformation of
the materials has increased the perceived value of the material:
• value to internal processes & culture
• financial/bottom line value
• Participants also made several comments regarding which materials should
be preserved & enhanced
• Read the full report on the blog:
http://www.open.ac.uk/blogs/stellar/wp-content/uploads/2013/07/STELLAR-Post-Enhancement-Survey-Report.pdf
57. Value to internal processes
& culture
• 89% of respondents agreed or strongly agreed that they would be more likely
to explore existing materials if they knew they had been enhanced
• 94% agreed or strongly agreed that such enhancement makes content easier to
reuse or refer to for inspiration during module production
• When thinking about existing systems, 94% also agreed or strongly agreed that
the semantic analysis they had seen suggested material which they would not
have found using a traditional search
• 78% of respondents agreed or strongly agreed that enhanced materials are
more likely to be referred to during module production than those preserved
in existing OU systems
58. Financial / bottom line value
• Improving the discoverability and reusability of the materials appears to
have increased the perceived financial value of the materials
• In the pre-enhancement survey 75.9% of respondents agreed that the OU
could make savings if more learning material were reused
• Following the enhancement, an increased 83% agreed or strongly agreed
that the OU could make cost savings if existing materials were enhanced
to make them more discoverable
“It will be helpful to know what kind of support
and budget is available to make more old course
resources available. This will help reducing costly
budgets for new modules in production.”
59. Value of semantic search
Stakeholder views of semantic search
• ‘More likely to use material’ - 89% agreed/strongly agreed
• ‘Content easier to reuse’ – 94% agreed/strongly agreed
• ‘Found material that traditional search wouldn’t – 94% agreed/strongly agreed
Cost-savings could be made if material re-used
After
Before
72.00% 74.00% 76.00% 78.00% 80.00% 82.00% 84.00%
http://www.open.ac.uk/blogs/stellar/wp-content/uploads/2013/07/STELLARPost-Enhancement-Survey-Report.pdf
60. Key findings
• Significant effort required to improve the metadata
• To make best use of the Linked Data, it was beneficial to digitise and
preserve all course materials for the selected courses
• Trade-off between value of extra content digitised and the
cost of cataloguing
• Once you’ve built it into your system you can automatically generate linked
data for new content of that type
• Stakeholders can see the value of this type of search
61. Follow-up work to STELLAR
• Linked Data embedded into OU Digital Library
• Used to link to related iTunesU and OpenLearn material
62. STELLAR
• STELLAR blog www.open.ac.uk/blogs/stellar
• Final report http://www.open.ac.uk/blogs/stellar/wpcontent/uploads/2013/09/STELLAR-JISC-Final-Report.pdf
• Final report in Jorum http://hdl.handle.net/10949/18379
Today I’m going to talk about some work we’ve been doing at the Open University to explore the potential of semantic web technologies as a tool to improve search in digital libraries
I’ll give you a bit of background to OU Library ServicesA very brief introduction to semantic web technologiesTalk about the context for looking at these search technologies and look at the work of the Stellar search projectThere’ll be time for questions at the end but feel free to interrupt me as we go along if you have questions
Brief background about the OUDistance learning so most students never visit the main campus
Range of library services – we aim to embed links to library material directly into course websites – online and digital library resources – not providing access to books for students
We also run a range of Information Management services, as well as the Institutional Repository and a University ArchiveA key area of expertise is metadata – I’ll cover later how important that is in relation to the semantic web
We’ve a good track record in recent years of running several Jisc-funded projects covering a range of different subjects – reference management, activity data and mobiles
And have carried out a number of other projects looking at video technology – so the AVA project looked at the University’s large collection of videos and built a system to find and digitise video content, particularly material on obsolete video media formatsWe’ve also been working with the University’s Knowledge Media Institute on several semantic web projects, including the Lucero project that built data.open.ac.uk which was the first University-wide linked data repository in the UK
Why searchSearch is critical to us – we spend a lot of time and effort on itWhether that is buying search systems, or teaching people how to search, or providing help and support materials, or trying to resolve problems that people have with trying to find library materialsIt’s a major part of what libraries do
And if people can’t find what they are searching for, then it’s a big problemStudents get frustrated, it affects their studies, it might affect your reputation through National Student Survey scoresSo it’s an important area for usFor staff, it can stop them from being able to effectively do their job and cost you money – this quote is from a person in an academic department asked to look at whether there was material from earlier courses that could be reused – in many cases they end up redoing the teaching materials
Ok searchWell we all know how to searchWe put some words into a search boxWe might also use some special instructions such as boolean logic or put things between inverted commas to help refine the search query
So a traditional search system takes your search query and compares it against an indexEssentially what systems are doing is matching what you typed with what is in the index – character by characterThey might tweak what gets presented, or offer you a ‘did you mean’ feature if you’ve typed something that looks similar to something else, but essentially they match your string of characters with an index derived from whatever they are indexing
But the systems don’t actually understand what you mean when type your search into the search boxThey don’t understand what the words mean, they don’t understand what the concepts are about, and they don’t understand how one search term might relate to another, other than by trawling their logs of what searchers who search for this… searched for that…
So even Google have been starting to think that just responding to a search with things that match character by character might not be the only approachSo they’ve started to explore alternatives with their Knowledge Graph
Which is what they using to flag up relevant material in their search resultsSo where does this different approach come from?
So where did the concept that the web might be organised differently come from?The concept for the semantic web came in an article by Tim Berners-Lee and others in Scientific American in 2001The article outlined an idea that the web should be adapted to allow computers to understand the ‘meaning’ of pages (the semantic content) and proposed structured collections of data
Semantic web technologies have been described as the web of meaning and the web of data.So there is a stack of protocols and technologies and standards used within this spaceThere are a few resources listed here if you want to follow up in more detail
These are some of the key elements of the semantic web toolkitAnd I’ll talk briefly about some of the basic concepts but won’t go into a lot of detail
URIs are Uniform Resource Identifiers – a string of characters used to identify a name of a web resource, this might include the name and the location but could be eitherSo an example of a URI is this OU course – the simple structure of a URI like this leads to a URI like this being described as a ‘cool’ URI
Linked data is the term used to describe the methods of exposing data on the semantic web
The key concept around linked data is around relationships between one piece of data and another. This picture comes from the original semantic web article in Scientific AmericanLinked data uses the concept called triples – where there is a subject and an object that are connected together by a predicate, so for example Jane Austen ‘is the author of’ Pride and PrejudiceThese triples are represented using a data interchange framework called RDF (the resource description framework) and stored in databases called ‘triplestoresThis is the basic building block of linked data
To be able to link data together you need to be able to have a shared vocabulary or ontology that allows you to describe the concepts in your data.These could be about people or places for example, or about things – such as a university courseSo you will need to go through an exercise to map all the concepts against relevant ontologies and then use RDF (the Resource Description Framework) to encode it
By coding the concepts in your data with standard vocabularies, essentially saying that we know where there is data about this concept that we can link to, you can allow connections to be made between websitesSo VIAF and FOAF are ways of linking together your content (in a catalogue for example) that mentions Jane Austen with other pages on the web that contain information about Jane Austen
So if we dig into the schema for that record you can see that there is a URI that identifies that your data is about a person and that there is information about that person at the VIAF URIAnd what you can do is to link the concepts represented by your triples to different ontonlogies
And there are places that you can go to find suitable ontologies and content
So this is part of the basic model
And this shows a bit more of the complexity
Within Fedora this is the approach This is the record for an OU module S100, showing some of the study materials that form part of the course
From the Fedora record then we can automatically create the linked data triple store content
Having created the data in RDF format we were then able to look at testing semantic web search tools to see if it improved the discoverability of the materialWe were able to use a tool called DiscOU developed by the Knowledge Media Institute (KMi)So the tool analyses the meaning of the words in your query to extract concepts in the digital library dataset then displays closest matchesAs well as showing digital library content we’ve also been able to easily connect it to other linked data datasets at the OU such as our OER platform OpenLearn
If you look a bit more closely at the toolThe tool works best with a rich piece of textYou get the concepts in the query displayed on the left with results on the rightYou can then change the priority of the concepts, or exclude them entirely, and as you change the concepts the data displayed refreshesThere’s a link to a short 1 minute screencast on the stellar blog
When you click on the item you get taken through to the content from the digital library – this might be the page of a book or a video
If you click on an OU course code it takes you to a page from data.open.ac.uk – but it could equally well take you to the main page on the university website about that particular course
This is how the system architecture looks, showing the main elements of the system and how it fits together
There’s a publicly available version that you can use to try out the technologyIn this case it searches the OU’s OER platform OpenLearn
But as you can see it is similar to DiscOU Stellar
If you remember our approach to Stellar – which was to measure value, carry out an enhancement and then assess the impact on perceptions of value, using a balanced scorecard approach with these four dimensions of value – how did this work affect user views?
A good positive reactionIn every area the majority agreed that the enhanced materials had valueWhere we saw particular increases in perceived value were maybe unsurprisingly to internal processes and financial value
So if we look in a bit more detailIt’s clear that users are more likely to explore enhanced materialSemantic search also finds material that a traditional ‘strings’ search would not have foundMaterial found through such a semantic search system would be more likely to be used through our course production system
When asked about the impact on the financial aspects it’s clear that the semantic enhancement has actually increased users perception of the value of the materialsIn particular there is a big jump in the percentage of people that think that the University could make cost savings if material was enhanced in this wayBut there’s a caveat that money needs to be invested in the process to make more material available
So to summarise how semantic search helpsUsers are more likely to find material, so more of it can be reused as an alternative to recreating course materialA key point is that semantic search finds things that traditional search can’tAnd users views had shifted in that an even greater percentage thought that there were cost benefits in reusing material
For us the key messages about the semantic search is that it was quite time-consuming to setup the ontologies and metadata for each of the types of material you are dealing with. But for that type of material once the process is done then you can continue to add more material of that type and automatically generate linked dataThere is a trade-off between the amount of enhancement you do and the improvements that gives you with discoverability – but that is pretty much the case with anything that involves metadataBut stakeholders do see and appreciate the value of an improved search feature
We’re now working on embedding the learning from STELLAR into the OU Digital Library. So linked data is in-built and we are using that capability to link out to related materials in other linked data datasets, in this case in OpenLearn and iTunesU.We haven’t yet built a semantic search tool as that will need some further development to build a version of the semantic meaning engine that we can integrate into the digital libraryWe’re also talking more widely across the university about how we could use the technology across other repositories of learning and teaching materials.
Ok, so that was a run through linked data and the semantic searching work carried out by the STELLAR projectFurther information including the final report along with about 20 blogs posts we put up during the project are on the project blogHope that was of interest