If we go back to 2009, it became obvious that library search simply didn’t work as well as users expected it to.We were regularly getting the sort of comments you see on screen which showed that library users were struggling with the federated search system that we were using.So the library embarked on some work to improve search, by introducing a new discovery search system and making improvements to the web site.
We changed the search system to a new generation of library search system, EBSCO Discovery System (EDS). Instead of searching library resources individually and telling you how many results are in each database it now searches one index and shows the results in a single list. Throughout 2010 and 2011 we worked with the system to make sure it was integrated into our improvements to the library web site and included as many of our resources as possible. We are still in the process of making further enhancements to include subject based searching.
We then started thinking whether there was more that we could do to improve the user experience. For a while we’d been following with interest some JISC work looking at whether activity data could be used by libraries to improve services, in projects such as TILE and MOSAIC. So we started to think whether there was an opportunity to look at whether using our activity data could improve the user experience of library search.
So when we knew that JISC were going to be funding some more work on activity data, we thought about what we’d want to do, and came up with this hypothesis.
The project we came up with was RISE – Recommendations Improve the Search ExperienceWe set out to test two thingsCan you use search data to make recommendations?Are recommendations useful for these new systems?
RISE was funded as part of the Activity Data strand of the JISC Infrastructure for Education and Research programme.It was a very short project, just six months, with a small team consisting of a developer and a project manager.There were seven other projects in the programme. Some of which were working with libraries, such as SALT and LIDP, others of which were looking at activity data in a range of other areas from Virtual Learning Environments, through repositories, to student systems to video-conferencing data, and including the UCIAD project in the OU’s Knowledge Management Institute looking at a user-centred approach to web click stream data.
So why the focus on activity data?Since the early 1990’s the business sector, particularly companies such as Tesco, Amazon and Wal-Mart, have been exploiting the data they have about customer activities to support decision making. Industry analysts have noted how many big retailers now use complex algorithms to analyse the large amounts of customer data involved to create new revenue opportunities and increase customer retention.Equally publishers are looking at ways to reach readers who are interested in particular topics online. In the context of Facebook, Twitter and other social networking sites worldwide networks of friends and people with similar interests form a large online community of potential readers that are just waiting to discover new relevant content. Knowing what these people are looking at, buying, and recommending could be key to marketing products in the years to come.Some early research by JISC, in the TILE and MOSAIC projects, identified that the HE sector also had extensive user data and there was some potential to make use of it, but it was greatly underused. So this JISC programme has set out to explore this area in more detail. Across the sector we are being told to be more business-like and the use of customer data is one of the areas that businesses seem to be exploiting far more than we do.
For a traditional ‘bricks and mortar’ university these are some of the ways that you’d typically interact with your customers.Well, for the OU things are a bit different…
We don’t really loan many books to students or have many accessing the library. All our students are distance learners so they interact with us online and use our resources electronically. And with more than 450,000 unique users of our website and over 100,000 unique users of our e-resources each year then there’s a fair amount of activity data for us to use.
So, if we are concentrating on our e-resources then the systems we use are SAMS single sign on. The EZProxy system from OCLC which allows students to access our resources as if they were locally within the library We are using SFX from ExLibris as our resources knowledge base and as the OpenURL link resolver and then finally the EBSCO Discovery System in place of an older federated search system
The stages of the project were to build the database fill it with activity data, write some software to create the recommendationscreate a search interface to show the recommendationstest it with some users and get feedback
We push as much as possible through EZproxy, so we use it for access through our discovery system, for links from SFX, for links placed in our VLE. So it seemed the obvious choice as the place to start to look at e-resource activity data. We didn’t have access to the EBSCO Discovery log files and we hadn’t been using that system for long whereas we did have a few months of log files from EZProxy.So we started with the EZProxy log files as the core dataset.
So when we start to look in detail at what data is contained within the log files you’ve got some useful data and other data that isn’t so useful for activity data purposes.We know the user name – that’s the OUCU, the Open University Computer User account name. You know the request, that is the website that is being accessedSo when you look at the detail of the record what you get is…
Something that looks like this (we’ve anonymised the OUCU for obvious reasons).This is one record out of tens of thousands of rows but with a bit of work you can break it down to…
So you’ve got the date and time – useful to be able to know when something happened
And the username of the user
And the request that has been made – in this case an EBSCOHost search
So our database starts to build up with details of the user and resources
We can then get data about the course(s) that students were studying from our internal student information system.
This tells us their course, subject area of interest and degree programme.
So, the data we have so far can tell us which courses people are on, so we can make recommendations based on that, i.e. these are the most popular resources that people on your course are looking at. We can also start to say that if you looked at resource C and then straightaway looked at resource D that there is a likelihood that there is some relationship between resource C and resource D.And we can also say which overall are the most popular articles or journals.
But there are limitations. From the logs you don’t always know what search terms were used or have much information about the item that is being accessed.And if you want to make a recommendation you don’t even have an article or journal title to show as the recommendation.So we looked at how we could improve the data. At the moment we use another EBSCO EDS API call to extract bibliographic details that are used to extract data from Crossref that we can store in the database.
That meant we could then retrieve some bibliographic data.Originally we’d hoped that we would be able to store basic metadata from EBSCO in the system but after discussion with them we realised that the license terms wouldn’t let us do that.So we had to look for other metadata sources that we could use. So we set the system up to retrieve data keys from EBSCO and use them to search Crossref. The Crossref data license allows you to store that data locally.
We created a test search interface called MyRecommendations, to test recommendations with users using the EBSCO EDS API. It gave a search screen, and included recommendations based on other resources the user had recently viewed.
Once a search had been carried out the search results page presented the user with further recommendations based on the articles viewed by people who had used similar search terms.
If you viewed one of the recommended resources it then opened the record in another window and you were given the chance to rate the usefulness of the recommendation.
We also built a second interface – this one is a Google Gadget version with pretty much the same functions as the main interface.
We then also started to capture search terms used in the MyRecommendations (RISE) interface.
Now we can add search terms that are being used
So we’ve ended up with a set of data that can give us a range of different types of recommendationsFrom ‘people on your course are looking at these articles’ through ‘people who looked at this article also looked at this article’ and ‘to people using this search term looked at these resources’And we are sure that you could put the data to other types of use.
When we were looking at recommendations we thought that the simplest approach was just to start with something very basic.What drives the recommendations is a set of relationship values. Values are assigned based on resource views and subsequent ratings by users.The relationships are ranked according to value so the top ones get shown as recommendations.
Each relationship starts as value 0 +1 each time the resource is viewed +1 each time the recommendation is viewed +1 each time the recommendation is rated as ‘Useful’ -2 each time the recommendation is rated as ‘Not Useful’Recommendations are displayed in value order
With EZProxy data on its own there are limits to the recommendations you can make, they would mostly be about which are the most popular resources.Our main issue was to get access to bibliographic data about the articles being accessed and recommended.To create something meaningful you need to combine the activity (EZproxy) data with other stuff, such student data.The more data you can get the better. The more data you add to the mix the more types recommendations you can then make.License restrictions on article level metadata limit what you can store in your database.
The original plan with the project was to be able to release an open data set of search data. We spent quite a lot of time looking at methods of anonymising the data, by removing usernames, generalising courses to broad subjects and looking at whether there was a threshold of students that we needed on a course to be able to release any data from that course.We faced a major challenge because the activity data we had was fairly meaningless without some article metadata and at the time we could only find data we could use ourselves and nothing we could make available in an open data set.So unfortunately it wasn’t possible to release the data. But others at EDINA, LIDP and SALT were able to do so.
The Google Gadget version of MyRecommendations will go into list of tools for students.We are migrating the database so we can use it for more mainstream use. We plan to use it for the new JISC MACON (Mobilising Academic Content Online) mobile devices search and accessibility project. And we’re interested in how this data could be used by Learning Analytics, an OU project to gather all user and activity data together into one single data warehouse.
We are also looking at how we can use these approaches to provide personalised services to users through the library website, so have been looking at being able to show people what articles are being looked at and have been developing some beta services to demonstrate this.
OutlineOpen University (OU) ContextWhy use activity data?Scope of the projectWhat we didEvaluation and next steps
OU context “The search engine on the library is not very user friendly. I had to find a specific article recommended in “The search the text and it facility is poor took several and doesn’t attempts to find stuff that is locate it.” supposed to be there”http://www.flickr.com/photos/james_lumb/3921968993/sizes/z/in/photostream
New search system New generation Discovery System from EBSCO (EDS)http://www.flickr.com/photos/jiscimages/435135071/sizes/m/in/photostream/
Could we do more?http://www.flickr.com/photos/davepattern/5808712333/sizes/z/in/photostream/
Recommendations Improve theSearch Experience? “That recommender systems can enhance the student experience in new generatione-resource discovery services”
Do recommendations improve the search experience? Can you use search data to make recommendations? Are recommendations useful in Discovery systems?http://www.flickr.com/photos/davepattern/3473326634/sizes/z/in/photostream/
JISC Activity Data ProgrammeJISC funded projectFebruary – July 2011One of eight projects [list at http://bit.ly/gwCmNS]
Why activity data? "Every day I wake up and ask, how can I flow data better, manage data better, analyse data better?" Rollin Ford, the CIO of Wal-Marthttp://www.flickr.com/photos/zerimski/5215633183/sizes/z/in/photostream/
So what is in the EZProxy logs?• Remote host• Date/Time• OUCU• Request• Status• Size of response• Referrer• User agent• Session http://www.flickr.com/photos/vixon/116447718/sizes/m/in/photostream/
So what is in the EZProxy logs?"0"|||"18.104.22.168"|||20110115235421|||“nn1234"|||"GET http://libezproxy.open.ac.uk:80/connect?Session=st3ShtizgtrS7tU5&url=http://search.ebscohost.com/login.aspx?direct=true&site=edslive&scope=site&type=0&cli0=FT&clv0=Y&cli1=FT1&clv1=Y&authtype=ip&group=VCStud&bquery=War%20Against%20the%20PanthersHTTP/1.1“|||302|||0|||http://library.open.ac.uk/|||"Mozilla/5.0 (X11; U; Linux i686; en-US;rv:22.214.171.124) Gecko/20101206 Ubuntu/10.10(maverick) Firefox/3.6.13"|||"t3ShtizgtrS7tU5"
So what is in the EZProxy logs?"0"|||"126.96.36.199"|||20110115235421|||“nn1234"|||"GET http://libezproxy.open.ac.uk:80/connect? date and timeSession=st3ShtizgtrS7tU5&url=http://search.ebscohost.com/login.aspx?direct=true&site=edslive&scope=site&type=0&cli0=FT&clv0=Y&cli1=FT1&clv1=Y&authtype=ip&group=VCStud&bquery=War%20Against%20the%20PanthersHTTP/1.1“|||302|||0|||http://library.open.ac.uk/|||"Mozilla/5.0 (X11; U; Linux i686; en-US;rv:188.8.131.52) Gecko/20101206 Ubuntu/10.10(maverick) Firefox/3.6.13"|||"t3ShtizgtrS7tU5"
So what is in the EZProxy logs?"0"|||"184.108.40.206"|||20110115235421|||“nn1234"|||"GET http://libezproxy.open.ac.uk:80/connect? UserSession=st3ShtizgtrS7tU5&url= namehttp://search.ebscohost.com/login.aspx?direct=true&site=edslive&scope=site&type=0&cli0=FT&clv0=Y&cli1=FT1&clv1=Y&authtype=ip&group=VCStud&bquery=War%20Against%20the%20PanthersHTTP/1.1“|||302|||0|||http://library.open.ac.uk/|||"Mozilla/5.0 (X11; U; Linux i686; en-US;rv:220.127.116.11) Gecko/20101206 Ubuntu/10.10(maverick) Firefox/3.6.13"|||"t3ShtizgtrS7tU5"
So what is in the EZProxy logs?"0"|||"18.104.22.168"|||20110115235421|||“nn1234"|||"GET http://libezproxy.open.ac.uk:80/connect?Session=st3ShtizgtrS7tU5&url=http://search.ebscohost.com/login.aspx?direct=true&site=edslive&scope=site&type=0&cli0=FT&clv0=Y&cli1=FT1&clv1=Y&authtype=ip&group=VCStud&bquery=War%20Against%20the%20PanthersHTTP/1.1“|||302|||0|||http://library.open.ac.uk/|||"Mozilla/5.0 (X11; Request U; Linux i686; en-US;rv:22.214.171.124) Gecko/20101206 Ubuntu/10.10(maverick) Firefox/3.6.13"|||"t3ShtizgtrS7tU5"
What can the data tell us? People who looked at resource ‘C’ alsoPeople on course ‘A’ viewed resource ‘B’ looked at resource ‘D’Which are the most popular resourcesThis resource is being used by people studying this course
But what isn’t there? ISSNs DOI Article Subject information termshttp://www.flickr.com/photos/kevharb/5466661946/sizes/z/in/photostream/
So how do you improve your data?Remote host | Date/Time | Oucu | request | status EZProxy| size of response | referrer | user agent | session user type | course code(s) CIRCEEDS Bibliographic data matchingCrossref
So how do you improve your data?Remote host | Date/Time | Oucu | request | status | size of EZProxyresponse | referrer | user agent | session user type | course code(s) CIRCEEDS Bibliographic data matchingCrossrefRISE Searches in RISE
What can the data tell us? People on course ‘A’ viewed People who looked at resource People who searched for subject resource ‘B’ ‘C’ also looked at resource ‘D’ ‘E’ looked at resource ‘F’People are looking at resources on this subjectThis resource is being used by people studying this course
Getting a recommendation User A Views Resource B Views +1 Resource B Module A123 RV=14 RV=15 User C Recommended Resource B Views +1 Resource B Module A123 RV=15 RV=16 User C Rate Useful +1 Resource B Module A123 RV=17 User C Rate Not Useful Resource B Module A123 -2 RV=14
Evaluation Online SurveyFace to Face interviewsReview of web analytics
Survey resultsRelated to records you have viewed Very useful Not sure 45% 11% Not useful 22% Quite useful 22% Slightly useful 0%
Focus groups Undergraduates PostgraduatesLike ratings and reviews from Citation as a recommendation other students ‘other people’s experiences Wary of provenance valuable’ Feed to module website Which module studied? Want synonyms How high a mark? Trust repository
Face to face interviews First impressions of recommendations (course-related) Asked to enter a search term. Results and recommendations explored. Asked about relevance Asked about preference for type of recommendation
Should we have a recommender system? “I think it would be a very good useful feature. It would be definitely very very useful” postgraduate Maths student“Im afraid my first reaction is to be a bit sceptical - it presumably doesnt tellyou if fellow students found the information/article useful or relevant to whatthey were looking for. I would hate to waste time following unproductivelinks laid down by others who might be failing students or think that any"lazy" students might develop poor practice by relying on what others hadlooked at. It sounds like a good idea but I think caution needs to beexercised. ”I have just had a go, it was good withsuggested papers that I had already found(which shows potential in my view) throughGoogle.