1330 mon dochart2 brock


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • If we go back to 2009, it became obvious that library search simply didn’t work as well as users expected it to.We were regularly getting the sort of comments you see on screen which showed that library users were struggling with the federated search system that we were using.So the library embarked on some work to improve search, by introducing a new discovery search system and making improvements to the web site.
  • We changed the search system to a new generation of library search system, EBSCO Discovery System (EDS). Instead of searching library resources individually and telling you how many results are in each database it now searches one index and shows the results in a single list. Throughout 2010 and 2011 we worked with the system to make sure it was integrated into our improvements to the library web site and included as many of our resources as possible. We are still in the process of making further enhancements to include subject based searching.
  • We then started thinking whether there was more that we could do to improve the user experience. For a while we’d been following with interest some JISC work looking at whether activity data could be used by libraries to improve services, in projects such as TILE and MOSAIC. So we started to think whether there was an opportunity to look at whether using our activity data could improve the user experience of library search.
  • So when we knew that JISC were going to be funding some more work on activity data, we thought about what we’d want to do, and came up with this hypothesis.
  • The project we came up with was RISE – Recommendations Improve the Search ExperienceWe set out to test two thingsCan you use search data to make recommendations?Are recommendations useful for these new systems?
  • RISE was funded as part of the Activity Data strand of the JISC Infrastructure for Education and Research programme.It was a very short project, just six months, with a small team consisting of a developer and a project manager.There were seven other projects in the programme. Some of which were working with libraries, such as SALT and LIDP, others of which were looking at activity data in a range of other areas from Virtual Learning Environments, through repositories, to student systems to video-conferencing data, and including the UCIAD project in the OU’s Knowledge Management Institute looking at a user-centred approach to web click stream data.
  • So why the focus on activity data?Since the early 1990’s the business sector, particularly companies such as Tesco, Amazon and Wal-Mart, have been exploiting the data they have about customer activities to support decision making. Industry analysts have noted how many big retailers now use complex algorithms to analyse the large amounts of customer data involved to create new revenue opportunities and increase customer retention.Equally publishers are looking at ways to reach readers who are interested in particular topics online. In the context of Facebook, Twitter and other social networking sites worldwide networks of friends and people with similar interests form a large online community of potential readers that are just waiting to discover new relevant content. Knowing what these people are looking at, buying, and recommending could be key to marketing products in the years to come.Some early research by JISC, in the TILE and MOSAIC projects, identified that the HE sector also had extensive user data and there was some potential to make use of it, but it was greatly underused. So this JISC programme has set out to explore this area in more detail. Across the sector we are being told to be more business-like and the use of customer data is one of the areas that businesses seem to be exploiting far more than we do.
  • For a traditional ‘bricks and mortar’ university these are some of the ways that you’d typically interact with your customers.Well, for the OU things are a bit different…
  • We don’t really loan many books to students or have many accessing the library. All our students are distance learners so they interact with us online and use our resources electronically. And with more than 450,000 unique users of our website and over 100,000 unique users of our e-resources each year then there’s a fair amount of activity data for us to use.
  • So, if we are concentrating on our e-resources then the systems we use are SAMS single sign on. The EZProxy system from OCLC which allows students to access our resources as if they were locally within the library We are using SFX from ExLibris as our resources knowledge base and as the OpenURL link resolver and then finally the EBSCO Discovery System in place of an older federated search system
  • The stages of the project were to build the database fill it with activity data, write some software to create the recommendationscreate a search interface to show the recommendationstest it with some users and get feedback
  • We push as much as possible through EZproxy, so we use it for access through our discovery system, for links from SFX, for links placed in our VLE. So it seemed the obvious choice as the place to start to look at e-resource activity data. We didn’t have access to the EBSCO Discovery log files and we hadn’t been using that system for long whereas we did have a few months of log files from EZProxy.So we started with the EZProxy log files as the core dataset.
  • So when we start to look in detail at what data is contained within the log files you’ve got some useful data and other data that isn’t so useful for activity data purposes.We know the user name – that’s the OUCU, the Open University Computer User account name. You know the request, that is the website that is being accessedSo when you look at the detail of the record what you get is…
  • Something that looks like this (we’ve anonymised the OUCU for obvious reasons).This is one record out of tens of thousands of rows but with a bit of work you can break it down to…
  • So you’ve got the date and time – useful to be able to know when something happened
  • And the username of the user
  • And the request that has been made – in this case an EBSCOHost search
  • So our database starts to build up with details of the user and resources
  • We can then get data about the course(s) that students were studying from our internal student information system.
  • This tells us their course, subject area of interest and degree programme.
  • So, the data we have so far can tell us which courses people are on, so we can make recommendations based on that, i.e. these are the most popular resources that people on your course are looking at. We can also start to say that if you looked at resource C and then straightaway looked at resource D that there is a likelihood that there is some relationship between resource C and resource D.And we can also say which overall are the most popular articles or journals.
  • But there are limitations. From the logs you don’t always know what search terms were used or have much information about the item that is being accessed.And if you want to make a recommendation you don’t even have an article or journal title to show as the recommendation.So we looked at how we could improve the data. At the moment we use another EBSCO EDS API call to extract bibliographic details that are used to extract data from Crossref that we can store in the database.
  • That meant we could then retrieve some bibliographic data.Originally we’d hoped that we would be able to store basic metadata from EBSCO in the system but after discussion with them we realised that the license terms wouldn’t let us do that.So we had to look for other metadata sources that we could use. So we set the system up to retrieve data keys from EBSCO and use them to search Crossref. The Crossref data license allows you to store that data locally.
  • We created a test search interface called MyRecommendations, to test recommendations with users using the EBSCO EDS API. It gave a search screen, and included recommendations based on other resources the user had recently viewed.
  • Once a search had been carried out the search results page presented the user with further recommendations based on the articles viewed by people who had used similar search terms.
  • If you viewed one of the recommended resources it then opened the record in another window and you were given the chance to rate the usefulness of the recommendation.
  • We also built a second interface – this one is a Google Gadget version with pretty much the same functions as the main interface.
  • We then also started to capture search terms used in the MyRecommendations (RISE) interface.
  • Now we can add search terms that are being used
  • So we’ve ended up with a set of data that can give us a range of different types of recommendationsFrom ‘people on your course are looking at these articles’ through ‘people who looked at this article also looked at this article’ and ‘to people using this search term looked at these resources’And we are sure that you could put the data to other types of use.
  • When we were looking at recommendations we thought that the simplest approach was just to start with something very basic.What drives the recommendations is a set of relationship values. Values are assigned based on resource views and subsequent ratings by users.The relationships are ranked according to value so the top ones get shown as recommendations.
  • Each relationship starts as value 0 +1 each time the resource is viewed +1 each time the recommendation is viewed +1 each time the recommendation is rated as ‘Useful’ -2 each time the recommendation is rated as ‘Not Useful’Recommendations are displayed in value order
  • Any system that deals with personal data has to be mindful of privacy and data protection requirements. After discussion within the JISC Activity Data programme and some helpful information particularly from EDINA’s OpenURL project, we put together a specific privacy policy and discussed it with our data protection people at the University. The policy explicitly covered activity data and we have linked to it from the RISE interfaces, from our main EBSCO EDS page and from SFX. The policy gives people an opt-out to have their data removed from the recommendations, even though they aren’t identified personally in any of the recommendations.With the new EU ‘cookies’ legislation we are doing some more work to ensure that we are legally compliant. Ideally we would want any institutional ‘cookie’ policy and agreement to cover permission to use data for this type of activity.
  • With EZProxy data on its own there are limits to the recommendations you can make, they would mostly be about which are the most popular resources.Our main issue was to get access to bibliographic data about the articles being accessed and recommended.To create something meaningful you need to combine the activity (EZproxy) data with other stuff, such student data.The more data you can get the better. The more data you add to the mix the more types recommendations you can then make.License restrictions on article level metadata limit what you can store in your database.
  • The original plan with the project was to be able to release an open data set of search data. We spent quite a lot of time looking at methods of anonymising the data, by removing usernames, generalising courses to broad subjects and looking at whether there was a threshold of students that we needed on a course to be able to release any data from that course.We faced a major challenge because the activity data we had was fairly meaningless without some article metadata and at the time we could only find data we could use ourselves and nothing we could make available in an open data set.So unfortunately it wasn’t possible to release the data. But others at EDINA, LIDP and SALT were able to do so.
  • The Google Gadget version of MyRecommendations will go into list of tools for students.We are migrating the database so we can use it for more mainstream use. We plan to use it for the new JISC MACON (Mobilising Academic Content Online) mobile devices search and accessibility project. And we’re interested in how this data could be used by Learning Analytics, an OU project to gather all user and activity data together into one single data warehouse.
  • We are also looking at how we can use these approaches to provide personalised services to users through the library website, so have been looking at being able to show people what articles are being looked at and have been developing some beta services to demonstrate this.
  • 1330 mon dochart2 brock

    1. 1. Recommendations Improve theSearch ExperienceAlison Brockwww.open.ac.uk/blogs/rise
    2. 2. OutlineOpen University (OU) ContextWhy use activity data?Scope of the projectWhat we didEvaluation and next steps
    3. 3. OU context “The search engine on the library is not very user friendly. I had to find a specific article recommended in “The search the text and it facility is poor took several and doesn’t attempts to find stuff that is locate it.” supposed to be there”http://www.flickr.com/photos/james_lumb/3921968993/sizes/z/in/photostream
    4. 4. New search system New generation Discovery System from EBSCO (EDS)http://www.flickr.com/photos/jiscimages/435135071/sizes/m/in/photostream/
    5. 5. Could we do more?http://www.flickr.com/photos/davepattern/5808712333/sizes/z/in/photostream/
    6. 6. Recommendations Improve theSearch Experience? “That recommender systems can enhance the student experience in new generatione-resource discovery services”
    7. 7. Do recommendations improve the search experience? Can you use search data to make recommendations? Are recommendations useful in Discovery systems?http://www.flickr.com/photos/davepattern/3473326634/sizes/z/in/photostream/
    8. 8. JISC Activity Data ProgrammeJISC funded projectFebruary – July 2011One of eight projects [list at http://bit.ly/gwCmNS]
    9. 9. Why activity data? "Every day I wake up and ask, how can I flow data better, manage data better, analyse data better?" Rollin Ford, the CIO of Wal-Marthttp://www.flickr.com/photos/zerimski/5215633183/sizes/z/in/photostream/
    10. 10. Typical library activity data Computer Loans Holds bookings Library e- access resources
    11. 11. OU Library activity data Computer Loans Holds bookings Library e- access resources
    12. 12. OU Library systems environment Athens DA authentication built into local (SAMS) login system EZProxy remote resource access SFX knowledge base and OpenURL link resolver Ebsco Discovery Solution
    13. 13. Scope of our project Algorithms &Activity data recommender Search code interface
    14. 14. What data is RISE using?bookmarklet
    15. 15. So what is in the EZProxy logs?• Remote host• Date/Time• OUCU• Request• Status• Size of response• Referrer• User agent• Session http://www.flickr.com/photos/vixon/116447718/sizes/m/in/photostream/
    16. 16. So what is in the EZProxy logs?"0"|||""|||20110115235421|||“nn1234"|||"GET http://libezproxy.open.ac.uk:80/connect?Session=st3ShtizgtrS7tU5&url=http://search.ebscohost.com/login.aspx?direct=true&site=edslive&scope=site&type=0&cli0=FT&clv0=Y&cli1=FT1&clv1=Y&authtype=ip&group=VCStud&bquery=War%20Against%20the%20PanthersHTTP/1.1“|||302|||0|||http://library.open.ac.uk/|||"Mozilla/5.0 (X11; U; Linux i686; en-US;rv: Gecko/20101206 Ubuntu/10.10(maverick) Firefox/3.6.13"|||"t3ShtizgtrS7tU5"
    17. 17. So what is in the EZProxy logs?"0"|||""|||20110115235421|||“nn1234"|||"GET http://libezproxy.open.ac.uk:80/connect? date and timeSession=st3ShtizgtrS7tU5&url=http://search.ebscohost.com/login.aspx?direct=true&site=edslive&scope=site&type=0&cli0=FT&clv0=Y&cli1=FT1&clv1=Y&authtype=ip&group=VCStud&bquery=War%20Against%20the%20PanthersHTTP/1.1“|||302|||0|||http://library.open.ac.uk/|||"Mozilla/5.0 (X11; U; Linux i686; en-US;rv: Gecko/20101206 Ubuntu/10.10(maverick) Firefox/3.6.13"|||"t3ShtizgtrS7tU5"
    18. 18. So what is in the EZProxy logs?"0"|||""|||20110115235421|||“nn1234"|||"GET http://libezproxy.open.ac.uk:80/connect? UserSession=st3ShtizgtrS7tU5&url= namehttp://search.ebscohost.com/login.aspx?direct=true&site=edslive&scope=site&type=0&cli0=FT&clv0=Y&cli1=FT1&clv1=Y&authtype=ip&group=VCStud&bquery=War%20Against%20the%20PanthersHTTP/1.1“|||302|||0|||http://library.open.ac.uk/|||"Mozilla/5.0 (X11; U; Linux i686; en-US;rv: Gecko/20101206 Ubuntu/10.10(maverick) Firefox/3.6.13"|||"t3ShtizgtrS7tU5"
    19. 19. So what is in the EZProxy logs?"0"|||""|||20110115235421|||“nn1234"|||"GET http://libezproxy.open.ac.uk:80/connect?Session=st3ShtizgtrS7tU5&url=http://search.ebscohost.com/login.aspx?direct=true&site=edslive&scope=site&type=0&cli0=FT&clv0=Y&cli1=FT1&clv1=Y&authtype=ip&group=VCStud&bquery=War%20Against%20the%20PanthersHTTP/1.1“|||302|||0|||http://library.open.ac.uk/|||"Mozilla/5.0 (X11; Request U; Linux i686; en-US;rv: Gecko/20101206 Ubuntu/10.10(maverick) Firefox/3.6.13"|||"t3ShtizgtrS7tU5"
    20. 20. RISE database
    21. 21. RISE databaseRemote host | Date/Time | Oucu | EZProxyrequest | status | size of response |referrer | user agent | sessionCIRCE user type | course code(s)
    22. 22. RISE database
    23. 23. What can the data tell us? People who looked at resource ‘C’ alsoPeople on course ‘A’ viewed resource ‘B’ looked at resource ‘D’Which are the most popular resourcesThis resource is being used by people studying this course
    24. 24. But what isn’t there? ISSNs DOI Article Subject information termshttp://www.flickr.com/photos/kevharb/5466661946/sizes/z/in/photostream/
    25. 25. So how do you improve your data?Remote host | Date/Time | Oucu | request | status EZProxy| size of response | referrer | user agent | session user type | course code(s) CIRCEEDS Bibliographic data matchingCrossref
    26. 26. MyRecommendations
    27. 27. http://www.google.com/ig/directory?type=gadgets&url=library.open.ac.uk/rise/google_gadget/risesearch.xml
    28. 28. So how do you improve your data?Remote host | Date/Time | Oucu | request | status | size of EZProxyresponse | referrer | user agent | session user type | course code(s) CIRCEEDS Bibliographic data matchingCrossrefRISE Searches in RISE
    29. 29. RISE database
    30. 30. What can the data tell us? People on course ‘A’ viewed People who looked at resource People who searched for subject resource ‘B’ ‘C’ also looked at resource ‘D’ ‘E’ looked at resource ‘F’People are looking at resources on this subjectThis resource is being used by people studying this course
    31. 31. Getting a recommendation
    32. 32. Getting a recommendation User A Views Resource B Views +1 Resource B Module A123 RV=14 RV=15 User C Recommended Resource B Views +1 Resource B Module A123 RV=15 RV=16 User C Rate Useful +1 Resource B Module A123 RV=17 User C Rate Not Useful Resource B Module A123 -2 RV=14
    33. 33. Data Protection and privacy Added a privacy policy to RISE, EDS and SFX interfaces Provided an opt-out featurePrivacy and opt-out URLhttp://library.open.ac.uk/rise/?page=privacy
    34. 34. Evaluation Online SurveyFace to Face interviewsReview of web analytics
    35. 35. Survey resultsRelated to records you have viewed Very useful Not sure 45% 11% Not useful 22% Quite useful 22% Slightly useful 0%
    36. 36. Survey results
    37. 37. Focus groups Undergraduates PostgraduatesLike ratings and reviews from Citation as a recommendation other students ‘other people’s experiences Wary of provenance valuable’ Feed to module website Which module studied? Want synonyms How high a mark? Trust repository
    38. 38. Face to face interviews First impressions of recommendations (course-related) Asked to enter a search term. Results and recommendations explored. Asked about relevance Asked about preference for type of recommendation
    39. 39. Should we have a recommender system? “I think it would be a very good useful feature. It would be definitely very very useful” postgraduate Maths student“Im afraid my first reaction is to be a bit sceptical - it presumably doesnt tellyou if fellow students found the information/article useful or relevant to whatthey were looking for. I would hate to waste time following unproductivelinks laid down by others who might be failing students or think that any"lazy" students might develop poor practice by relying on what others hadlooked at. It sounds like a good idea but I think caution needs to beexercised. ”I have just had a go, it was good withsuggested papers that I had already found(which shows potential in my view) throughGoogle.
    40. 40. Recommendations usage 2000 1800 1600 1400 1200 Relationship 1000 Course Search 800 600 400 200 0 1 2 3 4
    41. 41. Findings and lessons learnt • Users like recommendations ‘in principle’ • Recommendations provenance • Interest in the search tools
    42. 42. Findings and lessons learnt• EZProxy data• Use other data sources• Search terms• Need more data• License restrictions on metadata
    43. 43. Open Datahttp://www.flickr.com/photos/okfn/6262973028/sizes/z/in/photostream/
    44. 44. What next? • Google Gadget search tool • Recommendations database • MACON • Learning Analyticshttp://www.flickr.com/photos/shandrew/2102808886/sizes/m/in/photostream/
    45. 45. What next?
    46. 46. Blog: www.open.ac.uk/blogs/RISECode: http://code.google.com/p/rise-project/source/browse/trunk/rise/
    47. 47. My thanks go to Richard Nurse and LizMallett of the Open University Library forgiving me the use of their slides on theproject for this presentation. Any questions ???