RSS feeds using Millennium data

1,097 views

Published on

I presented this talk on creating RSS feeds at the European Innovative Users Group (EIUG) 2010 conference held at Aston University, 15-16 June 2010.

I describe a method of exporting and reusing metadata held in the Innovative Millennium LMS that enables reuse of the data and presentation as an RSS feed - in this case new books lists. This is achieved using Free / Open Sources software.

I explain how the process can be generalised to export of other bibliographic data, for example to export reading lists information to a VLE (BlackBoard) as XML, or presenting lists of e-resources on a Web site using a PHP front-end.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,097
On SlideShare
0
From Embeds
0
Number of Embeds
212
Actions
Shares
0
Downloads
3
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • I’m going to talk about work I did at Durham University. I no longer work there, having moved to University of London.I’d like to thank Jon Purcell, University Librarian at Durham for permission to talk to you about this system today
  • Problem is the new books listThese are also known as acquisitions and accessions lists which are more horrible, library-jargony terms for the same sort of thing.We’re talking about new books and stuff. New items maybe? Suggestions welcome.What we’re going to do is create lists that make readers aware of new items available at the library.
  • Note: I do not recommend doing this as your only way of advertising new stock.
  • Durham’s previous solution was a featured list.
  • Problems of this… The featured list was taking up substantial staff time in manual tweaking each week The list wasn’t split by subject and there was no practical way of achieving this without taking up loads of review files. The list presented couldn’t be easily reused or displayed elsewhere.By academic year 2007-08 usage has dropped to just ~6 unique visitors a week.As the only advertising of new books, that’s not good enough.
  • The idea of moving new books to an RSS feed is one of those “obvious” Web 2.0 improvements libraries come up with.RSS feeds allow readers to view the lists wherever they want in their choice of client. “Save the time of the reader”.We can move much of the processing to automated scripts. “Save the time of the staff”.Better, we can reuse the RSS feeds to push our new books lists to other places – like the Web and twitter. We’ll get on to that stuff later.
  • An important real reason for doing this was to pilot this approach of data export and processing.Making RSS feeds of new books is low-risk and demonstrates this technique is workable before we start asking other University departments to do development work of their own to reuse our data.
  • So this is what I wanted to see at the end.I wanted a system that would run without requiring constant attention or manual fettling of data.It was important this didn’t introduce additional, onerous work for our cataloguers.Even more important,I had no budget any extra software.
  • I’m just going to assume everyone has a Millennium server.I wanted to make use of the excellent database and Web hosting platform provided by Durham’s IT department, so my choice of technologies was made.Of course you could use different scripting languages and databases. You might even run it all on Windows… but friends, why punish yourself?
  • I mentioned the featured list being created each week.This was based on marking items as “new” by changing their item status. “New books” as an idea was already integrated into the cataloguing workflow.It was an easy next step to export the contents of the review file and reuse it.This might not work for you.At Senate House I’ve found it best to talk to the head cataloguer to work throughabout how to approach this
  • Our cataloguer just needs to export these fields into a tilde-delimited text file.This file is saved onto a networked drive that will be accessible to my Linux server for processing.
  • Sorry for putting this wall of text in front of you.I wanted you to get an idea of what we’re actually working with.
  • Onwards… several Perl scripts running on a Linux server now do all the work here.Here’s how it works in practice:On a Friday morning, a cataloguer saves a copy of the exported list of new items.Shortly after, the script will run and notice there is new data. This is processed, then loaded into a database.It’s worth looking at the “processing” stage in a bit of detail. I promise not to subject you to pages of Perl script…
  • This is what “processing” the data means. I want to demystify this.Basically we’re just getting it into a form that can be loaded into MySQL.The program loads the exported data,rewrites itto tidy up the formatting, then writes it out into another file…
  • This is the processed version of the same item we looked at before.This is loaded straight in to a MySQL database by the Perl script.
  • For clarity I wanted to break this item up to show you where the data has come from in Millennium[This can be skipped]
  • A little bit about the database.There are two tables – items contains the new books themselves, whereas fundmapis a table to relate the Millennium fund codes to the subjects they represent.At Durham, the fund codes used can be trusted to always relate to the department they were purchased for.This won’t work very well at Senate House - I’m looking at using item locations instead.
  • Here’s a snippet of the fundmap database to show you what it looks likeWe’re going to use the deptcode (department code) from this database to clump together multiple fundcodes into one subject department or subject name.I’ll spare you the gory detail as it involves SQL.
  • The final step is a PHP application that will actually serve the RSS feeds to the end user.In PHP because that’s what is supported on the IT Service Web server.Down the bottom is the form of the URL for querying the database for new items.We’re using the history department code here.
  • This is a very broad outline…The PHP program connects to the database and selects items, either all of them or a subject name.The sorting of the list will happen at this stage – our feeds are sorted by shelfmarkwhich is DDC.I don’t want to wade though the whole PHP script telling you what it does, here are some highlights...
  • This is finished, formatted RSS feed.Firstly the program writes in what are called the “channel” elements, data that describe the feed as a whole.Then for each entry retrieved from the database, we write out an “item” element.The <link> element is a link to the OPAC bib record display. You can include HTML in an RSS <description>. Most clients will render it.I’ve hyperlinked author, shelfmark and LC subject headings to searches in the OPAC. Subject headings are an attempt to provide some “find more like this” functionality.Presentation is meant to be simple. Everything has to make sense displayed out-of-context, away from a desktop PC. The 245$a is used to present a nice short title for reuse elsewhere for example.
  • So here’s the finished product in an RSS news readerAs you can see I’ve not been keeping up with new books at Durham.It’s been working with very few problems since August 2007.
  • Here’s an example showing reuse of the RSS feeds to provide a display of new book on the Library Web site.The feeds can be reused elsewhere such as in-house screensavers, flat screen displays
  • This is a summary of the process.- Start with a review file.- Export the bib and item data to a flat file.- Process it, then load into a database.- Use this as a basis for creating RSS feeds
  • Some lessons learned during this process.It’s easiest just to make everything use Unicode end-to-end from the very beginning.It’s polite and quite easy to write valid RSS or Atom feeds. Use feedvalidator to provide tips on good practice even if yours are already valid.We had very few complaints except from one or two people who’d been using the featured lists extensively.Only one person really ranted on about it…
  • Yes indeed, we can automate the review file and export stage.I recommend Expect
  • Following this trial I implemented more automated export and processing of Millennium data.Any of these could easily be a separate presentation…!- The Course Reserves creates an XML feed of reading lists items which is read in by Blackboard- The e-resources feed creates chunks of HTML which are reused in the CMS to list databases and e-journals information- The fines feed securely uploads patron data direct to the university treasurer for end-of-year fines clearing purposes.
  • RSS feeds using Millennium data

    1. 1. RSS feeds using Millennium dataAndrew Preater, University of London Research Library Services Presented at EIUG 2010, 15th June 2010 www.london.ac.uk
    2. 2. A short break in County DurhamI work for University ofLondon Research LibraryServices, at Senate HouseBut I will talk about myprevious development workat Durham University Library
    3. 3. Introduction The problem is the new books list We use these to list new items for readers as a current awareness tool Various ways to do this...
    4. 4. A traditional approach
    5. 5. Durham‟s new books featured list
    6. 6. Problems High maintenance Not split by subject; not easily „mashable‟ Usage next to nothing by 2007-0810 hits!
    7. 7. RSS feed improvementsPuts our metadata where thereader isMuch less work for librarystaffStandards-based XML data,can be reused elsewhere ormashed up RSS feed icon from www.feedicons.com
    8. 8. Project as proof of concept Low-risk pilot for automated export and processing of Millennium data Demonstrates this utility for future projects Quickest and easiest example using this approach
    9. 9. Desired outcomes Automated as much as possible Minimal effort by non-systems staff to maintain No special software – no budget! Stable and reliable, „just works‟
    10. 10. Software usedOther than Millennium...1. Linux server with Perl installed2. MySQL database3. Web server running PHP
    11. 11. Basic idea A featured list was created each week based on changing book item status to „d‟ So a „new books‟ review file was being made... New step added: export the contents of the review file and reuse it
    12. 12. Export these fieldsBIB MARC 245 $aBIB MARC 245BIB AUTHORBIB IMPRINTBIB SUBJECTBIB RECORD #ITEM FUND CODEITEM SHELFMARKITEM LOCATION
    13. 13. Example single item"Dead white men and other important people:"~"Dead white men and other importantpeople : sociologys big ideas / RalphFevre and AngusBancroft."~"Fevre, Ralph, 1955-"~"Basingstoke : PalgraveMacmillan, 2010."~"Social sciences --Philosophy.";"Sociology."~"b25978974"~"bgsoc"~"300.1 FEV"~"main4"
    14. 14. Processing this list Perl script run every 15 minutes by cron: 1. Checks if there is a new file 2. Processes the data 3. Loads it into a MySQL database 4. Cleans up
    15. 15. Step 2: tidying up the data1. Replace & with &2. Insert RFC822-compliant date3. Strip quotation marks around fields4. Strip trailing non-alphanumeric character in 245 $a5. Lowercase fund codes
    16. 16. Step 2: example single item|Dead white men and other importantpeople|Dead white men and other importantpeople : sociologys big ideas / RalphFevre and Angus Bancroft.|Fevre, Ralph,1955-|Basingstoke : Palgrave Macmillan,2010.|Social sciences --Philosophy.";"Sociology.|b25978974|bgsoc|300.1 FEV|main4|Mon, 07 Jun 10 12:31:01BST|Mon, 07 Jun 10 12:31:01 BST|
    17. 17. Step 2: example single itemDead white men and other important people 245$aDead white men and other important people: sociologys big ideas / Ralph Fevre and Angus 245Bancroft.Fevre, Ralph, 1955- AuthorBasingstoke : Palgrave Macmillan, 2010. ImprintSocial sciences -- Philosophy.";"Sociology. Subjectb25978974 Record #bgsoc Fund code300.1 FEV Shelfmarkmain4 LocationMon, 07 Jun 10 12:31:01 BST Date
    18. 18. DatabaseTwo tables are used:items is refresh weekly: contains ourbooks informationfundmap maps Millennium fund codes tosubjects. Export is automated but doesn‟tneed to run weekly
    19. 19. fundmap exampledeptcode fundcode deptname foobar siteECON bceco Economics & Finance DURHAMHIST bchis History DURHAM Govt & IntlMEIS bbcme DURHAM Affairs/IMEIS Govt & IntlMEIS bxabc DURHAM Affairs/IMEIS Trevelyan CollegeCTV ctvl1 DURHAM Library
    20. 20. Web front endPHP script hosted on IT Service Webserver will serve the feedshttp://www.dur.ac.uk/reading.list/newitems.php?dept=HIST Parameter is „all‟ or a subject code
    21. 21. What it does 1. Select items from database 2. Writes beautiful, valid RSS 3. Serves it up to the browser A bit more detail...
    22. 22. Generating RSS feed XML Write <title>, <description> <link>, <image> once Item <title> is 245 $a and links to catalogue bib record Itemeach database line, writefull item For <description> contains one data. newsinclude encoded HTML RSS Can <item> ... Item <description> author, shelfmark and subjects hyperlinked to catalogue search.
    23. 23. Finished product - IShown inAkregator feedreaderRunninghappily sinceAugust 2007
    24. 24. Finished product - IIHTML version ofRSS feeds onLibrary Web siteAlso: in-house PCscreensavers,plasma displays...
    25. 25. Summary Exported flat fileMillenniumreview file Process and loadDisplay with into databaseWeb front end
    26. 26. Lessons learnedEasiest to use Unicode everywhereWrite valid RSS 2.0 or Atom, usehttp://feedvalidator.org for hintsFew complaints; change uncovered a tinyhard core of featured lists fansThat said...
    27. 27. “Couldn‟t you automate this?” You can automate much of it with Expect or AutoIt Recommend Marc Dahl‟s presentation on Expect for Innopac: http://bit.ly/dahl-expect
    28. 28. Following on from this... Automated export and processing used for: Exporting Course Reserves to Blackboard Display of e-resources data in CMS Sending fines data to Oracle Financials
    29. 29. Thank you! Any questions?Contact meEmail: andrew.preater@london.ac.ukTwitter: @preater

    ×