Perspectives on National Library of Australia Developments Part 1 Rose Holley


Published on

Presentation given at GLAM- wiki event held in Canberra, Australia, August 6-7 2009. Talking about how far the Australian Newspapers Digitisation Service meets the needs of next generation users and wikimedians, and recent user engagement activities. Rose Holley and Kent Fitch present.

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • We are pleased to be able to speak here today. Kent and I will be giving our personal perspectives on some of the innovative work the National Library of Australia is doing with users and user generated content. Our views are not necessarily those of the organisation as a whole but rather our personal viewpoints.
  • I want to focus on the Australian Newspapers program, which Kent and I have been working closely on for the last 2 years, because the Library is using this as a test case for user interaction and user generated content. It was always intended that what was learnt from this would be applied to other Library collections. The digitised newspapers were released in a beta service exactly one year ago. The service was released with 1 million articles and now contains 5 million. We aim to have 40 million articles available by the end of 2010. The service has had no planned promotion as yet with most users hearing of it via blogs and forums. However we are planning a public release next week and will be removing the beta status. So far around 2000 highly active users have corrected 5 million lines of text. We are encouraging this because the quality of the computer generated OCR text is so poor from old newspapers. The only way to improve it is by hand. As soon as users improve it the improved text is available to all. They’ve also been adding tags to articles – around 100,000 so far and can add historical comments to articles. Without any formal publicity we have got around ½ million users so far and the public and researchers are very excited and pleased about the service. It has revolutionised the speed in which people can do research because the newspapers are available on line and full text searchable.
  • This is what you see if you do a keyword search for ‘the price of petrol’. We are looking at article view. Users can zoom in or out and choose to view the article in the context of the entire page. They can also navigate to any other page within the newspaper issue. The electronically generated text created through the OCR process is displayed on the left hand side. This is also where the users can use the 3 enhancement features. Users can tag the article with keywords and they can write comments and notes about the article. If users login they will be able to choose to make their tags and comments public or private. So they can share their comments with all users or they can add their own private research notes that only they can access. The innovative feature that is not available in any other online newspaper service, is the ability for the user to correct the electronically generated text. Users can correct the text by clicking on the ‘Help fix this text’ button.
  • Text correction is perhaps not the best way to describe what is happening since actually users are enhancing the text. Every enhancement layer is kept and the history is viewable. All enhancement layers as well as the original text are searched across.
  • We don’t know exactly how many users are actively and regularly undertaking text correction but last year it was 1200 and we know it has increased since then. The text correctors are not all tagging as well, those are different users. Our top text correctors put in around 40 hours a week to remain in the top 10 and are very dedicated to the task. They chart their progress in our ‘hall of fame’.
  • The Australian Newspapers program has been an experiment in innovation and software development and it has been very successful. The user interactions and their results which I have widely talked on and reported have convinced the National Library, the international Library community and the users that enabling this type of activity is good for everyone, however there was of course initial uncertainty and perhaps fear from some quarters. But at the end of the day the data is improved which is what the institution wants and the users feel involved and valued which is what they want.
  • Some of the major concerns of the GLAM sector are listed on this slide. Interestingly there was a double concern that either the community would start madly using and vandalising the data or that they would show a complete lack of interest, and after all our hard work on developing a text correction/enhancement module may not bother to use it. This was mainly why we decided to release as a beta or test version and to not build a moderation module at the start – in case the users didn’t want to do text correction. However we didn’t take into account reverse psychology and motivational factors. Most users initially were rather surprised and awed that they could actually change the text, they understood the huge level of trust that had been placed in them and responded appropriately and responsibly. It was clear to them that there was no moderation from the start and that we would trust their goodwill and they rose to the challenge. It is worth noting that some people likened it to Wikipedia, but it is not the same. We have the fall back that users can always see the original image so can themselves compare text and see if someone had incorrectly changed it. Also all our user profiles are open so anyone can see the activities that someone has been up to. Once users understood that they weren’t overwriting or deleting anything - just building up layers they were much more comfortable with it, this also enables roll back if necessary. Lastly - losing control. Users were quite insistent that we should give them guidelines, rules and co-ordination, rather than leave it open, which to date we still haven’t. Gatekeepers should open as well as close doors and in my opinion opening a door as powerful as this is surely not losing control but gaining power. So the issues have effectively been put to bed.
  • Now I will step into the wikimedians shoes and thinking about what they want from online sites in 2009 and how far the Australian Newspapers has met their needs. These assumptions 1- 5 are what we started with and we tried to address all of them and mostly have been able to. That is most users want to find stuff via Google not come direct to your institution website, they want to link to useful items they find, know if they can re-purpose items, and find/use stuff in their own spaces.
  • As soon as we could we contacted Google and set up a harvesting site specifically to trawl the newspapers in the Google News Archive. We made sure that if you came to an article directly via Google it would be clear where you were when you arrived. We populated Wikipedia newspaper biography pages with links to the digitised titles, and we dynamically pulled through the information from Wikipedia to our own website as a good central source. I will show you this shortly. If there were no relevant Wikipedia pages I created them to get the ball rolling. I have to say we were not flavour of the month in the library sector for doing this since there was a strong feeling that a catalogue record should be the primary source. However newspapers are catalogued very poorly if at all, and having a link to a source that contains a lot more about a newspaper than just its bibliographic details seemed preferable to us. I suggested that if Libraries had written newspaper biographies they should add the link to the Wikipedia pages. Of course most libraries then said they hadn’t got the time to do this anyway, whereas interested members of the public have. As far as I am concerned adding your library site as an external link to a relevant Wikipedia page is an entirely safe process and will only boost your traffic. I obviously understand the concerns of libraries creating information directly in a wikipedia page, which may then be changed or improved.
  • These charts show that we are slowly achieving our objective of gaining traffic to the site via other routes. We never considered it likely to expect users to come direct to our site as the primary way in. The pale blue shows the growing traffic from search engines.
  • It’s really important on a site like this to have persistent identifiers or stable url’s so that when users link to things the links don’t break this year or next. We re-wrote the policy for pi’s for the National Library in order to be able to create a hierarchial pi system for newspapers. We created pi’s at title, page and article level and clearly showed them under a ‘cite this’ link. Users like this a lot. Also if they print out articles the pi will always appear on the print out. One area where we fall down is having a clear copyright/re-use statement. At present it is the Library’s opinion that all newspapers prior to 1954 are out of copyright (everything in the service), and in order to promote accessibility we are very happy for re-purposing and re-use. However although we have verbally said this a lot it is still not in writing on the site which is of course not helpful to users. There are still the extreme caution barrier to be overcome to take this step.
  • Pushing it out there to users spaces is technically very simply, but there are hurdles to overcome with regard to traditional library practices, and the way libraries communicate with their users.
  • Sharing our code was something we planned to do from the start and if you want it, just ask us. Our mission is to help enable more people to access newspapers and help institutions enable the same level of user interaction in our services that we have.
  • So now lets have a look at what I’m talking about.
  • Here is the home page of Australian Newspapers.
  • This is a newspaper title information page – showing the date range digitised, title pi and information being pulled from wikipedia.
  • A closer view shows that we bring the first 4 lines of the wikipedia page in. If you click the link to see the Wikipedia full entry then …
  • This is what you would see for the Argus. We’ve added the link to the digitised newspapers as an external link at the bottom. For most newspapers there is already a page existing, but if there wasn’t I created one and it quickly got populated.
  • This wikipedia page for the Sydney Gazette shows that a user has already popped an image in from our site. (If anyone feels like editing this the user has actually used the url rather than the pi so that needs to be changed!).
  • Now if I want to search full text in a newspaper via Google (which a lot of geneaologists are doing with family names) I’ll type some words in the main web google and click search
  • How good is this? A wikipedia entry for the newspaper first and then its found the article with these words – first the corrections and then the article – all clearly cited in the results list.
  • Alternatively if I had of searched in Google News Archive the result is still number 2 – clearly cited.
  • If we click either link this is what we get – directly through to the article in the beta service again clearly cited. Of course I have picked this article on purpose for what it says and its relevance to wikipedia and why we are here today. Someone suggested it should be the wikipedia manifesto and it says ‘innumerable as the obstacles were which threatened to oppose our undertaking, yet we are happy to affirm that they were not insurmountable, however difficult the task before us. The utility of a paper in the colony as it must open a source of solid information will, we hope, be universally felt and acknowledged. We have courted the assistance of the ingenious and the intelligent. We open no channel to political discussion or personal animadversion – information is our only purpose.
  • Things I am interested in are the future potential for user activity in text enhancement of full-text collections, and how to co-ordinate and manage thousands of willing volunteers who really want to help. This is not something which cultural heritage institutions have done much of. However I can see from the Wikipedia phenomenon and the Mormons Family Search Index phenomenon that we are only touching the tip of the iceberg here with volunteers. Family Search told me that in 4 years they went from 0 to 160,000 volunteers. Potentially every Australian at home with internet access could be a volunteer so that is a pool of around 10 million people at least.
  • What strikes me is what a lot of power is out there and also sitting in this room. There is the power of the public and the power of the GLAM sector. What we really need to do is harness these 2 powers and have ‘super power’ and that is what this event is about. Traditionally the GLAM sector has held the power and control over data but the Australian Newspapers service has demonstrated how that can shift very effectively and usefully towards the community, with great outcomes for both. Barack Obama said “Don’t under-estimate the power of people who join together …. They can accomplish amazing things”. This is true but the public could do even more if the GLAM sector committed to really pro-actively enabling this on a much larger scale. We know technically we can do it and that’s not what’s holding us back. In my experience of managing IT projects for the last few years it’s very rarely technical issues that hold us back, its other things. For example it has not been technically hard to implement text correction/enhancement so why did no one do it before? It required creative thinking to solve a problem and letting go of some of our library rules about who can do what and why and when. Sometimes rules are made to be broken….
  • Perspectives on National Library of Australia Developments Part 1 Rose Holley

    1. 1. Perspectives on National Library of Australia developments Rose Holley and Kent Fitch August 6-7 2009, Canberra, Australia Galleries, Libraries, Archives, Museums and Wikimedia: Finding the Common Ground.
    2. 2. User generated content 1 year since public release 5 million lines of text enhanced/corrected in 216,000 articles 105,000 Tags added 3400 Comments added (5000 registered users, 480,000 anon users)
    3. 3. User Interaction at article level
    4. 4. View all corrections on an article
    5. 5. ‘ Crowd sourcing’ ?? volunteers now Flickr: LucLeqay
    6. 6. Advantages of interactions <ul><li>Very active community involvement (Many Hands Make Light Work) </li></ul><ul><li>Quality of resource improved (data enhancement – more accurate searching on text) </li></ul><ul><li>Value added to data (comments) </li></ul><ul><li>Data discoverable in different ways (tags) </li></ul>
    7. 7. Concerns of GLAM - disproved <ul><li>Vandalism or disinterest </li></ul><ul><ul><li>Giving users high level of responsibility and trust = loyalty and hard work </li></ul></ul><ul><ul><li>Original image with text always shows </li></ul></ul><ul><li>Data corrupted </li></ul><ul><ul><li>Adequate measures to ensure that data kept in layers, kept separate, can be restored back, but integrated together for public view </li></ul></ul><ul><ul><li>Data is enhanced and value added. </li></ul></ul><ul><li>Loss of control/power </li></ul><ul><ul><li>Volunteers want guidance/co-ordination </li></ul></ul><ul><ul><li>Gate keepers should open as well as close doors! </li></ul></ul>
    8. 8. Assumptions addressed <ul><li>Users will want to find stuff via Google/ Wikipedia not come direct to your website </li></ul><ul><li>Users will want to link to useful items </li></ul><ul><li>Users will want to know if they can re-purpose items </li></ul><ul><li>Orgs/users may want the search api and source code </li></ul><ul><li>Pro-actively need to ‘push out’ to users into their spaces </li></ul>
    9. 9. Helping users find stuff <ul><li>Enable the site to be indexed to article level by Google and it to be clear where you are if you arrive at article level from Google </li></ul><ul><li>Arrange a web harvesting site with Google </li></ul><ul><li>Add ‘external links’ to the site on relevant Wikipedia pages </li></ul><ul><li>Integrate relevant information dynamically from Wikipedia into the site </li></ul>
    10. 10. Incoming traffic sources <ul><li>August 2008 at release </li></ul><ul><li>Google 15% </li></ul><ul><li>Direct 37% </li></ul><ul><li>July 2009 1 yr on </li></ul><ul><li>Google 55 % </li></ul><ul><li>Direct 21% </li></ul>
    11. 11. Helping users link to and re-use stuff <ul><li>Have a persistent identifier (pi) at article, page, and newspaper title level and use the words ‘cite this’. This is a stable link that will never change. </li></ul><ul><li>Have a clear copyright, rights ownership and re-use statement on the site or at image level (Caution/fear about doing this…) </li></ul>
    12. 12. Pushing it out there <ul><li>RSS feeds to alert users to new content (not yet) </li></ul><ul><li>Talking to users in their forum spaces </li></ul><ul><li>Responding to blogs by commenting </li></ul><ul><li>Have an open communication channel </li></ul><ul><li>(not yet) </li></ul><ul><li>Facebook fan page </li></ul>
    13. 13. What’s yours is mine - sharing <ul><li>Search api </li></ul><ul><li>Open source code for delivery system </li></ul>
    14. 14. screenshot /demo <ul><li>pi’s </li></ul><ul><li>‘ About page’ showing Wikipedia pull in </li></ul><ul><li>Wikipedia page showing link to ANDP </li></ul><ul><li>Google results page showing ANDP </li></ul>
    15. 24. Potential and Volunteer co-ord <ul><li>August 2005 FamilySearch Indexing on web introduced. </li></ul><ul><li>January 2006 2,004 online volunteers </li></ul><ul><li>January 2007 23,000 online volunteers </li></ul><ul><li>January 2009 160,000 online volunteers </li></ul><ul><li>Potential digital volunteers: Australia </li></ul><ul><li>21 million residents and over half of the households have access to the internet </li></ul><ul><li>= 10 million people at least </li></ul><ul><li>Question: How can GLAM nurture, co-ordinate and manage digital volunteers? </li></ul>
    16. 25. User power and GLAM power = SUPER POWER!! <ul><li>&quot;Don't under estimate the power of people who join together…. they can accomplish amazing things,&quot; </li></ul><ul><li>Barack Obama 19 Jan 2009 Speaking on community engagement and involvement and voluntary work </li></ul><ul><li>Rose says: </li></ul><ul><li>People want to work together to achieve amazing things – we as librarians have the power to give them both the data and tools to do this - they will do the rest…… </li></ul>
    17. 26. Links and references <ul><li>Australian Newspapers: </li></ul><ul><li> </li></ul><ul><li>Report on user activity: </li></ul><ul><li> </li></ul><ul><li>Facebook Fan page: </li></ul><ul><li> </li></ul><ul><li>Open source code for delivery system: </li></ul><ul><li>https:// </li></ul>