Papers Past Revitalisation: NDF 2007

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Papers Past Revitalisation: NDF 2007 - Presentation Transcript

    1. Papers Past: Present and Future Gordon Paynter & Tracy Powell National Library of New Zealand Revitalising the Papers Past Historic Newspaper Collection National Digital Forum Conference, 29 November 2007
    2. Outline
      • What is Papers Past?
      • What users wanted
      • Large-scale OCR for newspapers
      • User interface development
      • User response to new site
      • Papers Past: Present and Future
    3. Papers Past (2001-2007)
    4. What users wanted
      • Papers Past was popular, but users wanted more:
        • Searchability
        • More newspapers
        • Better printability
        • Easier downloads
      User Survey “ [I] would love to be able to search across the newspapers (but I guess that would be a pretty big project with OCR!).” A respondent to the Papers Past user survey conducted in the planning stage of the project
    5. User research
      • Online survey:
        • On the front page of Papers Past
        • 212 responses in about a month
      • Comparative usability study:
        • Five Papers Past users were invited to the Library.
        • Performed tasks on Colorado and Utah collections
          • Observed using the features of these sites
          • Asked what features were important
    6. Online survey: who are the users? User types (based on 212 responses to online survey)
    7. Comparative usability study
      • Everybody used search, and used it first
      • Essential:
        • Printing (with context, such as citation information)
        • Browsing from page to page within a paper
        • Search term highlighting
      • Important:
        • Browse by region and by title
        • Browse by date (important, but somehow confusing)
        • Background information – for advanced users
    8. Our perspective
      • An old website, and looks it
      • It is not compliant with Government web standards
      • Web browser needs a Java Applet to view TIFF files
      • Valuable (and expensive) content not being exploited
    9. Large-scale Optical Character Recognition for Newspapers
    10. How do we make them searchable?
      • The collection is too large to transcribe:
        • over a million pages of text,
        • about 200,000 newspaper issues,
        • around 26 million articles,
        • approximately 7 billion words.
      • Where do we get a text equivalent to search?
      Optical Character Recognition “ A process by which software reads a page image and translates it into a text file by recognizing the shapes of the letters.” The NINCH Guide to Good Practice in the Digital Representation and Management of Cultural Heritage Materials.
    11. Large-scale OCR for newspapers
      • Performed by Planman Consulting in New Delhi
      • CCS docWORKs software
        • Abbyy Finereader 8.0 OCR engine (industrial version)
        • Includes our list of Māori and geographical terms
    12. Large-scale OCR for newspapers
      • Pages cropped and deskewed
      • Pages organised into issues
      • Pages zoned into Articles, Advertisements and Illustrations
      • Text is captured with OCR software
      • Selected Issue metadata manually cleaned up
      • Headline metadata manually corrected
      • Output to XML and image files
    13. User Interface
    14. Building a new web interface
      • Prototype developed by DL Consulting
      • Based on Greenstone software
      • A hybrid collection, containing:
        • Searchable newspapers with OCR text
        • Browse-only newspapers from old Papers Past
      • User interface redesigned by ClickSuite
      • User interface testing and refinement
      • Launched 03 September 2007
    15. User testing
      • 15 users observed
      • One on one sessions
      • Free browsing, then some fixed tasks
      • Positive response overall
      • Design of interface well-received
      • Some changes made in response:
        • Search page rearranged
        • Search history moved from search page to own page
      Overall rating
    16. New Papers Past
    17.  
    18.  
    19.  
    20.  
    21.  
    22.  
    23. User Response
    24. User response
      • We now have more people using the site
      • The terms Search and Browse are not well understood
        • “ Searchable” interpreted as “online”
        • “ Not searchable” interpreted as “not available”
      • Search functionality is very popular
      • Browse less well-received by hard-core researchers
      • Some of the issues relate only to material that is not searchable, and will disappear when all the material is searchable.
    25. Web statistics after first month Conclusion: we have more users, and they are using the site a lot more 443% Number of page views Was 23%, now 73.2% Visitor repeat rate Was 14 minutes, now 29 minutes Average length of visit Was 2, now 6 Average visits per visitor 331% Number of visits 21% Number of unique visitors Increase Statistic
    26. Papers Past: Present and Future
      • Website
        • Promote website to new user groups
        • Potential new features in response to user feedback
        • Evaluating Planman’s prototype metadata editing tool
      • Annual digitisation programme
        • Digitising new newspapers (and filling gaps)
        • Making all the existing pages searchable
      • Research
        • Documenting the relative advantages of OCR and transcription of textual materials
        • Testing whether changing from bitonal digitisation to greyscale digitisation improves OCR accuracy
    27. Thank you

    + NationalLibraryNZNationalLibraryNZ, 3 years ago

    custom

    1273 views, 0 favs, 3 embeds more stats

    Presentation on the redevelopment of the National L more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 1273
      • 1241 on SlideShare
      • 32 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 17
    Most viewed embeds
    • 30 views on http://librarytechnz.natlib.govt.nz
    • 1 views on http://www.netvibes.com
    • 1 views on http://64.233.189.132

    more

    All embeds
    • 30 views on http://librarytechnz.natlib.govt.nz
    • 1 views on http://www.netvibes.com
    • 1 views on http://64.233.189.132

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories