Helping communities access and explore
       their newspaper heritage.


     Rose Holley – Manager Newspaper Digitisatio...
Status of the Program
November 2006 Minister for Arts and
Sports approval

Budget approval -$8 million for 3 million
pages...
Content and Coverage
National Content
                                                Northern
                           ...
First Newspaper
        • First page of first
          Australian newspaper
          ever published


          The Sydn...
Through 150 years
• Up to 1954 (when
  Copyright applies),
  and later if agreement
  with publishers.


The Argus 22 Augu...
Relationship - ANPLAN
Website: http://www.nla.gov.au/anplan/




                                         6
Keep Up to Date with Progress
• Website: http://www.nla.gov.au/ndp/




                                        7
National Help
• NLA working with State and Territory
  Libraries as part of ANPLAN.
• Libraries suggest titles and dates a...
Process in brief
 National sourcing of selected newspaper microfilm
                       masters.

Masters scanned by Co...
Logistics
Australia (State Capitals – Sydney/Canberra)
USA (Virginia) - India (Hyderabad, Chennai)




                   ...
6 Month Progress
• IT Infrastructure and storage implemented at NLA

• Content management and ingest software developed by...
Next 6 months
• Acceptance of pilot data then commence
  production phase (3 million pages)

• Development of search and d...
Technology – internal NLA
 Old newspapers being processed and delivered
          using latest digital technology

• NLA d...
Infrastructure and Storage




    Online Storage – 70 TB:
•   Working space for images in processing 40TB for 1 million p...
Establishing Workflows




                         15
Technology - external
• Scanning microfilm
  using
  Flexscan/Eclipse
  scanner and latest
  software (nextstar)
  from Ne...
Scanning Contractor




                      17
Digital Images returned to NLA




                                 18
Quality Assurance at NLA
                Use 2 widescreen
                monitors placed
                vertically. Can ...
Metadata




           20
Page verification




                    21
22
Technology - external
Software developed to:
• Zone areas and articles on a page
• Flag continuing articles across multipl...
India Facility - Hyderabad




                             24
25
Quality Assurance




                    26
OCR Accuracy




               27
Batch reporting




                  28
Acceptance Criteria




                      29
Prototype Development
Under discussion:
• Derivative sizes and zoom technology
  testing
• Search and Browse features
• Re...
Digital Newspaper Searching
• Newspapers full text searchable
• Image captions searchable
• Search across multiple papers ...
Refine search by categories
•   News
•   Advertising
•   Birth Death Marriage notices
•   Obituaries
•   Editorial comment...
Search Illustrations
Categorised as:
• Photo
• Cartoon
• Map
• Graph
• Illustration
Captions searchable
                  ...
Browsing and Viewing
• Browse papers page by page
• Zoom in and out of image
  – to read small text
  – to view context of...
Zoom technology




                  35
Testing derivative sizes and zooming




                                 36
Prototype wireframe




                      37
Other features
Under discussion:
• OCR correction by users
• Personal annotation of articles by users
• Tagging results
• ...
Prototype release
• To be released to stakeholders who have
  given microfilm content
• Stakeholders able to view their da...
Pilot Data
• Canberra Times
• Sydney Gazette
• Northern Territory Times
• South Australia Advertiser
• Hobart Town Gazette...
http://www.nla.gov.au/ndp   41
Upcoming SlideShare
Loading in...5
×

The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

827

Published on

Published in: Technology, News & Politics
1 Comment
1 Like
Statistics
Notes
  • I've really enjoyed using the National Library Trove search page to research family history, turning up photos and news articles from the Sydney Morning herald and Brisbane Courier going back as far as the 1880s. The ability to edit and correct the transcript as we read the scanned pages is a great interactive tool to allow users to help with the project, as is the ability to leave comments and information about the photos I had ties to. This is a great way to add to the store of knowledge being presented. Hope that regional papers already held on microfilm may come to be added in to this resource.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
827
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

The Australian Newspapers Digitisation Program: Helping Communities Access and Explore their Newspaper Heritage - Keynote. 2007

  1. 1. Helping communities access and explore their newspaper heritage. Rose Holley – Manager Newspaper Digitisation Program http://www.nla.gov.au/ndp rholley@nla.gov.au Australian Media Traditions Conference 23 November 2007, Charles Sturt University, Bathurst 1
  2. 2. Status of the Program November 2006 Minister for Arts and Sports approval Budget approval -$8 million for 3 million pages over 4 years Contracts signed with digitisation suppliers April 2007 program pilot phase commences 2
  3. 3. Content and Coverage National Content Northern Territory Initially a title from each Times state Focus on major titles from each state first Anticipated that ‘regional’ titles may Courier Mail be contributed later West Australian Coverage: published Advertiser Sydney Gazette between 1803 – 1954 Canberra Times (out of copyright) Argus Mercury 3
  4. 4. First Newspaper • First page of first Australian newspaper ever published The Sydney Gazette and New South Wales Advertiser Saturday March 5 1803 4
  5. 5. Through 150 years • Up to 1954 (when Copyright applies), and later if agreement with publishers. The Argus 22 August 1945 5
  6. 6. Relationship - ANPLAN Website: http://www.nla.gov.au/anplan/ 6
  7. 7. Keep Up to Date with Progress • Website: http://www.nla.gov.au/ndp/ 7
  8. 8. National Help • NLA working with State and Territory Libraries as part of ANPLAN. • Libraries suggest titles and dates and provide microfilm for digitising. • ANPLAN members and other stakeholders will provide feedback on the search and delivery prototype. • Developing model for national contribution of regional newspapers. 8
  9. 9. Process in brief National sourcing of selected newspaper microfilm masters. Masters scanned by Contractor, Sydney to tiff files. NLA perform quality assurance, add metadata. Contractor, India process tiff files - OCR, zoning, xml markup. NLA QA files, ingest to system, create derivatives for delivery. 9
  10. 10. Logistics Australia (State Capitals – Sydney/Canberra) USA (Virginia) - India (Hyderabad, Chennai) 10
  11. 11. 6 Month Progress • IT Infrastructure and storage implemented at NLA • Content management and ingest software developed by NLA to support workflow • Quality assurance and production software developed by US/India contractor • Pilot data sent to contractors to test workflows, systems and software against agreed project spec. 11
  12. 12. Next 6 months • Acceptance of pilot data then commence production phase (3 million pages) • Development of search and delivery prototype • Public launch of service with a good body of content in 2008 • Progressive addition of content – national program ongoing 12
  13. 13. Technology – internal NLA Old newspapers being processed and delivered using latest digital technology • NLA developing in house: – Ingest and storage system – Workflow and content management system including quality assurance module – Search and delivery system • NLA providing: – System Infrastructure (storage, backup, disaster recovery) 13
  14. 14. Infrastructure and Storage Online Storage – 70 TB: • Working space for images in processing 40TB for 1 million pages • Search and delivery derivatives 30TB for 3 million pages • XML files, database systems and indexes 1 TB Offline Storage – unlimited for master images on tape. 14
  15. 15. Establishing Workflows 15
  16. 16. Technology - external • Scanning microfilm using Flexscan/Eclipse scanner and latest software (nextstar) from NextScan www.nextscan.com 20,000 pages a week. 16
  17. 17. Scanning Contractor 17
  18. 18. Digital Images returned to NLA 18
  19. 19. Quality Assurance at NLA Use 2 widescreen monitors placed vertically. Can view complete page within context of issue. Add metadata, sort out missing and duplicate pages within an issue. Prepare batches to send for OCR. 19
  20. 20. Metadata 20
  21. 21. Page verification 21
  22. 22. 22
  23. 23. Technology - external Software developed to: • Zone areas and articles on a page • Flag continuing articles across multiple pages • Categorise articles on a page • OCR text on a page • Re-key headings and first 4 lines of text. • Deliver XML files (ALTO) and METS/MODS files. 23
  24. 24. India Facility - Hyderabad 24
  25. 25. 25
  26. 26. Quality Assurance 26
  27. 27. OCR Accuracy 27
  28. 28. Batch reporting 28
  29. 29. Acceptance Criteria 29
  30. 30. Prototype Development Under discussion: • Derivative sizes and zoom technology testing • Search and Browse features • Results and refinement of results • User interaction with source (web 2.0) • Interface design 30
  31. 31. Digital Newspaper Searching • Newspapers full text searchable • Image captions searchable • Search across multiple papers e.g. by persons name. • Refine searching by: – Date – Newspaper title – State published 31
  32. 32. Refine search by categories • News • Advertising • Birth Death Marriage notices • Obituaries • Editorial commentary and letters • Shipping News • Arts and leisure • Detailed lists, results, guides 32
  33. 33. Search Illustrations Categorised as: • Photo • Cartoon • Map • Graph • Illustration Captions searchable Canberra Times 26 July 1928 page 6 33
  34. 34. Browsing and Viewing • Browse papers page by page • Zoom in and out of image – to read small text – to view context of article within page layout • Print article or entire page or issue 34
  35. 35. Zoom technology 35
  36. 36. Testing derivative sizes and zooming 36
  37. 37. Prototype wireframe 37
  38. 38. Other features Under discussion: • OCR correction by users • Personal annotation of articles by users • Tagging results • Creating public sets (for historical events) • Clustering results • Searching across other relevant resources (paid subscription services, international resources, other digital resources) 38
  39. 39. Prototype release • To be released to stakeholders who have given microfilm content • Stakeholders able to view their data • Feedback on data quality and search functionality • Amendments made and then ‘search and delivery version 1’ released to a wider group for testing and feedback before public launch in 2008. 39
  40. 40. Pilot Data • Canberra Times • Sydney Gazette • Northern Territory Times • South Australia Advertiser • Hobart Town Gazette, Courier, Colonial, Mercury • Melbourne Argus • Perth Gazette • West Australian • Brisbane Courier Mail (12 titles, 8000 issues = 50,000 pages = 500,000 articles) 40
  41. 41. http://www.nla.gov.au/ndp 41
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×