It is over one hundred years since a Commonwealth legal deposit scheme was first established in the Copyright Act 1912. In 1912, a National Library for the new nation was still just an aspiration of the Parliamentarians who established the scheme. The nearest thing to a National Library was the Commonwealth Parliamentary Library, then situated in a grand old building in Melbourne. Under the 1912 Commonwealth provisions, Australian publishers were required to provide a copy of each book published within one month of publication to the Librarian of the Commonwealth Parliament
Within Australia, legal deposit was first established as a statutory principle shortly after Federation in the Copyright Act 1912. This followed earlier state-based legal deposit schemes, which remain in place today. Under the 1912 federal provisions, Australian publishers were required to provide a copy of each book published within one month of publication to the Librarian of the Commonwealth Parliament, which was then situated in this beautiful building in Melbourne.
Legal deposit, an obligation on publishers to deposit a copy of every publication with a major library, was introduced in Australia prior to federation by various state governments. Federally, it dates back to the second decade of the 20th century with the Copyright Act of 1912, which required that Australian publishers deliver a copy of every book published to the Parliamentary Library within a month of publication. The legislation was quite explicit, the book delivered had to be: a copy of the whole book with all maps and illustrations… finished and coloured in the same manner as the best copies of the book are published, and bound, sewed, or stitched together, and on the best paper on which the book is printed.
The material deposited under these provisions was to form the basis of the National collection which was later devolved to the National Library in the 1960s. And for many years after this, legal deposit was the most important mechanism by which the Library could build its published Australian collections.
Zajistit, aby dokumentární zdroje národního významu vztahující se k Austrálii a australským lidem, jakož i významný ne-australský knihovní materiál, byly shromážděny, uchovány a zpřístupněny
So, finally let me tell you a little bit about the variety and scale of discoverable content in Trove.
Digitised Newspapers and more 203,784,410 Journals, articles and research 151,380,087 Archived websites (1996 - now) 115,424,117 Books 21,280,629 Pictures, photos, objects 8,390,501 Music, sound and video 4,243,480 Government Gazettes 2,320,236 People and organisation 1,097,608 Diaries, letters, archives 654,461 Maps 636,222 Lists 88,428 Total 509,300,179
But numbers aside, it’s the coming together of collections, which is so useful for people searching Trove.
It's about taking these extraordinary Australian collections, particularly, items that are personal or often ephemeral and unique, these cultural and personal artefacts and turning a spotlight on them.
It's about giving people glimpses into these pieces of the past and the present that might otherwise be completely overlooked, and hopefully offering them some inspiration for the future.
While Australia lagged behind many other countries in the provision of a legislative mandate to collect electronic publications, the experience gained by the Library in the preceding years, was not wasted. Not only had it resulted in a representative picture of an electronic world now almost completely vanished, this experience developed the Library’s understanding of the concerns of publishers and the underlying requirements for an effective and workable electronic deposit scheme. On both fronts, the Library has been well-served by the legislation eventually formulated and passed by the Commonwealth Parliament.
The new provisions in the Copyright Act 1968 (ss. 195CA-195CJ) cover all Australian publications that are literary, dramatic, musical or artistic works. This includes print and electronic books, journals, magazines, newsletters, reports, sheet music, maps, websites and public social media. They apply to any Australian person, group or organisation that makes their online or offline publications available to the public either for sale or for free. Material that is primarily audio-visual is excluded from the provisions. The key features of the legislation are:
The legislation is format and technology neutral, i.e. it does not specify particular formats. The legislation also provides for the National Library Minister to proscribe new electronic forms to be considered online, allowing for future developments in publishing and technology;
There are separate deposit requirements for offline and online electronic publications – offline publications are required to be deposited within a month of publication; online publications only need to be deposited on receipt of a request by the Library; Requests for online material can be automated, allowing the Library to issue requests via a web harvester;
Electronic material deposited with the National Library must be free from technological protection measures and accompanied by any software required; while offline works must be of an appropriate quality to be handled (that is a ‘best copy’);
The legislation has been framed to reduce the compliance burden on both publishers and the Library. For print and other offline material, the legislation preserves much of the features of the old scheme. However, the request-driven nature of the scheme for online electronic material ensures that the large volume of online material that is not required for the National Library’s collection does not need to be supplied. Given the potential for uncertainty resulting from the large scale and dynamic nature of online publishing, it also ensures clarity for publishers about what content they need to deliver. A particularly innovative feature of the legislation, reflecting an understanding of the digital age, is that it enables the Library to request electronic material without any restriction based on place of publication or distribution. While the provisions for offline material specify that it must be published in Australia, there is not the same specification for online material. Although the Act is only enforceable on those subject to Australian law, this provision enables the Library to request self-published works by Australians, even where the authors are using online services based overseas.
Publishers of offline publications are required to deliver their work to the National Library within one month of publication. For print publishers, this is no different to the previous legislation. BUT this is a new requirement for publishers who publish works on discs or other physical media, which weren’t previously covered by the legislation. Publishers need to supply the ‘best copy’ (e.g. hardback over paperback). Publishers of online publications are required to deliver their work to the Library within one month of receiving a request from us. If publishers don’t receive a request from us, they are not required to deposit. Publishers that publish works that are both offline and online such as a hardcopy book and an ebook version are required to deposit the offline copy within one month of publication. If publishers prefer to deposit the online version, we’re encouraging them to contact us.
The Library’s traditional and social media efforts reached their pinnacle on 17 February, the day the new legislation came into effect. At 9am the first ebook, Thomas Keneally’s Napoleon’s Last Island, was deposited “live” by staff at Penguin Random House watched by a number of Library staff. Photographs and a short video capturing the moment provided great social media content. Social media was also a wonderful way to share the moment with our publisher and library stakeholders. Involving staff in this celebratory moment was also a highly effective means to celebrate their personal development goals and reinforce pride in the system they built.
This is a summary of the workflow for individual books, music scores and maps.
As you can see from this diagram, much of the process is completely automated and requires no intervention by staff.
Importantly, as soon as metadata is transferred to the Library’s catalogue and then through to Libraries Australia and Trove, the material is discoverable and available for use by readers – even before the record has been upgraded/enhanced by cataloguers.
Long-term preservation of content is critical to successful implementation of the Library’s legislative mandate. Data and files, whether deposited through the edeposit service or a batch process, are transferred to a dedicated digital preservation management environment to ensure ongoing preservation. There is currently a three month lag between the date of submission and transfer to the digital preservation system to allow time for cataloguing work to be completed and to give publishers enough time to re-submit their content if any problems are found during cataloguing.
The process of ingesting content into the digital preservation system, which occurs in weekly batches, is fully automated: it identifies content which is ready for ingest, creates Submission Information Packages and ingests them into the preservation system. The original checksum created upon submission of the content by publisher is included in the Submission Information Package as it is used to verify the integrity of the content during ingest.
Digital Preservation staff perform regular pre- and post-ingest audits to ensure the quality and completeness of ingested content and metadata.
Quick quiz #2: Which file format has been deposited the most so far?
Publishers can deposit books, music scores or maps without creating a publisher account.
If a publisher creates an account, they will also have the option to register an ongoing title such as a journal, magazine or newsletter.
Let’s start by seeing just how easy it is to deposit an ebook.
The first part of the form gives the publisher a quick summary of their contact details which can easily be edited from the ‘My publisher details’ link on the right.
The maps deposit form also provides fields such as sheet number / designation and scale.
Once the publisher has entered the title details, they can then add the name of the creator.
Multiple creators for people or organisations can be added.
Year of birth is optional.
Next is year of publication and edition details.
You may be surprised to see that this edition of Cloudstreet was published in 2011. Quite a few publishers are keen to deposit their backlist titles as well as new release titles.
If the Library has already received an edeposit title with a matching identifier such as an ISBN, the system will alert the publisher that we already hold this title:
This will help ensure that we don’t receive duplicate titles.
The next section on the form is about access provisions.
We ask publishers to confirm that they have the authority to enter into an access agreement and to confirm whether the publication is commercially available or not.
Data gathered from the commercial availability status will assist us with valuation of the legal deposit digital collections, as well as with providing access to these titles.
Publishers with freely available material can provide the URL
We then ask publishers to choose the level of public access for their titles.
The first option of basic access is intended for commercial publications. Library users will be only able to view the publications from the National Library’s Reading Rooms here in Canberra. It will not be possible to download or copy the files.
The second option of freely available from the Library’s catalogue and Trove is intended for non-commercial publications. Anyone with access to the internet can download and copy the files to their computer or mobile device.
The third option has a 12 month embargo for books, music and maps. This means that the title will only be available onsite at the National Library for the first year before being made freely available. Serial publications have a 6 month embargo.
Access preferences can also be set up in the publisher’s account to save time.
For non-commercial publications, in the future we may be able to provide offsite viewing access these titles.
The next step is uploading the file.
For books, serials and music scores, the edeposit service can accept epub, mobi or pdf files. For maps, we can accept tiffs and pdfs – including fancy ‘geo versions’: ‘geotiff pdf’ ‘geospatial pdf’ or ‘georeferenced pdf’.
Publishers can also upload a cover image as a jpeg or tiff file.
We require files that do not have any digital protection measures so that we can preserve and migrate the content as file formats change over time.
If DRM is detected or if there is a virus, the system will not allow the file to be uploaded.
Publishers can provide any other information about their title such as a summary, abstract or subject heading suggestions.
The final step in the process is to click the green deposit button!
The publisher receives a thank you message and an email confirmation with details of their deposit.
They can then deposit more titles or view a list of their recent deposits. There is also an option to download their deposit history details via a csv spreadsheet.
I will now hand you over to Badra Karunarathna, who will demonstrate what happens once we receive an ebook via the edeposit service.
We ask for some basic bib data in the form, such as the title and ISSN.
The frequency of the publication, and the numbering and publication details of the issue being deposited.
The publisher also fills out an access agreement for the title. Immediate access was chosen for Break o’day which means that the newsletter will be freely available through the Library’s online services.
The publisher then uploads their issue.
At the moment only one issue at a time can be uploaded, however multiple issue uploads are in the making.
All serial titles go through a selection process before being added to the collection and the team uses the Reftracker system to manage the intake of digital serials and correspondence with publishers. Reftracker is used throughout the Library in various areas such as Trove, Libraries Australia and Reader Services.
This is a screenshot of deposited serials in Reftracker.
All of the data entered by the publisher in the edeposit form is transferred to Reftracker and a link is provided to access the deposited issue.
Rostered staff in Australian Serials allocate themselves titles for processing.
This is a closer look at a serials application.
Once a digital serial is selected for the collection, a staff members creates a catalogue record as well as record in our digital management system, Banjo.
Banjo pulls some bibliographic data from the catalogue record, which means we are not duplicating the creation of bib data.
The staff member also uploads the deposited issue.
A link is then created between this record and the edeposit service, which allows for the publisher to upload further issues with no staff intervention.
Now that the title has been registered Bundaleer Press can opt to add a new issue to an existing title.
The relevant title can be selected from the dropdown box and the issue number details and year of publication are entered.
The publisher then uploads the issue and the file is automatically checked for DRM and viruses.
We are working with our IT colleagues to enable multiple issues to be uploaded.
The publisher clicks the green deposit button and receives a thank you message and email confirmation.
The issue is then uploaded and deposited and the issue then travels all the way through to our digital management system with no staff intervention.
Since we are not having to manually process the new serial issues, this is a huge efficiency for us!
Here is a summary of the automated bulk deposit ingest process for ebooks via CoreSource.
Our system retrieves the files via the ftp servers at CoreSource and Wiley. Our system automatically checks their ftp servers every 5 minutes to see if any new publications are available and checks the new files to make sure there is no DRM or viruses.
If everything is fine, the metadata is extracted and grouped into a submission packet to automatically transferred into the Library’s digital collecting management system, affectionately known as Banjo (after Banjo Paterson!)
Here is an example of the file structure for ebooks received via CoreSource and journal issues received from Wiley.
For ebooks, the top level folder name is the ISBN. The CoreSource file directory contains three parts: The ebook in epub format (DRM free) The jpg cover The ONIX xml metadata (ONIX version 3.0)
For journal issues from Wiley, we receive zip files. Each file contains the individual PDF articles and Wiley customised xml metadata:
This is what a folder for a single Wiley journal issue looks like.
This is Volume 52, number 2 of the Australian Journal of Politics and History. You can see there are 9 PDF articles and xml files.
Our IT developer has extracted metadata from the xml file to create a virtual issue in our digital collection management system for all of the articles to link from.
We will be receiving 135 journal titles from Wiley in the near future. To indicate the scale of bulk deposit, we have already received 22 journal issues from Wiley as part of our testing, with over 550 PDF articles and metadata files received. Soon we will receive thousands of Wiley journal articles on a range of topics!
The Library’s automated batch ingest process allows journal content and associated metadata to be collected at the article level for the very first time.
This is an example of the Wiley xml data for a journal article. It includes information about the article such as: Title, creators, year of publication, number of pages and PDF file name.
It also includes metadata about the issue, such as: Journal title, volume and numbering, ISSN and the digital object identifier for the issue and individual article.
Before we can receive the journal articles, there is some pre-ingest work required.
Three of our staff in Australian Serials have created individual records for each of the 135 journal titles in our LMS (Voyager).
Our staff also create ‘work’ or ‘parent’ level records for each of the 135 journal titles in the Library’s Digital Collection Management System.
The issue metadata enables the system to automatically create ‘virtual issues’ to link from the parent record for the journal title.
Here are some issues for the Australian Journal of Politics and History.
The journal articles and then automatically linked from the ‘virtual issue’
Here you can see the title and creator metadata for each of the journal articles.
This is how the record looks from our online catalogue.
If you click on the online resources link in the catalogue, this is what you can currently see in Trove [click]
You can browse the issues and also see how many ‘children’ journal articles there are.
Here are some of the journal articles and metadata displayed in Trove.
As the Wiley journals are commercial publications, access will be restricted to onsite only at the National Library.
Libor Coufal - Australská národní knihovna - přednáška Praha 22.3.2017
DIGITÁLNÍ SBÍRKY A JEJICH
DLOUHODOBÉ UCHOVÁVÁNÍ V NÁRODNÍ
1912 – povinný výtisk
• Copyright Act
• Zavádí povinný
výtisk na federální
“The publisher of every
book which is first
published in the
Commonwealth … shall,
within one month after
publication, deliver at his
own expense a copy of the
book to the Librarian of
První bedna knih naložená během stěhování do
1927 – Canberra
1960 – Národní knihovna
„A body corporate is hereby established under
the name "National Library of Australia".“
(National Library Act 1960)
Mosaika složená z náhledů titulních stránek ve webovém
archivu PANDORA (autor: Alex Osborne)
1968 – samostatná budova
Mr Ryan Stokes
Managing Director & CEO
Seven Group Holdings Pty Ltd
Dr Marie-Louise Ayres
D-G, Executive Member
Ms Jane Hemstritch
Prof Kent Anderson Mr Thomas Bradley Ms Janet Hirst Mr Douglas Snedden
Julian Leeser MP Senator Claire Moore Ms Alice Wong
Shromáždit, uchovat a zpřístupnit:
• Dokumentární zdroje národního významu
vztahující se k Austrálii a australským lidem
• Další významný ne-australský knihovní
• Australský (národní) fond
– Publikované materiály (K/R)
– Nepublikované materiály (S)
– Zaměření na humanitní a sociální vědy + scénická umění
– Východní/jiho-východní Asie a Pacific(S)
• V místních jazycích
• V „západních“ jazycích publikované mimo region
– „Zbytek“ světa (S)
• Týkající se Austrálie nebo ovlivňující Austrálii
• Fyzické a digitální
– Noviny, časopisy, magazíny, zpravodaje, …
– Webový materiál
– Obrazový materiál (fotografie, kresby, malby, …)
– Orální historie a folklór
– Archivy osob a institucí
– Vládní publikace
– Výroční zprávy
• > 10 mil. jednotek, 6,9 mil. titulů
• 414 zaměstnanců (6/2016, plné úvazky), 73
• Rozpočet (2015/16):
– Vláda – 58,4 mil. AUD (9,7 mil. – nákup fondů)
– Externí zdroje – 17,3 mil. AUD
• 80. léta
• Diskety (CD,
DVD, USB Flash)
• < 5 TB
Nejstarší známá publikace na pevném nosiči (1980)
– 46 tis. hod.
– 22,4 mil. stran
• Knihy, časopisy
– 1,1 mil. stran
– 1,3 mil. „stran“
Příklad digitalizačního workflow
Total web archive collections: 434 TBs
AU domain harvests from
Internet Archive: 380 TBs
PANDORA Archive: 28 TBs
Web Archive: 26 TBs
Comparative data size of the NLA web archive collections
Whole domain harvests
• 2008 – discovery/delivery
• Agregátor australského obsahu
– NLA, státní a teritoriální knihovny, kulturní a
– 1400 sbírek
– Metadata + plné texty + digitální objekty
Digitised Newspapers and
Journals, articles and
Archived websites (1996 -
Music, sound and
People and organisation,
Diaries, letters, archives,
Digitised Newspapers and more
Journals, articles and research
Archived websites (1996 - now)
Pictures, photos, objects
Music, sound and video
People and organisation
Diaries, letters, archives
Trove jako platforma
• Uživatelsky generovaný
– Fotografie (Flickr)
– Korektury OCR
– Komentáře, štítkování
– Uživatelské seznamy
• Data dostupná přes API
Povinný výtisk 2016
Všechny australské publikace
• Literární, dramatická, hudební
a umělecká díla
• Tištěná i digitální
• Nekomerční i komerční
• Sklízet web
• Vyžadovat publikace bez DRM
• Definovat „australské“
Požadavky na vydavatele
Doručit knihovně do 1 měsíce od vydáníOffline
Doručit knihovně do 1 měsíce od vyžádáníOnline
Offline & Online
Ze zákona povinnost doručit kopii „offline“
Po dohodě s knihovnou možno doručit „online“
Digitální výzvy a příležitosti
• Finanční situace
• Nárůst digitálního materiálu
• Digitální povinný výtisk
• Nové technologie
• Očekávání veřejnosti a vlády
Digital Library Infrastructure
Replacement Program (DLIR)
• Mandát shromáždit, zpracovat, uchovat a
trvale zpřístupnit digitální informace
• Prudký nárůst digitálních zdrojů
• Původní infrastruktura zastralá/nedostačující
• 7/2012 – 6/2017 (5 ročních fází)
• Cca 15 mil. AUD
• Z rozpočtu knihovny
• Interní zdroje
– Projektové řízení
– Vývoj, implementace, integrace
• Tendr (DocWorks + Preservica)
DLIR - projekty
• Digital Collecting Project
• Digital Preservation Project
• System Replacement Project
• Web Archiving Project
• NCM Upgrade
• Mass Digitisation
• Legacy Digital Library
• Digital Library Security
• Pictures and Manuscripts
Digital Collecting Project
• Discovery Systems Integration
• Data Migration project
digital pub &
Weekly transfer of metadata to
ILMS & Libraries Australia
data to Digital