In this very short presentation I am to give a very brief history of the progress of resource discovery in the archives domain; to talk about The National Archives experience with using linked data; and to talk about some possible future directions.
As this is a mixed audience of librarians, archivists and curators, it may be helpful just to start with a few points about the nature of archival catalogues and how they operate in an online world.
In ting was the beginning was the National Register of Archives founded in 1945 as a paper-based national union catalogue of manuscripts, held in London, indexed by the ‘creators’ of manuscripts; now with info on over 300,000 collections indexes computerised from the 1970s and given additional place-name and subject access points; mounted online in the 1990s; linking to online catalogues began in 2005 adapted as an ISAAR-compliant name authority file in 2007 In 1998 the National Council on Archives published its seminal report, Archives Online – the creation of a National Archives Network articulated (what remains a valid objective) the concept of a single online point of access from which it would be possible to search and browse all the available catalogue descriptions of UK archives, linked to a name authority file identified the major tasks as retroconversion of existing paper-based catalogues and creating the technical infrastructure The report recognised that funding silos might mean a range of different projects took this vision forwards, but that adherence to some simple rules on interoperability would enable them to be linked or joined up in future
As a result, over the last decade or so, many flowers have bloomed. Some, like A2A and SCAN have not been taking on new content for some years, while others continue in active development and are indeed represented at this conference. There must be a concern, however, in the current funding environment, about how sustainable services are which are dependent on project or renewable funding. Or about whether their funders will reward their continuing fulfilment of a core information function as well as supporting continuing development.
About five years ago there was a prevalent view that the future lay in each repository hosting and managing its own data. Considerable investment has been made by local authorities and universities in making this possible, using two main commercial products, CALM and ADLIB, and a variety of other commercial and bespoke approaches. My view – and I recognise that this contentious - is that the results have been rather disappointing. Some of the reasons for this are on the screen. And again, there must be a question about whether in the current funding environment, the range of repositories providing their own catalogues will continue to grow, and even whether those which have them will continue to afford the relevant licences and investment to maintain them.
I want to turn now to The National Archives’ own work with linked data. As an organisation we have a commitment to the principles of open data. We are the arm of the UK government responsible for implementing the PSI directive, and the UK is widely viewed across Europe as the most enthusiastic advocate of open data. We have seen a range of initiatives by the current government to take this agenda further: data.gov.uk and now proposals for a Public Data Corporation. We are fortunate in having, in the person of John Sheridan, one of the pioneers and greatest advocates of linked data, certainly in the UK and possibly internationally. He has built up a team which has delivered the legislation.gov.uk site, of which TNA is extremely proud. It has made possible a revolution in the accessibility of primary and secondary legislation, both in its original enacted form and as amended. It is possible to see what was the state of the legislation at any particular date in the past. Linked data has been critical in making this possible. John’s work has prompted us to explore how L D can work in other contexts. We are producing a L D version of PRONOM which enables matching of the file format definitions it contains with those in similar registeries across the world. And we are exploring the application to resource discovery
The context for this is the beta-launch of an improved resource discovery system, initially focusing on Catalogue and some digitised resources, as our “Discovery” system. We are planning to explore the use of L D in the form of the “Open Annotations” model, for linking catalogue records and related user-generated content. Our ambition is to enable UGC to enrich the catalogue but at the same time to make it easy to see at a glance what is authoritative TNA-sourced catalogue data and what is possibly less authoritative UGC. In 2011 we will be seeking views on the future development of the NRA with a renewed appetite for sector leadership. We aim to move to a new infrastructure platform in 2012-14 which will radically enhance the technical and collaborative possibilities. We are willing to consider rebuilding the data structure to facilitate new ways of working, and we will explore the potential of linked data, web crawling and other approaches to extending the effective data content. We are considering offering a hosting service for repository catalogues that allows you to edit them remotely, thus potentially removing the need for smaller services to build an online catalogue at all. And we are interested at enabling crowdsourcing and possibly linking to online resources like Wikipedia as solutions to rapidly building name authority content SO – TELL US WHAT YOU WANT THE NRA TO DO FOR YOU!
I hope there is enough there to stir up debate! Thank you.
Towards a national archives network - Nick Kingsley (The National Archives)
Nick Kingsley 22 April 2010 Towards a National Archives Network?
<ul><li>Archival holdings consist of collections (or fonds) representing any number of archival objects; the collections are the primary units of management </li></ul><ul><li>Collections consisting of more than a few documents are likely to have a natural or imposed internal hierarchical structure, which should be reflected in detailed catalogues </li></ul><ul><li>Ideally catalogues are linked to authority records for names and places, and to taxonomies for subjects, which serve as access points, disambiguate terms and provide context </li></ul><ul><li>Online representations of detailed catalogues need to render the hierarchical structure and linkages successfully </li></ul><ul><li>Archive users typically use a combination of search and browse approaches in resource discovery </li></ul><ul><li>Catalogues compiled over a century or more are not consistent in style, language or structure but the basic elements in the modern international standard, ISAD(G) can usually be recognised </li></ul>Non-archivists start here...
A short history of archival networking <ul><li>In the beginning there was the National Register of Archives </li></ul><ul><li>Archives Online report published by National Council on Archives in 1998 </li></ul><ul><ul><li>Articulated the concept of a single online point of access from which it would be possible to search and browse all the available catalogue descriptions of UK archives, linked to a name authority file </li></ul></ul><ul><ul><li>The technology envisaged at the time, of course, has changed. But the objective remains valid. </li></ul></ul><ul><li>The realities of the funding silos meant that this report was taken forward by a series of different projects which committed to a basis of interoperability to protect the potential for future integration or cross-searching </li></ul>
Many flowers bloom... <ul><li>A2A: multi-level lists mainly from local authority archives in England </li></ul><ul><li>Archives Hub: often new or edited collection-level descriptions mainly from University and specialist institutions </li></ul><ul><li>AIM25: often new or edited collection-level descriptions from specialist archives in London </li></ul><ul><li>Archives Wales: collection-level descriptions (perhaps multi-level in future) from all archives in Wales </li></ul><ul><li>SCAN: collection and multi-level lists from mainly local authority archives in Scotland </li></ul><ul><li>JANUS: lists from higher education institutions in Cambridge </li></ul><ul><li>...but how sustainable are they in the current funding environment? </li></ul>
Repository catalogues <ul><li>Individual repository online catalogues were thought likely to replace the networks, but they have usually proved disappointing by comparison with the facilities supported by the aggregators: </li></ul><ul><ul><li>Sometimes constrained by lack of technical support from parent organisation or by use of platforms acquired for other purposes </li></ul></ul><ul><ul><li>Two widely adopted commercial platforms, whose suppliers prefer to take forward only developments commanding majority support among user group </li></ul></ul><ul><ul><li>Rarely offer robust and flexible search and browse facilities </li></ul></ul><ul><ul><li>Rarely comprehensive in coverage </li></ul></ul><ul><ul><li>Will a next generation of the technology be affordable in the current climate? ICA-AtoM represents an open source competitor which may become more widely adopted. </li></ul></ul>
The National Archives and Linked, Open Data <ul><li>The National Archives, as the UK regulator for the EU Public Sector Information Directive, is committed to supporting and promoting open data </li></ul><ul><li>The National Archives has also been a pioneer in exploiting the potential of Linked Data through its website www.legislation.gov.uk – one of the first large-scale implementations of linked data </li></ul><ul><li>Experience with legislation has led us to rebuild the PRONOM file format registry using a linked data approach. This is available on the TNA labs site: http://labs.nationalarchives.gov.uk/wordpress/ </li></ul><ul><li>The third area where we are exploring the use of linked data is around resource discovery. </li></ul>
<ul><li>We have recently launched the Discovery system on the TNA Labs site. This does not employ a linked data approach but uses the Autonomy category classifier to create a taxonomy which can be applied automatically across the 11,000,000 records in the dataset. </li></ul><ul><li>We are exploring using a linked data approach (Open Annotations) to connect entries in the catalogue to user generated content relating to them </li></ul><ul><li>We are about to review the business purpose and technical infrastructure of the NRA and will explore using a linked data approach to connect elements of a distributed name authority file and also to connect different levels of description of the same collection on different sites: for example a short description in the NRA, a collection level description in AIM25 and a multi-level description on a repository website </li></ul>Future of resource discovery at TNA