The Archives Hub ~ Interoperability, Spokes and the Distributed Model
The Hub in a Nutshell Based at Mimas, University of Manchester In service since 2000 Over 23,000 collection descriptions  170 repositories JISC funded Management and service team at Manchester Development team at Liverpool Cheshire software Cheshire for Archives – works with EAD descriptions Distributed system
Content and contributors Strategic aim: build and enhance content Meeting the needs of the UK research community Meeting the needs of the wider community Archives for education and research Hub Workshop 2009 Flickr cc licence: eileenaway's photostream The success of the Hub is a reflection of the rich content available from Hub contributors
Current contributors Higher/Further Education Consortium contributions Institutions with a research agenda Others on a case-by-case basis We encourage institutions to contact us Hub Workshop 2009 John Rylands Library, Manchester
Collection or lower-level…? Originally funded for collection-level Software/searches effective with both Complimentary approaches Researchers ask for detail Hub Workshop 2009 Flickr cc licence: Muffet’s photostream Flickr cc licence: soylentgreen23’s photostream Images useful at item level
JISC Information Environment …  a vast and sometimes bewildering range of potential sources of electronic information. Each source of information has its own name, its own interface, features and search facilities. Little wonder, then, that many users remain unaware of their existence or fail to discover their value for their own learning, teaching or research. A key challenge is therefore to achieve a managed, coherent and shared information environment that will overcome these obstacles.  Being able to cross-search and use customised, value added and other services will considerably simplify users’ interactions with online resources. This should encourage take-up and greatly improve means of accessing these resources. … these activities need to be based on standards for the creation, access, use, preservation and interoperability of networked resources.  http://www.jisc.ac.uk/index.cfm?name=ie_home
JISC Information Environment Most content providers will already offer a Web site through which end-users can access their content. To be a part of the JISC-IE, content providers also need to support machine oriented interfaces to their resources. Support searching using Z39.50/SRW Support metadata harvesting using OAI-PMH Andy Powell 5 step guide to becoming a content provider in the JISC Information Environment   http://www.ariadne.ac.uk/issue33/info-environment Hub Workshop 2009
E-GIF, open source and open standards e-GIF version 6.1 (18 th  March 2005) The e-Government Interoperability Framework (e-GIF) sets out the government’s technical policies and specifications for achieving interoperability… across the public sector. There is a strategic decision to adopt XML and XSL as the core standards for data integration and management. It is a pragmatic strategy that aims to reduce cost and risk for government systems while aligning them to the global Internet revolution. http://www.govtalk.gov.uk/documents/eGIF%20v6_1(1).pdf Open Source, Open Standards and Re–Use: Government Action Plan http://www.netvibes.com/cabinetoffice#Open_Source Hub Workshop 2009
Isn’t technology brilliant?!! Technical know-how Hub Workshop 2009 XML Data creation/editing template Web interface Machine interfaces Distributed model  Web 2.0  Dissemination = Satisfying user experience + understanding users
Hub Data Flow Hub Workshop 2009 Sustainable model Data held as XML Efficient search mechanism Flexible access Easy to become a  Spoke
The Distributed Hub Flickr cc licence : Thomas Hawk The main goal of a distributed computing system is to connect users and resources in a  transparent , open, and  scalable  way. Ideally this arrangement is drastically more  fault tolerant  and more powerful than many combinations of  stand-alone  computer systems.  [Wikipedia] Administration interface Customisable web front-end Machine-to-machine interfaces Data Creation Template Local control Technical support locally Hub team support
Spokes software Offers a means of storing and sharing archival descriptions in XML Provides machine-to-machine access to the descriptions through Z39.50 and SRU (Search and Retrieve via URL) & OAI-PMH for harvesting records Provides a customisable Web search interface Is open source and based on open standards Includes a data creation and editing template
Anatomy of a Spoke Hub Workshop 2009 EAD XML files  Web search interface Direct searching access for other applications through standards-based machine-to-machine protocols … Including the central Hub!  Z39.50 SRU Cheshire indexes of EAD data HTTP
Spokes indexes The database will provide indexes based on the following standards: Data standard Data field(s) cql.anywhere full text dc.description unittitle, controlaccess, and scopecontent fields dc.title collection title (titleproper) dc.creator creator of the collection dc.identifier eadid dc.date unitdate dc.subjects controlaccess fields bath.name personal, family, corporate and geographic names bath.personalName personal names bath.corporateName corporate names bath.geographicName geographic names bath.genreForm genre
Administration Interface Hub Workshop 2009 http://spoke.mimas.archiveshub.ac.uk/ead/admin/
 
Hub Workshop 2009
Liverpool Spoke Hub Workshop 2009
John Rylands Spoke Hub Workshop 2009
Agreement with Spokes Hub Workshop 2009
Hosted Spokes Spokes at Manchester Configuration Agreement between parties Manchester team undertake agreed level of support Institution still responsible for the data  Hub Workshop 2009
Being an Archives Hub Spoke… Gives you control over your own EAD files Allows you to update and add new files when you need to Exposes your EAD to other applications which need to cross-search the descriptions Using standards-compliant methods Means you benefit from using software that has been developed with the Archives Hub community Hub Workshop 2009
Collaboration & Sharing Networks and communities – the National Archives Network Cross-service and cross-domain collaboration Copac Intute Digitisation Projects Expand and share content import/export/M2M Links to other archive services NRA Hub Workshop 2009
The National Archives Network ‘ Our vision of the future of British archives is of a flow of archival information which takes account of all the opportunities offered by digital networks and offers opportunity for exploration - historical, personal, social - to the broadest possible range of people wherever they can use it - in the home, the classroom or the office.’ Hub Workshop 2009 British Archives: The Way Forward (NCA, 2000) A comprehensive national resource discovery mechanism
The importance ‘ There can be no higher priority for archives than the creation of this collaborative electronic network, overcoming the limitations of geography, crossing the many archival sectors and creating a truly unified digital directory or encyclopaedia of British historical documents.’ Hub Workshop 2009 British Archives: The Way Forward (NCA, 2000)
National Archive Network Hub Workshop 2009
The opportunity ‘ Outreach has been a developing preoccupation for archives in recent years, but the arrival of the internet age provides the opportunities to take archives, as never before, to the doorstep of the community at large.’  British Archives: The Way Forward (2000)   Hub Workshop 2009
Progress of the NAN Many archives took part in this drive towards a national archives network … many still are taking part The importance of recognised standards Intention to create collection level catalogues of  all  substantial collections within a defined timeframe Hub Workshop 2009
Success of the NAN Strands of the national archives network provide access to archives that were previously inaccessible The HLF has played a major role in enabling access and online discovery Users of archives have benefited enormously  Data standards have become of central importance Hub Workshop 2009
Shortcomings of the NAN We don’t have a single national network  Differences in data structure; content; search capabilities; look and feel Strands are not fully interoperable Politics, funding and willpower may not combine in favour of this approach The landscape has changed substantially since 2000 – maybe this solution is no longer appropriate?  Hub Workshop 2009
The NAN today Many ‘strands’ Only a few use EAD (support EAD export) Lack of funding for a joint solution Key is interoperability and machine-to-machine interfaces:  NAN as a community, sharing knowledge and experiences NAN as a promoter of standards and facilitator for data sharing NAN strands as promoters of flexible and open approaches Hub Workshop 2009
The Interoperable Hub Hub Workshop 2009 The ability of software and hardware on different machines to share data Content standards  Structural standards Validation of content Data Editor Training and awareness Contributor responsibility Networking and community building
Hub Workshop 2009
Machine-to-machine interfaces Web access is just one means of access to the data Machine access provides flexible access, so people can set their own agendas Z39.50 SRU OAI-PMH (harvester) Need to provide semantic data – properly marked up, well-structured Hub Workshop 2009
Pilot project for SRU: Genesis portal for Women’s Studies Hub hosts data Genesis searches the Hub using SRU Implications for data – how search just for appropriate descriptions?  Possible issues with search speeds Hub Workshop 2009
Persistent Identifiers All Hub descriptions have their own identifiers – a unique reference Gives them their own web address – can point to any description Facilitates linking, e.g. from National Register of Archives Enables bookmarking of content http://www.archiveshub.ac.uk/arch/glossary.shtml#identifier Hub Workshop 2009
Challenges   (of which there are many) Understanding our users Encouraging item-level descriptions Encouraging images/links to content Which technology? Perceptions of relevancy Understanding Impact  Sustainability Hub Workshop 2009 Flickr cc licence: hoodwink’s photostream
Moving Forward Increasing content and contributors Branding and new Website More engagement with users / user generated content Continuing to be standards-based, open and interoperable Hub Workshop 2009

Hub Distributed Model 2009

  • 1.
    The Archives Hub~ Interoperability, Spokes and the Distributed Model
  • 2.
    The Hub ina Nutshell Based at Mimas, University of Manchester In service since 2000 Over 23,000 collection descriptions 170 repositories JISC funded Management and service team at Manchester Development team at Liverpool Cheshire software Cheshire for Archives – works with EAD descriptions Distributed system
  • 3.
    Content and contributorsStrategic aim: build and enhance content Meeting the needs of the UK research community Meeting the needs of the wider community Archives for education and research Hub Workshop 2009 Flickr cc licence: eileenaway's photostream The success of the Hub is a reflection of the rich content available from Hub contributors
  • 4.
    Current contributors Higher/FurtherEducation Consortium contributions Institutions with a research agenda Others on a case-by-case basis We encourage institutions to contact us Hub Workshop 2009 John Rylands Library, Manchester
  • 5.
    Collection or lower-level…?Originally funded for collection-level Software/searches effective with both Complimentary approaches Researchers ask for detail Hub Workshop 2009 Flickr cc licence: Muffet’s photostream Flickr cc licence: soylentgreen23’s photostream Images useful at item level
  • 6.
    JISC Information Environment… a vast and sometimes bewildering range of potential sources of electronic information. Each source of information has its own name, its own interface, features and search facilities. Little wonder, then, that many users remain unaware of their existence or fail to discover their value for their own learning, teaching or research. A key challenge is therefore to achieve a managed, coherent and shared information environment that will overcome these obstacles. Being able to cross-search and use customised, value added and other services will considerably simplify users’ interactions with online resources. This should encourage take-up and greatly improve means of accessing these resources. … these activities need to be based on standards for the creation, access, use, preservation and interoperability of networked resources. http://www.jisc.ac.uk/index.cfm?name=ie_home
  • 7.
    JISC Information EnvironmentMost content providers will already offer a Web site through which end-users can access their content. To be a part of the JISC-IE, content providers also need to support machine oriented interfaces to their resources. Support searching using Z39.50/SRW Support metadata harvesting using OAI-PMH Andy Powell 5 step guide to becoming a content provider in the JISC Information Environment http://www.ariadne.ac.uk/issue33/info-environment Hub Workshop 2009
  • 8.
    E-GIF, open sourceand open standards e-GIF version 6.1 (18 th March 2005) The e-Government Interoperability Framework (e-GIF) sets out the government’s technical policies and specifications for achieving interoperability… across the public sector. There is a strategic decision to adopt XML and XSL as the core standards for data integration and management. It is a pragmatic strategy that aims to reduce cost and risk for government systems while aligning them to the global Internet revolution. http://www.govtalk.gov.uk/documents/eGIF%20v6_1(1).pdf Open Source, Open Standards and Re–Use: Government Action Plan http://www.netvibes.com/cabinetoffice#Open_Source Hub Workshop 2009
  • 9.
    Isn’t technology brilliant?!!Technical know-how Hub Workshop 2009 XML Data creation/editing template Web interface Machine interfaces Distributed model Web 2.0 Dissemination = Satisfying user experience + understanding users
  • 10.
    Hub Data FlowHub Workshop 2009 Sustainable model Data held as XML Efficient search mechanism Flexible access Easy to become a Spoke
  • 11.
    The Distributed HubFlickr cc licence : Thomas Hawk The main goal of a distributed computing system is to connect users and resources in a transparent , open, and scalable way. Ideally this arrangement is drastically more fault tolerant and more powerful than many combinations of stand-alone computer systems. [Wikipedia] Administration interface Customisable web front-end Machine-to-machine interfaces Data Creation Template Local control Technical support locally Hub team support
  • 12.
    Spokes software Offersa means of storing and sharing archival descriptions in XML Provides machine-to-machine access to the descriptions through Z39.50 and SRU (Search and Retrieve via URL) & OAI-PMH for harvesting records Provides a customisable Web search interface Is open source and based on open standards Includes a data creation and editing template
  • 13.
    Anatomy of aSpoke Hub Workshop 2009 EAD XML files Web search interface Direct searching access for other applications through standards-based machine-to-machine protocols … Including the central Hub! Z39.50 SRU Cheshire indexes of EAD data HTTP
  • 14.
    Spokes indexes Thedatabase will provide indexes based on the following standards: Data standard Data field(s) cql.anywhere full text dc.description unittitle, controlaccess, and scopecontent fields dc.title collection title (titleproper) dc.creator creator of the collection dc.identifier eadid dc.date unitdate dc.subjects controlaccess fields bath.name personal, family, corporate and geographic names bath.personalName personal names bath.corporateName corporate names bath.geographicName geographic names bath.genreForm genre
  • 15.
    Administration Interface HubWorkshop 2009 http://spoke.mimas.archiveshub.ac.uk/ead/admin/
  • 16.
  • 17.
  • 18.
    Liverpool Spoke HubWorkshop 2009
  • 19.
    John Rylands SpokeHub Workshop 2009
  • 20.
    Agreement with SpokesHub Workshop 2009
  • 21.
    Hosted Spokes Spokesat Manchester Configuration Agreement between parties Manchester team undertake agreed level of support Institution still responsible for the data Hub Workshop 2009
  • 22.
    Being an ArchivesHub Spoke… Gives you control over your own EAD files Allows you to update and add new files when you need to Exposes your EAD to other applications which need to cross-search the descriptions Using standards-compliant methods Means you benefit from using software that has been developed with the Archives Hub community Hub Workshop 2009
  • 23.
    Collaboration & SharingNetworks and communities – the National Archives Network Cross-service and cross-domain collaboration Copac Intute Digitisation Projects Expand and share content import/export/M2M Links to other archive services NRA Hub Workshop 2009
  • 24.
    The National ArchivesNetwork ‘ Our vision of the future of British archives is of a flow of archival information which takes account of all the opportunities offered by digital networks and offers opportunity for exploration - historical, personal, social - to the broadest possible range of people wherever they can use it - in the home, the classroom or the office.’ Hub Workshop 2009 British Archives: The Way Forward (NCA, 2000) A comprehensive national resource discovery mechanism
  • 25.
    The importance ‘There can be no higher priority for archives than the creation of this collaborative electronic network, overcoming the limitations of geography, crossing the many archival sectors and creating a truly unified digital directory or encyclopaedia of British historical documents.’ Hub Workshop 2009 British Archives: The Way Forward (NCA, 2000)
  • 26.
    National Archive NetworkHub Workshop 2009
  • 27.
    The opportunity ‘Outreach has been a developing preoccupation for archives in recent years, but the arrival of the internet age provides the opportunities to take archives, as never before, to the doorstep of the community at large.’ British Archives: The Way Forward (2000) Hub Workshop 2009
  • 28.
    Progress of theNAN Many archives took part in this drive towards a national archives network … many still are taking part The importance of recognised standards Intention to create collection level catalogues of all substantial collections within a defined timeframe Hub Workshop 2009
  • 29.
    Success of theNAN Strands of the national archives network provide access to archives that were previously inaccessible The HLF has played a major role in enabling access and online discovery Users of archives have benefited enormously Data standards have become of central importance Hub Workshop 2009
  • 30.
    Shortcomings of theNAN We don’t have a single national network Differences in data structure; content; search capabilities; look and feel Strands are not fully interoperable Politics, funding and willpower may not combine in favour of this approach The landscape has changed substantially since 2000 – maybe this solution is no longer appropriate? Hub Workshop 2009
  • 31.
    The NAN todayMany ‘strands’ Only a few use EAD (support EAD export) Lack of funding for a joint solution Key is interoperability and machine-to-machine interfaces: NAN as a community, sharing knowledge and experiences NAN as a promoter of standards and facilitator for data sharing NAN strands as promoters of flexible and open approaches Hub Workshop 2009
  • 32.
    The Interoperable HubHub Workshop 2009 The ability of software and hardware on different machines to share data Content standards Structural standards Validation of content Data Editor Training and awareness Contributor responsibility Networking and community building
  • 33.
  • 34.
    Machine-to-machine interfaces Webaccess is just one means of access to the data Machine access provides flexible access, so people can set their own agendas Z39.50 SRU OAI-PMH (harvester) Need to provide semantic data – properly marked up, well-structured Hub Workshop 2009
  • 35.
    Pilot project forSRU: Genesis portal for Women’s Studies Hub hosts data Genesis searches the Hub using SRU Implications for data – how search just for appropriate descriptions? Possible issues with search speeds Hub Workshop 2009
  • 36.
    Persistent Identifiers AllHub descriptions have their own identifiers – a unique reference Gives them their own web address – can point to any description Facilitates linking, e.g. from National Register of Archives Enables bookmarking of content http://www.archiveshub.ac.uk/arch/glossary.shtml#identifier Hub Workshop 2009
  • 37.
    Challenges (of which there are many) Understanding our users Encouraging item-level descriptions Encouraging images/links to content Which technology? Perceptions of relevancy Understanding Impact Sustainability Hub Workshop 2009 Flickr cc licence: hoodwink’s photostream
  • 38.
    Moving Forward Increasingcontent and contributors Branding and new Website More engagement with users / user generated content Continuing to be standards-based, open and interoperable Hub Workshop 2009