3. Plan
Quick introduction to Discovery Interface
1.Why it was needed
1.What it offers
1.How it was built
Conclusion on key reasons for success and benefits
http://discovery.nationalarchives.gov.uk/
3
4. Plan
Quick introduction to Discovery Interface
1.Why it was needed
1.What it offers
1.How it was built
Conclusion on key reasons for success and benefits
http://discovery.nationalarchives.gov.uk/
4
10. Quick Introduction to Discovery
What we just saw:
â˘Search or browse or read research guides
â˘Display record description
download if available
if closed, see when the document will be opened
and if you can in the meantime make a FOI request to
open it
â˘Check details about the archive that holds the
record
http://discovery.nationalarchives.gov.uk/
10
11. Plan
Quick introduction to Discovery Interface
1.Why it was needed
â Prior to Discovery
â New policies, business requirements
1.What it offers
1.How it was built
Conclusion on key reasons for success and benefits
http://discovery.nationalarchives.gov.uk/
11
12. Why it was needed: Prior to Discovery
TNA Records information were provided across
many different systems and parts of the website
ď End users: So many tools to search on
ď TNA teams: So many systems to
manage
We want
â˘1 place for the public to find records information
â˘1 tool to host, maintain, update, contribute to
http://discovery.nationalarchives.gov.uk/
12
13. Why it was needed: new requirements
http://discovery.nationalarchives.gov.uk/
13
14. Plan
Quick introduction to Discovery Interface
1.Why it was needed
â Prior to Discovery
â New policies, business requirements
1.What it offers
â Goal
â To the end users
â Technically
â What our users said about it
1.How it was built
Conclusion on key reasons for success and benefits
http://discovery.nationalarchives.gov.uk/
14
15. What it offers: Goal
Help the users to
âFind relevant records
âUnderstand the records
âObtain the records
http://discovery.nationalarchives.gov.uk/
15
16. What it offers to end users
to Researchers, students, public,
government, TNA staff
â˘A single point of entry
⢠to collections from 2500 archives
⢠Records from public bodies but also
private individuals or organisations
â˘Search
â˘Guides for searching
â˘Browse
â˘View & download records (TNA records
only)
â˘Search for archive contacts
â˘Collaborate, bookmark pages, save
searches, âŚ
http://discovery.nationalarchives.gov.uk/
16
17. What it offers technically
⢠Robust
⢠handle failures
⢠handle high peaks of traffic
⢠Detection of DOS attacks by firewalls
⢠Flexible and future proof
⢠3 tier architecture
⢠Use of shards on databases and search engine indexes
⢠Secure
⢠Platform highly constrained with (too) many physical
firewalls
⢠Meets government legislation on security
⢠Single Sign On
⢠keep up with current technologies
⢠From Service Oriented Architecture to micro services
⢠mainly .Net but involves also javascript (front end),
scala and java
⢠Using several open source technologies
⢠Accessible from a very broad range of devices (desktop
tables, smartphones)
⢠Merged together NRA, A2A,
ARCHON, MDR, TNA Catalogue,
etc âŚ
⢠Over 32.4 m ârecordsâ
⢠Over 233k ârecord creatorsâ
⢠Over 3,300 archive addresses
⢠40+ BackEnd services
⢠5,595,180 visitors last year
http://discovery.nationalarchives.gov.uk/
17
19. What our users said about it
âIt is great to have all these resources in one place so you can search across them all easilyâ
Feedback from a member of staff
âThis is brilliant! Being able to reasonably identify genealogical records and order them online
from your archives at a reasonable cost is beyond my wildest dreams!â
Feedback from a customer
âThe project has succeeded in integrating diverse data and databases from hundreds of archives
to create a beautiful and intuitive new resource that provides a greatly enhanced platform for
catalogue information. Discovery will enable archivists to promote their collections more
effectively and is certain to attract new users to archivesâ
Feedback from a Senior Archives Services Manager, Kingâs College London
âThe assessment panel very much enjoyed hearing about this important and fascinating service
and its adherence to the Digital by Default service standard, this is a challenge to other
government services to reach the same level of quality and user focus.â
From the Discovery assessment by GDS assessment panel
http://discovery.nationalarchives.gov.uk/
19
20. Plan
Quick introduction to Discovery Interface
1.Why it was needed
â Prior to Discovery
â New policies, business requirements
1.What it offers
â Goal
â To the end users
â Technically
â What our users said about it
1.How it was built
â Work with other archives (a2a, expert contribs)
â User centred design
â Agile Methodology
â Evolution of features
â Learnings
Conclusion on key reasons for success and benefits
http://discovery.nationalarchives.gov.uk/
20
21. How it was built: work with other archives
Access 2 archives
â˘Retrieve contents from 400 archives about 10m documents
â˘One of the biggest works of that scale worldwide at that time (2000 to 2008)
Discovery
â˘1 point of entry to TNA archives then other archives
â˘Schema designed through trial and error
â˘Integration of A2A in 2014
Expert Contributions
â˘Currently in development
â˘provide a back office to other archives so that they can directly update their documents into our
database
â˘Start with the British library
http://discovery.nationalarchives.gov.uk/
21
22. ISAD(G)
{
"IAID" : "adb5d9bc-f67b-4e1d-af0a-4d52fb67c923",
"PIAID" : "148b7157-ec5e-4c58-8240-c5ebe05a6e25",
"LvlId" : 9,
"Lang" : "English and French",
"HeldBys" : [{"XRefId" : "A13530664"}],
"Ref" : "QSP/244/10",
"Ttl" : "Papers relating to Harrow Parish",
"CovDts" : "1879-1932",
"CFrmDt" : "18790101",
"CToDt" : "19321231",
"PhysDescFrm" : "1 volume",
"AcsConds" : "<p>View by appointment only</p>",
"RstrOnUse" : "<p>Copyright restrictions apply</p>",
"ImmSrcOfAcs" : [{"Desc" : "<p>The papers in this collection were made available by Catherine
Stoye, the daughter of Professor G P Wells, in July 2001</p>"}],
"CustHist" : "<p>This collection was first deposited at the
United Reformed Church (URC) History Society in 1991</p>",
"LocOfOrigs" : [{"Desc" : "<p>Original in possession of the Hon. Mary
Berkeley</p>"}],
"CpsInfo" : [{"Desc" : "<p>Microfilm Roll: PB 83</p>"}],
"SC" : {"Desc" : "<p>Including business accounts and
photographs</p>"},
"Links" : [{"XRefT" : "Related material","XRefD" : "<p>See also Vestry, accounts and
rates</p>"}],
"Src" : "A2A"
}
<c level="file" langmaterial="eng fre">
<did>
<unitid label="Reference" countrycode="GB" repositorycode="55">QSP/244/10</unitid>
<unittitle label="Title">Papers relating to Harrow
Parish</unittitle>
<unitdate label="Date" normal="18790101/19321231">1879-
1932</unitdate>
<physdesc label="Extent"><extent>1</extent> <genreform>volume</genreform></physdesc>
</did>
<admininfo>
<accessrestrict><head>Conditions of access</head>
<p>View by appointment only</p>
</accessrestrict>
<userestrict><head>Restriction on use</head>
<p>Copyright restrictions apply</p>
</userestrict>
<acqinfo><head>Immediate source of acquisition</head>
<p>The papers in this collection were made available by Catherine Stoye, the daughter of Professor G P
Wells, in July 2001</p>
</acqinfo>
<custodhist><head>Custodial history</head>
<p>This collection was first deposited at the United
Reformed Church (URC) History Society in 1991</p>
</custodhist>
<altformavail><head>Location of originals</head>
<p>Original in possession of the Hon. Mary Berkeley</p>
</altformavail>
<altformavail><head>Copies information</head>
<p>Microfilm Roll: PB 83</p>
</altformavail>
</admininfo>
<scopecontent><head>Description</head>
<p>Including business accounts and photographs</p>
</scopecontent>
<add>
<relatedmaterial><head>Related material</head>
<p>See also Vestry, accounts and rates</p>
</relatedmaterial>
</add>
</c>
Discovery format
http://discovery.nationalarchives.gov.uk/
22
EAD
23. How it was built: User Centred Design
http://discovery.nationalarchives.gov.uk/
23
24. How it was built: Agile Methodology
User Interface
Back end
Database
Browse collections
Search
Find archive
search on recordsâ title
search on recordsâ title
Search on all recordsâ metadata
Search on all recordsâ metadata
Search on recordsâ contents (scanned doc)
Search on recordsâ contents (scanned doc)
http://discovery.nationalarchives.gov.uk/
24
30. How it was built: Learnings
â˘Use of cutting edge technologies (Mongo DB) paid off
â˘Moved from proprietary to open source technologies
â˘Get it right within TNA before reaching other archives
â˘Work with archives in a more collaborative way
http://discovery.nationalarchives.gov.uk/
30
31. Plan
Quick introduction to Discovery Interface
1.Why it was needed
â Prior to Discovery
â New policies, business requirements
1.What it offers
â Goal
â To the end users
â Technically
â What our users said about it
1.How it was built
â Work with other archives (a2a, expert contribs)
â User centred design
â Agile Methodology
â Evolution of features
â Learnings
Conclusion on key reasons for success and benefits
http://discovery.nationalarchives.gov.uk/
31
32. Conclusion
Key reasons for success
â˘Focused on user needs
â˘Iterate throughout and flexible about time scale
â˘Happy to take risks and redo things if necessary
http://discovery.nationalarchives.gov.uk/
32
33. Conclusion
Key Benefits to TNA
â˘Make savings (retire 16+ systems), invest on the future
â˘Pull all the data + users together
â˘Put us in a position where we lead the sector
Key benefits to archives
â˘Access to a wide audience
(6million visitors on Discovery last year)
⢠Added value to their services with more searchable data than ever
⢠Security that their catalogue will stay online, be maintained, improved
⢠All of that regardless of the cuts they may experience
http://discovery.nationalarchives.gov.uk/
33
34. Thank you for listening
Any questions ?
jeremie.charlet@nationalarchives.gsi.gov.uk
Find out more on http://blog.nationalarchives.gov.uk/
http://discovery.nationalarchives.gov.uk/
34
Editor's Notes
A 40min worth presentation
The national archives is the official archive and publisher for the UK government and for England and Wales. We are the guardians of some of our most iconic national documents, dating back over 1000 years.
Weâre going to tell you the story of how we built Discovery and itâs a story that lasted for the 5 past years.
Can do a live demo instead of using below slides
Describe what we see,
Read âWhat is Discoveryâ description to introduce the project
If the document is classified, you will see it here. Possibly make a FOI request
If the document is available for download, there would be a link, or the cost of the document if costly
TNA Records information provided across many different systems and parts of the website
End users: So many tools to master, have to search on every system
Archival teams: So many systems managed and supported by different teams, some of them limited for expansion and future development, contributed to in different ways
Sector leadership and our support of the government policy on archives. We were given that responsibility while we started working on Discovery
increased digitisation and born-digital records: while weâre limited in numbers with paper documents (we cannot finance the digitisation of everything), we just have to handle all new born digital documents, and there are more and more of them
Legislative changes â eg manage FOI and Data Protection: we want documents to be opened by default and enable users to make FOI requests if they are closed
Increasingly bleak financial outlook for the sector: like other business in the cultural field, less and less fundings from the government. many archives do not have the budget to build a website, maybe some of them cannot afford it anymore after a budget cut.
Helped to prioritize tasks. If it does not answer any of those goals, we probably do not need it
one of the main focuses of Discovery is that it needs to be be future proof in the sense that it can handle the high volumes of digital records that will be coming our way over the next few years.
Robust
Pairs of servers to handle failures
Extra scale down instances hosted on could services to handle high peaks of traffic
Detection of DOS attacks by firewalls (CheckPoint IPS) + automated reaction ( in development)
Flexible and future proof
3 tier architecture enables us to move tiers to cloud services if needed
Use of shards on databases and search engine indexes:
can be extended with extra servers
Can be easily transferred to the cloud services
Secure
Platform highly constrained with (too) many physical firewalls
Meets government legislation on security
federated Single Sign On to TNA websites using .Net membership provider
Discovery
Transfer website for government staff
Record copying: get a copy of non digitized records
Expert contributions
keep up with current technologies: a lot of innovation going on
From SOA to micro services
diverse: mainly .Net framework (WCF on the backend and MVC .NET for front end) but involves also javascript and php (front end), scala and java applications (back end), Mongo DB + Sql Server on the data layer
Several open source technologies (Apache Solr, Wordpress, MongoDB, AngularJS, Akka.Net)
NRA National register of archives
A2A Acces 2 Archives
Archon: contact directory of all archival repositories in UK
MDR: manorial documents register, very niche, historical, but we have a legal requirement to maintain it
TNA Catalogue: our archives here
Gives a good overview of what Discovery is focusing on:
Several servers dedicated to search (solr servers)
Lots of servers dedicated to metadata storage + most above all images: gridfs
Does not include the filers that still hold most of the files (but we are migrating them progressively to gridfs)
Use each quote to illustrate something:
1: as stated earlier
2: people pay for documents online, no benefits on our side, just to finance digitization of documents
3: we (TNA) are not the only ones to benefit from Discovery
4: this focus on user needs will be detailed later
Complete quote from Kingâs College:
The Discovery team have succeeded in building an attractive new search engine that will enable users for the first time to fully explore descriptions of the nationâs archival heritage, alongside records held by The National Archives. The project has succeeded in integrating diverse data and databases from hundreds of archives to create a beautiful and intuitive new resource that provides a greatly enhanced platform for catalogue information. Discovery will enable archivists to promote their collections more effectively and is certain to attract new users to archives
Access 2 archives
Biggest work of that scale worldwide at that time (2000 to 2008). About gathering the collections from 400 other archives (out of 2000 in UK). Stopped in 2008 by lack of funding
We defined a common EAD schema from analysing their own schemas (they were using ISAD (G) but not EAD for most of them)
Very complicated, most of them were not technical so we had to do the mapping ourselves, schemas varied extremely.
10m documents, contents from 400 archives, is now outdated, but still offers online visibility to many archives which did not even have a website
Discovery
Schema designed through through trial and error
Integrated A2A into Discovery in 2014 using the original data (does not include their latest updates but still provides them visibility)
Expert Contributions
Current ongoing project about providing a back office to external contributors so that they can directly update their documents into our database
Start with the british library which is our most expert/technical partner
Work iteratively. Will start reaching other archives once we got this one live and fixed most issues
We agreed with Axiell archival software to provide a âexport to TNA EAD formatâ feature to our archives in the future to get last version of their catalogues
Speak in the end about the timeline
First we built discovery website with our contents
Then we integrated other archives contents
And now we are working on a back office to enable other archives to update their contents
&gt; work step by step, get things working internally then integrate others
From ISAD(G) from another archives
To EAD TNA format designed in Access 2 Archives
To Discovery Information Asset format
We did it through user centred design. It means that we focus on users throughout the complete development of our products.
And it starts with user research, we need knowledge about our users first (and this is something we already did before Discovery)
We defined several categories of users using the hiking analogy, according to how advanced they are.
Then we created personas for specific group of people: a description of a fictive user, with what he knows, what he does, what he expects from us and from Discovery.
The purpose of personas is to create reliable and realistic representations of your key audience for reference within the project team and organisation wide. That way weâre clear that weâre not designing for us but for our user group. We updated the personas with staff from across the organisation in a series of interactive workshops.
When we look at the complete list of personas, we find again the list of users mentioned earlier:
Academics
People interested in genealogy
People completing administrative work
People passionated in history
Paid researchers from the gov, our staff, or businesses
User Centred design in TNA since 2008, before Discovery
When work started on Discovery, we already had a good amount of knowledge on our users
1 â Different forms of user research
We have carried out different forms of research throughout the project. We
sought feedback through an online exercise
sought feedback from visitors in our reading rooms showing them prototypes of the new designs
spent a day speaking to members of the public in a cafe to get feedback from non-users of the site
ran online surveys and used web analytics for more quantitative insights into how people use our website
ran one to one, hour long sessions in London and Bristol with our users showing them the new designs, recruiting to our personas to ensure we spoke to the right people. We observed and listened as the participants carried out tasks and gave their feedback.
We then took all this feedback, analysed it and made changes to finalise the new pages for release in beta.
Agility is a methodology that we used a lot in the government, that is recommended by Government Digital Services, that is the norm at TNA
Describes how people interact
Describes how we deliver a project
Consider you want to bake a big cake that represents your website, each slice a functionality, each layer a component of your software.
How do you implement that?
If you do it like a construction site, youâre going to build everything, one layer at a time. And youâre gonna wait for everything to be built to test it. If you missed something, misunderstood something, maybe had it completely wrong, youâre going to find this out after 6months of development and this is going to cost you a lot.
If you do it the agile way, youâre going to build one feature, maybe one sub-feature, at a time, and publish it immediately. So you can test it very quickly, and fail very quickly if there is anything wrong, and so, mend it very quickly while having wasted only 2 to 4 weeks worth of work.
Now we do that iteratively.
We go through iterations (sprint) of 1 to 4 weeks.
In which we go through a discover, dseign, develop, test phase. And we check it on the user on both the discover and the test phases, so twice in that very short iterations.
And then we iterate, and work on next steps
Concrete example on Discovery
Example of the discovery phase: we build a mockup, a fake website, and go in the reading room and get it tested by end users: we immediately know whether it is going to be useful to them, and we donât risk wasting weeks of implementation to learn it
Example of the test phase.
We deliver most of our services directly in beta, a non finalized version
We iterate not only to build new features, but sometimes to redo things, to improves existing features.
Good showcase is how our front page evolved from 2011 to 2014, as shown here.
Discovery is a search engine, and when you think of what a good example of search engine is, we think about Google. Their User interface is minimized, super simplified. You donât need to learn it to use it, you already know how to use it, but you can still do complex searches if you need and are more familiar with it.
Gathering all those collections from different sources is extremely hard, and to make a simple UI too. But this is what we are trying to achieve with Discovery.
On last version, a very visual interface, so that the end user can quickly scan it. Try to bring what he really needs first. He searches by default on all collections but can tick a box to search only within TNA. We hided the menu bar in that red button because few users are going to use it, etc.
First technical learnings then broader:
Use cutting edge technologies: MongoDB at its start: tough to manage for 9 months with a lot of exchanges with Mongo, but pays off in the end
better performances than Mysql, document db great to represent archival documents, scale with sharding
built a very good relationship with Mongo, and they shaped their product for us, gave us discounts
Move from proprietary Autonomy to Solr, RedDot to wordpressbecause of problems about supplier deliveries, their productâs life expectancy. And for money reasons
We waited for 5 years to work on expert contributions, to get our own house in order (have a proper schema matching all our in-house collections)
Look at how we have been working with other archives in A2A then Experts Contribution: first we tried to get their contents ourselves, but it could not really work, and it stops by lack of funding. Experts Contribution follows a different philosophy, we provide them with a platform and are going to guide/teach/help them to get their contents on it, but they are going to do it mainly themselves.
Maybe add:
Shiny vs non shinyImplementation of the back office AFTER the frontend: would have been better to implement both together?&gt; work on front end first was good to show thatâs useful and beneficial, but you must know that youâll have to invest on the non shiny to get all the benefits.
Flexible: itâs not because weâre agile that we donât have a plan. We plan a year ahead which features we want to bring. But weâre flexible with that plan, weâre going to reassess priority of our features on each iteration. We are maybe going to only implement a sub feature of a big feature, maybe we are going to postpone another big feature. Agile is about setting an amount of people working for a specific period of time. Not about trying to implement a specified set of features. We are going to implement what we think is best in the time we have, step by step. And we have the guarantee that we will provide something that works as a whole and will suit our user needs.
Opened to fail: happy to redo things if necessary(browse is on the 3rd iteration)
Key Benefits to TNA
Pull all the data + users together
&gt; easier to have holistic picture of all, better understand who our users are, what they need, what are our most popular collections
Key benefits to archives
Allows smaller archives to serve their contents to a wider audience (6m used Discovery on last year: no way a small archive could achieve that on its own)