This article discusses the differences between Versailles, Versailles and Versailles. Versailles refers to a city near Paris known for its royal palace. Versailles is a commune in France also known for its royal palace. Versailles is a commune in western France that shares its name with the other locations but is distinct from them.
We had hoped to work together much more over the last year, but we haven’t been able to for our own reasons, but now that RES is built, it is a good time to reboot our relationship.
We won’t be putting it to a referendum
The Research and Education Space (RES) is a partnership project between Jisc, the British Universities Film & Video Council (BUFVC) and the BBC that aims to make it easier for teachers, students and academics to discover, access and use material held in the public collections of broadcasters, museums, libraries, galleries and publishers.
The RES initiative comprises:
A platform, built by the BBC, which aggregates the catalogues of publicly-held archives and makes them accessible to the UK’s educational establishments.
A collaborative project to work with collection holders. public sector organisations, archives and libraries - to release their catalogues in the form of linked open data, to assist in the discovery of these assets
An ambition to stimulate public and private companies to build teaching products, underpinned by the platform, for the UK’s education sector.
Right now we’re indexing data from the huge British medical charity The Wellcome Trust, We’re working with the British Museum, The National Archives, the British Library, Nature, the Library of Congress, the Ordnance Survey and many many more.
To start, let’s go back 2500 years
Then the citizens of Athens had greater access to archive than we do today
They could go into the ‘metroon’ - which held all of the political, administrative and cultural documents held by the state and read and take away copies of anything they found.
But, times have changed.
There are now too many Metroons
Only a select few are allowed to enter and they may be able to look at what they find, but probably not copy it or borrow it…
There’s obviously a lot more data as well
In a story on the BBC news website in March last year IBM estimated that 2.5 exabytes - that's 2.5 billion gigabytes (GB) - of data was generated every day in 2012.
"About 75% of data is unstructured, coming from sources such as text, voice and video,”
So we need to start making some kind of sense of all that …
And this data from the National Archive neatly illustrates the issue that RES is trying to solve.
In the first data set we have Alfred Frederick Minall, in the second AF Minall, are they the same person?
We’ve set ourselves a big task … but someone once asked … How do you eat an elephant and the answer is
One mouthful at a time …
So let’s start small … with a school student looking for information about Versailles for their homework.
Do they mean the Palace of Versailles
Or The Treaty of Versailles
Or the Japanese visual kei metal band formed in 2007
And this is what I get if I put Versailles into Google, now I know that this is a deliberately incomplete example – but it does illustrate how Google, while it does its job brilliantly is perhaps not the answer to the problems posed by aggregating cultural data
Google does not particularly care about provenance – we do
Google does not particularly care about authenticity – we do
Google does not particularly care about licencing – we do
Google does not particularly care about permanence– we do
Google cares about what’s contemporary – we don’t
Google cares about the number of links to an asset – we don’t
Wouldn’t it be better if, when you’d found a reliable source of data … that clicking on a person’s name or an event in a document would deliver you a comprehensive list of everything about them held in any institution around the world
There’s lots of good reasons to open up access to our collective heritage and memory, but there are many challenges in the way.
The internet has the capability and the technology to enable this, but we need to work together to deliver this change
we need to use open standards to make it work
That’s why we’ve backed Linked Open Data … a mechanism for publishing structured data on the Web about virtually anything, in a form which can be consistently retrieved and processed by software.
The result will be added to the world wide web of data which works in parallel to the web of documents our browsers usually access, transparently using the same protocols and infrastructure.
turning legacy datasets into linked open semantic data is not technically hugely difficult, but it can be time consuming and requires some specialist expertise.
Where the ordinary web of documents is a means of publishing a page about something intended for a human being to understand, this web of data is a means of publishing data about those things.
So here’s what we’re building.
Powering the Research & Education Space is Acropolis, a technical stack which collects, indexes and organises rich structured data about archive collections published as Linked Open Data (LOD) on the Web. The collected data is organised around the people, places, events, concepts and things related to the items in the archive collections—and, if the archive assets themselves are available in digital form, that data includes the information on how to access them, all in a consistent machine-readable form.
The Research and Education Space is made of up three main components: a specialised web crawler, Anansi, an aggregator, Spindle, and a public API layer, Quilt.
Anansi’s role is to crawl the web, retrieving permissively-licensed Linked Open Data, and passing it to the aggregator for processing.
Spindle examines the data,
Quilt is responsible for making the index available to applications, also by publishing it as Linked Open Data. Because RES maintains an index, rather than a complete copy of all data that it finds, applications must consume data both from the RES index and from the original data sources.
The real cleverness lies in Spindle
It’s designed to evaluate rich descriptions of people, places, events, collections, concepts and things … primarily where the data explicitly states the equivalence … and aggregate and store that information in an index … preserving complete provenance information … in a manner which makes it most useful for those who aren’t trained and experienced archivists or librarians to use.
For example, The Treaty of Versailles is an event, and one who will appear in the catalogues of the British Library, the National Archives, the BBC, The Imperial War Museum, and countless others. Spindle aims to be able to aggregate all of those occurrences of the Treaty of Versailles under a single entity from which all of the material related to him can be located.
By doing this, multiple sets of catalogue data can be represented in a form which matches the way in which people tend to try to use archives (or, indeed, the Web in general): homing in on the subjects they are interested in, safe in the knowledge that all of the available archive material will be grouped under those subjects.
BUT The RES project will NOT be directly developing end-user applications, although sample code and demonstrations will be published to assist software developers in doing so. RES only indexes and publishes catalogue data released under terms which permit re-use in both commercial and non-commercial settings.
We have seed-funded a small number of prototypes such as this – RES Builder from Gooi
PDF or document scanner
Example: teacher scans in exam spec or learning objectives into the RES Builder platform, it then scans and brings up relevant words and phrases from the document and matches it to keywords in the metadata attached to the assets and brings up a variety of different assets for use at different levels in the classroom.
Soundpools from Touchpress is a deliberately left-field application - where a mobile app draws audio clips from data in RES to create an immersive audio experience that firest a student’s imagination …
So, how are we doing this?
A critical mass of linked open semantic data is necessary before the RES platform can really demonstrate its true power
We are working with archive collections across the UK to help them publish Linked Open Data describing their collections (including digital assets, where they exist). Although many collections are already publishing LOD or plan to, the RES project partners will be providing tools and advice to collection-holders in order to assist them throughout the lifetime of the project.
In order for data to be RES compliant there needs to be a digitised asset.
But we don’t care about the format the asset has been digitised in as it will always be served from the collection’s holders servers,
So we don’t care where it’s stored
Neither do we care how it’s licenced – free, subscription or pay per view – it’s not our business
The RES platform will not directly consume or publish digital media (audio, video, images, documents) itself. it will only index data about digital media which has been published in a form which can be used consistently by RES applications.
Each collection holder must take responsibility for writing and maintaining good quality data about their assets,
But they need to do that anyway? Right?
They also need to assign usage rights in machine-readable terms
But they need to do that anyway? Right?
Then they need to publish it as Linked Open Data on a publicly accessible server
explicitly and machine – readably online using Linked Open Data principles
The data about the representation must include a rights information triple referring to the well-known URI of a supported license.
The data describing digital assets must be made available under the terms of a supported license and include explicit licensing data in order for it to be indexed by the Research & Education Space and be useable by applications. Our approach is aligned with the Open Data Institute’s guide to publishing machine-readable rights data. And aligned with the work of the Copyright Hub.
We only keep a thin layer of assertions and links, That’s all
Where?
well because it’s Linked Open Data published under a permissive licence with the content licenced explicitly you can use it pretty much anywhere you like. How you like,
We’ll be using it in RES to transform access to content, data, information for children in schools across the UK …
But as RES will enable frictionless sharing … there is no reason why the use of our technology should be confined to education projects
So we’re opening up the opportunities for incredible collaborations between cultural organisations …
And of course the internet knows no boundaries
When RES is up and running … if you’re the curator of an exhibition at a cultural institute in the UK, you may worry about loaning physical objects from other institutions, but we’re providing the technology to make culture jams, object mashups and seamless sharing child’s play
Because RES is open source
It is possible to implement a distributed architecture
Please take it away and use it.
It’s another british gift to the world.
But better than cricket, or sandwiches,