This document discusses the challenges of representing cultural heritage digitally in an integrated and contextualized manner. It argues that current digital representations are still fragmented and disruptive due to using rigid classifications rather than conceptual models that capture relationships. The paper advocates learning from past practices like Wunderkammer collections that integrated diverse objects conceptually. A conceptual reference model is proposed to bridge divisions between collections and allow exploration of heterogeneous cultural data in meaningful ways.
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Connecting Cultural Heritage Through Conceptual Modeling
1. Abstract
The last 20 years has seen a vast amount of digitisation and a large number of
digital projects that have not yet succeeded in establishing long lasting
collaborative and integrated platforms for quality research, education and
engagement.
The task of representing both the diversity of cultural resources, as well as common
and related histories from different perspectives, is still a significant challenge.
Despite considerable investment the use of technology in representing cultural
heritage is still in a ‘disruptive’ and fragmented phase. This paper looks at the role
of ‘real world’ knowledge representation in providing a truly representative and
joined up picture of cultural heritage within and across national borders.
2. Digital Curation in
the Open World
Learning the Lessons
(of the last 500 years)
Dominic Oldman
British Museum
27th May 2015
Journées ABES
2015 Montpellier
4. Or,
The Answer to Life the Universe and Everything
The artificial closed world The real or open world
5. • Background: What’s the Problem?
• What is a Conceptual Reference Model?
• Our will to order
• Reuniting collections
• Exploring heterogeneous information.
• Building Knowledge
7. UCL Survey of BM
Collection Online
“The majority are seeking a known object,
and utilise discipline specific search terms,
showing goal-driven intent and a detailed
prior knowledge of the museum.”
[Scholarly Information Seeking Behaviour in the British Museum Online Collection, (2011), Terras, Ross]
8. “It also suggests that
academic browsing in a
museum environment is
somewhat problematic as
users have to be fairly linear
in their search strategies with
little satisfaction when
searching broadly or
browsing.”
[Scholarly Information Seeking Behaviour in the British Museum Online Collection,
UCL Survey of BM
Collection Online
9. “A Library platform will give
access not only to all… the
items on its shelves, the e-
items it has permission to
provide, and the items
(physical and virtual) within
its network of collaborating
institutions - but also to all
the data it can find: Data
from a curated set of reliable
institutions, including
scientific and non-profit”
David Weinberger
Issue for Libraries?
13. Square pegs, round Holes Type: Image or Man-made
(physical) object?
Subject: Mummy mask or is that
Object Type
14. “he was convinced that everything he had written hitherto
consisted solely in a string of the most abysmal errors and
lies, the consequences of which were immeasurable”.
W.G. Sebald
“Meaning cannot be counted, even as it can be
counted upon, so meaning has become
marginalized in an informational culture
Mark Pesce
Quantity and Quality
But
15. “…the information explosion, far
from serving the needs of the
burgeoning knowledge economy,
intensifies the need for quality
information and expertise that
libraries and librarians provide”.
Beyond the Book - Schnapp & Battles
• Public Library Visitor figures down -
particularly digital visitors.
• Academic library enquires down.
Libraries
16. Museums
“to claim some anchoring space in
a world of puzzling and often
threatening heterogeneity, non-
synchronicity, and information
overload”
Andreas Huyssen
• Visitors figures up - a beneficiary of current
digital disruption?
17. How do we achieve digital stability?
How do we achieve a contextual, semantic, cross
disciplinary OPAC?
2. What is a Conceptual Reference Model?
18. Animal, Vegetable, Mineral,…. A variant of Twenty Questions,
derived from the Linnaean
taxonomy of the natural world with
origins from pre-history
19. “[it]..contained promiscuously, rocks of unusual shape, coins, stuffed
animals, manuscript volumes, ostrich eggs, and unicorn horns. Statues
and paintings stood side by side with curios and exemplars of natural
history in these cabinets of wonder when people started collecting art
objects. Giorgio Agamben
The Wunderkammer (Cabinet of Curiosity)
Ferrante Imperato's Dell'Historia Naturale(Naples 1599),Museum Wormianum : Ole Worm's cabinet of curiosities. (16th
C)
Agsburg cabinet – 17th C
20. "Only seemingly does chaos reign in the
Wunderkammer,
however: to the mind of the medieval scholar, it was the sort of microcosm
that reproduced, in its harmonious confusion, the animal, vegetable, and
mineral macrocosm. This is why the individual objects seem to find their
meaning only side by side with others, between the walls of a room in which
the scholar could measure at every moment the boundaries of the
universe!”
Giorgio Agamben - The Man Without Content
21. The Microcosms
The Cabinets of
Curiosity
The
Wunderkammers
The Macrocosm
The Conceptual Reference Model
Wunderkammer CRM
Animal Entity
Vegetable Entity
Mineral Entity
CIDOC CRM
Biological Object
Physical Thing
Physical Man-made Thing
Physical Object
Man-Made object
Conceptual Object
25. Question
“Is it an accident that the
library, natural history
specimens, sculptures and
antiquities were part of the
same institution?”
Answer:
“I think it is”
Edmund Oldfield
Assistant Keeper of Antiquities at
the British Museum in 1857.
Antonio Panizzi
British Museum’s Principal Librarian
Asserted a fundamental distinction between Christian
art and “heathen antiquities”
26. Natural History Museum - 1881
National Gallery 1824
British Museum - 1753
British Library - 1973
Divisions & Re-classifications
• Specialisations of expertise
• New classification systems
• The unity of things is forgotten.
27. “We study the order of things, but we cannot grasp their innermost
essence. And because it is so, it befits our philosophy to be writ small”
W.G. Sebald on Thomas Browne
“For if you look at them you will not see
something that is common to all, but
similarities, relationships, and a whole series of
them at that.”
Ludwig Wittgenstein
“don't think, but look!” --
Ars Contextualis
31. Thing
everything consisting of or
carried by matter
Man-made Thing
+ must be existing due to
human intention
(this erases "natural")
Physical Man-Made
Thing
: + must be consisting of
matter
Man-Made Object
+ must be physically
separate
Man-Made Feature
+ must be physically
embedded
General
(but
precisely
defined)
Less General
(and
precisely
defined)
Legal Object
Physical Thing
Physical Object
Biological Object
Person
Conceptual Object
Propositional Object
Information Object
CIDOC CRM
Different Levels of Knowledge
37. “Their gaze is directed just
past it to focus on the
open anatomical atlas in
which the appalling
physical facts are reduced
to a diagram, a schematic
plan of the human being”
W.G. Sebald
“everything in the world
exists to give rise to a book”
Stéphane Mallarmé
41. Museums & Archives – Building a Picture
• Earliest site for human habitation in
Britain.
• Roman settlement.
• Settled by the Danes in the 9th century
after killing the King of the East Angles.
• Many important navel battles fought on
the coast. Battle of Lowestoft.
• Important fishing town since middle ages.
• Victorian Holiday Spa Town
56. Lowestoft & Scotland
‘Mary MacDonald from Point, Lewis
was a herring girl in the 1930s. She
said “I saw a lot of the world. I went to
Lerwick, Stonsay, Lochmaddy,
Yarmouth, Lowestoft, Peterhead and
Fraserburgh.”’
http://wovencommunities.org/collection/the-
herring-industry/
62. Using the a Conceptual
Reference Model
Joseph Conrad (Actor)
was present at
Sailing of the SS. Tuscania
(Event)
has time span
Time-Span Event
beginning
21 April 1923
ending
to 1st May 1923
SS Tuscania (Man-Made
Thing)
was produced by
Shipbuilding (Production
Event)
has time span
Time-Span Event
some time within
1921 - 1922
Portrait of Joseph Conrad
(Man-made Thing)
was produced by
Drawing
(Production Event)
has time span
Time-Span Event
some time within
April 1923 – May 1923
63. No Dead Ends!
Etching
(Man-made Thing)
was produced by
A Production Event
took place at
Workshop of Muirhead Bone
(Petersfield, Hampshire)
(Place)
64. The Point – at Last!
1. This took ages – even with computers!
2. It should be instantaneous. The computer should
already have found these relationships for me!
3. We need a conceptual reference model to mediate
between these resources.
65. Summary
A common reference for the
purpose of agreeing what we are
talking about even if we disagree
about the essence or nature of
these things.
67. Balancing Recall and Precision
“Google can bring you back
100,000 answers, a librarian
can bring you back the right
one.”
Neil Gaiman
68. Serious Consequences…
“The heavy reliance on keyword search in e-discovery places an
enormous burden on today’s legal teams. Inconsistencies in
language, inefficiencies in search techniques and software user
interfaces, which conceal more than reveal, place the attorney in a
difficult position”.
Legal Profession
• More and More data.
• Variable vocabularies.
• Inadequate search tools.
• Poor balance of recall and precision.
• Serious consequences for legal cases.
77. • Digital representation – Should not be a
surrogate of reality
• Rather a platform for the externalisation
of argument.
Digital Argument Requires Context
Observation
Belief
Proposition Belief Value
Concluded that
that Hold to be
Belief Adoption
adopted
Inference
Making
Used as a Premise
This isn’t a belief of
the original
organisations
78. Thinking About Data?
cv
cv
• Closed World
• Flat properties
• Tables and Fields
• Open World
• Levels of Knowledge
• Events & Activities
Poor Reuse
Value
High Reuse
value
79. Core Fixed Models Misrepresent Information
“It's still essentially impossible to bring data from
existing museum automation systems into a common
view…
Increasingly it seems that we should have concerned
ourselves with the relationships…between the objects.”
(David Bearman 1995)
80. The Intersection of the Digital Humanities
we are "still in an era of confusion
in the digital environment around
the sort of business models we
apply to questions of impact,
depth of data and so on. We may
not yet have the tools to
understand how we make those
decisions about the depth of data.“
Andrew Prescott
81.
82. Digital Curation in
the Real World
Learning the Lessons
(of the last 500 years)
Dominic Oldman
British Museum
27th May 2015
Journées ABES
2015 Montpellier
Editor's Notes
The reason why the dog is on the first slide is because it denotes a form of unsystematic research. The more information we have the more we need to find different routes through it. A dog doesn’t systematically sniff something out by going meticulously over the ground inch by inch – but darts from one lead to another until they have found there quarry. This is more akin to how we need to deal with bigger data repositories.
This presentation is loosely based around an article that I wrote with colleagues from the CRM Special Interest Group but in that article we were talking about digital representation issues of the last 20 years particularly around the creation of digital libraries. In this presentation I want to start looking at lessons from the last 500 years – or perhaps more – to try an understand how we manage and integrate increasing amounts of diverse and open information.
I want to make a distinction between internal institutional terms of reference and those that have wider or more universal application in the Open World.
We all have information systems that were designed to fulfil the requirements of our own organisations, areas of work and operations. We have heterogeneous datasets shaped by our mission, disciplines and locality - and are valuable because of that. They use unique and customised models and vocabularies that mean most to the institutions that manage and use them. In most cases these standards and the technology they use were never designed for open consumption and needed mediation, often using hand-crafted Web sites. They only make sense in conjunction with the additional knowledge of the institution.
Therefore when we publish information to the open world we need a mechanism by which the implicit knowledge can be made understandable to all. It needs to operate using a real world model not an artificial model.
The main problem we are trying to solve is how can we bring together different very different but related information from different knowledge organisations and integrate them without losing any of the original meaning, context and language.
The key to doing this is a not technology but a system of knowledge representation that is independent of technology and which describes things using universal concepts grounded in reality.
This requires a different mindset and a different approach to representing digital information that is very different to the way that we normally order information.
Finally we will look at the benefits this provides in terms of exploring information across institutional boundaries and in building and layering new knowledge.
What I hoped to do is try to define what the problem is and show how we can use knowledge representation to solve the
When we present our data on the Web we find that it has limited usefulness for people who not familiar with our internal conventions. It’s not as useful for wider and different audiences who live and work outside our institutional worlds.
A survey by University College London in 2013 found that people using the BM’s collection online system were more able to find the object record that they wanted if they already had an understanding of the data and the and terminology used. The implication is that the system is less accessible and therefore less used by other groups who are interested in the information but who are not well served by the way the data is represented.
Additionally, because it generally provides what amounts to an electronic reference card it doesn’t readily provide a means for non-linear exploration. That wasn’t the objective of the design of the original catalogue systems that have simple been transplanted onto the Web.
Again the UCL survey makes the point that other groups interested in broader exploration of the dataset, like academic researchers, find the system doesn’t comply with normal research methods. The model is one designed by museum documentation not for external audiences. Just because we publish openly to the Web doesn’t necessarily make the data open in this other sense.
This problem is magnified when we start to bring different types of system together. The challenge is to provide information systems that allow us to use computers to reason across these datasets without homogenising the data. The future of the library system is to point users at a whole range of resources that might have related information and which is sourced from outside the library. When we aggregate data it still sits in knowledge silos.
How can, for example, the library OPAC evolve into a system that not only allows searching and exploration across different resources but how those resources can be integrated so that these other systems are not simple lumped together to be searched separately but that one search system can find related things as part of a single search regardless of the source classification systems.
When we don’t attend to these problems then even data that relates to same subject matter becomes almost impossible to harmonise.
These examples from the digital library system, Europeana (comparing the Europeana record with the original record), show that inconsistent and overly generalised approaches to museum records results in misrepresentations of data. There is no agreement about what the source fields mean - and therefore an inconsistent application of data mapping. Fields are misrepresented in an overly generalised data model making integration limited and not suitable for quality research purposes.
In this example the record type is incorrect. Have they used the department name as the subject. Is the subject matter of this object – archaeology? In addition this is an object record, not an image record as specified.
Here the object type is overly specialised again making integration very unlikely. Is the subject really the location. Isn’t this the location where this object has been found or came from originally? What do these different identifiers mean and how are they used?
Again the record type is not an image and the object name or type is substituted for the subject.
The pre-occupation with quantity rather than quality seems to show a lack of regard towards meaning and representing cultural objects faithfully. The French writer Gustav Flaubert slaved over every word with a perfectionism that implied that he was highly concerned with conveying the right meaning. In a world in which we have huge amount of information, quantity trumps quality, meaning seems to have been pushed out. Data doesn’t seem to attract the same attention to detail and is considered and inferior type of content by the people who curate it and build infrastructure for it.
The idea behind the game of Animal, Vegetable, Mineral is that you use a common reference for helping to deduce the thing you are trying to discover. This has its origins in the Linnaeus system and in particular in the original “museums’, the Wunderkammers of the 16th and 17th centuries.
These are all Wunderkammers or Cabinet’s of Curiosity.
On the left is Ole Worm’s Wormianum. Worm was a Danish philosopher and collector during the 17th century. His Wunderkammer is a representation of nature containing natural and artificial objects (that are nevertheless reflections of nature). Artificial objects are still, animal, vegetable or mineral. There is an underlying knowledge system or ontology of the day that allows resemblances (for example patterns) and meaning (common concepts) to be identified between these objects. In this way a unity or harmony is found across the chaos of nature.
The Augsburg cabinet (displayed at the Getty Institute) on the right, although different in appearance, works on the same principles.
These cabinets are part of the same network of meanings. They are compatible within the conceptual reference that governs them.
Quote – also underlines the reference with Animal. Mineral and Vegetable. The confusion of the Wunderkammer is resolved by the application of a reference model and the application of a macro mask on to the microcosm of the cabinet.
The CIDOC CRM Conceptual Reference Model works, if you like, with the same principles but reflects a wider and more sophisticated set of universal concepts.
The British Museum opened to the public in 1759 still in the tradition of Animal, Mineral and Vegetable. Displays will mixed together different types of object in the same tradition as the cabinets but on a completely different scale. The collection amassed by Sir Hans Sloane, unlike the Wunderkammers and cabinets of curiosity was comprehensive and vast and aimed at answering every more sophisticated questions about the history of the world. People were able to travel further, trade was becoming more global and bigger questions across cultures and time came to the fore.
During the 19th Century as scientific techniques and methodologies become more sophisticated specialisations emerge and the sheer size of the collections puts pressure on existing institutions
Gradually divisions are established between different types of collection. Art is distinguished from antiquity, as is natural history. This leads to a break up of the comprehensive collection and the establishment of more specialised institutions and classification systems.
The National Gallery takes on the responsibility of the Nations Art, while the Natural History Museum and other museums are established in Kensington. The library finally is separated from the British Museum in the 20th Century.
In the modern world we have tended to concentrate on the objective categorisation things (their essence) and in putting then into different boxes. The problem with this approach is that this fails to take into account different characteristics, perspectives and context across different cultures. Objectively grasping the innermost essence of something is extremely difficult and often removes meaning. It tries to fit a common an fixed model onto things that are more complex and varied.
Ludwig Wittgenstein provides the example of games in his description of Family Resemblances. There is no one characteristic that is common across of games.
At the very top level of the CRM Animal, Vegetable and Mineral is put into a event based context with people, places and temporal entities such as events. A framework under which all things can be given a universal semantic framework so that we can start to harmonise information across the classification divisions.
The CRM however, operates as a hierarchy that reflects different levels of knowledge so that not only can data sources with different classifications be integrated but also data sources that record different levels of knowledge.
It uses the same semantic framework to record prints, medals, ethnographic materials, art, and classical antiquities
But equally monument information, in this case differentiating between the original thing and reproductions of it including photographic and digital media.
It can record bibliographic information and text and refer to translations. But can then also relate to the objects which relate to the people and places named in the text.
Books themselves have always drawn upon the evidence of archives and cultural heritage. The book by Max Sebald recounts his travels around the English County of Suffolk. The country where I was born making observations about local history and linking these to other events in the world.
Everything in the world gives rise to a book but books by themselves do not always tell the full story.
In this painting by Rembrandt Sebald re-interprets the painting. He proposes that the surgeons are looking at a book in such a way that de-humanises the person. Aris Kindt was a thief and Rembrandt has the chief surgeon dissecting the hand first, something that would never have actually happened, drawing attention to the history and life of the subject, something that the surgeons themselves seems less interested in.
Thomas Browne was though to be watching this public dissection.
The Anatomy lesson resides in The Hague at the Mauritshaus, just across the North see from Lowestoft, the town where I was born. The painting is one from the Louvre collection and is by Samuel Wale. The scene is just a few miles up the coast from Lowestoft and part of a coastline that has been painted by many artists. Its not surprising that there are many connections between the east coast of England and the coast of Holland.
But these connections may extend far beyond the Europe which again is not surprising given the extent of previous colonial empires.
Lowestoft itself has a reasonably rich history captured in many different books, this one from 1849 as Lowestoft flourished both as a Fishing port and a Victorian holiday resort.
It has the following claims to fame!
It was recently featured as a town of economic depression and downturn with high levels of unemployment and lack of investment.
This is a picture of a steam train from the national railway Museum in York showing the steam trains that brought in visitors and took away fish to the markets in London and elsewhere.
The British Museum showcases the Lowestoft Porcelain industry in the 18th Century. An industry brought to a close by increasing costs in a factory that was unique in that it paid equal wages to men and women.
The coast off Lowestoft has been the subject of many paintings and drawings not least those of Joseph Turner.
The Tate, in an oil painting, sometime after the closure of the Porcelain factory shows the Lowestoft Bowl painting in a Dutch style.
Imagine sitting with your bucket and spade on Lowestoft beach watching 200 ships do battle in the Battle of Lowestoft in 1665. The sheer numbers of ships and people must have made for an incredible landscape captured in paintings found in many different museums and galleries.
The Metropolitan also reflects that migration of people and things from Great Britain to America. John Barker
When you look at related objects from the collection they move away from possible subject areas and concentrate on catalogue/inventory information.
These items from the National Maritime Museum provide a maritime history of the boats from Lowestoft but also the famous battle of Lowestoft.
And in Scotland we finding links between the Scottish fishing industry with a fishing boat, a new kind of model that allowed longer trips, (the Star of Scotland) which links to Lowestoft.
An archival picture of the boat in Scotland.
A schematic in the Aberdeen Local Authority records.
And the same boat again, renamed and moored in Lowestoft harbour after being bought for the fleet there.
This is a boat that my Father worked on as an engineer on the fleet at Lowestoft.
The links between the Scottish and English East coast fishing industries is both long and close. Again from a small archive project we see the migration people from the very north of Scotland to the East coast of Suffolk where many people from Scotland settled.
The drawing from the National Galley in Washington by Muirhead Bone a famous engraver and artist shows the Lowestoft harbour at work.
This pub now opened in the centre of Lowestoft suggest other links
Joseph Conrad first landed in Lowestoft from Poland and learnt English in Lowestoft and the fisherman who sailed up the coast to Scotland.
He meets Muirhead Bone and his brother and you can see these drawings by Bone of Conrad at the British Museum.
And here is Conrad on the deck on the Tuscania with the captain, David Bone and Muirhead Bone
These relationships can be captured and perhaps inferred from different resources using a conceptual reference model.
These links may end up going in a completely new direction. Mukul Dey another famous engraver and photographer of the terracotta temples at Surul came to England meeting Muirhead Bone and working at his workshop in Hampshire
See slide
Keyword search gives you high recall but precision is low particular for more detailed searching required by researchers and professional.
People like the fact that they get 1 million hits in 0.27 seconds, but are they getting the information they want – how relevant is it to their question
The Web of structured data provides more precise terminology and properties to search on but there are problems of accessibility. We encounter this when will look at advanced search screen. Here we can search by the precise terminologies of the British Museum but we not quite sure how to apply them to get the results that we need.
These are issues that are not confined to cultural heritage and the arts. In Law inconsistencies in language mean that keyword searching can hide important information and this can have a serious consequence affecting peoples life and costing huge amounts of money.
Keyword searching doesn’t understand context. The question, influenced by Rembrandt means very little.
A similar search in a contextual system allows the user to change the context as appropriate to their research area.
Associative searching tries to make of structured semantic relationships but it is the quality of these relationships and the balance between generalisations and specialisation and there applicability to certain types of information and domain that determine how well these associative searches work. Generalised associative searches may not provide that much more than broad relationships backed up by ranking algorithms. Schema.org may provide useful structured data search support one level of information retrieval but may not be appropriate in more specialist applications.
The issue of recall and precision also affects the CIDOC Conceptual Reference Model. While searching using universal concepts, including relationships, provide better semantic support for searching, using it in its raw state would prioritise precision over recall and make searching difficult and inaccessible. The solution is to combine different semantic paths into more generalised ones increasing recall but with the underlying precision intact. In this search system the 80+ entities in the CRM are reduced to only six. Things, Actors, Places, Events, Periods and Concepts. Equally the relationships within these domains of knowledge are also condensed to a manageable number that still make semantic sense for serious data exploration.
These will call the Fundamental Categories and Fundamental Relationships.
This means that we can produce searches that exploit the relationships between these different entities. We may ant to search for people rather than things, but we may want to use the results of that search to find things, or events and so on.
Research Data requires context. It needs to be transparent, reproducible, reference able, respectful and contain provenance.
Most Linked Data sources fail to achieve most of these requirements for serious research
If you don’t represent data correctly then how can you use it for serious research or education – or for that matter engagement.
The information that knowledge institutions publish is a proposition of the organisation often based on observation of the objects they describe.
Adoption of this information by Aggregators is only true belief adoption if the information is represented faithfully. Otherwise it is simply a independent proposition of the aggregator open to challenge.
We need to rethink what it means to digitise and publish information. We need to think about the requirements of open world and describing data according to open and universal concepts.
We need to move away from core fixed models that misrepresent data and concentrate instead the relationships between things that does not rely on variable taxonomies.
The current state of digital humanities……
We need to change our relationship with machines by teaching them to work with models that comply without our understanding of the work, rather than a database view of the work. We need to ensure that we are not slaves to technology and technologist
The reason why the dog is on the first slide is because it denotes a form of unsystematic research. The more information we have the more we need to find different routes through it. A dog doesn’t systematically sniff something out by going meticulously over the ground inch by inch – but darts from one lead to another until they have found there quarry. This is more akin to how we need to deal with bigger data repositories.