Modelling of Archival User Needs - Steffen Hennicke
1. co-funded by the European Union
Modelling of Archival User Needs
Steffen Hennicke
Berlin School of Library and Information Science
Humboldt-Universität zu Berlin
27.06.2014
APEX, Dublin
2. Modelling of Archival User Needs, APEX 2013 227.06.2013
Research Context
• The thesis investigates the hypothesis that
●
there are sets of typical patterns in user enquiries to archives,
●
these patterns can be formally represented in a ontological model.
• Addresses prevailing research issues:
●
What are the information needs of users of archives?
●
What questions do users pose to archives?
●
Cornerstone for the design of better access and discovery
systems for the huge (but largely dormant) information
potential of archives and other historical knowledge bases.
3. Modelling of Archival User Needs, APEX 2013 327.06.2013
Research Interest
• Provide empirical insight into the nature of written
user enquiries in free text to archives.
• Investigate how common patterns of enquiries can
be reasonably represented in an ontological model
in order to produce adequate answers for the user.
• Main Research Question: “Is there a hypothetical
ontological model which can represent user enquiries
and their probable interpretations as formal queries
against a model of the archival target world that
would adequately answer the enquiry or its implicit
purpose?”
4. Modelling of Archival User Needs, APEX 2013 427.06.2013
Research Data
• Federal Archives of Germany (Bundesarchiv)
• Written “Reference Questions” contain largely unfiltered
information needs.
• 60 user files with 546 single questions
– 260 questions explicit or implicit “resource discovery”
– 70 questions “factual”
– 216 questions “other”
• In total, 330 questions (“resource discovery” and “factual”)
are currently being scrutinized for common patterns.
5. Modelling of Archival User Needs, APEX 2013 527.06.2013
Methodological Approach
• Wendy Duff and Catherine Johnson (2001)
– Analysis of E-mail Reference Questions
– Type of Questions
– Type of Wanted and Given Information
• 2-Step-Interpretation of Questions
– Wanted Information: Explicit user perspective
– Context of Information: Relations between information entities
– Aim: To find common patterns questions
• Ontology Engineering: Generalisation and formalisation of common patterns
• Ontological model: CIDOC CRM
– Empirical based Conceptual Reference Model (CRM) for the cultural heritage domain
– Conceptualizes information and history around the notion of events
6. Modelling of Archival User Needs, APEX 2013 627.06.2013
Example
• Context: A source I would like to see are the police-
and surveillance reports from the Weimar Republic
which are about revolutionary movements. I would like
to know what the surveillance agency of the Reich (or
the ones of the Länder) had to say about [person
name].
• Question 1: Do you know if the Bundesarchiv holds
such documents?
• Question 2: Which agency of the Reich was
responsible for the surveillance of revolutionary
movements? The Reich or the Länder?
7. Modelling of Archival User Needs, APEX 2013 727.06.2013
Example – Wanted Information
• First Interpretation: Which are probable and adequate answers
to the question?
• Question 1 (material-finding): “pointers” to documents…
– …about [“person name”]
– …about “revolutionary movements”
• Question 2 (fact-finding): name of a legal body
• Given Contextual Information
– name of a specific actor (“[person name]”)
– type of a group (“revolutionary movements”)
– type of a legal body (“surveillance agency of the Reich”)
– type of documents (“police- and surveillance reports”)
– name of a period (“Weimar Republic”)
8. Modelling of Archival User Needs, APEX 2013 827.06.2013
Example – CIDOC CRM
• Second Interpretation: How to translate the interpretation
of the question to CIDOC CRM?
• CIDOC CRM: Historical entities are connected through events.
• Question 1: The documents in question are the result of a
“surveillance activity” targeted at a specific type of group.
• Question 2: The legal body whose name is in question was
involved in an “surveillance activity” which was targeted at
a specific type of group and a specific actor.
• The common denominator is a “documentation activity”
which is the most general abstraction of a “surveillance
activity”.
9. Modelling of Archival User Needs, APEX 2013 927.06.2013
Example – Question 1
Original Question: Do
you know if the
Bundesarchiv holds such
documents?
10. Modelling of Archival User Needs, APEX 2013 1027.06.2013
Example – Question 2
Original Question: Which
agency of the Reich was
responsible for the
surveillance of revolutionary
movements?
12. Modelling of Archival User Needs, APEX 2013 1227.06.2013
Example – Documentation-
Activity
• Question 1 and 2 and their interpretations have
shared patterns which can be unified:
“Documentation-Activity”.
• New Classes and Properties to the CIDOC CRM
– E7.1 Documentation Activity
– Self-Documentation
– Documentation of Others
– E29.1 Mandate
– Several properties (e.g. “follows mandate”, “has mandate”)
13. Modelling of Archival User Needs, APEX 2013 1327.06.2013
Value and Potential
• Formal model of (archival) user needs.
– Strong conceptual and empirical reference for the design of
(archival) information systems.
– Design and extension of (archival) metadata schemas.
• Method to formalise (archival) user needs.
– 2-Step-Interpretation of user questions.
• Unobtrusive access layer to archival holdings.
– Does not interfere with archival documentation principles.
– Maybe with documentation practices.
• Extensible context layer for archival holdings.
– Connect to information outside of the archive.
14. Modelling of Archival User Needs, APEX 2013 1427.06.2013
Feasibility
• CIDOC-CRM representations exists for RDF(S)/OWL.
• Case study with EAD encoded finding aids.
• Documentation Activity explicates the (historical)
context of the creation of documents.
• (Semi-)automatic means of information extraction
(NLP).
• Pro-active documentation for high-value collections.
• Community-based contributions.