Experts Workshop on Controlled Vocabularies
Mainz 10-11/10/2013

Giovanni Colavizza

Topic Introduction
Controlled vocabularies and humanities, a problematic relationship.
The functional categorization of historical place types and the problems it raises.

Giovanni Colavizza
Leibniz Institute of European History
Colavizza@ieg-mainz.de

1
Experts Workshop on Controlled Vocabularies
Mainz 10-11/10/2013

Giovanni Colavizza

The scenario
Controlled vocabulary: a selected list of terms, which refer to concepts, used for
categorization. Criteria of concept selection are usually domain specific.
Focus for this talk: vocabularies of concepts, not proper names.

2
Experts Workshop on Controlled Vocabularies
Mainz 10-11/10/2013

Giovanni Colavizza

The scenario
Controlled vocabulary: a selected list of terms, which refer to concepts, used for
categorization. Criteria of concept selection are usually domain specific.
Focus for this talk: vocabularies of concepts, not proper names.
The term - concept relation is often not specified: intended (?) use of natural
language, which is context and interpretation specific.
But there goes language independence!
@Dalia Varanka, A topographic feature
taxonomy for a US national topographic
mapping ontology, 2009.

2
Experts Workshop on Controlled Vocabularies
Mainz 10-11/10/2013

Giovanni Colavizza

The problem
Quantitative and computer-based methods scale-up our responsibilities together
with our means.

Retrieve

The data and metadata loop:

Reuse Extend

3

Share
Experts Workshop on Controlled Vocabularies
Mainz 10-11/10/2013

Giovanni Colavizza

The problem
Quantitative and computer-based methods scale-up our responsibilities together
with our means.

Retrieve

The data and metadata loop:

Reuse Extend

Share

More strict requirements: classification systems must be shared, to some extent.
Such shared part must be formally specified (machine-readable). The term concept bond has to become explicit.

3
Experts Workshop on Controlled Vocabularies
Mainz 10-11/10/2013

Giovanni Colavizza

New design requirements
•Allow for comparison beyond single project (data integration)
•Interoperability and portability
•Scalability
•More accurate retrieval
•Automatic classification
•Named entity recognition
•Reasoning...
One possible solution: integrate a more strict knowledge model on top of
controlled vocabularies. Express it via ontologies: simplified specifications of
(shared!) conceptualizations.
Already possible! ISO 25964 (data model), SKOS (web format)

4
Experts Workshop on Controlled Vocabularies
Mainz 10-11/10/2013

Giovanni Colavizza

IEG proposal - concept
•Keep both natural language vocabularies AND formalized ontologies
•An integrated approach:

1.develop back-end ontologies, well formalized and documented*
2.vocabularies are built as needed, in natural language, associating tags with
formally defined concepts (prevent late integration)

5
Experts Workshop on Controlled Vocabularies
Mainz 10-11/10/2013

Giovanni Colavizza

IEG proposal - concept
•Keep both natural language vocabularies AND formalized ontologies
•An integrated approach:

1.develop back-end ontologies, well formalized and documented*
2.vocabularies are built as needed, in natural language, associating tags with
formally defined concepts (prevent late integration)

But!
No 1-1 mapping between vocabularies and ontologies. Focus on what’s shared*.
Pareto principle: 80% effects (tags we need) come from 20% causes (concepts).

5
Experts Workshop on Controlled Vocabularies
Mainz 10-11/10/2013

Giovanni Colavizza

IEG proposal - implementation
Implementation is key:
1.Upper ontologies (integration among domains)
2.Domain ontologies (e.g. functions)
3.Labeling system
4.Controlled vocabularies
> Linked data enabled, user friendly (minimize learning curve and overhead),
single entry-point to standards: bridges tags and concepts.

6
Experts Workshop on Controlled Vocabularies
Mainz 10-11/10/2013

Giovanni Colavizza

IEG proposal - implementation
Implementation is key:
1.Upper ontologies (integration among domains)
2.Domain ontologies (e.g. functions)
3.Labeling system
4.Controlled vocabularies
> Linked data enabled, user friendly (minimize learning curve and overhead),
single entry-point to standards: bridges tags and concepts.
Large-scale collaborative and community-driven framework (numbers 1, 2, 3, in
part 4), few experts for back-end, many users for front-end, everything open.
Could we think about a Consortium for controlled vocabularies (like TEI)?

6
Experts Workshop on Controlled Vocabularies
Mainz 10-11/10/2013

Giovanni Colavizza

Historical place types
Quite problematic:
Same names mean different things in space, time, culture
Generic tags for specific meanings: ambiguity
Layers of interpretations: agents, socio-political context, historians

7
Experts Workshop on Controlled Vocabularies
Mainz 10-11/10/2013

Giovanni Colavizza

Historical place types
Quite problematic:
Same names mean different things in space, time, culture
Generic tags for specific meanings: ambiguity
Layers of interpretations: agents, socio-political context, historians

From nouns to verbs:
Most vocabularies of place types/features are already loosely classified by
functionality (economic activity, leisure facility, place of culture, etc.)
There are less verbs than nouns (Wordnet synsets: ~82k nouns, ~14k verbs)
Verbs brings us closer to concrete events (and linked data triples..)

7
Experts Workshop on Controlled Vocabularies
Mainz 10-11/10/2013

Giovanni Colavizza

Functional categorization - I

@Filippo De Vivo, Patrizi, informatori, barbieri. Politica e comunicazione a Venezia nella prima età moderna. Milan: Feltrinelli,
2012. In English: id., Information and communication in Venice: Rethinking Early Modern Politics. Oxford: Oxford University
Press, 2007.

8
Experts Workshop on Controlled Vocabularies
Mainz 10-11/10/2013

Giovanni Colavizza

Open questions
Is all this useful and feasible? (let’s try it)
Where to start (historical place types)
What to model (functions)
Design requirements
Explore technical solutions
How to integrate existing vocabularies
> Sketch guidelines
Partners, anyone? :)

9
Experts Workshop on Controlled Vocabularies
Mainz 10-11/10/2013

Giovanni Colavizza

Thanks!
Controlled vocabularies and humanities, a problematic relationship.
The functional categorization of historical place types and the problems it raises.

Giovanni Colavizza
Leibniz Institute of European History
Colavizza@ieg-mainz.de

10

Mainz Expert Workshop on Controlled Vocabularies 10/10/2013

  • 1.
    Experts Workshop onControlled Vocabularies Mainz 10-11/10/2013 Giovanni Colavizza Topic Introduction Controlled vocabularies and humanities, a problematic relationship. The functional categorization of historical place types and the problems it raises. Giovanni Colavizza Leibniz Institute of European History Colavizza@ieg-mainz.de 1
  • 2.
    Experts Workshop onControlled Vocabularies Mainz 10-11/10/2013 Giovanni Colavizza The scenario Controlled vocabulary: a selected list of terms, which refer to concepts, used for categorization. Criteria of concept selection are usually domain specific. Focus for this talk: vocabularies of concepts, not proper names. 2
  • 3.
    Experts Workshop onControlled Vocabularies Mainz 10-11/10/2013 Giovanni Colavizza The scenario Controlled vocabulary: a selected list of terms, which refer to concepts, used for categorization. Criteria of concept selection are usually domain specific. Focus for this talk: vocabularies of concepts, not proper names. The term - concept relation is often not specified: intended (?) use of natural language, which is context and interpretation specific. But there goes language independence! @Dalia Varanka, A topographic feature taxonomy for a US national topographic mapping ontology, 2009. 2
  • 4.
    Experts Workshop onControlled Vocabularies Mainz 10-11/10/2013 Giovanni Colavizza The problem Quantitative and computer-based methods scale-up our responsibilities together with our means. Retrieve The data and metadata loop: Reuse Extend 3 Share
  • 5.
    Experts Workshop onControlled Vocabularies Mainz 10-11/10/2013 Giovanni Colavizza The problem Quantitative and computer-based methods scale-up our responsibilities together with our means. Retrieve The data and metadata loop: Reuse Extend Share More strict requirements: classification systems must be shared, to some extent. Such shared part must be formally specified (machine-readable). The term concept bond has to become explicit. 3
  • 6.
    Experts Workshop onControlled Vocabularies Mainz 10-11/10/2013 Giovanni Colavizza New design requirements •Allow for comparison beyond single project (data integration) •Interoperability and portability •Scalability •More accurate retrieval •Automatic classification •Named entity recognition •Reasoning... One possible solution: integrate a more strict knowledge model on top of controlled vocabularies. Express it via ontologies: simplified specifications of (shared!) conceptualizations. Already possible! ISO 25964 (data model), SKOS (web format) 4
  • 7.
    Experts Workshop onControlled Vocabularies Mainz 10-11/10/2013 Giovanni Colavizza IEG proposal - concept •Keep both natural language vocabularies AND formalized ontologies •An integrated approach: 1.develop back-end ontologies, well formalized and documented* 2.vocabularies are built as needed, in natural language, associating tags with formally defined concepts (prevent late integration) 5
  • 8.
    Experts Workshop onControlled Vocabularies Mainz 10-11/10/2013 Giovanni Colavizza IEG proposal - concept •Keep both natural language vocabularies AND formalized ontologies •An integrated approach: 1.develop back-end ontologies, well formalized and documented* 2.vocabularies are built as needed, in natural language, associating tags with formally defined concepts (prevent late integration) But! No 1-1 mapping between vocabularies and ontologies. Focus on what’s shared*. Pareto principle: 80% effects (tags we need) come from 20% causes (concepts). 5
  • 9.
    Experts Workshop onControlled Vocabularies Mainz 10-11/10/2013 Giovanni Colavizza IEG proposal - implementation Implementation is key: 1.Upper ontologies (integration among domains) 2.Domain ontologies (e.g. functions) 3.Labeling system 4.Controlled vocabularies > Linked data enabled, user friendly (minimize learning curve and overhead), single entry-point to standards: bridges tags and concepts. 6
  • 10.
    Experts Workshop onControlled Vocabularies Mainz 10-11/10/2013 Giovanni Colavizza IEG proposal - implementation Implementation is key: 1.Upper ontologies (integration among domains) 2.Domain ontologies (e.g. functions) 3.Labeling system 4.Controlled vocabularies > Linked data enabled, user friendly (minimize learning curve and overhead), single entry-point to standards: bridges tags and concepts. Large-scale collaborative and community-driven framework (numbers 1, 2, 3, in part 4), few experts for back-end, many users for front-end, everything open. Could we think about a Consortium for controlled vocabularies (like TEI)? 6
  • 11.
    Experts Workshop onControlled Vocabularies Mainz 10-11/10/2013 Giovanni Colavizza Historical place types Quite problematic: Same names mean different things in space, time, culture Generic tags for specific meanings: ambiguity Layers of interpretations: agents, socio-political context, historians 7
  • 12.
    Experts Workshop onControlled Vocabularies Mainz 10-11/10/2013 Giovanni Colavizza Historical place types Quite problematic: Same names mean different things in space, time, culture Generic tags for specific meanings: ambiguity Layers of interpretations: agents, socio-political context, historians From nouns to verbs: Most vocabularies of place types/features are already loosely classified by functionality (economic activity, leisure facility, place of culture, etc.) There are less verbs than nouns (Wordnet synsets: ~82k nouns, ~14k verbs) Verbs brings us closer to concrete events (and linked data triples..) 7
  • 13.
    Experts Workshop onControlled Vocabularies Mainz 10-11/10/2013 Giovanni Colavizza Functional categorization - I @Filippo De Vivo, Patrizi, informatori, barbieri. Politica e comunicazione a Venezia nella prima età moderna. Milan: Feltrinelli, 2012. In English: id., Information and communication in Venice: Rethinking Early Modern Politics. Oxford: Oxford University Press, 2007. 8
  • 14.
    Experts Workshop onControlled Vocabularies Mainz 10-11/10/2013 Giovanni Colavizza Open questions Is all this useful and feasible? (let’s try it) Where to start (historical place types) What to model (functions) Design requirements Explore technical solutions How to integrate existing vocabularies > Sketch guidelines Partners, anyone? :) 9
  • 15.
    Experts Workshop onControlled Vocabularies Mainz 10-11/10/2013 Giovanni Colavizza Thanks! Controlled vocabularies and humanities, a problematic relationship. The functional categorization of historical place types and the problems it raises. Giovanni Colavizza Leibniz Institute of European History Colavizza@ieg-mainz.de 10