Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Ontologies and the humanities: some issues affecting the design of digital infrastructure
1. Ontologies and the Humanities
Some Issues Affecting the Design of Digital
Infrastructure
Department Toby Burrows of Digital Humanities
2. Of making many ontologies, there is no end…
• “A joint project … which aims to develop an ontology of digital
research methods in the arts and humanities…”
• “Cette reflexion ́ a nećessité la modeĺisation d’une ontologie de la
transtextualite…”
• “The proposed ontology for 3D visualisation for cultural
heritage…”
• “Über die Modellierung einer Ontologie wissenschaftlicher
Prozesse fur̈ den Exzellenzcluster…”
• “The model can be aligned with upper level ontologies like the
CIDOC-CRM…”
Digital Humanities 2014, Lausanne, July 2014
3. What is an ontology?
• “The representation of entities, ideas, and events, along with
their properties and relations, according to a system of
categories” (Wikipedia)
• “An ontology formally represents knowledge as a set of concepts
within a domain, using a shared vocabulary to denote the types,
properties and interrelationships of those concepts” (Wikipedia)
• “The most typical kind of ontology for the Web has a taxonomy
and a set of inference rules” (Tim Berners-Lee)
4.
5.
6. Gill, Tony (2004) “Building semantic bridges between museums, libraries and archives: The CIDOC Conceptual Reference Model” First Monday vol. 9 no. 5
7.
8. Why ontologies?
• Computational perspective:
– Machine-processable
– Support automated reasoning and logic
– Enable contextual search and browse
– Enable software agents to identify trusted sources and provide
service discovery
• Humanities perspective:
– Semantic analysis of the contents of scholarly materials
– Categorization of scholarly materials
– Relating different categorization schemes to each other
– Computational reasoning – faceted searching and browsing
Berners-Lee, Tim, James Hendler and Ora Lassila (2001) "The Semantic Web", Scientific American, May 2001, p. 29-37
Allen, Colin (2013) “Cross-Cutting Categorization Schemes in the Digital Humanities”, Isis, Vol. 104, No. 3, pp. 573-583
9. Linguistic and semantic difficulties
• Variations in terminology
• Ambiguity of terminology
• Historical change in language and meaning
• Multilingualism – use of different languages
• Interdisciplinarity – different perspectives (“cross-domain”)
• Responses within ontologies:
– Definitions of terms
– Semantic context (provided by ontological structure)
– Ontology mapping across domains
– Ontology integration across domains
– Ontology learning and modification
10.
11. Alternative strategies?
• Search – use ontologies to classify search results (facets)
• Topic modeling – automatic generation of semantic categories
and relations from text-based NLP
• Fluid ontologies and vernacular ontologies
• Linked Data with light categorization for reasoning
– Vocabularies & thesauri encoded for the Semantic Web
(SKOS)
• “Folksonomies” or social tagging
Tags are applied to entities
There is no formal classification or categorization of concepts
There are no relationships between tags (other than being used to tag the
same entity)
12. Massive Attack Tags (last.fm)
00s 80s 90s acid jazz
alternative alternative dance alternative rock
ambient atmospheric beautiful
bristol bristol sound british
chill chill out chillout
dance dark downbeat downtempo dub
easy listening electro electronic electronica
england english experimental
favorite favorites favourite female vocalists
hip hop hip-hop house hypnotic
idm indie indie rock industrial instrumental
jazz lounge male vocalists massive attack
mellow pop psychedelic rap relax rock sexy soul
soundtrack technotrance trip hop trip-hop triphopuk
13. Categorizing contemporary popular
music
_________________________________________________________________
Alternative rock
Grunge
Punk
Indie
Riot grrrl
Alt-country
Hard rock
Seventies rock
Goth-punk
Slacker punk
“That old weird
America”
Stoner rock
Rap metal
Nu metal
Old-school punk
Metal
Hardcore
Post-punk
D-beat
To explain this
14.
15.
16. Deeper issues for the humanities
• More than just linguistic or semantic difficulties
• Debates about “the nature of things” (ontology!)
• Debates about “how to represent the world”
• The nature of perception and cognition
• Cognitive themes:
– Similarity and dissimilarity
– Relationships: metonymic and metaphorical, not just
semantic or logical
– Connections and trails
– Seeing things holistically
17. “The problem of modeling representations”
“The symbolic approach starts from the assumption that cognitive
systems can be described as Turing machines. Cognition is seen as
essentially being computation, involving symbol manipulation.”
(Gärdenfors 2000:1)
“The Semantic Web is a machine for creating syllogisms” (Shirky
2003)
“It is an unfortunate dogma of computer science in general, and
the Semantic Web in particular, that all semantic contents are
reducible to first-order logic or to set theory” (Gärdenfors
2014:258)
18. Conceptual spaces
In the current Semantic Web, the information mainly concerns
taxonomies and inference rules. If conceptual spaces are used as a
foundational methodology, the focus will be on describing domain
structures. This involves, above all, specifying the geometric and
topological structure of the domains. (Gärdenfors 2014: 261)
The issue is this: Do meaningful thought and reason concern merely the
manipulation of abstract symbols and their correspondence to an
objective reality? Or do meaningful thought and reason essentially
concern the nature of the organism doing the thinking – including the
nature of its body, its interactions in its environment, its social character
and so on? (Lakoff 1987: xv-xvi)
19. The world as graph
The theory of graph-theoretic structure is sufficient to account for all
structure in thought or world. Minimally, it has the information-theoretic
content to describe the complexity of the apparent world, it mirrors the
“computational” difficulty we have in grasping this world, and it has the
combinatoric texture to give a theoretically satisfying account of the
nature of the world. That is, the world is of daunting size and complexity,
parts of it are difficult precisely to isolate and conceive, but it is
fundamentally made up of parts arranged in simple, graspable
arrangements.
This is an extremely speculative assertion that a graph – large graphs
anyway – have the same compositional “feel” as the world; and that the
“facts” or sentences of first-order predicate logic of logico-metaphysical
analysis do not. (Dipert 1997:351)
20. HuNI’s approach – socially-linked data
• Aggregate heterogeneous data to a simple data model
• Keep the categorization of data entities to a minimum: six basic
categories
• No imported relationships between entities
• Allow users to express the relationships they see in the data – by
creating links between entities
• Allow multiple relationships between the same entities (even if
they are contradictory)
• The user-contributed links give meaning and add value
• Users can also create and share collections of entities
21. More icons = more PERSON A natural person
ORGANISATION A company, club, trust, gallery, political party, etc
WORK A cultural artefact or “man-made” thing created by
Concept
HuNI Record Category
Event Organisation Person Place Work
someone, that has some existence in its own right,
either physical or digital
PLACE A real, spatial location
EVENT An activity that occurs in space and time and may
involve people, organisations, places, works, etc.
CONCEPT Something whose existence is primarily mental
http://wiki.huni.net.au/display/DS/Data+Model
22.
23. Events
• Central to humanities perspectives on the world
• “Each entity is an event” – Bruno Latour
• Attempts at ontological models of events:
– Simple Event Model; LODE; The Event Ontology
– Within larger models: CIDOC-CRM, Europeana, FRBRoo
– Treat events as nameable entities
• Knowledge representation of events:
– CultureSampo (Finland)
• Events as conceptual spaces – Peter Gärdenfors
24. sem:has
SubEvent
sem:hasPlace
sem:Event sem:hasActor
sem:Actor sem:Place sem:Time
sem:hasTime
sem:placeType
sem:PlaceType
sem:eventType
sem:EventType
sem:actorType
sem:ActorType
sem:TimeType
sem:Type
sem:timeType
sem:Core
sem:Constraint
sem:Temporary sem:Role
sem:View
sem:RoleType
sem:roleType
sem:hasTimeStamp
sem:hasSubType
Core Classes
(Foreign)
Type System
Property
Constraints
the Simple Event Model (SEM)
Literal sem:hasTimeStamp
Literal sem:hasTimeStamp
sem:
accordingTo sem:Authority
sem:hasTime
Willem Robert van Hage
Véronique Malaisé
Vrije Universiteit Amsterdam
25. T. Ruotsalo, E. Hyvonen, An event-based approach for ̈ semantic metadata interoperability (2007)
26. In 1862, Sir Thomas Phillipps bought Phillipps MS 16402 in London as
part of the Sotheby’s sale of the collection of Guglielmo Libri.
27. The task for DH
• Future lines of DH research: looking beyond ontologies
• Computational modeling of humanities thought: going beyond
“reasoning” in the logical sense, as embedded in ontologies
• Examine alternatives from cognitive science and philosophy
– Conceptual spaces: the geometry of meaning
– Cognitive models
– The world as a graph
28. Dr Toby Burrows
Marie Curie Fellow
Department of Digital Humanities
King’s College London
26-29 Drury Lane
London WC2B 5RL
toby.burrows@kcl.ac.uk
Editor's Notes
Ontology has a specific meaning in computer science – though the term is often used very loosely - ironically
Two definitional sentences from the Wikipedia entry
Key words: “entities”, “categories”, “concepts”, “representation”, “properties and relations”
Provide the link to the original meaning in philosophy – “the study of what exists, what has being”
A form of knowledge representation
Different from vocabularies, thesauri, taxonomies, topic maps, data models and metadata schemas
As the definitions make clear, an ontology contains a vocabulary or taxonomy
PLUS relations between terms (including categories) – which can be expressed as inference rules
Designed for computer reasoning – as Berners-Lee emphasizes
Fundamental questions about the underlying approach (symbolic, logic-based) and its limitations
Raised by sceptics like Shirky
But also by cognitive scientists like Peter Gärdenfors
Gärdenfors proposes a very different approach to “making the Semantic Web more semantic”
Based on conceptual spaces, domains, and cognitive models – “the geometry of thought”
Drawing on earlier work by a range of people, including George Lakoff’s work on cognitive models of classification
Another interesting alternative proposal, from the philosophical side:
Replacing standard predicate logic with a graph-based approach to modelling the world and our understanding of it
DH needs to think through the implications of these fundamentally different approaches
Need to be aware of the limitations of the classic ontology framework and its assumptions
Need to try different approaches
I’m going to look at two projects which I’m involved in, as possible tentative examples
HuNI – explain what it is
30 different humanities datasets – 716,000 entities
The challenge is to organize and link heterogeneous data without pre-determining the structure and relationships
Sufficient organization is required to make the data aggregate useful – but without imposing too much of a conceptual framework
We need to be able to share – we need to be able to talk about the entities
Definitions of the six core categories
Documented in detail on the HuNI wiki
Challenge: link Hugh Jackman to Switzerland
Manually created links – diagram doesn’t show the nature of the link
That information is in a tabular form below the diagram
Can explore the graph by clicking on any of the entities
Designed for browsing and exploration, rather than reasoning or network analysis
Events in relation to provenance
Gardenfors is at pains to address this in his latest book (2014) – looks at potential of CSML