The document discusses issues in authoring ontologies and describes a study conducted to better understand the ontology authoring process. The study used an instrumented version of Protégé called Protégé4US to collect interaction logs and eye tracking data from ontology authors. Analysis of the data revealed common patterns of exploration, editing, and reasoning activities. Key findings include the repetitive nature of editing tasks and lack of situational awareness after running reasoning. Design recommendations aim to better support activities like bulk editing and anticipating the effects of reasoning.
1. Issues and activities in
authoring ontologies
Robert Stevens
School of Computer Science
University of Manchester
robert.stevens@manchester.ac.uk
2. We need to know what we’re talking
about…
• … if we don’t, our data are useless
• If we are to interpret our data then we need
to know what entities it describes
• We need to share data and re-use it
• We need to find data; compare data; analyse
data
• We need to know what we know and agree
about it….
3. What is an Ontology?
• Ontology (Socrates & Aristotle 400-360
BC)
• The study of being
•Word borrowed by computing for the
explicit description of the
conceptualisation of a domain:
– concepts
– properties and attributes of concepts
– constraints on properties and
attributes
– individuals (often, but not always)
• An ontology defines
– An agreement on the entities of a
domain
– a common vocabulary for the entities
of a domain
4. Web Ontology Language (OWL)
• W3C recommendation for ontologies for the Semantic
Web
• OWL-DL mapped to a decidable fragment of first order
logic
• Classes, properties and instances
• Boolean operators, plus existential and universal
quantification
• Rich class expressions used in restriction on properties
– hasDomain some (ImnunoGlobinDomain or
FibronectinDomain)
• Automated reasoners reveal entailments
from the axioms of an ontology in OWL
6. Some OWL and why it’s hard
Class: RanunculusRepens
SubClassOf:
* Flower,
Flower
and (hasFlowerSymmetry some RadialSymmetry)
and (hasPart some
(Androecium
and (hasAndroecialFusion some Apostemonous)
and (hasPart some
(Stamen
and (hasPart some Filament)
and (hasPart some
(Anther
and (hasAntherAttachment some AdnateAntherAttachment)
and (hasDehiscenceType some LongitudinalDehiscence)))))))
and (hasPart some
(Gynoecium
and (hasGynoecialFusion some Apocarpous)
and (hasPart some
(Pistil
and (hasPart some Carpel)
and (hasPart some Style)
and (hasPart some
(Stigma
and (hasStickiness some Stickiness)
and (hasStigmaShape some HookedStigmaShape)))
and (hasPart only
(Carpel
or Stigma
or Style))))
and (hasSexualPartArrangement some SpiralArrangement)))
and (hasPart exactly 1 (Perianth
7. Some OWL and why it’s hard
Class: RanunculusRepens
SubClassOf:
* Flower,
Flower
and (hasPart some
(Calyx
and (hasPart exactly 5 (Sepal
and (hasColour some Green)
and (hasRegion some
(BaseRegion
and (hasForm some Truncate)))
and (hasRegion some
(MarginRegion
and (hasSepalPetalFeature some Entire)
and (hasSepalPetalFeature some Membranous)))
and (hasRegion some
(SurfaceRegion
and (hasSepalPetalFeature some Pubescent)
and (hasSurfaceSelector some LowerSurfaceSelector)))
and (hasRegion some
(SurfaceRegion
and (hasSepalPetalFeature some Smooth)
and (hasSurfaceSelector some UpperSurfaceSelector)))
and (hasRegion some
(TipRegion
and (hasForm some Truncate)))
and (hasSepalPetalFeature some PalmatelyNetted)
and (hasSepalPetalShape some Ovate)
and (hasSepalousity some Aposepalos)))))
8. Some OWL and why it’s hard
Class: RanunculusRepens
SubClassOf:
* Flower,
Flower
and (hasPart some
(Corolla
and (hasPart exactly 5 (Petal
and (hasColour some Yellow)
and (hasPetalousity some Apopetalos)
and (hasRegion some
(BaseRegion
and (hasForm some Acute)))
and (hasRegion some
(MarginRegion
and (hasSepalPetalFeature some Entire)))
and (hasRegion some
(TipRegion
and (hasForm some Acute)))
and (hasSepalPetalFeature some PalmatelyNetted)
and (hasSepalPetalShape some Obovate)
and (hasPart exactly 1 Nectary)))))
and (hasPerianthArrangement some AlternatingPerianthArrangement)
and (hasPart only
(Calyx
or Corolla))))
9. Describing potatoes
Potato
BoilingPotato LateFirstEarlyPotato
Accent
Class: BoilingPotato
EquivalentTo: Potato and hasPreferredCookingMethod some Boiling
Class: LateFirstEarlyPotato
EquivalentTo: Potato and hasCroppingTime some LateFirstEarlyCropping
Class: Accent
SubClassOf:
Potato,
hasPreferredCookingMethod some Boiling,
hasYield some HighYield,
hasCroppingTime some LateFirstEarlyCropping
11. Understanding how ontologies are
authored in OWL
• We want to understand how these complex,
cognitively hard artefacts are authored
• HCI approaches do not pervade all computing
disciplines
• Instruments to run user studies are scarce
• Consequences for the OWL realm
– No real understanding about the authoring process
– Authoring tools are not human-centered
• What if we want to go further?
– Automatic detection of authoring patterns
– Intelligent support for authoring
12. How we tackle the problem
• Get familiarised with the
problem
• Set the scope
• Acquire insights for the
quantitative approach
Qualitative
approach
Interview study
Thematic analysis
• Collection of quantifiable data
• Use of lab apparatus (eye-tracker,
video, etc.)
• Find authoring patterns
• Quantify and generalise
Quantitative
approach
Instrumentation of Protégé
Lab study
Data-driven analysis
13. Little is known about the human
factors of ontology authoring
• What we know is mostly based on anecdotal
evidence
• We asked about problems and strategies
14. Uncovering issues in ontology
authoring
• Exploration and navigation
– Increase situational awareness by giving feedback
about the consequences of actions: e.g. undo,
reasoning
– Provide overviews for those who are not familiar
with a given ontology
– For those who are familiar with an ontology allow
bookmarks and provide landmarks
– Facilitate the navigation through filters, faceted
navigation mechanisms and hyperlinking entities
15. Uncovering issues in ontology
authoring
• Search and retrieval
– Integrated support to search on remote ontologies
and incorporate entities in the working ontology
• Efficient authoring
– Include design templates and spreadsheets
• Provide on-the-fly reasoning capabilities
• Remove information overload in explanations
• Include predefined unit tests for evaluation
16. Protégé4US: a step towards having
observational instruments
• Protégé4US: Protégé for User Studies
• Logging capabilities of:
– Interaction events: click, hover, expand hierarchy...
– Authoring events: add siblings, add restrictions...
– Environment commands: reason, search, undo...
76585,2,Classes,Element edited,Juliette subclass of: Potato and hasCroppingTime some ’Main
cropping’
77786,3,Classes,Save ontology,http://owl.cs.manchester.ac.uk/ontology/start-here.owl
80204,3,Classes,Reasoner invoked,HermiT 1.3.8
80647,1,Classes,Mouse entered, Class hierarchy (inferred)
82910,1,Classes,Element hovered,Early_cropping_potato
83049,1,Classes,Element selected,Early_cropping_potato
83661,1,Classes,Hierarchy expanded,Early_cropping_potato
17. User study to show the strengths of
Protégé4US
• Experimental design:
– Participants: 16 expert authors
– Stimuli: a potato ontology and Protégé4US
– 3 authoring tasks with an increased complexity
• Collected data
– Protégé4US logs: 10K events
– Completion times
– Self reported expertise
– Perceived task difficulty
– Screen video and eye-tracking
18. Describing potatoes
Potato
BoilingPotato LateFirstEarlyPotato
Accent
Class: BoilingPotato
EquivalentTo: Potato and hasPreferredCookingMethod some Boiling
Class: LateFirstEarlyPotato
EquivalentTo: Potato and hasCroppingTime some LateFirstEarlyCropping
Class: Accent
SubClassOf:
Potato,
hasPreferredCookingMethod some Boiling,
hasYield some HighYield,
hasCroppingTime some LateFirstEarlyCropping
20. Analysis of log data
• Interaction events account for 65% of events
while authoring events are 30%
• The top 3 events (entity selection, description
selection and invocation of editing menu)
account for 56% of events
21. Analysis of log data
• N-gram analysis of consecutive
events suggests lots of
repetition
• Esp. for entity selection and
hierarchy expansion
• Mouse driven functionalities
makes this possible in Protégé
• We built adjacency matrices for
participants: number of
transitions from event x to
event y
1000
750
500
250
0
2 4 6 8 10
N−grams size
frequency
Event
Class addition
Description selected
Entity selected
Entity selected(i)
Hierarchy expanded
Hierarchy expanded(i)
22. Reconstructing the interaction to
identify patterns through visualisation
• Left: web diagrams show most frequent
transitions between states
• Right: time diagrams show the authoring
rhythm P8
Back
Class addition
Convert into defined
Description selected
Description selected(i)
Entity deleted
Entity dragged
Entity edited:finish
Entity edited:start
Entity selected
Set property Undo
Run reasoner
Property addition
Load ontology
Get explanation
Hierarchy expanded(i)
Hierarchy collapsed(i)
Hierarchy collapsed
Entity renamed Entity selected(i)
Hierarchy expanded
Save
Description selected(i)
Description selected
Entity dragged
Entity deleted
Entity renamed
Back
Undo
Hierarchy collapsed(i)
Hierarchy collapsed
Get explanation
Set property
Property addition
Class addition
Run reasoner
Save
Convert into defined
Entity edited:finish
Entity edited:start
Hierarchy expanded(i)
Hierarchy expanded
EntitySelected(i)
Entity selected
Load ontology
0 1000 2000 3000 4000
23. Analysis of eye-tracking data
• Distribution of aggregated dwell times in the
areas of interest
• The class hierarchy
and the entity
edition menu get
the majority of
fixations and dwell
time
24. Analysis of eye-tracking data
• Number of fixations between areas of interest
• High frequency
expected at the
diagonal
• Symmetry
suggests checking
behaviours
• The class hierarchy
is the pivotal
window
25. Log data + eye-tracking data
• Synchronised both data sources
• Merged same consecutive events
e.g. class additiont, class additiont+1, class additiont+2, entity selectedt+3
M_class_additiont+2, entity selectedt+3
• Synchronised both data sources
• Computed N-gram analysis and we found 3
main activities:
– Exploration activity
– Authoring activity
– Reasoning activity
26. Exploration activity
Select
entity
Expand
hierarchy
0.48
0.31
Select
inferred
entity
Expand
inferred
hierarchy
0.25
0.43
0.12
0.54
Load
ontology
0.52
0.31
Expand
hierarchy
Select
description
0.29
0.37
Exploration activity
• Expand the asserted class
hierarchy after loading an
ontology
• The exploration of the
asserted hierarchy is
about finding a specific
location to add or modify
an entity, while exploration
of the inferred one is to
check the state of the
ontology
27. Editing activity
Select
description
Select
entity
0.29 Modify
entity
0.37
0.63
0.59
Editing activity
• Sequence found 362 times
• 22.6 times per participant
• The high probabilities along with the frequency
with which this activity is performed, indicates
that entities were modified in batches
28. Reasoning activity
Run
reasoner
0.17
Convert into
defined class
Save
Select
description
0.16
0.15
0.40
Expand
inferred
hierarchy
0.30
Select
entity
0.41
0.37
0.43
Select
inferred
entity
0.54
0.25 0.12
Reasoning activity
• After running the reasoner participants observe
the consequences of reasoning on the asserted
hierarchy and the description area OR
• To check classification, participants expand the
inferred class hierarchy and make selections on
inferred entities
29. Discussion
• Ontology editing is highly repetitive
• The class hierarchy received users’ attention
45% of the time
– Acts as an external memory of the ontology
– Plays the role of an index with pointers to extended
information
• Navigation of the inferred hierarchy is
exploratory, while the navigation of the asserted
hierarchy is directed
30. Discussion
• Some outcomes corroborate initial findings:
repetitiveness of editing task and lack of
situational awareness after running the
reasoner
• Design recommendations
– Support bulk editing
– Place editing features close to the class hierarchy
– Show entity descriptions close to the class
hierarchy
– Anticipate reasoner invocation
– Make changes to the inferred hierarchy explicit
31. Acknowledgements
Markel Vigo did the work.
Caroline Jay and Robert Stevens helped out with design,
analysis, and so on.
32. Issues and activities in
authoring ontologies
Robert Stevens
School of Computer Science
University of Manchester
robert.stevens@manchester.ac.uk
WhatIf: Answering “What if...” questions for Ontology Authoring.
EPSRC reference EP/J014176/1