SlideShare a Scribd company logo
1 of 192
Download to read offline
Practical Ontologies for
Information Professionals
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page i
Every purchase of a Facet book helps to fund CILIP’s advocacy,
awareness and accreditation programmes
for information professionals.
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page ii
Practical Ontologies for
Information Professionals
David Stuart
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page iii
© David Stuart 2016
Published by Facet Publishing
7 Ridgmount Street, London WC1E 7AE
www.facetpublishing.co.uk
Facet Publishing is wholly owned by CILIP: the Chartered Institute
of Library and Information Professionals.
David Stuart has asserted his right under the Copyright, Designs and Patents
Act 1988 to be identified as author of this work.
Except as otherwise permitted under the Copyright, Designs and Patents
Act 1988 this publication may only be reproduced, stored or transmitted in any
form or by any means, with the prior permission of the publisher, or, in the case of
reprographic reproduction, in accordance with the terms of a licence issued by
e Copyright Licensing Agency. Enquiries concerning reproduction outside
those terms should be sent to Facet Publishing, 7 Ridgmount Street,
London WC1E 7AE.
Every effort has been made to contact the holders of copyright material
reproduced in this text, and thanks are due to them for permission to reproduce
the material indicated. If there are any queries please contact the publisher.
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library.
ISBN 978-1-78330-062-4 (paperback)
ISBN 978-1-78330-104-1 (hardback)
ISBN 978-1-78330-152-2 (e-book)
First published 2016
Text printed on FSC accredited material.
Typeset from author’s files in 10/13 pt Minion Pro and Myriad Pro by
Facet Publishing Production.
Printed and made in Great Britain by CPI Group (UK) Ltd, Croydon, CR0 4YY.
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page iv
Contents
List of figures and tables..................................................................................vii
1 What is an ontology?..................................................................................1
Introduction ..........................................................................................................................1
The data deluge and information overload...............................................................1
Defining terms......................................................................................................................4
Knowledge organization systems and ontologies..................................................5
Ontologies, metadata and linked data......................................................................15
What can an ontology do?.............................................................................................17
Ontologies and information professionals..............................................................21
Alternatives to ontologies..............................................................................................22
The aims of this book.......................................................................................................24
The structure of this book..............................................................................................25
2 Ontologies and the semantic web ..........................................................27
Introduction........................................................................................................................27
The semantic web and linked data.............................................................................27
Resource Description Framework (RDF)...................................................................28
Classes, subclasses and properties.............................................................................30
The semantic web stack..................................................................................................31
Embedded RDF..................................................................................................................42
Alternative semantic visions.........................................................................................46
Libraries and the semantic web...................................................................................47
Other cultural heritage institutions and the semantic web ..............................49
Other organizations and the semantic web............................................................50
Conclusion...........................................................................................................................51
3 Existing ontologies ..................................................................................53
Introduction........................................................................................................................53
Ontology documentation..............................................................................................53
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page v
Ontologies for representing ontologies ...................................................................54
Ontologies for libraries....................................................................................................63
Upper ontologies..............................................................................................................68
Cultural heritage data models......................................................................................70
Ontologies for the web...................................................................................................71
Conclusion...........................................................................................................................78
4 Adopting ontologies................................................................................79
Introduction........................................................................................................................79
Reusing ontologies: application profiles and data models ...............................79
Identifying ontologies.....................................................................................................83
The ideal ontology discovery tool ..............................................................................89
Selection criteria................................................................................................................92
Conclusion...........................................................................................................................95
5 Building ontologies..................................................................................97
Introduction........................................................................................................................97
Approaches to building an ontology.........................................................................97
The twelve steps .............................................................................................................100
Ontology development example: Bibliometric Metrics Ontology
element set.......................................................................................................................127
Conclusion ........................................................................................................................135
6 Interrogating ontologies.......................................................................137
Introduction .....................................................................................................................137
Interrogating ontologies for reuse...........................................................................138
Interrogating a knowledge base...............................................................................139
Understanding ontology use.....................................................................................148
Conclusion ........................................................................................................................154
7 The future of ontologies and the information professional...............155
Introduction .....................................................................................................................155
The future of ontologies for knowledge discovery............................................155
The future role of library and information professionals .................................158
The practical development of ontologies .............................................................162
Conclusion ........................................................................................................................164
Bibliography...................................................................................................165
Index................................................................................................................179
VI pRACTICAL OnTOLOgIES
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page vi
List of figures and tables
Figures
1.1 Section of the British national Bibliography graph visualized using
RDF gravity..........................................................................................................................11
1.2 A graph of Jesus and his twelve apostles.................................................................18
2.1 David hates Apple graph................................................................................................29
2.2 David hates Apple, but knows Bob who loves Apple..........................................30
2.3 The semantic web stack..................................................................................................32
2.4 An example of an RDF graph........................................................................................41
3.1 A simple person and place ontology using RDF and RDFS................................56
3.2 nature.com data categories as SKOS play tree visualization.............................60
3.3 FRBR entities and relationships representing the intellectual content.........65
3.4 Structuring intellectual content in FaBiO.................................................................66
4.1 Linking between Schema.org and other vocabularies as shown on
Linked Open Vocabularies.............................................................................................82
4.2 Word cloud of subject headings of ontologies in BARTOC................................85
4.3 A search for‘person’within the Falcons Ontology Search..................................87
5.1 WebVOWL visualization of FOAF..............................................................................125
5.2 First draft of the Bibliometric Metrics Ontology, with two classes and
provisional relationships..............................................................................................128
5.3 Second draft of the renamed Bibliometric Indicators Ontology...................130
5.4 Screenshot of protégé 5.0 with the Entities tab selected ................................131
5.5 properties associated with the Bibliometric Indicators Ontology................133
5.6 Bibliometric Indicators Ontology (BInO) – v. 0.1 .................................................134
6.1 number of reusing vocabularies in rank order....................................................149
Tables
3.1 Dublin Core Terms properties.......................................................................................63
3.2 Comparison of schema:person with foaf:person...................................................76
5.1 Overview of steps in different ontology development methodologies.......99
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page vii
5.2 Different entities and concepts identified with different spotter
algorithms.........................................................................................................................113
6.1 The most common properties associated with schema:Book.......................152
VIII pRACTICAL OnTOLOgIES
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page viii
C H A P T E R 1
What is an ontology?
Introduction
Today more data and information are being produced and shared than ever before;
data is streaming forth from new online social behaviours as well as high-specification
digital tools and instruments. If we are to extract the maximum value from this data
then we need to make use of the most appropriate tools and technologies. Ontologies,
formal representations of knowledge with rich semantic relationships, are one such
tool, and the focus of this book.
is chapter provides an introduction to ontologies, and considers their increasing
importance to information professionals. Following a brief overview of the growing
information overload and data deluge, the chapter considers the various definitions
that have been applied to the term ‘ontology’ and how ontologies differ from associated
and overlapping information concepts such as controlled vocabularies, taxonomies,
metadata and knowledge bases. Finally, the chapter considers the potential of ontologies
for information retrieval and discovering ‘undiscovered public knowledge’, and the role
of the librarian in the development, maintenance and curation of ontologies.
The data deluge and information overload
It is important to start with an understanding of the changing information landscape,
reminding ourselves of why we need new tools and technologies, and why it is no
longer acceptable to continue with the way things have always been done. We are
awash with a wide variety of information and data, but due to the tools that we are
currently using the value of much of the data is going to waste. As John Naisbitt (1984,
17) put it, ‘We are drowning in information, but starved for knowledge’.
Information is coming from a wide variety of sources. ere has been an explosion
in the publishing and sharing of text across the whole of the communication spectrum,
from the informal to the formal. Traditional formal publications, such as books and
journals, have been joined by e-books and e-journals, with new publishing models
based on combinations of self-publishing and open access: the number of self-published
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 1
titles published in the USA rose from 85,468 titles in 2008 to 458,564 titles in 2013
(Bowker, 2014); whilst Chen (2014) estimated that the proportion of articles published
in the previous year available as open access had either passed or was very close to 50%.
In the middle of the formal–informal spectrum of publishing is the grey literature:
white papers, reports, technical papers and other, more informal, publications.
Whereas once this grey literature could be costly to create and had limited circulation,
desktop publishing soware and electronic publishing on the web have put it within
reach of a wide range of individuals and organizations. But the growth in these
numbers has been dwarfed by the growth of social media and other informal
publishing, where the associated numbers are oen in the hundreds of millions if not
billions: there are 1.49 billion active Facebook users each month (Facebook, 2015);
and over 500 million updates are sent on Twitter on a typical day (Twitter Engineering
Blog, 2013). No one can hope to read anything but the smallest fraction of this
information, even within the smallest of fields. ere is a need for new tools to help
with information retrieval, increasing precision without excessively impacting recall.
e narrative text has also been joined by increasing quantities of other text, such
as computer code and data sets, as well as rich media (i.e., images and video). Although
the lack of data sharing within the academic community has been labelled as the ‘dirty
little secret’ of open science data promotion (Borgman, 2012, 1059), the potential of
open data and open code to transform the rate of scientific progress (Hey, Tansley and
Tolle, 2009) and to encourage more open and accountable governments and encourage
citizens’ participation (Raman, 2012) has led to numerous open programs and policies.
Governments have signed up to open data charters promising data to be open by
default (Cabinet Office, 2013) and funding agencies and journals are increasingly
stipulating the need for open data and open code (e.g., Nature, 2014). It is not enough,
however, that data and code are open; they need to be findable and reusable by those
who want to make use of them too.
Whilst the growth of open data may have been slower than some would like, growth
in the number of images and videos shared has exploded: since its launch in 2010,
over 30 billion images have been shared on Instagram (Instagram, 2015); in May 2014
Snapchat reported 700 million photos sent per day (Techcrunch, 2014); and YouTube
counts billions of views every day as people watch hundreds of millions of hours of
video (YouTube, 2015). is media is also increasingly of higher quality, part of the
trend towards increasingly high specification digital tools and instruments. By 2007
83% of mobile phone cameras had digital cameras, and over the years the specification
of these cameras has increased dramatically. By 2012 there were mobile phones with
41 megapixel cameras available, many times more powerful than the first camera
phones with 0.1–1 megapixels. e rise of increasingly high specification mobile phone
cameras reflects an increase in digital data collection at increasingly high-level
2 pRACTICAL OnTOLOgIES
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 2
specifications across a wide range of disciplines and professions. Data per 360 degree
scan in computed tomography has gone from 57.6 kB in 1972 to 0.1–1GB by 2010
(Kalender, 2011), whilst the rise in quality and fall in price has increased the number
of scans made and the areas outside medicine where computed tomography may be
used (e.g., archaeology and paleontology). When the first human genome was declared
complete in 2003 it had been a mammoth project taking over ten years and costing
US$3billion; now we have entered the US$1000 genome era, where the cost of
sequencing the human genome has fallen to a price where it may play a role in
predictive and personalized medicine (Hayden, 2014). Projects such as the 100,000
Genome Project are now sequencing thousands of genomes to identify genetic causes
for a wide range of human diseases (www.genomicsengland.co.uk/the-100000-
genomes-project).e content in any single human genome, however, is dwarfed by
the amount of data produced by big science projects such as the Large Hadron
Collider, where 19 gigabytes of data were created in the first minute and thirteen
petabytes (1015
bytes) in the first year (Brumfiel, 2011). With so much data available,
and in increasingly large chunks, it becomes increasingly important that we are
accessing and downloading only the most relevant data for analysis.
As well as the data people are making a conscious decision to share, there are also
the vast digital trails we all increasingly leave as an increasing proportion of our lives
are lived online, and processes are digitized. Mobile phones can not only capture
pictures, but have built in GPS and accelerometers to track location and movement.
Phone (or VOIP) calls can now simply be captured in their entirety, to index or
playback in full at a later date if necessary. With the internet as the first port of call for
our information needs we are leaving trails of information about the searches we are
carrying out, the pages we are visiting and the links we are following. is information
is not only restricted to the log files of a single site, but may be aggregated by
advertising companies and content providers across multiple sites, enabling the
building of increasingly complex profiles on individuals for the tailoring of increasingly
personalized advertising and services.
As data storage and processing prices have fallen it is no longer necessary to be
selective in what we capture: increasingly we capture everything and then search the
captured information for what we need later. A process that is epitomized by note-
taking soware designed for capturing ‘everything’ and ideas such as life streaming.
Wearable technology, such as Google Glass, streamlines the process, as it is no longer
necessary to even go to the trouble of taking a smartphone from a pocket.
Data inevitably produces more data. e data that is captured is oen indexed,
analysed, or combined to spawn more data. A file may be indexed, the contents analysed
according to different criteria (e.g., searching for patterns or antecedents), and be
accompanied by an ever growing quantity of descriptive, access, and preservation
WhAT IS An OnTOLOgy? 3
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 3
metadata. As new questions are asked, and new methods of data analysis developed,
the same data set can continue to produce ever increasing quantities of data. We have
entered the era of Big Data. ere are vast amounts of structured and unstructured data
available, and there are new challenges to ensure that we make use of this data.
Neither the exponential growth of science nor the problems of information overload
are particularly new problems. e growth and communication of science began to
be explored scientifically in the 1950s and 60s, and its exponential growth was one of
the subjects of Derek J. de Solla Price’s (1963) seminal Little Science, Big Science. e
history of scientific publishing can be seen as one of trying to help researchers
overcome the problem of information overload, first with publication of specialist
journals, then with specialist abstract and indexing services. However, the web has
provided a step-change in the publishing of information. When Ziman (1969) wrote
of the problem of having to wade through ‘tomes of irresponsible nonsense’ without
peer review, he would have had no idea how large these tomes of irresponsible
nonsense would become.
e web requires new tools and methods to help users engage with the information
that is available, and its brief history has already been one of rapid innovation: from
directories to search engines, from information searching to information discovery.
We no longer expect always to have to search for the information that we require, but
are instead alerted to information we may require, either through the filter of social
network sites or algorithmic suggestions (e.g., Google Scholar).
ose who successfully find ways of managing the information overload, and of
making use of the increasing quantities of data available, will have the competitive
advantage. Whether that is the company gathering competitive intelligence on its rivals,
the researcher looking for new ways to encode and analyse data, or the international
non-governmental organization looking for efficiencies in sharing information.
Ontologies are one way of helping to tame some of the problems identified above,
providing a structure for this information in such a manner that it can be read
automatically and unambiguously, and shared more widely.
Defining terms
Whenever writing on a specialist subject it is generally advisable to start by defining
your terms, as all too oen we follow the example of Humpty Dumpty when he says
in Lewis Carroll’s rough the Looking Glass: ‘When I use a word, it means just what
I choose it to mean – neither more nor less’. Even within the smallest of fields the same
term may have multiple meanings, some of which may be conflicting, a feature that is
true for both ‘ontology’ and concepts such as data, information and knowledge, which
the ontology is trying to encode.
4 pRACTICAL OnTOLOgIES
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 4
Defining data, information, knowledge and wisdom
Most topics in information science can’t be discussed for long without running into the
terms data, information, or knowledge. Unfortunately the terms are notoriously hard
to define, and attempts at capturing knowledge within the library and information
science community (e.g., through knowledge management) have sometimes been
controversial for seemingly being little more than rebranding exercises.
Data, information, knowledge and wisdom are oen conceptualized as a four-step
pyramid, from data at the bottom, through information and knowledge, to wisdom at
the top. is model was popularized by Ackoff (1989), but analysis of how the terms
are used (Rowley, 2007; Zins, 2007) finds them to be the subject of wide-ranging and
oen overlapping definitions. Rather than thinking of them as distinct terms, it is more
useful to think of them as overlapping areas on a continuum from highly structured
and codified information at one end (data) to highly personal tacit understanding at
the other (wisdom).
Data is the ‘building blocks’ of information and knowledge (Kitchin, 2014),
although much of the information and knowledge that we have can seem quite
detached from the underlying data. Whereas the route from data to knowledge may
seem quite direct in the hard sciences, within the arts and the humanities the
relationships between abstract ideas and concepts that form information and
knowledge are less readily structured. Ontologies emerged as a way of capturing
knowledge, and codifying it in a highly structured manner as data, and this may be
applied to knowledge in any discipline.
. . . knowledge is inherently complex and the task of capturing it is correspondingly
complex. us, we cannot afford to waste whatever knowledge we do succeed in acquiring.
Neches et al., 1991, 54
Knowledge organization systems and ontologies
Ontologies are one of a number of different knowledge organization systems that have
been developed within the information profession to improve information discovery.
ese knowledge organization systems are also variously known as ‘taxonomies’ or
‘controlled vocabularies’, depending on the sector within which they are used. Whereas
cultural heritage institutions err more towards ‘controlled vocabularies’, the
commercial sector tends to use the term ‘taxonomies’.
Harpring (2013, 13) defines a controlled vocabulary as: ‘an organized arrangement
of words and phrases used to index content and/or to retrieve content through
browsing or searching’, very similar to Hedden’s broad definition of a taxonomy in her
introduction to e Accidental Taxonomist:
WhAT IS An OnTOLOgy? 5
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 5
. . . any knowledge organization system (controlled vocabulary, synonym ring, thesaurus,
hierarchical term tree, or ontology) used to support information/content findability,
discovery, and access.
Hedden, 2010, xxii
ere is also a more narrow use of the term taxonomy, in the sense it refers to a
hierarchical set of terms (Hedden, 2010; Harpring, 2013), such as the Linnaean
taxonomy of biological classification, most people’s first introduction to the term.
Within this work the term controlled vocabulary is preferred rather than taxonomy,
partly due to the potential for confusion caused by the dual meaning, but also due to
the author’s own background within library and information science.
Controlled vocabularies have both advantages and disadvantages. Advantages of a
controlled vocabulary include improved recall and greater precision through reducing
polysemy (van Hooland and Verborgh, 2014). Recall, the proportion of relevant
documents that are retrieved out of all the relevant documents in a collection, is
increased by the reduction of the number of terms associated with a particular concept.
For example, the Dublin Core Metadata Initiative Type Vocabulary is a controlled
vocabulary of 12 terms: collection, dataset, event, image (still image and moving
image), interactive resource, physical object, service, soware, sound, and text.
Without a controlled vocabulary, a wide range of resources that adhere to each of these
types could have been referred to differently. e ‘text’ resource type includes letters,
books, theses, reports, newspapers, and poems, as well as a host of other texts primarily
designed for reading. To ensure the recall of all the associated text resources would
require entering all the possible terms.
Polysemy refers to multiple meanings for the same term. A controlled vocabulary
enables distinctions to be made between the different terms. For example, ‘Apple’ may
refer to the fruit, the technology company, a computer created by the technology
company, or the record label founded by the Beatles. Within the Library of Congress
Subject Headings the fruit has the term ‘Apples’ and the computer is ‘Apple computer’,
whilst in the Library of Congress Name Authority File the technology company is
‘Apple Computer, Inc.’ and the record label is ‘Apple Records’.
ere are also a number of disadvantages to controlled vocabularies: the cost, the
complexity, the slow evolution, and their subjectivity (van Hooland and Verborgh,
2014). Controlled vocabularies are not only expensive to create in the first place, but
also to maintain as new names and terminology enter a field.
In some situations the slow speed of change may be simply due to limitations in
resources; in other situations there may be conflict between the terminology of
conservative and progressive perspectives. For example, a comparison of the style
guides of le- and right-wing newspapers can be particularly enlightening regarding
6 pRACTICAL OnTOLOgIES
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 6
their associated politics. Controlled vocabularies are inevitably subjective, and reflect
the world view of the creators at a particular time, and different people in more
enlightened times inevitably baulk at previous decisions, especially when there are
prohibitively large legacy costs to rectifying previous decisions. For example, the
Dewey Decimal Classification system is infamous for class 200 – religion, where seven
out of the ten divisions relate to the Bible or Christianity:
• 200 Religion
• 210 Philosophy & theory of religion
• 220 e Bible
• 230 Christianity
• 240 Christian practice & observance
• 250 Christian pastoral practice & religious orders
• 260 Christian organization, social work, & worship
• 270 History of Christianity
• 280 Christian denominations
• 290 Other religions.
Although there have been attempts to extend many of the other religions in DDC in
recent years, particularly Islam (Idrees, 2012), the Dewey legacy nonetheless supports
the perception of it being Christian-centric.
Some of the most widely used forms of controlled vocabularies within the
information profession are subject headings, authority files and thesauri. It is worth
considering each of these types of controlled vocabulary, and their limited nature, for
comparison with the more expressive nature of ontologies:
Subject headings are a controlled set of terms designed to describe the subject or
topic of a resource, whether it is book, article, or data set. Popular examples include
the Library of Congress Subject Headings (http://id.loc.gov/authorities/subjects.html)
and the Medical Subject Headings (MeSH) (www.nlm.nih.gov/mesh/meshhome.
html). Subject heading lists ensure that the same term is used to describe a work, rather
than multiple similar terms.
Authority files are sets of preferred headings. As well as preferred subject headings,
there may be preferred organization names, person names, and place names. History is
replete with people, places, and organizations that have different names at different times,
and successful information retrieval requires the consistent use of terms and
relationships between the alternatives: those looking for information on Mark Twain
may also want to retrieve information on Samuel Clemens, whilst those researching
Constantinople may also wish to retrieve information on Istanbul. Well known examples
include the authority files of the major national libraries (e.g., Library of Congress, British
WhAT IS An OnTOLOgy? 7
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 7
Library and Bibliothèque Nationale de France). VIAF (Virtual International Authority
File) (http://viaf.org) is a project from several national libraries designed to link together
the separate authority files of the libraries into one virtual authority file.
A thesaurus, like a taxonomy (in the narrower sense of the term), provides
hierarchical relationships between concepts (i.e., broader and narrower terms), as well
as equivalence and associative relationships. A typical entry in a thesaurus might
include all three types of relationship, as in the example below for information science:
Information Science
Broader terms: Sciences
Narrower terms: Computer Science
Library Science
Use instead of: Informatics
Information Industry
Related terms: Information Processing
Information Skills
Knowledge Management
Knowledge Representation
Library Education
e above example is based on ‘Information Science’ in the ERIC (Education
Resources Information Center) thesaurus (http://eric.ed.gov). e relationships within
a thesaurus enable a reader to traverse from one concept to another more easily,
helping to find related content. Other well known examples of thesauri include the
Getty esaurus of Geographic Names (www.getty.edu/research/tools/vocabularies/
tgn), the Art & Architecture esaurus (www.getty.edu/research/tools/vocabularies/
aat), and the esaurus for Graphic Materials (www.loc.gov/pictures/collection/tgm)
from the Library of Congress.
Today controlled vocabularies should also be compared with tagging, which came
to prominence with the rise of social media and social networking sites. e vast size
and diversity of the web, and its users, drove the need for an approach to classification
that was equally global and diverse in outlook, and could be applied by members of
the public as well as information professionals. Tagging, the application of
uncontrolled terms to online resources, has been incorporated into a large number of
services with varying degrees of success. Whilst many of the sites for bookmarking
web resources (e.g., del.icio.us) have fallen out of favour, it nonetheless continues to
have an important role within sites that are focused around user-generated content:
for example, the tagging of images in Flickr and Instagram, and the use of hashtags in
Twitter (so called because of the ‘#’ used to denote the tag). In comparison to a
8 pRACTICAL OnTOLOgIES
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 8
controlled vocabulary, tagging is likely to have reduced recall and lack precision, but
where the scale of the web is concerned there may be few alternative options.
An ontology is like a thesaurus, in that there are multiple types of relationship
between terms, but it can be non-hierarchical, with a far richer set of relationships,
and typically holds a far greater variety of information. e richness of the
relationships and information means that it is not only suitable for indexing resources,
but may be a knowledge base for knowledge discovery in its own right.
Defining an ontology
Ontologies first emerged in the Artificial Intelligence (AI) community, borrowing the
term ‘ontology’ from philosophy, where ontology is concerned with the study of being
or existence. e term was adopted by the AI community in the 1980s for
computational models that can enable automated reasoning (Gruber, 2009), having
recognized that ‘capturing knowledge is the key to building large and powerful AI
systems’ (Neches et al., 1991, 37).
Today the most widely used definition of ontology is Gruber’s (1993, 199)
definition: ‘an explicit specification of a conceptualization’. This has been criticized
for its broadness, incorporating both simple glossaries and ‘logical theories couched
in predicate calculus’ (Gruber, 2009, 1964), and also for its focus on subjective
concepts rather than entities as they exist in reality (Smith, 2004). Nevertheless, an
ontology might be considered a near-synonym with knowledge organization system
or taxonomy (in the broad sense).This continuum from informal vocabularies to
formal ontologies has been reiterated by the World Wide Web Consortium (W3C)
in their introduction to ontologies: ‘There is no clear division between what is
referred to as “vocabularies” and “ontologies”’ (W3C, 2013). The broadness of the
definition is an important part of the inclusiveness of ontologies for information
professionals. It is not just a subject for the AI community, but rather all those
involved in the codifying of knowledge, including librarians, archivists, museum
workers and domain experts. Nonetheless, a more specific definition is useful for
distinguishing between those ontologies that are the primary focus of this book and
other examples of controlled vocabularies.
Within most definitions of ontologies the distinctive feature of ontologies is the
richness of the relationships between terms. For Hedden (2010, 12), an ontology ‘can
be considered a type of taxonomy with even more complex relationships between
terms than in a thesaurus . . . it aims to describe a domain of knowledge, a subject
area, by both its terms . . . and their relationships’. Within an ontology a person does
not have to just be related to an event: they may be present at an event, organize an
event, take part in an event, be an authority on an event, or possibly instigate an event.
WhAT IS An OnTOLOgy? 9
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 9
An example of the richness of the information associated with a particular entity
in an ontology is provided below with an author record:
Ranganathan, S.R. (Shiyali Ramamrita), 1892-1972
event: 1892
1972
family name: Ranganathan
given name: S.R.
has created: Colon classification / S.R. Ranganathan
The five laws of library science / S.R.
Ranganathan
name: S.R. Ranganathan
type: Agent
Person
has contributed to: An essay in personal bibliography / A.K. Das
Gupta
same as: 49268668
The above record is based on the British National Bibliography record for S.R.
Ranganathan. It expresses two types of relationship between the author and his
associated works: has created, and has contributed to. With the exception of the
name, family name, and given name values, each of the properties on this record
links to another record for the particular instance, for example, The five laws of
library science:
The five laws of library science / S.R. Ranganathan
bnb: GB6417211
description: 2nd
ed originally published (B58-927) Madras
Library Association; Blunt 1958.
edition statement: 2nd
ed. reprinted (with minor amendments)
type: BibliographicResource
creator: Ranganathan, S.R. (Shiyali Ramamrita), 1892-1972
is part of: Ranganathan series in library science; no 12
language: eng
publication event: Asia Publishing House, 1964
same as: GB6417211
subject: 020
Again, many of the properties have their own associated records, creating a huge graph
10 pRACTICAL OnTOLOgIES
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 10
of related resources, joining previously disparate authority lists and classification
systems. Figure 1.1 shows the graph produced by just the author and instance records
mentioned above.
Explicit specifications of conceptualizations are important if computers are to
successfully communicate with one another without ambiguity, and there is less
ambiguity and more scope for drawing inferences if the explicit specifications build
upon one another in a more formal manner. ‘Formal’ rather than ‘explicit’ is used in
a number of definitions of ontologies: ‘An ontology is a formal specification of a shared
conceptualization’ (Borst, 1997,11); ‘Ontologies are formalized vocabularies of terms,
oen covering a specific domain and shared by a community of users. ey specify
the definitions of terms by describing their relationships with other terms in the
ontology’ (W3C, 2012). Others, however, have preferred to combine the two terms:
‘An ontology is a formal and explicit specification of a shared conceptualization’ (Jakus
et al., 2013, 29). Whilst a formal ontology would seem to necessitate an ontology being
explicit, an explicit ontology does not necessarily need to be particularly formal. e
use of relationships in defining terms is a particularly important part of the semantic
web due to its distributed nature, with organizations likely to be adhering to different
vocabularies.
WhAT IS An OnTOLOgy? 11
Figure 1.1 Section of the British National Bibliography graph visualized using RDF Gravity
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 11
As well as the richness of the relationships and their explicitness, there is another
distinctive feature of ontologies that is widely acknowledged: that they should be a
representation of the structure of knowledge, not just a set of indexing terms. Willer
and Dunshire (2013, 112) define an ontology as ‘a formal representation of the
structure of knowledge and information, and Allemang and Hendler (2011, 1) point
out that semantic models are sometimes called ontologies.
Although Harpring (2013) acknowledges certain similarities between thesauri and
taxonomies and ontologies, she considers them to have fundamentally different goals:
…ontologies use strict semantic relationships among terms and attributes with the goal of
knowledge representation in machine-readable form, whereas thesauri provide tools for
cataloguing and retrieval.
Harpring, 2013, 26
e goals of knowledge representation and information retrieval do not have to be
mutually exclusive, however, and the same ontology may be used for both. In fact the
richness on the relationships may allow for far richer querying and information
retrieval.
Within this book a fairly broad definition of ontology, albeit not quite as broad as
that of Gruber (1993), is taken:
An ontology is a formal representation of knowledge with rich semantic relationships
between terms.
Such ontologies may be more or less formal, depending on the extent to which they
define terms with relation to one another and incorporate axioms, and no distinction
is made as to whether an ontology is designed either for information retrieval or as a
knowledge base. Such a simple definition, however, glosses over the parts that
comprise an ontology.
The parts of an ontology
The definition of an ontology provided above is designed to be inclusive, although
it is sometimes necessary to distinguish between different ontologies that fall within
this definition. As with Willer and Dunshire’s (2013) definition, it is sometimes used
to distinguish the structure of the ontology from the instances. For example, a book
ontology might not be expected to include any information about particular books,
but rather provide the necessary structure for describing books and the relationships
between them and associated types of objects. In other situations an ontology might
12 pRACTICAL OnTOLOgIES
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 12
refer to both the structure and the instances, in much the same way as a thesaurus
of place names includes the names of places, not just the possible relationships
between them.
Whether an ontology developer is interested primarily in classes or instances may
be expected to differ considerably depending on the discipline. For example, Arp,
Smith and Spear, who are primarily interested in the representation of scientific
research, believe an ontology is ‘concerned with representing universals’ (2015, 17).
However within the arts and humanities it may be the particular facts that are
important rather than the general theories, and the general theories do not necessarily
have widespread agreement.
e W3C Library Linked Data Incubator Group (2011) makes a distinction between
metadata element sets and value vocabularies within data sets, with the metadata
element set providing the structure for holding the information (e.g., Dublin Core
element set) and the value vocabularies providing the values for these elements (e.g.,
an authority list of author names or place names). is book also distinguishes
between the structure and the values of ontologies, although it uses slightly different
terminology:
• ontology element set
• ontology instances.
e ontology element set and ontology instances combine to form an ontology data
set or knowledge base.
e term metadata is one that is already overburdened within the information
profession, and may cause confusion when distinguishing between more traditional
approaches to cataloguing and the rich semantic nature of ontologies. Metadata is also
strongly associated with a particular type of record within the information profession
(e.g., a bibliographic record describing a book), and it is important that ontologies are
more inclusive than this.
‘Instances’ is a more inclusive term than ‘value vocabulary’, which seems primarily
appropriate for existing controlled vocabularies, whereas an instance may be used to
refer to any concept or thing within an ontology. A concept is generally an abstract
idea that is then given a label, some of which are more concrete than others (e.g., ‘Paris’
may be considered a more concrete concept than ‘Love’), but which are nonetheless
abstract. Concepts form the basis of most traditional knowledge organization systems,
but ontologies can also deal with more concrete things. As well as the abstract idea of
Paris, the one that each of us holds in our minds, with associations of romantic
getaways, literary salons or fashion shows, there is the actual physical city with specific
boundaries, activities and population at any particular moment. Concepts and things
WhAT IS An OnTOLOgy? 13
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 13
are oen blurred within ontologies, but there is nonetheless a wide range of
information associated with any particular concept or thing that is not part of many
controlled vocabularies. Following Hedden’s (2010, 69) use of ‘term record’, instance
record is used to describe all the pieces of information associated with a particular
concept or thing, or resource record within the context of the semantic web.
For ease of reading, once an ontology element set or an ontology data set has been
introduced as such in this book, the subsequent text may simply refer to it as an
‘ontology’ or either an ‘element set’ or a ‘data set’. Ontologies differ greatly, but all
represent a formal representation of knowledge with rich semantic relationships
between terms.
Types of ontology
Just as we reach a point where the reader is likely to believe they have an understanding
of what an ontology consists of, it is necessary to introduce a range of additional
terminology that has been adopted to describe types of ontologies. Here we briefly
describe four of them: lightweight ontologies; upper ontologies; application profiles;
and ontology languages.
Usability is an important consideration when it comes to the creation of ontologies,
but as Murdock, Buckner and Allen (2012) ask: ‘. . . usability by whom or by what?’
Some have argued that ontologies are ‘unsuited to the rough-and-tumble of real-world
applications once they get beyond a certain level of complexity’ (Brewster and O’Hara,
2007, 565). Lightweight ontologies are ontologies that are designed for ease of use,
processable by machines but also accessible to humans, focusing on core classes (i.e.,
types of entities) and properties rather than constraints and axioms (Rocha da Silva
et al., 2014). ese may be particularly important in the humanities, where concepts
are far less concrete or widely agreed upon. It is lightweight ontologies that have the
widest use, especially on the semantic web, and are the type of many of the ontologies
within this book.
An upper ontology (also known as a foundation ontology) is a general all-inclusive
ontology that can theoretically connect all others. Such an ontology can aid ontology
interoperability and alignment, and provide a starting point for developing more
specific domain ontologies (Opalički and Lovrenčić, 2012). Examples of upper
ontologies include Suggested Upper Merged Ontology (SUMO) (www.adampease.
org/OP), OpenCyc (www.cyc.com/platform/opencyc) and the Basic Formal Ontology
(http://ifomis.uni-saarland.de/bfo). Whether a single, universal ontology is feasible or
desirable for representing the myriad of views and perspectives from different domains
is open to debate, and is oen ignored in the linked data approach to a semantic web.
In this work the focus is less on upper ontologies, and more on what may be referred
14 pRACTICAL OnTOLOgIES
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 14
to as middle-level ontologies, those that are not designed to be universal but are
nonetheless designed to accommodate data from a large number of domains. ese
include Europeana Data Model and CIDOC-CRM, both of which are returned to in
Chapter 3, along with one upper ontology, the Basic Formal Ontology.
Application profiles have been defined as: ‘. . . schemas which consist of data
elements drawn from one or more namespaces, combined together by implementors,
and optimized for a particular local application’ (Heery and Patel, 2000). ey reflect
the practical application of ontologies to meet real-world needs that may differ
considerably from strict standards described in the original documentation.
Increasingly, however, attempts have been made to accommodate the differences in
the requirements of the standard makers and the implementers. Dublin Core Terms
were developed with application profiles and the semantic web in mind (Baker, 2012),
whilst Resource Description and Access (RDA) has both constrained and
unconstrained properties, with the unconstrained properties being independent of
the overarching Functional Requirements for Bibliographic Records (FRBR) model
and having no explicit range or domain. Dublin Core Terms, RDA, and the FRBR
model are all returned to in Chapter 3.
There are also a range of ontology languages, or meta-ontologies (Stewart, 2011,
126), ‘formal languages used to construct ontologies’ (Kalibatiene and Vasilecas,
2011). Each of these languages may allow for different levels of expressiveness and
comprehensiveness, and there have been a number of comparisons of the different
languages over the years (e.g., Gómez-Pérez and Corcho, 2002; Kalibatiene and
Vasilecas, 2011). Whilst there are a number of traditional ontology languages and
web-based ontology languages, and there will undoubtedly be new entrants into the
market in the future, the ontology languages focused on in this book are primarily
the W3C recommendations for the semantic web: Resource Description Framework
(RDF), RDF Schema (RDFS), and Web Ontology Language (OWL). In Warren et
al.’s (2014) survey of ontology use, of the 65 respondents answering the question of
which language they used, 58 stated OWL, 56 RDF and 45 RDFS. There are well
known ontologies that have been published in other languages, e.g., SUMO was
written SUO-KIF, itself a variation of the Knowledge Interchange Format (KIF),
(Niles and Pease, 2001) and OpenCyc makes use of Cycl (Matuszek et al., 2006), but
the potential of the semantic web for bringing together distributed data means that
there is often a semantic web version of the ontologies too. The structuring of the
semantic web is returned to in more detail in Chapter 2.
Ontologies, metadata and linked data
e definition of an ontology provided above overlaps with both metadata and linked
WhAT IS An OnTOLOgy? 15
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 15
data, and it is important to recognize the similarities and the differences between the
different concepts, and how they overlap.
Metadata is generally defined as ‘data about data’, and information professionals
within cultural heritage institutions have traditionally focused heavily on the creation
of metadata to describe the objects within their respective collections. Extensive
standards and methodologies have independently been created for cataloguing and
classifying objects within each type of institution, whether archive, museum, or library,
with the metadata elements reflecting those aspects considered most important within
the community’s culture. is may be the importance of the fonds to the archival
community, reflected in the ability of Encoded Archival Descriptions (EADs) to not
only describe an archive collection but also increasingly smaller parts of the collection
in a hierarchical fashion, or through the extensive history of a specific object that is
possible through the Categories for the Description of Works of Art (CDWA).
e traditional distinction between metadata and data breaks down, however, as we
move from real-world objects to digital objects and many (oen computer scientists)
will say there’s no point in distinguishing between the two, it’s all just data. As van
Hooland and Verborgh (2014, 3) put it: ‘Just as you can always add an extra Lego piece
on top of another, you can always add another layer of metadata to describe metadata.’
Within this work the term metadata is limited to its traditional sense, a set of
elements used to describe a distinct resource, not a part of the resource itself. Where
the resource that is being published is a dataset, and if the dataset has been published
as linked data and the metadata has been published as linked data, then it may be
meaningless to distinguish between the two.
Linked data is the best practice for publishing structured data on the web (van
Hooland and Verborgh, 2014), which is generally agreed to be in accordance with the
four linked data principles set out by Tim Berners-Lee:
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names
3. When someone looks up a URI, provide useful information, using the standards
(RDF*, SPARQL)
4. Include links to other URIs so that they can discover more things
Berners-Lee, 2006
Linked data is an approach to data interoperability which offers an alternative to
having an upper ontology (Murdock, Buckner and Allen, 2012). It cuts through the
complexity of understanding the relationships between different terms for types of
object and attributes used within different data sets by allowing the direct linking
between the terms and instances rather than understanding the relationship via an
16 pRACTICAL OnTOLOgIES
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 16
upper ontology. It is not necessary to know that ‘watercolourist’ and ‘oil painter’ are
linked via the concepts ‘painter’ or ‘artist’ – instead the person J. M. W. Turner in one
data set may be linked to the person J. M. W. Turner in the other directly.
It is important to recognize, however, that not all ontologies are encoded as linked
data, and not all linked data is an ontology. An ontology does not have to be published
on the web or necessarily follow the graph data model of the semantic web’s Resource
Description Frame (RDF); instead it may only be used on a private network (or even
a single computer) and follow a proprietary format. Alternatively, a wide variety of
data may be published as linked data without being an ontology, although when linked
data may be considered an ontology and when it isn’t is open to debate.
Two factors that may be used in distinguishing between linked data that is an
encoded ontology and linked data that isn’t an ontology are dynamism and
exhaustiveness. An ontology is a formal representation of knowledge – it is not the
same as a dynamic database of information; whereas the library catalogue may be
considered an ontology data set or knowledge base, with rich relationships between
authors and their works, the circulation aspect of an integrated library system would
not be. ‘Formal’ also suggests that an ontology is not an ad hoc piece of data marked
up as linked data; marking up the relationships between all the members of the Pre-
Raphaelite Brotherhood in accordance with a particular element set might be
considered an ontology, whereas someone marking up the contact details on their
website would not be (although the element set used to make up the contact details
might be).
What can an ontology do?
Hedden (2010, 15) identifies three principal purposes for taxonomies, each of which
equally applies to ontologies: indexing support, retrieval support, and organization
and navigation support. In addition to which, an ontology can also act as a
knowledge base.
Indexing support
Despite advances in automatic indexing, human cataloguing and indexing continues
to be an important part of the information profession, and controlled vocabularies
can ensure consistency in the terms that are applied. An ontology enables an indexer
to think more broadly about the terms that are applied, with a wider range of
associated terms applicable.
WhAT IS An OnTOLOgy? 17
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 17
Retrieval support
Information retrieval is the other side of indexing and cataloguing; the same terms
that are used to index a document can then be used to retrieve it. e ontology,
however, has a couple of advantages over other controlled vocabularies: less ambiguity
and the potential of complex queries and inference. All controlled vocabularies are
designed to be as unambiguous as possible, distinguishing between potentially
confusing terms through the use of subdivisions, attributes, and scope notes. ey are,
nonetheless, subject to human error, both in their design and their implementation,
and a richer set of relationships with other terms offers less room for ambiguity.
e rich set of relationships within ontologies also allows for more complex queries
to be created for information retrieval. Whereas traditional search is built upon
Boolean operators and faceted search, ontologies allow for increasingly complex graph
matching.
Ontologies can be represented by a graph consisting of concepts and the
relationships between them; for example, Figure 1.2 shows the twelve apostles of Jesus
and the relationships between them as a graph.
For the sake of ease, within Figure 1.2 each of the people is represented by his name
rather than a unique identifier which has a name as an attribute, and the fact that Simon
(brother of Andrew) was subsequently called Peter is overlooked. is simple graph only
includes two types of relationship ‘has Apostle’ and ‘has Brother’, and yet already graph
matching enables the retrieval of results for more complex queries. If such an ontology
18 pRACTICAL OnTOLOgIES
Bartholomew
Matthew
James
philip
Thaddaeus
Simon
Thomas
Judas Iscariot
Simon
Andrew
James
John
has Brother
has Brother
has Apostle
Jesus
Figure 1.2 A graph of Jesus and his twelve apostles
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 18
had been used in the cataloguing of a set of religious texts that have been variously
ascribed to Jesus and his apostles it would now be possible to retrieve results as well as
information from the ontology, by matching query graphs against the knowledge graph
and including variables for the unknown data that the query should retrieve.
Matching the following graph against the graph of Jesus and his twelve apostles
would retrieve all the apostles for the unknown VARIABLE:
Simon, Andrew, James and John would be found to match the unknown VARIABLE
1 (and VARIABLE 2) in the following graph:
e graph matching doesn’t have to be built on explicit relationships alone, but may
also be built on inferred relationships.
‘Inference’ refers to the drawing of new relationships from a data set based on
existing relationships and a set of rules. At its simplest it may be an understanding of
what type of thing an entity is, based on its relationship with something else. For
example, if in a bibliographic ontology the information <Charles Dickens><has
written><Hard Times> is encoded, and the relationship ‘has written’ is only being
used to express the relationship between an author and a work, then the fact that
Charles Dickens is an author and Hard Times is a work can be inferred from the
information.
More extensive rules can allow for greater inference. For example, a genealogy
ontology may encode two facts, that <Adam><has Son><Cain>, and <Adam><has
Son><Abel>. If, as is normally the case, the <has Son> is only used where the target
is a male, it may be inferred that both Cain and Abel are male. An additional rule
stating that sons of the same parent are brothers would also allow this information to
be inferred.
Organization and navigation support
‘Organization and navigation support’ is about the ability to find information through
browsing rather than searching, following the relationships between terms to find
related concepts. For example, the online store Amazon.com has an extensive
WhAT IS An OnTOLOgy? 19
VARIABLE
has Apostle
Jesus
VARIABLE 1
VARIABLE 2
has Apostle
Jesus
has Brother
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 19
taxonomy through which a shopper may browse all the way from the general to the
highly specific:
Books
Politics & Social Sciences
Social Sciences
Library & Information Science
Library Management
Without much experience of a particular taxonomy it may be difficult to find the
desired subject in an extensive taxonomy. Different people will inevitably make
different decisions about the structure of a taxonomy for similar materials, and users
of the taxonomy will have to learn the taxonomists’ idiosyncrasies. For example on
the Amazon.co.uk site ‘Library & Information Sciences’ is found under ‘Reference’
rather than ‘Social Sciences’ (as it is on Amazon.com) and has no further subdivisions:
Books
Reference
Library & Information Sciences
– whilst Amazon.ca contains 18 narrower terms than ‘Library & Information Science’
in comparison to Amazon.com’s five. Although a taxonomy may be wrong in many
different ways, there is no single correct taxonomy.
An ontology has a more complex set of relationships than a thesaurus, which creates
additional challenges for enabling the browsing of resources. Whereas a thesaurus
may be kept separate from the content, e.g., running down the side of the page, an
ontology may be incorporated throughout the structure of the page. For example, the
BBC has developed the Programmes Ontology element set (www.bbc.co.uk/
ontologies/po) to facilitate access to the vast data set about the corporation’s
programme output and associated individuals, this information is encapsulated within
a whole web page rather than one small part of it. Visiting the regular URI for the long
running radio soap opera e Archers (www.bbc.co.uk/programmes/b008ncn6) will
provide a typical HTML page of information about the series; adding .rdf to the end
(www.bbc.co.uk/programmes/b006qpgr.rdf) will provide the underlying information
in a machine-readable format.
The ontology as a knowledge base
An additional purpose, unique to ontologies amongst controlled vocabularies, is the
20 pRACTICAL OnTOLOgIES
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 20
ontology as a knowledge base. e rich web of knowledge within an ontology and the
ability of inferences to be drawn on existing relationships mean that ontologies can
be a rich store of knowledge, not just a means to retrieve knowledge from resources
indexed with a particular ontology. Certain general ontologies, such as DBpedia, draw
together a wide range of information into one data set, and may be queried to produce
results in a form that has not been compiled previously.
Current approaches to information retrieval are limited in their ability to discover
new information (Stock et al., 2012).e use of an ontology as a knowledge base, as
well as increasingly sophisticated information retrieval, is also likely to help with the
discovery of undiscovered public knowledge. Undiscovered public knowledge is the
idea that the discovery of new knowledge does not have to be based on the
investigation of the real world of physical objects and events, but also through the
interrogation of objective knowledge. Swanson (1986) identifies three forms this
undiscovered public knowledge may take:
1) A hidden refutation: the hypothesis and its refutation may not both be known to any
one person.
2) A missing link in the logic of discovery: if no one person knows that A causes B, and B
causes C, then the inference that A causes C cannot be known.
3) Combination of multiple tests: a meta-analysis of multiple weak tests may nonetheless
provide a strong result.
Each of these is fundamentally an information retrieval problem: ensuring that both
hypothesis and refutation are found by a search; ensuring subsequent statements are
found; ensuring that all available tests of sufficient quality are identified. Ontologies
can undoubtedly improve information and knowledge retrieval, and help with the
mining of undiscovered public knowledge, in an increasingly automated fashion.
Ontologies and information professionals
is book is being published at a pivotal point in the history of ontologies. On the one
hand the web and the development of semantic web technologies have provided the
opportunity for ontologies to be adopted by more people in more places than ever
before – bringing together data from around the world into one huge data set that can
be queried by anyone. On the other hand the ideals of a semantic web have had to
adapt to the practicalities of human abilities, recognizing the importance of publishing
data even if it is not accompanied by robust formal ontologies.
is book will not only emphasize the importance and potential of ontologies, but
also the importance of the community of information professionals contributing to
WhAT IS An OnTOLOgy? 21
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 21
the development of new, and increasingly useful, ontologies. Murdock, Buckner and
Allen (2012) point out that one of the problems with ontology development is the
need for ‘double experts’, those with knowledge of ontology design and subject
domains. e community of information professionals have a long tradition of being
‘double experts’, oen coupling a postgraduate information science degree with a
subject specialism, and are ideally placed for a role in facilitating access to the web of
data and the development of ontologies. e role is particularly important if we are to
avoid the risk that an ontologist’s imposition of a domain ontology masks how
practitioners construct meaning (Pike and Gahegan, 2007).
Knowledge and experience of using knowledge organization systems is a
prerequisite for many jobs within the information profession, and the need for
knowledge of ontologies more specifically, is only likely to increase in the future. As
well as taxonomists and ontologists, for whom the development and maintenance of
controlled vocabularies may be a full-time role, knowledge and experience of
ontologies is also necessary as part of a wider skill set in cataloguing, metadata and
curation roles. For those working as a taxonomist for a global information service, a
metadata librarian in a university library, a digital asset cataloguer in a commercial
company or a records manager in a non-profit organization, it is increasingly difficult
to overlook the importance of ontologies.
e focus of the ontologies in this book is on those that are being used on the
semantic web. ere are, of course, many bespoke and proprietary ontologies used
within commercial organizations, attempting to bring together the disparate
information created by departments and units, but those that are of greatest interest
are those that provide the opportunity to share more data than ever before and develop
new insights from across the world.
Alternatives to ontologies
It is important to recognize that ontologies have limitations, and that there are
alternative ways of capturing and analysing data. Some of the limitations can be traced
to the fundamental assumptions that are made when encoding knowledge within
ontologies. Brewster and O’Hara (2007) note two such assumptions: first, the
monolithic nature of knowledge that is continually added to; and second, that concepts
are the fundamental units of ontologies, and these are manipulated with language.
Although there may be few, if any, Kuhnian paradigm shis (Kuhn, 1970) that
invalidate the whole of an ontology, there will nonetheless be changing perspectives
on the meanings and relationships of individual concepts. is is especially true
outside the sciences, where the meaning of concepts and the relationships with
associated concepts can be open to vigorous debate. ere is also much that is difficult
22 pRACTICAL OnTOLOgIES
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 22
or impossible to put adequately into words –so-called tacit knowledge (Polanyi, 1966)
– although Shadbolt and Smart (2015) suggest that rather than tacit knowledge being
seen as something that is impossible to articulate, it should be seen as something that
is more easily articulated in some situations than others.
Approaches to knowledge representation can be broadly categorized as either top-
down or bottom-up (Pike and Gahegan, 2007). Whereas ontologies can oen be
considered top-down models of the world, especially when considering the creation
of universal ontologies such as OpenCyc, the development of linked data and the
semantic web allow for a more bottom-up approach with competing ontologies and
potentially conflicting perspectives. However, even bottom-up approaches to capturing
knowledge from the data that is available have limitations.
e sheer quantity of information available on the web provides, and necessitates,
alternative ways of capturing data, through automatic reading and natural language
processing (NLP). NLP can be used both to extract terms for an ontology or thesaurus,
and apply terms from an ontology or thesaurus during indexing; the difference
between structured and unstructured data is becoming increasingly blurred (van
Hooland and Verborgh, 2014). NLP has its limitations, however, and depending on
the content and purpose of the NLP it is better categorized as a semi-automatic rather
than an automatic process. NLP is not the principal subject of this book, but it is likely
to play an increasing role in the development of ontologies in the future, and the
subject is returned to in Chapter 5.
Neither the limitations of ontologies, nor the alternatives, dismiss the need or
importance of ontologies. Rather, they help us understand where and when ontologies
are appropriate. It may be that in some situations a simpler form of controlled
vocabulary is more appropriate, either a thesaurus or an authority list. Data may be
better stored in a list, a spreadsheet or a relational database than as a graph, whilst
certain types of tacit knowledge may be better captured through video than by trying
to put it into words. Brewster and O’Hara (2007) note that criticisms have been made
that ontologies demand too much work and are too rigid, but such criticisms have
been made about many core information activities, such as cataloguing and
classification in the age of the web, and what we find is that most oen new
technologies complement rather than replace existing technologies. Rather than search
engines replacing the library catalogue, the library catalogue is increasingly integrating
its own information services with the web and, increasingly, the semantic web. Rather
than ontologies replacing earlier forms of controlled vocabularies, they complement
them, providing an increasingly powerful tool for information retrieval and knowledge
representation.
WhAT IS An OnTOLOgy? 23
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 23
The aims of this book
ere are three main aims for this book. e first is to demonstrate to the information
professional the importance of ontologies for knowledge discovery. e second is to
demonstrate the important contribution information professionals can make to the
development of ontologies. Finally, the book aims to provide a practical introduction
to the development of ontologies for information professionals.
is introductory chapter will, hopefully, already have gone some way to
demonstrating the importance of the development of robust and widely used
ontologies in the fight against information overload, and the role of the information
professional in the process. ese ideas will continue to be developed and reinforced
throughout the rest of the book.
In addition to demonstrating the importance of ontologies and the role of the
information professional, the book is also designed to be a practical introduction. It
will introduce some of the existing dominant ontologies that are likely to be of interest
to the information profession, as well as the methods and tools necessary for building
new ontologies and interrogating existing ontologies. LaPolla’s (2013) survey found
the implementation of semantic web compliant catalogues was hindered by a lack of
funding, best practice and awareness of the associated concepts. Whilst the book can
do little about the lack of funding, it will contribute to both discussion on best practice
and increase familiarity with many of the basic concepts. Although a majority
responding to LaPolla’s survey had some familiarity with semantic web concepts, this
fact is clouded by the fact that it was a self-selecting survey and it seems likely that
those with little interest in the semantic web didn’t bother with the survey. Even
amongst those who completed the survey, whereas the vast majority were either very
familiar or somewhat familiar with the concept of the semantic web (90.16%) and
linked data (95.52%), familiarity with more specific technologies necessary for
implementation were far lower: Web Ontology Language (OWL), 53.21%; Simple
Knowledge Organization Systems (SKOS), 43.59%.
No single book could provide an exhaustive introduction to the practicalities of
ontology use and development. Whole books have been written on technologies that
have been covered here in one or two pages; there is a huge variety of soware available
for ontology development; new ontologies are being developed (as well as old ones
falling into disuse); and old standards are changing while new ones are introduced.
Nonetheless, the underlying methods of ontology development change more slowly
than the specifications, and by focusing on the underlying theory the skills related to
one set of technologies can be applied to others.
24 pRACTICAL OnTOLOgIES
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 24
The structure of this book
e rest of this book consists of six chapters, from introducing the semantic web and
some existing ontologies, through adopting, building and interrogating ontologies, to
the future of ontologies:
Chapter 2 – Ontologies and the semantic web
Ontologies have gained added significance in recent years through the adoption of an
increasingly semantic web. Chapter 2 provides an introduction to the semantic web
and the role of ontologies, and how ontologies have been increasingly adopted in a
wide variety of libraries as well as other cultural heritage institutions and commercial
organizations.
Chapter 3 – Existing ontologies
ere is a wide variety of ontologies that have been developed, and knowledge of the
dominant ontologies, their applications and their differences is increasingly essential
to the information professional. Chapter 3 considers some of the main ontologies,
including those ontologies used for representing ontologies, those widely adopted by
libraries and those widely used on the web.
Chapter 4 – Adopting ontologies
e reuse of existing ontologies is important for both the integration of data across
different systems and to avoid the repetition of work. Chapter 4 considers the tools
that are available for identifying existing ontologies, how the ontologies (or elements
thereof) can be combined in the creation of application profiles, and some of the
criteria that should be considered when selecting ontologies.
Chapter 5 – Building ontologies
It is increasingly important that information professionals are not only users of existing
ontologies, but that they build their own ontology for particular applications. Chapter
5 provides both a methodology for building an ontology and an overview of some of
the tools that are available, before leading the reader through the development of a
simple ontology with Protégé, the most popular (and free) soware for ontology
development.
WhAT IS An OnTOLOgy? 25
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 25
Chapter 6 – Interrogating ontologies
Ontologies are not only of interest for the structure they provide, but also for the data
that they contain. Chapter 6 provides an overview of tools available for interrogating
semantic web ontologies, both through Simple Protocol and RDF Query Language
(SPARQL) and web crawlers, to gain new insights.
Chapter 7 – The future of ontologies and the information professional
e final chapter looks to the future of ontologies and the role of the information
professional in their development and use. e future of ontologies will undoubtedly
be a mixture of lightweight and more formal ontologies, and their development is
likely to be integrated with other technologies such as Natural Language Processing
and potentially crowdsourcing workflows. e contribution for the library and
information professional to ontology development also has the potential to change,
expanding from the bibliographic ontologies that will undoubtedly occupy them in
the short term to the development of niche subject specific ontologies in the long term.
26 pRACTICAL OnTOLOgIES
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 26
C H A P T E R 2
Ontologies and the semantic web
Introduction
Interest in ontologies has grown rapidly in recent years due to the adoption of an
increasingly semantic web. e web is by no means the only place where ontologies
may be implemented, but it is the use of ontologies on the semantic web that is the
primary focus of this book, as they have the greatest potential, and as such are likely
to be of greatest interest to the modern library and information professional.
e chapter starts with an introduction to the semantic web and its most recent
incarnation as linked data, before considering more closely the standards that have
been adopted for structuring the semantic web. Finally, the last part of the chapter
looks at how ontologies have been increasingly adopted in a wide variety of libraries
as well as other cultural heritage institutions and commercial organizations.
The semantic web and linked data
e semantic web is about moving from a web of documents to a web of data, from
one that is primarily designed to be read by humans to one that can be read by
machines. It first started gaining widespread attention in 2001 with publications in
Nature (Berners-Lee and Hendler, 2001) and Scientific American (Berners-Lee,
Hendler and Lassila, 2001). e web has put vast quantities of information at our
fingertips, but much of this information is unstructured and it requires a lot of effort
to gather and analyse the information resources that we need. For a simple
informational query we pay little attention to the effort required. If we want to know
what time a show starts it is generally simple enough to enter the name of the theatre
and browse the pages for show times. But as queries require collecting data from
multiple sites, the task can quickly become arduous. Wanting to know which shows
are playing in a five-mile radius of where I am and which start aer 8p.m. would
require aggregating information of multiple sites, or at least visiting a site that had
aggregated that information on my behalf. Some types of information have many
aggregating sites (e.g., hotel and holiday information), but there is a vast amount of
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 27
information that may not be commercially viable for aggregation. Also the aggregators
are not necessarily aggregating the information that you want aggregated; you may
want to know the length of the show, the suitability for a particular age group, or the
accessibility of the venue.
If, however, each site makes its data available in an appropriately structured format,
it becomes simple for this data to be gathered and queried automatically by a wide range
of web agents and services, each of which can query for the information that they require.
e original vision of the semantic web promised a future where an increasing
number of online activities could be accomplished automatically as automated agents
carry out tasks on people’s behalf, not only retrieving information, but potentially
carrying out simple transactions, albeit non-financial ones in the short term.
Despite recognition of the potential of a semantic web its initial adoption may be
considered to have been quite slow: most of our online activities still require a
significant amount of human involvement. Nonetheless, progress on the semantic web
has been made, not only with the establishment of new specifications, but also with
the establishment of a new paradigm for the publishing of data: linked data. Linked
data prioritizes the publishing of data in a machine-readable format rather than the
underlying concepts (van Hooland and Verborgh, 2014), and the simplicity of the
approach has encouraged the publishing of data on the semantic web by a wide range
of individuals and organizations.
ere is also a lot of interaction by people with semantic web technologies that is
hidden from view. Most people’s experience of a semantic web is not through
automatic agents but through the knowledge bases that have been created by the major
search engines and incorporated into their search results. For example, Google has
been incorporating its Knowledge Graph (www.google.com/intl/bn/insidesearch/
features/search/knowledge.html) knowledge base into its search results since 2012.
More recently it has been working on a Knowledge Vault, where the facts are extracted
from the web automatically and it offers unprecedented collection of facts (Hodson,
2014). Nonetheless, even with such a vast collection of facts, it is organized according
to the same manner as the semantic web – in RDF triples.
Resource Description Framework (RDF)
e Resource Description Framework (RDF) is a conceptual model for making
statements about resources through RDF triples. RDF triples are a way of expressing
and relating information as three-part statements or ‘facts’ structured in a simple
subject-predicate-object format. For example, in plain English, a triple could be:
<David><hates><Apple>
28 pRACTICAL OnTOLOgIES
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 28
OnTOLOgIES AnD ThE SEMAnTIC WEB 29
David is the subject, hates is the predicate, and Apple is the object. e idea that such
simple facts may be used to encode all human knowledge may be hard to believe but,
as has been observed by Novak, ‘if the structure and function of all organisms that
live or have lived on earth can simply be coded by triplet sequences of four nitrogenous
base pairs A, G, T and C, there is no reason for a knowledge record to be more
complex’ (cited in Jakus et al., 2013).
Modelling the data in such a way makes it suitable for the distributed nature of the
semantic web, as ‘anyone can make statements about any resource’ (W3C, 2004). If
someone else disagrees with this statement believing <David><loves><Apple>, or
wants to add an additional associated statement such as <Steve><loves><Apple>,
then there is nothing to stop them – a process that would be much more difficult if
the initial data was structured in a table or a relational database on the web.
Of course, having plain text can lead to ambiguity. Aer all, most people reading the
triple <David><hates><Apple>will know many people called David (both fictional
and real) and may associate multiple different objects and organizations with Apple.
ere are aer all, apples the fruit, Apple the music label established by the Beatles and
Apple the technology company. People reading the triple may presume that the David
referred to, is the author of the book, and based on the subject of the book may presume
David is more likely to have strong feelings on a technology company than on a type of
fruit or a music label. But none of this is explicit, and for computers to understand the
information, and for multiple graphs to be joined together, it needs to be made explicit.
To make the triple explicit, URIs may be used to represent each of the ‘resources’; on
the semantic web anything that can be represented is referred to as a resource. e graph
in Figure 2.1 shows a graph for <David><hates><Apple>. Resources are represented
by ovals, and literals are represented by rectangles. In this case the ‘hates’ relationship is
expressed between URIs representing David and Apple on a fictitious social network,
with label from the RDFS vocabulary used to provide human-readable labels to those
URIs. e relationship also takes the form of a URI, allowing a relationship to be used
from an existing ontology (albeit in this case also a fictitious one).
Once subjects, predicates, and objects are unambiguous, multiple facts can be
combined into a single graph that can then be queried by a computer. For example,
Figure 2.1 David hates Apple graph
David Apple
www.socialnetwork.com/AppleTech
www.socialnetwork.com/David
rdfs:label rdfs:label
www.relationships.com/hates
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 29
two additional pieces of information have been added to the graph: David knows Bob;
and Bob loves Apple (Figure 2.2).
Luckily the use of URIs means that it is possible to distinguish between Apple the
music label and Apple the technology company, and we do not expect it to be a source
of friction between Bob and David. In this graph the ‘knows’ relationship has been
taken from the existing FOAF (Friend of a Friend) ontology. FOAF is one of the most
established ontologies on the web, with many of the properties regularly being reused
across the web. As popular ontologies are reused it is possible for additional tools and
services to make use of the data, as people will already know what the property means.
FOAF is returned to in Chapter 3.
Classes, subclasses and properties
Whilst the semantic web is designed to let anyone make statements about any resource,
ontologies are needed to constrain what can be said. is is achieved through classes,
subclasses and properties.
A class is a set of things with properties in common. For example, an ontology may
have a ‘person’ class, an ‘event’ class, or a ‘place’ class. Properties are the attributes
associated with particular classes. For example, a person is likely to have a name, and
depending on the type of ontology properties may also have a sex, date of birth, place
of birth, e-mail address, job title, or any other type of attribute that could be associated
with a person. Ontologies may also state the cardinality of a property (i.e., the number
of times it may be associated with a particular entity) and state the type of objects a
30 pRACTICAL OnTOLOgIES
Figure 2.2 David hates Apple, but knows Bob who loves Apple
David Apple
www.socialnetwork.com/AppleTech
www.socialnetwork.com/David
rdfs:label rdfs:label
www.relationships.com/hates
Apple
rdfs:label
foaf:knows
rdfs:label
Bob
www.socialnetwork.com/AppleMusic
www.socialnetwork.com/Bob www.relationships.com/loves
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 30
property can have as a target. For example, place of birth may either be restricted to a
literal (i.e., string of text) or a link to another resource (e.g., an entity of the place class).
Following typical semantic web style the property and class names used throughout
the rest of the book make use of CamelCase (spaces between words are replaced by
capital letters as the beginning of each word), class names are capitalized, and a CURIE
(compact URI) style is adopted to show where a resource comes from a common data
set or ontology. For example, foaf:Person refers to a class called Person from the
foaf ontology, whereas foaf:familyName and foaf:age refer to properties from
the same ontology. Where an example makes use of a single fictitious vocabulary or
data set, and there is no advantage in coining a fake prefix, it is simply omitted, e.g.,
:colourOfHair would refer to a property from such a fictitious ontology.
e variety of properties that could be associated with a broad class such as Person
are endless, even if many of the potential properties are highly unlikely to be of much
use for most situations or apply to most people: numberOfTeeth; hasCircusSkill;
hasPoliticalAffiliation; hasMurdered. Subclasses enable distinctions to be
made between the properties that can be associated with different subsets of a class.
For example, whilst every Person may have a name and a date of birth,
hasCircusSkill may be associated with the Person subclass, Clown. An entity of
the type Clown would inherit properties associated with both Clown and Person.
Similarly, properties may have subproperties. For example, there may be a ‘knows’
property associated with a Person class, enabling a relationship to be expressed
between one person and one or more other people. But there may also be associated
subproperties of knows, e.g., hasSon, hasEmployee, hasMentor. is allows
ontologies to be queried at different levels of granulation.
Decisions about what is, or is not, a class, subclass or property are not absolute and
different people may make different decisions for different ontologies. If an agent only
ever has one address in an ontology, then it may be that address properties can be
incorporated into the agent class. If, however, an agent has multiple addresses of
different types it makes more sense to group the associated properties together in a
class of their own.
A number of technologies are necessary in going from an abstract idea about
representing a fact as a triple, collecting these triples in classes, and encoding these
triples in such a manner that they are widely understood and services can be built
upon them. e necessary steps are generally illustrated with a semantic web stack.
The semantic web stack
e semantic web stack (also known as the semantic web layer cake) is used to
represent the architecture of the semantic web. It has been visualized in a number of
OnTOLOgIES AnD ThE SEMAnTIC WEB 31
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 31
ways since it was first proposed, and specific information about the necessary
technologies for the different stages has been included as specifications have been
defined and recommended.
Figure 2.3 is based on the Wikipedia version of the semantic web stack as of April
2015.
Identifiers and character sets: URIs and Unicode
At the bottom of the semantic web stack are a number of already widely adopted
technologies for encoding the characters, identifiers and syntax of the semantic web:
Unicode, URIs.
For the information professional, Unicode simply means that RDF is encoded in text,
the sort that can be opened in Notepad on Windows or TextEdit on a Mac. e URIs
(uniform resource identifiers) are globally unique identifiers, the most common of which
are the URLs (uniform resource locators) that web users type in the address bar of their
browser to get the web page they are interested in, e.g., http://www.bbc.co.uk. URIs
also include URNs (uniform resource names) that are location-independent identifiers.
For example, the URN urn:isbn:9781783300624 may refer to the book Practical
Ontologies for Information Professionals, without it providing a location for information
about that book. Increasingly we talk about IRIs (internationalized resource identifiers)
rather than URIs, as it is no longer necessary for resource identifiers to be restricted to
32 pRACTICAL OnTOLOgIES
User Interfaces and Applications
Trust
proof
Cryptography
Unifying Logic
Identifiers: URI
Syntax: XML
Data interchange: RDF
Taxonomies: RDFS
Querying:
SPARQL
Ontologies: OWL Rules: RIF/SWRL
Character set: UNICODE
Figure 2.3 The semantic web stack (based on http://en.wikipedia.org
/wiki/Semantic_Web_Stack)
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 32
the characters of the Latin alphabet. Due to issues regarding the potential lack of support
of non-English character sets by semantic web tools, as well as a global recognition of
the Latin alphabet, URIs are nonetheless currently preferred to IRIs in the development
of ontologies and the term URI is used throughout.
Although any URI may be used as part of the semantic web, and it is not necessary
for anything to be returned for a particular URL, it is recommended that they are
nonetheless dereferenceable when used for linked data (Sauermann and Cyganiak,
2008). at means that when a URI is entered for a particular resource, information
associated with that resource is returned. For example, the URI http://
www.davidstuart.co.uk/resource/123 may be used to refer to a resource in
a data set, but trying to retrieve data from the page would return a HTTP 404 ‘file not
found’ error message. Obviously it is far more useful if the page returns the associated
resource record. ere are, however, issues that arise from creating derefenceable URIs
that need to be considered: most noticeably dealing with content negotiation, and
distinguishing between a web page and the resources that the page describes.
Content negotiation is the HTTP process by which an HTTP client retrieves its
preferred content. For example, a web browser will generally indicate that it prefers
HTML and entering a URI in a web browser retrieves an HTML version of the page
(if one is available). URIs are oen also used to indicate to a server that a different
version of a resource is required. For example, the BBC Programmes Ontology
generally provides an HTML version of a resource (e.g., www.bbc.co.uk/
programmes/b008ncn6) when viewed through a web browser; however, adding .rdf
to the end (www.bbc.co.uk/programmes/b006qpgr.rdf) provides the underlying
information in an RDF/XML serialization (discussed in more detail below).
URIs can also have an important role in distinguishing between the page and the
resource that the page describes. For example, when we dereference a URI representing
Queen Elizabeth II, we do not expect to retrieve the Queen herself from the internet, but
rather a page of information about the queen, and it is important to be able to distinguish
between the two. ere are two approaches to doing this, hash URIs and 303 URIs (Heath
and Bizer, 2011). With 303 URIs (also known as slash URIs), when resource URIs are
requested the server responds with a 303 see other status code, redirecting the client to the
URIofthepageassociatedwiththerequestedresource.isapproachhasbeentakenwith
the publishing of the DBpedia data set. Entering http://dbpedia.org/
resource/Elizabeth_II in a browser will redirect to the page http://
dbpedia.org/page/Elizabeth_II. Hash URIs make use of the fragment identifier,
which may be used in URIs to distinguish between parts of a document, to distinguish
between the page and the resource. For example, if DBpedia had adopted hash URIs then
a resource identifier could be distinguished from a page identifier (http://
dbpedia.org/Elizabeth_II) through the use of a fragment identifier, typically
OnTOLOgIES AnD ThE SEMAnTIC WEB 33
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 33
#this (http://dbpedia.org/Elizabeth_II#this). Although best practice
should enable the distinction between pages and resources, and the use of fragment
identifiers is simple to implement, oen such distinctions are overlooked.
Whether URIs are dereferenceable or not, one additional decision needs to be made
in the coining of URIs: whether they should be descriptive or opaque. e difference
between an opaque URI and a descriptive URI is the difference between http://
dbpedia.org/resource/Elizabeth_II and http://dbpedia.org/resource/
0128422. Without retrieving the associated resource record the first URI is far more self-
explanatory to a person reading the URIs, which may aid in both comprehending and
creating RDF triples, although it should be remembered that for a computer the two URIs
are equally meaningful. Whilst the comprehensibility of descriptive URIs might make it
seem as though they are the natural choice, there are a number of reasons why opaque
URIs might be more appropriate, and they have been incorporated in a wide range of
ontologies, e.g., CHEBI (www.ebi.ac.uk/chebi) and RDA (www.rdaregistry.info). It may
be that opaque URIs are adopted to allow for the evolution of terms (van Hooland and
Verborgh, 2014), to prevent the promotion of any single language, or simply because it is
the simplest way to ingest the data in a system.
Syntax: XML
XML (Extensible Markup Language) is a markup language that is both machine and
human readable. Although it is by no means the only format used for sharing semantic
web data, and is increasingly being challenge by alternative formats, RDF/XML is
longest established of the semantic web serialization formats and most semantic web
data is available in an XML format.
A simple XML file to describe a book may be structured as below:
<book>
<title>Practical Ontologies for Information Professionals</title>
<author>David Stuart</author>
<isbn>9781783300624</isbn>
</book>
Structuring the file as above, however, does not provide unique references for the
element names. e term author is applied in widely different ways in academia;
whereas in the humanities author might be expected to imply the person has had an
active role in composing a journal article, in the sciences it can oen be applied to
dozens of people who have had a role in carrying out an experiment, most of whom
will not have had a role in writing up the results. If the data is to be shared between
34 pRACTICAL OnTOLOgIES
Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 34
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf
dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf

More Related Content

Similar to dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf

PRACTICAL GUIDE TO EU FUNDING OPPORTUNITIES FOR RESEARCH AND INNOVATION
PRACTICAL GUIDE TO EU FUNDING OPPORTUNITIES FOR RESEARCH AND INNOVATIONPRACTICAL GUIDE TO EU FUNDING OPPORTUNITIES FOR RESEARCH AND INNOVATION
PRACTICAL GUIDE TO EU FUNDING OPPORTUNITIES FOR RESEARCH AND INNOVATIONWilson Yesid Contreras Duarte
 
PharmaLedger – Dissemination and In-Project Exploitation Plan
PharmaLedger – Dissemination and In-Project Exploitation PlanPharmaLedger – Dissemination and In-Project Exploitation Plan
PharmaLedger – Dissemination and In-Project Exploitation PlanPharmaLedger
 
Chinese companies in_the_extractive_industries_of_gabon_and_the_drc._ccs_repo...
Chinese companies in_the_extractive_industries_of_gabon_and_the_drc._ccs_repo...Chinese companies in_the_extractive_industries_of_gabon_and_the_drc._ccs_repo...
Chinese companies in_the_extractive_industries_of_gabon_and_the_drc._ccs_repo...Dr Lendy Spires
 
Elena Busso - Professional Thesis - A Comparative study of three European Lif...
Elena Busso - Professional Thesis - A Comparative study of three European Lif...Elena Busso - Professional Thesis - A Comparative study of three European Lif...
Elena Busso - Professional Thesis - A Comparative study of three European Lif...Elena Busso
 
Blockchain in Education. Alexander Grech & Anthony F. Camilleri. Editor Andre...
Blockchain in Education. Alexander Grech & Anthony F. Camilleri. Editor Andre...Blockchain in Education. Alexander Grech & Anthony F. Camilleri. Editor Andre...
Blockchain in Education. Alexander Grech & Anthony F. Camilleri. Editor Andre...eraser Juan José Calderón
 
Unsh daalgwar
Unsh daalgwarUnsh daalgwar
Unsh daalgwaruka0121
 
Service Innovation Casebook
Service Innovation CasebookService Innovation Casebook
Service Innovation CasebookAndrea Cocchi
 
serviceinnovation_casebook_pre_conference
serviceinnovation_casebook_pre_conferenceserviceinnovation_casebook_pre_conference
serviceinnovation_casebook_pre_conferenceAndrea Cocchi
 
"Just Imagine!" a Strategic Foresight exercise
"Just Imagine!" a Strategic Foresight exercise"Just Imagine!" a Strategic Foresight exercise
"Just Imagine!" a Strategic Foresight exerciseJohn Ratcliffe
 
"Just Imagine!" a Strategic Foresight exercise
"Just Imagine!" a Strategic Foresight exercise"Just Imagine!" a Strategic Foresight exercise
"Just Imagine!" a Strategic Foresight exerciseJohn Ratcliffe
 
Business Development Proposal Project for a Retail Merchandising Service Comp...
Business Development Proposal Project for a Retail Merchandising Service Comp...Business Development Proposal Project for a Retail Merchandising Service Comp...
Business Development Proposal Project for a Retail Merchandising Service Comp...Dragan Ocokoljic
 
Business Development Proposal Project for a Retail Merchandising Service Comp...
Business Development Proposal Project for a Retail Merchandising Service Comp...Business Development Proposal Project for a Retail Merchandising Service Comp...
Business Development Proposal Project for a Retail Merchandising Service Comp...Dragan Ocokoljic
 
International business
International businessInternational business
International businessChamal Nandika
 
Review of the Film Sector in Scotland
Review of the Film Sector in Scotland Review of the Film Sector in Scotland
Review of the Film Sector in Scotland Callum Lee
 
IMP³rove – A European Project with Impact – 50 Success Stories on Innovation ...
IMP³rove – A European Project with Impact – 50 Success Stories on Innovation ...IMP³rove – A European Project with Impact – 50 Success Stories on Innovation ...
IMP³rove – A European Project with Impact – 50 Success Stories on Innovation ...IMP³rove Academy
 
Online Travel Marketing for Web 2.0 | MA project
Online Travel Marketing for Web 2.0 | MA projectOnline Travel Marketing for Web 2.0 | MA project
Online Travel Marketing for Web 2.0 | MA projectAhmed Usman Ahmed
 

Similar to dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf (20)

PRACTICAL GUIDE TO EU FUNDING OPPORTUNITIES FOR RESEARCH AND INNOVATION
PRACTICAL GUIDE TO EU FUNDING OPPORTUNITIES FOR RESEARCH AND INNOVATIONPRACTICAL GUIDE TO EU FUNDING OPPORTUNITIES FOR RESEARCH AND INNOVATION
PRACTICAL GUIDE TO EU FUNDING OPPORTUNITIES FOR RESEARCH AND INNOVATION
 
PharmaLedger – Dissemination and In-Project Exploitation Plan
PharmaLedger – Dissemination and In-Project Exploitation PlanPharmaLedger – Dissemination and In-Project Exploitation Plan
PharmaLedger – Dissemination and In-Project Exploitation Plan
 
Chinese companies in_the_extractive_industries_of_gabon_and_the_drc._ccs_repo...
Chinese companies in_the_extractive_industries_of_gabon_and_the_drc._ccs_repo...Chinese companies in_the_extractive_industries_of_gabon_and_the_drc._ccs_repo...
Chinese companies in_the_extractive_industries_of_gabon_and_the_drc._ccs_repo...
 
Elena Busso - Professional Thesis - A Comparative study of three European Lif...
Elena Busso - Professional Thesis - A Comparative study of three European Lif...Elena Busso - Professional Thesis - A Comparative study of three European Lif...
Elena Busso - Professional Thesis - A Comparative study of three European Lif...
 
Blockchain in Education. Alexander Grech & Anthony F. Camilleri. Editor Andre...
Blockchain in Education. Alexander Grech & Anthony F. Camilleri. Editor Andre...Blockchain in Education. Alexander Grech & Anthony F. Camilleri. Editor Andre...
Blockchain in Education. Alexander Grech & Anthony F. Camilleri. Editor Andre...
 
Unsh daalgwar
Unsh daalgwarUnsh daalgwar
Unsh daalgwar
 
Service Innovation Casebook
Service Innovation CasebookService Innovation Casebook
Service Innovation Casebook
 
Gold as a strategic asset
Gold as a strategic assetGold as a strategic asset
Gold as a strategic asset
 
serviceinnovation_casebook_pre_conference
serviceinnovation_casebook_pre_conferenceserviceinnovation_casebook_pre_conference
serviceinnovation_casebook_pre_conference
 
"Just Imagine!" a Strategic Foresight exercise
"Just Imagine!" a Strategic Foresight exercise"Just Imagine!" a Strategic Foresight exercise
"Just Imagine!" a Strategic Foresight exercise
 
"Just Imagine!" a Strategic Foresight exercise
"Just Imagine!" a Strategic Foresight exercise"Just Imagine!" a Strategic Foresight exercise
"Just Imagine!" a Strategic Foresight exercise
 
Benchmarking EU & U.S.
Benchmarking EU & U.S.Benchmarking EU & U.S.
Benchmarking EU & U.S.
 
Business Development Proposal Project for a Retail Merchandising Service Comp...
Business Development Proposal Project for a Retail Merchandising Service Comp...Business Development Proposal Project for a Retail Merchandising Service Comp...
Business Development Proposal Project for a Retail Merchandising Service Comp...
 
Business Development Proposal Project for a Retail Merchandising Service Comp...
Business Development Proposal Project for a Retail Merchandising Service Comp...Business Development Proposal Project for a Retail Merchandising Service Comp...
Business Development Proposal Project for a Retail Merchandising Service Comp...
 
International business
International businessInternational business
International business
 
Review of the Film Sector in Scotland
Review of the Film Sector in Scotland Review of the Film Sector in Scotland
Review of the Film Sector in Scotland
 
IMP³rove – A European Project with Impact – 50 Success Stories on Innovation ...
IMP³rove – A European Project with Impact – 50 Success Stories on Innovation ...IMP³rove – A European Project with Impact – 50 Success Stories on Innovation ...
IMP³rove – A European Project with Impact – 50 Success Stories on Innovation ...
 
CSIRO Energy Resources Report
CSIRO Energy Resources ReportCSIRO Energy Resources Report
CSIRO Energy Resources Report
 
CSIRO Energy Resources Report
CSIRO Energy Resources ReportCSIRO Energy Resources Report
CSIRO Energy Resources Report
 
Online Travel Marketing for Web 2.0 | MA project
Online Travel Marketing for Web 2.0 | MA projectOnline Travel Marketing for Web 2.0 | MA project
Online Travel Marketing for Web 2.0 | MA project
 

Recently uploaded

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 

Recently uploaded (20)

The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 

dokumen.pub_practical-ontologies-for-information-professionals-9781783301522-9781783300624.pdf

  • 1. Practical Ontologies for Information Professionals Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page i
  • 2. Every purchase of a Facet book helps to fund CILIP’s advocacy, awareness and accreditation programmes for information professionals. Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page ii
  • 3. Practical Ontologies for Information Professionals David Stuart Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page iii
  • 4. © David Stuart 2016 Published by Facet Publishing 7 Ridgmount Street, London WC1E 7AE www.facetpublishing.co.uk Facet Publishing is wholly owned by CILIP: the Chartered Institute of Library and Information Professionals. David Stuart has asserted his right under the Copyright, Designs and Patents Act 1988 to be identified as author of this work. Except as otherwise permitted under the Copyright, Designs and Patents Act 1988 this publication may only be reproduced, stored or transmitted in any form or by any means, with the prior permission of the publisher, or, in the case of reprographic reproduction, in accordance with the terms of a licence issued by e Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to Facet Publishing, 7 Ridgmount Street, London WC1E 7AE. Every effort has been made to contact the holders of copyright material reproduced in this text, and thanks are due to them for permission to reproduce the material indicated. If there are any queries please contact the publisher. British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library. ISBN 978-1-78330-062-4 (paperback) ISBN 978-1-78330-104-1 (hardback) ISBN 978-1-78330-152-2 (e-book) First published 2016 Text printed on FSC accredited material. Typeset from author’s files in 10/13 pt Minion Pro and Myriad Pro by Facet Publishing Production. Printed and made in Great Britain by CPI Group (UK) Ltd, Croydon, CR0 4YY. Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page iv
  • 5. Contents List of figures and tables..................................................................................vii 1 What is an ontology?..................................................................................1 Introduction ..........................................................................................................................1 The data deluge and information overload...............................................................1 Defining terms......................................................................................................................4 Knowledge organization systems and ontologies..................................................5 Ontologies, metadata and linked data......................................................................15 What can an ontology do?.............................................................................................17 Ontologies and information professionals..............................................................21 Alternatives to ontologies..............................................................................................22 The aims of this book.......................................................................................................24 The structure of this book..............................................................................................25 2 Ontologies and the semantic web ..........................................................27 Introduction........................................................................................................................27 The semantic web and linked data.............................................................................27 Resource Description Framework (RDF)...................................................................28 Classes, subclasses and properties.............................................................................30 The semantic web stack..................................................................................................31 Embedded RDF..................................................................................................................42 Alternative semantic visions.........................................................................................46 Libraries and the semantic web...................................................................................47 Other cultural heritage institutions and the semantic web ..............................49 Other organizations and the semantic web............................................................50 Conclusion...........................................................................................................................51 3 Existing ontologies ..................................................................................53 Introduction........................................................................................................................53 Ontology documentation..............................................................................................53 Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page v
  • 6. Ontologies for representing ontologies ...................................................................54 Ontologies for libraries....................................................................................................63 Upper ontologies..............................................................................................................68 Cultural heritage data models......................................................................................70 Ontologies for the web...................................................................................................71 Conclusion...........................................................................................................................78 4 Adopting ontologies................................................................................79 Introduction........................................................................................................................79 Reusing ontologies: application profiles and data models ...............................79 Identifying ontologies.....................................................................................................83 The ideal ontology discovery tool ..............................................................................89 Selection criteria................................................................................................................92 Conclusion...........................................................................................................................95 5 Building ontologies..................................................................................97 Introduction........................................................................................................................97 Approaches to building an ontology.........................................................................97 The twelve steps .............................................................................................................100 Ontology development example: Bibliometric Metrics Ontology element set.......................................................................................................................127 Conclusion ........................................................................................................................135 6 Interrogating ontologies.......................................................................137 Introduction .....................................................................................................................137 Interrogating ontologies for reuse...........................................................................138 Interrogating a knowledge base...............................................................................139 Understanding ontology use.....................................................................................148 Conclusion ........................................................................................................................154 7 The future of ontologies and the information professional...............155 Introduction .....................................................................................................................155 The future of ontologies for knowledge discovery............................................155 The future role of library and information professionals .................................158 The practical development of ontologies .............................................................162 Conclusion ........................................................................................................................164 Bibliography...................................................................................................165 Index................................................................................................................179 VI pRACTICAL OnTOLOgIES Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page vi
  • 7. List of figures and tables Figures 1.1 Section of the British national Bibliography graph visualized using RDF gravity..........................................................................................................................11 1.2 A graph of Jesus and his twelve apostles.................................................................18 2.1 David hates Apple graph................................................................................................29 2.2 David hates Apple, but knows Bob who loves Apple..........................................30 2.3 The semantic web stack..................................................................................................32 2.4 An example of an RDF graph........................................................................................41 3.1 A simple person and place ontology using RDF and RDFS................................56 3.2 nature.com data categories as SKOS play tree visualization.............................60 3.3 FRBR entities and relationships representing the intellectual content.........65 3.4 Structuring intellectual content in FaBiO.................................................................66 4.1 Linking between Schema.org and other vocabularies as shown on Linked Open Vocabularies.............................................................................................82 4.2 Word cloud of subject headings of ontologies in BARTOC................................85 4.3 A search for‘person’within the Falcons Ontology Search..................................87 5.1 WebVOWL visualization of FOAF..............................................................................125 5.2 First draft of the Bibliometric Metrics Ontology, with two classes and provisional relationships..............................................................................................128 5.3 Second draft of the renamed Bibliometric Indicators Ontology...................130 5.4 Screenshot of protégé 5.0 with the Entities tab selected ................................131 5.5 properties associated with the Bibliometric Indicators Ontology................133 5.6 Bibliometric Indicators Ontology (BInO) – v. 0.1 .................................................134 6.1 number of reusing vocabularies in rank order....................................................149 Tables 3.1 Dublin Core Terms properties.......................................................................................63 3.2 Comparison of schema:person with foaf:person...................................................76 5.1 Overview of steps in different ontology development methodologies.......99 Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page vii
  • 8. 5.2 Different entities and concepts identified with different spotter algorithms.........................................................................................................................113 6.1 The most common properties associated with schema:Book.......................152 VIII pRACTICAL OnTOLOgIES Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page viii
  • 9. C H A P T E R 1 What is an ontology? Introduction Today more data and information are being produced and shared than ever before; data is streaming forth from new online social behaviours as well as high-specification digital tools and instruments. If we are to extract the maximum value from this data then we need to make use of the most appropriate tools and technologies. Ontologies, formal representations of knowledge with rich semantic relationships, are one such tool, and the focus of this book. is chapter provides an introduction to ontologies, and considers their increasing importance to information professionals. Following a brief overview of the growing information overload and data deluge, the chapter considers the various definitions that have been applied to the term ‘ontology’ and how ontologies differ from associated and overlapping information concepts such as controlled vocabularies, taxonomies, metadata and knowledge bases. Finally, the chapter considers the potential of ontologies for information retrieval and discovering ‘undiscovered public knowledge’, and the role of the librarian in the development, maintenance and curation of ontologies. The data deluge and information overload It is important to start with an understanding of the changing information landscape, reminding ourselves of why we need new tools and technologies, and why it is no longer acceptable to continue with the way things have always been done. We are awash with a wide variety of information and data, but due to the tools that we are currently using the value of much of the data is going to waste. As John Naisbitt (1984, 17) put it, ‘We are drowning in information, but starved for knowledge’. Information is coming from a wide variety of sources. ere has been an explosion in the publishing and sharing of text across the whole of the communication spectrum, from the informal to the formal. Traditional formal publications, such as books and journals, have been joined by e-books and e-journals, with new publishing models based on combinations of self-publishing and open access: the number of self-published Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 1
  • 10. titles published in the USA rose from 85,468 titles in 2008 to 458,564 titles in 2013 (Bowker, 2014); whilst Chen (2014) estimated that the proportion of articles published in the previous year available as open access had either passed or was very close to 50%. In the middle of the formal–informal spectrum of publishing is the grey literature: white papers, reports, technical papers and other, more informal, publications. Whereas once this grey literature could be costly to create and had limited circulation, desktop publishing soware and electronic publishing on the web have put it within reach of a wide range of individuals and organizations. But the growth in these numbers has been dwarfed by the growth of social media and other informal publishing, where the associated numbers are oen in the hundreds of millions if not billions: there are 1.49 billion active Facebook users each month (Facebook, 2015); and over 500 million updates are sent on Twitter on a typical day (Twitter Engineering Blog, 2013). No one can hope to read anything but the smallest fraction of this information, even within the smallest of fields. ere is a need for new tools to help with information retrieval, increasing precision without excessively impacting recall. e narrative text has also been joined by increasing quantities of other text, such as computer code and data sets, as well as rich media (i.e., images and video). Although the lack of data sharing within the academic community has been labelled as the ‘dirty little secret’ of open science data promotion (Borgman, 2012, 1059), the potential of open data and open code to transform the rate of scientific progress (Hey, Tansley and Tolle, 2009) and to encourage more open and accountable governments and encourage citizens’ participation (Raman, 2012) has led to numerous open programs and policies. Governments have signed up to open data charters promising data to be open by default (Cabinet Office, 2013) and funding agencies and journals are increasingly stipulating the need for open data and open code (e.g., Nature, 2014). It is not enough, however, that data and code are open; they need to be findable and reusable by those who want to make use of them too. Whilst the growth of open data may have been slower than some would like, growth in the number of images and videos shared has exploded: since its launch in 2010, over 30 billion images have been shared on Instagram (Instagram, 2015); in May 2014 Snapchat reported 700 million photos sent per day (Techcrunch, 2014); and YouTube counts billions of views every day as people watch hundreds of millions of hours of video (YouTube, 2015). is media is also increasingly of higher quality, part of the trend towards increasingly high specification digital tools and instruments. By 2007 83% of mobile phone cameras had digital cameras, and over the years the specification of these cameras has increased dramatically. By 2012 there were mobile phones with 41 megapixel cameras available, many times more powerful than the first camera phones with 0.1–1 megapixels. e rise of increasingly high specification mobile phone cameras reflects an increase in digital data collection at increasingly high-level 2 pRACTICAL OnTOLOgIES Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 2
  • 11. specifications across a wide range of disciplines and professions. Data per 360 degree scan in computed tomography has gone from 57.6 kB in 1972 to 0.1–1GB by 2010 (Kalender, 2011), whilst the rise in quality and fall in price has increased the number of scans made and the areas outside medicine where computed tomography may be used (e.g., archaeology and paleontology). When the first human genome was declared complete in 2003 it had been a mammoth project taking over ten years and costing US$3billion; now we have entered the US$1000 genome era, where the cost of sequencing the human genome has fallen to a price where it may play a role in predictive and personalized medicine (Hayden, 2014). Projects such as the 100,000 Genome Project are now sequencing thousands of genomes to identify genetic causes for a wide range of human diseases (www.genomicsengland.co.uk/the-100000- genomes-project).e content in any single human genome, however, is dwarfed by the amount of data produced by big science projects such as the Large Hadron Collider, where 19 gigabytes of data were created in the first minute and thirteen petabytes (1015 bytes) in the first year (Brumfiel, 2011). With so much data available, and in increasingly large chunks, it becomes increasingly important that we are accessing and downloading only the most relevant data for analysis. As well as the data people are making a conscious decision to share, there are also the vast digital trails we all increasingly leave as an increasing proportion of our lives are lived online, and processes are digitized. Mobile phones can not only capture pictures, but have built in GPS and accelerometers to track location and movement. Phone (or VOIP) calls can now simply be captured in their entirety, to index or playback in full at a later date if necessary. With the internet as the first port of call for our information needs we are leaving trails of information about the searches we are carrying out, the pages we are visiting and the links we are following. is information is not only restricted to the log files of a single site, but may be aggregated by advertising companies and content providers across multiple sites, enabling the building of increasingly complex profiles on individuals for the tailoring of increasingly personalized advertising and services. As data storage and processing prices have fallen it is no longer necessary to be selective in what we capture: increasingly we capture everything and then search the captured information for what we need later. A process that is epitomized by note- taking soware designed for capturing ‘everything’ and ideas such as life streaming. Wearable technology, such as Google Glass, streamlines the process, as it is no longer necessary to even go to the trouble of taking a smartphone from a pocket. Data inevitably produces more data. e data that is captured is oen indexed, analysed, or combined to spawn more data. A file may be indexed, the contents analysed according to different criteria (e.g., searching for patterns or antecedents), and be accompanied by an ever growing quantity of descriptive, access, and preservation WhAT IS An OnTOLOgy? 3 Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 3
  • 12. metadata. As new questions are asked, and new methods of data analysis developed, the same data set can continue to produce ever increasing quantities of data. We have entered the era of Big Data. ere are vast amounts of structured and unstructured data available, and there are new challenges to ensure that we make use of this data. Neither the exponential growth of science nor the problems of information overload are particularly new problems. e growth and communication of science began to be explored scientifically in the 1950s and 60s, and its exponential growth was one of the subjects of Derek J. de Solla Price’s (1963) seminal Little Science, Big Science. e history of scientific publishing can be seen as one of trying to help researchers overcome the problem of information overload, first with publication of specialist journals, then with specialist abstract and indexing services. However, the web has provided a step-change in the publishing of information. When Ziman (1969) wrote of the problem of having to wade through ‘tomes of irresponsible nonsense’ without peer review, he would have had no idea how large these tomes of irresponsible nonsense would become. e web requires new tools and methods to help users engage with the information that is available, and its brief history has already been one of rapid innovation: from directories to search engines, from information searching to information discovery. We no longer expect always to have to search for the information that we require, but are instead alerted to information we may require, either through the filter of social network sites or algorithmic suggestions (e.g., Google Scholar). ose who successfully find ways of managing the information overload, and of making use of the increasing quantities of data available, will have the competitive advantage. Whether that is the company gathering competitive intelligence on its rivals, the researcher looking for new ways to encode and analyse data, or the international non-governmental organization looking for efficiencies in sharing information. Ontologies are one way of helping to tame some of the problems identified above, providing a structure for this information in such a manner that it can be read automatically and unambiguously, and shared more widely. Defining terms Whenever writing on a specialist subject it is generally advisable to start by defining your terms, as all too oen we follow the example of Humpty Dumpty when he says in Lewis Carroll’s rough the Looking Glass: ‘When I use a word, it means just what I choose it to mean – neither more nor less’. Even within the smallest of fields the same term may have multiple meanings, some of which may be conflicting, a feature that is true for both ‘ontology’ and concepts such as data, information and knowledge, which the ontology is trying to encode. 4 pRACTICAL OnTOLOgIES Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 4
  • 13. Defining data, information, knowledge and wisdom Most topics in information science can’t be discussed for long without running into the terms data, information, or knowledge. Unfortunately the terms are notoriously hard to define, and attempts at capturing knowledge within the library and information science community (e.g., through knowledge management) have sometimes been controversial for seemingly being little more than rebranding exercises. Data, information, knowledge and wisdom are oen conceptualized as a four-step pyramid, from data at the bottom, through information and knowledge, to wisdom at the top. is model was popularized by Ackoff (1989), but analysis of how the terms are used (Rowley, 2007; Zins, 2007) finds them to be the subject of wide-ranging and oen overlapping definitions. Rather than thinking of them as distinct terms, it is more useful to think of them as overlapping areas on a continuum from highly structured and codified information at one end (data) to highly personal tacit understanding at the other (wisdom). Data is the ‘building blocks’ of information and knowledge (Kitchin, 2014), although much of the information and knowledge that we have can seem quite detached from the underlying data. Whereas the route from data to knowledge may seem quite direct in the hard sciences, within the arts and the humanities the relationships between abstract ideas and concepts that form information and knowledge are less readily structured. Ontologies emerged as a way of capturing knowledge, and codifying it in a highly structured manner as data, and this may be applied to knowledge in any discipline. . . . knowledge is inherently complex and the task of capturing it is correspondingly complex. us, we cannot afford to waste whatever knowledge we do succeed in acquiring. Neches et al., 1991, 54 Knowledge organization systems and ontologies Ontologies are one of a number of different knowledge organization systems that have been developed within the information profession to improve information discovery. ese knowledge organization systems are also variously known as ‘taxonomies’ or ‘controlled vocabularies’, depending on the sector within which they are used. Whereas cultural heritage institutions err more towards ‘controlled vocabularies’, the commercial sector tends to use the term ‘taxonomies’. Harpring (2013, 13) defines a controlled vocabulary as: ‘an organized arrangement of words and phrases used to index content and/or to retrieve content through browsing or searching’, very similar to Hedden’s broad definition of a taxonomy in her introduction to e Accidental Taxonomist: WhAT IS An OnTOLOgy? 5 Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 5
  • 14. . . . any knowledge organization system (controlled vocabulary, synonym ring, thesaurus, hierarchical term tree, or ontology) used to support information/content findability, discovery, and access. Hedden, 2010, xxii ere is also a more narrow use of the term taxonomy, in the sense it refers to a hierarchical set of terms (Hedden, 2010; Harpring, 2013), such as the Linnaean taxonomy of biological classification, most people’s first introduction to the term. Within this work the term controlled vocabulary is preferred rather than taxonomy, partly due to the potential for confusion caused by the dual meaning, but also due to the author’s own background within library and information science. Controlled vocabularies have both advantages and disadvantages. Advantages of a controlled vocabulary include improved recall and greater precision through reducing polysemy (van Hooland and Verborgh, 2014). Recall, the proportion of relevant documents that are retrieved out of all the relevant documents in a collection, is increased by the reduction of the number of terms associated with a particular concept. For example, the Dublin Core Metadata Initiative Type Vocabulary is a controlled vocabulary of 12 terms: collection, dataset, event, image (still image and moving image), interactive resource, physical object, service, soware, sound, and text. Without a controlled vocabulary, a wide range of resources that adhere to each of these types could have been referred to differently. e ‘text’ resource type includes letters, books, theses, reports, newspapers, and poems, as well as a host of other texts primarily designed for reading. To ensure the recall of all the associated text resources would require entering all the possible terms. Polysemy refers to multiple meanings for the same term. A controlled vocabulary enables distinctions to be made between the different terms. For example, ‘Apple’ may refer to the fruit, the technology company, a computer created by the technology company, or the record label founded by the Beatles. Within the Library of Congress Subject Headings the fruit has the term ‘Apples’ and the computer is ‘Apple computer’, whilst in the Library of Congress Name Authority File the technology company is ‘Apple Computer, Inc.’ and the record label is ‘Apple Records’. ere are also a number of disadvantages to controlled vocabularies: the cost, the complexity, the slow evolution, and their subjectivity (van Hooland and Verborgh, 2014). Controlled vocabularies are not only expensive to create in the first place, but also to maintain as new names and terminology enter a field. In some situations the slow speed of change may be simply due to limitations in resources; in other situations there may be conflict between the terminology of conservative and progressive perspectives. For example, a comparison of the style guides of le- and right-wing newspapers can be particularly enlightening regarding 6 pRACTICAL OnTOLOgIES Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 6
  • 15. their associated politics. Controlled vocabularies are inevitably subjective, and reflect the world view of the creators at a particular time, and different people in more enlightened times inevitably baulk at previous decisions, especially when there are prohibitively large legacy costs to rectifying previous decisions. For example, the Dewey Decimal Classification system is infamous for class 200 – religion, where seven out of the ten divisions relate to the Bible or Christianity: • 200 Religion • 210 Philosophy & theory of religion • 220 e Bible • 230 Christianity • 240 Christian practice & observance • 250 Christian pastoral practice & religious orders • 260 Christian organization, social work, & worship • 270 History of Christianity • 280 Christian denominations • 290 Other religions. Although there have been attempts to extend many of the other religions in DDC in recent years, particularly Islam (Idrees, 2012), the Dewey legacy nonetheless supports the perception of it being Christian-centric. Some of the most widely used forms of controlled vocabularies within the information profession are subject headings, authority files and thesauri. It is worth considering each of these types of controlled vocabulary, and their limited nature, for comparison with the more expressive nature of ontologies: Subject headings are a controlled set of terms designed to describe the subject or topic of a resource, whether it is book, article, or data set. Popular examples include the Library of Congress Subject Headings (http://id.loc.gov/authorities/subjects.html) and the Medical Subject Headings (MeSH) (www.nlm.nih.gov/mesh/meshhome. html). Subject heading lists ensure that the same term is used to describe a work, rather than multiple similar terms. Authority files are sets of preferred headings. As well as preferred subject headings, there may be preferred organization names, person names, and place names. History is replete with people, places, and organizations that have different names at different times, and successful information retrieval requires the consistent use of terms and relationships between the alternatives: those looking for information on Mark Twain may also want to retrieve information on Samuel Clemens, whilst those researching Constantinople may also wish to retrieve information on Istanbul. Well known examples include the authority files of the major national libraries (e.g., Library of Congress, British WhAT IS An OnTOLOgy? 7 Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 7
  • 16. Library and Bibliothèque Nationale de France). VIAF (Virtual International Authority File) (http://viaf.org) is a project from several national libraries designed to link together the separate authority files of the libraries into one virtual authority file. A thesaurus, like a taxonomy (in the narrower sense of the term), provides hierarchical relationships between concepts (i.e., broader and narrower terms), as well as equivalence and associative relationships. A typical entry in a thesaurus might include all three types of relationship, as in the example below for information science: Information Science Broader terms: Sciences Narrower terms: Computer Science Library Science Use instead of: Informatics Information Industry Related terms: Information Processing Information Skills Knowledge Management Knowledge Representation Library Education e above example is based on ‘Information Science’ in the ERIC (Education Resources Information Center) thesaurus (http://eric.ed.gov). e relationships within a thesaurus enable a reader to traverse from one concept to another more easily, helping to find related content. Other well known examples of thesauri include the Getty esaurus of Geographic Names (www.getty.edu/research/tools/vocabularies/ tgn), the Art & Architecture esaurus (www.getty.edu/research/tools/vocabularies/ aat), and the esaurus for Graphic Materials (www.loc.gov/pictures/collection/tgm) from the Library of Congress. Today controlled vocabularies should also be compared with tagging, which came to prominence with the rise of social media and social networking sites. e vast size and diversity of the web, and its users, drove the need for an approach to classification that was equally global and diverse in outlook, and could be applied by members of the public as well as information professionals. Tagging, the application of uncontrolled terms to online resources, has been incorporated into a large number of services with varying degrees of success. Whilst many of the sites for bookmarking web resources (e.g., del.icio.us) have fallen out of favour, it nonetheless continues to have an important role within sites that are focused around user-generated content: for example, the tagging of images in Flickr and Instagram, and the use of hashtags in Twitter (so called because of the ‘#’ used to denote the tag). In comparison to a 8 pRACTICAL OnTOLOgIES Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 8
  • 17. controlled vocabulary, tagging is likely to have reduced recall and lack precision, but where the scale of the web is concerned there may be few alternative options. An ontology is like a thesaurus, in that there are multiple types of relationship between terms, but it can be non-hierarchical, with a far richer set of relationships, and typically holds a far greater variety of information. e richness of the relationships and information means that it is not only suitable for indexing resources, but may be a knowledge base for knowledge discovery in its own right. Defining an ontology Ontologies first emerged in the Artificial Intelligence (AI) community, borrowing the term ‘ontology’ from philosophy, where ontology is concerned with the study of being or existence. e term was adopted by the AI community in the 1980s for computational models that can enable automated reasoning (Gruber, 2009), having recognized that ‘capturing knowledge is the key to building large and powerful AI systems’ (Neches et al., 1991, 37). Today the most widely used definition of ontology is Gruber’s (1993, 199) definition: ‘an explicit specification of a conceptualization’. This has been criticized for its broadness, incorporating both simple glossaries and ‘logical theories couched in predicate calculus’ (Gruber, 2009, 1964), and also for its focus on subjective concepts rather than entities as they exist in reality (Smith, 2004). Nevertheless, an ontology might be considered a near-synonym with knowledge organization system or taxonomy (in the broad sense).This continuum from informal vocabularies to formal ontologies has been reiterated by the World Wide Web Consortium (W3C) in their introduction to ontologies: ‘There is no clear division between what is referred to as “vocabularies” and “ontologies”’ (W3C, 2013). The broadness of the definition is an important part of the inclusiveness of ontologies for information professionals. It is not just a subject for the AI community, but rather all those involved in the codifying of knowledge, including librarians, archivists, museum workers and domain experts. Nonetheless, a more specific definition is useful for distinguishing between those ontologies that are the primary focus of this book and other examples of controlled vocabularies. Within most definitions of ontologies the distinctive feature of ontologies is the richness of the relationships between terms. For Hedden (2010, 12), an ontology ‘can be considered a type of taxonomy with even more complex relationships between terms than in a thesaurus . . . it aims to describe a domain of knowledge, a subject area, by both its terms . . . and their relationships’. Within an ontology a person does not have to just be related to an event: they may be present at an event, organize an event, take part in an event, be an authority on an event, or possibly instigate an event. WhAT IS An OnTOLOgy? 9 Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 9
  • 18. An example of the richness of the information associated with a particular entity in an ontology is provided below with an author record: Ranganathan, S.R. (Shiyali Ramamrita), 1892-1972 event: 1892 1972 family name: Ranganathan given name: S.R. has created: Colon classification / S.R. Ranganathan The five laws of library science / S.R. Ranganathan name: S.R. Ranganathan type: Agent Person has contributed to: An essay in personal bibliography / A.K. Das Gupta same as: 49268668 The above record is based on the British National Bibliography record for S.R. Ranganathan. It expresses two types of relationship between the author and his associated works: has created, and has contributed to. With the exception of the name, family name, and given name values, each of the properties on this record links to another record for the particular instance, for example, The five laws of library science: The five laws of library science / S.R. Ranganathan bnb: GB6417211 description: 2nd ed originally published (B58-927) Madras Library Association; Blunt 1958. edition statement: 2nd ed. reprinted (with minor amendments) type: BibliographicResource creator: Ranganathan, S.R. (Shiyali Ramamrita), 1892-1972 is part of: Ranganathan series in library science; no 12 language: eng publication event: Asia Publishing House, 1964 same as: GB6417211 subject: 020 Again, many of the properties have their own associated records, creating a huge graph 10 pRACTICAL OnTOLOgIES Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 10
  • 19. of related resources, joining previously disparate authority lists and classification systems. Figure 1.1 shows the graph produced by just the author and instance records mentioned above. Explicit specifications of conceptualizations are important if computers are to successfully communicate with one another without ambiguity, and there is less ambiguity and more scope for drawing inferences if the explicit specifications build upon one another in a more formal manner. ‘Formal’ rather than ‘explicit’ is used in a number of definitions of ontologies: ‘An ontology is a formal specification of a shared conceptualization’ (Borst, 1997,11); ‘Ontologies are formalized vocabularies of terms, oen covering a specific domain and shared by a community of users. ey specify the definitions of terms by describing their relationships with other terms in the ontology’ (W3C, 2012). Others, however, have preferred to combine the two terms: ‘An ontology is a formal and explicit specification of a shared conceptualization’ (Jakus et al., 2013, 29). Whilst a formal ontology would seem to necessitate an ontology being explicit, an explicit ontology does not necessarily need to be particularly formal. e use of relationships in defining terms is a particularly important part of the semantic web due to its distributed nature, with organizations likely to be adhering to different vocabularies. WhAT IS An OnTOLOgy? 11 Figure 1.1 Section of the British National Bibliography graph visualized using RDF Gravity Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 11
  • 20. As well as the richness of the relationships and their explicitness, there is another distinctive feature of ontologies that is widely acknowledged: that they should be a representation of the structure of knowledge, not just a set of indexing terms. Willer and Dunshire (2013, 112) define an ontology as ‘a formal representation of the structure of knowledge and information, and Allemang and Hendler (2011, 1) point out that semantic models are sometimes called ontologies. Although Harpring (2013) acknowledges certain similarities between thesauri and taxonomies and ontologies, she considers them to have fundamentally different goals: …ontologies use strict semantic relationships among terms and attributes with the goal of knowledge representation in machine-readable form, whereas thesauri provide tools for cataloguing and retrieval. Harpring, 2013, 26 e goals of knowledge representation and information retrieval do not have to be mutually exclusive, however, and the same ontology may be used for both. In fact the richness on the relationships may allow for far richer querying and information retrieval. Within this book a fairly broad definition of ontology, albeit not quite as broad as that of Gruber (1993), is taken: An ontology is a formal representation of knowledge with rich semantic relationships between terms. Such ontologies may be more or less formal, depending on the extent to which they define terms with relation to one another and incorporate axioms, and no distinction is made as to whether an ontology is designed either for information retrieval or as a knowledge base. Such a simple definition, however, glosses over the parts that comprise an ontology. The parts of an ontology The definition of an ontology provided above is designed to be inclusive, although it is sometimes necessary to distinguish between different ontologies that fall within this definition. As with Willer and Dunshire’s (2013) definition, it is sometimes used to distinguish the structure of the ontology from the instances. For example, a book ontology might not be expected to include any information about particular books, but rather provide the necessary structure for describing books and the relationships between them and associated types of objects. In other situations an ontology might 12 pRACTICAL OnTOLOgIES Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 12
  • 21. refer to both the structure and the instances, in much the same way as a thesaurus of place names includes the names of places, not just the possible relationships between them. Whether an ontology developer is interested primarily in classes or instances may be expected to differ considerably depending on the discipline. For example, Arp, Smith and Spear, who are primarily interested in the representation of scientific research, believe an ontology is ‘concerned with representing universals’ (2015, 17). However within the arts and humanities it may be the particular facts that are important rather than the general theories, and the general theories do not necessarily have widespread agreement. e W3C Library Linked Data Incubator Group (2011) makes a distinction between metadata element sets and value vocabularies within data sets, with the metadata element set providing the structure for holding the information (e.g., Dublin Core element set) and the value vocabularies providing the values for these elements (e.g., an authority list of author names or place names). is book also distinguishes between the structure and the values of ontologies, although it uses slightly different terminology: • ontology element set • ontology instances. e ontology element set and ontology instances combine to form an ontology data set or knowledge base. e term metadata is one that is already overburdened within the information profession, and may cause confusion when distinguishing between more traditional approaches to cataloguing and the rich semantic nature of ontologies. Metadata is also strongly associated with a particular type of record within the information profession (e.g., a bibliographic record describing a book), and it is important that ontologies are more inclusive than this. ‘Instances’ is a more inclusive term than ‘value vocabulary’, which seems primarily appropriate for existing controlled vocabularies, whereas an instance may be used to refer to any concept or thing within an ontology. A concept is generally an abstract idea that is then given a label, some of which are more concrete than others (e.g., ‘Paris’ may be considered a more concrete concept than ‘Love’), but which are nonetheless abstract. Concepts form the basis of most traditional knowledge organization systems, but ontologies can also deal with more concrete things. As well as the abstract idea of Paris, the one that each of us holds in our minds, with associations of romantic getaways, literary salons or fashion shows, there is the actual physical city with specific boundaries, activities and population at any particular moment. Concepts and things WhAT IS An OnTOLOgy? 13 Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 13
  • 22. are oen blurred within ontologies, but there is nonetheless a wide range of information associated with any particular concept or thing that is not part of many controlled vocabularies. Following Hedden’s (2010, 69) use of ‘term record’, instance record is used to describe all the pieces of information associated with a particular concept or thing, or resource record within the context of the semantic web. For ease of reading, once an ontology element set or an ontology data set has been introduced as such in this book, the subsequent text may simply refer to it as an ‘ontology’ or either an ‘element set’ or a ‘data set’. Ontologies differ greatly, but all represent a formal representation of knowledge with rich semantic relationships between terms. Types of ontology Just as we reach a point where the reader is likely to believe they have an understanding of what an ontology consists of, it is necessary to introduce a range of additional terminology that has been adopted to describe types of ontologies. Here we briefly describe four of them: lightweight ontologies; upper ontologies; application profiles; and ontology languages. Usability is an important consideration when it comes to the creation of ontologies, but as Murdock, Buckner and Allen (2012) ask: ‘. . . usability by whom or by what?’ Some have argued that ontologies are ‘unsuited to the rough-and-tumble of real-world applications once they get beyond a certain level of complexity’ (Brewster and O’Hara, 2007, 565). Lightweight ontologies are ontologies that are designed for ease of use, processable by machines but also accessible to humans, focusing on core classes (i.e., types of entities) and properties rather than constraints and axioms (Rocha da Silva et al., 2014). ese may be particularly important in the humanities, where concepts are far less concrete or widely agreed upon. It is lightweight ontologies that have the widest use, especially on the semantic web, and are the type of many of the ontologies within this book. An upper ontology (also known as a foundation ontology) is a general all-inclusive ontology that can theoretically connect all others. Such an ontology can aid ontology interoperability and alignment, and provide a starting point for developing more specific domain ontologies (Opalički and Lovrenčić, 2012). Examples of upper ontologies include Suggested Upper Merged Ontology (SUMO) (www.adampease. org/OP), OpenCyc (www.cyc.com/platform/opencyc) and the Basic Formal Ontology (http://ifomis.uni-saarland.de/bfo). Whether a single, universal ontology is feasible or desirable for representing the myriad of views and perspectives from different domains is open to debate, and is oen ignored in the linked data approach to a semantic web. In this work the focus is less on upper ontologies, and more on what may be referred 14 pRACTICAL OnTOLOgIES Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 14
  • 23. to as middle-level ontologies, those that are not designed to be universal but are nonetheless designed to accommodate data from a large number of domains. ese include Europeana Data Model and CIDOC-CRM, both of which are returned to in Chapter 3, along with one upper ontology, the Basic Formal Ontology. Application profiles have been defined as: ‘. . . schemas which consist of data elements drawn from one or more namespaces, combined together by implementors, and optimized for a particular local application’ (Heery and Patel, 2000). ey reflect the practical application of ontologies to meet real-world needs that may differ considerably from strict standards described in the original documentation. Increasingly, however, attempts have been made to accommodate the differences in the requirements of the standard makers and the implementers. Dublin Core Terms were developed with application profiles and the semantic web in mind (Baker, 2012), whilst Resource Description and Access (RDA) has both constrained and unconstrained properties, with the unconstrained properties being independent of the overarching Functional Requirements for Bibliographic Records (FRBR) model and having no explicit range or domain. Dublin Core Terms, RDA, and the FRBR model are all returned to in Chapter 3. There are also a range of ontology languages, or meta-ontologies (Stewart, 2011, 126), ‘formal languages used to construct ontologies’ (Kalibatiene and Vasilecas, 2011). Each of these languages may allow for different levels of expressiveness and comprehensiveness, and there have been a number of comparisons of the different languages over the years (e.g., Gómez-Pérez and Corcho, 2002; Kalibatiene and Vasilecas, 2011). Whilst there are a number of traditional ontology languages and web-based ontology languages, and there will undoubtedly be new entrants into the market in the future, the ontology languages focused on in this book are primarily the W3C recommendations for the semantic web: Resource Description Framework (RDF), RDF Schema (RDFS), and Web Ontology Language (OWL). In Warren et al.’s (2014) survey of ontology use, of the 65 respondents answering the question of which language they used, 58 stated OWL, 56 RDF and 45 RDFS. There are well known ontologies that have been published in other languages, e.g., SUMO was written SUO-KIF, itself a variation of the Knowledge Interchange Format (KIF), (Niles and Pease, 2001) and OpenCyc makes use of Cycl (Matuszek et al., 2006), but the potential of the semantic web for bringing together distributed data means that there is often a semantic web version of the ontologies too. The structuring of the semantic web is returned to in more detail in Chapter 2. Ontologies, metadata and linked data e definition of an ontology provided above overlaps with both metadata and linked WhAT IS An OnTOLOgy? 15 Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 15
  • 24. data, and it is important to recognize the similarities and the differences between the different concepts, and how they overlap. Metadata is generally defined as ‘data about data’, and information professionals within cultural heritage institutions have traditionally focused heavily on the creation of metadata to describe the objects within their respective collections. Extensive standards and methodologies have independently been created for cataloguing and classifying objects within each type of institution, whether archive, museum, or library, with the metadata elements reflecting those aspects considered most important within the community’s culture. is may be the importance of the fonds to the archival community, reflected in the ability of Encoded Archival Descriptions (EADs) to not only describe an archive collection but also increasingly smaller parts of the collection in a hierarchical fashion, or through the extensive history of a specific object that is possible through the Categories for the Description of Works of Art (CDWA). e traditional distinction between metadata and data breaks down, however, as we move from real-world objects to digital objects and many (oen computer scientists) will say there’s no point in distinguishing between the two, it’s all just data. As van Hooland and Verborgh (2014, 3) put it: ‘Just as you can always add an extra Lego piece on top of another, you can always add another layer of metadata to describe metadata.’ Within this work the term metadata is limited to its traditional sense, a set of elements used to describe a distinct resource, not a part of the resource itself. Where the resource that is being published is a dataset, and if the dataset has been published as linked data and the metadata has been published as linked data, then it may be meaningless to distinguish between the two. Linked data is the best practice for publishing structured data on the web (van Hooland and Verborgh, 2014), which is generally agreed to be in accordance with the four linked data principles set out by Tim Berners-Lee: 1. Use URIs as names for things 2. Use HTTP URIs so that people can look up those names 3. When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) 4. Include links to other URIs so that they can discover more things Berners-Lee, 2006 Linked data is an approach to data interoperability which offers an alternative to having an upper ontology (Murdock, Buckner and Allen, 2012). It cuts through the complexity of understanding the relationships between different terms for types of object and attributes used within different data sets by allowing the direct linking between the terms and instances rather than understanding the relationship via an 16 pRACTICAL OnTOLOgIES Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 16
  • 25. upper ontology. It is not necessary to know that ‘watercolourist’ and ‘oil painter’ are linked via the concepts ‘painter’ or ‘artist’ – instead the person J. M. W. Turner in one data set may be linked to the person J. M. W. Turner in the other directly. It is important to recognize, however, that not all ontologies are encoded as linked data, and not all linked data is an ontology. An ontology does not have to be published on the web or necessarily follow the graph data model of the semantic web’s Resource Description Frame (RDF); instead it may only be used on a private network (or even a single computer) and follow a proprietary format. Alternatively, a wide variety of data may be published as linked data without being an ontology, although when linked data may be considered an ontology and when it isn’t is open to debate. Two factors that may be used in distinguishing between linked data that is an encoded ontology and linked data that isn’t an ontology are dynamism and exhaustiveness. An ontology is a formal representation of knowledge – it is not the same as a dynamic database of information; whereas the library catalogue may be considered an ontology data set or knowledge base, with rich relationships between authors and their works, the circulation aspect of an integrated library system would not be. ‘Formal’ also suggests that an ontology is not an ad hoc piece of data marked up as linked data; marking up the relationships between all the members of the Pre- Raphaelite Brotherhood in accordance with a particular element set might be considered an ontology, whereas someone marking up the contact details on their website would not be (although the element set used to make up the contact details might be). What can an ontology do? Hedden (2010, 15) identifies three principal purposes for taxonomies, each of which equally applies to ontologies: indexing support, retrieval support, and organization and navigation support. In addition to which, an ontology can also act as a knowledge base. Indexing support Despite advances in automatic indexing, human cataloguing and indexing continues to be an important part of the information profession, and controlled vocabularies can ensure consistency in the terms that are applied. An ontology enables an indexer to think more broadly about the terms that are applied, with a wider range of associated terms applicable. WhAT IS An OnTOLOgy? 17 Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 17
  • 26. Retrieval support Information retrieval is the other side of indexing and cataloguing; the same terms that are used to index a document can then be used to retrieve it. e ontology, however, has a couple of advantages over other controlled vocabularies: less ambiguity and the potential of complex queries and inference. All controlled vocabularies are designed to be as unambiguous as possible, distinguishing between potentially confusing terms through the use of subdivisions, attributes, and scope notes. ey are, nonetheless, subject to human error, both in their design and their implementation, and a richer set of relationships with other terms offers less room for ambiguity. e rich set of relationships within ontologies also allows for more complex queries to be created for information retrieval. Whereas traditional search is built upon Boolean operators and faceted search, ontologies allow for increasingly complex graph matching. Ontologies can be represented by a graph consisting of concepts and the relationships between them; for example, Figure 1.2 shows the twelve apostles of Jesus and the relationships between them as a graph. For the sake of ease, within Figure 1.2 each of the people is represented by his name rather than a unique identifier which has a name as an attribute, and the fact that Simon (brother of Andrew) was subsequently called Peter is overlooked. is simple graph only includes two types of relationship ‘has Apostle’ and ‘has Brother’, and yet already graph matching enables the retrieval of results for more complex queries. If such an ontology 18 pRACTICAL OnTOLOgIES Bartholomew Matthew James philip Thaddaeus Simon Thomas Judas Iscariot Simon Andrew James John has Brother has Brother has Apostle Jesus Figure 1.2 A graph of Jesus and his twelve apostles Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 18
  • 27. had been used in the cataloguing of a set of religious texts that have been variously ascribed to Jesus and his apostles it would now be possible to retrieve results as well as information from the ontology, by matching query graphs against the knowledge graph and including variables for the unknown data that the query should retrieve. Matching the following graph against the graph of Jesus and his twelve apostles would retrieve all the apostles for the unknown VARIABLE: Simon, Andrew, James and John would be found to match the unknown VARIABLE 1 (and VARIABLE 2) in the following graph: e graph matching doesn’t have to be built on explicit relationships alone, but may also be built on inferred relationships. ‘Inference’ refers to the drawing of new relationships from a data set based on existing relationships and a set of rules. At its simplest it may be an understanding of what type of thing an entity is, based on its relationship with something else. For example, if in a bibliographic ontology the information <Charles Dickens><has written><Hard Times> is encoded, and the relationship ‘has written’ is only being used to express the relationship between an author and a work, then the fact that Charles Dickens is an author and Hard Times is a work can be inferred from the information. More extensive rules can allow for greater inference. For example, a genealogy ontology may encode two facts, that <Adam><has Son><Cain>, and <Adam><has Son><Abel>. If, as is normally the case, the <has Son> is only used where the target is a male, it may be inferred that both Cain and Abel are male. An additional rule stating that sons of the same parent are brothers would also allow this information to be inferred. Organization and navigation support ‘Organization and navigation support’ is about the ability to find information through browsing rather than searching, following the relationships between terms to find related concepts. For example, the online store Amazon.com has an extensive WhAT IS An OnTOLOgy? 19 VARIABLE has Apostle Jesus VARIABLE 1 VARIABLE 2 has Apostle Jesus has Brother Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 19
  • 28. taxonomy through which a shopper may browse all the way from the general to the highly specific: Books Politics & Social Sciences Social Sciences Library & Information Science Library Management Without much experience of a particular taxonomy it may be difficult to find the desired subject in an extensive taxonomy. Different people will inevitably make different decisions about the structure of a taxonomy for similar materials, and users of the taxonomy will have to learn the taxonomists’ idiosyncrasies. For example on the Amazon.co.uk site ‘Library & Information Sciences’ is found under ‘Reference’ rather than ‘Social Sciences’ (as it is on Amazon.com) and has no further subdivisions: Books Reference Library & Information Sciences – whilst Amazon.ca contains 18 narrower terms than ‘Library & Information Science’ in comparison to Amazon.com’s five. Although a taxonomy may be wrong in many different ways, there is no single correct taxonomy. An ontology has a more complex set of relationships than a thesaurus, which creates additional challenges for enabling the browsing of resources. Whereas a thesaurus may be kept separate from the content, e.g., running down the side of the page, an ontology may be incorporated throughout the structure of the page. For example, the BBC has developed the Programmes Ontology element set (www.bbc.co.uk/ ontologies/po) to facilitate access to the vast data set about the corporation’s programme output and associated individuals, this information is encapsulated within a whole web page rather than one small part of it. Visiting the regular URI for the long running radio soap opera e Archers (www.bbc.co.uk/programmes/b008ncn6) will provide a typical HTML page of information about the series; adding .rdf to the end (www.bbc.co.uk/programmes/b006qpgr.rdf) will provide the underlying information in a machine-readable format. The ontology as a knowledge base An additional purpose, unique to ontologies amongst controlled vocabularies, is the 20 pRACTICAL OnTOLOgIES Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 20
  • 29. ontology as a knowledge base. e rich web of knowledge within an ontology and the ability of inferences to be drawn on existing relationships mean that ontologies can be a rich store of knowledge, not just a means to retrieve knowledge from resources indexed with a particular ontology. Certain general ontologies, such as DBpedia, draw together a wide range of information into one data set, and may be queried to produce results in a form that has not been compiled previously. Current approaches to information retrieval are limited in their ability to discover new information (Stock et al., 2012).e use of an ontology as a knowledge base, as well as increasingly sophisticated information retrieval, is also likely to help with the discovery of undiscovered public knowledge. Undiscovered public knowledge is the idea that the discovery of new knowledge does not have to be based on the investigation of the real world of physical objects and events, but also through the interrogation of objective knowledge. Swanson (1986) identifies three forms this undiscovered public knowledge may take: 1) A hidden refutation: the hypothesis and its refutation may not both be known to any one person. 2) A missing link in the logic of discovery: if no one person knows that A causes B, and B causes C, then the inference that A causes C cannot be known. 3) Combination of multiple tests: a meta-analysis of multiple weak tests may nonetheless provide a strong result. Each of these is fundamentally an information retrieval problem: ensuring that both hypothesis and refutation are found by a search; ensuring subsequent statements are found; ensuring that all available tests of sufficient quality are identified. Ontologies can undoubtedly improve information and knowledge retrieval, and help with the mining of undiscovered public knowledge, in an increasingly automated fashion. Ontologies and information professionals is book is being published at a pivotal point in the history of ontologies. On the one hand the web and the development of semantic web technologies have provided the opportunity for ontologies to be adopted by more people in more places than ever before – bringing together data from around the world into one huge data set that can be queried by anyone. On the other hand the ideals of a semantic web have had to adapt to the practicalities of human abilities, recognizing the importance of publishing data even if it is not accompanied by robust formal ontologies. is book will not only emphasize the importance and potential of ontologies, but also the importance of the community of information professionals contributing to WhAT IS An OnTOLOgy? 21 Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 21
  • 30. the development of new, and increasingly useful, ontologies. Murdock, Buckner and Allen (2012) point out that one of the problems with ontology development is the need for ‘double experts’, those with knowledge of ontology design and subject domains. e community of information professionals have a long tradition of being ‘double experts’, oen coupling a postgraduate information science degree with a subject specialism, and are ideally placed for a role in facilitating access to the web of data and the development of ontologies. e role is particularly important if we are to avoid the risk that an ontologist’s imposition of a domain ontology masks how practitioners construct meaning (Pike and Gahegan, 2007). Knowledge and experience of using knowledge organization systems is a prerequisite for many jobs within the information profession, and the need for knowledge of ontologies more specifically, is only likely to increase in the future. As well as taxonomists and ontologists, for whom the development and maintenance of controlled vocabularies may be a full-time role, knowledge and experience of ontologies is also necessary as part of a wider skill set in cataloguing, metadata and curation roles. For those working as a taxonomist for a global information service, a metadata librarian in a university library, a digital asset cataloguer in a commercial company or a records manager in a non-profit organization, it is increasingly difficult to overlook the importance of ontologies. e focus of the ontologies in this book is on those that are being used on the semantic web. ere are, of course, many bespoke and proprietary ontologies used within commercial organizations, attempting to bring together the disparate information created by departments and units, but those that are of greatest interest are those that provide the opportunity to share more data than ever before and develop new insights from across the world. Alternatives to ontologies It is important to recognize that ontologies have limitations, and that there are alternative ways of capturing and analysing data. Some of the limitations can be traced to the fundamental assumptions that are made when encoding knowledge within ontologies. Brewster and O’Hara (2007) note two such assumptions: first, the monolithic nature of knowledge that is continually added to; and second, that concepts are the fundamental units of ontologies, and these are manipulated with language. Although there may be few, if any, Kuhnian paradigm shis (Kuhn, 1970) that invalidate the whole of an ontology, there will nonetheless be changing perspectives on the meanings and relationships of individual concepts. is is especially true outside the sciences, where the meaning of concepts and the relationships with associated concepts can be open to vigorous debate. ere is also much that is difficult 22 pRACTICAL OnTOLOgIES Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 22
  • 31. or impossible to put adequately into words –so-called tacit knowledge (Polanyi, 1966) – although Shadbolt and Smart (2015) suggest that rather than tacit knowledge being seen as something that is impossible to articulate, it should be seen as something that is more easily articulated in some situations than others. Approaches to knowledge representation can be broadly categorized as either top- down or bottom-up (Pike and Gahegan, 2007). Whereas ontologies can oen be considered top-down models of the world, especially when considering the creation of universal ontologies such as OpenCyc, the development of linked data and the semantic web allow for a more bottom-up approach with competing ontologies and potentially conflicting perspectives. However, even bottom-up approaches to capturing knowledge from the data that is available have limitations. e sheer quantity of information available on the web provides, and necessitates, alternative ways of capturing data, through automatic reading and natural language processing (NLP). NLP can be used both to extract terms for an ontology or thesaurus, and apply terms from an ontology or thesaurus during indexing; the difference between structured and unstructured data is becoming increasingly blurred (van Hooland and Verborgh, 2014). NLP has its limitations, however, and depending on the content and purpose of the NLP it is better categorized as a semi-automatic rather than an automatic process. NLP is not the principal subject of this book, but it is likely to play an increasing role in the development of ontologies in the future, and the subject is returned to in Chapter 5. Neither the limitations of ontologies, nor the alternatives, dismiss the need or importance of ontologies. Rather, they help us understand where and when ontologies are appropriate. It may be that in some situations a simpler form of controlled vocabulary is more appropriate, either a thesaurus or an authority list. Data may be better stored in a list, a spreadsheet or a relational database than as a graph, whilst certain types of tacit knowledge may be better captured through video than by trying to put it into words. Brewster and O’Hara (2007) note that criticisms have been made that ontologies demand too much work and are too rigid, but such criticisms have been made about many core information activities, such as cataloguing and classification in the age of the web, and what we find is that most oen new technologies complement rather than replace existing technologies. Rather than search engines replacing the library catalogue, the library catalogue is increasingly integrating its own information services with the web and, increasingly, the semantic web. Rather than ontologies replacing earlier forms of controlled vocabularies, they complement them, providing an increasingly powerful tool for information retrieval and knowledge representation. WhAT IS An OnTOLOgy? 23 Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 23
  • 32. The aims of this book ere are three main aims for this book. e first is to demonstrate to the information professional the importance of ontologies for knowledge discovery. e second is to demonstrate the important contribution information professionals can make to the development of ontologies. Finally, the book aims to provide a practical introduction to the development of ontologies for information professionals. is introductory chapter will, hopefully, already have gone some way to demonstrating the importance of the development of robust and widely used ontologies in the fight against information overload, and the role of the information professional in the process. ese ideas will continue to be developed and reinforced throughout the rest of the book. In addition to demonstrating the importance of ontologies and the role of the information professional, the book is also designed to be a practical introduction. It will introduce some of the existing dominant ontologies that are likely to be of interest to the information profession, as well as the methods and tools necessary for building new ontologies and interrogating existing ontologies. LaPolla’s (2013) survey found the implementation of semantic web compliant catalogues was hindered by a lack of funding, best practice and awareness of the associated concepts. Whilst the book can do little about the lack of funding, it will contribute to both discussion on best practice and increase familiarity with many of the basic concepts. Although a majority responding to LaPolla’s survey had some familiarity with semantic web concepts, this fact is clouded by the fact that it was a self-selecting survey and it seems likely that those with little interest in the semantic web didn’t bother with the survey. Even amongst those who completed the survey, whereas the vast majority were either very familiar or somewhat familiar with the concept of the semantic web (90.16%) and linked data (95.52%), familiarity with more specific technologies necessary for implementation were far lower: Web Ontology Language (OWL), 53.21%; Simple Knowledge Organization Systems (SKOS), 43.59%. No single book could provide an exhaustive introduction to the practicalities of ontology use and development. Whole books have been written on technologies that have been covered here in one or two pages; there is a huge variety of soware available for ontology development; new ontologies are being developed (as well as old ones falling into disuse); and old standards are changing while new ones are introduced. Nonetheless, the underlying methods of ontology development change more slowly than the specifications, and by focusing on the underlying theory the skills related to one set of technologies can be applied to others. 24 pRACTICAL OnTOLOgIES Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 24
  • 33. The structure of this book e rest of this book consists of six chapters, from introducing the semantic web and some existing ontologies, through adopting, building and interrogating ontologies, to the future of ontologies: Chapter 2 – Ontologies and the semantic web Ontologies have gained added significance in recent years through the adoption of an increasingly semantic web. Chapter 2 provides an introduction to the semantic web and the role of ontologies, and how ontologies have been increasingly adopted in a wide variety of libraries as well as other cultural heritage institutions and commercial organizations. Chapter 3 – Existing ontologies ere is a wide variety of ontologies that have been developed, and knowledge of the dominant ontologies, their applications and their differences is increasingly essential to the information professional. Chapter 3 considers some of the main ontologies, including those ontologies used for representing ontologies, those widely adopted by libraries and those widely used on the web. Chapter 4 – Adopting ontologies e reuse of existing ontologies is important for both the integration of data across different systems and to avoid the repetition of work. Chapter 4 considers the tools that are available for identifying existing ontologies, how the ontologies (or elements thereof) can be combined in the creation of application profiles, and some of the criteria that should be considered when selecting ontologies. Chapter 5 – Building ontologies It is increasingly important that information professionals are not only users of existing ontologies, but that they build their own ontology for particular applications. Chapter 5 provides both a methodology for building an ontology and an overview of some of the tools that are available, before leading the reader through the development of a simple ontology with Protégé, the most popular (and free) soware for ontology development. WhAT IS An OnTOLOgy? 25 Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 25
  • 34. Chapter 6 – Interrogating ontologies Ontologies are not only of interest for the structure they provide, but also for the data that they contain. Chapter 6 provides an overview of tools available for interrogating semantic web ontologies, both through Simple Protocol and RDF Query Language (SPARQL) and web crawlers, to gain new insights. Chapter 7 – The future of ontologies and the information professional e final chapter looks to the future of ontologies and the role of the information professional in their development and use. e future of ontologies will undoubtedly be a mixture of lightweight and more formal ontologies, and their development is likely to be integrated with other technologies such as Natural Language Processing and potentially crowdsourcing workflows. e contribution for the library and information professional to ontology development also has the potential to change, expanding from the bibliographic ontologies that will undoubtedly occupy them in the short term to the development of niche subject specific ontologies in the long term. 26 pRACTICAL OnTOLOgIES Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 26
  • 35. C H A P T E R 2 Ontologies and the semantic web Introduction Interest in ontologies has grown rapidly in recent years due to the adoption of an increasingly semantic web. e web is by no means the only place where ontologies may be implemented, but it is the use of ontologies on the semantic web that is the primary focus of this book, as they have the greatest potential, and as such are likely to be of greatest interest to the modern library and information professional. e chapter starts with an introduction to the semantic web and its most recent incarnation as linked data, before considering more closely the standards that have been adopted for structuring the semantic web. Finally, the last part of the chapter looks at how ontologies have been increasingly adopted in a wide variety of libraries as well as other cultural heritage institutions and commercial organizations. The semantic web and linked data e semantic web is about moving from a web of documents to a web of data, from one that is primarily designed to be read by humans to one that can be read by machines. It first started gaining widespread attention in 2001 with publications in Nature (Berners-Lee and Hendler, 2001) and Scientific American (Berners-Lee, Hendler and Lassila, 2001). e web has put vast quantities of information at our fingertips, but much of this information is unstructured and it requires a lot of effort to gather and analyse the information resources that we need. For a simple informational query we pay little attention to the effort required. If we want to know what time a show starts it is generally simple enough to enter the name of the theatre and browse the pages for show times. But as queries require collecting data from multiple sites, the task can quickly become arduous. Wanting to know which shows are playing in a five-mile radius of where I am and which start aer 8p.m. would require aggregating information of multiple sites, or at least visiting a site that had aggregated that information on my behalf. Some types of information have many aggregating sites (e.g., hotel and holiday information), but there is a vast amount of Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 27
  • 36. information that may not be commercially viable for aggregation. Also the aggregators are not necessarily aggregating the information that you want aggregated; you may want to know the length of the show, the suitability for a particular age group, or the accessibility of the venue. If, however, each site makes its data available in an appropriately structured format, it becomes simple for this data to be gathered and queried automatically by a wide range of web agents and services, each of which can query for the information that they require. e original vision of the semantic web promised a future where an increasing number of online activities could be accomplished automatically as automated agents carry out tasks on people’s behalf, not only retrieving information, but potentially carrying out simple transactions, albeit non-financial ones in the short term. Despite recognition of the potential of a semantic web its initial adoption may be considered to have been quite slow: most of our online activities still require a significant amount of human involvement. Nonetheless, progress on the semantic web has been made, not only with the establishment of new specifications, but also with the establishment of a new paradigm for the publishing of data: linked data. Linked data prioritizes the publishing of data in a machine-readable format rather than the underlying concepts (van Hooland and Verborgh, 2014), and the simplicity of the approach has encouraged the publishing of data on the semantic web by a wide range of individuals and organizations. ere is also a lot of interaction by people with semantic web technologies that is hidden from view. Most people’s experience of a semantic web is not through automatic agents but through the knowledge bases that have been created by the major search engines and incorporated into their search results. For example, Google has been incorporating its Knowledge Graph (www.google.com/intl/bn/insidesearch/ features/search/knowledge.html) knowledge base into its search results since 2012. More recently it has been working on a Knowledge Vault, where the facts are extracted from the web automatically and it offers unprecedented collection of facts (Hodson, 2014). Nonetheless, even with such a vast collection of facts, it is organized according to the same manner as the semantic web – in RDF triples. Resource Description Framework (RDF) e Resource Description Framework (RDF) is a conceptual model for making statements about resources through RDF triples. RDF triples are a way of expressing and relating information as three-part statements or ‘facts’ structured in a simple subject-predicate-object format. For example, in plain English, a triple could be: <David><hates><Apple> 28 pRACTICAL OnTOLOgIES Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 28
  • 37. OnTOLOgIES AnD ThE SEMAnTIC WEB 29 David is the subject, hates is the predicate, and Apple is the object. e idea that such simple facts may be used to encode all human knowledge may be hard to believe but, as has been observed by Novak, ‘if the structure and function of all organisms that live or have lived on earth can simply be coded by triplet sequences of four nitrogenous base pairs A, G, T and C, there is no reason for a knowledge record to be more complex’ (cited in Jakus et al., 2013). Modelling the data in such a way makes it suitable for the distributed nature of the semantic web, as ‘anyone can make statements about any resource’ (W3C, 2004). If someone else disagrees with this statement believing <David><loves><Apple>, or wants to add an additional associated statement such as <Steve><loves><Apple>, then there is nothing to stop them – a process that would be much more difficult if the initial data was structured in a table or a relational database on the web. Of course, having plain text can lead to ambiguity. Aer all, most people reading the triple <David><hates><Apple>will know many people called David (both fictional and real) and may associate multiple different objects and organizations with Apple. ere are aer all, apples the fruit, Apple the music label established by the Beatles and Apple the technology company. People reading the triple may presume that the David referred to, is the author of the book, and based on the subject of the book may presume David is more likely to have strong feelings on a technology company than on a type of fruit or a music label. But none of this is explicit, and for computers to understand the information, and for multiple graphs to be joined together, it needs to be made explicit. To make the triple explicit, URIs may be used to represent each of the ‘resources’; on the semantic web anything that can be represented is referred to as a resource. e graph in Figure 2.1 shows a graph for <David><hates><Apple>. Resources are represented by ovals, and literals are represented by rectangles. In this case the ‘hates’ relationship is expressed between URIs representing David and Apple on a fictitious social network, with label from the RDFS vocabulary used to provide human-readable labels to those URIs. e relationship also takes the form of a URI, allowing a relationship to be used from an existing ontology (albeit in this case also a fictitious one). Once subjects, predicates, and objects are unambiguous, multiple facts can be combined into a single graph that can then be queried by a computer. For example, Figure 2.1 David hates Apple graph David Apple www.socialnetwork.com/AppleTech www.socialnetwork.com/David rdfs:label rdfs:label www.relationships.com/hates Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 29
  • 38. two additional pieces of information have been added to the graph: David knows Bob; and Bob loves Apple (Figure 2.2). Luckily the use of URIs means that it is possible to distinguish between Apple the music label and Apple the technology company, and we do not expect it to be a source of friction between Bob and David. In this graph the ‘knows’ relationship has been taken from the existing FOAF (Friend of a Friend) ontology. FOAF is one of the most established ontologies on the web, with many of the properties regularly being reused across the web. As popular ontologies are reused it is possible for additional tools and services to make use of the data, as people will already know what the property means. FOAF is returned to in Chapter 3. Classes, subclasses and properties Whilst the semantic web is designed to let anyone make statements about any resource, ontologies are needed to constrain what can be said. is is achieved through classes, subclasses and properties. A class is a set of things with properties in common. For example, an ontology may have a ‘person’ class, an ‘event’ class, or a ‘place’ class. Properties are the attributes associated with particular classes. For example, a person is likely to have a name, and depending on the type of ontology properties may also have a sex, date of birth, place of birth, e-mail address, job title, or any other type of attribute that could be associated with a person. Ontologies may also state the cardinality of a property (i.e., the number of times it may be associated with a particular entity) and state the type of objects a 30 pRACTICAL OnTOLOgIES Figure 2.2 David hates Apple, but knows Bob who loves Apple David Apple www.socialnetwork.com/AppleTech www.socialnetwork.com/David rdfs:label rdfs:label www.relationships.com/hates Apple rdfs:label foaf:knows rdfs:label Bob www.socialnetwork.com/AppleMusic www.socialnetwork.com/Bob www.relationships.com/loves Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 30
  • 39. property can have as a target. For example, place of birth may either be restricted to a literal (i.e., string of text) or a link to another resource (e.g., an entity of the place class). Following typical semantic web style the property and class names used throughout the rest of the book make use of CamelCase (spaces between words are replaced by capital letters as the beginning of each word), class names are capitalized, and a CURIE (compact URI) style is adopted to show where a resource comes from a common data set or ontology. For example, foaf:Person refers to a class called Person from the foaf ontology, whereas foaf:familyName and foaf:age refer to properties from the same ontology. Where an example makes use of a single fictitious vocabulary or data set, and there is no advantage in coining a fake prefix, it is simply omitted, e.g., :colourOfHair would refer to a property from such a fictitious ontology. e variety of properties that could be associated with a broad class such as Person are endless, even if many of the potential properties are highly unlikely to be of much use for most situations or apply to most people: numberOfTeeth; hasCircusSkill; hasPoliticalAffiliation; hasMurdered. Subclasses enable distinctions to be made between the properties that can be associated with different subsets of a class. For example, whilst every Person may have a name and a date of birth, hasCircusSkill may be associated with the Person subclass, Clown. An entity of the type Clown would inherit properties associated with both Clown and Person. Similarly, properties may have subproperties. For example, there may be a ‘knows’ property associated with a Person class, enabling a relationship to be expressed between one person and one or more other people. But there may also be associated subproperties of knows, e.g., hasSon, hasEmployee, hasMentor. is allows ontologies to be queried at different levels of granulation. Decisions about what is, or is not, a class, subclass or property are not absolute and different people may make different decisions for different ontologies. If an agent only ever has one address in an ontology, then it may be that address properties can be incorporated into the agent class. If, however, an agent has multiple addresses of different types it makes more sense to group the associated properties together in a class of their own. A number of technologies are necessary in going from an abstract idea about representing a fact as a triple, collecting these triples in classes, and encoding these triples in such a manner that they are widely understood and services can be built upon them. e necessary steps are generally illustrated with a semantic web stack. The semantic web stack e semantic web stack (also known as the semantic web layer cake) is used to represent the architecture of the semantic web. It has been visualized in a number of OnTOLOgIES AnD ThE SEMAnTIC WEB 31 Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 31
  • 40. ways since it was first proposed, and specific information about the necessary technologies for the different stages has been included as specifications have been defined and recommended. Figure 2.3 is based on the Wikipedia version of the semantic web stack as of April 2015. Identifiers and character sets: URIs and Unicode At the bottom of the semantic web stack are a number of already widely adopted technologies for encoding the characters, identifiers and syntax of the semantic web: Unicode, URIs. For the information professional, Unicode simply means that RDF is encoded in text, the sort that can be opened in Notepad on Windows or TextEdit on a Mac. e URIs (uniform resource identifiers) are globally unique identifiers, the most common of which are the URLs (uniform resource locators) that web users type in the address bar of their browser to get the web page they are interested in, e.g., http://www.bbc.co.uk. URIs also include URNs (uniform resource names) that are location-independent identifiers. For example, the URN urn:isbn:9781783300624 may refer to the book Practical Ontologies for Information Professionals, without it providing a location for information about that book. Increasingly we talk about IRIs (internationalized resource identifiers) rather than URIs, as it is no longer necessary for resource identifiers to be restricted to 32 pRACTICAL OnTOLOgIES User Interfaces and Applications Trust proof Cryptography Unifying Logic Identifiers: URI Syntax: XML Data interchange: RDF Taxonomies: RDFS Querying: SPARQL Ontologies: OWL Rules: RIF/SWRL Character set: UNICODE Figure 2.3 The semantic web stack (based on http://en.wikipedia.org /wiki/Semantic_Web_Stack) Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 32
  • 41. the characters of the Latin alphabet. Due to issues regarding the potential lack of support of non-English character sets by semantic web tools, as well as a global recognition of the Latin alphabet, URIs are nonetheless currently preferred to IRIs in the development of ontologies and the term URI is used throughout. Although any URI may be used as part of the semantic web, and it is not necessary for anything to be returned for a particular URL, it is recommended that they are nonetheless dereferenceable when used for linked data (Sauermann and Cyganiak, 2008). at means that when a URI is entered for a particular resource, information associated with that resource is returned. For example, the URI http:// www.davidstuart.co.uk/resource/123 may be used to refer to a resource in a data set, but trying to retrieve data from the page would return a HTTP 404 ‘file not found’ error message. Obviously it is far more useful if the page returns the associated resource record. ere are, however, issues that arise from creating derefenceable URIs that need to be considered: most noticeably dealing with content negotiation, and distinguishing between a web page and the resources that the page describes. Content negotiation is the HTTP process by which an HTTP client retrieves its preferred content. For example, a web browser will generally indicate that it prefers HTML and entering a URI in a web browser retrieves an HTML version of the page (if one is available). URIs are oen also used to indicate to a server that a different version of a resource is required. For example, the BBC Programmes Ontology generally provides an HTML version of a resource (e.g., www.bbc.co.uk/ programmes/b008ncn6) when viewed through a web browser; however, adding .rdf to the end (www.bbc.co.uk/programmes/b006qpgr.rdf) provides the underlying information in an RDF/XML serialization (discussed in more detail below). URIs can also have an important role in distinguishing between the page and the resource that the page describes. For example, when we dereference a URI representing Queen Elizabeth II, we do not expect to retrieve the Queen herself from the internet, but rather a page of information about the queen, and it is important to be able to distinguish between the two. ere are two approaches to doing this, hash URIs and 303 URIs (Heath and Bizer, 2011). With 303 URIs (also known as slash URIs), when resource URIs are requested the server responds with a 303 see other status code, redirecting the client to the URIofthepageassociatedwiththerequestedresource.isapproachhasbeentakenwith the publishing of the DBpedia data set. Entering http://dbpedia.org/ resource/Elizabeth_II in a browser will redirect to the page http:// dbpedia.org/page/Elizabeth_II. Hash URIs make use of the fragment identifier, which may be used in URIs to distinguish between parts of a document, to distinguish between the page and the resource. For example, if DBpedia had adopted hash URIs then a resource identifier could be distinguished from a page identifier (http:// dbpedia.org/Elizabeth_II) through the use of a fragment identifier, typically OnTOLOgIES AnD ThE SEMAnTIC WEB 33 Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 33
  • 42. #this (http://dbpedia.org/Elizabeth_II#this). Although best practice should enable the distinction between pages and resources, and the use of fragment identifiers is simple to implement, oen such distinctions are overlooked. Whether URIs are dereferenceable or not, one additional decision needs to be made in the coining of URIs: whether they should be descriptive or opaque. e difference between an opaque URI and a descriptive URI is the difference between http:// dbpedia.org/resource/Elizabeth_II and http://dbpedia.org/resource/ 0128422. Without retrieving the associated resource record the first URI is far more self- explanatory to a person reading the URIs, which may aid in both comprehending and creating RDF triples, although it should be remembered that for a computer the two URIs are equally meaningful. Whilst the comprehensibility of descriptive URIs might make it seem as though they are the natural choice, there are a number of reasons why opaque URIs might be more appropriate, and they have been incorporated in a wide range of ontologies, e.g., CHEBI (www.ebi.ac.uk/chebi) and RDA (www.rdaregistry.info). It may be that opaque URIs are adopted to allow for the evolution of terms (van Hooland and Verborgh, 2014), to prevent the promotion of any single language, or simply because it is the simplest way to ingest the data in a system. Syntax: XML XML (Extensible Markup Language) is a markup language that is both machine and human readable. Although it is by no means the only format used for sharing semantic web data, and is increasingly being challenge by alternative formats, RDF/XML is longest established of the semantic web serialization formats and most semantic web data is available in an XML format. A simple XML file to describe a book may be structured as below: <book> <title>Practical Ontologies for Information Professionals</title> <author>David Stuart</author> <isbn>9781783300624</isbn> </book> Structuring the file as above, however, does not provide unique references for the element names. e term author is applied in widely different ways in academia; whereas in the humanities author might be expected to imply the person has had an active role in composing a journal article, in the sciences it can oen be applied to dozens of people who have had a role in carrying out an experiment, most of whom will not have had a role in writing up the results. If the data is to be shared between 34 pRACTICAL OnTOLOgIES Stuart_Practical ontologies_TEXT PROOF_04 28/07/2016 09:14 Page 34