The Semantic Web in Digital Libraries: A Literature Review

Stephen J. Stose 1
12/22/10

The Semantic Web in Digital Libraries: A Literature Review

Introduction

The Web is a big place. Currently, it provides millions, if not billions of documents, each of which a
human can read. Each of these documents has a set of terms that are meant to describe it, and hence
make it easier for users to locate. This meta-information allows search engines such as Google to map
the keywords users type with the terms describing each document. Humans, however, still have to read
through this information and hence determine if each document is indeed relevant to one’s search.

In this way, the Internet is structured syntactically. It is structured, but it does not think. I order to think,
terms need to be related somehow. “The book is written by Mark Twain”, is a thought that can be
represented in propositional space. For those accustomed to searching the Web, writing the terms
“book” and “Mark Twain” is usually enough. These terms are usually connected by default with
operators such as AND, or for those more advanced users, with more specified operators (OR, or NOT,
or the * Wildcard). Given that each resource on the web is merely mapped by terms, the relationship of
Mark Twain to Book (“is a book written by”) may be irrelevant to the Google search results we obtain.
If another person named Mark Twain had burned a book, however, a new relationship ensues, and
suddenly this opens up a complete new conceptual space not excluded by our original search terms.

In this way, relating the terms that describe resources by means of both a syntax and semantics
facilitates narrowing the huge Web space into the context we so specify. On the one hand, a semantics
will allow synonyms for “book” and “Mark Twain” (e.g., “Samuel Clemens”) to be grouped into under
one concept. If resources are also described with predicates that entail relationships between entities
(“X burned Y” or “X wrote Y”), such relationships will also allow the Web to begin to think. In this
way, questions such as “who wrote Tom Sawyer” may locate only resources that provide answers to
that question, and exclude others.

For this reason, it entails researchers to not only describe documents with descriptors, but also to relate
these terms within higher-order representations called ontologies. In this review, I seek to explore these
ontologies in a way specifically relevant for digital libraries, not the Web in general. Digital libraries
are web resources that have, compared to the Web in general, been described with very detailed
metadata. As such, they are extremely well structured, and thus make excellent candidates for the
Semantic Web.

I’ll begin by describing literature that explains the limitations of the Internet in current form. Tim
Berners-Lee believed the Web should not be made up of interconnected resources, but of
interconnected data extracted from these resources. The bibliography I here present will seek to
develop explanations for the tools of this new trade, which at times are quite technical. These tools
require new language formats and new conceptual systems, which must be integrated with existing
systems of metadata description commonly employed in library systems. We’ll review the goals of
these systems, look at how libraries might need to adapt in order to instantiate these new systems, and
what might be required given such adaptation. I’ll argue that libraries, especially digital collections, are
especially well suited to this project due to their having in place a strong syntactic system of
description. While current meta-data standards are easier to adapt, they will still require unexpected and
difficult organizational changes within libraries.

Stephen J. Stose 2
12/22/10

World Wide Web Consortium (2010). Semantic Web. Retrieved April 11, 2010, from:
http://www.w3.org/standards/semanticweb/

This is the international standards organization for the World Wide Web, and is headed by Sir
Tim Berners-Lee. The organization ensures industry members agree on standardized versions of web
technology such that discrepancies and inconsistency does not plague the web and web browser
technology. W3C publishes a web page that publishes standardization recommendations and rules, and
therefore also functions as a portal to learning about new technologies. Indeed, the Semantic Web is an
integral part of W3C’s mission to develop the Web into a universal medium for knowledge exchange,
and contains future project possibilities still unrealized and formal specifications for others currently
underway. For instance, it is a principle source in the field regarding the specifications for RDF
(Resource Description Framework), many of the data exchange protocols (RDF/XML, N3, Turtle), as
well as notation schemas (RDF Schema) and ontology languages, such as OWL. Some and more of
these technologies will be discussed below. W3C remains the best place on the web still for
information regarding these issues, and hence I cite it here first.

Coyle, K. (2008). Meaning, Technology, and the Semantic Web. Journal of Academic
Librarianship 34(3), 263-264.
Coyle discusses the fundamentals of the Semantic Web as per Tim Berners-Lee’s vision. This
article provides a basic framework for understanding how the Internet, as a repository of unstructured
and undifferentiated string of texts, can be harnessed with meaning that machines can process. It
transforms the “web of documents” into a “web of data” by creating “actionable information.” In
beginner’s language, Berner-Lee discusses how semantic triples combine fields (e.g., Dublin Core
fields, such that Creator = x and Title = y) with a predicate (x is the author of y), such that a human
language (e.g., “who wrote y?”) might generate an answer. The author goes on to discuss how RDF is
written to infer meaning by use of vocabulary patterns (URI’s), and what this might mean for libraries
and digital libraries in particular. The article set the stage for my research into how digital collections
are properly planned to include RDF.

Miller, E. (Designer). (2001). Digital libraries and the semantic web. [Web]. Retrieved from
http://www.w3.org/2001/09/06-ecdl/Overview.html

This is another introduction to the Semantic Web and is published on the main consortium
website for the Semantic Web, World Wide Web Consortium (W3C). The 39 slides are lessons that
focus on how the Semantic Web can create “actionable” searches within digital libraries if the digital
library has metadata amenable to an RDF schema of classes and subclasses, and properties and
subproperties. An example is made of this by explaining how the MIT digital repository uses the
DSpace digital management software to combine RDF, and how a formalized reference library such as
www.dmoz.org helps support the merging of distinct ontologies (i.e., vocabulary making up classes and
subclasses). This is a very basic introduction, but does refer to many working examples.

Stephen J. Stose 3
12/22/10

Manola, F. & Miller, E. (2004, February 10). RDF Primer. Retrieved from
http://www.w3.org/TR/2004/REC-rdf-primer-20040210/

This is a comprehensive source to understanding RDF. Resource Description Framework is a
language that serves to represent information about resources spread throughout the World Wide Web.
RDF serves to make documents readable by machines, not just people, by describing resources using
simple properties and property values (i.e., triples). This document is a technical primer. It explains the
technical details of RDF and the RDF/XML that structures these values in terms of triples, or subject-
predicate-object propositions that become machine-readable. It explains how the Web is made up of
URLs and more general URIs. We are familiar with URLs, but Uniform Resource Identifiers can be
created to allow these triples to identify the subjects, predicates and objects in these propositions. These
URIs are often just fragments, preceded by #, trailing the URL. The triples should be expressed in a
common vocabulary, which is constructed into an ontology, something we’ll be discussing more in this
bibliographic review. This resource really explains the technical mechanisms involved, and does not
just refer to using the Semantic Web for digital libraries, but for the Web in general.

Macgregor, G. (2008). Introduction to a special issue on digital libraries and the semantic
web: context, applications and research. Library Review, 57(3), 173-177. doi:
10.1108/00242530810865457.
This article provides some context to our review regarding the integration of the Semantic Web
into digital libraries. It describes the Semantic Web as allowing machines to read the web, much like
humans can. Alternatively, and more specifically, the Semantic Web seeks to include metadata about
the semantics of a web resource (keyword relationships), and not just its syntax (keyword mapping).
Currently, digital libraries describe their digital objects with fields, but do these fields relate to one
another semantically? And do they relate to other digital libraries semantically, such that “author”
might also be understood as “creator” or “100 $a Melville, Herman” (in MARC21). This article
introduces the Semantic Web to digital curators by providing basic level explanation regarding its
potential for use. The Semantic Web promotes better collaboration and exposure of digital objects;
enhances navigation and retrieval amongst heterogeneous environments, allows user profiling and
personalization, and improves interfaces and human-computer interaction. This article seeks to
introduce librarians to the Semantic Web, serves as an introduction to it use, seeks to allay fears for its
integration, and attempts to make the topic more accessible than it is perceived to be.

Fox, E. a., & Marchionini, G. (1998). Toward a worldwide digital library.
Communications of the ACM, 41(4), 29-32. doi: 10.1145/273035.273043.
The article outlines how digital libraries constitute very complex information systems in that
they involve collaboration, preservation, database management, instruction and learning, filtering and
retrieval, property rights, multimedia services, selection/collation, and reference and discovery. These
systems augment the physical objects they represent, and as such are available worldwide, thereby
opening up to a range of uses and users for global exchange and international understanding. A key to
harnessing their power and linking these nationalized resources (the author uses examples of national
federated digital libraries in many countries), however, is interoperability. Linking these resources
require 1) technical interoperability (networks & protocols), 2) informational interoperability
(language, metadata, naming conventions and software interfaces) and 3) social interoperability
(personal and organizational rights/responsibilities). These difficulties are discussed in a series of

Stephen J. Stose 4
12/22/10

articles that discuss Z39.50, multilingual support, community and government integration, and the
collaborative design of design interfaces and query formulation. The article is useful insofar as it serves
as a useful reminder of the goals of designing a digital library, and that technology is not to be
confused for more than a tool.

Lytras, M., Sicilia, M., Davies, J., & Kashyap, V. (2005). Digital libraries in the knowledge
era: Knowledge management and Semantic Web technologies. Library Management, 26(4/5), 170-
175. doi: 10.1108/01435120510596026.
These authors describe the Semantic Web as an extension of the current “metadata intensive”
fields of digital library and digitization programs. The commitment digital libraries keep to a formal
foundation of metadata provides a distinct advantage over traditional HTML markup (which has few
‘meta’ tags). While metadata production and maintenance is already time-consuming, integrating these
services with annotations that refer to shared ontologies may indeed create more complexity, but much
less than involved with the Internet in general. This paper is the first paper in a special issue of Library
Management, and serves to summarize many of the more technical papers that follow. Many of these
papers we cite here in the current annotated bibliography, including Sure & Studer (2005), as well as
Ferran, Mor, & Minguillón (2005).

Sure, Y., & Studer, R. (2005). Semantic Web technologies for digital libraries. Library
Management, 26(4/5), 190-195. doi: 10.1108/01435120510596044.
This article discusses the fact that computing standards specifying structure are falling into
place (e.g., eXtensible markup language). While this alphabet (i.e., structure) has been successful at
allowing different forms of interoperability, it was invented before the invention of a common
language. That is, the Internet still lacks semantic standards. HTML may function well for human
consumption (humans can read the documents) they lack ontologies that provide meaning to the
structure, and hence allow humans and computers to communicate. Ontologies are networks of classes
and subclasses interlinked to describe some domain of interest. As such, they allow inferences, in much
the same way as knowing something is a mammal (class) allows us to know that it produces milk
(subclass), or likely has hair (subclass), or itself is a subclass belonging to the class of animals. The
article explains how Semantic Web technologies might help digital libraries. If the descriptions we use
for objects and repositories utilize standardized ontologies, then this would make it easier for
computerized language to map onto the vagaries of human language. Such queries would also enable
consistent and coherent access to classes and subclasses of digital objects distributed across many
different repositories (i.e., interoperability). The article goes on to discuss how librarians need to
become proficient in using ontology editors such as OntoEdit, KAON, or Protégé, and annotation tools
such as Annotea, OntoMat-Annotizer (CREAM), and KIM; as well as inference engines, which allow
machines, from these descriptions and annotations of objects, to process them logically (if mammal,
then gives milk and has hair). It ends with a description of the Semantic Web layer cake by Time
Berners-Lee.

Brewster, C., & Ohara, K. (2007). Knowledge representation with ontologies: Present
challenges—Future possibilities. International Journal of Human-Computer Studies, 65(7), 563-
568. doi: 10.1016/j.ijhcs.2007.04.003.
This is an excellent introduction into the world of ontology construction for representing human

Stephen J. Stose 5
12/22/10

concepts and knowledge structures in computers. It begins with laying out philosophical assumptions
regarding the nature of knowledge—whether it be facts layered like bricks over time (Rationalists and
Empiricist), or paradigmatic shifts in knowledge that require at times historical re-writes (Quine and
Kuhn). In either case, an ontology has no truth-value; and it represents concepts, not words, even if
humans have to use words to represent concepts. This latter concern is the very reason why we must
standardize our vocabularies, and the second reason is why no one ontological structure permits perfect
external fidelity. Ontologies are representations that by necessity imperfectly represent the external
world. The very fact that institutions utilize different ontologies (whether implicitly or explicitly) is the
reason why knowledge sharing (i.e., interoperability) is problematic. The article discusses how
explicitly representing an institution’s ontology allows for efficient computation by providing for a
way to communicate with machines, given that the ontology itself is one of form of human expression.
The article also argues against the idea that user-generated folksonomies (i.e., social tagging) can
function as ontologies. The article serves as an introduction to further articles in the same issue, those
being focused around OWL and decidability, the augmentation of ontologies, whether ontologies can
represent common sense and ordinary language, and the difficulties of modeling different kinds of
knowledge in different forms (e.g., narratives, multimedia, and enterprises). This is a great
introduction to the current state of the art, even if much of the latter half of the article merely
summarizes difficult literature within the same volume.

Du, T. C., Li, F., & King, I. (2009). Managing knowledge on the Web – Extracting
ontology from HTML Web. Decision Support Systems, 47(4), 319-331. Elsevier B.V. doi:
10.1016/j.dss.2009.02.011.
Ontologies need not be human generated. In this article, an ontology extractor called
OntoSpider is unleashed. Given that the Web is multitudes larger than the collection of digital libraries,
this article provides interesting hope. If the Web itself can be crawled for abstract class-subclass,
property-subproperty relationships, and a machine can build ontologies that represent these
relationships, it seems all the more likely that the hightly structured data librarians create might be
much easier to represent as ontologies. The HTML Web has little meta-description, whereas digital
librarians follow metadata standards that, even if only loosely related to the next, do describe objects in
similar ways (e.g., author vs. creator vs artist etc…). By describing how millions of unstructured
HTML pages can be converted into the Semantic Web by an extractor requiring very little human-
initiated knowledge engineering, the article had the effect of making the integration of digital libraries
seem much simpler. Digital libraries are already quite structured with metadata, even if only syntactic
metadata. HTML web pages, on the other hand, contain little syntactic or semantic cues. Thus, the
article discusses how first the HTML page itself has to be dismantled, simplified and annotated before
an ontology that seeks to represent its concepts semantically can even be constructed. Digital libraries
are, for this reason, much better candidates for the Semantic Web, as the syntactic work is nearly
complete. This is a fantastic introduction to the way the Internet functions, and can be understood with
little mathematics.

Pulido, J., Ruiz, M., Herrera, R., Cabello, E., Legrand, S., Elliman, D., et al. (2006).
Ontology languages for the semantic web: A never completely updated review. Knowledge-Based
Systems, 19(7), 489-497. doi: 10.1016/j.knosys.2006.04.013.
Pulido et al. describe in detail the different ways currently available to describe human
knowledge. Different semantic web languages have been created over the past few years that seek to

Stephen J. Stose 6
12/22/10

enable machines to think. This implies machines can make inferences, but for this to occur, frameworks
are required that support the creation of ontologies. This article describes these tools, from early tools
such as KIF (Knowledge Interchange Format), F-Logic and Dublin Core to more recent formats such as
XML, RDF, Knowledge Annotation methods, OIL (Ontology Interchange Language) and DARPA
Agent Markup Language (DAML), and OWL (Ontology Web Language). These approaches overlap
and have different functions, and the authors describe the “semantic web ontology creation process”
and which tools are involved in each step. The steps are: gathering, extraction, organization,
refinement, merging and retrieval, all occurring for a given domain of knowledge. This article is
fantastic reference to the tools of the trade, describing which tools are used for which task.

Feng, L. (2005). Beyond information searching and browsing: acquiring knowledge from
digital libraries. Information Processing & Management, 41(1), 97-120. doi:
10.1016/j.ipm.2004.04.005.
A digital library system should empower its user to locate information to solve problems. Most
digital library architecture focuses on searching and browsing, a tactical strategy the authors contrast
with more strategic information seeking behavior. The tactical strategy locates documents, whereas a
digital library system that takes information-seeker’s strategy into account helps him or her locate
intelligent answers to questions. It involves the user’s hypotheses and pre-existing knowledge base in
that domain. The authors describe a dual level digital library where tactical document-keyword
matching is augmented with hypothesis-question “mapping”. In this way, higher-order cognitive
questions such as “tell me whether x will cause y” or “tell me what will possibly cause y, give me
referent articles that talk about this” produces answers and justifications for these answers. In this case,
the justification is a series of articles—and perhaps even the part of the article relevant—offered as
evidence for the hypothesized relationship. This is the very part of the Semantic Web that gets me, a
cognitive scientist, very excited. The article does go on to formally describe how this might work,
which does involve a great deal of set theory and higher-order logic.

Burke, M. (2009). The semantic web and the digital library. Aslib Proceedings, 61(3), 316-
322. doi: 10.1108/00012530910959844.
This article is an attempt by a librarian to awaken her field to the power of the Semantic Web
for digital libraries. She begins by discussing how ontologies enable the creation of equivalencies
between the metadata between two or more institutional collections. Web 2.0 projects such as FOAF
(friend of a friend, www.foaf-project.org), SIOC (semantically interlinked online communities,
http://sioc-project.org), DBpedia (http://wiki.dbpedia.org), and Musicbrainz (http://musicbrainz.org) all
allow users to link data objects across many different kinds of databases, and thereby foster data
integration. Of course, these are limited, as their ontologies only relate to specific subject domains.
Libraries have only begun to consider the potential of harvesting the data (i.e., digital objects
undergoing digitization) under their control. As one example, the author discusses in some detail
JeromeDL (www.jeromdl.org) as a “social semantic digital library” that integrates many of these new
Semantic Web features into a library’s existing library management system. The article is extremely
basic and does a poor job of explaining what the Semantic Web really is, but does cite some rich case
studies such as the few above, as well as www.talis.com, a group specializing in semantically rich
metadata. After reading articles like that of Feng above, librarians really have their work cut out for
them.

Stephen J. Stose 7
12/22/10

Greenberg, J. (2007). Advancing the Semantic Web via Library Functions Advancing the
Semantic Web via Library Functions. Imprint, (906118602). doi: 10.1300/J104v43n03.
This oft-cited article in the librarian community is an excellent introduction to how the library
and librarians already encompass the skills and knowledge for semantic web technology, albeit in
different form. She argues that more attention to planning and policy in libraries will accelerate
development in the Semantic Web, instead of merely focusing on folksonomies and social tagging as
its recent tendency seems to primarily exemplify. In doing so, she lists many similarities between the
Semantic Web and library functions. For example, “collection development” might translate to
“semantic web selection” and “cataloging” to “semantic representation”; “reference” to “semantic web
services” and “circulation to “web resource usage”. She then refers to a gap between the Semantic Web
and library communities, and how the library community has been slow to adapt compared to computer
scientists, psychologists, scientists, linguists, and especially the bio-medical community. She
recommends librarians get involved more and learn the necessary technologies, but also that the
Semantic Web community recognize librarian expertise, especially in regards to cataloging. This is a
well-structured and well-written paper, but I find its metaphors unconvincing, if not rather banal. It
does serve as an effective wake-up call for established librarians buried in their enclaves, falling behind
in opening their collections through the advances in information science.

Joint, N. (2008). The practitioner librarian and the semantic web: ANTAEUS. Library
Review, 57(3), 178-186. doi: 10.1108/00242530810865466.
This article attempts to show the enormous potential the Semantic Web has for library science, and in
doing so demystify it’s apparent complexity. It also was quite helpful in understanding some of the
finer mechanics of RDF. Just as CSS stylesheets separated style from content, Tim Berner-Lee
describes the underlying “data structures” as having a similar modularity librarians are familiar with. If
“resource description” in RDF sounds like cataloging or bibliographic description, it is. RDF is indeed
a mark-up language into which the Dublin Core, a cataloging standard, fits. This standard both
describes the resource and allows meaningful relationships to other resources to be drawn. In this
sense, the author describes how this fact serves to distribute it beyond the traditional library’s
centralized philosophy. Whereas the earlier Web (HTML and CSS) did not call this conception of a
“library as a closed application” into question, RDF as a description framework presumes opening
these resources to the wider and less-structured world of the “data web”. This is a nice helpful article
by a librarian with a healthy attitude to the current crisis librarianship faces.

Krause, J. (2008). Semantic heterogeneity: comparing new semantic web approaches with
those of digital libraries. Library Review, 57(3), 235-248. doi: 10.1108/00242530810865501.
This article provides a useful and cogent account of the basic differences implied in the
previous Joint (2008) article cited above. He compares the “Shell Model” of centralized digital libraries
and database catalogs (the “invisible web”) to the Semantic Web. The Shell Model homogenizes search
terms within a centralized location, such that heterogeneous documents described using a standardized
thesaurus can be integrated. Thesauri usually are criticized for their shallowness and limitations on
relations, while being easier to index. The article attempts to describe how the Semantic web
overcomes these limitations if the ontologies used to describe heterogeneous domains (e.g., psychology
vs. gerontology) utilize a standardized vocabulary. This article is unnecessarily difficult, not due to
formal language (e.g., mathematics or logic), but due to its rather unorganized and jargon-laden way of
describing these systems at a very abstract level. The computer science articles, even if I don’t follow

Stephen J. Stose 8
12/22/10

the mathematics, were refreshing in comparison. Nevertheless, the article discusses very important
differences, and is very much worth a second look.

Ferran, N., Mor, E., & Minguillón, J. (2005). Towards personalization in digital libraries
through ontologies. Library Management, 26(4/5), 206-217. doi: 10.1108/01435120510596062.
This article describes the requirements for building a complete navigational profile of users in
their process of searching and browsing a digital library as part of a distance learning component (e.g.,
a history class). As such, it provides a concrete case study of how the Semantic Web might serve to
link a digital library’s educational goals by integrating its use into a specific e-learning environment
(The Universitat Oberta de Ctalunya (UOC) Virtual Library). These personalization initiatives allow a
digital collection to adapt to user necessities and preferences by mapping these parameterized factors
(user profiles, navigational profiles, and user actions) onto an ontology that describes in detail the
various possible scenarios. Thus, when users seek similar types of information, this behavior will
invoke stored processes (i.e., inferences) within the ontology and thereby make use easier. For instance,
exploratory navigation and goal-oriented navigation strategies can be detected and different sets of
recommendations for the digital library’s use can be formulated accordingly. The same system can
detect whether a user is a teacher, student or researcher, and might tailor its recommendations as such.
While I question the usefulness of such a system (machines are quite bad generally at second-guessing
human intention), and myself would find it obnoxious and counter-productive, it is one way of
integrating the Semantic Web into digital collections and more generally education. It provides real-
world application to what often begins to appear a rather abstract field (the Semantic Web in digital
libraries).

Bygstad, B., Ghinea, G., & Klæboe, G. (2009). Organisational challenges of the semantic
web in digital libraries: a Norwegian case study. Online Information Review, 33(5), 973-985. doi:
10.1108/14684520911001945.
This is a comprehensive case study of the National Library of Norway. It analyzes two sources
of information in search of understanding the potential impact semantic web technology might have on
digital libraries: 1) the digitization project of the National Library, and 2) interviews with nine different
stakeholders of this project. The study focuses on the strategic, organizational, and technological
aspects of implementing the semantic web in digital libraries. It found few technological hurdles in
implementation, moderate strategic issues, and high organizational costs. At a strategic level, upper-
level management has to generate organization-wide support for the initiative, which will involve some
changes in the organizational structure in order to augment metadata teams with cross-organizational
ontology-engineering infrastructure. This implies large costs to the organization, as inter-organizational
and cross-organizational structures will need to be instituted to address issues in ontology engineering.
The main lesson the article teaches is that ontology engineering and metadata production will go hand
in hand, and shake up the institutional structure in unexpected ways.

Fuentes-Lorenzo, D., Morato, J., & Gómez, J. M. (2009). Knowledge management in
biomedical libraries: A semantic web approach. Information Systems Frontiers, 11(4), 471-480.
doi: 10.1007/s10796-009-9159-y.
Biomedicine has been pivotal in the development of the semantic web and digital library
technology. Technical programs in bio-informatics abound. The huge amount of data in the life

Stephen J. Stose 9
12/22/10

sciences, not limited to but certainly in part due to the human genome project, has bred researchers in
and around this field towards more efficient data-sharing and data-processing solutions. Extending
these applications with metadata information embedded in ontologies will provide both human users a
firmer understanding of the biomedical elements they seek to process, while allowing machines to use
its formal semantic properties in order to support this reasoning. This article seeks to present design
principles for a simple and “easy-to-use” biomedical Semantic Web. One penetrating example is how
drugs have many different names: Tylenol most of us know, but if the machine is to help, then higher
order terms within this Semantic Web will connect this term within its ontology to the same concepts
that use different words: “DCI”, “acetaminophen” or “NO2 BE01” are other monikers. The article
states that there are on average 19 synonyms for every gene. This article is a wonderful case study for
explicating how the Semantic Web can come alive. Special attention is given to a faceted search
mechanism that allows filtering through the ontology (represented simply in a sidebar) to add
restrictions to result pages, especially useful given these synonyms that operate at different research
levels (physics to chemistry to biology to pharmacology).

Prasad, a., & Madalli, D. P. (2008). Faceted infrastructure for semantic digital libraries.
Library Review, 57(3), 225-234. doi: 10.1108/00242530810865493.
Context based retrieval is sadly missing from many interfaces. Top-down models that prioritize
the user’s experience and allow customization require a bottom-up semantic infrastructure. The article
stresses that a “digital library is only as good as its retrieval efficiency”. Given that most digital
libraries inherited the bibliographic database format with search and browse features only, terms are
often pulled from their context. Metadata is really just knowledge, and knowledge is, as described by
Berner-Lee (see Coyle, 2008, above), just “actionable knowledge”. In most libraries, catalogs and
classification schemes allow only “post-coordinate indexing”, such that search terms connected by
Boolean operators strip resources from their context and place their referents in domain-general
indexes (i.e., indexes with no context). Pre-coordinate indexing (or context-dependent indexing), on the
other hand, includes a term’s semantic context (is domain-specific). This is not easy, as one topic has
many different perspectives, or facets, each that orders a given set of concepts into different class-
subclass ontologies (e.g., think about Ranganathan’s faceted cataloging). The order of these terms
might therefore become part of the interface and as such, the upper and lower classes of each term
contextualize browsing. If I am interested in harvesting, and perhaps sugarcane harvesting, I don’t
want to enter sugarcane :: harvesting space, but instead the harvesting :: sugarcane space, which will
open my browsing up to many other kinds of elements harvested (instead of many different properties
of sugarcane), depending on my intentions (i.e., my hypotheses).

Krestel, R., Kappler, T., & Lockemann, P. C. (2010). Converting a historical architecture
encyclopedia into a semantic knowledge base. IEEE Intelligent Systems.
This ambitious project describes digitizing one of the most respected, yet poorly indexed,
sources on architecture (The Handbook of Architecture, 1880, containing over 25,000 pages), into a
fully indexed and semantically annotated digital library using the MediaWiki (www.mediawiki.org).
After digitization, each section (within a chapter) was converted into a wiki page, which was contained
within another wiki page comprising the chapter, and so on into subsections. The Beta version is
located here: http://durm.semanticsoftware.info/wiki/index.php/Hauptseite. Each wiki page is fully
searchable and indexed, and also each is associated with the digitized original. Natural language
extraction technologies, including logical and morphological analysis, and information extraction,

Stephen J. Stose 10
12/22/10

allowed for many features to be developed. Most notably among these is “automatic summarization”,
which condenses large amounts of text into abstract summaries for readers. While very detailed, this
system allowed for the eventual construction of ontologies from the concepts themselves extracted, and
each of these were linked to the OWL vocabulary for the Semantic Web. This was written for a general
audience, but still includes a detailed account of a very ambitious project.

Madalli, D. P., & Suman, A. (2008). UML for the conceptual web. Online Information
Review, 32(4), 511-515. doi: 10.1108/14684520810897386.
UML stands for Unified Modeling Language. It represents an attempt to develop a faceted
model based on ontologies that are organized according to the way they are in the human mind (a rather
bold statement in this article…). That is, while an ontology is method for structuring concepts in
classes and subclasses, the structure of which do contain rule systems, this article attempts to create
even more constraints on this conceptual network through a developed system of axioms. The article
serves as a proposal and brief literature review of recent attempts to apply these ideas. It explores them
with the goal of developing faceted search/browse features in a user-friendly interface. It also seeks to
contain within this architecture different domains of knowledge inter-operable with the next, such that
subject-experts can continually modify and add to the existing digital library. In their proposal, the
authors suggest that the five fundamental categories of Ranganathan might be used for a fundamental
basis for each facet: personality, matter, energy, space and time.

Mayr, P., Mutschke, P., & Petras, V. (2008). Reducing semantic complexity in distributed
digital libraries: Treatment of term vagueness and document re-ranking. Library Review, 57(3),
213-224. doi: 10.1108/00242530810865484.
One-stop academic federated search portals are becoming increasingly common. Examples
include Elsevier’s Scirus portal, the Online Computer Library Center WorldCat catalog, or the Perseus
project of Tufts University. This article examines the Vascoda portal in Germany, a federated interface
of full-text article databases, library catalogs, and Internet resource collections. Searching these
resources results, however, only allows instances of the terms to be mapped. Large databases
nevertheless provide many results, leaving the impression that it misses few documents (Type-II
errors). These kinds of errors, due to the ambiguity and vagueness in human language, become more
problematic in more specified document repositories, or in collections spanning different databases
with different metadata schemes. This study attempts to deal with this issue of language vagueness by
re-ranking documents according to two parameters. One method is to “bradfordize” results, which
applies the Bradford Law of Scattering. This ranks articles of core journals ahead of the journals that do
not usually deal with the same topic as much. The other method is to rank authors depending on how
prominently they occur (i.e., are cited) within other similar publications (i.e., co-authorship networks).
This article thus takes concepts of the Semantic Web in order to alter ranking algorithms for search
terms.

Barbera, M., Nucci, M., Hahn, D., & Morbidoni, C. (2008). A Semantic Web Powered
Distributed Digital Library System. Electronic Publishing, (June), 130-139.
Given the ever-expanding set of resources available on-line, tools that allow for the intelligent
search of these resources grows. This article presents Talia, a distributed semantic digital library with
an annotation system especially tailored to research in philosophy. In a sense, Talia is a digital archive

Stephen J. Stose 11
12/22/10

management system, much like ContentDM or Omeka functions, but contains within its suite tools
developed especially for the Semantic Web. It this sense it combines a digital archive management
system with en electronic publishing system (an on-line peer review system), each resource utilizing a
distinct and stable URI. It is written using Ruby-on-Rails and the knowledge base of unchanging URIs
are organized in RDF by using the RDFS/OWL vocabularies to describe its ontologies. Any digital
library published with Talia can be interconnected which allows cohesive scholarly communities to
interact without giving up control over one’s own content. This is a fantastic system, one I wish were
available in PHP, such that digital libraries developed in Drupal (my own recent work) might be ported
as such. I reviewed in detail one such digital library that utilizes Talia, the Discovery Project, in our
class blog: http://jahurst.mysite.syr.edu/ist677s2010/?p=229.

Hull, D., Pettifer, S. R., & Kell, D. B. (2008). Defrosting the digital library: bibliographic
tools for the next generation web. (J. McEntyre). PLoS Computational Biology, 4(10), e1000204.
doi: 10.1371/journal.pcbi.1000204.
This review discusses the current digital libraries in use by computational biologists. This
includes PubMed, IEEE Xplore, the ACM digital library, ISI Web of Knowledge, Scopus, Citeseer,
arXive, DBLP and Google Scholar. These tools are described as being “cold” to the user, and are
contrasted with newer “warmer” tools such as Zotero, Mekentosj Papers, MyNCBI, CiteULike,
Connotea, HubMed, and Mendeley. These latter versions take advantage of the social web in order to
make these digital libraries more accessible, friendly, and personal. I myself use CiteULike and
Mendeley. In fact, to get an idea of how Mendely works, please visit my profile, in which I share with
you all of the articles used in this current bibliographic review.

http://www.mendeley.com/research-papers/collections/2216691/The-Semantic-Web-in-Digital-
Libraries/

This resource comes with an application I utilize on my desktop that essentially provides an
annotative interface for my personal library of PDF documents. These PDFs are automatically OCR’d
and hence searchable. This allows for me to comment and/or annotate parts of each paper as well,
within the document, as well as link to OpenOffice to format bibliographies in various styles. The local
interface can be synchronized at the touch of a button with the web interface, such that I can share my
bibliographies with selected users, or even make them public, as I have with this collection at the URL
above. I am still learning about how to use this resource, but using it to manage aspects of the current
annotated bibliography as part of the assignment was rather enlightening. It being OpenSource is
extremely valuable, as I have used EndNote and RefWorks at various times and lose my organization
due to proprietary licensing; and hence lost my bibliographies.

The Semantic Web in Digital Libraries: A Literature Review

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to The Semantic Web in Digital Libraries: A Literature Review

Similar to The Semantic Web in Digital Libraries: A Literature Review (20)

Recently uploaded

Recently uploaded (20)

The Semantic Web in Digital Libraries: A Literature Review