SlideShare a Scribd company logo
1 of 11
Download to read offline
Stephen J. Stose                                                                                                1
12/22/10


The Semantic Web in Digital Libraries: A Literature Review

Introduction

The Web is a big place. Currently, it provides millions, if not billions of documents, each of which a
human can read. Each of these documents has a set of terms that are meant to describe it, and hence
make it easier for users to locate. This meta-information allows search engines such as Google to map
the keywords users type with the terms describing each document. Humans, however, still have to read
through this information and hence determine if each document is indeed relevant to one’s search.

In this way, the Internet is structured syntactically. It is structured, but it does not think. I order to think,
terms need to be related somehow. “The book is written by Mark Twain”, is a thought that can be
represented in propositional space. For those accustomed to searching the Web, writing the terms
“book” and “Mark Twain” is usually enough. These terms are usually connected by default with
operators such as AND, or for those more advanced users, with more specified operators (OR, or NOT,
or the * Wildcard). Given that each resource on the web is merely mapped by terms, the relationship of
Mark Twain to Book (“is a book written by”) may be irrelevant to the Google search results we obtain.
If another person named Mark Twain had burned a book, however, a new relationship ensues, and
suddenly this opens up a complete new conceptual space not excluded by our original search terms.

In this way, relating the terms that describe resources by means of both a syntax and semantics
facilitates narrowing the huge Web space into the context we so specify. On the one hand, a semantics
will allow synonyms for “book” and “Mark Twain” (e.g., “Samuel Clemens”) to be grouped into under
one concept. If resources are also described with predicates that entail relationships between entities
(“X burned Y” or “X wrote Y”), such relationships will also allow the Web to begin to think. In this
way, questions such as “who wrote Tom Sawyer” may locate only resources that provide answers to
that question, and exclude others.

For this reason, it entails researchers to not only describe documents with descriptors, but also to relate
these terms within higher-order representations called ontologies. In this review, I seek to explore these
ontologies in a way specifically relevant for digital libraries, not the Web in general. Digital libraries
are web resources that have, compared to the Web in general, been described with very detailed
metadata. As such, they are extremely well structured, and thus make excellent candidates for the
Semantic Web.

I’ll begin by describing literature that explains the limitations of the Internet in current form. Tim
Berners-Lee believed the Web should not be made up of interconnected resources, but of
interconnected data extracted from these resources. The bibliography I here present will seek to
develop explanations for the tools of this new trade, which at times are quite technical. These tools
require new language formats and new conceptual systems, which must be integrated with existing
systems of metadata description commonly employed in library systems. We’ll review the goals of
these systems, look at how libraries might need to adapt in order to instantiate these new systems, and
what might be required given such adaptation. I’ll argue that libraries, especially digital collections, are
especially well suited to this project due to their having in place a strong syntactic system of
description. While current meta-data standards are easier to adapt, they will still require unexpected and
difficult organizational changes within libraries.
Stephen J. Stose                                                                                              2
12/22/10



       World Wide Web Consortium (2010). Semantic Web. Retrieved April 11, 2010, from:
http://www.w3.org/standards/semanticweb/

        This is the international standards organization for the World Wide Web, and is headed by Sir
Tim Berners-Lee. The organization ensures industry members agree on standardized versions of web
technology such that discrepancies and inconsistency does not plague the web and web browser
technology. W3C publishes a web page that publishes standardization recommendations and rules, and
therefore also functions as a portal to learning about new technologies. Indeed, the Semantic Web is an
integral part of W3C’s mission to develop the Web into a universal medium for knowledge exchange,
and contains future project possibilities still unrealized and formal specifications for others currently
underway. For instance, it is a principle source in the field regarding the specifications for RDF
(Resource Description Framework), many of the data exchange protocols (RDF/XML, N3, Turtle), as
well as notation schemas (RDF Schema) and ontology languages, such as OWL. Some and more of
these technologies will be discussed below. W3C remains the best place on the web still for
information regarding these issues, and hence I cite it here first.


       Coyle, K. (2008). Meaning, Technology, and the Semantic Web. Journal of Academic
Librarianship 34(3), 263-264.
         Coyle discusses the fundamentals of the Semantic Web as per Tim Berners-Lee’s vision. This
article provides a basic framework for understanding how the Internet, as a repository of unstructured
and undifferentiated string of texts, can be harnessed with meaning that machines can process. It
transforms the “web of documents” into a “web of data” by creating “actionable information.” In
beginner’s language, Berner-Lee discusses how semantic triples combine fields (e.g., Dublin Core
fields, such that Creator = x and Title = y) with a predicate (x is the author of y), such that a human
language (e.g., “who wrote y?”) might generate an answer. The author goes on to discuss how RDF is
written to infer meaning by use of vocabulary patterns (URI’s), and what this might mean for libraries
and digital libraries in particular. The article set the stage for my research into how digital collections
are properly planned to include RDF.


       Miller, E. (Designer). (2001). Digital libraries and the semantic web. [Web]. Retrieved from
http://www.w3.org/2001/09/06-ecdl/Overview.html

        This is another introduction to the Semantic Web and is published on the main consortium
website for the Semantic Web, World Wide Web Consortium (W3C). The 39 slides are lessons that
focus on how the Semantic Web can create “actionable” searches within digital libraries if the digital
library has metadata amenable to an RDF schema of classes and subclasses, and properties and
subproperties. An example is made of this by explaining how the MIT digital repository uses the
DSpace digital management software to combine RDF, and how a formalized reference library such as
www.dmoz.org helps support the merging of distinct ontologies (i.e., vocabulary making up classes and
subclasses). This is a very basic introduction, but does refer to many working examples.
Stephen J. Stose                                                                                              3
12/22/10


       Manola, F. & Miller, E. (2004, February 10). RDF Primer. Retrieved from
http://www.w3.org/TR/2004/REC-rdf-primer-20040210/

        This is a comprehensive source to understanding RDF. Resource Description Framework is a
language that serves to represent information about resources spread throughout the World Wide Web.
RDF serves to make documents readable by machines, not just people, by describing resources using
simple properties and property values (i.e., triples). This document is a technical primer. It explains the
technical details of RDF and the RDF/XML that structures these values in terms of triples, or subject-
predicate-object propositions that become machine-readable. It explains how the Web is made up of
URLs and more general URIs. We are familiar with URLs, but Uniform Resource Identifiers can be
created to allow these triples to identify the subjects, predicates and objects in these propositions. These
URIs are often just fragments, preceded by #, trailing the URL. The triples should be expressed in a
common vocabulary, which is constructed into an ontology, something we’ll be discussing more in this
bibliographic review. This resource really explains the technical mechanisms involved, and does not
just refer to using the Semantic Web for digital libraries, but for the Web in general.


       Macgregor, G. (2008). Introduction to a special issue on digital libraries and the semantic
web: context, applications and research. Library Review, 57(3), 173-177. doi:
10.1108/00242530810865457.
        This article provides some context to our review regarding the integration of the Semantic Web
into digital libraries. It describes the Semantic Web as allowing machines to read the web, much like
humans can. Alternatively, and more specifically, the Semantic Web seeks to include metadata about
the semantics of a web resource (keyword relationships), and not just its syntax (keyword mapping).
Currently, digital libraries describe their digital objects with fields, but do these fields relate to one
another semantically? And do they relate to other digital libraries semantically, such that “author”
might also be understood as “creator” or “100 $a Melville, Herman” (in MARC21). This article
introduces the Semantic Web to digital curators by providing basic level explanation regarding its
potential for use. The Semantic Web promotes better collaboration and exposure of digital objects;
enhances navigation and retrieval amongst heterogeneous environments, allows user profiling and
personalization, and improves interfaces and human-computer interaction. This article seeks to
introduce librarians to the Semantic Web, serves as an introduction to it use, seeks to allay fears for its
integration, and attempts to make the topic more accessible than it is perceived to be.


    Fox, E. a., & Marchionini, G. (1998). Toward a worldwide digital library.
Communications of the ACM, 41(4), 29-32. doi: 10.1145/273035.273043.
        The article outlines how digital libraries constitute very complex information systems in that
they involve collaboration, preservation, database management, instruction and learning, filtering and
retrieval, property rights, multimedia services, selection/collation, and reference and discovery. These
systems augment the physical objects they represent, and as such are available worldwide, thereby
opening up to a range of uses and users for global exchange and international understanding. A key to
harnessing their power and linking these nationalized resources (the author uses examples of national
federated digital libraries in many countries), however, is interoperability. Linking these resources
require 1) technical interoperability (networks & protocols), 2) informational interoperability
(language, metadata, naming conventions and software interfaces) and 3) social interoperability
(personal and organizational rights/responsibilities). These difficulties are discussed in a series of
Stephen J. Stose                                                                                               4
12/22/10


articles that discuss Z39.50, multilingual support, community and government integration, and the
collaborative design of design interfaces and query formulation. The article is useful insofar as it serves
as a useful reminder of the goals of designing a digital library, and that technology is not to be
confused for more than a tool.


       Lytras, M., Sicilia, M., Davies, J., & Kashyap, V. (2005). Digital libraries in the knowledge
era: Knowledge management and Semantic Web technologies. Library Management, 26(4/5), 170-
175. doi: 10.1108/01435120510596026.
        These authors describe the Semantic Web as an extension of the current “metadata intensive”
fields of digital library and digitization programs. The commitment digital libraries keep to a formal
foundation of metadata provides a distinct advantage over traditional HTML markup (which has few
‘meta’ tags). While metadata production and maintenance is already time-consuming, integrating these
services with annotations that refer to shared ontologies may indeed create more complexity, but much
less than involved with the Internet in general. This paper is the first paper in a special issue of Library
Management, and serves to summarize many of the more technical papers that follow. Many of these
papers we cite here in the current annotated bibliography, including Sure & Studer (2005), as well as
Ferran, Mor, & MinguillĂłn (2005).


     Sure, Y., & Studer, R. (2005). Semantic Web technologies for digital libraries. Library
Management, 26(4/5), 190-195. doi: 10.1108/01435120510596044.
        This article discusses the fact that computing standards specifying structure are falling into
place (e.g., eXtensible markup language). While this alphabet (i.e., structure) has been successful at
allowing different forms of interoperability, it was invented before the invention of a common
language. That is, the Internet still lacks semantic standards. HTML may function well for human
consumption (humans can read the documents) they lack ontologies that provide meaning to the
structure, and hence allow humans and computers to communicate. Ontologies are networks of classes
and subclasses interlinked to describe some domain of interest. As such, they allow inferences, in much
the same way as knowing something is a mammal (class) allows us to know that it produces milk
(subclass), or likely has hair (subclass), or itself is a subclass belonging to the class of animals. The
article explains how Semantic Web technologies might help digital libraries. If the descriptions we use
for objects and repositories utilize standardized ontologies, then this would make it easier for
computerized language to map onto the vagaries of human language. Such queries would also enable
consistent and coherent access to classes and subclasses of digital objects distributed across many
different repositories (i.e., interoperability). The article goes on to discuss how librarians need to
become proficient in using ontology editors such as OntoEdit, KAON, or ProtĂŠgĂŠ, and annotation tools
such as Annotea, OntoMat-Annotizer (CREAM), and KIM; as well as inference engines, which allow
machines, from these descriptions and annotations of objects, to process them logically (if mammal,
then gives milk and has hair). It ends with a description of the Semantic Web layer cake by Time
Berners-Lee.

       Brewster, C., & Ohara, K. (2007). Knowledge representation with ontologies: Present
challenges—Future possibilities. International Journal of Human-Computer Studies, 65(7), 563-
568. doi: 10.1016/j.ijhcs.2007.04.003.
         This is an excellent introduction into the world of ontology construction for representing human
Stephen J. Stose                                                                                            5
12/22/10


concepts and knowledge structures in computers. It begins with laying out philosophical assumptions
regarding the nature of knowledge—whether it be facts layered like bricks over time (Rationalists and
Empiricist), or paradigmatic shifts in knowledge that require at times historical re-writes (Quine and
Kuhn). In either case, an ontology has no truth-value; and it represents concepts, not words, even if
humans have to use words to represent concepts. This latter concern is the very reason why we must
standardize our vocabularies, and the second reason is why no one ontological structure permits perfect
external fidelity. Ontologies are representations that by necessity imperfectly represent the external
world. The very fact that institutions utilize different ontologies (whether implicitly or explicitly) is the
reason why knowledge sharing (i.e., interoperability) is problematic. The article discusses how
explicitly representing an institution’s ontology allows for efficient computation by providing for a
way to communicate with machines, given that the ontology itself is one of form of human expression.
The article also argues against the idea that user-generated folksonomies (i.e., social tagging) can
function as ontologies. The article serves as an introduction to further articles in the same issue, those
being focused around OWL and decidability, the augmentation of ontologies, whether ontologies can
represent common sense and ordinary language, and the difficulties of modeling different kinds of
knowledge in different forms (e.g., narratives, multimedia, and enterprises). This is a great
introduction to the current state of the art, even if much of the latter half of the article merely
summarizes difficult literature within the same volume.


       Du, T. C., Li, F., & King, I. (2009). Managing knowledge on the Web – Extracting
ontology from HTML Web. Decision Support Systems, 47(4), 319-331. Elsevier B.V. doi:
10.1016/j.dss.2009.02.011.
         Ontologies need not be human generated. In this article, an ontology extractor called
OntoSpider is unleashed. Given that the Web is multitudes larger than the collection of digital libraries,
this article provides interesting hope. If the Web itself can be crawled for abstract class-subclass,
property-subproperty relationships, and a machine can build ontologies that represent these
relationships, it seems all the more likely that the hightly structured data librarians create might be
much easier to represent as ontologies. The HTML Web has little meta-description, whereas digital
librarians follow metadata standards that, even if only loosely related to the next, do describe objects in
similar ways (e.g., author vs. creator vs artist etc…). By describing how millions of unstructured
HTML pages can be converted into the Semantic Web by an extractor requiring very little human-
initiated knowledge engineering, the article had the effect of making the integration of digital libraries
seem much simpler. Digital libraries are already quite structured with metadata, even if only syntactic
metadata. HTML web pages, on the other hand, contain little syntactic or semantic cues. Thus, the
article discusses how first the HTML page itself has to be dismantled, simplified and annotated before
an ontology that seeks to represent its concepts semantically can even be constructed. Digital libraries
are, for this reason, much better candidates for the Semantic Web, as the syntactic work is nearly
complete. This is a fantastic introduction to the way the Internet functions, and can be understood with
little mathematics.


      Pulido, J., Ruiz, M., Herrera, R., Cabello, E., Legrand, S., Elliman, D., et al. (2006).
Ontology languages for the semantic web: A never completely updated review. Knowledge-Based
Systems, 19(7), 489-497. doi: 10.1016/j.knosys.2006.04.013.
      Pulido et al. describe in detail the different ways currently available to describe human
knowledge. Different semantic web languages have been created over the past few years that seek to
Stephen J. Stose                                                                                           6
12/22/10


enable machines to think. This implies machines can make inferences, but for this to occur, frameworks
are required that support the creation of ontologies. This article describes these tools, from early tools
such as KIF (Knowledge Interchange Format), F-Logic and Dublin Core to more recent formats such as
XML, RDF, Knowledge Annotation methods, OIL (Ontology Interchange Language) and DARPA
Agent Markup Language (DAML), and OWL (Ontology Web Language). These approaches overlap
and have different functions, and the authors describe the “semantic web ontology creation process”
and which tools are involved in each step. The steps are: gathering, extraction, organization,
refinement, merging and retrieval, all occurring for a given domain of knowledge. This article is
fantastic reference to the tools of the trade, describing which tools are used for which task.


        Feng, L. (2005). Beyond information searching and browsing: acquiring knowledge from
digital libraries. Information Processing & Management, 41(1), 97-120. doi:
10.1016/j.ipm.2004.04.005.
         A digital library system should empower its user to locate information to solve problems. Most
digital library architecture focuses on searching and browsing, a tactical strategy the authors contrast
with more strategic information seeking behavior. The tactical strategy locates documents, whereas a
digital library system that takes information-seeker’s strategy into account helps him or her locate
intelligent answers to questions. It involves the user’s hypotheses and pre-existing knowledge base in
that domain. The authors describe a dual level digital library where tactical document-keyword
matching is augmented with hypothesis-question “mapping”. In this way, higher-order cognitive
questions such as “tell me whether x will cause y” or “tell me what will possibly cause y, give me
referent articles that talk about this” produces answers and justifications for these answers. In this case,
the justification is a series of articles—and perhaps even the part of the article relevant—offered as
evidence for the hypothesized relationship. This is the very part of the Semantic Web that gets me, a
cognitive scientist, very excited. The article does go on to formally describe how this might work,
which does involve a great deal of set theory and higher-order logic.


       Burke, M. (2009). The semantic web and the digital library. Aslib Proceedings, 61(3), 316-
322. doi: 10.1108/00012530910959844.
         This article is an attempt by a librarian to awaken her field to the power of the Semantic Web
for digital libraries. She begins by discussing how ontologies enable the creation of equivalencies
between the metadata between two or more institutional collections. Web 2.0 projects such as FOAF
(friend of a friend, www.foaf-project.org), SIOC (semantically interlinked online communities,
http://sioc-project.org), DBpedia (http://wiki.dbpedia.org), and Musicbrainz (http://musicbrainz.org) all
allow users to link data objects across many different kinds of databases, and thereby foster data
integration. Of course, these are limited, as their ontologies only relate to specific subject domains.
Libraries have only begun to consider the potential of harvesting the data (i.e., digital objects
undergoing digitization) under their control. As one example, the author discusses in some detail
JeromeDL (www.jeromdl.org) as a “social semantic digital library” that integrates many of these new
Semantic Web features into a library’s existing library management system. The article is extremely
basic and does a poor job of explaining what the Semantic Web really is, but does cite some rich case
studies such as the few above, as well as www.talis.com, a group specializing in semantically rich
metadata. After reading articles like that of Feng above, librarians really have their work cut out for
them.
Stephen J. Stose                                                                                          7
12/22/10


      Greenberg, J. (2007). Advancing the Semantic Web via Library Functions Advancing the
Semantic Web via Library Functions. Imprint, (906118602). doi: 10.1300/J104v43n03.
        This oft-cited article in the librarian community is an excellent introduction to how the library
and librarians already encompass the skills and knowledge for semantic web technology, albeit in
different form. She argues that more attention to planning and policy in libraries will accelerate
development in the Semantic Web, instead of merely focusing on folksonomies and social tagging as
its recent tendency seems to primarily exemplify. In doing so, she lists many similarities between the
Semantic Web and library functions. For example, “collection development” might translate to
“semantic web selection” and “cataloging” to “semantic representation”; “reference” to “semantic web
services” and “circulation to “web resource usage”. She then refers to a gap between the Semantic Web
and library communities, and how the library community has been slow to adapt compared to computer
scientists, psychologists, scientists, linguists, and especially the bio-medical community. She
recommends librarians get involved more and learn the necessary technologies, but also that the
Semantic Web community recognize librarian expertise, especially in regards to cataloging. This is a
well-structured and well-written paper, but I find its metaphors unconvincing, if not rather banal. It
does serve as an effective wake-up call for established librarians buried in their enclaves, falling behind
in opening their collections through the advances in information science.


      Joint, N. (2008). The practitioner librarian and the semantic web: ANTAEUS. Library
Review, 57(3), 178-186. doi: 10.1108/00242530810865466.
This article attempts to show the enormous potential the Semantic Web has for library science, and in
doing so demystify it’s apparent complexity. It also was quite helpful in understanding some of the
finer mechanics of RDF. Just as CSS stylesheets separated style from content, Tim Berner-Lee
describes the underlying “data structures” as having a similar modularity librarians are familiar with. If
“resource description” in RDF sounds like cataloging or bibliographic description, it is. RDF is indeed
a mark-up language into which the Dublin Core, a cataloging standard, fits. This standard both
describes the resource and allows meaningful relationships to other resources to be drawn. In this
sense, the author describes how this fact serves to distribute it beyond the traditional library’s
centralized philosophy. Whereas the earlier Web (HTML and CSS) did not call this conception of a
“library as a closed application” into question, RDF as a description framework presumes opening
these resources to the wider and less-structured world of the “data web”. This is a nice helpful article
by a librarian with a healthy attitude to the current crisis librarianship faces.


       Krause, J. (2008). Semantic heterogeneity: comparing new semantic web approaches with
those of digital libraries. Library Review, 57(3), 235-248. doi: 10.1108/00242530810865501.
        This article provides a useful and cogent account of the basic differences implied in the
previous Joint (2008) article cited above. He compares the “Shell Model” of centralized digital libraries
and database catalogs (the “invisible web”) to the Semantic Web. The Shell Model homogenizes search
terms within a centralized location, such that heterogeneous documents described using a standardized
thesaurus can be integrated. Thesauri usually are criticized for their shallowness and limitations on
relations, while being easier to index. The article attempts to describe how the Semantic web
overcomes these limitations if the ontologies used to describe heterogeneous domains (e.g., psychology
vs. gerontology) utilize a standardized vocabulary. This article is unnecessarily difficult, not due to
formal language (e.g., mathematics or logic), but due to its rather unorganized and jargon-laden way of
describing these systems at a very abstract level. The computer science articles, even if I don’t follow
Stephen J. Stose                                                                                            8
12/22/10


the mathematics, were refreshing in comparison. Nevertheless, the article discusses very important
differences, and is very much worth a second look.

      Ferran, N., Mor, E., & MinguillĂłn, J. (2005). Towards personalization in digital libraries
through ontologies. Library Management, 26(4/5), 206-217. doi: 10.1108/01435120510596062.
        This article describes the requirements for building a complete navigational profile of users in
their process of searching and browsing a digital library as part of a distance learning component (e.g.,
a history class). As such, it provides a concrete case study of how the Semantic Web might serve to
link a digital library’s educational goals by integrating its use into a specific e-learning environment
(The Universitat Oberta de Ctalunya (UOC) Virtual Library). These personalization initiatives allow a
digital collection to adapt to user necessities and preferences by mapping these parameterized factors
(user profiles, navigational profiles, and user actions) onto an ontology that describes in detail the
various possible scenarios. Thus, when users seek similar types of information, this behavior will
invoke stored processes (i.e., inferences) within the ontology and thereby make use easier. For instance,
exploratory navigation and goal-oriented navigation strategies can be detected and different sets of
recommendations for the digital library’s use can be formulated accordingly. The same system can
detect whether a user is a teacher, student or researcher, and might tailor its recommendations as such.
While I question the usefulness of such a system (machines are quite bad generally at second-guessing
human intention), and myself would find it obnoxious and counter-productive, it is one way of
integrating the Semantic Web into digital collections and more generally education. It provides real-
world application to what often begins to appear a rather abstract field (the Semantic Web in digital
libraries).


       Bygstad, B., Ghinea, G., & KlĂŚboe, G. (2009). Organisational challenges of the semantic
web in digital libraries: a Norwegian case study. Online Information Review, 33(5), 973-985. doi:
10.1108/14684520911001945.
         This is a comprehensive case study of the National Library of Norway. It analyzes two sources
of information in search of understanding the potential impact semantic web technology might have on
digital libraries: 1) the digitization project of the National Library, and 2) interviews with nine different
stakeholders of this project. The study focuses on the strategic, organizational, and technological
aspects of implementing the semantic web in digital libraries. It found few technological hurdles in
implementation, moderate strategic issues, and high organizational costs. At a strategic level, upper-
level management has to generate organization-wide support for the initiative, which will involve some
changes in the organizational structure in order to augment metadata teams with cross-organizational
ontology-engineering infrastructure. This implies large costs to the organization, as inter-organizational
and cross-organizational structures will need to be instituted to address issues in ontology engineering.
The main lesson the article teaches is that ontology engineering and metadata production will go hand
in hand, and shake up the institutional structure in unexpected ways.


       Fuentes-Lorenzo, D., Morato, J., & GĂłmez, J. M. (2009). Knowledge management in
biomedical libraries: A semantic web approach. Information Systems Frontiers, 11(4), 471-480.
doi: 10.1007/s10796-009-9159-y.
       Biomedicine has been pivotal in the development of the semantic web and digital library
technology. Technical programs in bio-informatics abound. The huge amount of data in the life
Stephen J. Stose                                                                                          9
12/22/10


sciences, not limited to but certainly in part due to the human genome project, has bred researchers in
and around this field towards more efficient data-sharing and data-processing solutions. Extending
these applications with metadata information embedded in ontologies will provide both human users a
firmer understanding of the biomedical elements they seek to process, while allowing machines to use
its formal semantic properties in order to support this reasoning. This article seeks to present design
principles for a simple and “easy-to-use” biomedical Semantic Web. One penetrating example is how
drugs have many different names: Tylenol most of us know, but if the machine is to help, then higher
order terms within this Semantic Web will connect this term within its ontology to the same concepts
that use different words: “DCI”, “acetaminophen” or “NO2 BE01” are other monikers. The article
states that there are on average 19 synonyms for every gene. This article is a wonderful case study for
explicating how the Semantic Web can come alive. Special attention is given to a faceted search
mechanism that allows filtering through the ontology (represented simply in a sidebar) to add
restrictions to result pages, especially useful given these synonyms that operate at different research
levels (physics to chemistry to biology to pharmacology).


       Prasad, a., & Madalli, D. P. (2008). Faceted infrastructure for semantic digital libraries.
Library Review, 57(3), 225-234. doi: 10.1108/00242530810865493.
        Context based retrieval is sadly missing from many interfaces. Top-down models that prioritize
the user’s experience and allow customization require a bottom-up semantic infrastructure. The article
stresses that a “digital library is only as good as its retrieval efficiency”. Given that most digital
libraries inherited the bibliographic database format with search and browse features only, terms are
often pulled from their context. Metadata is really just knowledge, and knowledge is, as described by
Berner-Lee (see Coyle, 2008, above), just “actionable knowledge”. In most libraries, catalogs and
classification schemes allow only “post-coordinate indexing”, such that search terms connected by
Boolean operators strip resources from their context and place their referents in domain-general
indexes (i.e., indexes with no context). Pre-coordinate indexing (or context-dependent indexing), on the
other hand, includes a term’s semantic context (is domain-specific). This is not easy, as one topic has
many different perspectives, or facets, each that orders a given set of concepts into different class-
subclass ontologies (e.g., think about Ranganathan’s faceted cataloging). The order of these terms
might therefore become part of the interface and as such, the upper and lower classes of each term
contextualize browsing. If I am interested in harvesting, and perhaps sugarcane harvesting, I don’t
want to enter sugarcane :: harvesting space, but instead the harvesting :: sugarcane space, which will
open my browsing up to many other kinds of elements harvested (instead of many different properties
of sugarcane), depending on my intentions (i.e., my hypotheses).


       Krestel, R., Kappler, T., & Lockemann, P. C. (2010). Converting a historical architecture
encyclopedia into a semantic knowledge base. IEEE Intelligent Systems.
        This ambitious project describes digitizing one of the most respected, yet poorly indexed,
sources on architecture (The Handbook of Architecture, 1880, containing over 25,000 pages), into a
fully indexed and semantically annotated digital library using the MediaWiki (www.mediawiki.org).
After digitization, each section (within a chapter) was converted into a wiki page, which was contained
within another wiki page comprising the chapter, and so on into subsections. The Beta version is
located here: http://durm.semanticsoftware.info/wiki/index.php/Hauptseite. Each wiki page is fully
searchable and indexed, and also each is associated with the digitized original. Natural language
extraction technologies, including logical and morphological analysis, and information extraction,
Stephen J. Stose                                                                                         10
12/22/10


allowed for many features to be developed. Most notably among these is “automatic summarization”,
which condenses large amounts of text into abstract summaries for readers. While very detailed, this
system allowed for the eventual construction of ontologies from the concepts themselves extracted, and
each of these were linked to the OWL vocabulary for the Semantic Web. This was written for a general
audience, but still includes a detailed account of a very ambitious project.


      Madalli, D. P., & Suman, A. (2008). UML for the conceptual web. Online Information
Review, 32(4), 511-515. doi: 10.1108/14684520810897386.
        UML stands for Unified Modeling Language. It represents an attempt to develop a faceted
model based on ontologies that are organized according to the way they are in the human mind (a rather
bold statement in this article…). That is, while an ontology is method for structuring concepts in
classes and subclasses, the structure of which do contain rule systems, this article attempts to create
even more constraints on this conceptual network through a developed system of axioms. The article
serves as a proposal and brief literature review of recent attempts to apply these ideas. It explores them
with the goal of developing faceted search/browse features in a user-friendly interface. It also seeks to
contain within this architecture different domains of knowledge inter-operable with the next, such that
subject-experts can continually modify and add to the existing digital library. In their proposal, the
authors suggest that the five fundamental categories of Ranganathan might be used for a fundamental
basis for each facet: personality, matter, energy, space and time.


        Mayr, P., Mutschke, P., & Petras, V. (2008). Reducing semantic complexity in distributed
digital libraries: Treatment of term vagueness and document re-ranking. Library Review, 57(3),
213-224. doi: 10.1108/00242530810865484.
         One-stop academic federated search portals are becoming increasingly common. Examples
include Elsevier’s Scirus portal, the Online Computer Library Center WorldCat catalog, or the Perseus
project of Tufts University. This article examines the Vascoda portal in Germany, a federated interface
of full-text article databases, library catalogs, and Internet resource collections. Searching these
resources results, however, only allows instances of the terms to be mapped. Large databases
nevertheless provide many results, leaving the impression that it misses few documents (Type-II
errors). These kinds of errors, due to the ambiguity and vagueness in human language, become more
problematic in more specified document repositories, or in collections spanning different databases
with different metadata schemes. This study attempts to deal with this issue of language vagueness by
re-ranking documents according to two parameters. One method is to “bradfordize” results, which
applies the Bradford Law of Scattering. This ranks articles of core journals ahead of the journals that do
not usually deal with the same topic as much. The other method is to rank authors depending on how
prominently they occur (i.e., are cited) within other similar publications (i.e., co-authorship networks).
This article thus takes concepts of the Semantic Web in order to alter ranking algorithms for search
terms.

       Barbera, M., Nucci, M., Hahn, D., & Morbidoni, C. (2008). A Semantic Web Powered
Distributed Digital Library System. Electronic Publishing, (June), 130-139.
       Given the ever-expanding set of resources available on-line, tools that allow for the intelligent
search of these resources grows. This article presents Talia, a distributed semantic digital library with
an annotation system especially tailored to research in philosophy. In a sense, Talia is a digital archive
Stephen J. Stose                                                                                         11
12/22/10


management system, much like ContentDM or Omeka functions, but contains within its suite tools
developed especially for the Semantic Web. It this sense it combines a digital archive management
system with en electronic publishing system (an on-line peer review system), each resource utilizing a
distinct and stable URI. It is written using Ruby-on-Rails and the knowledge base of unchanging URIs
are organized in RDF by using the RDFS/OWL vocabularies to describe its ontologies. Any digital
library published with Talia can be interconnected which allows cohesive scholarly communities to
interact without giving up control over one’s own content. This is a fantastic system, one I wish were
available in PHP, such that digital libraries developed in Drupal (my own recent work) might be ported
as such. I reviewed in detail one such digital library that utilizes Talia, the Discovery Project, in our
class blog: http://jahurst.mysite.syr.edu/ist677s2010/?p=229.


        Hull, D., Pettifer, S. R., & Kell, D. B. (2008). Defrosting the digital library: bibliographic
tools for the next generation web. (J. McEntyre). PLoS Computational Biology, 4(10), e1000204.
doi: 10.1371/journal.pcbi.1000204.
        This review discusses the current digital libraries in use by computational biologists. This
includes PubMed, IEEE Xplore, the ACM digital library, ISI Web of Knowledge, Scopus, Citeseer,
arXive, DBLP and Google Scholar. These tools are described as being “cold” to the user, and are
contrasted with newer “warmer” tools such as Zotero, Mekentosj Papers, MyNCBI, CiteULike,
Connotea, HubMed, and Mendeley. These latter versions take advantage of the social web in order to
make these digital libraries more accessible, friendly, and personal. I myself use CiteULike and
Mendeley. In fact, to get an idea of how Mendely works, please visit my profile, in which I share with
you all of the articles used in this current bibliographic review.


        http://www.mendeley.com/research-papers/collections/2216691/The-Semantic-Web-in-Digital-
Libraries/


        This resource comes with an application I utilize on my desktop that essentially provides an
annotative interface for my personal library of PDF documents. These PDFs are automatically OCR’d
and hence searchable. This allows for me to comment and/or annotate parts of each paper as well,
within the document, as well as link to OpenOffice to format bibliographies in various styles. The local
interface can be synchronized at the touch of a button with the web interface, such that I can share my
bibliographies with selected users, or even make them public, as I have with this collection at the URL
above. I am still learning about how to use this resource, but using it to manage aspects of the current
annotated bibliography as part of the assignment was rather enlightening. It being OpenSource is
extremely valuable, as I have used EndNote and RefWorks at various times and lose my organization
due to proprietary licensing; and hence lost my bibliographies.

More Related Content

What's hot

Semantic Mapping and LOD prez
Semantic Mapping and LOD prezSemantic Mapping and LOD prez
Semantic Mapping and LOD prezCarol Chiodo
 
Web of Data as a Solution for Interoperability. Case Studies
Web of Data as a Solution for Interoperability. Case StudiesWeb of Data as a Solution for Interoperability. Case Studies
Web of Data as a Solution for Interoperability. Case StudiesSabin Buraga
 
Semantic web Document
Semantic web DocumentSemantic web Document
Semantic web Documentap
 
Folksonomies: a bottom-up social categorization system
Folksonomies: a bottom-up social categorization systemFolksonomies: a bottom-up social categorization system
Folksonomies: a bottom-up social categorization systemdomenico79
 
DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0John Breslin
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
 
Challenging Issues and Similarity Measures for Web Document Clustering
Challenging Issues and Similarity Measures for Web Document ClusteringChallenging Issues and Similarity Measures for Web Document Clustering
Challenging Issues and Similarity Measures for Web Document ClusteringIOSR Journals
 
2015 07-tuto3-mining hin
2015 07-tuto3-mining hin2015 07-tuto3-mining hin
2015 07-tuto3-mining hinjins0618
 
Linked data and the future of libraries
Linked data and the future of librariesLinked data and the future of libraries
Linked data and the future of librariesRegan Harper
 
Linked Data for Czech Legislation
Linked Data for Czech LegislationLinked Data for Czech Legislation
Linked Data for Czech LegislationMartin Necasky
 
Semantic Libraries: the Container, the Content and the Contenders
Semantic Libraries: the Container, the Content and the ContendersSemantic Libraries: the Container, the Content and the Contenders
Semantic Libraries: the Container, the Content and the ContendersStefan Gradmann
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalMauro Dragoni
 
Semantic Security : Authorization on the Web with Ontologies
Semantic Security : Authorization on the Web with OntologiesSemantic Security : Authorization on the Web with Ontologies
Semantic Security : Authorization on the Web with OntologiesAmit Jain
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic WebMarin Dimitrov
 
Semantic Search with Semantic Web
Semantic Search with Semantic WebSemantic Search with Semantic Web
Semantic Search with Semantic WebZahra Sadeghi
 

What's hot (20)

Semantic Mapping and LOD prez
Semantic Mapping and LOD prezSemantic Mapping and LOD prez
Semantic Mapping and LOD prez
 
Semantic
SemanticSemantic
Semantic
 
Web of Data as a Solution for Interoperability. Case Studies
Web of Data as a Solution for Interoperability. Case StudiesWeb of Data as a Solution for Interoperability. Case Studies
Web of Data as a Solution for Interoperability. Case Studies
 
Semantic web Document
Semantic web DocumentSemantic web Document
Semantic web Document
 
Folksonomies: a bottom-up social categorization system
Folksonomies: a bottom-up social categorization systemFolksonomies: a bottom-up social categorization system
Folksonomies: a bottom-up social categorization system
 
DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0DM110 - Week 10 - Semantic Web / Web 3.0
DM110 - Week 10 - Semantic Web / Web 3.0
 
Linked Data to Improve the OER Experience
Linked Data to Improve the OER ExperienceLinked Data to Improve the OER Experience
Linked Data to Improve the OER Experience
 
Library Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic ControlLibrary Linked Data and the Future of Bibliographic Control
Library Linked Data and the Future of Bibliographic Control
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
 
Challenging Issues and Similarity Measures for Web Document Clustering
Challenging Issues and Similarity Measures for Web Document ClusteringChallenging Issues and Similarity Measures for Web Document Clustering
Challenging Issues and Similarity Measures for Web Document Clustering
 
2015 07-tuto3-mining hin
2015 07-tuto3-mining hin2015 07-tuto3-mining hin
2015 07-tuto3-mining hin
 
Linked data and the future of libraries
Linked data and the future of librariesLinked data and the future of libraries
Linked data and the future of libraries
 
Linked Data for Czech Legislation
Linked Data for Czech LegislationLinked Data for Czech Legislation
Linked Data for Czech Legislation
 
Semantic Libraries: the Container, the Content and the Contenders
Semantic Libraries: the Container, the Content and the ContendersSemantic Libraries: the Container, the Content and the Contenders
Semantic Libraries: the Container, the Content and the Contenders
 
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information RetrievalKeystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
Keystone Summer School 2015: Mauro Dragoni, Ontologies For Information Retrieval
 
Semantic Security : Authorization on the Web with Ontologies
Semantic Security : Authorization on the Web with OntologiesSemantic Security : Authorization on the Web with Ontologies
Semantic Security : Authorization on the Web with Ontologies
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Sanderson Shout It Out: LOUD
Sanderson Shout It Out: LOUDSanderson Shout It Out: LOUD
Sanderson Shout It Out: LOUD
 
Ziegler Open Data in Special Collections Libraries
Ziegler Open Data in Special Collections LibrariesZiegler Open Data in Special Collections Libraries
Ziegler Open Data in Special Collections Libraries
 
Semantic Search with Semantic Web
Semantic Search with Semantic WebSemantic Search with Semantic Web
Semantic Search with Semantic Web
 

Viewers also liked

ΟΝΝΕΔ - Οι προτάσεις μας για τα ΤΟΣΥΝ
ΟΝΝΕΔ - Οι προτάσεις μας για τα ΤΟΣΥΝΟΝΝΕΔ - Οι προτάσεις μας για τα ΤΟΣΥΝ
ΟΝΝΕΔ - Οι προτάσεις μας για τα ΤΟΣΥΝvmamatsios
 
Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning  	Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning   sstose
 
Disruptive technologies: Prediction or just recommendations?
Disruptive technologies: Prediction or just recommendations?Disruptive technologies: Prediction or just recommendations?
Disruptive technologies: Prediction or just recommendations?sstose
 
Nyt p-ga-01 prosedur teknologi informasi
Nyt p-ga-01  prosedur teknologi informasiNyt p-ga-01  prosedur teknologi informasi
Nyt p-ga-01 prosedur teknologi informasiAmelia Fitri
 
Government Information
Government InformationGovernment Information
Government Informationsstose
 
Data Breaches
Data BreachesData Breaches
Data Breachessstose
 
A comparison of two digital libraries based on pre-established criteria
A comparison of two digital libraries based on pre-established criteriaA comparison of two digital libraries based on pre-established criteria
A comparison of two digital libraries based on pre-established criteriasstose
 
Christine Madsen interview
Christine Madsen interviewChristine Madsen interview
Christine Madsen interviewsstose
 

Viewers also liked (8)

ΟΝΝΕΔ - Οι προτάσεις μας για τα ΤΟΣΥΝ
ΟΝΝΕΔ - Οι προτάσεις μας για τα ΤΟΣΥΝΟΝΝΕΔ - Οι προτάσεις μας για τα ΤΟΣΥΝ
ΟΝΝΕΔ - Οι προτάσεις μας για τα ΤΟΣΥΝ
 
Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning  	Web classification of Digital Libraries using GATE Machine Learning  
Web classification of Digital Libraries using GATE Machine Learning  
 
Disruptive technologies: Prediction or just recommendations?
Disruptive technologies: Prediction or just recommendations?Disruptive technologies: Prediction or just recommendations?
Disruptive technologies: Prediction or just recommendations?
 
Nyt p-ga-01 prosedur teknologi informasi
Nyt p-ga-01  prosedur teknologi informasiNyt p-ga-01  prosedur teknologi informasi
Nyt p-ga-01 prosedur teknologi informasi
 
Government Information
Government InformationGovernment Information
Government Information
 
Data Breaches
Data BreachesData Breaches
Data Breaches
 
A comparison of two digital libraries based on pre-established criteria
A comparison of two digital libraries based on pre-established criteriaA comparison of two digital libraries based on pre-established criteria
A comparison of two digital libraries based on pre-established criteria
 
Christine Madsen interview
Christine Madsen interviewChristine Madsen interview
Christine Madsen interview
 

Similar to The Semantic Web in Digital Libraries: A Literature Review

An Annotation Framework For The Semantic Web
An Annotation Framework For The Semantic WebAn Annotation Framework For The Semantic Web
An Annotation Framework For The Semantic WebAndrea Porter
 
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...dannyijwest
 
semantic web tech.ppt
semantic web tech.pptsemantic web tech.ppt
semantic web tech.pptNaglaaFathy42
 
A category theoretic model of rdf ontology
A category theoretic model of rdf ontologyA category theoretic model of rdf ontology
A category theoretic model of rdf ontologyIJwest
 
Tutorial on Semantic Digital Libraries (ESWC'2007)
Tutorial on Semantic Digital Libraries (ESWC'2007)Tutorial on Semantic Digital Libraries (ESWC'2007)
Tutorial on Semantic Digital Libraries (ESWC'2007)Sebastian Ryszard Kruk
 
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...Marko Rodriguez
 
Tutorial on Semantic Digital Libraries (WWW'2007)
Tutorial on Semantic Digital Libraries (WWW'2007)Tutorial on Semantic Digital Libraries (WWW'2007)
Tutorial on Semantic Digital Libraries (WWW'2007)Sebastian Ryszard Kruk
 
Semantic Web Nature
Semantic Web NatureSemantic Web Nature
Semantic Web NatureConstantin Stan
 
From Linked Documentary Resources to Linked Computational Resources
From Linked Documentary Resources to Linked Computational ResourcesFrom Linked Documentary Resources to Linked Computational Resources
From Linked Documentary Resources to Linked Computational ResourcesPhiloWeb
 
A Term Based Ranking Methodology for Resources on the Semantic Web
A Term Based Ranking Methodology for Resources on the Semantic WebA Term Based Ranking Methodology for Resources on the Semantic Web
A Term Based Ranking Methodology for Resources on the Semantic WebAaron Huang
 
Semantic Technolgy
Semantic TechnolgySemantic Technolgy
Semantic TechnolgyTalat Fakhri
 
Digital Libraries of the Future
Digital Libraries of the Future
Digital Libraries of the Future
Digital Libraries of the Future Sebastian Ryszard Kruk
 
Porting Library Vocabularies to the Semantic Web - IFLA 2010
Porting Library Vocabularies to the Semantic Web - IFLA 2010Porting Library Vocabularies to the Semantic Web - IFLA 2010
Porting Library Vocabularies to the Semantic Web - IFLA 2010Bernard Vatant
 
What is Linked Data, and What Does It Mean for Libraries?
What is Linked Data, and What Does It Mean for Libraries?What is Linked Data, and What Does It Mean for Libraries?
What is Linked Data, and What Does It Mean for Libraries?Emily Nimsakont
 
Semantic Web Technologies: Changing Bibliographic Descriptions?
Semantic Web Technologies: Changing Bibliographic Descriptions?Semantic Web Technologies: Changing Bibliographic Descriptions?
Semantic Web Technologies: Changing Bibliographic Descriptions?Stuart Weibel
 

Similar to The Semantic Web in Digital Libraries: A Literature Review (20)

An Annotation Framework For The Semantic Web
An Annotation Framework For The Semantic WebAn Annotation Framework For The Semantic Web
An Annotation Framework For The Semantic Web
 
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
Semantic - Based Querying Using Ontology in Relational Database of Library Ma...
 
semantic web tech.ppt
semantic web tech.pptsemantic web tech.ppt
semantic web tech.ppt
 
A category theoretic model of rdf ontology
A category theoretic model of rdf ontologyA category theoretic model of rdf ontology
A category theoretic model of rdf ontology
 
Tutorial on Semantic Digital Libraries (ESWC'2007)
Tutorial on Semantic Digital Libraries (ESWC'2007)Tutorial on Semantic Digital Libraries (ESWC'2007)
Tutorial on Semantic Digital Libraries (ESWC'2007)
 
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...
Neno/Fhat: Semantic Network Programming Language and Virtual Machine Specific...
 
Tutorial on Semantic Digital Libraries (WWW'2007)
Tutorial on Semantic Digital Libraries (WWW'2007)Tutorial on Semantic Digital Libraries (WWW'2007)
Tutorial on Semantic Digital Libraries (WWW'2007)
 
Semantic Web Nature
Semantic Web NatureSemantic Web Nature
Semantic Web Nature
 
From Linked Documentary Resources to Linked Computational Resources
From Linked Documentary Resources to Linked Computational ResourcesFrom Linked Documentary Resources to Linked Computational Resources
From Linked Documentary Resources to Linked Computational Resources
 
A Term Based Ranking Methodology for Resources on the Semantic Web
A Term Based Ranking Methodology for Resources on the Semantic WebA Term Based Ranking Methodology for Resources on the Semantic Web
A Term Based Ranking Methodology for Resources on the Semantic Web
 
Semantic Technolgy
Semantic TechnolgySemantic Technolgy
Semantic Technolgy
 
SNSW CO3.pptx
SNSW CO3.pptxSNSW CO3.pptx
SNSW CO3.pptx
 
Digital Libraries of the Future
Digital Libraries of the Future
Digital Libraries of the Future
Digital Libraries of the Future
 
Lec1.pptx
Lec1.pptxLec1.pptx
Lec1.pptx
 
Porting Library Vocabularies to the Semantic Web - IFLA 2010
Porting Library Vocabularies to the Semantic Web - IFLA 2010Porting Library Vocabularies to the Semantic Web - IFLA 2010
Porting Library Vocabularies to the Semantic Web - IFLA 2010
 
What is Linked Data, and What Does It Mean for Libraries?
What is Linked Data, and What Does It Mean for Libraries?What is Linked Data, and What Does It Mean for Libraries?
What is Linked Data, and What Does It Mean for Libraries?
 
It's all semantics! -The premises and promises of the semantic web
It's all semantics! -The premises and promises of the semantic webIt's all semantics! -The premises and promises of the semantic web
It's all semantics! -The premises and promises of the semantic web
 
Linked library data
Linked library dataLinked library data
Linked library data
 
Semantic Web Technologies: Changing Bibliographic Descriptions?
Semantic Web Technologies: Changing Bibliographic Descriptions?Semantic Web Technologies: Changing Bibliographic Descriptions?
Semantic Web Technologies: Changing Bibliographic Descriptions?
 
Irish Digital Libraries Summit
Irish Digital Libraries SummitIrish Digital Libraries Summit
Irish Digital Libraries Summit
 

Recently uploaded

costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Recently uploaded (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

The Semantic Web in Digital Libraries: A Literature Review

  • 1. Stephen J. Stose 1 12/22/10 The Semantic Web in Digital Libraries: A Literature Review Introduction The Web is a big place. Currently, it provides millions, if not billions of documents, each of which a human can read. Each of these documents has a set of terms that are meant to describe it, and hence make it easier for users to locate. This meta-information allows search engines such as Google to map the keywords users type with the terms describing each document. Humans, however, still have to read through this information and hence determine if each document is indeed relevant to one’s search. In this way, the Internet is structured syntactically. It is structured, but it does not think. I order to think, terms need to be related somehow. “The book is written by Mark Twain”, is a thought that can be represented in propositional space. For those accustomed to searching the Web, writing the terms “book” and “Mark Twain” is usually enough. These terms are usually connected by default with operators such as AND, or for those more advanced users, with more specified operators (OR, or NOT, or the * Wildcard). Given that each resource on the web is merely mapped by terms, the relationship of Mark Twain to Book (“is a book written by”) may be irrelevant to the Google search results we obtain. If another person named Mark Twain had burned a book, however, a new relationship ensues, and suddenly this opens up a complete new conceptual space not excluded by our original search terms. In this way, relating the terms that describe resources by means of both a syntax and semantics facilitates narrowing the huge Web space into the context we so specify. On the one hand, a semantics will allow synonyms for “book” and “Mark Twain” (e.g., “Samuel Clemens”) to be grouped into under one concept. If resources are also described with predicates that entail relationships between entities (“X burned Y” or “X wrote Y”), such relationships will also allow the Web to begin to think. In this way, questions such as “who wrote Tom Sawyer” may locate only resources that provide answers to that question, and exclude others. For this reason, it entails researchers to not only describe documents with descriptors, but also to relate these terms within higher-order representations called ontologies. In this review, I seek to explore these ontologies in a way specifically relevant for digital libraries, not the Web in general. Digital libraries are web resources that have, compared to the Web in general, been described with very detailed metadata. As such, they are extremely well structured, and thus make excellent candidates for the Semantic Web. I’ll begin by describing literature that explains the limitations of the Internet in current form. Tim Berners-Lee believed the Web should not be made up of interconnected resources, but of interconnected data extracted from these resources. The bibliography I here present will seek to develop explanations for the tools of this new trade, which at times are quite technical. These tools require new language formats and new conceptual systems, which must be integrated with existing systems of metadata description commonly employed in library systems. We’ll review the goals of these systems, look at how libraries might need to adapt in order to instantiate these new systems, and what might be required given such adaptation. I’ll argue that libraries, especially digital collections, are especially well suited to this project due to their having in place a strong syntactic system of description. While current meta-data standards are easier to adapt, they will still require unexpected and difficult organizational changes within libraries.
  • 2. Stephen J. Stose 2 12/22/10 World Wide Web Consortium (2010). Semantic Web. Retrieved April 11, 2010, from: http://www.w3.org/standards/semanticweb/ This is the international standards organization for the World Wide Web, and is headed by Sir Tim Berners-Lee. The organization ensures industry members agree on standardized versions of web technology such that discrepancies and inconsistency does not plague the web and web browser technology. W3C publishes a web page that publishes standardization recommendations and rules, and therefore also functions as a portal to learning about new technologies. Indeed, the Semantic Web is an integral part of W3C’s mission to develop the Web into a universal medium for knowledge exchange, and contains future project possibilities still unrealized and formal specifications for others currently underway. For instance, it is a principle source in the field regarding the specifications for RDF (Resource Description Framework), many of the data exchange protocols (RDF/XML, N3, Turtle), as well as notation schemas (RDF Schema) and ontology languages, such as OWL. Some and more of these technologies will be discussed below. W3C remains the best place on the web still for information regarding these issues, and hence I cite it here first. Coyle, K. (2008). Meaning, Technology, and the Semantic Web. Journal of Academic Librarianship 34(3), 263-264. Coyle discusses the fundamentals of the Semantic Web as per Tim Berners-Lee’s vision. This article provides a basic framework for understanding how the Internet, as a repository of unstructured and undifferentiated string of texts, can be harnessed with meaning that machines can process. It transforms the “web of documents” into a “web of data” by creating “actionable information.” In beginner’s language, Berner-Lee discusses how semantic triples combine fields (e.g., Dublin Core fields, such that Creator = x and Title = y) with a predicate (x is the author of y), such that a human language (e.g., “who wrote y?”) might generate an answer. The author goes on to discuss how RDF is written to infer meaning by use of vocabulary patterns (URI’s), and what this might mean for libraries and digital libraries in particular. The article set the stage for my research into how digital collections are properly planned to include RDF. Miller, E. (Designer). (2001). Digital libraries and the semantic web. [Web]. Retrieved from http://www.w3.org/2001/09/06-ecdl/Overview.html This is another introduction to the Semantic Web and is published on the main consortium website for the Semantic Web, World Wide Web Consortium (W3C). The 39 slides are lessons that focus on how the Semantic Web can create “actionable” searches within digital libraries if the digital library has metadata amenable to an RDF schema of classes and subclasses, and properties and subproperties. An example is made of this by explaining how the MIT digital repository uses the DSpace digital management software to combine RDF, and how a formalized reference library such as www.dmoz.org helps support the merging of distinct ontologies (i.e., vocabulary making up classes and subclasses). This is a very basic introduction, but does refer to many working examples.
  • 3. Stephen J. Stose 3 12/22/10 Manola, F. & Miller, E. (2004, February 10). RDF Primer. Retrieved from http://www.w3.org/TR/2004/REC-rdf-primer-20040210/ This is a comprehensive source to understanding RDF. Resource Description Framework is a language that serves to represent information about resources spread throughout the World Wide Web. RDF serves to make documents readable by machines, not just people, by describing resources using simple properties and property values (i.e., triples). This document is a technical primer. It explains the technical details of RDF and the RDF/XML that structures these values in terms of triples, or subject- predicate-object propositions that become machine-readable. It explains how the Web is made up of URLs and more general URIs. We are familiar with URLs, but Uniform Resource Identifiers can be created to allow these triples to identify the subjects, predicates and objects in these propositions. These URIs are often just fragments, preceded by #, trailing the URL. The triples should be expressed in a common vocabulary, which is constructed into an ontology, something we’ll be discussing more in this bibliographic review. This resource really explains the technical mechanisms involved, and does not just refer to using the Semantic Web for digital libraries, but for the Web in general. Macgregor, G. (2008). Introduction to a special issue on digital libraries and the semantic web: context, applications and research. Library Review, 57(3), 173-177. doi: 10.1108/00242530810865457. This article provides some context to our review regarding the integration of the Semantic Web into digital libraries. It describes the Semantic Web as allowing machines to read the web, much like humans can. Alternatively, and more specifically, the Semantic Web seeks to include metadata about the semantics of a web resource (keyword relationships), and not just its syntax (keyword mapping). Currently, digital libraries describe their digital objects with fields, but do these fields relate to one another semantically? And do they relate to other digital libraries semantically, such that “author” might also be understood as “creator” or “100 $a Melville, Herman” (in MARC21). This article introduces the Semantic Web to digital curators by providing basic level explanation regarding its potential for use. The Semantic Web promotes better collaboration and exposure of digital objects; enhances navigation and retrieval amongst heterogeneous environments, allows user profiling and personalization, and improves interfaces and human-computer interaction. This article seeks to introduce librarians to the Semantic Web, serves as an introduction to it use, seeks to allay fears for its integration, and attempts to make the topic more accessible than it is perceived to be. Fox, E. a., & Marchionini, G. (1998). Toward a worldwide digital library. Communications of the ACM, 41(4), 29-32. doi: 10.1145/273035.273043. The article outlines how digital libraries constitute very complex information systems in that they involve collaboration, preservation, database management, instruction and learning, filtering and retrieval, property rights, multimedia services, selection/collation, and reference and discovery. These systems augment the physical objects they represent, and as such are available worldwide, thereby opening up to a range of uses and users for global exchange and international understanding. A key to harnessing their power and linking these nationalized resources (the author uses examples of national federated digital libraries in many countries), however, is interoperability. Linking these resources require 1) technical interoperability (networks & protocols), 2) informational interoperability (language, metadata, naming conventions and software interfaces) and 3) social interoperability (personal and organizational rights/responsibilities). These difficulties are discussed in a series of
  • 4. Stephen J. Stose 4 12/22/10 articles that discuss Z39.50, multilingual support, community and government integration, and the collaborative design of design interfaces and query formulation. The article is useful insofar as it serves as a useful reminder of the goals of designing a digital library, and that technology is not to be confused for more than a tool. Lytras, M., Sicilia, M., Davies, J., & Kashyap, V. (2005). Digital libraries in the knowledge era: Knowledge management and Semantic Web technologies. Library Management, 26(4/5), 170- 175. doi: 10.1108/01435120510596026. These authors describe the Semantic Web as an extension of the current “metadata intensive” fields of digital library and digitization programs. The commitment digital libraries keep to a formal foundation of metadata provides a distinct advantage over traditional HTML markup (which has few ‘meta’ tags). While metadata production and maintenance is already time-consuming, integrating these services with annotations that refer to shared ontologies may indeed create more complexity, but much less than involved with the Internet in general. This paper is the first paper in a special issue of Library Management, and serves to summarize many of the more technical papers that follow. Many of these papers we cite here in the current annotated bibliography, including Sure & Studer (2005), as well as Ferran, Mor, & MinguillĂłn (2005). Sure, Y., & Studer, R. (2005). Semantic Web technologies for digital libraries. Library Management, 26(4/5), 190-195. doi: 10.1108/01435120510596044. This article discusses the fact that computing standards specifying structure are falling into place (e.g., eXtensible markup language). While this alphabet (i.e., structure) has been successful at allowing different forms of interoperability, it was invented before the invention of a common language. That is, the Internet still lacks semantic standards. HTML may function well for human consumption (humans can read the documents) they lack ontologies that provide meaning to the structure, and hence allow humans and computers to communicate. Ontologies are networks of classes and subclasses interlinked to describe some domain of interest. As such, they allow inferences, in much the same way as knowing something is a mammal (class) allows us to know that it produces milk (subclass), or likely has hair (subclass), or itself is a subclass belonging to the class of animals. The article explains how Semantic Web technologies might help digital libraries. If the descriptions we use for objects and repositories utilize standardized ontologies, then this would make it easier for computerized language to map onto the vagaries of human language. Such queries would also enable consistent and coherent access to classes and subclasses of digital objects distributed across many different repositories (i.e., interoperability). The article goes on to discuss how librarians need to become proficient in using ontology editors such as OntoEdit, KAON, or ProtĂŠgĂŠ, and annotation tools such as Annotea, OntoMat-Annotizer (CREAM), and KIM; as well as inference engines, which allow machines, from these descriptions and annotations of objects, to process them logically (if mammal, then gives milk and has hair). It ends with a description of the Semantic Web layer cake by Time Berners-Lee. Brewster, C., & Ohara, K. (2007). Knowledge representation with ontologies: Present challenges—Future possibilities. International Journal of Human-Computer Studies, 65(7), 563- 568. doi: 10.1016/j.ijhcs.2007.04.003. This is an excellent introduction into the world of ontology construction for representing human
  • 5. Stephen J. Stose 5 12/22/10 concepts and knowledge structures in computers. It begins with laying out philosophical assumptions regarding the nature of knowledge—whether it be facts layered like bricks over time (Rationalists and Empiricist), or paradigmatic shifts in knowledge that require at times historical re-writes (Quine and Kuhn). In either case, an ontology has no truth-value; and it represents concepts, not words, even if humans have to use words to represent concepts. This latter concern is the very reason why we must standardize our vocabularies, and the second reason is why no one ontological structure permits perfect external fidelity. Ontologies are representations that by necessity imperfectly represent the external world. The very fact that institutions utilize different ontologies (whether implicitly or explicitly) is the reason why knowledge sharing (i.e., interoperability) is problematic. The article discusses how explicitly representing an institution’s ontology allows for efficient computation by providing for a way to communicate with machines, given that the ontology itself is one of form of human expression. The article also argues against the idea that user-generated folksonomies (i.e., social tagging) can function as ontologies. The article serves as an introduction to further articles in the same issue, those being focused around OWL and decidability, the augmentation of ontologies, whether ontologies can represent common sense and ordinary language, and the difficulties of modeling different kinds of knowledge in different forms (e.g., narratives, multimedia, and enterprises). This is a great introduction to the current state of the art, even if much of the latter half of the article merely summarizes difficult literature within the same volume. Du, T. C., Li, F., & King, I. (2009). Managing knowledge on the Web – Extracting ontology from HTML Web. Decision Support Systems, 47(4), 319-331. Elsevier B.V. doi: 10.1016/j.dss.2009.02.011. Ontologies need not be human generated. In this article, an ontology extractor called OntoSpider is unleashed. Given that the Web is multitudes larger than the collection of digital libraries, this article provides interesting hope. If the Web itself can be crawled for abstract class-subclass, property-subproperty relationships, and a machine can build ontologies that represent these relationships, it seems all the more likely that the hightly structured data librarians create might be much easier to represent as ontologies. The HTML Web has little meta-description, whereas digital librarians follow metadata standards that, even if only loosely related to the next, do describe objects in similar ways (e.g., author vs. creator vs artist etc…). By describing how millions of unstructured HTML pages can be converted into the Semantic Web by an extractor requiring very little human- initiated knowledge engineering, the article had the effect of making the integration of digital libraries seem much simpler. Digital libraries are already quite structured with metadata, even if only syntactic metadata. HTML web pages, on the other hand, contain little syntactic or semantic cues. Thus, the article discusses how first the HTML page itself has to be dismantled, simplified and annotated before an ontology that seeks to represent its concepts semantically can even be constructed. Digital libraries are, for this reason, much better candidates for the Semantic Web, as the syntactic work is nearly complete. This is a fantastic introduction to the way the Internet functions, and can be understood with little mathematics. Pulido, J., Ruiz, M., Herrera, R., Cabello, E., Legrand, S., Elliman, D., et al. (2006). Ontology languages for the semantic web: A never completely updated review. Knowledge-Based Systems, 19(7), 489-497. doi: 10.1016/j.knosys.2006.04.013. Pulido et al. describe in detail the different ways currently available to describe human knowledge. Different semantic web languages have been created over the past few years that seek to
  • 6. Stephen J. Stose 6 12/22/10 enable machines to think. This implies machines can make inferences, but for this to occur, frameworks are required that support the creation of ontologies. This article describes these tools, from early tools such as KIF (Knowledge Interchange Format), F-Logic and Dublin Core to more recent formats such as XML, RDF, Knowledge Annotation methods, OIL (Ontology Interchange Language) and DARPA Agent Markup Language (DAML), and OWL (Ontology Web Language). These approaches overlap and have different functions, and the authors describe the “semantic web ontology creation process” and which tools are involved in each step. The steps are: gathering, extraction, organization, refinement, merging and retrieval, all occurring for a given domain of knowledge. This article is fantastic reference to the tools of the trade, describing which tools are used for which task. Feng, L. (2005). Beyond information searching and browsing: acquiring knowledge from digital libraries. Information Processing & Management, 41(1), 97-120. doi: 10.1016/j.ipm.2004.04.005. A digital library system should empower its user to locate information to solve problems. Most digital library architecture focuses on searching and browsing, a tactical strategy the authors contrast with more strategic information seeking behavior. The tactical strategy locates documents, whereas a digital library system that takes information-seeker’s strategy into account helps him or her locate intelligent answers to questions. It involves the user’s hypotheses and pre-existing knowledge base in that domain. The authors describe a dual level digital library where tactical document-keyword matching is augmented with hypothesis-question “mapping”. In this way, higher-order cognitive questions such as “tell me whether x will cause y” or “tell me what will possibly cause y, give me referent articles that talk about this” produces answers and justifications for these answers. In this case, the justification is a series of articles—and perhaps even the part of the article relevant—offered as evidence for the hypothesized relationship. This is the very part of the Semantic Web that gets me, a cognitive scientist, very excited. The article does go on to formally describe how this might work, which does involve a great deal of set theory and higher-order logic. Burke, M. (2009). The semantic web and the digital library. Aslib Proceedings, 61(3), 316- 322. doi: 10.1108/00012530910959844. This article is an attempt by a librarian to awaken her field to the power of the Semantic Web for digital libraries. She begins by discussing how ontologies enable the creation of equivalencies between the metadata between two or more institutional collections. Web 2.0 projects such as FOAF (friend of a friend, www.foaf-project.org), SIOC (semantically interlinked online communities, http://sioc-project.org), DBpedia (http://wiki.dbpedia.org), and Musicbrainz (http://musicbrainz.org) all allow users to link data objects across many different kinds of databases, and thereby foster data integration. Of course, these are limited, as their ontologies only relate to specific subject domains. Libraries have only begun to consider the potential of harvesting the data (i.e., digital objects undergoing digitization) under their control. As one example, the author discusses in some detail JeromeDL (www.jeromdl.org) as a “social semantic digital library” that integrates many of these new Semantic Web features into a library’s existing library management system. The article is extremely basic and does a poor job of explaining what the Semantic Web really is, but does cite some rich case studies such as the few above, as well as www.talis.com, a group specializing in semantically rich metadata. After reading articles like that of Feng above, librarians really have their work cut out for them.
  • 7. Stephen J. Stose 7 12/22/10 Greenberg, J. (2007). Advancing the Semantic Web via Library Functions Advancing the Semantic Web via Library Functions. Imprint, (906118602). doi: 10.1300/J104v43n03. This oft-cited article in the librarian community is an excellent introduction to how the library and librarians already encompass the skills and knowledge for semantic web technology, albeit in different form. She argues that more attention to planning and policy in libraries will accelerate development in the Semantic Web, instead of merely focusing on folksonomies and social tagging as its recent tendency seems to primarily exemplify. In doing so, she lists many similarities between the Semantic Web and library functions. For example, “collection development” might translate to “semantic web selection” and “cataloging” to “semantic representation”; “reference” to “semantic web services” and “circulation to “web resource usage”. She then refers to a gap between the Semantic Web and library communities, and how the library community has been slow to adapt compared to computer scientists, psychologists, scientists, linguists, and especially the bio-medical community. She recommends librarians get involved more and learn the necessary technologies, but also that the Semantic Web community recognize librarian expertise, especially in regards to cataloging. This is a well-structured and well-written paper, but I find its metaphors unconvincing, if not rather banal. It does serve as an effective wake-up call for established librarians buried in their enclaves, falling behind in opening their collections through the advances in information science. Joint, N. (2008). The practitioner librarian and the semantic web: ANTAEUS. Library Review, 57(3), 178-186. doi: 10.1108/00242530810865466. This article attempts to show the enormous potential the Semantic Web has for library science, and in doing so demystify it’s apparent complexity. It also was quite helpful in understanding some of the finer mechanics of RDF. Just as CSS stylesheets separated style from content, Tim Berner-Lee describes the underlying “data structures” as having a similar modularity librarians are familiar with. If “resource description” in RDF sounds like cataloging or bibliographic description, it is. RDF is indeed a mark-up language into which the Dublin Core, a cataloging standard, fits. This standard both describes the resource and allows meaningful relationships to other resources to be drawn. In this sense, the author describes how this fact serves to distribute it beyond the traditional library’s centralized philosophy. Whereas the earlier Web (HTML and CSS) did not call this conception of a “library as a closed application” into question, RDF as a description framework presumes opening these resources to the wider and less-structured world of the “data web”. This is a nice helpful article by a librarian with a healthy attitude to the current crisis librarianship faces. Krause, J. (2008). Semantic heterogeneity: comparing new semantic web approaches with those of digital libraries. Library Review, 57(3), 235-248. doi: 10.1108/00242530810865501. This article provides a useful and cogent account of the basic differences implied in the previous Joint (2008) article cited above. He compares the “Shell Model” of centralized digital libraries and database catalogs (the “invisible web”) to the Semantic Web. The Shell Model homogenizes search terms within a centralized location, such that heterogeneous documents described using a standardized thesaurus can be integrated. Thesauri usually are criticized for their shallowness and limitations on relations, while being easier to index. The article attempts to describe how the Semantic web overcomes these limitations if the ontologies used to describe heterogeneous domains (e.g., psychology vs. gerontology) utilize a standardized vocabulary. This article is unnecessarily difficult, not due to formal language (e.g., mathematics or logic), but due to its rather unorganized and jargon-laden way of describing these systems at a very abstract level. The computer science articles, even if I don’t follow
  • 8. Stephen J. Stose 8 12/22/10 the mathematics, were refreshing in comparison. Nevertheless, the article discusses very important differences, and is very much worth a second look. Ferran, N., Mor, E., & MinguillĂłn, J. (2005). Towards personalization in digital libraries through ontologies. Library Management, 26(4/5), 206-217. doi: 10.1108/01435120510596062. This article describes the requirements for building a complete navigational profile of users in their process of searching and browsing a digital library as part of a distance learning component (e.g., a history class). As such, it provides a concrete case study of how the Semantic Web might serve to link a digital library’s educational goals by integrating its use into a specific e-learning environment (The Universitat Oberta de Ctalunya (UOC) Virtual Library). These personalization initiatives allow a digital collection to adapt to user necessities and preferences by mapping these parameterized factors (user profiles, navigational profiles, and user actions) onto an ontology that describes in detail the various possible scenarios. Thus, when users seek similar types of information, this behavior will invoke stored processes (i.e., inferences) within the ontology and thereby make use easier. For instance, exploratory navigation and goal-oriented navigation strategies can be detected and different sets of recommendations for the digital library’s use can be formulated accordingly. The same system can detect whether a user is a teacher, student or researcher, and might tailor its recommendations as such. While I question the usefulness of such a system (machines are quite bad generally at second-guessing human intention), and myself would find it obnoxious and counter-productive, it is one way of integrating the Semantic Web into digital collections and more generally education. It provides real- world application to what often begins to appear a rather abstract field (the Semantic Web in digital libraries). Bygstad, B., Ghinea, G., & KlĂŚboe, G. (2009). Organisational challenges of the semantic web in digital libraries: a Norwegian case study. Online Information Review, 33(5), 973-985. doi: 10.1108/14684520911001945. This is a comprehensive case study of the National Library of Norway. It analyzes two sources of information in search of understanding the potential impact semantic web technology might have on digital libraries: 1) the digitization project of the National Library, and 2) interviews with nine different stakeholders of this project. The study focuses on the strategic, organizational, and technological aspects of implementing the semantic web in digital libraries. It found few technological hurdles in implementation, moderate strategic issues, and high organizational costs. At a strategic level, upper- level management has to generate organization-wide support for the initiative, which will involve some changes in the organizational structure in order to augment metadata teams with cross-organizational ontology-engineering infrastructure. This implies large costs to the organization, as inter-organizational and cross-organizational structures will need to be instituted to address issues in ontology engineering. The main lesson the article teaches is that ontology engineering and metadata production will go hand in hand, and shake up the institutional structure in unexpected ways. Fuentes-Lorenzo, D., Morato, J., & GĂłmez, J. M. (2009). Knowledge management in biomedical libraries: A semantic web approach. Information Systems Frontiers, 11(4), 471-480. doi: 10.1007/s10796-009-9159-y. Biomedicine has been pivotal in the development of the semantic web and digital library technology. Technical programs in bio-informatics abound. The huge amount of data in the life
  • 9. Stephen J. Stose 9 12/22/10 sciences, not limited to but certainly in part due to the human genome project, has bred researchers in and around this field towards more efficient data-sharing and data-processing solutions. Extending these applications with metadata information embedded in ontologies will provide both human users a firmer understanding of the biomedical elements they seek to process, while allowing machines to use its formal semantic properties in order to support this reasoning. This article seeks to present design principles for a simple and “easy-to-use” biomedical Semantic Web. One penetrating example is how drugs have many different names: Tylenol most of us know, but if the machine is to help, then higher order terms within this Semantic Web will connect this term within its ontology to the same concepts that use different words: “DCI”, “acetaminophen” or “NO2 BE01” are other monikers. The article states that there are on average 19 synonyms for every gene. This article is a wonderful case study for explicating how the Semantic Web can come alive. Special attention is given to a faceted search mechanism that allows filtering through the ontology (represented simply in a sidebar) to add restrictions to result pages, especially useful given these synonyms that operate at different research levels (physics to chemistry to biology to pharmacology). Prasad, a., & Madalli, D. P. (2008). Faceted infrastructure for semantic digital libraries. Library Review, 57(3), 225-234. doi: 10.1108/00242530810865493. Context based retrieval is sadly missing from many interfaces. Top-down models that prioritize the user’s experience and allow customization require a bottom-up semantic infrastructure. The article stresses that a “digital library is only as good as its retrieval efficiency”. Given that most digital libraries inherited the bibliographic database format with search and browse features only, terms are often pulled from their context. Metadata is really just knowledge, and knowledge is, as described by Berner-Lee (see Coyle, 2008, above), just “actionable knowledge”. In most libraries, catalogs and classification schemes allow only “post-coordinate indexing”, such that search terms connected by Boolean operators strip resources from their context and place their referents in domain-general indexes (i.e., indexes with no context). Pre-coordinate indexing (or context-dependent indexing), on the other hand, includes a term’s semantic context (is domain-specific). This is not easy, as one topic has many different perspectives, or facets, each that orders a given set of concepts into different class- subclass ontologies (e.g., think about Ranganathan’s faceted cataloging). The order of these terms might therefore become part of the interface and as such, the upper and lower classes of each term contextualize browsing. If I am interested in harvesting, and perhaps sugarcane harvesting, I don’t want to enter sugarcane :: harvesting space, but instead the harvesting :: sugarcane space, which will open my browsing up to many other kinds of elements harvested (instead of many different properties of sugarcane), depending on my intentions (i.e., my hypotheses). Krestel, R., Kappler, T., & Lockemann, P. C. (2010). Converting a historical architecture encyclopedia into a semantic knowledge base. IEEE Intelligent Systems. This ambitious project describes digitizing one of the most respected, yet poorly indexed, sources on architecture (The Handbook of Architecture, 1880, containing over 25,000 pages), into a fully indexed and semantically annotated digital library using the MediaWiki (www.mediawiki.org). After digitization, each section (within a chapter) was converted into a wiki page, which was contained within another wiki page comprising the chapter, and so on into subsections. The Beta version is located here: http://durm.semanticsoftware.info/wiki/index.php/Hauptseite. Each wiki page is fully searchable and indexed, and also each is associated with the digitized original. Natural language extraction technologies, including logical and morphological analysis, and information extraction,
  • 10. Stephen J. Stose 10 12/22/10 allowed for many features to be developed. Most notably among these is “automatic summarization”, which condenses large amounts of text into abstract summaries for readers. While very detailed, this system allowed for the eventual construction of ontologies from the concepts themselves extracted, and each of these were linked to the OWL vocabulary for the Semantic Web. This was written for a general audience, but still includes a detailed account of a very ambitious project. Madalli, D. P., & Suman, A. (2008). UML for the conceptual web. Online Information Review, 32(4), 511-515. doi: 10.1108/14684520810897386. UML stands for Unified Modeling Language. It represents an attempt to develop a faceted model based on ontologies that are organized according to the way they are in the human mind (a rather bold statement in this article…). That is, while an ontology is method for structuring concepts in classes and subclasses, the structure of which do contain rule systems, this article attempts to create even more constraints on this conceptual network through a developed system of axioms. The article serves as a proposal and brief literature review of recent attempts to apply these ideas. It explores them with the goal of developing faceted search/browse features in a user-friendly interface. It also seeks to contain within this architecture different domains of knowledge inter-operable with the next, such that subject-experts can continually modify and add to the existing digital library. In their proposal, the authors suggest that the five fundamental categories of Ranganathan might be used for a fundamental basis for each facet: personality, matter, energy, space and time. Mayr, P., Mutschke, P., & Petras, V. (2008). Reducing semantic complexity in distributed digital libraries: Treatment of term vagueness and document re-ranking. Library Review, 57(3), 213-224. doi: 10.1108/00242530810865484. One-stop academic federated search portals are becoming increasingly common. Examples include Elsevier’s Scirus portal, the Online Computer Library Center WorldCat catalog, or the Perseus project of Tufts University. This article examines the Vascoda portal in Germany, a federated interface of full-text article databases, library catalogs, and Internet resource collections. Searching these resources results, however, only allows instances of the terms to be mapped. Large databases nevertheless provide many results, leaving the impression that it misses few documents (Type-II errors). These kinds of errors, due to the ambiguity and vagueness in human language, become more problematic in more specified document repositories, or in collections spanning different databases with different metadata schemes. This study attempts to deal with this issue of language vagueness by re-ranking documents according to two parameters. One method is to “bradfordize” results, which applies the Bradford Law of Scattering. This ranks articles of core journals ahead of the journals that do not usually deal with the same topic as much. The other method is to rank authors depending on how prominently they occur (i.e., are cited) within other similar publications (i.e., co-authorship networks). This article thus takes concepts of the Semantic Web in order to alter ranking algorithms for search terms. Barbera, M., Nucci, M., Hahn, D., & Morbidoni, C. (2008). A Semantic Web Powered Distributed Digital Library System. Electronic Publishing, (June), 130-139. Given the ever-expanding set of resources available on-line, tools that allow for the intelligent search of these resources grows. This article presents Talia, a distributed semantic digital library with an annotation system especially tailored to research in philosophy. In a sense, Talia is a digital archive
  • 11. Stephen J. Stose 11 12/22/10 management system, much like ContentDM or Omeka functions, but contains within its suite tools developed especially for the Semantic Web. It this sense it combines a digital archive management system with en electronic publishing system (an on-line peer review system), each resource utilizing a distinct and stable URI. It is written using Ruby-on-Rails and the knowledge base of unchanging URIs are organized in RDF by using the RDFS/OWL vocabularies to describe its ontologies. Any digital library published with Talia can be interconnected which allows cohesive scholarly communities to interact without giving up control over one’s own content. This is a fantastic system, one I wish were available in PHP, such that digital libraries developed in Drupal (my own recent work) might be ported as such. I reviewed in detail one such digital library that utilizes Talia, the Discovery Project, in our class blog: http://jahurst.mysite.syr.edu/ist677s2010/?p=229. Hull, D., Pettifer, S. R., & Kell, D. B. (2008). Defrosting the digital library: bibliographic tools for the next generation web. (J. McEntyre). PLoS Computational Biology, 4(10), e1000204. doi: 10.1371/journal.pcbi.1000204. This review discusses the current digital libraries in use by computational biologists. This includes PubMed, IEEE Xplore, the ACM digital library, ISI Web of Knowledge, Scopus, Citeseer, arXive, DBLP and Google Scholar. These tools are described as being “cold” to the user, and are contrasted with newer “warmer” tools such as Zotero, Mekentosj Papers, MyNCBI, CiteULike, Connotea, HubMed, and Mendeley. These latter versions take advantage of the social web in order to make these digital libraries more accessible, friendly, and personal. I myself use CiteULike and Mendeley. In fact, to get an idea of how Mendely works, please visit my profile, in which I share with you all of the articles used in this current bibliographic review. http://www.mendeley.com/research-papers/collections/2216691/The-Semantic-Web-in-Digital- Libraries/ This resource comes with an application I utilize on my desktop that essentially provides an annotative interface for my personal library of PDF documents. These PDFs are automatically OCR’d and hence searchable. This allows for me to comment and/or annotate parts of each paper as well, within the document, as well as link to OpenOffice to format bibliographies in various styles. The local interface can be synchronized at the touch of a button with the web interface, such that I can share my bibliographies with selected users, or even make them public, as I have with this collection at the URL above. I am still learning about how to use this resource, but using it to manage aspects of the current annotated bibliography as part of the assignment was rather enlightening. It being OpenSource is extremely valuable, as I have used EndNote and RefWorks at various times and lose my organization due to proprietary licensing; and hence lost my bibliographies.