1. Semantic Digital Libraries:
Improving Usability of Information Discovery
with Semantic and Social Services
Sebastian Ryszard Kruk
Copyright @ Sebastian Ryszard Kruk, http://www.sebastiankruk.com/
10. Problem Statement
Digital library users are
missing a librarian => problems with information discovery
and understanding complex metadata
missing peers => cannot share experience with other
users visiting the library
missing connection with other sources => library
resources cannot become a part of the information
processing workflow
Digital Library system
knowledge organization systems => islands of highly
organized information
poor information discovery => loosing their position to
other sources
incompatible taxonomies and schemata => loosing potential
of rich metadata
11. Hypothesis
Semantic and social technologies in digital libraries
improve information discovery compared to classic
approaches:
Users find information more easily
Precision in searching is improved
Users’ satisfaction is increased
Users retain more information
expressiveness Semantic tagging
Semantic Web
Digital
Web interoperability communities 2.0
Libraries
controlled vocabularies
knowledge organization systems
Digital
Libraries
13. SemDL Architecture
Existing reference digital library architectures
Alexandria DL architecture (Frew et al, 1998)
DELOS reference model (actors) (Candela et al, 2007a)
Interaction Triptych Model (Fuhr et al, 2007)
Missing:
Object Model: integration of metadata, reuse of library resources
Digital Library Services: interoperability, user annotation, advanced search and browsing
Users
UI agents communities of users
System
Data Presentation Layer
service DL
developers Interoper- Information Advanced designers
Basic
ability Access Mgmt.
Services
Services Services Services
Data Access and Manipulation Layer
external DL
services Data Abstraction Layer administrators
Content
Data Sources
Published in: Kruk et al., 2005 (DEXA); Kruk and McDaniel, 2008 (Springer); Kruk et al., 2009 (accepted to TEL)
14. Ontologies for SemDL
Requirements:
Support for a complex and dynamic structure of
information objects; reuse, aggregation; scientific
publications workflow
Support for reach, interconnected and
interoperable bibliographic metadata; align existing
concepts, e.g., MARC21, BibTeX, Dublin Core, SKOS,
Address Ontology
Support for communities of library users: FOAF,
SIOC, Tom Gruber’s Tagging Ontology
Support for rights management; model based on
ODRL and XACML
Published in: Kruk et al., 2005 (DEXA); Kruk and Haslhofer, 2006 (NKOS, ECDL); Kruk and McDaniel, 2008 (Springer); Kruk et al.,
2009 (accepted to TEL)
15. SemDL Ontologies Example
Community Ontologies Structure Ontology Bibliographic Ontologies
John foaf:knows Sebastian marcont:hasAffiliation
DERI
Doe xfoaf:trustLevel Kruk
marcont:hasCreator
sscf:issuedBy 20%
SemDL marcont:hasAbstract
sscf:isIn Abstract
Book
Digital
(is in) marcont:hasRelatedEvent
Libraries
jdl:hasPart
(directory)
Corrib
dc:creator Introduction SemDL
Collection
Tutorial
sioc:related_to jdl:hasRepresentation
(Tagging) intro.pdf
hasAddress
tagging:hasTerm eac:hasLicense
Book rdfs:label DERIans DERI Pittsburgh
Book
(term) read (License)
Rights Management Ontology
Ontologies designed: JeromeDL structure ontology, MarcOnt
bibliographic ontology, FOAFRealm/SSCF ontology, Extensible
Access Control ontology, S3B Tagging Ontology
Ontologies used: FOAF, SKOS, SIOC, Address ontology
Published in: Kruk et al., 2005 (DEXA); Kruk and Haslhofer, 2006 (NKOS, ECDL); Kruk and McDaniel, 2008 (Springer); Kruk et al.,
2009 (accepted to TEL)
17. Social Semantic Collaborative Filtering
Motivation
support identifying and finding experts, and
propagating their expertise
allow to express users’ interests and filtering
knowledge base using disambiguation mechanisms
feature security mechanisms for efficient and
secure information gathering and dissemination
Model
graph of quantified social relations
graph of inclusions of collections annotated with
KOS concepts social relations
access control based on the position in the social
network
Published in: Grzonkowski, Gzella, Kruk, et al., 2009 (Journal of Web Based Communities); Choi, Kruk, et al., 2006 (IRW, WWW);
Kruk, et al, 2006 (ASWC); Kruk and Decker, 2005 (Semantic Desktop Workshop, ISWC)
18. Social Semantic Collaborative Filtering
Alice FQ=80%
Bibliographic Ontologies
FQ=50%
Felix
Mediation
Ontology
Mediation
Bob Caroline
FQ=30%
Artificial Intelligence Digital Distributed
Libraries Systems
FQ=10% Damian Eric
Libraries P2P Systems
Gerald ACL(PD, Damian) < 2
ACL(FQ, Damian) > 80%
Semantic Web
Legacy Ontologies
Mediation
Published in: Grzonkowski, Gzella, Kruk, et al., 2009 (Journal of Web Based Communities); Choi, Kruk, et al., 2006 (IRW, WWW);
Kruk, et al, 2006 (ASWC); Kruk and Decker, 2005 (Semantic Desktop Workshop, ISWC)
19. Evaluation of SSCF Model
Question for evaluation:
Is the social network better informed with SSCF?
Assumptions for evaluation model:
The quality of the information provided by a user on a certain
collection is proportional to the expertise level of the user on
the topic of the collection.
It is possible to find a user with a high expertise on the given
topic within the social network.
Evaluation setup:
a model of the social network - 1000 users
distribution of relationships: bell-curved (µ = 25, σ = 12.5) and
zipfian (θ = 1.9)
Measuring:
Average Maximal Expertise (R) - average value of the highest
expertise level found within given degree of separation (R)
Published in: Grzonkowski, Gzella, Kruk, et al., 2009 (Journal of Web Based Communities); Choi, Kruk, et al., 2006 (IRW, WWW);
Kruk, et al, 2006 (ASWC); Kruk and Decker, 2005 (Semantic Desktop Workshop, ISWC)
20. Evaluation of SSCF Model
Q1: Can a user access information gathered by the
domain experts ?
For Zipf ’s distribution maximal average expertise for R=6 is
91% - answer: very probable
For Bell-curved distribution maximal average expertise for R>3
is above 96% - answer: even more probable
Q2: Is the average expertise level higher in the
social network ?
For both types of distributions the average expertise of a
single member (R = 0) is much lower than in the social network.
Zipf (θ = 1.9) 100%
Bell (σ = 12.5) 75%
50%
25%
0%
0 1 2 3 4 5 6
Published in: Grzonkowski, Gzella, Kruk, et al., 2009 (Journal of Web Based Communities); Choi, Kruk, et al., 2006 (IRW, WWW);
Kruk, et al, 2006 (ASWC); Kruk and Decker, 2005 (Semantic Desktop Workshop, ISWC)
21. Shortcomings of faceted navigation
Shortcomings of faceted navigation
RDF is not a homogeneous information space
join operator is unintuitive to the end user (Oren, 2006)
no filtering based on only given value
no union and difference operators
most of solutions are monolithic (no MVC)
poor accessibility: information overload
Extended model
Extensions to inverted and existential operators
Browse and similarity operators
New combination operators: union, difference, binding
browse-(knows)
affiliation knows creator
DERI ... ... ?
browse-(affiliation) browse-(creator)
Published in: Kruk et al., 2007 (ODBASE)
22. MultiBeeBrowse
Zoomable User Interface: basic, structured, browsing and
complete history view
Collaborative Browsing (using SSCF and RSS)
Adaptable Browsing Interface (incl. concepts suggestions,
facets labeling, results presentation)
Services for Accessible Faceted Navigation
Model of
meta-operations:
filter
(dc:creator, quot;Krukquot;)
browse
(dc:creator)
sum search
(name:quot;Deckerquot;)
similar-
(dc:creator)
sum
Published in: Kruk et al., 2007 (ODBASE) Forest with faceted navigation decision trees Meta-operations decision tree
23. Evaluation
Comparing three solutions:
20
MultiBeeBrowse 16
BrowseRDF 15
12
Longwell
10 8
7
6
5 4
3
2
0
0
Friendly Average Hard to use
Published in: Kruk et al., 2007 (ODBASE)
24. Evaluation
Comparing three solutions:
20
MultiBeeBrowse 16
BrowseRDF 15
12
Longwell
10 8
7
6
5 4
3
2
0
0
Features comparison: Friendly Average Hard to use
Operator MBB Browse RDF Longwell Other
search + ± ± -
selection + ± ± ±
exist. property + + - -
browse + - - -
combine + ± ± ±
Published in: Kruk et al., 2007 (ODBASE)
26. JeromeDL
Semantic digital library project based on
cooperation of
Gdańsk University of Technology
DERI, National University of Ireland, Galway
Distributed under Open Source (BSD) license
10+ instances worldwide:
DERI, Ireland: Library, Books, EastWeb DL
GUT, Poland: WBSS, Kashebian, PMR Journal
INEGI, Mexico: internal digital library
dContentWare, Italy: core of the project
Bosco Inc., India: 1000+ resources
WKU, KY, USA: learning materials repository
Published in: Kruk, Decker and Zieborak, 2005 (DEXA); Kruk et al., 2007 (Semantic Web Challenge, ISWC), Kruk et al., 2008 (ECDL),
Kruk and McDaniel, 2008 (Springer)
27. Differentiators of JeromeDL
combining semantic bibliographic descriptions
and social media
advanced, personalized search solutions
social networking platform integrated with
user profiling component
extensible access control system based on
social network relations
collaborative filtering and browsing
dynamic collections
integration with other Web 2.0 services
Published in: Kruk, Decker and Zieborak, 2005 (DEXA); Kruk et al., 2007 (Semantic Web Challenge, ISWC), Kruk et al., 2008 (ECDL),
Kruk and McDaniel, 2008 (Springer)
28. 3-layered Architecture
Users
UI agents communities of users
System
Data Presentation Layer
service DL
developers Interoper- Information Advanced designers
Basic
ability Access Mgmt.
Services
Services Services Services
Data Access and Manipulation Layer
external DL
services Data Abstraction Layer administrators
Content
Data Sources
Published in: Kruk, Decker and Zieborak, 2005 (DEXA); Kruk et al., 2007 (Semantic Web Challenge, ISWC), Kruk et al., 2008 (ECDL),
Kruk and McDaniel, 2008 (Springer)
29. 3-layered Architecture
Collaborative Collaborative
Filtering Browsing
Community Social
Driven comments Tagging Blogging Services
Taxonomies
Mediation Natural Language
DMoz
Services Query Template
Ontologized Semantic
WordNet Identity Filtering and
Metadata Services
Management Browsing
Distributed
resource KOS Search
Digital
Classic
Library resource Security & Full-text
Services
Resources Access Control Index & Search
Published in: Kruk, Decker and Zieborak, 2005 (DEXA); Kruk et al., 2007 (Semantic Web Challenge, ISWC), Kruk et al., 2008 (ECDL),
Kruk and McDaniel, 2008 (Springer)
30. Search and Browsing
TagsTreeMaps - filtering with hierarchical tags
MultiBeeBrowse - social browsing
Dynamic collections - defined based on triple
filtering and SPARQL queries
Recommendations of related resources based on
semantic resource description
Query templates in natural language
Semantic Query Expansion based on user’s context
and semantic annotations
Social Semantic Collaborative Filtering
flexible API for integration of external services,
e.g., Exhibit (SIMILE, MIT)
Published in: Kruk, Decker and Zieborak, 2005 (DEXA); Kruk et al., 2007 (Semantic Web Challenge, ISWC), Kruk et al., 2008 (ECDL),
Kruk and McDaniel, 2008 (Springer)
32. Evaluation Procedure
Evaluating usability (system, user)
Two digital libraries in their basic (vanilla) setup
JeromeDL - semantic digital library
DSpace - classic digital library (control group)
Database:
noise: 529 articles from DERI JeromeDL instances
reference set: 35 articles on Internet psychology
Participants: 59 commenced evaluation, 26 completed
long time
Initial Tasks Question-Answering Memory Task
Tasks
registration one of QA Tasks
Task Task Task
getting to know no library access
1 2 3
the library
Initial QA Memory Final
Questionnaires Questionnaires Questionnairy Questionnairy
Published in: Kruk et al., 2008 (ECDL), Kruk and McDaniel, 2008 (Springer)
33. Questions for Evaluation (1)
Do semantic and social services improve the quality of answers?
slightly better results for JeromeDL group, improving significantly over
time (results statistical significance close to acceptance threshold)
Do semantic and social service increase the quality of references
provided by the participants?
slightly better results for JeromeDL group, improving significantly over
time (could not confirm statistical significance)
Do semantic and social service increase the satisfaction from
using a digital library? (statistical significance significance)
JeromeDL DSpace
23.00
21.99 22.69
17.25 19.84
13.41
14.86
11.50 9.39 8.23
5.75
1.88
0
task 1 task 2 task 3 average
Published in: Kruk et al., 2008 (ECDL), Kruk and McDaniel, 2008 (Springer)
34. Questions for Evaluation (2)
Which services are found to be most useful?
recommendations and social filtering (results statistically significant)
Do semantic and social services increase information
retention? (results statistically significant)
Quality of answers: JeromeDL - 2.78, DSpace - 2.44
Accuracy of references: JeromeDL - 6, DSpace - 1
Satisfaction:
JeromeDL DSpace
29.11
21.11
10.89
2.00 -17.22 -1.00
understanding easy of execution intuitiveness
Published in: Kruk et al., 2008 (ECDL), Kruk and McDaniel, 2008 (Springer)
35. Questions for Evaluation (2)
Which services are found to be most useful?
recommendations and social filtering (results statistically significant)
Do semantic and social services increase information
retention? (results statistically significant)
Quality of answers: JeromeDL - 2.78, DSpace - 2.44
Accuracy of references: JeromeDL - 6, DSpace - 1
Satisfaction:
JeromeDL DSpace
84.62%
29.11
21.11
10.89
2.00 -17.22 -1.00
understanding easy of execution intuitiveness
JeromeDL DSpace
Would you like to continue using this library ?
Published in: Kruk et al., 2008 (ECDL), Kruk and McDaniel, 2008 (Springer)
36. Questions for Evaluation (2)
Which services are found to be most useful?
recommendations and social filtering (results statistically significant)
Do semantic and social services increase information
retention? (results statistically significant)
Quality of answers: JeromeDL - 2.78, DSpace - 2.44
Accuracy of references: JeromeDL - 6, DSpace - 1
Satisfaction:
JeromeDL DSpace
84.62%
29.11
21.11
10.89
46.15%
2.00 -17.22 -1.00
understanding easy of execution intuitiveness
JeromeDL DSpace
Would you like to continue using this library ?
Published in: Kruk et al., 2008 (ECDL), Kruk and McDaniel, 2008 (Springer)
38. I have presented
Architecture and ontologies for Semantic
Digital Libraries
Examples of search and browsing services:
Social Semantic Collaborative Filtering
MultiBeeBrowse
JeromeDL - the prototype
Evaluation of semantic and social services
39. What about hypothesis ?
Semantic and social technologies in digital
libraries improve information discovery
compared to classic approaches:
40. What about hypothesis ?
Semantic and social technologies in digital
libraries improve information discovery
compared to classic approaches:
Users find information more easily ✓
41. What about hypothesis ?
Semantic and social technologies in digital
libraries improve information discovery
compared to classic approaches:
Users find information more easily ✓
Precision in searching is improved ✓
42. What about hypothesis ?
Semantic and social technologies in digital
libraries improve information discovery
compared to classic approaches:
Users find information more easily ✓
Precision in searching is improved ✓
Users’ satisfaction is increased ✓
43. What about hypothesis ?
Semantic and social technologies in digital
libraries improve information discovery
compared to classic approaches:
Users find information more easily ✓
Precision in searching is improved ✓
Users’ satisfaction is increased ✓
Users retain more information ✓
44. The Impact
1 Book: Kruk, McDaniel: Semantic Digital Libraries (Springer, 2008) [300+ copies sold]
30+ Papers (excluding 9 chapters in the book):
JeromeDL: IIS 2004, DEXA 2005, ECDL Demo Session 2005 Workshop, InfoBazy 2005,
ICIW 2006 (best paper), Semantic Web Challenge at ISWC 2007, SemTech 2007, MCAST
Workshop 2007, Dev. Track WWW 2008, ECDL 2008, InfoBazy 2008, The Electronic
Library Journal
FOAFRealm - FOAF Workshop 2004, TEHOSS 2005, MoSO @ MDM 2006, IRW2006 @
WWW2006, ASWC 2006, WBC 2007, International Journal of WBC, Semantic Web
Challenge at ISWC 2007, Media in Transition 2007
MarcOnt - DublinCore 2005, ECDL Poster Session 2005, International Artificial
Intelligence Research Society Conference 2007
MultiBeeBrowse - ODBASE 2007, Conference on Teaching and Learning 2007, CHI 2008
Social Semantic Collaborative Filtering - Semantic Desktop at ISWC, 2005
NLQ - IADIS International Conference WWW/Internet 2006
Didaskon/IKHarvester - EC-TEL 2007, LACLO 2006, IEEE ICSC 2006
HyperCuP - ESWC Demo Session 2006
5 Tutorials: JCDL2006, ESWC 2007, WWW 2007, JCDL 2008, ICSD 2009 (upcoming)
3 Invited talks: EPFL, UCD, Polish Information Processing Society
3 workshops: Irish DL Summit, Web Archiving, Special Session at NKOS 2006
10+ open source projects - corrib.org, opensource.knowledgehives.com
17 MSc Theses supervised at GUT
Startup company (Knowledge Hives) continuing R&D efforts initiated in SemDL domain
45. Semantic and Social Services
Improve Usability of Information Discovery
in
Semantic Digital Libraries
Sebastian Ryszard Kruk
http://www.sebastiankruk.com/
Editor's Notes
(page 6)
SemDL is a superposition of 3 technologies
Our hypothesis is that by introducing semantic and social technologies to digital libraries,
we are able to improve information discovery in digital libraries compared to classic approaches:
• Users are able to find information more easily.
• Precision in searching is improved.
• Users’ overall satisfaction with using the digital library to accomplish appointed tasks is
increased.
• Users are able to retain more information when using a semantic digital library.
(page 8)
Alexandria: UI+Agents, Middleware, Catalog+Resource+Data Engine, Librarian, Outside World
DELOS: (Designers, SysAdmins, AppDevels) DLMS -> DLS -> DL (End-Users)
Triptych: System (performance) Content (usefulness) User (usability -> System)
System: semantic & social services
Content: complex resources, dynamic objects, community annotations, semantic annotations
User: community, external services
Data Abstraction Layer: Access, Index, Registry, Preservation, Transactions, Replications, Reasoning & Inferencing
(page 62)
* Open Digital Rights Language (ODRL) [Iannella, 2002] - XML schemata for Expr.Lang. and data dict - Asset, Permission, Constraint, Requirements, Condition, Rights holder
* eXtensible Access Control Marcup Language (XACML) [Moses, 2005] - Rule, Policy, Target, Conditions
(page 78)
This is not an RDF graph - just an overview of concepts
(page 92)
Disadvantages of Typical Collaborative Filtering
Although collaborative filtering techniques solve same problems of information seeking, spe-
cific collaboration filtering implementations suffer various shortcomings:
• A heterophilous diffusion (exchange of information across different socio-economic groups)
is neglected in favor of a homophilous diffusion (exchange of information within socio-
economic groups) (Canny, 2002).
• The security and privacy issues are weakly supported; the reputation and trust among
users is usually not developed (Procter and McKinlay, 1997).
• When the social network is created automatically by harvesting various databases with
advanced algorithms:
– The critical mass of registered users is required to provide a satisfiable level of cor-
relation to user’s interests (Guo, 1998).
– Monopolies are supported (Polat and Du, 2003) because a service provider has to
gather a lot of information to become accurate (critical mass).
– It is impossible to create a digraph of social connection from most of the commonly
used sources; also, the privacy of individuals is often violated (Canny, 2002).
• When the user actively uses fora or mailing-lists:
– There is no guarantee that there will be an answer to the posted question or that the
answer will be thorough.
– There might be no expert on the specific field of discourse in the direct social neigh-
borhood of the user.
• Some systems require that users answer long questionnaires (Shardanand and Maes,
1995; Procter and McKinlay, 1997) in order to find similarities in users’ interests.
(page 126)
(page 133)
* following Kautz et al, 1997a
* Small World Phenonema (Barabasi, 2002) - Zipfian
* Bell curved (Groot, 2005) - special types of networks, eg. academics
(page 139)
(page 143)
* Flamenco (Yee, 2003)
* Oren missing: join, filtering on value, union, difference, taxonomy of values, poor accesibility
(page 96)
* left - actual interactions in MBB: sum (filter-browse, search-similar);
* right - tree of decision history tree
(page 110)
(page 117)
(page 117)
(page 117)
(page 117)
(page 117)
(page 117)
(page 117)
- we’ve got phenomenal participation from users form all over the world and fantastic feedback
- users create their own customizations and at the same time influence the the main line of development
(page 150)
- here are the most important features of the system- combining semantics, EAC, SN, collaborative
(page 150)
3 layers - detailed scope on services and data layers
distinctive layers
(page 151)
3 layers - detailed scope on services and data layers
distinctive layers
(page 151)
3 layers - detailed scope on services and data layers
distinctive layers
(page 151)
3 layers - detailed scope on services and data layers
distinctive layers
(page 151)