1
Mending the Gap between Library’s Electronic
and Print Collections in ILS and Library’s Web
Site using Semantic Web - Progress Report
2007 EndUser
Annual Ex Libris User Group Meeting
April 28, 2006
Chicago, ILL
Amanda Xu, Electronic Resources Cataloging Librarian,
and Andrew Sankowsi, Director of Collection and
Information Management
St. John’s University Library
Jamaica, New York
2
Content
Technology
Distribution
Q: Where is a library’s value proposition as illustrated in the triangle
below? Is library an aggregator of aggregators? If so, are we ready for it?
If not, what is the our next social identity? How can we transfer our skills
from ‘info-land’ or ‘meta-land’ to ‘digital land’ or ‘semantic land’ or simply
hybrid ‘land of bits?’ where print and electronic co-exists?
Where is users’
behavior
context?
Aggregated contents
through technologies?
Capable to select, organize,
access, guide, enhance, and distribute
contents to the user through
technology. Still, complains like ‘why
what you buy is not what I need, and
what I need is not what you buy?’
3
Introduction
• At this Thursday’s keynote presentation, Oren Beit-Arie –
Chief Strategy Officer of Ex Libris defined the role of library as
‘connecting users to content’, and providing ‘unique services,
to tailor the needs of their users and integrate service into
users’ tasks and workflow.’
• This is exactly where we feel the game is about as well –
modeling user’s information seeking behavior in context of
their experience, use the information to improve our
collections and services.
• Last year, we proposed a sample conceptual model to:
– Identify the information need of our faculty through the
footstep of their teaching and research experience;
– Use aggregated information to measure how well our
collections will meet faculty’s teaching, learning, and
research requirement through the footstep of collection
managers;
– The subject area that we chose was math and computer
science
4
5
Major Challenges and Opportunities
Challenges
1. Much of the data sources still
residing in isolated data islands in
closed systems or flat file systems
2. Integration requires enterprise
level
3. Lack of resources, times, money,
and understanding of required
systems infrastructure – hardware,
software, database, network
messaging, data structure, etc.
throughout the lifecycle
development of the application;
4. Lack of resources and required
skills to build the repository server,
handle ETL process and
messaging among systems across
the enterprise;
5. Hard to get people buy-in
• Priority conflict
• No access to data sources
• Turf protection
• No agreement on business
model – SAAS
• No governance
Opportunities
1. Model resource discovery
process cross-databases
2. Identify gaps in existing
systems infrastructure as far
as content selection is
concerned;
3. Re-examine collection
development and information
management process
4. Develop survey forms,
interview people in charge of
functional areas
5. Plan to do collection analysis
via inventory, gaps, usage,
cost
6. Tie the analysis to
researcher’s need
7. Identify new data sources to
be managed by libraries from
institutional repository to ILS
operation;
8. Identify priority list - Mend the
gap between print and
electronic collections in ILS
and Library Site
6
1. Environmental Scan of Web Content Technologies by IT Vendors, by
Library IT Vendors and Academic Libraries;
2. Current Web and Print Resources Integration Effort at St. John’s
University Library - Voyager, Gary Strawn’s Location Changer, Serials
Solutions, OCLC World Cat, Google Scholar, Net Library, NYLINK,
WALDO, Content Aggregators, Institutional Portals, Courseware,
Faculty Pub, Student & Alumni Repositories;
3. Swam through the whole iterative processes of application development from
backend to front end for the project:
Gather requirement
Obtain buy-in from IT vendors, IT consulting services, and make
recommendation
Obtain support from upper and middle level managers
Get into training at NYU SCPS, and sharpen needed skills set for
communication with technical and non-technical people
Proposed technical infrastructure from SOA framework to database
systems, data structure, etc.
Recommend a little touch of Semantic Web technologies for
Pervasive Library Resource Management on the Websites
Steps Taken
7
The credit of the talk shall go to many giants in IT industries, library
IT industries, especially former Endeavor and later Ex Libris
Information Systems, End User 2007 Program Planning Committee,
St. John’s University Library, NYU SCPS faculty;
What Will Not Be Discussed for This Talk:
Detailed files, data repositories, networks (physical and logistics),
distributed application programming and computing services,
security, jobs and events required for content aggregation and
deployment on library’s websites;
Mathematical models for auto-text processing, patterns, and
business rules generated and deployed by any given semantic
Web application;
Stacks of standards concerning key content technologies
Credit and Exceptions
8
Review Web Content Technologies
By IT Vendors(1)
Key Content Technology Vendors Investigated:
1. Project Management, Enterprise Architecture, Modeling
– IBM Rational, iRise, Telelogic, Agilense, CA Erwin,
MS Visio, MS Project Server, Embarcadero, Enterprise
Elements
2. Content Capturing Vendors – Captiva Software, Adobe,
FUJI, ZyLab, ABBYY FlexiCapture, Liquid Office;
3. ETL and Master Data Management: Sunosis, Data Flux,
Kalido
4. Content Management Systems – Oracle Stellent, EMC
Documentum, ArborText, POET, XEnterprise,
Marklogic;
5. Search Engine Services - Verity K2, Autonomy,
Teragram, FAST, Inxight Software, Endeca, iProspect,
IBM OmniFind, Siderean, Semantic Works, Google, Ask;
9
Review Web Content Technologies
By IT Vendors(2)
6. Portal – Vignette, Hummingbird, MS Share Point and
IBM Websphere;
7. BPM/SOA – Global 360, IBM FileNet, Lombardi,
PegaSystems, Hyperion, Pervasive, SUN/SeeBeyond,
BEA Systems, TIBCO, Sonic ESB;
8. CRM (Customer Relationship Management Vendors)–
Answers, Oracle Seibel;
9. Business Intelligence and Reporting – Business Objects,
Crystal Reports, Oracle, Informatica, Accenture, SAP, SAS,
SPSS;
10.Service Resolution Management – Knova and Kana;
11.Semantic Technologies: IBM, Oracle, Jena2, Gate,
Siderian;
12.Best Practice Sources: Delphi, KM, TDWI, AIIM, DCI,
OMG, SOA/BPM Institute, Insight Shared Network, SD
Time, Project 10X, NCOR, Getty, DocuLab, Ken Orr
Institute, Zachman Institute, Essential Strategies,
TOPQuadrant, Semantic Arts, Forrester, etc.
10
Check Functional Components and Enterprise Level
Content Sources to Be Leveraged in the Framework of
Service Oriented Architecture (SOA) (1)
1. Project Management, Enterprise Architecture &
Modeling
2. Imaging and Document Capture
3. Web Content Management from 2.0 to 3.0, including
content created from portal, desktop application,
browser, e-form, and other web-based collaboration
environment, such as Wiki, Flickr, instant messaging,
Yahoo 360 & Food Site, Oracle OTN Site, MySpace,
blog, RSS, social tagging, recommend, etc.
4. Document Management
5. Record and Retention Management
6. Digital Asset Management
7. ECM (Electronic Content Management) - Taxonomy,
Thesauri, Topic Map, Meta-data
11
Check Functional Components and Enterprise Level Content
Sources to Be Leveraged in the Framework of Service
Oriented Architecture (SOA) (2)
8. Enterprise Search, Directory, Digital Signature, Auto-
Classification, Clustering, Categorization, Security, Risk
Management
9. Compliance to License, Auditing, Federal and Legal
Regulations
10. Information Reusability, Lifecycle and Retention Policy
11. Data Warehouse, Business Intelligence, Performance
Management and Monitoring
12. Business Process Management (BPM)
13. Semantic Web Technologies
14. Email Management
15. Portal
12
Preliminary Proposal for Required Data Architecture in SOA Framework (1)
Source
Data
ODS
Systems
Data
Staging
Areas
Data
Warehouse
Data Marts
By Service Dept
By Media Type
By Profile Data
By Discipline
By Grain
By Contract
Analytical
Data Mart
Ad Hoc
Query
Modeling
& Mining
Tools
Visualization
Tools
Rule Engine
ETL/BI/Ontology
Meta-data
Unstructured
Data
Meta-data
Structured
Data
Vocabulary
/Lexicon/Concept
Presentation
Enterprise portals
Collaborate
Discover
Select
Annotate
Enhance
Search
Navigate
Syndicate
Interchange
Ontology-assisted
Transformation
Reference
Master
Data
RDF/OWL
Data Ontology
Profile
Protocol
Model
Spec
Standard
Schema
Contract
Constraint
Media type
Linkage
13
Preliminary Proposal for Required Data
Architecture in SOA Framework (2)
Discovery
agencies
Service
Providers
Service
Consumers
Find
Publish
Interact
Capable
Need
Satisfaction
Requirement
has
Service
Description
Protocols
Standards
Specs
Policies
Limits
Governance
Contacts
Reuse
Interoperable
Visibility
Execution
context
Effect
Strategies
Patterns
Models
Profiles
Domains
Refer Hold
Contract
& policies
Service
Distribution
Content
Technology
specify
Info, Process
Action
Behavior
Model
has
feedback
14
Compare What Offered in Content Technologies
by Library IT vendors (1)
Check Functional Components and Library-wide Content
Sources Supported By Library IT Industries
Integrated Library Systems (ILS)– Print and Electronic
Resources
Electronic Resources Management Systems (ERMS) for
Subscribed Titles in Electronic Databases – Full text and
A&I
Full-text A-Z List by Directory and Subject on Library Web
Federated Search, Google-like, etc. search on Library Web
Link Resolver and Knowledgebase containing logical links
and holdings for print and electronic materials
Digitized Documents and Images
Library Web Contents 1.0, 2.0, & 3.0, including stream
videos, library Wiki, eForms, instant messaging, RSS,
eTutorials, eNews, eAlerts, etc.
15
Compare What Offered in Content Technologies
by Library IT vendors (2)
Interlibrary Loan Services (ILL)
eReserve
eReferences – Ask Librarian
Auto-citation integration – RefWork, Endnote, etc.
Record Management for Institution and Archival Contents, e.g. EAD
and TEI
Library portals as library content and service distribution toolkit, e.g.
WorldCat, Google Scholar, etc.
Integrated support in context of service request:
uPortal
Learning Management Systems (LMS) and courseware, e.g. WebCT
uSearch, uMeta-data, uTaxonomy,
uEmail Management
uReporsitory, eg. DSpace, Fedora, Sakai;
Domain specific repositories, e.g. PMC
Community-based repository, e.g. EI Village, Community of Science,
MySpace
Statistics for inventory, budget, cost, user behavior, usage, etc.
16
Current Status Check (1)
1. Current Web and Print Resources Integration Effort at
St. John’s University Library –
Ex Libris - Voyager as ASP Solutions for ILS
Serials Solutions – SaaS Solutions for E-J Management
Maintain singe version of the truth of E-J holdings and
subscriptions via SS Knowledgebase and Clients;
Output – E-J A-Z list at journal level:
on the library website in HTML format
in EZ Proxy server as monthly updates in XML format
in Voyager as MARC title list in MARC format
in article linking to OCLC World Cat, Google Scholar,
NetLibrary, and content providers at article-level
in central search at package level if connectors with content
providers are readily available (in progress)
Use Gary Strawn’s Location Changer, and MARCEdit for monthly
updates MARC title list in Voyager and data consistence
checking among the above lists of services
Separate workflow process and platform for E-Content
Packages listed as A-Z list by database name, and by subject ;
yet same content packages provided by the same vendor;
17
Current Status Check (2)
3. Still look for:
Electronic Resources Management Systems (ERMS), e.g.
Serials Solutions, TDNet, Meridian, Verde;
Digital Resources Management Systems (DRMS)–
ContentDM, Greenstone, Encompass, etc.;
Institutional Repository Archives, e.g. DSpace, Sakai, Fedora,
etc.;
Library Portals to uPortal Courseware, e.g. Blackboard,
WebCT;
4. Implemented SaaS solutions to citation management –
RefWork; EReserve – Docutek
5. Campus IT handles Institutional Portals, Courseware,
Faculty Pub, Student & Alumni Repositories in
collaboration with the Library;
Current Status Check (2)
18
Obtain Journal Title Holdings from OPAC and
Journal A-Z List to Content Providers
19
Obtain the Journal Issue
20
Obtain – Journal Article
21
Obtain by Subject – Two Terms in one Search
22
PubMed: Ear Wax Removal (1)
23
PubMed: Ear Wax removal(2)
24
EBSCO: Cerumen
25
Scorpus: Earwax
26
Gale Group: Removal of Cerumen
27
Gale Group: Removal of Cerumen
28
Gale Group: Removal of Cerumen
29
Obtain by Subject - LCSH Search: Ear Wax
30
LCSH Authority File: Earwax
31
Obtain by Subject Default to Broader Term - LC Catalog: Earwax
32
Obtain by Subject Default to Broader Term - LC Catalog: Earwax
33
Obtain by Subject: Wikipedia: Earwax
34
Obtain by Subject: Wikipedia: Earwax
35
Obtain By Subject – Two Terms in Two Searches in Ask
36
Obtain By Subject – Two Terms in Two Searches in Ask
37
Obtain by Subject: Two Terms – Two Searches in Yahoo
38
Obtain By Subject – Two Terms in Two Searches in Yahoo
39
Obtain by Subject: Two Terms – Two Searches in Google
40
Obtain By Subject – Two Terms in Two Searches in Google
41
Computer Simulated Model to Draw This Chart – How Many Data Store
Do we Need, and How Many Interfaces Do we Need to Create for the End
User?
Napoleon’s March to Moscow – The War of 1812
Edward Tuffe – Poster from Envisioning Information
42
Current State of Web Content Packaging using Integrated Library Systems,
Electronic and Digital Resource Management Systems in Comparison with
What Offered by Aggregators, Google, Ask_Jeeves, etc. (2)
At presentation layer, systems that support open URL allows user to traverse from
database to journal, and from journal to article independent of the location of the
services. We have more chances to ensure ‘find, access, and obtain’; while search
engines may provoke copyright and license barrier;
At document processing level, PubMed, LC, EBSCO, SCORPUS, Gale Group use
authority control for subject access, while Google, Ask_Jeeves, Yahoo do not. We
anticipate user’s query by adding authority control for named entity and
controlled vocabulary for subject access, e.g. two search terms only need to be
entered once;
At query level, there are lot of rooms for us to improve – query expansion, e.g.
teaser, refinement, and optimization, etc.;
At end user level, we still do not know them at individual level;
At process management level and performance measure level, we are still in
ground 0.
At content data model level - Data Store vs. Interface
How Many Interfaces Do we Need? – ERMS - eJournals, Federated Search -
Articles, Library Web Site –Databases, DCMS –Images?
43
Desired Features for Managing Library Print and
Electronic Content on library website(1)
Need Another ECMS Or Wrapper Or Data
Warehouse?
Function - merge data
Essential elements for a journal record in Serials
Solutions, and Library Catalog have different
requirements. Yet, they all need core elements for
identification, discovery, and dis-ambiguous purpose;
How many times do we have to create them or export
and import them into these repositories?
44
Desired Features for Managing Library Print and Electronic
Content on library website (2)
Library Content Packaging Process:
Data extraction, transformation, and
load (ETL) is still manual-oriented
process, e.g. loading MARC data file
into ACQ, ACQ into Meridian,
LinkFinderPlus into Federated
Search;
If we want to maintain one version
of truth of our data for ILS, ERMS,
DCM, Federated Search, and
Dspace, shouldn’t it be - extracted,
loaded, transformed (ELT), and
designed in a way that they can be
modularized, reusable, and portable
everywhere;
Constant tagging standard for Web
content at taxonomy level among ILS,
ERMS, DCMS, Federated Search, and
Dspace:
Taxonomy for DCMS – container
specific, or across ILS, ERMS,
Federated Search, and uPortal?
Type of Content Unwrapped:
Form processing, how do we
capture form data on our web, or
in print, excel, PDF format?
Digital Asset;
Web content from Web 2.0
Content Redundancy among ILS,
ERMS, DCMS, Federated Search,
and institution repository:
If all we can get from ERMS is 1)
license compliance, and 2)
analytic reports from data
warehouse, shouldn’t we add the
license info to Voyage ACQ, and
build data warehouse on top of
all repositories – ILS, DCM,
Federated Search, library
website, WorldCat, uPortal,
DSpace, etc.?
45
Review of Desired Features for Library Electronic Content
Management Systems (ECMS) (3)
Content Data Model
Support mission critical reports,
e.g. 360 degree view of workflow
process for journals?
Collection level record for
hierarchical invoice
processing of a subscription
package with hundreds of
titles in one bundle;
Price history for periodicals
and package should be
allowed to exist in ACQ and
enable price comparison at
journal title level;
Support sufficient business rules for
content validation, e.g. validation rule
against duplicated invoice, etc.
Consistent Content Retention Policy:
ILS – MFHD has retention policy
but not enforced; An item gets
withdrawn from item level;
What about content in ERMS, DCM,
library Website, and how should the
out of date, inaccurate data be
systematically removed?
Can content retention policy be
enforced so that record removal or
changes of locations have options to
setup systematically?
Content display model:
Facet browsing and search
support;
Auto fix of broken URLs and
Web content change;
Horizontal content display
model, e.g. ledger info of various
fiscal year
Meet compliance requirement
46
Review of Desired Features for Library Electronic Content
Management Systems (ECMS) (4)
– Search, navigation, retrieval, and display by
description, classification, subjects from library
catalog to library website, from library website to
content providers, from journal to issue, from issue
to articles;
–
What questions does it answer?? - Vertical and
horizontal (views + processes + usage + ROI) from
the perspective of end-users, librarians and staff,
process owners, administrators, and partners
(contents, technologies, and services)
USE
47
Semantic Web Definitions
1. “A common framework that allows data to be shared and reused across
application, enterprise, and community boundaries.”
– Available: http://www.w3.org/2001/sw/
2. “An attempt to make Web resources more readily accessible to the
automated processes by adding information about the resources that
describe or provide Web content.”
– Available: http://www.w3.org/2004/OWL
3. “Binary relationships capture the meaning of the link” – Tim Berners
Lee, Japan Prize 2002.
– Available: http://www.w3.org/2002/Talks/04-sweb/
4. SW is an “extension of the current web, providing an infrastructure for the
interchange and the integration of data on the Web.”
– Available: http://www.w3c.org/Consortium/Offices/Presentations/RDFTutorial/
48
Tim Berners Lee, “ W3C World Wide Web Consortium, Academic
Discussion, Japan Prize 2002.”
Available: http://www.w3.org/2002/Talks/04-sweb/slide12-0.html
49
Semantic Technologies and Standards
Semantic Web Road Map by Tim
Berbers-Lee, Sept. 1998. Available:
http://www.w3.org/designIssues/Sem
antic.html
1. “A web of data, in some way like a
global database”
2. “Machine understandable information”
3. “Basic assertion model”
-meta-data: property of a resource
in RDF Syntax
4. “Semantic layer”
– RDF schema, FOAF, SKOS
– OWL Lite, OWL DL, OWL Full
5. “Conversion of language” –
‘semantically link two independent
databases, and allow the query of each
other via conversion of the query’
-
6. “Logic layer” – “deduction of one type of
document from a document of another
type, checking of a document against a set
of rules of self consistency, resolution of a
query by conversion from terms unknown
into the terms known”
• SWRL: a semantic Web Rule Language
combining OWL and RuleML
7. “Proof validation” – a language for proof
8. “Evolution rules language”
9. “Query language” – SPARQL query
language for RDF
10. “Digital signature” – “public key
cryptoography”, or “adding logic of trust
as icing on the cake of a reasoning
systems”
11. “Index terms” – RDF search engines
12. “Engine of the future” – combine a
reasoning engine with a search engine
50
Promises of the Semantic Web (1)
URI makes possible for everything, including partial
information to be identifiable;
If it is based on knowledge representation framework,
SW will allow global consistency of data;
Allows aggregation of information;
Support inference of information;
Extensible to multimedia data;
Digital/Electronic library collections, institution and
community collections are Web enabled;
Combine applications remotely for local knowledge
integration – calendar, address book, airline preferences;
Encapsulate all data stores and processes behind the
scene, and address users’ concerns in graphic, chart, etc.
view
51
<?xml version=“1.0”?>
rdf:RDF xmlns:rdf=“http://www.w3c.org/1999/02/22-rdf-syntax-ns#”
xmlns:ss=“http://yz3rj4vl2y.search.serialssolutions.com/serialsSolution/elements/
1.0/#”>
<rdf:Description
rdf:about=“http://YZ3RJ4VL2Y.search.serialssolutions.com/?V=1.0&L=YZ3RJ
4VL2Y&S=JCs&C=ALGEANDLOG&T=marc”>
<ss:JournalTitle>Algebra and logic</ss:JournalTitle>
<ss:JournalISSN rdf:parseType=“Resource”>0002-5232</ss:JournalISSN>
<ss:JournalCoverageDates>from 05/01/2003 to 1 year
ago</ss:JournalCoverageDates>
<ss:Category>Algebra</ss:Category>
<ss:eJournalHome
rdf:resource=“http://yz3rj4vl2y.search.serialssolutions.com”/>
<ss:contains rdf:parseType=“Literal”><h1>St. John’s Univ. Libraries e-full text
Journals</h1></ss:contains></rdf:Desccription>
</rdf:RDF>
Promises of SW Layered Cake: Standards
A simple RDF Example in RDF/XML (2)
52
Promises of SW Layered Cake: Standards
A simple RDF Example in RDF/XML (3)
• A resource is anything that can have a URI:
'http://YZ3RJ4VL2Y.search.serialssolutions.com/?V=1.0&L=YZ3RJ4VL2Y&S=JCs&C=ALG
EANDLOG&T=marc’. Potentially all the elements of RDF/XML file can be addressed as
URI, and thereby a distributed computer.
• A Property is a Resource that has a name and can be used as a property: e.g.
<SS_JournalTitle>
• A statement consists of – Resource, property, and value. The three parts
known as subject (s), predicate (p), and object (o), which are also known as a
RDF Triple (s, p, o).
• RDF Graph defines methods to retrieve triples, property and object pair for a
specific subject which is a resource, etc.
• Core property of RDF: rdf:ID – define a fragment identifier within the RDF
portion, used in conjunction with xml:base; rdf:value; rdf:subject, rdf:object,
rdf:rest, rdf:first, rdf:nodeID (internal identifier for a resource).
• Blank nodes with identical nodeID-s in different graphs are different.
53
Promises of SW Layered Cake: Standards (4)
A simple RDF Container Example in RDF Graph
#JournalTitle
#JournalIssn
#JournalCoverageDates
#eJournalHome
consistsOf
#Category
rdf:nil
rdf:List
rdf:first
rdf:first
rdf:first
rdf:first
rdf:rest
rdf:rest
rdf:rest
rdf:rest
rdf:type
rdf:type
rdf:type
54
Promises of SW Layered Cake: Standards (5)
A simple RDF Container Example in RDF/XML
RDF class: rdf:List
<rdf:Description rdf:about=“#eJournalHome”>
<axsvg:consistsOf rdf:parserType=“Collection”>
<rdf:Description rdf:about=“#JournalTitle”/>
<rdf:Description rdf:about=“#JournalIssn”/>
<rdf:Description
rdf:about=“#JournalCoverageDates”/>
<rdf:Description rdf:about=“#Category”/>
</axsvg:consistsOf>
</rdf:Description>
55
RDF type: rdf:Seq
RDF Properties rdf:_1, rdf:_2, etc.
<rdf:Description rdf:about=“#eJournalHome”>
<axsvg:consistsOf><rdf:description>
<rdf:type rdf:resource=“http:// .. rdf-syntax-ns#Seq”/>
<rdf:_1 rdf:resource=“#JournalTitle”/>
<rdf:_2 rdf:resource=“#JournalIssn”/>
<rdf:_3 rdf:resource=“#JournalCoverageDates”/>
<rdf:_4 rdf:resource=“#Category”/></rdf:description>
</axsvg:consistsOf>
</rdf:Description>
Promises of SW Layered Cake: Standards(6)
A simple RDF Container Examples in RDF/XML
56
Promises of SW Layered Cake(7)
A simple example of RDF Attribute using FOAF
Vocabulary in XHTML
<a href=mailto:xua@stjohns.edu>email</a> or call me 718-990-6716 </p>
…
Existing Web
… <p>If you have any question, please contact me:
Proposed Web
<html xmlns:foaf=“http://xmlns.com/foaf/0.1>
<head><title>Amanda Xu’s Home Page</title></head>
<body>…
<p>If you have any question, please contact me: <a
rel=“foaf:mbox” href=mailto:xua@stjohns.edu</a> or call <span
property=“foaf:phone”>718-990-6716</span></p>
</body>
</html>
IT1
Slide 56
IT1 "RDF/A Primer 1.0: Embedding RDF in XHTML," W3C Working Draft 10 March 2006. Available:
<http://www.w3.org/TR/2006/WD-xhtml-rdfa-primer-20060310
Information Technology, 4/18/2006
57
Promises of the Layered Cake: Standards (8)
A Simple RDF Vocabulary Description Language
/Schema in XML
<?xml version=“1.0”?>
<rdf:RDF
xmlns:rdf=“http://www.w3.org/1999/02/22-ref-syntax-ns#”
xmlns:rdfs=“http://www.w.org/2000/01/rdf-schema#”
xmlns:ss=“http://yz3rj4vl2y.search.serialssolutions.com/serialsSolution/elements/1.0/#”
xmlns:xsd =“http://www.w3.org/2001/XMLSchema#”>
<rdfs:Class
rdf:about=“http://yz3rj4vl2y.search.serialssolutions.com/serialsSolution”>
<rdfs:subClassof rdf:resource=http://www.w3.org/200/01/rdf-
schema#Resource/>
</rdfs:Class>
<rdf:Property
rdf:about=“http://yz3rj4vl2y.search.serialssolutions.com/serialsSolution/elements/1.0
/JournalTitle”>
<rdfs:domain rdf:resource=“http://
yz3rj4vl2y.search.serialssolutions.com/serialsSolution”/>
<rdfs:comment>No print holdings available for the title</rdfs:comment>
<rdfs:label xml:lang=“en”>JournalTitle</rdfs:label>
</rdf:Property> …
</rdf:RDF>
58
Promises of the Layered Cake: Standards (9)
A Simple RDF Vocabulary Description Language
/Schema
• Core properties of RDF schema:
– rdfs:subClassOf,
– rdfs:seeAlso (another doc containing
additional information about the resources
being described (t.o.c.),
– rdfs:member, rdfs:label, rdfs:subPropertyOf,
rdfs:isDefinedBy, rdfs:Comment, rdfs:domain,
rdfs:Range,
rdfs:ContainerMembershipProperty
59
#e-Journal
Home
#JournalTitle rdf:type
rdfs:Resource
Rdfs:Class
rdfs:subSubClassOf
rdf:type
Nodes – rdfs:Resource, rdfs:Class
Properties – rdfs:subClasssOf, rdf:type
Promises of SW Layered Cake: Standards (10)
A Simple RDF Vocabulary Description Language
/Schema Graph
60
Promises of SW Layered Cake:
RDF/RDFS Standards and Technologies (11)
Binding RDF to an XML file
Use rdf:about asURI for external resources
Add RDF to XML directly in its own namespace
Technology
Editor – DC.DOT, OCLC, IsaViz;
Parser – ARP2, ICS-Forth
Scraper – GRDDL – microformat extraction out of XML files
SPARQLapplication –
SQL/SPARQL bridge – relational db
GRDDL for xml files
RDF files
RDFLib
HP Bristol lab
Jena – full SPARQL implementation
RDFstore(perl), RAP, SWI-Prolog
RDF/A extends HTML
Extends the link and meta elements
61
Promises of SW Layered Cake: Standards (12)
Web Ontology Language (OWL)
Dr. Leo Obrst, MITRE, 2006:
“Ontologies are usually expressed in a logic-based language,
enabling detailed, sound, meaningful distinctions to be made
among classes, properties, & relations”;
“More expressive meaning but maintain ‘computability.’”
SW expresses “ontological information about instances appearing
in multiple documents linking of data from diverse sources in a
principled way.” –W3C OWL Web Ontology Language Guide, 10
Feb. 2004
Expressive, aggregation, link, inference – capability of OWL
““Ontology Spectrum and Semantic ModelsOntology Spectrum and Semantic Models””
Dr. LeoDr. Leo ObrstObrst
MITREMITRE
Information Semantics GroupInformation Semantics Group
Information Discovery & UnderstandingInformation Discovery & Understanding
Center for Innovative Computing & InformaticsCenter for Innovative Computing & Informatics
January 12 & 19, 2006January 12 & 19, 2006
http://ontolog.cim3.net/cgihttp://ontolog.cim3.net/cgi--bin/wiki.pl?ConferenceCall_2006_01_12bin/wiki.pl?ConferenceCall_2006_01_12 inin
http://ontolog.cim3.net/cgihttp://ontolog.cim3.net/cgi--bin/wiki.pl?WikiHomePagebin/wiki.pl?WikiHomePage
62
63
64
Promises of SW Layered Cake: Standards (13)
A Sample Web Ontology Language (OWL) in Graph
A dolphin is a mammal living in the sea or in the Amazon
From W3C Tutorial – www.w3.org/Consortium/Offices/Presentation/RDFTutoiral
65
Promises of SW Layered Cake: Standards (14)
A Sample Web Ontology Language (OWL) in XML
From: www.w3.org/Consortium/Offices/Presentations/RDFTutoiral#118
66
Promises of SW Layered Cake: Standards (15)
Web Ontology Language (OWL)
Example of MARC 753 Serialized in RDF/OWL pt. 1
245 ##$a Decisions in economics and finance: A Journal of Applied
Mathematics
753 ##$a Applied mathematics
753 ##$a Mathematical models $b Social sciences
753 ##$a Mathematical models $b Economics
753 ##$d Social sciences $b Mathematical models
$s Mathematical models $t Social sciences
753 ##$d Economics $b Mathematical models
$s Mathematical models $t Economics
67
Promises of SW Layered Cake: Standards (16)
Web Ontology Language (OWL)
Example of MARC 753 Serialized in RDF/OWL pt.2
<?xml version=“1.0”?>
<rdf:RDF
xmlns:rdf=“http://www.w3.org/1999/02/22-ref-syntax-ns#”
xmlns:rdfs=“http://www.w.org/2000/01/rdf-schema#”
xmlns:ss=“http://yz3rj4vl2y.search.serialssolutions.com/serialsSolution/elemen
ts/1.0/#”
xmlns:xsd =“http://www.w3.org/2001/XMLSchema#”
xmlns:owl =“http://www.w3.or/2002/07/owl#”
Xml:base
=“xmlns:ss=“http://yz3rj4vl2y.search.serialssolutions.com/serialsSolution#”>
<owl:Ontology rdf:about=“”/>
<owl:Class rdf:ID=“AppliedMathematics”>
<rdfs:subClassesOf rdf:resource=“Mathematics” />
<rdfs:comment>An Example of OWL Ontology</rdfs:comment>
<rdfs:label>Applied Mathematics<rdfs:label>
</owl:Class>
<owl:ObjectProperty>, <rdfs:domain>, <rdfs:range>,
<owl:DataTypeProperty>, <owl:FunctionProperty>
…
68
OWL Web Ontology Language: Semantics and Abstract Syntax
http://www.w3.org/TR/owl-semantics/
W3C OWL Web Site
http://www.w3.org/2004/OWL/
SWRL: A Semantic Web Rule Language Combining OWL and
RuleML
http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage
Promises of SW Layered Cake: Standards and Technologies(17)
A Sample Web Ontology Language (OWL) in XML
Tools
Protégé OWL – Ontology Editor for the Semantic Web
http://protege.stanford.edu/plugins/owl/swrl/
Protégé-Frames—User interface and knowledge server to
support users in constructing and storing frame-based
domain ontologies, customizing data entry forms, and
entering instance data:
http://protege.stanford.edu/overview/protege-frames.html
69
Protégé 3.0 beta – family.swrl
70
SWRL Editor: Protégé 3.0 beta – family.swrl
71
Promises of SW Layered Cake: Standards (18)
SKOS (Simple Knowledge Organization System)
www.w3.org/Consortium/Offices/Presentations/RDFTutorial/#146
72
Promises of SW Layered Cake: Standards (19)
SKOS (Simple Knowledge Organization System)
www.w3.org/Consortium/Offices/Presentations/RDFTutorial/#147
73
Promises of SW Layered Cake: Standards (20)
Topic Map from Mulberrytech
74
Semantic Web for Managing Library Resources on
the Websites
Markup & Apply Accurate Metadata/Subject Analysis
Term with Manual and Semi-automatic tools (JN title list);
Develop common semantic structures and data dictionaries
(e.g. Master Classification Scheme – LCC) ;
Taxonomy work results in machine addressable schema
that enables cross-applications transactions; Web services
infrastructure is needed to make content portable (e.g.
uPortal, library website, etc.);
Content tagging with w/ topic (LCSH, MESH, AAT, etc.)
and LC classification markers;
Aggregation of content through portal/data warehouse
channels using Simple Knowledge Organization Systems
(SKOS);
Add facets to a category, eg. Location -> Type;
75
A Sample Snapshot of LC Classification Scheme to Encompass
All Library Resources on the Website - Math
76
A Sample Snapshot of Categorized Course Titles
77
A Sample Snapshot of Categorized Faculty Specialty by LCC
78
A Sample Snapshot of Categorized Books Checked Out By Faculty by LCC
79
A Sample Snapshot of ‘To be Categorized’ JN Titles
80
A Case Study for St. John’s University Library with Sample Conceptual Model,
and no Live Applications Built due to Time, Resources, and Tooling Constraint
1. Continue to maintain single version of true for e-holdings and print
holdings, e.g. Serials Solutions and Voyager;
2. Added named entity for product name – MARC 730 field;
3. Add subject category browse – MARC 753 field;
4. Add facet terms from other thesaurus;
5. Add authority control;
6. Output e-holdings to library website, WebVoyage, and WorldCat;
7. Develop a classification scheme for all resources on library website in
conformance to other resources at enterprise level
8. Develop Web service infrastructure to dynamic insert, update, and
delete of content residing in ECM/Portal/Data warehouse and
interchange data among content partners within and outside the
institutions;
9. ETL and data cleansing, and automate the process as much as
possible with SaaS Solution providers
81
82
The Palace Museum (Beijing) 《Qingming Shang He Tu Bu Quan Juan》
Author:Zhang Zeduan 、Luo Dongping
Website: http://www.qingmingtu.com/english/index.htm
83
References to Typical Set of Automatic Tools and Methodologies
Supporting Semantic Web Application Development (1)
Starting Point for Processes:
1. Project Management and Enterprise
Architecture
2. Content Capturing
3. Content Management Systems
4. Search Engine Services
5. Portal development
6. BPM/SOA
7. CRM (Customer Relationship Management)
8. Service Resolution Management;
Starting Point for Methodologies:
1. RUP (Rational Unified Process) and Agile
Software Dev.;
2. Develop project management, enterprise
architecture, SW development and deployment
platforms;
3. Modeling on data, processes, systems, and
people associated with the SW applications in
UML and Entity Diagram;
4. Develop requirements, use cases, functional
and technical specifications, testing cases,
deployment, release, and acceptance plans;
5. Develop applications with process specific set
of tools;
6 D l i t t ti i t di t
84
References to Typical Set of Automatic Tools and Methodologies
Supporting Semantic Web Application Development (2)
Starting point for tools
1. Checkout all the tools that I mentioned in presentation slice 2 and 3;
2. Go to the companies’ websites, download and test their tools;
3. Identify and develop your own stack of tools
4. Try:
• Protégé OWL – Ontology Editor for the Semantic Web
http://protege.stanford.edu/plugins/owl/swrl/
• Protégé-Frames—User interface and knowledge server to support users in
constructing and storing frame-based domain ontologies, customizing data entry
forms, and entering instance data:
http://protege.stanford.edu/overview/protege-frames.html
5. If you are an Oracle user, protégé_oracle_rdf_plugin, ntriple_converter,
Oracle RDF Batch Loader Package

Mending the Gap between Library's Electronic and Print Collections in ILS and Library's Web Site using Semantic Web - Progress Report

  • 1.
    1 Mending the Gapbetween Library’s Electronic and Print Collections in ILS and Library’s Web Site using Semantic Web - Progress Report 2007 EndUser Annual Ex Libris User Group Meeting April 28, 2006 Chicago, ILL Amanda Xu, Electronic Resources Cataloging Librarian, and Andrew Sankowsi, Director of Collection and Information Management St. John’s University Library Jamaica, New York
  • 2.
    2 Content Technology Distribution Q: Where isa library’s value proposition as illustrated in the triangle below? Is library an aggregator of aggregators? If so, are we ready for it? If not, what is the our next social identity? How can we transfer our skills from ‘info-land’ or ‘meta-land’ to ‘digital land’ or ‘semantic land’ or simply hybrid ‘land of bits?’ where print and electronic co-exists? Where is users’ behavior context? Aggregated contents through technologies? Capable to select, organize, access, guide, enhance, and distribute contents to the user through technology. Still, complains like ‘why what you buy is not what I need, and what I need is not what you buy?’
  • 3.
    3 Introduction • At thisThursday’s keynote presentation, Oren Beit-Arie – Chief Strategy Officer of Ex Libris defined the role of library as ‘connecting users to content’, and providing ‘unique services, to tailor the needs of their users and integrate service into users’ tasks and workflow.’ • This is exactly where we feel the game is about as well – modeling user’s information seeking behavior in context of their experience, use the information to improve our collections and services. • Last year, we proposed a sample conceptual model to: – Identify the information need of our faculty through the footstep of their teaching and research experience; – Use aggregated information to measure how well our collections will meet faculty’s teaching, learning, and research requirement through the footstep of collection managers; – The subject area that we chose was math and computer science
  • 4.
  • 5.
    5 Major Challenges andOpportunities Challenges 1. Much of the data sources still residing in isolated data islands in closed systems or flat file systems 2. Integration requires enterprise level 3. Lack of resources, times, money, and understanding of required systems infrastructure – hardware, software, database, network messaging, data structure, etc. throughout the lifecycle development of the application; 4. Lack of resources and required skills to build the repository server, handle ETL process and messaging among systems across the enterprise; 5. Hard to get people buy-in • Priority conflict • No access to data sources • Turf protection • No agreement on business model – SAAS • No governance Opportunities 1. Model resource discovery process cross-databases 2. Identify gaps in existing systems infrastructure as far as content selection is concerned; 3. Re-examine collection development and information management process 4. Develop survey forms, interview people in charge of functional areas 5. Plan to do collection analysis via inventory, gaps, usage, cost 6. Tie the analysis to researcher’s need 7. Identify new data sources to be managed by libraries from institutional repository to ILS operation; 8. Identify priority list - Mend the gap between print and electronic collections in ILS and Library Site
  • 6.
    6 1. Environmental Scanof Web Content Technologies by IT Vendors, by Library IT Vendors and Academic Libraries; 2. Current Web and Print Resources Integration Effort at St. John’s University Library - Voyager, Gary Strawn’s Location Changer, Serials Solutions, OCLC World Cat, Google Scholar, Net Library, NYLINK, WALDO, Content Aggregators, Institutional Portals, Courseware, Faculty Pub, Student & Alumni Repositories; 3. Swam through the whole iterative processes of application development from backend to front end for the project: Gather requirement Obtain buy-in from IT vendors, IT consulting services, and make recommendation Obtain support from upper and middle level managers Get into training at NYU SCPS, and sharpen needed skills set for communication with technical and non-technical people Proposed technical infrastructure from SOA framework to database systems, data structure, etc. Recommend a little touch of Semantic Web technologies for Pervasive Library Resource Management on the Websites Steps Taken
  • 7.
    7 The credit ofthe talk shall go to many giants in IT industries, library IT industries, especially former Endeavor and later Ex Libris Information Systems, End User 2007 Program Planning Committee, St. John’s University Library, NYU SCPS faculty; What Will Not Be Discussed for This Talk: Detailed files, data repositories, networks (physical and logistics), distributed application programming and computing services, security, jobs and events required for content aggregation and deployment on library’s websites; Mathematical models for auto-text processing, patterns, and business rules generated and deployed by any given semantic Web application; Stacks of standards concerning key content technologies Credit and Exceptions
  • 8.
    8 Review Web ContentTechnologies By IT Vendors(1) Key Content Technology Vendors Investigated: 1. Project Management, Enterprise Architecture, Modeling – IBM Rational, iRise, Telelogic, Agilense, CA Erwin, MS Visio, MS Project Server, Embarcadero, Enterprise Elements 2. Content Capturing Vendors – Captiva Software, Adobe, FUJI, ZyLab, ABBYY FlexiCapture, Liquid Office; 3. ETL and Master Data Management: Sunosis, Data Flux, Kalido 4. Content Management Systems – Oracle Stellent, EMC Documentum, ArborText, POET, XEnterprise, Marklogic; 5. Search Engine Services - Verity K2, Autonomy, Teragram, FAST, Inxight Software, Endeca, iProspect, IBM OmniFind, Siderean, Semantic Works, Google, Ask;
  • 9.
    9 Review Web ContentTechnologies By IT Vendors(2) 6. Portal – Vignette, Hummingbird, MS Share Point and IBM Websphere; 7. BPM/SOA – Global 360, IBM FileNet, Lombardi, PegaSystems, Hyperion, Pervasive, SUN/SeeBeyond, BEA Systems, TIBCO, Sonic ESB; 8. CRM (Customer Relationship Management Vendors)– Answers, Oracle Seibel; 9. Business Intelligence and Reporting – Business Objects, Crystal Reports, Oracle, Informatica, Accenture, SAP, SAS, SPSS; 10.Service Resolution Management – Knova and Kana; 11.Semantic Technologies: IBM, Oracle, Jena2, Gate, Siderian; 12.Best Practice Sources: Delphi, KM, TDWI, AIIM, DCI, OMG, SOA/BPM Institute, Insight Shared Network, SD Time, Project 10X, NCOR, Getty, DocuLab, Ken Orr Institute, Zachman Institute, Essential Strategies, TOPQuadrant, Semantic Arts, Forrester, etc.
  • 10.
    10 Check Functional Componentsand Enterprise Level Content Sources to Be Leveraged in the Framework of Service Oriented Architecture (SOA) (1) 1. Project Management, Enterprise Architecture & Modeling 2. Imaging and Document Capture 3. Web Content Management from 2.0 to 3.0, including content created from portal, desktop application, browser, e-form, and other web-based collaboration environment, such as Wiki, Flickr, instant messaging, Yahoo 360 & Food Site, Oracle OTN Site, MySpace, blog, RSS, social tagging, recommend, etc. 4. Document Management 5. Record and Retention Management 6. Digital Asset Management 7. ECM (Electronic Content Management) - Taxonomy, Thesauri, Topic Map, Meta-data
  • 11.
    11 Check Functional Componentsand Enterprise Level Content Sources to Be Leveraged in the Framework of Service Oriented Architecture (SOA) (2) 8. Enterprise Search, Directory, Digital Signature, Auto- Classification, Clustering, Categorization, Security, Risk Management 9. Compliance to License, Auditing, Federal and Legal Regulations 10. Information Reusability, Lifecycle and Retention Policy 11. Data Warehouse, Business Intelligence, Performance Management and Monitoring 12. Business Process Management (BPM) 13. Semantic Web Technologies 14. Email Management 15. Portal
  • 12.
    12 Preliminary Proposal forRequired Data Architecture in SOA Framework (1) Source Data ODS Systems Data Staging Areas Data Warehouse Data Marts By Service Dept By Media Type By Profile Data By Discipline By Grain By Contract Analytical Data Mart Ad Hoc Query Modeling & Mining Tools Visualization Tools Rule Engine ETL/BI/Ontology Meta-data Unstructured Data Meta-data Structured Data Vocabulary /Lexicon/Concept Presentation Enterprise portals Collaborate Discover Select Annotate Enhance Search Navigate Syndicate Interchange Ontology-assisted Transformation Reference Master Data RDF/OWL Data Ontology Profile Protocol Model Spec Standard Schema Contract Constraint Media type Linkage
  • 13.
    13 Preliminary Proposal forRequired Data Architecture in SOA Framework (2) Discovery agencies Service Providers Service Consumers Find Publish Interact Capable Need Satisfaction Requirement has Service Description Protocols Standards Specs Policies Limits Governance Contacts Reuse Interoperable Visibility Execution context Effect Strategies Patterns Models Profiles Domains Refer Hold Contract & policies Service Distribution Content Technology specify Info, Process Action Behavior Model has feedback
  • 14.
    14 Compare What Offeredin Content Technologies by Library IT vendors (1) Check Functional Components and Library-wide Content Sources Supported By Library IT Industries Integrated Library Systems (ILS)– Print and Electronic Resources Electronic Resources Management Systems (ERMS) for Subscribed Titles in Electronic Databases – Full text and A&I Full-text A-Z List by Directory and Subject on Library Web Federated Search, Google-like, etc. search on Library Web Link Resolver and Knowledgebase containing logical links and holdings for print and electronic materials Digitized Documents and Images Library Web Contents 1.0, 2.0, & 3.0, including stream videos, library Wiki, eForms, instant messaging, RSS, eTutorials, eNews, eAlerts, etc.
  • 15.
    15 Compare What Offeredin Content Technologies by Library IT vendors (2) Interlibrary Loan Services (ILL) eReserve eReferences – Ask Librarian Auto-citation integration – RefWork, Endnote, etc. Record Management for Institution and Archival Contents, e.g. EAD and TEI Library portals as library content and service distribution toolkit, e.g. WorldCat, Google Scholar, etc. Integrated support in context of service request: uPortal Learning Management Systems (LMS) and courseware, e.g. WebCT uSearch, uMeta-data, uTaxonomy, uEmail Management uReporsitory, eg. DSpace, Fedora, Sakai; Domain specific repositories, e.g. PMC Community-based repository, e.g. EI Village, Community of Science, MySpace Statistics for inventory, budget, cost, user behavior, usage, etc.
  • 16.
    16 Current Status Check(1) 1. Current Web and Print Resources Integration Effort at St. John’s University Library – Ex Libris - Voyager as ASP Solutions for ILS Serials Solutions – SaaS Solutions for E-J Management Maintain singe version of the truth of E-J holdings and subscriptions via SS Knowledgebase and Clients; Output – E-J A-Z list at journal level: on the library website in HTML format in EZ Proxy server as monthly updates in XML format in Voyager as MARC title list in MARC format in article linking to OCLC World Cat, Google Scholar, NetLibrary, and content providers at article-level in central search at package level if connectors with content providers are readily available (in progress) Use Gary Strawn’s Location Changer, and MARCEdit for monthly updates MARC title list in Voyager and data consistence checking among the above lists of services Separate workflow process and platform for E-Content Packages listed as A-Z list by database name, and by subject ; yet same content packages provided by the same vendor;
  • 17.
    17 Current Status Check(2) 3. Still look for: Electronic Resources Management Systems (ERMS), e.g. Serials Solutions, TDNet, Meridian, Verde; Digital Resources Management Systems (DRMS)– ContentDM, Greenstone, Encompass, etc.; Institutional Repository Archives, e.g. DSpace, Sakai, Fedora, etc.; Library Portals to uPortal Courseware, e.g. Blackboard, WebCT; 4. Implemented SaaS solutions to citation management – RefWork; EReserve – Docutek 5. Campus IT handles Institutional Portals, Courseware, Faculty Pub, Student & Alumni Repositories in collaboration with the Library; Current Status Check (2)
  • 18.
    18 Obtain Journal TitleHoldings from OPAC and Journal A-Z List to Content Providers
  • 19.
  • 20.
  • 21.
    21 Obtain by Subject– Two Terms in one Search
  • 22.
    22 PubMed: Ear WaxRemoval (1)
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
    29 Obtain by Subject- LCSH Search: Ear Wax
  • 30.
  • 31.
    31 Obtain by SubjectDefault to Broader Term - LC Catalog: Earwax
  • 32.
    32 Obtain by SubjectDefault to Broader Term - LC Catalog: Earwax
  • 33.
    33 Obtain by Subject:Wikipedia: Earwax
  • 34.
    34 Obtain by Subject:Wikipedia: Earwax
  • 35.
    35 Obtain By Subject– Two Terms in Two Searches in Ask
  • 36.
    36 Obtain By Subject– Two Terms in Two Searches in Ask
  • 37.
    37 Obtain by Subject:Two Terms – Two Searches in Yahoo
  • 38.
    38 Obtain By Subject– Two Terms in Two Searches in Yahoo
  • 39.
    39 Obtain by Subject:Two Terms – Two Searches in Google
  • 40.
    40 Obtain By Subject– Two Terms in Two Searches in Google
  • 41.
    41 Computer Simulated Modelto Draw This Chart – How Many Data Store Do we Need, and How Many Interfaces Do we Need to Create for the End User? Napoleon’s March to Moscow – The War of 1812 Edward Tuffe – Poster from Envisioning Information
  • 42.
    42 Current State ofWeb Content Packaging using Integrated Library Systems, Electronic and Digital Resource Management Systems in Comparison with What Offered by Aggregators, Google, Ask_Jeeves, etc. (2) At presentation layer, systems that support open URL allows user to traverse from database to journal, and from journal to article independent of the location of the services. We have more chances to ensure ‘find, access, and obtain’; while search engines may provoke copyright and license barrier; At document processing level, PubMed, LC, EBSCO, SCORPUS, Gale Group use authority control for subject access, while Google, Ask_Jeeves, Yahoo do not. We anticipate user’s query by adding authority control for named entity and controlled vocabulary for subject access, e.g. two search terms only need to be entered once; At query level, there are lot of rooms for us to improve – query expansion, e.g. teaser, refinement, and optimization, etc.; At end user level, we still do not know them at individual level; At process management level and performance measure level, we are still in ground 0. At content data model level - Data Store vs. Interface How Many Interfaces Do we Need? – ERMS - eJournals, Federated Search - Articles, Library Web Site –Databases, DCMS –Images?
  • 43.
    43 Desired Features forManaging Library Print and Electronic Content on library website(1) Need Another ECMS Or Wrapper Or Data Warehouse? Function - merge data Essential elements for a journal record in Serials Solutions, and Library Catalog have different requirements. Yet, they all need core elements for identification, discovery, and dis-ambiguous purpose; How many times do we have to create them or export and import them into these repositories?
  • 44.
    44 Desired Features forManaging Library Print and Electronic Content on library website (2) Library Content Packaging Process: Data extraction, transformation, and load (ETL) is still manual-oriented process, e.g. loading MARC data file into ACQ, ACQ into Meridian, LinkFinderPlus into Federated Search; If we want to maintain one version of truth of our data for ILS, ERMS, DCM, Federated Search, and Dspace, shouldn’t it be - extracted, loaded, transformed (ELT), and designed in a way that they can be modularized, reusable, and portable everywhere; Constant tagging standard for Web content at taxonomy level among ILS, ERMS, DCMS, Federated Search, and Dspace: Taxonomy for DCMS – container specific, or across ILS, ERMS, Federated Search, and uPortal? Type of Content Unwrapped: Form processing, how do we capture form data on our web, or in print, excel, PDF format? Digital Asset; Web content from Web 2.0 Content Redundancy among ILS, ERMS, DCMS, Federated Search, and institution repository: If all we can get from ERMS is 1) license compliance, and 2) analytic reports from data warehouse, shouldn’t we add the license info to Voyage ACQ, and build data warehouse on top of all repositories – ILS, DCM, Federated Search, library website, WorldCat, uPortal, DSpace, etc.?
  • 45.
    45 Review of DesiredFeatures for Library Electronic Content Management Systems (ECMS) (3) Content Data Model Support mission critical reports, e.g. 360 degree view of workflow process for journals? Collection level record for hierarchical invoice processing of a subscription package with hundreds of titles in one bundle; Price history for periodicals and package should be allowed to exist in ACQ and enable price comparison at journal title level; Support sufficient business rules for content validation, e.g. validation rule against duplicated invoice, etc. Consistent Content Retention Policy: ILS – MFHD has retention policy but not enforced; An item gets withdrawn from item level; What about content in ERMS, DCM, library Website, and how should the out of date, inaccurate data be systematically removed? Can content retention policy be enforced so that record removal or changes of locations have options to setup systematically? Content display model: Facet browsing and search support; Auto fix of broken URLs and Web content change; Horizontal content display model, e.g. ledger info of various fiscal year Meet compliance requirement
  • 46.
    46 Review of DesiredFeatures for Library Electronic Content Management Systems (ECMS) (4) – Search, navigation, retrieval, and display by description, classification, subjects from library catalog to library website, from library website to content providers, from journal to issue, from issue to articles; – What questions does it answer?? - Vertical and horizontal (views + processes + usage + ROI) from the perspective of end-users, librarians and staff, process owners, administrators, and partners (contents, technologies, and services) USE
  • 47.
    47 Semantic Web Definitions 1.“A common framework that allows data to be shared and reused across application, enterprise, and community boundaries.” – Available: http://www.w3.org/2001/sw/ 2. “An attempt to make Web resources more readily accessible to the automated processes by adding information about the resources that describe or provide Web content.” – Available: http://www.w3.org/2004/OWL 3. “Binary relationships capture the meaning of the link” – Tim Berners Lee, Japan Prize 2002. – Available: http://www.w3.org/2002/Talks/04-sweb/ 4. SW is an “extension of the current web, providing an infrastructure for the interchange and the integration of data on the Web.” – Available: http://www.w3c.org/Consortium/Offices/Presentations/RDFTutorial/
  • 48.
    48 Tim Berners Lee,“ W3C World Wide Web Consortium, Academic Discussion, Japan Prize 2002.” Available: http://www.w3.org/2002/Talks/04-sweb/slide12-0.html
  • 49.
    49 Semantic Technologies andStandards Semantic Web Road Map by Tim Berbers-Lee, Sept. 1998. Available: http://www.w3.org/designIssues/Sem antic.html 1. “A web of data, in some way like a global database” 2. “Machine understandable information” 3. “Basic assertion model” -meta-data: property of a resource in RDF Syntax 4. “Semantic layer” – RDF schema, FOAF, SKOS – OWL Lite, OWL DL, OWL Full 5. “Conversion of language” – ‘semantically link two independent databases, and allow the query of each other via conversion of the query’ - 6. “Logic layer” – “deduction of one type of document from a document of another type, checking of a document against a set of rules of self consistency, resolution of a query by conversion from terms unknown into the terms known” • SWRL: a semantic Web Rule Language combining OWL and RuleML 7. “Proof validation” – a language for proof 8. “Evolution rules language” 9. “Query language” – SPARQL query language for RDF 10. “Digital signature” – “public key cryptoography”, or “adding logic of trust as icing on the cake of a reasoning systems” 11. “Index terms” – RDF search engines 12. “Engine of the future” – combine a reasoning engine with a search engine
  • 50.
    50 Promises of theSemantic Web (1) URI makes possible for everything, including partial information to be identifiable; If it is based on knowledge representation framework, SW will allow global consistency of data; Allows aggregation of information; Support inference of information; Extensible to multimedia data; Digital/Electronic library collections, institution and community collections are Web enabled; Combine applications remotely for local knowledge integration – calendar, address book, airline preferences; Encapsulate all data stores and processes behind the scene, and address users’ concerns in graphic, chart, etc. view
  • 51.
    51 <?xml version=“1.0”?> rdf:RDF xmlns:rdf=“http://www.w3c.org/1999/02/22-rdf-syntax-ns#” xmlns:ss=“http://yz3rj4vl2y.search.serialssolutions.com/serialsSolution/elements/ 1.0/#”> <rdf:Description rdf:about=“http://YZ3RJ4VL2Y.search.serialssolutions.com/?V=1.0&L=YZ3RJ 4VL2Y&S=JCs&C=ALGEANDLOG&T=marc”> <ss:JournalTitle>Algebraand logic</ss:JournalTitle> <ss:JournalISSN rdf:parseType=“Resource”>0002-5232</ss:JournalISSN> <ss:JournalCoverageDates>from 05/01/2003 to 1 year ago</ss:JournalCoverageDates> <ss:Category>Algebra</ss:Category> <ss:eJournalHome rdf:resource=“http://yz3rj4vl2y.search.serialssolutions.com”/> <ss:contains rdf:parseType=“Literal”><h1>St. John’s Univ. Libraries e-full text Journals</h1></ss:contains></rdf:Desccription> </rdf:RDF> Promises of SW Layered Cake: Standards A simple RDF Example in RDF/XML (2)
  • 52.
    52 Promises of SWLayered Cake: Standards A simple RDF Example in RDF/XML (3) • A resource is anything that can have a URI: 'http://YZ3RJ4VL2Y.search.serialssolutions.com/?V=1.0&L=YZ3RJ4VL2Y&S=JCs&C=ALG EANDLOG&T=marc’. Potentially all the elements of RDF/XML file can be addressed as URI, and thereby a distributed computer. • A Property is a Resource that has a name and can be used as a property: e.g. <SS_JournalTitle> • A statement consists of – Resource, property, and value. The three parts known as subject (s), predicate (p), and object (o), which are also known as a RDF Triple (s, p, o). • RDF Graph defines methods to retrieve triples, property and object pair for a specific subject which is a resource, etc. • Core property of RDF: rdf:ID – define a fragment identifier within the RDF portion, used in conjunction with xml:base; rdf:value; rdf:subject, rdf:object, rdf:rest, rdf:first, rdf:nodeID (internal identifier for a resource). • Blank nodes with identical nodeID-s in different graphs are different.
  • 53.
    53 Promises of SWLayered Cake: Standards (4) A simple RDF Container Example in RDF Graph #JournalTitle #JournalIssn #JournalCoverageDates #eJournalHome consistsOf #Category rdf:nil rdf:List rdf:first rdf:first rdf:first rdf:first rdf:rest rdf:rest rdf:rest rdf:rest rdf:type rdf:type rdf:type
  • 54.
    54 Promises of SWLayered Cake: Standards (5) A simple RDF Container Example in RDF/XML RDF class: rdf:List <rdf:Description rdf:about=“#eJournalHome”> <axsvg:consistsOf rdf:parserType=“Collection”> <rdf:Description rdf:about=“#JournalTitle”/> <rdf:Description rdf:about=“#JournalIssn”/> <rdf:Description rdf:about=“#JournalCoverageDates”/> <rdf:Description rdf:about=“#Category”/> </axsvg:consistsOf> </rdf:Description>
  • 55.
    55 RDF type: rdf:Seq RDFProperties rdf:_1, rdf:_2, etc. <rdf:Description rdf:about=“#eJournalHome”> <axsvg:consistsOf><rdf:description> <rdf:type rdf:resource=“http:// .. rdf-syntax-ns#Seq”/> <rdf:_1 rdf:resource=“#JournalTitle”/> <rdf:_2 rdf:resource=“#JournalIssn”/> <rdf:_3 rdf:resource=“#JournalCoverageDates”/> <rdf:_4 rdf:resource=“#Category”/></rdf:description> </axsvg:consistsOf> </rdf:Description> Promises of SW Layered Cake: Standards(6) A simple RDF Container Examples in RDF/XML
  • 56.
    56 Promises of SWLayered Cake(7) A simple example of RDF Attribute using FOAF Vocabulary in XHTML <a href=mailto:xua@stjohns.edu>email</a> or call me 718-990-6716 </p> … Existing Web … <p>If you have any question, please contact me: Proposed Web <html xmlns:foaf=“http://xmlns.com/foaf/0.1> <head><title>Amanda Xu’s Home Page</title></head> <body>… <p>If you have any question, please contact me: <a rel=“foaf:mbox” href=mailto:xua@stjohns.edu</a> or call <span property=“foaf:phone”>718-990-6716</span></p> </body> </html> IT1
  • 57.
    Slide 56 IT1 "RDF/APrimer 1.0: Embedding RDF in XHTML," W3C Working Draft 10 March 2006. Available: <http://www.w3.org/TR/2006/WD-xhtml-rdfa-primer-20060310 Information Technology, 4/18/2006
  • 58.
    57 Promises of theLayered Cake: Standards (8) A Simple RDF Vocabulary Description Language /Schema in XML <?xml version=“1.0”?> <rdf:RDF xmlns:rdf=“http://www.w3.org/1999/02/22-ref-syntax-ns#” xmlns:rdfs=“http://www.w.org/2000/01/rdf-schema#” xmlns:ss=“http://yz3rj4vl2y.search.serialssolutions.com/serialsSolution/elements/1.0/#” xmlns:xsd =“http://www.w3.org/2001/XMLSchema#”> <rdfs:Class rdf:about=“http://yz3rj4vl2y.search.serialssolutions.com/serialsSolution”> <rdfs:subClassof rdf:resource=http://www.w3.org/200/01/rdf- schema#Resource/> </rdfs:Class> <rdf:Property rdf:about=“http://yz3rj4vl2y.search.serialssolutions.com/serialsSolution/elements/1.0 /JournalTitle”> <rdfs:domain rdf:resource=“http:// yz3rj4vl2y.search.serialssolutions.com/serialsSolution”/> <rdfs:comment>No print holdings available for the title</rdfs:comment> <rdfs:label xml:lang=“en”>JournalTitle</rdfs:label> </rdf:Property> … </rdf:RDF>
  • 59.
    58 Promises of theLayered Cake: Standards (9) A Simple RDF Vocabulary Description Language /Schema • Core properties of RDF schema: – rdfs:subClassOf, – rdfs:seeAlso (another doc containing additional information about the resources being described (t.o.c.), – rdfs:member, rdfs:label, rdfs:subPropertyOf, rdfs:isDefinedBy, rdfs:Comment, rdfs:domain, rdfs:Range, rdfs:ContainerMembershipProperty
  • 60.
    59 #e-Journal Home #JournalTitle rdf:type rdfs:Resource Rdfs:Class rdfs:subSubClassOf rdf:type Nodes –rdfs:Resource, rdfs:Class Properties – rdfs:subClasssOf, rdf:type Promises of SW Layered Cake: Standards (10) A Simple RDF Vocabulary Description Language /Schema Graph
  • 61.
    60 Promises of SWLayered Cake: RDF/RDFS Standards and Technologies (11) Binding RDF to an XML file Use rdf:about asURI for external resources Add RDF to XML directly in its own namespace Technology Editor – DC.DOT, OCLC, IsaViz; Parser – ARP2, ICS-Forth Scraper – GRDDL – microformat extraction out of XML files SPARQLapplication – SQL/SPARQL bridge – relational db GRDDL for xml files RDF files RDFLib HP Bristol lab Jena – full SPARQL implementation RDFstore(perl), RAP, SWI-Prolog RDF/A extends HTML Extends the link and meta elements
  • 62.
    61 Promises of SWLayered Cake: Standards (12) Web Ontology Language (OWL) Dr. Leo Obrst, MITRE, 2006: “Ontologies are usually expressed in a logic-based language, enabling detailed, sound, meaningful distinctions to be made among classes, properties, & relations”; “More expressive meaning but maintain ‘computability.’” SW expresses “ontological information about instances appearing in multiple documents linking of data from diverse sources in a principled way.” –W3C OWL Web Ontology Language Guide, 10 Feb. 2004 Expressive, aggregation, link, inference – capability of OWL ““Ontology Spectrum and Semantic ModelsOntology Spectrum and Semantic Models”” Dr. LeoDr. Leo ObrstObrst MITREMITRE Information Semantics GroupInformation Semantics Group Information Discovery & UnderstandingInformation Discovery & Understanding Center for Innovative Computing & InformaticsCenter for Innovative Computing & Informatics January 12 & 19, 2006January 12 & 19, 2006 http://ontolog.cim3.net/cgihttp://ontolog.cim3.net/cgi--bin/wiki.pl?ConferenceCall_2006_01_12bin/wiki.pl?ConferenceCall_2006_01_12 inin http://ontolog.cim3.net/cgihttp://ontolog.cim3.net/cgi--bin/wiki.pl?WikiHomePagebin/wiki.pl?WikiHomePage
  • 63.
  • 64.
  • 65.
    64 Promises of SWLayered Cake: Standards (13) A Sample Web Ontology Language (OWL) in Graph A dolphin is a mammal living in the sea or in the Amazon From W3C Tutorial – www.w3.org/Consortium/Offices/Presentation/RDFTutoiral
  • 66.
    65 Promises of SWLayered Cake: Standards (14) A Sample Web Ontology Language (OWL) in XML From: www.w3.org/Consortium/Offices/Presentations/RDFTutoiral#118
  • 67.
    66 Promises of SWLayered Cake: Standards (15) Web Ontology Language (OWL) Example of MARC 753 Serialized in RDF/OWL pt. 1 245 ##$a Decisions in economics and finance: A Journal of Applied Mathematics 753 ##$a Applied mathematics 753 ##$a Mathematical models $b Social sciences 753 ##$a Mathematical models $b Economics 753 ##$d Social sciences $b Mathematical models $s Mathematical models $t Social sciences 753 ##$d Economics $b Mathematical models $s Mathematical models $t Economics
  • 68.
    67 Promises of SWLayered Cake: Standards (16) Web Ontology Language (OWL) Example of MARC 753 Serialized in RDF/OWL pt.2 <?xml version=“1.0”?> <rdf:RDF xmlns:rdf=“http://www.w3.org/1999/02/22-ref-syntax-ns#” xmlns:rdfs=“http://www.w.org/2000/01/rdf-schema#” xmlns:ss=“http://yz3rj4vl2y.search.serialssolutions.com/serialsSolution/elemen ts/1.0/#” xmlns:xsd =“http://www.w3.org/2001/XMLSchema#” xmlns:owl =“http://www.w3.or/2002/07/owl#” Xml:base =“xmlns:ss=“http://yz3rj4vl2y.search.serialssolutions.com/serialsSolution#”> <owl:Ontology rdf:about=“”/> <owl:Class rdf:ID=“AppliedMathematics”> <rdfs:subClassesOf rdf:resource=“Mathematics” /> <rdfs:comment>An Example of OWL Ontology</rdfs:comment> <rdfs:label>Applied Mathematics<rdfs:label> </owl:Class> <owl:ObjectProperty>, <rdfs:domain>, <rdfs:range>, <owl:DataTypeProperty>, <owl:FunctionProperty> …
  • 69.
    68 OWL Web OntologyLanguage: Semantics and Abstract Syntax http://www.w3.org/TR/owl-semantics/ W3C OWL Web Site http://www.w3.org/2004/OWL/ SWRL: A Semantic Web Rule Language Combining OWL and RuleML http://ontolog.cim3.net/cgi-bin/wiki.pl?WikiHomePage Promises of SW Layered Cake: Standards and Technologies(17) A Sample Web Ontology Language (OWL) in XML Tools Protégé OWL – Ontology Editor for the Semantic Web http://protege.stanford.edu/plugins/owl/swrl/ Protégé-Frames—User interface and knowledge server to support users in constructing and storing frame-based domain ontologies, customizing data entry forms, and entering instance data: http://protege.stanford.edu/overview/protege-frames.html
  • 70.
    69 Protégé 3.0 beta– family.swrl
  • 71.
    70 SWRL Editor: Protégé3.0 beta – family.swrl
  • 72.
    71 Promises of SWLayered Cake: Standards (18) SKOS (Simple Knowledge Organization System) www.w3.org/Consortium/Offices/Presentations/RDFTutorial/#146
  • 73.
    72 Promises of SWLayered Cake: Standards (19) SKOS (Simple Knowledge Organization System) www.w3.org/Consortium/Offices/Presentations/RDFTutorial/#147
  • 74.
    73 Promises of SWLayered Cake: Standards (20) Topic Map from Mulberrytech
  • 75.
    74 Semantic Web forManaging Library Resources on the Websites Markup & Apply Accurate Metadata/Subject Analysis Term with Manual and Semi-automatic tools (JN title list); Develop common semantic structures and data dictionaries (e.g. Master Classification Scheme – LCC) ; Taxonomy work results in machine addressable schema that enables cross-applications transactions; Web services infrastructure is needed to make content portable (e.g. uPortal, library website, etc.); Content tagging with w/ topic (LCSH, MESH, AAT, etc.) and LC classification markers; Aggregation of content through portal/data warehouse channels using Simple Knowledge Organization Systems (SKOS); Add facets to a category, eg. Location -> Type;
  • 76.
    75 A Sample Snapshotof LC Classification Scheme to Encompass All Library Resources on the Website - Math
  • 77.
    76 A Sample Snapshotof Categorized Course Titles
  • 78.
    77 A Sample Snapshotof Categorized Faculty Specialty by LCC
  • 79.
    78 A Sample Snapshotof Categorized Books Checked Out By Faculty by LCC
  • 80.
    79 A Sample Snapshotof ‘To be Categorized’ JN Titles
  • 81.
    80 A Case Studyfor St. John’s University Library with Sample Conceptual Model, and no Live Applications Built due to Time, Resources, and Tooling Constraint 1. Continue to maintain single version of true for e-holdings and print holdings, e.g. Serials Solutions and Voyager; 2. Added named entity for product name – MARC 730 field; 3. Add subject category browse – MARC 753 field; 4. Add facet terms from other thesaurus; 5. Add authority control; 6. Output e-holdings to library website, WebVoyage, and WorldCat; 7. Develop a classification scheme for all resources on library website in conformance to other resources at enterprise level 8. Develop Web service infrastructure to dynamic insert, update, and delete of content residing in ECM/Portal/Data warehouse and interchange data among content partners within and outside the institutions; 9. ETL and data cleansing, and automate the process as much as possible with SaaS Solution providers
  • 82.
  • 83.
    82 The Palace Museum(Beijing) 《Qingming Shang He Tu Bu Quan Juan》 Author:Zhang Zeduan 、Luo Dongping Website: http://www.qingmingtu.com/english/index.htm
  • 84.
    83 References to TypicalSet of Automatic Tools and Methodologies Supporting Semantic Web Application Development (1) Starting Point for Processes: 1. Project Management and Enterprise Architecture 2. Content Capturing 3. Content Management Systems 4. Search Engine Services 5. Portal development 6. BPM/SOA 7. CRM (Customer Relationship Management) 8. Service Resolution Management; Starting Point for Methodologies: 1. RUP (Rational Unified Process) and Agile Software Dev.; 2. Develop project management, enterprise architecture, SW development and deployment platforms; 3. Modeling on data, processes, systems, and people associated with the SW applications in UML and Entity Diagram; 4. Develop requirements, use cases, functional and technical specifications, testing cases, deployment, release, and acceptance plans; 5. Develop applications with process specific set of tools; 6 D l i t t ti i t di t
  • 85.
    84 References to TypicalSet of Automatic Tools and Methodologies Supporting Semantic Web Application Development (2) Starting point for tools 1. Checkout all the tools that I mentioned in presentation slice 2 and 3; 2. Go to the companies’ websites, download and test their tools; 3. Identify and develop your own stack of tools 4. Try: • Protégé OWL – Ontology Editor for the Semantic Web http://protege.stanford.edu/plugins/owl/swrl/ • Protégé-Frames—User interface and knowledge server to support users in constructing and storing frame-based domain ontologies, customizing data entry forms, and entering instance data: http://protege.stanford.edu/overview/protege-frames.html 5. If you are an Oracle user, protégé_oracle_rdf_plugin, ntriple_converter, Oracle RDF Batch Loader Package