Irish Digital Libraries Summit Digital Libraries at the eve of the Next Generation Internet Sebastian Ryszard Kruk, Mary Burke, Stefan Decker http://wiki.corrib.deri.ie/index.php/SemDL/IrishDLSummit
Looking into the Future of Irish Digital Libraries ? ?
Why do we care?  John teaches biology, over the Internet, using digital libraries and modern technologies (wikis, blogs) How to deliver the material just-in-time? How to pre-asses students? How to automate most of the process?
Goals Present  current solutions  that digital libraries to the Next Generation Internet
Goals Gather  opinions, requirements and future plans  of Irish libraries
Goals Build up bases for  an application for funding of a national digital libraries initiative under the EU  FP7 Digital Libraries  theme
Schedule Semantic Digital Libraries Coffee break 11:30-11:50 Tomasz Woroniecki Building a Semantic Digital Library 11:00-11:30 Maciej Dąbrowski Ontologies for Digital Libraries 10:30-11:00 Sebastian Kruk Mary Burke Get together, Welcome 10:00-10:30 Future of Digital Libraries Lunch break 13:00-14:00 Predrag Knezevic  BRICKS Project 12:30-13:00 Alexander Troussov IBM Ontological Network Miner  and its applications to semantic social networks  12:00-12:30 Sebastian Kruk Introduction to the session 11:50-12:00
Schedule Digital Libraries in Ireland Wrap-up, Conclusions 16:45-17:00 Mary Burke Discussion panel:   Do we need Semantic Web and Web 2.0 technologies  in Digital Libraries? 15:45-16:45 Coffee break 15:30-15:45   Sebastian Kruk, Adam Gzella With a Little Help from My Friends:  Social Semantic Search and Browsing  15:00-15:30 Judith Wusteman OJAX: A Web 2.0 Search user Interface  14:30-15:00 John McDonough The Irish Virtual Research Library and Archive Project –  an infrastructure for humanities research.  14:00-14:30
Ontologies for Digital Libraries  MarcOnt Initiative Maciej Dąbrowski Digital Enterprise Research Institute National University of Ireland, Galway maciej . dabrowski @deri.org
Outline Real-life and Semantic Web Semantic Web and Ontologies MarcOnt Ontology MarcOnt Tools Conclusions
Real-life problems Heterogenous systems Identified Problems: Interoperability Format translation Multiple data formats in DL: How to support them? How to translate between them? Who should create mappings?
Real-life problems – user’s expectations Searching: Effective and Accurate We want correct and fast answers!! Intuitive and Simple Asking questions should be easy. Meaning Jaguar – a car or an animal? Reasoning Give me articles written by students of X in Galway? Identified problems: Intuitive interface for asking complex querries
Real-life problems - summary Digital Libraries should provide: Interoperability Support for many formats Complex search features Intuitive interfaces
Semantic Web
The Semantic Web – A Brief Introduction Current Web vs. Semantic Web? An extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.  [Tim Berners-Lee] Current Web was designed for humans, and there is little information usable for machines Was the Web meant to be more? Objects with well defined attributes as opposed to untyped hyperlinks between Internet resources A  network of relationships  amongst named objects, yielding unified information management tasks What do you mean by “Semantic”? the  semantics  of something is the  meaning  of something Semantic Web is able to describe things in a way that computers can understand
Semantic Web vs. Current Web Current Web Semantic Web
The Semantic Web – What is RDF? Describing things on the S emantic  W eb RDF (Resource Description Framework) a  data  format  for describing information and resources,  the fundamental data model for the Semantic Web Using RDF, we can describe relationships between things like: A is a  part  of B or Y is a  member  of  Z and their properties ( size ,  weight ,  age ,  price …) in a machine-understandable format  RDF  graph-based model  delivers  straightforward  machine  process ing Putting information into RDF files makes it possible for “scutters” or RDF crawlers to  search ,  discover ,  pick up ,  collect ,  analyse  and  process  information from the Web
The Semantic Web – What is RDF? A simple RDF example Statement: “ Stefan Decker  is the  creator  of the resource (web page)   http://www.stefandecker.org ” Structure: Resource (subject) http://www.stefandecker.org Property (predicate)  http://purl.org/dc/elements/1.1/creator Value (object)  “ Stefan Decker ” Directed graph: http://www.stefandecker.org dc:creator Stefan Decker
The Semantic Web – How RDF can help us? How RDF can help us? identify objects establish relationships express a new relationship   just add a new RDF statement  integrate information from different sources    copy all the RDF data together RDF allows many points of view
Ontologies What is an Ontology? „ An ontology is a specification of a conceptualization.“ Tom Gruber, 1993 Ontologies are social contracts Agreed, explicit semantics Understandable to outsiders (Often) derived in  a community process Ontology markup and representation languages: RDF  and RDF Schema OWL Other:  DAML+OIL ,  EER ,  UML ,  Topic Maps ,  MOF ,  XML Schemas
Components of ontologies Concepts Book Article Author Properties hasPages hasTitle Constraints Cardinality is at least 1 Maximum value is 200 Axioms Planes can fly People can’t fly Relationships Is a Part of
Ontologies - half-time conclusions Data is not only human readable, it is now also machine readable  Machines can realize much more complex tasks (eg. reasoning) Capturing the meaning of concepts is possible A new look on data storage systems (there are no data structures!!) A  d  v  a  n  t  a  g  e  s
Usecase scenario Author Title Structured resources: Author Title Data storage allows: Author  Title Additional information cannot be stored!! Author Title Date Title Author Regular Systems Author Title Date
Ontology development process Many approaches Different life cycles Continuous process Involves  community of users   Requires  tools for collaboration Tools for  ontology  development are necessary D  e  v  e  l  o  p  m  e  n  t
MarcOnt Initiative Motivation: Build a bibliographic ontology for  the Jerome Digital Library MarcOnt Initiative goals: Deliver a set of tools for  collaborative ontology development Collaboration Tools for domain experts Enable mediation between formats (MMS)
MarcOnt Ontology Central point of MarcOnt Initiative Translation and mediation format Continuous collaborative ontology improvement Knowledge from the  domain experts Community  influence and evaluation
MarcOnt Ontology Goals: Capture concepts   from the legacy bibliographic formats MARC21, Bibtex, Dublin Core Lattes, ... Create a  uniform bibliographic description format  for digital libraries. Enable the use of Semantic Web technologies (eg. reasoning) to improve capabilities of digital libraries Improve interoperability
Format Translation Scenario Author: John Smith Date of Birth: 1956-10-15 Date of death: 2004-09-10 Author: John Smith Date of Birth: ?? Date of death: ?? Author: John Smith Date of Birth: ?? Date of death: ?? Author: John Smith Date of Birth: ?? Date of death: ?? Dublin Core
Format Translation Scenario Author: John Smith Date of Birth: 1956-10-15 Date of death: 2004-09-10 Author: John Smith Date of Birth: ?? Date of death: ?? Author: John Smith Date of Birth: ?? Date of death: ?? Author: John Smith Date of Birth: 1956-10-15 Date of death: 2004-09-10 RDF Storage Dublin Core Author: John Smith Date of Birth: 1956-10-15 Date of death: 2004-09-10 Author: John Smith Date of Birth: 1956-10-15 Date of death: 2004-09-10
MarcOnt Mediation Services
MarcOnt Mediation Services Format translation Interoperability MarcOnt Mediation Services RDF Translator
MarcOnt Ontology in JeromeDL Improvement of searching capabilities Natural Language Processing (NLP) Templates Show me all publications written by students of Decker.
MarcOnt Portal Collaborative ontology development. Portal provides: Suggestions Annotations Versioning Ontology editor
MarcOnt Portal On-line ontology editing Visualization of ontologies
MarcOnt Portal Comparing versions of ontologies
MarcOnt Initiative Roadmap Lattes – CV platform used in Brasil Release of MarcOnt draft ontology Digital Rights Management Sharing issues MarcOntX agent – automatic integration of concept from Digital Libraries
MarcOnt Initiative summary MarcOnt Initiative goals: Create a framework for collaborative ontology development Provide domain experts with tools to share their knowledge Offer tools for data mediation between different data formats Develop MarcOnt bibliographic ontology Create a community of users (domain experts)
Conclusions Ontologies: can  improve  the most important goal of digital libraries –  searching  the information facilitate interoperability capture  much  more information  (metadata) than existing systems are the agreement  of people (domain experts)  need tools for collaborative development and  community of users are the future  of Digital Libraries?
Tomasz Woroniecki [email_address] JeromeDL Building a Semantic Digital Library
Outline of the presentation Introduction to Semantic Digital Libraries Overview of JeromeDL Architecture of JeromeDL Working with JeromeDL Demo
Social Semantic Digital   Library A library stores and provides access to  resources (books) Qualified staff updates catalogues and helps users
Social Semantic  Digital   Library Machine-readable resources Full-text index improves searching Easy access Availability
Social  Semantic  Digital   Library Resources are accessible  by  machines, not  with  machines Metadata is rich and extensible Searching reflects meaning of terms RDF is a standard for representing information Not just resources but also knowledge is shared
Social  Semantic Digital   Library Involves the community into sharing knowledge Utilizes social network in searching Allows for comments, blogs, shared bookmarks Easy tagging
Evolution of Libraries Social Semantic Digital Library Involves the community into sharing knowledge Semantic Digital Library Accessible  by  machines, not only  with  machines Digital Library Online, easy searching with a full-text index Library Organized collection
Semantic Digital Library Semantic digital libraries integrate  information based on different metadata, e.g.: resources, user profiles, bookmarks, taxonomies provide  interoperability  with other systems (not only digital libraries)   deliver more robust,  user friendly and adaptable search and browsing  interfaces empowered by semantics
JeromeDL - Motivations Support for different kinds of bibliographic medatata, like:  DublinCore ,  BibTeX  and  MARC21  at the same time. Making use of existing rich sources of bibliographic descriptions (like MARC21) created by human. Supporting users and communities: users have control over their profile information; community-aware profiles are integrated with bibliographic descriptions support for community generated knowledge Delivering communication between instances: P2P mode for searching and users authentication Hierarchical mode for browsing
JeromeDL – Social Semantic Digital Library JeromeDL fulfills requirements of:  Librarians precise annotations rich metadata Researchers easy publishing searching related topics Average users efficient search and browsing online collaboration
JeromeDL - Architecture
Ontologies in JeromeDL
Using JeromeDL Uploading a resource provide title, abstract, author etc. provide structure of the resource (e.g., chapters) choose domains of the subject choose keywords for the resource set additional properties upload digital parts of the resource
Using JeromeDL
Using JeromeDL An administrator either approves or rejects a published resource
Sharing bookmarks
JeromeDL for a regular user Browsing resources by type, author, keyword, domain Downloading the resource and its bibliographic description in various formats Subscribing to RSS feeds Searching simple, advanced, distributed, semantic
JeromeDL for a regular user
Summary An easy solution for putting resources online A community around your repository Support for many languages Integration with Bibster and OpenSearch protocols Visit  www.jeromedl.org
Irish Digital Libraries Summit Digital Libraries at the eve of the Next Generation Internet Future of Digital Libraries http://wiki.corrib.deri.ie/index.php/SemDL/IrishDLSummit
Looking into the Future of Irish Digital Libraries ? ?
Building the Future Future Internet, semantic or social, or both,  will not emerge on its own ,  we  need to build it
Building the Future Digital libraries are  important part  of the Internet
Building the Future Libraries should  continue to  drive the changes, not only follow
Building the Future OnNeM   -  IBM Ontological Network Miner and its applications to semantic social networks  BRICKS Project  – Building Resources for Integrated Cultural Knowledge Services
IBM CAS Dublin / LanguageWare group  Ontological Network Miner and its applications  to  models of social networks and semantics Alexander Troussov, Mikhail Sogrin, John Judge
Agenda Ontological Network Miner tool (project Galaxy) As generic tool to perform elements of soft clustering and fuzzy inference on semantic networks Applications of Galaxy to ontology-based semantic analysis of texts Semantic tagging, term disambiguation based on the global context Galaxy applications to folksonomies Community detection/Expertice location, … Applications to unified models of semantic social networks Research cooperation
Ontological Network Miner (Galaxy) A generic tool to perform elements of soft clustering and fuzzy inference on semantic networks Ongoing project based on the work we have done for EU 6 th  framework integrated project Nepomuk
  Applications to metadata generation
Applications to metadata generation Currently the semantic web relies on semantic annotation mostly done manually by humans Working in EU 7 th  framework project Nepomuk (which aims to build social semantic desktop) we in IBM Dublin developed a tool for automation of metadata creation:  Automatic ontology-based conceptual tagging (central concepts of the text with respect to the given lexico-semantic resource) Text which mentions Mulhuddart, Lansdowne, Clontarf is probably about Dublin/Ireland/Europe/Earth, this fact can be inferred from geographical relations like Mulhuddart “is-part-of” Dublin Disambiguation of terms Based on on the ontological knowledge from corresponding resource (Jaguar – a car or an animal?  Jaguar, car, animal, pet, …) …
Automatic tagging based on concept mentions NETWORK OF CONCEPTS TEXT Mapping of term mentions to concepts  .  Finding “focus” concept Mention  Mention  Mention  Mention
DEMO (Lotusphere 2007)  Run eclipse.exe Open  lotusphere_demo.config.xml  located in subfolder  data Have a look at the underlying personal information management ontology people, organisations, projects,  Open text:  email1.anno   Text is processed on the fly, terms are disambiguated, central concepts are shown in the upper-right window Why US?  Because most found concepts are people, and during disambiguation it was established that most likely referents of (ambiguous) names are located in US Let us remove first line with two names  The text now has less names. Instead of  people, other (abstract) concepts now play a more prominent role. Because of this (after a small delay caused by Eclipse, not by the performance of our system) US disappears as the top concept
What is Ontological Network Miner?  Text analytics demo shown before has applications to: Context dependent smart tags Metadata generation Although text processing is a complex process involving mapping from text to concepts and usage of empirics specific to certain properties of the discourse at the heart of the processing is clustering on the graph of concepts Which was shown by the animation when wide orange area becomes smaller after “magical” shrinking This clustering is provided by IBM Dublin Ontological Network Miner Codenamed OnNeM in Nepomuk project
What exactly Ontological Network Miner does?  One algorithm (a blend of soft clustering & fuzzy inference) Depending on the parameters, this algorithm provides “ Generalisation” of the model Output has less nodes compared to the input “ Expansion” of the model Which might be used for query expansion: Query “nutrion”+”science” is expanded into properly ranked list:  nutritionist, dietologist,  nutritional, scientific,  ..  Our customers and partners can tune the algorithm for specific tasks using intuitively clear parameters.
Tuning Galaxy  Galaxy utilises a data-driven algorithm and more importantly, tuning can be done by a domain specialist (not necessarily a researcher or software developer),  is to “tell” Galaxy what properties of the underlying semantic network are relevant to a particular task: For example, in application to geotagging the user might specify that Galaxy favour geographical locations with bigger populations, and, in addition, favour popular resorts Using WordNet – specify that Galaxy must favour  hypernymy-hyponymy  relations and disfavour  meronymy-holonomy  relations Researchers (IBMers and CAS scientists) also have the opportunity to work with us on “fine-tuning” the algorithm For example, to improve usage of graph-metrics such as in-/out- degree of nodes
Applications to folksonomy systems (Del.icio.us, IBM’s Dogear, …)
Folksonomies as ontological networks   People Documents Tags Instances of tagging
Why a “generic” ontological network miner is needed: Objects of interest might be wired into one unified model of lexicon, semantic and social networks For example, the network depicted on the previous slide can be augmented with new entities and new relations One can add relations between participants, or add new people into consideration Semantic relations between tags might be added manually, or generated automatically  based on morphological similarity of words, proximity in WordNet, etc.  Keywords and other metainformation about documents might be wired into the network Tags in folksonomies are created by humans. Keywords (preexisting in documents or extracted by text processing) and their relations to documents and tags might be added to augment folksonomies.  Dogear can recommend tags for new document which nobody yet tagged in a style accepted in the community
Why a “generic” engine like Galaxy is needed: (cont) Unified model of lexicon, semantic and social networks gives more context to make the right decisions in  Community Detection, Community Structure Analysis, Metadata Sharing & Recommendations, etc However, data network becomes quite intricate and irregular, and only generic, scalable and high-performing ontological network miners (like Galaxy) are up to the job Galaxy is a generic technique, which can efficiently work on huge networks with complex topology Most tasks on MeSH and WordNet are done in 200 msc Galaxy has native potential for explanatory module “ This person might help you to understand this document because he frequently used tags popular for this documents”
OnNeM can handle  Networks like this:  People Documents Tags Instances of tagging New people and additional relations between them Relations between tags: semantic proximity, misspellings, translations, WordNet, … Relations between documents: … New objects: e.g. keywords from texts might be related with documents and tags
Applications to  Semantic Social Networks & Knowledge Exchange
What problems Galaxy can address Galaxy could be used to uniformly address many problems in  Semantic Social Networks & Knowledge Exchange : Tag recommendation in folksonomies; Community detection; Centrality problem in social network analysis; Expertise location…  How?  Galaxy is a generic technique: which takes as input a function on nodes of a semantic network and transforms this input into another function according to the parameters. To simplify explanations, instead of the input/output functions, we’ll talk about the input set of nodes and ranked output set of nodes To create solution for a particular task  A set of input nodes must be chosen Parameters of the algorithm must be established  Output set must be interpreted according to the task
IBM social software  “ the company is serious about dominating social networking for the enterprise” Cooking Up a Social Networking Storm With IBM Labs, March 30, 2007    IBM Social Software Dogear Dogear is a social-tagging service for resources such as public URLs, company-internal URLs, and other company internal documents (e.g., Wiki pages, Domino documents, etc.) Bluepages+1 is an enhanced version of IBM online employee directory. Among its enhancements is the ability for one person to apply a tag directly to another person’s directory page. Blog Central Blog Central is an internal blogging service, open to any employee. The Blog Central data structures provide for a separate list of tags for each blog and for each entry within each blog. Activities Activities is a web-based version of ActivityExplorer, an activity-centric collaboration service in which teams may create a collections of diverse objects in a tree-like structure consisting of a root “activity” and its daughter components.
Our research plans to exploit Galaxy: We are investigating a wide range of applications in  Community Detection, Community Structure Analysis, Metadata Sharing & Recommendations Enhanced with Social Reputation Mechanisms Based on our understanding of potential IBM needs, our commitments for European research projects, and our vision of the potential of Galaxy, we are looking forward to the creation of the following functionalities: Community Support Given a peer: Search for its neighbors within a community Given the entire collection: Identify trends and threads (e.g., tags becoming popular, etc.) Metadata Sharing & Recommendations Given a file with some attached metadata: Recommend additional annotations Recommend similar files Given one or more tags and/or keywords: Locate peers with expertise in the described areas
Research collaborationation  Create semantic social networks of your interest … in the format which can be used by Galaxy simple XML format  Design scenario and work with us on tuning parameters of Galaxy for the tasks in your scenario …  Contacts Alexander Troussov, CAS Chief Scientist,  [email_address] Marie Wallace, LanguageWare manager,  [email_address] Brian O’Donovan, CAS Program Director,  [email_address]   IBM CAS Dublin https://www.ibm.com/ibm/cas/sites/dublin/ LanguageWare http://www.ibm.com/software/globalization/topics/languageware/index.jsp   NEPOMUK http://nepomuk.semanticdesktop.org/
Questions?
BRICKS Project Predrag Knežević Fraunhofer IPSI Institute Darmstadt, Germany [email_address]
What is BRICKS? A software infrastructure for building digital library networks Transparent access to distributed resources Multilinguality Easy installation & maintainance A set of end-user applications Network & content management Web 2.0 Tagging/Annotations Domain specific applications A business model Open Source, Platform Independent Low cost infrastructure User communities    sustainability
Sustainability User Communities Open Source Applications User App. Build on top of the foundation User Services can become Foundation services Foundation/Infrastructure Decentralized Storage Content&Metadata Mngt. Semantic Retrieval Security/DRM BRICKS
BRICKS Architecture A decentralized P2P network Avoid central coordination Highly Scalable, increased reliability Minimized maintainance costs Each P2P Node is a set of SOA components Web Service Interface Platform Independent Flexible Composition Components for Storing, accessing and protecting digital objects (Semantic) search & browsing P2P commmunication
Accessing Data
A Look into a BNode { BNode
Features Application development in any language with a good Web-service support Metadata Support for various schemas Indexed both locally and published in decentralized index as well Annotations Support for various media types (text, images, audio, video)  Various supported types (text, audio, video, spatial, temporal) Content Can be stored outside of BNode Internally content can be managed in various binary and structured (XML) formats Organized into collections Location transparent for applications Search Simple, advanced, ontology-based Cross-language support Addresses all available content
Collection Manager Single access point for all content and metadata related operations (local and remote) Physical Collection Similar to folder/directory hierarchy in a file system Bound to a single BNode Each digital content object belongs to exactly one collection Logical Collection Virtual folder for organizing content items independent of their physical location  Links to content items from various physical collections on different BNodes A content item might belong to many of them Stored Query similar to database views
Content Manager Two ways to handle Content in BRICKS stored locally at site of a member party, accessed via URL stored within BRICKS Based on Java Content Repository (JCR) Provide a meta-content model Re-use of existing content models Use standard models
Metadata Manager Metadata descriptions     RDF Suitable for any applícation scenario Express Relationships between objects React to changes without changing the model Schema defintions     OWL No fixed schema Extensible (e.g. Application Profiles) Semantic concepts instead of schematic strucutures SPARQL Metadata queries over ontology concepts Queries for graph patterns
Annotation Management Rich model Supported fragment types: “Text fragment”, “Time fragment”, “Rectangle”, “Circle”, “Point”, “Polygon” and “Polyline” Supported annotation types: “Structured Annotation”, “Association”, “Text annotation” and “Symbol Annotation” Annotation type “association” supports n:m relations Support of versioning Annotation of complete objects and of fragment of objects Supports annotation of multiple objects 13/03/2007
Security Manager Transparently invoked by the Framework any service call is checked Context-aware policies based on RBAC (via XACML rules), supporting Roles, Groups, at DLObject level Permission declaration through Javadoc @tags Federated identity is managed through an adapted version of OpenSAML Reputation-based Trust calculation integrated Web-based GUI for Security configuration 13/03/2007
Digital Rights Management DRM Component Support for licenses based on MPEG-21 REL license declaration standard Generic API for the integration of commercial DRM systems Watermarking Open-source watermarking tool for images other tools can be integrated BRICKS Store web application for commercial content Creative Commons support for other content in BRICKS 13/03/2007
BRICKS APPLICATIONS
Application: BRICKS Workspace  What does it demonstrate? a web application (thin client) accessing BRICKS Foundation services Web 2.0 image annotations Reference application Primary customers? general end-users (citizens) application developers Technology Struts based interface to the BCH Live demo at http://saturn.researchstudio.at:8090/workspace
Application: BRICKS Desktop  What does it demonstrate? a rich client application accessing BRICKS Foundation services direct access to the BCHN Primary customers? expert end-users (researchers, educators) application developers Technology Eclipse based rich client interface Download at http://develop.bricksfactory.org/projects/desktop
Application: Annotation Tool What does it demonstrate? Tool which allows end-users to annotate images Creation of annotation threads Supervised Annotations Primary customers? end-users Institutions with large image collections Technology Web Application
Application: Online Exhibition Authoring Tool What does it demonstrate? Creating and publishing  online exhibitions using contents that  is available  in the BRICKS network Primary customers? expert end-users (curators) Technology Web Application Live demo at http://livingmemory.researchstudio.at/
Application: Archeological Finds Identifier What does it demonstrate? a web application for comparing  found objects (e.g. ancient coins)  with objects from reference  collections  Application of complex domain ontology (CIDOC-CRM) Map visualization of GIS-Metadata Primary customers? Museum curators, archaeologists, students, amateurs Technology Struts based interface Live Demo at http://finds.brickscommunity.org:8091/findsidentifier/index.do
BRICKS Demo Store What does it demonstrate? Purchasing digital goods License maintenances and proofing Primary customers Content providers Technology Based on OFBiz Live demo at http://brstore.metaware.it:9080/ecommerce/control/main
References BRICKS Community Web Site ( http://www.brickscommunity.org ) BNode Release Downloads ( http://foundation.bricksfactory.org ) BRICKSforge ( http://develop.bricksfactory.org ) BRICKS Developer Community ( http://dev.brickscommunity.org )
Irish Digital Libraries Summit Digital Libraries at the eve of the Next Generation Internet Digital Libraries in Ireland http://wiki.corrib.deri.ie/index.php/SemDL/IrishDLSummit
Building the Future IVRLA   -  The Irish Virtual Research Library and Archive Project - an infrastructure for humanities research.  OJAX  – A Web 2.0 Search user Interface  S 3 B  -  With a Little Help from My Friends: Social Semantic Search and Browsing
The Irish Virtual Research Library and Archive Project - an infrastructure for humanities research.
Outline Quick Overview Digitisation Processes Repository Development Content Models IVRLA Deployment Observations
IVRLA Positioning PRTLI funded project Component of UCD Humanities Institute of Ireland and based in UCD Library Supporting research through offering access to digitised content from participating primary source repositories Direct research into digitisation and digital repositories Developing and promoting added value tools and services
IVRLA Deliverables Body of digitised content Functioning repository prototype with scaleable infrastructure Comprehensive report including regulatory and financial issues  Body of corporate knowledge & expertise Centre of excellence Proof of concept
The vision thing
Support the creation and publication of new forms of “ information units ”  Integrate with the  processes  (e.g., workflows) of research, collaboration, and scholarly communication Enable  knowledge   integration : capture semantic and factual relationships among information entities Promote information  re-use  and  contextualization Facilitate  collaborative activity  and capture information that is created as a byproduct of it Capture and maintain the  complex structural, semantic, provenance, and administrative relationships  among digital resources* * Sandy Payette, Sydney 2006. Digital content repositories should…
Digitisation and Cataloguing Processes
Image based Digitisation Components Apple PowerMac G5 running Kodak oXYgen Scan  Kodak IQSmart 2 Adobe Photoshop CS2
Audio Digitisation Components Quadriga system Lake People ADC and DAC Revox 1/4 inch tape player
Files and Formats Scanned Material (text and images) TIFF (PM)  JPEG (CW) Djvu (CW)  JPEG (TN) Time Based Material Audio BWF (PM) MP3 / MP4 (CW) Video Linear Digital (PM) mov,wmv? (CW)
Workflows TIFFs have metadata embedded TIFFs are backed up to LTO Photoshop macros used to watermark, create JPEG and TIFFs DVDs created and stored  Additional derivatives created for resource discovery and access
Data Storage 3 high quality ‘Preservation Master’ copies 2 DVD-ROM - working 1 LTO - deep archive Copies stored in geographically disparate locations Estimate that IVRLA will require 6-8TB for all preservation master storage. Scans ~ 80MB Audio ~ 800MB/hr Online requirement is significantly less
Metadata and Database 2 stage cataloguing database MODS - descriptive metadata METS - structural and transmission metadata EAD - archival context and structure MIX - technical metadata for images MADS - descriptive metadata authority files
 
MADS files
Collection Model Library use OPAC for searching Archives use Finding Aid for browsing Hybrid model to enable searching and browsing of complex hierarchical digital collections Model facilitates top down and bottom up approaches EAD provides context and structure MODS provides precision and accuracy Create EAD template for each ‘collection’ Catalogue to the appropriate level
 
 
Repository Architecture Articulation
Open Source Repository Systems Growing area of development Several options available;  Dspace  Eprints  Fedora IVRLA required a solution which offers;  Suitability for wide range of data types Support for collection structures and complex objects Scalability - prototype into service Future-proof architecture Long term digital preservation
Fedora Service Framework (2005-07) © S.Payette
IVRLA Preservation Requirements Audit trails and datastream versioning  Persistent Identifiers  Checksum creation and validation Whole object versioning OAIS compliance TDR compliance
IVRLA interface requirements Evidence Provenance, authenticity, integrity, context, persistence, sustainability Granularity - directed to page, clip, part .. Security, authentication and authorisation infrastructure Conversation/Participation Informal, collaborative Personalisation and customisation Recommendation Services (S/CSI) Social searching and annotation (S/CSI - S/ILS) Add value, links, connections…
Content Models
Fedora Content Models A definition for a “type” of object (e.g., article, book, image, learning object) that describes the internal composition of a group of similar Fedora objects Data Type Structure Services Data Type defines payloads and metadata Structure defines relationships between objects  Services define actions or disseminators for the content
 
 
 
RoadMap Initial Research and Demo Develop utilities - sipMaker and mixMaker Articulate collection model Develop Virtual Library and Archive 1.0 Browse Search View Cite Tag Ingest Trial and deployment of subsets Develop Virtual Library and Archive 2.0 User management Personalisation, customisation Recommendation services Annotation and tagging Research space Virtual collections
IVRLA 1.0?
Usership Research based Context heavy - accuracy, integrity and authenticity Technically literate with Internet age expectations - the Google effect Accurate citation and source acknowledgement using persistent identifiers
Repository Challenges Architecture is not an ‘out of box’ solution Resources required to articulate and develop interface layer(s) Metadata management is complex Tension between popular delivery formats and archival preservation formats Challenge of anticipating all user environments in content modeling Improved automation is necessary for ingest and validation.  Digitisation is the main bottleneck Sustainability - prototype developed into a service Human resources are central to technology projects Developing and training data curators - multidisciplinary skill sets
Observations and Conclusion 5 year project timeline requires an iterative process  New advances in computing science will influence developments - eScience, eHumanities, Web 2.0 IVRLA positions the archival source with all context and structure as central to the digital deployment Define and build core sources which can be interrogated and integrated with dynamic services Standards based interoperability is key to ensure future accessibility and sustainability New repository models suggest and support user created metadata such as social bookmarking and annotating
Further Information www.ucd.ie/ivrla [email_address]
OJAX:  Web 2.0  Federated search   Judith Wusteman April 2007
Overview Introducing OJAX OJAX Demo  Related research
Web 2.0 Technologies and Standards used in OJAX AJAX REST  JSON Atom OAI-PMH  OpenSearch Open API StaX Apache Lucene
http://ojax.sourceforge.net/
OJAX
OJAX demo
Unifying the user interface
Auto-completion Auto-search Dynamic archive list
Dynamic scrolling
Auto-expansion of results
Sorting results
OpenSearch
OpenSearch Enables search engines to describe their search syntax to browsers Describes standards for search results syntax Based on RSS and Atom
Atom feed support
 
 
 
 
Accessibility
Science Foundation Ireland: OJAX++: a next generation  collaborative research tool To investigate how concepts from the Social Web can be applied to the research environment in order to facilitate dynamic collaboration and the sharing of ideas among researchers.
PhD starting September 2007  In collaboration with  UCD School of Computer Science and Informatics Requirements  Honours degree (preferably first class or 2.1)  in Computer Science or a related field or equivalent technical expertise Preferred Experience :  Web technology  JavaScript  AJAX  one of Java, Ruby or Python.  http://www.ucd.ie/wusteman [email_address] .
Advantages of OJAX Developed in Ireland. Can be adapted to suit.  Already in Beta version. Available for download. Well received  Responds to new user expectations generated by Web 2.0 Rich, dynamic user experience.  Intuitive interface.  Integration, interoperability and reuse.  Open source standards-compliance. including OpenSearch, OAI-PMH, StAX  and Apache Lucene.
http://ojax.sourceforge.net/
With a Little Help from My Friends Social Semantic Search and Browsing Sebastian Ryszard Kruk, Adam Gzella Digital Enterprise Research Institute National University of Ireland, Galway sebastian.kruk@deri.org, adam.gzella@deri.org http://s3b.corrib.org/
Take away message We search in different way for different things Keyword search is not enough We create the knowledge by sharing our (search) experience
Outline Motivation How do people search Search and Browsing lifecycle Applying semantics and making use of social networks: Keyword-based search Collaborative Faceted Navigation Collaborative Filtering Conclusions - Putting it all together
How do people search? Different user goals: Resource Seeking  - the user wants to find a specific resource (e.g. lyrics of a song, a program to download, a map service etc.) Navigational  - the user is searching for a specific web site whose URL s/he forgot Informational  - the user is looking for information about a topic s/he is interested in Rose and Levinson:  Understanding user goals in web search (2004)
Search and browsing lifecycle Why ? Information can be useful Information can be a garbage How ? (Search and browsing actions) [REUSE]  keyword-based search (resource seeking)  [REDUCE]  faceted navigation (navigational)  [RECYCLE]  collaborative filtering (informational) Can this process be improved with Semantic Web and Social Networking technologies?
Query refinement in keyword-based search Why  simple full-text search is not enough? Too many results ( low precision ) One needs to specify the exact keyword ( low recall ) How to distinguish between: Python and python? ( high fall-out ) How ?  Disambiguation through a  context Query context Short-term context: User’s goal Location Time Long-term context: User’s interest Search engine specific
Query refinement in keyword-based search How ?  Query  refinement Spread activation Types mapping Pruning Acquiring the  context information : Previous searches of the user Semantically annotated user’s bookmarks Community profile And ? (Manual query refinement) “ Tell me why ” button and  the transcript of refinement process Continue to  faceted navigation
Collaborative Faceted Navigation Why ? The  search does not end  on a (long) list of results The results are not a list (!) but a  graph We loose context with  linear navigation A need for unified notion (UI, Services) of filter/narrow and browse/expand services Share browsing experience –  navigate collaboratively   How (Services)? Defines  REST access  to services and their composition Basic services : access, search, filter, similar, browse, combine Meta services : RDF serialization, subscription channels, service ID generation Context services : manage contexts, manage service calls/compositions in the context, lists contexts Statistics services : properties, values, tokens
Collaborative Faceted Navigation How (User interface)? Hexagons to capture the notion of  non-linear history of browsing Selecting values from  list, tag cloud or TagsTreeMap TM Context  zoomable interface : List (graph) of results Browse from current results Navigate between service call Navigate between contexts (with given call)
Social Semantic Collaborative Filtering Why? The bottom-line of acquiring knowledge:  informal communication  (“word of mouth”)  How? Everyone classifies (filters) the information in bookmark folders ( user-oriented taxonomy ) Peers share (collaborate over) the information ( community-driven taxonomy ) Result? Knowledge  “flows“ from the expert  through the social network to the user System amass a lot of information  on user/community profile ( context )
Social Semantic Collaborative Filtering Problems? The  horizon of a social network  (2-3 degrees of separation) How to handle  fine-grained information  (blogs, wikis, etc.) Solutions?  Inference engine to  suggest knowledge  from the outskirts of the social network Support for  SIOC metadata : Semantically Interlinked Online Communities: blogs, wikis, fora, … SIOC browser in SSCF Annotations and evaluations of “local” resources
Putting it all together user profile: recent actions refine search results filter, record, annotate, and share results and actions re-call shared actions user profile: user’s interests filter, record,  annotate,  and share results
Do we need Semantic Web and Web 2.0 technologies in Digital Libraries?   Irish Digital Libraries Summit Digital Libraries at the eve of the Next Generation Internet http://wiki.corrib.deri.ie/index.php/SemDL/IrishDLSummit
Irish Digital Libraries Summit Digital Libraries at the eve of the Next Generation Internet Conclusions http://wiki.corrib.deri.ie/index.php/SemDL/IrishDLSummit

Irish Digital Libraries Summit

  • 1.
    Irish Digital LibrariesSummit Digital Libraries at the eve of the Next Generation Internet Sebastian Ryszard Kruk, Mary Burke, Stefan Decker http://wiki.corrib.deri.ie/index.php/SemDL/IrishDLSummit
  • 2.
    Looking into theFuture of Irish Digital Libraries ? ?
  • 3.
    Why do wecare? John teaches biology, over the Internet, using digital libraries and modern technologies (wikis, blogs) How to deliver the material just-in-time? How to pre-asses students? How to automate most of the process?
  • 4.
    Goals Present current solutions that digital libraries to the Next Generation Internet
  • 5.
    Goals Gather opinions, requirements and future plans of Irish libraries
  • 6.
    Goals Build upbases for an application for funding of a national digital libraries initiative under the EU FP7 Digital Libraries theme
  • 7.
    Schedule Semantic DigitalLibraries Coffee break 11:30-11:50 Tomasz Woroniecki Building a Semantic Digital Library 11:00-11:30 Maciej Dąbrowski Ontologies for Digital Libraries 10:30-11:00 Sebastian Kruk Mary Burke Get together, Welcome 10:00-10:30 Future of Digital Libraries Lunch break 13:00-14:00 Predrag Knezevic BRICKS Project 12:30-13:00 Alexander Troussov IBM Ontological Network Miner and its applications to semantic social networks 12:00-12:30 Sebastian Kruk Introduction to the session 11:50-12:00
  • 8.
    Schedule Digital Librariesin Ireland Wrap-up, Conclusions 16:45-17:00 Mary Burke Discussion panel: Do we need Semantic Web and Web 2.0 technologies in Digital Libraries? 15:45-16:45 Coffee break 15:30-15:45 Sebastian Kruk, Adam Gzella With a Little Help from My Friends: Social Semantic Search and Browsing 15:00-15:30 Judith Wusteman OJAX: A Web 2.0 Search user Interface 14:30-15:00 John McDonough The Irish Virtual Research Library and Archive Project – an infrastructure for humanities research. 14:00-14:30
  • 9.
    Ontologies for DigitalLibraries MarcOnt Initiative Maciej Dąbrowski Digital Enterprise Research Institute National University of Ireland, Galway maciej . dabrowski @deri.org
  • 10.
    Outline Real-life andSemantic Web Semantic Web and Ontologies MarcOnt Ontology MarcOnt Tools Conclusions
  • 11.
    Real-life problems Heterogenoussystems Identified Problems: Interoperability Format translation Multiple data formats in DL: How to support them? How to translate between them? Who should create mappings?
  • 12.
    Real-life problems –user’s expectations Searching: Effective and Accurate We want correct and fast answers!! Intuitive and Simple Asking questions should be easy. Meaning Jaguar – a car or an animal? Reasoning Give me articles written by students of X in Galway? Identified problems: Intuitive interface for asking complex querries
  • 13.
    Real-life problems -summary Digital Libraries should provide: Interoperability Support for many formats Complex search features Intuitive interfaces
  • 14.
  • 15.
    The Semantic Web– A Brief Introduction Current Web vs. Semantic Web? An extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. [Tim Berners-Lee] Current Web was designed for humans, and there is little information usable for machines Was the Web meant to be more? Objects with well defined attributes as opposed to untyped hyperlinks between Internet resources A network of relationships amongst named objects, yielding unified information management tasks What do you mean by “Semantic”? the semantics of something is the meaning of something Semantic Web is able to describe things in a way that computers can understand
  • 16.
    Semantic Web vs.Current Web Current Web Semantic Web
  • 17.
    The Semantic Web– What is RDF? Describing things on the S emantic W eb RDF (Resource Description Framework) a data format for describing information and resources, the fundamental data model for the Semantic Web Using RDF, we can describe relationships between things like: A is a part of B or Y is a member of Z and their properties ( size , weight , age , price …) in a machine-understandable format RDF graph-based model delivers straightforward machine process ing Putting information into RDF files makes it possible for “scutters” or RDF crawlers to search , discover , pick up , collect , analyse and process  information from the Web
  • 18.
    The Semantic Web– What is RDF? A simple RDF example Statement: “ Stefan Decker is the creator of the resource (web page) http://www.stefandecker.org ” Structure: Resource (subject) http://www.stefandecker.org Property (predicate) http://purl.org/dc/elements/1.1/creator Value (object) “ Stefan Decker ” Directed graph: http://www.stefandecker.org dc:creator Stefan Decker
  • 19.
    The Semantic Web– How RDF can help us? How RDF can help us? identify objects establish relationships express a new relationship  just add a new RDF statement integrate information from different sources  copy all the RDF data together RDF allows many points of view
  • 20.
    Ontologies What isan Ontology? „ An ontology is a specification of a conceptualization.“ Tom Gruber, 1993 Ontologies are social contracts Agreed, explicit semantics Understandable to outsiders (Often) derived in a community process Ontology markup and representation languages: RDF and RDF Schema OWL Other: DAML+OIL , EER , UML , Topic Maps , MOF , XML Schemas
  • 21.
    Components of ontologiesConcepts Book Article Author Properties hasPages hasTitle Constraints Cardinality is at least 1 Maximum value is 200 Axioms Planes can fly People can’t fly Relationships Is a Part of
  • 22.
    Ontologies - half-timeconclusions Data is not only human readable, it is now also machine readable Machines can realize much more complex tasks (eg. reasoning) Capturing the meaning of concepts is possible A new look on data storage systems (there are no data structures!!) A d v a n t a g e s
  • 23.
    Usecase scenario AuthorTitle Structured resources: Author Title Data storage allows: Author Title Additional information cannot be stored!! Author Title Date Title Author Regular Systems Author Title Date
  • 24.
    Ontology development processMany approaches Different life cycles Continuous process Involves community of users Requires tools for collaboration Tools for ontology development are necessary D e v e l o p m e n t
  • 25.
    MarcOnt Initiative Motivation:Build a bibliographic ontology for the Jerome Digital Library MarcOnt Initiative goals: Deliver a set of tools for collaborative ontology development Collaboration Tools for domain experts Enable mediation between formats (MMS)
  • 26.
    MarcOnt Ontology Centralpoint of MarcOnt Initiative Translation and mediation format Continuous collaborative ontology improvement Knowledge from the domain experts Community influence and evaluation
  • 27.
    MarcOnt Ontology Goals:Capture concepts from the legacy bibliographic formats MARC21, Bibtex, Dublin Core Lattes, ... Create a uniform bibliographic description format for digital libraries. Enable the use of Semantic Web technologies (eg. reasoning) to improve capabilities of digital libraries Improve interoperability
  • 28.
    Format Translation ScenarioAuthor: John Smith Date of Birth: 1956-10-15 Date of death: 2004-09-10 Author: John Smith Date of Birth: ?? Date of death: ?? Author: John Smith Date of Birth: ?? Date of death: ?? Author: John Smith Date of Birth: ?? Date of death: ?? Dublin Core
  • 29.
    Format Translation ScenarioAuthor: John Smith Date of Birth: 1956-10-15 Date of death: 2004-09-10 Author: John Smith Date of Birth: ?? Date of death: ?? Author: John Smith Date of Birth: ?? Date of death: ?? Author: John Smith Date of Birth: 1956-10-15 Date of death: 2004-09-10 RDF Storage Dublin Core Author: John Smith Date of Birth: 1956-10-15 Date of death: 2004-09-10 Author: John Smith Date of Birth: 1956-10-15 Date of death: 2004-09-10
  • 30.
  • 31.
    MarcOnt Mediation ServicesFormat translation Interoperability MarcOnt Mediation Services RDF Translator
  • 32.
    MarcOnt Ontology inJeromeDL Improvement of searching capabilities Natural Language Processing (NLP) Templates Show me all publications written by students of Decker.
  • 33.
    MarcOnt Portal Collaborativeontology development. Portal provides: Suggestions Annotations Versioning Ontology editor
  • 34.
    MarcOnt Portal On-lineontology editing Visualization of ontologies
  • 35.
    MarcOnt Portal Comparingversions of ontologies
  • 36.
    MarcOnt Initiative RoadmapLattes – CV platform used in Brasil Release of MarcOnt draft ontology Digital Rights Management Sharing issues MarcOntX agent – automatic integration of concept from Digital Libraries
  • 37.
    MarcOnt Initiative summaryMarcOnt Initiative goals: Create a framework for collaborative ontology development Provide domain experts with tools to share their knowledge Offer tools for data mediation between different data formats Develop MarcOnt bibliographic ontology Create a community of users (domain experts)
  • 38.
    Conclusions Ontologies: can improve the most important goal of digital libraries – searching the information facilitate interoperability capture much more information (metadata) than existing systems are the agreement of people (domain experts) need tools for collaborative development and community of users are the future of Digital Libraries?
  • 39.
    Tomasz Woroniecki [email_address]JeromeDL Building a Semantic Digital Library
  • 40.
    Outline of thepresentation Introduction to Semantic Digital Libraries Overview of JeromeDL Architecture of JeromeDL Working with JeromeDL Demo
  • 41.
    Social Semantic Digital Library A library stores and provides access to resources (books) Qualified staff updates catalogues and helps users
  • 42.
    Social Semantic Digital Library Machine-readable resources Full-text index improves searching Easy access Availability
  • 43.
    Social Semantic Digital Library Resources are accessible by machines, not with machines Metadata is rich and extensible Searching reflects meaning of terms RDF is a standard for representing information Not just resources but also knowledge is shared
  • 44.
    Social SemanticDigital Library Involves the community into sharing knowledge Utilizes social network in searching Allows for comments, blogs, shared bookmarks Easy tagging
  • 45.
    Evolution of LibrariesSocial Semantic Digital Library Involves the community into sharing knowledge Semantic Digital Library Accessible by  machines, not only with machines Digital Library Online, easy searching with a full-text index Library Organized collection
  • 46.
    Semantic Digital LibrarySemantic digital libraries integrate information based on different metadata, e.g.: resources, user profiles, bookmarks, taxonomies provide interoperability with other systems (not only digital libraries) deliver more robust, user friendly and adaptable search and browsing interfaces empowered by semantics
  • 47.
    JeromeDL - MotivationsSupport for different kinds of bibliographic medatata, like: DublinCore , BibTeX and MARC21 at the same time. Making use of existing rich sources of bibliographic descriptions (like MARC21) created by human. Supporting users and communities: users have control over their profile information; community-aware profiles are integrated with bibliographic descriptions support for community generated knowledge Delivering communication between instances: P2P mode for searching and users authentication Hierarchical mode for browsing
  • 48.
    JeromeDL – SocialSemantic Digital Library JeromeDL fulfills requirements of: Librarians precise annotations rich metadata Researchers easy publishing searching related topics Average users efficient search and browsing online collaboration
  • 49.
  • 50.
  • 51.
    Using JeromeDL Uploadinga resource provide title, abstract, author etc. provide structure of the resource (e.g., chapters) choose domains of the subject choose keywords for the resource set additional properties upload digital parts of the resource
  • 52.
  • 53.
    Using JeromeDL Anadministrator either approves or rejects a published resource
  • 54.
  • 55.
    JeromeDL for aregular user Browsing resources by type, author, keyword, domain Downloading the resource and its bibliographic description in various formats Subscribing to RSS feeds Searching simple, advanced, distributed, semantic
  • 56.
    JeromeDL for aregular user
  • 57.
    Summary An easysolution for putting resources online A community around your repository Support for many languages Integration with Bibster and OpenSearch protocols Visit www.jeromedl.org
  • 58.
    Irish Digital LibrariesSummit Digital Libraries at the eve of the Next Generation Internet Future of Digital Libraries http://wiki.corrib.deri.ie/index.php/SemDL/IrishDLSummit
  • 59.
    Looking into theFuture of Irish Digital Libraries ? ?
  • 60.
    Building the FutureFuture Internet, semantic or social, or both, will not emerge on its own , we need to build it
  • 61.
    Building the FutureDigital libraries are important part of the Internet
  • 62.
    Building the FutureLibraries should continue to drive the changes, not only follow
  • 63.
    Building the FutureOnNeM - IBM Ontological Network Miner and its applications to semantic social networks BRICKS Project – Building Resources for Integrated Cultural Knowledge Services
  • 64.
    IBM CAS Dublin/ LanguageWare group Ontological Network Miner and its applications to models of social networks and semantics Alexander Troussov, Mikhail Sogrin, John Judge
  • 65.
    Agenda Ontological NetworkMiner tool (project Galaxy) As generic tool to perform elements of soft clustering and fuzzy inference on semantic networks Applications of Galaxy to ontology-based semantic analysis of texts Semantic tagging, term disambiguation based on the global context Galaxy applications to folksonomies Community detection/Expertice location, … Applications to unified models of semantic social networks Research cooperation
  • 66.
    Ontological Network Miner(Galaxy) A generic tool to perform elements of soft clustering and fuzzy inference on semantic networks Ongoing project based on the work we have done for EU 6 th framework integrated project Nepomuk
  • 67.
    Applicationsto metadata generation
  • 68.
    Applications to metadatageneration Currently the semantic web relies on semantic annotation mostly done manually by humans Working in EU 7 th framework project Nepomuk (which aims to build social semantic desktop) we in IBM Dublin developed a tool for automation of metadata creation: Automatic ontology-based conceptual tagging (central concepts of the text with respect to the given lexico-semantic resource) Text which mentions Mulhuddart, Lansdowne, Clontarf is probably about Dublin/Ireland/Europe/Earth, this fact can be inferred from geographical relations like Mulhuddart “is-part-of” Dublin Disambiguation of terms Based on on the ontological knowledge from corresponding resource (Jaguar – a car or an animal? Jaguar, car, animal, pet, …) …
  • 69.
    Automatic tagging basedon concept mentions NETWORK OF CONCEPTS TEXT Mapping of term mentions to concepts . Finding “focus” concept Mention Mention Mention Mention
  • 70.
    DEMO (Lotusphere 2007) Run eclipse.exe Open lotusphere_demo.config.xml located in subfolder data Have a look at the underlying personal information management ontology people, organisations, projects, Open text: email1.anno Text is processed on the fly, terms are disambiguated, central concepts are shown in the upper-right window Why US? Because most found concepts are people, and during disambiguation it was established that most likely referents of (ambiguous) names are located in US Let us remove first line with two names The text now has less names. Instead of people, other (abstract) concepts now play a more prominent role. Because of this (after a small delay caused by Eclipse, not by the performance of our system) US disappears as the top concept
  • 71.
    What is OntologicalNetwork Miner? Text analytics demo shown before has applications to: Context dependent smart tags Metadata generation Although text processing is a complex process involving mapping from text to concepts and usage of empirics specific to certain properties of the discourse at the heart of the processing is clustering on the graph of concepts Which was shown by the animation when wide orange area becomes smaller after “magical” shrinking This clustering is provided by IBM Dublin Ontological Network Miner Codenamed OnNeM in Nepomuk project
  • 72.
    What exactly OntologicalNetwork Miner does? One algorithm (a blend of soft clustering & fuzzy inference) Depending on the parameters, this algorithm provides “ Generalisation” of the model Output has less nodes compared to the input “ Expansion” of the model Which might be used for query expansion: Query “nutrion”+”science” is expanded into properly ranked list: nutritionist, dietologist, nutritional, scientific, .. Our customers and partners can tune the algorithm for specific tasks using intuitively clear parameters.
  • 73.
    Tuning Galaxy Galaxy utilises a data-driven algorithm and more importantly, tuning can be done by a domain specialist (not necessarily a researcher or software developer), is to “tell” Galaxy what properties of the underlying semantic network are relevant to a particular task: For example, in application to geotagging the user might specify that Galaxy favour geographical locations with bigger populations, and, in addition, favour popular resorts Using WordNet – specify that Galaxy must favour hypernymy-hyponymy relations and disfavour meronymy-holonomy relations Researchers (IBMers and CAS scientists) also have the opportunity to work with us on “fine-tuning” the algorithm For example, to improve usage of graph-metrics such as in-/out- degree of nodes
  • 74.
    Applications to folksonomysystems (Del.icio.us, IBM’s Dogear, …)
  • 75.
    Folksonomies as ontologicalnetworks  People Documents Tags Instances of tagging
  • 76.
    Why a “generic”ontological network miner is needed: Objects of interest might be wired into one unified model of lexicon, semantic and social networks For example, the network depicted on the previous slide can be augmented with new entities and new relations One can add relations between participants, or add new people into consideration Semantic relations between tags might be added manually, or generated automatically based on morphological similarity of words, proximity in WordNet, etc. Keywords and other metainformation about documents might be wired into the network Tags in folksonomies are created by humans. Keywords (preexisting in documents or extracted by text processing) and their relations to documents and tags might be added to augment folksonomies. Dogear can recommend tags for new document which nobody yet tagged in a style accepted in the community
  • 77.
    Why a “generic”engine like Galaxy is needed: (cont) Unified model of lexicon, semantic and social networks gives more context to make the right decisions in Community Detection, Community Structure Analysis, Metadata Sharing & Recommendations, etc However, data network becomes quite intricate and irregular, and only generic, scalable and high-performing ontological network miners (like Galaxy) are up to the job Galaxy is a generic technique, which can efficiently work on huge networks with complex topology Most tasks on MeSH and WordNet are done in 200 msc Galaxy has native potential for explanatory module “ This person might help you to understand this document because he frequently used tags popular for this documents”
  • 78.
    OnNeM can handle Networks like this:  People Documents Tags Instances of tagging New people and additional relations between them Relations between tags: semantic proximity, misspellings, translations, WordNet, … Relations between documents: … New objects: e.g. keywords from texts might be related with documents and tags
  • 79.
    Applications to Semantic Social Networks & Knowledge Exchange
  • 80.
    What problems Galaxycan address Galaxy could be used to uniformly address many problems in Semantic Social Networks & Knowledge Exchange : Tag recommendation in folksonomies; Community detection; Centrality problem in social network analysis; Expertise location… How? Galaxy is a generic technique: which takes as input a function on nodes of a semantic network and transforms this input into another function according to the parameters. To simplify explanations, instead of the input/output functions, we’ll talk about the input set of nodes and ranked output set of nodes To create solution for a particular task A set of input nodes must be chosen Parameters of the algorithm must be established Output set must be interpreted according to the task
  • 81.
    IBM social software “ the company is serious about dominating social networking for the enterprise” Cooking Up a Social Networking Storm With IBM Labs, March 30, 2007 IBM Social Software Dogear Dogear is a social-tagging service for resources such as public URLs, company-internal URLs, and other company internal documents (e.g., Wiki pages, Domino documents, etc.) Bluepages+1 is an enhanced version of IBM online employee directory. Among its enhancements is the ability for one person to apply a tag directly to another person’s directory page. Blog Central Blog Central is an internal blogging service, open to any employee. The Blog Central data structures provide for a separate list of tags for each blog and for each entry within each blog. Activities Activities is a web-based version of ActivityExplorer, an activity-centric collaboration service in which teams may create a collections of diverse objects in a tree-like structure consisting of a root “activity” and its daughter components.
  • 82.
    Our research plansto exploit Galaxy: We are investigating a wide range of applications in Community Detection, Community Structure Analysis, Metadata Sharing & Recommendations Enhanced with Social Reputation Mechanisms Based on our understanding of potential IBM needs, our commitments for European research projects, and our vision of the potential of Galaxy, we are looking forward to the creation of the following functionalities: Community Support Given a peer: Search for its neighbors within a community Given the entire collection: Identify trends and threads (e.g., tags becoming popular, etc.) Metadata Sharing & Recommendations Given a file with some attached metadata: Recommend additional annotations Recommend similar files Given one or more tags and/or keywords: Locate peers with expertise in the described areas
  • 83.
    Research collaborationation Create semantic social networks of your interest … in the format which can be used by Galaxy simple XML format Design scenario and work with us on tuning parameters of Galaxy for the tasks in your scenario … Contacts Alexander Troussov, CAS Chief Scientist, [email_address] Marie Wallace, LanguageWare manager, [email_address] Brian O’Donovan, CAS Program Director, [email_address] IBM CAS Dublin https://www.ibm.com/ibm/cas/sites/dublin/ LanguageWare http://www.ibm.com/software/globalization/topics/languageware/index.jsp NEPOMUK http://nepomuk.semanticdesktop.org/
  • 84.
  • 85.
    BRICKS Project PredragKnežević Fraunhofer IPSI Institute Darmstadt, Germany [email_address]
  • 86.
    What is BRICKS?A software infrastructure for building digital library networks Transparent access to distributed resources Multilinguality Easy installation & maintainance A set of end-user applications Network & content management Web 2.0 Tagging/Annotations Domain specific applications A business model Open Source, Platform Independent Low cost infrastructure User communities  sustainability
  • 87.
    Sustainability User CommunitiesOpen Source Applications User App. Build on top of the foundation User Services can become Foundation services Foundation/Infrastructure Decentralized Storage Content&Metadata Mngt. Semantic Retrieval Security/DRM BRICKS
  • 88.
    BRICKS Architecture Adecentralized P2P network Avoid central coordination Highly Scalable, increased reliability Minimized maintainance costs Each P2P Node is a set of SOA components Web Service Interface Platform Independent Flexible Composition Components for Storing, accessing and protecting digital objects (Semantic) search & browsing P2P commmunication
  • 89.
  • 90.
    A Look intoa BNode { BNode
  • 91.
    Features Application developmentin any language with a good Web-service support Metadata Support for various schemas Indexed both locally and published in decentralized index as well Annotations Support for various media types (text, images, audio, video) Various supported types (text, audio, video, spatial, temporal) Content Can be stored outside of BNode Internally content can be managed in various binary and structured (XML) formats Organized into collections Location transparent for applications Search Simple, advanced, ontology-based Cross-language support Addresses all available content
  • 92.
    Collection Manager Singleaccess point for all content and metadata related operations (local and remote) Physical Collection Similar to folder/directory hierarchy in a file system Bound to a single BNode Each digital content object belongs to exactly one collection Logical Collection Virtual folder for organizing content items independent of their physical location Links to content items from various physical collections on different BNodes A content item might belong to many of them Stored Query similar to database views
  • 93.
    Content Manager Twoways to handle Content in BRICKS stored locally at site of a member party, accessed via URL stored within BRICKS Based on Java Content Repository (JCR) Provide a meta-content model Re-use of existing content models Use standard models
  • 94.
    Metadata Manager Metadatadescriptions  RDF Suitable for any applícation scenario Express Relationships between objects React to changes without changing the model Schema defintions  OWL No fixed schema Extensible (e.g. Application Profiles) Semantic concepts instead of schematic strucutures SPARQL Metadata queries over ontology concepts Queries for graph patterns
  • 95.
    Annotation Management Richmodel Supported fragment types: “Text fragment”, “Time fragment”, “Rectangle”, “Circle”, “Point”, “Polygon” and “Polyline” Supported annotation types: “Structured Annotation”, “Association”, “Text annotation” and “Symbol Annotation” Annotation type “association” supports n:m relations Support of versioning Annotation of complete objects and of fragment of objects Supports annotation of multiple objects 13/03/2007
  • 96.
    Security Manager Transparentlyinvoked by the Framework any service call is checked Context-aware policies based on RBAC (via XACML rules), supporting Roles, Groups, at DLObject level Permission declaration through Javadoc @tags Federated identity is managed through an adapted version of OpenSAML Reputation-based Trust calculation integrated Web-based GUI for Security configuration 13/03/2007
  • 97.
    Digital Rights ManagementDRM Component Support for licenses based on MPEG-21 REL license declaration standard Generic API for the integration of commercial DRM systems Watermarking Open-source watermarking tool for images other tools can be integrated BRICKS Store web application for commercial content Creative Commons support for other content in BRICKS 13/03/2007
  • 98.
  • 99.
    Application: BRICKS Workspace What does it demonstrate? a web application (thin client) accessing BRICKS Foundation services Web 2.0 image annotations Reference application Primary customers? general end-users (citizens) application developers Technology Struts based interface to the BCH Live demo at http://saturn.researchstudio.at:8090/workspace
  • 100.
    Application: BRICKS Desktop What does it demonstrate? a rich client application accessing BRICKS Foundation services direct access to the BCHN Primary customers? expert end-users (researchers, educators) application developers Technology Eclipse based rich client interface Download at http://develop.bricksfactory.org/projects/desktop
  • 101.
    Application: Annotation ToolWhat does it demonstrate? Tool which allows end-users to annotate images Creation of annotation threads Supervised Annotations Primary customers? end-users Institutions with large image collections Technology Web Application
  • 102.
    Application: Online ExhibitionAuthoring Tool What does it demonstrate? Creating and publishing online exhibitions using contents that is available in the BRICKS network Primary customers? expert end-users (curators) Technology Web Application Live demo at http://livingmemory.researchstudio.at/
  • 103.
    Application: Archeological FindsIdentifier What does it demonstrate? a web application for comparing found objects (e.g. ancient coins) with objects from reference collections Application of complex domain ontology (CIDOC-CRM) Map visualization of GIS-Metadata Primary customers? Museum curators, archaeologists, students, amateurs Technology Struts based interface Live Demo at http://finds.brickscommunity.org:8091/findsidentifier/index.do
  • 104.
    BRICKS Demo StoreWhat does it demonstrate? Purchasing digital goods License maintenances and proofing Primary customers Content providers Technology Based on OFBiz Live demo at http://brstore.metaware.it:9080/ecommerce/control/main
  • 105.
    References BRICKS CommunityWeb Site ( http://www.brickscommunity.org ) BNode Release Downloads ( http://foundation.bricksfactory.org ) BRICKSforge ( http://develop.bricksfactory.org ) BRICKS Developer Community ( http://dev.brickscommunity.org )
  • 106.
    Irish Digital LibrariesSummit Digital Libraries at the eve of the Next Generation Internet Digital Libraries in Ireland http://wiki.corrib.deri.ie/index.php/SemDL/IrishDLSummit
  • 107.
    Building the FutureIVRLA - The Irish Virtual Research Library and Archive Project - an infrastructure for humanities research. OJAX – A Web 2.0 Search user Interface S 3 B - With a Little Help from My Friends: Social Semantic Search and Browsing
  • 108.
    The Irish VirtualResearch Library and Archive Project - an infrastructure for humanities research.
  • 109.
    Outline Quick OverviewDigitisation Processes Repository Development Content Models IVRLA Deployment Observations
  • 110.
    IVRLA Positioning PRTLIfunded project Component of UCD Humanities Institute of Ireland and based in UCD Library Supporting research through offering access to digitised content from participating primary source repositories Direct research into digitisation and digital repositories Developing and promoting added value tools and services
  • 111.
    IVRLA Deliverables Bodyof digitised content Functioning repository prototype with scaleable infrastructure Comprehensive report including regulatory and financial issues Body of corporate knowledge & expertise Centre of excellence Proof of concept
  • 112.
  • 113.
    Support the creationand publication of new forms of “ information units ” Integrate with the processes (e.g., workflows) of research, collaboration, and scholarly communication Enable knowledge integration : capture semantic and factual relationships among information entities Promote information re-use and contextualization Facilitate collaborative activity and capture information that is created as a byproduct of it Capture and maintain the complex structural, semantic, provenance, and administrative relationships among digital resources* * Sandy Payette, Sydney 2006. Digital content repositories should…
  • 114.
  • 115.
    Image based DigitisationComponents Apple PowerMac G5 running Kodak oXYgen Scan Kodak IQSmart 2 Adobe Photoshop CS2
  • 116.
    Audio Digitisation ComponentsQuadriga system Lake People ADC and DAC Revox 1/4 inch tape player
  • 117.
    Files and FormatsScanned Material (text and images) TIFF (PM) JPEG (CW) Djvu (CW) JPEG (TN) Time Based Material Audio BWF (PM) MP3 / MP4 (CW) Video Linear Digital (PM) mov,wmv? (CW)
  • 118.
    Workflows TIFFs havemetadata embedded TIFFs are backed up to LTO Photoshop macros used to watermark, create JPEG and TIFFs DVDs created and stored Additional derivatives created for resource discovery and access
  • 119.
    Data Storage 3high quality ‘Preservation Master’ copies 2 DVD-ROM - working 1 LTO - deep archive Copies stored in geographically disparate locations Estimate that IVRLA will require 6-8TB for all preservation master storage. Scans ~ 80MB Audio ~ 800MB/hr Online requirement is significantly less
  • 120.
    Metadata and Database2 stage cataloguing database MODS - descriptive metadata METS - structural and transmission metadata EAD - archival context and structure MIX - technical metadata for images MADS - descriptive metadata authority files
  • 121.
  • 122.
  • 123.
    Collection Model Libraryuse OPAC for searching Archives use Finding Aid for browsing Hybrid model to enable searching and browsing of complex hierarchical digital collections Model facilitates top down and bottom up approaches EAD provides context and structure MODS provides precision and accuracy Create EAD template for each ‘collection’ Catalogue to the appropriate level
  • 124.
  • 125.
  • 126.
  • 127.
    Open Source RepositorySystems Growing area of development Several options available; Dspace Eprints Fedora IVRLA required a solution which offers; Suitability for wide range of data types Support for collection structures and complex objects Scalability - prototype into service Future-proof architecture Long term digital preservation
  • 128.
    Fedora Service Framework(2005-07) © S.Payette
  • 129.
    IVRLA Preservation RequirementsAudit trails and datastream versioning Persistent Identifiers Checksum creation and validation Whole object versioning OAIS compliance TDR compliance
  • 130.
    IVRLA interface requirementsEvidence Provenance, authenticity, integrity, context, persistence, sustainability Granularity - directed to page, clip, part .. Security, authentication and authorisation infrastructure Conversation/Participation Informal, collaborative Personalisation and customisation Recommendation Services (S/CSI) Social searching and annotation (S/CSI - S/ILS) Add value, links, connections…
  • 131.
  • 132.
    Fedora Content ModelsA definition for a “type” of object (e.g., article, book, image, learning object) that describes the internal composition of a group of similar Fedora objects Data Type Structure Services Data Type defines payloads and metadata Structure defines relationships between objects Services define actions or disseminators for the content
  • 133.
  • 134.
  • 135.
  • 136.
    RoadMap Initial Researchand Demo Develop utilities - sipMaker and mixMaker Articulate collection model Develop Virtual Library and Archive 1.0 Browse Search View Cite Tag Ingest Trial and deployment of subsets Develop Virtual Library and Archive 2.0 User management Personalisation, customisation Recommendation services Annotation and tagging Research space Virtual collections
  • 137.
  • 138.
    Usership Research basedContext heavy - accuracy, integrity and authenticity Technically literate with Internet age expectations - the Google effect Accurate citation and source acknowledgement using persistent identifiers
  • 139.
    Repository Challenges Architectureis not an ‘out of box’ solution Resources required to articulate and develop interface layer(s) Metadata management is complex Tension between popular delivery formats and archival preservation formats Challenge of anticipating all user environments in content modeling Improved automation is necessary for ingest and validation. Digitisation is the main bottleneck Sustainability - prototype developed into a service Human resources are central to technology projects Developing and training data curators - multidisciplinary skill sets
  • 140.
    Observations and Conclusion5 year project timeline requires an iterative process New advances in computing science will influence developments - eScience, eHumanities, Web 2.0 IVRLA positions the archival source with all context and structure as central to the digital deployment Define and build core sources which can be interrogated and integrated with dynamic services Standards based interoperability is key to ensure future accessibility and sustainability New repository models suggest and support user created metadata such as social bookmarking and annotating
  • 141.
  • 142.
    OJAX: Web2.0 Federated search Judith Wusteman April 2007
  • 143.
    Overview Introducing OJAXOJAX Demo Related research
  • 144.
    Web 2.0 Technologiesand Standards used in OJAX AJAX REST JSON Atom OAI-PMH OpenSearch Open API StaX Apache Lucene
  • 145.
  • 146.
  • 147.
  • 148.
  • 149.
  • 150.
  • 151.
  • 152.
  • 153.
  • 154.
    OpenSearch Enables searchengines to describe their search syntax to browsers Describes standards for search results syntax Based on RSS and Atom
  • 155.
  • 156.
  • 157.
  • 158.
  • 159.
  • 160.
  • 161.
    Science Foundation Ireland:OJAX++: a next generation collaborative research tool To investigate how concepts from the Social Web can be applied to the research environment in order to facilitate dynamic collaboration and the sharing of ideas among researchers.
  • 162.
    PhD starting September2007 In collaboration with UCD School of Computer Science and Informatics Requirements Honours degree (preferably first class or 2.1) in Computer Science or a related field or equivalent technical expertise Preferred Experience : Web technology JavaScript AJAX one of Java, Ruby or Python. http://www.ucd.ie/wusteman [email_address] .
  • 163.
    Advantages of OJAXDeveloped in Ireland. Can be adapted to suit. Already in Beta version. Available for download. Well received Responds to new user expectations generated by Web 2.0 Rich, dynamic user experience. Intuitive interface. Integration, interoperability and reuse. Open source standards-compliance. including OpenSearch, OAI-PMH, StAX and Apache Lucene.
  • 164.
  • 165.
    With a LittleHelp from My Friends Social Semantic Search and Browsing Sebastian Ryszard Kruk, Adam Gzella Digital Enterprise Research Institute National University of Ireland, Galway sebastian.kruk@deri.org, adam.gzella@deri.org http://s3b.corrib.org/
  • 166.
    Take away messageWe search in different way for different things Keyword search is not enough We create the knowledge by sharing our (search) experience
  • 167.
    Outline Motivation Howdo people search Search and Browsing lifecycle Applying semantics and making use of social networks: Keyword-based search Collaborative Faceted Navigation Collaborative Filtering Conclusions - Putting it all together
  • 168.
    How do peoplesearch? Different user goals: Resource Seeking - the user wants to find a specific resource (e.g. lyrics of a song, a program to download, a map service etc.) Navigational - the user is searching for a specific web site whose URL s/he forgot Informational - the user is looking for information about a topic s/he is interested in Rose and Levinson: Understanding user goals in web search (2004)
  • 169.
    Search and browsinglifecycle Why ? Information can be useful Information can be a garbage How ? (Search and browsing actions) [REUSE] keyword-based search (resource seeking) [REDUCE] faceted navigation (navigational) [RECYCLE] collaborative filtering (informational) Can this process be improved with Semantic Web and Social Networking technologies?
  • 170.
    Query refinement inkeyword-based search Why simple full-text search is not enough? Too many results ( low precision ) One needs to specify the exact keyword ( low recall ) How to distinguish between: Python and python? ( high fall-out ) How ? Disambiguation through a context Query context Short-term context: User’s goal Location Time Long-term context: User’s interest Search engine specific
  • 171.
    Query refinement inkeyword-based search How ? Query refinement Spread activation Types mapping Pruning Acquiring the context information : Previous searches of the user Semantically annotated user’s bookmarks Community profile And ? (Manual query refinement) “ Tell me why ” button and the transcript of refinement process Continue to faceted navigation
  • 172.
    Collaborative Faceted NavigationWhy ? The search does not end on a (long) list of results The results are not a list (!) but a graph We loose context with linear navigation A need for unified notion (UI, Services) of filter/narrow and browse/expand services Share browsing experience – navigate collaboratively How (Services)? Defines REST access to services and their composition Basic services : access, search, filter, similar, browse, combine Meta services : RDF serialization, subscription channels, service ID generation Context services : manage contexts, manage service calls/compositions in the context, lists contexts Statistics services : properties, values, tokens
  • 173.
    Collaborative Faceted NavigationHow (User interface)? Hexagons to capture the notion of non-linear history of browsing Selecting values from list, tag cloud or TagsTreeMap TM Context zoomable interface : List (graph) of results Browse from current results Navigate between service call Navigate between contexts (with given call)
  • 174.
    Social Semantic CollaborativeFiltering Why? The bottom-line of acquiring knowledge: informal communication (“word of mouth”) How? Everyone classifies (filters) the information in bookmark folders ( user-oriented taxonomy ) Peers share (collaborate over) the information ( community-driven taxonomy ) Result? Knowledge “flows“ from the expert through the social network to the user System amass a lot of information on user/community profile ( context )
  • 175.
    Social Semantic CollaborativeFiltering Problems? The horizon of a social network (2-3 degrees of separation) How to handle fine-grained information (blogs, wikis, etc.) Solutions? Inference engine to suggest knowledge from the outskirts of the social network Support for SIOC metadata : Semantically Interlinked Online Communities: blogs, wikis, fora, … SIOC browser in SSCF Annotations and evaluations of “local” resources
  • 176.
    Putting it alltogether user profile: recent actions refine search results filter, record, annotate, and share results and actions re-call shared actions user profile: user’s interests filter, record, annotate, and share results
  • 177.
    Do we needSemantic Web and Web 2.0 technologies in Digital Libraries? Irish Digital Libraries Summit Digital Libraries at the eve of the Next Generation Internet http://wiki.corrib.deri.ie/index.php/SemDL/IrishDLSummit
  • 178.
    Irish Digital LibrariesSummit Digital Libraries at the eve of the Next Generation Internet Conclusions http://wiki.corrib.deri.ie/index.php/SemDL/IrishDLSummit