Slideshare.net (beta)

 
Post to TwitterPost to Twitter
Post: 
Myspace Hi5 Friendster Xanga LiveJournal Facebook Blogger Tagged Typepad Freewebs BlackPlanet gigya icons

All comments

Add a comment on Slide 1

If you have a SlideShare account, login to comment; else you can comment as a guest


Showing 1-50 of 4 (more)

Irish Digital Libraries Summit

From skruk, 2 years ago

This is a full stack of slides from the first edition of the Irish more

2573 views  |  0 comments  |  2 favorites  |  88 downloads
 

Categories

Add Category
 
 

Tags

corrib deri digital libraries semdl jeromedl semantic web web2.0 sscf s3b irvla

more

 
 

Groups / Events

 
Embed
options

More Info

CC Attribution-NonCommercial LicenseCC Attribution-NonCommercial License
This slideshow is Public
Total Views: 2573
on Slideshare: 2573
from embeds: 0

Slideshow transcript

Slide 1: Irish Digital Libraries Summit Digital Libraries at the eve of the Next Generation Internet Sebastian Ryszard Kruk, Mary Burke, Stefan Decker http://wiki.corrib.deri.ie/index.php/SemDL/IrishDLSummit  Copyright 2006 Digital Enterprise Research www.deri.ie Institute. All rights reserved.

Slide 2: Looking into the Future of Irish Digital Libraries ? 2

Slide 3: Why do we care? • John teaches biology, over the Internet, using digital libraries and modern technologies (wikis, blogs) • How to deliver the material just-in-time? • How to pre-asses students? • How to automate most of the process? 3

Slide 4: Goals • Present current solutions that digital libraries to the Next Generation Internet 4

Slide 5: Goals • Gather opinions, requirements and future plans of Irish libraries 5

Slide 6: Goals • Build up bases for an application for funding of a national digital libraries initiative under the EU FP7 Digital Libraries theme 6

Slide 7: Schedule 10:00-10:30 Sebastian Kruk Get together, Welcome Mary Burke Semantic Digital Libraries 10:30-11:00 Maciej Dąbrowski Ontologies for Digital Libraries 11:00-11:30 Tomasz Woroniecki Building a Semantic Digital Library 11:30-11:50 Coffee break Future of Digital Libraries 11:50-12:00 Sebastian Kruk Introduction to the session 12:00-12:30 Alexander Troussov IBM Ontological Network Miner and its applications to semantic social networks 12:30-13:00 Predrag Knezevic BRICKS Project 13:00-14:00 Lunch break 7

Slide 8: Schedule Digital Libraries in Ireland 14:00-14:30 John McDonough The Irish Virtual Research Library and Archive Project – an infrastructure for humanities research. 14:30-15:00 Judith Wusteman OJAX: A Web 2.0 Search user Interface 15:00-15:30 Sebastian Kruk, With a Little Help from My Friends: Adam Gzella Social Semantic Search and Browsing 15:30-15:45 Coffee break 15:45-16:45 Mary Burke Discussion panel: Do we need Semantic Web and Web 2.0 technologies in Digital Libraries? 16:45-17:00 Wrap-up, Conclusions 8

Slide 9: Ontologies for Digital Libraries MarcOnt Initiative Maciej Dąbrowski Digital Enterprise Research Institute National University of Ireland, Galway maciej.dabrowski@deri.org  Copyright 2006 Digital Enterprise Research www.deri.ie Institute. All rights reserved.

Slide 10: Outline • Real-life and Semantic Web • Semantic Web and Ontologies • MarcOnt Ontology • MarcOnt Tools • Conclusions 10

Slide 11: Real-life problems Heterogenous systems MARC21 Identified Problems: ? Dublin Core • Interoperability • Format translation Bibtex Multiple data formats in DL: • How to support them? • How to translate between them? • Who should create mappings? 11

Slide 12: Real-life problems – user’s expectations Searching: • Effective and Accurate We want correct and fast answers!! • Intuitive and Simple Asking questions should be easy. • Meaning Jaguar – a car or an animal? • Reasoning Give me articles written by students of X in Galway? Identified problems: • Intuitive interface for asking complex querries 12

Slide 13: Real-life problems - summary Digital Libraries should provide: • Interoperability • Support for many formats • Complex search features • Intuitive interfaces 13

Slide 14: Semantic Web 14

Slide 15: The Semantic Web – A Brief Introduction • Current Web vs. Semantic Web? – An extension of the current Web in which information is given well- defined meaning, better enabling computers and people to work in cooperation. [Tim Berners-Lee] – Current Web was designed for humans, and there is little information usable for machines • Was the Web meant to be more? – Objects with well defined attributes as opposed to untyped hyperlinks between Internet resources – A network of relationships amongst named objects, yielding unified information management tasks • What do you mean by “Semantic”? – the semantics of something is the meaning of something – Semantic Web is able to describe things in a way that computers can understand 15

Slide 16: Semantic Web vs. Current Web Current Web Semantic Web 16

Slide 17: The Semantic Web – What is RDF? Describing things on the Semantic Web – RDF (Resource Description Framework) • a data format for describing information and resources, • the fundamental data model for the Semantic Web – Using RDF, we can describe relationships between things like: • A is a part of B or • Y is a member of Z • and their properties (size, weight, age, price…) in a machine-understandable format – RDF graph-based model delivers straightforward machine processing – Putting information into RDF files makes it possible for “scutters” or RDF crawlers to search, discover, pick up, collect, analyse and process information from the Web 17

Slide 18: The Semantic Web – What is RDF? A simple RDF example – Statement: “Stefan Decker is the creator of the resource (web page) http://www.stefandecker.org” – Structure: Resource (subject) http://www.stefandecker.org Property (predicate) http://purl.org/dc/elements/1.1/creator Value (object) “Stefan Decker” – Directed graph: dc:creator Stefan Decker http://www.stefandecker.org 18

Slide 19: The Semantic Web – How RDF can help us? How RDF can help us? • identify objects • establish relationships • express a new relationship just add a new RDF statement • integrate information from different sources  copy all the RDF data together • RDF allows many points of view 19

Slide 20: Ontologies • What is an Ontology? „An ontology is a specification of a conceptualization.“ Tom Gruber, 1993 • Ontologies are social contracts – Agreed, explicit semantics – Understandable to outsiders – (Often) derived in a community process • Ontology markup and representation languages: – RDF and RDF Schema – OWL – Other: DAML+OIL, EER, UML, Topic Maps, MOF, XML Schemas 20

Slide 21: Components of ontologies Concepts Axioms • Book • Planes can fly • Article • People can’t fly • Author Relationships Properties • Is a • hasPages • Part of • hasTitle Constraints • Cardinality is at least 1 • Maximum value is 200 21

Slide 22: Ontologies - half-time conclusions • Data is not only human readable, it is now Advantages also machine readable • Machines can realize much more complex tasks (eg. reasoning) • Capturing the meaning of concepts is possible • A new look on data storage systems (there are no data structures!!) 22

Slide 23: Usecase scenario Regular Systems Structured resources: • Author • Title Data storage allows: Author • Author Title Author • Title Date Title Additional information Author Title Date Author Title cannot be stored!! 23

Slide 24: Ontology development process Development • Many approaches • Different life cycles • Continuous process • Involves community of users • Requires tools for collaboration • Tools for ontology development are necessary 24

Slide 25: MarcOnt Initiative Motivation: • Build a bibliographic ontology for the Jerome Digital Library MarcOnt Initiative goals: • Deliver a set of tools for collaborative ontology development • Collaboration • Tools for domain experts • Enable mediation between formats (MMS) 25

Slide 26: MarcOnt Ontology • Central point of MarcOnt Initiative • Translation and mediation format • Continuous collaborative ontology improvement • Knowledge from the domain experts • Community influence and evaluation 26

Slide 27: MarcOnt Ontology Goals: • Capture concepts from the legacy bibliographic formats – MARC21, Bibtex, Dublin Core – Lattes, ... • Create a uniform bibliographic description format for digital libraries. • Enable the use of Semantic Web technologies (eg. reasoning) to improve capabilities of digital libraries • Improve interoperability 27

Slide 28: Format Translation Scenario Author: Author: John Smith John Smith Dublin Core Date of Birth: Date of Birth: 1956-10-15 ?? Date of death: Date of death: 2004-09-10 ?? Author: Author: John Smith John Smith Date of Birth: Date of Birth: ?? ?? Date of death: Date of death: ?? ?? 28

Slide 29: Format Translation Scenario Author: Author: Author: John Smith John Smith John Smith Date of Birth: Date of Birth: Date of Birth: Dublin Core 1956-10-15 1956-10-15 ?? Date of death: Date of death: Date of death: 2004-09-10 2004-09-10 ?? RDF Storage Author: Author: Author: John Smith John Smith John Smith Date of Birth: Date of Birth: Date of Birth: 1956-10-15 1956-10-15 ?? Date of death: Date of death: Date of death: 2004-09-10 2004-09-10 ?? 29

Slide 30: MarcOnt Mediation Services 30

Slide 31: MarcOnt Mediation Services Interoperability Format translation MarcOnt Ontology MarcOnt RDF MARC21 RDF Dublin Core RDF New format RDF MARC21 XML Dublin Core XML New format XML MARC21 Dublin Core New format MarcOnt Mediation Services RDF Translator 31

Slide 32: MarcOnt Ontology in JeromeDL • Improvement of searching capabilities • Natural Language Processing (NLP) • Templates Show me all publications written by students of Decker. 32

Slide 33: MarcOnt Portal Collaborative ontology development. Initial Ontology Portal provides: Sugested Poposals • Suggestions Versioning Proposal discussion • Annotations Proposal anotations • Versioning • Ontology editor Proposal autopromoting Proposal voting Next Revision MarcOnt Portal 33

Slide 34: MarcOnt Portal On-line ontology editing Visualization of ontologies 34

Slide 35: MarcOnt Portal Comparing versions of ontologies 35

Slide 36: MarcOnt Initiative Roadmap • MarcOntX agent – automatic integration of concept from Digital Libraries • Lattes – CV platform used in Brasil • Release of MarcOnt draft ontology • Digital Rights Management • Sharing issues 36

Slide 37: MarcOnt Initiative summary MarcOnt Initiative goals: • Create a framework for collaborative ontology development • Provide domain experts with tools to share their knowledge • Offer tools for data mediation between different data formats • Develop MarcOnt bibliographic ontology • Create a community of users (domain experts) 37

Slide 38: Conclusions Ontologies: • can improve the most important goal of digital libraries – searching the information • facilitate interoperability • capture much more information (metadata) than existing systems • are the agreement of people (domain experts) • need tools for collaborative development and community of users • are the future of Digital Libraries? 38

Slide 39: JeromeDL Building a Semantic Digital Library Tomasz Woroniecki tomasz.woroniecki@deri.org  Copyright 2006 Digital Enterprise Research www.deri.ie Institute. All rights reserved.

Slide 40: Outline of the presentation • Introduction to Semantic Digital Libraries • Overview of JeromeDL • Architecture of JeromeDL • Working with JeromeDL • Demo 40

Slide 41: Social Semantic Digital Library • A library stores and provides access to resources (books) • Qualified staff updates catalogues and helps users

Slide 42: Social Semantic Digital Library • Machine-readable resources • Full-text index improves searching • Easy access • Availability

Slide 43: Social Semantic Digital Library • Resources are accessible by machines, not with machines • Metadata is rich and extensible • Searching reflects meaning of terms • RDF is a standard for representing information • Not just resources but also knowledge is shared

Slide 44: Social Semantic Digital Library • Involves the community into sharing knowledge • Utilizes social network in searching • Allows for comments, blogs, shared bookmarks • Easy tagging

Slide 45: Evolution of Libraries Social Semantic Digital Library Involves the community into sharing knowledge Semantic Digital Library Accessible by machines, not only with machines Digital Library Online, easy searching with a full-text index Library Organized collection

Slide 46: Semantic Digital Library Semantic digital libraries – integrate information based on different metadata, e.g.: resources, user profiles, bookmarks, taxonomies – provide interoperability with other systems (not only digital libraries) – deliver more robust, user friendly and adaptable search and browsing interfaces empowered by semantics

Slide 47: JeromeDL - Motivations • Support for different kinds of bibliographic medatata, like: DublinCore, BibTeX and MARC21 at the same time. – Making use of existing rich sources of bibliographic descriptions (like MARC21) created by human. • Supporting users and communities: – users have control over their profile information; – community-aware profiles are integrated with bibliographic descriptions – support for community generated knowledge • Delivering communication between instances: – P2P mode for searching and users authentication – Hierarchical mode for browsing

Slide 48: JeromeDL – Social Semantic Digital Library JeromeDL fulfills requirements of: • Librarians – precise annotations – rich metadata • Researchers – easy publishing – searching related topics • Average users – efficient search and browsing – online collaboration

Slide 49: JeromeDL - Architecture

Slide 50: Ontologies in JeromeDL

Slide 51: Using JeromeDL • Uploading a resource – provide title, abstract, author etc. – provide structure of the resource (e.g., chapters) – choose domains of the subject – choose keywords for the resource – set additional properties – upload digital parts of the resource

Slide 52: Using JeromeDL

Slide 53: Using JeromeDL • An administrator either approves or rejects a published resource

Slide 54: Sharing bookmarks

Slide 55: JeromeDL for a regular user • Browsing resources – by type, author, keyword, domain • Downloading the resource and its bibliographic description in various formats • Subscribing to RSS feeds • Searching – simple, advanced, distributed, semantic

Slide 56: JeromeDL for a regular user

Slide 57: Summary • An easy solution for putting resources online • A community around your repository • Support for many languages • Integration with Bibster and OpenSearch protocols Visit www.jeromedl.org

Slide 58: Irish Digital Libraries Summit Digital Libraries at the eve of the Next Generation Internet Future of Digital Libraries http://wiki.corrib.deri.ie/index.php/SemDL/IrishDLSummit  Copyright 2006 Digital Enterprise Research www.deri.ie Institute. All rights reserved.

Slide 59: Looking into the Future of Irish Digital Libraries ? 59

Slide 60: Building the Future • Future Internet, semantic or social, or both, will not emerge on its own, we need to build it 60

Slide 61: Building the Future • Digital libraries are important part of the Internet 61

Slide 62: Building the Future • Libraries should continue to drive the changes, not only follow 62

Slide 63: Building the Future • OnNeM - IBM Ontological Network Miner and its applications to semantic social networks • BRICKS Project – Building Resources for Integrated Cultural Knowledge Services 63

Slide 64: IBM CAS Dublin / LanguageWare group Ontological Network Miner and its applications to models of social networks and semantics Alexander Troussov, Mikhail Sogrin, John Judge

Slide 65: Agenda  Ontological Network Miner tool (project Galaxy) As generic tool to perform elements of soft clustering and fuzzy  inference on semantic networks  Applications of Galaxy to ontology-based semantic analysis of texts  – Semantic tagging, term disambiguation based on the global context Galaxy applications to folksonomies  – Community detection/Expertice location, … Applications to unified models of semantic social networks   Research cooperation

Slide 66: Ontological Network Miner (Galaxy)  A generic tool to perform elements of soft clustering and fuzzy inference on semantic networks  Ongoing project based on the work we have done for EU 6th framework integrated project Nepomuk 66

Slide 67: Applications to metadata generation

Slide 68: Applications to metadata generation  Currently the semantic web relies on semantic annotation mostly done manually by humans  Working in EU 7th framework project Nepomuk (which aims to build social semantic desktop) we in IBM Dublin developed a tool for automation of metadata creation: Automatic ontology-based conceptual tagging  (central concepts of the text with respect to the given lexico-semantic resource) – Text which mentions Mulhuddart, Lansdowne, Clontarf is probably about Dublin/Ireland/Europe/Earth, this fact can be inferred from geographical relations like Mulhuddart “is-part-of” Dublin  Disambiguation of terms – Based on on the ontological knowledge from corresponding resource (Jaguar – a car or an animal? Jaguar, car, animal, pet, …) …

Slide 69: Automatic tagging based on concept mentions NETWORK OF CONCEPTS Finding “focus” concept Mapping of term mentions to concepts . Mention Mention Mention Mention TEXT

Slide 70: DEMO (Lotusphere 2007)  Run eclipse.exe  Open lotusphere_demo.config.xml located in subfolder data  Have a look at the underlying personal information management ontology people, organisations, projects,   Open text: email1.anno  Text is processed on the fly, terms are disambiguated, central concepts are shown in the upper-right window Why US? Because most found concepts are people, and during disambiguation it was  established that most likely referents of (ambiguous) names are located in US Let us remove first line with two names  – The text now has less names. Instead of people, other (abstract) concepts now play a more prominent role. Because of this (after a small delay caused by Eclipse, not by the performance of our system) US disappears as the top concept

Slide 71: What is Ontological Network Miner?  Text analytics demo shown before has applications to: Context dependent smart tags   Metadata generation  Although text processing is a complex process involving mapping from text to concepts and usage of empirics specific to  certain properties of the discourse  at the heart of the processing is clustering on the graph of concepts Which was shown by the animation when wide orange area becomes smaller  after “magical” shrinking  This clustering is provided by IBM Dublin Ontological Network Miner Codenamed OnNeM in Nepomuk project 

Slide 72: What exactly Ontological Network Miner does?  One algorithm (a blend of soft clustering & fuzzy inference)  Depending on the parameters, this algorithm provides “Generalisation” of the model  – Output has less nodes compared to the input  “Expansion” of the model – Which might be used for query expansion: • Query “nutrion”+”science” is expanded into properly ranked list: » nutritionist, dietologist, nutritional, scientific, ..  Our customers and partners can tune the algorithm for specific tasks using intuitively clear parameters.

Slide 73: Tuning Galaxy  Galaxy utilises a data-driven algorithm and more importantly, tuning can be done by a domain specialist (not necessarily a researcher or software developer), is to “tell” Galaxy what properties of the underlying semantic network are relevant to a particular task: For example, in application to geotagging the user might specify that Galaxy  favour geographical locations with bigger populations, and, in addition, favour popular resorts  Using WordNet – specify that Galaxy must favour hypernymy-hyponymy relations and disfavour meronymy-holonomy relations  Researchers (IBMers and CAS scientists) also have the opportunity to work with us on “fine-tuning” the algorithm  For example, to improve usage of graph-metrics such as in-/out- degree of nodes

Slide 74: Applications to folksonomy systems (Del.icio.us, IBM’s Dogear, …)

Slide 75: Folksonomies as ontological networks People Documents Tags Instances of tagging

Slide 76: Why a “generic” ontological network miner is needed:  Objects of interest might be wired into one unified model of lexicon, semantic and social networks  For example, the network depicted on the previous slide can be augmented with new entities and new relations One can add relations between participants, or add new people into consideration   Semantic relations between tags might be added manually, or generated automatically based on morphological similarity of words, proximity in WordNet, etc.  Keywords and other metainformation about documents might be wired into the network – Tags in folksonomies are created by humans. Keywords (preexisting in documents or extracted by text processing) and their relations to documents and tags might be added to augment folksonomies. • Dogear can recommend tags for new document which nobody yet tagged in a style accepted in the community

Slide 77: Why a “generic” engine like Galaxy is needed: (cont)  Unified model of lexicon, semantic and social networks gives more context to make the right decisions in Community Detection, Community Structure Analysis, Metadata Sharing & Recommendations, etc  However, data network becomes quite intricate and irregular, and only generic, scalable and high-performing ontological network miners (like Galaxy) are up to the job  Galaxy is a generic technique, which can efficiently work on huge networks with complex topology Most tasks on MeSH and WordNet are done in 200 msc   Galaxy has native potential for explanatory module “This person might help you to understand this document because he frequently used  tags popular for this documents”

Slide 78: OnNeM can handle New objects: e.g. Networks like this: keywords from texts might be related with documents and tags New people and People Documents Tags additional relations Relations between between them documents: … Relations between tags: semantic proximity, misspellings, translations, WordNet, … Instances of tagging

Slide 79: Applications to Semantic Social Networks & Knowledge Exchange

Slide 80: What problems Galaxy can address  Galaxy could be used to uniformly address many problems in Semantic Social Networks & Knowledge Exchange: Tag recommendation in folksonomies; Community detection; Centrality problem in  social network analysis; Expertise location…  How? Galaxy is a generic technique: which takes as input a function on nodes of a semantic  network and transforms this input into another function according to the parameters. To simplify explanations, instead of the input/output functions, we’ll talk about the input set of nodes and ranked output set of nodes To create solution for a particular task  – A set of input nodes must be chosen – Parameters of the algorithm must be established – Output set must be interpreted according to the task

Slide 81: IBM social software  “the company is serious about dominating social networking for the enterprise” Cooking Up a Social Networking Storm With IBM Labs, March 30, 2007   IBM Social Software Dogear  – Dogear is a social-tagging service for resources such as public URLs, company-internal URLs, and other company internal documents (e.g., Wiki pages, Domino documents, etc.) Bluepages+1  – is an enhanced version of IBM online employee directory. Among its enhancements is the ability for one person to apply a tag directly to another person’s directory page. Blog Central  – Blog Central is an internal blogging service, open to any employee. The Blog Central data structures provide for a separate list of tags for each blog and for each entry within each blog. Activities  – Activities is a web-based version of ActivityExplorer, an activity-centric collaboration service in which teams may create a collections of diverse objects in a tree-like structure consisting of a root “activity” and its daughter components. 81

Slide 82: Our research plans to exploit Galaxy:  We are investigating a wide range of applications in Community Detection, Community Structure Analysis, Metadata Sharing & Recommendations Enhanced with Social Reputation Mechanisms  Based on our understanding of potential IBM needs, our commitments for European research projects, and our vision of the potential of Galaxy, we are looking forward to the creation of the following functionalities: Community Support  – Given a peer: Search for its neighbors within a community – Given the entire collection: Identify trends and threads (e.g., tags becoming popular, etc.)  Metadata Sharing & Recommendations – Given a file with some attached metadata: • Recommend additional annotations • Recommend similar files – Given one or more tags and/or keywords: • Locate peers with expertise in the described areas

Slide 83: Research collaborationation  Create semantic social networks of your interest …  in the format which can be used by Galaxy simple XML format   Design scenario and work with us on tuning parameters of Galaxy for the tasks in your scenario …  Contacts Alexander Troussov, CAS Chief Scientist, atrousso@ie.ibm.com   Marie Wallace, LanguageWare manager, mwallace@ie.ibm.com  Brian O’Donovan, CAS Program Director, brian_odonovan@ie.ibm.com  IBM CAS Dublin https://www.ibm.com/ibm/cas/sites/dublin/  LanguageWare http://www.ibm.com/software/globalization/topics/languageware/index.jsp  NEPOMUK http://nepomuk.semanticdesktop.org/

Slide 84:  Questions? 84

Slide 85: BRICKS Project Predrag Knežević Fraunhofer IPSI Institute Darmstadt, Germany knezevic@ipsi.fhg.de 20/04/07 DERI, Galway, Ireland 85

Slide 86: What is BRICKS? • A software infrastructure for building digital library networks – Transparent access to distributed resources – Multilinguality – Easy installation & maintainance • A set of end-user applications – Network & content management – Web 2.0 Tagging/Annotations – Domain specific applications • A business model – Open Source, Platform Independent – Low cost infrastructure – User communities  sustainability 20/04/07 DERI, Galway, Ireland 86

Slide 87: • Sustainability BRICKS – User Communities – Open Source • Applications – User App. Build on top of the foundation – User Services can become Foundation services • Foundation/Infrastructure – Decentralized Storage – Content&Metadata Mngt. – Semantic Retrieval – Security/DRM 20/04/07 DERI, Galway, Ireland 87

Slide 88: BRICKS Architecture • A decentralized P2P network – Avoid central coordination – Highly Scalable, increased reliability – Minimized maintainance costs • Each P2P Node is a set of SOA components – Web Service Interface – Platform Independent – Flexible Composition • Components for – Storing, accessing and protecting digital objects – (Semantic) search & browsing – P2P commmunication 20/04/07 DERI, Galway, Ireland 88

Slide 89: Accessing Data Workstation User Workstation Workstation BNode User User Workstation User User Workstation Austrian Library Re que st User Request Workstation BNod