Irish Digital Libraries Summit

  • 5,951 views
Uploaded on

This is a full stack of slides from the first edition of the Irish Digital Libraries Summit organized by DERI Galway (Apr 20th, 2007)

This is a full stack of slides from the first edition of the Irish Digital Libraries Summit organized by DERI Galway (Apr 20th, 2007)

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
5,951
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
168
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Irish Digital Libraries Summit Digital Libraries at the eve of the Next Generation Internet Sebastian Ryszard Kruk, Mary Burke, Stefan Decker http://wiki.corrib.deri.ie/index.php/SemDL/IrishDLSummit
  • 2. Looking into the Future of Irish Digital Libraries ? ?
  • 3. Why do we care?
    • John teaches biology, over the Internet, using digital libraries and modern technologies (wikis, blogs)
    • How to deliver the material just-in-time?
    • How to pre-asses students?
    • How to automate most of the process?
  • 4. Goals
    • Present current solutions that digital libraries to the Next Generation Internet
  • 5. Goals
    • Gather opinions, requirements and future plans of Irish libraries
  • 6. Goals
    • Build up bases for an application for funding of a national digital libraries initiative under the EU FP7 Digital Libraries theme
  • 7. Schedule Semantic Digital Libraries Coffee break 11:30-11:50 Tomasz Woroniecki Building a Semantic Digital Library 11:00-11:30 Maciej Dąbrowski Ontologies for Digital Libraries 10:30-11:00 Sebastian Kruk Mary Burke Get together, Welcome 10:00-10:30 Future of Digital Libraries Lunch break 13:00-14:00 Predrag Knezevic BRICKS Project 12:30-13:00 Alexander Troussov IBM Ontological Network Miner and its applications to semantic social networks 12:00-12:30 Sebastian Kruk Introduction to the session 11:50-12:00
  • 8. Schedule Digital Libraries in Ireland Wrap-up, Conclusions 16:45-17:00 Mary Burke Discussion panel: Do we need Semantic Web and Web 2.0 technologies in Digital Libraries? 15:45-16:45 Coffee break 15:30-15:45 Sebastian Kruk, Adam Gzella With a Little Help from My Friends: Social Semantic Search and Browsing 15:00-15:30 Judith Wusteman OJAX: A Web 2.0 Search user Interface 14:30-15:00 John McDonough The Irish Virtual Research Library and Archive Project – an infrastructure for humanities research. 14:00-14:30
  • 9. Ontologies for Digital Libraries MarcOnt Initiative Maciej Dąbrowski Digital Enterprise Research Institute National University of Ireland, Galway maciej . dabrowski @deri.org
  • 10. Outline
    • Real-life and Semantic Web
    • Semantic Web and Ontologies
    • MarcOnt Ontology
    • MarcOnt Tools
    • Conclusions
  • 11. Real-life problems
    • Heterogenous systems
    • Identified Problems:
    • Interoperability
    • Format translation
    • Multiple data formats in DL:
    • How to support them?
    • How to translate between them?
    • Who should create mappings?
  • 12. Real-life problems – user’s expectations
    • Searching:
    • Effective and Accurate
    • We want correct and fast answers!!
    • Intuitive and Simple
    • Asking questions should be easy.
    • Meaning
    • Jaguar – a car or an animal?
    • Reasoning
    • Give me articles written by students of X in Galway?
    • Identified problems:
    • Intuitive interface for asking complex querries
  • 13. Real-life problems - summary
    • Digital Libraries should provide:
    • Interoperability
    • Support for many formats
    • Complex search features
    • Intuitive interfaces
  • 14. Semantic Web
  • 15. The Semantic Web – A Brief Introduction
    • Current Web vs. Semantic Web?
      • An extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation. [Tim Berners-Lee]
      • Current Web was designed for humans, and there is little information usable for machines
    • Was the Web meant to be more?
      • Objects with well defined attributes as opposed to untyped hyperlinks between Internet resources
      • A network of relationships amongst named objects, yielding unified information management tasks
    • What do you mean by “Semantic”?
      • the semantics of something is the meaning of something
      • Semantic Web is able to describe things in a way that computers can understand
  • 16. Semantic Web vs. Current Web
    • Current Web
    • Semantic Web
  • 17. The Semantic Web – What is RDF?
    • Describing things on the S emantic W eb
      • RDF (Resource Description Framework)
        • a data format for describing information and resources,
        • the fundamental data model for the Semantic Web
      • Using RDF, we can describe relationships between things like:
        • A is a part of B or
        • Y is a member of Z
        • and their properties ( size , weight , age , price …) in a machine-understandable format
      • RDF graph-based model delivers straightforward machine process ing
      • Putting information into RDF files makes it possible for “scutters” or RDF crawlers to search , discover , pick up , collect , analyse and process  information from the Web
  • 18. The Semantic Web – What is RDF?
    • A simple RDF example
      • Statement:
      • “ Stefan Decker is the creator of the resource (web page) http://www.stefandecker.org ”
      • Structure:
        • Resource (subject) http://www.stefandecker.org
        • Property (predicate) http://purl.org/dc/elements/1.1/creator
        • Value (object) “ Stefan Decker ”
      • Directed graph:
    http://www.stefandecker.org dc:creator Stefan Decker
  • 19. The Semantic Web – How RDF can help us?
    • How RDF can help us?
    • identify objects
    • establish relationships
    • express a new relationship  just add a new RDF statement
    • integrate information from different sources  copy all the RDF data together
    • RDF allows many points of view
  • 20. Ontologies
    • What is an Ontology?
      • „ An ontology is a specification of a conceptualization.“
    • Tom Gruber, 1993
    • Ontologies are social contracts
      • Agreed, explicit semantics
      • Understandable to outsiders
      • (Often) derived in a community process
    • Ontology markup and representation languages:
      • RDF and RDF Schema
      • OWL
      • Other: DAML+OIL , EER , UML , Topic Maps , MOF , XML Schemas
  • 21. Components of ontologies
    • Concepts
    • Book
    • Article
    • Author
    • Properties
    • hasPages
    • hasTitle
    • Constraints
    • Cardinality is at least 1
    • Maximum value is 200
    • Axioms
    • Planes can fly
    • People can’t fly
    • Relationships
    • Is a
    • Part of
  • 22. Ontologies - half-time conclusions
    • Data is not only human readable, it is now also machine readable
    • Machines can realize much more complex tasks (eg. reasoning)
    • Capturing the meaning of concepts is possible
    • A new look on data storage systems (there are no data structures!!)
    • A d v a n t a g e s
  • 23. Usecase scenario
    • Author
    • Title
    • Structured resources:
    • Author
    • Title
    • Data storage allows:
    • Author
    • Title
    • Additional information
    • cannot be stored!!
    Author Title Date Title Author Regular Systems Author Title Date
  • 24. Ontology development process
    • Many approaches
    • Different life cycles
    • Continuous process
    • Involves community of users
    • Requires tools for collaboration
    • Tools for ontology development are necessary
    • D e v e l o p m e n t
  • 25. MarcOnt Initiative
    • Motivation:
    • Build a bibliographic ontology for
    • the Jerome Digital Library
    • MarcOnt Initiative goals:
    • Deliver a set of tools for
    • collaborative ontology
    • development
    • Collaboration
    • Tools for domain experts
    • Enable mediation between formats (MMS)
  • 26. MarcOnt Ontology
    • Central point of MarcOnt Initiative
    • Translation and mediation format
    • Continuous collaborative ontology improvement
    • Knowledge from the domain experts
    • Community influence and evaluation
  • 27. MarcOnt Ontology
    • Goals:
    • Capture concepts from the legacy bibliographic formats
      • MARC21, Bibtex, Dublin Core
      • Lattes, ...
    • Create a uniform bibliographic description format for digital libraries.
    • Enable the use of Semantic Web technologies (eg. reasoning) to improve capabilities of digital libraries
    • Improve interoperability
  • 28. Format Translation Scenario
    • Author:
    • John Smith
    • Date of Birth:
    • 1956-10-15
    • Date of death:
    • 2004-09-10
    Author: John Smith Date of Birth: ?? Date of death: ?? Author: John Smith Date of Birth: ?? Date of death: ?? Author: John Smith Date of Birth: ?? Date of death: ?? Dublin Core
  • 29. Format Translation Scenario
    • Author:
    • John Smith
    • Date of Birth:
    • 1956-10-15
    • Date of death:
    • 2004-09-10
    Author: John Smith Date of Birth: ?? Date of death: ?? Author: John Smith Date of Birth: ?? Date of death: ?? Author: John Smith Date of Birth: 1956-10-15 Date of death: 2004-09-10 RDF Storage Dublin Core Author: John Smith Date of Birth: 1956-10-15 Date of death: 2004-09-10 Author: John Smith Date of Birth: 1956-10-15 Date of death: 2004-09-10
  • 30. MarcOnt Mediation Services
  • 31. MarcOnt Mediation Services
    • Format translation
    Interoperability MarcOnt Mediation Services RDF Translator
  • 32. MarcOnt Ontology in JeromeDL
    • Improvement of searching capabilities
    • Natural Language Processing (NLP)
    • Templates
    • Show me all publications written by students of Decker.
  • 33. MarcOnt Portal
    • Collaborative ontology development.
    • Portal provides:
    • Suggestions
    • Annotations
    • Versioning
    • Ontology editor
  • 34. MarcOnt Portal
    • On-line ontology editing
    • Visualization of ontologies
  • 35. MarcOnt Portal
    • Comparing versions of ontologies
  • 36. MarcOnt Initiative Roadmap
    • Lattes – CV platform used in Brasil
    • Release of MarcOnt draft ontology
    • Digital Rights Management
    • Sharing issues
    • MarcOntX agent – automatic integration of concept from Digital Libraries
  • 37. MarcOnt Initiative summary
    • MarcOnt Initiative goals:
    • Create a framework for collaborative ontology development
    • Provide domain experts with tools to share their knowledge
    • Offer tools for data mediation between different data formats
    • Develop MarcOnt bibliographic ontology
    • Create a community of users (domain experts)
  • 38. Conclusions
    • Ontologies:
    • can improve the most important goal of digital libraries – searching the information
    • facilitate interoperability
    • capture much more information (metadata) than existing systems
    • are the agreement of people (domain experts)
    • need tools for collaborative development and community of users
    • are the future of Digital Libraries?
  • 39.
      • Tomasz Woroniecki
      • [email_address]
    JeromeDL Building a Semantic Digital Library
  • 40. Outline of the presentation
    • Introduction to Semantic Digital Libraries
    • Overview of JeromeDL
    • Architecture of JeromeDL
    • Working with JeromeDL
    • Demo
  • 41. Social Semantic Digital Library
    • A library stores and provides access to resources (books)
    • Qualified staff updates catalogues and helps users
  • 42. Social Semantic Digital Library
    • Machine-readable resources
    • Full-text index improves searching
    • Easy access
    • Availability
  • 43. Social Semantic Digital Library
    • Resources are accessible by machines, not with machines
    • Metadata is rich and extensible
    • Searching reflects meaning of terms
    • RDF is a standard for representing information
    • Not just resources but also knowledge is shared
  • 44. Social Semantic Digital Library
    • Involves the community into sharing knowledge
    • Utilizes social network in searching
    • Allows for comments, blogs, shared bookmarks
    • Easy tagging
  • 45. Evolution of Libraries Social Semantic Digital Library Involves the community into sharing knowledge Semantic Digital Library Accessible by  machines, not only with machines Digital Library Online, easy searching with a full-text index Library Organized collection
  • 46. Semantic Digital Library
    • Semantic digital libraries
      • integrate information based on different metadata, e.g.: resources, user profiles, bookmarks, taxonomies
      • provide interoperability with other systems (not only digital libraries)
      • deliver more robust, user friendly and adaptable search and browsing interfaces empowered by semantics
  • 47. JeromeDL - Motivations
    • Support for different kinds of bibliographic medatata, like: DublinCore , BibTeX and MARC21 at the same time.
      • Making use of existing rich sources of bibliographic descriptions (like MARC21) created by human.
    • Supporting users and communities:
      • users have control over their profile information;
      • community-aware profiles are integrated with bibliographic descriptions
      • support for community generated knowledge
    • Delivering communication between instances:
      • P2P mode for searching and users authentication
      • Hierarchical mode for browsing
  • 48. JeromeDL – Social Semantic Digital Library
    • JeromeDL fulfills requirements of:
    • Librarians
      • precise annotations
      • rich metadata
    • Researchers
      • easy publishing
      • searching related topics
    • Average users
      • efficient search and browsing
      • online collaboration
  • 49. JeromeDL - Architecture
  • 50. Ontologies in JeromeDL
  • 51. Using JeromeDL
    • Uploading a resource
      • provide title, abstract, author etc.
      • provide structure of the resource (e.g., chapters)
      • choose domains of the subject
      • choose keywords for the resource
      • set additional properties
      • upload digital parts of the resource
  • 52. Using JeromeDL
  • 53. Using JeromeDL
    • An administrator either approves or rejects a published resource
  • 54. Sharing bookmarks
  • 55. JeromeDL for a regular user
    • Browsing resources
      • by type, author, keyword, domain
    • Downloading the resource and its bibliographic description in various formats
    • Subscribing to RSS feeds
    • Searching
      • simple, advanced, distributed, semantic
  • 56. JeromeDL for a regular user
  • 57. Summary
    • An easy solution for putting resources online
    • A community around your repository
    • Support for many languages
    • Integration with Bibster and OpenSearch protocols
    Visit www.jeromedl.org
  • 58. Irish Digital Libraries Summit Digital Libraries at the eve of the Next Generation Internet Future of Digital Libraries http://wiki.corrib.deri.ie/index.php/SemDL/IrishDLSummit
  • 59. Looking into the Future of Irish Digital Libraries ? ?
  • 60. Building the Future
    • Future Internet, semantic or social, or both, will not emerge on its own , we need to build it
  • 61. Building the Future
    • Digital libraries are important part of the Internet
  • 62. Building the Future
    • Libraries should continue to drive the changes, not only follow
  • 63. Building the Future
    • OnNeM - IBM Ontological Network Miner and its applications to semantic social networks
    • BRICKS Project – Building Resources for Integrated Cultural Knowledge Services
  • 64. IBM CAS Dublin / LanguageWare group Ontological Network Miner and its applications to models of social networks and semantics
      • Alexander Troussov, Mikhail Sogrin, John Judge
  • 65. Agenda
    • Ontological Network Miner tool (project Galaxy)
      • As generic tool to perform elements of soft clustering and fuzzy inference on semantic networks
    • Applications of Galaxy
      • to ontology-based semantic analysis of texts
        • Semantic tagging, term disambiguation based on the global context
      • Galaxy applications to folksonomies
        • Community detection/Expertice location, …
      • Applications to unified models of semantic social networks
    • Research cooperation
  • 66. Ontological Network Miner (Galaxy)
    • A generic tool to perform elements of soft clustering and fuzzy inference on semantic networks
    • Ongoing project based on the work we have done for EU 6 th framework integrated project Nepomuk
  • 67.
    • Applications to metadata generation
  • 68. Applications to metadata generation
    • Currently the semantic web relies on semantic annotation mostly done manually by humans
    • Working in EU 7 th framework project Nepomuk (which aims to build social semantic desktop) we in IBM Dublin developed a tool for automation of metadata creation:
      • Automatic ontology-based conceptual tagging (central concepts of the text with respect to the given lexico-semantic resource)
        • Text which mentions Mulhuddart, Lansdowne, Clontarf is probably about Dublin/Ireland/Europe/Earth, this fact can be inferred from geographical relations like Mulhuddart “is-part-of” Dublin
      • Disambiguation of terms
        • Based on on the ontological knowledge from corresponding resource (Jaguar – a car or an animal? Jaguar, car, animal, pet, …)
  • 69. Automatic tagging based on concept mentions NETWORK OF CONCEPTS TEXT Mapping of term mentions to concepts . Finding “focus” concept Mention Mention Mention Mention
  • 70. DEMO (Lotusphere 2007)
    • Run eclipse.exe
    • Open lotusphere_demo.config.xml located in subfolder data
    • Have a look at the underlying personal information management ontology
      • people, organisations, projects,
    • Open text: email1.anno
    • Text is processed on the fly, terms are disambiguated, central concepts are shown in the upper-right window
      • Why US? Because most found concepts are people, and during disambiguation it was established that most likely referents of (ambiguous) names are located in US
      • Let us remove first line with two names
        • The text now has less names. Instead of people, other (abstract) concepts now play a more prominent role. Because of this (after a small delay caused by Eclipse, not by the performance of our system) US disappears as the top concept
  • 71. What is Ontological Network Miner?
    • Text analytics demo shown before has applications to:
      • Context dependent smart tags
      • Metadata generation
    • Although text processing is a complex process
      • involving mapping from text to concepts and usage of empirics specific to certain properties of the discourse
    • at the heart of the processing is clustering on the graph of concepts
      • Which was shown by the animation when wide orange area becomes smaller after “magical” shrinking
    • This clustering is provided by IBM Dublin Ontological Network Miner
      • Codenamed OnNeM in Nepomuk project
  • 72. What exactly Ontological Network Miner does?
    • One algorithm (a blend of soft clustering & fuzzy inference)
    • Depending on the parameters, this algorithm provides
      • “ Generalisation” of the model
        • Output has less nodes compared to the input
      • “ Expansion” of the model
        • Which might be used for query expansion:
          • Query “nutrion”+”science” is expanded into properly ranked list:
            • nutritionist, dietologist, nutritional, scientific, ..
    • Our customers and partners can tune the algorithm for specific tasks using intuitively clear parameters.
  • 73. Tuning Galaxy
    • Galaxy utilises a data-driven algorithm and more importantly, tuning can be done by a domain specialist (not necessarily a researcher or software developer), is to “tell” Galaxy what properties of the underlying semantic network are relevant to a particular task:
      • For example, in application to geotagging the user might specify that Galaxy favour geographical locations with bigger populations, and, in addition, favour popular resorts
      • Using WordNet – specify that Galaxy must favour hypernymy-hyponymy relations and disfavour meronymy-holonomy relations
    • Researchers (IBMers and CAS scientists) also have the opportunity to work with us on “fine-tuning” the algorithm
      • For example, to improve usage of graph-metrics such as in-/out- degree of nodes
  • 74.
    • Applications to folksonomy systems (Del.icio.us, IBM’s Dogear, …)
  • 75. Folksonomies as ontological networks  People Documents Tags Instances of tagging
  • 76. Why a “generic” ontological network miner is needed:
    • Objects of interest might be wired into one unified model of lexicon, semantic and social networks
    • For example, the network depicted on the previous slide can be augmented with new entities and new relations
      • One can add relations between participants, or add new people into consideration
      • Semantic relations between tags might be added manually, or generated automatically based on morphological similarity of words, proximity in WordNet, etc.
      • Keywords and other metainformation about documents might be wired into the network
        • Tags in folksonomies are created by humans. Keywords (preexisting in documents or extracted by text processing) and their relations to documents and tags might be added to augment folksonomies.
          • Dogear can recommend tags for new document which nobody yet tagged in a style accepted in the community
  • 77. Why a “generic” engine like Galaxy is needed: (cont)
    • Unified model of lexicon, semantic and social networks gives more context to make the right decisions in Community Detection, Community Structure Analysis, Metadata Sharing & Recommendations, etc
    • However, data network becomes quite intricate and irregular, and only generic, scalable and high-performing ontological network miners (like Galaxy) are up to the job
    • Galaxy is a generic technique, which can efficiently work on huge networks with complex topology
      • Most tasks on MeSH and WordNet are done in 200 msc
    • Galaxy has native potential for explanatory module
      • “ This person might help you to understand this document because he frequently used tags popular for this documents”
  • 78. OnNeM can handle Networks like this:  People Documents Tags Instances of tagging New people and additional relations between them Relations between tags: semantic proximity, misspellings, translations, WordNet, … Relations between documents: … New objects: e.g. keywords from texts might be related with documents and tags
  • 79.
    • Applications to Semantic Social Networks & Knowledge Exchange
  • 80. What problems Galaxy can address
    • Galaxy could be used to uniformly address many problems in Semantic Social Networks & Knowledge Exchange :
      • Tag recommendation in folksonomies; Community detection; Centrality problem in social network analysis; Expertise location…
    • How?
      • Galaxy is a generic technique: which takes as input a function on nodes of a semantic network and transforms this input into another function according to the parameters. To simplify explanations, instead of the input/output functions, we’ll talk about the input set of nodes and ranked output set of nodes
      • To create solution for a particular task
        • A set of input nodes must be chosen
        • Parameters of the algorithm must be established
        • Output set must be interpreted according to the task
  • 81. IBM social software
    • “ the company is serious about dominating social networking for the enterprise”
      • Cooking Up a Social Networking Storm With IBM Labs, March 30, 2007
    • IBM Social Software
      • Dogear
        • Dogear is a social-tagging service for resources such as public URLs, company-internal URLs, and other company internal documents (e.g., Wiki pages, Domino documents, etc.)
      • Bluepages+1
        • is an enhanced version of IBM online employee directory. Among its enhancements is the ability for one person to apply a tag directly to another person’s directory page.
      • Blog Central
        • Blog Central is an internal blogging service, open to any employee. The Blog Central data structures provide for a separate list of tags for each blog and for each entry within each blog.
      • Activities
        • Activities is a web-based version of ActivityExplorer, an activity-centric collaboration service in which teams may create a collections of diverse objects in a tree-like structure consisting of a root “activity” and its daughter components.
  • 82. Our research plans to exploit Galaxy:
    • We are investigating a wide range of applications in Community Detection, Community Structure Analysis, Metadata Sharing & Recommendations Enhanced with Social Reputation Mechanisms
    • Based on our understanding of potential IBM needs, our commitments for European research projects, and our vision of the potential of Galaxy, we are looking forward to the creation of the following functionalities:
      • Community Support
        • Given a peer: Search for its neighbors within a community
        • Given the entire collection: Identify trends and threads (e.g., tags becoming popular, etc.)
      • Metadata Sharing & Recommendations
        • Given a file with some attached metadata:
          • Recommend additional annotations
          • Recommend similar files
        • Given one or more tags and/or keywords:
          • Locate peers with expertise in the described areas
  • 83. Research collaborationation
    • Create semantic social networks of your interest …
    • in the format which can be used by Galaxy
      • simple XML format
    • Design scenario and work with us on tuning parameters of Galaxy for the tasks in your scenario
    • Contacts
      • Alexander Troussov, CAS Chief Scientist, [email_address]
      • Marie Wallace, LanguageWare manager, [email_address]
      • Brian O’Donovan, CAS Program Director, [email_address]
    • IBM CAS Dublin https://www.ibm.com/ibm/cas/sites/dublin/
    • LanguageWare http://www.ibm.com/software/globalization/topics/languageware/index.jsp
    • NEPOMUK http://nepomuk.semanticdesktop.org/
  • 84.
    • Questions?
  • 85. BRICKS Project Predrag Knežević Fraunhofer IPSI Institute Darmstadt, Germany [email_address]
  • 86. What is BRICKS?
    • A software infrastructure for building digital library networks
      • Transparent access to distributed resources
      • Multilinguality
      • Easy installation & maintainance
    • A set of end-user applications
      • Network & content management
      • Web 2.0 Tagging/Annotations
      • Domain specific applications
    • A business model
      • Open Source, Platform Independent
      • Low cost infrastructure
      • User communities  sustainability
  • 87.
    • Sustainability
      • User Communities
      • Open Source
    • Applications
      • User App. Build on top of the foundation
      • User Services can become Foundation services
    • Foundation/Infrastructure
      • Decentralized Storage
      • Content&Metadata Mngt.
      • Semantic Retrieval
      • Security/DRM
    BRICKS
  • 88. BRICKS Architecture
    • A decentralized P2P network
      • Avoid central coordination
      • Highly Scalable, increased reliability
      • Minimized maintainance costs
    • Each P2P Node is a set of SOA components
      • Web Service Interface
      • Platform Independent
      • Flexible Composition
    • Components for
      • Storing, accessing and protecting digital objects
      • (Semantic) search & browsing
      • P2P commmunication
  • 89. Accessing Data
  • 90. A Look into a BNode { BNode
  • 91. Features
    • Application development in any language with a good Web-service support
    • Metadata
      • Support for various schemas
      • Indexed both locally and published in decentralized index as well
    • Annotations
      • Support for various media types (text, images, audio, video)
      • Various supported types (text, audio, video, spatial, temporal)
    • Content
      • Can be stored outside of BNode
      • Internally content can be managed in various binary and structured (XML) formats
      • Organized into collections
      • Location transparent for applications
    • Search
      • Simple, advanced, ontology-based
      • Cross-language support
      • Addresses all available content
  • 92. Collection Manager
    • Single access point for all content and metadata related operations (local and remote)
    • Physical Collection
      • Similar to folder/directory hierarchy in a file system
      • Bound to a single BNode
      • Each digital content object belongs to exactly one collection
    • Logical Collection
      • Virtual folder for organizing content items independent of their physical location
      • Links to content items from various physical collections on different BNodes
      • A content item might belong to many of them
    • Stored Query similar to database views
  • 93. Content Manager
    • Two ways to handle Content in BRICKS
      • stored locally at site of a member party, accessed via URL
      • stored within BRICKS
    • Based on Java Content Repository (JCR)
    • Provide a meta-content model
      • Re-use of existing content models
      • Use standard models
  • 94. Metadata Manager
    • Metadata descriptions  RDF
      • Suitable for any applícation scenario
      • Express Relationships between objects
      • React to changes without changing the model
    • Schema defintions  OWL
      • No fixed schema
      • Extensible (e.g. Application Profiles)
      • Semantic concepts instead of schematic strucutures
    • SPARQL
      • Metadata queries over ontology concepts
      • Queries for graph patterns
  • 95. Annotation Management
    • Rich model
      • Supported fragment types: “Text fragment”, “Time fragment”, “Rectangle”, “Circle”, “Point”, “Polygon” and “Polyline”
      • Supported annotation types: “Structured Annotation”, “Association”, “Text annotation” and “Symbol Annotation”
      • Annotation type “association” supports n:m relations
    • Support of versioning
    • Annotation of complete objects and of fragment of objects
    • Supports annotation of multiple objects
    13/03/2007
  • 96. Security Manager
    • Transparently invoked by the Framework
      • any service call is checked
    • Context-aware policies based on RBAC (via XACML rules), supporting Roles, Groups, at DLObject level
    • Permission declaration through Javadoc @tags
    • Federated identity is managed through an adapted version of OpenSAML
    • Reputation-based Trust calculation integrated
    • Web-based GUI for Security configuration
    13/03/2007
  • 97. Digital Rights Management
    • DRM Component
      • Support for licenses based on MPEG-21 REL license declaration standard
      • Generic API for the integration of commercial DRM systems
    • Watermarking
      • Open-source watermarking tool for images
      • other tools can be integrated
    • BRICKS Store web application for commercial content
    • Creative Commons support for other content in BRICKS
    13/03/2007
  • 98. BRICKS APPLICATIONS
  • 99. Application: BRICKS Workspace
    • What does it demonstrate?
      • a web application (thin client) accessing BRICKS Foundation services
      • Web 2.0 image annotations
      • Reference application
    • Primary customers?
      • general end-users (citizens)
      • application developers
    • Technology
      • Struts based interface to the BCH
    • Live demo at http://saturn.researchstudio.at:8090/workspace
  • 100. Application: BRICKS Desktop
    • What does it demonstrate?
      • a rich client application accessing BRICKS Foundation services
      • direct access to the BCHN
    • Primary customers?
      • expert end-users (researchers, educators)
      • application developers
    • Technology
      • Eclipse based rich client interface
    • Download at http://develop.bricksfactory.org/projects/desktop
  • 101. Application: Annotation Tool
    • What does it demonstrate?
      • Tool which allows end-users to annotate images
      • Creation of annotation threads
      • Supervised Annotations
    • Primary customers?
      • end-users
      • Institutions with large image collections
    • Technology
      • Web Application
  • 102. Application: Online Exhibition Authoring Tool
    • What does it demonstrate?
      • Creating and publishing online exhibitions using contents that is available in the BRICKS network
    • Primary customers?
      • expert end-users (curators)
    • Technology
      • Web Application
    • Live demo at http://livingmemory.researchstudio.at/
  • 103. Application: Archeological Finds Identifier
    • What does it demonstrate?
      • a web application for comparing found objects (e.g. ancient coins) with objects from reference collections
      • Application of complex domain ontology (CIDOC-CRM)
      • Map visualization of GIS-Metadata
    • Primary customers?
      • Museum curators, archaeologists, students, amateurs
    • Technology
      • Struts based interface
    • Live Demo at http://finds.brickscommunity.org:8091/findsidentifier/index.do
  • 104. BRICKS Demo Store
    • What does it demonstrate?
      • Purchasing digital goods
      • License maintenances and proofing
    • Primary customers
      • Content providers
    • Technology
      • Based on OFBiz
    • Live demo at http://brstore.metaware.it:9080/ecommerce/control/main
  • 105. References
    • BRICKS Community Web Site ( http://www.brickscommunity.org )
    • BNode Release Downloads ( http://foundation.bricksfactory.org )
    • BRICKSforge ( http://develop.bricksfactory.org )
    • BRICKS Developer Community ( http://dev.brickscommunity.org )
  • 106. Irish Digital Libraries Summit Digital Libraries at the eve of the Next Generation Internet Digital Libraries in Ireland http://wiki.corrib.deri.ie/index.php/SemDL/IrishDLSummit
  • 107. Building the Future
    • IVRLA - The Irish Virtual Research Library and Archive Project - an infrastructure for humanities research.
    • OJAX – A Web 2.0 Search user Interface
    • S 3 B - With a Little Help from My Friends: Social Semantic Search and Browsing
  • 108. The Irish Virtual Research Library and Archive Project - an infrastructure for humanities research.
  • 109. Outline
    • Quick Overview
    • Digitisation Processes
    • Repository Development
    • Content Models
    • IVRLA Deployment
    • Observations
  • 110. IVRLA Positioning
    • PRTLI funded project
    • Component of UCD Humanities Institute of Ireland and based in UCD Library
    • Supporting research through offering access to digitised content from participating primary source repositories
    • Direct research into digitisation and digital repositories
    • Developing and promoting added value tools and services
  • 111. IVRLA Deliverables
    • Body of digitised content
    • Functioning repository prototype with scaleable infrastructure
    • Comprehensive report including regulatory and financial issues
    • Body of corporate knowledge & expertise
    • Centre of excellence
    • Proof of concept
  • 112. The vision thing
  • 113.
    • Support the creation and publication of new forms of “ information units ”
    • Integrate with the processes (e.g., workflows) of research, collaboration, and scholarly communication
    • Enable knowledge integration : capture semantic and factual relationships among information entities
    • Promote information re-use and contextualization
    • Facilitate collaborative activity and capture information that is created as a byproduct of it
    • Capture and maintain the complex structural, semantic, provenance, and administrative relationships among digital resources*
    • * Sandy Payette, Sydney 2006.
    Digital content repositories should…
  • 114. Digitisation and Cataloguing Processes
  • 115. Image based Digitisation Components
    • Apple PowerMac G5 running Kodak oXYgen Scan
    • Kodak IQSmart 2
    • Adobe Photoshop CS2
  • 116. Audio Digitisation Components
    • Quadriga system
    • Lake People ADC and DAC
    • Revox 1/4 inch tape player
  • 117. Files and Formats
    • Scanned Material (text and images)
      • TIFF (PM)
      • JPEG (CW) Djvu (CW)
      • JPEG (TN)
    • Time Based Material
      • Audio
      • BWF (PM)
      • MP3 / MP4 (CW)
      • Video
      • Linear Digital (PM)
      • mov,wmv? (CW)
  • 118. Workflows
    • TIFFs have metadata embedded
    • TIFFs are backed up to LTO
    • Photoshop macros used to watermark, create JPEG and TIFFs
    • DVDs created and stored
    • Additional derivatives created for resource discovery and access
  • 119. Data Storage
    • 3 high quality ‘Preservation Master’ copies
      • 2 DVD-ROM - working
      • 1 LTO - deep archive
    • Copies stored in geographically disparate locations
    • Estimate that IVRLA will require 6-8TB for all preservation master storage.
      • Scans ~ 80MB
      • Audio ~ 800MB/hr
    • Online requirement is significantly less
  • 120. Metadata and Database
    • 2 stage cataloguing database
    • MODS - descriptive metadata
    • METS - structural and transmission metadata
    • EAD - archival context and structure
    • MIX - technical metadata for images
    • MADS - descriptive metadata authority files
  • 121.  
  • 122. MADS files
  • 123. Collection Model
    • Library use OPAC for searching
    • Archives use Finding Aid for browsing
    • Hybrid model to enable searching and browsing of complex hierarchical digital collections
    • Model facilitates top down and bottom up approaches
    • EAD provides context and structure
    • MODS provides precision and accuracy
    • Create EAD template for each ‘collection’
    • Catalogue to the appropriate level
  • 124.  
  • 125.  
  • 126. Repository Architecture Articulation
  • 127. Open Source Repository Systems
    • Growing area of development
    • Several options available;
      • Dspace
      • Eprints
      • Fedora
    • IVRLA required a solution which offers;
      • Suitability for wide range of data types
      • Support for collection structures and complex objects
      • Scalability - prototype into service
      • Future-proof architecture
      • Long term digital preservation
  • 128. Fedora Service Framework (2005-07) © S.Payette
  • 129. IVRLA Preservation Requirements
    • Audit trails and datastream versioning
    • Persistent Identifiers
    • Checksum creation and validation
    • Whole object versioning
    • OAIS compliance
    • TDR compliance
  • 130. IVRLA interface requirements
    • Evidence
    • Provenance, authenticity, integrity, context, persistence, sustainability
    • Granularity - directed to page, clip, part ..
    • Security, authentication and authorisation infrastructure
    • Conversation/Participation
    • Informal, collaborative
    • Personalisation and customisation
    • Recommendation Services (S/CSI)
    • Social searching and annotation (S/CSI - S/ILS)
    • Add value, links, connections…
  • 131. Content Models
  • 132. Fedora Content Models
    • A definition for a “type” of object (e.g., article, book, image, learning object) that describes the internal composition of a group of similar Fedora objects
      • Data Type
      • Structure
      • Services
    • Data Type defines payloads and metadata
    • Structure defines relationships between objects
    • Services define actions or disseminators for the content
  • 133.  
  • 134.  
  • 135.  
  • 136. RoadMap
    • Initial Research and Demo
    • Develop utilities - sipMaker and mixMaker
    • Articulate collection model
    • Develop Virtual Library and Archive 1.0
      • Browse
      • Search
      • View
      • Cite
      • Tag
    • Ingest Trial and deployment of subsets
    • Develop Virtual Library and Archive 2.0
      • User management
      • Personalisation, customisation
      • Recommendation services
      • Annotation and tagging
      • Research space
      • Virtual collections
  • 137. IVRLA 1.0?
  • 138. Usership
    • Research based
    • Context heavy - accuracy, integrity and authenticity
    • Technically literate with Internet age expectations - the Google effect
    • Accurate citation and source acknowledgement using persistent identifiers
  • 139. Repository Challenges
    • Architecture is not an ‘out of box’ solution
    • Resources required to articulate and develop interface layer(s)
    • Metadata management is complex
    • Tension between popular delivery formats and archival preservation formats
    • Challenge of anticipating all user environments in content modeling
    • Improved automation is necessary for ingest and validation. Digitisation is the main bottleneck
    • Sustainability - prototype developed into a service
    • Human resources are central to technology projects
    • Developing and training data curators - multidisciplinary skill sets
  • 140. Observations and Conclusion
    • 5 year project timeline requires an iterative process
    • New advances in computing science will influence developments - eScience, eHumanities, Web 2.0
    • IVRLA positions the archival source with all context and structure as central to the digital deployment
    • Define and build core sources which can be interrogated and integrated with dynamic services
    • Standards based interoperability is key to ensure future accessibility and sustainability
    • New repository models suggest and support user created metadata such as social bookmarking and annotating
  • 141. Further Information
    • www.ucd.ie/ivrla
    • [email_address]
  • 142. OJAX: Web 2.0 Federated search Judith Wusteman April 2007
  • 143. Overview
    • Introducing OJAX
    • OJAX Demo
    • Related research
  • 144. Web 2.0 Technologies and Standards used in OJAX
    • AJAX
    • REST
    • JSON
    • Atom
    • OAI-PMH
    • OpenSearch
    • Open API
    • StaX
    • Apache Lucene
  • 145. http://ojax.sourceforge.net/
  • 146. OJAX
  • 147. OJAX demo
  • 148. Unifying the user interface
  • 149. Auto-completion Auto-search Dynamic archive list
  • 150. Dynamic scrolling
  • 151. Auto-expansion of results
  • 152. Sorting results
  • 153. OpenSearch
  • 154. OpenSearch
    • Enables search engines to describe their search syntax to browsers
    • Describes standards for search results syntax
      • Based on RSS and Atom
  • 155. Atom feed support
  • 156.  
  • 157.  
  • 158.  
  • 159.  
  • 160. Accessibility
  • 161. Science Foundation Ireland: OJAX++: a next generation collaborative research tool
    • To investigate how concepts from the Social Web can be applied to the research environment in order to facilitate dynamic collaboration and the sharing of ideas among researchers.
  • 162. PhD starting September 2007
    • In collaboration with
    • UCD School of Computer Science and Informatics
    • Requirements
      • Honours degree (preferably first class or 2.1)
        • in Computer Science or a related field
        • or equivalent technical expertise
    • Preferred Experience :
      • Web technology
      • JavaScript
      • AJAX
      • one of Java, Ruby or Python.
    • http://www.ucd.ie/wusteman
    • [email_address] .
  • 163. Advantages of OJAX
    • Developed in Ireland. Can be adapted to suit.
    • Already in Beta version. Available for download.
    • Well received
    • Responds to new user expectations generated by Web 2.0
      • Rich, dynamic user experience.
      • Intuitive interface.
    • Integration, interoperability and reuse.
    • Open source standards-compliance.
      • including OpenSearch, OAI-PMH, StAX
      • and Apache Lucene.
  • 164. http://ojax.sourceforge.net/
  • 165. With a Little Help from My Friends Social Semantic Search and Browsing Sebastian Ryszard Kruk, Adam Gzella Digital Enterprise Research Institute National University of Ireland, Galway sebastian.kruk@deri.org, adam.gzella@deri.org http://s3b.corrib.org/
  • 166. Take away message
    • We search in different way for different things
    • Keyword search is not enough
    • We create the knowledge by sharing our (search) experience
  • 167. Outline
    • Motivation
    • How do people search
    • Search and Browsing lifecycle
    • Applying semantics and making use of social networks:
      • Keyword-based search
      • Collaborative Faceted Navigation
      • Collaborative Filtering
    • Conclusions - Putting it all together
  • 168. How do people search?
    • Different user goals:
      • Resource Seeking - the user wants to find a specific resource (e.g. lyrics of a song, a program to download, a map service etc.)
      • Navigational - the user is searching for a specific web site whose URL s/he forgot
      • Informational - the user is looking for information about a topic s/he is interested in
    • Rose and Levinson: Understanding user goals in web search (2004)
  • 169. Search and browsing lifecycle
    • Why ?
      • Information can be useful
      • Information can be a garbage
    • How ? (Search and browsing actions)
      • [REUSE] keyword-based search (resource seeking)
      • [REDUCE] faceted navigation (navigational)
      • [RECYCLE] collaborative filtering (informational)
    • Can this process be improved with Semantic Web and Social Networking technologies?
  • 170. Query refinement in keyword-based search
    • Why simple full-text search is not enough?
      • Too many results ( low precision )
      • One needs to specify the exact keyword ( low recall )
      • How to distinguish between: Python and python? ( high fall-out )
    • How ?
      • Disambiguation through a context
        • Query context
        • Short-term context:
          • User’s goal
          • Location
          • Time
        • Long-term context:
          • User’s interest
          • Search engine specific
  • 171. Query refinement in keyword-based search
    • How ?
      • Query refinement
        • Spread activation
        • Types mapping
        • Pruning
      • Acquiring the context information :
        • Previous searches of the user
        • Semantically annotated user’s bookmarks
        • Community profile
    • And ? (Manual query refinement)
      • “ Tell me why ” button and the transcript of refinement process
      • Continue to faceted navigation
  • 172. Collaborative Faceted Navigation
    • Why ?
      • The search does not end on a (long) list of results
      • The results are not a list (!) but a graph
      • We loose context with linear navigation
      • A need for unified notion (UI, Services) of filter/narrow and browse/expand services
      • Share browsing experience – navigate collaboratively
    • How (Services)?
      • Defines REST access to services and their composition
      • Basic services : access, search, filter, similar, browse, combine
      • Meta services : RDF serialization, subscription channels, service ID generation
      • Context services : manage contexts, manage service calls/compositions in the context, lists contexts
      • Statistics services : properties, values, tokens
  • 173. Collaborative Faceted Navigation
    • How (User interface)?
      • Hexagons to capture the notion of non-linear history of browsing
      • Selecting values from list, tag cloud or TagsTreeMap TM
      • Context zoomable interface :
        • List (graph) of results
        • Browse from current results
        • Navigate between service call
        • Navigate between contexts (with given call)
  • 174. Social Semantic Collaborative Filtering
    • Why?
      • The bottom-line of acquiring knowledge: informal communication (“word of mouth”)
    • How?
      • Everyone classifies (filters) the information in bookmark folders ( user-oriented taxonomy )
      • Peers share (collaborate over) the information ( community-driven taxonomy )
    • Result?
      • Knowledge “flows“ from the expert through the social network to the user
      • System amass a lot of information on user/community profile ( context )
  • 175. Social Semantic Collaborative Filtering
    • Problems?
      • The horizon of a social network (2-3 degrees of separation)
      • How to handle fine-grained information (blogs, wikis, etc.)
    • Solutions?
      • Inference engine to suggest knowledge from the outskirts of the social network
      • Support for SIOC metadata :
        • Semantically Interlinked Online Communities: blogs, wikis, fora, …
        • SIOC browser in SSCF
        • Annotations and evaluations of “local” resources
  • 176. Putting it all together user profile: recent actions refine search results filter, record, annotate, and share results and actions re-call shared actions user profile: user’s interests filter, record, annotate, and share results
  • 177. Do we need Semantic Web and Web 2.0 technologies in Digital Libraries? Irish Digital Libraries Summit Digital Libraries at the eve of the Next Generation Internet http://wiki.corrib.deri.ie/index.php/SemDL/IrishDLSummit
  • 178. Irish Digital Libraries Summit Digital Libraries at the eve of the Next Generation Internet Conclusions http://wiki.corrib.deri.ie/index.php/SemDL/IrishDLSummit