1) The document discusses the problem of broken links in the Web of Data (also known as the Linked Data cloud). As resources on the web change over time, links between them can become broken when the target resource is removed, moved, or changed.
2) It defines two types of broken links: structurally and semantically broken. A structurally broken link occurs when the representations of the target resource can no longer be retrieved. A semantically broken link occurs when the target resource has changed meaning.
3) The analysis of changes between two versions of DBpedia data showed many resources were moved, removed, or created, demonstrating the broken links problem. Redirect links in DBpedia help trace moved resources.
The document discusses the development of the Semantic Web, which extends the current web to a web of data through the use of metadata, ontologies, and formal semantics. It describes key technologies like the Resource Description Framework (RDF) and Web Ontology Language (OWL) that add machine-readable meaning to web documents. The Semantic Web aims to enable machines to process and understand the semantics of information on the web.
This chapter introduces the semantic modeling procedure, detailing its technical characteristics, possibilities and limitations. First, we present the languages that are used for semantic description. We present RDF, RDFS and OWL, describe their expressiveness in terms of describing Web Resources, and the abilities they provide in order to describe, query, administer and manage resources at a semantic layer. Next, we present the vocabularies that are used in order to provide common grounds in understanding and communicating ideas and concepts. The technologies, together with the vocabularies used, altogether comprise the modern landscape of Semantic Web/Linked Data applications and serve as the basis for maintaining, analyzing datasets and building applications on top of them.
Linked Data at the Open University: From Technical Challenges to Organization...Mathieu d'Aquin
The document discusses how the Knowledge Media Institute at the Open University in the UK has developed a linked data platform, called data.open.ac.uk, to provide open access to various types of data from across the university, including course information, research publications, podcasts, videos, and more. It describes some of the technical and organizational challenges in developing the platform, and highlights how it has enabled new uses of the university's data and inspired innovation both within the university and more broadly in open education.
The document describes Freenet, a distributed anonymous information storage and retrieval system. Freenet operates as a decentralized peer-to-peer network where nodes can store and retrieve data. It aims to protect anonymity of users and resist censorship of information. Data is stored on the network through a process where requests are routed across nodes based on the data key. This allows for popular data to be replicated across nodes.
This document provides an introduction to the Semantic Web, covering topics such as what the Semantic Web is, how semantic data is represented and stored, querying semantic data using SPARQL, and who is implementing Semantic Web technologies. The presentation includes definitions of key concepts, examples to illustrate technical aspects, and discussions of how the Semantic Web compares to other technologies. Major companies implementing aspects of the Semantic Web are highlighted.
The document summarizes a presentation on Named Data Networking (NDN) given by Mostafa Rezazad. It discusses the motivation for NDN, which is to make data and services rather than locations the primary objects on the network. This allows for benefits like redundancy elimination, easier mobility, and more inherent security. An overview is provided of NDN's packet types, node structure, name structure, and routing approach.
The Semantic Web is a vision of information that is understandable by computers. Although there is great exploitable potential, we are still in "Generation Zero'' of the Semantic Web, since there are few real-world compelling applications. The heterogeneity, the volume of data and the lack of standards are problems that could be addressed through some nature inspired methods. The paper presents the most important aspects of the Semantic Web, as well as its biggest issues; it then describes some methods inspired from nature - genetic algorithms, artificial neural networks, swarm intelligence, and the way these techniques can be used to deal with Semantic Web problems.
The document discusses the development of the Semantic Web, which extends the current web to a web of data through the use of metadata, ontologies, and formal semantics. It describes key technologies like the Resource Description Framework (RDF) and Web Ontology Language (OWL) that add machine-readable meaning to web documents. The Semantic Web aims to enable machines to process and understand the semantics of information on the web.
This chapter introduces the semantic modeling procedure, detailing its technical characteristics, possibilities and limitations. First, we present the languages that are used for semantic description. We present RDF, RDFS and OWL, describe their expressiveness in terms of describing Web Resources, and the abilities they provide in order to describe, query, administer and manage resources at a semantic layer. Next, we present the vocabularies that are used in order to provide common grounds in understanding and communicating ideas and concepts. The technologies, together with the vocabularies used, altogether comprise the modern landscape of Semantic Web/Linked Data applications and serve as the basis for maintaining, analyzing datasets and building applications on top of them.
Linked Data at the Open University: From Technical Challenges to Organization...Mathieu d'Aquin
The document discusses how the Knowledge Media Institute at the Open University in the UK has developed a linked data platform, called data.open.ac.uk, to provide open access to various types of data from across the university, including course information, research publications, podcasts, videos, and more. It describes some of the technical and organizational challenges in developing the platform, and highlights how it has enabled new uses of the university's data and inspired innovation both within the university and more broadly in open education.
The document describes Freenet, a distributed anonymous information storage and retrieval system. Freenet operates as a decentralized peer-to-peer network where nodes can store and retrieve data. It aims to protect anonymity of users and resist censorship of information. Data is stored on the network through a process where requests are routed across nodes based on the data key. This allows for popular data to be replicated across nodes.
This document provides an introduction to the Semantic Web, covering topics such as what the Semantic Web is, how semantic data is represented and stored, querying semantic data using SPARQL, and who is implementing Semantic Web technologies. The presentation includes definitions of key concepts, examples to illustrate technical aspects, and discussions of how the Semantic Web compares to other technologies. Major companies implementing aspects of the Semantic Web are highlighted.
The document summarizes a presentation on Named Data Networking (NDN) given by Mostafa Rezazad. It discusses the motivation for NDN, which is to make data and services rather than locations the primary objects on the network. This allows for benefits like redundancy elimination, easier mobility, and more inherent security. An overview is provided of NDN's packet types, node structure, name structure, and routing approach.
The Semantic Web is a vision of information that is understandable by computers. Although there is great exploitable potential, we are still in "Generation Zero'' of the Semantic Web, since there are few real-world compelling applications. The heterogeneity, the volume of data and the lack of standards are problems that could be addressed through some nature inspired methods. The paper presents the most important aspects of the Semantic Web, as well as its biggest issues; it then describes some methods inspired from nature - genetic algorithms, artificial neural networks, swarm intelligence, and the way these techniques can be used to deal with Semantic Web problems.
This document summarizes Freenet, a distributed peer-to-peer network that allows for anonymous publication, replication, and retrieval of data. It operates as a network of identical nodes that pool storage space to store files and cooperate to route requests to likely locations of data, without using broadcasts or centralized indexes. Files are referred to location-independently and dynamically replicated near requestors and deleted where unneeded, making true origins and destinations difficult to determine.
The document discusses the history and significance of links in hypertext and hypermedia. It covers:
- The evolution of links from static embedded links to dynamic links stored separately in link databases.
- The distinction between navigation using links that don't require similarity, versus retrieval which relies on similarity between a query and document.
- The challenges of extending content-based retrieval and navigation to non-text media like images and video.
- The goal of building systems that can extract semantics from media and associate media with concepts to enable more versatile concept-based navigation and retrieval.
The document discusses analyzing the Web of Data (WoD) as a complex network at multiple scales. At the graph scale, the WoD contains over 100 nodes (datasets) connected by 350 edges. At the triple scale, a network of over 600,000 nodes and 800,000 edges was analyzed. Network analysis found the WoD exhibits properties like short average path lengths, power law degree distributions, and a few highly central nodes like DBpedia. Ongoing challenges include implicit links, multi-relations, and dynamics as data is continuously added.
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
Linked Open Data promises to provide guiding principles to publish interlinked knowledge graphs on the Web in the form of findable, accessible, interoperable, and reusable datasets. In this talk I argue that while as such, Linked Data may be viewed as a basis for instantiating the FAIR principles, there are still a number of open issues that cause significant data quality issues even when knowledge graphs are published as Linked Data. In this talk I will first define the boundaries of what constitutes a single coherent knowledge graph within Linked Data, i.e., present a principled notion of what a dataset is and what links within and between datasets are. I will also define different link types for data in Linked datasets and present the results of our empirical analysis of linkage among the datasets of the Linked Open Data cloud. Recent results from our analysis of Wikidata, which has not been part of the Linked Open Data Cloud, will also be presented.
This document provides an overview of database concepts and the history of data access APIs in Microsoft technologies. It defines what a database and DBMS are, lists some common DBMSs, and explains what data access is and why universal data access is important. It then summarizes the evolution of Microsoft's data access APIs from ODBC and DAO, which had limitations, to RDO, OLE DB, and ADO, which improved performance and universality.
This document provides an overview of storing Resource Description Framework (RDF) graphs in relational database management systems. Specifically:
- RDF represents data as subject-predicate-object triples that form a directed graph. This triples-based data model allows for easy data integration.
- RDF graphs are typically stored as a single subject-predicate-object table in a relational database for persistent storage.
- Queries to retrieve and manipulate data in the RDF graph can then be performed using SQL on this table.
Experience from 10 months of University Linked Data Mathieu d'Aquin
Experience from 10 months of University Linked Data at the Open University:
1. The Open University exposed its public data as linked open data to make the data more discoverable, reusable, and integrated with other datasets.
2. Exposing data as linked data provides benefits like increased transparency, data reuse internally and externally, and reduced costs of managing the university's public data.
3. Other UK universities have since followed the Open University's example in exposing their data as linked data.
Semantic Web, Linked Data and Education: A Perfect Fit?Mathieu d'Aquin
This document discusses how semantic web technologies like linked data are a perfect fit for education. It provides examples of how the Open University has applied linked data to connect educational resources and data from across the university. Linked data allows for flexibility, accessibility, and the ability to combine and interpret different sources of knowledge. However, challenges remain around representing rich metadata about educational purpose and interpreting resources in an educational context.
This document discusses distributed databases. It begins by introducing distributed database systems and their structure. Key points include that the database is split across multiple computers that communicate over a network. It then discusses the tradeoffs of distributing a database, such as increased availability but also higher complexity. The document outlines two approaches to distributing data - replication, where copies of data are stored at different sites, and fragmentation, where relations are split into pieces stored at different sites. It provides examples to illustrate these concepts.
This document summarizes the agenda and goals for the EuropeanaConnect All-Staff Meeting taking place from May 10-12, 2010 in Berlin. The meeting will include plenary sessions to update staff on achievements and demonstrations of project results. There will also be parallel sessions for specific work packages and inter-WP meetings. Key achievements in the first year of the project include adding over 40,000 audio items to Europeana from 16 countries, developing prototypes for a semantic layer, multilingual access, and new access channels like mobile and spatial interfaces. Challenges addressed include synchronizing with Europeana requirements and addressing licensing and IPR issues for audio content.
The document describes personas developed by Europeana to represent target users of their services. It outlines 4 primary personas - William, Maria, Peter and Jukka - focused on their interests, digital skills, and search behaviors. These personas are used by Europeana to guide development, selection, and evaluation of new services to better meet user needs. Additional information on the personas and how they are applied can be found on Europeana's website.
Europeana v1.0 and Interdependencies with EuropeanaConnectEuropeanaConnect
Europeana v1.0 is a digital library that provides single access to Europe's cultural heritage from libraries, archives and museums. Europeana Connect supports Europeana v1.0 through developing user-centered design, content acquisition and enrichment, repository infrastructure, a multilingual portal, linked open data, and knowledge outputs. The presentation outlines the key products and outcomes of Europeana Connect to realize the vision of a unified access point to European culture online.
This document discusses efforts to make European cultural heritage data more accessible online. It describes how Europeana provides access to over 5.9 million objects from libraries, museums and archives, with a target of 10 million by 2010. A key challenge is that the data is heterogeneous in format and language. The document proposes using semantic web technologies like controlled vocabularies and linked open data to better interconnect and enrich the metadata from different cultural heritage institutions. This would allow the creation of a more open and interconnected "web of cultural heritage data".
This document provides an overview of content management systems (CMS) and analyzes the capabilities of various open source CMS solutions. It discusses Smile, a company that specializes in open source solutions, and references some of Smile's clients. The document then covers the fundamentals of CMS, describes several popular open source CMS like Drupal, Joomla, and Typo3, and analyzes their strengths and evolution. It also mentions some promising new solutions. The goal is to help clients select the best CMS for their specific needs.
The document provides an overview and updates on Europeana and related projects. It discusses changes to the Europeana backend including improvements to the API, ingestion processes, and repository. It also outlines plans to improve search functionality on Europeana including refine search, alternative suggestions, social tagging integration, and visual browsing options. Upcoming projects and priorities for Europeana are mentioned including a focus on improving the end user experience, ensuring sustainability, and developing strong collection and partner programs.
This document discusses the Europeana Licensing Framework project which aims to (1) develop a pragmatic licensing framework for content in Europeana by summer 2010, (2) validate the framework through consultation, and (3) describe the three public domains and tools for marking up content with the applicable licensing information. The framework will focus on licensing metadata and objects within Europeana to define rights and ensure compliance. It will also link to discussions on interoperability with external registries and a solution was found for asserting public domain status by authoritative organizations uploading content via OAI-PMH. Tools to be built include a license selection tool and public domain helper tool.
DBpedia Spotlight is a system that automatically annotates text documents with DBpedia URIs. It identifies mentions of entities in text and links them to the appropriate DBpedia resources, addressing the challenge of ambiguity. The system is highly configurable, allowing users to specify which types of entities to annotate and the desired balance of coverage and accuracy. An evaluation found DBpedia Spotlight performed competitively compared to other annotation systems.
This document summarizes Freenet, a distributed peer-to-peer network that allows for anonymous publication, replication, and retrieval of data. It operates as a network of identical nodes that pool storage space to store files and cooperate to route requests to likely locations of data, without using broadcasts or centralized indexes. Files are referred to location-independently and dynamically replicated near requestors and deleted where unneeded, making true origins and destinations difficult to determine.
The document discusses the history and significance of links in hypertext and hypermedia. It covers:
- The evolution of links from static embedded links to dynamic links stored separately in link databases.
- The distinction between navigation using links that don't require similarity, versus retrieval which relies on similarity between a query and document.
- The challenges of extending content-based retrieval and navigation to non-text media like images and video.
- The goal of building systems that can extract semantics from media and associate media with concepts to enable more versatile concept-based navigation and retrieval.
The document discusses analyzing the Web of Data (WoD) as a complex network at multiple scales. At the graph scale, the WoD contains over 100 nodes (datasets) connected by 350 edges. At the triple scale, a network of over 600,000 nodes and 800,000 edges was analyzed. Network analysis found the WoD exhibits properties like short average path lengths, power law degree distributions, and a few highly central nodes like DBpedia. Ongoing challenges include implicit links, multi-relations, and dynamics as data is continuously added.
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...Armin Haller
Linked Open Data promises to provide guiding principles to publish interlinked knowledge graphs on the Web in the form of findable, accessible, interoperable, and reusable datasets. In this talk I argue that while as such, Linked Data may be viewed as a basis for instantiating the FAIR principles, there are still a number of open issues that cause significant data quality issues even when knowledge graphs are published as Linked Data. In this talk I will first define the boundaries of what constitutes a single coherent knowledge graph within Linked Data, i.e., present a principled notion of what a dataset is and what links within and between datasets are. I will also define different link types for data in Linked datasets and present the results of our empirical analysis of linkage among the datasets of the Linked Open Data cloud. Recent results from our analysis of Wikidata, which has not been part of the Linked Open Data Cloud, will also be presented.
This document provides an overview of database concepts and the history of data access APIs in Microsoft technologies. It defines what a database and DBMS are, lists some common DBMSs, and explains what data access is and why universal data access is important. It then summarizes the evolution of Microsoft's data access APIs from ODBC and DAO, which had limitations, to RDO, OLE DB, and ADO, which improved performance and universality.
This document provides an overview of storing Resource Description Framework (RDF) graphs in relational database management systems. Specifically:
- RDF represents data as subject-predicate-object triples that form a directed graph. This triples-based data model allows for easy data integration.
- RDF graphs are typically stored as a single subject-predicate-object table in a relational database for persistent storage.
- Queries to retrieve and manipulate data in the RDF graph can then be performed using SQL on this table.
Experience from 10 months of University Linked Data Mathieu d'Aquin
Experience from 10 months of University Linked Data at the Open University:
1. The Open University exposed its public data as linked open data to make the data more discoverable, reusable, and integrated with other datasets.
2. Exposing data as linked data provides benefits like increased transparency, data reuse internally and externally, and reduced costs of managing the university's public data.
3. Other UK universities have since followed the Open University's example in exposing their data as linked data.
Semantic Web, Linked Data and Education: A Perfect Fit?Mathieu d'Aquin
This document discusses how semantic web technologies like linked data are a perfect fit for education. It provides examples of how the Open University has applied linked data to connect educational resources and data from across the university. Linked data allows for flexibility, accessibility, and the ability to combine and interpret different sources of knowledge. However, challenges remain around representing rich metadata about educational purpose and interpreting resources in an educational context.
This document discusses distributed databases. It begins by introducing distributed database systems and their structure. Key points include that the database is split across multiple computers that communicate over a network. It then discusses the tradeoffs of distributing a database, such as increased availability but also higher complexity. The document outlines two approaches to distributing data - replication, where copies of data are stored at different sites, and fragmentation, where relations are split into pieces stored at different sites. It provides examples to illustrate these concepts.
This document summarizes the agenda and goals for the EuropeanaConnect All-Staff Meeting taking place from May 10-12, 2010 in Berlin. The meeting will include plenary sessions to update staff on achievements and demonstrations of project results. There will also be parallel sessions for specific work packages and inter-WP meetings. Key achievements in the first year of the project include adding over 40,000 audio items to Europeana from 16 countries, developing prototypes for a semantic layer, multilingual access, and new access channels like mobile and spatial interfaces. Challenges addressed include synchronizing with Europeana requirements and addressing licensing and IPR issues for audio content.
The document describes personas developed by Europeana to represent target users of their services. It outlines 4 primary personas - William, Maria, Peter and Jukka - focused on their interests, digital skills, and search behaviors. These personas are used by Europeana to guide development, selection, and evaluation of new services to better meet user needs. Additional information on the personas and how they are applied can be found on Europeana's website.
Europeana v1.0 and Interdependencies with EuropeanaConnectEuropeanaConnect
Europeana v1.0 is a digital library that provides single access to Europe's cultural heritage from libraries, archives and museums. Europeana Connect supports Europeana v1.0 through developing user-centered design, content acquisition and enrichment, repository infrastructure, a multilingual portal, linked open data, and knowledge outputs. The presentation outlines the key products and outcomes of Europeana Connect to realize the vision of a unified access point to European culture online.
This document discusses efforts to make European cultural heritage data more accessible online. It describes how Europeana provides access to over 5.9 million objects from libraries, museums and archives, with a target of 10 million by 2010. A key challenge is that the data is heterogeneous in format and language. The document proposes using semantic web technologies like controlled vocabularies and linked open data to better interconnect and enrich the metadata from different cultural heritage institutions. This would allow the creation of a more open and interconnected "web of cultural heritage data".
This document provides an overview of content management systems (CMS) and analyzes the capabilities of various open source CMS solutions. It discusses Smile, a company that specializes in open source solutions, and references some of Smile's clients. The document then covers the fundamentals of CMS, describes several popular open source CMS like Drupal, Joomla, and Typo3, and analyzes their strengths and evolution. It also mentions some promising new solutions. The goal is to help clients select the best CMS for their specific needs.
The document provides an overview and updates on Europeana and related projects. It discusses changes to the Europeana backend including improvements to the API, ingestion processes, and repository. It also outlines plans to improve search functionality on Europeana including refine search, alternative suggestions, social tagging integration, and visual browsing options. Upcoming projects and priorities for Europeana are mentioned including a focus on improving the end user experience, ensuring sustainability, and developing strong collection and partner programs.
This document discusses the Europeana Licensing Framework project which aims to (1) develop a pragmatic licensing framework for content in Europeana by summer 2010, (2) validate the framework through consultation, and (3) describe the three public domains and tools for marking up content with the applicable licensing information. The framework will focus on licensing metadata and objects within Europeana to define rights and ensure compliance. It will also link to discussions on interoperability with external registries and a solution was found for asserting public domain status by authoritative organizations uploading content via OAI-PMH. Tools to be built include a license selection tool and public domain helper tool.
DBpedia Spotlight is a system that automatically annotates text documents with DBpedia URIs. It identifies mentions of entities in text and links them to the appropriate DBpedia resources, addressing the challenge of ambiguity. The system is highly configurable, allowing users to specify which types of entities to annotate and the desired balance of coverage and accuracy. An evaluation found DBpedia Spotlight performed competitively compared to other annotation systems.
Linked data demystified:Practical efforts to transform CONTENTDM metadata int...Cory Lampert
This document outlines a presentation about transforming metadata from a CONTENTdm digital collection into linked data. It discusses the concepts of linked data, including defining linked data, linked data principles, technologies and standards. It then explains how these concepts can be applied to digital collection records, including anticipated challenges working with CONTENTdm. The document describes a linked data project at UNLV Libraries to transform collection records into linked data and publish it on the linked data cloud. It provides tips for creating metadata that is more suitable for linked data.
Linked Data Generation for the University Data From Legacy Database dannyijwest
Web was developed to share information among the users through internet as some hyperlinked documents.
If someone wants to collect some data from the web he has to search and crawl through the documents to
fulfil his needs. Concept of Linked Data creates a breakthrough at this stage by enabling the links within
data. So, besides the web of connected documents a new web developed both for humans and machines, i.e.,
the web of connected data, simply known as Linked Data Web. Since it is a very new domain, still a very
few works has been done, specially the publication of legacy data within a University domain as Linked
Data.
Nelson Piedra , Janneth Chicaiza
and Jorge López, Universidad Técnica Particular de Loja, Edmundo
Tovar, Universidad Politécnica de Madrid,
and Oscar Martínez, Universitas
Miguel Hernández
Explore the advantages of using linked data with OERs.
The document discusses handling broken links in the Web of Data. It introduces DSNotify, a framework for detecting and correcting broken links. DSNotify uses a notification strategy where data sources notify clients of changes that could cause links to break. It also detects moves between resources based on their similarities. The core algorithm detects events like deletions, updates, moves and creations by comparing resource representations over time. DSNotify was evaluated on changes observed between snapshots of DBpedia.
Linked data for Libraries, Archives, Museumsljsmart
Linked data provides a method for publishing structured data on the semantic web so that it can be interlinked and made more useful. It builds upon standard web technologies like HTTP and URIs. The benefits of creating and using linked data include making data sharable, extensible, reusable, and improving discoverability. The process of creating linked data involves identifying data to expose, representing it in RDF/XML with URIs, and making that data available via HTTP URIs so others can discover and link to it.
Charleston 2012 - The Future of Serials in a Linked Data WorldProQuest
The educational objective of this session is to review today’s MARC-based environment in which the serial record predominates, and compare that with what might be possible in a future world of linked data. The session will inspire conversation and reflection on a number of questions. What will a world of statement-based rather than record-based metadata look like? What will a new environment mean for library systems, workflows, and information dissemination?
The document introduces the principles of Linked Data, which aims to share data rather than documents on the web. It describes the four rules of Linked Data and provides examples of existing Linked Data datasets as well as tools for publishing and using Linked Data. The document also discusses extending Linked Data to include geospatial and sensor data by linking web resources, structured geospatial databases, and unstructured geographic information.
The document provides an overview of how the LOCAH project is applying Linked Data concepts to expose archival and bibliographic data from the Archives Hub and Copac as Linked Open Data. It describes the process of (1) modeling the data as RDF triples, (2) transforming existing XML data to RDF, (3) enhancing the data by linking to external vocabularies and datasets, (4) loading the RDF into a triplestore, and (5) creating Linked Data views to expose the data on the web. The goal is to publish structured data that can be interconnected across domains to enable new uses by both humans and machines.
The document is an assignment for a Semantic Web course. It includes questions and answers about key concepts of the Semantic Web, such as the meaning of the term "Semantic Web", why data interoperability on the web is difficult, why DBpedia is important for linking data, and the four rules of linked data. It also lists and describes four datasets from linkeddata.org and the ontologies used by each.
Web Science Synergies: Exploring Web Knowledge through the Semantic WebStefan Dietze
The document discusses exploring web data and knowledge through the semantic web. It describes how the semantic web adds meaning to data through shared vocabularies and schemas. It also discusses challenges with the large number and diversity of linked open datasets, including issues with accessibility, heterogeneity of schemas, and data quality. It proposes approaches to address these challenges, such as dataset profiling, metadata catalogs, and infrastructure for federated querying.
by Sotiris Batsakis & Grigoris Antoniou, presented at the 3rd PRELIDA Consolidation and Dissemination Workshop, Riva, Italy, October, 17, 2014. More information about the workshop at: prelida.eu
1) The document discusses different aspects of embedding institutional repositories at the global level, including visibility and access, development, and advocacy/publicity projects.
2) It describes the ORA institutional repository at the University of Oxford, which uses FEDORA as middleware between storage and delivery applications.
3) Examples of projects discussed include the Medieval Libraries of Great Britain, Cultures of Knowledge, and PIRUS, which is about publisher and institutional repository usage statistics.
1) The document discusses different aspects of embedding institutional repositories at the global level, including visibility and access, development, and advocacy/publicity projects.
2) It describes the technical infrastructure supporting the University of Oxford's institutional repository, including the use of Fedora as middleware between storage and applications.
3) Examples of projects discussed include the Medieval Libraries of Great Britain, Cultures of Knowledge, and PIRUS for collecting publisher and institutional repository usage statistics.
The document provides an overview of linked data fundamentals, including key concepts like URIs, RDF, ontologies, and the semantic web. It discusses aspects of linked data such as using HTTP URIs to identify resources, representing data as subject-predicate-object triples, and connecting related resources through links. It also covers RDF serialization formats, ontologies like RDFS and OWL, and notable linked open data sources.
The document describes DBpedia, a project that extracts structured data from Wikipedia and makes it available on the Web. DBpedia has extracted over 2.6 million entities from Wikipedia and defined web-dereferenceable identifiers for each. As DBpedia covers many domains, other data sources on the Web have begun linking to DBpedia resources, making DBpedia a central hub. This has resulted in a Web of over 4.7 billion interlinked pieces of data across various domains.
Jana Parvanova, Vladimir Alexiev and Stanislav Kostadinov. In workshop Collaborative Annotations in Shared Environments: metadata, vocabularies and techniques in the Digital Humanities (DH-CASE 2013). Collocated with DocEng 2013. Florence, Italy, Sep 2013.
This document discusses building REST and hypermedia APIs with PHP. It begins with an introduction of the speaker and overview of REST. It then discusses REST as an architectural style, describing constraints like client-server, stateless, cache and layered system. It explains the uniform interface constraint and importance of hypermedia and hyperlinks. Examples are given of photo sharing API using HTTP, links and relations. The presentation concludes with recommendations like using link relations instead of hardcoded URIs and avoiding direct XML/JSON usage.
IRJET- Data Retrieval using Master Resource Description FrameworkIRJET Journal
This document discusses using a Master Resource Description Framework (MRDF) to improve data retrieval efficiency from databases. The MRDF combines multiple RDF files into a single framework to reduce the time needed for search engines to query each individual RDF file. It also describes using a user profile to track user interests and tailor query results accordingly for a personalized search experience. The MRDF approach is presented as improving search efficiency while retrieving data from databases.
Europeana - Digitale Bibliothek Europas. Fenster zur Welt für lokale, regiona...EuropeanaConnect
Mag. Gerda Koch, AIT Angewandte Informationstechnik Forschungsgesellschaft mbH
14th International Congress Cultural Heritage and New Technologies Vienna, 17 September 2009
Europeana: Europe's flagship web portal, making Europe's cultural heritage ac...EuropeanaConnect
Europeana is Europe’s flagship web portal that makes Europe’s cultural heritage accessible worldwide. It seeks to present digital content such as music, music-related materials, and non-audio artifacts from European cultural institutions. Content must be in the public domain, wholly owned, or licensed under Creative Commons to protect intellectual property rights while benefiting institutions through increased exposure and opportunities for funding and collaboration. Owners can proceed by testing uploads and working with Europeana to map metadata for inclusion in the portal.
Semantische Kontextualisierung von Museumsbestanden in EuropeanaEuropeanaConnect
Marlies Olensky, Prof. Dr. Stefan Gradmann Humboldt-Universitat zu Berlin / Berlin School of Library and Information Science
Wissensorganisation 2009, Bonn, 19-21 October 2009
EU-funded project Europeana - Europe's flagship web portal, making Europe's c...EuropeanaConnect
Europeana is the European Union's flagship digital library portal that makes Europe's cultural heritage accessible online. It aggregates existing digital cultural content from various institutions and presents it with semantic tagging in a new context. The project is co-funded by the European Commission to increase access to cultural works still in copyright through licensing or that are in the public domain. It aims to provide more exposure for cultural works while respecting the intellectual property rights of owners.
Promoting Austrian Cultural and Scientific Heritage via EUROPEANAEuropeanaConnect
Mag. Gerda Koch, AIT Angewandte Informationstechnik Forschungsgesellschaft mbH
14th International Congress Cultural Heritage and New Technologies Vienna, 17 November 2009
This document summarizes the EOD (Electronic On Demand) service, which allows individual users to request the digitization of books. The service is coordinated by the University of Innsbruck Library and has over 20 member libraries from 10 countries. It maintains a central database where libraries can view and fulfill digitization orders from their own collections. On average, it delivers digitized books within 7 days at a cost of 5-10 euros per order. User feedback indicates the EOD service provides fast access to otherwise inaccessible materials and that the digital files are of good quality. Future plans include expanding access for blind and visually impaired users and creating fully searchable eBooks.
Enhancing user access to european digital heritageEuropeanaConnect
EuropeanaConnect is a project co-funded by the European Commission to build the technical infrastructure for Europeana and enhance user access to digital cultural heritage collections. It involves 30 partners from 14 countries and will add new audio content, build semantic capabilities, and develop multilingual and mobile access channels for Europeana. The project aims to better understand user needs through personas, surveys and testing and involve users in contributing annotations and comments.
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.pptHenry Hollis
The History of NZ 1870-1900.
Making of a Nation.
From the NZ Wars to Liberals,
Richard Seddon, George Grey,
Social Laboratory, New Zealand,
Confiscations, Kotahitanga, Kingitanga, Parliament, Suffrage, Repudiation, Economic Change, Agriculture, Gold Mining, Timber, Flax, Sheep, Dairying,
Chapter wise All Notes of First year Basic Civil Engineering.pptxDenish Jangid
Chapter wise All Notes of First year Basic Civil Engineering
Syllabus
Chapter-1
Introduction to objective, scope and outcome the subject
Chapter 2
Introduction: Scope and Specialization of Civil Engineering, Role of civil Engineer in Society, Impact of infrastructural development on economy of country.
Chapter 3
Surveying: Object Principles & Types of Surveying; Site Plans, Plans & Maps; Scales & Unit of different Measurements.
Linear Measurements: Instruments used. Linear Measurement by Tape, Ranging out Survey Lines and overcoming Obstructions; Measurements on sloping ground; Tape corrections, conventional symbols. Angular Measurements: Instruments used; Introduction to Compass Surveying, Bearings and Longitude & Latitude of a Line, Introduction to total station.
Levelling: Instrument used Object of levelling, Methods of levelling in brief, and Contour maps.
Chapter 4
Buildings: Selection of site for Buildings, Layout of Building Plan, Types of buildings, Plinth area, carpet area, floor space index, Introduction to building byelaws, concept of sun light & ventilation. Components of Buildings & their functions, Basic concept of R.C.C., Introduction to types of foundation
Chapter 5
Transportation: Introduction to Transportation Engineering; Traffic and Road Safety: Types and Characteristics of Various Modes of Transportation; Various Road Traffic Signs, Causes of Accidents and Road Safety Measures.
Chapter 6
Environmental Engineering: Environmental Pollution, Environmental Acts and Regulations, Functional Concepts of Ecology, Basics of Species, Biodiversity, Ecosystem, Hydrological Cycle; Chemical Cycles: Carbon, Nitrogen & Phosphorus; Energy Flow in Ecosystems.
Water Pollution: Water Quality standards, Introduction to Treatment & Disposal of Waste Water. Reuse and Saving of Water, Rain Water Harvesting. Solid Waste Management: Classification of Solid Waste, Collection, Transportation and Disposal of Solid. Recycling of Solid Waste: Energy Recovery, Sanitary Landfill, On-Site Sanitation. Air & Noise Pollution: Primary and Secondary air pollutants, Harmful effects of Air Pollution, Control of Air Pollution. . Noise Pollution Harmful Effects of noise pollution, control of noise pollution, Global warming & Climate Change, Ozone depletion, Greenhouse effect
Text Books:
1. Palancharmy, Basic Civil Engineering, McGraw Hill publishers.
2. Satheesh Gopi, Basic Civil Engineering, Pearson Publishers.
3. Ketki Rangwala Dalal, Essentials of Civil Engineering, Charotar Publishing House.
4. BCP, Surveying volume 1
A Visual Guide to 1 Samuel | A Tale of Two HeartsSteve Thomason
These slides walk through the story of 1 Samuel. Samuel is the last judge of Israel. The people reject God and want a king. Saul is anointed as the first king, but he is not a good king. David, the shepherd boy is anointed and Saul is envious of him. David shows honor while Saul continues to self destruct.
🔥🔥🔥🔥🔥🔥🔥🔥🔥
إضغ بين إيديكم من أقوى الملازم التي صممتها
ملزمة تشريح الجهاز الهيكلي (نظري 3)
💀💀💀💀💀💀💀💀💀💀
تتميز هذهِ الملزمة بعِدة مُميزات :
1- مُترجمة ترجمة تُناسب جميع المستويات
2- تحتوي على 78 رسم توضيحي لكل كلمة موجودة بالملزمة (لكل كلمة !!!!)
#فهم_ماكو_درخ
3- دقة الكتابة والصور عالية جداً جداً جداً
4- هُنالك بعض المعلومات تم توضيحها بشكل تفصيلي جداً (تُعتبر لدى الطالب أو الطالبة بإنها معلومات مُبهمة ومع ذلك تم توضيح هذهِ المعلومات المُبهمة بشكل تفصيلي جداً
5- الملزمة تشرح نفسها ب نفسها بس تكلك تعال اقراني
6- تحتوي الملزمة في اول سلايد على خارطة تتضمن جميع تفرُعات معلومات الجهاز الهيكلي المذكورة في هذهِ الملزمة
واخيراً هذهِ الملزمة حلالٌ عليكم وإتمنى منكم إن تدعولي بالخير والصحة والعافية فقط
كل التوفيق زملائي وزميلاتي ، زميلكم محمد الذهبي 💊💊
🔥🔥🔥🔥🔥🔥🔥🔥🔥
This presentation was provided by Racquel Jemison, Ph.D., Christina MacLaughlin, Ph.D., and Paulomi Majumder. Ph.D., all of the American Chemical Society, for the second session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session Two: 'Expanding Pathways to Publishing Careers,' was held June 13, 2024.
1. DSNotify: Handling Broken Links in the Web of Data
Niko P. Popitsch Bernhard Haslhofer
niko.popitsch@univie.ac.at bernhard.haslhofer@univie.ac.at
University of Vienna, Department of Distributed and Multimedia Systems
Liebiggasse 4/3-4, 1010 Vienna, Austria
ABSTRACT the network leading to the practical unavailability of infor-
The Web of Data has emerged as a way of exposing struc- mation [15, 2, 18, 19, 22].
tured linked data on the Web. It builds on the central build- Recently, Linked Data has been proposed as an approach
ing blocks of the Web (URIs, HTTP) and benefits from its for exposing structured data by means of common Web
simplicity and wide-spread adoption. It does, however, also technologies such as dereferencable URIs, HTTP, and RDF.
inherit the unresolved issues such as the broken link prob- Links between resources play a central role in this Web of
lem. Broken links constitute a major challenge for actors Data as they connect semantically related data. Meanwhile
consuming Linked Data as they require them to deal with an estimated number of 4.7 billion RDF triples and 142 mil-
reduced accessibility of data. We believe that the broken lion RDF links (cf. [6]) are exposed on the Web by numerous
link problem is a major threat to the whole Web of Data data sources from different domains. DBpedia [7], the struc-
idea and that both Linked Data consumers and providers tured representation of Wikipedia, is the most prominent
will require solutions that deal with this problem. Since no one. Web agents can easily retrieve these data by derefer-
general solutions for fixing such links in the Web of Data encing URIs via HTTP and process the returned RDF data
have emerged, we make three contributions into this direc- in their own application context. In the example shown in
tion: first, we provide a concise definition of the broken Figure 1, an institution has linked a resource representing a
link problem and a comprehensive analysis of existing ap- band in their local data set with the corresponding resource
proaches. Second, we present DSNotify, a generic frame- in DBpedia in order to publish a combination of these data
work able to assist human and machine actors in fixing bro- on its Web portal.
ken links. It uses heuristic feature comparison and employs Source Resource Link Target Resource
a time-interval-based blocking technique for the underlying
http://example.com/bands/OliverBlack rdfs:seeAlso http://dbpedia.org/resource/Oliver_Black
instance matching problem. Third, we derived benchmark
datasets from knowledge bases such as DBpedia and eval- dbpprop: abstract
foaf:name
uated the effectiveness of our approach with respect to the Representation Representation
broken link problem. Our results show the feasibility of a Oliver Black
Oliver Black is a Canadian
rock group ...
time-interval-based blocking approach for systems that aim
at detecting and fixing broken links in the Web of Data.
Figure 1: Sample link to a DBpedia resource
Categories and Subject Descriptors
H.4.m [Information Systems]: Miscellaneous; H.3.3 [ The Linked Data approach builds on the Architecture of
Information Systems]: Information Search and Retrieval the World Wide Web [16] and inherits the technical benefits
such as simplicity and wide-spread adoption but also the
unsolved problems such as broken links. In the course of
General Terms time, resources and their representations can be removed
Algorithms, Theory, Experimentation, Measurement completely or “moved” to another URI meaning that they
are published under a different HTTP URI. In case of the
1. INTRODUCTION above example, the band eventually changed its name and
Data integrity on the Web is not given because URI ref- the title of their Wikipedia entry to “Townline”, with the
erences of links between resources are not as cool 1 as they result that the corresponding DBpedia entry moved from its
are supposed to be. Resources may be removed, moved, or previous URI to http://dbpedia.org/resource/Townline.
updated over time leading to broken links. These constitute In the Linked Data context, we informally speak of links
a major problem for consumers of Web data, both human pointing from one resource (source) to another resource (tar-
and machine actors, as they interrupt navigational paths in get). Such a link is broken when the representations of the
target cannot be accessed anymore. However, we consider
1
See http://www.w3.org/Provider/Style/URI a link also as broken when the representations of the target
Copyright is held by the International World Wide Web Conference Com-
resource were updated in such a way that they underwent a
mittee (IW3C2). Distribution of these papers is limited to classroom use, change in meaning the link-creator had not in mind.
and personal use by others. When regarding the changes in recent DBpedia releases,
WWW 2010, April 26–30, 2010, Raleigh, North Carolina, USA. the broken link problem becomes evident: analogous to [7],
ACM 978-1-60558-799-8/10/04.
2. we analyzed the instances of common DBpedia classes in the 2.1 Definition of Broken Links and Events
snapshots 3.2 (October 2008) and 3.3 (May 2009), identified We distinguish two types of broken links that differ in their
the events that occurred between these versions and catego- characteristics and in the way how they can be detected and
rized them into move, remove, and create events. The results fixed: structurally and semantically broken links.
in Table 1 show that within a period of about seven months
the DBpedia data space has grown and was considerably re- Structurally broken links. Formally, we define structurally
organized. Different from other data sources, DBpedia has broken (binary) links as follows:
the great advantage that it records move events in so-called
redirect links that are derived from redirection pages. These Definition 1 (Broken Link). Let R and D be the set
are automatically created in the Wikipedia when articles are of resources and resource representations respectively and
renamed. P(A) be the powerset of an arbitrary set A.
Class Ins. 3.2 Ins. 3.3 MV RM CR
Further let δt : R −→ P(D), be a dereferencing function
Person 213,016 244,621 2,841 20,561 49,325 returning the set of representations of a given resource at a
Place 247,508 318,017 2,209 2,430 70,730 given time t.
Organisation 76,343 105,827 2,020 1,242 28,706 Now we can define a (binary) link as a pair
Work 189,725 213,231 4,097 6,558 25,967
l = (rsource , rtarget ) with rsource ∧ rtarget ∈ R.
Table 1: Changes between two DBpedia releases. Such a link is called structurally broken if
Ins. 3.2 and Ins 3.3 denote the number of instances δt−∆ (rtarget ) = ∅ ∧ δt (rtarget ) = ∅.
of a certain DBpedia class in the respective release
That is, a link (as depicted for example in Figure 1) is
data sets, MV the moved, RM the removed, and
considered structurally broken if its target resource had rep-
CR the number of created resources.
resentations that are not retrievable anymore2 . In the re-
mainder of this paper, we will refer to structurally broken
If humans encounter broken links caused by a move event, links simply as broken links if not stated otherwise.
they can search the data source or the Web for the new lo-
cation of the target resource. However, for machine agents
broken links can lead to serious processing errors or misin-
Semantically broken links. Apart from structurally bro-
ken links, we also consider a link as broken if the human
terpretation of resources when they do not implement ap-
interpretation (the meaning) of the representations of its
propriate fallback mechanisms. If, for instance, the link in
target resource differs from the one intended by the link au-
Figure 1 breaks and the target resource becomes unavailable
thor. In a quantitative analysis of Wikipedia articles that
due to a remove or move event, the referenced biography
changed their meaning over time [13], the authors found out
information cannot be provided anymore. If the resource
that only a small number of articles (6 out of a test set
representations are updated and undergo a change in mean-
of 100 articles) underwent minor or major changes in their
ing, the Web portal could encounter the problem of exposing
meaning. However, we do not think that these results can
semantically invalid information.
be generalized for arbitrary data sources.
While the detection of broken links on the Web is sup-
In contrast to structurally broken links, semantically bro-
ported by a number of tools, there are only few approaches
ken links are much harder to detect and fix by machine ac-
for automatically fixing them [19]. Techniques for dealing
tors. But they are fixable by human actors that may, in a
with the broken link problem in the Web of Data do not
semi-automatic process, report such events to a system that
exist yet. The current approach is to rely on the HTTP
then forwards these events to subscribed actors. We have
404 Not Found response and assume that data-consuming
foreseen this in our system (see Section 3).
actors can deal with it. We consider this as insufficient and
propose DSNotify, which we informally introduced in [12],
as a possible solution. DSNotify is a generic change detec- Events. Having defined links and broken links we can now
tion framework for Linked Data sources that informs data- define events that occur when datasets are modified, possi-
consuming actors about the various types of events (create, bly leading to broken links:
remove, move, update) that can occur in data sources.
Our contributions are: (i) we formalize the broken-link Definition 2 (Event). Let E be the set of all events
problem in the context of the Web of Data and provide a and e ∈ E be a quadruple e = (r1 , r2 , τ, t), where r1 ∈ R and
comprehensive analysis of existing solution strategies (Sec- r2 ∈ R ∪ {∅} are resources affected by the event,
tion 2), (ii) we present DSNotify, focusing on its core algo- τ ∈ {created, removed, updated, moved} is the type of the
rithms for handling the underlying instance matching prob- event and t is the time when the event took place. Further
lem (Section 3), and (iii) we have derived benchmark data let L ⊆ E be a set of detected events.
sets from the ISLab Instance Matching Benchmark [11] and Then we can assert that, ∀r ∈ R :
from the DBpedia knowledge base and evaluate the effec- δt−∆ (r) = ∅ ∧ δt (r) = ∅ =⇒ L ←− L ∪ {(r, ∅, created, t)} .
tiveness of the DSNotify approach (Section 4). δt−∆ (r) = ∅ ∧ δt (r) = ∅ =⇒ L ←− L ∪ {(r, ∅, removed, t)} .
δt−∆ (r) = δt (r) =⇒ L ←− L ∪ {(r, ∅, updated, t)} .
2. THE BROKEN LINK PROBLEM Note the analogy between the definition of broken links
In the previous section, we informally described the bro- and removed events: whenever the representations of a link
ken link problem and its possible consequences. In this sec- 2
Note that our definitions do not consider a link as broken
tion we want to contribute a formal definition of a broken if only some of the representations of the target resource are
link in the context of Linked Data and discuss existing so- not retrievable anymore. We consider clarifications of this
lution strategies for dealing with this problem. issue as a topic for further research.
3. target are removed, the corresponding links are broken. Now Broken link type
Solution Strategy Class A B C
we have defined create, remove and update events, but what
Ignoring the Problem -
about the event type “moved ”? In fact, it is not possible Embedded Links p
to speak about moved resources considering only the pre- Relative References p
vious definitions. Although there is no concept of resource Indirection p
location in RDF, it exists in the Web of Data as it relies Versioned and Static Collections p
on dereferencable HTTP URIs. For this reason, we define a Regular Updates p
Redundancy p
weak equality relation between resources in the Web of Data
Dynamic Links a
based on a similarity relation between its representations Notification c
and build on that to define move events: Detect and Correct c
Manual Edit/Search c
Definition 3 (Move Event).
Let σ : P(D) × P(D) −→ [0, 1] be a similarity function
between two sets of resource representations. Further let Table 2: Solution strategies for the broken link prob-
Θ ∈ [0, 1] be a threshold value. lem. The strategies are classified according to Ash-
We define the maximum similarity of a resource man [2]: preventative methods (p) that try to avoid
rold ∈ {r ∈ R|δt (rold ) = ∅} with any other resource rnew ∈ broken links in the first place, adaptive methods (a)
R {rold } as simmax ≡ max(σ(δt−∆ (rold ), δt (rnew ))) . that create links dynamically thereby avoiding bro-
rold
Now we can assert that: ken links and corrective methods (c) that try to fix
∃!rnew ∈ R|δt−∆ (rnew ) = ∅ ∧ Θ < σ(δt−∆ (rold ), δt (rnew )) = broken links. Symbols: potentially fixes/avoids such
simmax =⇒ L ← L ∪ {(rnew , rold , moved, t)} . broken links ( ), does not fix/avoid such broken
rold
links ( ), partly fixes/avoids such broken links ( )
Thus, we consider a resource as moved from one HTTP
URI to another when the resource with the “old” URI was
removed, the resource with the “new” URI was created and and the target resource is referenced using e.g., an HTTP
the representations retrieved from the “old” URI are very URI reference. This method preserves link integrity when
similar3 to the representations retrieved from the “new” URI. the source resource of a link is moved (type A).
2.2 Solution Strategies Relative References. Relative references prevent broken
In consequence of Definition 1, we further provide a more links in some cases (e.g., when a whole resource collection
informal definition of link integrity: is moved). But neither does this method avoid broken links
due to removed resources (type C ), nor does it hinder links
Definition 4 (Link Integrity). Link integrity is a with external sources/targets (i.e., absolute references) from
qualitative measure for the reliability that a link leads to the breaking.
representations of a resource that were intended by the au-
thor of the link. Indirection5 . Introducing a layer of indirection allows con-
tent providers to keep links to their Web resources up to
Methods to preserve link integrity have a long history in
date. Aliases refer to the location of a resource and special
hypertext research. We have analyzed existing approaches
services translate between an alias and its referred location.
in detail, building to a great part on Ashman’s work [2] and
Moving or removing a resource requires an update in these
extending it. In the following, we categorize broken links by
service’s translation tables. Uniform Resource Names were
the events that caused their breaking:
proposed for such an indirection strategy, PURLs and DOIs
type A: links broken because source resources were moved4 ,
are two well known examples [1]. Permalinks use a similar
type B: links broken because target resources were moved
strategy, although the translation step is performed by the
and type C: links broken because source or target resources
content repository itself and not by a special (possibly cen-
were removed. The identified solution strategies are sum-
tral) service. Another special case on the Web is the use
marized in Table 2 and discussed in the following:
of small (“gravestone”) pages that reside at the locations of
moved or removed resources and indicate what happened to
Ignoring the Problem. Although this can hardly be called the resource (e.g., by automatically redirecting the HTTP
a “solution strategy”, it is the status-quo to simply ignore request to the new location).
the problem of broken links and shift it to higher-level ap- The main disadvantage of the indirection strategy is that
plications that process the data. As mentioned before, this it depends on notifications (see below) for updating the
strategy is even less acceptable in the Web of Data. translation tables. Furthermore it “. . . requires the cooper-
ation of link creators to refer to the alias instead of to the
Embedded Links. The embedded link model [8] is the most absolute address.” [2]. Another disadvantage is that spe-
common way how links on the Web are modeled. As in cial “translation services” (PURL server, CMS, gravestone
HTML, the link is embedded in the source representation pages) are required that introduce additional latency when
3
Note that in the case that there are multiple possible move accessing resources (e.g., two HTTP requests instead of one).
target resources with equal (maximum) similarity to the re- Nevertheless, indirection is an increasingly popular method
moved resource rold , no event is issued (∃! should be read as on the Web. This strategy prevents broken links of type A
“there exists exactly one”).
4 5
Note that in our definitions we did not consider links of The “Dereferencing (Aliasing) of End Points” and “For-
type A as we assumed an embedded link model for Linked warding Mechanisms and Gravestones” categories from [2]
Data sources. are combined in this category.
4. and B and can also help with type C links, e.g., removed
PURLs result in HTTP 410 status code (Gone), which al- links to
lows an application to react accordingly (e.g., by removing Dsrc Dtgt
the links to this resource). updates consumes
consumes monitors
Versioned and Static Collections. In this approach, no
modifications/deletions of the considered resources are al- Application notifies DSNotify
lowed. This prevents broken links of types A-C within a consumes notifies
static collection (e.g., an archive), links with sources/targets event choice
outside this collection can still break. Examples are HTML
links into the Internet Archive 6 . Furthermore, semantically User
broken links may be prevented when e.g., linking to a certain
(unmodifiable) revision of a Wikipedia article.
Figure 2: DSNotify Usage Scenario.
Regular Updates. This approach is based on predictable
updates to resource collections, so applications can easily Manual Edit/Search. In this category we summarize man-
update their links to the new (predictable) resource loca- ual strategies for fixing broken links or re-finding missing
tions, avoiding broken links of types A-C ). link targets. This “solution strategy” is currently arguably
among the most popular ones on the Web. First, content
Redundancy. Redundant copies of resource representations providers may manually update links in their contents (per-
are kept and a service forwards referrers to one of these haps assisted by automatic link checking software like W3C’s
copies as long as at least one copy still exists. This approach link checker). Second, users may re-find the target resources
is related to the versioning and indirection approaches. How- of broken links e.g., by exploiting search engines or by man-
ever, such services can reasonably be applied only to highly ual URI manipulations.
available, unmodifiable data (e.g., collections of scientific
documents). This method may prevent broken links of types
A-C, examples include LOCKSS [21] or RepWeb [23]. 3. DSNOTIFY
After having investigated possible solution strategies to
Dynamic Links. In this method, links are created dynami- deal with the broken link problem, we now present our own
cally when required and are not stored, avoiding broken links solution strategy. It is called DSNotify and is a generic
of types A-C. The links are created based on computations change detection framework that assists human and machine
that may reflect the current state of the involved resources actors in fixing broken links. It can be used as an add-on to
as well as other (external) parameters, i.e., such links may existing applications that want to preserve link integrity in
be context-dependent. However, the automatic generation the data sets that are under their control (detect and correct
of links is a non-trivial task and this solution strategy is not strategy, see below). It can also be used to notify subscribed
applicable to many real-world problems. applications of changes in a set of data sources (notification
strategy). Further, it can be set-up as a service that au-
Notification. Here, clients are informed about the events tomatically forwards requests to new resource locations of
that may lead to broken links and all required information moved resources (indirection strategy).
(e.g., new resource locations) is communicated to them so DSNotify is not meant to be a service that monitors the
they can fix affected links. This strategy was for example whole Linked Data space but rather as a light-weight com-
used in the Hyper-G system [17] where resource updates are ponent that can be tailored to application-specific needs and
propagated between document servers using a p-flood algo- detects modifications in selected Linked Data sources.
rithm. It is also the strategy of Silk and Triplify discussed A typical usage scenario is illustrated in Figure 2: an ap-
in Section 5. This method resolves broken links caused by plication hosts a data set Dsrc that contains links to a target
A-C but requires that the content provider can observe and data set Dtgt . The application consumes and integrates data
communicate such events. from both data sets, and provides a view on this data (such
as a Website) consumed by Web users. The application uses
DSNotify to monitor Dtgt as it has no control over this tar-
Detect and Correct. The solution for the broken link prob- get data set. DSNotify notifies the application (and possibly
lem proposed in this work falls mainly into this category
other subscribed actors) about the events occurring in Dtgt
that Ashman describes in her work as: “. . . the application
and the application can update and fix potentially broken
using the link first checks the validity of the endpoint refer-
links in the source data set.
ence against the information, perhaps matching it with an
expected value. If the validity test fails, an attempt to correct 3.1 General Approach
the link by relocating it may be made . . . ” [2].
As all other solutions in this category (cf., Section 5), we Our approach for fixing broken links is based on an in-
use heuristic methods to semi-automatically fix broken links dexing infrastructure. A monitor periodically accesses con-
of the types A-C. sidered data sources (e.g., a Linked Data source), creates
an item for each resource it encounters and extracts a fea-
6
For example the URI http://web.archive.org/web/ ture vector from this item’s representations. The feature
19971011064403/http://www.archive.org/index.html vector is stored together with the item’s URI in an item
references the 1997 version of the internet archive main index (ii). The set of extracted features, their implementa-
page. tion, and their extractors are configurable. Feature vectors
5. CA CB CC CD CE RA MB,F MC,G CH RF RD ME,I
are updated in the ii with every monitoring cycle resulting
in possible update events logged by DSNotify.
Items corresponding to resources that are not found any- m0 m1 h1
more are moved to another index, the removed item index A
B
E
F
B E
F A
B
G C
(rii). After some timeout period, items are moved from the C
D D A
G
D C
rii to a third index called the archived item index (aii) re- ii rii aii
sulting in a remove event (cf. Definition 2).
Items in the ii and the rii are periodically considered by a t
housekeeper thread (a “move detector”) that compares their m2 h2 m3 h3
feature vectors and tries to identify possible successors for H F B H F B H F B H F
B
I E I E I E I E
removed (real-world) items (cf. Definition 3). A plug-in G D C G
D C G
D C G CD
A A A A
heuristic is used for this comparison; in the default con-
figuration a vector space model acting on the extracted and timeout timeout
weighted feature vectors is used. The similarity measures for
the features themselves are configurable; for character string Figure 3: Example timeline illustrating the main
features one could use, for instance, exact string matching workflow of DSNotify. Ci ,Ri and Mi,j denote create,
or the Levenshtein distance. The similarity value calculated remove and move events of items i and j. mx and
by the heuristic is compared against threshold values 7 to hx denote monitoring and housekeeping operations
determine whether an item is another item’s predecessor respectively. The current index contents is shown
(resulting in a detected move event) or not (possibly re- in the grey boxes below the respective operation,
sulting in a detected create event). The predecessors of the the overall process is explained in the text.
newly indexed items are moved to the aii and linked to the
new corresponding items. This enables actors to query the
DSNotify indices for actual locations of items. the evaluation of our system (see Section 4).
The events detected by DSNotify are stored in a central
event log. This log as well as the indices can be accessed 3.3 Monitoring and Housekeeping
via various interfaces, such as a Java API, an XML-RPC The cooperation of monitor, housekeeper, and the indices
interface, and an HTTP interface. is depicted in Figure 3. To simplify matters, we assume an
empty dataset at the beginning. Then four items (A,B,C,D)
3.2 Time-interval-based Blocking are created before the initial DSNotify monitoring process
The main task of DSNotify is the efficient and accurate starts at m0 . The four items are detected and added to
matching of pairs of feature vectors representing the same the item index (ii). Then a new item E is created, item A
real-world item at different points in time. As in record is removed and the items B and C are “moved” to a new
linkage and related problems (cf. Section 5), the number location becoming items F and G respectively. At m1 the
of such pairs grows quadratically with the number of con- three items that are not found anymore by the monitor are
sidered items resulting in unacceptable computational ef- “moved” to the removed item index (rii) and the new item
fort. The reduction in the number of comparisons is called is added to the ii. When the housekeeper is started for the
blocking and various approaches have been proposed in the first time at h1 , it acts on the current indices and compares
past [25]. the recent new items (E,F ,G) with the recently removed
We have developed a time-interval-based blocking (TIBB) items (B,C,A). It does not include the “old” item D in its
mechanism for an efficient and accurate reduction of the feature vector comparisons. The housekeeper detects B as a
number of compared feature vector pairs. Our method in- predecessor of F and C as a predecessor of G, moves B and
cludes only the feature vectors derived from items that were C to the archived item index (aii) and links them to their
detected as being recently created or removed, blocking out successors. Between m1 and m2 a new item is created (H),
the vast majority of the items in our indices. Reconsider- two items (F ,D) are removed and the item E is “moved”
ing Definition 3, this means that we are allowing only small to item I. The monitor updates the indices accordingly at
values for ∆. Thus, if x is the number of feature vectors m2 and the subsequent housekeeping operation at h2 tries
stored in our system, n is the number of new items and r to find predecessors of the items H and I. But before this
is the number of recently removed items with n + r ≤ x, operation, the housekeeper recognizes that the retention pe-
then the number of comparisons in a single DSNotify house- riod of item A in rii is longer than the timeout period and
keeping operation is n · r instead of x2 . It is intuitively clear moves it to the aii. The housekeeper then detects E as a
that normally n and r are much smaller than x and therefore predecessor of I, moves it also to the aii and links it to I.
n·r x2 . The actual number of feature vector comparisons Between m2 and m3 no events take place and the indices
in a single housekeeper operation depends on the vitality of remain untouched by the monitor. At h3 the housekeeper
the monitored data source with respect to created, removed recognizes the timeout of the items F and D and moves
and moved items and on the frequency of housekeeping op- them to the aii leaving an empty rii.
erations8 . We have analyzed and confirmed this behavior in
7
3.4 Event Choices
Note that using threshold values for the determination As mentioned before, a threshold value (upperThreshold )
of non-matches, possible-matches and matches was already
proposed by Fellegi and Sunter in 1969 [10]. is used to decide whether two feature vectors are similar
8
As housekeeping and monitoring are separate operations enough to assume their corresponding items as a predeces-
in DSNotify, this number depends also on the monitoring sor/successor pair. Furthermore, DSNotify uses a second
frequency when lower than the housekeeping frequency. threshold value (lowerThreshold ) to decide whether two fea-
6. Data: Item indices ii, rii, aii
Result: List of detected events L updates monitors
begin Dsrc
Move timed-out items from rii to aii;
L ←− ∅; P M C ←− ∅; Simulator DSNotify
Dobs
foreach ni ∈ ii.getRecentItems() do
pmc ←− ∅;
foreach oi ∈ rii.getItems() do Dtgt
sim ←− calculateSimilarity(oi, ni);
if sim > lowerT hreshold then Analyzer
Etest Edet
pmc ←− pmc + {(oi, ni, sim)};
end
end Results
if pmc = ∅ then
L ←− L + {(ni, ∅, create)};
else
P M C ←− P M C + {pmc}; Figure 4: The DSNotify evaluation approach. A
end simulator takes two datasets (Dsrc and Dtgt ) and an
end eventset (Etest ) as input and continuously updates
foreach pmc ∈ P M C do a newly created observed dataset (Dobs ). DSNotify
if pmc = ∅ then
(oimax , nimax , simmax ) ←− monitors this dataset and creates a log of detected
getElementW ithM axSim(pmc); events (Edet ). This log is compared to the eventset
if simmax > upperT hreshold then Etest to evaluate the system’s accuracy.
L ←− L + {(oimax , nimax , move)};
move oimax to aii;
link oimax to nimax ; we rely on simple timeouts for removing old data items from
remove all elements from P M C where these structures but this method can still result in unaccept-
pmc.oi = oimax ;
able memory consumption when monitoring highly dynamic
else
Issue an eventChoice for pmc; data sources. More advanced strategies are under consider-
end ation. Note that we consider particularly the feature vector
end history as a very valuable data structure as it allows ex post
end analysis of the evolution of items w.r.t. their location in a
return L; data set and the values of the indexed features.
end
Algorithm 1: Central DSNotify housekeeping algorithm. 4. EVALUATION
In the evaluation of our system we concentrated on two
issues: first, we wanted to evaluate the system for its ap-
ture vectors are similar enough to be even considered as
plicability for real-world Linked Data sources, and second,
such a pair (“possible move candidates”, pmc). When none
we wanted to analyze the influence of the housekeeping fre-
of the feature vectors considered for a possible move op-
quency on the overall effectiveness of the system.
eration are similar enough (i.e., >upperThreshold ), DSNo-
We evaluated our system with datasets that we call event-
tify stores all considered pairs of feature vectors with sim-
sets. An eventset is a timely-ordered set of events (cf. Defini-
ilarity values >lowerThreshold in a so-called event choice
tion 2 and 3) that transforms a source into a target dataset.
object. Event choices are representations of decisions that
Thus, an eventset can be seen as the event history of a
have to be made outside of DSNotify, for example by hu-
dataset. We have developed a simulator that can re-play
man actors or by other machine actors that can resort to
such eventsets, interpreting the event timestamps with re-
additonal data/knowledge. These external actors may ac-
gard to a configurable duration of the whole simulation. Fig-
cess the log of event choice objects and send their decisions
ure 4 depicts an overview of our evaluation approach.
about what feature vector pair (if any) corresponds to a pre-
All experiments were carried out on a system using two
decessor/successor pair back. DSNotify will now update its
Intel Xeon CPUs with 2.66 Ghz each and 8 GB of RAM.
indices accordingly and send notifications to all subscribed
The used threshold values were 0.8 (upperThreshold ) and 0.3
actors. A detailed description of the overall housekeeping al-
(lowerThreshold ). We have created two types of eventsets
gorithm, the core of DSNotify, is presented in Algorithm 1.
from existing datasets for our evaluation: the iimb-eventsets
3.5 Item History and Event Log and the dbpedia-eventset 9 .
As discussed above, DSNotify incrementally constructs 4.1 The IIMB Eventsets
three central data structures during its operation: (i) an
The iimb-eventsets are derived from the ISLab Instance
event log containing all events detected by the system, (ii) a
Matching Benchmark [11] which contains one (source) data-
log containing all unresolved event choices and (iii) a linked
set containing 222 instances and 37 target datasets that vary
structure of feature vectors constituting a history of the re-
in number and type of introduced modifications to the in-
spective items. This latter structure is stored in the indices
stance data. It is the goal of instance matching tools to
maintained by DSNotify. All three data structures can be
match the resources in the source dataset with the resources
accessed in various ways by agents that make use of DSNo-
in the respective target dataset by comparing their instance
tify for fixing broken links as further described in [12].
data. The benchmark contains an alignment file describing
As these data structures may grow indefinitely, a strategy
9
for pruning them from time to time is required. Currently All data sets are available at http://dsnotify.org/.
7. 90 100
Name Coverage H Hnorm hki=1s
tbox:cogito-Name 0.995 5.378 0.995
80
tbox:cogito-first sentence 0.991 5.354 0.991 q
hki=3s q
q
F1 − measure (%)
q
70
tbox:cogito-tag 0.986 1.084 0.201 hki=10s
q
q
60
tbox:cogito-domain 0.982 3.129 0.579 q q
50
hki=20s q
tbox:wikipedia-name 0.333 1.801 0.333
40
q
tbox:wikipedia-birthdate 0.225 1.217 0.225 q
hki=30s
q
30
q
tbox:wikipedia-location 0.185 0.992 0.184 q
20
q
q q q q
tbox:wikipedia-birthplace 0.104 0.553 0.102
10
q
Namespace prefix tbox: <http://islab.dico.unimi.it/iimb/tbox.owl# >
0
1 2 3 4 5 6 7 8 9 10
Table 3: Coverage, entropy and normalized en- iimb−eventset number
tropy of all properties in the iimb datasets with a
coverage > 10%. The selected properties are written Figure 5: Influence of the housekeeping interval
in bold. (hki) on the F1 -measure in the iimb-eventsets evalu-
ations.
what resources correspond to each other that can be used
to measure the effectiveness of such tools. We used this curacy with the increasing dataset number. This is also
alignment information to derive 10 eventsets, correspond- expected as the benchmarks introduces more value trans-
ing to the first 10 iimb target datasets, each containing 222 formations with higher dataset numbers, although there are
move events. The first 10 iimb datasets introduce increasing two outliners for the datasets 7 and 10.
numbers of value transformations like typographical errors
to the instance data. We used random timestamps for the 4.2 The DBpedia Persondata Eventset
events (as this data is not available in this benchmark) that In order to evaluate our approach with real-world data we
resulted in an equal distribution of events over the eventset have created a dbpedia-eventset that was derived from the
duration. person datasets of the DBpedia snapshots 3.2 and 3.310 . The
We have simulated these eventsets, monitored the chang- raw persondata datasets contain 20,284 (version 3.2) and
ing dataset with DSNotify and measured precision and recall 29,498 (version 3.3) subjects typed as foaf:Person each hav-
of the reported events with respect to the eventset informa- ing three properties foaf:name, foaf:surname and foaf:given-
tion. For a useful feature selection we first calculated the name. Naturally, these properties are very well suited to
entropy of the properties with a coverage > 10%, i.e., only uniquely identify persons as also confirmed by their high
properties were considered where at least 10% of the re- entropy values (cf. Table 4). For the same reasons as al-
sources had instance values. The results are summarized ready discussed for the iimb datasets an evaluation with
in Table 3. As the goal of the evaluation was not to op- only these properties would not clearly demonstrate our
timize the resulting precision/recall values but to analyze approach. Therefore we enriched both raw data sets with
our blocking approach, we consequently chose the properties four properties (see Table 4) from the respective DBpedia
tbox:cogito-tag and tbox:cogito-domain for the evaluation be- Mapping-based Infobox Extraction datasets [7] with smaller
cause they have good coverage but comparatively small en- coverage and entropy values.
tropy in this dataset. We calculated the entropy as shown We derived the dbpedia-eventset by comparing both data-
in Equation 1 and normalized it by dividing by ln(n). sets for created, removed or updated resources. We retrieved
n the creation and removal dates for the events from Wikipedia
H(p) = − pi ln(pi ) (1) as these data are not included in the DBpedia datasets. For
i=1 the update events we used random dates. Furthermore, we
used the DBpedia redirect dataset to identify and generate
DSNotify was configured to compare these properties using move events. This dataset contains redirection information
the Levenshtein distance and both properties contributed derived from Wikipedia’s redirect pages that are automat-
equally (weight = 1.0) to the corresponding feature vector ically created when a Wikipedia article is renamed. The
comparison. The simulation was configured to run for 60 dates for these events were also retrieved from Wikipedia.
seconds, thus the monitored datasets changed with an aver- The resulting eventset contained 3810 create, 230 remove,
age rate of 222 = 3.7 events/s.
60 4161 update and 179 move events, summing up to 8380
As stated before, the goal of this evaluation was to demon- events11 . The histogram of the eventset depicted in Figure 6
strate the influence of the housekeeping frequency on the shows a high peak in bin 14. About a quarter of all events
overall effectiveness of the system. For this, we repeated 10
the experiment with varying housekeeping intervals of 1s, The snapshots contain a subset of all instances of type
3s, 10s, 20s, 30s (corresponding to an average rate 3.7, 11.1, foaf:Person and can be downloaded from http://dbpedia.
org/ (filename: persondata_en.nt).
37.0, 74.0, 111.0 events/housekeeping cycle) and calculated 11
Another 5666 events were excluded from the eventset as
the F1 -measure (the harmonic mean of precision and recall) they resulted from inaccuracies in the DBpedia datasets.
for each dataset (Figure 5). For example there are some items in the 3.2 snapshot
that are not part of the 3.3 snapshot but were not re-
Results. The results clearly demonstrate the expected de- moved from Wikipedia (a prominent example is the re-
crease in accuracy when increasing the length of the house- source http://dbpedia.org/resource/Tim_Berners-Lee).
Furthermore several items from version 3.3 were not in-
keeping intervals, as this leads to more feature vector com- cluded in version 3.2 although the creation date of the corre-
parisons and therefore more possibilities to make the wrong sponding Wikipedia article is before the creation date of the
decisions. Furthermore, Figure 5 depicts the decreasing ac- 3.2 snapshot. We decided to generally exclude such items.
8. 90 100
Name Coverage H Hnorm q
foaf:name
q
q
foaf:name (d) 1.00/1.00 9.91/10.28 1.00/1.00 q
q
q q
80
q
q
dsnotify:rdfhash
foaf:surname (d) 1.00/1.00 9.11/9.25 0.92/0.90 q
q q
F1 − measure (%)
q
q
70
foaf:givenname
foaf:givenname (d) 1.00/1.00 8.23/8.52 0.83/0.83 dbpedia:birthdate+dbpedia:birthplace
q q
q
60
dbpedia:birthdate (d) 0.60/0.60 5.84/5.96 0.59/0.58 q
50
q
q
dbpedia:birthplace (o) 0.48/0.47 4.24/4.32 0.43/0.42 dbpedia:birthplace
q q
40
q
q
q
q
dbpedia:height (d) 0.10/0.08 0.65/0.51 0.07/0.05
30
q
dbpedia:birthdate
dbpedia:draftyear (d) 0.01/0.01 0.06/0.05 0.01/0.01
20
q
dbpedia:height
10
Namespace prefix dbpedia: <http://dbpedia.org/ontology/> dbpedia:draftyear
0
Namespace prefix foaf: <http://xmlns.com/foaf/0.1/>
1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4
Log10(average events/housekeeping cycle)
Table 4: Coverage, type, entropy and normalized
entropy of all properties in the enriched dbpedia
Figure 7: Influence of the number of events per
3.2/3.3 persondata sets. The selected properties
housekeeping cycle on the F1 -measure of detected
are written in bold. Symbols: object property (o),
move events in the dbpedia-eventset evaluation.
datatype property (d).
1000 1500 2000
Results. The results, depicted in Figure 7, show a fast sat-
uration of the F1 -measure with an decreasing number of
Events
events per housekeeping cycle. This clearly confirms the
findings from our iimb evaluation. The accuracy of DSNo-
500
tify increases with increasing housekeeping frequencies or
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 decreasing event rates. From a pragmatical viewpoint, this
Bins means a tradeoff between the costs for monitoring and house-
keeping operations (computational effort, network transmis-
Figure 6: Histogram of the distribution of events in sion costs, etc.) and accuracy. The curve for the simple rdf-
the dbpedia-eventset. A bin corresponds to a time hash function is surprisingly good, stabilizing at about 80%
interval of about 11 days. for the F1 -measure. This can be attributed mainly to the
high precision rates that are expected from such a function.
The curve for the combined properties shows maximum val-
occured within this time interval. We think that such event ues for the F1 -measure of about 60%.
peaks are not unusual in real-world data and are interested The measured precision and recall rates are depicted in
how our application deals with such situations. Figure 8. Both measures show a decrease with increasing
We re-played the eventsets, monitored the changing data- numbers of events per housekeeping cycle. For the preci-
set with DSNotify and measured precision and recall of the sion this can be observed mainly for low-entropy properties
reported events with respect to the eventset information (cf. whereas the recall measures for all properties are affected.
Figure 4). We repeated the simulation seven times varying
dsnotify:rdfhash
90 100
q
q q
q
the number of average events per housekeeping interval and
q q
q
q q q q
q q
q
q foaf:name
calculated the F1 -measure of the reported move events12 . q q
80
q
Precision (%)
70
q
For each simulation, DSNotify was configured to index q q
60
q q
only one of the six selected properties in Table 4. To calcu- dbpedia:birthplace
50
q q q
foaf:givenname
40
late the similarity between datatype properties, we used the q
dbpedia:birthdate+dbpedia:birthplace
q
30
Levensthein distance. For object properties we used a sim- dbpedia:birthdate
20
dbpedia:height
ple similarity function that counted the number of common
10
dbpedia:draftyear
0
property values (i.e., resources) in both resources that are 1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4
compared and divided it by the number of total values. Log10(average events/housekeeping cycle)
Furthermore, we ran the simulations indexing only one
90 100
foaf:name
cumulative attribute, an rdf-hash. This hash function calcu- q
q q
q
q
foaf:givenname q
80
q q
lates an MD5 hashsum over all string-serialized properties
70
q
dsnotify:rdfhash
Recall (%)
q
q
q q q
of a resource and the corresponding similarity function re-
60
q
q
50
q
turns 1.0 if the hash-sums are equal or 0.0 otherwise. Thus
q
q
dbpedia:birthdate+dbpedia:birthplace
40
dbpedia:birthplace q
this rdf-hash is sensible to any modifications in a resource’s
30
q
q q q
q
q
dbpedia:birthdate
20
q
instance data. q
10
dbpedia:height
dbpedia:draftyear
Additionally we evaluated a combination of the dbpedia
0
1.6 1.8 2.0 2.2 2.4 2.6 2.8 3.0 3.2 3.4
birthdate and birthplace properties, each contributed with Log10(average events/housekeeping cycle)
equal weight to the weighted feature vector. The coverage of
resources that had values for at least one of these attributes
Figure 8: Influence of the number of events per
was 65% in the 3.2 snapshot and 62% in the 3.3 snapshot.
housekeeping cycle on the measured precision and
recall of detected move events in the dbpedia-eventset
12
We fixed the housekeeping period for this experiment to 30s evaluation.
and varied the simulation length from 3600 to 56.25s. Thus
the event rates varied between 2.3 to 149.0 events/second It is, again, important to state that our evaluation had not
or 35.2 to 2250.1 events/housekeeeping interval respectively.
For these calculations we considered only move, remove and the goal to maximize the accuracy of the system for these
create events (i.e., 4219 events) from the eventset as only particular eventsets but rather to reveal the characteristics
these influence the accuracy of the algorithm. of our time-interval-based blocking approach. It shows that