This document discusses RDF2Rule, which is an approach to learning rules from RDF knowledge bases. It does this by first mining frequent predicate cycles (FPCs) from the RDF graphs, which are patterns that frequently appear. It then generates rules from these mined FPCs. RDF2Rule can learn rules quickly from RDF data and always generates more rules than alternative approaches. It also has high quality predictions and fast running times. The document provides background on semantic web issues, RDF, predicate paths and cycles that FPCs are based on, and how RDF2Rule indexes RDF data to efficiently support its mining algorithm.
The International Federation of Library Associations and Institutions (IFLA) is responsible for the development and maintenance of International Standard Bibliographic Description (ISBD), UNIMARC, and the "Functional Requirements" family for bibliographic records (FRBR), authority data (FRAD), and subject authority data (FRSAD). ISBD underpins the MARC family of formats used by libraries world-wide for many millions of catalog records, while FRBR is a relatively new model optimized for users and the digital environment. These metadata models, schemas, and content rules are now being expressed in the Resource Description Framework language for use in the Semantic Web.
This webinar provides a general update on the work being undertaken. It describes the development of an Application Profile for ISBD to specify the sequence, repeatability, and mandatory status of its elements. It discusses issues involved in deriving linked data from legacy catalogue records based on monolithic and multi-part schemas following ISBD and FRBR, such as the duplication which arises from copy cataloging and FRBRization. The webinar provides practical examples of deriving high-quality linked data from the vast numbers of records created by libraries, and demonstrates how a shift of focus from records to linked-data triples can provide more efficient and effective user-centered resource discovery services.
The International Federation of Library Associations and Institutions (IFLA) is responsible for the development and maintenance of International Standard Bibliographic Description (ISBD), UNIMARC, and the "Functional Requirements" family for bibliographic records (FRBR), authority data (FRAD), and subject authority data (FRSAD). ISBD underpins the MARC family of formats used by libraries world-wide for many millions of catalog records, while FRBR is a relatively new model optimized for users and the digital environment. These metadata models, schemas, and content rules are now being expressed in the Resource Description Framework language for use in the Semantic Web.
This webinar provides a general update on the work being undertaken. It describes the development of an Application Profile for ISBD to specify the sequence, repeatability, and mandatory status of its elements. It discusses issues involved in deriving linked data from legacy catalogue records based on monolithic and multi-part schemas following ISBD and FRBR, such as the duplication which arises from copy cataloging and FRBRization. The webinar provides practical examples of deriving high-quality linked data from the vast numbers of records created by libraries, and demonstrates how a shift of focus from records to linked-data triples can provide more efficient and effective user-centered resource discovery services.
Cross-language information retrieval (CLIR) is a technique to locate documents written in one natural language by queries expressed in another language. This project investigates the feasibility of CLIR based on domain-specific bilingual corpus databases.
Concept and example of a semantic solution implemented with SQL views to cooperate with users on queries over structured data with independence from database schema knowledge and technology.
Over the last years, the Semantic Web has been growing steadily. Today, we count more than 10,000 datasets made available online following Semantic Web standards. Nevertheless, many applications, such as data integration, search, and interlinking, may not take the full advantage of the data without having a priori statistical information about its internal structure and coverage. In fact, there are already a number of tools, which offer such statistics, providing basic information about RDF datasets and vocabularies. However, those usually show severe deficiencies in terms of performance once the dataset size grows beyond the capabilities of a single machine. In this paper, we introduce a software component for statistical calculations of large RDF datasets, which scales out to clusters of machines. More specifically, we describe the first distributed inmemory approach for computing 32 different statistical criteria for RDF datasets using Apache Spark. The preliminary results show that our distributed approach improves upon a previous centralized approach we compare against and provides approximately linear horizontal scale-up. The criteria are extensible beyond the 32 default criteria, is integrated into the larger SANSA framework and employed in at least four major usage scenarios beyond the SANSA community.
Context, Perspective, and Generalities in a Knowledge OntologyMike Bergman
This presentation to the Ontolog Forum in Dec 2016 presents the knowledge graph (ontology) design for KBpedia, a system of six major knowledge bases and 20 minor ones for conducting knowledge-based artificial intelligence (KBAI). The talk emphasizes the roots of the system in the triadic logic of Charles Sanders Peirce. It also discusses the use of KBpedia for the more-or-less automatic ways it can help create training corpuses, training sets, and reference standards for supervised, unsupervised and deep machine learning. Uses of the system include entity and relation extraction and tagging, classification, clustering, sentiment analysis, and other AI tasks.
Introducing FRSAD and Mapping it with Other ModelsMarcia Zeng
Report on the work of the IFLA FRSAR (Functional Requirements for Subject Authority Records) Working Group.
1. Introducing the FRSAD model.
2. Mapping to other models (BS 8723 and ISO 25964, SKOS, OWL, & DCMI-AM).
(Presented at IFLA2009 Milan, 08, 2009 by Marcia Zeng and Maja Zumer)
Paper and FRSAD report available at:http://nkos.slis.kent.edu/FRSAR/index.html
FRSAD Functional Requirements for Subject Authority Data modelMarcia Zeng
Presentation on the modeling approach of the FRSAD (Functional Requirements for Subject Authority Data) model; the entities, attributes, and relationships defined. Discussions of the implications of the FRSAD model for interoperability and future R&D considered. Presented for the ALCTS CCS Subject Analysis Committee, ALA 2010 Annual Conference, Washington, D.C. June 28, 2010
These slides were presented as part of a W3C tutorial at the CSHALS 2010 conference (http://www.iscb.org/cshals2010). The slides are adapted from a longer introduction to the Semantic Web available at http://www.slideshare.net/LeeFeigenbaum/semantic-web-landscape-2009 .
A PDF version of the slides is available at http://thefigtrees.net/lee/sw/cshals/cshals-w3c-semantic-web-tutorial.pdf .
Explanations in Dialogue Systems through Uncertain RDF Knowledge BasesDaniel Sonntag
We implemented a generic dialogue shell that can be configured for and applied to domain-specific dialogue applications. The dialogue system works robustly for a new domain when the application backend can automatically infer previously unknown knowledge (facts) and provide explanations for the inference steps involved. For this purpose, we employ URDF, a query engine for uncertain and potentially inconsistent RDF knowledge bases. URDF supports rule-based, first-order predicate logic as used in OWL-Lite and OWL-DL, with simple and effective top-down reasoning capabilities. This mechanism also generates explanation graphs. These graphs can then be displayed in the GUI of the dialogue shell and help the user understand the underlying reasoning processes. We believe that proper explanations are a main factor for increasing the level of user trust in end-to-end human-computer interaction systems.
Cross-language information retrieval (CLIR) is a technique to locate documents written in one natural language by queries expressed in another language. This project investigates the feasibility of CLIR based on domain-specific bilingual corpus databases.
Concept and example of a semantic solution implemented with SQL views to cooperate with users on queries over structured data with independence from database schema knowledge and technology.
Over the last years, the Semantic Web has been growing steadily. Today, we count more than 10,000 datasets made available online following Semantic Web standards. Nevertheless, many applications, such as data integration, search, and interlinking, may not take the full advantage of the data without having a priori statistical information about its internal structure and coverage. In fact, there are already a number of tools, which offer such statistics, providing basic information about RDF datasets and vocabularies. However, those usually show severe deficiencies in terms of performance once the dataset size grows beyond the capabilities of a single machine. In this paper, we introduce a software component for statistical calculations of large RDF datasets, which scales out to clusters of machines. More specifically, we describe the first distributed inmemory approach for computing 32 different statistical criteria for RDF datasets using Apache Spark. The preliminary results show that our distributed approach improves upon a previous centralized approach we compare against and provides approximately linear horizontal scale-up. The criteria are extensible beyond the 32 default criteria, is integrated into the larger SANSA framework and employed in at least four major usage scenarios beyond the SANSA community.
Context, Perspective, and Generalities in a Knowledge OntologyMike Bergman
This presentation to the Ontolog Forum in Dec 2016 presents the knowledge graph (ontology) design for KBpedia, a system of six major knowledge bases and 20 minor ones for conducting knowledge-based artificial intelligence (KBAI). The talk emphasizes the roots of the system in the triadic logic of Charles Sanders Peirce. It also discusses the use of KBpedia for the more-or-less automatic ways it can help create training corpuses, training sets, and reference standards for supervised, unsupervised and deep machine learning. Uses of the system include entity and relation extraction and tagging, classification, clustering, sentiment analysis, and other AI tasks.
Introducing FRSAD and Mapping it with Other ModelsMarcia Zeng
Report on the work of the IFLA FRSAR (Functional Requirements for Subject Authority Records) Working Group.
1. Introducing the FRSAD model.
2. Mapping to other models (BS 8723 and ISO 25964, SKOS, OWL, & DCMI-AM).
(Presented at IFLA2009 Milan, 08, 2009 by Marcia Zeng and Maja Zumer)
Paper and FRSAD report available at:http://nkos.slis.kent.edu/FRSAR/index.html
FRSAD Functional Requirements for Subject Authority Data modelMarcia Zeng
Presentation on the modeling approach of the FRSAD (Functional Requirements for Subject Authority Data) model; the entities, attributes, and relationships defined. Discussions of the implications of the FRSAD model for interoperability and future R&D considered. Presented for the ALCTS CCS Subject Analysis Committee, ALA 2010 Annual Conference, Washington, D.C. June 28, 2010
These slides were presented as part of a W3C tutorial at the CSHALS 2010 conference (http://www.iscb.org/cshals2010). The slides are adapted from a longer introduction to the Semantic Web available at http://www.slideshare.net/LeeFeigenbaum/semantic-web-landscape-2009 .
A PDF version of the slides is available at http://thefigtrees.net/lee/sw/cshals/cshals-w3c-semantic-web-tutorial.pdf .
Explanations in Dialogue Systems through Uncertain RDF Knowledge BasesDaniel Sonntag
We implemented a generic dialogue shell that can be configured for and applied to domain-specific dialogue applications. The dialogue system works robustly for a new domain when the application backend can automatically infer previously unknown knowledge (facts) and provide explanations for the inference steps involved. For this purpose, we employ URDF, a query engine for uncertain and potentially inconsistent RDF knowledge bases. URDF supports rule-based, first-order predicate logic as used in OWL-Lite and OWL-DL, with simple and effective top-down reasoning capabilities. This mechanism also generates explanation graphs. These graphs can then be displayed in the GUI of the dialogue shell and help the user understand the underlying reasoning processes. We believe that proper explanations are a main factor for increasing the level of user trust in end-to-end human-computer interaction systems.
Resource Description Framework (RDF) has entered the metadata scene for libraries in a major way over the last few years. While the promise of its Linked Data capabilities is exciting, the realities of changing data models, encoding practices, and even ontologies can put a check on that excitement. This session will explore these issues and discuss when this is worth doing and how to go about doing it.
Presentation done* at the 13th International Semantic Web Conference (ISWC) in which we approach a compressed format to represent RDF Data Streams. See the original article at: http://dataweb.infor.uva.es/wp-content/uploads/2014/07/iswc14.pdf
* Presented by Alejandro Llaves (http://www.slideshare.net/allaves)
First Steps in Semantic Data Modelling and Search & Analytics in the CloudOntotext
This webinar will break the roadblocks that prevent many from reaping the benefits of heavyweight Semantic Technology in small scale projects. We will show you how to build Semantic Search & Analytics proof of concepts by using managed services in the Cloud.
We now have larger Knowledge Bases than ever before. (10 billion facts is now a small number).
We now have the instruments to observe and analyse these very large Knowledge Bases.
We can use these insights for better tools for querying, inferencing, publishing, maintaining, visualising and explaining.
The Web is a universal medium for information, data and knowledge exchange. The Semantic Web is an extension of the World Wide Web, ``in which information is given well-defined meaning, better enabling computers and people to work in cooperation''\cite{semweb:lee}. RDF, together with SparQL, provide a powerful mechanism for describing and interchanging metadata on the web. This paper presents briefly the two concepts - RDF, SparQL - and three of the most popular frameworks (written in Java) that offer support for RDF: Jena, Sesame and JRDF.
Integrating Heterogeneous Data Sources in the Web of DataFranck Michel
These are the slides of a 40mn presentation I've made at the CNRS Software Development days (JDEV 2017), in Marseille (France), July 5th, 2017.
Here is the Webcast, in French: https://webcast.in2p3.fr/videos-integrer_des_sources_de_donnees_heterogenes_dans_le_web_de_donnees
2. Semantic Web
The Semantic Web is an extension of the current web in which information is
given well-defined meaning, better enabling computers and people to work in
cooperation (Tim Berners-Lee, 2001).
The study of meaning focused on the relation between signifiers like, words,
phrase, signs and symbol and what they stand for.
2
3. Semantic web problems
Too Much web information
Around 1,000,000,000 (1×109) resources.
Many different type of resources
• Texts, images, graphics
• Audio, video, multimedia.
• Database, web applications
3
4. Semantic web problems
Information not Indexable
No common scheme for doing so.
Differing relationship between authors, publisher, info
intermediaries and users
Each community use their own approach
Information not shareable
Difficult to share information about information.
Not common catalog scheme.
4
6. Second issue
Language for expressing metadata must be:
universal (so all can understand)
–flexible (to incorporate different types)
–extensible (flexible to custom types)
–simple (to encourage adoption)
–modular (so that schemes can be mixed, extended)
6
8. RDF
RDF stands for Resource Description Framework
It is a machine understandable metadata
RDF is graphical formalism ( + XML syntax + semantics) –for
representing metadata
for describing the semantics of information in a machine
accessible way
8
9. Resource Description Framework(RDF)
RDF is language for represent the resources:
A Resource can be any thing.
Within the context of the web relevance is the given to web
resources, i.e. that any thing that can be located via
URL(Uninform Resource Locater).
The basic building block is the statements(or triple).
One of the main application: data integration
9
10. RD2RDF
The adaption of the relation model to the web given rise to RDF.
From Tuples to Triples.
Any relation data can be represented as triples:
Row key subject
Column property / Relation / literal
Value value / object
10
subject valueproperty
statement
11. RDF2Rules
Rule Learning Approach
Learning Rules from RDF Knowledge Bases by Mining Frequent
Predicate Cycles
First mines frequent predicate cycles (FPCs), a kind of interesting
frequent patterns in knowledge bases, and then generates rules
from the mined FPCs
It uses the entity type information when generates and evaluates
rules
11
12. Quality of RDF KB
enrich the knowledge in an RDF KB,
information extraction techniques are usually used to extract more entities
and their relations from plain text or semi-structured text.
expand a KB is to infer new facts from the existing ones by using
inference rules.
hasChild(A,B) ∧ hasSpouse(A,C) ⇒ hasChild(C,B)
Entities
Facts
12
13. RDF graph
RDF is a graph-based data model;
a set of RDF triples constitutes an RDF
graph,
nodes resources
directed vertices predicates
13
14. RDF graph (Nodes)
three kinds of nodes (resources) in an RDF graph:
IRI a global identifier for a resource, such as people, organization and place;
Literals are basic values including strings, dates and numbers, etc.;
blank nodes in RDF represent recourses without global identifiers.
14
15. Path and Cycle
Path
A path in an RDF KB G = (E,P,T) is a sequence of consecutive entities and predicates
Cycle
A cycle in an RDF graph is a special path that starts and ends at the same node.
15
16. Predicate path and Cycle
(PREDICATE PATH).
A predicate path is a sequence of entity variables and predicates .
(PREDICATE CYCLE).
A predicate cycle is a special predicate path that starts and ends at the same entity
variable.
16
17. FPC(Frequent Predicate Cycle)
the interesting patterns represented by predicate cycles can be used to infer new
facts in KBs
dbo : spouse(x1,x2) ∧ dbo : children(x2,x3)
⇒ dbo : children(x1,x3)
dbo : children(x1,x3) ∧ dbo : children(x2,x3)
⇒ dbo : spouse(x1,x2)
17
18. FPC(Frequent Predicate Cycle)
The number of its instances that exists in the given RDF KB is called the support
of it. If the support of a predicate path (cycle) is not less than a specified
threshold, it is called the frequent predicate path (cycle)
Frequent predicate cycles (FPCs) are patterns that frequently appear in the KB,
rules generated from FPCs are prone to be reliable.
so our proposed approach RDF2Rules first mines frequent predicate cycles from
RDF KBs, and then generates inference rules from FPCs.
18
19. RDF INDEXING FOR MINING ALGORITHM
In-memory indexing structure to support the mining algorithm
instead of using the existing RDF storage systems
Given a predicate, find all the entity pairs that it connects;
Given an entity, find all its incident edges and its neighbor entities;
Given a predicate path, find all of its instances path
19
20. RDF2 Rule
RDF2Rules can learn rules very quickly.
RDF2Rules always gets more rules.
Quality of Predictions and the running time.
Remove duplicate rules through FPC.
20
21. conclusion
RDF is a simple data model based on graph.
RDF has a extensible URI-based vocabulary.
Anyone can make statements about ANY RESOURCE( open world
assumptions).
Rules are learned by finding frequent predicate cycles in RDF
graphs.
Quality of Predictions and the running time.
21
Editor's Notes
You’ll be able to find and take pieces of data sets from different places, aggregate them without warehousing, and analyze them in a more straightforward, powerful way than you can now.” (PWC, May 2009)