Schema-agnositc queries over large-schema databases: a distributional semanti...Andre Freitas
This document provides an overview and summary of André Freitas' PhD thesis defense presentation on schema-agnostic queries for large schema databases using distributional semantics. The presentation motivates the need for schema-agnostic queries due to the rise of very large and dynamic database schemas. It proposes using distributional semantics to provide an accurate, comprehensive and low maintenance approach to cope with semantic heterogeneity in schema-agnostic queries. The key aspects of the approach include semantic pivoting to reduce semantic complexity, distributional semantic models to enable semantic matching, and a hybrid distributional-relational semantic model called τ-Space to support the development of a schema-agnostic query mechanism.
Different Semantic Perspectives for Question Answering SystemsAndre Freitas
Question Answering systems define one of the most complex tasks in computational semantics. The intrinsic complexity of the QA task allows researchers of QA systems to investigate and explore different perspectives of semantics. However, this complexity also induces a bias towards a systems perspective, where researchers are alienated from a deeper reasoning on the semantic principles that are in place within the different components of the system. In this talk we will explore the semantic challenges, principles and perspectives behind the components of QA systems, aiming at providing a principled map and overview on the contribution of each component within the QA semantic interpretation goal.
Language Models for Information RetrievalDustin Smith
The document provides background information on Christopher Manning, Prabhakar Raghavan, and Hinrich Schutze, who are authors of the book "Introduction to Information Retrieval: Language models for information retrieval". It then outlines the presentation which discusses language models for information retrieval, including query likelihood models, estimating query generation probabilities, and experiments comparing language modeling approaches to other IR techniques.
Word Tagging with Foundational Ontology ClassesAndre Freitas
Semantic annotation is fundamental to deal with large-scale
lexical information, mapping the information to an enumerable set of
categories over which rules and algorithms can be applied, and foundational
ontology classes can be used as a formal set of categories for
such tasks. A previous alignment between WordNet noun synsets and
DOLCE provided a starting point for ontology-based annotation, but in
NLP tasks verbs are also of substantial importance. This work presents
an extension to the WordNet-DOLCE noun mapping, aligning verbs according
to their links to nouns denoting perdurants, transferring to the
verb the DOLCE class assigned to the noun that best represents that
verb’s occurrence. To evaluate the usefulness of this resource, we implemented
a foundational ontology-based semantic annotation framework,
that assigns a high-level foundational category to each word or phrase
in a text, and compared it to a similar annotation tool, obtaining an
increase of 9.05% in accuracy.
Schema-Agnostic Queries (SAQ-2015): Semantic Web ChallengeAndre Freitas
The document describes the Schema-Agnostic Queries (SAQ-2015) challenge which aims to support querying over complex databases without needing to understand the underlying schema. It presents a test collection for evaluating systems on their ability to answer schema-agnostic queries over linked data. The test collection contains schema-agnostic queries mapped to equivalent SPARQL queries along with categorized mappings. An analysis finds the test collection expresses a variety of mapping types, data model mappings, compositional mappings, and query operations.
Semantics at Scale: A Distributional ApproachAndre Freitas
1) The document discusses using distributional semantics to build robust semantic models that can handle large amounts of data and enable semantic computing at scale.
2) It describes how distributional semantic models can be used to represent word meanings based on their linguistic contexts, allowing semantic knowledge bases to be automatically constructed from large text corpora.
3) The author proposes a schema-agnostic approach using distributional semantics to enable querying databases without prior knowledge of schemas, addressing problems of vocabulary and structural differences between queries and data.
The review and development work described in this report focuses on the aspects of semantic linking and annotation particularly relevant to ARIADNE. Semantic linking within ARIADNE is considered within the spatial, temporal and subject dimensions. The subject dimension is considered in depth, starting with a review of linking tools considered relevant to ARIADNE followed by a discussion of the ARIADNE approach and the vocabulary mapping tools used within ARIADNE. The Getty AAT proved an appropriate vocabulary mapping hub that afforded a multilingual search capability in the ARIADNE Portal via the semantic enrichment of partner subject metadata with derived AAT concepts.
A case study conducted an exploratory investigation of the semantic integration of extracts from archaeological datasets with information extracted via NLP across different languages. The
investigation followed a broad theme relating to wooden material including shipwrecks, with a focus on types of wooden material, samples taken, wooden objects with dating from dendrochronological analysis, etc. The Demonstrator is available for general use. The user is shielded from the complexity
of the underlying semantic framework (based on the CIDOC CRM and Getty AAT) by the Web application user interface. The Demonstrator highlights the potential for archaeological research that can interrogate grey literature reports in conjunction with datasets. Queries concern wooden objects (e.g. samples of beech wood keels), optionally from a given date range, with automatic expansion over hierarchies of wood types.
Authors:
Douglas Tudhope (USW)
Ceri Binding (USW)
D15.3. Ver:1 (Final)
Tutorial - Introduction to Rule Technologies and SystemsAdrian Paschke
Tutorial at Semantic Web Applications and Tools for the Life Sciences (SWAT4LS 2014), 9-11 Dec., Berlin, Germany
http://www.swat4ls.org/workshops/berlin2014/
Schema-agnositc queries over large-schema databases: a distributional semanti...Andre Freitas
This document provides an overview and summary of André Freitas' PhD thesis defense presentation on schema-agnostic queries for large schema databases using distributional semantics. The presentation motivates the need for schema-agnostic queries due to the rise of very large and dynamic database schemas. It proposes using distributional semantics to provide an accurate, comprehensive and low maintenance approach to cope with semantic heterogeneity in schema-agnostic queries. The key aspects of the approach include semantic pivoting to reduce semantic complexity, distributional semantic models to enable semantic matching, and a hybrid distributional-relational semantic model called τ-Space to support the development of a schema-agnostic query mechanism.
Different Semantic Perspectives for Question Answering SystemsAndre Freitas
Question Answering systems define one of the most complex tasks in computational semantics. The intrinsic complexity of the QA task allows researchers of QA systems to investigate and explore different perspectives of semantics. However, this complexity also induces a bias towards a systems perspective, where researchers are alienated from a deeper reasoning on the semantic principles that are in place within the different components of the system. In this talk we will explore the semantic challenges, principles and perspectives behind the components of QA systems, aiming at providing a principled map and overview on the contribution of each component within the QA semantic interpretation goal.
Language Models for Information RetrievalDustin Smith
The document provides background information on Christopher Manning, Prabhakar Raghavan, and Hinrich Schutze, who are authors of the book "Introduction to Information Retrieval: Language models for information retrieval". It then outlines the presentation which discusses language models for information retrieval, including query likelihood models, estimating query generation probabilities, and experiments comparing language modeling approaches to other IR techniques.
Word Tagging with Foundational Ontology ClassesAndre Freitas
Semantic annotation is fundamental to deal with large-scale
lexical information, mapping the information to an enumerable set of
categories over which rules and algorithms can be applied, and foundational
ontology classes can be used as a formal set of categories for
such tasks. A previous alignment between WordNet noun synsets and
DOLCE provided a starting point for ontology-based annotation, but in
NLP tasks verbs are also of substantial importance. This work presents
an extension to the WordNet-DOLCE noun mapping, aligning verbs according
to their links to nouns denoting perdurants, transferring to the
verb the DOLCE class assigned to the noun that best represents that
verb’s occurrence. To evaluate the usefulness of this resource, we implemented
a foundational ontology-based semantic annotation framework,
that assigns a high-level foundational category to each word or phrase
in a text, and compared it to a similar annotation tool, obtaining an
increase of 9.05% in accuracy.
Schema-Agnostic Queries (SAQ-2015): Semantic Web ChallengeAndre Freitas
The document describes the Schema-Agnostic Queries (SAQ-2015) challenge which aims to support querying over complex databases without needing to understand the underlying schema. It presents a test collection for evaluating systems on their ability to answer schema-agnostic queries over linked data. The test collection contains schema-agnostic queries mapped to equivalent SPARQL queries along with categorized mappings. An analysis finds the test collection expresses a variety of mapping types, data model mappings, compositional mappings, and query operations.
Semantics at Scale: A Distributional ApproachAndre Freitas
1) The document discusses using distributional semantics to build robust semantic models that can handle large amounts of data and enable semantic computing at scale.
2) It describes how distributional semantic models can be used to represent word meanings based on their linguistic contexts, allowing semantic knowledge bases to be automatically constructed from large text corpora.
3) The author proposes a schema-agnostic approach using distributional semantics to enable querying databases without prior knowledge of schemas, addressing problems of vocabulary and structural differences between queries and data.
The review and development work described in this report focuses on the aspects of semantic linking and annotation particularly relevant to ARIADNE. Semantic linking within ARIADNE is considered within the spatial, temporal and subject dimensions. The subject dimension is considered in depth, starting with a review of linking tools considered relevant to ARIADNE followed by a discussion of the ARIADNE approach and the vocabulary mapping tools used within ARIADNE. The Getty AAT proved an appropriate vocabulary mapping hub that afforded a multilingual search capability in the ARIADNE Portal via the semantic enrichment of partner subject metadata with derived AAT concepts.
A case study conducted an exploratory investigation of the semantic integration of extracts from archaeological datasets with information extracted via NLP across different languages. The
investigation followed a broad theme relating to wooden material including shipwrecks, with a focus on types of wooden material, samples taken, wooden objects with dating from dendrochronological analysis, etc. The Demonstrator is available for general use. The user is shielded from the complexity
of the underlying semantic framework (based on the CIDOC CRM and Getty AAT) by the Web application user interface. The Demonstrator highlights the potential for archaeological research that can interrogate grey literature reports in conjunction with datasets. Queries concern wooden objects (e.g. samples of beech wood keels), optionally from a given date range, with automatic expansion over hierarchies of wood types.
Authors:
Douglas Tudhope (USW)
Ceri Binding (USW)
D15.3. Ver:1 (Final)
Tutorial - Introduction to Rule Technologies and SystemsAdrian Paschke
Tutorial at Semantic Web Applications and Tools for the Life Sciences (SWAT4LS 2014), 9-11 Dec., Berlin, Germany
http://www.swat4ls.org/workshops/berlin2014/
We use metadata of various kind to improve and enrich text document clustering using an extension of Latent Dirichlet Allocation (LDA). The methods are fully implemented, evaluated and software is available on github.
These are the slides of an invited talk I gave September 8 at the Alexandria Workshop of TPDL-2016: http://alexandria-project.eu/events/3rd-workshop/
Linked Open Data to support content based Recommender SystemsVito Ostuni
This document proposes using Linked Open Data to support content-based recommender systems by providing rich item descriptions. It describes the main drawback of traditional content-based recommender systems as their limited ability to analyze content. A vector space model is then adapted to represent RDF graphs from Linked Open Data as vectors, allowing item similarity to be computed based on semantic relationships. An evaluation of this approach using MovieLens data demonstrates improvements in recommendation precision and recall.
- What is Clustering, Honeypots and Density Based Clustering?
- What is Optics Clustering and how is it different than DB Clustering? …and how
can it be used for outlier detection.
- What is so-called soft clustering and how is it different than clustering? …and how
can it be used for outlier detection.
The document summarizes the Named Entity Extraction and Linking (NEEL) challenge held at WWW2015. The NEEL challenge aimed to explore new approaches for recognizing and linking named entities in microposts (short social media posts). 21 teams participated in the challenge involving recognizing named entities and linking them to entries in the DBpedia knowledge base. The winning team, Ousia, achieved an overall score of 0.8067 by accurately recognizing and linking named entities in tweets.
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Daniel Valcarce
Slides of the presentation given at the Doctoral Symposium of ACM RecSys 2015. The paper is entitled:
Daniel Valcarce: Exploring Statistical Language Models for Recommender Systems. RecSys 2015: 375-378
http://doi.acm.org/10.1145/2792838.2796547
This report discusses three submissions based on the Duet architecture to the Deep Learning track at TREC 2019. For the document retrieval task, we adapt the Duet model to ingest a "multiple field" view of documents—we refer to the new architecture as Duet with Multiple Fields (DuetMF). A second submission combines the DuetMF model with other neural and traditional relevance estimators in a learning-to-rank framework and achieves improved performance over the DuetMF baseline. For the passage retrieval task, we submit a single run based on an ensemble of eight Duet models.
Graph-to-Text Generation and its Applications to DialogueJinho Choi
The document discusses graph-to-text generation and its applications to dialogue systems. It provides an overview of current approaches to graph-to-text including rule-based, statistical, sequence-to-sequence and graph-to-sequence models. Recent advances use pretrained language models and graph neural networks. While current systems show promise, they still struggle with omissions, repetitions and unnatural language. The document proposes two threads of future work: exploring graph-to-text on dialogue data and implementing model improvements.
The World Wide Web is moving from a Web of hyper-linked documents to a Web of linked data. Thanks to the Semantic Web technological stack and to the more recent Linked Open Data (LOD) initiative, a vast amount of RDF data have been published in freely accessible datasets connected with each other to form the so called LOD cloud. As of today, we have tons of RDF data available in the Web of Data, but only a few applications really exploit their potential power. The availability of such data is for sure an opportunity to feed personalized information access tools such as recommender systems. We will show how to plug Linked Open Data in a recommendation engine in order to build a new generation of LOD-enabled applications.
(Lecture given @ the 11th Reasoning Web Summer School - Berlin - August 1, 2015)
Data Tactics Analytics Brown Bag (Aug 22, 2013)Rich Heimann
This document provides an overview and agenda for a brown bag presentation on analytics services. The presentation includes introductions of the analytics team, discussions of why analytics are important both for business and practical reasons, and case studies of identifying smugglers and analyzing text data. The presentation emphasizes a philosophy of not being "data agnostic" and using modes of inquiry like induction and abduction rather than deduction.
WP3 Further specification of Functionality and Interoperability - GradmannEuropeana
The document discusses issues and recommendations for Work Group 3.2 on semantic and multilingual aspects of the Europeana digital library. Key points include:
- Europeana surrogates need rich semantic context in areas like place, time, people and concepts.
- The types of links between surrogates and semantic nodes, as well as the semantic technologies used, need to be determined.
- Support for multiple European languages in areas like search queries, results and functionality is important but requires further scope definition and identification of language resources.
We extend RDF with the ability to represent property values that exist, but are unknown or partially known, using constraints. Following ideas from the incomplete information literature, we develop a semantics for this extension of RDF, called RDFi, and study SPARQL query evaluation in this framework.
Tensor Networks and Their Applications on Machine LearningKwan-yuet Ho
This document provides a biography and background of Kwan-Yuet "Stephen" Ho, a data scientist at Leidos, and then summarizes his presentation on tensor networks and their applications in machine learning. Ho has a PhD in physics from the University of Maryland and has worked as a research scientist and machine learning engineer. The presentation defines tensor networks as a mathematical tool from quantum many-body theory that can efficiently represent many-body wavefunctions. It discusses how tensor networks are useful for constructing machine learning algorithms and provides examples of their applications in supervised and unsupervised learning.
Machine Learning Methods for Analysing and Linking RDF DataJens Lehmann
Invited Talk at the 8th International Conference on Scalable Uncertainty Management (SUM)
The talk outlines applications of supervised structured machine learning and presents a specific refinement operator based approach for RDF/OWL. It also outlines how similar ideas can be used in other (formal) languages, in particular link specifications.
This document provides an overview and summary of Stephen Ho's talk on machine learning. It discusses his background in theoretical physics and transition to machine learning engineering. The talk outlines different machine learning algorithms like supervised, unsupervised, and reinforcement learning. It covers feature analysis techniques including embeddings. Finally, it discusses the importance of production monitoring for machine learning systems and what aspects should be monitored like code, models, data, runtime metrics, errors and environment.
A fundamental goal of search engines is to identify, given a query, documents that have relevant text. This is intrinsically difficult because the query and the document may use different vocabulary, or the document may contain query words without being relevant. We investigate neural word embeddings as a source of evidence in document ranking. We train a word2vec embedding model on a large unlabelled query corpus, but in contrast to how the model is commonly used, we retain both the input and the output projections, allowing us to leverage both the embedding spaces to derive richer distributional relationships. During ranking we map the query words into the input space and the document words into the output space, and compute a query-document relevance score by aggregating the cosine similarities across all the query-document word pairs.
We postulate that the proposed Dual Embedding Space Model (DESM) captures evidence on whether a document is about a query term in addition to what is modelled by traditional term-frequency based approaches. Our experiments show that the DESM can re-rank top documents returned by a commercial Web search engine, like Bing, better than a term-matching based signal like TF-IDF. However, when ranking a larger set of candidate documents, we find the embeddings-based approach is prone to false positives, retrieving documents that are only loosely related to the query. We demonstrate that this problem can be solved effectively by ranking based on a linear mixture of the DESM and the word counting features.
5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra
Slides from my keynote talk at the Recherche d'Information SEmantique (RISE) workshop at CORIA-TALN 2018 conference in Rennes, France.
(Abstract)
Neural Information Retrieval (or neural IR) is the application of shallow or deep neural networks to IR tasks. Unlike classical IR models, these machine learning (ML) based approaches are data-hungry, requiring large scale training data before they can be deployed. Traditional learning to rank models employ supervised ML techniques—including neural networks—over hand-crafted IR features. By contrast, more recently proposed neural models learn representations of language from raw text that can bridge the gap between the query and the document vocabulary.
Neural IR is an emerging field and research publications in the area has been increasing in recent years. While the community explores new architectures and training regimes, a new set of challenges, opportunities, and design principles are emerging in the context of these new IR models. In this talk, I will share five lessons learned from my personal research in the area of neural IR. I will present a framework for discussing different unsupervised approaches to learning latent representations of text. I will cover several challenges to learning effective text representations for IR and discuss how latent space models should be combined with observed feature spaces for better retrieval performance. Finally, I will conclude with a few case studies that demonstrates the application of neural approaches to IR that go beyond text matching.
This document summarizes a paper on keyword-driven SPARQL query generation using background knowledge. It presents an approach that maps keywords to entities, ranks them, identifies relevant graph patterns, and generates SPARQL queries. The approach was evaluated on accuracy metrics such as recall, fuzzy precision and F-score. Results showed higher accuracy for queries finding instances' characteristics or associations compared to those finding similar instances. Future work to improve the approach is also discussed.
The document discusses a neural model called Duet for ranking documents based on their relevance to a query. Duet uses both a local model that operates on exact term matches between queries and documents, and a distributed model that learns embeddings to match queries and documents in the embedding space. The two models are combined using a linear combination and trained jointly on labeled query-document pairs. Experimental results show Duet performs significantly better at document ranking and other IR tasks compared to using the local and distributed models individually. The amount of training data is also important, with larger datasets needed to learn better representations.
Slides related to the papers:
- Fionda V., Gutierrez C., Pirrò G. "Knowledge Maps of Web Graphs". n the proceedings of the “14th International Conference on Principles of Knowledge Representation and Reasoning (KR2014)”.
- Fionda V., Pirrò G., Gutierrez C. "The Map Generator Tool". Demo session at the “13th International Semantic Web Conference (ISWC2014)”.
We use metadata of various kind to improve and enrich text document clustering using an extension of Latent Dirichlet Allocation (LDA). The methods are fully implemented, evaluated and software is available on github.
These are the slides of an invited talk I gave September 8 at the Alexandria Workshop of TPDL-2016: http://alexandria-project.eu/events/3rd-workshop/
Linked Open Data to support content based Recommender SystemsVito Ostuni
This document proposes using Linked Open Data to support content-based recommender systems by providing rich item descriptions. It describes the main drawback of traditional content-based recommender systems as their limited ability to analyze content. A vector space model is then adapted to represent RDF graphs from Linked Open Data as vectors, allowing item similarity to be computed based on semantic relationships. An evaluation of this approach using MovieLens data demonstrates improvements in recommendation precision and recall.
- What is Clustering, Honeypots and Density Based Clustering?
- What is Optics Clustering and how is it different than DB Clustering? …and how
can it be used for outlier detection.
- What is so-called soft clustering and how is it different than clustering? …and how
can it be used for outlier detection.
The document summarizes the Named Entity Extraction and Linking (NEEL) challenge held at WWW2015. The NEEL challenge aimed to explore new approaches for recognizing and linking named entities in microposts (short social media posts). 21 teams participated in the challenge involving recognizing named entities and linking them to entries in the DBpedia knowledge base. The winning team, Ousia, achieved an overall score of 0.8067 by accurately recognizing and linking named entities in tweets.
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Daniel Valcarce
Slides of the presentation given at the Doctoral Symposium of ACM RecSys 2015. The paper is entitled:
Daniel Valcarce: Exploring Statistical Language Models for Recommender Systems. RecSys 2015: 375-378
http://doi.acm.org/10.1145/2792838.2796547
This report discusses three submissions based on the Duet architecture to the Deep Learning track at TREC 2019. For the document retrieval task, we adapt the Duet model to ingest a "multiple field" view of documents—we refer to the new architecture as Duet with Multiple Fields (DuetMF). A second submission combines the DuetMF model with other neural and traditional relevance estimators in a learning-to-rank framework and achieves improved performance over the DuetMF baseline. For the passage retrieval task, we submit a single run based on an ensemble of eight Duet models.
Graph-to-Text Generation and its Applications to DialogueJinho Choi
The document discusses graph-to-text generation and its applications to dialogue systems. It provides an overview of current approaches to graph-to-text including rule-based, statistical, sequence-to-sequence and graph-to-sequence models. Recent advances use pretrained language models and graph neural networks. While current systems show promise, they still struggle with omissions, repetitions and unnatural language. The document proposes two threads of future work: exploring graph-to-text on dialogue data and implementing model improvements.
The World Wide Web is moving from a Web of hyper-linked documents to a Web of linked data. Thanks to the Semantic Web technological stack and to the more recent Linked Open Data (LOD) initiative, a vast amount of RDF data have been published in freely accessible datasets connected with each other to form the so called LOD cloud. As of today, we have tons of RDF data available in the Web of Data, but only a few applications really exploit their potential power. The availability of such data is for sure an opportunity to feed personalized information access tools such as recommender systems. We will show how to plug Linked Open Data in a recommendation engine in order to build a new generation of LOD-enabled applications.
(Lecture given @ the 11th Reasoning Web Summer School - Berlin - August 1, 2015)
Data Tactics Analytics Brown Bag (Aug 22, 2013)Rich Heimann
This document provides an overview and agenda for a brown bag presentation on analytics services. The presentation includes introductions of the analytics team, discussions of why analytics are important both for business and practical reasons, and case studies of identifying smugglers and analyzing text data. The presentation emphasizes a philosophy of not being "data agnostic" and using modes of inquiry like induction and abduction rather than deduction.
WP3 Further specification of Functionality and Interoperability - GradmannEuropeana
The document discusses issues and recommendations for Work Group 3.2 on semantic and multilingual aspects of the Europeana digital library. Key points include:
- Europeana surrogates need rich semantic context in areas like place, time, people and concepts.
- The types of links between surrogates and semantic nodes, as well as the semantic technologies used, need to be determined.
- Support for multiple European languages in areas like search queries, results and functionality is important but requires further scope definition and identification of language resources.
We extend RDF with the ability to represent property values that exist, but are unknown or partially known, using constraints. Following ideas from the incomplete information literature, we develop a semantics for this extension of RDF, called RDFi, and study SPARQL query evaluation in this framework.
Tensor Networks and Their Applications on Machine LearningKwan-yuet Ho
This document provides a biography and background of Kwan-Yuet "Stephen" Ho, a data scientist at Leidos, and then summarizes his presentation on tensor networks and their applications in machine learning. Ho has a PhD in physics from the University of Maryland and has worked as a research scientist and machine learning engineer. The presentation defines tensor networks as a mathematical tool from quantum many-body theory that can efficiently represent many-body wavefunctions. It discusses how tensor networks are useful for constructing machine learning algorithms and provides examples of their applications in supervised and unsupervised learning.
Machine Learning Methods for Analysing and Linking RDF DataJens Lehmann
Invited Talk at the 8th International Conference on Scalable Uncertainty Management (SUM)
The talk outlines applications of supervised structured machine learning and presents a specific refinement operator based approach for RDF/OWL. It also outlines how similar ideas can be used in other (formal) languages, in particular link specifications.
This document provides an overview and summary of Stephen Ho's talk on machine learning. It discusses his background in theoretical physics and transition to machine learning engineering. The talk outlines different machine learning algorithms like supervised, unsupervised, and reinforcement learning. It covers feature analysis techniques including embeddings. Finally, it discusses the importance of production monitoring for machine learning systems and what aspects should be monitored like code, models, data, runtime metrics, errors and environment.
A fundamental goal of search engines is to identify, given a query, documents that have relevant text. This is intrinsically difficult because the query and the document may use different vocabulary, or the document may contain query words without being relevant. We investigate neural word embeddings as a source of evidence in document ranking. We train a word2vec embedding model on a large unlabelled query corpus, but in contrast to how the model is commonly used, we retain both the input and the output projections, allowing us to leverage both the embedding spaces to derive richer distributional relationships. During ranking we map the query words into the input space and the document words into the output space, and compute a query-document relevance score by aggregating the cosine similarities across all the query-document word pairs.
We postulate that the proposed Dual Embedding Space Model (DESM) captures evidence on whether a document is about a query term in addition to what is modelled by traditional term-frequency based approaches. Our experiments show that the DESM can re-rank top documents returned by a commercial Web search engine, like Bing, better than a term-matching based signal like TF-IDF. However, when ranking a larger set of candidate documents, we find the embeddings-based approach is prone to false positives, retrieving documents that are only loosely related to the query. We demonstrate that this problem can be solved effectively by ranking based on a linear mixture of the DESM and the word counting features.
5 Lessons Learned from Designing Neural Models for Information RetrievalBhaskar Mitra
Slides from my keynote talk at the Recherche d'Information SEmantique (RISE) workshop at CORIA-TALN 2018 conference in Rennes, France.
(Abstract)
Neural Information Retrieval (or neural IR) is the application of shallow or deep neural networks to IR tasks. Unlike classical IR models, these machine learning (ML) based approaches are data-hungry, requiring large scale training data before they can be deployed. Traditional learning to rank models employ supervised ML techniques—including neural networks—over hand-crafted IR features. By contrast, more recently proposed neural models learn representations of language from raw text that can bridge the gap between the query and the document vocabulary.
Neural IR is an emerging field and research publications in the area has been increasing in recent years. While the community explores new architectures and training regimes, a new set of challenges, opportunities, and design principles are emerging in the context of these new IR models. In this talk, I will share five lessons learned from my personal research in the area of neural IR. I will present a framework for discussing different unsupervised approaches to learning latent representations of text. I will cover several challenges to learning effective text representations for IR and discuss how latent space models should be combined with observed feature spaces for better retrieval performance. Finally, I will conclude with a few case studies that demonstrates the application of neural approaches to IR that go beyond text matching.
This document summarizes a paper on keyword-driven SPARQL query generation using background knowledge. It presents an approach that maps keywords to entities, ranks them, identifies relevant graph patterns, and generates SPARQL queries. The approach was evaluated on accuracy metrics such as recall, fuzzy precision and F-score. Results showed higher accuracy for queries finding instances' characteristics or associations compared to those finding similar instances. Future work to improve the approach is also discussed.
The document discusses a neural model called Duet for ranking documents based on their relevance to a query. Duet uses both a local model that operates on exact term matches between queries and documents, and a distributed model that learns embeddings to match queries and documents in the embedding space. The two models are combined using a linear combination and trained jointly on labeled query-document pairs. Experimental results show Duet performs significantly better at document ranking and other IR tasks compared to using the local and distributed models individually. The amount of training data is also important, with larger datasets needed to learn better representations.
Slides related to the papers:
- Fionda V., Gutierrez C., Pirrò G. "Knowledge Maps of Web Graphs". n the proceedings of the “14th International Conference on Principles of Knowledge Representation and Reasoning (KR2014)”.
- Fionda V., Pirrò G., Gutierrez C. "The Map Generator Tool". Demo session at the “13th International Semantic Web Conference (ISWC2014)”.
- Ancestral Causal Inference (ACI) is a new causal discovery method that formulates causal inference as an optimization problem over ancestral structures rather than direct causal relations.
- This reduces the search space drastically, making ACI much more scalable than existing methods while maintaining accuracy.
- ACI also introduces a method for scoring the confidence in predicted causal statements based on the optimization problem.
This document describes a new method called Ancestral Causal Inference (ACI) for learning causal relationships from data. ACI uses a more coarse-grained representation called an ancestral structure, which represents indirect rather than direct causal relations. This reduces the search space significantly compared to existing methods. ACI formulates causal inference as an optimization problem to find the ancestral structure that minimizes loss on statistical (in)dependencies provided as input. Rules are defined for reasoning about (in)dependencies under ancestral structures. Evaluation on simulated and real biological data shows ACI is as accurate as state-of-the-art methods but much more scalable to large problem sizes.
NIPS2010: optimization algorithms in machine learningzukun
The document summarizes optimization algorithms for machine learning applications. It discusses first-order methods like gradient descent, accelerated methods like Nesterov's algorithm, and non-monotone methods like Barzilai-Borwein. Gradient descent converges at a rate of 1/k, while methods like heavy-ball, conjugate gradient, and Nesterov's algorithm can achieve faster linear or 1/k^2 convergence rates depending on the problem structure. The document provides convergence analysis and rate results for various first-order optimization algorithms applied to machine learning problems.
Alexis Ohanian talks about the early days of reddit at Mass ChallengeAlexis Ohanian
The document discusses the early days of Reddit and provides advice for starting a new business or project. It mentions that co-founders are important, to look for opportunities to do something awesome, to respond quickly to calls and emails as a human, and that the element of surprise is important. It also notes that early adopters are key, but that initially no one or just your mom wants to use the website, so you need evangelists. It advises to make something people want and love, give them a reason to talk about it, know competitors but don't care, and to launch, improve and repeat.
Talk: Joint causal inference on observational and experimental data - NIPS 20...Sara Magliacane
This document discusses methods for performing joint causal inference using both observational and experimental datasets. It proposes modeling the datasets within a single causal graph framework using additional dummy variables to represent experimental conditions. This allows applying constraint-based causal discovery methods, but requires modifications to handle violations of the causal faithfulness assumption that can occur due to deterministic relationships between variables. A strategy is presented that rephrases the constraint-based approach in terms of d-separations and derives sound conditional independence relationships to partially address these violations. The goal is to reconstruct the causal graph representing all datasets from independence test results.
The document summarizes a study that surveyed 130 newly admitted undergraduate teacher education students about their views on parent involvement in education. The survey aimed to understand students' memories of their own families' school involvement and how they conceptualize the roles of parents and teachers. It found that students viewed parent knowledge as long-term and individual while teacher knowledge was seen as professional and unbiased. Students anticipated doing more school-based parent involvement like conferences rather than community activities. The authors advocate giving greater attention to families in teacher education programs.
HR / Talent Analytics orientation given as a guest lecture at Management Institute for Leadership and Excellence (MILE), Pune. This presentation covers aspects like:
1. Core concepts, terminologies & buzzwords
- Business Intelligence, Analytics
- Big Data, Cloud, SaaS
2. Analytics
- Types, Domains, Tools…
3. HR Analytics
- Why? What is measured?
- How? Predictive possibilities…
4. Case studies
5. HR Analytics org structure & delivery model
This document discusses clustering of RDF data across the Semantic Web. It begins by describing the Linking Open Data project and the growing amount of RDF data available. It then discusses the motivations for clustering RDF data, such as improving data access and query response times over distributed machines. Current approaches to RDF clustering are also summarized, including extracting instance subgraphs and computing distances between instances. The document outlines different techniques for instance extraction and distance computation in RDF clustering.
Over the last years, the Semantic Web has been growing steadily. Today, we count more than 10,000 datasets made available online following Semantic Web standards. Nevertheless, many applications, such as data integration, search, and interlinking, may not take the full advantage of the data without having a priori statistical information about its internal structure and coverage. In fact, there are already a number of tools, which offer such statistics, providing basic information about RDF datasets and vocabularies. However, those usually show severe deficiencies in terms of performance once the dataset size grows beyond the capabilities of a single machine. In this paper, we introduce a software component for statistical calculations of large RDF datasets, which scales out to clusters of machines. More specifically, we describe the first distributed inmemory approach for computing 32 different statistical criteria for RDF datasets using Apache Spark. The preliminary results show that our distributed approach improves upon a previous centralized approach we compare against and provides approximately linear horizontal scale-up. The criteria are extensible beyond the 32 default criteria, is integrated into the larger SANSA framework and employed in at least four major usage scenarios beyond the SANSA community.
The advent of social networks has changed the research in computer science. Now, the massive volume of data has present in the form of twitter, facebook, emails, IOT (Internet of Things). So, the storage and analysis of these data has become great challenge for researchers. Traditional frameworks have failed for the processing of large data. R is open source programming framework developed for the analysis of large data results better accuracy. It also gives the opportunity of the implementation in R programming language. In this paper, a study on the use of R for the classification of large social network data. Naïve Bayes algorithm is used for the classification of large twitter data. The experiment has shown that enormous amount of data can be sufficiently classified using the R framework with promising results.
Developing Linked Data and Semantic Web-based Applications (Expotec 2015)Ig Bittencourt
The document discusses developing Linked Data and Semantic Web applications. It begins with key concepts related to Linked Data, the Semantic Web, and applications. It then describes two key steps in developing such applications: publishing data as Linked Data and consuming Linked Data to build applications. Examples are provided of extracting, enriching, and linking different datasets to build a real estate recommendation application that performs semantic searches over the integrated data. Ontologies are created and reused to represent the domains and support interoperability. The document emphasizes integrating the data and software engineering perspectives in developing Semantic Web applications.
Information access over linked data requires to determine
subgraph(s), in linked data's underlying graph, that correspond to the required information need. Usually, an information access framework is able to retrieve richer information by checking of a large number of possible subgraphs. However, on the ecking of a large number of possible subgraphs increases information access complexity. This makes information access frameworks less eective. A large number of contemporary linked data information access frameworks reduce the complexity by introducing dierent heuristics but they suer on retrieving richer information. Or, some frameworks do not care about the complexity. However, a practically usable framework should retrieve richer information with lower complexity. In linked data information access, we hypothesize that pre-processed data statistics of linked data can be used to eciently check a large number of possible subgraphs. This will help to retrieve comparatively richer information with lower data access complexity. Preliminary evaluation of our proposed hypothesis shows promising performance.
The increasing amount of valuable semi-structured data has become available online. In this talk, we overview the state of the art in entity ranking over structured data ("linked data").
We present a solution to learn URI selection criteria in order to improve the crawling of Linked Open Data by predicting their RDF-relevance. The prediction component is able to predict whether a newly discovered URI contains RDF content or not by extracting features from several sources and building a prediction model based on FTRL-proximal online learning algorithm. The experimental results demonstrate that the coverage of the crawl is improved compared to baseline methods.
Transient and persistent RDF views over relational databases in the context o...Nikolaos Konstantinou
As far as digital repositories are concerned, numerous benefits emerge from the disposal of their contents as Linked Open Data (LOD). This leads more and more repositories towards this direction. However, several factors need to be taken into account in doing so, among which is whether the transition needs to be materialized in real-time or in asynchronous time intervals. In this paper we provide the problem framework in the context of digital repositories, we discuss the benefits and drawbacks of both approaches and draw our conclusions after evaluating a set of performance measurements. Overall, we argue that in contexts with infrequent data updates, as is the case with digital repositories, persistent RDF views are more efficient than real-time SPARQL-to-SQL rewriting systems in terms of query response times, especially when expensive SQL queries are involved.
Make our Scientific Datasets Accessible and Interoperable on the WebFranck Michel
The presentation investigates the challenges that we must face to share scientific datasets on the Web following the Linked Open Data principles. We present the standards of the Semantic Web and investigate how they can help address those challenges. We give tips as to how to choose vocabularies to describe data and metadata, link datasets to other related datasets by making appropriate alignments, translate existing data sources to RDF and publish it on the Web as linked data.
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...FedorNikolaev
In this work, we propose a novel retrieval model that incorporates term dependencies into structured document retrieval and apply it to the task of ERWD. In the proposed model, the document field weights and the relative importance of unigrams and bigrams are optimized with respect to the target retrieval metric using a learning-to-rank method.
This document outlines the goals and tangible outcomes of the LinkedUp project. The project aims to demonstrate success stories of open web data applications, establish an evaluation framework for such applications, and facilitate technology transfer in the education sector. It will run a multi-stage challenge and establish a large-scale data testbed. The challenge aims to produce highly innovative and evaluated applications using web-scale data to address educational scenarios. The project expects to increase collaboration, disseminate best practices, and raise awareness of open web data through technology transfer activities.
Efficient Query Answering against Dynamic RDF DatabasesAlexandra Roatiș
The document describes efficient query answering against dynamic RDF databases. It discusses RDF as a graph-based data model and standard, blank nodes, RDF Schema (RDFS) for semantic constraints, the open-world assumption and RDF entailment through implicit triples and saturation. It also covers basic graph pattern (BGP) queries in SPARQL and the need to decouple RDF entailment from query evaluation through data saturation or query reformulation to obtain complete query answers.
On the Web, the amount of structured and Linked Data about entities is constantly growing. Descriptions of single entities often include thousands of statements and it becomes difficult to comprehend the data, unless a selection of the most relevant facts is provided. This doctoral thesis addresses the problem of Linked Data entity summarization. The contributions involve two entity summarization approaches, a common API for entity summarization, and an approach for entity data fusion.
What Factors Influence the Design of a Linked Data Generation Algorithm?andimou
Generating Linked Data remains a complicated and intensive engineering process. While different factors determine how a Linked Data generation algorithm is designed, potential alternatives for each factor are currently not considered when designing the tools’ underlying algorithms. Certain design patterns are frequently ap- plied across different tools, covering certain alternatives of a few of these factors, whereas other alternatives are never explored. Consequently, there are no adequate tools for Linked Data generation for certain occasions, or tools with inadequate and inefficient algorithms are chosen. In this position paper, we determine such factors, based on our experiences, and present a preliminary list. These factors could be considered when a Linked Data generation algorithm is designed or a tool is chosen. We investigated which factors are covered by widely known Linked Data generation tools and concluded that only certain design patterns are frequently encountered. By these means, we aim to point out that Linked Data generation is above and beyond bare implementations, and algorithms need to be thoroughly and systematically studied and exploited.
From Exploratory Search to Web Search and back - PIKM 2010Roku
The power of search is with no doubt one of the main aspects for the success of the Web. Currently available search engines on the Web allow to return results with a high precision. Nevertheless, if we limit our attention only to lookup search we are missing another important search task. In exploratory search, the user is willing not only to find documents relevant with respect to her query but she is also interested in learning, discovering and understanding novel knowledge on complex and sometimes unknown topics.
In the paper we address this issue presenting LED, a web based system that aims to improve (lookup) Web search by enabling users to properly explore knowledge associated to her query. We rely on DBpedia to explore the semantics of keywords within the query thus suggesting potentially interesting related topics/keywords to the user.
This document discusses Timbuctoo, an application designed for academic research that allows for complex and heterogeneous data. It explores archiving RDF datasets from Timbuctoo instances, including handling RDF graphs and triples, versioning datasets, and verifying dataset integrity and resolving links. A potential pipeline is proposed to ingest datasets from Timbuctoo into the EASY archive, but current Timbuctoo instances and datasets have obscure URIs and insufficient metadata, and the prototype pipeline lacks specifications. Archiving linked data from Timbuctoo could change the nature of preservation for archives.
Framester: A Wide Coverage Linguistic Linked Data HubMehwish Alam
Framester is a linguistic linked data hub that aims to improve coverage of FrameNet by extending mappings between FrameNet and other resources like WordNet and BabelNet. Framester represents over 40 million triples linking linguistic and factual resources and aligning frames, roles, and types to foundational ontologies. It provides a word frame disambiguation service and was evaluated on annotated corpora, showing improved performance over previous approaches.
Similar to Trust Models for RDF Data: Semantics and Complexity - AAAI2015 (20)
Discover the benefits of outsourcing SEO to Indiadavidjhones387
"Discover the benefits of outsourcing SEO to India! From cost-effective services and expert professionals to round-the-clock work advantages, learn how your business can achieve digital success with Indian SEO solutions.
Ready to Unlock the Power of Blockchain!Toptal Tech
Imagine a world where data flows freely, yet remains secure. A world where trust is built into the fabric of every transaction. This is the promise of blockchain, a revolutionary technology poised to reshape our digital landscape.
Toptal Tech is at the forefront of this innovation, connecting you with the brightest minds in blockchain development. Together, we can unlock the potential of this transformative technology, building a future of transparency, security, and endless possibilities.
Trust Models for RDF Data: Semantics and Complexity - AAAI2015
1. Trust Models for RDF Data: Semantics and Complexity
AAAI 2015
Valeria Fionda1, Gianluigi Greco1
1Department of Mathematics and Computer Science, University of Calabria
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 1 / 28
2. Outline
1 Introduction
2 Trust Framework
3 Complexity
4 Conclusions
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 2 / 28
3. Introduction
Outline
1 Introduction
2 Trust Framework
3 Complexity
4 Conclusions
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 3 / 28
4. Introduction
The Resource Description Framework
Resource
Description
Framework
(RDF)
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 4 / 28
5. Introduction
The Resource Description Framework
Resource
Description
Framework
(RDF)
Resource
Description
Framework
(RDF)
Formalization of the model
Systematic study of the complexity of
entailment
Formalization of the concept of minimal
representation (core) of an RDF graph
Study of the complexity of
core computation
[Gutierrez
et al. 2011]
Foundational
aspects
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 4 / 28
6. Introduction
Extensions of RDF
[G
utierrez, H
urtado,
and
Vaism
an
2007]
Time
Provenance
[Dividino et al. 2009]
Fuzzy
[Straccia 2009]
Trust
[Hartig 2009;
Tomaszuk, Pak,
and Rybinski 2013]
General annotations
[U
drea,Recupero,
and
Subrahm
anian
2010;
Zim
m
erm
ann
etal.2012]
Resource
Description
Framework
(RDF)
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 5 / 28
8. Introduction
Why trust is important?
Due to the openess and decentralization of the Semantic Web, the
presence of incorrect and unreliable RDF data can negatively affect
decision processes and cause economic damages.
By associating trust values to RDF data, some of these issues can be
mitigated.
Having trust values alone is not enough; making reasoning
problems tractable when dealing with the large volume of data
available on the Web is essential.
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 7 / 28
14. Trust Framework
Trustworthiness
The trustworthiness of t is a
value indicating to what extent t is
believed or disbelieved to be true
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 10 / 28
15. Trust Framework
Trustworthiness
The trustworthiness of t is a
value indicating to what extent t is
believed or disbelieved to be true
A trust-enriched RDF graph (short: t-graph) is a pair G, w where G is an
RDF graph, and w is a real-valued trust function such that:
• dom(w) is a set of RDF triples;
• for each t ∈ dom(w), −1≤w(t)≤1 holds. The symbol φ associated
with any triple outside the domain is meant to denote that the trust
value of t is unknown;
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 10 / 28
16. Trust Framework
Trust Function - example
• dom(w) is a set of RDF triples;
• for each t ∈ dom(w), −1≤w(t)≤1 holds.
w((A Time To Kill, author,John Grisham))=0.8
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 11 / 28
17. Trust Framework
Trust Function - example
• dom(w) is a set of RDF triples;
• for each t ∈ dom(w), −1≤w(t)≤1 holds.
w((A Time To Kill, author,John Grisham))=0.8
dom(w) = G {(A Time To Kill, genre, Legal thriller)}
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 12 / 28
18. Trust Framework
Trust Aggregation Functions
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 13 / 28
19. Trust Framework
Trust Aggregation Functions
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 13 / 28
20. Trust Framework
Trust Aggregation Functions
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 13 / 28
21. Trust Framework
Trust Aggregation Functions
φ is a neutral element with respect to f , i.e., f (S) = f (S ∪ {φ})
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 13 / 28
22. Trust Framework
Trust Aggregation Function - example
author
Untitled
author
genre
genre
0.8
0.7
John_Grisham A_Time_To_Kill
Legal_thriller
-0.2
Minimum trust aggregation function
fmin({w1(t) | t∈G1})= min{w1(t) | t∈G1}=-0.2.
Maximum trust aggregation function
fmax({w1(t) | t∈G1})= max{w1(t) | t∈G1}=0.8
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 14 / 28
23. Trust Framework
RDF - Simple Interpratation
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 15 / 28
24. Trust Framework
RDF - Simple Interpratation
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 15 / 28
25. Trust Framework
RDF - Simple Interpratation
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 15 / 28
29. Trust Framework
f-models
Given a trust aggregation function f , a t-interpretation is an f -model if
the value assigned via ¯σ to each interpreted triple is the result of the
function f on the trust values of the RDF triples mapped into it.
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 19 / 28
30. Trust Framework
f-models
Given a trust aggregation function f , a t-interpretation is an f -model if
the value assigned via ¯σ to each interpreted triple is the result of the
function f on the trust values of the RDF triples mapped into it.
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 20 / 28
31. Trust Framework
f -models
Given a trust aggregation function f , a t-interpretation is an f -model if
the value assigned via ¯σ to each interpreted triple is the result of the
function f on the trust values of the RDF triples mapped into it.
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 21 / 28
32. Complexity
Outline
1 Introduction
2 Trust Framework
3 Complexity
4 Conclusions
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 22 / 28
33. Complexity
Trust Aggregation Operators
Model Checking
Complexity
General Acyclic
RDF NP-complete in P
t-RDF NP-complete NP-complete
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 23 / 28
34. Complexity
Trust Aggregation Operators
Model Checking
Complexity
General Acyclic
RDF NP-complete in P
t-RDF NP-complete NP-complete
We focus on trust aggregation functions (f⊕) that can be built on top
of binary trust aggregation operators (⊕):
A trust (aggregation) operator ⊕ is the binary operator of an
Idempotent Commutative Ordered Monoid ([−1, 1] ∪ {φ}, ⊕, ) where
the partial order is defined as v v if, and only if, v ⊕ v = v.
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 23 / 28
35. Complexity
Complexity
Concept Complexity
General Bounded treewidth
RDF
Model Checking NP-complete in P
Entailment NP-complete in P
Core coNP-complete in P
t-RDF
Model Checking NP-complete in P
⊕-Entailment NP-complete in P
⊕-Core coNP-complete in P
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 24 / 28
36. Conclusions
Outline
1 Introduction
2 Trust Framework
3 Complexity
4 Conclusions
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 25 / 28
37. Conclusions
Conclusions
Associating trust value to RDF data can prevent from the use of data
that are not accurate.
Having trust values alone is not enough. Indeed, since RDF is the
backbone of the Semantic Web, making reasoning problems tractable
when dealing with such a large volume of data is essential.
We defined a formal framework (and a prototype system) for
reasoning about trust values and by singling out islands of tractability
for classes of acyclic and nearly-acyclic graphs for the most basic
problems.
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 26 / 28
38. Download our prototype at http://trdfreasoner.wordpress.com
THANK YOU
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 27 / 28
39. Reasoning Problems
⊕-Entailment:
Let G1, w1 and G2, w2 be two t-graphs. Then, G1, w1 ⊕-entails
G2, w2 (denoted by G1, w1 |= G2, w2 , if ⊕ is understood) if every
f⊕-model of G1, w1 is also a f⊕-model of G2, w2 .
⊕-Core:
Let G, w be a t-graph. A ⊕-core of G, w is a t-graph G , w with
G ⊆ G and such that: (i) G, w |= G , w ; and (ii) G, w |= G , w
holds ∀G ⊂ G .
Valeria Fionda, Gianluigi Greco ( Department of Mathematics and Computer Science, University ofTrust Models for RDF Data 28 / 28