This document discusses algorithms for transforming queries between different query languages. It focuses on transformations between Prolog, SPARQL, and λ-DCS queries. The document provides background on these query languages and explains why query transformations are useful, such as for linking natural language to database queries. It then describes two algorithms: one for transforming Prolog to SPARQL queries, and one for transforming SPARQL to λ-DCS queries. The algorithms are tested on a small geography database and the results are analyzed to evaluate the algorithms' performance and limitations.
Abstract:
An increasing number of applications rely on RDF, OWL 2, and SPARQL for storing and querying data. SPARQL, however, is not targeted towards end-users, and suitable query interfaces are needed. Faceted search is a prominent approach for end-user data access, and several RDF-based faceted search systems have been developed. There is, however, a lack of rigorous theoretical underpinning for faceted search in the context of RDF and OWL 2. In this paper, we provide such solid foundations. We formalise faceted interfaces for this context, identify a fragment of first-order logic capturing the underlying queries, and study the complexity of answering such queries for RDF and OWL 2 profiles. We then study interface generation and update, and devise efficiently implementable algorithms. Finally, we have implemented and tested our faceted search algorithms for scalability, with encouraging results.
Matching and merging anonymous terms from web sourcesIJwest
This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated i
Expressive Querying of Semantic Databases with Incremental Query RewritingAlexandre Riazanov
This talk briefly introduces the Incremental Query Rewriting (IQR) method (see http://link.springer.com/chapter/10.1007%2F978-1-4419-7335-1_1 ) and presents an approach for extremely expressive querying of RDF triplestores, based on IQR.
Tutorial - Introduction to Rule Technologies and SystemsAdrian Paschke
Tutorial at Semantic Web Applications and Tools for the Life Sciences (SWAT4LS 2014), 9-11 Dec., Berlin, Germany
http://www.swat4ls.org/workshops/berlin2014/
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...DBOnto
The document describes a novel approach to parallelizing materialization (fixpoint computation) of Datalog programs in centralized, main-memory RDF systems. The approach distributes the workload evenly across cores and uses an RDF indexing data structure that supports efficient, mostly lock-free parallel updates. Evaluation showed the approach parallelizes computation very well, with up to 13.9x speedup on 16 cores and 19.3x speedup using 32 virtual cores via hyperthreading.
WebSpa is a tool that allows the quick, intuitive (and even fun) interrogation of arbitrary SPARQL endpoints. WebSpa runs in the web browser and does not require the installation of any additional software. The tool manages a large variety of pre-defined SPARQL endpoints and allows the addition of new ones. An user account gives the possibility of saving both the interrogation and its results on the local computer, as well as further editing of the queries. The application is written in both Java and Flex. It uses Jena and ARQ application programming interface in order to perform the queries, and the results are processed and displayed using Flex.
Information access over linked data requires to determine
subgraph(s), in linked data's underlying graph, that correspond to the required information need. Usually, an information access framework is able to retrieve richer information by checking of a large number of possible subgraphs. However, on the ecking of a large number of possible subgraphs increases information access complexity. This makes information access frameworks less eective. A large number of contemporary linked data information access frameworks reduce the complexity by introducing dierent heuristics but they suer on retrieving richer information. Or, some frameworks do not care about the complexity. However, a practically usable framework should retrieve richer information with lower complexity. In linked data information access, we hypothesize that pre-processed data statistics of linked data can be used to eciently check a large number of possible subgraphs. This will help to retrieve comparatively richer information with lower data access complexity. Preliminary evaluation of our proposed hypothesis shows promising performance.
This document summarizes and compares four prominent RDF query languages: RDQL, SPARQL, SeRQL, and XsRQL. It evaluates each language based on seven key features: support for data types, path expressions, closure, semantics, optional values, aggregate functions, and advanced set operations. The document finds that while no language is complete, SPARQL shows the most potential as the future W3C standard due to its iterative development process incorporating feedback from the W3C working group. RDQL provides basic functionality but was not intended for complex queries. SeRQL has strong open source support. XsRQL extends existing XML query approaches.
Abstract:
An increasing number of applications rely on RDF, OWL 2, and SPARQL for storing and querying data. SPARQL, however, is not targeted towards end-users, and suitable query interfaces are needed. Faceted search is a prominent approach for end-user data access, and several RDF-based faceted search systems have been developed. There is, however, a lack of rigorous theoretical underpinning for faceted search in the context of RDF and OWL 2. In this paper, we provide such solid foundations. We formalise faceted interfaces for this context, identify a fragment of first-order logic capturing the underlying queries, and study the complexity of answering such queries for RDF and OWL 2 profiles. We then study interface generation and update, and devise efficiently implementable algorithms. Finally, we have implemented and tested our faceted search algorithms for scalability, with encouraging results.
Matching and merging anonymous terms from web sourcesIJwest
This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching spec This paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching specThis paper describes a workflow of simplifying and matching spec ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated ial language terms in RDF generated i
Expressive Querying of Semantic Databases with Incremental Query RewritingAlexandre Riazanov
This talk briefly introduces the Incremental Query Rewriting (IQR) method (see http://link.springer.com/chapter/10.1007%2F978-1-4419-7335-1_1 ) and presents an approach for extremely expressive querying of RDF triplestores, based on IQR.
Tutorial - Introduction to Rule Technologies and SystemsAdrian Paschke
Tutorial at Semantic Web Applications and Tools for the Life Sciences (SWAT4LS 2014), 9-11 Dec., Berlin, Germany
http://www.swat4ls.org/workshops/berlin2014/
Parallel Materialisation of Datalog Programs in Centralised, Main-Memory RDF ...DBOnto
The document describes a novel approach to parallelizing materialization (fixpoint computation) of Datalog programs in centralized, main-memory RDF systems. The approach distributes the workload evenly across cores and uses an RDF indexing data structure that supports efficient, mostly lock-free parallel updates. Evaluation showed the approach parallelizes computation very well, with up to 13.9x speedup on 16 cores and 19.3x speedup using 32 virtual cores via hyperthreading.
WebSpa is a tool that allows the quick, intuitive (and even fun) interrogation of arbitrary SPARQL endpoints. WebSpa runs in the web browser and does not require the installation of any additional software. The tool manages a large variety of pre-defined SPARQL endpoints and allows the addition of new ones. An user account gives the possibility of saving both the interrogation and its results on the local computer, as well as further editing of the queries. The application is written in both Java and Flex. It uses Jena and ARQ application programming interface in order to perform the queries, and the results are processed and displayed using Flex.
Information access over linked data requires to determine
subgraph(s), in linked data's underlying graph, that correspond to the required information need. Usually, an information access framework is able to retrieve richer information by checking of a large number of possible subgraphs. However, on the ecking of a large number of possible subgraphs increases information access complexity. This makes information access frameworks less eective. A large number of contemporary linked data information access frameworks reduce the complexity by introducing dierent heuristics but they suer on retrieving richer information. Or, some frameworks do not care about the complexity. However, a practically usable framework should retrieve richer information with lower complexity. In linked data information access, we hypothesize that pre-processed data statistics of linked data can be used to eciently check a large number of possible subgraphs. This will help to retrieve comparatively richer information with lower data access complexity. Preliminary evaluation of our proposed hypothesis shows promising performance.
This document summarizes and compares four prominent RDF query languages: RDQL, SPARQL, SeRQL, and XsRQL. It evaluates each language based on seven key features: support for data types, path expressions, closure, semantics, optional values, aggregate functions, and advanced set operations. The document finds that while no language is complete, SPARQL shows the most potential as the future W3C standard due to its iterative development process incorporating feedback from the W3C working group. RDQL provides basic functionality but was not intended for complex queries. SeRQL has strong open source support. XsRQL extends existing XML query approaches.
Extractive Document Summarization - An Unsupervised ApproachFindwise
1. This paper presents and evaluates an unsupervised extractive document summarization system that uses TextRank, K-means clustering, and one-class SVM algorithms for sentence ranking.
2. The system achieves state-of-the-art performance on the DUC 2002 English dataset with a ROUGE score of 0.4797 and can also summarize Swedish documents.
3. Domain knowledge is added through sentence boosting to improve summarization of news articles, and similarities between sentences are calculated to avoid redundancy for multi-document summarization.
A Mathematical Approach to Ontology Authoring and DocumentationChristoph Lange
This document proposes using OMDoc, a framework for representing formal knowledge, to improve ontology authoring and documentation. It describes how OMDoc can:
1) Provide better support for modularity, documentation at different granularities, and linking documentation to formal representations compared to languages like OWL.
2) Model existing ontologies and translate between OMDoc and OWL/RDF formats to leverage existing tools.
3) Allow comprehensive, integrated documentation of ontologies through features like literate programming. The approach is evaluated by reimplementing the FOAF ontology in OMDoc.
This document summarizes three popular Java frameworks for working with RDF and SPARQL: Jena, Sesame, and JRDF. It describes how each framework represents RDF data using a graph model with subjects, predicates, and objects. It also discusses how each framework supports querying RDF data using SPARQL or alternative query languages, and persisting RDF graphs to databases.
Query Translation for Data Sources with Heterogeneous Content Semantics Jie Bao
The document discusses query translation for data sources with heterogeneous content semantics. It proposes using ontology-extended data sources to make explicit the implicit ontologies associated with data. The key aspects covered include translating queries between different data content ontologies using conversion functions and interoperation constraints to ensure sound, complete, or exact translations.
The Semantic Web is a vision of information that is understandable by computers. Although there is great exploitable potential, we are still in "Generation Zero'' of the Semantic Web, since there are few real-world compelling applications. The heterogeneity, the volume of data and the lack of standards are problems that could be addressed through some nature inspired methods. The paper presents the most important aspects of the Semantic Web, as well as its biggest issues; it then describes some methods inspired from nature - genetic algorithms, artificial neural networks, swarm intelligence, and the way these techniques can be used to deal with Semantic Web problems.
Automated building of taxonomies for search enginesBoris Galitsky
We build a taxonomy of entities which is intended to improve the relevance of search engine in a vertical domain. The taxonomy construction process starts from the seed entities and mines the web for new entities associated with them. To form these new entities, machine learning of syntactic parse trees (their generalization) is applied to the search results for existing entities to form commonalities between them. These commonality expressions then form parameters of existing entities, and are turned into new entities at the next learning iteration.
Taxonomy and paragraph-level syntactic generalization are applied to relevance improvement in search and text similarity assessment. We conduct an evaluation of the search relevance improvement in vertical and horizontal domains and observe significant contribution of the learned taxonomy in the former, and a noticeable contribution of a hybrid system in the latter domain. We also perform industrial evaluation of taxonomy and syntactic generalization-based text relevance assessment and conclude that proposed algorithm for automated taxonomy learning is suitable for integration into industrial systems. Proposed algorithm is implemented as a part of Apache OpenNLP.Similarity project.
Lemon-aid: using Lemon to aid quantitative historical linguistic analysismbruemmer
The document discusses converting dictionary and wordlist data from the Quantitative Historical Linguistics (QuantHistLing) research unit into the Lexicon Model for Ontologies (Lemon) format. This allows the data to be queried as Linked Data across over 50 lexical resources. The data is converted into RDF triples using Lemon to model lexicons and machine-readable dictionaries. This combined data set can then be used as input for computational historical linguistics tools to analyze word similarities and cognates across languages.
The document discusses an experiment in acquiring rich logical knowledge from natural language text using a technique called Textual Logic (TL). TL maps text to logical formulas and vice versa using an interactive disambiguation process. In an experiment, TL was used to represent over 2,500 sentences from a biology textbook as logical formulas using Rulelog, a new knowledge representation that is defeasible, tractable and rich. The resulting logical knowledge covered over 95% of the textbook material and took an average of less than 10 minutes per sentence to author. The study demonstrates progress on rapidly acquiring rich logical knowledge from text and reasoning with such knowledge.
This document discusses creating an OWL ontology from Dublin Core metadata to improve semantic interoperability. It describes a process to transform DC metadata from a digital repository into RDF, then refine the semantics through an OWL ontology. It demonstrates this with metadata from a university repository, adding properties and subclasses. Queries over the instantiated ontology using reasoning show the ability to infer new knowledge not possible from the original metadata alone. The approach aims to semantically enrich existing metadata in a centralized way to help address challenges of the semantic web.
The logic-based machine-understandable framework of the Semantic Web often challenges naive users when they try to query ontology-based knowledge bases. Existing research efforts have approached this problem by introducing Natural Language (NL) interfaces to ontologies. These NL interfaces have the ability to construct SPARQL queries based on NL user queries. However, most efforts were restricted to queries expressed in English, and they often benefited from the advancement of English NLP tools. However, little research has been done to support querying the Arabic content on the Semantic Web by using NL queries. This paper presents a domain-independent approach to translate Arabic NL queries to SPARQL by leveraging linguistic analysis. Based on a special consideration on Noun Phrases (NPs), our approach uses a language parser to extract NPs and the relations from Arabic parse trees and match them to the underlying ontology. It then utilizes knowledge in the ontology to group NPs into triple-based representations. A SPARQL query is finally generated by extracting targets and modifiers, and interpreting them into SPARQL. The interpretation of advanced semantic features including negation, conjunctive and disjunctive modifiers is also supported. The approach was evaluated by using two datasets consisting of OWL test data and queries, and the obtained results have confirmed its feasibility to translate Arabic NL queries to SPARQL.
Web search engines index documents and respond to keyword queries by returning a ranked list of relevant documents. Early search engines like Archie allowed searching by title across FTP sites. Modern search engines preprocess documents by removing tags and stopwords, stemming words, and building inverted indexes to map terms to documents for fast retrieval. They evaluate search results using metrics like precision and recall compared to human judgments of relevance.
Concept hierarchy is the backbone of ontology, and the concept hierarchy acquisition has been a hot topic in the field of ontology learning. this paper proposes a hyponymy extraction method of domain ontology concept based on cascaded conditional random field(CCRFs) and hierarchy clustering. It takes free text as extracting object, adopts CCRFs identifying the domain concepts. First the low layer of CCRFs is used to identify simple domain concept, then the results are sent to the high layer, in which the nesting concepts are recognized. Next we adopt hierarchy clustering to identify the hyponymy relation between domain ontology concepts. The experimental results demonstrate the proposed method is efficient.
Framester: A Wide Coverage Linguistic Linked Data HubMehwish Alam
Framester is a linguistic linked data hub that aims to improve coverage of FrameNet by extending mappings between FrameNet and other resources like WordNet and BabelNet. Framester represents over 40 million triples linking linguistic and factual resources and aligning frames, roles, and types to foundational ontologies. It provides a word frame disambiguation service and was evaluated on annotated corpora, showing improved performance over previous approaches.
This document discusses semantic technologies and digital data processing. It provides an overview of semantics and the semantic web, including XML, RDF, OWL, SPARQL, ontologies, and data models. It also discusses capturing semantics in XML documents, OWL, RDF schema, semantic web applications like cartographic searching, SKOS for knowledge organization systems, and the SKOS Play visualization tool.
This is a short presentation that explains the famous TextRank papers that used graphs to produce summaries and document indices (keywords).
Link to paper : https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf
This document discusses topic extraction for domain ontology. It describes domain ontology as a collection of vocabularies and conceptualization of a given domain. The purpose of topic extraction is to identify relevant concepts in documents, obtain domain-specific terms, classify documents, and identify key concepts and relationships for an ontology. The project stages include obtaining domain knowledge, preprocessing documents, and applying either K-Means clustering or Latent Dirichlet Allocation to extract topics. K-Means partitions data into clusters while LDA represents documents as mixtures over topics characterized by word distributions.
"Vouchers" para el desarrollo de mercados de servicios empresariales: ¿de qué...José Pedro Alberti
La Unión Europea ha propuesto un nuevo paquete de sanciones contra Rusia que incluye un embargo al petróleo ruso. El embargo se aplicaría gradualmente durante seis meses para el petróleo crudo y ocho meses para los productos refinados. Este paquete de sanciones requiere la aprobación unánime de los 27 estados miembros de la UE.
The document provides information about upcoming events and exhibits at the South Carolina State Museum in early 2017. It announces the opening of a new blockbuster exhibit called "Savage Ancient Seas" on February 4th that will showcase prehistoric sea creatures from 80 million years ago. It also notes that a new full-dome planetarium movie called "Sea Monsters: A Prehistoric Adventure" will debut on the same day to complement the exhibit. The director expresses excitement for the museum's plans and programs in 2017.
Qualtrics is a survey software company that has experienced strong financial success, doubling its revenue each year since 2011. It sells its products to universities and businesses by having new graduates tell their managers about Qualtrics. The company has a high customer retention rate of 90% or more. Key factors in its continued predicted success include its revenue-generating structure, positive cash flows, doubling of revenue, and ability to attract talent.
This document summarizes a presentation on filling the marketing funnel with social media. It discusses trends in social media usage, with platforms like Facebook seeing steady adult usage while attention shifts away from television. The presentation predicts that by 2020, social media will be fully integrated into daily life through connected devices and bots, and virtual reality experiences will be commonly used. It profiles companies like Citi and Panoptic Group that are innovating with technologies like Spectacles and virtual reality. The presentation encourages brands to invest in emerging technologies, treat social media as a lifestyle, take risks with new platforms, hire dedicated social media staff, and focus on technology promotions over cash promotions.
- Amazon was founded in 1995 by Jeff Bezos originally as an online bookstore and has since expanded into a massive online retailer selling a wide array of products.
- It has grown rapidly through expanding its product offerings, making acquisitions, and benefiting from strong internet growth. However, its focus on growth and innovation has resulted in high costs that offset much of its large sales revenues.
- Key issues Amazon faces include prioritizing sales growth over profits, intense workplace culture pressures, high shipping costs, and strong competition from other online retailers. Its continued innovation will be important to maintain its dominance.
Top Safety News for July 2016
• Free employer toolkit to promote seat belt use - and reduce costs
• How to improve skin safety in the workplace
• Seven safety program essentials
• OSHA fines top $5 million in June
• 2,600+ new safety signs and labels at ComplianceSigns.com
Extractive Document Summarization - An Unsupervised ApproachFindwise
1. This paper presents and evaluates an unsupervised extractive document summarization system that uses TextRank, K-means clustering, and one-class SVM algorithms for sentence ranking.
2. The system achieves state-of-the-art performance on the DUC 2002 English dataset with a ROUGE score of 0.4797 and can also summarize Swedish documents.
3. Domain knowledge is added through sentence boosting to improve summarization of news articles, and similarities between sentences are calculated to avoid redundancy for multi-document summarization.
A Mathematical Approach to Ontology Authoring and DocumentationChristoph Lange
This document proposes using OMDoc, a framework for representing formal knowledge, to improve ontology authoring and documentation. It describes how OMDoc can:
1) Provide better support for modularity, documentation at different granularities, and linking documentation to formal representations compared to languages like OWL.
2) Model existing ontologies and translate between OMDoc and OWL/RDF formats to leverage existing tools.
3) Allow comprehensive, integrated documentation of ontologies through features like literate programming. The approach is evaluated by reimplementing the FOAF ontology in OMDoc.
This document summarizes three popular Java frameworks for working with RDF and SPARQL: Jena, Sesame, and JRDF. It describes how each framework represents RDF data using a graph model with subjects, predicates, and objects. It also discusses how each framework supports querying RDF data using SPARQL or alternative query languages, and persisting RDF graphs to databases.
Query Translation for Data Sources with Heterogeneous Content Semantics Jie Bao
The document discusses query translation for data sources with heterogeneous content semantics. It proposes using ontology-extended data sources to make explicit the implicit ontologies associated with data. The key aspects covered include translating queries between different data content ontologies using conversion functions and interoperation constraints to ensure sound, complete, or exact translations.
The Semantic Web is a vision of information that is understandable by computers. Although there is great exploitable potential, we are still in "Generation Zero'' of the Semantic Web, since there are few real-world compelling applications. The heterogeneity, the volume of data and the lack of standards are problems that could be addressed through some nature inspired methods. The paper presents the most important aspects of the Semantic Web, as well as its biggest issues; it then describes some methods inspired from nature - genetic algorithms, artificial neural networks, swarm intelligence, and the way these techniques can be used to deal with Semantic Web problems.
Automated building of taxonomies for search enginesBoris Galitsky
We build a taxonomy of entities which is intended to improve the relevance of search engine in a vertical domain. The taxonomy construction process starts from the seed entities and mines the web for new entities associated with them. To form these new entities, machine learning of syntactic parse trees (their generalization) is applied to the search results for existing entities to form commonalities between them. These commonality expressions then form parameters of existing entities, and are turned into new entities at the next learning iteration.
Taxonomy and paragraph-level syntactic generalization are applied to relevance improvement in search and text similarity assessment. We conduct an evaluation of the search relevance improvement in vertical and horizontal domains and observe significant contribution of the learned taxonomy in the former, and a noticeable contribution of a hybrid system in the latter domain. We also perform industrial evaluation of taxonomy and syntactic generalization-based text relevance assessment and conclude that proposed algorithm for automated taxonomy learning is suitable for integration into industrial systems. Proposed algorithm is implemented as a part of Apache OpenNLP.Similarity project.
Lemon-aid: using Lemon to aid quantitative historical linguistic analysismbruemmer
The document discusses converting dictionary and wordlist data from the Quantitative Historical Linguistics (QuantHistLing) research unit into the Lexicon Model for Ontologies (Lemon) format. This allows the data to be queried as Linked Data across over 50 lexical resources. The data is converted into RDF triples using Lemon to model lexicons and machine-readable dictionaries. This combined data set can then be used as input for computational historical linguistics tools to analyze word similarities and cognates across languages.
The document discusses an experiment in acquiring rich logical knowledge from natural language text using a technique called Textual Logic (TL). TL maps text to logical formulas and vice versa using an interactive disambiguation process. In an experiment, TL was used to represent over 2,500 sentences from a biology textbook as logical formulas using Rulelog, a new knowledge representation that is defeasible, tractable and rich. The resulting logical knowledge covered over 95% of the textbook material and took an average of less than 10 minutes per sentence to author. The study demonstrates progress on rapidly acquiring rich logical knowledge from text and reasoning with such knowledge.
This document discusses creating an OWL ontology from Dublin Core metadata to improve semantic interoperability. It describes a process to transform DC metadata from a digital repository into RDF, then refine the semantics through an OWL ontology. It demonstrates this with metadata from a university repository, adding properties and subclasses. Queries over the instantiated ontology using reasoning show the ability to infer new knowledge not possible from the original metadata alone. The approach aims to semantically enrich existing metadata in a centralized way to help address challenges of the semantic web.
The logic-based machine-understandable framework of the Semantic Web often challenges naive users when they try to query ontology-based knowledge bases. Existing research efforts have approached this problem by introducing Natural Language (NL) interfaces to ontologies. These NL interfaces have the ability to construct SPARQL queries based on NL user queries. However, most efforts were restricted to queries expressed in English, and they often benefited from the advancement of English NLP tools. However, little research has been done to support querying the Arabic content on the Semantic Web by using NL queries. This paper presents a domain-independent approach to translate Arabic NL queries to SPARQL by leveraging linguistic analysis. Based on a special consideration on Noun Phrases (NPs), our approach uses a language parser to extract NPs and the relations from Arabic parse trees and match them to the underlying ontology. It then utilizes knowledge in the ontology to group NPs into triple-based representations. A SPARQL query is finally generated by extracting targets and modifiers, and interpreting them into SPARQL. The interpretation of advanced semantic features including negation, conjunctive and disjunctive modifiers is also supported. The approach was evaluated by using two datasets consisting of OWL test data and queries, and the obtained results have confirmed its feasibility to translate Arabic NL queries to SPARQL.
Web search engines index documents and respond to keyword queries by returning a ranked list of relevant documents. Early search engines like Archie allowed searching by title across FTP sites. Modern search engines preprocess documents by removing tags and stopwords, stemming words, and building inverted indexes to map terms to documents for fast retrieval. They evaluate search results using metrics like precision and recall compared to human judgments of relevance.
Concept hierarchy is the backbone of ontology, and the concept hierarchy acquisition has been a hot topic in the field of ontology learning. this paper proposes a hyponymy extraction method of domain ontology concept based on cascaded conditional random field(CCRFs) and hierarchy clustering. It takes free text as extracting object, adopts CCRFs identifying the domain concepts. First the low layer of CCRFs is used to identify simple domain concept, then the results are sent to the high layer, in which the nesting concepts are recognized. Next we adopt hierarchy clustering to identify the hyponymy relation between domain ontology concepts. The experimental results demonstrate the proposed method is efficient.
Framester: A Wide Coverage Linguistic Linked Data HubMehwish Alam
Framester is a linguistic linked data hub that aims to improve coverage of FrameNet by extending mappings between FrameNet and other resources like WordNet and BabelNet. Framester represents over 40 million triples linking linguistic and factual resources and aligning frames, roles, and types to foundational ontologies. It provides a word frame disambiguation service and was evaluated on annotated corpora, showing improved performance over previous approaches.
This document discusses semantic technologies and digital data processing. It provides an overview of semantics and the semantic web, including XML, RDF, OWL, SPARQL, ontologies, and data models. It also discusses capturing semantics in XML documents, OWL, RDF schema, semantic web applications like cartographic searching, SKOS for knowledge organization systems, and the SKOS Play visualization tool.
This is a short presentation that explains the famous TextRank papers that used graphs to produce summaries and document indices (keywords).
Link to paper : https://web.eecs.umich.edu/~mihalcea/papers/mihalcea.emnlp04.pdf
This document discusses topic extraction for domain ontology. It describes domain ontology as a collection of vocabularies and conceptualization of a given domain. The purpose of topic extraction is to identify relevant concepts in documents, obtain domain-specific terms, classify documents, and identify key concepts and relationships for an ontology. The project stages include obtaining domain knowledge, preprocessing documents, and applying either K-Means clustering or Latent Dirichlet Allocation to extract topics. K-Means partitions data into clusters while LDA represents documents as mixtures over topics characterized by word distributions.
"Vouchers" para el desarrollo de mercados de servicios empresariales: ¿de qué...José Pedro Alberti
La Unión Europea ha propuesto un nuevo paquete de sanciones contra Rusia que incluye un embargo al petróleo ruso. El embargo se aplicaría gradualmente durante seis meses para el petróleo crudo y ocho meses para los productos refinados. Este paquete de sanciones requiere la aprobación unánime de los 27 estados miembros de la UE.
The document provides information about upcoming events and exhibits at the South Carolina State Museum in early 2017. It announces the opening of a new blockbuster exhibit called "Savage Ancient Seas" on February 4th that will showcase prehistoric sea creatures from 80 million years ago. It also notes that a new full-dome planetarium movie called "Sea Monsters: A Prehistoric Adventure" will debut on the same day to complement the exhibit. The director expresses excitement for the museum's plans and programs in 2017.
Qualtrics is a survey software company that has experienced strong financial success, doubling its revenue each year since 2011. It sells its products to universities and businesses by having new graduates tell their managers about Qualtrics. The company has a high customer retention rate of 90% or more. Key factors in its continued predicted success include its revenue-generating structure, positive cash flows, doubling of revenue, and ability to attract talent.
This document summarizes a presentation on filling the marketing funnel with social media. It discusses trends in social media usage, with platforms like Facebook seeing steady adult usage while attention shifts away from television. The presentation predicts that by 2020, social media will be fully integrated into daily life through connected devices and bots, and virtual reality experiences will be commonly used. It profiles companies like Citi and Panoptic Group that are innovating with technologies like Spectacles and virtual reality. The presentation encourages brands to invest in emerging technologies, treat social media as a lifestyle, take risks with new platforms, hire dedicated social media staff, and focus on technology promotions over cash promotions.
- Amazon was founded in 1995 by Jeff Bezos originally as an online bookstore and has since expanded into a massive online retailer selling a wide array of products.
- It has grown rapidly through expanding its product offerings, making acquisitions, and benefiting from strong internet growth. However, its focus on growth and innovation has resulted in high costs that offset much of its large sales revenues.
- Key issues Amazon faces include prioritizing sales growth over profits, intense workplace culture pressures, high shipping costs, and strong competition from other online retailers. Its continued innovation will be important to maintain its dominance.
Top Safety News for July 2016
• Free employer toolkit to promote seat belt use - and reduce costs
• How to improve skin safety in the workplace
• Seven safety program essentials
• OSHA fines top $5 million in June
• 2,600+ new safety signs and labels at ComplianceSigns.com
This newsletter from Epworth United Methodist Church provides information about upcoming worship services and events for February and March 2017. It includes the pastor's message about stewardship and giving from a biblical perspective. It also announces plans for a new church photo directory in March and events to celebrate the church's 175th anniversary in July.
Este documento presenta los resultados de una evaluación de estilos de aprendizaje realizada a Jessica Londoño. La evaluación consistió en una serie de preguntas donde se presentaban cuatro opciones de estilos de aprendizaje y se pedía calificarlas de 1 a 4 según su aproximación al estilo propio. Los resultados ubican a Jessica principalmente en los estilos de aprendizaje de conceptualización abstracta y experiencia concreta, lo que la clasifica como convergente.
Este documento describe diferentes tipos de subdrenajes, incluyendo sus características y usos. Menciona subdrenes interceptores, para pavimentos, horizontales, colchones de drenaje, de zanja y profundos. También describe trincheras estabilizadoras y cortinas impermeables subterráneas, destacando que los subdrenes tienen el objetivo de disminuir las presiones de poro o impedir que aumenten para estabilizar taludes.
Este documento describe diferentes tipos de organizaciones y estructuras dentro de una empresa. Explica conceptos como la organización, los organigramas, la departamentalización por producto, función o proyecto, y las ventajas e inconvenientes de las estructuras funcionales, verticales y amplias.
El documento describe los diferentes espacios y materiales de un aula de Educación Infantil. Incluye secciones para actividades lógico-matemáticas, experimentación sensorial, juego simbólico, comunicación lingüística, proyectos y áreas naturalistas con material apropiado para cada una.
Renata Gimenes tem experiência em comunicação, marketing e jornalismo. Formou-se em Jornalismo e fez especializações em Jornalismo Internacional e Impresso e MBA em Marketing. Atualmente trabalha como assessora de imprensa e comunicação para várias agências e clientes dos setores de turismo, hotelaria e entretenimento.
The document discusses querying ontologies using SPARQL-DL. It describes building a user-friendly application that allows loading an ontology and submitting SPARQL-DL queries. The application utilizes the Pellet reasoner via its Jena API implementation to process queries against OWL ontologies. Experiments show the application can correctly handle simple DL and mixed ABox/TBox queries.
The document discusses querying ontologies using SPARQL-DL. It describes building a user-friendly application that allows loading an ontology and submitting SPARQL-DL queries. The application utilizes the Pellet reasoner via its Jena API implementation to process queries against OWL ontologies. Experiments show the application can correctly handle simple DL and mixed ABox/TBox queries.
IRJET- An Efficient Way to Querying XML Database using Natural LanguageIRJET Journal
This document discusses an efficient way to query XML databases using natural language. It proposes a framework that can accept English language queries and translate them into XQuery or SQL expressions to retrieve data from an XML database. The system performs linguistic processing to map tokens in the natural language query to XQuery fragments, then executes the translated query against the database. Existing approaches are discussed that typically use semantic and syntactic analysis to represent the query logically before translation, but have limitations in handling ambiguity. The proposed system aims to improve query translation accuracy by leveraging token relationships and classifications determined from natural language parsing.
The document discusses faceted search over ontology-enhanced RDF data. It formalizes faceted interfaces for querying RDF graphs that capture ontological information. It studies the expressivity and complexity of queries represented by faceted interfaces, and algorithms for generating and updating interfaces based on the underlying RDF and ontology information. The goal is to provide rigorous theoretical foundations for faceted search in the context of RDF and OWL 2 ontologies.
Although animals do not use language, they are capable of many of the same kinds of cognition as us; much of our experience is at a non-verbal level.
Semantics is the bridge between surface forms used in language and what we do and experience.
Language understanding depends on world knowledge (i.e. “the pig is in the pen” vs. “the ink is in the pen”)
We might not be ready for executives to specify policies themselves, but we can make the process from specification to behavior more automated, linked to precise vocabulary, and more traceable.
Advances such as SVBR and an English serialization for ISO Common Logic means that executives and line workers can understand why the system does certain things, or verify that policies and regulations are implemented
Realization of natural language interfaces usingunyil96
The document discusses research on using lazy functional programming (LFP) to build natural language interfaces (NLIs). LFP involves delaying evaluation of function arguments until needed. Over 45 researchers have investigated using LFP for NLI design and implementation due to similarities between some linguistic theories and LFP theories. The research has resulted in over 60 papers on using LFP for natural language processing tasks like syntactic and semantic analysis. The paper provides a comprehensive survey of this research area at the intersection of computer science and computational linguistics.
Explanations in Dialogue Systems through Uncertain RDF Knowledge BasesDaniel Sonntag
We implemented a generic dialogue shell that can be configured for and applied to domain-specific dialogue applications. The dialogue system works robustly for a new domain when the application backend can automatically infer previously unknown knowledge (facts) and provide explanations for the inference steps involved. For this purpose, we employ URDF, a query engine for uncertain and potentially inconsistent RDF knowledge bases. URDF supports rule-based, first-order predicate logic as used in OWL-Lite and OWL-DL, with simple and effective top-down reasoning capabilities. This mechanism also generates explanation graphs. These graphs can then be displayed in the GUI of the dialogue shell and help the user understand the underlying reasoning processes. We believe that proper explanations are a main factor for increasing the level of user trust in end-to-end human-computer interaction systems.
This document discusses several inference engines that can be used for semantic web applications: Pellet, FaCT, FaCT++, RacerPro, Kaon2, and HermiT. It analyzes and compares these inference engines based on their expressivity, algorithms, interfaces, and other features. The key purpose of inference engines is to infer new knowledge and relationships from existing semantic data using rules and ontologies. The document concludes that a comparative analysis of inference engines can help select the most appropriate one for a given semantic web application or research.
Information residing in relational databases and delimited file systems are inadequate for reuse and sharing over the web. These file systems do not adhere to commonly set principles for maintaining data harmony. Due to these reasons, the resources have been suffering from lack of uniformity, heterogeneity as well as redundancy throughout the web. Ontologies have been widely used for solving such type of problems, as they help in extracting knowledge out of any information system. In this article, we focus on extracting concepts and their relations from a set of CSV files. These files are served as individual concepts and grouped into a particular domain, called the domain ontology. Furthermore, this domain ontology is used for capturing CSV data and represented in RDF format retaining links among files or concepts. Datatype and object properties are automatically detected from header fields. This reduces the task of user involvement in generating mapping files. The detail analysis has been performed on Baseball tabular data and the result shows a rich set of semantic information.
May 2024 - Top10 Cited Articles in Natural Language Computingkevig
Natural Language Processing is a programmed approach to analyze text that is based on both a set of theories and a set of technologies. This forum aims to bring together researchers who have designed and build software that will analyze, understand, and generate languages that humans use naturally to address computers.
As technology and needs evolve and the need for scalable and high availability solutions increase there is a need to evaluate new databases. The lack of clarity in the market makes in difficult for IT stakeholders to understand the differences between the solutions available and the choice to make. The key areas to consider while evaluating NoSql databases are data model, query model, consistency model, APIs, support and community strength.
INTELLIGENT-MULTIDIMENSIONAL-DATABASE-INTERFACEMohamed Reda
The document describes an intelligent multidimensional database interface system that allows users to query the database using natural language instead of SQL. The system works by parsing the user's natural language query, filling a semantic dictionary with words from the query and a lexical dictionary with terms from the database schema. It then maps words between the two dictionaries to generate a SQL query, which is executed on the database to return results to the user. The system aims to provide a more user-friendly search experience for non-expert users compared to traditional SQL queries.
This document compares three APIs for processing RDF in the .NET Framework: SemWeb, LinqToRdf, and Rowlex. SemWeb provides low-level RDF interaction and the others build on it. LinqToRdf allows LINQ querying of RDF graphs while Rowlex maps RDF triples to object-oriented classes. All three APIs lack documentation and support as they were last updated in 2008-2009. SemWeb has the best performance while LinqToRdf has the lowest due to additional processing of LINQ queries to SPARQL.
This document discusses various programming language trends including functional programming languages like Haskell, purely object-oriented languages, and Scala. It also covers database trends like NoSQL databases Cassandra and MongoDB. Programming tools like Docker and Vagrant are mentioned. The document discusses paradigms like static vs dynamic typing and strong vs weak typing. It provides examples and resources for languages including Haskell, Erlang, Scala, and databases like Cassandra.
This document discusses key concepts of object-oriented programming including inheritance, polymorphism, and dynamic binding. It describes how inheritance allows for reuse of data and functionality from parent classes while also allowing modification and addition of new entities in derived classes. Polymorphism is supported through dynamic binding which allows polymorphic variables to reference objects from parent and subclass types.
This document is a preprint that summarizes a paper on developing a spell checker model using automata. It discusses using fuzzy automata rather than finite automata for improved string comparison. The proposed spell checker would incorporate autosuggestion features into a Windows application. It would use techniques like edit distance and tries to store dictionaries and suggest corrections. The paper outlines the design of the spell checker, discussing functions like comparing words to a dictionary and considering morphology. Advantages like improved accuracy and speed are discussed along with potential disadvantages like inability to detect all errors.
Object relationship mapping and hibernateJoe Jacob
ORM stands for Object Relational Mapping and is a technique for mapping objects in an object-oriented programming language to tables in a relational database. The document discusses the various mismatches that exist between object-oriented programming and relational databases, and how ORM frameworks like Hibernate address these mismatches through object relational mapping. It also provides an overview of Hibernate - a popular Java ORM framework, and demonstrates how to configure Hibernate and perform basic CRUD operations using both XML configuration and Java annotations.
Class Diagram Extraction from Textual Requirements Using NLP Techniquesiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
This document presents a new method for extracting class diagrams from textual requirements using natural language processing (NLP) techniques. It proposes the Requirements Analysis and Class diagram Extraction (RACE) system, which uses tools like the OpenNLP parser, a stemming algorithm, and WordNet to extract concepts and identify classes, attributes and relationships. The RACE system applies heuristic rules and a domain ontology to the output of the NLP tools to refine and finalize the extracted class diagram. The paper concludes that the RACE system demonstrates the effective use of NLP techniques to automate the extraction of class diagrams from informal natural language requirements specifications.
1. Transformation between Query Languages
Miguel Cristian Greciano Raiskila
Tokyo National Institute of Informatics
Technische Universit¨at Darmstadt
Universidad Polit´ecnica de Madrid
mc.greciano@gmail.com
Yusuke Miyao
Tokyo National Institute of Informatics
Associate Professor
Master Thesis Tutor
yusuke@nii.ac.jp
Abstract
The Semantic Web, an extension of the
Web that provides easier ways to retrieve
data, has seen a major growth in the last
years. Not only do users of the Inter-
net desire information, they desire better,
quicker, easier and more efficient ways to
access that information and find answers
to their questions.
A clear example of the Semantic Web phe-
nomenon is the popularity of the Freebase
data set. It is a big collection of struc-
tured data harvested from many sources.
Freebase runs on a database infrastructure
created in-house by Metaweb that uses a
graph model. Because its data structure
is non-hierarchical, Freebase is open for
users to enter new objects and relation-
ships into the underlying graph, a great
advantage. Since 2008, Freebase imple-
ments RDF (Resource Description For-
mat), allowing Freebase to be used as
linked data and be queried by languages
such as SPARQL, which is also quite pop-
ular. SPARQL users normally create their
own SPARQL queries manually, or at least
semi-manually. There are even hundreds
of sample SPARQL queries on the web
with their associated natural language ut-
terance to hint users how to create or adapt
their own queries. It would be however
ideal if this whole step from natural lan-
guage to executing a query in the database
was automated.
In this paper we wish to address a key
feature in the desired automation between
natural language and query execution:
transformation between query languages.
Indeed in the example of Freebase and
SPARQL, natural languages utterances are
far in nature from RDF graphs, but closer
in nature to semantic trees. If we could
close the gap between trees and graphs by
transforming between different query lan-
guages, we would draw near to our final
goal of automation. In this paper we out-
line two algorithms that transform queries
between different languages, the SPARQL
→ λ-DCS one being the more relevant.
We also provide the necessary background
as to the reasons and utility for said algo-
rithms.
1 Introduction
This paper outlines the algorithms developed to
transform queries from one query language into
their equivalents in a different query language.
It also provides a brief but comprehensive back-
ground on the query languages and tools chosen
for this study, as well as the reasons to why these
were chosen for this study.
An obvious initial question is what kind of
benefits would transformation between query lan-
guages provide. It is true that equivalent queries
produce equivalent results (”Where was Barack
Obama born?” should return ”Honolulu” no mat-
ter which query language one is using), however,
different languages express different concepts for
equivalent queries, as well as being different in
efficiency and answer-retrieving speed for equiv-
alent queries. Just as one can express a database
containing the books in a library and associated
data both in, for example, programming languages
Java and C, the implementation of said database
will be obviously different due to the different
inherent nature between both programming lan-
guages. Java will use objects to represent the data
and C will use other data structures and care about
storage and memory differently.
Furthermore, transformation between query
languages can be very useful when a specific query
2. language is more suitable than another for cer-
tain natural language processing tasks. Some lan-
guages have a graph structure and others have
a tree structure. Associating syntactic trees that
arise from parsing a natural language sentence
with a query could be intuitively easier done if
the query itself has a tree structure. For ex-
ample, in their ”Semantic Parsing on Freebase
from Question-Answer Pairs” (Berant et al., 2013)
Jonathan Berant, Andrew Chou, Roy Frostig and
Percy Liang already suggested a way to map natu-
ral language utterances to queries on the Freebase
data set: through a tree-structured query language
they name Lambda Dependency-Based Composi-
tional Semantics (from now on λ-DCS).
An ambitious project that our team is currently
working on, the Question-Answering Project
(from now on, the QA project), intends to auto-
matically create these Freebase queries from natu-
ral language input. As mentioned in the abstract,
we already possess a good amount of utterance-
query pairs, however in order to train the inter-
preter we must be able to traverse from utterance-
semantic tree-graph-query and the reverse way
too.
The chosen query languages for the transforma-
tions are Prolog, SPARQL and λ-DCS. Section 3
contains a brief exposition of the syntax and pe-
culiarities of these languages. Section 4 outlines
the transformation algorithms between these lan-
guages. In section 5 we analyze the performance
of said algorithms. Section 6 explains the encoun-
tered problems and limitations to the algorithms
and the transformations, as well as some possible
solutions. Section 7 suggests future tasks to be
carried out from this project, and section 8 pro-
vides the conclusion to our work. Section 9 is an
Appendix containing some of the results of our
work: queries equivalently expressed in different
query languages.
As we will see, the λ-DCS → SPARQL trans-
formation is already implicit in the toolkit that
we have. However the major contribution of this
paper is the reverse SPARQL → λ-DCS trans-
formation. It is not a trivial task as λ-DCS
→ SPARQL is transforming trees into graphs,
whereas SPARQL → λ-DCS is the opposite: rans-
forming graphs into trees. Trees are by defini-
tion a particular case of graphs and thus less ex-
pressive, so in theory transforming from graphs
to trees should pose a significant challenge. The
SPARQL → λ-DCS has not been proposed yet
and is important for the QA project, which needs
to traverse from semantic trees to graphs (λ-DCS
→ SPARQL) and also from graphs to semantic
trees (SPARQL → λ-DCS) in order to execute
training.
The chosen experimental database is GeoQuery
1. It is a small database containing geographical
information about all of the states in the US (cities,
rivers, roads, highest and lowest points...). An ex-
tended geographical database with more data is
included in Freebase, however we chose to work
with GeoQuery because it is simple, intuitive,
small, comes in various formats and is free and
open source (despite Freebase being also free and
open source, there are of course data sets which
are not, so this advantage should not be consid-
ered a given). It is also relatively well-known.
One of the formats GeoQuery comes in is the Pro-
log format, and this is the main reason why we
chose to select Prolog as one of the query lan-
guages in this research. GeoQuery comes with
more than 880 functional Prolog queries as sam-
ples, all of them associated with natural language
questions such as ”What is the longest river in Col-
orado?”. The samples are first-order-logic queries,
i.e. they treat with quantities and sets, not propo-
sitions. Thus, we have a good starting source ma-
terial for our objective of transforming functional
queries in one language to their equivalents in an-
other language. In this case, the algorithms are de-
signed to transform first from Prolog to SPARQL,
and then from SPARQL to λ-DCS. We thus wish
the algorithms function properly in the GeoQuery
database, with the hope that such algorithms gen-
eralize well when transforming queries in larger
datasets like Freebase.
2 Related Work
Automated transformation between query lan-
guages is not a very common practice. Most of the
times queries are written manually for the purpose
of retrieving desired information. Consequently,
related work is scarce on the web. One can still
find similar attempts though: in the book ”Reason-
ing Web” there’s a subsection addressing transfor-
mation between SPARQL and GReQL. (A mann
et al., 2010)
As we will see in section 4.3, the transforma-
1
http://www.cs.utexas.edu/users/ml/
nldata/geoquery.html
3. tion λ-DCS → SPARQL is implicit in the SEM-
PRE toolkit, however the reverse transformation
SPARQL → λ-DCS is not. One of the algo-
rithms proposed in this paper attempts to exe-
cute this transformation. The other algorithm pro-
posed is for the Prolog → SPARQL transforma-
tion, which as far as the authors are concerned,
nobody else has attempted to develop. Thus, even
though Transformation between Query Languages
is not a new concept, it is still rare, and this paper
pioneers the transformations Prolog → SPARQL
and SPARQL → λ-DCS.
3 Query Languages Overview
Here we do not intend to explain all the intrica-
cies of the used languages. However we wish to
provide a simple background for each of them,
along with some basic definitions, so that the
reader can fully comprehend the algorithms de-
veloped in this work and the subsequent results of
the research. We also provide references to more
complete expositions and/or tutorials of these lan-
guages should the reader wish to deepen his un-
derstanding of these query languages.
3.1 Prolog
Prolog is a general purpose logic programming
language, with roots in first-order logic. Prolog
is declarative: the program logic is expressed in
terms of relations or relationships, the query of
which initiates a computation. The relationships
are thus arbitrary, i.e., the author decides how to
define said relationships, and there is no set of uni-
versal relationships. These relationships connect
Prolog variables with each other, as well as with
constants. An easy tool to interpret and execute
Prolog queries is SWI-Prolog. 2
Here is an overview of the GeoQuery database
in Prolog format. The database entries have the
following pattern:
# state(name, abbreviation,
capital, population, area,
state number, city1, city2,
city3, city4)
# city(state, state abbreviation,
name, population)
# river(name, length, [states
through which it flows])
# border(state,
2
http://www.swi-prolog.org/
state abbreviation, [states that
border it])
# highlow(state,
state abbreviation, highest point,
highest elevation, lowest point,
lowest elevation)
# mountain(state,
state abbreviation, name, height)
# road(number, [states it passes
through])
# lake(name, area, [states it is
in])
and here we provide some instances of said pat-
terns as an example:
# state(’arkansas’, ’ar’, ’little
rock’, 2286.0e+3, 53.2e+3,25,
’little rock’, ’fort smith’,
’north little rock’, ’pine
bluff’).
# state(’california’, ’ca’,
’sacramento’, 23.67e+6,
158.0e+3,31, ’los angeles’, ’san
diego’, ’san francisco’, ’san
jose’).
# state(’colorado’, ’co’,
’denver’, 2889.0e+3, 104.0e+3,38,
’denver’, ’colorado springs’,
’aurora’, ’lakewood’).
...
# river(’mississippi’, 3778,
[’minnesota’, ’wisconsin’,
’iowa’, ’illinois’, ’missouri’,
’kentucky’, ’tennessee’,
’arkansas’, ’mississippi’,
’louisiana’, ’louisiana’]).
# river(’missouri’, 3968,
[’montana’, ’north dakota’,
’south dakota’, ’iowa’,
’nebraska’, ’missouri’,
’missouri’]).
# river(’colorado’, 2333,
[’colorado’, ’utah’, ’arizona’,
’nevada’, ’california’]).
Apart from the entries in the database following
a known pattern, relationships have to be defined
in order to be understood by the Prolog interpreter.
Here are two examples of such definitions in Pro-
log:
4. # loc(cityid(City,St),
stateid(State)):-
city(State,St,City, ).
# const(V,V).
The first example, the relation ”loc”, indicates
that when ”loc” appears, if the input variables are
a city and a state, the interpreter has the informa-
tion available in the first, second and third proper-
ties of the corresponding city entry (State, St and
City properties). The second example, the rela-
tion ”const”, indicates that both inputs are to be
associated together. ”Const” is useful for defining
Prolog variables as constants.
And finally we present one of the Prolog query
samples contained in the GeoQuery database:
answer(A,(city(A),loc(A,B),
const(B,stateid(virginia)))).
This query will retrieve all cities in the state of
Virginia. As we can see, the query begs for A to
be returned, where A is a city, and A is located
in B, which corresponds to a state with constant
ID equal to Virginia. We will use this query as
the example input for the algorithms described in
section 4.
3.2 SPARQL
SPARQL (”SPARQL Protocol and RDF Query
Language”) is an RDF query language. In
other words, it is a semantic query language for
databases, able to retrieve and manipulate data
stored in Resource Description Framework (RDF)
format. It is recognized as one of the key tech-
nologies of the semantic web, and it has become
an official W3C Recommendation. A very instruc-
tive SPARQL tutorial can be found in the Apache
Jena homepage 3. We suggest either executing
SPARQL queries with the Apache Jena frame-
work, or with a Virtuoso server. 4 In this subsec-
tion we shall explain the very basics of SPARQL,
please refer to the aforementioned tutorial to actu-
ally learn SPARQL.
SPARQL is a very popular language to query
RDF graphs. Important datasets like Freebase are
stored in RDF format. RDF graphs basically con-
3
http://jena.apache.org/tutorials/
sparql.html
4
http://kidehen.typepad.com/kingsley_
idehens_typepad/
sist on a set of triples or statements – patterns
like Subject <Verb> Object or Entity1
<Relationship> Entity2. Here is an ex-
ample of said RDF graphs:
<state25> <type> ’state’ .
<state25> <name> ’mississippi’ .
...
...
<river1> <type> ’river’ .
<river1> <name> ’mississippi’ .
SparQL matches triple templates with the RDF
graphs and returns the triples that fit the blueprint.
The templates are expressed with SPARQL vari-
ables, which can be recognized because they start
with a ”?” symbol. Here we have two examples of
SPARQL queries:
SELECT ?x WHERE {
?x <name> ’mississippi’ .
}
SELECT ?x WHERE {
?x <type> ’state’ .
?x <name> ’mississippi’ .
}
As we can see, ?x is a SPARQL variable, and
in both cases it tries to match the elements that
appear in the left of RDF triples. It is also the vari-
able that is selected with the SELECT operator, it
is thus the variable to be queried and its values re-
turned as an answer to the query. The WHERE
block indicates the patterns that the graphs must
match. All patterns in the WHERE block must be
matched in order for the entity in the left be as-
sociated to ?x. Thus, if we execute both queries
on the RDF example above, the first query will
return <state25> and <river1> as answers,
but the second query will only return <state25>
as its answer, because only <state25> matches
both ?x <type> ’state’ and ?x <name>
’mississippi’, whereas <river1> only
matches the latter.
3.3 λ-DCS
Lambda dependency-based compositional seman-
tics (λ-DCS) is a new formal language for repre-
senting logical forms in semantic parsing. It was
developed by Percy Liang. (Liang, 2013) It at-
5. tempts to express logical forms in a simpler way
than Lambda Calculus. By eliminating variables
and making existential quantification implicit, λ-
DCS logical forms are generally more compact
than those in Lambda Calculus. Compared to the
graph structure of SPARQL, the tree structure in
λ-DCS, as well as the absence of variables, should
be very helpful when attempting to associate the
generated trees that are produced from parsing nat-
ural language utterances and the database queries
that will retrieve the answer.
To provide an insight on how λ-DCS is nota-
tionally simpler than lambda calculus, compare
the following expressions:
- Natural language utterance: “people who have
lived in Seattle”
– Logical form (lambda calculus):
λx.∃e.PlacesLived(x,e) ∧ Location(e,Seattle)
– Logical form (λ-DCS):
PlacesLived.Location.Seattle
– SEMPRE notation:
(!<name> (and (<type> ’people’)
(<livedin> (and (<type> ’state’)
(<name> ’seattle’))))
All express the same concept, however λ-DCS
lacks variables and thus has a much more sim-
plified expression compared to Lambda Calcu-
lus. If the reader is interested in a deeper un-
derstanding of Lambda Calculus, we recommend
Barendrengt’s ”Introduction to lambda calculus”
(Barendregt and Barendsen, 1984). SEMPRE is a
toolkit for training semantic parsers, which map
natural language utterances to denotations (an-
swers) via intermediate logical forms. It is the
toolkit Percy Liang developed in order to execute
λ-DCS queries. 5 We also used the SEMPRE
toolkit in this work, and when in this report we
refer to λ-DCS queries, we are actually referring
to the SEMPRE notation of them, not their logi-
cal form. The SEMPRE query above can be read
as: ”return the name of all entities of type ’peo-
ple’ that have lived in the entities of type ’state’
and name ’seattle’. The logical group nature of
this language is thus clearly manifest, with opera-
tions such as intersection (and) and union (or) be-
ing used.
When executing λ-DCS queries, the SEM-
5
http://nlp.stanford.edu/software/
sempre/
PRE toolkit automatically transforms them into an
equivalent SPARQL query which then executes in
a Virtuoso server. Thus the λ-DCS → SPARQL
transformation is implicit within the toolkit. In
this paper we attempt to perform the opposite
transformation, SPARQL → λ-DCS, which can
be extremely useful for the aforementioned QA
project.
4 Transformation algorithms
In this section we will describe the transformation
algorithms step by step. Because the algorithms
are much easier to understand given a specific ex-
ample, we will use a simple sample query asso-
ciated to the question ”What are the cities in Vir-
ginia?”
4.1 Prolog → SPARQL Algorithm
The first thing to note is that the GeoQuery
database is not provided in RDF format. Thus, we
first need to transform the database to RDF format
so that SPARQL and λ-DCS queries can be exe-
cuted on the GeoQuery database. There are many
different trivial ways to do this transformation. In
our case we chose to transform an entry into a ge-
ographical entity with associated properties, all of
the entries constituting different and independent
graphs (no linking between graphs). For example,
this entry:
# state(’arizona’,’az’,’phoenix’,
2718.0e+3,114.0e+3,48,’phoenix’,
’tucson’,’mesa’,’tempe’).
transforms to:
<state3> <type> ’state’ .
<state3> <name> ’arizona’ .
<state3> <abbreviation> ’az’ .
<state3> <capital> ’phoenix’ .
<state3> <population> 2718.0e+3 .
<state3> <area> 114.0e+3 .
<state3> <state number> 48 .
<state3> <city1> ’phoenix’ .
<state3> <city2> ’tucson’ .
<state3> <city3> ’mesa’ .
<state3> <city4> ’tempe’ .
Note that an extra property, <type>, is added
to clarify that the geographical entity <state>
is a state. Prolog identifies directly that the entry
6. in a database is a state, the RDF format does not,
however.
Now that the GeoQuery database is also in RDF
format, we can attempt to transform the sample
Prolog queries into SPARQL queries. As stated
before, the Prolog relationships are arbitrarily se-
lected by the GeoQuery database, and thus this
Prolog → SPARQL transformation will not be
universal, but specific to this particular case of
the GeoQuery database. Other Prolog relation-
ships would require other transformations. The
SPARQL → λ-DCS transformation that will be
proposed afterwards, however, is indeed intended
to be universal. We will first enumerate the steps
in abstract, and then explain how the algorithm ex-
ecutes on an example Prolog query.
1. Use the NLTK toolkit to create a tree from
the Prolog query
2. Identify the Prolog variables (single capital
letters in the leaves of the tree) and store them
in an empty variables dictionary
3. Identify the type of the Prolog variables and
store the type in the variables dictionary
3.1. Look for single-leaf nodes
3.2. Look for two-leaf ”const” labeled nodes
3.3. Leave the non-type-informing nodes for
the next step
4. With the type of the variables, indicate the
relationships between the Prolog variables in
SPARQL format. This step must interpret the
nodes of the tree that were not interpreted in
Step 3.
5. Organize all collected information in a cor-
rect RDF graph form. In this work we opted
to have a string that concatenated the infor-
mation progressively as the algorithm was
executed.
Now we will see the algorithm executed on an
example. Given the sample query in Prolog that
retrieves all the cities in the state of Virginia:
answer(A,(city(A),loc(A,B),
const(B,stateid(virginia)))).
the first step is to parse such query and create an
NLTK tree from it. This is how the tree looks like:
(Step 1)
Note that the label ”goals” has been added at an
unnamed node in the original Prolog query. The
name is arbitrary and is simply there so that all
nodes in the NLTK tree are labeled. The next step
is to identify the Prolog variables (Step 2) and their
type (Step 3), information required to interpret the
other Prolog relationships. Variables are single
capital letters located in the leaves of the tree. We
can obtain the type of the variable either from the
label of single-leaf subtrees - e.g. ”city(A)” tells
us A is a city - or from two-leaf subtrees with the
label ”const” - e.g. const(B,stateid(virginia)) tells
us B is a state, and its name is Virginia. We can
thus now create a variable dictionary containing
all the variables and their corresponding types:
varDict={’A’:’city’,’B’:’state’}
Finally, we are now able to interpret the other
subtrees or Prolog relationships like loc(A,B) -
which informs that city A is located in state B.
(Step 4) We would be unable to infer this without
the type of A and B (”loc” could refer to a river
A located in state B, a relationship with a different
name), so these subtrees can only be interpreted at
this stage. Now that all information about the Pro-
log relationships has been retrieved, it can now be
expressed as a SPARQL query: (Step 5)
SELECT ?city WHERE {
?xA <name> ?city .
?xA <type> "city" .
?xA <name> ?A .
?xB <type> "state" .
?xB <name> ?B .
?xB <name> "virginia" .
?xA <state> ?B .
}
7. This is the equivalent query in SPARQL that re-
trieves all the cities in the state of Virginia. Prolog
variables are expressed with an extra ”x” in front
of them when they appear on the left in SPARQL
because they reference geographical entities. In
the right they reference String names, and thus
need to be differentiated. It also helps for clar-
ity purposes. A further post-processing of this
SPARQL query is possible, condensing for exam-
ple these four statements
?xB <type> "state" .
?xB <name> ?B .
?xB <name> "virginia" .
?xA <state> ?B .
into just this one:
?xA <state> "virginia" .
However the SPARQL query that the algorithm
provides expresses all the information given in the
original Prolog query, and thus we chose to leave
it like it is.
4.2 SPARQL → λ-DCS Algorithm
From the SPARQL query we obtained with the
previous algorithm we will now attempt to create
an equivalent λ-DCS query. We will first present
the steps of the algorithm in abstract, and in the
next step we can see the algorithm applied to the
sample query and the result we eventually arrive
to.
1. Parse and interpret every line in the SPARQL
query, and create a variable dictionary
1.1. Identify the variables (they start with
”?”)
1.2. Assign relationships between variables
and/or constants
1.3. Add the reverse relationships (starting
with ”!”) in the target-variables
2. Traverse the variable dictionary to eliminate
the SPARQL variables
2.1. Start with the selected variable
2.2. transcribe its relationships
2.3. 0 relationships → [], more than 1 rela-
tionship → ”and” operator
2.4. select next SPARQL variable to tra-
verse, and repeat this step until all vari-
ables have been traversed
3. Add special options (”Count”, ”Limit”,
”Or”...) where appropiate
In the following page the reader can find this
algorithm applied iteration by iteration with the
sample query from the previous section.
8. EXECUTION OF THE ALGORITHM IN A
SAMPLE QUERY, STEP BY STEP
SELECT ?city WHERE {
?xA <name> ?city .
?xA <type> "city" .
?xA <name> ?A .
?xB <type> "state" .
?xB <name> ?B .
?xB <name> "virginia" .
?xA <state> ?B .
}
Step 0: original SPARQL query
Step 1: Create a variable dictionary with relation-
ships
Step 2: Start from the selected variable (green)
Step 3: Continue with next variable (?xA). Sev-
eral relationships translate as an ”and” operator in
SEMPRE. The reverse relationship with the previ-
ous variable is always ignored.
Step 4: Next variables: ?B and ?A. ?A has no
relationships left after ignoring its reverse rela-
tionship. Thus, it translates as a general variable,
[] in SEMPRE.
Step 5: Last variable: ?xB. No variables left -
SEMPRE query finished.
9. Now we mention some clarifying comments to
the steps above. Once again we wish to identify all
the variables in the query, but this time the vari-
able dictionary will also contain all the relation-
ships of said variables with constants and other
variables. Note that, unlike in Prolog, ?xA and
?A are different variables in SPARQL. In our vari-
able dictionary we will also include the reverse
relationships between variables, expressed in λ-
DCS with an exclamation mark ”!” in front of
the relationship. For example, when we read the
line ?xA <state> ?B . we will include both
xA - <state> - ?B and ?B - !<state>
- ?xA in the variable dictionary. This will be
essential when traversing variables. The variable
dictionary created from our sample query is in
Step 1.
In order to eliminate the SPARQL variables we
will iterate the transcription of relationships as we
traverse the variable dictionary. We start at the se-
lected variable, retrieved from the query’s first line
SELECT ?city WHERE { and transcribe its
only property into λ-DCS format. (Step 2)
We now proceed to eliminate ?xA. We ignore
the reverse relationship that joins ?xA with ?city
(as this relationship has just been transcribed)
and focus on all the other relationships - ”type”,
”name” and ”state”. (Step 3) Because we have
more than one relationship to transcribe from ?xA,
we use the λ-DCS ”and” operator to express the
intersection of the groups expressed by these rela-
tionships.
We repeat this step until we have traversed all
variables in the dictionary. In the particular case
of variable ?A, (Step 4) its only relationship is
ignored because it was already expressed in the
previous iteration. This leaves no relationships to
transcribe and thus ?A is changed to [], the λ-DCS
operator that expresses an undefined variable. Af-
ter all the iterations we arrive at the final λ-DCS
expression: (Step 5)
This is the equivalent query in λ-DCS that re-
trieves all the cities in the state of Virginia. As can
be noted, Prolog and SPARQL variables have dis-
appeared and only a mathematical group expres-
sion remains.
As a last word, recall that this SPARQL to λ-
DCS algorithm is much more relevant, as it is in-
tended to be universal. SPARQL grammar and re-
lationships are not arbitrary like Prolog relation-
ships, thus one would expect this algorithm to per-
form well no matter the database and SPARQL
queries that are provided as input.
5 Results
Apart from the sample query that corresponds to
”What are the cities in Virginia?”, the Appendix
in Section 9 provides more examples of queries
transformed by the developed algorithms. In the
Appendix the reader can understand the different
nature of the different query languages by com-
paring equivalent queries, and appreciate Prolog
as a variable propositional language, SPARQL as
a graph language and λ-DCS as a variable-less
mathematical group language.
Note how for one of the queries, an equivalent
λ-DCS query was not possible to obtain with the
described algorithms. Besides, the Prolog query
and the SPARQL query there return different re-
sults. Flaws and limitations of the algorithms are
discussed in the next section.
Overall, the algorithms were able to success-
fully transform about 90% of the 880 sam-
ple queries provided in the GeoQuery database.
By ”successfully transform” we mean that these
queries have been correctly expressed in Prolog,
SPARQL and λ-DCS, providing equivalent results
when executed. The queries that were not suc-
cessfully transformed into some language along
with those whose transformation did not provide
equivalent results make the remaining 10%. The
algorithms’ coverage is thus pretty satisfactory,
especially considering that all basic queries can
be successfully transformed with these algorithms.
Other operators apart from the basic ones, e.g.
count, descending order, max, union/or..., were
also successfully interpreted by the algorithms.
The algorithms still admit however refinement,
improvement and extensions, since some opera-
tors are not yet included or are problematic. These
problems and limitations will be outlined in the
following section.
6 Encountered Problems and
Limitations
The development of the explained algorithms did
encounter some difficulties, and in some cases
we were unable to successfully transform certain
queries. Here we will mention and explain where
relevant these difficulties and how they were over-
come, if it was the case.
10. First we will address some of the prob-
lems encountered when treating with Prolog.
Due to Prolog’s arbitrary definition of rela-
tionships, it is obvious that in some cases
one could define a better relationship to ease
the transformation to SPARQL. For example,
instead of the relationships capital(A)
+ loc(A,B) it would be much better to
define the relationship capital(A,B),
which combines both and avoids having to
create a statement ?xStateCapitalOf
<capital> ?A . that contains an undefined
variable. In addition, some of the sample Geo-
Query queries contain redundancy that then
spreads as identical statements in SPARQL. As
seen in Query 2 of the Appendix, the Prolog
state(B),const(B,stateid(oregon))
could simply be expressed as
const(B,stateid(oregon)), without re-
dundancy. Finally, inconsistencies within the for-
mat of the Prolog GeoQuery database obviously
leads to problems when trying to test equivalent
queries. The property <lowest elevation>,
for example, is only defined for those states
that do not border the sea. Those states which
do border the sea are assumed to have lowest
elevation equal to zero, however the absence of
such a relationship leads to the inconsistencies
between Prolog and SPARQL queries expressed
in query Y of Table X, apart from requiring
an OPTIONAL operator in SPARQL which, as
we will address briefly, cannot be expressed in
λ-DCS.
The main problem when treating with SPARQL
was the absence of a good SPARQL parser, which
would greatly simplify interpreting nested boxes.
The algorithm thus far can only interpret a very
simple UNION nested block, but not for example
a query like this:
SELECT ?A WHERE {
SELECT ?B WHERE {
...
}
...
}
As a mathematical group and logical language,
λ-DCS has a wider coverage than SPARQL, but
only when one group or one variable is being
queried. A serious limitation of λ-DCS is that the
language is unable to query two variables or two
groups simultaneously. For example, a query to
retrieve the name and surname of all employees in
a company would look like this in SPARQL:
SELECT ?name ?surname WHERE {
?person <name> ?name .
?person <surname> ?surname .
}
This SPARQL query will return a table of two
columns, one column for the names and one col-
umn for the corresponding surnames. However,
due to having two variables to be retrieved, it
is impossible to express this in λ-DCS. λ-DCS
could retrieve a list of the names of the employ-
ees and a list of the surnames of the employees,
i.e. two separate lists, but not a single list with
name-surname pairs, which would be the equiv-
alent of the two-column answer from SPARQL.
This is obviously a big setback to transforming
any SPARQL query to an equivalent λ-DCS ex-
pression, as a huge strength of SPARQL is re-
trieving associated variables and properties in ta-
bles, and thus a large amount of SPARQL queries
will have more than one variable retrieved and
will be impossible to transform to λ-DCS. Fur-
thermore, the OPTIONAL operator in SPARQL
cannot be expressed as a logical mathematical
group, which means it cannot transform to λ-
DCS either. The SPARQL OPTIONAL operator
addresses the sparsity and irregularity of proper-
ties in RDF graph databases, allowing a query to
match a relationship whether it exists or not. For
example, a SPARQL query that retrieves the name
of the lowest point of a state and its correspond-
ing height, IF it exists in the database, would be
similar to this:
SELECT ?lowpoint ?height WHERE {
?xA <type> ’highlow’ .
?xA <lowest point> ?lowpoint .
OPTIONAL (?xA <lowest height>
?height)
}
The result would be a table with two columns:
one column for the name of the lowest point, and
one column for its corresponding height. If the
<lowest height> property is not found, the
name will still be retrieved and its correspond-
11. ing cell in the second column would be left blank.
This cannot be expressed in λ-DCS because of the
OPTIONAL operator and, as explained above, be-
cause of the existence of more than one variable to
be retrieved. It is true that the OPTIONAL oper-
ator’s main utility in SPARQL is most of the time
tied to retrieving more than one variable, so these
two limitations can generally be seen as one. It is
however, as explained above, a considerable limi-
tation to expressing SPARQL queries in λ-DCS, as
multivariable queries and OPTIONAL operators
are quite common in SPARQL queries. The only
way to tackle this problem would be to develop an
extension to λ-DCS that would effectively allow
for multiple logical groups to be retrieved simul-
taneously as well as allowing some properties of
said groups to be optional.
7 Future Work
As already stated throughout this paper, there is
yet big room for improvement in refining the pro-
posed algorithms, either by extensions to cover
more operators or using tools to better interpret the
input queries. For example, if a good SPARQL
parser were to be developed then interpreting
nested SPARQL blocks would become a much
more feasible task.
Another important step to take is to test the pro-
posed algorithms in other data sets and observe
their performance. The Prolog → SPARQL al-
gorithm obviously does not generalize well due
to Prolog’s arbitrary declarative nature, however
the SPARQL → λ-DCS algorithm is designed
to be universal. Thus the latter should definitely
be tested using sample SPARQL queries from
databases like Freebase or QALD 6 as input.
Finally, we eagerly await the deployment of the
aforementioned QA project that would make full
use of the proposed algorithms for its purposes.
The success of said project would broaden the per-
spective of the utility behind transforming queries
between different query languages.
8 Conclusion
In this work we carried out the transformation be-
tween the Prolog, SPARQL and λ-DCS query lan-
guages. We discovered that it is a feasible task
when treating with queries that originate from nat-
6
http://greententacle.techfak.
uni-bielefeld.de/˜cunger/qald/index.
php?x=home&q=5
ural language utterances or requests. We are sat-
isfied on how the proposed algorithms are able to
transform the big majority of basic queries suc-
cessfully, and we consider it would be worthy to
continue the work and refine the algorithms.
Furthermore, working on these transformations
has brought us a deeper understanding on the sim-
ilarities and differences of the target query lan-
guages, and how some adapt better to different
tasks. As one could intuitively think from the be-
ginning, there are also concepts and queries that
can not be expressed in all languages, and thus a
total coverage transformation is impossible. How-
ever, this should not be a setback to performing
said transformations where they are viable. In-
deed, we hope to see the QA project reach the full
potential of the transformations presented in this
paper.
12. 9 APPENDIX: Queries in different query languages
Utterance: What are the cities in Virginia?
Prolog:
answer(A,(city(A),loc(A,B), const(B,stateid(virginia)))).
SPARQL:
SELECT ?city WHERE {
?xA <name> ?city .
?xA <type> "city" .
?xA <name> ?A .
?xB <type> "state" .
?xB <name> ?B .
?xB <name> "virginia" .
?xA <state> ?B .
}
λ-DCS:
(!<name> (and (and (<type> ’city’) (<name> [])) (<state> (!<name>
(and (<type> ’state’) (<name> ’virginia’))))))
Query 1 (Used as example)
Utterance: What is the name of the highest point in Oregon?
Prolog:
answer(A,highest(A,(place(A),loc(A,B),state(B),const(B,stateid(oregon))))).
SPARQL:
SELECT ?A WHERE {
?xB <type> "state" .
?xB <name> ?B .
?xB <type> "state" .
?xB <name> ?B .
?xB <name> "oregon" .
?xA <highest point> ?A .
?xA <highest elevation> ?height0 .
?xA <state> ?B .
}
ORDER BY DESC(?height0) LIMIT 1
λ-DCS:
(!<highest point> (argmax 1 1 (<state> (!<name> (and (and
(<type> ’state’) (<type> ’state’)) (<name> (!<name> (and (and
(<type> ’state’) (<type> ’state’)) (<name> ’oregon’)))))))
<highest elevation>))
Query 2
13. Utterance: What is the capital of Texas?
Prolog:
answer(A,(capital(A),loc(A,B),const(B,stateid(texas)))).
SPARQL:
SELECT ?A WHERE {
?xB <type> "state" .
?xB <name> ?B .
?xB <name> "texas" .
?xStateCapitalOf <capital> ?A .
?xA <name> ?A .
?xA <state> ?B .
}
λ-DCS:
(and (!<capital> []) (!<name> (<state> (!<name> (and (<type> ’state’)
(<name> ’texas’))))))
Query 3
Utterance: How many states have a lower elevation than Arizona?
Prolog:
answer(A,count(B,(state(B),low point(B,C),lower(C,D),
low point(E,D),const(E,stateid(arizona))),A)).
SPARQL: (does not provide equivalent results as the Prolog query!)
SELECT (COUNT (?B) AS ?numberOFstate) WHERE {
?xB <type> "state" .
?xB <name> ?B .
?xC <state> ?B .
?xC <lowest point> ?C .
?xE <type> "state" .
?xE <name> ?E .
?xE <name> "arizona" .
OPTIONAL {?xC <lowest elevation> ?height0 . }
OPTIONAL {?xD <lowest elevation> ?height1 . }
FILTER ( IF ( BOUND(?height1), ?height0, 0 ) <
IF ( BOUND(?height1), ?height1, 0 ) )
?xD <state> ?E .
?xD <lowest point> ?D .
}
λ-DCS: Not possible! (See Section 6: Encountered problems and limitations)
Query 4
14. Utterance: What is the name of the lakes in Michigan?
Prolog:
answer(A,(lake(A),loc(A,B),const(B,stateid(michigan)))).
SPARQL:
SELECT ?lake WHERE {
?xA <name> ?lake .
?xA <type> "lake" .
?xA <name> ?A .
?xB <type> "state" .
?xB <name> ?B .
?xB <name> "michigan" .
?xA <isin> ?B .
}
λ-DCS:
(!<name> (and (and (<type> ’lake’) (<name> [])) (<isin> (!<name> (and
(<type> ’state’) (<name> ’michigan’))))))
Query 5
Utterance: How many rivers flow through Colorado?
Prolog:
answer(A,count(B,(river(B),loc(B,C),const(C,stateid(colorado))),A)).
SPARQL:
SELECT (COUNT (?B) AS ?numberOFriver) WHERE {
?xB <type> "river" .
?xB <name> ?B .
?xC <type> "state" .
?xC <name> ?C .
?xC <name> "colorado" .
?xB <flowsthru> ?C .
}
λ-DCS:
(count (!<name> (and (<type> ’river’) (<flowsthru> (!<name> (and
(<type> ’state’) (<name> ’colorado’)))))))
Query 6
15. Utterance: What are the names of the highest points of the states bordering Mississippi?
Prolog:
answer(A,(high point(B,A),state(B),next to(B,C),const(C,stateid(mississippi)))).
SPARQL:
SELECT ?A WHERE {
?xB <type> "state" .
?xB <name> ?B .
?xC <type> "state" .
?xC <name> ?C .
?xC <name> "mississippi" .
?x0 <state> ?B .
?x0 <highest point> ?A .
?x1 <state> ?C .
?x1 <borderingstate> ?B .
}
λ-DCS:
(!<highest point> (<state> (and (!<name> (<type> ’state’))
(!<borderingstate> (<state> (!<name> (and (<type> ’state’) (<name>
’mississippi’))))))))
Query 7
Utterance: Give me all the cities in the USA
Prolog:
answer(A,(city(A),loc(A,B),const(B,countryid(usa)))).
SPARQL:
SELECT ?A WHERE {
?xA <type> "city" .
?xA <name> ?A .
}
λ-DCS:
(!<name> (<type> ’city’))
Query 8
16. References
Uwe A mann, Andreas Bartho, and Christian Wende. 2010. Reasoning Web. Semantic Technologies for Soft-
ware Engineering: 6th International Summer School 2010, Dresden, Germany, August 30-September 3, 2010.
Tutorial Lectures, volume 6325. Springer Science & Business Media.
Henk P Barendregt and Erik Barendsen. 1984. Introduction to lambda calculus. Nieuw archief voor wisenkunde,
4(2):337–372.
Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on freebase from question-
answer pairs. In EMNLP, pages 1533–1544.
Percy Liang. 2013. Lambda dependency-based compositional semantics. arXiv preprint arXiv:1309.4408.