From TREC to Watson: is open domain question answering a solved problem?Constantin Orasan
The document summarizes a presentation on question answering systems. It begins by providing context on information overload and defining question answering. It then discusses the evolution of QA systems from early databases to today's open-domain systems. The presentation focuses on IBM's Watson system, providing an overview of its unprecedented ability to answer open-domain questions as well as the massive resources required for its development. It concludes by arguing that open-domain QA remains unsolved and that closed-domain, interactive QA may be more practical for real-world applications.
Question Answering - Application and ChallengesJens Lehmann
This document provides an overview of question answering applications and challenges. It defines question answering as receiving natural language questions and providing concise answers. Recent developments in question answering systems are discussed, including IBM Watson. Challenges for question answering over semantic data are explored, such as lexical gaps, ambiguity, granularity, and alternative resources. Large-scale linguistic resources and machine learning approaches for question answering are also covered. Applications of question answering technologies are examined.
The document discusses question answering over knowledge graphs. It introduces question answering and describes how knowledge graphs can be used to answer natural language questions. It summarizes three proposed papers on learning knowledge graphs for question answering through dialogs, automated template generation for question answering over knowledge graphs, and generating knowledge questions from knowledge graphs. The document also covers motivation for question answering, defining characteristics, different methods like template-based and dialog-based systems, evaluating knowledge quality, and examples of question answering systems.
Open domain Question Answering System - Research project in NLPGVS Chaitanya
Using a computer to answer questions has been a human dream since the beginning of the digital era. A first step towards the achievement of such an ambitious goal is to deal with natural language to enable the computer to understand what its user asks. The discipline that studies the connection between natural language and the representation of its meaning via computational models is computational linguistics. According to such discipline, Question Answering can be defined as the task that, given a question formulated in natural language , aims at finding one or more concise answers. And the Improvements in Technology and the Explosive demand for better information access has reignited the interest in Q & A systems , The wealth of the information on the web makes it an Interactive resource for seeking quick Answers to factual Questions such as “Who is the first American to land in space ?”, or “what is the second Tallest Mountain in the world ?”, yet Today’s Most advanced web Search systems(Bing , Google , yahoo) make it Surprisingly Tedious to locate the Answers , Q& A System Aims to develop techniques that go beyond Retrieval of Relevant documents in order to return the exact answers using Natural language factoid question
The World Wide Web is moving from a Web of hyper-linked documents to a Web of linked data. Thanks to the Semantic Web technological stack and to the more recent Linked Open Data (LOD) initiative, a vast amount of RDF data have been published in freely accessible datasets connected with each other to form the so called LOD cloud. As of today, we have tons of RDF data available in the Web of Data, but only a few applications really exploit their potential power. The availability of such data is for sure an opportunity to feed personalized information access tools such as recommender systems. We will show how to plug Linked Open Data in a recommendation engine in order to build a new generation of LOD-enabled applications.
(Lecture given @ the 11th Reasoning Web Summer School - Berlin - August 1, 2015)
Basic introduction to recommender systems + Implementing a content-based recommender system by leveraging knowledge encoded into Linked Open Data datasets
This document discusses recommender systems and linked open data. It begins with an introduction to linked open data, describing its key components like URIs, RDF, and popular vocabularies. It then provides an overview of recommender systems, explaining how they help with information overload by matching users to items. Different recommendation techniques are described like collaborative filtering, content-based, knowledge-based, and hybrid approaches. Evaluation methods for recommender systems like dataset splitting are also briefly covered. The document aims to lay the foundation for discussing how recommender systems can utilize linked open data.
From TREC to Watson: is open domain question answering a solved problem?Constantin Orasan
The document summarizes a presentation on question answering systems. It begins by providing context on information overload and defining question answering. It then discusses the evolution of QA systems from early databases to today's open-domain systems. The presentation focuses on IBM's Watson system, providing an overview of its unprecedented ability to answer open-domain questions as well as the massive resources required for its development. It concludes by arguing that open-domain QA remains unsolved and that closed-domain, interactive QA may be more practical for real-world applications.
Question Answering - Application and ChallengesJens Lehmann
This document provides an overview of question answering applications and challenges. It defines question answering as receiving natural language questions and providing concise answers. Recent developments in question answering systems are discussed, including IBM Watson. Challenges for question answering over semantic data are explored, such as lexical gaps, ambiguity, granularity, and alternative resources. Large-scale linguistic resources and machine learning approaches for question answering are also covered. Applications of question answering technologies are examined.
The document discusses question answering over knowledge graphs. It introduces question answering and describes how knowledge graphs can be used to answer natural language questions. It summarizes three proposed papers on learning knowledge graphs for question answering through dialogs, automated template generation for question answering over knowledge graphs, and generating knowledge questions from knowledge graphs. The document also covers motivation for question answering, defining characteristics, different methods like template-based and dialog-based systems, evaluating knowledge quality, and examples of question answering systems.
Open domain Question Answering System - Research project in NLPGVS Chaitanya
Using a computer to answer questions has been a human dream since the beginning of the digital era. A first step towards the achievement of such an ambitious goal is to deal with natural language to enable the computer to understand what its user asks. The discipline that studies the connection between natural language and the representation of its meaning via computational models is computational linguistics. According to such discipline, Question Answering can be defined as the task that, given a question formulated in natural language , aims at finding one or more concise answers. And the Improvements in Technology and the Explosive demand for better information access has reignited the interest in Q & A systems , The wealth of the information on the web makes it an Interactive resource for seeking quick Answers to factual Questions such as “Who is the first American to land in space ?”, or “what is the second Tallest Mountain in the world ?”, yet Today’s Most advanced web Search systems(Bing , Google , yahoo) make it Surprisingly Tedious to locate the Answers , Q& A System Aims to develop techniques that go beyond Retrieval of Relevant documents in order to return the exact answers using Natural language factoid question
The World Wide Web is moving from a Web of hyper-linked documents to a Web of linked data. Thanks to the Semantic Web technological stack and to the more recent Linked Open Data (LOD) initiative, a vast amount of RDF data have been published in freely accessible datasets connected with each other to form the so called LOD cloud. As of today, we have tons of RDF data available in the Web of Data, but only a few applications really exploit their potential power. The availability of such data is for sure an opportunity to feed personalized information access tools such as recommender systems. We will show how to plug Linked Open Data in a recommendation engine in order to build a new generation of LOD-enabled applications.
(Lecture given @ the 11th Reasoning Web Summer School - Berlin - August 1, 2015)
Basic introduction to recommender systems + Implementing a content-based recommender system by leveraging knowledge encoded into Linked Open Data datasets
This document discusses recommender systems and linked open data. It begins with an introduction to linked open data, describing its key components like URIs, RDF, and popular vocabularies. It then provides an overview of recommender systems, explaining how they help with information overload by matching users to items. Different recommendation techniques are described like collaborative filtering, content-based, knowledge-based, and hybrid approaches. Evaluation methods for recommender systems like dataset splitting are also briefly covered. The document aims to lay the foundation for discussing how recommender systems can utilize linked open data.
Profile-based Dataset Recommendation for RDF Data Linking Mohamed BEN ELLEFI
This document summarizes Mohamed Ben Ellefi's PhD thesis defense on profile-based dataset recommendation for RDF data linking. The thesis proposes two approaches: a topic profile-based approach and an intensional profile-based approach. The topic profile-based approach models datasets as topics and recommends target datasets based on similarity between source and target topic profiles, achieving an average recall of 81% and reducing the search space by 86%. The approach shows better performance than baselines but needs improvement on precision.
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerFrancesco Osborne
The document summarizes research on automatically classifying Springer Nature proceedings using the Smart Topic Miner (STM). STM extracts topics from publications, maps them to a computer science ontology, selects relevant topics using a greedy algorithm, and infers tags. It was tested on 8 Springer Nature editors who found STM accurately classified 75-90% of proceedings and improved their work. However, STM is currently limited to computer science and occasional noisy results were found in books with few chapters. Future work aims to expand STM to characterize topic evolution over time and directly support author tagging.
This document describes an approach for bridging the gap between natural language queries and linked data concepts using BabelNet. The approach uses BabelNet for word sense disambiguation, named entity recognition and disambiguation. It parses queries, matches terms to ontology concepts and properties, generates candidate triples, and integrates the triples to produce SPARQL queries. The approach was evaluated on test data from QALD-2, achieving a promising 76% of questions answered correctly.
This document provides an overview of spoken content retrieval. It discusses how spoken content retrieval works, involving speech recognition to convert spoken queries/documents into text that can then be used for information retrieval. However, recognition errors pose challenges. The document thus explores techniques beyond using only recognition 1-best outputs, such as using lattices, confusion networks, and subword units to handle out-of-vocabulary words and improve retrieval accuracy. It also presents examples of integrating different recognition clues, training retrieval models, and directly matching spoken queries to documents without recognition.
This document discusses practical aspects of natural language processing (NLP) work. It contrasts research work, which involves setting goals, devising algorithms, training models, and testing accuracy, with development work, which focuses on implementing algorithms as scalable APIs. The document emphasizes that obtaining data is crucial for NLP and describes sources for structured, semi-structured, and unstructured data. It recommends Lisp as a language that supports the interactivity, flexibility, and tree processing needed for NLP research and development work.
Apply chinese radicals into neural machine translation: deeper than character...Lifeng (Aaron) Han
The document proposes incorporating Chinese radicals into neural machine translation models. It discusses related work incorporating word and character level information into neural MT. The proposed model combines radical-level MT with an attention-based neural model, representing input text with word, character, and radical combinations. Experiments show the character+radical and word+radical models outperform baselines on standard MT evaluation metrics using a Chinese-English dataset. Future work includes improving model optimization and testing on additional data.
This document summarizes a presentation on the basics of Python programming. It introduces fundamental Python concepts like datatypes, functions, methods, and indentation-based code structuring. It also announces an exercise for the attendees to practice these basics and previews upcoming meetings that will involve working with structured datasets in Python.
Practical Machine Learning - Part 1 contains:
- Basic notations of ML (what tasks are there, what is a model, how to measure performance)
- A couple of examples of problems and solutions (taken from previous work)
- A brief presentation of open-source software used for ML (R, scikit-learn, Weka)
Tools and Techniques for Analyzing Texts: Tweets to Intellectual PropertyDan Sullivan, Ph.D.
The document discusses various text mining techniques including sentiment analysis, topic modeling, classification, named entity recognition, and event extraction. It provides examples of applications and considerations for each technique. Performance factors like scalability, language support, and integration rules are also covered. Overall the document serves as an introduction to common text analytics methods.
This document discusses a lecture on data harvesting and storage. It covers APIs, RSS feeds, scraping and crawling as methods for collecting data from various sources. It also discusses storing data in formats like CSV, JSON, and XML. The document provides code examples for working with JSON data and discusses tools for long-term data collection like DMI-TCAT.
This document summarizes research analyzing metadata from over 630,000 learning objects using the Learning Object Metadata (LOM) standard. The analysis found that LOM instances take up around 5KB of storage space on average. Only 20 of LOM's 50 elements are used frequently, capturing similar information to the Dublin Core standard. Educational elements are underused and dependent on individual communities. Validation found loose implementation of LOM's XML structure results in good interoperability despite unclear value spaces. Metadata quality is varied, showing a need for quality assurance processes. The conclusion advocates for more studies of this kind to improve metadata standards and learning technologies.
The document discusses improvements made to the VIVO search engine between versions 1.2.1 and 1.3. Version 1.3 transitioned from Lucene to SOLR, allowing it to index additional data from semantic relationships and interconnectivity. This enriched the search index and provided more relevant results with better ranking compared to version 1.2.1. Indexing was also improved through multithreading, reducing indexing time. Experiments were discussed to further enhance search quality through techniques like query expansion, spelling correction, and using ontologies for fact-based questioning.
Latent semantic analysis (LSA) is a technique used in natural language processing to analyze relationships between documents and terms by producing concepts related to them. LSA assumes words with similar meanings will occur in similar texts, and uses a documents-terms matrix and singular value decomposition to discover hidden concepts and represent words and documents as vectors in a semantic vector space. Apache OpenNLP is a machine learning toolkit that can be used for various natural language processing tasks like part-of-speech tagging and parsing, and LSA can be seen as part of natural language processing.
The ontology engineering research community has focused for many years on supporting the creation, development and evolution of ontologies. Ontology forecasting, which aims at predicting semantic changes in an ontology, represents instead a new challenge. In this paper, we want to give a contribution to this novel endeavour by focusing on the task of forecasting semantic concepts in the research domain. Indeed, ontologies representing scientific disciplines contain only research topics that are already popular enough to be selected by human experts or automatic algorithms. They are thus unfit to support tasks which require the ability of describing and exploring the forefront of research, such as trend detection and horizon scanning. We address this issue by introducing the Semantic Innovation Forecast (SIF) model, which predicts new concepts of an ontology at time t+1 , using only data available at time t. Our approach relies on lexical innovation and adoption information extracted from historical data. We evaluated the SIF model on a very large dataset consisting of over one million scientific papers belonging to the Computer Science domain: the outcomes show that the proposed approach offers a competitive boost in mean average precision-at-ten compared to the baselines when forecasting over 5 years.
What one needs to know to work in Natural Language Processing field and the aspects of developing an NLP project using the example of a system to identify text language
Supporting Springer Nature Editors by means of Semantic TechnologiesFrancesco Osborne
The Open University and Springer Nature have been collaborating since 2015 in the development of an array of semantically-enhanced solutions supporting editors in i) classifying proceedings and other editorial products with respect to the relevant research areas and ii) taking informed decisions about their marketing strategy. These solutions include i) the Smart Topic API, which automatically maps keywords associated with published papers to semantically characterized topics, which are drawn from a very large and automatically-generated ontology of Computer Science topics; ii) the Smart Topic Miner, which helps editors to associate scholarly metadata to books; and iii) the Smart Book Recommender, which assists editors in deciding which editorial products should be marketed in a specific venue.
WISS QA Do it yourself Question answering over Linked DataAndre Freitas
This document describes a challenge to build a question answering system over linked data from DBpedia and Wikipedia. Participants will work in groups to develop components of the QA system, such as question analysis, entity search, query generation, graph extraction, evaluation, and a user interface. The goal is to have a working QA system by the end of the challenge that can answer natural language questions over linked data.
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Khirulnizam Abd Rahman
Application of Ontology in Semantic Information Retrieval
by Prof Shahrul Azman from FSTM, UKM
Presentation for MyREN Seminar 2014
Berjaya Hotel, Kuala Lumpur
27 November 2014
Semantic search uses language processing to analyze the meaning of content and search queries to return more relevant results. It involves classifying content using taxonomies, identifying named entities, extracting relationships between entities, and matching these based on meaning. Implementing semantic search requires preparing content through classification, metadata, and information architecture, as well as technologies for semantic tagging, entity extraction, triple stores, and integrating these capabilities with existing search and content management systems.
Profile-based Dataset Recommendation for RDF Data Linking Mohamed BEN ELLEFI
This document summarizes Mohamed Ben Ellefi's PhD thesis defense on profile-based dataset recommendation for RDF data linking. The thesis proposes two approaches: a topic profile-based approach and an intensional profile-based approach. The topic profile-based approach models datasets as topics and recommends target datasets based on similarity between source and target topic profiles, achieving an average recall of 81% and reducing the search space by 86%. The approach shows better performance than baselines but needs improvement on precision.
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerFrancesco Osborne
The document summarizes research on automatically classifying Springer Nature proceedings using the Smart Topic Miner (STM). STM extracts topics from publications, maps them to a computer science ontology, selects relevant topics using a greedy algorithm, and infers tags. It was tested on 8 Springer Nature editors who found STM accurately classified 75-90% of proceedings and improved their work. However, STM is currently limited to computer science and occasional noisy results were found in books with few chapters. Future work aims to expand STM to characterize topic evolution over time and directly support author tagging.
This document describes an approach for bridging the gap between natural language queries and linked data concepts using BabelNet. The approach uses BabelNet for word sense disambiguation, named entity recognition and disambiguation. It parses queries, matches terms to ontology concepts and properties, generates candidate triples, and integrates the triples to produce SPARQL queries. The approach was evaluated on test data from QALD-2, achieving a promising 76% of questions answered correctly.
This document provides an overview of spoken content retrieval. It discusses how spoken content retrieval works, involving speech recognition to convert spoken queries/documents into text that can then be used for information retrieval. However, recognition errors pose challenges. The document thus explores techniques beyond using only recognition 1-best outputs, such as using lattices, confusion networks, and subword units to handle out-of-vocabulary words and improve retrieval accuracy. It also presents examples of integrating different recognition clues, training retrieval models, and directly matching spoken queries to documents without recognition.
This document discusses practical aspects of natural language processing (NLP) work. It contrasts research work, which involves setting goals, devising algorithms, training models, and testing accuracy, with development work, which focuses on implementing algorithms as scalable APIs. The document emphasizes that obtaining data is crucial for NLP and describes sources for structured, semi-structured, and unstructured data. It recommends Lisp as a language that supports the interactivity, flexibility, and tree processing needed for NLP research and development work.
Apply chinese radicals into neural machine translation: deeper than character...Lifeng (Aaron) Han
The document proposes incorporating Chinese radicals into neural machine translation models. It discusses related work incorporating word and character level information into neural MT. The proposed model combines radical-level MT with an attention-based neural model, representing input text with word, character, and radical combinations. Experiments show the character+radical and word+radical models outperform baselines on standard MT evaluation metrics using a Chinese-English dataset. Future work includes improving model optimization and testing on additional data.
This document summarizes a presentation on the basics of Python programming. It introduces fundamental Python concepts like datatypes, functions, methods, and indentation-based code structuring. It also announces an exercise for the attendees to practice these basics and previews upcoming meetings that will involve working with structured datasets in Python.
Practical Machine Learning - Part 1 contains:
- Basic notations of ML (what tasks are there, what is a model, how to measure performance)
- A couple of examples of problems and solutions (taken from previous work)
- A brief presentation of open-source software used for ML (R, scikit-learn, Weka)
Tools and Techniques for Analyzing Texts: Tweets to Intellectual PropertyDan Sullivan, Ph.D.
The document discusses various text mining techniques including sentiment analysis, topic modeling, classification, named entity recognition, and event extraction. It provides examples of applications and considerations for each technique. Performance factors like scalability, language support, and integration rules are also covered. Overall the document serves as an introduction to common text analytics methods.
This document discusses a lecture on data harvesting and storage. It covers APIs, RSS feeds, scraping and crawling as methods for collecting data from various sources. It also discusses storing data in formats like CSV, JSON, and XML. The document provides code examples for working with JSON data and discusses tools for long-term data collection like DMI-TCAT.
This document summarizes research analyzing metadata from over 630,000 learning objects using the Learning Object Metadata (LOM) standard. The analysis found that LOM instances take up around 5KB of storage space on average. Only 20 of LOM's 50 elements are used frequently, capturing similar information to the Dublin Core standard. Educational elements are underused and dependent on individual communities. Validation found loose implementation of LOM's XML structure results in good interoperability despite unclear value spaces. Metadata quality is varied, showing a need for quality assurance processes. The conclusion advocates for more studies of this kind to improve metadata standards and learning technologies.
The document discusses improvements made to the VIVO search engine between versions 1.2.1 and 1.3. Version 1.3 transitioned from Lucene to SOLR, allowing it to index additional data from semantic relationships and interconnectivity. This enriched the search index and provided more relevant results with better ranking compared to version 1.2.1. Indexing was also improved through multithreading, reducing indexing time. Experiments were discussed to further enhance search quality through techniques like query expansion, spelling correction, and using ontologies for fact-based questioning.
Latent semantic analysis (LSA) is a technique used in natural language processing to analyze relationships between documents and terms by producing concepts related to them. LSA assumes words with similar meanings will occur in similar texts, and uses a documents-terms matrix and singular value decomposition to discover hidden concepts and represent words and documents as vectors in a semantic vector space. Apache OpenNLP is a machine learning toolkit that can be used for various natural language processing tasks like part-of-speech tagging and parsing, and LSA can be seen as part of natural language processing.
The ontology engineering research community has focused for many years on supporting the creation, development and evolution of ontologies. Ontology forecasting, which aims at predicting semantic changes in an ontology, represents instead a new challenge. In this paper, we want to give a contribution to this novel endeavour by focusing on the task of forecasting semantic concepts in the research domain. Indeed, ontologies representing scientific disciplines contain only research topics that are already popular enough to be selected by human experts or automatic algorithms. They are thus unfit to support tasks which require the ability of describing and exploring the forefront of research, such as trend detection and horizon scanning. We address this issue by introducing the Semantic Innovation Forecast (SIF) model, which predicts new concepts of an ontology at time t+1 , using only data available at time t. Our approach relies on lexical innovation and adoption information extracted from historical data. We evaluated the SIF model on a very large dataset consisting of over one million scientific papers belonging to the Computer Science domain: the outcomes show that the proposed approach offers a competitive boost in mean average precision-at-ten compared to the baselines when forecasting over 5 years.
What one needs to know to work in Natural Language Processing field and the aspects of developing an NLP project using the example of a system to identify text language
Supporting Springer Nature Editors by means of Semantic TechnologiesFrancesco Osborne
The Open University and Springer Nature have been collaborating since 2015 in the development of an array of semantically-enhanced solutions supporting editors in i) classifying proceedings and other editorial products with respect to the relevant research areas and ii) taking informed decisions about their marketing strategy. These solutions include i) the Smart Topic API, which automatically maps keywords associated with published papers to semantically characterized topics, which are drawn from a very large and automatically-generated ontology of Computer Science topics; ii) the Smart Topic Miner, which helps editors to associate scholarly metadata to books; and iii) the Smart Book Recommender, which assists editors in deciding which editorial products should be marketed in a specific venue.
WISS QA Do it yourself Question answering over Linked DataAndre Freitas
This document describes a challenge to build a question answering system over linked data from DBpedia and Wikipedia. Participants will work in groups to develop components of the QA system, such as question analysis, entity search, query generation, graph extraction, evaluation, and a user interface. The goal is to have a working QA system by the end of the challenge that can answer natural language questions over linked data.
Application of Ontology in Semantic Information Retrieval by Prof Shahrul Azm...Khirulnizam Abd Rahman
Application of Ontology in Semantic Information Retrieval
by Prof Shahrul Azman from FSTM, UKM
Presentation for MyREN Seminar 2014
Berjaya Hotel, Kuala Lumpur
27 November 2014
Semantic search uses language processing to analyze the meaning of content and search queries to return more relevant results. It involves classifying content using taxonomies, identifying named entities, extracting relationships between entities, and matching these based on meaning. Implementing semantic search requires preparing content through classification, metadata, and information architecture, as well as technologies for semantic tagging, entity extraction, triple stores, and integrating these capabilities with existing search and content management systems.
Semantic Search tutorial at SemTech 2012Peter Mika
This document provides an introduction to a semantic search tutorial given by Peter Mika and Tran Duc Thanh. The agenda covers semantic web data, including the RDF data model and publishing RDF data. It also covers query processing, ranking, result presentation, evaluation, and a question period. The document discusses why semantic search is needed to address poorly solved queries and enable novel search tasks using structured data and background knowledge.
An updated "what is happening on the Semantic Web" presentation for 2010 - includes business use, government use, and some speculation on the current areas of excitement and development. A very accessible talk, not aimed solely at a technical audience.
Ontology Web services for Semantic ApplicationsTrish Whetzel
The document describes ontology web services created by the National Center for Biomedical Ontology to facilitate the application of ontologies in biomedical science. The services provide access to ontologies and related functions like searching, term details, hierarchies and mappings. Additional services allow the creation of ontology-based annotations using tools like the annotator and ontology recommender. All services are accessible via RESTful web APIs.
Practical Semantic Web and Why You Should Care - DrupalCon DC 2009Boris Mann
Presented at Drupalcon DC 2009 - http://dc2009.drupalcon.org/session/practical-semantic-web-and-why-you-should-care
An overview of Semantic Web concepts and RDF. Exploration of RDFa. How open data fits. Examples of modules and functionality in Drupal today, and a plan for Drupal 7.
This document provides an overview of ontologies, semantic web technologies, and their applications. It discusses syntactic web limitations and the need to add semantics. Key concepts covered include ontology, RDF, RDFS, OWL, Protege, and how these technologies enable a global linked database by semantically connecting data on the web.
The Semantic Web (and what it can deliver for your business)Knud Möller
3-hour talk I gave on behalf of Social Bits and the Irish Internet Association (IIA). Contains an introduction to the general idea of the Semantic Web and Linked Data, its relevance and opportunities for businesses, and a look under the hood - how does it all work?
We present Fresnel Forms, a plugin we developed for Protégé, an editor for Semantic Web ontologies. The Fresnel Forms plugin processes the currently active ontology in a Protégé session to export a semantic wiki for that ontology. This export uses Semantic MediaWiki’s XML-based export format for import into an existing wiki. Fresnel Forms also provides a GUI editor to let the user fine-tune the generated interface before exporting it to a wiki.
Fresnel Forms exports use features from Semantic MediaWiki and Semantic Forms to provide an annotate-and-browse data system interface. Each wiki Fresnel Forms generates provides forms for entering data for classes and fields that conform to the original ontology. Templates provide displays of pages created with these forms. Finally, the wiki’s ExportRDF feature creates Semantic Web triples for the data entered that use URI’s from the original ontology. Fresnel Forms provides thus an efficient way to create a wiki for populating a given Semantic Web ontology.
Fresnel Forms can be downloaded and installed on Protégé from http://is.cs.ou.nl/OWF/index.php5/Fresnel_Forms
The document introduces the Semantic Web and how it allows for the integration and merging of disparate datasets. It provides an example of merging two bookstore datasets that have similar information but are structured differently. By exporting the datasets as RDF triples, mapping identical resources, and adding a few statements to link equivalent terms, the datasets can be merged. This allows for new queries to be answered by combining information from both original datasets. The Semantic Web provides technologies to automate this kind of data integration and enable more powerful queries across multiple sources of data.
This document discusses adding semantic structure to real-time social data from Twitter through Twitter Annotations. It describes how Annotations can be mapped to existing Semantic Web vocabularies and linked to datasets to enable real-time semantic search over social and linked data. A system called TwitLogic is presented that captures Twitter data, converts it to RDF, and publishes it as linked streams to allow for continuous querying and integration with the live Semantic Web.
The document discusses the evolution of the World Wide Web towards a Semantic Web, where computers will be able to understand the meaning, context and relationships between data on web pages. It provides an example of how Semantic Web coding could link together different web pages about a professor by relating her faculty page, research, blog and staff listing. This creates a richer experience for users by making more information accessible in an interconnected way. The document then outlines some methods for implementing Semantic Web coding, such as using RDF triples or microformats, and provides examples of microformats being used on web pages.
Very basic introductory talk about the Semantic Web, given to undergraduate and posgraduate students of Universidad del Valle (Cali, Colombia) in September 2010
The GoodRelations Ontology: Making Semantic Web-based E-Commerce a RealityMartin Hepp
A promising application domain for Semantic Web technology is the annotation of products and services offerings on the Web so that consumers and enterprises can search for suitable suppliers using products and services ontologies. While there has been substantial progress in developing ontologies for types of products and services, namely eClassOWL, this alone does not provide the representational means required for e-commerce on the Semantic Web. Particularly missing is an ontology that allows describing the relationships between (1) Web resources, (2) offerings made by means of those Web resources, (3) legal entities, (4) prices, (5) terms and conditions, and (6) the aforementioned ontologies for products and services. (1NDN)
In the talk, I will explain the need and potential of the GoodRelations ontology, introduce its key conceptual elements, highlight several lessons learned, and summarize design decisions with respect to to modeling approaches and the appropriate language fragment, which may be relevant for other ontology projects, too.
Finding knowledge, data and answers on the Semantic Webebiquity
Web search engines like Google have made us all smarter by providing ready access to the world's knowledge whenever we need to look up a fact, learn about a topic or evaluate opinions. The W3C's Semantic Web effort aims to make such knowledge more accessible to computer programs by publishing it in machine understandable form.
<p>
As the volume of Semantic Web data grows software agents will need their own search engines to help them find the relevant and trustworthy knowledge they need to perform their tasks. We will discuss the general issues underlying the indexing and retrieval of RDF based information and describe Swoogle, a crawler based search engine whose index contains information on over a million RDF documents.
<p>
We will illustrate its use in several Semantic Web related research projects at UMBC including a distributed platform for constructing end-to-end use cases that demonstrate the semantic web’s utility for integrating scientific data. We describe ELVIS (the Ecosystem Location Visualization and Information System), a suite of tools for constructing food webs for a given location, and Triple Shop, a SPARQL query interface which searches the Semantic Web for data relevant to a given query ELVIS functionality is exposed as a collection of web services, and all input and output data is expressed in OWL, thereby enabling its integration with Triple Shop and other semantic web resources.
The document discusses reasoning on the Semantic Web, including issues, vulnerabilities, and solutions. It covers work done so far in ontologies and rules, points out vulnerabilities like lack of referential integrity and inconsistent knowledge from multiple resources. It discusses the need for reasoners to be incomplete but possibly unsound to handle the scale of the web. Related work in distributed reasoning is presented, and it concludes by looking forward to the need for web-scale reasoning that can deal with incomplete and inconsistent resources while being context-aware and allowing different representations of open and closed world assumptions.
This document summarizes research into discovering lost web pages using techniques from digital preservation and information retrieval. Key points include:
- Web pages are frequently lost due to broken links or content being moved/removed, but copies may still exist in search engine caches or archives.
- Techniques like lexical signatures (representing a page's content in a few keywords) and analyzing page titles, tags and link neighborhoods can help characterize lost pages and find similar replacement content.
- Experiments showed that lexical signatures degrade over time but page titles are more stable, and combining techniques improves performance in locating replacement content. The goal is to develop a browser extension to help users find lost web pages.
LANL Research Library
March 12, 2009
Martin Klein & Michael L. Nelson
Department of Computer Science
Old Dominion University
Norfolk VA
www.cs.odu.edu/~{mklein,mln}
Synchronicity: Just-In-Time Discovery of Lost Web PagesMichael Nelson
The document discusses techniques for discovering lost web pages using lexical signatures. It finds that lexical signatures generated from page titles and content evolve over time, with terms dropping out. Signatures perform best with 5-7 terms. Combining titles with signatures provides better discovery results than either alone. Future work includes predicting "good" titles and augmenting signatures with tags and link neighborhoods.
Integrating a Domain Ontology Development Environment and an Ontology Search ...Takeshi Morita
In order to reduce the cost of building domain ontologies manually, in this paper, we propose a method and a tool named DODDLE-OWL for domain ontology construction reusing texts and existing ontologies extracted by an ontology search engine: Swoogle. In the experimental evaluation, we applied the method to a particular field of law and evaluated the acquired ontologies.
Tracing Networks: Ontology-based Software in a NutshellTracingNetworks
The document discusses using ontologies for semantic tagging and querying of data. It describes how ontologies can model relationships between concepts to address limitations of traditional keyword searches. An example application uses an ontology to semantically tag images, allowing queries over relationships like "animal and person" instead of just keywords. The system architecture integrates an ontology, triplestore and visualization tools like Google Earth and charts.
247th ACS Meeting: Experiment Markup Language (ExptML)Stuart Chalk
To integrate science into the semantic web it is important to capture the context of research as it is done. ExptML is designed to store information and workflows from the scientific process.
Current conceptual models and methodologies for Web applications concentrate on content, navigation, and service modeling. Although some of them are meant to address semantic web applications too, they do not fully exploit the whole potential deriving from interaction with ontological data sources and and from Semantic annotations. This paper proposes an extension to Web application conceptual models toward Semantic Web. We devise an extension of the WebML modeling framework that fulfills most of the design requirements emerging for the new area of Semantic Web. We generalize the development process to cover Semantic Web and we devise a set of new primitives for ontology importing and querying. Finally, an implementation prototype of the proposed concepts is proposed within the commercial tool WebRatio.
The document discusses the BioSamples Database (BioSD) and its conversion to linked data. BioSD aims to provide information about biological samples used in experiments in a centralized reference system. It was converted to linked data to allow for integration with other datasets, exploitation of ontologies, and improved searching. The conversion included changes to the data model and several improvements to the software. SPARQL queries are demonstrated to retrieve sample data and attributes. Potential new areas discussed include integrating geo-located samples with Google Maps and search by feature similarity.
The document discusses faceted search over ontology-enhanced RDF data. It formalizes faceted interfaces for querying RDF graphs that capture ontological information. It studies the expressivity and complexity of queries represented by faceted interfaces, and algorithms for generating and updating interfaces based on the underlying RDF and ontology information. The goal is to provide rigorous theoretical foundations for faceted search in the context of RDF and OWL 2 ontologies.
Abstract:
An increasing number of applications rely on RDF, OWL 2, and SPARQL for storing and querying data. SPARQL, however, is not targeted towards end-users, and suitable query interfaces are needed. Faceted search is a prominent approach for end-user data access, and several RDF-based faceted search systems have been developed. There is, however, a lack of rigorous theoretical underpinning for faceted search in the context of RDF and OWL 2. In this paper, we provide such solid foundations. We formalise faceted interfaces for this context, identify a fragment of first-order logic capturing the underlying queries, and study the complexity of answering such queries for RDF and OWL 2 profiles. We then study interface generation and update, and devise efficiently implementable algorithms. Finally, we have implemented and tested our faceted search algorithms for scalability, with encouraging results.
The document discusses the Semantic Web and Linked Data. It provides an overview of RDF syntaxes, storage and querying technologies for the Semantic Web. It also discusses issues around scalability and reasoning over large amounts of semantic data. Examples are provided to illustrate SPARQL querying of RDF data, including graph patterns, conjunctions, optional patterns and value testing.
Semantic Web: From Representations to ApplicationsGuus Schreiber
This document discusses semantic web representations and applications. It provides an overview of the W3C Web Ontology Working Group and Semantic Web Best Practices and Deployment Working Group, including their goals and key issues addressed. Examples of semantic web applications are also described, such as using ontologies to integrate information from heterogeneous cultural heritage sources.
Santa Fe Complex
March 13, 2009
Martin Klein, Frank McCown,
Joan Smith, Michael L. Nelson
Department of Computer Science
Old Dominion University
Norfolk VA
NCBO BioPortal SPARQL Endpoint - The Quad Economy of a Semantic Web Ontology ...Trish Whetzel
The document discusses the NCBO BioPortal SPARQL endpoint, which provides query access to ontologies stored in BioPortal. It analyzes storing ontologies in a quad store to improve scalability over materializing all import closures. An analysis of 149 OWL ontologies found materializing imports led to nearly double the number of ontologies and triples compared to using individual ontology graphs without import materialization. The SPARQL endpoint facilitates queries across ontologies by synchronizing content daily and providing access to ontology metadata and versions.
Tracing Networks: Ontology Software in a Nutshellenoch1982
The document introduces ontology-based databases and their advantages over traditional relational databases. It discusses how ontologies use triples to represent knowledge as graphs rather than tables. This allows for more complex queries using semantic relationships rather than just keywords. The document provides an example of using an ontology to tag images from an archaeological site and then querying the data to find images matching semantic patterns. It describes using visualization tools to display query results on maps and charts.
Using the Semantic Web, and Contributing to itMathieu d'Aquin
The document discusses using and contributing to the Semantic Web. It describes how Semantic Web applications can exploit distributed knowledge online by dynamically retrieving and combining relevant ontologies and data. It presents Watson, a gateway that provides APIs allowing applications to search, explore, and query Semantic Web documents without having to download the data. The document also discusses analyzing relationships between ontologies and assessing agreement/disagreement between their statements.
Toward Semantic Representation of Science in Electronic Laboratory Notebooks ...Stuart Chalk
An electronic laboratory Notebook (ELN) can be characterized as a system that allows scientists to capture the data and resources used in performing scientific experiments. This allows users to easily organize and find their data however, little information about the scientific process is recorded.
In this paper we highlight the current status of progress toward semantic representation of science in ELNs.
Towards ubiquitous OWL computing: Simplifying programmatic authoring of and q...Hilmar Lapp
Presentation about two small tools addressing gaps commonly encountered when computing and programming with OWL (the Web Ontology Language) at scale. Given at the 2014 Bioinformatics Open Source Conference (BOSC).
The video of the talk is here: http://youtu.be/K0SlYwMyn-A
Similar to QALL-ME: Ontology and Semantic Web (20)
Tutorial given at RANLP 2015 in Hissar, Bulgaria
Recent years have seen lots of changes in the field of computational linguistics, most of them due to the widespread use of the Internet and the benefits and problems it brings. The first part of this tutorial will discuss these changes and will focus on crowdsourcing and how it influenced the creation of annotated data.
Annotation of data employed to train and test NLP methods used to be the task of language experts who had a good understanding of the linguistic phenomena to be tackled. Given that a large number of people now have access to the Internet, crowdsourcing has become an alternative way of obtaining annotated data. The core idea of crowdsourcing is that it is possible to design tasks that can be completed by non-experts and that the outputs of these tasks can be combined to obtain high-quality linguistic annotation, which would normally be produced by experts. Examples of how crowdsourcing was employed in computational linguistics will be given.
Big data is another trend in computational linguistics as researchers rely on more and more data for improving the results of a method. The second part of the tutorial will introduce the MapReduce programming model and show how it was used in processing language. Combined with processing larger quantities of data, the field of computational linguistics has applied deep learning to various tasks successfully, improving their accuracy. An introduction to deep learning will be provided, followed by examples of how it was applied to tasks such as learning semantic representations, sentiment analysis and machine translation evaluation.
The role of linguistic information for shallow language processingConstantin Orasan
The document discusses shallow language processing and summarization. It argues that while deep language understanding is limited, shallow methods can be improved by adding linguistic information. As an example, it shows how term frequency, anaphora resolution, discourse cues and genetic algorithms can select extractive summaries that better match human abstracts, without requiring full text comprehension.
What is Computer-Aided Summarisation and does it really work?Constantin Orasan
Computer-aided summarization (CAS) uses automatic methods to identify important information in documents, which humans can then edit to produce summaries. An evaluation of a CAS tool called CAST found that it reduced the time professional summarizers needed to produce summaries by 20% on average without significantly affecting summary quality. User feedback indicated the tool was most useful for identifying related sentences to include.
The document discusses automatic summarization and related disciplines. It defines summarization as the condensation of a source text into a shorter version by selecting key information. Automatic summarization involves producing summaries computationally. Related fields include automatic classification, keyword extraction, information retrieval, information extraction, and question answering, which all aim to organize and understand information from text.
The MESSAGE project aims to:
1) Develop tools to rapidly disseminate reliable emergency messages across Europe.
2) Ensure messages are comprehensible to facilitate response.
3) Propose making available a controlled language editing tool to allow quick and accurate editing of alerts.
Invited talk at Processing ROmanian in Multilingual, Interoperational and Scalable Environments (PROMISE 2010) on how to port the QALL-ME framework to a new language
Annotation of anaphora and coreference for automatic processingConstantin Orasan
This document discusses annotation of anaphora and coreference in corpora for computational linguistics. It covers several annotation schemes including MUC, which aimed to achieve high inter-annotator agreement by focusing on coreference between noun phrases. The NP4E corpus aimed to develop guidelines for annotating both noun phrase and event coreference in newspaper articles. Annotation is a time-consuming process that requires concentration to identify mentions and relations accurately. Guidelines must be clear and consistent to help annotators agree on how to mark up texts.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
1. Co-funded by the European Union
QALL-ME: Ontology andQALL-ME: Ontology and
Semantic WebSemantic Web
Constantin Orasan
University of Wolverhampton
http://clg.wlv.ac.uk
2. Structure of presentation
1. The QALL-ME ontology
2. The ontology for answer retrieval
3. The ontology for bibliographical domain
4. The ontology for presentation
5. Where next?
3. Author, Title - Date 3
Ontology in QALL-MEOntology in QALL-ME
The QALL-ME ontology provides a
conceptualised description of the domain in
which the system is used
It is used to:
Provide a bridge between languages
Pass information between different components
of the system
Encode the data
Retrieve the data
4. QALL-ME ontology
An ontology for the domain of tourism was
developed and used in the prototype (Ou et.
al., 2008)
Experiments with (existing) ontologies for the
bibliographical domain were carried out
(Orasan et. al., 2009)
5. Ontology for the domain of tourism
Developed to address the user needs
Inspired by existing ontologies such as
Harmonise, eTourism, etc.
… but developed specially for the project
Aligned it to WordNet and SUMO
Freely available from the QALL-ME website
7. Semantic annotation and database organization
The ontology was used to encode the data
Annotated data from the content providers
was converted to RDF triplets
The RDF documents can be stored in
databases or plain text files
The Jena RDF API was used for the
operations
8. Semantic annotation and database organization
XML Schema
XML
Documents
RDF
Documents
Define
DetermineDetermine
Transform
QALL-ME
Ontology
HTML
Parser
Download World Wide
Web
Convert
Database
Convert
12. Ontology for MRP
Minimal Relation Patterns represent relations
in the ontology
Can be used in text entailment
Already presented
13. Ontology for generation of hypothesis
Starting from the ontology we can create hypothesis
What is the name of the movie with [DIRECTOR]?
What is the director of the movie with the name [NAME]?
Can be done for any language
Can generate the SPARQL at the same time
Can be done for any domain
14. Ontology generated patterns
91% of the questions from the benchmark have one
or two constrains
Investigation of the benchmark indicated three
types of questions:
T1 – Query the name of a site or event which has one or
more non-name attributes;
Can you tell me the name of a Chinese restaurant in
Walsall?
T2 – Query a non-name attribute of a site or event whose
name is known; and
Can you give me the address for the Kinnaree Thai
Restaurant?
T3 – Query a non-name attribute of a site or event whose
name is unknown but using its other non-name
attribute(s) as the constraint(s).
15. Could you give me a contact number for an
Italian restaurant in Solihull?”
can be decomposed into the following two
questions:
T1: could you give me the name of an Italian
restaurant in Solihull?
T2: could you give me a contact number for
<the name of the restaurant in T1>?
16. Automatically generated patterns
the ontology can be used to generate patterns for T1 and
T2 questions with one or two constraints
2703 patterns were generated for English and German
generated also the SPARQLs
Evaluation on 200 questions
Baseline = cosine bag of words
Semantic engine = similarity on concepts + EAT + entity
filtering
Language and domain independent
Baseline Semantic engine
English 42.46% 65%
German 34.96% 64.88%
18. Domain of scientific publications
Experiments for the bibliographic domain were
carried out
What papers did C. Orasan published in 2008?
Existing ontologies were combined:
Semantic Web for Research Communities (SWRC)
models concepts from the research community
A subset of Dublin Core was used to describe the
properties of a bibliographical entry
Simple Knowledge Organisation System (SKOS) was
used to model relations between terms
19. The data from BibTeX format was converted
to the domain ontology
SPARQL patterns were generated
The retrieval algorithm was not changed
… but some changes had to be introduced at
the level of framework
21. User satisfaction is largely determined by
aspects such as the ease of use, learning
curve, feedback, interface friendliness, etc.
and not just by accuracy.
What movies can I see at Symphony Hall this week?
If no answers:
Look for a different location
Search for a different time period
Wrong presupposition
User preferences
22. Most of the Feedback desiderata can be met
without changing the current pipeline.
'understanding' occurs in the Entailment engine
(EE)
the QPlanner does not have direct access to this
information, but
it can be injected in the results via the generated
SPARQL, exploiting the RDF data model
Interactive Question Answering (IQA)
ontology (Magnini et. al., 2009)
23. A question is analysed in terms of:
Expected answer type
Constraints
Context
The answer will contain:
Core Information
Justification
Complementary information
The situation can be handled using a rich SPARQL
Rewriting rules for the SPARQL in case of empty
answer
25. qmq:qi rdf:type qmq:QuestionInterpretation;
qmq:hasInterpretation
"In which cinema is [MOVIE] showed on [TIME]" ;
qmq:hasConstraint qmq:c1;
qmq:hasConstraint qmq:c2;
qmq:hasFacet qmq:f1.
qmq:c2 rdf:type qmq:Filter;
qmq:hasType qmo:DatePeriod;
qmq:hasProperty qmo:startDate;
qmq:hasValue '''[TIMEX2]''' ;
qmq:failureReason
“No film can be for the given date”.
27. Where next?
We have the technology to “convert” a
natural language question to SPARQL, via
an ontology
We can get access to a large number of
resources using Linked Open Data
We can expand the access to knowledge