The document provides an overview of question answering systems, including their evolution from information retrieval, common evaluation benchmarks like TREC and CLEF, and examples of major QA projects like Watson. It also discusses the movement towards leveraging semantic technologies and linked open data to power next generation QA systems, as seen in projects like SINA which transform natural language queries into formal queries over structured knowledge bases.
I will try to say – what is QA, how could we get the answer to questions on natural language and how successful have we been in that domain.
I have gained all of my knowledge from three proposed papers and what I read around them.
Question Answering System using machine learning approachGarima Nanda
In a compact form, this is a presentation reflecting how the machine learning approach can be used for the effective and efficient interaction using classification techniques.
An on-going project on Natural Language Processing (using Python and the NLTK toolkit), which focuses on the extraction of sentiment from a Question and its title on www.stackoverflow.com and determining the polarity.Based on the above findings, it is verified whether the rules and guidelines imposed by the SO community on the users are strictly followed or not.
I will try to say – what is QA, how could we get the answer to questions on natural language and how successful have we been in that domain.
I have gained all of my knowledge from three proposed papers and what I read around them.
Question Answering System using machine learning approachGarima Nanda
In a compact form, this is a presentation reflecting how the machine learning approach can be used for the effective and efficient interaction using classification techniques.
An on-going project on Natural Language Processing (using Python and the NLTK toolkit), which focuses on the extraction of sentiment from a Question and its title on www.stackoverflow.com and determining the polarity.Based on the above findings, it is verified whether the rules and guidelines imposed by the SO community on the users are strictly followed or not.
What really are recommendations engines nowadays?
This presentation introduces the foundations of recommendation algorithms, and covers common approaches as well as some of the most advanced techniques. Although more focused on efficiency than theoretical properties, basics of matrix algebra and optimization-based machine learning are used through the presentation.
Table of Contents:
1. Collaborative Filtering
1.1 User-User
1.2 Item-Item
1.3 User-Item
* Matrix Factorization
* Stochastic Gradient Descent (SGD)
* Truncated Singular Value Decomposition (SVD)
* Alternating Least Square (ALS)
* Deep Learning
2. Content Extraction
* Item-Item Similarities
* Deep Content Extraction: NLP, CNN, LSTM
3. Hybrid Models
4. In Production
4.1 Problematics
4.2 Solutions
4.3 Tools
And then there were ... Large Language ModelsLeon Dohmen
It is not often even in the ICT world that one witnesses a revolution. The rise of the Personal Computer, the rise of mobile telephony and, of course, the rise of the Internet are some of those revolutions. So what is ChatGPT really? Is ChatGPT also such a revolution? And like any revolution, does ChatGPT have its winners and losers? And who are they? How do we ensure that ChatGPT contributes to a positive impulse for "Smart Humanity?".
During a key note om April 3 and 13 2023 Piek Vossen explained the impact of Large Language Models like ChatGPT.
Prof. PhD. Piek Th.J.M. Vossen, is Full professor of Computational Lexicology at the Faculty of Humanities, Department of Language, Literature and Communication (LCC) at VU Amsterdam:
What is ChatGPT? What technology and thought processes underlie it? What are its consequences? What choices are being made? In the presentation, Piek will elaborate on the basic principles behind Large Language Models and how they are used as a basis for Deep Learning in which they are fine-tuned for specific tasks. He will also discuss a specific variant GPT that underlies ChatGPT. It covers what ChatGPT can and cannot do, what it is good for and what the risks are.
Automatic text summarization is the process of reducing the text content and retaining the
important points of the document. Generally, there are two approaches for automatic text summarization:
Extractive and Abstractive. The process of extractive based text summarization can be divided into two
phases: pre-processing and processing. In this paper, we discuss some of the extractive based text
summarization approaches used by researchers. We also provide the features for extractive based text
summarization process. We also present the available linguistic preprocessing tools with their features,
which are used for automatic text summarization. The tools and parameters useful for evaluating the
generated summary are also discussed in this paper. Moreover, we explain our proposed lexical chain
analysis approach, with sample generated lexical chains, for extractive based automatic text summarization.
We also provide the evaluation results of our system generated summary. The proposed lexical chain
analysis approach can be used to solve different text mining problems like topic classification, sentiment
analysis, and summarization.
Word embedding, Vector space model, language modelling, Neural language model, Word2Vec, GloVe, Fasttext, ELMo, BERT, distilBER, roBERTa, sBERT, Transformer, Attention
The presentation describes how to install the NLTK and work out the basics of text processing with it. The slides were meant for supporting the talk and may not be containing much details.Many of the examples given in the slides are from the NLTK book (http://www.amazon.com/Natural-Language-Processing-Python-Steven/dp/0596516495/ref=sr_1_1?ie=UTF8&s=books&qid=1282107366&sr=8-1-spell ).
발표자: 민세원 (서울대 학사과정)
발표일: 2017.8.
Sewon Min(민세원) is a student at Seoul National University, majoring in computer science. She did her research at University of Washington with Minjoon Seo, Hannaneh Hajishirzi and Ali Farhadi. Her main interest is natural language understanding with a focus on question answering.
개요:
To achieve human-level understanding of natural language, it is crucial to carefully analyze the current state of machine ability to judge what machine can do and what it cannot do. Then, it is required to concern how to expand the current ability of a machine toward human-level. In this talk, I will first describe the current state of machine ability in question answering by analyzing recently well-studied dataset, SQuAD. Next, by focusing on its limitations, I will introduce some of my desired approaches toward the next step. Lastly, I will introduce my work on transfer learning in question answering as one of those approaches.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
Introduction to Natural Language ProcessingPranav Gupta
the presentation gives a gist about the major tasks and challenges involved in natural language processing. In the second part, it talks about one technique each for Part Of Speech Tagging and Automatic Text Summarization
Deep Learning Models for Question AnsweringSujit Pal
Talk about a hobby project to apply Deep Learning models to predict answers to 8th grade science multiple choice questions for the Allen AI challenge on Kaggle.
What really are recommendations engines nowadays?
This presentation introduces the foundations of recommendation algorithms, and covers common approaches as well as some of the most advanced techniques. Although more focused on efficiency than theoretical properties, basics of matrix algebra and optimization-based machine learning are used through the presentation.
Table of Contents:
1. Collaborative Filtering
1.1 User-User
1.2 Item-Item
1.3 User-Item
* Matrix Factorization
* Stochastic Gradient Descent (SGD)
* Truncated Singular Value Decomposition (SVD)
* Alternating Least Square (ALS)
* Deep Learning
2. Content Extraction
* Item-Item Similarities
* Deep Content Extraction: NLP, CNN, LSTM
3. Hybrid Models
4. In Production
4.1 Problematics
4.2 Solutions
4.3 Tools
And then there were ... Large Language ModelsLeon Dohmen
It is not often even in the ICT world that one witnesses a revolution. The rise of the Personal Computer, the rise of mobile telephony and, of course, the rise of the Internet are some of those revolutions. So what is ChatGPT really? Is ChatGPT also such a revolution? And like any revolution, does ChatGPT have its winners and losers? And who are they? How do we ensure that ChatGPT contributes to a positive impulse for "Smart Humanity?".
During a key note om April 3 and 13 2023 Piek Vossen explained the impact of Large Language Models like ChatGPT.
Prof. PhD. Piek Th.J.M. Vossen, is Full professor of Computational Lexicology at the Faculty of Humanities, Department of Language, Literature and Communication (LCC) at VU Amsterdam:
What is ChatGPT? What technology and thought processes underlie it? What are its consequences? What choices are being made? In the presentation, Piek will elaborate on the basic principles behind Large Language Models and how they are used as a basis for Deep Learning in which they are fine-tuned for specific tasks. He will also discuss a specific variant GPT that underlies ChatGPT. It covers what ChatGPT can and cannot do, what it is good for and what the risks are.
Automatic text summarization is the process of reducing the text content and retaining the
important points of the document. Generally, there are two approaches for automatic text summarization:
Extractive and Abstractive. The process of extractive based text summarization can be divided into two
phases: pre-processing and processing. In this paper, we discuss some of the extractive based text
summarization approaches used by researchers. We also provide the features for extractive based text
summarization process. We also present the available linguistic preprocessing tools with their features,
which are used for automatic text summarization. The tools and parameters useful for evaluating the
generated summary are also discussed in this paper. Moreover, we explain our proposed lexical chain
analysis approach, with sample generated lexical chains, for extractive based automatic text summarization.
We also provide the evaluation results of our system generated summary. The proposed lexical chain
analysis approach can be used to solve different text mining problems like topic classification, sentiment
analysis, and summarization.
Word embedding, Vector space model, language modelling, Neural language model, Word2Vec, GloVe, Fasttext, ELMo, BERT, distilBER, roBERTa, sBERT, Transformer, Attention
The presentation describes how to install the NLTK and work out the basics of text processing with it. The slides were meant for supporting the talk and may not be containing much details.Many of the examples given in the slides are from the NLTK book (http://www.amazon.com/Natural-Language-Processing-Python-Steven/dp/0596516495/ref=sr_1_1?ie=UTF8&s=books&qid=1282107366&sr=8-1-spell ).
발표자: 민세원 (서울대 학사과정)
발표일: 2017.8.
Sewon Min(민세원) is a student at Seoul National University, majoring in computer science. She did her research at University of Washington with Minjoon Seo, Hannaneh Hajishirzi and Ali Farhadi. Her main interest is natural language understanding with a focus on question answering.
개요:
To achieve human-level understanding of natural language, it is crucial to carefully analyze the current state of machine ability to judge what machine can do and what it cannot do. Then, it is required to concern how to expand the current ability of a machine toward human-level. In this talk, I will first describe the current state of machine ability in question answering by analyzing recently well-studied dataset, SQuAD. Next, by focusing on its limitations, I will introduce some of my desired approaches toward the next step. Lastly, I will introduce my work on transfer learning in question answering as one of those approaches.
https://telecombcn-dl.github.io/2017-dlsl/
Winter School on Deep Learning for Speech and Language. UPC BarcelonaTech ETSETB TelecomBCN.
The aim of this course is to train students in methods of deep learning for speech and language. Recurrent Neural Networks (RNN) will be presented and analyzed in detail to understand the potential of these state of the art tools for time series processing. Engineering tips and scalability issues will be addressed to solve tasks such as machine translation, speech recognition, speech synthesis or question answering. Hands-on sessions will provide development skills so that attendees can become competent in contemporary data analytics tools.
Introduction to Natural Language ProcessingPranav Gupta
the presentation gives a gist about the major tasks and challenges involved in natural language processing. In the second part, it talks about one technique each for Part Of Speech Tagging and Automatic Text Summarization
Deep Learning Models for Question AnsweringSujit Pal
Talk about a hobby project to apply Deep Learning models to predict answers to 8th grade science multiple choice questions for the Allen AI challenge on Kaggle.
Michael Manukyan and Hrayr Harutyunyan gave a talk on sentence representations in the context of deep learning during Armenian NLP Meetup. They also reviewed a recent paper on machine comprehension (Wang, Jiang, 2016)
Big data impact on society: a research roadmap for Europe (BYTE project resea...Anna Fensel
With its rapid growth and increasing adoption, big data is producing a growing impact in society. Its usage is opening both opportunities such as new business models and economic gains and risks such as privacy violations and discrimination. Europe is in need of a comprehensive strategy to optimise the use of data for a societal benefit and increase the innovation and competitiveness of its productive activities. In this paper, we contribute to the definition of this strategy with a research roadmap that considers the positive and negative externalities associated with big data, maps research and innovation topics in the areas of data management, processing, analytics, protection, visualisation, as well as non-technical topics, to the externalities they can tackle, and provides a time frame to address these topics.
This presentation discusses two databases. The first is a database of the platform scholarship from 2000 to present. The second is a global database of platform companies.
From Argument Mapping to Argument Mining, and BackEDV Project
Talk slides for the 2014 SICSA Workshop on Argument Mining (Dundee, 9-10 July 2014). The talk presents our group's work in the context of argument mining and offers some challenges and applications.
Presentation by Tito Sierra at Code4Lib 2007 in Athens, GA.
The Smart Subjects tool attempts to increase broader user discovery of relevant library resources by serendipitously recommending library subjects related to a user's search query. The prototype tool uses large locally created subject indexes consisting of rich topical keyword content harvested from local sources. An OpenSearch interface allows this recommendation service to be integrated flexibly and easily in a variety of web applications.
This presentation was provided by Camelia Csora of Elsevier during the NISO event "Next Generation Discovery Tools: New Tools, Aging Standards," held March 27 - March 28, 2008.
New Perspectives in Scholarly Publishing – Potenziale und Möglichkeiten digi...Alexander Grossmann
New Perspectives in Scholarly Publishing – Potenziale und Möglichkeiten digital vernetzter Forschung
Vortrag von Alexander Grossmann, Digital Academics Summit 2017, Leipzig
Manuel Noya talks about the science-industry relationship driven by competitive intelligence and how to surf emerging technologies
Workshop title:TDM unlocking a goldmine of information
Training overview:
Text and Data Mining (TDM) is a natural ‘next step’ in open science. It can lead to new and unexpected discoveries and increase the impact of publications and repositories. This workshop showcases examples of successful TDM and infrastructural solutions for researchers. We will also discuss what is needed to make most of infrastructures and how publishers and repositories can open up their content.
DAY 2 - PARALLEL SESSION 4 & 5
QALD-7 @ ESWC 2017 Portoroz, Slovenia
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Presentation of QALD 7 challenge at ESWC2017: Question Answering over Linked Data.
This work was supported by grants from the EU H2020 Framework Programme provided for the project HOBBIT (GA no. 688227).
Navigation Support in Evolving Communities by a Web-based DashboardRalf Klamma
Navigation Support in Evolving Communities by a Web-based Dashboard
10th IFIP WG 2.13 International Conference on Open Source Systems, OSS 2014, San José, Costa Rica, May 6-9, 2014
Anna Hannemann, KristjanLiiva, Ralf Klamma
RWTH Aachen UniversityAdvanced Community Information Systems (ACIS)
hannemann@dbis.rwth-aachen.de
COUNTER Standards for Open Access: the Value of Measuring/ the Measuring of V...UCD Library
Presentation given by Joseph Greene, Research Repository Librarian at University College Dublin Library, at the LIBER 2017 Conference in Patras, Greece, on July 6, 2017.
Similar to Tutorial on Question Answering Systems (20)
Metrics for Evaluating Quality of Embeddings for Ontological Concepts Saeedeh Shekarpour
Although there is an emerging trend towards generating embeddings for primarily unstructured data and, recently, for structured data, no systematic suite for measuring the quality of embeddings has been proposed yet.
This deficiency is further sensed with respect to embeddings generated for structured data because there are no concrete evaluation metrics measuring the quality of the encoded structure as well as semantic patterns in the embedding space.
In this paper, we introduce a framework containing three distinct tasks concerned with the individual aspects of ontological concepts: (i) the categorization aspect, (ii) the hierarchical aspect, and (iii) the relational aspect.
Then, in the scope of each task, a number of intrinsic metrics are proposed for evaluating the quality of the embeddings.
Furthermore, w.r.t. this framework, multiple experimental studies were run to compare the quality of the available embedding models.
Employing this framework in future research can reduce misjudgment and provide greater insight about quality comparisons of embeddings for ontological concepts.
We positioned our sampled data and code at https://github.com/alshargi/Concept2vec under GNU General Public License v3.0.
CEVO: Comprehensive EVent Ontology Enhancing Cognitive Annotation on RelationsSaeedeh Shekarpour
While the general analysis of named entities has received substantial research attention on unstructured as well as structured data, the analysis of relations among named entities has received limited focus.
In fact, a review of the literature revealed a deficiency in research on the abstract conceptualization required to organize relations.
We believe that such an abstract conceptualization can benefit various communities and applications
such as natural language processing, information extraction, machine learning, and ontology engineering.
In this paper, we present Comprehensive EVent Ontology (CEVO), built on
Levin's conceptual hierarchy of English verbs that categorizes verbs with shared meaning, and syntactic behavior.
We present the fundamental concepts and requirements for this ontology.
Furthermore, we present three use cases employing the CEVO ontology on
annotation tasks: (i) annotating relations in plain text, (ii) annotating ontological properties, and (iii) linking textual relations to ontological properties.
These use-cases demonstrate the benefits of using CEVO for annotation: (i) annotating English verbs from an abstract conceptualization, (ii) playing the role of an upper ontology for organizing ontological properties, and
(iii) facilitating the annotation of text relations using any underlying vocabulary.
This resource is available at https://shekarpour.github.io/cevo.io/ using https://w3id.org/cevo namespace.
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERSveerababupersonal22
It consists of cw radar and fmcw radar ,range measurement,if amplifier and fmcw altimeterThe CW radar operates using continuous wave transmission, while the FMCW radar employs frequency-modulated continuous wave technology. Range measurement is a crucial aspect of radar systems, providing information about the distance to a target. The IF amplifier plays a key role in signal processing, amplifying intermediate frequency signals for further analysis. The FMCW altimeter utilizes frequency-modulated continuous wave technology to accurately measure altitude above a reference point.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
6th International Conference on Machine Learning & Applications (CMLA 2024)ClaraZara1
6th International Conference on Machine Learning & Applications (CMLA 2024) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of on Machine Learning & Applications.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...ssuser7dcef0
Power plants release a large amount of water vapor into the
atmosphere through the stack. The flue gas can be a potential
source for obtaining much needed cooling water for a power
plant. If a power plant could recover and reuse a portion of this
moisture, it could reduce its total cooling water intake
requirement. One of the most practical way to recover water
from flue gas is to use a condensing heat exchanger. The power
plant could also recover latent heat due to condensation as well
as sensible heat due to lowering the flue gas exit temperature.
Additionally, harmful acids released from the stack can be
reduced in a condensing heat exchanger by acid condensation. reduced in a condensing heat exchanger by acid condensation.
Condensation of vapors in flue gas is a complicated
phenomenon since heat and mass transfer of water vapor and
various acids simultaneously occur in the presence of noncondensable
gases such as nitrogen and oxygen. Design of a
condenser depends on the knowledge and understanding of the
heat and mass transfer processes. A computer program for
numerical simulations of water (H2O) and sulfuric acid (H2SO4)
condensation in a flue gas condensing heat exchanger was
developed using MATLAB. Governing equations based on
mass and energy balances for the system were derived to
predict variables such as flue gas exit temperature, cooling
water outlet temperature, mole fraction and condensation rates
of water and sulfuric acid vapors. The equations were solved
using an iterative solution technique with calculations of heat
and mass transfer coefficients and physical properties.
1. A Tutorial on
Question Answering Systems
Saeedeh Shekarpour
Postdoc researcher from EIS research group
31 August 2015EIS research group - Bonn University
1
2. Outline
Introduction
Associations for evaluation QA systems
Preliminary Concepts
Data Web
Emerging Concepts
Deeper view on SINA Project
31 August 2015EIS research group - Bonn University
2
3. Query
Answer
What is a Question Answering (QA)
system?
Systems that automatically answer questions posed by
humans in natural language query.
31 August 2015EIS research group - Bonn University
3
Natural language is the common
way for sharing knowledge
4. Question answering is a multi
disciplinary field
31 August 2015EIS research group - Bonn University
4
Semantic
Web
Linked Data
5. Difference to Information Retrieval
Information Retrieval is a query driven approach for
accessing information.
System returns a list of documents.
It is responsibility of user to navigate on the retrieved
documents and find its own information need.
Question Answering is an answer driven approach for
accessing information.
User asks its question in natural language ( i.e. phrase-
based, full sentence or even keyword based) queries.
System returns the list of short answers.
More complex functionality.
31 August 2015EIS research group - Bonn University
5
6. Search engines are moving towards
QA
31 August 2015EIS research group - Bonn University
6
7. Search engines still lack the ability to
answer more complex queries
31 August 2015EIS research group - Bonn University
7
8. Natural language queries are
classified into different categories
Factoid queries: WH questions like when, who, where.
Yes/ No queries: Is Berlin capital of Germany?
Definition queries: what is leukemia?
Cause/consequence queries: How, Why, What. what are
the consequences of the Iraq war ?
31 August 2015EIS research group - Bonn University
8
9. Natural language queries are
classified into different categories
Procedural queries: which are the steps for getting a
Master degree?
Comparative queries: what are the differences between
the model A and B?
Queries with examples: list of hard disks similar to hard
disk X.
Queries about opinion: What is the opinion of the majority
of Americans about the Iraq war?
31 August 2015EIS research group - Bonn University
9
10. Corpus Type
Structured data (relational data bases, RDF knowledge
bases).
Semi-structured data (XML databases)
Free text
Multimodal data: image, voice, video
31 August 2015EIS research group - Bonn University
10
11. Types of QA systems
Open-domain: domain independent QA systems can
answer any query from any corpus
+ covers wide range of queries
- low accuracy
Closed-domain: domain specific QA systems are limited
to specific domains
+ High accuracy
- limited coverage over the possible queries
- Needs domain expert
31 August 2015EIS research group - Bonn University
11
12. Is QA system a need for user?
Search engines query log analysis shows that
Real informational queries in Google:
Who first invented rock and roll music?
When was the mobile phone invented?
Where was the hamburger invented?
How to lose weight?
31 August 2015EIS research group - Bonn University
12
Type of query Query log
analysis
Informational 48%
Navigational 20%
Transactional 30%
13. Introduction
Associations for evaluation QA systems
Preliminary Concepts
Data Web
Emerging Concepts
Deeper view on SINA Project
31 August 2015EIS research group - Bonn University
13
14. Text Retrieval Conference (Trec)
In 1999, Trec initiated a QA track,
From 1999-2002, participant systems were allowed to
return ranked answer snippets. These snippets of text
contain the actual answer.
From 2002, participant systems were allowed to return
only the exact answer with confidence rate.
The evaluation metric was mean reciprocal rank (MRR).
31 August 2015EIS research group - Bonn University
14
15. Typical pipeline for the participating
systems in Trec
31 August 2015EIS research group - Bonn University
15
Query
Expansion
IR
Approach
Passage
Retrieval
Answer
Selection
Predicting
the type
of answer
Answer
Query
16. QA at Clef
The Cross-Language Evaluation Forum (CLEF) initiative
provides
Since 2003, Clef included a QA track.
an infrastructure for the testing, tuning and evaluation of
information retrieval systems operating on European
languages in both monolingual and cross-language
contexts.
31 August 2015EIS research group - Bonn University
16
17. Outline
Introduction
Associations for evaluation QA systems
Preliminary Concepts
Data Web
Emerging Concepts
Deeper view on SINA Project
31 August 2015EIS research group - Bonn University
17
18. Core of a QA system
31 August 2015EIS research group - Bonn University
18
query
19. Syntactic Parsing: Part-of-speech
Tagging
I like sweet cakes
I/PRP like/VBP sweet/JJ cakes/NNS
31 August 2015EIS research group - Bonn University
19
like/VBP
I/PRP cakes/NNS
sweet/JJ
20. Type of Answer
Where was the hamburger invented?
White Castle traces the origin of the hamburger to
Hamburg, Germany with its invention by Otto Kuase.
31 August 2015EIS research group - Bonn University
20
Place
21. Named Entity Recognition on Query
where was Franklin Roosevelt born?
31 August 2015EIS research group - Bonn University
21
Named Entity: Person
22. Relation Extraction
Barack Hussein Obama is the 44th and current President of
the United States. Born in Honolulu, Hawaii, Obama is a
graduate of Columbia University and Harvard Law School.
31 August 2015EIS research group - Bonn University
22
Named Entity: Person
Named Entity: Place
Relation: President of
23. Watson Project
Watson is a computer which is capable of answering
question issued in natural language.
Questions come from quiz show called Jeopardy.
The software of this project is called DeepQA project.
In 2011, Watson won the former winners of quiz show
Jeopardy.
31 August 2015EIS research group - Bonn University
23
24. Watson Description
Hardware: Watson system has 2,880 POWER processor
threads and has 16 terabytes of RAM.
Data: encyclopedias, dictionaries, thesauri, newswire
articles, and literary works. Watson also used databases,
taxonomies, and ontologies such as DBPedia, WordNet, and
Yago were used.
Software: DeepQA software and the Apache UIMA
framework. The system was written in various languages,
including Java, C++, and Prolog, and runs on the SUSE Linux
Enterprise Server 11 operating system using Apache Hadoop
framework to provide distributed computing.
31 August 2015EIS research group - Bonn University
24
25. The high level architecture of
DeepQA
31 August 2015EIS research group - Bonn University
25
26. Decomposition example
Query: Of the four countries in the world that the United
States does not have diplomatic relations with, the one
that’s farthest north.
Bhutan, Cuba, Iran, North Korea
North Korea
31 August 2015EIS research group - Bonn University
26
27. Outline
Introduction
Associations for evaluation QA systems
Preliminary Concepts
Data Web
Emerging Concepts
Deeper view on SINA Project
31 August 2015EIS research group - Bonn University
27
29. The growth of Linked Open Data
31 August 2015EIS research group - Bonn University
29
August 2014
570 Datasets
More than 74 billion triples
May 2007
12 Datasets
30. How to retrieve data from Linked
Data?
31 August 2015EIS research group - Bonn University
30
31. RDF Model
RDF is an standard for describing Web resources.
The RDF data model expresses statements about Web
resources in the form of subject-predicate-object (triple).
The statement “Jack knows Alice” is represented as:
31 August 2015EIS research group - Bonn University
31
knowing
32. Outline
Introduction
Associations for evaluation QA systems
Preliminary Concepts
Data Web
Emerging Concepts
Deeper view on SINA Project
31 August 2015EIS research group - Bonn University
32
34. Semantic Annotation
Name Entity Recognition
Where is the capital of Germany?
Semantic Annotation
Where is the capital of Germany?
31 August 2015EIS research group - Bonn University
34
Named Entity: Place
Entity:
http://dbpedia.org/resource/Germ
any
35. 31 August 2015EIS research group - Bonn University
35
Berlin in the 1920s was the third largest municipality in the world.
After World War II, the city became divided into East Berlin -- the
capital of East Germany -- and West Berlin, a West German
exclave surrounded by the Berlin Wall from 1961–89.
Berlin in the 1920s was the third largest municipality in the world.
After World War II, the city became divided into East Berlin -- the
capital of East Germany -- and West Berlin, a West German
exclave surrounded by the Berlin Wall from 1961–89.
39. Outline
Introduction
Associations for evaluation QA systems
Preliminary Concepts
Data Web
Emerging Concepts
Deeper view on SINA Project
31 August 2015EIS research group - Bonn University
39
40. SINAArchitecture
31 August 2015EIS research group - Bonn University
40
Client
QueryPreprocessing
QueryExpansion
ResourceRetrieval
Disambiguation
QueryConstruction
Representation
Server
UnderlyingInterlinked
KnowledgeBases
query result
keywords
valid segments
mapped resources
tuple of
resources
SPARQL
queries
OWL API
http client
Stanford
CoreNLP
SegmentValidation
Reformulated query
41. Comparison of search approaches
31 August 2015EIS research group - Bonn University
41
Data-semantic
unaware
Data-semantic
aware
Keyword-based
query
42. Objective: transformation from
textual query to formal query
31 August 2015EIS research group - Bonn University
42
Which televisions shows were created by Walt
Disney?
SELECT * WHERE
{ ?v0 a dbo:TelevisionShow.
?v0 dbo:creator dbr:Walt_Disney. }
44. Query Segmentation
31 August 2015EIS research group - Bonn University
44
Definition: query segmentation is the process of identifying the right
segments of data items that occur in the keyword queries.
45. Resource Disambiguation
31 August 2015EIS research group - Bonn University
45
Definition: resource disambiguation is the process of
recognizing the suitable resources in the underlying
knowledge base.
•Who produced films starring Natalie
Portman?
•dbpedia/ontology/film
•dbpedia/property/film
•What are the side effects of drugs used
for Tuberculosis?
•diseasome:Tuberculosis
•sider:Tuberculosis
46. Concurrent Approach
31 August 2015EIS research group - Bonn University
46
Query
Segmentation
Resource
Disambiguation
47. Modeling using hidden Markov
model
31 August 2015EIS research group - Bonn University
47
1
2
3
Unknown
Entity
4
5
6
7
8
9
Start
Keyword 1 Keyword 3Keyword 2 Keyword 4
48. Bootstrapping the model parameters
1. Emission probability is defined based on the similarity of the label of
each state with a segment, this similarity is computed based on
string-similarity and Jaccard-similarity.
2. Semantic relatedness is a base for transition probability and initial
probability. Intuitively, it is based on two values: distance and
connectivity degree. We transform these two values to hub and
authority values using weighted HITS algorithm.
3. HITS algorithm is a link analysis algorithm that was originally
developed for ranking Web pages. It assign a hub and authority
value to each web page.
4. Initial probability and transition probability are defined as a uniform
distribution over the hub and and authority values.
31 August 2015EIS research group - Bonn University
48
49. 31 August 2015EIS research group - Bonn University
49
Sequence
of
keywords
(television show creat Walt Disney)
Paths 0.0023 dbo:TelevisionShow dbo:creator dbr:Walt_Disney
0.0014 dbo:TelevisionShow dbo:creator dbr:Category:Walt_Disney
0.000589 dbr:TelevisionShow dbo:creator dbr:Walt_Disney
0.000353 dbr:TelevisionShow dbo:creator dbr:Category:Walt_Disney
0.0000376 dbp:television dbp:show dbo:creator dbr:Category:Walt_Disney
Output of the model
50. Query Expansion
Definition: query expansion is a way of reformulating the
input query in order to overcome the vocabulary
mismatch problem.
31 August 2015EIS research group - Bonn University
50
• Wife of Barak Obama
•Spouse of Barak Obama
52. Linguistic features
WordNet is a popular data source for expansion.
Linguistic features extracted from WordNet are:
1. Synonyms: words having a similar meanings to the input
keyword.
2. Hyponyms: words representing a specialization of the input
keyword.
3. Hypernyms: words representing a generalization of the input
keyword.
31 August 2015EIS research group - Bonn University
52
53. Semantic features from Linked Data
31 August 2015EIS research group - Bonn University
53
1. SameAs: deriving resources using owl:sameAs.
2. SeeAlso: deriving resources using rdfs:seeAlso.
3. Equivalence class/property: deriving classes or properties
using owl:equivalentClass and owl:equivalentProperty.
4. Super class/property: deriving all super classes/properties of
by following the rdfs:subClassOf or rdfs:subPropertyOf
property.
5. Sub class/property: deriving resources by following the
rdfs:subClassOf or rdfs:subPropertyOf property paths ending
with the input resource.
54. Semantic features from Linked Data
5. Broader concepts: deriving using the SKOS vocabulary
properties skos:broader and skos:broadMatch.
6. Narrower concepts: deriving concepts using skos:narrower
and skos:narrowMatch.
7. Related concepts: deriving concepts using skos:closeMatch,
skos:mappingRelation and skos:exactMatch.
31 August 2015EIS research group - Bonn University
54
55. Exemplary expansion graph of the
word movie
31 August 2015EIS research group - Bonn University
55
movie
home
movie
production
film
motion
picture show
video
telefilm
56. Statistics over the number of the
derived words from WordNet and
Linked Data
31 August 2015EIS research group - Bonn University
56
Feature #derived words
synonym 503
hyponym 2703
hypernym 657
sameAs 2332
seeAlso 49
equivalence 2
super class/property 267
Sub class/property 2166
57. Automatic query expansion
31 August 2015EIS research group - Bonn University
57
Input query
External Data source
Data extraction and
preparation
Heuristic method
Reformulated query
58. Expansion set for each segment
31 August 2015EIS research group - Bonn University
58
59. Reformulating query using hidden
Markov model
31 August 2015EIS research group - Bonn University
59
Barak
Barak
Obama
spouse
Obama
wife
first
lady
woman
Barak Obama wife
Barak
Obama
Start
Obama
wife
Barak
Obama
wife
Input query: wife of Barak Obama
60. Formal Query Construction
Definition: Once the resources are detected, a
connected subgraph of the knowledge base graph,
called the query graph, has to be determined which fully
covers the set of mapped resources.
31 August 2015EIS research group - Bonn University
60
Disambiguated
resources
sider:sideEffect
diseasome:possibleDrug
diseasome:1154
SPARQL query SELECT ?v3 WHERE {
diseasome:115 diseasome:possibleDrug ?v1 .
?v1 owl:sameAs ?v2 .
?v2 sider:sideEffect ?v3 .}
61. Data Fusion on Linked Data
Answer of a question may be spread among different
datasets employing heterogeneous schemas.
Constructing a federated query from needs to exploit
links between the different datasets on the schema and
instance levels.
31 August 2015EIS research group - Bonn University
61
62. Two different approaches
Formal query construction based on
Template-based query construction
Forward chaining based query construction
31 August 2015EIS research group - Bonn University
62
63. Federated query construction using
forward chaining
31 August 2015EIS research group - Bonn University
63
Query What is the side effects of drugs used for Tuberculosis?
resources diseasome:1154 (type instance)
diseasome:possibleDrug (type property)
sider:sideEffect (type property)
1154 ?v0
possibleDrug
Graph 1
?v1 ?v2
sideEffect
Graph 2
115
4
?v0
possibleDrug
Template 1
?v1 ?v2
sideEffect
Template 2
115
4
?v0
possibleDrug
?v1 ?v2
sideEffect
UIMA (Unstructured Information Management Architecture)
. Decomposition: Some more complex clues contain multiple facts about the answer, all of which are required to arrive at the correct response but are unlikely to occur together in one place.