NATURAL LANGUAGE INTERFACE
What is NLIDB
Purpose of NLIDB
Architecture of NLIDB
Where NLIDB is used
WHAT IS NATURAL LANGUAGE INTERFACE TO
The idea of using natural language instead of SQL
has prompted the development of new type of
processing method called Natural Language
Interface to DataBase Systems (NLIDB).
NLIDB is a step towards the development of
Intelligent DataBase Systems (IDBS) to enhance
the users in performing flexible querying in
PURPOSE OF NLIDB
The purpose of natural language interfaces is to
allow users to compose Question in natural
language and receive responses.
Asking questions to databases in natural language
like English is a very convenient and easy method
of data access from database system especially for
normal users who do not understand complicated
database query languages such as SQL.
This enables a user to simply enter queries in
English to the Natural Language Database
The lexicon supports the following two operations:
1.Given a word stem ws, retrieve the set of tokens
which contain ws .
2.Given a token t, retrieve the set of database
elements matching t.
The names of all database elements are extracted
and split into individual words. Each word is then
stemmed and a corresponding set of synonyms is
identified using a general purpose word ontology.
Each database element is thus associated with a set
of word stems and each word stem is in turn
associated with set of synonyms.
The tokenizer’s input is a natural language question
and its output is the set of all possible complete
tokenization's of the question.
The tokenizer proceeds by stemming each word in
the question, and the looking up in the lexicon the
set of tokens containing the word stem.
For each potential token, the tokenizer checks
whether the other words in the token are also
present in the question.
Finally, the tokenizer also assigns to each token the
types of database elements it could potentially
match to (e.g., value , attribute, etc.).
In Matcher we reduce the problem of finding a
semantic interpretation of ambiguous natural
language tokens as database elements to a graph
Parser Plug in
o It extracts attachment relationships between
tokens from the parse tree.
o The attachment relationships are used by the
matcher in the generation of valid
mappings(only semantic interpretations which
satisfy the syntactic attachment constraints
represent valid mappings).
The query generator takes the database elements
selected by the matcher and weaves them into a
well-formed SQL query.
In the case of single-relation queries, this process is
In the case of multi-relation queries, the generator
adds join conditions to the WHERE clause, which
reflect a join path that contains all the relations
implicitly invoked by attributes in the query.
The generator generates a query for each possible
join path and submits the queries to the
The equivalence checker tests whether there are
multiple distinct solutions to the maxflow problem
and whether these solutions translate into distinct
If this NLIDB finds two distinct SQL queries, it does
not output an answer.
o Natural Query
What are the HP jobs on a Unix system?
SYNTACTIC MARKERS (are , the , on , a)
Tokens (job system HP Unix what)
o SQL Query
SELECT DISTINCT Description FROM JOB
WHERE Platform =‘HP’ AND Company =‘Unix’;
WHERE NLIDB IS USED??
There are many applications that can take
advantages of NLIDB.
1. In PDA(Personal Digital Assistance) and cell phone
environments, the display screen is not as wide as
a computer or a laptop. Filling a form that has many
fields can be tedious: one may have to navigate
through the screen, to scroll, to look up the scroll
box values, etc.
2. Instead, with NLIDB, the only work that needs to be
done is to type the question similar to the SMS
(Short Messaging System).
No Artificial Language
Simple, easy to use
Better for Some Questions
Easy to Use for Multiple Database Tables
Linguistic coverage is not obvious
Linguistic vs. conceptual failures
Though several NLIDB systems have also been
developed so far for commercial use but the use of
NLIDB systems is not wide-spread and it is not a
standard option for interfacing to a database.
This lack of acceptance is mainly due to the large
number of deficiencies in the NLIDB system in
order to understand a natural language.