SlideShare a Scribd company logo
1 of 20
Download to read offline
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
DOI : 10.5121/ijcseit.2012.2602 13
EFFICIENT SCHEMA BASED KEYWORD SEARCH IN
RELATIONAL DATABASES
Myint Myint Thein1
and Mie Mie Su Thwin2
1
University of Computer Studies, Mandalay, Myanmar
mmyintt@gmail.com
2
University of Computer Studies, Mandalay, Myanmar
drmiemiesuthwin@mmcert.org.mm
ABSTRACT
Keyword search in relational databases allows user to search information without knowing database
schema and using structural query language (SQL). In this paper, we address the problem of generating
and evaluating candidate networks. In candidate network generation, the overhead is caused by raising the
number of joining tuples for the size of minimal candidate network. To reduce overhead, we propose
candidate network generation algorithms to generate a minimum number of joining tuples according to the
maximum number of tuple set. We first generate a set of joining tuples, candidate networks (CNs). It is
difficult to obtain an optimal query processing plan during generating a number of joins. We also develop a
dynamic CN evaluation algorithm (D_CNEval) to generate connected tuple trees (CTTs) by reducing the
size of intermediate joining results. The performance evaluation of the proposed algorithms is conducted
on IMDB and DBLP datasets and also compared with existing algorithms.
KEYWORDS
Candidate Network, Connected Tuple Tree, Joining Tuples, Keyword Query, Keyword Search, Relational
Database
1. INTRODUCTION
The most critical and valuable amount of data such as business data has been stored in relational
databases. Relational database management system (RDBMS) is a DBMS in which data is saved
in tables and the relationships among the data are saved in tables. The data can be reassembled
and accessed in many different ways without change the table forms. Most commercial relational
database management system uses SQL to access the database. With more and more data being
stored in relational database, it has become crucial for users to be able to search and browse the
information stored in them. Keyword search in relational databases enables ordinary users, who
do not understand the database schema and SQL, to find the connected tuple sets among the
tuples stored in relations, with a given set of keywords. The existing methods of keyword search
in relational databases can be broadly classified into two categories that are schema based method
and graph based method.
In schema based keyword search in relational database, it has a common method that is
generating candidate network in schema graph transformed from relations. Data is stored in the
form of columns, tables and primary key to foreign key relationships in relational databases.
According to develop the schema graph, we illustrate two schema graphs as examples. Figure 1
shows the schema graph of publication database from DBLP dataset. It consists of six relation
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
14
schemas that are Person, InProceeding, RelaitonPersonInProceeding, Proceeding, Publisher and
Series. Each relation has a primary key from except RelationPersonInProceeding relation.
InProceeding relation has one foreign key that refers to the primary key defined on Proceeding
relation. Proceeding relation has two foreign key that refers to the primary key defined on both
Publisher and Series relations. The movies database schema graph of IMDB dataset shows in
Figure 2. It consists of six relation schemas: Movies, Directors, Movies-Directors, Movies-
Genres, Actors and Roles. Each relation has a primary key from except Movies-Directors
relation. Roles relation has one foreign key that refers to the primary key defined on Actors
relation.
The logical unit of answers needed by users is not limited to an individual column value or ever
an individual tuple for a given keyword query. It may be multiple tuples joined together. Given
keyword search in relational databases, generating minimum joining tuples sets of relations that
contained keyword is called candidate network, such as SQL. A candidate network must satisfy
the two conditions, total and minimal. Because it is meaningless if two tuples in a candidate
network are too far away from each other, the maximum numbers of tuples allowed in a candidate
network are needed to specify [18].
Suppose user wants to get the papers written by “Jinlin Chen” from DBLP database. The system
generates the relevant CNs, such as Person ⋈ Relation-Person-InProceeding ⋈ InProceeding,
with multiple tuples from different relations joined by foreign keys. Generating all valid
candidate networks that are called connected tuple trees by joining tuples from multiple relations.
DISCOVER [4], S-KWS [10], Liu et al.[7], and SPARK2 [9] are systems that support keyword
search on relational database. They generated tuple trees as answer for the CN generation. The
first two systems need to reduce the cost of generating minimal CNs, while the last two systems
cannot solve the growing number of CNs for small CN size. Existing candidate network
generation, CN’s size is unbounded and the number of CNs grows very large for small CN’s size.
This fact brings large overhead for CNs generation. Due to large number of generated CNs to be
evaluated, multi-query optimization problem is caused on the CNs evaluation.
In this paper, we focus on generating the valid CNs on the data bond and producing the minimal
connected tuple trees. We develop algorithms in order to generate all minimum connected trees of
tuples in the database with no more than the maximum number of tuple set. We proposed
candidate network generation algorithms to find relevant answers on-the-fly by joining tuples in
the database. We also proposed the dynamic CN evaluation algorithm (D_CNEval) by evaluating
the number of generated CNs. We conduct the experimental results on DBLP and IMDB
databases and present the analysis of worst case time for these algorithms.
The rest of the paper is organized as follows: Section 2 discusses the related work. Section 3
presents the basic concept of keyword query and CN. The overview of proposed system is
specified in Section 4. Section 5 presents the Candidate Network Generation. Section 6 illustrates
the query execution and Section 7 shows the experimental results. Section 8 concludes this paper.
2. RELATED WORK
The main goal of a keyword search system is to find a set of closely inter-connected tuples that
collectively match the keywords. One type of methods is based on modeling data as a graph, and
the results as subtrees or sub-graphs. Another type of methods is based on relational databases
where structured data are stored.
Several researchers have been done on early keyword search systems for relational databases [2,
7, 8, 15]. Yu et al. [18] surveyed the developments on finding structural information among tuples
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
in an RDB using an l-keyword query. They discussed the keyword search systems by comparing
between schema-based keyword search and graph
evaluated the sets of answers by defining all minimal total joining networks of tuples between
CNs and the latter showed how to answer keyword queries using graph algorithms focused on
weighted directed graph. DBXplorer [1] used undirected graph to
according to each tuple tree. This system accessed to symbol table to get tuples’ information, and
then calculated tuple tree according to schema graph.
DISCOVER [4] proposed the CN generation algorithm based on a breadth
search space. This proposed algorithm expanded the partial CNs generated to larger partial CNs
until all CNs are generated. As the number of partial CNs can be exponentially large, arbitrarily
expanding will make the algorithm extremel
the cost of generating the set of CNs is high and kept in memory for further extension.
S-KWS [10] developed an algorithm that reduces the number of partial results generated by
expanding from part of the nodes in a partial tree and avoid isomorphism testing by assigning a
proper expansion order. Although it reduced the generated partial results, it existed overhead for
generating minimal CNs to the query.
Liu et al. [7] described the answer graph ge
they produced duplication-free CNs by assigning the different alias, they had not considered the
efficiency of answer generation. SPARK2 [9] developed the duplication
canonical form but it did not solve the number of CNs grow
3. PRELIMINARIES
3.1. Data Model
A relational database can be viewed as a graph which represents a relational model such as
schema graph Gs (V, E) [4, 8, 16, 18]. A relation
relation in the database corresponds to a vertex in G
{R1,R2,…}. Edges represent the foreign key to primary key relationships between pairs of
relation schemas, Ri and Rj, denoted R
relation schema, such as a set of tuples, conforming to the relation schema. The graph can be as a
directed or undirected graph. It can be captured every granularity level of the schema
Figure 1. Publication Database Schema Graph
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
keyword query. They discussed the keyword search systems by comparing
based keyword search and graph-based keyword search in RDB. The
evaluated the sets of answers by defining all minimal total joining networks of tuples between
CNs and the latter showed how to answer keyword queries using graph algorithms focused on
weighted directed graph. DBXplorer [1] used undirected graph to construct each SQL statements
according to each tuple tree. This system accessed to symbol table to get tuples’ information, and
then calculated tuple tree according to schema graph.
DISCOVER [4] proposed the CN generation algorithm based on a breadth-first traversal in the
search space. This proposed algorithm expanded the partial CNs generated to larger partial CNs
until all CNs are generated. As the number of partial CNs can be exponentially large, arbitrarily
expanding will make the algorithm extremely inefficient. The problem with this algorithm is that
the cost of generating the set of CNs is high and kept in memory for further extension.
KWS [10] developed an algorithm that reduces the number of partial results generated by
the nodes in a partial tree and avoid isomorphism testing by assigning a
proper expansion order. Although it reduced the generated partial results, it existed overhead for
generating minimal CNs to the query.
Liu et al. [7] described the answer graph generation algorithm to generate tuple trees. Although
free CNs by assigning the different alias, they had not considered the
efficiency of answer generation. SPARK2 [9] developed the duplication-free algorithm by
it did not solve the number of CNs grows very large for small CN size.
A relational database can be viewed as a graph which represents a relational model such as
(V, E) [4, 8, 16, 18]. A relation database is a collection of relations. Each
relation in the database corresponds to a vertex in Gs, denoted as the set of relation schemas
{R1,R2,…}. Edges represent the foreign key to primary key relationships between pairs of
denoted Ri→Rj. A relation on relation schema Rj is an instance of the
relation schema, such as a set of tuples, conforming to the relation schema. The graph can be as a
directed or undirected graph. It can be captured every granularity level of the schema
Figure 1. Publication Database Schema Graph
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
15
keyword query. They discussed the keyword search systems by comparing
based keyword search in RDB. The former
evaluated the sets of answers by defining all minimal total joining networks of tuples between
CNs and the latter showed how to answer keyword queries using graph algorithms focused on
construct each SQL statements
according to each tuple tree. This system accessed to symbol table to get tuples’ information, and
rst traversal in the
search space. This proposed algorithm expanded the partial CNs generated to larger partial CNs
until all CNs are generated. As the number of partial CNs can be exponentially large, arbitrarily
y inefficient. The problem with this algorithm is that
the cost of generating the set of CNs is high and kept in memory for further extension.
KWS [10] developed an algorithm that reduces the number of partial results generated by
the nodes in a partial tree and avoid isomorphism testing by assigning a
proper expansion order. Although it reduced the generated partial results, it existed overhead for
neration algorithm to generate tuple trees. Although
free CNs by assigning the different alias, they had not considered the
free algorithm by
s very large for small CN size.
A relational database can be viewed as a graph which represents a relational model such as
database is a collection of relations. Each
, denoted as the set of relation schemas
{R1,R2,…}. Edges represent the foreign key to primary key relationships between pairs of
is an instance of the
relation schema, such as a set of tuples, conforming to the relation schema. The graph can be as a
directed or undirected graph. It can be captured every granularity level of the schema elements.
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
Figure 2. Movies Database Schema Graph
We use directed schema graphs that show in Figure 1 and Figure 2 as the schema graph of
publication database and movies
key and foreign key attributes are made of same attribute with attribute of related relation. There
are no self loops and at most one primary
3.2. Connected Tuple Tree
A keyword query (Q) consists of a list of keywords {k
tuples that contain the given keywords. For a given query Q, a result is the set of all possible
joining networks of tuples. A joining network of tuple is a connected tuple tree. Each node
tuple in the database, and each pair of adjacent tuples in
primary key relationship. Suppose (Ri,Rj) is an edge in the schema graph. Let ti
(ti join tj) Є (Ri join Rj). Then (ti
the number of tuples involved. Note that a single tuple is the connected tuple tree with size 1. The
size of CTT can have arbitrarily large size, when there exists a many to many relationship in the
schema graph. Therefore, the size of conn
3.3. Candidate Network
Each connected tuple tree is the sets consisting of relational names that produced by a relational
algebra expression, if each tuple in one relation contains a term of the keywords. Fo
keyword query Q, the query tuple set
contain at least one keyword of the query
tuples in relation R and we use R
a free tuple set. A candidate network
node must be a query tuple set. Every edge (R
schema graph Gs. The size of a CN is the number of its tuple sets.
In the framework of RDBMS, a keyword query is processed in the two main steps that are
candidate network generation and candidate network evaluation. In candidate network generation
step, it generates a set of CNs over schema graph G
and duplicate-free upon the maximal size. In candidate network evaluation step, it evaluates the
generated CNs by reducing the size of intermediate joining results
minimal number of CNs and how to evaluate the generated CNs in Section
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
Figure 2. Movies Database Schema Graph
We use directed schema graphs that show in Figure 1 and Figure 2 as the schema graph of
publication database and movies database schema graph. For simplicity, we assume all primary
key and foreign key attributes are made of same attribute with attribute of related relation. There
are no self loops and at most one primary-foreign key relationship between any two relations.
A keyword query (Q) consists of a list of keywords {k1,k2,…,kq}, and searches interconnected
tuples that contain the given keywords. For a given query Q, a result is the set of all possible
joining networks of tuples. A joining network of tuple is a connected tuple tree. Each node
ach pair of adjacent tuples in CTT is connected via a foreign key to
primary key relationship. Suppose (Ri,Rj) is an edge in the schema graph. Let ti Є Ri, tj
(Ri join Rj). Then (ti,tj) is an edge in the connected tuple tree. The size
the number of tuples involved. Note that a single tuple is the connected tuple tree with size 1. The
size of CTT can have arbitrarily large size, when there exists a many to many relationship in the
schema graph. Therefore, the size of connected tuple tree is needed to only data bound.
Each connected tuple tree is the sets consisting of relational names that produced by a relational
algebra expression, if each tuple in one relation contains a term of the keywords. Fo
query tuple set RN
is a set of all tuples which belong to relation
contain at least one keyword of the query Q. We denote RF
the free tuple set which is the set of all
RQ
to denote a tuple set, which can be either a non-free tuple set or
candidate network is a tree of tuple sets RN
or RF
with the restriction that every
node must be a query tuple set. Every edge (Ri
Q
,Rj
Q
) in a CN corresponds to an edge (R
of a CN is the number of its tuple sets.
In the framework of RDBMS, a keyword query is processed in the two main steps that are
candidate network generation and candidate network evaluation. In candidate network generation
p, it generates a set of CNs over schema graph Gs. The set of CNs shall be sound or complete
free upon the maximal size. In candidate network evaluation step, it evaluates the
generated CNs by reducing the size of intermediate joining results. We present how to generate
minimal number of CNs and how to evaluate the generated CNs in Section 5 and Section
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
16
We use directed schema graphs that show in Figure 1 and Figure 2 as the schema graph of
database schema graph. For simplicity, we assume all primary
key and foreign key attributes are made of same attribute with attribute of related relation. There
foreign key relationship between any two relations.
}, and searches interconnected
tuples that contain the given keywords. For a given query Q, a result is the set of all possible
joining networks of tuples. A joining network of tuple is a connected tuple tree. Each node ti is a
is connected via a foreign key to
Ri, tj Є Rj, and
ize of a CTT is
the number of tuples involved. Note that a single tuple is the connected tuple tree with size 1. The
size of CTT can have arbitrarily large size, when there exists a many to many relationship in the
ected tuple tree is needed to only data bound.
Each connected tuple tree is the sets consisting of relational names that produced by a relational
algebra expression, if each tuple in one relation contains a term of the keywords. For a given
is a set of all tuples which belong to relation R that
which is the set of all
free tuple set or
with the restriction that every
) in a CN corresponds to an edge (Ri,Rj) in the
In the framework of RDBMS, a keyword query is processed in the two main steps that are
candidate network generation and candidate network evaluation. In candidate network generation
. The set of CNs shall be sound or complete
free upon the maximal size. In candidate network evaluation step, it evaluates the
. We present how to generate
and Section 6.
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
4. PROPOSED SYSTEM OVER
In this section, we demonstrate the overview of keyword search on relational databases that is
shown in Figure 3. The system supports free
trees with user typed keywords. In this system, the final results are eliminated by processing the
four phases that are following. The query cleaning phase filters out as
removed stopwords query. This process reduces the size of the indexing structure considerably.
The indexing unit in a relational document can be a field, attribute, tuple, table, or any
combination of these. After the system has b
relation, the indexer produces the matched tuple sets by using the filtered input query. The system
generates a set of CNs by traversing on the schema graph in order to the tuple sets. Query
executing phase executes queries for each CNs and generate the connected tuple trees as executed
queries. Finally, the system returns the minimal connected tuple trees to the user for a given
query.
Figure 3. Proposed System Overview
5. CANDIDATE NETWORK
In schema-based keyword search in relational database, the generating all candidate networks for
keyword query Q satisfy the two properties, such as complete and duplication
listed below.
Property 1. The set contains all CNs w
Property 2. Every two CNs are not isomorphic to each other (duplication
In this paper, we propose CN generation algorithms for schema
relational database. Existing keyword search systems, such as DISCOVER and S
CNs temporarily through a breadth
of CNs set is to avoid the generation of redundant joining networks of tuple sets. As the number
of query keywords or maximum size of CN increases, or the database schema becomes
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
PROPOSED SYSTEM OVERVIEW
In this section, we demonstrate the overview of keyword search on relational databases that is
Figure 3. The system supports free-style keyword search by generating connected tuple
trees with user typed keywords. In this system, the final results are eliminated by processing the
four phases that are following. The query cleaning phase filters out as potential index terms as
removed stopwords query. This process reduces the size of the indexing structure considerably.
The indexing unit in a relational document can be a field, attribute, tuple, table, or any
combination of these. After the system has built the inverted index files as posting table for each
relation, the indexer produces the matched tuple sets by using the filtered input query. The system
generates a set of CNs by traversing on the schema graph in order to the tuple sets. Query
phase executes queries for each CNs and generate the connected tuple trees as executed
queries. Finally, the system returns the minimal connected tuple trees to the user for a given
Figure 3. Proposed System Overview
CANDIDATE NETWORK GENERATION
based keyword search in relational database, the generating all candidate networks for
keyword query Q satisfy the two properties, such as complete and duplication-free, which are
ns all CNs with no more than MAXN (completeness).
Property 2. Every two CNs are not isomorphic to each other (duplication-free).
In this paper, we propose CN generation algorithms for schema-based keyword search in
relational database. Existing keyword search systems, such as DISCOVER and S-KWS, generate
CNs temporarily through a breadth-first traversal of schema graph for any user query. The result
of CNs set is to avoid the generation of redundant joining networks of tuple sets. As the number
of query keywords or maximum size of CN increases, or the database schema becomes
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
17
In this section, we demonstrate the overview of keyword search on relational databases that is
style keyword search by generating connected tuple
trees with user typed keywords. In this system, the final results are eliminated by processing the
potential index terms as
removed stopwords query. This process reduces the size of the indexing structure considerably.
The indexing unit in a relational document can be a field, attribute, tuple, table, or any
uilt the inverted index files as posting table for each
relation, the indexer produces the matched tuple sets by using the filtered input query. The system
generates a set of CNs by traversing on the schema graph in order to the tuple sets. Query
phase executes queries for each CNs and generate the connected tuple trees as executed
queries. Finally, the system returns the minimal connected tuple trees to the user for a given
based keyword search in relational database, the generating all candidate networks for
free, which are
based keyword search in
KWS, generate
query. The result
of CNs set is to avoid the generation of redundant joining networks of tuple sets. As the number
of query keywords or maximum size of CN increases, or the database schema becomes
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
18
complicated, it will take much more time to generate CNs for a query Q. There can be two ways
to reduce the time for the generation of CNs. One is to develop a more efficient CN generation
algorithm and the other is to develop preprocessing techniques to generate CNs in advance. In
this section, we develop the efficient CN generation algorithms to address the above problem.
5.1. Heuristic_CNGen Algorithm
In this section, we describe a new CN generation algorithm (Heuristic_CNGen) to generate valid
CNs [13]. Given a keyword query Q, the system first receives all the query tuple set RQ
for all
relations R as input. We use RNorQ
to define a tuple set, if CN is a result then each node belongs to
the non-free query tuple set RN
and the free query tuple set RF
of each relation R for a given
query. Note that the free query tuple set in CN cannot contain the query keyword, but they
support to the non-free query tuple set as primary-foreign keys relationship.
Figure 4. Heuristic_CNGen Algorithm
Heuristic_CNGen(MAXN, f_limit, f_new)
Input: schema graph SG, query Q
Output: a set of candidate networks CN
1. E: queue generated all non-free tuple sets{R1
N
,R2
N
, …,Rn
N
}and all
free tuple sets{R1
F
,R2
F
,…,Rn
F
} for Q.
2. While E is not empty{
3. Pop head T from E
4. If T > MAXN Then T is pruned
5. Else
6. If T is a valid network graph Then add T to CN
7. For each Ri
NorQ
in T do
8. For each Rj
N
that is adjacent to Ri
N
in SG do
9. f_new = h(Rj
N
)
10. g(Rj
N
) = c(Rj
N
, Ri
N
)
11. f(Rj
N
) = g(Rj
N
) + f_new
12. If f(Rj
N
) ≤ f_limit Then
13. Add Rj
N
in front of E
14. Else
15. Add Ri
N
in front of E
16. f_new = f(Rj
N
)
17. f_limit = MAX.VALUE
18. End if
19. End for
20. End for
21. End if
}
22. For each network graph in CN, if there is more than one network
graph in CN, then the same network graph is pruned.
23. Return CN.
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
We identify a network graph as a joined expression of the query tuple sets that produces
candidate networks as result. We define the size of a network graph as the number of nodes the
same as the generated CN’s size. In Figure 4, we present the candidate network generat
algorithm based on IDA* algorithm [3, 6] to generate all network graphs for a given query Q and
schema graph SG.
We set up three parameters: MAXN, f_limit and f_new. First, the maximum number of tuple sets,
denote MAXN, in a network graph to reduce
Rj
N
add in front of queue E, if the estimated cost of the cheapest solution through node R
than given f_limit value. If the estimated cost of node R
that is adjacent node Rj
N
in SG add in front of E. Third, f_new assign heuristic value of a new
node that is adjacent by the existing node in schema graph. Finally, the number of CNs is only
data bounded by the query and database. The Properties 1 and 2 prove t
duplication-free on the results of the algorithm, if we do not violate any
5.2. AT_CNGen Algorithm
We present another CN generation algorithm (AT_CNGen) to improve the performance of
Heuristic_CNGen. In CN generation, we
computation cost and memory cost. This algorithm generated the valid CN in order to complete
and duplication-free. It computes a heuristic value to estimate the cost of a new node that is
adjacent by the existing node in SG at once. If the two nodes in SG have same heuristic value, the
algorithm expands these nodes iteratively. We observe that this algorithm is not efficient because
it does not reduce the large overhead for the above shortcomings. Theref
CN generation algorithm (AT_CNGen) based on adjacent tuple list to address the efficiency of
CN generation.
We can identify an adjacent tuple following the primary and foreign keys into the schema graph,
because the relational database is designed as a schema graph SG. An adjacent tuple is defined a
set of adjacent tuple connected by primary
recognize an adjacent tuple as a joined expression of the query tuple sets that produces CNs as
result. We define the size of an adjacent tuple as the number of tuples the same as the generated
CN’s size.
Figure 5. Adjacent Tuple List for DBLP
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
network graph as a joined expression of the query tuple sets that produces
candidate networks as result. We define the size of a network graph as the number of nodes the
same as the generated CN’s size. In Figure 4, we present the candidate network generat
algorithm based on IDA* algorithm [3, 6] to generate all network graphs for a given query Q and
We set up three parameters: MAXN, f_limit and f_new. First, the maximum number of tuple sets,
denote MAXN, in a network graph to reduce generating meaningless results. Second, the node
add in front of queue E, if the estimated cost of the cheapest solution through node R
than given f_limit value. If the estimated cost of node Rj
N
is more than f_limit value, the node R
in SG add in front of E. Third, f_new assign heuristic value of a new
node that is adjacent by the existing node in schema graph. Finally, the number of CNs is only
data bounded by the query and database. The Properties 1 and 2 prove the completeness and
free on the results of the algorithm, if we do not violate any constraints.
We present another CN generation algorithm (AT_CNGen) to improve the performance of
Heuristic_CNGen. In CN generation, we used Heuristic_CNGen algorithm in order to reduce the
computation cost and memory cost. This algorithm generated the valid CN in order to complete
free. It computes a heuristic value to estimate the cost of a new node that is
e existing node in SG at once. If the two nodes in SG have same heuristic value, the
algorithm expands these nodes iteratively. We observe that this algorithm is not efficient because
it does not reduce the large overhead for the above shortcomings. Therefore, we propose a new
CN generation algorithm (AT_CNGen) based on adjacent tuple list to address the efficiency of
We can identify an adjacent tuple following the primary and foreign keys into the schema graph,
se is designed as a schema graph SG. An adjacent tuple is defined a
set of adjacent tuple connected by primary-foreign key relationship in SG. In this way, we
recognize an adjacent tuple as a joined expression of the query tuple sets that produces CNs as
esult. We define the size of an adjacent tuple as the number of tuples the same as the generated
Figure 5. Adjacent Tuple List for DBLP
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
19
network graph as a joined expression of the query tuple sets that produces
candidate networks as result. We define the size of a network graph as the number of nodes the
same as the generated CN’s size. In Figure 4, we present the candidate network generation
algorithm based on IDA* algorithm [3, 6] to generate all network graphs for a given query Q and
We set up three parameters: MAXN, f_limit and f_new. First, the maximum number of tuple sets,
generating meaningless results. Second, the node
add in front of queue E, if the estimated cost of the cheapest solution through node Rj
N
is less
is more than f_limit value, the node Ri
N
in SG add in front of E. Third, f_new assign heuristic value of a new
node that is adjacent by the existing node in schema graph. Finally, the number of CNs is only
he completeness and
We present another CN generation algorithm (AT_CNGen) to improve the performance of
used Heuristic_CNGen algorithm in order to reduce the
computation cost and memory cost. This algorithm generated the valid CN in order to complete
free. It computes a heuristic value to estimate the cost of a new node that is
e existing node in SG at once. If the two nodes in SG have same heuristic value, the
algorithm expands these nodes iteratively. We observe that this algorithm is not efficient because
ore, we propose a new
CN generation algorithm (AT_CNGen) based on adjacent tuple list to address the efficiency of
We can identify an adjacent tuple following the primary and foreign keys into the schema graph,
se is designed as a schema graph SG. An adjacent tuple is defined a
foreign key relationship in SG. In this way, we
recognize an adjacent tuple as a joined expression of the query tuple sets that produces CNs as
esult. We define the size of an adjacent tuple as the number of tuples the same as the generated
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
Consider the publication database schema graph in Figure 1 and the movies database schema
graph in Figure 2, where we can iteratively get each tuple in SG followed by primary
keys relationship and obtain the adjacent tuple. For example, we illustrate the adjacent tuple list
for query Q = “Chen Web Springer
Jack David” in IMDB that are shown in Figure 5 and Figure 6.
Figure 6. Adjacent Tuple List for IMDB
The adjacent tuple list based method has some features. First, adjacent tuple lists is effective to
generate candidate networks as
isomorphic. Second, the relationships between adjacent tuples through primary
be identified, so we can efficiently generate the valid candidate networks. Third, a set of adjacen
tuple list is no larger than the total primary
AT_CNGen()
Input: A set of adjacent tuple list D
Output: A set of CN
1. CN ← Φ
2. For each T
3. For each D
4. For each d
5.
6.
7.
8.
9. End if
10.
11. End for
12. End for
13. Return CN.
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
Consider the publication database schema graph in Figure 1 and the movies database schema
Figure 2, where we can iteratively get each tuple in SG followed by primary
keys relationship and obtain the adjacent tuple. For example, we illustrate the adjacent tuple list
Chen Web Springer” in DBLP and the adjacent tuple list for query Q = “Black
Jack David” in IMDB that are shown in Figure 5 and Figure 6.
Figure 6. Adjacent Tuple List for IMDB
The adjacent tuple list based method has some features. First, adjacent tuple lists is effective to
generate candidate networks as they capture structures. They can depict a meaningful and non
isomorphic. Second, the relationships between adjacent tuples through primary-foreign keys can
be identified, so we can efficiently generate the valid candidate networks. Third, a set of adjacen
tuple list is no larger than the total primary-foreign keys relationships in the underlying database.
Figure 7. AT_CNGen Algorithm
AT_CNGen()
Input: A set of adjacent tuple list D
A set of CN
2. For each T Є getTupleSet(ki) do
3. For each D Є getAdjacentList(T) do
For each d Є D do
If( d is a valid CN and d∉CN ) Then
Add d into CN.
Else
d is pruned.
End if
End for
End for
End for
Return CN.
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
20
Consider the publication database schema graph in Figure 1 and the movies database schema
Figure 2, where we can iteratively get each tuple in SG followed by primary-foreign
keys relationship and obtain the adjacent tuple. For example, we illustrate the adjacent tuple list
r query Q = “Black
The adjacent tuple list based method has some features. First, adjacent tuple lists is effective to
they capture structures. They can depict a meaningful and non-
foreign keys can
be identified, so we can efficiently generate the valid candidate networks. Third, a set of adjacent
foreign keys relationships in the underlying database.
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
21
We demonstrate this algorithm to generate all CNs for a given query Q and schema graph SG that
is shown in Figure 7. In order to generate valid CNs, AT_CNGen algorithm first accepts the
adjacent tuple lists as input. During each CN generation, AT_CNGen calls getTupleSet(K) to get
a query tuple sets T for a given query Q. And then it calls getAdjacentList(T,MAXN) to take the
adjacent tuple list for getting query tuple sets. Each adjacent tuple list d adds CN, if d is a valid
CN and is duplicated on each others. If d is invalid and identical, d is pruned.
Figure 8. getTupleSet Algorithm
In Figure 8, getTupleSet(K) returns a set of query tuple set T for a keyword query Q. We set up
one parameter K that is the number of keyword. We receive all the non-free query tuple set RN
and the free query tuple set RF
of each relation R for a given query. First, the algorithm checks the
length of input keyword query. If length of keyword query is one, it produces the non-free query
tuple sets Ri
N
by using the inverted index for that keyword. And it returns the query tuple sets.
Where keyword’s length is more than one, the non-free query tuple sets Ri
N
is made by indexing
for each keyword query. At that time, the algorithm returns T by adding Ri
N
which are not
identical.
getAdjacentList(T,MAXN) is put two parameter: T and MAXN that is shown in Figure 9. It first
receives the non-free query tuple sets T and schema graph SG as input. For each tuple set Ri
N
, it
takes Rj
NorF
that is adjacent tuple Ri
N
in schema graph as adjacent tuple d. Next the algorithm
checks each adjacent tuple d, which is not identical and no more than the maximum number of
tuple sets. And it removes an invalid adjacent tuple. The algorithm returns the valid adjacent tuple
list subsequently.
getTupleSet(K)
Input: query Q={k1,k2,….,ki} Є K, all non-free tuple sets
{R1
N
, R2
N
,…,Ri
N
}and all free tuple sets {R1
F
, R2
F
,…, Ri
F
}
for Q
Output: A set of query tuple set T
1. T ← Φ
2. While (K.length is not empty){
3. If (K.length = 1) Then
4. Ri
N
← Index(ki)
5. Add Ri
N
into T.
6. Else
7. Ri
N
← Index(ki)
8. If (Ri
N
∉T ) Then
9. Add Ri
N
into T.
10. Else
11. Ignore Ri
N
.
12. End if
13. End if
14. }
15. Return T.
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
22
Eventually, AT_CNGen algorithm generates all candidate networks no more than the maximal
number of tuple sets for the user input keywords. The generated CNs is only data bounded by
following Properties 1 and 2.
Figure 9. getAdjancentList Algorithm
6. QUERY EXECUTION
In this section, we present the generating CTTs by executing the generated CNs in order
to get the results. For instance, consider the query Q = “Jinlin Content”. The
corresponding result consists of the connected tuple trees: P1→I1, P1→I2. Each
connected tuple trees corresponds to a tree at schema level. For example, both of the
above trees correspond to the schema level tree Person{Jinlin}
→ Relation-Person-
InProceeding{}
← InProceeding{Content}
, where each Ri
K
consists of the tuples of Ri that
contain all keywords of K and no other keyword of Q.
Given a query Q, all possible tuple sets Ri
K
are computed, where Ri
K
= {t | t Є Ri∧∀wk Є
K, t contains wk∧∀wj Є QK, t does not contain wj} [12]. After selecting a keyword
query wl, all tuple sets Ri
K
for which wl Є K are located. These are the initial connected
tuple tree with only one node. Then, these trees are expanded either by adding a tuple set
that contains at least another keyword query or a tuple set that is free tuple set. These
trees can be further expanded. The connected tuple trees that contain all keywords query
are returned.
In RDBMS, the problem of evaluating all CNs in order to get all connected tuple trees is
a multi-query optimization problem. There are two main issues: (1) How to share
common sub-expressions among CNs generated in order to reduce computational cost
when evaluating. (2) How to find a proper join order to fast evaluate all CNs. For a
getAdjacentList(T,MAXN)
Input: Tuple set T: {R1
N
, R2
N
,…,Ri
N
} Є T, Schema graph SG
Output: Adjacent tuple list D
1.D ← Φ
2. For each Ri
N
in T do
3. For each Rj
NorF
that is adjacent to Ri
N
in SG do
4. Add Rj
Nor F
with an edge into d.
5. If ( d ∉D and d < MAXN) Then
6. Add d into D.
7. Else
8. Ignore d.
9. End if
10. End for
11. End for
12. Return D.
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
keyword query, the number of CNs generated can be very large. Given a large number of
joins, it is extremely difficult to obtain an optimal query processing plan.
one best plan for a CN may make others slow down, if its subtrees are shared by others
CNs [11].
The idea of evaluating the sub
DISCOVER [4] proposed the algorithm
algorithm. In this algorithm,
evaluated first and may generate the smallest number of result
S-KWS [10] constructed an operator mesh
evaluating all CNs. When evaluating all CNs in a mesh, a projected relation with the
smallest number of tuples is selected to start and to
evaluating all CNs using only joins may
tuples. They proposed to use semijoin/join sequences to evaluate a
In this paper, we present the algorithm for executing CN in order to CN evaluation
strategy. It is observed that there is substantial evaluating the common join expressions
among CNs. As a consequence, the computational efforts can be saved if multiple CNs
can be executed in a calculated way that minimizes the sizes of joining intermediate
results.
6.1. Evaluating Candidate Network
We present the D_CNEval algorithm based on the idea of dynamic query optimization algorithm
[14] for CN evaluation that is shown in F
perform this task. In general, it will return the result of query execution.
We set up one parameter such as CN and use the non
CN. The algorithm evaluates the size of CN. If the size of CN is equal to one, the algorithm can
directly return the executed result for the CN. But if the size of CN is more than one, non
tuple sets of each relation in CN are projected
intermediate results. These projected results are executed as SQL queries and also are merging all
results. If the algorithm executes an empty tuple set, it can stop the current state and it will start t
execute the next relation in CN. As a consequence, the algorithm returns all executed results for
each CN. The examples of evaluated CN for Q = “Chen Web Springer” in DBLP and Q = “
Jack David” in IMDB that are shown in Figure 10 and Figure 11.
Figure 10. Processing of Evaluated CN on DBLP
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
keyword query, the number of CNs generated can be very large. Given a large number of
joins, it is extremely difficult to obtain an optimal query processing plan. It is because
one best plan for a CN may make others slow down, if its subtrees are shared by others
The idea of evaluating the sub-expressions is a well-known topic in query optimization.
the algorithm to evaluate all CNs together using a greedy
algorithm. In this algorithm, sub-expressions that are shared by most CNs should be
may generate the smallest number of results should be evaluated first.
an operator mesh in order to share the computatio
evaluating all CNs in a mesh, a projected relation with the
smallest number of tuples is selected to start and to join. Qin et al. [11] observe
evaluating all CNs using only joins may always generate a large number of temporary
to use semijoin/join sequences to evaluate a CN.
paper, we present the algorithm for executing CN in order to CN evaluation
strategy. It is observed that there is substantial evaluating the common join expressions
among CNs. As a consequence, the computational efforts can be saved if multiple CNs
executed in a calculated way that minimizes the sizes of joining intermediate
Evaluating Candidate Network
We present the D_CNEval algorithm based on the idea of dynamic query optimization algorithm
[14] for CN evaluation that is shown in Figure 12. The D_CNEval algorithm is devised to
perform this task. In general, it will return the result of query execution.
We set up one parameter such as CN and use the non-free tuple sets as input, which belong to that
CN. The algorithm evaluates the size of CN. If the size of CN is equal to one, the algorithm can
directly return the executed result for the CN. But if the size of CN is more than one, non
relation in CN are projected, which contain keyword query, to reduce the size of
intermediate results. These projected results are executed as SQL queries and also are merging all
results. If the algorithm executes an empty tuple set, it can stop the current state and it will start t
execute the next relation in CN. As a consequence, the algorithm returns all executed results for
each CN. The examples of evaluated CN for Q = “Chen Web Springer” in DBLP and Q = “
” in IMDB that are shown in Figure 10 and Figure 11.
igure 10. Processing of Evaluated CN on DBLP
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
23
keyword query, the number of CNs generated can be very large. Given a large number of
It is because
one best plan for a CN may make others slow down, if its subtrees are shared by others
known topic in query optimization.
gether using a greedy
expressions that are shared by most CNs should be
s should be evaluated first.
order to share the computational cost of
evaluating all CNs in a mesh, a projected relation with the
observed that
always generate a large number of temporary
paper, we present the algorithm for executing CN in order to CN evaluation
strategy. It is observed that there is substantial evaluating the common join expressions
among CNs. As a consequence, the computational efforts can be saved if multiple CNs
executed in a calculated way that minimizes the sizes of joining intermediate
We present the D_CNEval algorithm based on the idea of dynamic query optimization algorithm
igure 12. The D_CNEval algorithm is devised to
belong to that
CN. The algorithm evaluates the size of CN. If the size of CN is equal to one, the algorithm can
directly return the executed result for the CN. But if the size of CN is more than one, non-free
which contain keyword query, to reduce the size of
intermediate results. These projected results are executed as SQL queries and also are merging all
results. If the algorithm executes an empty tuple set, it can stop the current state and it will start to
execute the next relation in CN. As a consequence, the algorithm returns all executed results for
each CN. The examples of evaluated CN for Q = “Chen Web Springer” in DBLP and Q = “Black
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
Figure 11. Processing of Evaluated CN on IMDB
Figure 12. D_CNEval Algorithm
We perform a worst case time analysis of the D_CNEval algorithm. If the size of CN is equal 1,
we assume that we execute the result in time
every relation in CN, where |CN| is the size of candidate netw
attributes of relation are projected by the non
result. In each step, we assign the query results with the same array lists. Evaluating the query
results in Ri
N
takes time T, where T is the number of tuples, that are contained keyword, in
D_CNEval(CN)
Input: non-free tuple sets{R
Output: result of query execution EQ
1. EQ = Φ
2. n = sizeof(CN)
3. If (EQ.isEMPTY()) then
4. Return Φ
5 .Else {
6. If ( n = 1 ) then
7. EQ = executeTo( R
8. Else{
9.
10.
11.
12.
13.
14.
}
15. Return EQ.
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
Figure 11. Processing of Evaluated CN on IMDB
Figure 12. D_CNEval Algorithm
We perform a worst case time analysis of the D_CNEval algorithm. If the size of CN is equal 1,
we assume that we execute the result in time O(1). The for-loop is executed at most |CN| times for
every relation in CN, where |CN| is the size of candidate network. Given a n-relation in CN, the
attributes of relation are projected by the non-free tuple set Ri
N
that are executed due to projected
result. In each step, we assign the query results with the same array lists. Evaluating the query
time T, where T is the number of tuples, that are contained keyword, in
free tuple sets{R1
N
,R2
N
,…,Ri
N
}, in which belongs to CN
Output: result of query execution EQ
2. n = sizeof(CN)
3. If (EQ.isEMPTY()) then
If ( n = 1 ) then
EQ = executeTo( Ri
N
)
Else{
For each Ri
N
Є CN {
Ri
N
′ = projectTo (Ri
N
)
EQ′ = executeTo( Ri
N ′
)
If (EQ′.isEMPTY()) then continue
Else
EQ = EQ ∪ EQ′ } }
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
24
We perform a worst case time analysis of the D_CNEval algorithm. If the size of CN is equal 1,
loop is executed at most |CN| times for
relation in CN, the
that are executed due to projected
result. In each step, we assign the query results with the same array lists. Evaluating the query
time T, where T is the number of tuples, that are contained keyword, in
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
25
relation. Hence we check if a query result is empty in Ri
N
and reduce its time in O(1). The query
result is merged at most |T| times. Hence the total execution time takes in the worst case time
O(|CN|.|T|).The D_CNEval algorithm may output the results of query execution by reducing the
size of intermediate joining results.
6.2. Generating Connected Tuple Trees
We display the processing of generated CTT as shown in Figure 13. For a given query Q, the
connected tuple tree is generated according to an evaluated CN that is some tuples coming from
different relations. For each pair of adjacent tuple sets Ri, Rj in connected tuple tree, there is an
edge (Ri,Rj) in SG. Each CTT that defined satisfaction as follow:
Property 3. If a node in connected tuple tree is one of tuples in relation, it contains at least
one keyword in query Q (completeness).
Property 4. There is no duplicate tuple with each other in the connected tuple tree
(duplication-free).
In Figure 14, CTT1 and CTT2 for Query 1 and Query 2 in DBLP are presented as examples. In
CTT1, a node P2 contains the keyword “Peter”, and I3 contains keyword “XML” and U1 contains
the keyword “Springer” in Query 1. In CTT2, the nodes P3 and I4 contain the keywords “David”
and “Modelling”, and U1 contains the keyword “Springer” in Query 2 respectively.
Figure 13. Processing of Generated CTT
For IMDB, CTT3 and CTT4 for Query 3 and Query 4 are illustrated as examples in Figure 15. In
CTT3, a node M1 contains the two keywords “Love” and “Story”, and D1 contains keyword
“Elley” in Query 3. In CTT4, the nodes A2 and M3 contain the keywords “Black” and “Jack”,
and D2 contains the keyword “David” in Query 4 respectively. Except primary-foreign relation
nodes, all remaining nodes contain the keywords in given query, and there are no duplicate nodes.
In this paper, we consider a connected tuple tree as a result as long as it fulfills the properties.
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
26
Figure 14. Queries, CTTs and CNs for DBLP
Figure 15. Queries, CTTs and CNs for IMDB
7. PERFORMANCE EVALUATION
7.1. Evaluation Setup
We evaluate the search efficiency of proposed algorithms on DBLP and IMDB datasets. All
queries generating algorithms were implemented in Java, and JDBC was used to connect to the
database. We conducted all the experiments on Core(TM) 2 Duo CPU and 2GB memory laptop
running XP. We take the average executing time on running 15 times.
Dataset: We use two real datasets the Original Digital Bibliography and Library Project (DBLP)
dataset [5] and the Internet Movie Database (IMDB) [17] in our evaluation. DBLP contains
publications records. IMDB contains movies records. Table 1 and Table 2. show the schema and
statistic of two datasets.
Query Set: We manually picked a large number of queries for evaluation. We attempted to
include a wide variety of keywords and their combinations in the query sets, such as the
selectivity of keywords, the size of the most relevant answers, the number of potential relevant
answers, etc. We focus on a subset of the queries in this experiment. There are 20 queries with
query length ranging from 2 to 6.
Query 1: “Peter XML Springer”
CTT1: P2→RPI←I3→R2→U1
CN1: PN
⋈ RPIF
⋈ IN
⋈ RF
⋈ UN
Query 2: “David Modelling Springer”
CTT2: P3→ RPI←I4→R4→U1
CN2: PN
⋈ RPIF
⋈ IN
⋈ RF
⋈ UN
Query 3: “Elley love Story”
CTT3: M1→MD←D1
CN3: MN
⋈ MDF
⋈DN
Query 4: “Black Jack David”
CTT4: A2→ R←M3→MD→D2
CN4: AN
⋈ RF
⋈ MN
⋈ MDF
⋈ DN
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
27
Table 1. Statistics of DBLP Dataset
Table 2. Statistics of IMDB Dataset
7.2. Evaluation
We implement the Heuristic_CNGen and AT_CNGen algorithms for CN generation. In practical,
we observe that AT_CNGen algorithm is small overhead than Heuristic_CNGen algorithm. To
compare the performance of these algorithms, we analyze the worst case time of
Heuristic_CNGen algorithm and AT_CNGen algorithm.
In Heuristic_CNGen algorithm, while-loop is performed at most |E| times for every tuple sets in
queue E, where |E| is the size of queue. We check the network graph T that is more than maximun
number of tuple sets when T is put from queue to calculate as first. If T is greater than maximun
number of tuple sets, we can reduce its time in O(1). We add T into Hash table H with generated
CN for each tuple set when T is not more than maximun number of tuple sets. This step is
increasing the computation time in O(1). And the firstly for-loop is achieved at most |T| times for
each tuple set in T, where |T| is the size of network graph that is less than or equal to maximun
number of tuple sets. The next for-loop is fulfilled at most |T| times because each node in T
traverses the adjacent node with it in schema graph. Generating all valid candiate networks in
schema graph takes time |T|. In completeness, the CN generation time takes in O(|E|-|T|). Then,
we filter out the duplicated CN with for-loop that takes at most time |T|. Hence the total execution
time takes in the worst case time O((|E|-|T|)2
).
In AT_CNGen algorithm, we suppose that the number of keyword is K and the size of tuple set
is T and the maximal adjacent tuple in schema graph is M. The for-loop in algorithm
getTupleSet(k) is executed at most |K| times for each keyword in keyword query, time complexity
to construct the tuple set is O(|K|). After generating the tuple set, getAdjacentList(T,MAXN)
algorithm is transvered at most |T| time for each tuple set that is adjacent tuple in the schema
graph. The time complexity of transversing adjacent tuple is O(|T|). Then, we check the
duplicated CN with for-loop that takes at most time |M|. As a consquence, the total time
complexity is O(|K|.|TM|).
Relation Schema #Tuples
Person(Pid,Name)
InProceeding(Iid,Title,Pages,Rid)
Proceeding(Rid,Title,Uid,Sid,…)
Publisher(Uid,Name)
Series(Sid,Title)
RelationPersonInProceeding(Pid,IPid)
174,709
212,273
3,007
86
24
491,777
Relation Schema #Tuples
Actors(Aid,Name)
Directors(Did,Name)
Movies(Mid,Name,Year,Rank)
Movies-Directors(Mid,Did)
Movies-Genres(Mid,Genre)
Roles(Aid,Mid,Role)
817,718
86,880
388,269
406,967
417,784
3,432,630
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
28
In summary, AT_CNGen algorithm can be reduced the computation time and searching space,
while Heuristic_CNGen algorithm expands the nodes in schema graph which have same heuristic
values. Moreover, AT_CNGen algorithm generates the valid CNs due to completeness and
duplication-free. Finally, we select AT_CNGen algorithm for the CN generation of relational
keyword search system.
7.3. Experimental Results of the Candidate Network Generation
We compare the evaluation results of the proposed AT_CNGen algorithm and existing algorithms
by using the same DBLP and IMDB datasets. We observe that proposed algorithm achieve better
search performance over the two datasets than the existing algorithms, such as DISCOVER and
SPARK, that is shown in Figure 16 and Figure 17.
The proposed AT_CNGen algorithm can generate the valid CNs by the maximal CN size. The
proposed algorithm can produce the number of CNs by duplication-free. Then, the new CN
generation algorithm is compared with all existing CN generations to eliminate redundancies. The
proposed method reduces the number of CNs grows very large even for small CN size. The
elapsed time of SPARK is faster than DISCOVER, but SPARK cannot bounded to produce the
number of CN for the small CN’s size. We can see that elapsed time of the proposed AT_CNGen
algorithm is exponentially smaller than DISCOVER and SPARK.
Figure 16. Comparison of AT_CNGen and Others previous algorithms on DBLP
10
100
1000
10000
3 4 5 6
ComputationTime(ms)
MAX CN Size DISCOVER
SPARK
AT_CNGen
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
29
Figure 17. Comparison of AT_CNGen and Existing algorithms on IMDB
7.4. Experimental Results of Query Execution
In this section, we measure the computation time for sample queries of DBLP dataset in Figure 18
by implementing the D_CNEval algorithm. Moreover, we also present sample queries of IMDB
dataset in Figure 20 to evaluate the efficiency of D_CNEval algorithm. Given a keyword query,
the proposed algorithm generates the valid CNs. The generated CNs is evaluated by reducing the
size of intermediate results to get the final results.
Query Keywords
Q1 chen
Q2 chen web content
Q3 chen web springer
Q4 web content
Q5 content springer bychen
Q6 chen web springer 2000
Q7 david compiler generator
Q8 compiler springer by david
Q9 compiler generator
Q10 david compiler generator springer
Figure 18. Keywords Queries on DBLP
10
100
1000
10000
3 4 5 6
ComputationTime(ms)
MAX CN Size DISCOVER
SPARK
AT_CNGen
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
30
Figure 19. Execution Times for Queries of DBLP
Figure 19 shows the execution times by evaluating the number of queries. In this figure, Q1 can
execute the result at minimum time because this query selects the executed query in a single
relation. Also Q7, Q9 and Q10 can evaluate the result queries at minimum time although two or
more relations joined due to the primary-foreign relationship.
Then, we present the evaluation of execution times for the queries in IMDB that is shown in
Figure 21. We can see that Q14 and Q19 can execute the result at minimum time because this
query selects the executed query in a single relation. Also Q13, Q15 and Q18 can evaluate the
result queries at minimum time although two or more relations joined due to the primary-foreign
relationship. So, we observe that the D_CNEval algorithm execute the final result to speed up.
Query Keywords
Q11 alexander
Q12 hollywood
Q13 elley love story
Q14 monkey island
Q15 blake death
Q16 mile allen
Q17 godfather
Q18 come away 2005
Q19 2010:continues
Q20 black jack david
Figure 20. Keywords Queries on IMDB
10
100
1000
10000
100000
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10
ComputationTime(ms)
Queries
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
31
Figure 21. Execution Times for Queries of IMDB
8. CONCLUSIONS
Efficient keyword search in relational databases allows ordinary users to find text information in
relational databases with much higher flexibility. A keyword query in the system is a list of
keywords and does not need to specify any relation or attributes names. The result to such a
keyword query consists of the minimal connected tuple trees, which potentially include tuples
from multiple relations in database. We first proposed a CN generation algorithm
(Heuristic_CNGen) to produce all CNs. Although this algorithm produces the valid CNs, it
reduced the system performance when traversing the same heuristic values nodes. In order to
improve the performance, we also propose a new CN generation algorithm (AT_CNGen) to
generate all CNs for relational keyword search system. The proposed candidate network
algorithm can solve the growing number of CNs for small CN size by comparing with existing
algorithms. Moreover, we observe that AT_CNGen algorithm achieved the system performance
as high as Heuristic_CNGen algorithm. Then, we propose the dynamic CN evaluation algorithm
(D_CNEval) to produce the connected tuple trees by reducing the joining intermediate results.
The proposed CN evaluation algorithm can generate the minimal number of CTTs by doing
minimal accesses to the database, and does not have data bound with the maximum number of
tuple set. We presented the experimental results on DBLP and IMDB show that the proposed
algorithms generate the result approximately for the user desired query. And the experimental
results are efficiently evaluated by using query execution strategy.
REFERENCES
[1] Agrawal,S., Chaudhuri,S., Das,G. (2002). DBXplorer: A System for Keyword-Based Search over
Relational Database. Proc. 18th
Int. Conf. on Data Engineering, pp. 5-16.
[2] Baid,A., Rae,I., Li,J., Doan,A., Naughton,J. (2010). Toward Scalable Keyword Search over Relational
Data. Proc. VLDB Endowment, Vol. 3.
[3] Felner,A., Korf,R.E., Hanan,S. (2004). Additive Pattern Database Heuristics. Journal of Artificial
Intelligence Research, (p.279-318).
[4] Hristidis,V., Papakonstaninou,Y. (2002). DISCOVER: Keyword Search in Relational Databases.
Proc. 28th
Int. Conf. on Very Large Data Bases, pp. 670-681.
[5] http://www.dblp.uni.trier.de.
[6] Korf,R.E. Depth-Firth Iterative-Deepening: An Optimal Admissible Tree Search.
10
100
1000
10000
100000
Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20
ComputationTime(ms)
Queries
International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012
32
[7] Liu,F., Yu,C., Meng,W. (2006). Effective Keyword Search in Relational Databases. Proc. 2006 ACM
SIGMOD Int. Conf. on Management of data, pp. 563-574.
[8] Li,P., Zhu,Q., Wang,S. (2008). The Research on the Algorithms of Keyword Search in Relational
Database. Springer, pp. 134-143.
[9] Luo,Y., Wang,W., Lin,X., Zhou,X. (2011). SPARK2: Top-k Keyword Query in Relational Databases.
TKDE Special Issue: Keyword Search on Structured Data.
[10] Markowetz,A., Yang,Y., Papadias,D. (2007). Keyword Search on Relational Data Streams. Proc.
2007 ACM SIGMOD Int. Conf. on Management of data, pp. 605-616.
[11] Qin,L., Yu,J.X., Chang,L. (2009). Keyword Search in Databases: The Power of RDBMs. Proc. 35th
SIGMOD Int. Conf. on Management of data, pp. 681-694.
[12] Stefanidis,K., Drosou,M., Pitoura,E. (2010). PerK: Personalized Keyword Search in Relational
Databases through Preferences. Proc. 13th
Int. Conf. on Extending Database Technology, EDBT, pp.
585-596.
[13] Thein,M.M. (2012). Querying Connected Tuple Trees for Relational Keyword Search. Proc. Int.
Conference on Information Retrieval and Knowledge Management, pp. 285-289.
[14] Tamer Özsu.M, Valduriez.P. (2011). Principles of Distributed Database Systems. Springer.
[15] Wang,S., Zhang,J., Peng,Z., Zhan,J., Wang,Q. (2007). Study on Efficiency and Effectiveness of
KSORD. APWeb/WAIM Int. Workshops, pp. 6-17.
[16] Xu,Y., Ishikawa,Y., Guan,J. (2009). Effective Top-k Keyword Search in Relational Databases Considering
Query Semantics. APWeb/WAIM Int. Workshops, pp. 172–184.
[17] http://www.imdb.com/interfaces.
[18] YU,J.X., Qin,L., Chang,L. (2010). Keyword Search in Relational Databases: A Survey. IEEE Data Engineering
Bulletin, Vol. 33, pp. 67-78.

More Related Content

What's hot

IRJET- Foster Hashtag from Image and Text
IRJET-  	  Foster Hashtag from Image and TextIRJET-  	  Foster Hashtag from Image and Text
IRJET- Foster Hashtag from Image and TextIRJET Journal
 
QUERY INVERSION TO FIND DATA PROVENANCE
QUERY INVERSION TO FIND DATA PROVENANCE QUERY INVERSION TO FIND DATA PROVENANCE
QUERY INVERSION TO FIND DATA PROVENANCE cscpconf
 
Context-Based Diversification for Keyword Queries over XML Data
Context-Based Diversification for Keyword Queries over XML DataContext-Based Diversification for Keyword Queries over XML Data
Context-Based Diversification for Keyword Queries over XML Data1crore projects
 
Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Editor IJARCET
 
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...ijseajournal
 
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITYSOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITYIJDKP
 
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIESENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIEScsandit
 
Activity Context Modeling in Context-Aware
Activity Context Modeling in Context-AwareActivity Context Modeling in Context-Aware
Activity Context Modeling in Context-AwareEditor IJCATR
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGdannyijwest
 
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
 
Query Optimization Techniques in Graph Databases
Query Optimization Techniques in Graph DatabasesQuery Optimization Techniques in Graph Databases
Query Optimization Techniques in Graph Databasesijdms
 
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...IJwest
 
A comprehensive survey of link mining and anomalies detection
A comprehensive survey of link mining and anomalies detectionA comprehensive survey of link mining and anomalies detection
A comprehensive survey of link mining and anomalies detectioncsandit
 
Best Keyword Cover Search
Best Keyword Cover SearchBest Keyword Cover Search
Best Keyword Cover Search1crore projects
 
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...Parang Saraf
 
Using Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic WebUsing Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic WebIJwest
 
Concurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsConcurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsParang Saraf
 
Incentive Compatible Privacy Preserving Data Analysis
Incentive Compatible Privacy Preserving Data AnalysisIncentive Compatible Privacy Preserving Data Analysis
Incentive Compatible Privacy Preserving Data Analysisrupasri mupparthi
 

What's hot (18)

IRJET- Foster Hashtag from Image and Text
IRJET-  	  Foster Hashtag from Image and TextIRJET-  	  Foster Hashtag from Image and Text
IRJET- Foster Hashtag from Image and Text
 
QUERY INVERSION TO FIND DATA PROVENANCE
QUERY INVERSION TO FIND DATA PROVENANCE QUERY INVERSION TO FIND DATA PROVENANCE
QUERY INVERSION TO FIND DATA PROVENANCE
 
Context-Based Diversification for Keyword Queries over XML Data
Context-Based Diversification for Keyword Queries over XML DataContext-Based Diversification for Keyword Queries over XML Data
Context-Based Diversification for Keyword Queries over XML Data
 
Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257Ijarcet vol-2-issue-7-2252-2257
Ijarcet vol-2-issue-7-2252-2257
 
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
 
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITYSOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITY
 
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIESENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
 
Activity Context Modeling in Context-Aware
Activity Context Modeling in Context-AwareActivity Context Modeling in Context-Aware
Activity Context Modeling in Context-Aware
 
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKINGINTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
INTELLIGENT SOCIAL NETWORKS MODEL BASED ON SEMANTIC TAG RANKING
 
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...
 
Query Optimization Techniques in Graph Databases
Query Optimization Techniques in Graph DatabasesQuery Optimization Techniques in Graph Databases
Query Optimization Techniques in Graph Databases
 
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
 
A comprehensive survey of link mining and anomalies detection
A comprehensive survey of link mining and anomalies detectionA comprehensive survey of link mining and anomalies detection
A comprehensive survey of link mining and anomalies detection
 
Best Keyword Cover Search
Best Keyword Cover SearchBest Keyword Cover Search
Best Keyword Cover Search
 
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
 
Using Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic WebUsing Page Size for Controlling Duplicate Query Results in Semantic Web
Using Page Size for Controlling Duplicate Query Results in Semantic Web
 
Concurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector RepresentationsConcurrent Inference of Topic Models and Distributed Vector Representations
Concurrent Inference of Topic Models and Distributed Vector Representations
 
Incentive Compatible Privacy Preserving Data Analysis
Incentive Compatible Privacy Preserving Data AnalysisIncentive Compatible Privacy Preserving Data Analysis
Incentive Compatible Privacy Preserving Data Analysis
 

Similar to EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES

ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES cscpconf
 
Enhancing keyword search over relational databases using ontologies
Enhancing keyword search over relational databases using ontologiesEnhancing keyword search over relational databases using ontologies
Enhancing keyword search over relational databases using ontologiescsandit
 
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routingIEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routingIEEEFINALYEARSTUDENTPROJECTS
 
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routingIEEEMEMTECHSTUDENTSPROJECTS
 
Annotating Search Results from Web Databases
Annotating Search Results from Web Databases Annotating Search Results from Web Databases
Annotating Search Results from Web Databases Mohit Sngg
 
Approaches for Keyword Query Routing
Approaches for Keyword Query RoutingApproaches for Keyword Query Routing
Approaches for Keyword Query RoutingIJERA Editor
 
keyword query routing
keyword query routingkeyword query routing
keyword query routingswathi78
 
Semantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningSemantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningEditor IJCATR
 
Shortest path estimation for graph
Shortest path estimation for graphShortest path estimation for graph
Shortest path estimation for graphijdms
 
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4JOUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4Jijcsity
 
JPJ1423 Keyword Query Routing
JPJ1423   Keyword Query RoutingJPJ1423   Keyword Query Routing
JPJ1423 Keyword Query Routingchennaijp
 
Keyword Query Routing
Keyword Query RoutingKeyword Query Routing
Keyword Query RoutingSWAMI06
 
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemKnowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemIRJET Journal
 
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...Computer Science Journals
 

Similar to EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES (20)

ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES
 
Enhancing keyword search over relational databases using ontologies
Enhancing keyword search over relational databases using ontologiesEnhancing keyword search over relational databases using ontologies
Enhancing keyword search over relational databases using ontologies
 
G1803054653
G1803054653G1803054653
G1803054653
 
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routingIEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
IEEE 2014 JAVA DATA MINING PROJECTS Keyword query routing
 
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
2014 IEEE JAVA DATA MINING PROJECT Keyword query routing
 
Annotating Search Results from Web Databases
Annotating Search Results from Web Databases Annotating Search Results from Web Databases
Annotating Search Results from Web Databases
 
Approaches for Keyword Query Routing
Approaches for Keyword Query RoutingApproaches for Keyword Query Routing
Approaches for Keyword Query Routing
 
keyword query routing
keyword query routingkeyword query routing
keyword query routing
 
Semantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data MiningSemantically Enriched Knowledge Extraction With Data Mining
Semantically Enriched Knowledge Extraction With Data Mining
 
Shortest path estimation for graph
Shortest path estimation for graphShortest path estimation for graph
Shortest path estimation for graph
 
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4JOUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
OUTCOME ANALYSIS IN ACADEMIC INSTITUTIONS USING NEO4J
 
Sub1579
Sub1579Sub1579
Sub1579
 
Keyword query routing
Keyword query routingKeyword query routing
Keyword query routing
 
JPJ1423 Keyword Query Routing
JPJ1423   Keyword Query RoutingJPJ1423   Keyword Query Routing
JPJ1423 Keyword Query Routing
 
Keyword Query Routing
Keyword Query RoutingKeyword Query Routing
Keyword Query Routing
 
Poster (1)
Poster (1)Poster (1)
Poster (1)
 
A new link based approach for categorical data clustering
A new link based approach for categorical data clusteringA new link based approach for categorical data clustering
A new link based approach for categorical data clustering
 
Az31349353
Az31349353Az31349353
Az31349353
 
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemKnowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
 
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
 

More from IJCSEIT Journal

ANALYSIS OF EXISTING TRAILERS’ CONTAINER LOCK SYSTEMS
ANALYSIS OF EXISTING TRAILERS’ CONTAINER LOCK SYSTEMS ANALYSIS OF EXISTING TRAILERS’ CONTAINER LOCK SYSTEMS
ANALYSIS OF EXISTING TRAILERS’ CONTAINER LOCK SYSTEMS IJCSEIT Journal
 
A MODEL FOR REMOTE ACCESS AND PROTECTION OF SMARTPHONES USING SHORT MESSAGE S...
A MODEL FOR REMOTE ACCESS AND PROTECTION OF SMARTPHONES USING SHORT MESSAGE S...A MODEL FOR REMOTE ACCESS AND PROTECTION OF SMARTPHONES USING SHORT MESSAGE S...
A MODEL FOR REMOTE ACCESS AND PROTECTION OF SMARTPHONES USING SHORT MESSAGE S...IJCSEIT Journal
 
BIOMETRIC APPLICATION OF INTELLIGENT AGENTS IN FAKE DOCUMENT DETECTION OF JOB...
BIOMETRIC APPLICATION OF INTELLIGENT AGENTS IN FAKE DOCUMENT DETECTION OF JOB...BIOMETRIC APPLICATION OF INTELLIGENT AGENTS IN FAKE DOCUMENT DETECTION OF JOB...
BIOMETRIC APPLICATION OF INTELLIGENT AGENTS IN FAKE DOCUMENT DETECTION OF JOB...IJCSEIT Journal
 
FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...
FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...
FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...IJCSEIT Journal
 
BIOMETRICS AUTHENTICATION TECHNIQUE FOR INTRUSION DETECTION SYSTEMS USING FIN...
BIOMETRICS AUTHENTICATION TECHNIQUE FOR INTRUSION DETECTION SYSTEMS USING FIN...BIOMETRICS AUTHENTICATION TECHNIQUE FOR INTRUSION DETECTION SYSTEMS USING FIN...
BIOMETRICS AUTHENTICATION TECHNIQUE FOR INTRUSION DETECTION SYSTEMS USING FIN...IJCSEIT Journal
 
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...IJCSEIT Journal
 
Effect of Interleaved FEC Code on Wavelet Based MC-CDMA System with Alamouti ...
Effect of Interleaved FEC Code on Wavelet Based MC-CDMA System with Alamouti ...Effect of Interleaved FEC Code on Wavelet Based MC-CDMA System with Alamouti ...
Effect of Interleaved FEC Code on Wavelet Based MC-CDMA System with Alamouti ...IJCSEIT Journal
 
FUZZY WEIGHTED ASSOCIATIVE CLASSIFIER: A PREDICTIVE TECHNIQUE FOR HEALTH CARE...
FUZZY WEIGHTED ASSOCIATIVE CLASSIFIER: A PREDICTIVE TECHNIQUE FOR HEALTH CARE...FUZZY WEIGHTED ASSOCIATIVE CLASSIFIER: A PREDICTIVE TECHNIQUE FOR HEALTH CARE...
FUZZY WEIGHTED ASSOCIATIVE CLASSIFIER: A PREDICTIVE TECHNIQUE FOR HEALTH CARE...IJCSEIT Journal
 
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALGENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALIJCSEIT Journal
 
DETECTION OF CONCEALED WEAPONS IN X-RAY IMAGES USING FUZZY K-NN
DETECTION OF CONCEALED WEAPONS IN X-RAY IMAGES USING FUZZY K-NNDETECTION OF CONCEALED WEAPONS IN X-RAY IMAGES USING FUZZY K-NN
DETECTION OF CONCEALED WEAPONS IN X-RAY IMAGES USING FUZZY K-NNIJCSEIT Journal
 
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVALMETA-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVALIJCSEIT Journal
 
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...IJCSEIT Journal
 
M-FISH KARYOTYPING - A NEW APPROACH BASED ON WATERSHED TRANSFORM
M-FISH KARYOTYPING - A NEW APPROACH BASED ON WATERSHED TRANSFORMM-FISH KARYOTYPING - A NEW APPROACH BASED ON WATERSHED TRANSFORM
M-FISH KARYOTYPING - A NEW APPROACH BASED ON WATERSHED TRANSFORMIJCSEIT Journal
 
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGESRANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGESIJCSEIT Journal
 
A NOVEL WINDOW FUNCTION YIELDING SUPPRESSED MAINLOBE WIDTH AND MINIMUM SIDELO...
A NOVEL WINDOW FUNCTION YIELDING SUPPRESSED MAINLOBE WIDTH AND MINIMUM SIDELO...A NOVEL WINDOW FUNCTION YIELDING SUPPRESSED MAINLOBE WIDTH AND MINIMUM SIDELO...
A NOVEL WINDOW FUNCTION YIELDING SUPPRESSED MAINLOBE WIDTH AND MINIMUM SIDELO...IJCSEIT Journal
 
CSHURI – Modified HURI algorithm for Customer Segmentation and Transaction Pr...
CSHURI – Modified HURI algorithm for Customer Segmentation and Transaction Pr...CSHURI – Modified HURI algorithm for Customer Segmentation and Transaction Pr...
CSHURI – Modified HURI algorithm for Customer Segmentation and Transaction Pr...IJCSEIT Journal
 
AN EFFICIENT IMPLEMENTATION OF TRACKING USING KALMAN FILTER FOR UNDERWATER RO...
AN EFFICIENT IMPLEMENTATION OF TRACKING USING KALMAN FILTER FOR UNDERWATER RO...AN EFFICIENT IMPLEMENTATION OF TRACKING USING KALMAN FILTER FOR UNDERWATER RO...
AN EFFICIENT IMPLEMENTATION OF TRACKING USING KALMAN FILTER FOR UNDERWATER RO...IJCSEIT Journal
 
USING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASE
USING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASEUSING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASE
USING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASEIJCSEIT Journal
 
FACTORS AFFECTING ACCEPTANCE OF WEB-BASED TRAINING SYSTEM: USING EXTENDED UTA...
FACTORS AFFECTING ACCEPTANCE OF WEB-BASED TRAINING SYSTEM: USING EXTENDED UTA...FACTORS AFFECTING ACCEPTANCE OF WEB-BASED TRAINING SYSTEM: USING EXTENDED UTA...
FACTORS AFFECTING ACCEPTANCE OF WEB-BASED TRAINING SYSTEM: USING EXTENDED UTA...IJCSEIT Journal
 
PROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SET
PROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SETPROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SET
PROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SETIJCSEIT Journal
 

More from IJCSEIT Journal (20)

ANALYSIS OF EXISTING TRAILERS’ CONTAINER LOCK SYSTEMS
ANALYSIS OF EXISTING TRAILERS’ CONTAINER LOCK SYSTEMS ANALYSIS OF EXISTING TRAILERS’ CONTAINER LOCK SYSTEMS
ANALYSIS OF EXISTING TRAILERS’ CONTAINER LOCK SYSTEMS
 
A MODEL FOR REMOTE ACCESS AND PROTECTION OF SMARTPHONES USING SHORT MESSAGE S...
A MODEL FOR REMOTE ACCESS AND PROTECTION OF SMARTPHONES USING SHORT MESSAGE S...A MODEL FOR REMOTE ACCESS AND PROTECTION OF SMARTPHONES USING SHORT MESSAGE S...
A MODEL FOR REMOTE ACCESS AND PROTECTION OF SMARTPHONES USING SHORT MESSAGE S...
 
BIOMETRIC APPLICATION OF INTELLIGENT AGENTS IN FAKE DOCUMENT DETECTION OF JOB...
BIOMETRIC APPLICATION OF INTELLIGENT AGENTS IN FAKE DOCUMENT DETECTION OF JOB...BIOMETRIC APPLICATION OF INTELLIGENT AGENTS IN FAKE DOCUMENT DETECTION OF JOB...
BIOMETRIC APPLICATION OF INTELLIGENT AGENTS IN FAKE DOCUMENT DETECTION OF JOB...
 
FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...
FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...
FACE RECOGNITION USING DIFFERENT LOCAL FEATURES WITH DIFFERENT DISTANCE TECHN...
 
BIOMETRICS AUTHENTICATION TECHNIQUE FOR INTRUSION DETECTION SYSTEMS USING FIN...
BIOMETRICS AUTHENTICATION TECHNIQUE FOR INTRUSION DETECTION SYSTEMS USING FIN...BIOMETRICS AUTHENTICATION TECHNIQUE FOR INTRUSION DETECTION SYSTEMS USING FIN...
BIOMETRICS AUTHENTICATION TECHNIQUE FOR INTRUSION DETECTION SYSTEMS USING FIN...
 
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
PERFORMANCE ANALYSIS OF FINGERPRINTING EXTRACTION ALGORITHM IN VIDEO COPY DET...
 
Effect of Interleaved FEC Code on Wavelet Based MC-CDMA System with Alamouti ...
Effect of Interleaved FEC Code on Wavelet Based MC-CDMA System with Alamouti ...Effect of Interleaved FEC Code on Wavelet Based MC-CDMA System with Alamouti ...
Effect of Interleaved FEC Code on Wavelet Based MC-CDMA System with Alamouti ...
 
FUZZY WEIGHTED ASSOCIATIVE CLASSIFIER: A PREDICTIVE TECHNIQUE FOR HEALTH CARE...
FUZZY WEIGHTED ASSOCIATIVE CLASSIFIER: A PREDICTIVE TECHNIQUE FOR HEALTH CARE...FUZZY WEIGHTED ASSOCIATIVE CLASSIFIER: A PREDICTIVE TECHNIQUE FOR HEALTH CARE...
FUZZY WEIGHTED ASSOCIATIVE CLASSIFIER: A PREDICTIVE TECHNIQUE FOR HEALTH CARE...
 
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNALGENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
GENDER RECOGNITION SYSTEM USING SPEECH SIGNAL
 
DETECTION OF CONCEALED WEAPONS IN X-RAY IMAGES USING FUZZY K-NN
DETECTION OF CONCEALED WEAPONS IN X-RAY IMAGES USING FUZZY K-NNDETECTION OF CONCEALED WEAPONS IN X-RAY IMAGES USING FUZZY K-NN
DETECTION OF CONCEALED WEAPONS IN X-RAY IMAGES USING FUZZY K-NN
 
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVALMETA-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
META-HEURISTICS BASED ARF OPTIMIZATION FOR IMAGE RETRIEVAL
 
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...
ERROR PERFORMANCE ANALYSIS USING COOPERATIVE CONTENTION-BASED ROUTING IN WIRE...
 
M-FISH KARYOTYPING - A NEW APPROACH BASED ON WATERSHED TRANSFORM
M-FISH KARYOTYPING - A NEW APPROACH BASED ON WATERSHED TRANSFORMM-FISH KARYOTYPING - A NEW APPROACH BASED ON WATERSHED TRANSFORM
M-FISH KARYOTYPING - A NEW APPROACH BASED ON WATERSHED TRANSFORM
 
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGESRANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
RANDOMIZED STEGANOGRAPHY IN SKIN TONE IMAGES
 
A NOVEL WINDOW FUNCTION YIELDING SUPPRESSED MAINLOBE WIDTH AND MINIMUM SIDELO...
A NOVEL WINDOW FUNCTION YIELDING SUPPRESSED MAINLOBE WIDTH AND MINIMUM SIDELO...A NOVEL WINDOW FUNCTION YIELDING SUPPRESSED MAINLOBE WIDTH AND MINIMUM SIDELO...
A NOVEL WINDOW FUNCTION YIELDING SUPPRESSED MAINLOBE WIDTH AND MINIMUM SIDELO...
 
CSHURI – Modified HURI algorithm for Customer Segmentation and Transaction Pr...
CSHURI – Modified HURI algorithm for Customer Segmentation and Transaction Pr...CSHURI – Modified HURI algorithm for Customer Segmentation and Transaction Pr...
CSHURI – Modified HURI algorithm for Customer Segmentation and Transaction Pr...
 
AN EFFICIENT IMPLEMENTATION OF TRACKING USING KALMAN FILTER FOR UNDERWATER RO...
AN EFFICIENT IMPLEMENTATION OF TRACKING USING KALMAN FILTER FOR UNDERWATER RO...AN EFFICIENT IMPLEMENTATION OF TRACKING USING KALMAN FILTER FOR UNDERWATER RO...
AN EFFICIENT IMPLEMENTATION OF TRACKING USING KALMAN FILTER FOR UNDERWATER RO...
 
USING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASE
USING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASEUSING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASE
USING DATA MINING TECHNIQUES FOR DIAGNOSIS AND PROGNOSIS OF CANCER DISEASE
 
FACTORS AFFECTING ACCEPTANCE OF WEB-BASED TRAINING SYSTEM: USING EXTENDED UTA...
FACTORS AFFECTING ACCEPTANCE OF WEB-BASED TRAINING SYSTEM: USING EXTENDED UTA...FACTORS AFFECTING ACCEPTANCE OF WEB-BASED TRAINING SYSTEM: USING EXTENDED UTA...
FACTORS AFFECTING ACCEPTANCE OF WEB-BASED TRAINING SYSTEM: USING EXTENDED UTA...
 
PROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SET
PROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SETPROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SET
PROBABILISTIC INTERPRETATION OF COMPLEX FUZZY SET
 

Recently uploaded

power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 

Recently uploaded (20)

power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 

EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES

  • 1. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 DOI : 10.5121/ijcseit.2012.2602 13 EFFICIENT SCHEMA BASED KEYWORD SEARCH IN RELATIONAL DATABASES Myint Myint Thein1 and Mie Mie Su Thwin2 1 University of Computer Studies, Mandalay, Myanmar mmyintt@gmail.com 2 University of Computer Studies, Mandalay, Myanmar drmiemiesuthwin@mmcert.org.mm ABSTRACT Keyword search in relational databases allows user to search information without knowing database schema and using structural query language (SQL). In this paper, we address the problem of generating and evaluating candidate networks. In candidate network generation, the overhead is caused by raising the number of joining tuples for the size of minimal candidate network. To reduce overhead, we propose candidate network generation algorithms to generate a minimum number of joining tuples according to the maximum number of tuple set. We first generate a set of joining tuples, candidate networks (CNs). It is difficult to obtain an optimal query processing plan during generating a number of joins. We also develop a dynamic CN evaluation algorithm (D_CNEval) to generate connected tuple trees (CTTs) by reducing the size of intermediate joining results. The performance evaluation of the proposed algorithms is conducted on IMDB and DBLP datasets and also compared with existing algorithms. KEYWORDS Candidate Network, Connected Tuple Tree, Joining Tuples, Keyword Query, Keyword Search, Relational Database 1. INTRODUCTION The most critical and valuable amount of data such as business data has been stored in relational databases. Relational database management system (RDBMS) is a DBMS in which data is saved in tables and the relationships among the data are saved in tables. The data can be reassembled and accessed in many different ways without change the table forms. Most commercial relational database management system uses SQL to access the database. With more and more data being stored in relational database, it has become crucial for users to be able to search and browse the information stored in them. Keyword search in relational databases enables ordinary users, who do not understand the database schema and SQL, to find the connected tuple sets among the tuples stored in relations, with a given set of keywords. The existing methods of keyword search in relational databases can be broadly classified into two categories that are schema based method and graph based method. In schema based keyword search in relational database, it has a common method that is generating candidate network in schema graph transformed from relations. Data is stored in the form of columns, tables and primary key to foreign key relationships in relational databases. According to develop the schema graph, we illustrate two schema graphs as examples. Figure 1 shows the schema graph of publication database from DBLP dataset. It consists of six relation
  • 2. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 14 schemas that are Person, InProceeding, RelaitonPersonInProceeding, Proceeding, Publisher and Series. Each relation has a primary key from except RelationPersonInProceeding relation. InProceeding relation has one foreign key that refers to the primary key defined on Proceeding relation. Proceeding relation has two foreign key that refers to the primary key defined on both Publisher and Series relations. The movies database schema graph of IMDB dataset shows in Figure 2. It consists of six relation schemas: Movies, Directors, Movies-Directors, Movies- Genres, Actors and Roles. Each relation has a primary key from except Movies-Directors relation. Roles relation has one foreign key that refers to the primary key defined on Actors relation. The logical unit of answers needed by users is not limited to an individual column value or ever an individual tuple for a given keyword query. It may be multiple tuples joined together. Given keyword search in relational databases, generating minimum joining tuples sets of relations that contained keyword is called candidate network, such as SQL. A candidate network must satisfy the two conditions, total and minimal. Because it is meaningless if two tuples in a candidate network are too far away from each other, the maximum numbers of tuples allowed in a candidate network are needed to specify [18]. Suppose user wants to get the papers written by “Jinlin Chen” from DBLP database. The system generates the relevant CNs, such as Person ⋈ Relation-Person-InProceeding ⋈ InProceeding, with multiple tuples from different relations joined by foreign keys. Generating all valid candidate networks that are called connected tuple trees by joining tuples from multiple relations. DISCOVER [4], S-KWS [10], Liu et al.[7], and SPARK2 [9] are systems that support keyword search on relational database. They generated tuple trees as answer for the CN generation. The first two systems need to reduce the cost of generating minimal CNs, while the last two systems cannot solve the growing number of CNs for small CN size. Existing candidate network generation, CN’s size is unbounded and the number of CNs grows very large for small CN’s size. This fact brings large overhead for CNs generation. Due to large number of generated CNs to be evaluated, multi-query optimization problem is caused on the CNs evaluation. In this paper, we focus on generating the valid CNs on the data bond and producing the minimal connected tuple trees. We develop algorithms in order to generate all minimum connected trees of tuples in the database with no more than the maximum number of tuple set. We proposed candidate network generation algorithms to find relevant answers on-the-fly by joining tuples in the database. We also proposed the dynamic CN evaluation algorithm (D_CNEval) by evaluating the number of generated CNs. We conduct the experimental results on DBLP and IMDB databases and present the analysis of worst case time for these algorithms. The rest of the paper is organized as follows: Section 2 discusses the related work. Section 3 presents the basic concept of keyword query and CN. The overview of proposed system is specified in Section 4. Section 5 presents the Candidate Network Generation. Section 6 illustrates the query execution and Section 7 shows the experimental results. Section 8 concludes this paper. 2. RELATED WORK The main goal of a keyword search system is to find a set of closely inter-connected tuples that collectively match the keywords. One type of methods is based on modeling data as a graph, and the results as subtrees or sub-graphs. Another type of methods is based on relational databases where structured data are stored. Several researchers have been done on early keyword search systems for relational databases [2, 7, 8, 15]. Yu et al. [18] surveyed the developments on finding structural information among tuples
  • 3. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 in an RDB using an l-keyword query. They discussed the keyword search systems by comparing between schema-based keyword search and graph evaluated the sets of answers by defining all minimal total joining networks of tuples between CNs and the latter showed how to answer keyword queries using graph algorithms focused on weighted directed graph. DBXplorer [1] used undirected graph to according to each tuple tree. This system accessed to symbol table to get tuples’ information, and then calculated tuple tree according to schema graph. DISCOVER [4] proposed the CN generation algorithm based on a breadth search space. This proposed algorithm expanded the partial CNs generated to larger partial CNs until all CNs are generated. As the number of partial CNs can be exponentially large, arbitrarily expanding will make the algorithm extremel the cost of generating the set of CNs is high and kept in memory for further extension. S-KWS [10] developed an algorithm that reduces the number of partial results generated by expanding from part of the nodes in a partial tree and avoid isomorphism testing by assigning a proper expansion order. Although it reduced the generated partial results, it existed overhead for generating minimal CNs to the query. Liu et al. [7] described the answer graph ge they produced duplication-free CNs by assigning the different alias, they had not considered the efficiency of answer generation. SPARK2 [9] developed the duplication canonical form but it did not solve the number of CNs grow 3. PRELIMINARIES 3.1. Data Model A relational database can be viewed as a graph which represents a relational model such as schema graph Gs (V, E) [4, 8, 16, 18]. A relation relation in the database corresponds to a vertex in G {R1,R2,…}. Edges represent the foreign key to primary key relationships between pairs of relation schemas, Ri and Rj, denoted R relation schema, such as a set of tuples, conforming to the relation schema. The graph can be as a directed or undirected graph. It can be captured every granularity level of the schema Figure 1. Publication Database Schema Graph International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 keyword query. They discussed the keyword search systems by comparing based keyword search and graph-based keyword search in RDB. The evaluated the sets of answers by defining all minimal total joining networks of tuples between CNs and the latter showed how to answer keyword queries using graph algorithms focused on weighted directed graph. DBXplorer [1] used undirected graph to construct each SQL statements according to each tuple tree. This system accessed to symbol table to get tuples’ information, and then calculated tuple tree according to schema graph. DISCOVER [4] proposed the CN generation algorithm based on a breadth-first traversal in the search space. This proposed algorithm expanded the partial CNs generated to larger partial CNs until all CNs are generated. As the number of partial CNs can be exponentially large, arbitrarily expanding will make the algorithm extremely inefficient. The problem with this algorithm is that the cost of generating the set of CNs is high and kept in memory for further extension. KWS [10] developed an algorithm that reduces the number of partial results generated by the nodes in a partial tree and avoid isomorphism testing by assigning a proper expansion order. Although it reduced the generated partial results, it existed overhead for generating minimal CNs to the query. Liu et al. [7] described the answer graph generation algorithm to generate tuple trees. Although free CNs by assigning the different alias, they had not considered the efficiency of answer generation. SPARK2 [9] developed the duplication-free algorithm by it did not solve the number of CNs grows very large for small CN size. A relational database can be viewed as a graph which represents a relational model such as (V, E) [4, 8, 16, 18]. A relation database is a collection of relations. Each relation in the database corresponds to a vertex in Gs, denoted as the set of relation schemas {R1,R2,…}. Edges represent the foreign key to primary key relationships between pairs of denoted Ri→Rj. A relation on relation schema Rj is an instance of the relation schema, such as a set of tuples, conforming to the relation schema. The graph can be as a directed or undirected graph. It can be captured every granularity level of the schema Figure 1. Publication Database Schema Graph International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 15 keyword query. They discussed the keyword search systems by comparing based keyword search in RDB. The former evaluated the sets of answers by defining all minimal total joining networks of tuples between CNs and the latter showed how to answer keyword queries using graph algorithms focused on construct each SQL statements according to each tuple tree. This system accessed to symbol table to get tuples’ information, and rst traversal in the search space. This proposed algorithm expanded the partial CNs generated to larger partial CNs until all CNs are generated. As the number of partial CNs can be exponentially large, arbitrarily y inefficient. The problem with this algorithm is that the cost of generating the set of CNs is high and kept in memory for further extension. KWS [10] developed an algorithm that reduces the number of partial results generated by the nodes in a partial tree and avoid isomorphism testing by assigning a proper expansion order. Although it reduced the generated partial results, it existed overhead for neration algorithm to generate tuple trees. Although free CNs by assigning the different alias, they had not considered the free algorithm by s very large for small CN size. A relational database can be viewed as a graph which represents a relational model such as database is a collection of relations. Each , denoted as the set of relation schemas {R1,R2,…}. Edges represent the foreign key to primary key relationships between pairs of is an instance of the relation schema, such as a set of tuples, conforming to the relation schema. The graph can be as a directed or undirected graph. It can be captured every granularity level of the schema elements.
  • 4. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 Figure 2. Movies Database Schema Graph We use directed schema graphs that show in Figure 1 and Figure 2 as the schema graph of publication database and movies key and foreign key attributes are made of same attribute with attribute of related relation. There are no self loops and at most one primary 3.2. Connected Tuple Tree A keyword query (Q) consists of a list of keywords {k tuples that contain the given keywords. For a given query Q, a result is the set of all possible joining networks of tuples. A joining network of tuple is a connected tuple tree. Each node tuple in the database, and each pair of adjacent tuples in primary key relationship. Suppose (Ri,Rj) is an edge in the schema graph. Let ti (ti join tj) Є (Ri join Rj). Then (ti the number of tuples involved. Note that a single tuple is the connected tuple tree with size 1. The size of CTT can have arbitrarily large size, when there exists a many to many relationship in the schema graph. Therefore, the size of conn 3.3. Candidate Network Each connected tuple tree is the sets consisting of relational names that produced by a relational algebra expression, if each tuple in one relation contains a term of the keywords. Fo keyword query Q, the query tuple set contain at least one keyword of the query tuples in relation R and we use R a free tuple set. A candidate network node must be a query tuple set. Every edge (R schema graph Gs. The size of a CN is the number of its tuple sets. In the framework of RDBMS, a keyword query is processed in the two main steps that are candidate network generation and candidate network evaluation. In candidate network generation step, it generates a set of CNs over schema graph G and duplicate-free upon the maximal size. In candidate network evaluation step, it evaluates the generated CNs by reducing the size of intermediate joining results minimal number of CNs and how to evaluate the generated CNs in Section International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 Figure 2. Movies Database Schema Graph We use directed schema graphs that show in Figure 1 and Figure 2 as the schema graph of publication database and movies database schema graph. For simplicity, we assume all primary key and foreign key attributes are made of same attribute with attribute of related relation. There are no self loops and at most one primary-foreign key relationship between any two relations. A keyword query (Q) consists of a list of keywords {k1,k2,…,kq}, and searches interconnected tuples that contain the given keywords. For a given query Q, a result is the set of all possible joining networks of tuples. A joining network of tuple is a connected tuple tree. Each node ach pair of adjacent tuples in CTT is connected via a foreign key to primary key relationship. Suppose (Ri,Rj) is an edge in the schema graph. Let ti Є Ri, tj (Ri join Rj). Then (ti,tj) is an edge in the connected tuple tree. The size the number of tuples involved. Note that a single tuple is the connected tuple tree with size 1. The size of CTT can have arbitrarily large size, when there exists a many to many relationship in the schema graph. Therefore, the size of connected tuple tree is needed to only data bound. Each connected tuple tree is the sets consisting of relational names that produced by a relational algebra expression, if each tuple in one relation contains a term of the keywords. Fo query tuple set RN is a set of all tuples which belong to relation contain at least one keyword of the query Q. We denote RF the free tuple set which is the set of all RQ to denote a tuple set, which can be either a non-free tuple set or candidate network is a tree of tuple sets RN or RF with the restriction that every node must be a query tuple set. Every edge (Ri Q ,Rj Q ) in a CN corresponds to an edge (R of a CN is the number of its tuple sets. In the framework of RDBMS, a keyword query is processed in the two main steps that are candidate network generation and candidate network evaluation. In candidate network generation p, it generates a set of CNs over schema graph Gs. The set of CNs shall be sound or complete free upon the maximal size. In candidate network evaluation step, it evaluates the generated CNs by reducing the size of intermediate joining results. We present how to generate minimal number of CNs and how to evaluate the generated CNs in Section 5 and Section International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 16 We use directed schema graphs that show in Figure 1 and Figure 2 as the schema graph of database schema graph. For simplicity, we assume all primary key and foreign key attributes are made of same attribute with attribute of related relation. There foreign key relationship between any two relations. }, and searches interconnected tuples that contain the given keywords. For a given query Q, a result is the set of all possible joining networks of tuples. A joining network of tuple is a connected tuple tree. Each node ti is a is connected via a foreign key to Ri, tj Є Rj, and ize of a CTT is the number of tuples involved. Note that a single tuple is the connected tuple tree with size 1. The size of CTT can have arbitrarily large size, when there exists a many to many relationship in the ected tuple tree is needed to only data bound. Each connected tuple tree is the sets consisting of relational names that produced by a relational algebra expression, if each tuple in one relation contains a term of the keywords. For a given is a set of all tuples which belong to relation R that which is the set of all free tuple set or with the restriction that every ) in a CN corresponds to an edge (Ri,Rj) in the In the framework of RDBMS, a keyword query is processed in the two main steps that are candidate network generation and candidate network evaluation. In candidate network generation . The set of CNs shall be sound or complete free upon the maximal size. In candidate network evaluation step, it evaluates the . We present how to generate and Section 6.
  • 5. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 4. PROPOSED SYSTEM OVER In this section, we demonstrate the overview of keyword search on relational databases that is shown in Figure 3. The system supports free trees with user typed keywords. In this system, the final results are eliminated by processing the four phases that are following. The query cleaning phase filters out as removed stopwords query. This process reduces the size of the indexing structure considerably. The indexing unit in a relational document can be a field, attribute, tuple, table, or any combination of these. After the system has b relation, the indexer produces the matched tuple sets by using the filtered input query. The system generates a set of CNs by traversing on the schema graph in order to the tuple sets. Query executing phase executes queries for each CNs and generate the connected tuple trees as executed queries. Finally, the system returns the minimal connected tuple trees to the user for a given query. Figure 3. Proposed System Overview 5. CANDIDATE NETWORK In schema-based keyword search in relational database, the generating all candidate networks for keyword query Q satisfy the two properties, such as complete and duplication listed below. Property 1. The set contains all CNs w Property 2. Every two CNs are not isomorphic to each other (duplication In this paper, we propose CN generation algorithms for schema relational database. Existing keyword search systems, such as DISCOVER and S CNs temporarily through a breadth of CNs set is to avoid the generation of redundant joining networks of tuple sets. As the number of query keywords or maximum size of CN increases, or the database schema becomes International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 PROPOSED SYSTEM OVERVIEW In this section, we demonstrate the overview of keyword search on relational databases that is Figure 3. The system supports free-style keyword search by generating connected tuple trees with user typed keywords. In this system, the final results are eliminated by processing the four phases that are following. The query cleaning phase filters out as potential index terms as removed stopwords query. This process reduces the size of the indexing structure considerably. The indexing unit in a relational document can be a field, attribute, tuple, table, or any combination of these. After the system has built the inverted index files as posting table for each relation, the indexer produces the matched tuple sets by using the filtered input query. The system generates a set of CNs by traversing on the schema graph in order to the tuple sets. Query phase executes queries for each CNs and generate the connected tuple trees as executed queries. Finally, the system returns the minimal connected tuple trees to the user for a given Figure 3. Proposed System Overview CANDIDATE NETWORK GENERATION based keyword search in relational database, the generating all candidate networks for keyword query Q satisfy the two properties, such as complete and duplication-free, which are ns all CNs with no more than MAXN (completeness). Property 2. Every two CNs are not isomorphic to each other (duplication-free). In this paper, we propose CN generation algorithms for schema-based keyword search in relational database. Existing keyword search systems, such as DISCOVER and S-KWS, generate CNs temporarily through a breadth-first traversal of schema graph for any user query. The result of CNs set is to avoid the generation of redundant joining networks of tuple sets. As the number of query keywords or maximum size of CN increases, or the database schema becomes International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 17 In this section, we demonstrate the overview of keyword search on relational databases that is style keyword search by generating connected tuple trees with user typed keywords. In this system, the final results are eliminated by processing the potential index terms as removed stopwords query. This process reduces the size of the indexing structure considerably. The indexing unit in a relational document can be a field, attribute, tuple, table, or any uilt the inverted index files as posting table for each relation, the indexer produces the matched tuple sets by using the filtered input query. The system generates a set of CNs by traversing on the schema graph in order to the tuple sets. Query phase executes queries for each CNs and generate the connected tuple trees as executed queries. Finally, the system returns the minimal connected tuple trees to the user for a given based keyword search in relational database, the generating all candidate networks for free, which are based keyword search in KWS, generate query. The result of CNs set is to avoid the generation of redundant joining networks of tuple sets. As the number of query keywords or maximum size of CN increases, or the database schema becomes
  • 6. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 18 complicated, it will take much more time to generate CNs for a query Q. There can be two ways to reduce the time for the generation of CNs. One is to develop a more efficient CN generation algorithm and the other is to develop preprocessing techniques to generate CNs in advance. In this section, we develop the efficient CN generation algorithms to address the above problem. 5.1. Heuristic_CNGen Algorithm In this section, we describe a new CN generation algorithm (Heuristic_CNGen) to generate valid CNs [13]. Given a keyword query Q, the system first receives all the query tuple set RQ for all relations R as input. We use RNorQ to define a tuple set, if CN is a result then each node belongs to the non-free query tuple set RN and the free query tuple set RF of each relation R for a given query. Note that the free query tuple set in CN cannot contain the query keyword, but they support to the non-free query tuple set as primary-foreign keys relationship. Figure 4. Heuristic_CNGen Algorithm Heuristic_CNGen(MAXN, f_limit, f_new) Input: schema graph SG, query Q Output: a set of candidate networks CN 1. E: queue generated all non-free tuple sets{R1 N ,R2 N , …,Rn N }and all free tuple sets{R1 F ,R2 F ,…,Rn F } for Q. 2. While E is not empty{ 3. Pop head T from E 4. If T > MAXN Then T is pruned 5. Else 6. If T is a valid network graph Then add T to CN 7. For each Ri NorQ in T do 8. For each Rj N that is adjacent to Ri N in SG do 9. f_new = h(Rj N ) 10. g(Rj N ) = c(Rj N , Ri N ) 11. f(Rj N ) = g(Rj N ) + f_new 12. If f(Rj N ) ≤ f_limit Then 13. Add Rj N in front of E 14. Else 15. Add Ri N in front of E 16. f_new = f(Rj N ) 17. f_limit = MAX.VALUE 18. End if 19. End for 20. End for 21. End if } 22. For each network graph in CN, if there is more than one network graph in CN, then the same network graph is pruned. 23. Return CN.
  • 7. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 We identify a network graph as a joined expression of the query tuple sets that produces candidate networks as result. We define the size of a network graph as the number of nodes the same as the generated CN’s size. In Figure 4, we present the candidate network generat algorithm based on IDA* algorithm [3, 6] to generate all network graphs for a given query Q and schema graph SG. We set up three parameters: MAXN, f_limit and f_new. First, the maximum number of tuple sets, denote MAXN, in a network graph to reduce Rj N add in front of queue E, if the estimated cost of the cheapest solution through node R than given f_limit value. If the estimated cost of node R that is adjacent node Rj N in SG add in front of E. Third, f_new assign heuristic value of a new node that is adjacent by the existing node in schema graph. Finally, the number of CNs is only data bounded by the query and database. The Properties 1 and 2 prove t duplication-free on the results of the algorithm, if we do not violate any 5.2. AT_CNGen Algorithm We present another CN generation algorithm (AT_CNGen) to improve the performance of Heuristic_CNGen. In CN generation, we computation cost and memory cost. This algorithm generated the valid CN in order to complete and duplication-free. It computes a heuristic value to estimate the cost of a new node that is adjacent by the existing node in SG at once. If the two nodes in SG have same heuristic value, the algorithm expands these nodes iteratively. We observe that this algorithm is not efficient because it does not reduce the large overhead for the above shortcomings. Theref CN generation algorithm (AT_CNGen) based on adjacent tuple list to address the efficiency of CN generation. We can identify an adjacent tuple following the primary and foreign keys into the schema graph, because the relational database is designed as a schema graph SG. An adjacent tuple is defined a set of adjacent tuple connected by primary recognize an adjacent tuple as a joined expression of the query tuple sets that produces CNs as result. We define the size of an adjacent tuple as the number of tuples the same as the generated CN’s size. Figure 5. Adjacent Tuple List for DBLP International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 network graph as a joined expression of the query tuple sets that produces candidate networks as result. We define the size of a network graph as the number of nodes the same as the generated CN’s size. In Figure 4, we present the candidate network generat algorithm based on IDA* algorithm [3, 6] to generate all network graphs for a given query Q and We set up three parameters: MAXN, f_limit and f_new. First, the maximum number of tuple sets, denote MAXN, in a network graph to reduce generating meaningless results. Second, the node add in front of queue E, if the estimated cost of the cheapest solution through node R than given f_limit value. If the estimated cost of node Rj N is more than f_limit value, the node R in SG add in front of E. Third, f_new assign heuristic value of a new node that is adjacent by the existing node in schema graph. Finally, the number of CNs is only data bounded by the query and database. The Properties 1 and 2 prove the completeness and free on the results of the algorithm, if we do not violate any constraints. We present another CN generation algorithm (AT_CNGen) to improve the performance of Heuristic_CNGen. In CN generation, we used Heuristic_CNGen algorithm in order to reduce the computation cost and memory cost. This algorithm generated the valid CN in order to complete free. It computes a heuristic value to estimate the cost of a new node that is e existing node in SG at once. If the two nodes in SG have same heuristic value, the algorithm expands these nodes iteratively. We observe that this algorithm is not efficient because it does not reduce the large overhead for the above shortcomings. Therefore, we propose a new CN generation algorithm (AT_CNGen) based on adjacent tuple list to address the efficiency of We can identify an adjacent tuple following the primary and foreign keys into the schema graph, se is designed as a schema graph SG. An adjacent tuple is defined a set of adjacent tuple connected by primary-foreign key relationship in SG. In this way, we recognize an adjacent tuple as a joined expression of the query tuple sets that produces CNs as esult. We define the size of an adjacent tuple as the number of tuples the same as the generated Figure 5. Adjacent Tuple List for DBLP International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 19 network graph as a joined expression of the query tuple sets that produces candidate networks as result. We define the size of a network graph as the number of nodes the same as the generated CN’s size. In Figure 4, we present the candidate network generation algorithm based on IDA* algorithm [3, 6] to generate all network graphs for a given query Q and We set up three parameters: MAXN, f_limit and f_new. First, the maximum number of tuple sets, generating meaningless results. Second, the node add in front of queue E, if the estimated cost of the cheapest solution through node Rj N is less is more than f_limit value, the node Ri N in SG add in front of E. Third, f_new assign heuristic value of a new node that is adjacent by the existing node in schema graph. Finally, the number of CNs is only he completeness and We present another CN generation algorithm (AT_CNGen) to improve the performance of used Heuristic_CNGen algorithm in order to reduce the computation cost and memory cost. This algorithm generated the valid CN in order to complete free. It computes a heuristic value to estimate the cost of a new node that is e existing node in SG at once. If the two nodes in SG have same heuristic value, the algorithm expands these nodes iteratively. We observe that this algorithm is not efficient because ore, we propose a new CN generation algorithm (AT_CNGen) based on adjacent tuple list to address the efficiency of We can identify an adjacent tuple following the primary and foreign keys into the schema graph, se is designed as a schema graph SG. An adjacent tuple is defined a foreign key relationship in SG. In this way, we recognize an adjacent tuple as a joined expression of the query tuple sets that produces CNs as esult. We define the size of an adjacent tuple as the number of tuples the same as the generated
  • 8. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 Consider the publication database schema graph in Figure 1 and the movies database schema graph in Figure 2, where we can iteratively get each tuple in SG followed by primary keys relationship and obtain the adjacent tuple. For example, we illustrate the adjacent tuple list for query Q = “Chen Web Springer Jack David” in IMDB that are shown in Figure 5 and Figure 6. Figure 6. Adjacent Tuple List for IMDB The adjacent tuple list based method has some features. First, adjacent tuple lists is effective to generate candidate networks as isomorphic. Second, the relationships between adjacent tuples through primary be identified, so we can efficiently generate the valid candidate networks. Third, a set of adjacen tuple list is no larger than the total primary AT_CNGen() Input: A set of adjacent tuple list D Output: A set of CN 1. CN ← Φ 2. For each T 3. For each D 4. For each d 5. 6. 7. 8. 9. End if 10. 11. End for 12. End for 13. Return CN. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 Consider the publication database schema graph in Figure 1 and the movies database schema Figure 2, where we can iteratively get each tuple in SG followed by primary keys relationship and obtain the adjacent tuple. For example, we illustrate the adjacent tuple list Chen Web Springer” in DBLP and the adjacent tuple list for query Q = “Black Jack David” in IMDB that are shown in Figure 5 and Figure 6. Figure 6. Adjacent Tuple List for IMDB The adjacent tuple list based method has some features. First, adjacent tuple lists is effective to generate candidate networks as they capture structures. They can depict a meaningful and non isomorphic. Second, the relationships between adjacent tuples through primary-foreign keys can be identified, so we can efficiently generate the valid candidate networks. Third, a set of adjacen tuple list is no larger than the total primary-foreign keys relationships in the underlying database. Figure 7. AT_CNGen Algorithm AT_CNGen() Input: A set of adjacent tuple list D A set of CN 2. For each T Є getTupleSet(ki) do 3. For each D Є getAdjacentList(T) do For each d Є D do If( d is a valid CN and d∉CN ) Then Add d into CN. Else d is pruned. End if End for End for End for Return CN. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 20 Consider the publication database schema graph in Figure 1 and the movies database schema Figure 2, where we can iteratively get each tuple in SG followed by primary-foreign keys relationship and obtain the adjacent tuple. For example, we illustrate the adjacent tuple list r query Q = “Black The adjacent tuple list based method has some features. First, adjacent tuple lists is effective to they capture structures. They can depict a meaningful and non- foreign keys can be identified, so we can efficiently generate the valid candidate networks. Third, a set of adjacent foreign keys relationships in the underlying database.
  • 9. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 21 We demonstrate this algorithm to generate all CNs for a given query Q and schema graph SG that is shown in Figure 7. In order to generate valid CNs, AT_CNGen algorithm first accepts the adjacent tuple lists as input. During each CN generation, AT_CNGen calls getTupleSet(K) to get a query tuple sets T for a given query Q. And then it calls getAdjacentList(T,MAXN) to take the adjacent tuple list for getting query tuple sets. Each adjacent tuple list d adds CN, if d is a valid CN and is duplicated on each others. If d is invalid and identical, d is pruned. Figure 8. getTupleSet Algorithm In Figure 8, getTupleSet(K) returns a set of query tuple set T for a keyword query Q. We set up one parameter K that is the number of keyword. We receive all the non-free query tuple set RN and the free query tuple set RF of each relation R for a given query. First, the algorithm checks the length of input keyword query. If length of keyword query is one, it produces the non-free query tuple sets Ri N by using the inverted index for that keyword. And it returns the query tuple sets. Where keyword’s length is more than one, the non-free query tuple sets Ri N is made by indexing for each keyword query. At that time, the algorithm returns T by adding Ri N which are not identical. getAdjacentList(T,MAXN) is put two parameter: T and MAXN that is shown in Figure 9. It first receives the non-free query tuple sets T and schema graph SG as input. For each tuple set Ri N , it takes Rj NorF that is adjacent tuple Ri N in schema graph as adjacent tuple d. Next the algorithm checks each adjacent tuple d, which is not identical and no more than the maximum number of tuple sets. And it removes an invalid adjacent tuple. The algorithm returns the valid adjacent tuple list subsequently. getTupleSet(K) Input: query Q={k1,k2,….,ki} Є K, all non-free tuple sets {R1 N , R2 N ,…,Ri N }and all free tuple sets {R1 F , R2 F ,…, Ri F } for Q Output: A set of query tuple set T 1. T ← Φ 2. While (K.length is not empty){ 3. If (K.length = 1) Then 4. Ri N ← Index(ki) 5. Add Ri N into T. 6. Else 7. Ri N ← Index(ki) 8. If (Ri N ∉T ) Then 9. Add Ri N into T. 10. Else 11. Ignore Ri N . 12. End if 13. End if 14. } 15. Return T.
  • 10. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 22 Eventually, AT_CNGen algorithm generates all candidate networks no more than the maximal number of tuple sets for the user input keywords. The generated CNs is only data bounded by following Properties 1 and 2. Figure 9. getAdjancentList Algorithm 6. QUERY EXECUTION In this section, we present the generating CTTs by executing the generated CNs in order to get the results. For instance, consider the query Q = “Jinlin Content”. The corresponding result consists of the connected tuple trees: P1→I1, P1→I2. Each connected tuple trees corresponds to a tree at schema level. For example, both of the above trees correspond to the schema level tree Person{Jinlin} → Relation-Person- InProceeding{} ← InProceeding{Content} , where each Ri K consists of the tuples of Ri that contain all keywords of K and no other keyword of Q. Given a query Q, all possible tuple sets Ri K are computed, where Ri K = {t | t Є Ri∧∀wk Є K, t contains wk∧∀wj Є QK, t does not contain wj} [12]. After selecting a keyword query wl, all tuple sets Ri K for which wl Є K are located. These are the initial connected tuple tree with only one node. Then, these trees are expanded either by adding a tuple set that contains at least another keyword query or a tuple set that is free tuple set. These trees can be further expanded. The connected tuple trees that contain all keywords query are returned. In RDBMS, the problem of evaluating all CNs in order to get all connected tuple trees is a multi-query optimization problem. There are two main issues: (1) How to share common sub-expressions among CNs generated in order to reduce computational cost when evaluating. (2) How to find a proper join order to fast evaluate all CNs. For a getAdjacentList(T,MAXN) Input: Tuple set T: {R1 N , R2 N ,…,Ri N } Є T, Schema graph SG Output: Adjacent tuple list D 1.D ← Φ 2. For each Ri N in T do 3. For each Rj NorF that is adjacent to Ri N in SG do 4. Add Rj Nor F with an edge into d. 5. If ( d ∉D and d < MAXN) Then 6. Add d into D. 7. Else 8. Ignore d. 9. End if 10. End for 11. End for 12. Return D.
  • 11. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 keyword query, the number of CNs generated can be very large. Given a large number of joins, it is extremely difficult to obtain an optimal query processing plan. one best plan for a CN may make others slow down, if its subtrees are shared by others CNs [11]. The idea of evaluating the sub DISCOVER [4] proposed the algorithm algorithm. In this algorithm, evaluated first and may generate the smallest number of result S-KWS [10] constructed an operator mesh evaluating all CNs. When evaluating all CNs in a mesh, a projected relation with the smallest number of tuples is selected to start and to evaluating all CNs using only joins may tuples. They proposed to use semijoin/join sequences to evaluate a In this paper, we present the algorithm for executing CN in order to CN evaluation strategy. It is observed that there is substantial evaluating the common join expressions among CNs. As a consequence, the computational efforts can be saved if multiple CNs can be executed in a calculated way that minimizes the sizes of joining intermediate results. 6.1. Evaluating Candidate Network We present the D_CNEval algorithm based on the idea of dynamic query optimization algorithm [14] for CN evaluation that is shown in F perform this task. In general, it will return the result of query execution. We set up one parameter such as CN and use the non CN. The algorithm evaluates the size of CN. If the size of CN is equal to one, the algorithm can directly return the executed result for the CN. But if the size of CN is more than one, non tuple sets of each relation in CN are projected intermediate results. These projected results are executed as SQL queries and also are merging all results. If the algorithm executes an empty tuple set, it can stop the current state and it will start t execute the next relation in CN. As a consequence, the algorithm returns all executed results for each CN. The examples of evaluated CN for Q = “Chen Web Springer” in DBLP and Q = “ Jack David” in IMDB that are shown in Figure 10 and Figure 11. Figure 10. Processing of Evaluated CN on DBLP International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 keyword query, the number of CNs generated can be very large. Given a large number of joins, it is extremely difficult to obtain an optimal query processing plan. It is because one best plan for a CN may make others slow down, if its subtrees are shared by others The idea of evaluating the sub-expressions is a well-known topic in query optimization. the algorithm to evaluate all CNs together using a greedy algorithm. In this algorithm, sub-expressions that are shared by most CNs should be may generate the smallest number of results should be evaluated first. an operator mesh in order to share the computatio evaluating all CNs in a mesh, a projected relation with the smallest number of tuples is selected to start and to join. Qin et al. [11] observe evaluating all CNs using only joins may always generate a large number of temporary to use semijoin/join sequences to evaluate a CN. paper, we present the algorithm for executing CN in order to CN evaluation strategy. It is observed that there is substantial evaluating the common join expressions among CNs. As a consequence, the computational efforts can be saved if multiple CNs executed in a calculated way that minimizes the sizes of joining intermediate Evaluating Candidate Network We present the D_CNEval algorithm based on the idea of dynamic query optimization algorithm [14] for CN evaluation that is shown in Figure 12. The D_CNEval algorithm is devised to perform this task. In general, it will return the result of query execution. We set up one parameter such as CN and use the non-free tuple sets as input, which belong to that CN. The algorithm evaluates the size of CN. If the size of CN is equal to one, the algorithm can directly return the executed result for the CN. But if the size of CN is more than one, non relation in CN are projected, which contain keyword query, to reduce the size of intermediate results. These projected results are executed as SQL queries and also are merging all results. If the algorithm executes an empty tuple set, it can stop the current state and it will start t execute the next relation in CN. As a consequence, the algorithm returns all executed results for each CN. The examples of evaluated CN for Q = “Chen Web Springer” in DBLP and Q = “ ” in IMDB that are shown in Figure 10 and Figure 11. igure 10. Processing of Evaluated CN on DBLP International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 23 keyword query, the number of CNs generated can be very large. Given a large number of It is because one best plan for a CN may make others slow down, if its subtrees are shared by others known topic in query optimization. gether using a greedy expressions that are shared by most CNs should be s should be evaluated first. order to share the computational cost of evaluating all CNs in a mesh, a projected relation with the observed that always generate a large number of temporary paper, we present the algorithm for executing CN in order to CN evaluation strategy. It is observed that there is substantial evaluating the common join expressions among CNs. As a consequence, the computational efforts can be saved if multiple CNs executed in a calculated way that minimizes the sizes of joining intermediate We present the D_CNEval algorithm based on the idea of dynamic query optimization algorithm igure 12. The D_CNEval algorithm is devised to belong to that CN. The algorithm evaluates the size of CN. If the size of CN is equal to one, the algorithm can directly return the executed result for the CN. But if the size of CN is more than one, non-free which contain keyword query, to reduce the size of intermediate results. These projected results are executed as SQL queries and also are merging all results. If the algorithm executes an empty tuple set, it can stop the current state and it will start to execute the next relation in CN. As a consequence, the algorithm returns all executed results for each CN. The examples of evaluated CN for Q = “Chen Web Springer” in DBLP and Q = “Black
  • 12. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 Figure 11. Processing of Evaluated CN on IMDB Figure 12. D_CNEval Algorithm We perform a worst case time analysis of the D_CNEval algorithm. If the size of CN is equal 1, we assume that we execute the result in time every relation in CN, where |CN| is the size of candidate netw attributes of relation are projected by the non result. In each step, we assign the query results with the same array lists. Evaluating the query results in Ri N takes time T, where T is the number of tuples, that are contained keyword, in D_CNEval(CN) Input: non-free tuple sets{R Output: result of query execution EQ 1. EQ = Φ 2. n = sizeof(CN) 3. If (EQ.isEMPTY()) then 4. Return Φ 5 .Else { 6. If ( n = 1 ) then 7. EQ = executeTo( R 8. Else{ 9. 10. 11. 12. 13. 14. } 15. Return EQ. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 Figure 11. Processing of Evaluated CN on IMDB Figure 12. D_CNEval Algorithm We perform a worst case time analysis of the D_CNEval algorithm. If the size of CN is equal 1, we assume that we execute the result in time O(1). The for-loop is executed at most |CN| times for every relation in CN, where |CN| is the size of candidate network. Given a n-relation in CN, the attributes of relation are projected by the non-free tuple set Ri N that are executed due to projected result. In each step, we assign the query results with the same array lists. Evaluating the query time T, where T is the number of tuples, that are contained keyword, in free tuple sets{R1 N ,R2 N ,…,Ri N }, in which belongs to CN Output: result of query execution EQ 2. n = sizeof(CN) 3. If (EQ.isEMPTY()) then If ( n = 1 ) then EQ = executeTo( Ri N ) Else{ For each Ri N Є CN { Ri N ′ = projectTo (Ri N ) EQ′ = executeTo( Ri N ′ ) If (EQ′.isEMPTY()) then continue Else EQ = EQ ∪ EQ′ } } International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 24 We perform a worst case time analysis of the D_CNEval algorithm. If the size of CN is equal 1, loop is executed at most |CN| times for relation in CN, the that are executed due to projected result. In each step, we assign the query results with the same array lists. Evaluating the query time T, where T is the number of tuples, that are contained keyword, in
  • 13. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 25 relation. Hence we check if a query result is empty in Ri N and reduce its time in O(1). The query result is merged at most |T| times. Hence the total execution time takes in the worst case time O(|CN|.|T|).The D_CNEval algorithm may output the results of query execution by reducing the size of intermediate joining results. 6.2. Generating Connected Tuple Trees We display the processing of generated CTT as shown in Figure 13. For a given query Q, the connected tuple tree is generated according to an evaluated CN that is some tuples coming from different relations. For each pair of adjacent tuple sets Ri, Rj in connected tuple tree, there is an edge (Ri,Rj) in SG. Each CTT that defined satisfaction as follow: Property 3. If a node in connected tuple tree is one of tuples in relation, it contains at least one keyword in query Q (completeness). Property 4. There is no duplicate tuple with each other in the connected tuple tree (duplication-free). In Figure 14, CTT1 and CTT2 for Query 1 and Query 2 in DBLP are presented as examples. In CTT1, a node P2 contains the keyword “Peter”, and I3 contains keyword “XML” and U1 contains the keyword “Springer” in Query 1. In CTT2, the nodes P3 and I4 contain the keywords “David” and “Modelling”, and U1 contains the keyword “Springer” in Query 2 respectively. Figure 13. Processing of Generated CTT For IMDB, CTT3 and CTT4 for Query 3 and Query 4 are illustrated as examples in Figure 15. In CTT3, a node M1 contains the two keywords “Love” and “Story”, and D1 contains keyword “Elley” in Query 3. In CTT4, the nodes A2 and M3 contain the keywords “Black” and “Jack”, and D2 contains the keyword “David” in Query 4 respectively. Except primary-foreign relation nodes, all remaining nodes contain the keywords in given query, and there are no duplicate nodes. In this paper, we consider a connected tuple tree as a result as long as it fulfills the properties.
  • 14. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 26 Figure 14. Queries, CTTs and CNs for DBLP Figure 15. Queries, CTTs and CNs for IMDB 7. PERFORMANCE EVALUATION 7.1. Evaluation Setup We evaluate the search efficiency of proposed algorithms on DBLP and IMDB datasets. All queries generating algorithms were implemented in Java, and JDBC was used to connect to the database. We conducted all the experiments on Core(TM) 2 Duo CPU and 2GB memory laptop running XP. We take the average executing time on running 15 times. Dataset: We use two real datasets the Original Digital Bibliography and Library Project (DBLP) dataset [5] and the Internet Movie Database (IMDB) [17] in our evaluation. DBLP contains publications records. IMDB contains movies records. Table 1 and Table 2. show the schema and statistic of two datasets. Query Set: We manually picked a large number of queries for evaluation. We attempted to include a wide variety of keywords and their combinations in the query sets, such as the selectivity of keywords, the size of the most relevant answers, the number of potential relevant answers, etc. We focus on a subset of the queries in this experiment. There are 20 queries with query length ranging from 2 to 6. Query 1: “Peter XML Springer” CTT1: P2→RPI←I3→R2→U1 CN1: PN ⋈ RPIF ⋈ IN ⋈ RF ⋈ UN Query 2: “David Modelling Springer” CTT2: P3→ RPI←I4→R4→U1 CN2: PN ⋈ RPIF ⋈ IN ⋈ RF ⋈ UN Query 3: “Elley love Story” CTT3: M1→MD←D1 CN3: MN ⋈ MDF ⋈DN Query 4: “Black Jack David” CTT4: A2→ R←M3→MD→D2 CN4: AN ⋈ RF ⋈ MN ⋈ MDF ⋈ DN
  • 15. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 27 Table 1. Statistics of DBLP Dataset Table 2. Statistics of IMDB Dataset 7.2. Evaluation We implement the Heuristic_CNGen and AT_CNGen algorithms for CN generation. In practical, we observe that AT_CNGen algorithm is small overhead than Heuristic_CNGen algorithm. To compare the performance of these algorithms, we analyze the worst case time of Heuristic_CNGen algorithm and AT_CNGen algorithm. In Heuristic_CNGen algorithm, while-loop is performed at most |E| times for every tuple sets in queue E, where |E| is the size of queue. We check the network graph T that is more than maximun number of tuple sets when T is put from queue to calculate as first. If T is greater than maximun number of tuple sets, we can reduce its time in O(1). We add T into Hash table H with generated CN for each tuple set when T is not more than maximun number of tuple sets. This step is increasing the computation time in O(1). And the firstly for-loop is achieved at most |T| times for each tuple set in T, where |T| is the size of network graph that is less than or equal to maximun number of tuple sets. The next for-loop is fulfilled at most |T| times because each node in T traverses the adjacent node with it in schema graph. Generating all valid candiate networks in schema graph takes time |T|. In completeness, the CN generation time takes in O(|E|-|T|). Then, we filter out the duplicated CN with for-loop that takes at most time |T|. Hence the total execution time takes in the worst case time O((|E|-|T|)2 ). In AT_CNGen algorithm, we suppose that the number of keyword is K and the size of tuple set is T and the maximal adjacent tuple in schema graph is M. The for-loop in algorithm getTupleSet(k) is executed at most |K| times for each keyword in keyword query, time complexity to construct the tuple set is O(|K|). After generating the tuple set, getAdjacentList(T,MAXN) algorithm is transvered at most |T| time for each tuple set that is adjacent tuple in the schema graph. The time complexity of transversing adjacent tuple is O(|T|). Then, we check the duplicated CN with for-loop that takes at most time |M|. As a consquence, the total time complexity is O(|K|.|TM|). Relation Schema #Tuples Person(Pid,Name) InProceeding(Iid,Title,Pages,Rid) Proceeding(Rid,Title,Uid,Sid,…) Publisher(Uid,Name) Series(Sid,Title) RelationPersonInProceeding(Pid,IPid) 174,709 212,273 3,007 86 24 491,777 Relation Schema #Tuples Actors(Aid,Name) Directors(Did,Name) Movies(Mid,Name,Year,Rank) Movies-Directors(Mid,Did) Movies-Genres(Mid,Genre) Roles(Aid,Mid,Role) 817,718 86,880 388,269 406,967 417,784 3,432,630
  • 16. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 28 In summary, AT_CNGen algorithm can be reduced the computation time and searching space, while Heuristic_CNGen algorithm expands the nodes in schema graph which have same heuristic values. Moreover, AT_CNGen algorithm generates the valid CNs due to completeness and duplication-free. Finally, we select AT_CNGen algorithm for the CN generation of relational keyword search system. 7.3. Experimental Results of the Candidate Network Generation We compare the evaluation results of the proposed AT_CNGen algorithm and existing algorithms by using the same DBLP and IMDB datasets. We observe that proposed algorithm achieve better search performance over the two datasets than the existing algorithms, such as DISCOVER and SPARK, that is shown in Figure 16 and Figure 17. The proposed AT_CNGen algorithm can generate the valid CNs by the maximal CN size. The proposed algorithm can produce the number of CNs by duplication-free. Then, the new CN generation algorithm is compared with all existing CN generations to eliminate redundancies. The proposed method reduces the number of CNs grows very large even for small CN size. The elapsed time of SPARK is faster than DISCOVER, but SPARK cannot bounded to produce the number of CN for the small CN’s size. We can see that elapsed time of the proposed AT_CNGen algorithm is exponentially smaller than DISCOVER and SPARK. Figure 16. Comparison of AT_CNGen and Others previous algorithms on DBLP 10 100 1000 10000 3 4 5 6 ComputationTime(ms) MAX CN Size DISCOVER SPARK AT_CNGen
  • 17. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 29 Figure 17. Comparison of AT_CNGen and Existing algorithms on IMDB 7.4. Experimental Results of Query Execution In this section, we measure the computation time for sample queries of DBLP dataset in Figure 18 by implementing the D_CNEval algorithm. Moreover, we also present sample queries of IMDB dataset in Figure 20 to evaluate the efficiency of D_CNEval algorithm. Given a keyword query, the proposed algorithm generates the valid CNs. The generated CNs is evaluated by reducing the size of intermediate results to get the final results. Query Keywords Q1 chen Q2 chen web content Q3 chen web springer Q4 web content Q5 content springer bychen Q6 chen web springer 2000 Q7 david compiler generator Q8 compiler springer by david Q9 compiler generator Q10 david compiler generator springer Figure 18. Keywords Queries on DBLP 10 100 1000 10000 3 4 5 6 ComputationTime(ms) MAX CN Size DISCOVER SPARK AT_CNGen
  • 18. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 30 Figure 19. Execution Times for Queries of DBLP Figure 19 shows the execution times by evaluating the number of queries. In this figure, Q1 can execute the result at minimum time because this query selects the executed query in a single relation. Also Q7, Q9 and Q10 can evaluate the result queries at minimum time although two or more relations joined due to the primary-foreign relationship. Then, we present the evaluation of execution times for the queries in IMDB that is shown in Figure 21. We can see that Q14 and Q19 can execute the result at minimum time because this query selects the executed query in a single relation. Also Q13, Q15 and Q18 can evaluate the result queries at minimum time although two or more relations joined due to the primary-foreign relationship. So, we observe that the D_CNEval algorithm execute the final result to speed up. Query Keywords Q11 alexander Q12 hollywood Q13 elley love story Q14 monkey island Q15 blake death Q16 mile allen Q17 godfather Q18 come away 2005 Q19 2010:continues Q20 black jack david Figure 20. Keywords Queries on IMDB 10 100 1000 10000 100000 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 ComputationTime(ms) Queries
  • 19. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 31 Figure 21. Execution Times for Queries of IMDB 8. CONCLUSIONS Efficient keyword search in relational databases allows ordinary users to find text information in relational databases with much higher flexibility. A keyword query in the system is a list of keywords and does not need to specify any relation or attributes names. The result to such a keyword query consists of the minimal connected tuple trees, which potentially include tuples from multiple relations in database. We first proposed a CN generation algorithm (Heuristic_CNGen) to produce all CNs. Although this algorithm produces the valid CNs, it reduced the system performance when traversing the same heuristic values nodes. In order to improve the performance, we also propose a new CN generation algorithm (AT_CNGen) to generate all CNs for relational keyword search system. The proposed candidate network algorithm can solve the growing number of CNs for small CN size by comparing with existing algorithms. Moreover, we observe that AT_CNGen algorithm achieved the system performance as high as Heuristic_CNGen algorithm. Then, we propose the dynamic CN evaluation algorithm (D_CNEval) to produce the connected tuple trees by reducing the joining intermediate results. The proposed CN evaluation algorithm can generate the minimal number of CTTs by doing minimal accesses to the database, and does not have data bound with the maximum number of tuple set. We presented the experimental results on DBLP and IMDB show that the proposed algorithms generate the result approximately for the user desired query. And the experimental results are efficiently evaluated by using query execution strategy. REFERENCES [1] Agrawal,S., Chaudhuri,S., Das,G. (2002). DBXplorer: A System for Keyword-Based Search over Relational Database. Proc. 18th Int. Conf. on Data Engineering, pp. 5-16. [2] Baid,A., Rae,I., Li,J., Doan,A., Naughton,J. (2010). Toward Scalable Keyword Search over Relational Data. Proc. VLDB Endowment, Vol. 3. [3] Felner,A., Korf,R.E., Hanan,S. (2004). Additive Pattern Database Heuristics. Journal of Artificial Intelligence Research, (p.279-318). [4] Hristidis,V., Papakonstaninou,Y. (2002). DISCOVER: Keyword Search in Relational Databases. Proc. 28th Int. Conf. on Very Large Data Bases, pp. 670-681. [5] http://www.dblp.uni.trier.de. [6] Korf,R.E. Depth-Firth Iterative-Deepening: An Optimal Admissible Tree Search. 10 100 1000 10000 100000 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 ComputationTime(ms) Queries
  • 20. International Journal of Computer Science, Engineering and Information Technology (IJCSEIT), Vol.2, No.6, December 2012 32 [7] Liu,F., Yu,C., Meng,W. (2006). Effective Keyword Search in Relational Databases. Proc. 2006 ACM SIGMOD Int. Conf. on Management of data, pp. 563-574. [8] Li,P., Zhu,Q., Wang,S. (2008). The Research on the Algorithms of Keyword Search in Relational Database. Springer, pp. 134-143. [9] Luo,Y., Wang,W., Lin,X., Zhou,X. (2011). SPARK2: Top-k Keyword Query in Relational Databases. TKDE Special Issue: Keyword Search on Structured Data. [10] Markowetz,A., Yang,Y., Papadias,D. (2007). Keyword Search on Relational Data Streams. Proc. 2007 ACM SIGMOD Int. Conf. on Management of data, pp. 605-616. [11] Qin,L., Yu,J.X., Chang,L. (2009). Keyword Search in Databases: The Power of RDBMs. Proc. 35th SIGMOD Int. Conf. on Management of data, pp. 681-694. [12] Stefanidis,K., Drosou,M., Pitoura,E. (2010). PerK: Personalized Keyword Search in Relational Databases through Preferences. Proc. 13th Int. Conf. on Extending Database Technology, EDBT, pp. 585-596. [13] Thein,M.M. (2012). Querying Connected Tuple Trees for Relational Keyword Search. Proc. Int. Conference on Information Retrieval and Knowledge Management, pp. 285-289. [14] Tamer Özsu.M, Valduriez.P. (2011). Principles of Distributed Database Systems. Springer. [15] Wang,S., Zhang,J., Peng,Z., Zhan,J., Wang,Q. (2007). Study on Efficiency and Effectiveness of KSORD. APWeb/WAIM Int. Workshops, pp. 6-17. [16] Xu,Y., Ishikawa,Y., Guan,J. (2009). Effective Top-k Keyword Search in Relational Databases Considering Query Semantics. APWeb/WAIM Int. Workshops, pp. 172–184. [17] http://www.imdb.com/interfaces. [18] YU,J.X., Qin,L., Chang,L. (2010). Keyword Search in Relational Databases: A Survey. IEEE Data Engineering Bulletin, Vol. 33, pp. 67-78.