SlideShare a Scribd company logo
1 of 5
Download to read offline
International Journal of Modern Engineering Research (IJMER)
                 www.ijmer.com         Vol.2, Issue.6, Nov-Dec. 2012 pp-4497-4501       ISSN: 2249-6645

      A Heuristic Algorithm to Reduce Information Overhead Based On
                      Hierarchical Category Structure
                                  Kishore Kumar Reddy1, M M M K Varma2
                    1
                      (M.Tech, Department of CSE, Sri Sivani College of Engineering, Andhra Pradesh, India
            2
                (Assistant Professor, Department of CSE, Sri Sivani College of Engineering, Andhra Pradesh, India

Abstract : Database systems are being increasingly used for category is assigned a descriptive label examining which the
interactive and exploratory data retrieval. In such retrieval      user can determine whether the category is relevant or not.
queries often cause in too many answers, this phenomenon is        Then user can visit the relevant categories and ignore the
referred as “information overload”. In this paper, we              remaining ones. Second, they present the answers to the
proposed a solution which can automatically categorize the         queries in a ranked order. Thus, categorization and ranking
results of SQL queries to address this problem. The proposed       present two complementary techniques to manage
solution dynamically generate a labeled hierarchical               information overload. After browsing the categorization
category structure where users can determine whether a             hierarchy and/or examining the ranked results, users often
category is relevant or not by checking its label and can          reformulate the query into a more focused narrower query.
explore the relevant categories and ignore other categories                  Therefore, categorization and ranking are indirectly
to reduce information overload. In the proposed solution, an       useful even for subsequent reformulation of the queries.
analytical model is designed to estimate information                         In contrast to the internet text search, categorization
overload faced by a user for a given exploration. Based on         and ranking of query results have received much less
the model, we formulate the categorization problem as a cost       attention in the database field. Recently ranking of query
optimization problem and develop heuristic algorithms to           results has gained some attention. But all the existing
calculate the min-cost categorization.                             methods have not critically examined the use of
                                                                   categorization of query results in a relational database.
Keywords:      Data Mining, Information              Overload,     Hence in this paper, categorization of database query results
Categorization, Ranking, Discretization                            presents some unique challenges that are not addressed in
                                                                   various search engines/web directories. In all the existing
                        I. Introduction                            search engines, the category structures are created a priori
          Database systems are being increasingly used for         and the items are tagged in advance as well. At search time,
interactive and exploratory data retrieval [1, 2, 8, 14]. In       the search results are integrated with the pre-defined
such retrieval, queries often result in too many answers. Not      category structure by simply placing each search result under
all the retrieved items are relevant to the user, only a few       the category it was assigned during the tagging process.
result set is relevant to the user. Unfortunately, user needs to   Since such categorization is independent of the query, the
check all the retrieved items to find relevant information         distribution of items in the categories is susceptible to skew:
need to the user query. This phenomenon is commonly                some groups can have a very large number of items and
referred to as “information overload”. Consider a scenario,        some very few.
real-estate database that maintains information like the                     For example, a search on „databases‟ on
location, price, number of bedrooms etc. of each house             Amazon.com yields around 34,000 matches out of which
available for sale. Suppose that a potential buyer is looking      32,580 are under the “books” category. These 32,580 items
for homes in the Seattle/Bellevue Area of Washington, USA          are not categorized any further (can be sorted based on price
in the $200,000 to $300,000 price range. The above query,          or publication date or customer rating) and the user is forced
henceforth referred to as the “Homes” query, returns 6,045         to go through the long list to find the relevant items. This
homes when executed on the MSN House&Home home                     defeats the purpose of categorization as it retains the
listing database. Information overload makes it difficult for      problem of information overload. In this paper, we propose
the user to differentiate the interesting and uninteresting        techniques to automatically categorize the results of SQL
items, which leads to a huge wastage of user‟s time and            queries on a relational database in order to reduce
effort. Information overload can happen when the user is not       information overload. Unlike the “a priori” categorization
certain about the query. In such a situation, user can pose a      techniques described above, we generate a labelled
broad query in the beginning to avoid exclusion of                 hierarchical category structure automatically based on the
potentially interesting results. For example, a user shopping      contents of the tuples in the answer set. Since our category
for a home is often not sure of the exact neighbourhood she        structure is generated at query time and hence tailored to the
wants or the exact price range or the exact square footage at      result set of the query at hand, it does not suffer from the
the beginning of the query. Such broad queries may also            problems of a priori categorization discussed above.
occur when the user is naïve and refrains from using                         This paper discusses how such categorization
advanced search features [8]. Finally, information overload        structures can be generated on the fly to best reduce the
is inherent when users are interested in browsing through a        information overload. We begin by identifying the space of
set of items instead of searching among them.                      categorizations and develop an understanding of the
          In the context internet text search, there has been      exploration models that the user may follow in navigating
two canonical ways to avoid information overload. First,           the hierarchies. Such understanding helps us compare and
they group the search results into separate categories. Each       contrast the relative goodness of the alternatives for

                                                           www.ijmer.com                                               4497 | Page
International Journal of Modern Engineering Research (IJMER)
               www.ijmer.com         Vol.2, Issue.6, Nov-Dec. 2012 pp-4497-4501       ISSN: 2249-6645
categorization. This leads to an analytical cost model that                      III. Basics of Categorization
captures the goodness of a categorization. Such a cost model      Space of Categorizations
is driven by the aggregate knowledge of user behaviours that               Let R be a set of tuples. R can either be a base
can be gleaned from the workload experienced on the               relation or a materialized view or it can be the result of a
system. Finally, we show that we can efficiently search the       query Q. We assume that R does not contain any aggregated
space of categorizations to find a good categorization using      or derived attributes, i.e., Q does not contain any GROUP
the analytical cost models. Our solution is general and           BYs or attribute derivations (Q is a SPJ query). A
presents a domain-independent approach to addressing the          hierarchical categorization of R is a recursive partitioning of
information overload problem. We perform extensive                the tuples in R based on the data attributes and their values.
experiments to evaluate our cost models as well as our            We define a valid hierarchical categorization T of R
categorization algorithm.                                         inductively as follows.
         The rest of the paper is organized as section 2:
discuss about the related work, section 3: discuss about          Base Case: Given the root or “ALL” node (level 0) which
basics of categorization section 4: presents the Proposed         contains all the tuples in R, we partition the tuples in R into
Solution, section 5: discuss about cost model, section 6:         an ordered list of mutually disjoint categories (level 1
categorization algorithm, section 7: discuss about                nodes2) using a single attribute.
Experimental setup, section 8: concludes the paper.
                                                                  Inductive Step: Given a node C at level (l-1), we partition
                    II. Related Work                              the set of tuples tset(C) contained in C into an ordered list of
OLAP and Data Visualization: Our work on categorization           mutually disjoint subcategories (level l nodes) using a single
is related to OLAP as both involve presenting a hierarchical,     attribute which is same for all nodes at level (l-1). We
aggregated view of data to the user and allow to drilldown/       partition a node C only if C contains more than a certain
roll-up the categories [13]. However, in OLAP, the user           number of tuples. The attribute used is referred to as the
needs to manually specify the grouping attributes and             categorizing attribute of the level l nodes and the
grouping functions [13] in our work, those are determined         subcategorizing attribute of the level (l-1) nodes.
automatically. Information visualization deals with visual        Furthermore, once an attribute is used as a categorizing
ways to present information [6]. It can be thought as a step      attribute at any level, it is not repeated at a later level, i.e.,
after categorization to the further reduce information            there is a 1:1 association between each level of T and the
overload: given the category structure proposed in this paper,    corresponding categorizing attribute. We impose the above
we can use visualization techniques visually display the tree     constraints to ensure that the categorization is simple,
[6].                                                              intuitive and easily understandable to the user.
                                                                  Associated with each node C is a category label and a tuple-
Data Mining: The space in which the clusters are discovered       set as defined below:
is usually provided there whereas, in categorization, we need
to find that space. Second, existing clustering algorithms        Category Label: The predicate label(C) describing node C.
deal with either exclusively categorical [11] or exclusively      For example, the first child of root (rendered at the top) has
numeric spaces [17] in categorization, the space usually          label „Neighbourhood ∈ {Redmond, Bellevue}‟ while the
involves both categorical and numeric attributes. Third, the      first child of the above category has label „Price: 200K–
optimization criteria are different while it is minimizing        225K‟.
inter-cluster distance in clustering to decrease information
overload. Our work differs from classification where the          Tuple-Set: The set of tuples tset(C) (called the tuple-set of
categories are already given their along with a training          C) contained in C either appearing directly under C (if C is a
database of labelled tuples and we need predict the label of      leaf node) or under its subcategories (if C is a non-leaf
future, unlabeled tuples [12].                                    node). Formally, tset(C) is the set of tuples, among the ones
                                                                  contained in the parent of C, which satisfy the predicate
Discretization/Histograms: The discretization assumes that        label(C). In other words, tset(C) is the subset of tuples in R
there is a class assigned to each numeric value and uses the      that satisfies the conjunction of category labels of all nodes
entropy minimization heuristic [10]. On the other hand, the       on the path from the root to C
histogram bucket selection is based on minimization of                      The label of a category unambiguously describes to
errors in result size estimation [5, 15].                         the user which tuples, among those in the tuple set of the
                                                                  parent of C, appear under C. Hence, user can determine
Ranking: Ranked retrieval has traditionally been used in          whether C contains any item that is relevant or not by
Information Retrieval in the context of keyword searches          looking just at the label and hence decide whether to explore
over text/unstructured data [3] but has been proposed in the      or ignore C.
context of relational databases recently [2,4,14]. Ranking is a
powerful technique for reducing information overload and                              IV. Proposed Model
can be used effectively in complement with categorization.                 We present two models capturing two cases in data
Although categorization has been studied extensively in the       exploration. One scenario is that the user explores the result
text domain [9, 16] to the best of our knowledge, this is the     set R using the category tree T until finds relevant tuple t
first proposal for automatic categorization in the context of     ∈R. For example, the user may want to find every home
relational databases.                                             relevant to her in the “Homes” query. In order to ensure that
                                                                  user finds every relevant tuple and needs to examine every

                                                          www.ijmer.com                                                4498 | Page
International Journal of Modern Engineering Research (IJMER)
               www.ijmer.com         Vol.2, Issue.6, Nov-Dec. 2012 pp-4497-4501       ISSN: 2249-6645
tuple and every category label except the ones that appear         The user starts the exploration by exploring the root node.
under categories she deliberately decides to ignore. Another       Given that the user has decided to explore a node C, user
scenario is that the user is interested in just one (or two or a   non-deterministically choose one of the two options
few) tuple(s) in R so she explores R using T till she finds
that one (or few) tuple(s). For example, a user may be             Option ‘SHOWTUPLES’: Check the tuples in tset(C)
satisfied if she finds just one or two homes that are relevant     starting from the first tuple in tset(C) till user finds the first
to her. For the purpose of modeling, we assume that, in this       relevant tuple. In this paper, we do not assume any particular
case, the user is interested in just one tuple, i.e., the user     ordering/ranking when the tuples in tset(C) are presented to
explores the result set until finds the first relevant tuple. We   the user.
consider these two cases because they both occur commonly
and they differ in their analytical models.                        Option ‘SHOWCAT’: Examine the labels of the
                                                                   subcategories of C starting from the first subcategory till the
Exploration Model for ‘All’ case                                   first one she finds interesting. As in the „ALL‟ case, user
Figure 1 describes the exploration model for all case the user     checks the label of each subcategory Ci starting from the
starts the exploration by exploring the root node. Given user      first one and non-deterministically chooses to either explore
has decided to explore the node C, if C is a non-leaf node,        it or ignore it. If user chooses to ignore Ci, simply proceeds
user non-deterministically chooses one of the two options          and checks the next label. If user chooses to explore Ci, it
                                                                   does so recursively based on the same exploration model.
Option ‘SHOWTUPLES’: Browse through the tuples in                  We assume that when she drills down into Ci, user finds at
tset(C). Note that the user needs to examine all tuples in         least one relevant tuple in tset (Ci); so, unlike in the „ALL‟
tset(C) to make sure that she finds every tuple relevant to        case, the user does not need to examine the labels of the
her.                                                               remaining subcategories of C.

Option ‘SHOWCAT’: Examine the labels of all the n                                      V. Cost Estimation
subcategories of C, exploring the ones relevant to her and                  Since we want to generate the tree imposes the least
ignoring the rest. More specifically, she examines the label       possible information overload on the user, we need to
of each subcategory Ci starting from the first subcategory         estimate the information overload that a user will face during
and non-deterministically chooses to either explore it or          an exploration using a given category T.
ignore it. If user chooses to ignore Ci, simply proceeds and
examines the next label (of Ci+1). If user chooses to explore      Cost Model for ‘ALL’ case
Ci, it does so recursively based on the same exploration           Let us first consider the „ALL‟ case. Given a user
model, i.e., by choosing either „SHOWTUPLES‟ or                    exploration X using category tree T, we define information
„SHOWCAT‟ if it is an internal node or by choosing                 overload cost, or simply cost (denoted by Cost All(X, T)), as
„SHOWTUPLES‟ if it is a leafnode. After she finishes the           the total number of items examined by the user during X.
exploration of Ci, user goes ahead and examines the label of       This definition is based on the assumption that the time spent
the next subcategory of C (of Ci+1). Note that we assume           in finding the relevant tuples is proportional to the number of
that the user examines the subcategories in the order it           items the user needs to examine more the number of items
appears under C it can be from top to bottom or from left to       user needs to examine, more the time wasted in finding the
right depending on how the tree is rendered by the user            relevant tuples, higher the information overload.
interface.
If C is a leaf node, „SHOWTUPLES‟ is the only option               Cost Model for ‘ONE’ Scenario
(option „SHOWCAT‟ is not possible since a leaf node has no         The information overload cost CostOne (T) that a user will
subcategories).                                                    face, on average during an exploration using a given
                                                                   category tree T is the number of items that a user will need
                                                                   to examine till user finds the first relevant tuple. Let us
                                                                   consider the cost CostOne(C) of exploring the subtree rooted
                                                                   at C given that the user has chosen to explore C, CostOne
                                                                   (T) is simply CostOne (root). If the user goes for option
                                                                   „SHOWTUPLES‟ for C and frac (C) denotes the fraction of
                                                                   tuples in tset(C) that she needs to examine, on average,
                                                                   before she finds the first relevant tuple, the cost, on average,
                                                                   is frac(C)*|tset(C)|. If user goes for option „SHOWCAT‟, the
                                                                   total cost is (K*i + CostOne (Ci)) if Ci is the first
                                                                   subcategory of C explored by the user (since the user
                                                                   examines only i labels and explores only Ci).




 Figure 1: Flow chart model of exploration of node C in
                       ‘All’ case
Exploration Model for ‘One’ Scenario

                                                           www.ijmer.com                                                4499 | Page
International Journal of Modern Engineering Research (IJMER)
               www.ijmer.com         Vol.2, Issue.6, Nov-Dec. 2012 pp-4497-4501       ISSN: 2249-6645
             VI. Categorization Algorithm                          Accurate Cost Model: There is a strong positive correlation
         Since we know how to compute the information              between the estimated average exploration cost and actual
overload cost Cost All (T) of a given tree T, we can               exploration cost for various users. This indicates that our
enumerate all the permissible category trees on R, compute         workload-based cost models can accurately model
their costs and pick the tree Topt with the minimum cost.          information overload in real life.
This enumerative algorithm will produce the cost-optimal
tree but could be prohibitively expensive as the number of         Better Categorization Algorithm: Our cost-based
permissible categorizations may be extremely large.                categorization algorithm produces significantly better
          Hence we present our preliminary ideas to reduce         category trees compared to techniques that do not consider
the search space of enumeration. First we present our              such analytical models.
techniques in the context of 1-level categorization i.e., a root
node pointing to a set of mutually disjoint categories which        Table 1: Pearson’s Correlation between estimated cost
are not subcategorized any further.                                                    and actual cost
                                                                                     Subset Correlation
Reducing the Choices of Categorizing Attribute                                       1       0.39
          Since the presence of a selection condition on an                          2       0.98
attribute in a workload query reflects the user‟s interest in                        3       0.32
that attribute, attributes that occur infrequently in the
                                                                                     4       0.48
workload can be discarded right away while searching for
the min-cost tree. Let A be the categorizing attribute chosen                        5       0.16
for the 1-level categorization. If the occurrence count                              6       0.16
NAttr(A) of A in the workload is low, the SHOWTUPLES                                 7       0.19
probability Pw(root) of the root node will be high. Since the                        8       0.76
SHOWTUPLES cost of a tree is typically much higher than                              All     0.90
its SHOWCAT cost and the choice of partitioning affects
only the SHOWCAT cost, a high SHOWTUPLES                                     To further confirm the strength of the positive
probability implies that the cost of the resulting tree would      correlation between the estimated and actual costs, we
have a large first component (Pw(root)*|tset(root)|) which         compute the Pearson Correlation Coefficient for each subset
would contribute to a higher total cost. Therefore, it is          separately as well as together in Table 1. The overall
reasonable to consider eliminating such low occurring              correlation (0.9) indicates almost perfect linear relationship
attributes without considering any of their partitioning.          while the subset correlations show either weak (between 0.2
          Specifically, we eliminate the uninteresting             and 0.6) or strong (between 0.6 and 1.0) positive correlation.
attributes using the following simple heuristic: if an attribute   This shows that our cost models accurately model
A occurs in less than a fraction x of the queries in the           information overload faced by users in real-life.
workload, i.e., NAttr (A)/N < x, we eliminate A. The
threshold x will need to be specified by the system
designer/domain expert. For example, for the home
searching application, if we use x=0.4, only 6 attributes,
namely neighborhood, price, bedroomcount, bathcount,
property-type and square footage, are retained from among
53 attributes in the MSN House2Home dataset.

           VII.      Experimental Evaluation
          To evaluate the performance of the proposed model,
we present the results of an extensive empirical study we
have conducted to evaluate the accuracy of our cost models         Figure 2: Correlation between actual cost and estimated
in modeling information overload and evaluate our cost-                                      cost
based categorization algorithm and compare it with
categorization techniques that do not consider such cost
models. Our experiments consist of a real-life user study as
well as a novel, large-scale, simulated, cross-validated user-
study. For both studies, our dataset comprises of a single
table called ListProperty which contains information about
real homes that are available for sale in the whole of United
States. The ListProperty contains 1.7 million rows (each row
is a home) and, in addition to other attributes, has the
following non-null attributes: location (neighborhood, city,
state, zip code), price, bedroomcount, bathcount, year-built,
property-type (whether it is a single family home or condo
etc.) and square-footage.                                                     Figure 3: Cost of various techniques



                                                           www.ijmer.com                                             4500 | Page
International Journal of Modern Engineering Research (IJMER)
               www.ijmer.com         Vol.2, Issue.6, Nov-Dec. 2012 pp-4497-4501       ISSN: 2249-6645
          Figure 2 plots the estimated cost against the actual
cost for the 800 explorations. The plot along with the trend       [4]    N. Bruno, L. Gravano and S. Chaudhuri, Top-k Selection
line (best linear fit with intercept 0 is y = 1.1002x) shows              Queries over Relational Databases: Mapping Strategies and
that the actual cost incurred by users has strong positive                Performance Evaluation. In ACM TODS, VO, 27, No. 2,
                                                                          June 2002.
correlation with the estimated average cost.
                                                                   [5]    N. Bruno, S. Chaudhuri and L. Gravano. STHoles: A
          Figure 3 compares the proposed technique with the               Multidimensional Workload-Aware Histogram. Proc. of
Attr-cost and No cost techniques based on the fractional cost             SIGMOD, 2001.
averaged over the queries in a subset.                             [6]    S. Card, J. MacKinlay and B. Shneiderman. Readings in
                                                                          Information Visualization: Using Vision to Think, Morgan
                    VIII.    Conclusion                                   Kaufmann; 1st edition (1999).
          Database systems are being increasingly used for         [7]    J. Chu-Carroll, J. Prager, Y. Ravin and C. Cesar, A Hybrid
interactive and exploratory data retrieval. In such retrieval             Approach to Natural Language Web Search, In Proc. of
                                                                          Conference on Empirical Methods in Natural Language
queries often cause in too many answers, this phenomenon is               Processing, July 20
referred as “information overload”. In this paper, we              [8]    S. Dar, G. Entin, S. Geva and E. Palmon, DTL‟s DataSpot:
proposed a solution which can automatically categorize the                Database Exploration Using Plain Language, In Proceedings
results of SQL queries to address this problem. The proposed              of VLDB Conference, 1998.
solution dynamically generate a labeled hierarchical category      [9]    S. Dumais, J. Platt, D. Heckerman and M. Sahami, Inductive
structure where users can determine whether a category is                 learning algorithms and representations for text
relevant or not by checking its label and can explore the                 categorization, In Proc. Of CIKM Conference, 1998.
relevant categories and ignore other categories to reduce          [10]    U. Fayyad and K. Irani. Multi-Interval Discretization of
information overload. In the proposed solution, an analytical             Continuous-Valued Attributes for Classification Learning.
                                                                          Proc. of IJCAI, 1993.
model is designed to estimate information overload faced by        [11]   V. Ganti, J. Gehrke and R. Ramakrishnan. CACTUS -
a user for a given exploration. Based on the model, we                    Clustering Categorical Data Using Summaries. KDD, 1999.
formulate the categorization problem as a cost optimization        [12]    J. Gehrke, V. Ganti, R. Ramakrishnan, W. Loh.
problem and develop heuristic algorithms to calculate the                 BOATOptimistic Decision Tree Construction. Proc. of
min-cost categorization. Our proposed cost-based                          SIGMOD, 1999.
categorization algorithm produces significant better category      [13]   J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart,
trees compared to techniques that do not consider such cost-              M. Venkatrao, F. Pellow and H. Pirahesh. Data Cube: A
models.                                                                   Relational Aggregation Operator Generalizing Group-By,
                                                                          Cross-Tab, and Sub-Totals. Journal of Data Mining and
                                                                          Knowledge Discovery", Vol 1, No. 1, 1997.
                         References                                [14]   V. Hristidis and Y. Papakonstantinou, DISCOVER: Keyword
[1]   S. Agrawal, S. Chaudhuri and G. Das. DBExplorer: A                  Search in Relational Databases, In Proc. of VLDB
      System for Keyword Search over Relational Databases. In             Conference, 2002
      Proceedings of ICDE Conference, 2002.                        [15]   V. Poosala, Y. Ioannidis, P. Haas, E. Shekita. Improved
[2]   S. Agrawal, S. Chaudhuri, G. Das and A. Gionis. Automated           Histograms for Selectivity Estimation of Range Predicates.
      Ranking of Database Query Results. In Proceedings of First          Proc. of SIGMOD, 1996.
      Biennial Conference on Innovative Data Systems Research      [16]   F. Sebastiani, Machine learning in automated text
      (CIDR), 2003.                                                       categorization, ACM Computing Surveys, Vol. 34, No. 1,
[3]   R. Baeza_yates and B. Ribiero-Neto, Modern Information              2002.
      Retrieval, ACM Press, 1999.                                  [17]   T. Zhang, R. Ramakrishnan and M. Livny. BIRCH: an
                                                                          efficient data clustering method for very large databases.
                                                                          Proc. of ACM SIGMOD Conference, 1996.




                                                           www.ijmer.com                                                4501 | Page

More Related Content

What's hot

Data mining Algorithm’s Variant Analysis
Data mining Algorithm’s Variant AnalysisData mining Algorithm’s Variant Analysis
Data mining Algorithm’s Variant AnalysisIOSR Journals
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET Journal
 
Bca examination 2017 dbms
Bca examination 2017 dbmsBca examination 2017 dbms
Bca examination 2017 dbmsAnjaan Gajendra
 
Expression of Query in XML object-oriented database
Expression of Query in XML object-oriented databaseExpression of Query in XML object-oriented database
Expression of Query in XML object-oriented databaseEditor IJCATR
 
Semantic Annotation Framework For Intelligent Information Retrieval Using KIM...
Semantic Annotation Framework For Intelligent Information Retrieval Using KIM...Semantic Annotation Framework For Intelligent Information Retrieval Using KIM...
Semantic Annotation Framework For Intelligent Information Retrieval Using KIM...dannyijwest
 
ICS Part 2 Computer Science Short Notes
ICS Part 2 Computer Science Short NotesICS Part 2 Computer Science Short Notes
ICS Part 2 Computer Science Short NotesAbdul Haseeb
 
IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...
IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...
IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...IRJET Journal
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics DomainDrjabez
 
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...IJwest
 
Information Retrieval Models
Information Retrieval ModelsInformation Retrieval Models
Information Retrieval ModelsNisha Arankandath
 
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...ijdmtaiir
 
Incentive Compatible Privacy Preserving Data Analysis
Incentive Compatible Privacy Preserving Data AnalysisIncentive Compatible Privacy Preserving Data Analysis
Incentive Compatible Privacy Preserving Data Analysisrupasri mupparthi
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.docbutest
 
Interpreting the Semantics of Anomalies Based on Mutual Information in Link M...
Interpreting the Semantics of Anomalies Based on Mutual Information in Link M...Interpreting the Semantics of Anomalies Based on Mutual Information in Link M...
Interpreting the Semantics of Anomalies Based on Mutual Information in Link M...ijdms
 
A SURVEY OF LINK MINING AND ANOMALIES DETECTION
A SURVEY OF LINK MINING AND ANOMALIES DETECTIONA SURVEY OF LINK MINING AND ANOMALIES DETECTION
A SURVEY OF LINK MINING AND ANOMALIES DETECTIONIJDKP
 

What's hot (15)

Data mining Algorithm’s Variant Analysis
Data mining Algorithm’s Variant AnalysisData mining Algorithm’s Variant Analysis
Data mining Algorithm’s Variant Analysis
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
 
Bca examination 2017 dbms
Bca examination 2017 dbmsBca examination 2017 dbms
Bca examination 2017 dbms
 
Expression of Query in XML object-oriented database
Expression of Query in XML object-oriented databaseExpression of Query in XML object-oriented database
Expression of Query in XML object-oriented database
 
Semantic Annotation Framework For Intelligent Information Retrieval Using KIM...
Semantic Annotation Framework For Intelligent Information Retrieval Using KIM...Semantic Annotation Framework For Intelligent Information Retrieval Using KIM...
Semantic Annotation Framework For Intelligent Information Retrieval Using KIM...
 
ICS Part 2 Computer Science Short Notes
ICS Part 2 Computer Science Short NotesICS Part 2 Computer Science Short Notes
ICS Part 2 Computer Science Short Notes
 
IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...
IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...
IRJET-Efficient Data Linkage Technique using one Class Clustering Tree for Da...
 
Profile Analysis of Users in Data Analytics Domain
Profile Analysis of   Users in Data Analytics DomainProfile Analysis of   Users in Data Analytics Domain
Profile Analysis of Users in Data Analytics Domain
 
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
Immune-Inspired Method for Selecting the Optimal Solution in Semantic Web Ser...
 
Information Retrieval Models
Information Retrieval ModelsInformation Retrieval Models
Information Retrieval Models
 
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
 
Incentive Compatible Privacy Preserving Data Analysis
Incentive Compatible Privacy Preserving Data AnalysisIncentive Compatible Privacy Preserving Data Analysis
Incentive Compatible Privacy Preserving Data Analysis
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.doc
 
Interpreting the Semantics of Anomalies Based on Mutual Information in Link M...
Interpreting the Semantics of Anomalies Based on Mutual Information in Link M...Interpreting the Semantics of Anomalies Based on Mutual Information in Link M...
Interpreting the Semantics of Anomalies Based on Mutual Information in Link M...
 
A SURVEY OF LINK MINING AND ANOMALIES DETECTION
A SURVEY OF LINK MINING AND ANOMALIES DETECTIONA SURVEY OF LINK MINING AND ANOMALIES DETECTION
A SURVEY OF LINK MINING AND ANOMALIES DETECTION
 

Viewers also liked

Turning point of my life
Turning point of my lifeTurning point of my life
Turning point of my lifegyou2
 
Bd2641384143
Bd2641384143Bd2641384143
Bd2641384143IJMER
 
Ax2419441946
Ax2419441946Ax2419441946
Ax2419441946IJMER
 
Di3211291134
Di3211291134Di3211291134
Di3211291134IJMER
 
Inmotionhosting metroblog com_bluehost_reviews_is_their_host
Inmotionhosting metroblog com_bluehost_reviews_is_their_hostInmotionhosting metroblog com_bluehost_reviews_is_their_host
Inmotionhosting metroblog com_bluehost_reviews_is_their_hostRancyna James
 
Trough External Service Management Improve Quality & Productivity
Trough External Service Management Improve Quality & ProductivityTrough External Service Management Improve Quality & Productivity
Trough External Service Management Improve Quality & ProductivityIJMER
 
Performance & emission characteristics of Two Cylinder Diesel Engine Using D...
Performance & emission characteristics of Two Cylinder Diesel  Engine Using D...Performance & emission characteristics of Two Cylinder Diesel  Engine Using D...
Performance & emission characteristics of Two Cylinder Diesel Engine Using D...IJMER
 
Estimize Bull speed using Back propagation
Estimize Bull speed using Back propagationEstimize Bull speed using Back propagation
Estimize Bull speed using Back propagationIJMER
 
Energy Audit of a Food Industry
Energy Audit of a Food IndustryEnergy Audit of a Food Industry
Energy Audit of a Food IndustryIJMER
 
Personalization of the Web Search
Personalization of the Web SearchPersonalization of the Web Search
Personalization of the Web SearchIJMER
 
Estimation of Cost Analysis for 4 Kw Grids Connected Solar Photovoltiac Plant
Estimation of Cost Analysis for 4 Kw Grids Connected Solar Photovoltiac PlantEstimation of Cost Analysis for 4 Kw Grids Connected Solar Photovoltiac Plant
Estimation of Cost Analysis for 4 Kw Grids Connected Solar Photovoltiac PlantIJMER
 
Image Denoising Using Non Linear Filter
Image Denoising Using Non Linear FilterImage Denoising Using Non Linear Filter
Image Denoising Using Non Linear FilterIJMER
 
Novel mechanical Whistle Counter Device for Pressure Cooker
Novel mechanical Whistle Counter Device for Pressure CookerNovel mechanical Whistle Counter Device for Pressure Cooker
Novel mechanical Whistle Counter Device for Pressure CookerIJMER
 
Design and Implementation of Wireless Embedded Systems at 60 GHz Millimeter-W...
Design and Implementation of Wireless Embedded Systems at 60 GHz Millimeter-W...Design and Implementation of Wireless Embedded Systems at 60 GHz Millimeter-W...
Design and Implementation of Wireless Embedded Systems at 60 GHz Millimeter-W...IJMER
 
Zigbee Based Wireless Sensor Networks for Smart Campus
Zigbee Based Wireless Sensor Networks for Smart CampusZigbee Based Wireless Sensor Networks for Smart Campus
Zigbee Based Wireless Sensor Networks for Smart CampusIJMER
 
Noise Removal with Morphological Operations Opening and Closing Using Erosio...
Noise Removal with Morphological Operations Opening and  Closing Using Erosio...Noise Removal with Morphological Operations Opening and  Closing Using Erosio...
Noise Removal with Morphological Operations Opening and Closing Using Erosio...IJMER
 

Viewers also liked (20)

Turning point of my life
Turning point of my lifeTurning point of my life
Turning point of my life
 
Bd2641384143
Bd2641384143Bd2641384143
Bd2641384143
 
Ax2419441946
Ax2419441946Ax2419441946
Ax2419441946
 
Di3211291134
Di3211291134Di3211291134
Di3211291134
 
South africa
South africaSouth africa
South africa
 
Liverpool
LiverpoolLiverpool
Liverpool
 
Inmotionhosting metroblog com_bluehost_reviews_is_their_host
Inmotionhosting metroblog com_bluehost_reviews_is_their_hostInmotionhosting metroblog com_bluehost_reviews_is_their_host
Inmotionhosting metroblog com_bluehost_reviews_is_their_host
 
Trough External Service Management Improve Quality & Productivity
Trough External Service Management Improve Quality & ProductivityTrough External Service Management Improve Quality & Productivity
Trough External Service Management Improve Quality & Productivity
 
Performance & emission characteristics of Two Cylinder Diesel Engine Using D...
Performance & emission characteristics of Two Cylinder Diesel  Engine Using D...Performance & emission characteristics of Two Cylinder Diesel  Engine Using D...
Performance & emission characteristics of Two Cylinder Diesel Engine Using D...
 
Estimize Bull speed using Back propagation
Estimize Bull speed using Back propagationEstimize Bull speed using Back propagation
Estimize Bull speed using Back propagation
 
Kompania deily
Kompania deilyKompania deily
Kompania deily
 
Energy Audit of a Food Industry
Energy Audit of a Food IndustryEnergy Audit of a Food Industry
Energy Audit of a Food Industry
 
Personalization of the Web Search
Personalization of the Web SearchPersonalization of the Web Search
Personalization of the Web Search
 
Estimation of Cost Analysis for 4 Kw Grids Connected Solar Photovoltiac Plant
Estimation of Cost Analysis for 4 Kw Grids Connected Solar Photovoltiac PlantEstimation of Cost Analysis for 4 Kw Grids Connected Solar Photovoltiac Plant
Estimation of Cost Analysis for 4 Kw Grids Connected Solar Photovoltiac Plant
 
Image Denoising Using Non Linear Filter
Image Denoising Using Non Linear FilterImage Denoising Using Non Linear Filter
Image Denoising Using Non Linear Filter
 
Novel mechanical Whistle Counter Device for Pressure Cooker
Novel mechanical Whistle Counter Device for Pressure CookerNovel mechanical Whistle Counter Device for Pressure Cooker
Novel mechanical Whistle Counter Device for Pressure Cooker
 
Design and Implementation of Wireless Embedded Systems at 60 GHz Millimeter-W...
Design and Implementation of Wireless Embedded Systems at 60 GHz Millimeter-W...Design and Implementation of Wireless Embedded Systems at 60 GHz Millimeter-W...
Design and Implementation of Wireless Embedded Systems at 60 GHz Millimeter-W...
 
Zigbee Based Wireless Sensor Networks for Smart Campus
Zigbee Based Wireless Sensor Networks for Smart CampusZigbee Based Wireless Sensor Networks for Smart Campus
Zigbee Based Wireless Sensor Networks for Smart Campus
 
BlackBerry WebWorks
BlackBerry WebWorksBlackBerry WebWorks
BlackBerry WebWorks
 
Noise Removal with Morphological Operations Opening and Closing Using Erosio...
Noise Removal with Morphological Operations Opening and  Closing Using Erosio...Noise Removal with Morphological Operations Opening and  Closing Using Erosio...
Noise Removal with Morphological Operations Opening and Closing Using Erosio...
 

Similar to Dq2644974501

A rough set based hybrid method to text categorization
A rough set based hybrid method to text categorizationA rough set based hybrid method to text categorization
A rough set based hybrid method to text categorizationNinad Samel
 
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...IJTET Journal
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET Journal
 
Expression of Query in XML object-oriented database
Expression of Query in XML object-oriented databaseExpression of Query in XML object-oriented database
Expression of Query in XML object-oriented databaseEditor IJCATR
 
Expression of Query in XML object-oriented database
Expression of Query in XML object-oriented databaseExpression of Query in XML object-oriented database
Expression of Query in XML object-oriented databaseEditor IJCATR
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSeditorijettcs
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSeditorijettcs
 
Information retrival system and PageRank algorithm
Information retrival system and PageRank algorithmInformation retrival system and PageRank algorithm
Information retrival system and PageRank algorithmRupali Bhatnagar
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...IRJET Journal
 
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data SetsHortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data SetsIJMER
 
Overview of Indexing In Object Oriented Database
Overview of Indexing In Object Oriented DatabaseOverview of Indexing In Object Oriented Database
Overview of Indexing In Object Oriented DatabaseEditor IJMTER
 
Annotating Search Results from Web Databases
Annotating Search Results from Web Databases Annotating Search Results from Web Databases
Annotating Search Results from Web Databases Mohit Sngg
 
Bs31267274
Bs31267274Bs31267274
Bs31267274IJMER
 
27 ijcse-01238-5 sivaranjani
27 ijcse-01238-5 sivaranjani27 ijcse-01238-5 sivaranjani
27 ijcse-01238-5 sivaranjaniShivlal Mewada
 
11.0004www.iiste.org call for paper.on demand quality of web services using r...
11.0004www.iiste.org call for paper.on demand quality of web services using r...11.0004www.iiste.org call for paper.on demand quality of web services using r...
11.0004www.iiste.org call for paper.on demand quality of web services using r...Alexander Decker
 
4.on demand quality of web services using ranking by multi criteria 31-35
4.on demand quality of web services using ranking by multi criteria 31-354.on demand quality of web services using ranking by multi criteria 31-35
4.on demand quality of web services using ranking by multi criteria 31-35Alexander Decker
 

Similar to Dq2644974501 (20)

A rough set based hybrid method to text categorization
A rough set based hybrid method to text categorizationA rough set based hybrid method to text categorization
A rough set based hybrid method to text categorization
 
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...
An Improvised Fuzzy Preference Tree Of CRS For E-Services Using Incremental A...
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
 
International Journal of Engineering Inventions (IJEI),
International Journal of Engineering Inventions (IJEI), International Journal of Engineering Inventions (IJEI),
International Journal of Engineering Inventions (IJEI),
 
Expression of Query in XML object-oriented database
Expression of Query in XML object-oriented databaseExpression of Query in XML object-oriented database
Expression of Query in XML object-oriented database
 
Expression of Query in XML object-oriented database
Expression of Query in XML object-oriented databaseExpression of Query in XML object-oriented database
Expression of Query in XML object-oriented database
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
 
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONSEXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
EXPLORING DATA MINING TECHNIQUES AND ITS APPLICATIONS
 
Information retrival system and PageRank algorithm
Information retrival system and PageRank algorithmInformation retrival system and PageRank algorithm
Information retrival system and PageRank algorithm
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
 
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data SetsHortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
 
CloWSer
CloWSerCloWSer
CloWSer
 
Overview of Indexing In Object Oriented Database
Overview of Indexing In Object Oriented DatabaseOverview of Indexing In Object Oriented Database
Overview of Indexing In Object Oriented Database
 
Annotating Search Results from Web Databases
Annotating Search Results from Web Databases Annotating Search Results from Web Databases
Annotating Search Results from Web Databases
 
Bs31267274
Bs31267274Bs31267274
Bs31267274
 
27 ijcse-01238-5 sivaranjani
27 ijcse-01238-5 sivaranjani27 ijcse-01238-5 sivaranjani
27 ijcse-01238-5 sivaranjani
 
11.0004www.iiste.org call for paper.on demand quality of web services using r...
11.0004www.iiste.org call for paper.on demand quality of web services using r...11.0004www.iiste.org call for paper.on demand quality of web services using r...
11.0004www.iiste.org call for paper.on demand quality of web services using r...
 
4.on demand quality of web services using ranking by multi criteria 31-35
4.on demand quality of web services using ranking by multi criteria 31-354.on demand quality of web services using ranking by multi criteria 31-35
4.on demand quality of web services using ranking by multi criteria 31-35
 

More from IJMER

A Study on Translucent Concrete Product and Its Properties by Using Optical F...
A Study on Translucent Concrete Product and Its Properties by Using Optical F...A Study on Translucent Concrete Product and Its Properties by Using Optical F...
A Study on Translucent Concrete Product and Its Properties by Using Optical F...IJMER
 
Developing Cost Effective Automation for Cotton Seed Delinting
Developing Cost Effective Automation for Cotton Seed DelintingDeveloping Cost Effective Automation for Cotton Seed Delinting
Developing Cost Effective Automation for Cotton Seed DelintingIJMER
 
Study & Testing Of Bio-Composite Material Based On Munja Fibre
Study & Testing Of Bio-Composite Material Based On Munja FibreStudy & Testing Of Bio-Composite Material Based On Munja Fibre
Study & Testing Of Bio-Composite Material Based On Munja FibreIJMER
 
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)IJMER
 
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...IJMER
 
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...IJMER
 
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...IJMER
 
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...IJMER
 
Static Analysis of Go-Kart Chassis by Analytical and Solid Works Simulation
Static Analysis of Go-Kart Chassis by Analytical and Solid Works SimulationStatic Analysis of Go-Kart Chassis by Analytical and Solid Works Simulation
Static Analysis of Go-Kart Chassis by Analytical and Solid Works SimulationIJMER
 
High Speed Effortless Bicycle
High Speed Effortless BicycleHigh Speed Effortless Bicycle
High Speed Effortless BicycleIJMER
 
Integration of Struts & Spring & Hibernate for Enterprise Applications
Integration of Struts & Spring & Hibernate for Enterprise ApplicationsIntegration of Struts & Spring & Hibernate for Enterprise Applications
Integration of Struts & Spring & Hibernate for Enterprise ApplicationsIJMER
 
Microcontroller Based Automatic Sprinkler Irrigation System
Microcontroller Based Automatic Sprinkler Irrigation SystemMicrocontroller Based Automatic Sprinkler Irrigation System
Microcontroller Based Automatic Sprinkler Irrigation SystemIJMER
 
On some locally closed sets and spaces in Ideal Topological Spaces
On some locally closed sets and spaces in Ideal Topological SpacesOn some locally closed sets and spaces in Ideal Topological Spaces
On some locally closed sets and spaces in Ideal Topological SpacesIJMER
 
Intrusion Detection and Forensics based on decision tree and Association rule...
Intrusion Detection and Forensics based on decision tree and Association rule...Intrusion Detection and Forensics based on decision tree and Association rule...
Intrusion Detection and Forensics based on decision tree and Association rule...IJMER
 
Natural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine LearningNatural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine LearningIJMER
 
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcess
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcessEvolvea Frameworkfor SelectingPrime Software DevelopmentProcess
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcessIJMER
 
Material Parameter and Effect of Thermal Load on Functionally Graded Cylinders
Material Parameter and Effect of Thermal Load on Functionally Graded CylindersMaterial Parameter and Effect of Thermal Load on Functionally Graded Cylinders
Material Parameter and Effect of Thermal Load on Functionally Graded CylindersIJMER
 
Studies On Energy Conservation And Audit
Studies On Energy Conservation And AuditStudies On Energy Conservation And Audit
Studies On Energy Conservation And AuditIJMER
 
An Implementation of I2C Slave Interface using Verilog HDL
An Implementation of I2C Slave Interface using Verilog HDLAn Implementation of I2C Slave Interface using Verilog HDL
An Implementation of I2C Slave Interface using Verilog HDLIJMER
 
Discrete Model of Two Predators competing for One Prey
Discrete Model of Two Predators competing for One PreyDiscrete Model of Two Predators competing for One Prey
Discrete Model of Two Predators competing for One PreyIJMER
 

More from IJMER (20)

A Study on Translucent Concrete Product and Its Properties by Using Optical F...
A Study on Translucent Concrete Product and Its Properties by Using Optical F...A Study on Translucent Concrete Product and Its Properties by Using Optical F...
A Study on Translucent Concrete Product and Its Properties by Using Optical F...
 
Developing Cost Effective Automation for Cotton Seed Delinting
Developing Cost Effective Automation for Cotton Seed DelintingDeveloping Cost Effective Automation for Cotton Seed Delinting
Developing Cost Effective Automation for Cotton Seed Delinting
 
Study & Testing Of Bio-Composite Material Based On Munja Fibre
Study & Testing Of Bio-Composite Material Based On Munja FibreStudy & Testing Of Bio-Composite Material Based On Munja Fibre
Study & Testing Of Bio-Composite Material Based On Munja Fibre
 
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)
 
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...
 
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...
 
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...
 
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...
 
Static Analysis of Go-Kart Chassis by Analytical and Solid Works Simulation
Static Analysis of Go-Kart Chassis by Analytical and Solid Works SimulationStatic Analysis of Go-Kart Chassis by Analytical and Solid Works Simulation
Static Analysis of Go-Kart Chassis by Analytical and Solid Works Simulation
 
High Speed Effortless Bicycle
High Speed Effortless BicycleHigh Speed Effortless Bicycle
High Speed Effortless Bicycle
 
Integration of Struts & Spring & Hibernate for Enterprise Applications
Integration of Struts & Spring & Hibernate for Enterprise ApplicationsIntegration of Struts & Spring & Hibernate for Enterprise Applications
Integration of Struts & Spring & Hibernate for Enterprise Applications
 
Microcontroller Based Automatic Sprinkler Irrigation System
Microcontroller Based Automatic Sprinkler Irrigation SystemMicrocontroller Based Automatic Sprinkler Irrigation System
Microcontroller Based Automatic Sprinkler Irrigation System
 
On some locally closed sets and spaces in Ideal Topological Spaces
On some locally closed sets and spaces in Ideal Topological SpacesOn some locally closed sets and spaces in Ideal Topological Spaces
On some locally closed sets and spaces in Ideal Topological Spaces
 
Intrusion Detection and Forensics based on decision tree and Association rule...
Intrusion Detection and Forensics based on decision tree and Association rule...Intrusion Detection and Forensics based on decision tree and Association rule...
Intrusion Detection and Forensics based on decision tree and Association rule...
 
Natural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine LearningNatural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine Learning
 
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcess
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcessEvolvea Frameworkfor SelectingPrime Software DevelopmentProcess
Evolvea Frameworkfor SelectingPrime Software DevelopmentProcess
 
Material Parameter and Effect of Thermal Load on Functionally Graded Cylinders
Material Parameter and Effect of Thermal Load on Functionally Graded CylindersMaterial Parameter and Effect of Thermal Load on Functionally Graded Cylinders
Material Parameter and Effect of Thermal Load on Functionally Graded Cylinders
 
Studies On Energy Conservation And Audit
Studies On Energy Conservation And AuditStudies On Energy Conservation And Audit
Studies On Energy Conservation And Audit
 
An Implementation of I2C Slave Interface using Verilog HDL
An Implementation of I2C Slave Interface using Verilog HDLAn Implementation of I2C Slave Interface using Verilog HDL
An Implementation of I2C Slave Interface using Verilog HDL
 
Discrete Model of Two Predators competing for One Prey
Discrete Model of Two Predators competing for One PreyDiscrete Model of Two Predators competing for One Prey
Discrete Model of Two Predators competing for One Prey
 

Dq2644974501

  • 1. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.2, Issue.6, Nov-Dec. 2012 pp-4497-4501 ISSN: 2249-6645 A Heuristic Algorithm to Reduce Information Overhead Based On Hierarchical Category Structure Kishore Kumar Reddy1, M M M K Varma2 1 (M.Tech, Department of CSE, Sri Sivani College of Engineering, Andhra Pradesh, India 2 (Assistant Professor, Department of CSE, Sri Sivani College of Engineering, Andhra Pradesh, India Abstract : Database systems are being increasingly used for category is assigned a descriptive label examining which the interactive and exploratory data retrieval. In such retrieval user can determine whether the category is relevant or not. queries often cause in too many answers, this phenomenon is Then user can visit the relevant categories and ignore the referred as “information overload”. In this paper, we remaining ones. Second, they present the answers to the proposed a solution which can automatically categorize the queries in a ranked order. Thus, categorization and ranking results of SQL queries to address this problem. The proposed present two complementary techniques to manage solution dynamically generate a labeled hierarchical information overload. After browsing the categorization category structure where users can determine whether a hierarchy and/or examining the ranked results, users often category is relevant or not by checking its label and can reformulate the query into a more focused narrower query. explore the relevant categories and ignore other categories Therefore, categorization and ranking are indirectly to reduce information overload. In the proposed solution, an useful even for subsequent reformulation of the queries. analytical model is designed to estimate information In contrast to the internet text search, categorization overload faced by a user for a given exploration. Based on and ranking of query results have received much less the model, we formulate the categorization problem as a cost attention in the database field. Recently ranking of query optimization problem and develop heuristic algorithms to results has gained some attention. But all the existing calculate the min-cost categorization. methods have not critically examined the use of categorization of query results in a relational database. Keywords: Data Mining, Information Overload, Hence in this paper, categorization of database query results Categorization, Ranking, Discretization presents some unique challenges that are not addressed in various search engines/web directories. In all the existing I. Introduction search engines, the category structures are created a priori Database systems are being increasingly used for and the items are tagged in advance as well. At search time, interactive and exploratory data retrieval [1, 2, 8, 14]. In the search results are integrated with the pre-defined such retrieval, queries often result in too many answers. Not category structure by simply placing each search result under all the retrieved items are relevant to the user, only a few the category it was assigned during the tagging process. result set is relevant to the user. Unfortunately, user needs to Since such categorization is independent of the query, the check all the retrieved items to find relevant information distribution of items in the categories is susceptible to skew: need to the user query. This phenomenon is commonly some groups can have a very large number of items and referred to as “information overload”. Consider a scenario, some very few. real-estate database that maintains information like the For example, a search on „databases‟ on location, price, number of bedrooms etc. of each house Amazon.com yields around 34,000 matches out of which available for sale. Suppose that a potential buyer is looking 32,580 are under the “books” category. These 32,580 items for homes in the Seattle/Bellevue Area of Washington, USA are not categorized any further (can be sorted based on price in the $200,000 to $300,000 price range. The above query, or publication date or customer rating) and the user is forced henceforth referred to as the “Homes” query, returns 6,045 to go through the long list to find the relevant items. This homes when executed on the MSN House&Home home defeats the purpose of categorization as it retains the listing database. Information overload makes it difficult for problem of information overload. In this paper, we propose the user to differentiate the interesting and uninteresting techniques to automatically categorize the results of SQL items, which leads to a huge wastage of user‟s time and queries on a relational database in order to reduce effort. Information overload can happen when the user is not information overload. Unlike the “a priori” categorization certain about the query. In such a situation, user can pose a techniques described above, we generate a labelled broad query in the beginning to avoid exclusion of hierarchical category structure automatically based on the potentially interesting results. For example, a user shopping contents of the tuples in the answer set. Since our category for a home is often not sure of the exact neighbourhood she structure is generated at query time and hence tailored to the wants or the exact price range or the exact square footage at result set of the query at hand, it does not suffer from the the beginning of the query. Such broad queries may also problems of a priori categorization discussed above. occur when the user is naïve and refrains from using This paper discusses how such categorization advanced search features [8]. Finally, information overload structures can be generated on the fly to best reduce the is inherent when users are interested in browsing through a information overload. We begin by identifying the space of set of items instead of searching among them. categorizations and develop an understanding of the In the context internet text search, there has been exploration models that the user may follow in navigating two canonical ways to avoid information overload. First, the hierarchies. Such understanding helps us compare and they group the search results into separate categories. Each contrast the relative goodness of the alternatives for www.ijmer.com 4497 | Page
  • 2. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.2, Issue.6, Nov-Dec. 2012 pp-4497-4501 ISSN: 2249-6645 categorization. This leads to an analytical cost model that III. Basics of Categorization captures the goodness of a categorization. Such a cost model Space of Categorizations is driven by the aggregate knowledge of user behaviours that Let R be a set of tuples. R can either be a base can be gleaned from the workload experienced on the relation or a materialized view or it can be the result of a system. Finally, we show that we can efficiently search the query Q. We assume that R does not contain any aggregated space of categorizations to find a good categorization using or derived attributes, i.e., Q does not contain any GROUP the analytical cost models. Our solution is general and BYs or attribute derivations (Q is a SPJ query). A presents a domain-independent approach to addressing the hierarchical categorization of R is a recursive partitioning of information overload problem. We perform extensive the tuples in R based on the data attributes and their values. experiments to evaluate our cost models as well as our We define a valid hierarchical categorization T of R categorization algorithm. inductively as follows. The rest of the paper is organized as section 2: discuss about the related work, section 3: discuss about Base Case: Given the root or “ALL” node (level 0) which basics of categorization section 4: presents the Proposed contains all the tuples in R, we partition the tuples in R into Solution, section 5: discuss about cost model, section 6: an ordered list of mutually disjoint categories (level 1 categorization algorithm, section 7: discuss about nodes2) using a single attribute. Experimental setup, section 8: concludes the paper. Inductive Step: Given a node C at level (l-1), we partition II. Related Work the set of tuples tset(C) contained in C into an ordered list of OLAP and Data Visualization: Our work on categorization mutually disjoint subcategories (level l nodes) using a single is related to OLAP as both involve presenting a hierarchical, attribute which is same for all nodes at level (l-1). We aggregated view of data to the user and allow to drilldown/ partition a node C only if C contains more than a certain roll-up the categories [13]. However, in OLAP, the user number of tuples. The attribute used is referred to as the needs to manually specify the grouping attributes and categorizing attribute of the level l nodes and the grouping functions [13] in our work, those are determined subcategorizing attribute of the level (l-1) nodes. automatically. Information visualization deals with visual Furthermore, once an attribute is used as a categorizing ways to present information [6]. It can be thought as a step attribute at any level, it is not repeated at a later level, i.e., after categorization to the further reduce information there is a 1:1 association between each level of T and the overload: given the category structure proposed in this paper, corresponding categorizing attribute. We impose the above we can use visualization techniques visually display the tree constraints to ensure that the categorization is simple, [6]. intuitive and easily understandable to the user. Associated with each node C is a category label and a tuple- Data Mining: The space in which the clusters are discovered set as defined below: is usually provided there whereas, in categorization, we need to find that space. Second, existing clustering algorithms Category Label: The predicate label(C) describing node C. deal with either exclusively categorical [11] or exclusively For example, the first child of root (rendered at the top) has numeric spaces [17] in categorization, the space usually label „Neighbourhood ∈ {Redmond, Bellevue}‟ while the involves both categorical and numeric attributes. Third, the first child of the above category has label „Price: 200K– optimization criteria are different while it is minimizing 225K‟. inter-cluster distance in clustering to decrease information overload. Our work differs from classification where the Tuple-Set: The set of tuples tset(C) (called the tuple-set of categories are already given their along with a training C) contained in C either appearing directly under C (if C is a database of labelled tuples and we need predict the label of leaf node) or under its subcategories (if C is a non-leaf future, unlabeled tuples [12]. node). Formally, tset(C) is the set of tuples, among the ones contained in the parent of C, which satisfy the predicate Discretization/Histograms: The discretization assumes that label(C). In other words, tset(C) is the subset of tuples in R there is a class assigned to each numeric value and uses the that satisfies the conjunction of category labels of all nodes entropy minimization heuristic [10]. On the other hand, the on the path from the root to C histogram bucket selection is based on minimization of The label of a category unambiguously describes to errors in result size estimation [5, 15]. the user which tuples, among those in the tuple set of the parent of C, appear under C. Hence, user can determine Ranking: Ranked retrieval has traditionally been used in whether C contains any item that is relevant or not by Information Retrieval in the context of keyword searches looking just at the label and hence decide whether to explore over text/unstructured data [3] but has been proposed in the or ignore C. context of relational databases recently [2,4,14]. Ranking is a powerful technique for reducing information overload and IV. Proposed Model can be used effectively in complement with categorization. We present two models capturing two cases in data Although categorization has been studied extensively in the exploration. One scenario is that the user explores the result text domain [9, 16] to the best of our knowledge, this is the set R using the category tree T until finds relevant tuple t first proposal for automatic categorization in the context of ∈R. For example, the user may want to find every home relational databases. relevant to her in the “Homes” query. In order to ensure that user finds every relevant tuple and needs to examine every www.ijmer.com 4498 | Page
  • 3. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.2, Issue.6, Nov-Dec. 2012 pp-4497-4501 ISSN: 2249-6645 tuple and every category label except the ones that appear The user starts the exploration by exploring the root node. under categories she deliberately decides to ignore. Another Given that the user has decided to explore a node C, user scenario is that the user is interested in just one (or two or a non-deterministically choose one of the two options few) tuple(s) in R so she explores R using T till she finds that one (or few) tuple(s). For example, a user may be Option ‘SHOWTUPLES’: Check the tuples in tset(C) satisfied if she finds just one or two homes that are relevant starting from the first tuple in tset(C) till user finds the first to her. For the purpose of modeling, we assume that, in this relevant tuple. In this paper, we do not assume any particular case, the user is interested in just one tuple, i.e., the user ordering/ranking when the tuples in tset(C) are presented to explores the result set until finds the first relevant tuple. We the user. consider these two cases because they both occur commonly and they differ in their analytical models. Option ‘SHOWCAT’: Examine the labels of the subcategories of C starting from the first subcategory till the Exploration Model for ‘All’ case first one she finds interesting. As in the „ALL‟ case, user Figure 1 describes the exploration model for all case the user checks the label of each subcategory Ci starting from the starts the exploration by exploring the root node. Given user first one and non-deterministically chooses to either explore has decided to explore the node C, if C is a non-leaf node, it or ignore it. If user chooses to ignore Ci, simply proceeds user non-deterministically chooses one of the two options and checks the next label. If user chooses to explore Ci, it does so recursively based on the same exploration model. Option ‘SHOWTUPLES’: Browse through the tuples in We assume that when she drills down into Ci, user finds at tset(C). Note that the user needs to examine all tuples in least one relevant tuple in tset (Ci); so, unlike in the „ALL‟ tset(C) to make sure that she finds every tuple relevant to case, the user does not need to examine the labels of the her. remaining subcategories of C. Option ‘SHOWCAT’: Examine the labels of all the n V. Cost Estimation subcategories of C, exploring the ones relevant to her and Since we want to generate the tree imposes the least ignoring the rest. More specifically, she examines the label possible information overload on the user, we need to of each subcategory Ci starting from the first subcategory estimate the information overload that a user will face during and non-deterministically chooses to either explore it or an exploration using a given category T. ignore it. If user chooses to ignore Ci, simply proceeds and examines the next label (of Ci+1). If user chooses to explore Cost Model for ‘ALL’ case Ci, it does so recursively based on the same exploration Let us first consider the „ALL‟ case. Given a user model, i.e., by choosing either „SHOWTUPLES‟ or exploration X using category tree T, we define information „SHOWCAT‟ if it is an internal node or by choosing overload cost, or simply cost (denoted by Cost All(X, T)), as „SHOWTUPLES‟ if it is a leafnode. After she finishes the the total number of items examined by the user during X. exploration of Ci, user goes ahead and examines the label of This definition is based on the assumption that the time spent the next subcategory of C (of Ci+1). Note that we assume in finding the relevant tuples is proportional to the number of that the user examines the subcategories in the order it items the user needs to examine more the number of items appears under C it can be from top to bottom or from left to user needs to examine, more the time wasted in finding the right depending on how the tree is rendered by the user relevant tuples, higher the information overload. interface. If C is a leaf node, „SHOWTUPLES‟ is the only option Cost Model for ‘ONE’ Scenario (option „SHOWCAT‟ is not possible since a leaf node has no The information overload cost CostOne (T) that a user will subcategories). face, on average during an exploration using a given category tree T is the number of items that a user will need to examine till user finds the first relevant tuple. Let us consider the cost CostOne(C) of exploring the subtree rooted at C given that the user has chosen to explore C, CostOne (T) is simply CostOne (root). If the user goes for option „SHOWTUPLES‟ for C and frac (C) denotes the fraction of tuples in tset(C) that she needs to examine, on average, before she finds the first relevant tuple, the cost, on average, is frac(C)*|tset(C)|. If user goes for option „SHOWCAT‟, the total cost is (K*i + CostOne (Ci)) if Ci is the first subcategory of C explored by the user (since the user examines only i labels and explores only Ci). Figure 1: Flow chart model of exploration of node C in ‘All’ case Exploration Model for ‘One’ Scenario www.ijmer.com 4499 | Page
  • 4. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.2, Issue.6, Nov-Dec. 2012 pp-4497-4501 ISSN: 2249-6645 VI. Categorization Algorithm Accurate Cost Model: There is a strong positive correlation Since we know how to compute the information between the estimated average exploration cost and actual overload cost Cost All (T) of a given tree T, we can exploration cost for various users. This indicates that our enumerate all the permissible category trees on R, compute workload-based cost models can accurately model their costs and pick the tree Topt with the minimum cost. information overload in real life. This enumerative algorithm will produce the cost-optimal tree but could be prohibitively expensive as the number of Better Categorization Algorithm: Our cost-based permissible categorizations may be extremely large. categorization algorithm produces significantly better Hence we present our preliminary ideas to reduce category trees compared to techniques that do not consider the search space of enumeration. First we present our such analytical models. techniques in the context of 1-level categorization i.e., a root node pointing to a set of mutually disjoint categories which Table 1: Pearson’s Correlation between estimated cost are not subcategorized any further. and actual cost Subset Correlation Reducing the Choices of Categorizing Attribute 1 0.39 Since the presence of a selection condition on an 2 0.98 attribute in a workload query reflects the user‟s interest in 3 0.32 that attribute, attributes that occur infrequently in the 4 0.48 workload can be discarded right away while searching for the min-cost tree. Let A be the categorizing attribute chosen 5 0.16 for the 1-level categorization. If the occurrence count 6 0.16 NAttr(A) of A in the workload is low, the SHOWTUPLES 7 0.19 probability Pw(root) of the root node will be high. Since the 8 0.76 SHOWTUPLES cost of a tree is typically much higher than All 0.90 its SHOWCAT cost and the choice of partitioning affects only the SHOWCAT cost, a high SHOWTUPLES To further confirm the strength of the positive probability implies that the cost of the resulting tree would correlation between the estimated and actual costs, we have a large first component (Pw(root)*|tset(root)|) which compute the Pearson Correlation Coefficient for each subset would contribute to a higher total cost. Therefore, it is separately as well as together in Table 1. The overall reasonable to consider eliminating such low occurring correlation (0.9) indicates almost perfect linear relationship attributes without considering any of their partitioning. while the subset correlations show either weak (between 0.2 Specifically, we eliminate the uninteresting and 0.6) or strong (between 0.6 and 1.0) positive correlation. attributes using the following simple heuristic: if an attribute This shows that our cost models accurately model A occurs in less than a fraction x of the queries in the information overload faced by users in real-life. workload, i.e., NAttr (A)/N < x, we eliminate A. The threshold x will need to be specified by the system designer/domain expert. For example, for the home searching application, if we use x=0.4, only 6 attributes, namely neighborhood, price, bedroomcount, bathcount, property-type and square footage, are retained from among 53 attributes in the MSN House2Home dataset. VII. Experimental Evaluation To evaluate the performance of the proposed model, we present the results of an extensive empirical study we have conducted to evaluate the accuracy of our cost models Figure 2: Correlation between actual cost and estimated in modeling information overload and evaluate our cost- cost based categorization algorithm and compare it with categorization techniques that do not consider such cost models. Our experiments consist of a real-life user study as well as a novel, large-scale, simulated, cross-validated user- study. For both studies, our dataset comprises of a single table called ListProperty which contains information about real homes that are available for sale in the whole of United States. The ListProperty contains 1.7 million rows (each row is a home) and, in addition to other attributes, has the following non-null attributes: location (neighborhood, city, state, zip code), price, bedroomcount, bathcount, year-built, property-type (whether it is a single family home or condo etc.) and square-footage. Figure 3: Cost of various techniques www.ijmer.com 4500 | Page
  • 5. International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.2, Issue.6, Nov-Dec. 2012 pp-4497-4501 ISSN: 2249-6645 Figure 2 plots the estimated cost against the actual cost for the 800 explorations. The plot along with the trend [4] N. Bruno, L. Gravano and S. Chaudhuri, Top-k Selection line (best linear fit with intercept 0 is y = 1.1002x) shows Queries over Relational Databases: Mapping Strategies and that the actual cost incurred by users has strong positive Performance Evaluation. In ACM TODS, VO, 27, No. 2, June 2002. correlation with the estimated average cost. [5] N. Bruno, S. Chaudhuri and L. Gravano. STHoles: A Figure 3 compares the proposed technique with the Multidimensional Workload-Aware Histogram. Proc. of Attr-cost and No cost techniques based on the fractional cost SIGMOD, 2001. averaged over the queries in a subset. [6] S. Card, J. MacKinlay and B. Shneiderman. Readings in Information Visualization: Using Vision to Think, Morgan VIII. Conclusion Kaufmann; 1st edition (1999). Database systems are being increasingly used for [7] J. Chu-Carroll, J. Prager, Y. Ravin and C. Cesar, A Hybrid interactive and exploratory data retrieval. In such retrieval Approach to Natural Language Web Search, In Proc. of Conference on Empirical Methods in Natural Language queries often cause in too many answers, this phenomenon is Processing, July 20 referred as “information overload”. In this paper, we [8] S. Dar, G. Entin, S. Geva and E. Palmon, DTL‟s DataSpot: proposed a solution which can automatically categorize the Database Exploration Using Plain Language, In Proceedings results of SQL queries to address this problem. The proposed of VLDB Conference, 1998. solution dynamically generate a labeled hierarchical category [9] S. Dumais, J. Platt, D. Heckerman and M. Sahami, Inductive structure where users can determine whether a category is learning algorithms and representations for text relevant or not by checking its label and can explore the categorization, In Proc. Of CIKM Conference, 1998. relevant categories and ignore other categories to reduce [10] U. Fayyad and K. Irani. Multi-Interval Discretization of information overload. In the proposed solution, an analytical Continuous-Valued Attributes for Classification Learning. Proc. of IJCAI, 1993. model is designed to estimate information overload faced by [11] V. Ganti, J. Gehrke and R. Ramakrishnan. CACTUS - a user for a given exploration. Based on the model, we Clustering Categorical Data Using Summaries. KDD, 1999. formulate the categorization problem as a cost optimization [12] J. Gehrke, V. Ganti, R. Ramakrishnan, W. Loh. problem and develop heuristic algorithms to calculate the BOATOptimistic Decision Tree Construction. Proc. of min-cost categorization. Our proposed cost-based SIGMOD, 1999. categorization algorithm produces significant better category [13] J. Gray, S. Chaudhuri, A. Bosworth, A. Layman, D. Reichart, trees compared to techniques that do not consider such cost- M. Venkatrao, F. Pellow and H. Pirahesh. Data Cube: A models. Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. Journal of Data Mining and Knowledge Discovery", Vol 1, No. 1, 1997. References [14] V. Hristidis and Y. Papakonstantinou, DISCOVER: Keyword [1] S. Agrawal, S. Chaudhuri and G. Das. DBExplorer: A Search in Relational Databases, In Proc. of VLDB System for Keyword Search over Relational Databases. In Conference, 2002 Proceedings of ICDE Conference, 2002. [15] V. Poosala, Y. Ioannidis, P. Haas, E. Shekita. Improved [2] S. Agrawal, S. Chaudhuri, G. Das and A. Gionis. Automated Histograms for Selectivity Estimation of Range Predicates. Ranking of Database Query Results. In Proceedings of First Proc. of SIGMOD, 1996. Biennial Conference on Innovative Data Systems Research [16] F. Sebastiani, Machine learning in automated text (CIDR), 2003. categorization, ACM Computing Surveys, Vol. 34, No. 1, [3] R. Baeza_yates and B. Ribiero-Neto, Modern Information 2002. Retrieval, ACM Press, 1999. [17] T. Zhang, R. Ramakrishnan and M. Livny. BIRCH: an efficient data clustering method for very large databases. Proc. of ACM SIGMOD Conference, 1996. www.ijmer.com 4501 | Page