SlideShare a Scribd company logo
1 of 12
QUERY AWARE DETERMINIZATION OF UNCERTAIN
OBJECTS
Abstract—This paper considers the problem of determinizing probabilistic data to
enable such data to be stored in legacy systems that accept only deterministic
input. Probabilistic data may be generated by automated data analysis/enrichment
techniques such as entity resolution, information extraction, and speech processing.
The legacy system may correspond to pre-existing web applications such as Flickr,
Picasa, etc. The goal is to generate a deterministic representation of probabilistic
data that optimizes the quality of the end-application built on deterministic data.
We explore such a determinization problem in the context of two different data
processing tasks—triggers and selection queries. We show that approaches such as
thresholding or top-1 selection traditionally used for determinization lead to
suboptimal performance for such applications. Instead, we develop a query-aware
strategy and show its advantages over existing solutions through a comprehensive
empirical evaluation over real and synthetic datasets.
EXISTING SYSTEM:
Determinizing Probabilistic Data. While we are not aware of any prior work that
directly addresses the issue of determinizing probabilistic data as studied in this
paper, the works that are very related to ours are . They explore how to determinize
answers to a query over a probabilistic database. In contrast, we are interested in
best deterministic representation of data (and not that of a answer to a query) so as
to continue to use existing end-applications that take only deterministic input. The
differences in the two problem settings lead to different challenges. Authors in
address a problem that chooses the set of uncertain objects to be cleaned, in order
to achieve the best improvement in the quality of query answers. However, their
goal is to improve quality of single query, while ours is to optimize quality of
overall query workload. Also, they focus on how to select the best set of objects
and each selected object is cleaned by human clarification, whereas we determinize
all objects automatically. These differences essentially lead to different
optimization challenges. Another related area is MAP inference in graphical
model, which aims to find the assignment to each variable that jointly maximizes
the probability defined by the model. The determinization problem for the cost-
based metric can be potentially viewed as an instance of MAP inference problem.
If we view the problem that way, the challenge becomes that of developing fast
and high-quality approximate algorithm for solving the corresponding NP-hard
problem. Section 3.3 exactly provides such algorithms, heavily optimized and
tuned to specifically our problem setting. Probabilistic Data Models. A variety of
advanced probabilistic data models have been proposed in the past. Our focus
however was determinizing probabilistic objects, such as image tags and speech
output, for which the probabilistic attribute model suffices. We note that
determining probabilistic data stored in more advanced probabilistic models such
as And/Xor tree might also be interesting. Extending our work to deal with data of
such complexity remains an interesting future direction of work.
PROPOSED SYSTEM:
Overall, the main contributions of this paper are:
• We introduce the problem of determinizing probabilistic data. Given a workload
of triggers/queries, the main challenge is to find the deterministic representation of
the data which would optimize certain quality metrics of the answer to these
triggers/queries.
• We propose a framework that solves the problem of determinization by
minimizing the expected cost of the answer to queries. We develop a branchand-
bound algorithm that finds an approximate near-optimal solution to the resulting
NP-hard problem.
• We address the problem of determinizing a collection of objects to optimize set-
based quality metrics, such as F-measure. We develop an efficient algorithm that
reaches near-optimal quality.
• We extend the solutions to handle a data model where mutual exclusion exists
among tags. We show that correlations among tags can be leveraged in our
solutions to get better results. We also demonstrate that our solutions are designed
to handle various types of queries.
• We empirically demonstrate that the proposed algorithms are very efficient and
reach high-quality results that are very close to those of the optimal solution. We
also demonstrate that they are robust to small changes in the original query
workload.
Module 1
Data Quality
Data quality refers to the level of quality of Data. There are many definitions of
data quality but data is generally considered high quality if, "they are fit for their
intended uses in operations, decision making and planning." (J. M. Juran).
Alternatively, data is deemed of high quality if it correctly represents the real-
world construct to which it refers. Furthermore, apart from these definitions, as
data volume increases, the question of internal consistency within data becomes
significant, regardless of fitness for use for any particular external purpose. The
people's views on data quality can often be in disagreement, even when discussing
the same set of data used for the same purpose. A considerable amount of data
quality research involves investigating and describing various categories of
desirable attributes (or dimensions) of data. These lists commonly
include accuracy, correctness, currency, completeness andrelevance. Nearly 200
such terms have been identified and there is little agreement in their nature (are
these concepts, goals or criteria?), their definitions or measures (Wang et al.,
1993). Software engineers may recognize this as a similar problem to "ilities".
Data quality control is the process of controlling the usage of data with known
quality measurements for an application or a process. This process is usually done
after a Data Quality Assurance (QA) process, which consists of discovery of data
inconsistency and correction.
Data QA processesprovides following information to Data Quality Control (QC):
 Severity of inconsistency
 Incompleteness
 Accuracy
 Precision
 Missing / Unknown
The Data QC process uses the information from the QA process to decide to use
the data for analysis or in an application or business process. For example, if a
Data QC process finds that the data contains too many errors or inconsistencies,
then it prevents that data from being used for its intended process which could
cause disruption. For example, providing invalid measurements from several
sensors to the automatic pilot feature on an aircraft could cause it to crash. Thus,
establishing data QC process provides the protection of usage of data control and
establishes safe information usage.
Module 2
Determinization Problem
Having defined the notation, we now can define the determinization problem:
Definition 1 (Determinization). Given a set of uncertain objects O = {O1,O2, . . .
,O|O|}, a query workload Q and a quality metric F, the goal of the deteriminization
problem is for each object O ∈ O to select from WO a set of tags AO ⊆ WO as the
deterministic representation of O, such that F(O,Q) is optimized.
The ground truth tags G are not associated with uncertain objects and thus not
known to any approach to the determinization problem. Therefore, such algorithms
cannot directly measure the quality F(O,Q) during their execution. Thus, in the
following section, we will present our solution that is based on maximizing the
expected quality measure E(F(O,Q)). Before we describe our solution to the
determinization problem, we note that a baseline algorithm for determinization is a
thresholding (threshold-cut) approach. The thresholding approach employs a pre-
defined threshold τ . For each object O it composes its answer set A by choosing wi
tags from W such that P(wi) ≥ τ . The advantage of this approach is that it is
computationally efficient and potentially can be tuned to a particular dataset O and
workload Q by changing τ . The main drawback of this approach is that it is
unaware of the query workload (“query-unaware") and thus does not necessarily
optimize the given quality metrics, which leads to lower quality.
Module 3
Branch and Bound Algorithm
Instead of performing a brute-force enumeration, we can employ a faster branch
and bound (BB) technique. The approach discovers answer sets in a greedy fashion
so that answer sets with lower costtend to be discovered first.
Outline of the BB algorithm
The algorithm uses a branch and bound strategy to explore a search tree. Each
node v in the tree corresponds to a partial tag selection where decisions of whether
to include certain tags in the answer set or not have been made. We capture the
partial selection of tags using the concept of sequence. The performance of the
algorithm depends upon the choice of the node (amongst all the non-leaf nodes) to
branch, the computation of upper and lower bounds of a node, and the choice of
the tag used to expand the node.We next describe these components of the
algorithm in detail.
Module 4
Node Selection
The algorithm maintains a priority queue H for picking a node v that contains the
most promising sequence Sv to continue with. Among the nodes in the priority
queue, which one should we choose to branch next? Let us consider sequence Sv
that corresponds to a node v ∈ H. For v we define Av as the set of answer
sequences that correspond to the leaf nodes derivable from node v via the
branching procedure described above. That is, Av corresponds to the leaf nodes of
the subtree rooted at node v. For example, or node v2 of the tree in Fig. 3, Av2 =
{Sv8 , Sv9 , Sv10 , Sv11 }. Then for node v let mv = min A∈ Av E(cost(A,G,Q)) be
the value of the minimum expected cost among these sequences. Notice that if we
knew the exact value of mv, it would have been an ideal key for the priority queue
H, since it would lead to the quickest way to find the best sequences. The problem
is that the exact value of mv is unknown when v is branched, since the subtree
rooted at v is not yet constructed at that moment. Even though mv is unknown it is
possible to quickly determine good lower and upper bounds on its value _v ≤ mv ≤
hv, without comparing scores of each sequence in Av. For any node v, if Sv is an
answer sequence then the bounds are equal to the cost of the sequence itself,
otherwise the bounds are computed as explained next.
Module 5
Query-Level Optimization
The performance of the BB algorithm can be significantly improved further by
employing query-level optimizations. For a given sequence Sv of node v, we might
be able to exactly determine the expected cost of certain queries (e.g., Cases 1 and
2 above), even if Sv is not an answer sequence. In other words, for these queries we
can get their cost without expanding Sv into the corresponding answer sequences A
∈ Av. We refer to such cost as fixed cost of v, and to such queries as fixed or
decided queries with respect to O. Thus, each node v is also associated with the set
of undecided queries Qv, its fixed cost ˆcv, as well as its lower and upper bounds
ˆlv and ˆhv on non-fixed cost. Note that lv = ˆcv +ˆlv and hv = ˆcv + ˆhv. Fixed cost
and fixed queries give us an opportunity to make the BB algorithm even more
efficient by reducing the number of queries involved in computing the expected
cost. When the algorithm performs a branching at a node, it creates two new nodes
vyes and vno, by making yes/no decision for one undecided tag w. Since one
additional tag is now decided for each new node v, it is possible that expected cost
for some query Q ∈ Qv can be computed (Case 1 and 2). If so, Q becomes a fixed
query for the new node and will not be considered in future computations derived
from that node. Based on the relation between Sv and Q, there are three different
cases for updating ˆcv, ˆlv and ˆhv
CONCLUSION:
In this paper we have considered the problem of determinizing uncertain objects to
enable such data to be stored in pre-existing systems, such as Flickr, that take only
deterministic input. The goal is to generate deterministic representations that
optimize the quality of answers to queries/triggers that execute over the
deterministic data representation. We have proposed efficient determinization
algorithms that are orders of magnitude faster than the enumeration based optimal
solution but achieves almost the same quality as the optimal solution. As future
work, we plan to explore determinization techniques in the context of applications,
wherein users are also interested in retrieving objects in a ranked order.
REFERENCES
[1] D. V. Kalashnikov, S. Mehrotra, J. Xu, and N. Venkatasubramanian, “A
semantics-based approach for speech annotation of images,” IEEE Trans. Knowl.
Data Eng., vol. 23, no. 9, pp. 1373–1387, Sept. 2011.
[2] J. Li and J. Wang, “Automatic linguistic indexing of pictures by a statistical
modeling approach,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 9, pp.
1075–1088, Sept. 2003.
[3] C. Wangand, F. Jing, L. Zhang, and H. Zhang, “Image annotation refinement
using random walk with restarts,” in Proc. 14th Annu. ACM Int. Conf. Multimedia,
New York, NY, USA, 2006.
[4] B. Minescu, G. Damnati, F. Bechet, and R. de Mori, “Conditional use of word
lattices, confusion networks and 1-best string hypotheses in a sequential
interpretation strategy,” in Proc. ICASSP, 2007.
[5] R. Nuray-Turan, D. V. Kalashnikov, S. Mehrotra, and Y. Yu, “Attribute and
object selection queries on objects with probabilistic attributes,” ACM Trans.
DatabaseSyst., vol. 37, no. 1, Article 3, Feb. 2012.
[6] J. Li and A. Deshpande, “Consensus answers for queries over probabilistic
databases,” in Proc. 28th ACM SIGMOD-SIGACTSIGART Symp. PODS, New
York, NY, USA, 2009.
[7] M. B. Ebarhimi and A. A. Ghorbani, “A novel approach for frequent phrase
mining in web search engine query streams,” in Proc. 5th Annu. Conf. CNSR,
Frederlcton, NB, Canada, 2007.
[8] S. Bhatia, D. Majumdar, and P. Mitra, “Query suggestions in the absence of
query logs,” in Proc. 34th Int. ACM SIGIR, Beijing, China, 2011.
[9] C. Manning and H. Schutze, Foundations of Statistical Natural Language
Processing, Cambridge, MA, USA: MIT Press, 1999.
[10] D. V. Kalashnikov and S. Mehrotra, “Domain-independent data cleaning via
analysis of entity-relationship graph,” ACM Trans. Database Syst., vol. 31, no. 2,
pp. 716–767, Jun. 2006.
[11] K. Schnaitter, S. Abiteboul, T. Milo, and N. Polyzotis, “On-line index
selection for shifting workloads,” in Proc. IEEE 23rd Int. Conf. Data Eng.
Workshop, Istanbul, Turkey, 2007.
[12] P. Unterbrunner, G. Giannikis, G. Alonso, D. Fauser, and D. Kossmann,
“Predictable performance for unpredictable workloads,” in Proc. VLDB, Lyon,
France, 2009.
[13] R. Cheng, J. Chen, and X. Xie, “Cleaning uncertain data with quality
guarantees,” in Proc. VLDB, Auckland, New Zealand, 2008.

More Related Content

What's hot

USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWSUSING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWScsandit
 
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...IJDKP
 
Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Damian R. Mingle, MBA
 
Analysis of Textual Data Classification with a Reddit Comments Dataset
Analysis of Textual Data Classification with a Reddit Comments DatasetAnalysis of Textual Data Classification with a Reddit Comments Dataset
Analysis of Textual Data Classification with a Reddit Comments DatasetAdamBab
 
Selecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-RSelecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-RIOSR Journals
 
Explainable AI in Drug Hunting
Explainable AI in Drug HuntingExplainable AI in Drug Hunting
Explainable AI in Drug HuntingEd Griffen
 
Opinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classicationOpinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classicationIJECEIAES
 
Semantic Annotation of Documents
Semantic Annotation of DocumentsSemantic Annotation of Documents
Semantic Annotation of Documentssubash chandra
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection methodIJSRD
 
Classification on multi label dataset using rule mining technique
Classification on multi label dataset using rule mining techniqueClassification on multi label dataset using rule mining technique
Classification on multi label dataset using rule mining techniqueeSAT Publishing House
 
Anomaly detection via eliminating data redundancy and rectifying data error i...
Anomaly detection via eliminating data redundancy and rectifying data error i...Anomaly detection via eliminating data redundancy and rectifying data error i...
Anomaly detection via eliminating data redundancy and rectifying data error i...nalini manogaran
 
An integrated mechanism for feature selection
An integrated mechanism for feature selectionAn integrated mechanism for feature selection
An integrated mechanism for feature selectionsai kumar
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsDinusha Dilanka
 

What's hot (20)

Introductionedited
IntroductioneditedIntroductionedited
Introductionedited
 
FINAL REVIEW
FINAL REVIEWFINAL REVIEW
FINAL REVIEW
 
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWSUSING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
USING NLP APPROACH FOR ANALYZING CUSTOMER REVIEWS
 
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...
 
Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...Controlling informative features for improved accuracy and faster predictions...
Controlling informative features for improved accuracy and faster predictions...
 
IJET-V2I6P32
IJET-V2I6P32IJET-V2I6P32
IJET-V2I6P32
 
Analysis of Textual Data Classification with a Reddit Comments Dataset
Analysis of Textual Data Classification with a Reddit Comments DatasetAnalysis of Textual Data Classification with a Reddit Comments Dataset
Analysis of Textual Data Classification with a Reddit Comments Dataset
 
KDD Cup Research Paper
KDD Cup Research PaperKDD Cup Research Paper
KDD Cup Research Paper
 
C017510717
C017510717C017510717
C017510717
 
Selecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-RSelecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-R
 
Explainable AI in Drug Hunting
Explainable AI in Drug HuntingExplainable AI in Drug Hunting
Explainable AI in Drug Hunting
 
Opinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classicationOpinion mining framework using proposed RB-bayes model for text classication
Opinion mining framework using proposed RB-bayes model for text classication
 
Semantic Annotation of Documents
Semantic Annotation of DocumentsSemantic Annotation of Documents
Semantic Annotation of Documents
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection method
 
Classification on multi label dataset using rule mining technique
Classification on multi label dataset using rule mining techniqueClassification on multi label dataset using rule mining technique
Classification on multi label dataset using rule mining technique
 
Anomaly detection via eliminating data redundancy and rectifying data error i...
Anomaly detection via eliminating data redundancy and rectifying data error i...Anomaly detection via eliminating data redundancy and rectifying data error i...
Anomaly detection via eliminating data redundancy and rectifying data error i...
 
An integrated mechanism for feature selection
An integrated mechanism for feature selectionAn integrated mechanism for feature selection
An integrated mechanism for feature selection
 
Performance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning AlgorithmsPerformance Comparision of Machine Learning Algorithms
Performance Comparision of Machine Learning Algorithms
 
My experiment
My experimentMy experiment
My experiment
 

Similar to QUERY AWARE DETERMINIZATION OF UNCERTAIN OBJECTS

Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...Geetika Gautam
 
Query aware determinization of uncertain objects
Query aware determinization of uncertain objectsQuery aware determinization of uncertain objects
Query aware determinization of uncertain objectsSoftroniics india
 
Paper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityPaper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityGon-soo Moon
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruptionjagan477830
 
Summary_Classification_Algorithms_Student_Data
Summary_Classification_Algorithms_Student_DataSummary_Classification_Algorithms_Student_Data
Summary_Classification_Algorithms_Student_DataMadeleine Organ
 
IRJET- Study and Evaluation of Classification Algorithms in Data Mining
IRJET- Study and Evaluation of Classification Algorithms in Data MiningIRJET- Study and Evaluation of Classification Algorithms in Data Mining
IRJET- Study and Evaluation of Classification Algorithms in Data MiningIRJET Journal
 
Maximum likelihood estimation from uncertain
Maximum likelihood estimation from uncertainMaximum likelihood estimation from uncertain
Maximum likelihood estimation from uncertainIEEEFINALYEARPROJECTS
 
COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...
COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...
COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...IAEME Publication
 
Comparison between the genetic algorithms optimization and particle swarm opt...
Comparison between the genetic algorithms optimization and particle swarm opt...Comparison between the genetic algorithms optimization and particle swarm opt...
Comparison between the genetic algorithms optimization and particle swarm opt...IAEME Publication
 
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...cscpconf
 
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Mumbai Academisc
 
Predictive job scheduling in a connection limited system using parallel genet...
Predictive job scheduling in a connection limited system using parallel genet...Predictive job scheduling in a connection limited system using parallel genet...
Predictive job scheduling in a connection limited system using parallel genet...Mumbai Academisc
 
Neural networks, naïve bayes and decision tree machine learning
Neural networks, naïve bayes and decision tree machine learningNeural networks, naïve bayes and decision tree machine learning
Neural networks, naïve bayes and decision tree machine learningFrancisco E. Figueroa-Nigaglioni
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional VerificationSai Kiran Kadam
 
Supervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting TechniqueSupervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting Techniqueiosrjce
 
Higgs Boson Machine Learning Challenge - Kaggle
Higgs Boson Machine Learning Challenge - KaggleHiggs Boson Machine Learning Challenge - Kaggle
Higgs Boson Machine Learning Challenge - KaggleSajith Edirisinghe
 
Higgs bosob machine learning challange
Higgs bosob machine learning challangeHiggs bosob machine learning challange
Higgs bosob machine learning challangeTharindu Ranasinghe
 
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERIJCSEA Journal
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdfBeyaNasr1
 

Similar to QUERY AWARE DETERMINIZATION OF UNCERTAIN OBJECTS (20)

Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
 
Query aware determinization of uncertain objects
Query aware determinization of uncertain objectsQuery aware determinization of uncertain objects
Query aware determinization of uncertain objects
 
Paper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityPaper-Allstate-Claim-Severity
Paper-Allstate-Claim-Severity
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
Summary_Classification_Algorithms_Student_Data
Summary_Classification_Algorithms_Student_DataSummary_Classification_Algorithms_Student_Data
Summary_Classification_Algorithms_Student_Data
 
IRJET- Study and Evaluation of Classification Algorithms in Data Mining
IRJET- Study and Evaluation of Classification Algorithms in Data MiningIRJET- Study and Evaluation of Classification Algorithms in Data Mining
IRJET- Study and Evaluation of Classification Algorithms in Data Mining
 
Maximum likelihood estimation from uncertain
Maximum likelihood estimation from uncertainMaximum likelihood estimation from uncertain
Maximum likelihood estimation from uncertain
 
COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...
COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...
COMPARISON BETWEEN THE GENETIC ALGORITHMS OPTIMIZATION AND PARTICLE SWARM OPT...
 
Comparison between the genetic algorithms optimization and particle swarm opt...
Comparison between the genetic algorithms optimization and particle swarm opt...Comparison between the genetic algorithms optimization and particle swarm opt...
Comparison between the genetic algorithms optimization and particle swarm opt...
 
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
 
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...
 
Predictive job scheduling in a connection limited system using parallel genet...
Predictive job scheduling in a connection limited system using parallel genet...Predictive job scheduling in a connection limited system using parallel genet...
Predictive job scheduling in a connection limited system using parallel genet...
 
Neural networks, naïve bayes and decision tree machine learning
Neural networks, naïve bayes and decision tree machine learningNeural networks, naïve bayes and decision tree machine learning
Neural networks, naïve bayes and decision tree machine learning
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
J017256674
J017256674J017256674
J017256674
 
Supervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting TechniqueSupervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting Technique
 
Higgs Boson Machine Learning Challenge - Kaggle
Higgs Boson Machine Learning Challenge - KaggleHiggs Boson Machine Learning Challenge - Kaggle
Higgs Boson Machine Learning Challenge - Kaggle
 
Higgs bosob machine learning challange
Higgs bosob machine learning challangeHiggs bosob machine learning challange
Higgs bosob machine learning challange
 
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINERANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
ANALYSIS AND COMPARISON STUDY OF DATA MINING ALGORITHMS USING RAPIDMINER
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 

More from Nexgen Technology

MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...Nexgen Technology
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...Nexgen Technology
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...Nexgen Technology
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...Nexgen Technology
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...Nexgen Technology
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...Nexgen Technology
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CH...
     MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CH...     MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CH...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CH...Nexgen Technology
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENN...
  MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENN...  MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENN...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENN...Nexgen Technology
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...Nexgen Technology
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...Nexgen Technology
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENNA...
 MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENNA... MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENNA...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENNA...Nexgen Technology
 
Ieee 2020 21 vlsi projects in pondicherry,ieee vlsi projects in chennai
Ieee 2020 21 vlsi projects in pondicherry,ieee  vlsi projects  in chennaiIeee 2020 21 vlsi projects in pondicherry,ieee  vlsi projects  in chennai
Ieee 2020 21 vlsi projects in pondicherry,ieee vlsi projects in chennaiNexgen Technology
 
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics Nexgen Technology
 
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...Nexgen Technology
 
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...Nexgen Technology
 
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...Nexgen Technology
 
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...Nexgen Technology
 
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...Nexgen Technology
 
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...Nexgen Technology
 
Ieee 2020 21 embedded in pondicherry,final year projects in pondicherry,best...
Ieee 2020 21  embedded in pondicherry,final year projects in pondicherry,best...Ieee 2020 21  embedded in pondicherry,final year projects in pondicherry,best...
Ieee 2020 21 embedded in pondicherry,final year projects in pondicherry,best...Nexgen Technology
 

More from Nexgen Technology (20)

MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CH...
     MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CH...     MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CH...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CH...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENN...
  MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENN...  MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENN...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENN...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...    MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHE...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHE...
 
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENNA...
 MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENNA... MECHANICAL PROJECTS IN PONDICHERRY,   2020-21  MECHANICAL PROJECTS IN CHENNA...
MECHANICAL PROJECTS IN PONDICHERRY, 2020-21 MECHANICAL PROJECTS IN CHENNA...
 
Ieee 2020 21 vlsi projects in pondicherry,ieee vlsi projects in chennai
Ieee 2020 21 vlsi projects in pondicherry,ieee  vlsi projects  in chennaiIeee 2020 21 vlsi projects in pondicherry,ieee  vlsi projects  in chennai
Ieee 2020 21 vlsi projects in pondicherry,ieee vlsi projects in chennai
 
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics
Ieee 2020 21 power electronics in pondicherry,Ieee 2020 21 power electronics
 
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...
Ieee 2020 -21 ns2 in pondicherry, Ieee 2020 -21 ns2 projects,best project cen...
 
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...
Ieee 2020 21 ns2 in pondicherry,best project center in pondicherry,final year...
 
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...
Ieee 2020 21 java dotnet in pondicherry,final year projects in pondicherry,pr...
 
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...
Ieee 2020 21 iot in pondicherry,final year projects in pondicherry,project ce...
 
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...
Ieee 2020 21 blockchain in pondicherry,final year projects in pondicherry,bes...
 
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...
Ieee 2020 -21 bigdata in pondicherry,project center in pondicherry,best proje...
 
Ieee 2020 21 embedded in pondicherry,final year projects in pondicherry,best...
Ieee 2020 21  embedded in pondicherry,final year projects in pondicherry,best...Ieee 2020 21  embedded in pondicherry,final year projects in pondicherry,best...
Ieee 2020 21 embedded in pondicherry,final year projects in pondicherry,best...
 

Recently uploaded

Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonJericReyAuditor
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxAnaBeatriceAblay2
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 

Recently uploaded (20)

Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Science lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lessonScience lesson Moon for 4th quarter lesson
Science lesson Moon for 4th quarter lesson
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 

QUERY AWARE DETERMINIZATION OF UNCERTAIN OBJECTS

  • 1. QUERY AWARE DETERMINIZATION OF UNCERTAIN OBJECTS Abstract—This paper considers the problem of determinizing probabilistic data to enable such data to be stored in legacy systems that accept only deterministic input. Probabilistic data may be generated by automated data analysis/enrichment techniques such as entity resolution, information extraction, and speech processing. The legacy system may correspond to pre-existing web applications such as Flickr, Picasa, etc. The goal is to generate a deterministic representation of probabilistic data that optimizes the quality of the end-application built on deterministic data. We explore such a determinization problem in the context of two different data processing tasks—triggers and selection queries. We show that approaches such as thresholding or top-1 selection traditionally used for determinization lead to suboptimal performance for such applications. Instead, we develop a query-aware strategy and show its advantages over existing solutions through a comprehensive empirical evaluation over real and synthetic datasets. EXISTING SYSTEM: Determinizing Probabilistic Data. While we are not aware of any prior work that directly addresses the issue of determinizing probabilistic data as studied in this paper, the works that are very related to ours are . They explore how to determinize answers to a query over a probabilistic database. In contrast, we are interested in
  • 2. best deterministic representation of data (and not that of a answer to a query) so as to continue to use existing end-applications that take only deterministic input. The differences in the two problem settings lead to different challenges. Authors in address a problem that chooses the set of uncertain objects to be cleaned, in order to achieve the best improvement in the quality of query answers. However, their goal is to improve quality of single query, while ours is to optimize quality of overall query workload. Also, they focus on how to select the best set of objects and each selected object is cleaned by human clarification, whereas we determinize all objects automatically. These differences essentially lead to different optimization challenges. Another related area is MAP inference in graphical model, which aims to find the assignment to each variable that jointly maximizes the probability defined by the model. The determinization problem for the cost- based metric can be potentially viewed as an instance of MAP inference problem. If we view the problem that way, the challenge becomes that of developing fast and high-quality approximate algorithm for solving the corresponding NP-hard problem. Section 3.3 exactly provides such algorithms, heavily optimized and tuned to specifically our problem setting. Probabilistic Data Models. A variety of advanced probabilistic data models have been proposed in the past. Our focus however was determinizing probabilistic objects, such as image tags and speech output, for which the probabilistic attribute model suffices. We note that determining probabilistic data stored in more advanced probabilistic models such
  • 3. as And/Xor tree might also be interesting. Extending our work to deal with data of such complexity remains an interesting future direction of work. PROPOSED SYSTEM: Overall, the main contributions of this paper are: • We introduce the problem of determinizing probabilistic data. Given a workload of triggers/queries, the main challenge is to find the deterministic representation of the data which would optimize certain quality metrics of the answer to these triggers/queries. • We propose a framework that solves the problem of determinization by minimizing the expected cost of the answer to queries. We develop a branchand- bound algorithm that finds an approximate near-optimal solution to the resulting NP-hard problem. • We address the problem of determinizing a collection of objects to optimize set- based quality metrics, such as F-measure. We develop an efficient algorithm that reaches near-optimal quality. • We extend the solutions to handle a data model where mutual exclusion exists among tags. We show that correlations among tags can be leveraged in our solutions to get better results. We also demonstrate that our solutions are designed to handle various types of queries.
  • 4. • We empirically demonstrate that the proposed algorithms are very efficient and reach high-quality results that are very close to those of the optimal solution. We also demonstrate that they are robust to small changes in the original query workload. Module 1 Data Quality Data quality refers to the level of quality of Data. There are many definitions of data quality but data is generally considered high quality if, "they are fit for their intended uses in operations, decision making and planning." (J. M. Juran). Alternatively, data is deemed of high quality if it correctly represents the real- world construct to which it refers. Furthermore, apart from these definitions, as data volume increases, the question of internal consistency within data becomes significant, regardless of fitness for use for any particular external purpose. The people's views on data quality can often be in disagreement, even when discussing the same set of data used for the same purpose. A considerable amount of data quality research involves investigating and describing various categories of desirable attributes (or dimensions) of data. These lists commonly include accuracy, correctness, currency, completeness andrelevance. Nearly 200 such terms have been identified and there is little agreement in their nature (are these concepts, goals or criteria?), their definitions or measures (Wang et al., 1993). Software engineers may recognize this as a similar problem to "ilities".
  • 5. Data quality control is the process of controlling the usage of data with known quality measurements for an application or a process. This process is usually done after a Data Quality Assurance (QA) process, which consists of discovery of data inconsistency and correction. Data QA processesprovides following information to Data Quality Control (QC):  Severity of inconsistency  Incompleteness  Accuracy  Precision  Missing / Unknown The Data QC process uses the information from the QA process to decide to use the data for analysis or in an application or business process. For example, if a Data QC process finds that the data contains too many errors or inconsistencies, then it prevents that data from being used for its intended process which could cause disruption. For example, providing invalid measurements from several sensors to the automatic pilot feature on an aircraft could cause it to crash. Thus, establishing data QC process provides the protection of usage of data control and establishes safe information usage. Module 2 Determinization Problem
  • 6. Having defined the notation, we now can define the determinization problem: Definition 1 (Determinization). Given a set of uncertain objects O = {O1,O2, . . . ,O|O|}, a query workload Q and a quality metric F, the goal of the deteriminization problem is for each object O ∈ O to select from WO a set of tags AO ⊆ WO as the deterministic representation of O, such that F(O,Q) is optimized. The ground truth tags G are not associated with uncertain objects and thus not known to any approach to the determinization problem. Therefore, such algorithms cannot directly measure the quality F(O,Q) during their execution. Thus, in the following section, we will present our solution that is based on maximizing the expected quality measure E(F(O,Q)). Before we describe our solution to the determinization problem, we note that a baseline algorithm for determinization is a thresholding (threshold-cut) approach. The thresholding approach employs a pre- defined threshold τ . For each object O it composes its answer set A by choosing wi tags from W such that P(wi) ≥ τ . The advantage of this approach is that it is computationally efficient and potentially can be tuned to a particular dataset O and workload Q by changing τ . The main drawback of this approach is that it is unaware of the query workload (“query-unaware") and thus does not necessarily optimize the given quality metrics, which leads to lower quality.
  • 7. Module 3 Branch and Bound Algorithm Instead of performing a brute-force enumeration, we can employ a faster branch and bound (BB) technique. The approach discovers answer sets in a greedy fashion so that answer sets with lower costtend to be discovered first. Outline of the BB algorithm The algorithm uses a branch and bound strategy to explore a search tree. Each node v in the tree corresponds to a partial tag selection where decisions of whether to include certain tags in the answer set or not have been made. We capture the partial selection of tags using the concept of sequence. The performance of the algorithm depends upon the choice of the node (amongst all the non-leaf nodes) to branch, the computation of upper and lower bounds of a node, and the choice of the tag used to expand the node.We next describe these components of the algorithm in detail. Module 4 Node Selection
  • 8. The algorithm maintains a priority queue H for picking a node v that contains the most promising sequence Sv to continue with. Among the nodes in the priority queue, which one should we choose to branch next? Let us consider sequence Sv that corresponds to a node v ∈ H. For v we define Av as the set of answer sequences that correspond to the leaf nodes derivable from node v via the branching procedure described above. That is, Av corresponds to the leaf nodes of the subtree rooted at node v. For example, or node v2 of the tree in Fig. 3, Av2 = {Sv8 , Sv9 , Sv10 , Sv11 }. Then for node v let mv = min A∈ Av E(cost(A,G,Q)) be the value of the minimum expected cost among these sequences. Notice that if we knew the exact value of mv, it would have been an ideal key for the priority queue H, since it would lead to the quickest way to find the best sequences. The problem is that the exact value of mv is unknown when v is branched, since the subtree rooted at v is not yet constructed at that moment. Even though mv is unknown it is possible to quickly determine good lower and upper bounds on its value _v ≤ mv ≤ hv, without comparing scores of each sequence in Av. For any node v, if Sv is an answer sequence then the bounds are equal to the cost of the sequence itself, otherwise the bounds are computed as explained next. Module 5 Query-Level Optimization
  • 9. The performance of the BB algorithm can be significantly improved further by employing query-level optimizations. For a given sequence Sv of node v, we might be able to exactly determine the expected cost of certain queries (e.g., Cases 1 and 2 above), even if Sv is not an answer sequence. In other words, for these queries we can get their cost without expanding Sv into the corresponding answer sequences A ∈ Av. We refer to such cost as fixed cost of v, and to such queries as fixed or decided queries with respect to O. Thus, each node v is also associated with the set of undecided queries Qv, its fixed cost ˆcv, as well as its lower and upper bounds ˆlv and ˆhv on non-fixed cost. Note that lv = ˆcv +ˆlv and hv = ˆcv + ˆhv. Fixed cost and fixed queries give us an opportunity to make the BB algorithm even more efficient by reducing the number of queries involved in computing the expected cost. When the algorithm performs a branching at a node, it creates two new nodes vyes and vno, by making yes/no decision for one undecided tag w. Since one additional tag is now decided for each new node v, it is possible that expected cost for some query Q ∈ Qv can be computed (Case 1 and 2). If so, Q becomes a fixed query for the new node and will not be considered in future computations derived from that node. Based on the relation between Sv and Q, there are three different cases for updating ˆcv, ˆlv and ˆhv CONCLUSION:
  • 10. In this paper we have considered the problem of determinizing uncertain objects to enable such data to be stored in pre-existing systems, such as Flickr, that take only deterministic input. The goal is to generate deterministic representations that optimize the quality of answers to queries/triggers that execute over the deterministic data representation. We have proposed efficient determinization algorithms that are orders of magnitude faster than the enumeration based optimal solution but achieves almost the same quality as the optimal solution. As future work, we plan to explore determinization techniques in the context of applications, wherein users are also interested in retrieving objects in a ranked order. REFERENCES [1] D. V. Kalashnikov, S. Mehrotra, J. Xu, and N. Venkatasubramanian, “A semantics-based approach for speech annotation of images,” IEEE Trans. Knowl. Data Eng., vol. 23, no. 9, pp. 1373–1387, Sept. 2011. [2] J. Li and J. Wang, “Automatic linguistic indexing of pictures by a statistical modeling approach,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 9, pp. 1075–1088, Sept. 2003. [3] C. Wangand, F. Jing, L. Zhang, and H. Zhang, “Image annotation refinement using random walk with restarts,” in Proc. 14th Annu. ACM Int. Conf. Multimedia, New York, NY, USA, 2006.
  • 11. [4] B. Minescu, G. Damnati, F. Bechet, and R. de Mori, “Conditional use of word lattices, confusion networks and 1-best string hypotheses in a sequential interpretation strategy,” in Proc. ICASSP, 2007. [5] R. Nuray-Turan, D. V. Kalashnikov, S. Mehrotra, and Y. Yu, “Attribute and object selection queries on objects with probabilistic attributes,” ACM Trans. DatabaseSyst., vol. 37, no. 1, Article 3, Feb. 2012. [6] J. Li and A. Deshpande, “Consensus answers for queries over probabilistic databases,” in Proc. 28th ACM SIGMOD-SIGACTSIGART Symp. PODS, New York, NY, USA, 2009. [7] M. B. Ebarhimi and A. A. Ghorbani, “A novel approach for frequent phrase mining in web search engine query streams,” in Proc. 5th Annu. Conf. CNSR, Frederlcton, NB, Canada, 2007. [8] S. Bhatia, D. Majumdar, and P. Mitra, “Query suggestions in the absence of query logs,” in Proc. 34th Int. ACM SIGIR, Beijing, China, 2011. [9] C. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, Cambridge, MA, USA: MIT Press, 1999. [10] D. V. Kalashnikov and S. Mehrotra, “Domain-independent data cleaning via analysis of entity-relationship graph,” ACM Trans. Database Syst., vol. 31, no. 2, pp. 716–767, Jun. 2006. [11] K. Schnaitter, S. Abiteboul, T. Milo, and N. Polyzotis, “On-line index selection for shifting workloads,” in Proc. IEEE 23rd Int. Conf. Data Eng. Workshop, Istanbul, Turkey, 2007.
  • 12. [12] P. Unterbrunner, G. Giannikis, G. Alonso, D. Fauser, and D. Kossmann, “Predictable performance for unpredictable workloads,” in Proc. VLDB, Lyon, France, 2009. [13] R. Cheng, J. Chen, and X. Xie, “Cleaning uncertain data with quality guarantees,” in Proc. VLDB, Auckland, New Zealand, 2008.