SlideShare a Scribd company logo
1 of 5
Download to read offline
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
__________________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org 38
EVALUATING THE EFFICIENCY OF RULE TECHNIQUES FOR FILE
CLASSIFICATION
S. Vijayarani1
, M. Muthulakshmi2
1
Assistant Professor, 2
M. Phil Research Scholar, Department of Computer Science, School of Computer Science and
Engineering, Bharathiar University, Coimbatore, Tamilnadu, India.
vijimohan_2000@yahoo.com. abarajitha.uma@gmail.com
Abstract
Text mining refers to the process of deriving high quality information from text. It is also known as knowledge discovery from text
(KDT), deals with the machine supported analysis of text. It is used in various areas such as information retrieval, marketing,
information extraction, natural language processing, document similarity, and so on. Document Similarity is one of the important
techniques in text mining. In document similarity, the first and foremost step is to classify the files based on their category. In this
research work, various classification rule techniques are used to classify the computer files based on their extensions. For example,
the extension of computer files may be pdf, doc, ppt, xls, and so on. There are several algorithms for rule classifier such as decision
table, JRip, Ridor, DTNB, NNge, PART, OneR and ZeroR. In this research work, three classification algorithms namely decision
table, DTNB and OneR classifiers are used for performing classification of computer files based on their extension. The results
produced by these algorithms are analyzed by using the performance factors classification accuracy and error rate. From the
experimental results, DTNB proves to be more efficient than other two techniques.
Index Terms: Data mining, Text mining, Classification, Decision table, DTNB, OneR
----------------------------------------------------------------------***------------------------------------------------------------------------
1. INTRODUCTION
Data mining is the practice of searching through large amounts
of computerized data to find useful patterns or trends. Data
mining is the process of knowledge discovery where
knowledge is gained by analyzing the data store in very large
repositories which are analyzed from various perspectives and
the result is summarized it into useful information. Data
mining is also known as Knowledge Discovery in Data
(KDD). There are various research areas in data mining such
as web mining, text mining, image mining, statistics, machine
learning, data organization and databases, pattern detection,
artificial intelligence and other areas.
Text mining or knowledge discovery from text (KDT) deals
with the machine supported analysis of text. It uses methods
from information retrieval, natural language processing (NLP)
information extraction and also connects them with the
algorithms and methods of Knowledge discovery of data, data
mining, machine learning and statistics. Current research in
the area of text mining tackles problems of text representation,
classification, clustering, or the search and modeling of hidden
patterns. [5]
Text mining usually involves the process of structuring the
input text (usually parsing, along with the accumulation of
some derived linguistic features and the removal of others, and
consequent insertion into a database), deriving models within
the structured data, and to finish evaluation and interpretation
of the output. High quality in text mining typically refers to
some combination of relevance of relevance, innovation, and
interestingness. Various stages of a text-mining process can be
combined together into a single workgroup. [10]
Some of the important applications of text-mining include
Data Mining Competitive Intelligence, Records Management,
Enterprise Business Intelligence, National Security, E-
Discovery, Intelligence Scientific discovery especially Life
Sciences, Search or Information Access and Social media
monitoring. Some of the technologies that have been
developed and can be used in the text mining process are
information extraction, topic tracking, summarization,
classification, clustering, concept linkage, information
conception, and question answering [4]. In this new and
current era of technology, developments and techniques,
efficient and effective, document classification is becoming a
challenging and highly required area to capably categorize text
documents into mutually exclusive categories. Text
categorization is an upcoming and vital field in today’s
world which is most importantly required and demanded
to efficiently categorize various text documents into
different categories.
The rest of this paper is organized as follows. Section 2
illustrates the review of literature. Section 3 discusses the
classification rule classifier and the various algorithms used
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org 39
for classification. Experimental results are analyzed in Section
4 and Conclusions are given in Section 5.
2. LITERATURE REVIEW
C. Lakshmi Devasena et al [9] discussed the effectiveness of
Rule-Based classifiers for classification by taking a sample
data set from UCI machine learning repository using the open
source machine learning tool. An evaluation of different rule
based classifiers used in data mining and a practical
guideline for selecting the most suited algorithm for a
classification is presented and some empirical criteria for
describing and evaluating the classifiers are given. The
performances of the classifiers were measured and results are
compared using the Iris Data set. Among nine classifiers
(Conjunctive Rule Classifier, Decision Table Classifier,
DTNB Classifier, OneR Classifier, JRIP Classifier, NNGE
Classifier, PART Classifier, RIDOR Classifier and ZeroR
Classifier) NNGE Classifier performs well in the
classification problem. OneR classifier, RIDOR Classifier
and JRIP classifier are coming in the next category to classify
the data.
Biao Qin, Yuni Xia et al [4] proposed a new rule-based
classification and prediction algorithm called uRule for
classifying uncertain data. Uncertain data often occur in
modern applications, including sensor databases, spatial-
temporal databases, and medical or biology information
systems. This algorithm introduces new measures for
generating, pruning and optimizing rules. This new measures
are computed considering uncertain data interval and
probability distribution function. Based on the new values, the
optimal splitting attribute and splitting value can be identified
and used for classification and prediction. The uRule
algorithm can process uncertainty in both numerical and
categorical data. The experimental results show that uRule has
excellent performance even when data is highly uncertain.
Mohd Fauzi bin Othman et al [13] investigated the
performance of different classification or clustering methods
for a set of large data. The algorithm tested are Bayes
Network, Radial Basis Function, Rule based classifier, Pruned
Tree, Single Conjunctive Rule Learner and Nearest Neighbors
Algorithm. The dataset breast cancer with a total data of 6291
and a dimension of 699 rows and 9 columns will be used to
test and justify the differences between the classification
methods or algorithms. Consequently, the classification
technique that has the potential to significantly improve the
common or conventional methods will be suggested for use in
large scale data, bioinformatics or other common applications.
Among the machine learning algorithm tested, Bayes network
classifier has the potential to significantly improve the
conventional classification methods for use in medical or
bioinformatics field.
Dr. S. Vijayarani et al [19] discussed the classification rule
techniques in data mining are compared for predicting heart
disease. The classification rule algorithms are namely
Decision table, JRip, OneR and Part. By analyzed the
experimental results of accuracy measure, it is observed
that the decision table classification rule technique turned
out to be best classifier for heart disease prediction because it
contains more accuracy. By analyzed all error rates, the
Decision table and OneR classification rule algorithm
contains least error rate in possible two outcomes.
3. METHODOLOGY
Document similarity is one of the main tasks in the area of text
mining. The essential step of document similarity is to classify
the documents based on some criteria. [7] In this section,
various classification algorithms are used to classify the files
based on their extension which are stored in the computer hard
disk. (For example: pdf, doc, txt and so on). The methodology
of the research work is as follows.
1. Dataset – Computer Files can be collected from the
system hard disk.
2. Classification Rule Algorithms
 Decision Table
 DTNB
 OneR
3. Performance factors
 Classification accuracy
 Error rate
4. Best Technique among classification rule algorithms
 DTNB
3.1 Dataset
To compare these data mining classification techniques,
computer files can be collected from the system hard disk and
a synthetic data set is created. This dataset has 31994 instances
and four attributes namely file name, file size, file extension
and file path.
3.2 Classification Rule Techniques
Classification of documents involves assigning class labels
to documents indicating their category. Classification
algorithms are widely used in various applications. Data
classification is a two steps process in which first step is the
training phase where the classifier algorithm builds classifier
with the training set of tuples and the second phase is
classification phase where the model is used for classification
and its performance is analyzed with the testing set of tuples.
[12] There are various classification rule algorithms such as
Decision table, JRip, Ridor, DTNB, PART, OneR, and so on.
In this research, we have analyzed classification rule
algorithms namely Decision table, DTNB and OneR.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org 40
3.2.1 Decision Table
The algorithm, decision table, is found in the Weka classifiers
under Rules. The simplest way of representing the output from
machine learning is to put it in the same form as the input. The
use of the classifier rules decision table is described as
building and using a simple decision table majority classifier.
The output will show a decision on a number of attributes for
each instance. The number and specific types of attributes can
vary to suit the needs of the task.
Two variants of decision table classifiers are available. The
first classifier, called DTMaj (Decision Table Majority)
returns the majority of the training set if the decision table cell
matching the new instance is empty, that is, it does not contain
any training instances. The second classifier, called DTLoc
(Decision Table Local), is a new variant that searches for a
decision table entry with fewer matching attributes (larger
cells) if the matching cell is empty. This variant therefore
returns an answer from the native region. [14]
3.2.2 DTNB
This is for building and using a decision table/naive bayes
hybrid classifier. Every point in the search, the algorithm
estimates the value of dividing the attributes into two disjoint
subsets: one for the decision table, and the other for naive
Bayes. A forward selection search is used at each step, then
the selected attributes are exhibited by naive Bayes and the
remainder by the decision table and all attributes are modeled
by the decision table initially. At each step, the algorithm
dropping an attribute entirely from the model. [16]
The algorithm for learning the combined model (DTNB)
proceeds in much the same way as the one for stand-alone
DTs. At each point in the exploration it estimates the merit
associated with splitting the attributes into two disjoint
subsets: one for the DT, the other for NB. The class
probability estimates of the DT and NB must be combined to
generate overall class probability estimates. Assuming X> is
the set of attributes in the DT and X⊥ the one in NB, the
overall class probability is computed as
Q (y | X) = α × QDT(y | X>) × QNB(y | X⊥)/Q(y),
Where QDT(y | X>) and QNB(y | X ⊥) are the class
probability estimates obtained from the DT and NB
respectively, α is a normalization constant, and Q(y) is the
prior probability of the class. All probabilities are predictable
using Laplace corrected observed counts.
3.2.3 OneR
The OneR algorithm creates a single rule for each attribute of
training data and then picks up the rule with the least error
rate. To generate a rule for an attribute, the most recurrent
class for each attribute value must be established. The most
recurrent class is the class that appears most frequently for that
attribute value. A rule is a set of attribute values destined to
their most recurrent class with which the attribute based on.
Pseudo-code for OneR algorithm is
The number of training data instances which does not agree
with the binding of attribute value in the rule produces the
error rate. OneR selects the rule with the least error rate. If two
or more rules have same error rate, then the rule is selected at
random. [2]
4. EXPERIMENTAL RESULTS
4.1 Accuracy Measure
The following table shows the accuracy measure of
classification techniques. They are the correctly classified
instances, incorrectly classified instances, True Positive rate, F
Measure, Precision, Receiver Operating Characteristics (ROC)
Area and Kappa Statistics. TP Rate is the ratio of play cases
predicted correctly cases to the total of positive cases. F
Measure is a way of combining recall and precision scores
into a single measure of performance. [18] Precision is the
proportion of relevant documents in the results returned. ROC
Area is a traditional to plot the same information in a
normalized form with 1-false negative rate plotted against the
false positive rate.
Table-1: Accuracy Measure for Classification Rule
Algorithms
Algorithm
Classifier
Decision
Table
DTNB OneR
Correctly Classified
Instances
88.21 95.09 71.72
Incorrectly Classified
Instances
11.79 4.91 28.28
TP Rate 88.20 95.10 71.70
Precision 91.70 95.20 99.60
F Measure 89.10 94.80 79.70
ROC Area 94.30 98.60 85.80
Kappa Statistics 81.09 92.08 58.95
For each attribute A,
For each value VA of the attribute, make a rule as
follows:
Add up how often each class appears
Locate the most frequent class Cf
Generate a rule when A=VA; class attribute value = Cf
End For-Each
Compute the error rate of all rules
End For-Each
Select the rule with the smallest
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org 41
Chart-1: Accuracy Measure for Classification Rule
Algorithms
From the graph, it is observed that DTNB algorithm performs
better than other algorithms. Therefore the DTNB
classification algorithm performs well because it contains
highest accuracy when compared to Decision table and OneR.
4.2 Error Rate
They are the Mean Absolute Error (M.A.E), Root Mean
Square Error (R.M.S.E), Relative Absolute Error (R.A.E) and
Root Relative Squared Error (R.R.S.R). They are the Mean
Absolute Error (M.A.E), Root Mean Square Error (R.M.S.E),
Relative Absolute Error (R.A.E) and Root Relative Squared
Error (R.R.S.R). The mean absolute error (MAE) is defined
as the quantity used to measure how close predictions or
forecasts are to the eventual outcomes.[19] The root mean
square error (RMSE) is defined as frequently used
measure of the differences between values predicted by a
model or an estimator and the values actually observed.
Relative Absolute Error is a measure of the uncertainty of
measurement compared to the size of the measurement.
The root relative squared error defined as a relative to what it
would have been if a simple predictor had been used. More
specifically, this predictor is just the average of the actual
values.
Table-2: Error Rate of Classification Rule Algorithms
Algorithm MAE RMSE RAE RRAE
Decision Table 3.45 11.25 74.46 73.92
DTNB 1.98 8.21 42.64 53.96
OneR 2.09 14.47 45.19 95.10
Chart-2: Error Rate for Classification Rule Algorithms
From the above graph, it is observed that Decision table and
OneR algorithms attains highest error rate. Therefore, the
DTNB classification algorithm performs well because it
contains least error rate when compared to other algorithms.
CONCLUSIONS
Data mining is the process of discovering interesting
knowledge from large amounts of data stored either in
databases, data warehouses or other information sources. Text
Mining is the automated or partially automated processing of
text. In this research work, the classification rule algorithms
namely Decision table, DTNB and OneR are used for
classifying computer files which are stored in the computer.
By analysing the experimental results it is observed that the
DTNB (Decision Tree Naïve Bayes) classification technique
has yields better result than other techniques. In future we tend
to improve efficiency of performance by applying other data
mining techniques and algorithms.
REFERENCES
[1]. Abdullah Wahbeh, Mohammed Al-Kabi, “Comparative
Assessment of the performance of three WEKA text classifiers
applied Arabic text.
[2]. Anshul Goyal, Rajni Mehta, “Performance Comparison of
Naïve Bayes and J48 Classification Algorithms”.
[3]. Aurangzeb Khan, Baharum Baharudin, Lam Hong Lee,
Khairullah khan, “A Review of Machine Learning Algorithms
for Text-Documents Classification”.
[4]. Biao Qin, Yuni Xia, Sunil Prabhakar, Yicheng Tu, “A
Rule-Based Classification Algorithm for Uncertain Data”.
[5]. BS Harish, DS Guru, S Manjunath, “Representation and
Classification of Text Documents: A Brief Review”.
[6]. S.Deepajothi, Dr.S.Selvarajan, “A Comparative Study of
Classification Techniques on Adult Data Set”
[7]. Ian H. Witten, Eibe Frank, “Data Mining Tools and
Techniques practical Machine Learning”.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org 42
[8]. Kaushik Raviya, Biren Gajjar, “Performance Evaluation
of Different Data Mining Classification Algorithm Using
WEKA”.
[9]. C. Lakshmi Devasena, T. Sumathi, V.V. Gomathi and M.
Hemalatha, “Effectiveness Evaluation of Rule Based
Classifiers for the Classification of Iris Data Set.
[10]. Lan Huang, David Milne, Eibe Frank, Ian H. Witten,
“Learning a Concept-based Document Similarity Measure”.
[11]. Mahendra Tiwari, Manu Bhai Jha, Om Prakash Yadav,
“Performance analysis of Data Mining algorithms in Weka”.
[12]. Mirza Nazura Abdulkarim, “Classification and Retrieval
of Research Papers: A Semantic Hierarchical Approach”.
[13]. Mohd Fauzi bin Othman, Thomas Moh Shan Yau,
“Comparison of Different Classification Techniques using
WEKA for Breast Cancer”.
[14]. Petra Kralj Novak, “Classification in WEKA”.
[15]. Mrs. Sayantani Ghosh, Mr. Sudipta Roy, Prof. Samir K.
Bandyopadhyay, “A tutorial review on Text Mining
Algorithms”.
[16]. Sunila Godara, Ritu Yadav, “Performance analysis of
clustering algorithms for character recognition using Weka
tool”.
[17]. Trilok Chand Sharma, Manoj Jain, “WEKA Approach
for Comparative Study of Classification Algorithm”.
[18]. Dr. S.Vijayarani, S.Sudha, “Comparative Analysis of
Classification Function Techniques for Heart Disease
Prediction”.
[19]. Dr. S.Vijayarani, S. Sudha, “An Effective Classification
Rule Technique for Heart Disease Prediction”.
BIOGRAPHIES
Dr. S. Vijayarani has completed MCA,
M.Phil and PhD in Computer Science. She
is working as Assistant Professor in the
School of Computer Science and
Engineering, Bharathiar University,
Coimbatore. Her fields of research
interest are data mining, privacy,
security, bioinformatics and data streams. She has
published papers in the international journals and
presented research papers in international and national
conferences.
Mrs. M Muthulakshmi has completed
M.Sc in Computer Science and
Information Technology. She is currently
pursuing her M.Phil in Computer
Science in the School of Computer
Science and Engineering, Bharathiar
University, Coimbatore. Her fields of
interest are Data Mining, Text Mining and Semantic web
mining.

More Related Content

What's hot

Predicting students' performance using id3 and c4.5 classification algorithms
Predicting students' performance using id3 and c4.5 classification algorithmsPredicting students' performance using id3 and c4.5 classification algorithms
Predicting students' performance using id3 and c4.5 classification algorithmsIJDKP
 
Correlation of artificial neural network classification and nfrs attribute fi...
Correlation of artificial neural network classification and nfrs attribute fi...Correlation of artificial neural network classification and nfrs attribute fi...
Correlation of artificial neural network classification and nfrs attribute fi...eSAT Journals
 
Ontology Based PMSE with Manifold Preference
Ontology Based PMSE with Manifold PreferenceOntology Based PMSE with Manifold Preference
Ontology Based PMSE with Manifold PreferenceIJCERT
 
Using data mining methods knowledge discovery for text mining
Using data mining methods knowledge discovery for text miningUsing data mining methods knowledge discovery for text mining
Using data mining methods knowledge discovery for text miningeSAT Journals
 
Data Mining System and Applications: A Review
Data Mining System and Applications: A ReviewData Mining System and Applications: A Review
Data Mining System and Applications: A Reviewijdpsjournal
 
An efficient information retrieval ontology system based indexing for context
An efficient information retrieval ontology system based indexing for contextAn efficient information retrieval ontology system based indexing for context
An efficient information retrieval ontology system based indexing for contexteSAT Journals
 
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)Universitas Pembangunan Panca Budi
 
June 2020: Top Read Articles in Advanced Computational Intelligence
June 2020: Top Read Articles in Advanced Computational IntelligenceJune 2020: Top Read Articles in Advanced Computational Intelligence
June 2020: Top Read Articles in Advanced Computational Intelligenceaciijournal
 
Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebDistributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebEditor IJCATR
 
Using data mining methods knowledge discovery for text mining
Using data mining methods knowledge discovery for text miningUsing data mining methods knowledge discovery for text mining
Using data mining methods knowledge discovery for text miningeSAT Publishing House
 
Privacy Preservation and Restoration of Data Using Unrealized Data Sets
Privacy Preservation and Restoration of Data Using Unrealized Data SetsPrivacy Preservation and Restoration of Data Using Unrealized Data Sets
Privacy Preservation and Restoration of Data Using Unrealized Data SetsIJERA Editor
 
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : ...
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL  FOR HEALTHCARE INFORMATION SYSTEM :   ...ONTOLOGY-DRIVEN INFORMATION RETRIEVAL  FOR HEALTHCARE INFORMATION SYSTEM :   ...
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : ...IJNSA Journal
 
A Semantic Retrieval System for Extracting Relationships from Biological Corpus
A Semantic Retrieval System for Extracting Relationships from Biological CorpusA Semantic Retrieval System for Extracting Relationships from Biological Corpus
A Semantic Retrieval System for Extracting Relationships from Biological Corpusijcsit
 
An optimal unsupervised text data segmentation 3
An optimal unsupervised text data segmentation 3An optimal unsupervised text data segmentation 3
An optimal unsupervised text data segmentation 3prj_publication
 
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUESTUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUEIJDKP
 

What's hot (17)

Predicting students' performance using id3 and c4.5 classification algorithms
Predicting students' performance using id3 and c4.5 classification algorithmsPredicting students' performance using id3 and c4.5 classification algorithms
Predicting students' performance using id3 and c4.5 classification algorithms
 
Correlation of artificial neural network classification and nfrs attribute fi...
Correlation of artificial neural network classification and nfrs attribute fi...Correlation of artificial neural network classification and nfrs attribute fi...
Correlation of artificial neural network classification and nfrs attribute fi...
 
Ontology Based PMSE with Manifold Preference
Ontology Based PMSE with Manifold PreferenceOntology Based PMSE with Manifold Preference
Ontology Based PMSE with Manifold Preference
 
50120130406032
5012013040603250120130406032
50120130406032
 
Using data mining methods knowledge discovery for text mining
Using data mining methods knowledge discovery for text miningUsing data mining methods knowledge discovery for text mining
Using data mining methods knowledge discovery for text mining
 
Data Mining System and Applications: A Review
Data Mining System and Applications: A ReviewData Mining System and Applications: A Review
Data Mining System and Applications: A Review
 
An efficient information retrieval ontology system based indexing for context
An efficient information retrieval ontology system based indexing for contextAn efficient information retrieval ontology system based indexing for context
An efficient information retrieval ontology system based indexing for context
 
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
 
June 2020: Top Read Articles in Advanced Computational Intelligence
June 2020: Top Read Articles in Advanced Computational IntelligenceJune 2020: Top Read Articles in Advanced Computational Intelligence
June 2020: Top Read Articles in Advanced Computational Intelligence
 
Distributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic WebDistributed Digital Artifacts on the Semantic Web
Distributed Digital Artifacts on the Semantic Web
 
Using data mining methods knowledge discovery for text mining
Using data mining methods knowledge discovery for text miningUsing data mining methods knowledge discovery for text mining
Using data mining methods knowledge discovery for text mining
 
Privacy Preservation and Restoration of Data Using Unrealized Data Sets
Privacy Preservation and Restoration of Data Using Unrealized Data SetsPrivacy Preservation and Restoration of Data Using Unrealized Data Sets
Privacy Preservation and Restoration of Data Using Unrealized Data Sets
 
50120140501018
5012014050101850120140501018
50120140501018
 
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : ...
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL  FOR HEALTHCARE INFORMATION SYSTEM :   ...ONTOLOGY-DRIVEN INFORMATION RETRIEVAL  FOR HEALTHCARE INFORMATION SYSTEM :   ...
ONTOLOGY-DRIVEN INFORMATION RETRIEVAL FOR HEALTHCARE INFORMATION SYSTEM : ...
 
A Semantic Retrieval System for Extracting Relationships from Biological Corpus
A Semantic Retrieval System for Extracting Relationships from Biological CorpusA Semantic Retrieval System for Extracting Relationships from Biological Corpus
A Semantic Retrieval System for Extracting Relationships from Biological Corpus
 
An optimal unsupervised text data segmentation 3
An optimal unsupervised text data segmentation 3An optimal unsupervised text data segmentation 3
An optimal unsupervised text data segmentation 3
 
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUESTUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
STUDENTS’ PERFORMANCE PREDICTION SYSTEM USING MULTI AGENT DATA MINING TECHNIQUE
 

Viewers also liked

Seismic performance of r c buildings on sloping grounds with different types ...
Seismic performance of r c buildings on sloping grounds with different types ...Seismic performance of r c buildings on sloping grounds with different types ...
Seismic performance of r c buildings on sloping grounds with different types ...eSAT Journals
 
Chemical engineering and competences in ecology
Chemical engineering and competences in ecologyChemical engineering and competences in ecology
Chemical engineering and competences in ecologyeSAT Journals
 
Digital architecture manifesting an accurate virtual built environment
Digital architecture manifesting an accurate virtual built environmentDigital architecture manifesting an accurate virtual built environment
Digital architecture manifesting an accurate virtual built environmenteSAT Journals
 
Public encryption with two ack approach to mitigate wormhole attack in wsn
Public encryption with two ack approach to mitigate wormhole attack in wsnPublic encryption with two ack approach to mitigate wormhole attack in wsn
Public encryption with two ack approach to mitigate wormhole attack in wsneSAT Journals
 
Variation in linear density of combed yarn due to dyeing with reactive dye in...
Variation in linear density of combed yarn due to dyeing with reactive dye in...Variation in linear density of combed yarn due to dyeing with reactive dye in...
Variation in linear density of combed yarn due to dyeing with reactive dye in...eSAT Journals
 
Automatic signature verification with chain code using weighted distance and ...
Automatic signature verification with chain code using weighted distance and ...Automatic signature verification with chain code using weighted distance and ...
Automatic signature verification with chain code using weighted distance and ...eSAT Journals
 
Seismic evaluation of rc frame with brick masonry infill walls
Seismic evaluation of rc frame with brick masonry infill wallsSeismic evaluation of rc frame with brick masonry infill walls
Seismic evaluation of rc frame with brick masonry infill wallseSAT Journals
 
Result generation system for cbgs scheme in educational organization
Result generation system for cbgs scheme in educational organizationResult generation system for cbgs scheme in educational organization
Result generation system for cbgs scheme in educational organizationeSAT Journals
 
Control mouse and computer system using voice commands
Control mouse and computer system using voice commandsControl mouse and computer system using voice commands
Control mouse and computer system using voice commandseSAT Journals
 
Learning in content based image retrieval a review
Learning in content based image retrieval  a reviewLearning in content based image retrieval  a review
Learning in content based image retrieval a revieweSAT Journals
 
Meta documents and query extension to enhance information retrieval process
Meta documents and query extension to enhance information retrieval processMeta documents and query extension to enhance information retrieval process
Meta documents and query extension to enhance information retrieval processeSAT Journals
 
Thermal analysis of water cooled charge air cooler in turbo charged diesel en...
Thermal analysis of water cooled charge air cooler in turbo charged diesel en...Thermal analysis of water cooled charge air cooler in turbo charged diesel en...
Thermal analysis of water cooled charge air cooler in turbo charged diesel en...eSAT Journals
 
How to take dynamic data in life cycle inventory from brazilian sugarcane eth...
How to take dynamic data in life cycle inventory from brazilian sugarcane eth...How to take dynamic data in life cycle inventory from brazilian sugarcane eth...
How to take dynamic data in life cycle inventory from brazilian sugarcane eth...eSAT Journals
 
Real time tracking based on gsm and ir module for running train
Real time tracking based on gsm and ir module for running trainReal time tracking based on gsm and ir module for running train
Real time tracking based on gsm and ir module for running traineSAT Journals
 
Optimization techniques for harmonics minimization in cascaded hybrid multile...
Optimization techniques for harmonics minimization in cascaded hybrid multile...Optimization techniques for harmonics minimization in cascaded hybrid multile...
Optimization techniques for harmonics minimization in cascaded hybrid multile...eSAT Journals
 
Comparison of performance of lateral load resisting systems in multi storey f...
Comparison of performance of lateral load resisting systems in multi storey f...Comparison of performance of lateral load resisting systems in multi storey f...
Comparison of performance of lateral load resisting systems in multi storey f...eSAT Journals
 
Seismic response of multi storey irregular building with floating column
Seismic response of multi storey irregular building with floating columnSeismic response of multi storey irregular building with floating column
Seismic response of multi storey irregular building with floating columneSAT Journals
 

Viewers also liked (18)

Seismic performance of r c buildings on sloping grounds with different types ...
Seismic performance of r c buildings on sloping grounds with different types ...Seismic performance of r c buildings on sloping grounds with different types ...
Seismic performance of r c buildings on sloping grounds with different types ...
 
Survey mobile app
Survey mobile appSurvey mobile app
Survey mobile app
 
Chemical engineering and competences in ecology
Chemical engineering and competences in ecologyChemical engineering and competences in ecology
Chemical engineering and competences in ecology
 
Digital architecture manifesting an accurate virtual built environment
Digital architecture manifesting an accurate virtual built environmentDigital architecture manifesting an accurate virtual built environment
Digital architecture manifesting an accurate virtual built environment
 
Public encryption with two ack approach to mitigate wormhole attack in wsn
Public encryption with two ack approach to mitigate wormhole attack in wsnPublic encryption with two ack approach to mitigate wormhole attack in wsn
Public encryption with two ack approach to mitigate wormhole attack in wsn
 
Variation in linear density of combed yarn due to dyeing with reactive dye in...
Variation in linear density of combed yarn due to dyeing with reactive dye in...Variation in linear density of combed yarn due to dyeing with reactive dye in...
Variation in linear density of combed yarn due to dyeing with reactive dye in...
 
Automatic signature verification with chain code using weighted distance and ...
Automatic signature verification with chain code using weighted distance and ...Automatic signature verification with chain code using weighted distance and ...
Automatic signature verification with chain code using weighted distance and ...
 
Seismic evaluation of rc frame with brick masonry infill walls
Seismic evaluation of rc frame with brick masonry infill wallsSeismic evaluation of rc frame with brick masonry infill walls
Seismic evaluation of rc frame with brick masonry infill walls
 
Result generation system for cbgs scheme in educational organization
Result generation system for cbgs scheme in educational organizationResult generation system for cbgs scheme in educational organization
Result generation system for cbgs scheme in educational organization
 
Control mouse and computer system using voice commands
Control mouse and computer system using voice commandsControl mouse and computer system using voice commands
Control mouse and computer system using voice commands
 
Learning in content based image retrieval a review
Learning in content based image retrieval  a reviewLearning in content based image retrieval  a review
Learning in content based image retrieval a review
 
Meta documents and query extension to enhance information retrieval process
Meta documents and query extension to enhance information retrieval processMeta documents and query extension to enhance information retrieval process
Meta documents and query extension to enhance information retrieval process
 
Thermal analysis of water cooled charge air cooler in turbo charged diesel en...
Thermal analysis of water cooled charge air cooler in turbo charged diesel en...Thermal analysis of water cooled charge air cooler in turbo charged diesel en...
Thermal analysis of water cooled charge air cooler in turbo charged diesel en...
 
How to take dynamic data in life cycle inventory from brazilian sugarcane eth...
How to take dynamic data in life cycle inventory from brazilian sugarcane eth...How to take dynamic data in life cycle inventory from brazilian sugarcane eth...
How to take dynamic data in life cycle inventory from brazilian sugarcane eth...
 
Real time tracking based on gsm and ir module for running train
Real time tracking based on gsm and ir module for running trainReal time tracking based on gsm and ir module for running train
Real time tracking based on gsm and ir module for running train
 
Optimization techniques for harmonics minimization in cascaded hybrid multile...
Optimization techniques for harmonics minimization in cascaded hybrid multile...Optimization techniques for harmonics minimization in cascaded hybrid multile...
Optimization techniques for harmonics minimization in cascaded hybrid multile...
 
Comparison of performance of lateral load resisting systems in multi storey f...
Comparison of performance of lateral load resisting systems in multi storey f...Comparison of performance of lateral load resisting systems in multi storey f...
Comparison of performance of lateral load resisting systems in multi storey f...
 
Seismic response of multi storey irregular building with floating column
Seismic response of multi storey irregular building with floating columnSeismic response of multi storey irregular building with floating column
Seismic response of multi storey irregular building with floating column
 

Similar to Evaluating the efficiency of rule techniques for file classification

Data mining techniques
Data mining techniquesData mining techniques
Data mining techniqueseSAT Journals
 
Data mining techniques a survey paper
Data mining techniques a survey paperData mining techniques a survey paper
Data mining techniques a survey papereSAT Publishing House
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET Journal
 
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYCLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYEditor IJMTER
 
Variance rover system web analytics tool using data
Variance rover system web analytics tool using dataVariance rover system web analytics tool using data
Variance rover system web analytics tool using dataeSAT Publishing House
 
Variance rover system
Variance rover systemVariance rover system
Variance rover systemeSAT Journals
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection methodIJSRD
 
The Survey of Data Mining Applications And Feature Scope
The Survey of Data Mining Applications  And Feature Scope The Survey of Data Mining Applications  And Feature Scope
The Survey of Data Mining Applications And Feature Scope IJCSEIT Journal
 
Applying Soft Computing Techniques in Information Retrieval
Applying Soft Computing Techniques in Information RetrievalApplying Soft Computing Techniques in Information Retrieval
Applying Soft Computing Techniques in Information RetrievalIJAEMSJORNAL
 
Document retrieval using clustering
Document retrieval using clusteringDocument retrieval using clustering
Document retrieval using clusteringeSAT Journals
 
A study and survey on various progressive duplicate detection mechanisms
A study and survey on various progressive duplicate detection mechanismsA study and survey on various progressive duplicate detection mechanisms
A study and survey on various progressive duplicate detection mechanismseSAT Journals
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET Journal
 
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILESAN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILEScscpconf
 
An efficient feature selection in
An efficient feature selection inAn efficient feature selection in
An efficient feature selection incsandit
 
A Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryA Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryIJERA Editor
 
Machine learning for text document classification-efficient classification ap...
Machine learning for text document classification-efficient classification ap...Machine learning for text document classification-efficient classification ap...
Machine learning for text document classification-efficient classification ap...IAESIJAI
 
An Analysis of Outlier Detection through clustering method
An Analysis of Outlier Detection through clustering methodAn Analysis of Outlier Detection through clustering method
An Analysis of Outlier Detection through clustering methodIJAEMSJORNAL
 
A Survey of Modern Data Classification Techniques
A Survey of Modern Data Classification TechniquesA Survey of Modern Data Classification Techniques
A Survey of Modern Data Classification Techniquesijsrd.com
 

Similar to Evaluating the efficiency of rule techniques for file classification (20)

Data mining techniques
Data mining techniquesData mining techniques
Data mining techniques
 
Data mining techniques a survey paper
Data mining techniques a survey paperData mining techniques a survey paper
Data mining techniques a survey paper
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
 
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYCLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
 
Variance rover system web analytics tool using data
Variance rover system web analytics tool using dataVariance rover system web analytics tool using data
Variance rover system web analytics tool using data
 
Variance rover system
Variance rover systemVariance rover system
Variance rover system
 
G045033841
G045033841G045033841
G045033841
 
Introduction to feature subset selection method
Introduction to feature subset selection methodIntroduction to feature subset selection method
Introduction to feature subset selection method
 
The Survey of Data Mining Applications And Feature Scope
The Survey of Data Mining Applications  And Feature Scope The Survey of Data Mining Applications  And Feature Scope
The Survey of Data Mining Applications And Feature Scope
 
Applying Soft Computing Techniques in Information Retrieval
Applying Soft Computing Techniques in Information RetrievalApplying Soft Computing Techniques in Information Retrieval
Applying Soft Computing Techniques in Information Retrieval
 
Document retrieval using clustering
Document retrieval using clusteringDocument retrieval using clustering
Document retrieval using clustering
 
A study and survey on various progressive duplicate detection mechanisms
A study and survey on various progressive duplicate detection mechanismsA study and survey on various progressive duplicate detection mechanisms
A study and survey on various progressive duplicate detection mechanisms
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
 
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILESAN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
AN EFFICIENT FEATURE SELECTION IN CLASSIFICATION OF AUDIO FILES
 
An efficient feature selection in
An efficient feature selection inAn efficient feature selection in
An efficient feature selection in
 
A Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryA Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
A Survey Ondecision Tree Learning Algorithms for Knowledge Discovery
 
Machine learning for text document classification-efficient classification ap...
Machine learning for text document classification-efficient classification ap...Machine learning for text document classification-efficient classification ap...
Machine learning for text document classification-efficient classification ap...
 
Ez36937941
Ez36937941Ez36937941
Ez36937941
 
An Analysis of Outlier Detection through clustering method
An Analysis of Outlier Detection through clustering methodAn Analysis of Outlier Detection through clustering method
An Analysis of Outlier Detection through clustering method
 
A Survey of Modern Data Classification Techniques
A Survey of Modern Data Classification TechniquesA Survey of Modern Data Classification Techniques
A Survey of Modern Data Classification Techniques
 

More from eSAT Journals

Mechanical properties of hybrid fiber reinforced concrete for pavements
Mechanical properties of hybrid fiber reinforced concrete for pavementsMechanical properties of hybrid fiber reinforced concrete for pavements
Mechanical properties of hybrid fiber reinforced concrete for pavementseSAT Journals
 
Material management in construction – a case study
Material management in construction – a case studyMaterial management in construction – a case study
Material management in construction – a case studyeSAT Journals
 
Managing drought short term strategies in semi arid regions a case study
Managing drought    short term strategies in semi arid regions  a case studyManaging drought    short term strategies in semi arid regions  a case study
Managing drought short term strategies in semi arid regions a case studyeSAT Journals
 
Life cycle cost analysis of overlay for an urban road in bangalore
Life cycle cost analysis of overlay for an urban road in bangaloreLife cycle cost analysis of overlay for an urban road in bangalore
Life cycle cost analysis of overlay for an urban road in bangaloreeSAT Journals
 
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materialsLaboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materialseSAT Journals
 
Laboratory investigation of expansive soil stabilized with natural inorganic ...
Laboratory investigation of expansive soil stabilized with natural inorganic ...Laboratory investigation of expansive soil stabilized with natural inorganic ...
Laboratory investigation of expansive soil stabilized with natural inorganic ...eSAT Journals
 
Influence of reinforcement on the behavior of hollow concrete block masonry p...
Influence of reinforcement on the behavior of hollow concrete block masonry p...Influence of reinforcement on the behavior of hollow concrete block masonry p...
Influence of reinforcement on the behavior of hollow concrete block masonry p...eSAT Journals
 
Influence of compaction energy on soil stabilized with chemical stabilizer
Influence of compaction energy on soil stabilized with chemical stabilizerInfluence of compaction energy on soil stabilized with chemical stabilizer
Influence of compaction energy on soil stabilized with chemical stabilizereSAT Journals
 
Geographical information system (gis) for water resources management
Geographical information system (gis) for water resources managementGeographical information system (gis) for water resources management
Geographical information system (gis) for water resources managementeSAT Journals
 
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
Forest type mapping of bidar forest division, karnataka using geoinformatics ...Forest type mapping of bidar forest division, karnataka using geoinformatics ...
Forest type mapping of bidar forest division, karnataka using geoinformatics ...eSAT Journals
 
Factors influencing compressive strength of geopolymer concrete
Factors influencing compressive strength of geopolymer concreteFactors influencing compressive strength of geopolymer concrete
Factors influencing compressive strength of geopolymer concreteeSAT Journals
 
Experimental investigation on circular hollow steel columns in filled with li...
Experimental investigation on circular hollow steel columns in filled with li...Experimental investigation on circular hollow steel columns in filled with li...
Experimental investigation on circular hollow steel columns in filled with li...eSAT Journals
 
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
Experimental behavior of circular hsscfrc filled steel tubular columns under ...Experimental behavior of circular hsscfrc filled steel tubular columns under ...
Experimental behavior of circular hsscfrc filled steel tubular columns under ...eSAT Journals
 
Evaluation of punching shear in flat slabs
Evaluation of punching shear in flat slabsEvaluation of punching shear in flat slabs
Evaluation of punching shear in flat slabseSAT Journals
 
Evaluation of performance of intake tower dam for recent earthquake in india
Evaluation of performance of intake tower dam for recent earthquake in indiaEvaluation of performance of intake tower dam for recent earthquake in india
Evaluation of performance of intake tower dam for recent earthquake in indiaeSAT Journals
 
Evaluation of operational efficiency of urban road network using travel time ...
Evaluation of operational efficiency of urban road network using travel time ...Evaluation of operational efficiency of urban road network using travel time ...
Evaluation of operational efficiency of urban road network using travel time ...eSAT Journals
 
Estimation of surface runoff in nallur amanikere watershed using scs cn method
Estimation of surface runoff in nallur amanikere watershed using scs cn methodEstimation of surface runoff in nallur amanikere watershed using scs cn method
Estimation of surface runoff in nallur amanikere watershed using scs cn methodeSAT Journals
 
Estimation of morphometric parameters and runoff using rs & gis techniques
Estimation of morphometric parameters and runoff using rs & gis techniquesEstimation of morphometric parameters and runoff using rs & gis techniques
Estimation of morphometric parameters and runoff using rs & gis techniqueseSAT Journals
 
Effect of variation of plastic hinge length on the results of non linear anal...
Effect of variation of plastic hinge length on the results of non linear anal...Effect of variation of plastic hinge length on the results of non linear anal...
Effect of variation of plastic hinge length on the results of non linear anal...eSAT Journals
 
Effect of use of recycled materials on indirect tensile strength of asphalt c...
Effect of use of recycled materials on indirect tensile strength of asphalt c...Effect of use of recycled materials on indirect tensile strength of asphalt c...
Effect of use of recycled materials on indirect tensile strength of asphalt c...eSAT Journals
 

More from eSAT Journals (20)

Mechanical properties of hybrid fiber reinforced concrete for pavements
Mechanical properties of hybrid fiber reinforced concrete for pavementsMechanical properties of hybrid fiber reinforced concrete for pavements
Mechanical properties of hybrid fiber reinforced concrete for pavements
 
Material management in construction – a case study
Material management in construction – a case studyMaterial management in construction – a case study
Material management in construction – a case study
 
Managing drought short term strategies in semi arid regions a case study
Managing drought    short term strategies in semi arid regions  a case studyManaging drought    short term strategies in semi arid regions  a case study
Managing drought short term strategies in semi arid regions a case study
 
Life cycle cost analysis of overlay for an urban road in bangalore
Life cycle cost analysis of overlay for an urban road in bangaloreLife cycle cost analysis of overlay for an urban road in bangalore
Life cycle cost analysis of overlay for an urban road in bangalore
 
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materialsLaboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
 
Laboratory investigation of expansive soil stabilized with natural inorganic ...
Laboratory investigation of expansive soil stabilized with natural inorganic ...Laboratory investigation of expansive soil stabilized with natural inorganic ...
Laboratory investigation of expansive soil stabilized with natural inorganic ...
 
Influence of reinforcement on the behavior of hollow concrete block masonry p...
Influence of reinforcement on the behavior of hollow concrete block masonry p...Influence of reinforcement on the behavior of hollow concrete block masonry p...
Influence of reinforcement on the behavior of hollow concrete block masonry p...
 
Influence of compaction energy on soil stabilized with chemical stabilizer
Influence of compaction energy on soil stabilized with chemical stabilizerInfluence of compaction energy on soil stabilized with chemical stabilizer
Influence of compaction energy on soil stabilized with chemical stabilizer
 
Geographical information system (gis) for water resources management
Geographical information system (gis) for water resources managementGeographical information system (gis) for water resources management
Geographical information system (gis) for water resources management
 
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
Forest type mapping of bidar forest division, karnataka using geoinformatics ...Forest type mapping of bidar forest division, karnataka using geoinformatics ...
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
 
Factors influencing compressive strength of geopolymer concrete
Factors influencing compressive strength of geopolymer concreteFactors influencing compressive strength of geopolymer concrete
Factors influencing compressive strength of geopolymer concrete
 
Experimental investigation on circular hollow steel columns in filled with li...
Experimental investigation on circular hollow steel columns in filled with li...Experimental investigation on circular hollow steel columns in filled with li...
Experimental investigation on circular hollow steel columns in filled with li...
 
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
Experimental behavior of circular hsscfrc filled steel tubular columns under ...Experimental behavior of circular hsscfrc filled steel tubular columns under ...
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
 
Evaluation of punching shear in flat slabs
Evaluation of punching shear in flat slabsEvaluation of punching shear in flat slabs
Evaluation of punching shear in flat slabs
 
Evaluation of performance of intake tower dam for recent earthquake in india
Evaluation of performance of intake tower dam for recent earthquake in indiaEvaluation of performance of intake tower dam for recent earthquake in india
Evaluation of performance of intake tower dam for recent earthquake in india
 
Evaluation of operational efficiency of urban road network using travel time ...
Evaluation of operational efficiency of urban road network using travel time ...Evaluation of operational efficiency of urban road network using travel time ...
Evaluation of operational efficiency of urban road network using travel time ...
 
Estimation of surface runoff in nallur amanikere watershed using scs cn method
Estimation of surface runoff in nallur amanikere watershed using scs cn methodEstimation of surface runoff in nallur amanikere watershed using scs cn method
Estimation of surface runoff in nallur amanikere watershed using scs cn method
 
Estimation of morphometric parameters and runoff using rs & gis techniques
Estimation of morphometric parameters and runoff using rs & gis techniquesEstimation of morphometric parameters and runoff using rs & gis techniques
Estimation of morphometric parameters and runoff using rs & gis techniques
 
Effect of variation of plastic hinge length on the results of non linear anal...
Effect of variation of plastic hinge length on the results of non linear anal...Effect of variation of plastic hinge length on the results of non linear anal...
Effect of variation of plastic hinge length on the results of non linear anal...
 
Effect of use of recycled materials on indirect tensile strength of asphalt c...
Effect of use of recycled materials on indirect tensile strength of asphalt c...Effect of use of recycled materials on indirect tensile strength of asphalt c...
Effect of use of recycled materials on indirect tensile strength of asphalt c...
 

Recently uploaded

Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacingjaychoudhary37
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2RajaP95
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 

Recently uploaded (20)

Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...Software and Systems Engineering Standards: Verification and Validation of Sy...
Software and Systems Engineering Standards: Verification and Validation of Sy...
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacing
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2HARMONY IN THE HUMAN BEING - Unit-II UHV-2
HARMONY IN THE HUMAN BEING - Unit-II UHV-2
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 

Evaluating the efficiency of rule techniques for file classification

  • 1. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 __________________________________________________________________________________________ Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org 38 EVALUATING THE EFFICIENCY OF RULE TECHNIQUES FOR FILE CLASSIFICATION S. Vijayarani1 , M. Muthulakshmi2 1 Assistant Professor, 2 M. Phil Research Scholar, Department of Computer Science, School of Computer Science and Engineering, Bharathiar University, Coimbatore, Tamilnadu, India. vijimohan_2000@yahoo.com. abarajitha.uma@gmail.com Abstract Text mining refers to the process of deriving high quality information from text. It is also known as knowledge discovery from text (KDT), deals with the machine supported analysis of text. It is used in various areas such as information retrieval, marketing, information extraction, natural language processing, document similarity, and so on. Document Similarity is one of the important techniques in text mining. In document similarity, the first and foremost step is to classify the files based on their category. In this research work, various classification rule techniques are used to classify the computer files based on their extensions. For example, the extension of computer files may be pdf, doc, ppt, xls, and so on. There are several algorithms for rule classifier such as decision table, JRip, Ridor, DTNB, NNge, PART, OneR and ZeroR. In this research work, three classification algorithms namely decision table, DTNB and OneR classifiers are used for performing classification of computer files based on their extension. The results produced by these algorithms are analyzed by using the performance factors classification accuracy and error rate. From the experimental results, DTNB proves to be more efficient than other two techniques. Index Terms: Data mining, Text mining, Classification, Decision table, DTNB, OneR ----------------------------------------------------------------------***------------------------------------------------------------------------ 1. INTRODUCTION Data mining is the practice of searching through large amounts of computerized data to find useful patterns or trends. Data mining is the process of knowledge discovery where knowledge is gained by analyzing the data store in very large repositories which are analyzed from various perspectives and the result is summarized it into useful information. Data mining is also known as Knowledge Discovery in Data (KDD). There are various research areas in data mining such as web mining, text mining, image mining, statistics, machine learning, data organization and databases, pattern detection, artificial intelligence and other areas. Text mining or knowledge discovery from text (KDT) deals with the machine supported analysis of text. It uses methods from information retrieval, natural language processing (NLP) information extraction and also connects them with the algorithms and methods of Knowledge discovery of data, data mining, machine learning and statistics. Current research in the area of text mining tackles problems of text representation, classification, clustering, or the search and modeling of hidden patterns. [5] Text mining usually involves the process of structuring the input text (usually parsing, along with the accumulation of some derived linguistic features and the removal of others, and consequent insertion into a database), deriving models within the structured data, and to finish evaluation and interpretation of the output. High quality in text mining typically refers to some combination of relevance of relevance, innovation, and interestingness. Various stages of a text-mining process can be combined together into a single workgroup. [10] Some of the important applications of text-mining include Data Mining Competitive Intelligence, Records Management, Enterprise Business Intelligence, National Security, E- Discovery, Intelligence Scientific discovery especially Life Sciences, Search or Information Access and Social media monitoring. Some of the technologies that have been developed and can be used in the text mining process are information extraction, topic tracking, summarization, classification, clustering, concept linkage, information conception, and question answering [4]. In this new and current era of technology, developments and techniques, efficient and effective, document classification is becoming a challenging and highly required area to capably categorize text documents into mutually exclusive categories. Text categorization is an upcoming and vital field in today’s world which is most importantly required and demanded to efficiently categorize various text documents into different categories. The rest of this paper is organized as follows. Section 2 illustrates the review of literature. Section 3 discusses the classification rule classifier and the various algorithms used
  • 2. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org 39 for classification. Experimental results are analyzed in Section 4 and Conclusions are given in Section 5. 2. LITERATURE REVIEW C. Lakshmi Devasena et al [9] discussed the effectiveness of Rule-Based classifiers for classification by taking a sample data set from UCI machine learning repository using the open source machine learning tool. An evaluation of different rule based classifiers used in data mining and a practical guideline for selecting the most suited algorithm for a classification is presented and some empirical criteria for describing and evaluating the classifiers are given. The performances of the classifiers were measured and results are compared using the Iris Data set. Among nine classifiers (Conjunctive Rule Classifier, Decision Table Classifier, DTNB Classifier, OneR Classifier, JRIP Classifier, NNGE Classifier, PART Classifier, RIDOR Classifier and ZeroR Classifier) NNGE Classifier performs well in the classification problem. OneR classifier, RIDOR Classifier and JRIP classifier are coming in the next category to classify the data. Biao Qin, Yuni Xia et al [4] proposed a new rule-based classification and prediction algorithm called uRule for classifying uncertain data. Uncertain data often occur in modern applications, including sensor databases, spatial- temporal databases, and medical or biology information systems. This algorithm introduces new measures for generating, pruning and optimizing rules. This new measures are computed considering uncertain data interval and probability distribution function. Based on the new values, the optimal splitting attribute and splitting value can be identified and used for classification and prediction. The uRule algorithm can process uncertainty in both numerical and categorical data. The experimental results show that uRule has excellent performance even when data is highly uncertain. Mohd Fauzi bin Othman et al [13] investigated the performance of different classification or clustering methods for a set of large data. The algorithm tested are Bayes Network, Radial Basis Function, Rule based classifier, Pruned Tree, Single Conjunctive Rule Learner and Nearest Neighbors Algorithm. The dataset breast cancer with a total data of 6291 and a dimension of 699 rows and 9 columns will be used to test and justify the differences between the classification methods or algorithms. Consequently, the classification technique that has the potential to significantly improve the common or conventional methods will be suggested for use in large scale data, bioinformatics or other common applications. Among the machine learning algorithm tested, Bayes network classifier has the potential to significantly improve the conventional classification methods for use in medical or bioinformatics field. Dr. S. Vijayarani et al [19] discussed the classification rule techniques in data mining are compared for predicting heart disease. The classification rule algorithms are namely Decision table, JRip, OneR and Part. By analyzed the experimental results of accuracy measure, it is observed that the decision table classification rule technique turned out to be best classifier for heart disease prediction because it contains more accuracy. By analyzed all error rates, the Decision table and OneR classification rule algorithm contains least error rate in possible two outcomes. 3. METHODOLOGY Document similarity is one of the main tasks in the area of text mining. The essential step of document similarity is to classify the documents based on some criteria. [7] In this section, various classification algorithms are used to classify the files based on their extension which are stored in the computer hard disk. (For example: pdf, doc, txt and so on). The methodology of the research work is as follows. 1. Dataset – Computer Files can be collected from the system hard disk. 2. Classification Rule Algorithms  Decision Table  DTNB  OneR 3. Performance factors  Classification accuracy  Error rate 4. Best Technique among classification rule algorithms  DTNB 3.1 Dataset To compare these data mining classification techniques, computer files can be collected from the system hard disk and a synthetic data set is created. This dataset has 31994 instances and four attributes namely file name, file size, file extension and file path. 3.2 Classification Rule Techniques Classification of documents involves assigning class labels to documents indicating their category. Classification algorithms are widely used in various applications. Data classification is a two steps process in which first step is the training phase where the classifier algorithm builds classifier with the training set of tuples and the second phase is classification phase where the model is used for classification and its performance is analyzed with the testing set of tuples. [12] There are various classification rule algorithms such as Decision table, JRip, Ridor, DTNB, PART, OneR, and so on. In this research, we have analyzed classification rule algorithms namely Decision table, DTNB and OneR.
  • 3. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org 40 3.2.1 Decision Table The algorithm, decision table, is found in the Weka classifiers under Rules. The simplest way of representing the output from machine learning is to put it in the same form as the input. The use of the classifier rules decision table is described as building and using a simple decision table majority classifier. The output will show a decision on a number of attributes for each instance. The number and specific types of attributes can vary to suit the needs of the task. Two variants of decision table classifiers are available. The first classifier, called DTMaj (Decision Table Majority) returns the majority of the training set if the decision table cell matching the new instance is empty, that is, it does not contain any training instances. The second classifier, called DTLoc (Decision Table Local), is a new variant that searches for a decision table entry with fewer matching attributes (larger cells) if the matching cell is empty. This variant therefore returns an answer from the native region. [14] 3.2.2 DTNB This is for building and using a decision table/naive bayes hybrid classifier. Every point in the search, the algorithm estimates the value of dividing the attributes into two disjoint subsets: one for the decision table, and the other for naive Bayes. A forward selection search is used at each step, then the selected attributes are exhibited by naive Bayes and the remainder by the decision table and all attributes are modeled by the decision table initially. At each step, the algorithm dropping an attribute entirely from the model. [16] The algorithm for learning the combined model (DTNB) proceeds in much the same way as the one for stand-alone DTs. At each point in the exploration it estimates the merit associated with splitting the attributes into two disjoint subsets: one for the DT, the other for NB. The class probability estimates of the DT and NB must be combined to generate overall class probability estimates. Assuming X> is the set of attributes in the DT and X⊥ the one in NB, the overall class probability is computed as Q (y | X) = α × QDT(y | X>) × QNB(y | X⊥)/Q(y), Where QDT(y | X>) and QNB(y | X ⊥) are the class probability estimates obtained from the DT and NB respectively, α is a normalization constant, and Q(y) is the prior probability of the class. All probabilities are predictable using Laplace corrected observed counts. 3.2.3 OneR The OneR algorithm creates a single rule for each attribute of training data and then picks up the rule with the least error rate. To generate a rule for an attribute, the most recurrent class for each attribute value must be established. The most recurrent class is the class that appears most frequently for that attribute value. A rule is a set of attribute values destined to their most recurrent class with which the attribute based on. Pseudo-code for OneR algorithm is The number of training data instances which does not agree with the binding of attribute value in the rule produces the error rate. OneR selects the rule with the least error rate. If two or more rules have same error rate, then the rule is selected at random. [2] 4. EXPERIMENTAL RESULTS 4.1 Accuracy Measure The following table shows the accuracy measure of classification techniques. They are the correctly classified instances, incorrectly classified instances, True Positive rate, F Measure, Precision, Receiver Operating Characteristics (ROC) Area and Kappa Statistics. TP Rate is the ratio of play cases predicted correctly cases to the total of positive cases. F Measure is a way of combining recall and precision scores into a single measure of performance. [18] Precision is the proportion of relevant documents in the results returned. ROC Area is a traditional to plot the same information in a normalized form with 1-false negative rate plotted against the false positive rate. Table-1: Accuracy Measure for Classification Rule Algorithms Algorithm Classifier Decision Table DTNB OneR Correctly Classified Instances 88.21 95.09 71.72 Incorrectly Classified Instances 11.79 4.91 28.28 TP Rate 88.20 95.10 71.70 Precision 91.70 95.20 99.60 F Measure 89.10 94.80 79.70 ROC Area 94.30 98.60 85.80 Kappa Statistics 81.09 92.08 58.95 For each attribute A, For each value VA of the attribute, make a rule as follows: Add up how often each class appears Locate the most frequent class Cf Generate a rule when A=VA; class attribute value = Cf End For-Each Compute the error rate of all rules End For-Each Select the rule with the smallest
  • 4. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org 41 Chart-1: Accuracy Measure for Classification Rule Algorithms From the graph, it is observed that DTNB algorithm performs better than other algorithms. Therefore the DTNB classification algorithm performs well because it contains highest accuracy when compared to Decision table and OneR. 4.2 Error Rate They are the Mean Absolute Error (M.A.E), Root Mean Square Error (R.M.S.E), Relative Absolute Error (R.A.E) and Root Relative Squared Error (R.R.S.R). They are the Mean Absolute Error (M.A.E), Root Mean Square Error (R.M.S.E), Relative Absolute Error (R.A.E) and Root Relative Squared Error (R.R.S.R). The mean absolute error (MAE) is defined as the quantity used to measure how close predictions or forecasts are to the eventual outcomes.[19] The root mean square error (RMSE) is defined as frequently used measure of the differences between values predicted by a model or an estimator and the values actually observed. Relative Absolute Error is a measure of the uncertainty of measurement compared to the size of the measurement. The root relative squared error defined as a relative to what it would have been if a simple predictor had been used. More specifically, this predictor is just the average of the actual values. Table-2: Error Rate of Classification Rule Algorithms Algorithm MAE RMSE RAE RRAE Decision Table 3.45 11.25 74.46 73.92 DTNB 1.98 8.21 42.64 53.96 OneR 2.09 14.47 45.19 95.10 Chart-2: Error Rate for Classification Rule Algorithms From the above graph, it is observed that Decision table and OneR algorithms attains highest error rate. Therefore, the DTNB classification algorithm performs well because it contains least error rate when compared to other algorithms. CONCLUSIONS Data mining is the process of discovering interesting knowledge from large amounts of data stored either in databases, data warehouses or other information sources. Text Mining is the automated or partially automated processing of text. In this research work, the classification rule algorithms namely Decision table, DTNB and OneR are used for classifying computer files which are stored in the computer. By analysing the experimental results it is observed that the DTNB (Decision Tree Naïve Bayes) classification technique has yields better result than other techniques. In future we tend to improve efficiency of performance by applying other data mining techniques and algorithms. REFERENCES [1]. Abdullah Wahbeh, Mohammed Al-Kabi, “Comparative Assessment of the performance of three WEKA text classifiers applied Arabic text. [2]. Anshul Goyal, Rajni Mehta, “Performance Comparison of Naïve Bayes and J48 Classification Algorithms”. [3]. Aurangzeb Khan, Baharum Baharudin, Lam Hong Lee, Khairullah khan, “A Review of Machine Learning Algorithms for Text-Documents Classification”. [4]. Biao Qin, Yuni Xia, Sunil Prabhakar, Yicheng Tu, “A Rule-Based Classification Algorithm for Uncertain Data”. [5]. BS Harish, DS Guru, S Manjunath, “Representation and Classification of Text Documents: A Brief Review”. [6]. S.Deepajothi, Dr.S.Selvarajan, “A Comparative Study of Classification Techniques on Adult Data Set” [7]. Ian H. Witten, Eibe Frank, “Data Mining Tools and Techniques practical Machine Learning”.
  • 5. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 02 Issue: 10 | Oct-2013, Available @ http://www.ijret.org 42 [8]. Kaushik Raviya, Biren Gajjar, “Performance Evaluation of Different Data Mining Classification Algorithm Using WEKA”. [9]. C. Lakshmi Devasena, T. Sumathi, V.V. Gomathi and M. Hemalatha, “Effectiveness Evaluation of Rule Based Classifiers for the Classification of Iris Data Set. [10]. Lan Huang, David Milne, Eibe Frank, Ian H. Witten, “Learning a Concept-based Document Similarity Measure”. [11]. Mahendra Tiwari, Manu Bhai Jha, Om Prakash Yadav, “Performance analysis of Data Mining algorithms in Weka”. [12]. Mirza Nazura Abdulkarim, “Classification and Retrieval of Research Papers: A Semantic Hierarchical Approach”. [13]. Mohd Fauzi bin Othman, Thomas Moh Shan Yau, “Comparison of Different Classification Techniques using WEKA for Breast Cancer”. [14]. Petra Kralj Novak, “Classification in WEKA”. [15]. Mrs. Sayantani Ghosh, Mr. Sudipta Roy, Prof. Samir K. Bandyopadhyay, “A tutorial review on Text Mining Algorithms”. [16]. Sunila Godara, Ritu Yadav, “Performance analysis of clustering algorithms for character recognition using Weka tool”. [17]. Trilok Chand Sharma, Manoj Jain, “WEKA Approach for Comparative Study of Classification Algorithm”. [18]. Dr. S.Vijayarani, S.Sudha, “Comparative Analysis of Classification Function Techniques for Heart Disease Prediction”. [19]. Dr. S.Vijayarani, S. Sudha, “An Effective Classification Rule Technique for Heart Disease Prediction”. BIOGRAPHIES Dr. S. Vijayarani has completed MCA, M.Phil and PhD in Computer Science. She is working as Assistant Professor in the School of Computer Science and Engineering, Bharathiar University, Coimbatore. Her fields of research interest are data mining, privacy, security, bioinformatics and data streams. She has published papers in the international journals and presented research papers in international and national conferences. Mrs. M Muthulakshmi has completed M.Sc in Computer Science and Information Technology. She is currently pursuing her M.Phil in Computer Science in the School of Computer Science and Engineering, Bharathiar University, Coimbatore. Her fields of interest are Data Mining, Text Mining and Semantic web mining.