SlideShare a Scribd company logo
1 of 4
Download to read offline
An Efficient Way of Classifying and
Clustering Documents Based on SMTP
U.Umamaheswari[1]
, Mr.G.Shivaji Rao M.E.[2]
,
M.E scholar, Assistant Professor,
[1]
Department of Computer Science and Engineering, [2]
Department of Computer Science and Engineering,
Sree Sowdambika College of Engineering, Sree Sowdambika College of Engineering,
Aruppukottai-626 134, Aruppukottai-626 134,
Tamil Nadu, Tamil Nadu,
India. India.
13umamaheswari@gmail.com shivajirao88@gmail.com
Abstract
In text processing, the similarity measurement is the important process. It measures the similarities between
the two documents. In this project we proposed the new similarity measurement. The computation of similarity
measurement is based on the feature of two documents. Our proposed system contains three case to compute the
similarity. The three cases are, both two documents contains features, only one document contains feature, there is no
feature into the two documents. In first case, the similarity is increased when the differences of feature value is
decreased between the two documents. Then the given differences are scaled. In second case, fixed value is given to
the similarity. In third cases there is no contribution to the similarity. Finally our proposed measure method achieves
the better performance compared than other measurement methods.
Index Terms—Document classification, document clustering, entropy, accuracy, classifiers, clustering algorithms
I. INTRODUCTION
Text processing plays an important role
in information retrieval, data mining, and web
search. A document is any content drawn up or
received by the Foundation concerning a matter
relating to the policies, activities and decisions
falling within its competence and in the
framework of its official tasks, in whatever
medium (written on paper or stored in electronic
form , including e-mail, or as a sound, visual or
audio-visual recording). The term classification
means the allocation of an appropriate level of
security (as confidential or restricted) to a
document the unauthorised disclosure of which
might prejudice the interests of the Foundation,
the EU or third parties. Documents are
confidential when their unauthorised disclosure
could harm the essential interests of an
individual, the Foundation or the EU.
Documents are restricted when their
unauthorised disclosure could be
Disadvantageous to the Foundation, the EU or a
third party. Documents are restricted when their
unauthorised disclosure could be
disadvantageous to the Foundation, the EU or a
third party. The term originator means the duly
authorised author of a classified document. The
term downgrading means a reduction in the level
of classification. The term declassification
means the removal of any classification.
RULES FOR CLASSIFICATION
Foundation documents that are not
public shall be classified in one of the following
categories: confidential or restricted. Criteria and
guidance for classification are set out in Annex 1
to this decision. The classification of a
document shall be decided by the originator
based on these rules. All classified documents
shall be recorded in a register of classified
documents. Applications for access to classified
documents shall be examined by the Director. If
a classified document is to be made available in
response to a request from a member of the
public, it shall be first declassified by a decision
of the Director. Documents shall be classified
only when necessary. The classification shall be
clearly indicated and shall be maintained only as
long as the document requires protection. The
classification of a document shall be determined
by the level of sensitivity of its contents.
Classification of documents shall be
periodically reviewed. By request of the
Document Management Officer (DMO), the
originator of a document shall indicate if that
document or information may be downgraded
and declassified. Where a document or
information is declassified, details shall be
recorded in the register and the document shall
be archived appropriately. Where classification
is retained, details of the review shall be entered
Proceedings of International Conference on Developments in Engineering Research
ISBN NO : 378 - 26 - 13840 - 9
www.iaetsd.in
INTERNATIONAL ASSOCIATION OF ENGINEERING & TECHNOLOGY FOR SKILL DEVELOPMENT
50
in the register. All classified documents shall be
retained in a manner that ensures they are not
disclosed to unauthorised individuals. Security
and access levels to classified documents shall
be recorded in the register by the DMO. Either
the originator or the DMO may retain classified
documents. Where the originator retains
documents, they shall be physically safeguarded
as specified by the DMO.
Individual pages, paragraphs, sections,
annexes, appendices, attachments and enclosures
of a given document may require different
classifications and shall be classified
accordingly. The classification of the document
as a whole shall be that of its most highly
classified part. The originator shall indicate
clearly at which level a document should be
classified when detached from its enclosures.
The classification shall appear at the top and
bottom centre of each page, and each page shall
be numbered. Each classified document shall
bear a reference number and a date. All annexes
and enclosures shall be listed on the first page of
a document classified as confidential. The
classification shall be shown on restricted
documents by mechanical or electronic means.
Classification shall be shown on confidential
documents by mechanical means or by hand or
by printing on pre-stamped, registered paper.
DOCUMENT CLASSIFICATION
Document classification or document
categorization is a problem in library science,
information science and computer science. The
task is to assign a document to one or more
classes or categories. This may be done
"manually" (or "intellectually") or
algorithmically. The intellectual classification of
documents has mostly been the province of
library science, while the algorithmic
classification of documents is used mainly in
information science and computer science. The
problems are overlapping, however, and there is
therefore also interdisciplinary research on
document classification.
The documents to be classified may be
texts, images, music, etc. Each kind of document
possesses its special classification problems.
When not otherwise specified, text classification
is implied. Documents may be classified
according to their subjects or according to other
attributes (such as document type, author,
printing year etc.). In the rest of this article only
subject classification is considered. There are
two main philosophies of subject classification
of documents: The content based approach and
the request based approach.
II. RELATED WORK
The documents are represented as
vectors, each elements of vectors indicates the
corresponding feature values of the documents.
In existing system the feature values are termed
as frequency, relative term frequency and tf-idf
(term frequency and inverse document
frequency). The document size is large and most
of the vector value is zero in this system. In
existing system non symmetric measure is used
to measure the similarities. Canberra distance
metric is one of the existing methods which are
used to measure the similarity. This method is
applicable when the vector elements are always
non-zero. In existing system cosine similarity
measure is used to measure the similarities
between the two documents. It takes cosine of
angle between the two vectors. The phase based
similarity measure is used in one of the existing
system. So, we are our some problem in
following as the document is large in existing
system so it takes more memory to store the
documents. It contains zero vector values so it
makes severe challenges to measure the
similarity. It is not efficient and does not provide
accurate results. The performance is low. It takes
more time to produce the result.
In our proposed system new measure is used to
measure the similarities between the two
documents. This new measure is symmetric
measure. The similarities between the two
documents are measured with respect to the
features. Three feature cases are used to
measure the similarities. In first case the two
documents contains features value, in second
case, only one document contains the feature
value and in third case there is no feature value
in both documents. This measure is applied in
many applications such as single label and multi
label classification, clustering and so on. In our
proposed system we k-NN based single label
classification (SL-KNN) and k-NN base multi
label classification (ML-KNN) for classification
purpose. The documents are formed as cluster in
our concept for this purpose we used
Hierarchical Agglomerative Clustering (HAC).
It is the one type of k means clustering
algorithm. We three type of data sets such as
WebKB, Reuters 8 and RCV1. These data sets
are stored in the form of XML. So, In our project
improve following as It achieves better
Proceedings of International Conference on Developments in Engineering Research
ISBN NO : 378 - 26 - 13840 - 9
www.iaetsd.in
INTERNATIONAL ASSOCIATION OF ENGINEERING & TECHNOLOGY FOR SKILL DEVELOPMENT
51
performance compared than other measure, It
provides efficient results, It achieves accuracy in
similarity measurement, It take less time for
similarity measure.
III.PROPOSE SIMILARITY MEASURE
In our proposed system new
measure is used to measure the similarities
between the two documents. This new measure
is symmetric measure. The similarities between
the two documents are measured with respect to
the features. Three feature cases are used to
measure the similarities. In first case the two
documents contains features value, in second
case, only one document contains the feature
value and in third case there is no feature value
in both documents. This measure is applied in
many applications such as single label and multi
label classification, clustering and so on. In our
proposed system we k-NN based single label
classification (SL-KNN) and k-NN base multi
label classification (ML-KNN) for classification
purpose. The documents are formed as cluster in
our concept for this purpose we used
Hierarchical Agglomerative Clustering (HAC).
It is the one type of k means clustering
algorithm. We three type of data sets such as
WebKB, Reuters 8 and RCV1. These data sets
are stored in the form of XML.
Fig: System architecture
The following properties, among
other ones, are preferable for a similarity
measure between two documents:
1. The presence or absence of a feature is
more essential than the difference
between the two values associated with
a present feature.
Let as Consider two features wi
and wj and two documents d1 and d2.
wi = d2(some relationship)
wi ≠ d1(no relationship)
In this case, d1 and d2 are
dissimilar in terms of wi. If wj appears in both
d1 and d2. Then wj has some relationship with
d1 and d2 simultaneously. In this case, d1 and
d2 are similar to some degree in terms of wj.
For the above two cases, it is reasonable to say
that wi carries more weight than wj in
determining the similarity degree between d1
and d2.
2. The similarity degree should increase
when the difference between two non-
zero values of a specific feature
decreases.
3. The similarity degree should decrease
when the number of presence-absence
features increases.
4. Two documents are least similar to each
other if none of the features have non-
zero values in both documents. Let
d1 = < d11, d12, . . . , d1m > and
d2 = < d21, d22, . . . , d2m >.
If d1id2i = 0,
d1i + d2i > 0
for 1 ≤ i ≤ m, then d1 and d2 are
least similar to each other. As mentioned
earlier, d1 and d2 are dissimilar in terms
of a presence-absence feature.
5. The similarity measure should be
symmetric. That is, the similarity degree
between d1 and d2 should be the same
as that between d2 and d1.
6. The value distribution of a feature is
considered, i.e., the standard deviation
of the feature is taken into account, for
its contribution to the similarity between
two documents. A feature with a larger
spread offers more contribution to the
similarity between d1 and d2.
IV. METHODOLOGY
1. DATA SET SELECTION
Here we used three types of data
sets. They are WebKB, Reuters 8 and RCV1.
These data sets are stored in the form of HTML
and text or SGM. Each data set is divided into
two types. They are training and testing data set.
WebKB contains web pages as the document
which is collected by the world wide knowledge
base. It does not contain predefined training set
and testing set. So we randomly divide these
data sets as training and testing set. Reuter’s data
set contains predesigned training set and testing
sets. If the resultant data set contains 71% of
training set and 29% of testing set then it is
called as Reuters 8. RCV1 also contains
predesigned training and testing data sets.
2. CLUSTERING
After the data set collection then we
perform the cluster formation. Cluster contains
the more than one data set. For cluster formation
Proceedings of International Conference on Developments in Engineering Research
ISBN NO : 378 - 26 - 13840 - 9
www.iaetsd.in
INTERNATIONAL ASSOCIATION OF ENGINEERING & TECHNOLOGY FOR SKILL DEVELOPMENT
52
we used k-means algorithm. In our proposed
system, we used HAC (hierarchical
agglomerative clustering). It is the one type k
means clustering algorithm and it used bottom
up approach for clustering. The HAC is mostly
used to measure the performance.
3. PERFORM CLASSIFICATION
After form the cluster then we performs
the classifications. We used SL-kNN and ML-
kNN method is used to perform the
classifications. SL-KNN is used for only one
category and ML-KNN is used for more than
two categories. These are used to measure the
similarity between the documents. It gives best
result for similarity. Classification is used to
retrieve process. It increases the fast of retrieve
the data. It easily identifies the similarity
between the documents. It gives related result
efficiently.
4. FIND SIMILARITIES
After select the classification
methods then we finds the similarities between
the data sets. The similarity measurement is
based on the features of documents. It finds the
word weight and distance to measure the
similarities. Similarity measure has three
properties they are, the presence and absence
feature is more important than the difference of
two values that are presented in the feature, the
similarity degree is increased when the
difference between the non zero values of
feature is decreased, the similarity degree is
decreased when the presence and absence of
feature is increased.
5. PERFORMANCE COMPARISON
After perform all the process then
we compared the performance of process. In our
proposed system, the performance is high
compared than all existing system.
V.CONCLUSION
The main concept of our proposed system is
to find the similarity between the documents.
For similarity measure we used KNN based
single label classification and KNN based multi
label classification. Here we collect three data
sets such as WEBKB, reuters-8 and RCV1.
Classification is used for retrieve process and it
increases the fast of retrieve the data. It easily
identifies the similarity between the documents.
It gives related result efficiently. Here we used
HAC for clustering and it is the one type of k
means clustering algorithm. It also measures the
performance. In our proposed system, the data
sets are stored in the form the XML. Here we
used tf-idf (term frequency and inverse
document frequency) to retrieve the document
terms and It measures the similarity by
calculating the weight. Here we used
optimization and it identifies the no of cluster
that we formed in the document. It increases the
accuracy of similarity measurement.
VI.REFERENCES
1. Yung-Shen Lin, Jung-Yi Jiang, and Shie-Jue Lee,
Member, IEEE,” A Similarity Measure for Text
Classification and Clustering”, in IEEE Transactions
On Knowledge And Data Engineering, Vol. 26, No. 7,
July 2014 1575
2. D. Cai, X. He, and J. Han, “Document clustering using
locality preserving indexing,” IEEE Trans. Knowl. Data
Eng., vol. 17, no. 12, pp. 1624–1637, Dec. 2005.
3. H. Chim and X. Deng, “Efficient phrase-based
document similarity for clustering,” IEEE Trans.
Knowl. Data Eng., vol. 20, no. 9, pp. 1217–1229, Sept.
2008.
4. S. Clinchant and E. Gaussier, “Information-based
models for ad hoc IR,” in Proc. 33rd SIGIR, Geneva,
Switzerland, 2010, pp. 234–241.
5. K. M. Hammouda and M. S. Kamel, “Hierarchically
distributed peer-to-peer document clustering and cluster
summarization,” IEEE Trans. Knowl. Data Eng., vol.
21, no. 5, pp. 681–698, May 2009.
6. Y. Zhao and G. Karypis, “Comparison of agglomerative
and partitional document clustering algorithms,” in
Proc. Workshop Clustering High Dimensional Data Its
Appl. 2nd SIAM ICDM, 2002, pp. 83–93.
7. T. Zhang, Y. Y. Tang, B. Fang, and Y. Xiang,
“Document clustering in correlation similarity measure
space,” IEEE Trans. Knowl. Data Eng., vol. 24, no. 6,
pp. 1002–1013, Jun. 2012.
8. M. L. Zhang and Z. H. Zhou, “ML-kNN: A lazy
learning approach to multi-label learning,” Pattern
Recognit., vol. 40, no. 7, pp. 2038–2048, 2007.
9. Strehl and J. Ghosh, “Value-based customer grouping
from large retail data-sets,” in Proc. SPIE, vol. 4057.
Orlando, FL, USA, Apr. 2000, pp. 33–42.
10. F. Sebastiani, “Machine learning in automated text
categorization,” ACM CSUR, vol. 34, no. 1, pp. 1–47,
2002.
11. T. W. Schoenharl and G. Madey, “Evaluation of
measurement techniques for the validation of agent-
based simulations against streaming data,” in Proc.
ICCS, Kraków, Poland, 2008.
12. D. D. Lewis, Y. Yang, T. Rose, and F. Li, “RCV1: A
new benchmark collection for text categorization
research,” J. Mach. Learn. Res., vol. 5, pp. 361–397,
Apr. 2004.
13. V. Lertnattee and T. Theeramunkong,
“Multidimensional text classification for drug
information,” IEEE Trans. Inform. Technol. Biomed.,
vol. 8, no. 3 pp. 306–312, Sept. 2004.
14. S.-J. Lee and C.-S. Ouyang, “A neuro-fuzzy system
modeling with self-constructing rule generation and
hybrid SVD-based learning,” IEEE Trans. Fuzzy Syst.,
vol. 11, no. 3, pp. 341–353, Jun. 2003.
15. G. Amati and C. J. V. Rijsbergen, “Probabilistic
models of information retrieval based on
measuring the divergence from randomness,”
ACM Trans. Inform. Syst., vol. 20, no. 4, pp. 357–
389, 2002.
Proceedings of International Conference on Developments in Engineering Research
ISBN NO : 378 - 26 - 13840 - 9
www.iaetsd.in
INTERNATIONAL ASSOCIATION OF ENGINEERING & TECHNOLOGY FOR SKILL DEVELOPMENT
53

More Related Content

What's hot

Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrievalunyil96
 
Improving Annotations in Digital Documents using Document Features and Fuzzy ...
Improving Annotations in Digital Documents using Document Features and Fuzzy ...Improving Annotations in Digital Documents using Document Features and Fuzzy ...
Improving Annotations in Digital Documents using Document Features and Fuzzy ...IRJET Journal
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)9866825059
 
Paper id 252014107
Paper id 252014107Paper id 252014107
Paper id 252014107IJRAT
 
Indexing Automated Vs Automatic Galvan1
Indexing Automated Vs Automatic   Galvan1Indexing Automated Vs Automatic   Galvan1
Indexing Automated Vs Automatic Galvan1CorinaF
 
A03730108
A03730108A03730108
A03730108theijes
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notesBAIRAVI T
 
Asif nosql
Asif nosqlAsif nosql
Asif nosqlAsif Ali
 
Exploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining ApplicationsExploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining ApplicationsIRJET Journal
 
IRJET- A Survey on Link Prediction Techniques
IRJET-  	  A Survey on Link Prediction TechniquesIRJET-  	  A Survey on Link Prediction Techniques
IRJET- A Survey on Link Prediction TechniquesIRJET Journal
 
Sherlock a deep learning approach to semantic data type dete
Sherlock a deep learning approach to semantic data type deteSherlock a deep learning approach to semantic data type dete
Sherlock a deep learning approach to semantic data type detemayank272369
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantinimaxfalc
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrievalKU Leuven
 
A SURVEY OF LINK MINING AND ANOMALIES DETECTION
A SURVEY OF LINK MINING AND ANOMALIES DETECTIONA SURVEY OF LINK MINING AND ANOMALIES DETECTION
A SURVEY OF LINK MINING AND ANOMALIES DETECTIONIJDKP
 
SOFTWARE ENGINEERING AND SOFTWARE PROJECT MANAGEMENT
SOFTWARE ENGINEERING AND SOFTWARE PROJECT MANAGEMENTSOFTWARE ENGINEERING AND SOFTWARE PROJECT MANAGEMENT
SOFTWARE ENGINEERING AND SOFTWARE PROJECT MANAGEMENTARAVINDRM2
 

What's hot (20)

Text databases and information retrieval
Text databases and information retrievalText databases and information retrieval
Text databases and information retrieval
 
P33077080
P33077080P33077080
P33077080
 
Improving Annotations in Digital Documents using Document Features and Fuzzy ...
Improving Annotations in Digital Documents using Document Features and Fuzzy ...Improving Annotations in Digital Documents using Document Features and Fuzzy ...
Improving Annotations in Digital Documents using Document Features and Fuzzy ...
 
Model of information retrieval (3)
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
 
Automatic indexing
Automatic indexingAutomatic indexing
Automatic indexing
 
Paper id 252014107
Paper id 252014107Paper id 252014107
Paper id 252014107
 
Hy3414631468
Hy3414631468Hy3414631468
Hy3414631468
 
Indexing Automated Vs Automatic Galvan1
Indexing Automated Vs Automatic   Galvan1Indexing Automated Vs Automatic   Galvan1
Indexing Automated Vs Automatic Galvan1
 
A03730108
A03730108A03730108
A03730108
 
Information retrieval-systems notes
Information retrieval-systems notesInformation retrieval-systems notes
Information retrieval-systems notes
 
Asif nosql
Asif nosqlAsif nosql
Asif nosql
 
Exploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining ApplicationsExploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining Applications
 
Text mining
Text miningText mining
Text mining
 
IRJET- A Survey on Link Prediction Techniques
IRJET-  	  A Survey on Link Prediction TechniquesIRJET-  	  A Survey on Link Prediction Techniques
IRJET- A Survey on Link Prediction Techniques
 
Sherlock a deep learning approach to semantic data type dete
Sherlock a deep learning approach to semantic data type deteSherlock a deep learning approach to semantic data type dete
Sherlock a deep learning approach to semantic data type dete
 
S34119122
S34119122S34119122
S34119122
 
Tovek Presentation by Livio Costantini
Tovek Presentation by Livio CostantiniTovek Presentation by Livio Costantini
Tovek Presentation by Livio Costantini
 
Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
 
A SURVEY OF LINK MINING AND ANOMALIES DETECTION
A SURVEY OF LINK MINING AND ANOMALIES DETECTIONA SURVEY OF LINK MINING AND ANOMALIES DETECTION
A SURVEY OF LINK MINING AND ANOMALIES DETECTION
 
SOFTWARE ENGINEERING AND SOFTWARE PROJECT MANAGEMENT
SOFTWARE ENGINEERING AND SOFTWARE PROJECT MANAGEMENTSOFTWARE ENGINEERING AND SOFTWARE PROJECT MANAGEMENT
SOFTWARE ENGINEERING AND SOFTWARE PROJECT MANAGEMENT
 

Viewers also liked

Iaetsd a novel approach to assess the quality of tone mapped
Iaetsd a novel approach to assess the quality of tone mappedIaetsd a novel approach to assess the quality of tone mapped
Iaetsd a novel approach to assess the quality of tone mappedIaetsd Iaetsd
 
COMPARISION OF TONE-MAPPING ALGORITHM BASED ON STRUCTURAL FIDELITY AND STATIS...
COMPARISION OF TONE-MAPPING ALGORITHM BASED ON STRUCTURAL FIDELITY AND STATIS...COMPARISION OF TONE-MAPPING ALGORITHM BASED ON STRUCTURAL FIDELITY AND STATIS...
COMPARISION OF TONE-MAPPING ALGORITHM BASED ON STRUCTURAL FIDELITY AND STATIS...Sumadeep Juvvalapalem
 
Iaetsd early detection of breast cancer
Iaetsd early detection of breast cancerIaetsd early detection of breast cancer
Iaetsd early detection of breast cancerIaetsd Iaetsd
 
Iaetsd protecting privacy preserving for cost effective adaptive actions
Iaetsd protecting  privacy preserving for cost effective adaptive actionsIaetsd protecting  privacy preserving for cost effective adaptive actions
Iaetsd protecting privacy preserving for cost effective adaptive actionsIaetsd Iaetsd
 
Iaetsd enhanced cryptography algorithm for providing
Iaetsd enhanced cryptography algorithm for providingIaetsd enhanced cryptography algorithm for providing
Iaetsd enhanced cryptography algorithm for providingIaetsd Iaetsd
 
Iaetsd rtos based electronics industrial
Iaetsd rtos based electronics industrialIaetsd rtos based electronics industrial
Iaetsd rtos based electronics industrialIaetsd Iaetsd
 
Iaetsd fpga implementation of rf technology and biometric authentication
Iaetsd fpga implementation of rf technology and biometric authenticationIaetsd fpga implementation of rf technology and biometric authentication
Iaetsd fpga implementation of rf technology and biometric authenticationIaetsd Iaetsd
 
Iaetsd load area frequency control for multi area power
Iaetsd load area frequency control for multi area powerIaetsd load area frequency control for multi area power
Iaetsd load area frequency control for multi area powerIaetsd Iaetsd
 
Iaetsd zigbee for vehicular communication systems
Iaetsd zigbee for vehicular communication systemsIaetsd zigbee for vehicular communication systems
Iaetsd zigbee for vehicular communication systemsIaetsd Iaetsd
 
Iaetsd asynchronous data transactions on so c using fifo
Iaetsd asynchronous data transactions on so c using fifoIaetsd asynchronous data transactions on so c using fifo
Iaetsd asynchronous data transactions on so c using fifoIaetsd Iaetsd
 
Iaetsd power capture safe test pattern determination
Iaetsd power capture safe test pattern determinationIaetsd power capture safe test pattern determination
Iaetsd power capture safe test pattern determinationIaetsd Iaetsd
 
Iaetsd quick detection technique to reduce congestion in
Iaetsd quick detection technique to reduce congestion inIaetsd quick detection technique to reduce congestion in
Iaetsd quick detection technique to reduce congestion inIaetsd Iaetsd
 
Iaetsd experimental study on properties of ternary blended fibre
Iaetsd experimental study on properties of ternary blended fibreIaetsd experimental study on properties of ternary blended fibre
Iaetsd experimental study on properties of ternary blended fibreIaetsd Iaetsd
 
Iaetsd design and analysis of a novel low actuated voltage
Iaetsd design and analysis of a novel low actuated voltageIaetsd design and analysis of a novel low actuated voltage
Iaetsd design and analysis of a novel low actuated voltageIaetsd Iaetsd
 
Iaetsd an mtcmos technique for optimizing low
Iaetsd an mtcmos technique for optimizing lowIaetsd an mtcmos technique for optimizing low
Iaetsd an mtcmos technique for optimizing lowIaetsd Iaetsd
 
Iaetsd degraded document image enhancing in
Iaetsd degraded document image enhancing inIaetsd degraded document image enhancing in
Iaetsd degraded document image enhancing inIaetsd Iaetsd
 
Iaetsd survey on wireless sensor networks routing
Iaetsd survey on wireless sensor networks routingIaetsd survey on wireless sensor networks routing
Iaetsd survey on wireless sensor networks routingIaetsd Iaetsd
 
Iaetsd innovative brick material
Iaetsd innovative brick materialIaetsd innovative brick material
Iaetsd innovative brick materialIaetsd Iaetsd
 
Iaetsd automated toll collection system
Iaetsd automated toll collection systemIaetsd automated toll collection system
Iaetsd automated toll collection systemIaetsd Iaetsd
 
Iaetsd compressed air vehicle (cav)
Iaetsd compressed air vehicle (cav)Iaetsd compressed air vehicle (cav)
Iaetsd compressed air vehicle (cav)Iaetsd Iaetsd
 

Viewers also liked (20)

Iaetsd a novel approach to assess the quality of tone mapped
Iaetsd a novel approach to assess the quality of tone mappedIaetsd a novel approach to assess the quality of tone mapped
Iaetsd a novel approach to assess the quality of tone mapped
 
COMPARISION OF TONE-MAPPING ALGORITHM BASED ON STRUCTURAL FIDELITY AND STATIS...
COMPARISION OF TONE-MAPPING ALGORITHM BASED ON STRUCTURAL FIDELITY AND STATIS...COMPARISION OF TONE-MAPPING ALGORITHM BASED ON STRUCTURAL FIDELITY AND STATIS...
COMPARISION OF TONE-MAPPING ALGORITHM BASED ON STRUCTURAL FIDELITY AND STATIS...
 
Iaetsd early detection of breast cancer
Iaetsd early detection of breast cancerIaetsd early detection of breast cancer
Iaetsd early detection of breast cancer
 
Iaetsd protecting privacy preserving for cost effective adaptive actions
Iaetsd protecting  privacy preserving for cost effective adaptive actionsIaetsd protecting  privacy preserving for cost effective adaptive actions
Iaetsd protecting privacy preserving for cost effective adaptive actions
 
Iaetsd enhanced cryptography algorithm for providing
Iaetsd enhanced cryptography algorithm for providingIaetsd enhanced cryptography algorithm for providing
Iaetsd enhanced cryptography algorithm for providing
 
Iaetsd rtos based electronics industrial
Iaetsd rtos based electronics industrialIaetsd rtos based electronics industrial
Iaetsd rtos based electronics industrial
 
Iaetsd fpga implementation of rf technology and biometric authentication
Iaetsd fpga implementation of rf technology and biometric authenticationIaetsd fpga implementation of rf technology and biometric authentication
Iaetsd fpga implementation of rf technology and biometric authentication
 
Iaetsd load area frequency control for multi area power
Iaetsd load area frequency control for multi area powerIaetsd load area frequency control for multi area power
Iaetsd load area frequency control for multi area power
 
Iaetsd zigbee for vehicular communication systems
Iaetsd zigbee for vehicular communication systemsIaetsd zigbee for vehicular communication systems
Iaetsd zigbee for vehicular communication systems
 
Iaetsd asynchronous data transactions on so c using fifo
Iaetsd asynchronous data transactions on so c using fifoIaetsd asynchronous data transactions on so c using fifo
Iaetsd asynchronous data transactions on so c using fifo
 
Iaetsd power capture safe test pattern determination
Iaetsd power capture safe test pattern determinationIaetsd power capture safe test pattern determination
Iaetsd power capture safe test pattern determination
 
Iaetsd quick detection technique to reduce congestion in
Iaetsd quick detection technique to reduce congestion inIaetsd quick detection technique to reduce congestion in
Iaetsd quick detection technique to reduce congestion in
 
Iaetsd experimental study on properties of ternary blended fibre
Iaetsd experimental study on properties of ternary blended fibreIaetsd experimental study on properties of ternary blended fibre
Iaetsd experimental study on properties of ternary blended fibre
 
Iaetsd design and analysis of a novel low actuated voltage
Iaetsd design and analysis of a novel low actuated voltageIaetsd design and analysis of a novel low actuated voltage
Iaetsd design and analysis of a novel low actuated voltage
 
Iaetsd an mtcmos technique for optimizing low
Iaetsd an mtcmos technique for optimizing lowIaetsd an mtcmos technique for optimizing low
Iaetsd an mtcmos technique for optimizing low
 
Iaetsd degraded document image enhancing in
Iaetsd degraded document image enhancing inIaetsd degraded document image enhancing in
Iaetsd degraded document image enhancing in
 
Iaetsd survey on wireless sensor networks routing
Iaetsd survey on wireless sensor networks routingIaetsd survey on wireless sensor networks routing
Iaetsd survey on wireless sensor networks routing
 
Iaetsd innovative brick material
Iaetsd innovative brick materialIaetsd innovative brick material
Iaetsd innovative brick material
 
Iaetsd automated toll collection system
Iaetsd automated toll collection systemIaetsd automated toll collection system
Iaetsd automated toll collection system
 
Iaetsd compressed air vehicle (cav)
Iaetsd compressed air vehicle (cav)Iaetsd compressed air vehicle (cav)
Iaetsd compressed air vehicle (cav)
 

Similar to Classifying Documents Based on SMTP Using an Efficient Clustering Method

A Competent and Empirical Model of Distributed Clustering
A Competent and Empirical Model of Distributed ClusteringA Competent and Empirical Model of Distributed Clustering
A Competent and Empirical Model of Distributed ClusteringIRJET Journal
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET Journal
 
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET- 	  Automated Document Summarization and Classification using Deep Lear...IRJET- 	  Automated Document Summarization and Classification using Deep Lear...
IRJET- Automated Document Summarization and Classification using Deep Lear...IRJET Journal
 
A rough set based hybrid method to text categorization
A rough set based hybrid method to text categorizationA rough set based hybrid method to text categorization
A rough set based hybrid method to text categorizationNinad Samel
 
Reviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clusteringReviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clusteringIRJET Journal
 
2016 BE Final year Projects in chennai - 1 Crore Projects
2016 BE Final year Projects in chennai - 1 Crore Projects 2016 BE Final year Projects in chennai - 1 Crore Projects
2016 BE Final year Projects in chennai - 1 Crore Projects 1crore projects
 
Design and Implementation of Meetings Document Management and Retrieval System
Design and Implementation of Meetings Document Management and Retrieval SystemDesign and Implementation of Meetings Document Management and Retrieval System
Design and Implementation of Meetings Document Management and Retrieval SystemCSCJournals
 
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEMCANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEMIRJET Journal
 
Candidate Link Generation Using Semantic Phermone Swarm
Candidate Link Generation Using Semantic Phermone SwarmCandidate Link Generation Using Semantic Phermone Swarm
Candidate Link Generation Using Semantic Phermone Swarmkevig
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET Journal
 
An efficeient privacy preserving ranked keyword search
An efficeient privacy preserving ranked keyword searchAn efficeient privacy preserving ranked keyword search
An efficeient privacy preserving ranked keyword searchredpel dot com
 
Data Analytics all units
Data Analytics all unitsData Analytics all units
Data Analytics all unitsjayaramb
 
IRJET - Document Comparison based on TF-IDF Metric
IRJET - Document Comparison based on TF-IDF MetricIRJET - Document Comparison based on TF-IDF Metric
IRJET - Document Comparison based on TF-IDF MetricIRJET Journal
 
Evaluating the efficiency of rule techniques for file classification
Evaluating the efficiency of rule techniques for file classificationEvaluating the efficiency of rule techniques for file classification
Evaluating the efficiency of rule techniques for file classificationeSAT Journals
 
Clustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureClustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureIOSR Journals
 
Evaluating the efficiency of rule techniques for file
Evaluating the efficiency of rule techniques for fileEvaluating the efficiency of rule techniques for file
Evaluating the efficiency of rule techniques for fileeSAT Publishing House
 
IRJET- A Survey of Text Document Clustering by using Clustering Techniques
IRJET- A Survey of Text Document Clustering by using Clustering TechniquesIRJET- A Survey of Text Document Clustering by using Clustering Techniques
IRJET- A Survey of Text Document Clustering by using Clustering TechniquesIRJET Journal
 
Document retrieval using clustering
Document retrieval using clusteringDocument retrieval using clustering
Document retrieval using clusteringeSAT Journals
 
Correlation Analysis of Forensic Metadata for Digital Evidence
Correlation Analysis of Forensic Metadata for Digital EvidenceCorrelation Analysis of Forensic Metadata for Digital Evidence
Correlation Analysis of Forensic Metadata for Digital EvidenceIJCSIS Research Publications
 

Similar to Classifying Documents Based on SMTP Using an Efficient Clustering Method (20)

A Competent and Empirical Model of Distributed Clustering
A Competent and Empirical Model of Distributed ClusteringA Competent and Empirical Model of Distributed Clustering
A Competent and Empirical Model of Distributed Clustering
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
 
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET- 	  Automated Document Summarization and Classification using Deep Lear...IRJET- 	  Automated Document Summarization and Classification using Deep Lear...
IRJET- Automated Document Summarization and Classification using Deep Lear...
 
A rough set based hybrid method to text categorization
A rough set based hybrid method to text categorizationA rough set based hybrid method to text categorization
A rough set based hybrid method to text categorization
 
Reviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clusteringReviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clustering
 
2016 BE Final year Projects in chennai - 1 Crore Projects
2016 BE Final year Projects in chennai - 1 Crore Projects 2016 BE Final year Projects in chennai - 1 Crore Projects
2016 BE Final year Projects in chennai - 1 Crore Projects
 
Design and Implementation of Meetings Document Management and Retrieval System
Design and Implementation of Meetings Document Management and Retrieval SystemDesign and Implementation of Meetings Document Management and Retrieval System
Design and Implementation of Meetings Document Management and Retrieval System
 
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEMCANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
 
Candidate Link Generation Using Semantic Phermone Swarm
Candidate Link Generation Using Semantic Phermone SwarmCandidate Link Generation Using Semantic Phermone Swarm
Candidate Link Generation Using Semantic Phermone Swarm
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
 
An efficeient privacy preserving ranked keyword search
An efficeient privacy preserving ranked keyword searchAn efficeient privacy preserving ranked keyword search
An efficeient privacy preserving ranked keyword search
 
IJET-V2I6P33
IJET-V2I6P33IJET-V2I6P33
IJET-V2I6P33
 
Data Analytics all units
Data Analytics all unitsData Analytics all units
Data Analytics all units
 
IRJET - Document Comparison based on TF-IDF Metric
IRJET - Document Comparison based on TF-IDF MetricIRJET - Document Comparison based on TF-IDF Metric
IRJET - Document Comparison based on TF-IDF Metric
 
Evaluating the efficiency of rule techniques for file classification
Evaluating the efficiency of rule techniques for file classificationEvaluating the efficiency of rule techniques for file classification
Evaluating the efficiency of rule techniques for file classification
 
Clustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureClustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity Measure
 
Evaluating the efficiency of rule techniques for file
Evaluating the efficiency of rule techniques for fileEvaluating the efficiency of rule techniques for file
Evaluating the efficiency of rule techniques for file
 
IRJET- A Survey of Text Document Clustering by using Clustering Techniques
IRJET- A Survey of Text Document Clustering by using Clustering TechniquesIRJET- A Survey of Text Document Clustering by using Clustering Techniques
IRJET- A Survey of Text Document Clustering by using Clustering Techniques
 
Document retrieval using clustering
Document retrieval using clusteringDocument retrieval using clustering
Document retrieval using clustering
 
Correlation Analysis of Forensic Metadata for Digital Evidence
Correlation Analysis of Forensic Metadata for Digital EvidenceCorrelation Analysis of Forensic Metadata for Digital Evidence
Correlation Analysis of Forensic Metadata for Digital Evidence
 

More from Iaetsd Iaetsd

iaetsd Survey on cooperative relay based data transmission
iaetsd Survey on cooperative relay based data transmissioniaetsd Survey on cooperative relay based data transmission
iaetsd Survey on cooperative relay based data transmissionIaetsd Iaetsd
 
iaetsd Software defined am transmitter using vhdl
iaetsd Software defined am transmitter using vhdliaetsd Software defined am transmitter using vhdl
iaetsd Software defined am transmitter using vhdlIaetsd Iaetsd
 
iaetsd Health monitoring system with wireless alarm
iaetsd Health monitoring system with wireless alarmiaetsd Health monitoring system with wireless alarm
iaetsd Health monitoring system with wireless alarmIaetsd Iaetsd
 
iaetsd Equalizing channel and power based on cognitive radio system over mult...
iaetsd Equalizing channel and power based on cognitive radio system over mult...iaetsd Equalizing channel and power based on cognitive radio system over mult...
iaetsd Equalizing channel and power based on cognitive radio system over mult...Iaetsd Iaetsd
 
iaetsd Economic analysis and re design of driver’s car seat
iaetsd Economic analysis and re design of driver’s car seatiaetsd Economic analysis and re design of driver’s car seat
iaetsd Economic analysis and re design of driver’s car seatIaetsd Iaetsd
 
iaetsd Design of slotted microstrip patch antenna for wlan application
iaetsd Design of slotted microstrip patch antenna for wlan applicationiaetsd Design of slotted microstrip patch antenna for wlan application
iaetsd Design of slotted microstrip patch antenna for wlan applicationIaetsd Iaetsd
 
REVIEW PAPER- ON ENHANCEMENT OF HEAT TRANSFER USING RIBS
REVIEW PAPER- ON ENHANCEMENT OF HEAT TRANSFER USING RIBSREVIEW PAPER- ON ENHANCEMENT OF HEAT TRANSFER USING RIBS
REVIEW PAPER- ON ENHANCEMENT OF HEAT TRANSFER USING RIBSIaetsd Iaetsd
 
A HYBRID AC/DC SOLAR POWERED STANDALONE SYSTEM WITHOUT INVERTER BASED ON LOAD...
A HYBRID AC/DC SOLAR POWERED STANDALONE SYSTEM WITHOUT INVERTER BASED ON LOAD...A HYBRID AC/DC SOLAR POWERED STANDALONE SYSTEM WITHOUT INVERTER BASED ON LOAD...
A HYBRID AC/DC SOLAR POWERED STANDALONE SYSTEM WITHOUT INVERTER BASED ON LOAD...Iaetsd Iaetsd
 
Fabrication of dual power bike
Fabrication of dual power bikeFabrication of dual power bike
Fabrication of dual power bikeIaetsd Iaetsd
 
Blue brain technology
Blue brain technologyBlue brain technology
Blue brain technologyIaetsd Iaetsd
 
iirdem The Livable Planet – A Revolutionary Concept through Innovative Street...
iirdem The Livable Planet – A Revolutionary Concept through Innovative Street...iirdem The Livable Planet – A Revolutionary Concept through Innovative Street...
iirdem The Livable Planet – A Revolutionary Concept through Innovative Street...Iaetsd Iaetsd
 
iirdem Surveillance aided robotic bird
iirdem Surveillance aided robotic birdiirdem Surveillance aided robotic bird
iirdem Surveillance aided robotic birdIaetsd Iaetsd
 
iirdem Growing India Time Monopoly – The Key to Initiate Long Term Rapid Growth
iirdem Growing India Time Monopoly – The Key to Initiate Long Term Rapid Growthiirdem Growing India Time Monopoly – The Key to Initiate Long Term Rapid Growth
iirdem Growing India Time Monopoly – The Key to Initiate Long Term Rapid GrowthIaetsd Iaetsd
 
iirdem Design of Efficient Solar Energy Collector using MPPT Algorithm
iirdem Design of Efficient Solar Energy Collector using MPPT Algorithmiirdem Design of Efficient Solar Energy Collector using MPPT Algorithm
iirdem Design of Efficient Solar Energy Collector using MPPT AlgorithmIaetsd Iaetsd
 
iirdem CRASH IMPACT ATTENUATOR (CIA) FOR AUTOMOBILES WITH THE ADVOCATION OF M...
iirdem CRASH IMPACT ATTENUATOR (CIA) FOR AUTOMOBILES WITH THE ADVOCATION OF M...iirdem CRASH IMPACT ATTENUATOR (CIA) FOR AUTOMOBILES WITH THE ADVOCATION OF M...
iirdem CRASH IMPACT ATTENUATOR (CIA) FOR AUTOMOBILES WITH THE ADVOCATION OF M...Iaetsd Iaetsd
 
iirdem ADVANCING OF POWER MANAGEMENT IN HOME WITH SMART GRID TECHNOLOGY AND S...
iirdem ADVANCING OF POWER MANAGEMENT IN HOME WITH SMART GRID TECHNOLOGY AND S...iirdem ADVANCING OF POWER MANAGEMENT IN HOME WITH SMART GRID TECHNOLOGY AND S...
iirdem ADVANCING OF POWER MANAGEMENT IN HOME WITH SMART GRID TECHNOLOGY AND S...Iaetsd Iaetsd
 
iaetsd Shared authority based privacy preserving protocol
iaetsd Shared authority based privacy preserving protocoliaetsd Shared authority based privacy preserving protocol
iaetsd Shared authority based privacy preserving protocolIaetsd Iaetsd
 
iaetsd Secured multiple keyword ranked search over encrypted databases
iaetsd Secured multiple keyword ranked search over encrypted databasesiaetsd Secured multiple keyword ranked search over encrypted databases
iaetsd Secured multiple keyword ranked search over encrypted databasesIaetsd Iaetsd
 
iaetsd Robots in oil and gas refineries
iaetsd Robots in oil and gas refineriesiaetsd Robots in oil and gas refineries
iaetsd Robots in oil and gas refineriesIaetsd Iaetsd
 
iaetsd Modeling of solar steam engine system using parabolic
iaetsd Modeling of solar steam engine system using paraboliciaetsd Modeling of solar steam engine system using parabolic
iaetsd Modeling of solar steam engine system using parabolicIaetsd Iaetsd
 

More from Iaetsd Iaetsd (20)

iaetsd Survey on cooperative relay based data transmission
iaetsd Survey on cooperative relay based data transmissioniaetsd Survey on cooperative relay based data transmission
iaetsd Survey on cooperative relay based data transmission
 
iaetsd Software defined am transmitter using vhdl
iaetsd Software defined am transmitter using vhdliaetsd Software defined am transmitter using vhdl
iaetsd Software defined am transmitter using vhdl
 
iaetsd Health monitoring system with wireless alarm
iaetsd Health monitoring system with wireless alarmiaetsd Health monitoring system with wireless alarm
iaetsd Health monitoring system with wireless alarm
 
iaetsd Equalizing channel and power based on cognitive radio system over mult...
iaetsd Equalizing channel and power based on cognitive radio system over mult...iaetsd Equalizing channel and power based on cognitive radio system over mult...
iaetsd Equalizing channel and power based on cognitive radio system over mult...
 
iaetsd Economic analysis and re design of driver’s car seat
iaetsd Economic analysis and re design of driver’s car seatiaetsd Economic analysis and re design of driver’s car seat
iaetsd Economic analysis and re design of driver’s car seat
 
iaetsd Design of slotted microstrip patch antenna for wlan application
iaetsd Design of slotted microstrip patch antenna for wlan applicationiaetsd Design of slotted microstrip patch antenna for wlan application
iaetsd Design of slotted microstrip patch antenna for wlan application
 
REVIEW PAPER- ON ENHANCEMENT OF HEAT TRANSFER USING RIBS
REVIEW PAPER- ON ENHANCEMENT OF HEAT TRANSFER USING RIBSREVIEW PAPER- ON ENHANCEMENT OF HEAT TRANSFER USING RIBS
REVIEW PAPER- ON ENHANCEMENT OF HEAT TRANSFER USING RIBS
 
A HYBRID AC/DC SOLAR POWERED STANDALONE SYSTEM WITHOUT INVERTER BASED ON LOAD...
A HYBRID AC/DC SOLAR POWERED STANDALONE SYSTEM WITHOUT INVERTER BASED ON LOAD...A HYBRID AC/DC SOLAR POWERED STANDALONE SYSTEM WITHOUT INVERTER BASED ON LOAD...
A HYBRID AC/DC SOLAR POWERED STANDALONE SYSTEM WITHOUT INVERTER BASED ON LOAD...
 
Fabrication of dual power bike
Fabrication of dual power bikeFabrication of dual power bike
Fabrication of dual power bike
 
Blue brain technology
Blue brain technologyBlue brain technology
Blue brain technology
 
iirdem The Livable Planet – A Revolutionary Concept through Innovative Street...
iirdem The Livable Planet – A Revolutionary Concept through Innovative Street...iirdem The Livable Planet – A Revolutionary Concept through Innovative Street...
iirdem The Livable Planet – A Revolutionary Concept through Innovative Street...
 
iirdem Surveillance aided robotic bird
iirdem Surveillance aided robotic birdiirdem Surveillance aided robotic bird
iirdem Surveillance aided robotic bird
 
iirdem Growing India Time Monopoly – The Key to Initiate Long Term Rapid Growth
iirdem Growing India Time Monopoly – The Key to Initiate Long Term Rapid Growthiirdem Growing India Time Monopoly – The Key to Initiate Long Term Rapid Growth
iirdem Growing India Time Monopoly – The Key to Initiate Long Term Rapid Growth
 
iirdem Design of Efficient Solar Energy Collector using MPPT Algorithm
iirdem Design of Efficient Solar Energy Collector using MPPT Algorithmiirdem Design of Efficient Solar Energy Collector using MPPT Algorithm
iirdem Design of Efficient Solar Energy Collector using MPPT Algorithm
 
iirdem CRASH IMPACT ATTENUATOR (CIA) FOR AUTOMOBILES WITH THE ADVOCATION OF M...
iirdem CRASH IMPACT ATTENUATOR (CIA) FOR AUTOMOBILES WITH THE ADVOCATION OF M...iirdem CRASH IMPACT ATTENUATOR (CIA) FOR AUTOMOBILES WITH THE ADVOCATION OF M...
iirdem CRASH IMPACT ATTENUATOR (CIA) FOR AUTOMOBILES WITH THE ADVOCATION OF M...
 
iirdem ADVANCING OF POWER MANAGEMENT IN HOME WITH SMART GRID TECHNOLOGY AND S...
iirdem ADVANCING OF POWER MANAGEMENT IN HOME WITH SMART GRID TECHNOLOGY AND S...iirdem ADVANCING OF POWER MANAGEMENT IN HOME WITH SMART GRID TECHNOLOGY AND S...
iirdem ADVANCING OF POWER MANAGEMENT IN HOME WITH SMART GRID TECHNOLOGY AND S...
 
iaetsd Shared authority based privacy preserving protocol
iaetsd Shared authority based privacy preserving protocoliaetsd Shared authority based privacy preserving protocol
iaetsd Shared authority based privacy preserving protocol
 
iaetsd Secured multiple keyword ranked search over encrypted databases
iaetsd Secured multiple keyword ranked search over encrypted databasesiaetsd Secured multiple keyword ranked search over encrypted databases
iaetsd Secured multiple keyword ranked search over encrypted databases
 
iaetsd Robots in oil and gas refineries
iaetsd Robots in oil and gas refineriesiaetsd Robots in oil and gas refineries
iaetsd Robots in oil and gas refineries
 
iaetsd Modeling of solar steam engine system using parabolic
iaetsd Modeling of solar steam engine system using paraboliciaetsd Modeling of solar steam engine system using parabolic
iaetsd Modeling of solar steam engine system using parabolic
 

Recently uploaded

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacingjaychoudhary37
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 

Recently uploaded (20)

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacing
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
VICTOR MAESTRE RAMIREZ - Planetary Defender on NASA's Double Asteroid Redirec...
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 

Classifying Documents Based on SMTP Using an Efficient Clustering Method

  • 1. An Efficient Way of Classifying and Clustering Documents Based on SMTP U.Umamaheswari[1] , Mr.G.Shivaji Rao M.E.[2] , M.E scholar, Assistant Professor, [1] Department of Computer Science and Engineering, [2] Department of Computer Science and Engineering, Sree Sowdambika College of Engineering, Sree Sowdambika College of Engineering, Aruppukottai-626 134, Aruppukottai-626 134, Tamil Nadu, Tamil Nadu, India. India. 13umamaheswari@gmail.com shivajirao88@gmail.com Abstract In text processing, the similarity measurement is the important process. It measures the similarities between the two documents. In this project we proposed the new similarity measurement. The computation of similarity measurement is based on the feature of two documents. Our proposed system contains three case to compute the similarity. The three cases are, both two documents contains features, only one document contains feature, there is no feature into the two documents. In first case, the similarity is increased when the differences of feature value is decreased between the two documents. Then the given differences are scaled. In second case, fixed value is given to the similarity. In third cases there is no contribution to the similarity. Finally our proposed measure method achieves the better performance compared than other measurement methods. Index Terms—Document classification, document clustering, entropy, accuracy, classifiers, clustering algorithms I. INTRODUCTION Text processing plays an important role in information retrieval, data mining, and web search. A document is any content drawn up or received by the Foundation concerning a matter relating to the policies, activities and decisions falling within its competence and in the framework of its official tasks, in whatever medium (written on paper or stored in electronic form , including e-mail, or as a sound, visual or audio-visual recording). The term classification means the allocation of an appropriate level of security (as confidential or restricted) to a document the unauthorised disclosure of which might prejudice the interests of the Foundation, the EU or third parties. Documents are confidential when their unauthorised disclosure could harm the essential interests of an individual, the Foundation or the EU. Documents are restricted when their unauthorised disclosure could be Disadvantageous to the Foundation, the EU or a third party. Documents are restricted when their unauthorised disclosure could be disadvantageous to the Foundation, the EU or a third party. The term originator means the duly authorised author of a classified document. The term downgrading means a reduction in the level of classification. The term declassification means the removal of any classification. RULES FOR CLASSIFICATION Foundation documents that are not public shall be classified in one of the following categories: confidential or restricted. Criteria and guidance for classification are set out in Annex 1 to this decision. The classification of a document shall be decided by the originator based on these rules. All classified documents shall be recorded in a register of classified documents. Applications for access to classified documents shall be examined by the Director. If a classified document is to be made available in response to a request from a member of the public, it shall be first declassified by a decision of the Director. Documents shall be classified only when necessary. The classification shall be clearly indicated and shall be maintained only as long as the document requires protection. The classification of a document shall be determined by the level of sensitivity of its contents. Classification of documents shall be periodically reviewed. By request of the Document Management Officer (DMO), the originator of a document shall indicate if that document or information may be downgraded and declassified. Where a document or information is declassified, details shall be recorded in the register and the document shall be archived appropriately. Where classification is retained, details of the review shall be entered Proceedings of International Conference on Developments in Engineering Research ISBN NO : 378 - 26 - 13840 - 9 www.iaetsd.in INTERNATIONAL ASSOCIATION OF ENGINEERING & TECHNOLOGY FOR SKILL DEVELOPMENT 50
  • 2. in the register. All classified documents shall be retained in a manner that ensures they are not disclosed to unauthorised individuals. Security and access levels to classified documents shall be recorded in the register by the DMO. Either the originator or the DMO may retain classified documents. Where the originator retains documents, they shall be physically safeguarded as specified by the DMO. Individual pages, paragraphs, sections, annexes, appendices, attachments and enclosures of a given document may require different classifications and shall be classified accordingly. The classification of the document as a whole shall be that of its most highly classified part. The originator shall indicate clearly at which level a document should be classified when detached from its enclosures. The classification shall appear at the top and bottom centre of each page, and each page shall be numbered. Each classified document shall bear a reference number and a date. All annexes and enclosures shall be listed on the first page of a document classified as confidential. The classification shall be shown on restricted documents by mechanical or electronic means. Classification shall be shown on confidential documents by mechanical means or by hand or by printing on pre-stamped, registered paper. DOCUMENT CLASSIFICATION Document classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done "manually" (or "intellectually") or algorithmically. The intellectual classification of documents has mostly been the province of library science, while the algorithmic classification of documents is used mainly in information science and computer science. The problems are overlapping, however, and there is therefore also interdisciplinary research on document classification. The documents to be classified may be texts, images, music, etc. Each kind of document possesses its special classification problems. When not otherwise specified, text classification is implied. Documents may be classified according to their subjects or according to other attributes (such as document type, author, printing year etc.). In the rest of this article only subject classification is considered. There are two main philosophies of subject classification of documents: The content based approach and the request based approach. II. RELATED WORK The documents are represented as vectors, each elements of vectors indicates the corresponding feature values of the documents. In existing system the feature values are termed as frequency, relative term frequency and tf-idf (term frequency and inverse document frequency). The document size is large and most of the vector value is zero in this system. In existing system non symmetric measure is used to measure the similarities. Canberra distance metric is one of the existing methods which are used to measure the similarity. This method is applicable when the vector elements are always non-zero. In existing system cosine similarity measure is used to measure the similarities between the two documents. It takes cosine of angle between the two vectors. The phase based similarity measure is used in one of the existing system. So, we are our some problem in following as the document is large in existing system so it takes more memory to store the documents. It contains zero vector values so it makes severe challenges to measure the similarity. It is not efficient and does not provide accurate results. The performance is low. It takes more time to produce the result. In our proposed system new measure is used to measure the similarities between the two documents. This new measure is symmetric measure. The similarities between the two documents are measured with respect to the features. Three feature cases are used to measure the similarities. In first case the two documents contains features value, in second case, only one document contains the feature value and in third case there is no feature value in both documents. This measure is applied in many applications such as single label and multi label classification, clustering and so on. In our proposed system we k-NN based single label classification (SL-KNN) and k-NN base multi label classification (ML-KNN) for classification purpose. The documents are formed as cluster in our concept for this purpose we used Hierarchical Agglomerative Clustering (HAC). It is the one type of k means clustering algorithm. We three type of data sets such as WebKB, Reuters 8 and RCV1. These data sets are stored in the form of XML. So, In our project improve following as It achieves better Proceedings of International Conference on Developments in Engineering Research ISBN NO : 378 - 26 - 13840 - 9 www.iaetsd.in INTERNATIONAL ASSOCIATION OF ENGINEERING & TECHNOLOGY FOR SKILL DEVELOPMENT 51
  • 3. performance compared than other measure, It provides efficient results, It achieves accuracy in similarity measurement, It take less time for similarity measure. III.PROPOSE SIMILARITY MEASURE In our proposed system new measure is used to measure the similarities between the two documents. This new measure is symmetric measure. The similarities between the two documents are measured with respect to the features. Three feature cases are used to measure the similarities. In first case the two documents contains features value, in second case, only one document contains the feature value and in third case there is no feature value in both documents. This measure is applied in many applications such as single label and multi label classification, clustering and so on. In our proposed system we k-NN based single label classification (SL-KNN) and k-NN base multi label classification (ML-KNN) for classification purpose. The documents are formed as cluster in our concept for this purpose we used Hierarchical Agglomerative Clustering (HAC). It is the one type of k means clustering algorithm. We three type of data sets such as WebKB, Reuters 8 and RCV1. These data sets are stored in the form of XML. Fig: System architecture The following properties, among other ones, are preferable for a similarity measure between two documents: 1. The presence or absence of a feature is more essential than the difference between the two values associated with a present feature. Let as Consider two features wi and wj and two documents d1 and d2. wi = d2(some relationship) wi ≠ d1(no relationship) In this case, d1 and d2 are dissimilar in terms of wi. If wj appears in both d1 and d2. Then wj has some relationship with d1 and d2 simultaneously. In this case, d1 and d2 are similar to some degree in terms of wj. For the above two cases, it is reasonable to say that wi carries more weight than wj in determining the similarity degree between d1 and d2. 2. The similarity degree should increase when the difference between two non- zero values of a specific feature decreases. 3. The similarity degree should decrease when the number of presence-absence features increases. 4. Two documents are least similar to each other if none of the features have non- zero values in both documents. Let d1 = < d11, d12, . . . , d1m > and d2 = < d21, d22, . . . , d2m >. If d1id2i = 0, d1i + d2i > 0 for 1 ≤ i ≤ m, then d1 and d2 are least similar to each other. As mentioned earlier, d1 and d2 are dissimilar in terms of a presence-absence feature. 5. The similarity measure should be symmetric. That is, the similarity degree between d1 and d2 should be the same as that between d2 and d1. 6. The value distribution of a feature is considered, i.e., the standard deviation of the feature is taken into account, for its contribution to the similarity between two documents. A feature with a larger spread offers more contribution to the similarity between d1 and d2. IV. METHODOLOGY 1. DATA SET SELECTION Here we used three types of data sets. They are WebKB, Reuters 8 and RCV1. These data sets are stored in the form of HTML and text or SGM. Each data set is divided into two types. They are training and testing data set. WebKB contains web pages as the document which is collected by the world wide knowledge base. It does not contain predefined training set and testing set. So we randomly divide these data sets as training and testing set. Reuter’s data set contains predesigned training set and testing sets. If the resultant data set contains 71% of training set and 29% of testing set then it is called as Reuters 8. RCV1 also contains predesigned training and testing data sets. 2. CLUSTERING After the data set collection then we perform the cluster formation. Cluster contains the more than one data set. For cluster formation Proceedings of International Conference on Developments in Engineering Research ISBN NO : 378 - 26 - 13840 - 9 www.iaetsd.in INTERNATIONAL ASSOCIATION OF ENGINEERING & TECHNOLOGY FOR SKILL DEVELOPMENT 52
  • 4. we used k-means algorithm. In our proposed system, we used HAC (hierarchical agglomerative clustering). It is the one type k means clustering algorithm and it used bottom up approach for clustering. The HAC is mostly used to measure the performance. 3. PERFORM CLASSIFICATION After form the cluster then we performs the classifications. We used SL-kNN and ML- kNN method is used to perform the classifications. SL-KNN is used for only one category and ML-KNN is used for more than two categories. These are used to measure the similarity between the documents. It gives best result for similarity. Classification is used to retrieve process. It increases the fast of retrieve the data. It easily identifies the similarity between the documents. It gives related result efficiently. 4. FIND SIMILARITIES After select the classification methods then we finds the similarities between the data sets. The similarity measurement is based on the features of documents. It finds the word weight and distance to measure the similarities. Similarity measure has three properties they are, the presence and absence feature is more important than the difference of two values that are presented in the feature, the similarity degree is increased when the difference between the non zero values of feature is decreased, the similarity degree is decreased when the presence and absence of feature is increased. 5. PERFORMANCE COMPARISON After perform all the process then we compared the performance of process. In our proposed system, the performance is high compared than all existing system. V.CONCLUSION The main concept of our proposed system is to find the similarity between the documents. For similarity measure we used KNN based single label classification and KNN based multi label classification. Here we collect three data sets such as WEBKB, reuters-8 and RCV1. Classification is used for retrieve process and it increases the fast of retrieve the data. It easily identifies the similarity between the documents. It gives related result efficiently. Here we used HAC for clustering and it is the one type of k means clustering algorithm. It also measures the performance. In our proposed system, the data sets are stored in the form the XML. Here we used tf-idf (term frequency and inverse document frequency) to retrieve the document terms and It measures the similarity by calculating the weight. Here we used optimization and it identifies the no of cluster that we formed in the document. It increases the accuracy of similarity measurement. VI.REFERENCES 1. Yung-Shen Lin, Jung-Yi Jiang, and Shie-Jue Lee, Member, IEEE,” A Similarity Measure for Text Classification and Clustering”, in IEEE Transactions On Knowledge And Data Engineering, Vol. 26, No. 7, July 2014 1575 2. D. Cai, X. He, and J. Han, “Document clustering using locality preserving indexing,” IEEE Trans. Knowl. Data Eng., vol. 17, no. 12, pp. 1624–1637, Dec. 2005. 3. H. Chim and X. Deng, “Efficient phrase-based document similarity for clustering,” IEEE Trans. Knowl. Data Eng., vol. 20, no. 9, pp. 1217–1229, Sept. 2008. 4. S. Clinchant and E. Gaussier, “Information-based models for ad hoc IR,” in Proc. 33rd SIGIR, Geneva, Switzerland, 2010, pp. 234–241. 5. K. M. Hammouda and M. S. Kamel, “Hierarchically distributed peer-to-peer document clustering and cluster summarization,” IEEE Trans. Knowl. Data Eng., vol. 21, no. 5, pp. 681–698, May 2009. 6. Y. Zhao and G. Karypis, “Comparison of agglomerative and partitional document clustering algorithms,” in Proc. Workshop Clustering High Dimensional Data Its Appl. 2nd SIAM ICDM, 2002, pp. 83–93. 7. T. Zhang, Y. Y. Tang, B. Fang, and Y. Xiang, “Document clustering in correlation similarity measure space,” IEEE Trans. Knowl. Data Eng., vol. 24, no. 6, pp. 1002–1013, Jun. 2012. 8. M. L. Zhang and Z. H. Zhou, “ML-kNN: A lazy learning approach to multi-label learning,” Pattern Recognit., vol. 40, no. 7, pp. 2038–2048, 2007. 9. Strehl and J. Ghosh, “Value-based customer grouping from large retail data-sets,” in Proc. SPIE, vol. 4057. Orlando, FL, USA, Apr. 2000, pp. 33–42. 10. F. Sebastiani, “Machine learning in automated text categorization,” ACM CSUR, vol. 34, no. 1, pp. 1–47, 2002. 11. T. W. Schoenharl and G. Madey, “Evaluation of measurement techniques for the validation of agent- based simulations against streaming data,” in Proc. ICCS, Kraków, Poland, 2008. 12. D. D. Lewis, Y. Yang, T. Rose, and F. Li, “RCV1: A new benchmark collection for text categorization research,” J. Mach. Learn. Res., vol. 5, pp. 361–397, Apr. 2004. 13. V. Lertnattee and T. Theeramunkong, “Multidimensional text classification for drug information,” IEEE Trans. Inform. Technol. Biomed., vol. 8, no. 3 pp. 306–312, Sept. 2004. 14. S.-J. Lee and C.-S. Ouyang, “A neuro-fuzzy system modeling with self-constructing rule generation and hybrid SVD-based learning,” IEEE Trans. Fuzzy Syst., vol. 11, no. 3, pp. 341–353, Jun. 2003. 15. G. Amati and C. J. V. Rijsbergen, “Probabilistic models of information retrieval based on measuring the divergence from randomness,” ACM Trans. Inform. Syst., vol. 20, no. 4, pp. 357– 389, 2002. Proceedings of International Conference on Developments in Engineering Research ISBN NO : 378 - 26 - 13840 - 9 www.iaetsd.in INTERNATIONAL ASSOCIATION OF ENGINEERING & TECHNOLOGY FOR SKILL DEVELOPMENT 53