SlideShare a Scribd company logo
1 of 5
Download to read offline
International Journal of Engineering Research and Development
e-ISSN: 2278-067X, p-ISSN: 2278-800X, www.ijerd.com
Volume 8, Issue 8 (September 2013), PP. 01-05
www.ijerd.com 1 | Page
A Novel approach for clustering textual information in
various emails using text data mining technique
Prof. Anjana. R. Arakerimath
Associate Professor, MCA Department Pimpri Chinchwad College of Engineering, Akurdi, Nigdi, Pune. India.
Abstract:- Clustering is the technique used for data reduction. It divides the data into groups based on pattern
similarities such that each group is abstracted by one or more representatives. Recently, there is a growing
emphasis on exploratory analysis of very large datasets to discover useful patterns. This paper explains
extracting the useful knowledge represented by clusters from textual information contained in a large number of
emails for text and data mining techniques. E-mail data that are now becoming the dominant form of inter and
intra organizational written communication for many companies. The sample texts of two mails are verified for
data clustering. The cluster shows the similar emails exchanged between the users and finding the text
similarities to cluster the texts. In this paper the use of Pattern similarities i.e., the similar words exchanged
between the users by considering the different Threshold values are made for the purpose. The threshold value
shows the frequency of the words used. The representation of data is done using a vector space model.
I. INTRODUCTION
E-mail is an effective, fast and cheap communication way. Clustering is a data mining (machine
learning) technique used to place data elements into related groups without advance knowledge of the group
definitions. Popular clustering techniques include k-means clustering and expectation maximization (EM)
clustering.
Text clustering divides a set of texts into clusters (parts), so that texts within each cluster are similar in
content. A text clustering algorithm partitions a set of texts so that texts within the same group are as similar in
content as possible. It is done without using any predifined categories. Text clustering can be applied to the
documents retrieved by a search engine, so that they can be presented in groups according to content.
Clustering is the process of partitioning or dividing a set of patterns (data) into groups. Each cluster is
abstracted using one or more representatives. Representing the data by fewer clusters necessarily loses certain
fine details, but achieves simplification. It models data by its clusters. Clustering is a type of classification
imposed on finite set of objects [3]. The relationship between objects is represented in a proximity matrix in
which the rows represent „n‟ e-mails and columns correspond to the terms given as dimensions. If objects are
categorized as patterns, or points in a d-dimensional metric space, the proximity measure can be Euclidean
distance between a pair of points. Unless a meaningful measure of distance or proximity, between a pair of
objects is established, no meaningful cluster analysis is possible. Clustering is useful in many applications like
decision making, data mining, text mining, machine learning, grouping, and pattern classification and intrusion
detection. Clustering has to be done as it helps in detecting outliners and to examine small size clusters [13]. The
proximity matrix is used in this context and thus serves as a useful input to the clustering algorithm. It represents
a cluster of n patterns by m points. Typically, m < n leading to data compression, can use centriod. This would
help in prototype selection for efficient classification. The clustering algorithms are applied to the training set
belonging to two different classes separately to obtain their correspondent cluster representatives. There are
different stages in clustering. Typical pattern clustering activity involves the following steps, viz..
a) Pattern representation (optionally including feature extraction and/or selection),
b) Definition of a pattern proximity measure appropriate to the data domain,
c) Clustering or grouping,
d) Data abstraction (if needed), and
e) Assessment of output (if needed).
1.1 Applications of text clustering
The technology is now broadly applied for a wide variety of government, research, and business needs.
Applications can be sorted into a number of categories by analysis type or by business function. Using this
approach to classifying solutions, application categories include: Enterprise Business, Intelligence Data Mining,
Competitive Intelligence, E-Discovery, Records Management, National Security Intelligence, Scientific
discovery, especially Life Sciences, Natural Language/Semantic Toolkit or Service, Automated ad placement,
Search/Information Access and Social media monitoring
A Novel approach for clustering textual information in various emails using text…
www.ijerd.com 2 | Page
II. LITERATURE ON CLUSTERING
Any email can be represented in terms of features with discrete values based on some statistics of the
presence or absence of words based on a vector space model. Thus e-mail data can be represented in their vector
form using the vector space model. Before implementing the vector space model for representing the data, it is
important that the data is pre-processed [8]. Clustering are of two types: Unsupervised and Supervised.
Computer algorithms for clustering are typically cast as fully automated, unsupervised learning algorithms; that
is, the algorithm is given only the collection of instances and the surface features that describe each, without any
information about the nature of the clusters. Recently, however, a variety of researchers have studied ways of
allowing a user to provide limited information to improve clustering quality. One approach is to allow the user
to provide cluster labels for some of the instances, indicating which cluster that in- stance belongs to. For
example, [11] [2] [8] use labels of this type to form initial cluster descriptions, which are then re-fined using
both the unlabeled and labeled instances. A second type of input information consists of pair-wise constraints
among instances. These constraints may assert that two documents must belong to the same cluster without
indicating which one it is, or may assert that two documents must belong to different clusters. Various
constraint-based methods and distance-based methods have been proposed to use this type of information. The
short [1] survey on different approaches and also for an approach to integrating distance-based and constraint
based approaches into a probabilistic framework. A third type of additional input involves background
knowledge to enrich the set of features that describe each instance. For example, [6] enriches their document
representation by using an ontology (WorldNet) as background knowledge. A fourth type of extra information,
which we are primarily interested in, is information about the key surface features for a particular class, or
cluster. For example paper [9] uses a few user-supplied keywords per class and a class hierarchy to generate
preliminary labels to build an initial text classifier for the class. The reference [10] proposes an interesting
technique in which they ask a user to identify interesting words among automatically selected representative
words for each class of documents, and then use these user-identified words to re-train the classifier as in [9].
This paper focuses on the classification of textual E-mails using data mining techniques. Some paper [7]
explains filter messages into spam and not spam, but still to divide spam messages into thematically similar
groups and to analyze them, in order to define the social networks of spammers.
In some of the paper, researcher has applied unsupervised word sense discrimination technique based
on clustering similar contexts [12] to the problems of name discrimination and email clustering. The existing
Email System, Email servers accept, forward, deliver and store messages. Most business workers today spend
from one to two hours of their working day on email: reading, ordering, sorting and re-contextualizing'
fragmented information. The Spam: Emails when used to send unsolicited messages and unwanted
advertisements create nuisance and are termed as spam.
Outcome of the Literature
E-mail data are becoming the dominant form for inter and intra organizational written communication
in many companies. Clustering is a technique of creating group of similar objects. In this paper the cluster
shows the similar emails exchanged between the users and it finds the text similarities to cluster. We are using
the Pattern i.e., the similar words exchanged between the users by considering the different Threshold values,
here Threshold value shows the frequency of the words used. With the increase use of email communication, it
became necessary to classify and categorize emails. Email Clustering provides the user to process emails based
on their contents. This system implements Porter Stemmer and K-Means Algorithm. This System will split
mails into specific clusters and emails with content similarity will be stored in same cluster. Reduces the time
and cost factors as user will no more will be filtering the emails by reading one-by-one. The emails will be
stored in clusters based on content for user to access them on their priority.
III. NEED OF NEW SYSTEM
o Preprocessing of Emails before actual storage.
o Email classification can be applied to several different applications, including filtering messages based
on priority, assigning messages to user-created folders, or identifying SPAM.
o To ease the work and Reduce Time and Cost.
o Mail Overflow: The amount of unwanted incoming mail can easily rise so much that it becomes an
annoyance.
IV. METHODOLOGY AND STEPS USED IN TEXT CLUSTERING
1) Get data set from user
2) Perform preprocessing
3) Apply Porter stemmer algorithm
4) Get Stemmed Emails
A Novel approach for clustering textual information in various emails using text…
www.ijerd.com 3 | Page
5) Calculate term frequency
6) Calculate index document Frequency
7) Calculate term weight
8) Apply vector Space model
9) Calculate the Centriod
10) Form the clusters and analyze Emails based on their contents K-means Algorithm
The K-means algorithm takes the input parameter k, and partitions a set of n objects into k clusters so
that the resulting intra cluster similarity is high but the inter-cluster similarity is low. The Kmeans algorithm
proceeds as follows:
a) Arbitrarily choose k objects as the initial cluster centers.
b) Repeat
c) (Reassign each object to the cluster to which the object is the most similar, based on the mean value of the
objects in the cluster.
d) Update the cluster means, i.e., calculate the mean value of the objects for each clusters.
e) Until no change.
Vector space model
Due to the large number of features (terms) in the training set, memory requirements will be more.
Arrays cannot be used to store the features as this leads to memory problems so we use a linked list to
implement the storage of features and the TF – IDF calculation [5]. As the training set contains large number of
documents, the documents are also implemented in the linked list format.
VI. DATA FLOW DIAGRAM OF TEXT CLUSTERING
A Novel approach for clustering textual information in various emails using text…
www.ijerd.com 4 | Page
Graphical User Interface
Figurers shows developed GUI of Email clustering example
Proposed Enhancements
1. Email filtering should be done on attachments.
2. More enhancements can be extended on any image, datasheet, etc.
3. This concept of clustering can be applied on mobile messages also.
VII. CONCLUSION
With the increased use of email communication, it became necessary to classify and categorize emails.
Email Clustering provides the user to process emails based on their contents. This system implements Porter
Stemmer and K-Means Algorithm. This System will split mails into specific clusters and emails with content
similarity will be stored in same cluster. Reduces the time and cost factors as user will no more will be filtering
the emails by reading one-by-one. The emails will be stored in clusters based on content, so user will access
them on their priority.
A Novel approach for clustering textual information in various emails using text…
www.ijerd.com 5 | Page
REFERENCES
[1]. S. Basu, M. Bilenko, and R. J. Mooney. A probabilistic framework for semi-supervised clustering. In
KDD-04, 2004.
[2]. A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the
Conference on Computational Learning Theory, 1998
[3]. Jain A.K., M.N. Murthy and P.J. Flynn, “Data Clustering : A Review,”ACM Computing Surveys,
1999.
[4]. Anagha Kulkarni and Ted Pedersen, “Name Discrimination and Email Clustering using Unsupervised
Clustering and Labeling of Similar Contexts”, 2nd Indian International Conference on Artificial
Intelligence (IICAI-05), pp. 703-722, 2005.
[5]. Porter. M, “An algorithm for suffix stripping”, Proc. Automated library Information systems, pp130-
137, 1980.
[6]. A. Hotho, S. Staab, and G. Stumme. Text clustering based on background knowledge. Technical
Report 425, University of Karlsruhe, Institute AIFB, 2003.
[7]. S. Nazirova, “Mechanism of classification of text spam messages collected in spam pattern bases,” in
Proceedings of the 3rd International Conference on Problems of Cybernetics and Informatics, (PCI
‟10), vol. 2, pp. 206–209, 2010.
[8]. T. Joachims. Transductive inference for text classification using support vector machines in ICML-99,
1999
[9]. R. Jones, A. McCallum, K. Nigam, and E. Rilo. Boot strapping for text learning tasks in IJCAI-99,
Workshop on Text Mining: Foundations, Techniques and Applications, 1999.
[10]. B. Liu, X. Li, W. S. Lee, and P. S. Yu. Text classification by labeling words. In AAAI-04, 2004.
[11]. K. Nigam, A. K. McCallum, S. Thrun, and T. M. Mitchell. Learning to classify text from labeled and
unlabeled documents in AAAI-98, 1998
[12]. Purandare A.and penersen T. Word sense discrimination by clustering contexts in vector and similarity
spaces. The proceedings of the conference on Computational Natural language Learning,PP41-48
boston, MA 2004.
[13]. Bryan Klimt and Yiming Yang, “The Enron Corpus: A New Dataset for Email classification
Research”, European Conference on Machine Learning, Pisa, Italy, 2004.

More Related Content

What's hot

Classification of text data using feature clustering algorithm
Classification of text data using feature clustering algorithmClassification of text data using feature clustering algorithm
Classification of text data using feature clustering algorithmeSAT Publishing House
 
Different Similarity Measures for Text Classification Using Knn
Different Similarity Measures for Text Classification Using KnnDifferent Similarity Measures for Text Classification Using Knn
Different Similarity Measures for Text Classification Using KnnIOSR Journals
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALijaia
 
Multi label text classification
Multi label text classificationMulti label text classification
Multi label text classificationraghavr186
 
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...IJERA Editor
 
Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Editor IJARCET
 
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERWITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERijnlc
 
A Competent and Empirical Model of Distributed Clustering
A Competent and Empirical Model of Distributed ClusteringA Competent and Empirical Model of Distributed Clustering
A Competent and Empirical Model of Distributed ClusteringIRJET Journal
 
Text Classification using Support Vector Machine
Text Classification using Support Vector MachineText Classification using Support Vector Machine
Text Classification using Support Vector Machineinventionjournals
 
Blei ngjordan2003
Blei ngjordan2003Blei ngjordan2003
Blei ngjordan2003Ajay Ohri
 
Data Mining in Multi-Instance and Multi-Represented Objects
Data Mining in Multi-Instance and Multi-Represented ObjectsData Mining in Multi-Instance and Multi-Represented Objects
Data Mining in Multi-Instance and Multi-Represented Objectsijsrd.com
 
Textual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative AnalysisTextual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative AnalysisEditor IJMTER
 
Clustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureClustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureIOSR Journals
 
Text clustering
Text clusteringText clustering
Text clusteringKU Leuven
 
A Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia ArticlesA Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia Articlesijma
 

What's hot (20)

Classification of text data using feature clustering algorithm
Classification of text data using feature clustering algorithmClassification of text data using feature clustering algorithm
Classification of text data using feature clustering algorithm
 
Different Similarity Measures for Text Classification Using Knn
Different Similarity Measures for Text Classification Using KnnDifferent Similarity Measures for Text Classification Using Knn
Different Similarity Measures for Text Classification Using Knn
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
 
Multi label text classification
Multi label text classificationMulti label text classification
Multi label text classification
 
Hc3612711275
Hc3612711275Hc3612711275
Hc3612711275
 
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...
Dynamic & Attribute Weighted KNN for Document Classification Using Bootstrap ...
 
Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973
 
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERWITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
 
A Competent and Empirical Model of Distributed Clustering
A Competent and Empirical Model of Distributed ClusteringA Competent and Empirical Model of Distributed Clustering
A Competent and Empirical Model of Distributed Clustering
 
Text Classification using Support Vector Machine
Text Classification using Support Vector MachineText Classification using Support Vector Machine
Text Classification using Support Vector Machine
 
Blei ngjordan2003
Blei ngjordan2003Blei ngjordan2003
Blei ngjordan2003
 
Data Mining in Multi-Instance and Multi-Represented Objects
Data Mining in Multi-Instance and Multi-Represented ObjectsData Mining in Multi-Instance and Multi-Represented Objects
Data Mining in Multi-Instance and Multi-Represented Objects
 
Textual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative AnalysisTextual Data Partitioning with Relationship and Discriminative Analysis
Textual Data Partitioning with Relationship and Discriminative Analysis
 
Clustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity MeasureClustering Algorithm with a Novel Similarity Measure
Clustering Algorithm with a Novel Similarity Measure
 
Text Classification in the Field of Search Engines
Text Classification in the Field of Search EnginesText Classification in the Field of Search Engines
Text Classification in the Field of Search Engines
 
AN IMPROVED TECHNIQUE FOR DOCUMENT CLUSTERING
AN IMPROVED TECHNIQUE FOR DOCUMENT CLUSTERINGAN IMPROVED TECHNIQUE FOR DOCUMENT CLUSTERING
AN IMPROVED TECHNIQUE FOR DOCUMENT CLUSTERING
 
Text clustering
Text clusteringText clustering
Text clustering
 
A Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia ArticlesA Document Exploring System on LDA Topic Model for Wikipedia Articles
A Document Exploring System on LDA Topic Model for Wikipedia Articles
 
K04302082087
K04302082087K04302082087
K04302082087
 
Spe165 t
Spe165 tSpe165 t
Spe165 t
 

Viewers also liked

ACP, uma forma muito especial de estar junto
ACP, uma forma muito especial de estar juntoACP, uma forma muito especial de estar junto
ACP, uma forma muito especial de estar juntoRodrigo Rezende
 
Rodizio Grill Dinner Packages
Rodizio Grill Dinner PackagesRodizio Grill Dinner Packages
Rodizio Grill Dinner PackagesEric Underwood
 
Bio Lunch & Inspiratie Voor en door Vrouwen
Bio Lunch & Inspiratie Voor en door VrouwenBio Lunch & Inspiratie Voor en door Vrouwen
Bio Lunch & Inspiratie Voor en door VrouwenMarjolein van den Bos
 
Leeann Collins Resume Oct 2016
Leeann Collins Resume Oct 2016Leeann Collins Resume Oct 2016
Leeann Collins Resume Oct 2016Leeann Collins
 

Viewers also liked (8)

ACP, uma forma muito especial de estar junto
ACP, uma forma muito especial de estar juntoACP, uma forma muito especial de estar junto
ACP, uma forma muito especial de estar junto
 
Rodizio Grill Dinner Packages
Rodizio Grill Dinner PackagesRodizio Grill Dinner Packages
Rodizio Grill Dinner Packages
 
Nike
NikeNike
Nike
 
Bio Lunch & Inspiratie Voor en door Vrouwen
Bio Lunch & Inspiratie Voor en door VrouwenBio Lunch & Inspiratie Voor en door Vrouwen
Bio Lunch & Inspiratie Voor en door Vrouwen
 
Recommendation Letter Manish Mohil
Recommendation Letter Manish MohilRecommendation Letter Manish Mohil
Recommendation Letter Manish Mohil
 
owensresume2016
owensresume2016owensresume2016
owensresume2016
 
Leeann Collins Resume Oct 2016
Leeann Collins Resume Oct 2016Leeann Collins Resume Oct 2016
Leeann Collins Resume Oct 2016
 
Analytical Intrumentation Certificate
Analytical Intrumentation CertificateAnalytical Intrumentation Certificate
Analytical Intrumentation Certificate
 

Similar to International Journal of Engineering Research and Development (IJERD)

Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Editor IJARCET
 
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringA Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringIJMER
 
IRJET- Multi Label Document Classification Approach using Machine Learning Te...
IRJET- Multi Label Document Classification Approach using Machine Learning Te...IRJET- Multi Label Document Classification Approach using Machine Learning Te...
IRJET- Multi Label Document Classification Approach using Machine Learning Te...IRJET Journal
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET Journal
 
A Deep Analysis on Prevailing Spam Mail Filteration Machine Learning Approaches
A Deep Analysis on Prevailing Spam Mail Filteration Machine Learning ApproachesA Deep Analysis on Prevailing Spam Mail Filteration Machine Learning Approaches
A Deep Analysis on Prevailing Spam Mail Filteration Machine Learning Approachesijtsrd
 
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGA SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSijseajournal
 
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION cscpconf
 
Bs31267274
Bs31267274Bs31267274
Bs31267274IJMER
 
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...IJCSIS Research Publications
 
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...ijdmtaiir
 
A Novel Clustering Method for Similarity Measuring in Text Documents
A Novel Clustering Method for Similarity Measuring in Text DocumentsA Novel Clustering Method for Similarity Measuring in Text Documents
A Novel Clustering Method for Similarity Measuring in Text DocumentsIJMER
 
Paper id 37201536
Paper id 37201536Paper id 37201536
Paper id 37201536IJRAT
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...IRJET Journal
 
Iaetsd hierarchical fuzzy rule based classification
Iaetsd hierarchical fuzzy rule based classificationIaetsd hierarchical fuzzy rule based classification
Iaetsd hierarchical fuzzy rule based classificationIaetsd Iaetsd
 

Similar to International Journal of Engineering Research and Development (IJERD) (20)

Bl24409420
Bl24409420Bl24409420
Bl24409420
 
Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973Volume 2-issue-6-1969-1973
Volume 2-issue-6-1969-1973
 
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringA Novel Multi- Viewpoint based Similarity Measure for Document Clustering
A Novel Multi- Viewpoint based Similarity Measure for Document Clustering
 
IRJET- Multi Label Document Classification Approach using Machine Learning Te...
IRJET- Multi Label Document Classification Approach using Machine Learning Te...IRJET- Multi Label Document Classification Approach using Machine Learning Te...
IRJET- Multi Label Document Classification Approach using Machine Learning Te...
 
E1062530
E1062530E1062530
E1062530
 
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
IRJET- Cluster Analysis for Effective Information Retrieval through Cohesive ...
 
A Deep Analysis on Prevailing Spam Mail Filteration Machine Learning Approaches
A Deep Analysis on Prevailing Spam Mail Filteration Machine Learning ApproachesA Deep Analysis on Prevailing Spam Mail Filteration Machine Learning Approaches
A Deep Analysis on Prevailing Spam Mail Filteration Machine Learning Approaches
 
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGA SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERING
 
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSCONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANS
 
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION
 
Bs31267274
Bs31267274Bs31267274
Bs31267274
 
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
Improved K-mean Clustering Algorithm for Prediction Analysis using Classifica...
 
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...
 
A Novel Clustering Method for Similarity Measuring in Text Documents
A Novel Clustering Method for Similarity Measuring in Text DocumentsA Novel Clustering Method for Similarity Measuring in Text Documents
A Novel Clustering Method for Similarity Measuring in Text Documents
 
Lx3520322036
Lx3520322036Lx3520322036
Lx3520322036
 
Paper id 37201536
Paper id 37201536Paper id 37201536
Paper id 37201536
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
 
Iaetsd hierarchical fuzzy rule based classification
Iaetsd hierarchical fuzzy rule based classificationIaetsd hierarchical fuzzy rule based classification
Iaetsd hierarchical fuzzy rule based classification
 
600 608
600 608600 608
600 608
 

More from IJERD Editor

A Novel Method for Prevention of Bandwidth Distributed Denial of Service Attacks
A Novel Method for Prevention of Bandwidth Distributed Denial of Service AttacksA Novel Method for Prevention of Bandwidth Distributed Denial of Service Attacks
A Novel Method for Prevention of Bandwidth Distributed Denial of Service AttacksIJERD Editor
 
MEMS MICROPHONE INTERFACE
MEMS MICROPHONE INTERFACEMEMS MICROPHONE INTERFACE
MEMS MICROPHONE INTERFACEIJERD Editor
 
Influence of tensile behaviour of slab on the structural Behaviour of shear c...
Influence of tensile behaviour of slab on the structural Behaviour of shear c...Influence of tensile behaviour of slab on the structural Behaviour of shear c...
Influence of tensile behaviour of slab on the structural Behaviour of shear c...IJERD Editor
 
Gold prospecting using Remote Sensing ‘A case study of Sudan’
Gold prospecting using Remote Sensing ‘A case study of Sudan’Gold prospecting using Remote Sensing ‘A case study of Sudan’
Gold prospecting using Remote Sensing ‘A case study of Sudan’IJERD Editor
 
Reducing Corrosion Rate by Welding Design
Reducing Corrosion Rate by Welding DesignReducing Corrosion Rate by Welding Design
Reducing Corrosion Rate by Welding DesignIJERD Editor
 
Router 1X3 – RTL Design and Verification
Router 1X3 – RTL Design and VerificationRouter 1X3 – RTL Design and Verification
Router 1X3 – RTL Design and VerificationIJERD Editor
 
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...IJERD Editor
 
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVR
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVRMitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVR
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVRIJERD Editor
 
Study on the Fused Deposition Modelling In Additive Manufacturing
Study on the Fused Deposition Modelling In Additive ManufacturingStudy on the Fused Deposition Modelling In Additive Manufacturing
Study on the Fused Deposition Modelling In Additive ManufacturingIJERD Editor
 
Spyware triggering system by particular string value
Spyware triggering system by particular string valueSpyware triggering system by particular string value
Spyware triggering system by particular string valueIJERD Editor
 
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...IJERD Editor
 
Secure Image Transmission for Cloud Storage System Using Hybrid Scheme
Secure Image Transmission for Cloud Storage System Using Hybrid SchemeSecure Image Transmission for Cloud Storage System Using Hybrid Scheme
Secure Image Transmission for Cloud Storage System Using Hybrid SchemeIJERD Editor
 
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...IJERD Editor
 
Gesture Gaming on the World Wide Web Using an Ordinary Web Camera
Gesture Gaming on the World Wide Web Using an Ordinary Web CameraGesture Gaming on the World Wide Web Using an Ordinary Web Camera
Gesture Gaming on the World Wide Web Using an Ordinary Web CameraIJERD Editor
 
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...IJERD Editor
 
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...IJERD Editor
 
Moon-bounce: A Boon for VHF Dxing
Moon-bounce: A Boon for VHF DxingMoon-bounce: A Boon for VHF Dxing
Moon-bounce: A Boon for VHF DxingIJERD Editor
 
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...IJERD Editor
 
Importance of Measurements in Smart Grid
Importance of Measurements in Smart GridImportance of Measurements in Smart Grid
Importance of Measurements in Smart GridIJERD Editor
 
Study of Macro level Properties of SCC using GGBS and Lime stone powder
Study of Macro level Properties of SCC using GGBS and Lime stone powderStudy of Macro level Properties of SCC using GGBS and Lime stone powder
Study of Macro level Properties of SCC using GGBS and Lime stone powderIJERD Editor
 

More from IJERD Editor (20)

A Novel Method for Prevention of Bandwidth Distributed Denial of Service Attacks
A Novel Method for Prevention of Bandwidth Distributed Denial of Service AttacksA Novel Method for Prevention of Bandwidth Distributed Denial of Service Attacks
A Novel Method for Prevention of Bandwidth Distributed Denial of Service Attacks
 
MEMS MICROPHONE INTERFACE
MEMS MICROPHONE INTERFACEMEMS MICROPHONE INTERFACE
MEMS MICROPHONE INTERFACE
 
Influence of tensile behaviour of slab on the structural Behaviour of shear c...
Influence of tensile behaviour of slab on the structural Behaviour of shear c...Influence of tensile behaviour of slab on the structural Behaviour of shear c...
Influence of tensile behaviour of slab on the structural Behaviour of shear c...
 
Gold prospecting using Remote Sensing ‘A case study of Sudan’
Gold prospecting using Remote Sensing ‘A case study of Sudan’Gold prospecting using Remote Sensing ‘A case study of Sudan’
Gold prospecting using Remote Sensing ‘A case study of Sudan’
 
Reducing Corrosion Rate by Welding Design
Reducing Corrosion Rate by Welding DesignReducing Corrosion Rate by Welding Design
Reducing Corrosion Rate by Welding Design
 
Router 1X3 – RTL Design and Verification
Router 1X3 – RTL Design and VerificationRouter 1X3 – RTL Design and Verification
Router 1X3 – RTL Design and Verification
 
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...
 
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVR
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVRMitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVR
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVR
 
Study on the Fused Deposition Modelling In Additive Manufacturing
Study on the Fused Deposition Modelling In Additive ManufacturingStudy on the Fused Deposition Modelling In Additive Manufacturing
Study on the Fused Deposition Modelling In Additive Manufacturing
 
Spyware triggering system by particular string value
Spyware triggering system by particular string valueSpyware triggering system by particular string value
Spyware triggering system by particular string value
 
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
 
Secure Image Transmission for Cloud Storage System Using Hybrid Scheme
Secure Image Transmission for Cloud Storage System Using Hybrid SchemeSecure Image Transmission for Cloud Storage System Using Hybrid Scheme
Secure Image Transmission for Cloud Storage System Using Hybrid Scheme
 
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...
 
Gesture Gaming on the World Wide Web Using an Ordinary Web Camera
Gesture Gaming on the World Wide Web Using an Ordinary Web CameraGesture Gaming on the World Wide Web Using an Ordinary Web Camera
Gesture Gaming on the World Wide Web Using an Ordinary Web Camera
 
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...
 
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...
 
Moon-bounce: A Boon for VHF Dxing
Moon-bounce: A Boon for VHF DxingMoon-bounce: A Boon for VHF Dxing
Moon-bounce: A Boon for VHF Dxing
 
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
 
Importance of Measurements in Smart Grid
Importance of Measurements in Smart GridImportance of Measurements in Smart Grid
Importance of Measurements in Smart Grid
 
Study of Macro level Properties of SCC using GGBS and Lime stone powder
Study of Macro level Properties of SCC using GGBS and Lime stone powderStudy of Macro level Properties of SCC using GGBS and Lime stone powder
Study of Macro level Properties of SCC using GGBS and Lime stone powder
 

Recently uploaded

AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaborationbruanjhuli
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 

Recently uploaded (20)

AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online CollaborationCOMPUTER 10: Lesson 7 - File Storage and Online Collaboration
COMPUTER 10: Lesson 7 - File Storage and Online Collaboration
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 

International Journal of Engineering Research and Development (IJERD)

  • 1. International Journal of Engineering Research and Development e-ISSN: 2278-067X, p-ISSN: 2278-800X, www.ijerd.com Volume 8, Issue 8 (September 2013), PP. 01-05 www.ijerd.com 1 | Page A Novel approach for clustering textual information in various emails using text data mining technique Prof. Anjana. R. Arakerimath Associate Professor, MCA Department Pimpri Chinchwad College of Engineering, Akurdi, Nigdi, Pune. India. Abstract:- Clustering is the technique used for data reduction. It divides the data into groups based on pattern similarities such that each group is abstracted by one or more representatives. Recently, there is a growing emphasis on exploratory analysis of very large datasets to discover useful patterns. This paper explains extracting the useful knowledge represented by clusters from textual information contained in a large number of emails for text and data mining techniques. E-mail data that are now becoming the dominant form of inter and intra organizational written communication for many companies. The sample texts of two mails are verified for data clustering. The cluster shows the similar emails exchanged between the users and finding the text similarities to cluster the texts. In this paper the use of Pattern similarities i.e., the similar words exchanged between the users by considering the different Threshold values are made for the purpose. The threshold value shows the frequency of the words used. The representation of data is done using a vector space model. I. INTRODUCTION E-mail is an effective, fast and cheap communication way. Clustering is a data mining (machine learning) technique used to place data elements into related groups without advance knowledge of the group definitions. Popular clustering techniques include k-means clustering and expectation maximization (EM) clustering. Text clustering divides a set of texts into clusters (parts), so that texts within each cluster are similar in content. A text clustering algorithm partitions a set of texts so that texts within the same group are as similar in content as possible. It is done without using any predifined categories. Text clustering can be applied to the documents retrieved by a search engine, so that they can be presented in groups according to content. Clustering is the process of partitioning or dividing a set of patterns (data) into groups. Each cluster is abstracted using one or more representatives. Representing the data by fewer clusters necessarily loses certain fine details, but achieves simplification. It models data by its clusters. Clustering is a type of classification imposed on finite set of objects [3]. The relationship between objects is represented in a proximity matrix in which the rows represent „n‟ e-mails and columns correspond to the terms given as dimensions. If objects are categorized as patterns, or points in a d-dimensional metric space, the proximity measure can be Euclidean distance between a pair of points. Unless a meaningful measure of distance or proximity, between a pair of objects is established, no meaningful cluster analysis is possible. Clustering is useful in many applications like decision making, data mining, text mining, machine learning, grouping, and pattern classification and intrusion detection. Clustering has to be done as it helps in detecting outliners and to examine small size clusters [13]. The proximity matrix is used in this context and thus serves as a useful input to the clustering algorithm. It represents a cluster of n patterns by m points. Typically, m < n leading to data compression, can use centriod. This would help in prototype selection for efficient classification. The clustering algorithms are applied to the training set belonging to two different classes separately to obtain their correspondent cluster representatives. There are different stages in clustering. Typical pattern clustering activity involves the following steps, viz.. a) Pattern representation (optionally including feature extraction and/or selection), b) Definition of a pattern proximity measure appropriate to the data domain, c) Clustering or grouping, d) Data abstraction (if needed), and e) Assessment of output (if needed). 1.1 Applications of text clustering The technology is now broadly applied for a wide variety of government, research, and business needs. Applications can be sorted into a number of categories by analysis type or by business function. Using this approach to classifying solutions, application categories include: Enterprise Business, Intelligence Data Mining, Competitive Intelligence, E-Discovery, Records Management, National Security Intelligence, Scientific discovery, especially Life Sciences, Natural Language/Semantic Toolkit or Service, Automated ad placement, Search/Information Access and Social media monitoring
  • 2. A Novel approach for clustering textual information in various emails using text… www.ijerd.com 2 | Page II. LITERATURE ON CLUSTERING Any email can be represented in terms of features with discrete values based on some statistics of the presence or absence of words based on a vector space model. Thus e-mail data can be represented in their vector form using the vector space model. Before implementing the vector space model for representing the data, it is important that the data is pre-processed [8]. Clustering are of two types: Unsupervised and Supervised. Computer algorithms for clustering are typically cast as fully automated, unsupervised learning algorithms; that is, the algorithm is given only the collection of instances and the surface features that describe each, without any information about the nature of the clusters. Recently, however, a variety of researchers have studied ways of allowing a user to provide limited information to improve clustering quality. One approach is to allow the user to provide cluster labels for some of the instances, indicating which cluster that in- stance belongs to. For example, [11] [2] [8] use labels of this type to form initial cluster descriptions, which are then re-fined using both the unlabeled and labeled instances. A second type of input information consists of pair-wise constraints among instances. These constraints may assert that two documents must belong to the same cluster without indicating which one it is, or may assert that two documents must belong to different clusters. Various constraint-based methods and distance-based methods have been proposed to use this type of information. The short [1] survey on different approaches and also for an approach to integrating distance-based and constraint based approaches into a probabilistic framework. A third type of additional input involves background knowledge to enrich the set of features that describe each instance. For example, [6] enriches their document representation by using an ontology (WorldNet) as background knowledge. A fourth type of extra information, which we are primarily interested in, is information about the key surface features for a particular class, or cluster. For example paper [9] uses a few user-supplied keywords per class and a class hierarchy to generate preliminary labels to build an initial text classifier for the class. The reference [10] proposes an interesting technique in which they ask a user to identify interesting words among automatically selected representative words for each class of documents, and then use these user-identified words to re-train the classifier as in [9]. This paper focuses on the classification of textual E-mails using data mining techniques. Some paper [7] explains filter messages into spam and not spam, but still to divide spam messages into thematically similar groups and to analyze them, in order to define the social networks of spammers. In some of the paper, researcher has applied unsupervised word sense discrimination technique based on clustering similar contexts [12] to the problems of name discrimination and email clustering. The existing Email System, Email servers accept, forward, deliver and store messages. Most business workers today spend from one to two hours of their working day on email: reading, ordering, sorting and re-contextualizing' fragmented information. The Spam: Emails when used to send unsolicited messages and unwanted advertisements create nuisance and are termed as spam. Outcome of the Literature E-mail data are becoming the dominant form for inter and intra organizational written communication in many companies. Clustering is a technique of creating group of similar objects. In this paper the cluster shows the similar emails exchanged between the users and it finds the text similarities to cluster. We are using the Pattern i.e., the similar words exchanged between the users by considering the different Threshold values, here Threshold value shows the frequency of the words used. With the increase use of email communication, it became necessary to classify and categorize emails. Email Clustering provides the user to process emails based on their contents. This system implements Porter Stemmer and K-Means Algorithm. This System will split mails into specific clusters and emails with content similarity will be stored in same cluster. Reduces the time and cost factors as user will no more will be filtering the emails by reading one-by-one. The emails will be stored in clusters based on content for user to access them on their priority. III. NEED OF NEW SYSTEM o Preprocessing of Emails before actual storage. o Email classification can be applied to several different applications, including filtering messages based on priority, assigning messages to user-created folders, or identifying SPAM. o To ease the work and Reduce Time and Cost. o Mail Overflow: The amount of unwanted incoming mail can easily rise so much that it becomes an annoyance. IV. METHODOLOGY AND STEPS USED IN TEXT CLUSTERING 1) Get data set from user 2) Perform preprocessing 3) Apply Porter stemmer algorithm 4) Get Stemmed Emails
  • 3. A Novel approach for clustering textual information in various emails using text… www.ijerd.com 3 | Page 5) Calculate term frequency 6) Calculate index document Frequency 7) Calculate term weight 8) Apply vector Space model 9) Calculate the Centriod 10) Form the clusters and analyze Emails based on their contents K-means Algorithm The K-means algorithm takes the input parameter k, and partitions a set of n objects into k clusters so that the resulting intra cluster similarity is high but the inter-cluster similarity is low. The Kmeans algorithm proceeds as follows: a) Arbitrarily choose k objects as the initial cluster centers. b) Repeat c) (Reassign each object to the cluster to which the object is the most similar, based on the mean value of the objects in the cluster. d) Update the cluster means, i.e., calculate the mean value of the objects for each clusters. e) Until no change. Vector space model Due to the large number of features (terms) in the training set, memory requirements will be more. Arrays cannot be used to store the features as this leads to memory problems so we use a linked list to implement the storage of features and the TF – IDF calculation [5]. As the training set contains large number of documents, the documents are also implemented in the linked list format. VI. DATA FLOW DIAGRAM OF TEXT CLUSTERING
  • 4. A Novel approach for clustering textual information in various emails using text… www.ijerd.com 4 | Page Graphical User Interface Figurers shows developed GUI of Email clustering example Proposed Enhancements 1. Email filtering should be done on attachments. 2. More enhancements can be extended on any image, datasheet, etc. 3. This concept of clustering can be applied on mobile messages also. VII. CONCLUSION With the increased use of email communication, it became necessary to classify and categorize emails. Email Clustering provides the user to process emails based on their contents. This system implements Porter Stemmer and K-Means Algorithm. This System will split mails into specific clusters and emails with content similarity will be stored in same cluster. Reduces the time and cost factors as user will no more will be filtering the emails by reading one-by-one. The emails will be stored in clusters based on content, so user will access them on their priority.
  • 5. A Novel approach for clustering textual information in various emails using text… www.ijerd.com 5 | Page REFERENCES [1]. S. Basu, M. Bilenko, and R. J. Mooney. A probabilistic framework for semi-supervised clustering. In KDD-04, 2004. [2]. A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the Conference on Computational Learning Theory, 1998 [3]. Jain A.K., M.N. Murthy and P.J. Flynn, “Data Clustering : A Review,”ACM Computing Surveys, 1999. [4]. Anagha Kulkarni and Ted Pedersen, “Name Discrimination and Email Clustering using Unsupervised Clustering and Labeling of Similar Contexts”, 2nd Indian International Conference on Artificial Intelligence (IICAI-05), pp. 703-722, 2005. [5]. Porter. M, “An algorithm for suffix stripping”, Proc. Automated library Information systems, pp130- 137, 1980. [6]. A. Hotho, S. Staab, and G. Stumme. Text clustering based on background knowledge. Technical Report 425, University of Karlsruhe, Institute AIFB, 2003. [7]. S. Nazirova, “Mechanism of classification of text spam messages collected in spam pattern bases,” in Proceedings of the 3rd International Conference on Problems of Cybernetics and Informatics, (PCI ‟10), vol. 2, pp. 206–209, 2010. [8]. T. Joachims. Transductive inference for text classification using support vector machines in ICML-99, 1999 [9]. R. Jones, A. McCallum, K. Nigam, and E. Rilo. Boot strapping for text learning tasks in IJCAI-99, Workshop on Text Mining: Foundations, Techniques and Applications, 1999. [10]. B. Liu, X. Li, W. S. Lee, and P. S. Yu. Text classification by labeling words. In AAAI-04, 2004. [11]. K. Nigam, A. K. McCallum, S. Thrun, and T. M. Mitchell. Learning to classify text from labeled and unlabeled documents in AAAI-98, 1998 [12]. Purandare A.and penersen T. Word sense discrimination by clustering contexts in vector and similarity spaces. The proceedings of the conference on Computational Natural language Learning,PP41-48 boston, MA 2004. [13]. Bryan Klimt and Yiming Yang, “The Enron Corpus: A New Dataset for Email classification Research”, European Conference on Machine Learning, Pisa, Italy, 2004.