This document proposes a new multi-viewpoint based similarity measure for clustering text documents that aims to overcome limitations of existing measures. Existing measures use a single viewpoint to measure similarity between documents, but the proposed measure uses multiple viewpoints to ensure clusters exhibit all relationships between documents. The empirical study found that using a multi-viewpoint similarity measure forms more meaningful clusters by capturing more informative relationships between documents.
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Textual Data Partitioning with Relationship and Discriminative AnalysisEditor IJMTER
Data partitioning methods are used to partition the data values with similarity. Similarity
measures are used to estimate transaction relationships. Hierarchical clustering model produces tree
structured results. Partitioned clustering produces results in grid format. Text documents are
unstructured data values with high dimensional attributes. Document clustering group ups unlabeled text
documents into meaningful clusters. Traditional clustering methods require cluster count (K) for the
document grouping process. Clustering accuracy degrades drastically with reference to the unsuitable
cluster count.
Textual data elements are divided into two types’ discriminative words and nondiscriminative
words. Only discriminative words are useful for grouping documents. The involvement of
nondiscriminative words confuses the clustering process and leads to poor clustering solution in return.
A variation inference algorithm is used to infer the document collection structure and partition of
document words at the same time. Dirichlet Process Mixture (DPM) model is used to partition
documents. DPM clustering model uses both the data likelihood and the clustering property of the
Dirichlet Process (DP). Dirichlet Process Mixture Model for Feature Partition (DPMFP) is used to
discover the latent cluster structure based on the DPM model. DPMFP clustering is performed without
requiring the number of clusters as input.
Document labels are used to estimate the discriminative word identification process. Concept
relationships are analyzed with Ontology support. Semantic weight model is used for the document
similarity analysis. The system improves the scalability with the support of labels and concept relations
for dimensionality reduction process.
A Novel Clustering Method for Similarity Measuring in Text DocumentsIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Textual Data Partitioning with Relationship and Discriminative AnalysisEditor IJMTER
Data partitioning methods are used to partition the data values with similarity. Similarity
measures are used to estimate transaction relationships. Hierarchical clustering model produces tree
structured results. Partitioned clustering produces results in grid format. Text documents are
unstructured data values with high dimensional attributes. Document clustering group ups unlabeled text
documents into meaningful clusters. Traditional clustering methods require cluster count (K) for the
document grouping process. Clustering accuracy degrades drastically with reference to the unsuitable
cluster count.
Textual data elements are divided into two types’ discriminative words and nondiscriminative
words. Only discriminative words are useful for grouping documents. The involvement of
nondiscriminative words confuses the clustering process and leads to poor clustering solution in return.
A variation inference algorithm is used to infer the document collection structure and partition of
document words at the same time. Dirichlet Process Mixture (DPM) model is used to partition
documents. DPM clustering model uses both the data likelihood and the clustering property of the
Dirichlet Process (DP). Dirichlet Process Mixture Model for Feature Partition (DPMFP) is used to
discover the latent cluster structure based on the DPM model. DPMFP clustering is performed without
requiring the number of clusters as input.
Document labels are used to estimate the discriminative word identification process. Concept
relationships are analyzed with Ontology support. Semantic weight model is used for the document
similarity analysis. The system improves the scalability with the support of labels and concept relations
for dimensionality reduction process.
A Novel Clustering Method for Similarity Measuring in Text DocumentsIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
Data mining is the process of using technology to identify patterns and prospects from large amount of information. In Data Mining, Clustering is an important research topic and wide range of unverified classification application. Clustering is technique which divides a data into meaningful groups. K-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. In this paper, we present the comparison of different K-means clustering algorithms.
International Journal of Engineering and Science Invention (IJESI)inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...Editor IJMTER
Databases are build with the fixed number of fields and records. Uncertain database contains a
different number of fields and records. Clustering techniques are used to group up the relevant records
based on the similarity values. The similarity measures are designed to estimate the relationship between
the transactions with fixed attributes. The uncertain data similarity is estimated using similarity
measures with some modifications.
Clustering on uncertain data is one of the essential tasks in mining uncertain data. The existing
methods extend traditional partitioning clustering methods like k-means and density-based clustering
methods like DBSCAN to uncertain data. Such methods cannot handle uncertain objects. Probability
distributions are essential characteristics of uncertain objects have not been considered in measuring
similarity between uncertain objects.
The customer purchase transaction data is analyzed using uncertain data clustering scheme. The
density based clustering mechanism is used for the uncertain data clustering process. This model
produces results with minimum accuracy levels. The clustering technique is improved with distribution
based similarity model for uncertain data. The nearest neighbor search technique is applied on the
distribution based data environment. The system is designed using java as a front end and oracle as a
back end.
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
Many applications of automatic document classification require learning accurately with little training
data. The semi-supervised classification technique uses labeled and unlabeled data for training. This
technique has shown to be effective in some cases; however, the use of unlabeled data is not always
beneficial.
On the other hand, the emergence of web technologies has originated the collaborative development of
ontologies. In this paper, we propose the use of ontologies in order to improve the accuracy and efficiency
of the semi-supervised document classification.
We used support vector machines, which is one of the most effective algorithms that have been studied for
text. Our algorithm enhances the performance of transductive support vector machines through the use of
ontologies. We report experimental results applying our algorithm to three different datasets. Our
experiments show an increment of accuracy of 4% on average and up to 20%, in comparison with the
traditional semi-supervised model.
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE ijdms
Nowadays, document clustering is considered as a da
ta intensive task due to the dramatic, fast increas
e in
the number of available documents. Nevertheless, th
e features that represent those documents are also
too
large. The most common method for representing docu
ments is the vector space model, which represents
document features as a bag of words and does not re
present semantic relations between words. In this
paper we introduce a distributed implementation for
the bisecting k-means using MapReduce programming
model. The aim behind our proposed implementation i
s to solve the problem of clustering intensive data
documents. In addition, we propose integrating the
WordNet ontology with bisecting k-means in order to
utilize the semantic relations between words to enh
ance document clustering results. Our presented
experimental results show that using lexical catego
ries for nouns only enhances internal evaluation
measures of document clustering; and decreases the
documents features from thousands to tens features.
Our experiments were conducted using Amazon ElasticMapReduce to deploy the Bisecting k-means
algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Characterizing Erythrophleum Suaveolens Charcoal as a Viable Alternative Fuel...IOSR Journals
An experimental study was conducted to characterize erythrophleum suaveolens (Gwaska) charcoal. The test was conducted for proximate analysis (involving the determination of moisture content, ash, volatile matter and fixed carbon) and ultimate analysis (involving the determination of carbon, hydrogen, oxygen, nitrogen sulphur and calorific value) of erythrophleum suaveolens charcoal. The determined values of moisture, ash, volatile matter and fixed carbon were 0.94%, 6.13%, 6.77% and 86.16% respectively. Also the determined values of carbon, hydrogen, oxygen, nitrogen, sulphur and calorific value were 77.5%, 9%, 5.48%, 1.89%, 0.003% and 7158.6995 Kcal/Kg respectively. Therefore, the gwaska charcoal satisfies the blast furnace requirements for moisture, ash and sulphur in Nigeria. However, its volatile matter exceeds the specified limit except for Indian standard practice. The erythrophleum suaveolens charcoal’s thermal properties showed that it could compete favourably with coke and therefore can be an excellent reducing fuel for the production of iron.
IOSR Journal of Applied Physics (IOSR-JAP) is an open access international journal that provides rapid publication (within a month) of articles in all areas of physics and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in applied physics. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Experimental Investigation on Use of Methyl Ester Kusum Oil and Its Blends In...IOSR Journals
The research on alternative fuels for compression ignition engine has become essential due to depletion of petroleum products, higher oil prices and its major contribution for pollutants, where vegetable oil promises best alternative fuel. Vegetable oils, due to their agricultural origin, are able to reduce net CO2 emissions to the atmosphere. In the present paper, the research efforts directed towards improving the performance of C.I. engine using vegetable oil (Methyl ester kusum oil) as a fuel. The paper deals with results of performance of a single cylinder, four stroke, C.I. engine using kusum oil methyl ester and its blends with diesel. The performance of engine was studied at constant speed, with the engine operated at various loading conditions. Performance parameters considered for comparing are brake specific fuel consumption, thermal efficiency, brake power, exhaust gas temperature, smoke density and part load and peak load performance of the engine. The engine offers increase in thermal efficiency when it is powered by kusum oil and its blends at various loads. The power developed and exhausts gas temperature increases with the increase and specific fuel consumption is higher than diesel fuel
Design of a Dual-Band Microstrip Patch Antenna for GPS,WiMAX and WLAN.IOSR Journals
The A multi band microstrip patch antenna has been designed for GPS,WiMAX and WLAN
applications. The proposed antenna is designed by using substrate of RT duroid having permittivity of about 2.2
and loss tangent of 1.The substrate is having thickness of 6mm at which a trapezoidal patch antenna with V slot
has been introduced in this paper. The designing results like S11 parameter return loss,VSWR and field pattern
is plotted successfully. The obtained result is having a two band resonance with S11 less then -10dB and VSWR
less than 2.
So a dual band trapezoidal microstrip patch antenna has been designed and all results are plotted.Simmulating
software used is IE3D.
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
Data mining is the process of using technology to identify patterns and prospects from large amount of information. In Data Mining, Clustering is an important research topic and wide range of unverified classification application. Clustering is technique which divides a data into meaningful groups. K-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. In this paper, we present the comparison of different K-means clustering algorithms.
International Journal of Engineering and Science Invention (IJESI)inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...Editor IJMTER
Databases are build with the fixed number of fields and records. Uncertain database contains a
different number of fields and records. Clustering techniques are used to group up the relevant records
based on the similarity values. The similarity measures are designed to estimate the relationship between
the transactions with fixed attributes. The uncertain data similarity is estimated using similarity
measures with some modifications.
Clustering on uncertain data is one of the essential tasks in mining uncertain data. The existing
methods extend traditional partitioning clustering methods like k-means and density-based clustering
methods like DBSCAN to uncertain data. Such methods cannot handle uncertain objects. Probability
distributions are essential characteristics of uncertain objects have not been considered in measuring
similarity between uncertain objects.
The customer purchase transaction data is analyzed using uncertain data clustering scheme. The
density based clustering mechanism is used for the uncertain data clustering process. This model
produces results with minimum accuracy levels. The clustering technique is improved with distribution
based similarity model for uncertain data. The nearest neighbor search technique is applied on the
distribution based data environment. The system is designed using java as a front end and oracle as a
back end.
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
Many applications of automatic document classification require learning accurately with little training
data. The semi-supervised classification technique uses labeled and unlabeled data for training. This
technique has shown to be effective in some cases; however, the use of unlabeled data is not always
beneficial.
On the other hand, the emergence of web technologies has originated the collaborative development of
ontologies. In this paper, we propose the use of ontologies in order to improve the accuracy and efficiency
of the semi-supervised document classification.
We used support vector machines, which is one of the most effective algorithms that have been studied for
text. Our algorithm enhances the performance of transductive support vector machines through the use of
ontologies. We report experimental results applying our algorithm to three different datasets. Our
experiments show an increment of accuracy of 4% on average and up to 20%, in comparison with the
traditional semi-supervised model.
O NTOLOGY B ASED D OCUMENT C LUSTERING U SING M AP R EDUCE ijdms
Nowadays, document clustering is considered as a da
ta intensive task due to the dramatic, fast increas
e in
the number of available documents. Nevertheless, th
e features that represent those documents are also
too
large. The most common method for representing docu
ments is the vector space model, which represents
document features as a bag of words and does not re
present semantic relations between words. In this
paper we introduce a distributed implementation for
the bisecting k-means using MapReduce programming
model. The aim behind our proposed implementation i
s to solve the problem of clustering intensive data
documents. In addition, we propose integrating the
WordNet ontology with bisecting k-means in order to
utilize the semantic relations between words to enh
ance document clustering results. Our presented
experimental results show that using lexical catego
ries for nouns only enhances internal evaluation
measures of document clustering; and decreases the
documents features from thousands to tens features.
Our experiments were conducted using Amazon ElasticMapReduce to deploy the Bisecting k-means
algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Characterizing Erythrophleum Suaveolens Charcoal as a Viable Alternative Fuel...IOSR Journals
An experimental study was conducted to characterize erythrophleum suaveolens (Gwaska) charcoal. The test was conducted for proximate analysis (involving the determination of moisture content, ash, volatile matter and fixed carbon) and ultimate analysis (involving the determination of carbon, hydrogen, oxygen, nitrogen sulphur and calorific value) of erythrophleum suaveolens charcoal. The determined values of moisture, ash, volatile matter and fixed carbon were 0.94%, 6.13%, 6.77% and 86.16% respectively. Also the determined values of carbon, hydrogen, oxygen, nitrogen, sulphur and calorific value were 77.5%, 9%, 5.48%, 1.89%, 0.003% and 7158.6995 Kcal/Kg respectively. Therefore, the gwaska charcoal satisfies the blast furnace requirements for moisture, ash and sulphur in Nigeria. However, its volatile matter exceeds the specified limit except for Indian standard practice. The erythrophleum suaveolens charcoal’s thermal properties showed that it could compete favourably with coke and therefore can be an excellent reducing fuel for the production of iron.
IOSR Journal of Applied Physics (IOSR-JAP) is an open access international journal that provides rapid publication (within a month) of articles in all areas of physics and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in applied physics. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Experimental Investigation on Use of Methyl Ester Kusum Oil and Its Blends In...IOSR Journals
The research on alternative fuels for compression ignition engine has become essential due to depletion of petroleum products, higher oil prices and its major contribution for pollutants, where vegetable oil promises best alternative fuel. Vegetable oils, due to their agricultural origin, are able to reduce net CO2 emissions to the atmosphere. In the present paper, the research efforts directed towards improving the performance of C.I. engine using vegetable oil (Methyl ester kusum oil) as a fuel. The paper deals with results of performance of a single cylinder, four stroke, C.I. engine using kusum oil methyl ester and its blends with diesel. The performance of engine was studied at constant speed, with the engine operated at various loading conditions. Performance parameters considered for comparing are brake specific fuel consumption, thermal efficiency, brake power, exhaust gas temperature, smoke density and part load and peak load performance of the engine. The engine offers increase in thermal efficiency when it is powered by kusum oil and its blends at various loads. The power developed and exhausts gas temperature increases with the increase and specific fuel consumption is higher than diesel fuel
Design of a Dual-Band Microstrip Patch Antenna for GPS,WiMAX and WLAN.IOSR Journals
The A multi band microstrip patch antenna has been designed for GPS,WiMAX and WLAN
applications. The proposed antenna is designed by using substrate of RT duroid having permittivity of about 2.2
and loss tangent of 1.The substrate is having thickness of 6mm at which a trapezoidal patch antenna with V slot
has been introduced in this paper. The designing results like S11 parameter return loss,VSWR and field pattern
is plotted successfully. The obtained result is having a two band resonance with S11 less then -10dB and VSWR
less than 2.
So a dual band trapezoidal microstrip patch antenna has been designed and all results are plotted.Simmulating
software used is IE3D.
A study of the chemical composition and the biological active components of N...IOSR Journals
Nigella Sativa (N.S.) is an annual herbaceous plant from Ranunculaceae family producing small black seeds with aromatic odor and taste. Fenugreek (Trigonella foenum-graecum L.) , belongs to the subfamily papilionacae of the family Leguminosae (bean family, Fabaceae). The plant is an aromatic herbaceous annual, widely cultivated in Mediterranean countries and Asia.
Aim:- to extract and study the biological active components of fixed oils of N.Sativa and Fenugreek seeds.
Materials and methods:Fixed oil of the N.S. and the F.S seeds were extracted and characterized using infrared spectroscopic techniques (Tensor27- PRUKER). Biological activity test was applied on the bacteria (Bacillus pumilus, E.coli, and Pseudo M.).
Results: Both studied fixed oils showed identical antimicrobial activity.
Conclusion:- this study showed an identical similarity between the active biological components of both studied materials (N.Sativa and Fenugreek seeds) in spite of their different botanical origin, leading to a matched biological activity. This finding may be useful in replacing one herbal seeds instead of the other according to their availability when applying these seeds for their known therapeutic uses.
Developing Design Cracker Anacardium occidentale (mente) for Home IndustrialIOSR Journals
These purpose of my research was to design the peeled of cutting and made its quality and quantity
of the cracker mente. The method for evaluation was designing and testing several peel models, testing crack
effectively, and testing for its quality and quantity cracker. With developing design craker mente, we found the
cracker home industrial. The cracker mente have developed design a cutting knife with follow the mente’s the
top of the ellipse knife. The length effectively was the maximum produce cracker quality is 8 mm with width
mente average 25-27 mm. The result of the desain knife for cracker mente has to be grade mente quality to
produce to be 79,25% each 1 kg with time peeling take 26 menit, 55 detik.
Kata kunci : Peeling mente.
Network Lifetime Analysis of Routing Protocols of Short Network in QualnetIOSR Journals
A Mobile Ad-Hoc Network (MANET) is a collection of wireless mobile nodes that communicates with
each other without using any existing infrastructure, access point or centralized administration. Mobile ad-hoc
network have the attributes such as wireless connection, continuously changing topology, distributed operation
and ease of deployment. In this paper we have compared the energy consumption of reactive, proactive &
hybrid routing protocol AODV,DSR,RIP & ZRP by using different mobility model. We have analyzed the
Network lifetime of protocols by varying pay load, mobility, pause time and type of traffic (CBR). A detailed
simulation has been carried out in qualnet. The metrics used for performance analysis are energy consumed &
battery consumption. It has been observed that RIP has better network lifetime than other
Spatial Correlation Based Medium Access Control Protocol Using DSR & AODV Rou...IOSR Journals
Abstract : In Wireless sensor network sensor nodes have a limited battery life and their efficient utilization is
a very much importent task. Their are many ways are proposed for efficient utilization of energy.For efficient
energy utilization many topologies,protocals are proposed by the help of which we can maximize the battery
life. In this paper we propesed a methode in which a correlation is made between all the sensor nodes including
ME(mobile element). A Vector Quantization methode are used for distance calculation between all the sensor
nodes and mobile element. After finding the corrélation we used the DSR & AODV routing Protocol. The
performance of the proposed protocol has been examined and evaluated with the NS-2 simulator in terms of
packet drop ratio and energy consumption. The simulation result shows that the proposed protocol with AODV
routing gives a batter result compared with same protocol with DSR routing.
Keywords: ME, DVT, DSR, AODV, Wireless Sensor Network, Efficient Energy Utilization
Congruence Lattices of Isoform LatticesIOSR Journals
A congruence of a lattice L is said to be isoform, if any two congruence classes of are isomorphic as lattices. The lattice L is said to be isoform, if all congruence's of L are isoform. We prove that every finite distributive lattice D can be represented as the congruence lattice of a finite isoform lattice.
Simple and Effective Method of the Synthesis of Nanosized Fe2O 3particlesIOSR Journals
Abstract: Nanosized Iron oxide is prepared by using precipitation method from iron nitrate and liquid ammonia. Thermal analysis shows that synthesized iron oxide shows some weight loss and oxide undergoing decomposition, dehydration or any physical change from TGA curve we observe that Iron oxide shows stable weight loss above 4000C. In DTA curve also, there is exothermic and endothermic peak. Which shows phase transition, solid state reaction or any chemical reaction occurred during heating treatment. Morphology is observed by scanning electron microscopy (SEM) shows particles are nanosized. Further morphology observation by Transmission Electron Microscopy (TEM) revels that Iron Oxide has the corundum (Al2O3) structure. Magnetic measurements shows that iron oxide has five unpaired electron and strongly paramagnetic character.
Evaluation of Uptake of Methylene blue dye by Sulphonated biomass of Cicer ar...IOSR Journals
The uptake of methylene blue by sulphonated biomass of Cicer arientinum is conducted in batch mode. The effect of parameters like contact time, sorbent dose, pH and temperature has been studied. The value Kp is found to be 0.1928 and 0.8727 for initial and final concentrations respectively. The kinetics of biosorption results indicate that sorption process follows pseudo–second order model with determination coefficients greater than 0.988 for sorbent under all experimental conditions. Thermodynamic parameter via KD, and ΔG are calculated indicates, rise in KD, negative ΔG values determine the spontaneity of the process and significantly shows that sorption process is time, temperature and concentration dependant. The adsorption obeys the Langmuir isotherm, Hall separation factor values less than unity and low value of activation energy indicate that sorption is an activated and favorable physical process. The phenomenon of sorption includes liquid-film, mass transfer mechanism is well described by Weber and Morris intraparticle diffusion model. Thus sulphonated biomass of Cicer arientinum(S-III) is a low cost and easily available good sorbent for the removal of MB+ from wastewater.
Shrinkage of Polyester Fibre in Selected Chlorinated Solvents and Effects on ...IOSR Journals
Polyester fibres were isothermally treated with four chlorinated solvents; perchloroethylene (PCE), trichloroethylene (TCE), 1,1-dichloroethylene (1,1-DCE) and tetrachloromethane (TCM). Measurement of the longitudinal shrinkage of the treated fibres was carried out at room temperature for 30, 60, 150, 300, 450, 600, 750, 900 and 1800 seconds that was found to be sufficient to establish dynamic equilibrium conditions for each of the solvents. From the results, a trend of 1,1-DCE > PCE > TCE > TCM was observed for the shrinkage values which showed that the solvents exhibited behaviour that cannot be explained in terms of the variations in their boiling points and molecular weights values. Solubility parameter values (δ) of the solvents were however, found to be the overriding factor as it followed the above trend. The treatment has been able to provide a means of improving polyester fibre structure to suit its use in commercial applications and also revealed that the best among the four solvents in term of effecting minimal change on the structure and quality of the fabric during laundry will be TCM.
IOSR Journal of Business and Management (IOSR-JBM) is an open access international journal that provides rapid publication (within a month) of articles in all areas of business and managemant and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications inbusiness and management. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
FPGA Based Implementation of Electronic Safe LockIOSR Journals
Thispaper is based on design of an “Automatic Security System Using VHDL” providing
understandable and adequate operating procedure to the user. The operation is conducted by six different
modules. If any of the modules fails, the failed module can be replaced without affecting the activity of others.
The safety is ensured to the user by setting a secret code number which is the combination of three numbers, by
doing so, only the authorized users can unlock the safe. The paper finds its appositeness in big organizations,
military and banking sectors. Simulation through VHDL is quite generous and fiscal due to the reduction in
number of components. Important operation consideration is to not give any indication to the user that the
combination entered is incorrect until after the user has entered the all three numbers and pressed the OPEN
key. Otherwise, it is possible for a user to determine the combination in no more than 96 attempts, as opposed to
no more than 32,768 attempts
Enhance the Throughput of Wireless Network Using Multicast RoutingIOSR Journals
Wireless Mesh Network is designed static or limited mobility environment .In multicast routing for
wireless mesh networks has focused on metrics that estimate link quality to maximize throughput
andtoprovide secure communication. Nodes must collaborate in order to compute the path metric and
forward data.Node identify the novel attacks against high- throughput multicast protocols in wireless
mesh network.. The attacks exploit the local estimation and global aggregation of the metric to allow
attackers to attract a large amount of traffic These attacks are very effective b a s e d on high
throughput metrics. The aggressive path selection is a double-edged sword: It is maximizes throughput,
it also increases attack effectiveness. so Rate guard mechanism will be used.Rate guard mechanism
means combines Measurement-based detection and accusation-based reaction techniques.The attacks
and the defense using ODMRP, a representative multicast protocol for wireless mesh networks, and
SPP, an adaptation of the well-known ETX unicast metric to the multicast setting
Effects of Psychological Training on Mental Skills with Female Basketball Pla...IOSR Journals
Abstract: The purpose of this study was to examine the effect of a psychological skills training program on
psychological skills of female basketball players. These psychological skills consisted of imagery, relaxation,
focusing, refocusing, goal setting, competition planning, fear control, and stress reactions. The sample
consisted of 12 semi-elite female basketball players from Nasr team in Tehran city that purposely were
selected in 2014 (with the mean age of 23/58± 1/67 years old). All Subjects completed the OMSAT-3
questionnaire that has been confirmed by SanatyMonfared& et al. (2006) in Iran. After giving pre-test, the
subjects divided in two experimental and control group, and then 12-weeks interventions (including,
imagery, relaxation, goal setting, self-talk, and focus training) were done. After 12 weeks, the subjects of two
groups completed the OMSAT-3 questionnaire for post-test. Then, the data were analyzed with descriptive and
inferential statistics methods. The result of dependent t-test for comparing the pre-test and post-test scores
showed that there is a significant difference between scores of pre- and post-test of experimental group (t=
4/98, p<0/01). As a result, it is concluded that, these interventions have positive effects on subscales of
foundation skills, psycho-somatic skills, and cognitive skills from pre-test to post-test for experimental
group versus control group.
Key words: Psychological skills, Mental training, Imagery, Goal setting, OMSAT-3 Questionnaire, Basketball
Optimal Estimating Sequence for a Hilbert Space Valued ParameterIOSR Journals
Some optimality criteria used in estimation of parameters in finite dimensional space has been extended to a separable Hilbert space. Different optimality criteria and their equivalence are established for estimating sequence rather than estimator. An illustrious example is provided with the estimation of the mean of a Gaussian process
Recent Trends in Incremental Clustering: A ReviewIOSRjournaljce
This paper presents a review on recent trends in incremental clustering algorithms. It tries to focus on both clustering based on similarity measure and clustering not based on similarity measure. In this context, the paper is devoted to various typical incremental clustering algorithms. Mainly optimization, genetic and fuzzy approaches of these algorithms is covered in the paper. The paper is original with respect to one aspect that is, it provides a complete overview that is fully devoted to evolutionary algorithms for incremental clustering. A number of references are provided that describe applications of evolutionary algorithms for incremental clustering in different domains, such as human activity detection, online fault detection, information security, track an object consistently throughout the network solving boundary problem etc.
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
International Journal of Engineering and Science Invention (IJESI)inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...ijdmtaiir
-In this study a comprehensive evaluation of two
supervised feature selection methods for dimensionality
reduction is performed - Latent Semantic Indexing (LSI) and
Principal Component Analysis (PCA). This is gauged against
unsupervised techniques like fuzzy feature clustering using
hard fuzzy C-means (FCM) . The main objective of the study is
to estimate the relative efficiency of two supervised techniques
against unsupervised fuzzy techniques while reducing the
feature space. It is found that clustering using FCM leads to
better accuracy in classifying documents in the face of
evolutionary algorithms like LSI and PCA. Results show that
the clustering of features improves the accuracy of document
classification
Clustering is also known as data segmentation aims to partitions data set into groups, clusters, according to their similarity. Cluster analysis has been extensively studied in many researches. There are many algorithms for different types of clustering. These classical algorithms can't be applied on big data due to its distinct features. It is a challenge to apply the traditional techniques on large unstructured data. This study proposes a hybrid model to cluster big data using the famous traditional K-means clustering algorithm. The proposed model consists of three phases namely; Mapper phase, Clustering Phase and Reduce phase. The first phase uses map-reduce algorithm to split big data into small datasets. Whereas, the second phase implements the traditional clustering K-means algorithm on each of the spitted small data sets. The last phase is responsible of producing the general clusters output of the complete data set. Two functions, Mode and Fuzzy Gaussian, have been implemented and compared at the last phase to determine the most suitable one. The experimental study used four benchmark big data sets; Covtype, Covtype-2, Poker, and Poker-2. The results proved the efficiency of the proposed model in clustering big data using the traditional K-means algorithm. Also, the experiments show that the Fuzzy Gaussian function produces more accurate results than the traditional Mode function.
Ensemble based Distributed K-Modes ClusteringIJERD Editor
Clustering has been recognized as the unsupervised classification of data items into groups. Due to the explosion in the number of autonomous data sources, there is an emergent need for effective approaches in distributed clustering. The distributed clustering algorithm is used to cluster the distributed datasets without gathering all the data in a single site. The K-Means is a popular clustering method owing to its simplicity and speed in clustering large datasets. But it fails to handle directly the datasets with categorical attributes which are generally occurred in real life datasets. Huang proposed the K-Modes clustering algorithm by introducing a new dissimilarity measure to cluster categorical data. This algorithm replaces means of clusters with a frequency based method which updates modes in the clustering process to minimize the cost function. Most of the distributed clustering algorithms found in the literature seek to cluster numerical data. In this paper, a novel Ensemble based Distributed K-Modes clustering algorithm is proposed, which is well suited to handle categorical data sets as well as to perform distributed clustering process in an asynchronous manner. The performance of the proposed algorithm is compared with the existing distributed K-Means clustering algorithms, and K-Modes based Centralized Clustering algorithm. The experiments are carried out for various datasets of UCI machine learning data repository.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Hierarchal clustering and similarity measures along with multi representationeSAT Journals
Abstract All clustering methods have to assume some cluster relationship on the list of data objects that they really are applied on. Graph-Based Document Clustering works with frequent senses rather than frequent keywords used in traditional text mining techniques.Similarity between a pair of objects can be defined either explicitly or implicitly. With this paper, we analyzed existing multi-viewpoint based similarity measure and two related clustering methods. The main difference between a traditional dissimilarity/similarity measure and ours could be that the former uses merely a single viewpoint, which is the origin, even though the latter utilizes many viewpoints, which you ll find are objects assumed to not have the very same cluster using the two objects being measured. Using multiple viewpoints, more informative assessment of similarity could well be achieved. Theoretical analysis and empirical study are conducted to back up this claim. Two criterion functions for document clustering are proposed dependent on this wonderful measure. We compare them several well-known clustering algorithms which use other popular similarity measures on various document collections confirming the good sides of our proposal. Keywords –Multiview Cluster, Document id, ClusterDistance
Similar to Clustering Algorithm with a Novel Similarity Measure (20)
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Clustering Algorithm with a Novel Similarity Measure
1. IOSR Journal of Computer Engineering (IOSRJCE)
ISSN: 2278-0661 Volume 4, Issue 6 (Sep-Oct. 2012), PP 37-42
www.iosrjournals.org
www.iosrjournals.org 37 | Page
Clustering Algorithm with a Novel Similarity Measure
Gaddam Saidi Reddy1
, Dr.R.V.Krishnaiah2
1,2
Department of CSE, DRK institute of science and technology, Hyderabad, India.
Abstract-Clustering is one of the data mining and text mining techniques used to analyze datasets by dividing it
into meaningful groups. The objects in the dataset can have certain relationships among them. All clustering
algorithms assume this before they are applied to datasets. The existing algorithms for text mining make use of a
single viewpoint for measuring similarity between objects. Their drawback is that the clusters can’t exhibit the
complete set of relationships among objects. To overcome this drawback, we propose a new similarity measure
known as multi-viewpoint based similarity measure to ensure the clusters show all relationships among objects.
We also proposed two clustering methods. The empirical study revealed that the hypothesis “multi-viewpoint
similarity can bring about more informative relationships among objects and thus more meaningful clusters are
formed” is proved to be correct and it can be used in the real time applications where text documents are to be
searched or processed frequently.
Index Terms– Data mining, text mining, similarity measure, multi-viewpoint similarity measure, clustering
methods.
I. Introduction
Data mining is a process of analyzing data in order to bring about trends or patterns from the data. Many
techniques are part of data mining. Other mining such as text mining and web mining also exists. Clustering is
one of the important data mining or text mining algorithm that is used to group similar objects together. In other
words, it is used to organize given objects into some meaningful sub groups that make further analysis on data
easier. Clustered groups make search mechanisms easy and reduce the bulk of operations and computational
cost. Many clustering algorithms have been around since the inception of data mining domain. They are used
based on the kind of application. One such clustering algorithm being used widelyby the IT industry is k-means .
It still remains in the top list of widely used clustering algorithms in the world. It has many variants as well.
Basically its functionality is similar. It takes two arguments and forms clusters. The first argument is data set or
objects to be clustered while the second argument is the number of clusters to be formed. It has wide range of
applications. One such application is credit card fraud detection. In such application, it generates clusters offline
and makes a model. And then new transactions are simply added to the model which has clusters indicating high,
low and medium range transactions. When a new transaction takes place, it can compare with the general buying
patterns of customer and can detect abnormality. Any abnormality is suspected to be a fraudulent transaction.
According to also k-means is the most favorite clustering algorithms in the data mining domain. Nevertheless, it
has its own drawbacks that are well known to the world. They are sensitiveness to cluster size, sensitiveness to
initialization; its performance is lesser than many other clustering techniques used in the data mining domain.
Provided these drawbacks, it is still considered popular due to its simplicity, scalability and understandability. As
it is less complex with adequate performance, it is widely used in the industry overlooking its known limitations.
Another important quality of k-means algorithm is that it can be easily combined with other algorithms for best
results. Generally the problem of clustering can be thought as optimization process. By optimizing similarity
measures the optimal clusters can be formed thus performance is improved. Therefore the soundness of
clustering algorithms depends on their similarity measure adopted. To meet various requirements k-means has
many variants. For instance spherical k-means (uses cosine similarity) is used to cluster text documents while
original k-means can be used to clustering using Euclidean distance [3].
According to Leo Wanner, clustering methods are classified into hierarchical clustering, data
partitioning, data grouping. The hierarchical clustering is used to establish cluster taxonomy. Data partitioning is
used to build a set of flat partitions. They are also known as non-overlapping clusters. Data group is used to build
a set of flat or overlapping clusters. The proposed work in this paper is motivated by the facts ascertained by
investigation of the above. Especially similarity measures are considered. From research findings it is understood
that the nature of similarity measured used in any clustering technique has profound impact on the results. The
aim of the paper is to develop a new method that is used to cluster text documents that have sparse and high
dimensional data objects. Afterwards we formulate new clustering criterion functions and corresponding
clustering algorithms respectively. Like k-means the proposed algorithms work faster and provide consistent,
high quality performance in the process of clustering text documents. The proposed similarity measure is based
on multi-viewpoint which is elaborated in the later sections.
2. Clustering Algorithm with a Novel Similarity Measure
www.iosrjournals.org 38 | Page
II. Related Work
Document clustering is one of the text mining techniques. It has been around since the inception of text
mining domain. It is s process of grouping objects into some categories or groups in such a way that there is
maximization of intra-cluster object similarity and inter-cluster dissimilarity. Here an object does mean a
document and term refers to a word in the document. Each document considered for clustering is represented as
an m – dimensional vector d. The mrepresents the total number of terms present in the given document.
Document vectors are the result of some sort of weighting schemes like TF-IDF (Term Frequency –Inverse
Document Frequency). Many approaches came into existence for document clustering. They include information
theoretic co-clustering [4], non – negative matrix factorization, probabilistic model based method [2] and so on.
However, these approaches did not use specific measure in finding document similarity. In this paper we
consider methods that specifically use certain measurement. From the literature it is found that one of the popular
measures is Eucludian distance.
Dist (di,dj) = ||di – dj|| (1)
K-means is one of the popular clustering algorithms in the world. It is in the list of top 10. Due to its
simplicity and ease of use it is still being used in the mining domain. Euclidian distance measure is used in k-
means algorithm. The main purpose of the k-means algorithm is to minimize the distance, as per Euclidian
measurement, between objects in clusters. The centroid of such clusters is represented as:
Min ∑k
∑ ||di – Cr||2
(2)
r=1 di∈ Sr
In text mining domain, cosine similarity measure is also widely used measurement for finding document
similarity, especially for hi-dimensional and sparse document clustering . The cosine similarity measure is also
used in one of the variants of k-means known as spherical k-means. It is mainly used to maximize the cosine
similairity between cluster’s centroid and the documents in the cluster. The difference between k-means that uses
Euclidian distance and the k-means that make use of cosine similarity is that the former focuses on vector
magnitudes while the latter focuses on vector directions. Another popular approach is known as graph
partitioning approach. In this approach the document corpus is considered as a graph. Min – max cut algorithm is
the one that makes use of this approach and it focuses on minimizing centroid function.
Min ∑k
Dt
r D (3)
r=1
||Dr||2
Other graph partitioning methods include Normalized Cut and Average Weight are used for document clustering
purposes successfully. They used pairwise and cosine simialarity score for document clustering. For document
clustering analysis of criterin functions is made.
CLUTO [1] software package where another method of document clustering based on graph partitioning is
implemented. It builds nearest neighbor graph first and then makes clusters. In this approach for given non-unit
vectors of document the extend Jaccard coefficient is:
𝑆𝑖𝑚 𝑒𝐽𝑎𝑐𝑐 (ui,uj)=
ui uj
||ui||2 + ||uj||2 – uti uj
(4)
Both direction and magnitude are considered in Jaccard coefficients when compared with cosine similarity and
Euclidean distance.When the documents in clusters are represented as unit vectors, the approach is very much
similar to cosine similarity. All measures such as cosine, Euclidean, Jaccard, and Pearson correlation are
compared . The conclusion made here is that Ecludean and Jaccard are best for web document clustering. In [1]
and research has been made on categorical data. They both selected related attributes for given subject and
calculated distance between two values. Document similarities can also be found using approaches that are
concept and phrase based. In [1] tree-milarity measure is used conceptually while proposed phrase-based
approach. Both of them used an algorithm known as Hierarchical Agglomerative Clustering in order to perform
clustering. Their computational complexity is very high that is the drawback of these approaches. For XML
documents also measures are found to know structural similarity [5]. However, they are different from normal
text document clustering.
III. Multi-View Point Based Similarity
Our approach in finding similarity between documents or objects while performing clustering is multi-view
based similarity. It makes use of more than one point of reference as opposed to existing algorithms used for
clustering text documents. As per our approach the similarity between two documents is calculated as:
Sim(di,dj) = 1/n-nr ∑ Sim (di-dh, dj-dh) (5)
dt,dj∈Srdh∈SSr
3. Clustering Algorithm with a Novel Similarity Measure
www.iosrjournals.org 39 | Page
Here is the description of this approach. Consider two point di and dj in cluster Sr. The similarity between those
two points is viewed from a point dh which is outside the cluster. Such similarity is equal to the product of
cosine angle between those points with respect to Eucludean distance between the points. An assumption on
which this definition is based on is “dh is not the same cluster as di and dj. When distances are smaller the
chances are higher that the dh is in the same cluster. Though various viewpoints are useful in increasing the
accuracy of similarity measure there is a possibility of having that give negative result. However the possibility
of such drawback can be ignored provided plenty of documents to be clustered.
IV. Algorithms Proposed
A series of algorithms are proposed to achieve MVS (Multi-View point Similarity). Listing 1 give a
procedure for building similarity matrix of MVS.
1: procedure BUILDMVSMATRIX(A)
2: for r ← 1 : c do
3: DSISrI←_di/∈ ∑Sr di
4: nSISr ← |S ISr|
5: end for
6: for i ← 1 : n do
7: r ← class of di
8: for j ← 1 : n do
9: if dj ∈ Sr then
10: aij ← dti dj – dti DSISr nSISr – dt j DSSr nSISr + 1
11: else
12: aij←dti dj−dti DSISr –dj nSISr −1–dt j DSISr –dj nSISr −1
13: end if
14: end for
15: end for
16: return A = {aij}n×n
17: end procedure
Listing 1 –Procedure for building MVS similarity matrix
From the consition it is understood that when di is considered closer to dl, the dl can still be consideredbeing
closer to di as per MVS. For validation purpose listing 2 is used.
Require: 0 < percentage ≤ 1
1: procedure GETVALIDITY(validity,A, percentage)
2: for r ← 1 : c do
3: qr ← _percentage × nr
4: if qr = 0 then _ percentage too small
5: qr ← 1
6: end if
7: end for
8: for i ← 1 : n do
9: {aiv[1], . . . , aiv[n] } ←Sort {ai1, . . . , ain}
10: s.t. aiv[1] ≥ aiv[2] ≥ . . . ≥ aiv[n] {v[1], . . . , v[n]} ← permute {1, . . . , n}
11: r ← class of di
12: validity(di) ← |{dv[1], . . . , dv[qr] } ∩ Sr| qr
13: end for
14: validity ← _ni←1 validity(di)n
15: return validity
16: end procedure
Listing 2 –Procedure for get validity score
The final validity is calculated by averaging overall the rows of A as given in line 14. When the validity
score is higher, the suitability is more for clustering. The validity scores of Cosine Simialirty (CS) and MVS are
presented in fig. 1.
4. Clustering Algorithm with a Novel Similarity Measure
www.iosrjournals.org 40 | Page
Fig. 1 – Validity test of CS and MVS
Here series 1 corresponds to reutors – 7 CS; series 2 corresponds to reutors-7 MVS; series 3
corresponds to klb-CS; and series 4 corresponds to klb-MVS. The validitity scores of CS and MVS are shown in
fig. 1. In the validity test as per the results shown in fig. 1, MVS is better than that of CS.
1: procedure INITIALIZATION
2: Select k seeds s1, . . . , sk randomly
3: cluster[di] ← p = argmaxr{strdi}, ∀ i = 1, . . . , n
4: Dr ← _di∈Srdi, nr ← |Sr|, ∀ r = 1, . . . , k
5: end procedure
6: procedure REFINEMENT
7: repeat
8: {v[1 : n]} ← random permutation of {1, . . ., n}
9: for j ← 1 : n do
10: i ← v[j]
11: p ← cluster[di]
12: ΔIp ← I(np − 1,Dp − di) − I(np,Dp)
13: q ← arg max r,r_=p{I(nr+1,Dr+di)−I(nr,Dr)}
14: ΔIq ← I(nq + 1,Dq + di) − I(nq,Dq)
15: if ΔIp +ΔIq >0 then
16: Move di to cluster q: cluster[di] ← q
17: Update Dp, np,Dq, nq
18: end if
19: end for
20: until No move for all n documents
21: end procedure
Listing 3 –Algorithm for incremental clustering
The algorithm provided in listing 3 has two phases known as initialization and refinenement.
Initialization is the process of selecting k documents as seeds and forming initial positions while refinement has
number of iterations.In each iterationn number of documents are randomly visited. A verification is done for
each document to find whether moving it to a cluster increases objective function. If improvement is not
estimated, the object is not moved to the cluster else it is moved to the cluster that provides highest improvement.
This process is terminated when iteration finds no document to be moved to new clusters.
V. Performance Evaluation Of Mvs
As part of the performance evaluation, the comparison is made between MVSC Ir, MVSC Iv with existing
algorithms. The document database, data corpora, has benchmark datasets for clustering purposes. These
benchmark datasets details are given in table 1.
Table 1 –Benchmark documents datasets
5. Clustering Algorithm with a Novel Similarity Measure
www.iosrjournals.org 41 | Page
VI. Experimental Setup And Evaluation
To demonstrate MVSCs we compared them with 5 other clustering algorithms. All the clustering algorithms
used in evaluation are:
MVSC Ir : MVSC with criterion function Ir
MVSC Iv : MVSC with criterion function Iv
K-means : conventional k-means with Ecludean distance
Spkmeans: Spherical k-means with CS
graphCS : CLUTO’s graph method with CS
graphEJ: CLUTO’s graph with extended Jaccard
MMC: Min Max Cut algorithm
VII. Results
The experimental results are shown in fig. 2 and fig. 3 for all clustering algorithms using 20 bench mark
document databases. As the results are not fit into one graph they are split into two graphs and eash graph shows
results with 10 datasets.
Fig. 2 (a) Experimental Results for first 10 datasets
Fig. 2 (b) : Experimental results for next 10 datasets
As can be seen in fig. 2 (a) and fig. 2 (b), it is evident that with respect to many data sets MVSC is
performing better. In some cases only other algorithms like graphEJ performed well. Both MVSC Ir and MVSC
Iv outperform many other existing algorithms in most of the cases. As part of experiments we also present the
effect of on the performance of MVSC Ir.
The Effect Of On The Performance Of MVSC Ir
Cluster size and balance have impact on the partitional clustering methods that are based on criterion
functions. Based on the clustering results in Accuracy, FScore and NMI, this assessment is done. The results are
as shown in fig. 3.
6. Clustering Algorithm with a Novel Similarity Measure
www.iosrjournals.org 42 | Page
As can be seen in fig. 3, MVSR Ir’s performance worst at 0 and 1 while it has significant performance
improvement in the middle. MVSR Ir performs within 5% of the best case with respect to any type of evaluation
metrics.
VIII. Conclusion
In this paper we proposed a new similarity measure known as MVS (Multi-Viewpoint based Similarity).
When it is compared with cosine similarity, MVS is more useful for finding the similarity of text documents. The
empirical results and analysis revealed that the proposed scheme for similarity measure is efficient and it can be
used in the real time applications in the text mining domain. IR and IV are the two criterion functions proposed
based on MVS. Their respective clustering algorithms are also introduced. The proposed scheme is tested with
large datasets with various evolution metrics. The results reveal that the clustering algorithm provides
performance that is better than many state – of – the - art clustering algorithms. Similarity measure from multiple
viewpoints is the main contrition of this paper. The paper also provides partitioned clustering that can be applied
on documents. The future work is that the proposed algorithms can be altered and applied to hierarchical
clustering. Our novel approach to measure document similarity is described in the following sections.
References
[1] A. Ahmad and L. Dey, “A method to compute distance between two categorical values of same attribute in unsupervised learning
for categorical data set,” Pattern Recognit. Lett., vol. 28, no. 1, pp. 110 – 118, 2007.
[2] A. Banerjee, I. Dhillon, J. Ghosh, and S. Sra, “Clustering on the unit hypersphere using von Mises-Fisher distributions,” J. Mach.
Learn. Res., vol. 6, pp. 1345–1382, Sep 2005.
[3] I. Dhillon and D. Modha, “Concept decompositions for large sparse text data using clustering,” Mach. Learn., vol. 42, no. 1-2, pp.
143–175, Jan 2001.
[4] I. S. Dhillon, S. Mallela, and D. S. Modha, “Information-theoretic co-clustering,” in KDD, 2003, pp. 89–98.
[5] S. Flesca, G. Manco, E. Masciari, L. Pontieri, and A. Pugliese, “Fast detection of xml structural similarity,” IEEE Trans. on
Knowl. And Data Eng., vol. 17, no. 2, pp. 160–175, 2005.
[6] I. Guyon, U. von Luxburg, and R. C. Williamson, “Clustering: Science or Art?” NIPS’09 Workshop on Clustering Theory, 2009.
[7] D. Ienco, R. G. Pensa, and R. Meo, “Context-based distance learning for categorical data clustering,” in Proc. of the 8th Int.
Symp. IDA, 2009, pp. 83–94.
[8] Leo Wanner (2004). “Introduction to Clustering Techniques”. Available online at:
http://www.iula.upf.edu/materials/040701wanner.pdf [viewed: 16 August 2012]
[9] C. D. Manning, P. Raghavan, and H. Sch ¨ utze, An Introduction to Information Retrieval. Press, Cambridge U., 2009.
[10] on web-page clustering,” in Proc. of the 17th National Conf. on Artif. Intell.: Workshop of Artif. Intell. for Web Search. AAAI,
Jul. 2000, pp. 58–64.
[11] J. Shi and J. Malik, “Normalized cuts and image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, pp. 888–905,
2000.
[12] A. Strehl, J. Ghosh, and R. Mooney, “Impact of similarity measures.
[13] X. Wu, V. Kumar, J. Ross Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, P. S. Yu, Z.-H. Zhou, M.
Steinbach, D. J. Hand, and D. Steinberg, “Top 10 algorithms in data mining,” Knowl. Inf. Syst., vol. 14, no. 1, pp. 1–37, 2007.
[14] W. Xu, X. Liu, and Y. Gong, “Document clustering based on nonnegative matrix factorization,” in SIGIR, 2003, pp. 267–273.
[15] H. Zha, X. He, C. H. Q. Ding, M. Gu, and H. D. Simon, “Spectral relaxation for k-means clustering,” in NIPS, 2001, pp. 1057–
1064.
[16] Y. Zhao and G. Karypis, “Empirical and theoretical comparisons of selected criterion functions for document clustering,” Mach.
Learn., vol. 55, no. 3, pp. 311–331, Jun 2004.
[17] S. Zhong, “Efficient online spherical K-means clustering,” in IEEE IJCNN, 2005, pp. 3180–3185.
Gaddam Saidi Reddy(M.Tech) is student of DRK institute of science and technology, Hyderabad, AP,
INDIA. He has received B.Tech Degree computer science and engineering&M.Tech Degree in
computer science and engineering. His main research interest includes data mining. Cloud computing.
Dr.R.V.Krishnaiah(Ph.D) is working as Principal at DRK INSTITUTE OF SCINCE &
TECHNOLOGY, Hyderabad, AP, INDIA. He has received M.TechDegree(EIE&CSE). His
main research interest includes Data Mining, Software Engineering.