International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Data mining is utilized to manage huge measure of information which are put in the data ware houses and databases, to discover required information and data. Numerous data mining systems have been proposed, for example, association rules, decision trees, neural systems, clustering, and so on. It has turned into the purpose of consideration from numerous years. A re-known amongst the available data mining strategies is clustering of the dataset. It is the most effective data mining method. It groups the dataset in number of clusters based on certain guidelines that are predefined. It is dependable to discover the connection between the distinctive characteristics of data.
In k-mean clustering algorithm, the function is being selected on the basis of the relevancy of the function for predicting the data and also the Euclidian distance between the centroid of any cluster and the data objects outside the cluster is being computed for the clustering the data points. In this work, author enhanced the Euclidian distance formula to increase the cluster quality.
The problem of accuracy and redundancy of the dissimilar points in the clusters remains in the improved k-means for which new enhanced approach is been proposed which uses the similarity function for checking the similarity level of the point before including it to the cluster.
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Data mining is utilized to manage huge measure of information which are put in the data ware houses and databases, to discover required information and data. Numerous data mining systems have been proposed, for example, association rules, decision trees, neural systems, clustering, and so on. It has turned into the purpose of consideration from numerous years. A re-known amongst the available data mining strategies is clustering of the dataset. It is the most effective data mining method. It groups the dataset in number of clusters based on certain guidelines that are predefined. It is dependable to discover the connection between the distinctive characteristics of data.
In k-mean clustering algorithm, the function is being selected on the basis of the relevancy of the function for predicting the data and also the Euclidian distance between the centroid of any cluster and the data objects outside the cluster is being computed for the clustering the data points. In this work, author enhanced the Euclidian distance formula to increase the cluster quality.
The problem of accuracy and redundancy of the dissimilar points in the clusters remains in the improved k-means for which new enhanced approach is been proposed which uses the similarity function for checking the similarity level of the point before including it to the cluster.
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
Textual Data Partitioning with Relationship and Discriminative AnalysisEditor IJMTER
Data partitioning methods are used to partition the data values with similarity. Similarity
measures are used to estimate transaction relationships. Hierarchical clustering model produces tree
structured results. Partitioned clustering produces results in grid format. Text documents are
unstructured data values with high dimensional attributes. Document clustering group ups unlabeled text
documents into meaningful clusters. Traditional clustering methods require cluster count (K) for the
document grouping process. Clustering accuracy degrades drastically with reference to the unsuitable
cluster count.
Textual data elements are divided into two types’ discriminative words and nondiscriminative
words. Only discriminative words are useful for grouping documents. The involvement of
nondiscriminative words confuses the clustering process and leads to poor clustering solution in return.
A variation inference algorithm is used to infer the document collection structure and partition of
document words at the same time. Dirichlet Process Mixture (DPM) model is used to partition
documents. DPM clustering model uses both the data likelihood and the clustering property of the
Dirichlet Process (DP). Dirichlet Process Mixture Model for Feature Partition (DPMFP) is used to
discover the latent cluster structure based on the DPM model. DPMFP clustering is performed without
requiring the number of clusters as input.
Document labels are used to estimate the discriminative word identification process. Concept
relationships are analyzed with Ontology support. Semantic weight model is used for the document
similarity analysis. The system improves the scalability with the support of labels and concept relations
for dimensionality reduction process.
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSijdkp
Subspace clustering discovers the clusters embedded in multiple, overlapping subspaces of high
dimensional data. Many significant subspace clustering algorithms exist, each having different
characteristics caused by the use of different techniques, assumptions, heuristics used etc. A comprehensive
classification scheme is essential which will consider all such characteristics to divide subspace clustering
approaches in various families. The algorithms belonging to same family will satisfy common
characteristics. Such a categorization will help future developers to better understand the quality criteria to
be used and similar algorithms to be used to compare results with their proposed clustering algorithms. In
this paper, we first proposed the concept of SCAF (Subspace Clustering Algorithms’ Family).
Characteristics of SCAF will be based on the classes such as cluster orientation, overlap of dimensions etc.
As an illustration, we further provided a comprehensive, systematic description and comparison of few
significant algorithms belonging to “Axis parallel, overlapping, density based” SCAF.
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
Text Document Clustering is one of the fastest growing research areas because of availability of huge amount of information in an electronic form. There are several number of techniques launched for clustering documents in such a way that documents within a cluster have high intra-similarity and low inter-similarity to other clusters. Many document clustering algorithms provide localized search in effectively navigating, summarizing, and organizing information. A global optimal solution can be obtained by applying high-speed and high-quality optimization algorithms. The optimization technique performs a globalized search in the entire solution space. In this paper, a brief survey on optimization approaches to text document clustering is turned out.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...IJECEIAES
A hard partition clustering algorithm assigns equally distant points to one of the clusters, where each datum has the probability to appear in simultaneous assignment to further clusters. The fuzzy cluster analysis assigns membership coefficients of data points which are equidistant between two clusters so the information directs have a place toward in excess of one cluster in the meantime. For a subset of CiteScore dataset, fuzzy clustering (fanny) and fuzzy c-means (fcm) algorithms were implemented to study the data points that lie equally distant from each other. Before analysis, clusterability of the dataset was evaluated with Hopkins statistic which resulted in 0.4371, a value < 0.5, indicating that the data is highly clusterable. The optimal clusters were determined using NbClust package, where it is evidenced that 9 various indices proposed 3 cluster solutions as best clusters. Further, appropriate value of fuzziness parameter m was evaluated to determine the distribution of membership values with variation in m from 1 to 2. Coefficient of variation (CV), also known as relative variability was evaluated to study the spread of data. The time complexity of fuzzy clustering (fanny) and fuzzy c-means algorithms were evaluated by keeping data points constant and varying number of clusters.
A Combined Approach for Feature Subset Selection and Size Reduction for High ...IJERA Editor
selection of relevant feature from a given set of feature is one of the important issues in the field of
data mining as well as classification. In general the dataset may contain a number of features however it is not
necessary that the whole set features are important for particular analysis of decision making because the
features may share the common information‟s and can also be completely irrelevant to the undergoing
processing. This generally happen because of improper selection of features during the dataset formation or
because of improper information availability about the observed system. However in both cases the data will
contain the features that will just increase the processing burden which may ultimately cause the improper
outcome when used for analysis. Because of these reasons some kind of methods are required to detect and
remove these features hence in this paper we are presenting an efficient approach for not just removing the
unimportant features but also the size of complete dataset size. The proposed algorithm utilizes the information
theory to detect the information gain from each feature and minimum span tree to group the similar features
with that the fuzzy c-means clustering is used to remove the similar entries from the dataset. Finally the
algorithm is tested with SVM classifier using 35 publicly available real-world high-dimensional dataset and the
results shows that the presented algorithm not only reduces the feature set and data lengths but also improves the
performances of the classifier.
automatic classification in information retrievalBasma Gamal
automatic classification in information retrieval-automatic classification of documents
Chapter 3 from IR_VAN_Book
INFORMATION RETRIEVAL
C. J. van RIJSBERGEN B.Sc., Ph.D., M.B.C.S.
Semi-Supervised Discriminant Analysis Based On Data Structureiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Clustering of high dimensionality data which can be seen in almost all fields these days is becoming
very tedious process. The key disadvantage of high dimensional data which we can pen down is curse
of dimensionality. As the magnitude of datasets grows the data points become sparse and density of
area becomes less making it difficult to cluster that data which further reduces the performance of
traditional algorithms used for clustering. Semi-supervised clustering algorithms aim to improve
clustering results using limited supervision. The supervision is generally given as pair wise
constraints; such constraints are natural for graphs, yet most semi-supervised clustering algorithms are
designed for data represented as vectors [2]. In this paper, we unify vector-based and graph-based
approaches. We first show that a recently-proposed objective function for semi-supervised clustering
based on Hidden Markov Random Fields, with squared Euclidean distance and a certain class of
constraint penalty functions, can be expressed as a special case of the global kernel k-means objective
[3]. A recent theoretical connection between global kernel k-means and several graph clustering
objectives enables us to perform semi-supervised clustering of data. In particular, some methods have
been proposed for semi supervised clustering based on pair wise similarity or dissimilarity
information. In this paper, we propose a kernel approach for semi supervised clustering and present in
detail two special cases of this kernel approach.
Principle Component Analysis Based on Optimal Centroid Selection Model for Su...ijtsrd
Clustering a large sparse and large scale data is an open research in the data mining. To discover the significant information through clustering algorithm stands inadequate as most of the data finds to be non actionable. Existing clustering technique is not feasible to time varying data in high dimensional space. Hence Subspace clustering will be answerable to problems in the clustering through incorporation of domain knowledge and parameter sensitive prediction. Sensitiveness of the data is also predicted through thresholding mechanism. The problems of usability and usefulness in 3D subspace clustering are very important issue in subspace clustering. . The Solutions is highly helpful benefit for police departments and law enforcement organisations to better understand stock issues and provide insights that will enable them to track activities, predict the likelihood. Also determining the correct dimension is inconsistent and challenging issue in subspace clustering .In this thesis, we propose Centroid based Subspace Forecasting Framework by constraints is proposed, i.e. must link and must not link with domain knowledge. Unsupervised Subspace clustering algorithm with inbuilt process like inconsistent constraints correlating to dimensions has been resolved through singular value decomposition. Principle component analysis is been used in which condition has been explored to estimate the strength of actionable to be particular attributes and utilizing the domain knowledge to refinement and validating the optimal centroids dynamically. An experimental result proves that proposed framework outperforms other competition subspace clustering technique in terms of efficiency, Fmeasure, parameter insensitiveness and accuracy. G. Raj Kamal | A. Deepika | D. Pavithra | J. Mohammed Nadeem | V. Prasath Kumar "Principle Component Analysis Based on Optimal Centroid Selection Model for SubSpace Clustering Model" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-4 , June 2020, URL: https://www.ijtsrd.com/papers/ijtsrd31374.pdf Paper Url :https://www.ijtsrd.com/computer-science/data-miining/31374/principle-component-analysis-based-on-optimal-centroid-selection-model-for-subspace-clustering-model/g-raj-kamal
This is a webquest about Digital Citizenship for sixth to eighth graders. It covers topics such as plagiarism and "netiquette". From this webquest, the students will create a Digital Citizenship portfolio and analyze the information to conclude what Digital Citizenship means to them.
Significance Assessment of Architectural Heritage Monuments in Old-GoaIJMER
Abstract: Old-Goa has been declared as a World Heritage
site in 1986 for its rich culture, built heritage and includes
many magnificent churches, monuments and temples. Most
of these churches are world famous and constructed way
back in the 16
th
century and are the best examples of
Manueline and Gothic architecture. These churches have
very intricate detailing and ornamentation reflecting the
past and playing an important role in shaping the
community to know about the ancient culture, way of life,
architecture, level of development, building techniques, and
use of material, art and other aspects of the society of a
particular period. The rich heritage structures are on the
verge of deterioration and alarms for effective management.
The surrounding areas are getting developed in a non
harmonious manner without any due respect to the fine
existing architecture. The detracting and non-contributory
buildings will deface the heritage area losing its identity
due to non harmonious approach by the agencies and
people. These heritage monuments and areas are to be
made their significance assessment for undertaking the
conservation and preservation. The paper deals with the
significant assessment of the heritage monuments in the
heritage area of Old Goa.
Keywords: Architectural Significance, Heritage,
Conservation, Renaissance, Baroque.
Virtualization Technology using Virtual Machines for Cloud ComputingIJMER
Cloud computing is the delivery of computing and storage capacity as a service to a community of end users. The name “cloud computing” comes from the use of a cloud-shaped symbol as an abstraction for the complex infrastructure it contains in system diagrams. Cloud computing entrusts services with a user's software, data and computation over a network. End users access cloud-based applications through a web browser or mobile application or a light-weight desktop while the business software and user's data are stored on servers at a remote location. Proponents claim that cloud computing environment allows enterprises to get their applications up and running faster, with improved manageability and less maintenance, and enables IT industry to more rapidly adjust resources to meet fluctuating and unpredictable business demand. In this paper, we present a system that uses virtualization technology to allocate the data center resources dynamically based on the application demands and support green computing by optimizing the number of servers in use. This method multiplexes virtual to physical resources adaptively based on the changing demand. We use the concept of skewness metric to combine virtual machines with different resource characteristics appropriately so that the capacities of servers are well utilized.
Application of Parabolic Trough Collectorfor Reduction of Pressure Drop in Oi...IJMER
Pipelines are the least expensive and most effective method for the oil transportation.
Due to high viscosity of crude oil, the pressure drop and pumping power requirements are very high.
So it is necessary to bring down the viscosity of crude oil. Heated pipelines are used reduce the oil
viscosity by increasing the oil temperature. Electrical heating and direct flame heating are the common
method used for heating the oil pipeline. In this work, a new application of Parabolic Trough Collector
in the field of oil pipeline transport is introduced for reducing pressure drop in oil pipelines. Oil
pipeline is heated by applying concentrated solar radiation on the pipe surface using a Parabolic
Trough Collector in which the oil pipeline acts as the absorber pipe. 3-D steady state analysis is
carried out on a heated oil pipeline using commercial CFD software package ANSYS Fluent 14.5. In
this work an effort is made to investigate the effect of concentrated solar radiation for reducing
pressure drop in the oil pipeline. The results from the numerical analysis shows that the pressure drop
in oil pipeline is get reduced by heating the pipe line using concentrated solar radiation. From this
work, the application of PTC in oil pipeline transportation is justified.
This paper of finite element analysis of the rib cage model is applied to recognize stress distributions and to determine the rate of bone fractures(especially for pathologically changed bones). Also to determine the load and stress to occurs on the human rib cage at any accident. Also find the maximum load sustain capacity of human rib cage and according to the load sustain capacity of the human rib cage by finite element analysis and search a material as like a bone cement and it take on a rib fracture and see the result . This paper is only of to nullify the rib fracture as present medical treatment give the elastic belt but due to respiration, the human ribs are contract and relax that’s the rib fracture are only minimize not a nullify. The human models are considered in between age 15 to 40 year. The Simulation result shows a good agreement with the cadaver test data.
Textual Data Partitioning with Relationship and Discriminative AnalysisEditor IJMTER
Data partitioning methods are used to partition the data values with similarity. Similarity
measures are used to estimate transaction relationships. Hierarchical clustering model produces tree
structured results. Partitioned clustering produces results in grid format. Text documents are
unstructured data values with high dimensional attributes. Document clustering group ups unlabeled text
documents into meaningful clusters. Traditional clustering methods require cluster count (K) for the
document grouping process. Clustering accuracy degrades drastically with reference to the unsuitable
cluster count.
Textual data elements are divided into two types’ discriminative words and nondiscriminative
words. Only discriminative words are useful for grouping documents. The involvement of
nondiscriminative words confuses the clustering process and leads to poor clustering solution in return.
A variation inference algorithm is used to infer the document collection structure and partition of
document words at the same time. Dirichlet Process Mixture (DPM) model is used to partition
documents. DPM clustering model uses both the data likelihood and the clustering property of the
Dirichlet Process (DP). Dirichlet Process Mixture Model for Feature Partition (DPMFP) is used to
discover the latent cluster structure based on the DPM model. DPMFP clustering is performed without
requiring the number of clusters as input.
Document labels are used to estimate the discriminative word identification process. Concept
relationships are analyzed with Ontology support. Semantic weight model is used for the document
similarity analysis. The system improves the scalability with the support of labels and concept relations
for dimensionality reduction process.
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSijdkp
Subspace clustering discovers the clusters embedded in multiple, overlapping subspaces of high
dimensional data. Many significant subspace clustering algorithms exist, each having different
characteristics caused by the use of different techniques, assumptions, heuristics used etc. A comprehensive
classification scheme is essential which will consider all such characteristics to divide subspace clustering
approaches in various families. The algorithms belonging to same family will satisfy common
characteristics. Such a categorization will help future developers to better understand the quality criteria to
be used and similar algorithms to be used to compare results with their proposed clustering algorithms. In
this paper, we first proposed the concept of SCAF (Subspace Clustering Algorithms’ Family).
Characteristics of SCAF will be based on the classes such as cluster orientation, overlap of dimensions etc.
As an illustration, we further provided a comprehensive, systematic description and comparison of few
significant algorithms belonging to “Axis parallel, overlapping, density based” SCAF.
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
Text Document Clustering is one of the fastest growing research areas because of availability of huge amount of information in an electronic form. There are several number of techniques launched for clustering documents in such a way that documents within a cluster have high intra-similarity and low inter-similarity to other clusters. Many document clustering algorithms provide localized search in effectively navigating, summarizing, and organizing information. A global optimal solution can be obtained by applying high-speed and high-quality optimization algorithms. The optimization technique performs a globalized search in the entire solution space. In this paper, a brief survey on optimization approaches to text document clustering is turned out.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...IJECEIAES
A hard partition clustering algorithm assigns equally distant points to one of the clusters, where each datum has the probability to appear in simultaneous assignment to further clusters. The fuzzy cluster analysis assigns membership coefficients of data points which are equidistant between two clusters so the information directs have a place toward in excess of one cluster in the meantime. For a subset of CiteScore dataset, fuzzy clustering (fanny) and fuzzy c-means (fcm) algorithms were implemented to study the data points that lie equally distant from each other. Before analysis, clusterability of the dataset was evaluated with Hopkins statistic which resulted in 0.4371, a value < 0.5, indicating that the data is highly clusterable. The optimal clusters were determined using NbClust package, where it is evidenced that 9 various indices proposed 3 cluster solutions as best clusters. Further, appropriate value of fuzziness parameter m was evaluated to determine the distribution of membership values with variation in m from 1 to 2. Coefficient of variation (CV), also known as relative variability was evaluated to study the spread of data. The time complexity of fuzzy clustering (fanny) and fuzzy c-means algorithms were evaluated by keeping data points constant and varying number of clusters.
A Combined Approach for Feature Subset Selection and Size Reduction for High ...IJERA Editor
selection of relevant feature from a given set of feature is one of the important issues in the field of
data mining as well as classification. In general the dataset may contain a number of features however it is not
necessary that the whole set features are important for particular analysis of decision making because the
features may share the common information‟s and can also be completely irrelevant to the undergoing
processing. This generally happen because of improper selection of features during the dataset formation or
because of improper information availability about the observed system. However in both cases the data will
contain the features that will just increase the processing burden which may ultimately cause the improper
outcome when used for analysis. Because of these reasons some kind of methods are required to detect and
remove these features hence in this paper we are presenting an efficient approach for not just removing the
unimportant features but also the size of complete dataset size. The proposed algorithm utilizes the information
theory to detect the information gain from each feature and minimum span tree to group the similar features
with that the fuzzy c-means clustering is used to remove the similar entries from the dataset. Finally the
algorithm is tested with SVM classifier using 35 publicly available real-world high-dimensional dataset and the
results shows that the presented algorithm not only reduces the feature set and data lengths but also improves the
performances of the classifier.
automatic classification in information retrievalBasma Gamal
automatic classification in information retrieval-automatic classification of documents
Chapter 3 from IR_VAN_Book
INFORMATION RETRIEVAL
C. J. van RIJSBERGEN B.Sc., Ph.D., M.B.C.S.
Semi-Supervised Discriminant Analysis Based On Data Structureiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Clustering of high dimensionality data which can be seen in almost all fields these days is becoming
very tedious process. The key disadvantage of high dimensional data which we can pen down is curse
of dimensionality. As the magnitude of datasets grows the data points become sparse and density of
area becomes less making it difficult to cluster that data which further reduces the performance of
traditional algorithms used for clustering. Semi-supervised clustering algorithms aim to improve
clustering results using limited supervision. The supervision is generally given as pair wise
constraints; such constraints are natural for graphs, yet most semi-supervised clustering algorithms are
designed for data represented as vectors [2]. In this paper, we unify vector-based and graph-based
approaches. We first show that a recently-proposed objective function for semi-supervised clustering
based on Hidden Markov Random Fields, with squared Euclidean distance and a certain class of
constraint penalty functions, can be expressed as a special case of the global kernel k-means objective
[3]. A recent theoretical connection between global kernel k-means and several graph clustering
objectives enables us to perform semi-supervised clustering of data. In particular, some methods have
been proposed for semi supervised clustering based on pair wise similarity or dissimilarity
information. In this paper, we propose a kernel approach for semi supervised clustering and present in
detail two special cases of this kernel approach.
Principle Component Analysis Based on Optimal Centroid Selection Model for Su...ijtsrd
Clustering a large sparse and large scale data is an open research in the data mining. To discover the significant information through clustering algorithm stands inadequate as most of the data finds to be non actionable. Existing clustering technique is not feasible to time varying data in high dimensional space. Hence Subspace clustering will be answerable to problems in the clustering through incorporation of domain knowledge and parameter sensitive prediction. Sensitiveness of the data is also predicted through thresholding mechanism. The problems of usability and usefulness in 3D subspace clustering are very important issue in subspace clustering. . The Solutions is highly helpful benefit for police departments and law enforcement organisations to better understand stock issues and provide insights that will enable them to track activities, predict the likelihood. Also determining the correct dimension is inconsistent and challenging issue in subspace clustering .In this thesis, we propose Centroid based Subspace Forecasting Framework by constraints is proposed, i.e. must link and must not link with domain knowledge. Unsupervised Subspace clustering algorithm with inbuilt process like inconsistent constraints correlating to dimensions has been resolved through singular value decomposition. Principle component analysis is been used in which condition has been explored to estimate the strength of actionable to be particular attributes and utilizing the domain knowledge to refinement and validating the optimal centroids dynamically. An experimental result proves that proposed framework outperforms other competition subspace clustering technique in terms of efficiency, Fmeasure, parameter insensitiveness and accuracy. G. Raj Kamal | A. Deepika | D. Pavithra | J. Mohammed Nadeem | V. Prasath Kumar "Principle Component Analysis Based on Optimal Centroid Selection Model for SubSpace Clustering Model" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-4 , June 2020, URL: https://www.ijtsrd.com/papers/ijtsrd31374.pdf Paper Url :https://www.ijtsrd.com/computer-science/data-miining/31374/principle-component-analysis-based-on-optimal-centroid-selection-model-for-subspace-clustering-model/g-raj-kamal
This is a webquest about Digital Citizenship for sixth to eighth graders. It covers topics such as plagiarism and "netiquette". From this webquest, the students will create a Digital Citizenship portfolio and analyze the information to conclude what Digital Citizenship means to them.
Significance Assessment of Architectural Heritage Monuments in Old-GoaIJMER
Abstract: Old-Goa has been declared as a World Heritage
site in 1986 for its rich culture, built heritage and includes
many magnificent churches, monuments and temples. Most
of these churches are world famous and constructed way
back in the 16
th
century and are the best examples of
Manueline and Gothic architecture. These churches have
very intricate detailing and ornamentation reflecting the
past and playing an important role in shaping the
community to know about the ancient culture, way of life,
architecture, level of development, building techniques, and
use of material, art and other aspects of the society of a
particular period. The rich heritage structures are on the
verge of deterioration and alarms for effective management.
The surrounding areas are getting developed in a non
harmonious manner without any due respect to the fine
existing architecture. The detracting and non-contributory
buildings will deface the heritage area losing its identity
due to non harmonious approach by the agencies and
people. These heritage monuments and areas are to be
made their significance assessment for undertaking the
conservation and preservation. The paper deals with the
significant assessment of the heritage monuments in the
heritage area of Old Goa.
Keywords: Architectural Significance, Heritage,
Conservation, Renaissance, Baroque.
Virtualization Technology using Virtual Machines for Cloud ComputingIJMER
Cloud computing is the delivery of computing and storage capacity as a service to a community of end users. The name “cloud computing” comes from the use of a cloud-shaped symbol as an abstraction for the complex infrastructure it contains in system diagrams. Cloud computing entrusts services with a user's software, data and computation over a network. End users access cloud-based applications through a web browser or mobile application or a light-weight desktop while the business software and user's data are stored on servers at a remote location. Proponents claim that cloud computing environment allows enterprises to get their applications up and running faster, with improved manageability and less maintenance, and enables IT industry to more rapidly adjust resources to meet fluctuating and unpredictable business demand. In this paper, we present a system that uses virtualization technology to allocate the data center resources dynamically based on the application demands and support green computing by optimizing the number of servers in use. This method multiplexes virtual to physical resources adaptively based on the changing demand. We use the concept of skewness metric to combine virtual machines with different resource characteristics appropriately so that the capacities of servers are well utilized.
Application of Parabolic Trough Collectorfor Reduction of Pressure Drop in Oi...IJMER
Pipelines are the least expensive and most effective method for the oil transportation.
Due to high viscosity of crude oil, the pressure drop and pumping power requirements are very high.
So it is necessary to bring down the viscosity of crude oil. Heated pipelines are used reduce the oil
viscosity by increasing the oil temperature. Electrical heating and direct flame heating are the common
method used for heating the oil pipeline. In this work, a new application of Parabolic Trough Collector
in the field of oil pipeline transport is introduced for reducing pressure drop in oil pipelines. Oil
pipeline is heated by applying concentrated solar radiation on the pipe surface using a Parabolic
Trough Collector in which the oil pipeline acts as the absorber pipe. 3-D steady state analysis is
carried out on a heated oil pipeline using commercial CFD software package ANSYS Fluent 14.5. In
this work an effort is made to investigate the effect of concentrated solar radiation for reducing
pressure drop in the oil pipeline. The results from the numerical analysis shows that the pressure drop
in oil pipeline is get reduced by heating the pipe line using concentrated solar radiation. From this
work, the application of PTC in oil pipeline transportation is justified.
This paper of finite element analysis of the rib cage model is applied to recognize stress distributions and to determine the rate of bone fractures(especially for pathologically changed bones). Also to determine the load and stress to occurs on the human rib cage at any accident. Also find the maximum load sustain capacity of human rib cage and according to the load sustain capacity of the human rib cage by finite element analysis and search a material as like a bone cement and it take on a rib fracture and see the result . This paper is only of to nullify the rib fracture as present medical treatment give the elastic belt but due to respiration, the human ribs are contract and relax that’s the rib fracture are only minimize not a nullify. The human models are considered in between age 15 to 40 year. The Simulation result shows a good agreement with the cadaver test data.
Education set for collecting and visualizing data using sensor system based ...IJMER
This article presents the issues of the wireless sensor measuring systems design which might
be used in education process of computer science faculty. The work shows the integration of a simple
measuring system, data management system, visual system and the hardware. Education set is designed
to consolidate knowledge in many fields of computer science and the interdependence between them, as
programming techniques, database, Web server, communications protocols, software and hardware.
Presented measuring sensor system consists of a number of measurement nodes, whose role is to
provide information about certain desirable characteristics, warning against natural hazards or
violation of the physical safety. An important part of the sensor system is a measuring subsystem and
the collecting measurement data subsystem. The article presents the temperature measurement sensor
system concepts and measurement data storage and visualization methods
From system performance to application metrics, we continue to further our understanding of what to monitor, why, and how to present it appropriately to the various audiences who need to act on this information. Yet there are things across our environment that we agree we can’t measure because they are unquantifiable. That doesn’t mean that there is zero signal to be analyzed and monitored.
We can look at open source software that is in wide use, yet becomes stale and unusable after years due to the atrophy of maintainers keeping it up to date with security and integrations with other software, or implementation of new features that keep it useful. How do you measure the health of your current implemented software solutions so that you know when to start planning change, or committing intentional time to a project?
In this talk, I’ll tackle these questions in addition to sharing other observations about monitoring within our environments with the goal of inspiring others to examine available signals, their impact, and the value of monitoring.
Tracking of Maximum Power from Wind Using Fuzzy Logic Controller Based On PMSGIJMER
Wind energy has gained a growing worldwide interest due to the nonstop rise in fuel cost. The main aim of the wind-energy system is to extract the maximum power present in the wind stream. In order to extract the highest power, the maximum power point tracking (MPPT) algorithm is used. This paper proposes the fuzzy logic MPPT controller to track the maximum power from the wind generation
system. The maximum power is achieved based on the rotor speed of the wind system which consists of
wind turbine and PMSG. The error and change in error is given as input to the fuzzy logic and its output
is connected to the boost converter. The voltage from the dc link is controlled by the Voltage Source
Inverter (VSI), and it is placed in grid side converter control. The proposed system is designed and evaluated in MATLAB/SIMULINK. Simulation results show the good dynamic performance of the proposed system.
On Characterizations of NANO RGB-Closed Sets in NANO Topological SpacesIJMER
The purpose of this paper is to establish and derive the theorems which exhibit the
characterization of nano rgb-closed sets in nano topological space and obtain some of their interesting
properties. We also use this notion to consider new weak form of continuities with these sets.
2010 AMS classification: 54A05, 54C10.
International Journal of Engineering and Science Invention (IJESI)inventionjournals
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Distribution Similarity based Data Partition and Nearest Neighbor Search on U...Editor IJMTER
Databases are build with the fixed number of fields and records. Uncertain database contains a
different number of fields and records. Clustering techniques are used to group up the relevant records
based on the similarity values. The similarity measures are designed to estimate the relationship between
the transactions with fixed attributes. The uncertain data similarity is estimated using similarity
measures with some modifications.
Clustering on uncertain data is one of the essential tasks in mining uncertain data. The existing
methods extend traditional partitioning clustering methods like k-means and density-based clustering
methods like DBSCAN to uncertain data. Such methods cannot handle uncertain objects. Probability
distributions are essential characteristics of uncertain objects have not been considered in measuring
similarity between uncertain objects.
The customer purchase transaction data is analyzed using uncertain data clustering scheme. The
density based clustering mechanism is used for the uncertain data clustering process. This model
produces results with minimum accuracy levels. The clustering technique is improved with distribution
based similarity model for uncertain data. The nearest neighbor search technique is applied on the
distribution based data environment. The system is designed using java as a front end and oracle as a
back end.
Recent Trends in Incremental Clustering: A ReviewIOSRjournaljce
This paper presents a review on recent trends in incremental clustering algorithms. It tries to focus on both clustering based on similarity measure and clustering not based on similarity measure. In this context, the paper is devoted to various typical incremental clustering algorithms. Mainly optimization, genetic and fuzzy approaches of these algorithms is covered in the paper. The paper is original with respect to one aspect that is, it provides a complete overview that is fully devoted to evolutionary algorithms for incremental clustering. A number of references are provided that describe applications of evolutionary algorithms for incremental clustering in different domains, such as human activity detection, online fault detection, information security, track an object consistently throughout the network solving boundary problem etc.
Privacy preservation techniques in data miningeSAT Journals
Abstract In this paper different privacy preservation techniques are compared. Classification is the most commonly applied data mining technique, which employs a set of pre-classified examples to develop a model that can classify the population of records at large. Fraud detection and credit risk applications are particularly well suited to this type of analysis. This approach frequently employs decision tree or neural network-based classification algorithms. The data classification process involves learning and classification. In Learning the training data are analyzed by classification algorithm. In classification test data are used to estimate the accuracy of the classification rules. If the accuracy is acceptable the rules can be applied to the new data tuples . For a fraud detection application, this would include complete records of both fraudulent and valid activities determined on a record-by-record basis. The classifier-training algorithm uses these pre-classified examples to determine the set of parameters required for proper discrimination. The algorithm then encodes these parameters into a model called a classifier Index Terms: Data Mining, Privacy Preservation, Clustering, Classification Techniques, Naive Bayes.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Hierarchal clustering and similarity measures along with multi representationeSAT Journals
Abstract All clustering methods have to assume some cluster relationship on the list of data objects that they really are applied on. Graph-Based Document Clustering works with frequent senses rather than frequent keywords used in traditional text mining techniques.Similarity between a pair of objects can be defined either explicitly or implicitly. With this paper, we analyzed existing multi-viewpoint based similarity measure and two related clustering methods. The main difference between a traditional dissimilarity/similarity measure and ours could be that the former uses merely a single viewpoint, which is the origin, even though the latter utilizes many viewpoints, which you ll find are objects assumed to not have the very same cluster using the two objects being measured. Using multiple viewpoints, more informative assessment of similarity could well be achieved. Theoretical analysis and empirical study are conducted to back up this claim. Two criterion functions for document clustering are proposed dependent on this wonderful measure. We compare them several well-known clustering algorithms which use other popular similarity measures on various document collections confirming the good sides of our proposal. Keywords –Multiview Cluster, Document id, ClusterDistance
Clustering heterogeneous categorical data using enhanced mini batch K-means ...IJECEIAES
Clustering methods in data mining aim to group a set of patterns based on their similarity. In a data survey, heterogeneous information is established with various types of data scales like nominal, ordinal, binary, and Likert scales. A lack of treatment of heterogeneous data and information leads to loss of information and scanty decision-making. Although many similarity measures have been established, solutions for heterogeneous data in clustering are still lacking. The recent entropy distance measure seems to provide good results for the heterogeneous categorical data. However, it requires many experiments and evaluations. This article presents a proposed framework for heterogeneous categorical data solution using a mini batch k-means with entropy measure (MBKEM) which is to investigate the effectiveness of similarity measure in clustering method using heterogeneous categorical data. Secondary data from a public survey was used. The findings demonstrate the proposed framework has improved the clustering’s quality. MBKEM outperformed other clustering algorithms with the accuracy at 0.88, v-measure (VM) at 0.82, adjusted rand index (ARI) at 0.87, and Fowlkes-Mallow’s index (FMI) at 0.94. It is observed that the average minimum elapsed time-varying for cluster generation, k at 0.26 s. In the future, the proposed solution would be beneficial for improving the quality of clustering for heterogeneous categorical data problems in many domains.
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...ijdmtaiir
-In this study a comprehensive evaluation of two
supervised feature selection methods for dimensionality
reduction is performed - Latent Semantic Indexing (LSI) and
Principal Component Analysis (PCA). This is gauged against
unsupervised techniques like fuzzy feature clustering using
hard fuzzy C-means (FCM) . The main objective of the study is
to estimate the relative efficiency of two supervised techniques
against unsupervised fuzzy techniques while reducing the
feature space. It is found that clustering using FCM leads to
better accuracy in classifying documents in the face of
evolutionary algorithms like LSI and PCA. Results show that
the clustering of features improves the accuracy of document
classification
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Similar to A Novel Clustering Method for Similarity Measuring in Text Documents (20)
A Study on Translucent Concrete Product and Its Properties by Using Optical F...IJMER
- Translucent concrete is a concrete based material with light-transferring properties,
obtained due to embedded light optical elements like Optical fibers used in concrete. Light is conducted
through the concrete from one end to the other. This results into a certain light pattern on the other
surface, depending on the fiber structure. Optical fibers transmit light so effectively that there is
virtually no loss of light conducted through the fibers. This paper deals with the modeling of such
translucent or transparent concrete blocks and panel and their usage and also the advantages it brings
in the field. The main purpose is to use sunlight as a light source to reduce the power consumption of
illumination and to use the optical fiber to sense the stress of structures and also use this concrete as an
architectural purpose of the building
Developing Cost Effective Automation for Cotton Seed DelintingIJMER
A low cost automation system for removal of lint from cottonseed is to be designed and
developed. The setup consists of stainless steel drum with stirrer in which cottonseeds having lint is mixed
with concentrated sulphuric acid. So lint will get burn. This lint free cottonseed treated with lime water to
neutralize acidic nature. After water washing this cottonseeds are used for agriculter purpose
Study & Testing Of Bio-Composite Material Based On Munja FibreIJMER
The incorporation of natural fibres such as munja fiber composites has gained
increasing applications both in many areas of Engineering and Technology. The aim of this study is to
evaluate mechanical properties such as flexural and tensile properties of reinforced epoxy composites.
This is mainly due to their applicable benefits as they are light weight and offer low cost compared to
synthetic fibre composites. Munja fibres recently have been a substitute material in many weight-critical
applications in areas such as aerospace, automotive and other high demanding industrial sectors. In
this study, natural munja fibre composites and munja/fibreglass hybrid composites were fabricated by a
combination of hand lay-up and cold-press methods. A new variety in munja fibre is the present work
the main aim of the work is to extract the neat fibre and is characterized for its flexural characteristics.
The composites are fabricated by reinforcing untreated and treated fibre and are tested for their
mechanical, properties strictly as per ASTM procedures.
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)IJMER
Hybrid engine is a combination of Stirling engine, IC engine and Electric motor. All these 3 are
connected together to a single shaft. The power source of the Stirling engine will be a Solar Panel. The aim of
this is to run the automobile using a Hybrid engine
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...IJMER
The present day technology demands eco-friendly developments. In this era the
composite material are playing a vital roal in different field of Engineering .The composite materials
are using as a principle materials. Nowaday the composite materials are utilizing as a important
component of engineering field .Where as the importance of the applications of composites is well
known, but thrust on the use of natural fibres in it for reinforcement has been given priority for some
times. But changing from synthetic fibres to natural fibres provides only half green-composites. A
partial green composite will be achieved if the matrix component is also eco-friendly. Keeping this in
view, a detailed literature surveyed has been carried out through various issues of the Journals
related to this field. The material systems used are sunnhemp fibres. Some epoxy and hardener has
been also added for stability and drying of the bio-composites. Various graphs and bar-charts are
super-imposed on each other for comparison among themselves and Graphs is plotted on MAT LAB
and ORIGIN 6.0 software. To determining tensile strengths, Various properties for different biocomposites
have been compared among themselves. Comparison of the behaviour of bio-composites of
this work has been also compare with other works. The bio-composites developed in this work are
likely to get applications in fall ceilings, partitions, bio-degradable packagings, automotive interiors,
sports things (e.g. rackets, nets, etc.), toys etc.
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...IJMER
The Greenstone belts of Karnataka are enriched in BIFs in Dharwar craton, where Iron
formations are confined to the basin shelf, clearly separated from the deeper-water iron formation that
accumulated at the basin margin and flanking the marine basin. Geochemical data procured in terms of
major, trace and REE are plotted in various diagrams to interpret the genesis of BIFs. Al2O3, Fe2O3 (T),
TiO2, CaO, and SiO2 abundances and ratios show a wide variation. Ni, Co, Zr, Sc, V, Rb, Sr, U, Th,
ΣREE, La, Ce and Eu anomalies and their binary relationships indicate that wherever the terrigenous
component has increased, the concentration of elements of felsic such as Zr and Hf has gone up. Elevated
concentrations of Ni, Co and Sc are contributed by chlorite and other components characteristic of basic
volcanic debris. The data suggest that these formations were generated by chemical and clastic
sedimentary processes on a shallow shelf. During transgression, chemical precipitation took place at the
sediment-water interface, whereas at the time of regression. Iron ore formed with sedimentary structures
and textures in Kammatturu area, in a setting where the water column was oxygenated.
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...IJMER
In this paper, the mechanical characteristics of C45 medium carbon steel are investigated
under various working conditions. The main characteristic to be studied on this paper is impact toughness
of the material with different configurations and the experiment were carried out on charpy impact testing
equipment. This study reveals the ability of the material to absorb energy up to failure for various
specimen configurations under different heat treated conditions and the corresponding results were
compared with the analysis outcome
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...IJMER
Robot guns are being increasingly employed in automotive manufacturing to replace
risky jobs and also to increase productivity. Using a single robot for a single operation proves to be
expensive. Hence for cost optimization, multiple guns are mounted on a single robot and multiple
operations are performed. Robot Gun structure is an efficient way in which multiple welds can be done
simultaneously. However mounting several weld guns on a single structure induces a variety of
dynamic loads, especially during movement of the robot arm as it maneuvers to reach the weld
locations. The primary idea employed in this paper, is to model those dynamic loads as equivalent G
force loads in FEA. This approach will be on the conservative side, and will be saving time and
subsequently cost efficient. The approach of the paper is towards creating a standard operating
procedure when it comes to analysis of such structures, with emphasis on deploying various technical
aspects of FEA such as Non Linear Geometry, Multipoint Constraint Contact Algorithm, Multizone
meshing .
Static Analysis of Go-Kart Chassis by Analytical and Solid Works SimulationIJMER
This paper aims to do modelling, simulation and performing the static analysis of a go
kart chassis consisting of Circular beams. Modelling, simulations and analysis are performed using 3-D
modelling software i.e. Solid Works and ANSYS according to the rulebook provided by Indian Society of
New Era Engineers (ISNEE) for National Go Kart Championship (NGKC-14).The maximum deflection is
determined by performing static analysis. Computed results are then compared to analytical calculation,
where it is found that the location of maximum deflection agrees well with theoretical approximation but
varies on magnitude aspect.
In récent year various vehicle introduced in market but due to limitation in
carbon émission and BS Séries limitd speed availability vehicle in the market and causing of
environnent pollution over few year There is need to decrease dependancy on fuel vehicle.
bicycle is to be modified for optional in the future To implement new technique using change in
pedal assembly and variable speed gearbox such as planetary gear optimise speed of vehicle
with variable speed ratio.To increase the efficiency of bicycle for confortable drive and to
reduce torque appli éd on bicycle. we introduced epicyclic gear box in which transmission done
throgh Chain Drive (i.e. Sprocket )to rear wheel with help of Epicyclical gear Box to give
number of différent Speed during driving.To reduce torque requirent in the cycle with change in
the pedal mechanism
Integration of Struts & Spring & Hibernate for Enterprise ApplicationsIJMER
The proposal of this paper is to present Spring Framework which is widely used in
developing enterprise applications. Considering the current state where applications are developed using
the EJB model, Spring Framework assert that ordinary java beans(POJO) can be utilize with minimal
modifications. This modular framework can be used to develop the application faster and can reduce
complexity. This paper will highlight the design overview of Spring Framework along with its features that
have made the framework useful. The integration of multiple frameworks for an E-commerce system has
also been addressed in this paper. This paper also proposes structure for a website based on integration of
Spring, Hibernate and Struts Framework.
Microcontroller Based Automatic Sprinkler Irrigation SystemIJMER
Microcontroller based Automatic Sprinkler System is a new concept of using
intelligence power of embedded technology in the sprinkler irrigation work. Designed system replaces
the conventional manual work involved in sprinkler irrigation to automatic process. Using this system a
farmer is protected against adverse inhuman weather conditions, tedious work of changing over of
sprinkler water pipe lines & risk of accident due to high pressure in the water pipe line. Overall
sprinkler irrigation work is transformed in to a comfortableautomatic work. This system provides
flexibility & accuracy in respect of time set for the operation of a sprinkler water pipe lines. In present
work the author has designed and developed an automatic sprinkler irrigation system which is
controlled and monitored by a microcontroller interfaced with solenoid valves.
On some locally closed sets and spaces in Ideal Topological SpacesIJMER
In this paper we introduce and characterize some new generalized locally closed sets
known as
δ
ˆ
s-locally closed sets and spaces are known as
δ
ˆ
s-normal space and
δ
ˆ
s-connected space and
discussed some of their properties
Intrusion Detection and Forensics based on decision tree and Association rule...IJMER
This paper present an approach based on the combination of, two techniques using
decision tree and Association rule mining for Probe attack detection. This approach proves to be
better than the traditional approach of generating rules for fuzzy expert system by clustering methods.
Association rule mining for selecting the best attributes together and decision tree for identifying the
best parameters together to create the rules for fuzzy expert system. After that rules for fuzzy expert
system are generated using association rule mining and decision trees. Decision trees is generated for
dataset and to find the basic parameters for creating the membership functions of fuzzy inference
system. Membership functions are generated for the probe attack. Based on these rules we have
created the fuzzy inference system that is used as an input to neuro-fuzzy system. Fuzzy inference
system is loaded to neuro-fuzzy toolbox as an input and the final ANFIS structure is generated for
outcome of neuro-fuzzy approach. The experiments and evaluations of the proposed method were
done with NSL-KDD intrusion detection dataset. As the experimental results, the proposed approach
based on the combination of, two techniques using decision tree and Association rule mining
efficiently detected probe attacks. Experimental results shows better results for detecting intrusions as
compared to others existing methods
Natural Language Ambiguity and its Effect on Machine LearningIJMER
"Natural language processing" here refers to the use and ability of systems to process
sentences in a natural language such as English, rather than in a specialized artificial computer
language such as C++. The systems of real interest here are digital computers of the type we think of as
personal computers and mainframes. Of course humans can process natural languages, but for us the
question is whether digital computers can or ever will process natural languages. We have tried to
explore in depth and break down the types of ambiguities persistent throughout the natural languages
and provide an answer to the question “How it affects the machine translation process and thereby
machine learning as whole?” .
Today in era of software industry there is no perfect software framework available for
analysis and software development. Currently there are enormous number of software development
process exists which can be implemented to stabilize the process of developing a software system. But no
perfect system is recognized till yet which can help software developers for opting of best software
development process. This paper present the framework of skillful system combined with Likert scale. With
the help of Likert scale we define a rule based model and delegate some mass score to every process and
develop one tool name as MuxSet which will help the software developers to select an appropriate
development process that may enhance the probability of system success.
Material Parameter and Effect of Thermal Load on Functionally Graded CylindersIJMER
The present study investigates the creep in a thick-walled composite cylinders made
up of aluminum/aluminum alloy matrix and reinforced with silicon carbide particles. The distribution
of SiCp is assumed to be either uniform or decreasing linearly from the inner to the outer radius of
the cylinder. The creep behavior of the cylinder has been described by threshold stress based creep
law with a stress exponent of 5. The composite cylinders are subjected to internal pressure which is
applied gradually and steady state condition of stress is assumed. The creep parameters required to
be used in creep law, are extracted by conducting regression analysis on the available experimental
results. The mathematical models have been developed to describe steady state creep in the composite
cylinder by using von-Mises criterion. Regression analysis is used to obtain the creep parameters
required in the study. The basic equilibrium equation of the cylinder and other constitutive equations
have been solved to obtain creep stresses in the cylinder. The effect of varying particle size, particle
content and temperature on the stresses in the composite cylinder has been analyzed. The study
revealed that the stress distributions in the cylinder do not vary significantly for various combinations
of particle size, particle content and operating temperature except for slight variation observed for
varying particle content. Functionally Graded Materials (FGMs) emerged and led to the development
of superior heat resistant materials.
Energy Audit is the systematic process for finding out the energy conservation
opportunities in industrial processes. The project carried out studies on various energy conservation
measures application in areas like lighting, motors, compressors, transformer, ventilation system etc.
In this investigation, studied the technical aspects of the various measures along with its cost benefit
analysis.
Investigation found that major areas of energy conservation are-
1. Energy efficient lighting schemes.
2. Use of electronic ballast instead of copper ballast.
3. Use of wind ventilators for ventilation.
4. Use of VFD for compressor.
5. Transparent roofing sheets to reduce energy consumption.
So Energy Audit is the only perfect & analyzed way of meeting the Industrial Energy Conservation.
An Implementation of I2C Slave Interface using Verilog HDLIJMER
The focus of this paper is on implementation of Inter Integrated Circuit (I2C) protocol
following slave module for no data loss. In this paper, the principle and the operation of I2C bus protocol
will be introduced. It follows the I2C specification to provide device addressing, read/write operation and
an acknowledgement. The programmable nature of device provide users with the flexibility of configuring
the I2C slave device to any legal slave address to avoid the slave address collision on an I2C bus with
multiple slave devices. This paper demonstrates how I2C Master controller transmits and receives data to
and from the Slave with proper synchronization.
The module is designed in Verilog and simulated in ModelSim. The design is also synthesized in Xilinx
XST 14.1. This module acts as a slave for the microprocessor which can be customized for no data loss.
Discrete Model of Two Predators competing for One PreyIJMER
This paper investigates the dynamical behavior of a discrete model of one prey two
predator systems. The equilibrium points and their stability are analyzed. Time series plots are obtained
for different sets of parameter values. Also bifurcation diagrams are plotted to show dynamical behavior
of the system in selected range of growth parameter
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
How world-class product teams are winning in the AI era by CEO and Founder, P...
A Novel Clustering Method for Similarity Measuring in Text Documents
1. www.ijmer.com
International Journal of Modern Engineering Research (IJMER)
Vol. 3, Issue. 5, Sep - Oct. 2013 pp-2823-2826
ISSN: 2249-6645
A Novel Clustering Method for Similarity Measuring in
Text Documents
Preethi Priyanka Thella1, G. Sridevi2
1
M.Tech, Nimra College of Engineering & Technology, Vijayawada, A.P., India.
Assoc.Professor, Dept.of CSE, Nimra College of Engineering & Technology, Vijayawada, A.P., India.
2
ABSTRACT: Clustering is the process of grouping data into subsets in such a manner that identical instances are
collected together, while different instances belong to different groups. The instances are thereby arranged into an efficient
depiction that characterizes the populace that is being sampled. A general move towards the clustering process is to treat it
as an optimization process. A best partition is found by optimizing an exacting function of similarity, or distance, among
data. Basically, there is a hidden assumption that the true inherent structure of data could be correctly describe by using the
similarity formula defined and fixed in the clustering decisive factor. In this paper, we introduce clustering with multi- view
points based on different similarity measures. The multi- view point approach to learning is one in which we have ‘views’ of
the data (sometimes in a rather abstract sense) and the goal is to use the relationship between these views to alleviate the
difficulty of a learning problem of interest.
Keywords: Clustering, Text mining, Similarity measure, View point.
I.
INTRODUCTION
Clustering[1] or cluster analysis is the task of grouping a set of objects in such a way that objects in the same group
(called cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters). It is a main
task of explorative data mining techniques, and a common technique for statistical data analysis used in many fields,
including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics. Cluster analysis
itself is not one specific algorithm or procedure, but the general task to be solved. It can be achieved by using various
algorithms that differ significantly in their notion of what constitutes a cluster and how to efficiently find them. Popular
notions of clusters include groups with low distances among the cluster members, intervals or particular statistical
distributions, dense areas of the data space. Clustering can therefore be formulated as a Multi- objective optimization
process.
The appropriate clustering algorithm and parameter settings, including values such as the distance function to use, a
density threshold or the number of expected clusters, depend on the individual data set and intended use of the results.
Clustering as such is not an automatic task, but an iterative process of Knowledge discovery or interactive multi- objective
optimization that involves trial and failure. It will often be necessary to modify parameters and preprocessing until the result
achieves the desired properties. Cluster analysis can be considered the most important unsupervised learning problem; so, as
every other problem of this kind, it deals with finding a structure in a collection of unlabeled data. A loose definition of
clustering process could be “the process of organizing objects into groups whose members are similar in some way”. A
cluster is therefore a collection of objects or items which are “similar” between them and are “dissimilar” to the objects
belonging to other clusters. Figure 1 shows clustering process.
Figure 1: Clustering Process
In this case we easily identify the four clusters into which the data can be divided; the similarity criterion is
distance: two or more objects belong to the same cluster if they are “close” according to a given distance (in this case
geometrical distance). This is called as distance based clustering. Another kind of clustering is called conceptual clustering:
two or more objects belong to the same cluster if this one defines a concept common to all that objects. In other words,
objects are grouped according to their fit to descriptive concepts, not according to the simple similarity measures. The multiview point approach to learning is one in which we have „views‟ of the data (sometimes in a rather abstract sense) and the
goal is to use the relationship between these views to alleviate the difficulty of a learning problem of interest.
www.ijmer.com
2823 | Page
2. www.ijmer.com
International Journal of Modern Engineering Research (IJMER)
Vol. 3, Issue. 5, Sep - Oct. 2013 pp-2823-2826
ISSN: 2249-6645
II.
RELATED WORK
Text clustering is required in the real world applications such as web search engines. It comes under text mining
process. It is meant for grouping text documents into various clusters. These clusters are used by various applications in the
real world, for example, search engines. A text document is treated as an object a word in the document is referred as a term.
A vector is built to represent each text document. The total number of terms in the text document is represented by m. Some
kind of weighting schemes like Term Frequency – Inverse Document Frequency (TF-IDF) is used to represent document
vectors. There are many approaches for text document clustering. They include probabilistic based methods [2], nonnegative
matrix factorization [3] and information theoretic co-clustering [4]. These approaches are not using a particular measure for
finding similarity among text documents. In this paper, we make use of multi- view point similarity measure for finding the
similarity. As found it literature, a measure widely used in text document clustering is ED (Euclidian Distance).
K-Means algorithm is most widely used clustering algorithm due to its ease of use and simplicity. Euclidian
distance is the measure used in K-Means algorithm to measure the distance between objects to make them into clusters. In
this case the cluster centroid is computed as follows:
Another similarity measure being used for text document mining is cosine similarity measure. It is best useful in hidimensional documents [5]. This measure is also being used in Spherical K-Means which is a variant of K-Means algorithm.
The difference between the two flavors of K-Means algorithm that use cosine similarity measure and ED measure
respectively is that the former focuses on vector directions while the latter focuses on vector magnitudes. Graph partitioning
is yet another approach which is very popular. It considers the text document corpus as graph and uses min-max cut
algorithm which represents centriod as follows:
There is a software package called CLUTO [6] which is meant for document clustering. It makes use of the graph
partitioning approach. Based on the nearest neighbor graph it builds, it text documents are clustered. It is based on the
Jacquard coefficient which is computed as follows:
Jacquard coefficients use both magnitude and direction which is not the case with Euclidian distance and cosine
similarity. However, it is similarity to cosine similarity when the documents are represented as unit vectors. In [7] there is
comparison between the two techniques namely Jacquard and Pearson correlation. It also concludes that both of them are
best used in clustering process of web documents. For tsxt document clustering other approaches can be used which are
phrase based and concept based. In phrase based approach is found while in [8] tree similarity based approach is found. The
common procedure used by both of them is “Hierarchical agglomerative Clustering”. The drawback of these approaches is
that their computational cost is too high. For clustering XML documents also there are some measures. One such measure is
called “Structural Similarity” which differs from text document clustering. This paper focuses on a new multi-view point
based similarity measure for text clustering.
III.
PROPOSED WORK
In proposed work, our approach in finding similarity between documents or objects while performing clustering is
multi-view based similarity. It makes use of more than one point of reference as opposed to existing algorithms used for text
document clustering. As per our approach the similarity between two documents is calculated as follows:
sim(d i , d j )
d i , d j S r
1
n nr
sim(d
d h S S r
i
dh , d j dh )
Consider two point “di” and “dj” in the cluster Sr. The similarity between those two points is viewed from a point
“dh” which is outside the cluster. Such similarity is equal to the product of the cosine angle between those points with
respect to Euclidean distance between the points. An assumption on which this definition is based on is “dh” is not the same
cluster as “di” and “dj”. When distances are very small, then the chances are higher that the “dh” is in the same cluster.
Though various viewpoints are useful in increasing the accuracy of the similarity measure there is a possibility of having that
give negative result. However the possibility of such a drawback can be ignored provided plenty of documents to be
clustered.
Now we have to carry out the validity test for the cosine similarity and multi view based similarity as follows. For each
type of the similarity measure, a similarity matrix called A = {aij}n×n is created. For CS, this is very simple, as aij = dti dj .
The algorithm for building Multi view Similarity (MVS) matrix is described in Algorithm 1.
www.ijmer.com
2824 | Page
3. International Journal of Modern Engineering Research (IJMER)
www.ijmer.com
Vol. 3, Issue. 5, Sep - Oct. 2013 pp-2823-2826
ISSN: 2249-6645
ALGORITHM 1: BUILDMVSMATRIX(A)
Step 1: for r ← 1 : c do
Step 2:
DS Sr ←
d
d i S r
i
Step 3: nS Sr←|S Sr|
Step 4: end for
Step 5: for i ← 1 : n do
Step 6:
r ← class of di
Step 7:
for j ← 1 : n do
Step 8:
if dj Sr then
Step 9:
Step 10:
else
Step 11:
Step 12: end if
Step 13: end for
Step 14: end for
Step 15: return A = {aij}n×n
First, the outer composite with respect to each class is determined. Then, for each row ai of “A”, i = 1, . . . , n, if the pair of
text documents di and dj, j = 1, . . . , n are in the same class, aij is calculated as in line 9. Otherwise, dj is assumed to be in
di‟s class, and aij is calculated as shown in line 11.
After matrix “A” is formed, the code in Algorithm 2 is used to get its validity score:
ALGORITHM 2: GETVALIDITY(validity,A, percentage)
Step 1: for r ← 1 : c do
Step 2: qr ← floor(percentage × nr)
Step 3: if qr = 0 then
Step 4:
qr ← 1
Step 5: end if
Step 6: end for
Step 7: for i ← 1 : n do
Step 8: {aiv[1], . . . , aiv[n] } ←Sort {ai1, . . . , ain}
Step 9: s.t. aiv[1] ≥ aiv[2] ≥ . . . ≥ aiv[n]
{v[1], . . . , v[n]} ← permute {1, . . . , n}
Step 10: r ← class of di
Step 11:
Step 12: end for
Step 13:
Step 14: return validity
For each document “di” corresponding to row “ai” of matrix A, we select “qr” documents closest to point “di”. The
value of “qr” is chosen relatively as the percentage of the size of the class r that contains “di”, where percentage (0, 1].
Then, validity with respect to “di” is calculated by the fraction of these “qr” documents having the same class label with
“di”, as shown in line 11. The final validity is determined by averaging the over all the rows of matrix A, as shown in line
13. It is clear that the validity score is bounded within values 0 and 1. The higher validity score a similarity measure has, the
more suitable it should be useful for the clustering process.
IV.
INCREMENTAL CLUSTERING ALGORITHM
The main goal of this algorithm is to perform text document clustering by optimizing
www.ijmer.com
I R and I V as shown below:
2825 | Page
4. www.ijmer.com
International Journal of Modern Engineering Research (IJMER)
Vol. 3, Issue. 5, Sep - Oct. 2013 pp-2823-2826
ISSN: 2249-6645
With this general form, the incremental optimization algorithm, which has two major steps Initialization and Refinement, is
shown in Algorithm 3 and Algorithm 4.
ALGORITHM 3: INITIALIZATION
Step 1: Select k seeds s1, . . . , sk randomly
Step 2:
Step 3:
Step 4: end
ALGORITHM 4: REFINEMENT
Step 1: repeat
Step 2: {v[1 : n]} ← random permutation of {1, . . ., n}
Step 3: for j ← 1 : n do
Step 4: i ← v[j]
Step 5: p ← cluster[di]
Step 6:
Step 7:
Step 8:
Step 9: if
then
Step 10: Move di to cluster q: cluster[di] ← q
Step 11: Update Dp, np,Dq, nq
Step 12: end if
Step 13: end for
Step 14: until No move for all n documents
Step 15: end
At Initialization, “k” arbitrary documents are selected to be the seeds from which initial partitions are formed.
Refinement is a process that consists of a number of iterations. During each iteration, the “n” text documents are visited one
by one in a totally random order. Each text document is checked if its move to another cluster results in improvement of the
objective function. If yes, then the text document is moved to the cluster that leads to the highest improvement. If no clusters
are better than the current cluster, the text document is not moved. The clustering process terminates when iteration
completes without any text documents being moved to new clusters.
V.
CONCLUSION
In the view point of data engineering, a cluster is a group of objects with similar nature. The grouping mechanism is
called as clustering process. The similar text documents are grouped together in a cluster, if their cosine similarity measure is
less than a specified threshold. In this paper we mainly focuses on view points and we introduce a novel multi-viewpoint
based similarity measure for text mining. The nature of similarity measure plays a very important role in the success or
failure of the clustering method. From the proposed similarity measure, we then formulate new clustering criterion functions
and introduce their respective clustering algorithms, which are fast and scalable like k-means algorithm, but are also capable
of providing high quality and consistent performance.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
I. Guyon, U. von Luxburg, and R. C. Williamson, “Clustering: Science or Art?” , ‖ NIPS‟09 Workshop on Clustering Theory, 2009.
Leo
Wanner
(2004).
“Introduction
to
Clustering
Techniques”.
Available
online
at:
http://www.iula.upf.edu/materials/040701wanner.pdf [viewed: 16 August 2012]
D. Ienco, R. G. Pensa, and R. Meo, “Context-based distance learning for categorical data clustering,” in Proc. of the 8th Int. Symp.
IDA, 2009, pp. 83–94.
I. Guyon, U. von Luxburg, and R. C. Williamson, “Clustering: Science or Art?” NIPS‟09 Workshop on Clustering Theory, 2009.
C. D. Manning, P. Raghavan, and H. Sch ¨ utze, An Introduction to Information Retrieval. Press, Cambridge U., 2009.
X. Wu, V. Kumar, J. Ross Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. Ng, B. Liu, P. S. Yu, Z.-H. Zhou, M.
Steinbach, D. J. Hand, and D. Steinberg, “Top 10 algorithms in data mining,” Knowl.Inf. Syst., vol. 14, no. 1, pp. 1–37, 2007.
W. Xu, X. Liu, and Y. Gong, “Document clustering based on nonnegative matrix factorization,” in SIGIR, 2003, pp. 267–273.
S. Zhong, “Efficient online spherical K-means clustering,” in IEEE IJCNN, 2005, pp. 3180–3185.
www.ijmer.com
2826 | Page