SlideShare a Scribd company logo
1 of 3
Integrated Intelligent Research(IIR) International Journal of Business Intelligent
Volume: 04 Issue: 02 December 2015,Pages No.112- 114
ISSN: 2278-2400
112
Curse of Dimensionality in Paradoxical High
Dimensional Clinical Datasets – A Survey
S.Rajeswari, Dr.M.S.Josephine,V.Jeyabalaraja
Research Scholar, Computer Applications, Bharathiyar University, Coimbatore, India
Professor, Computer Applications, Dr.MGR University, Chennai, India
Professor, Computer Science & Engineering,, Velammal Engineering College, Chennai, India
Email: Vrajee2008@gmail.com,josejbr@yahoo.com,jeyabalaraja@gmail.com
Abstract-Data storage and retrieval is one among the
challenging process in the field of computation. The storage
and retrieval of multi-dimensional unstructured conflict data
are needs the conversion of structure process. The storage has
the major impact on access and computation time. A
significant analysis of very large data sets involves different
types of datasets as paradoxical high dimensional data. The
ideal case assumptions are that data are collected in equal
length intervals and while comparing the length are not valid
for many real data sets especially clinical data sets. In addition
the datasets are different from each other, the data are
paradoxical and varies by each medical data. In this paper, the
concept of hierarchical clustering with dendrogram structure is
used to represent the paradoxical high dimensional clinical
datasets. These large clusters of high dimensional datasets are
of different dimensions and they may produce much noise and
mask the real data to be diverse. There is a survey of clustering
techniques used in paradoxical high dimensional clinical
datasets and which will be highlighted by the dendrogram
representation and also to reduces the dimensions of different
clusters.
Keywords: - Hierarchical Clustering, High dimensional data,
Dendrogram.
I. INTRODUCTION
Data mining refers to extracting or mining knowledge from large
amount of data. It is the process of digging data for discovering
latent patterns which can be translated into valuable information.
Clinical data mining can visualize the hidden patterns present in
voluminous data which is to be discovered. Data mining
techniques which are applied to clinical datasets include
clustering, classification, prediction and frequent pattern and
attribute selection. This paper provides the summary of the
techniques used to represent the paradoxical or heterogeneous
high dimensional data
II. EASE OF USE
A. Heterogenous High Dimensional Data
It consists of a set of interconnected, autonomous component
databases. Object in one component database may different
greatly from objects in other component database making it
difficult to assimilate their semantics into the overall
heterogeneous database. A legacy database is a group of
heterogeneous databases that combine different kinds of data
systems such as relational or object oriented databases,
hierarchical databases, network databases, spread sheets,
multimedia databases or file systems.
B. Paradoxical Clinical Datasets.
One of the most significant challenges of the data mining in
medical side is to obtain the quality and relevant clinical trial
data. Medical data is complex and heterogeneous in nature,
because it is collected from various sources such as from the
medical reports of laboratory, from the discussion with patient
or from the review of physician. The medical information is
characteristics of redundancy, multi-attribution, incompletion
and closely related with time.In this paper, we have discussed
about the clustering techniques especially hierarchical
clustering with the dendrogram structure of clusters with top-
down method of representation of clinical data clusters to
reduce the curse of dimensionality.
C. Cluster Analysis
The process of grouping a set of physical or abstract objects
into classes of similar objects is called clustering. A cluster is a
collection of data objects that are similar to one another within
the same cluster and are dissimilar to the objects in other
cluster. Cluster analysis is a popular data discretization
method. A clustering algorithm can be applied to discretize a
numerical attribute, C by partitioning the values of C into
clusters or groups. Clustering takes the distribution of C into
consideration, as well as the closeness of data points, and
therefore is able to produce high-quality discretization results.
Clustering can be used to generate a concept hierarchy for C by
following either a top down splitting strategy or a bottom-up
merging strategy, where each cluster forms a node of the
concept hierarchy. In the former, each initial cluster or
partition may be further decomposed into several sub clusters,
forming a lower level of the hierarchy. In the latter, clusters are
formed by repeatedly grouping neighboring clusters in order to
form higher-level concepts. Clustering is also called Data
Segmentation in some applications because clustering
partitions large datasets into groups according to their
similarity. Clustering is also used for outlier detection. For a
machine learning perspective clustered corresponds to hidden
patterns, the search for clusters in unsupervised learning.
D. Hierarchical Clustering
A hierarchical clustering metrics works by grouping data
objects into a tree of clusters, which is further divided into
agglomerative or divisive depending on whether the
hierarchical decomposition is formed in bottom-up (Merging)
Integrated Intelligent Research(IIR) International Journal of Business Intelligent
Volume: 04 Issue: 02 December 2015,Pages No.112- 114
ISSN: 2278-2400
113
or top-down(splitting) fashion.A tree structure called a
Dendrogram is commonly used to represent the process of
hierarchical clustering. It shows how objects are grouped
together step by step. Divisive clustering will put all objects in
one cluster, repeat the iteration until all clusters as singleton.
Choose a cluster to split(criteria based) and replace the chosen
clusters with the sub-clusters. Here cut based optimization is
used to weaken the connection between objects in different
clusters rather than strengthens the connection between object
within a cluster.
Figure1. Dendrogram Structure of Diferent Clusters
III. RELATED WORK
Here we make a survey among different articles that relate to
our study. By the Michael Sedlmair et al. [14] apply data
mining techniques to reduce the dimension of high
dimensional data by using the concept of scatter plot. Chunxia
Xiao et.al[15] quoted the use of hierarchical data structure for
the representing high resolution image and videos, which gives
an excellent quality of data. With respect to the filter feature
selection methods, the application of cluster analysis has been
demonstrated to be more effective than traditional feature
selection algorithm Piereira et al.[1], Bakar and Mccallum[2]
and Dhillon et.al[3] employed to reduce the distributional
clustering of words to reduce the dimensionality of text data.
Qinbao Song et.al[4] stated that the features are divided into
clusters by using graph-theoretic clustering methods for
representing the high dimensional data.
Shaurya Jauhari et al.[5] represent the gene expression data by
using hierarchical clustering of agglomerative method. Yifeng
Li et.al[6] shows that the micro array data as high dimensional
data and implement the concept of classification by using the
non-negative least square method. Cagatay et.al[7] by
representation of factor generation, they introduce the
construction and utilization of representative factors for the
interactive visual analysis of structures in high dimensional
data sets to reduce the dimensions. Nenad Toma Sev et. al[8]
stated that the role of hubness in clustering high dimensional
data will reduce the curse of dimensionality in datasets. Jenny
Hyunjung Lee et. al[9] represents a structure based distance
uses the multidimensional scaling method to calculate the
distance in the clusters of high dimensional data.
IV. PROPOSED WORK
A. Clustering Paradoxical High Dimensional Clinical Data.
It is a challenging process due to the curse of dimensionality.
Many dimensions may not be relevant. As the number of
dimension increases the data become increasingly sparse so
that the distance measurement between pairs of points becomes
tedious. Here we represent the paradoxical high dimensional
clinical data using hierarchical clustering with divisive top-
down decompositions method.An interesting strategy that often
yields good results is to first apply a hierarchical divisive
methods, which determines the number of clusters and find an
initial clustering and then use iterative relocation to improve
the clustering. For clustering purposes, the most relevant aspect
of the curse of dimensionality concerns with the effect of
increasing dimensionality as distance or similarity.For
example, as we taking the four different clusters with distance
as follows, where |x-y| is the distance between to objects or
points, x and y. M is the mean for clusters C and Ci is the
number of objects in C.
B. Equations
Minimum Distance
Dmin(Ci, Cj)= min x€Ci y€Cj|x-y| (1)
Maximum Distance
Dmax(Ci, Cj)= max x€Ci y€Cj|x-y| (2)
Mean Distance
Dmean(Ci, Cj)= |Mi-Mj| (3)
Average Distance
Davg(Ci, Cj)= 1/CiCj ∑ ∑ |x-y| (4)
x€Ci y€ Cj
when the algorithm uses the minimum distance, Dmin(Ci,Cj) to
measure the distance between clusters it is called Nearest –
neighbor clustering algorithm. Combining the edges linking
clusters always go between distinct clusters, the resulting graph
will generate a tree. Thus a divisive or agglomerative
hierarchical clustering algorithm that uses the minimum
distance measures.Where the maximum distance, Dmax(Ci,Cj) to
measure the distance between clusters called Farthest –
Neighbour clustering algorithm. The distance between two
clusters is determined by the most distant nodes in the two
clusters. Farthest-neighbour algorithm tends to minimize the
increase in diameter of the clusters at each iteration as little as
possible. If the tree clusters are rather compact and
approximately equal in size, the method will produce huge-
quality clusters, otherwise the clusters produced can be
meaningless.In a theoretical analysis of several different types
of clusters of paradoxical clinical data sets. This work was
oriented towards the problem of finding the nearest neighbours
of points, but the result also indicate potential problems for
clustering high dimensionaldata.Finding the distance measures
of the clusters as Maximum and Minimum distance. The
absolute distance of Dmax-Dmin of the clusters are closet and
farthest neighbours of independently related points depends on
Integrated Intelligent Research(IIR) International Journal of Business Intelligent
Volume: 04 Issue: 02 December 2015,Pages No.112- 114
ISSN: 2278-2400
114
the distance measure. By the distance measures will curse of
dimensionality increases and also remain constant, but some
information may lost.To enhance this type of dimensionality
reduction without the information loss is Principal Component
Analysis(PCA) or Singular value Decomposition(SVD) can be
used from the allocated split up clusters.
V. CONCLUSION
In this paper we survey various articles used clustering
techniques that have been employed for medical data mining.
Data mining techniques have higher utility in medical data
mining as there is voluminous data in this industry. Due to the
enormous growth of clinical data, it has become indispensable
to use data mining techniques to help decision support and
predication systems in the field of Healthcare. In this paper, we
include main issues of handling noise and dimension reduction
without loss of information. In real world systems producing
large amount of heterogeneous medical data, processing
compute tedious tasks. We suggest here the data mining
techniques of Hierarchical clustering with principal component
Analysis may provide a fine results for curse of dimensionality.
The medical mining yields required business intelligence to
support well informed diagnosis and decisions.
REFERENCES
[1] F. Pereira, N. Tishby, and L. Lee, “Distributional Clustering of English
Words,” Proc. 31st Ann. Meeting on Assoc. for Computational
Linguistics, pp. 183-190, 1993.
[2] L.D. Baker and A.K. McCallum, “Distributional Clustering ofWords for
Text Classification,” Proc. 21st Ann. Int’l ACM SIGIRConf. Research
and Development in information Retrieval, pp. 96-103,1998.
[3] I.S. Dhillon, S. Mallela, and R. Kumar, “A Divisive
InformationTheoretic Feature Clustering Algorithm for Text
Classification,”J. Machine Learning Research, vol. 3, pp. 1265-1287,
2003
[4] Qinbao Song, Jingjie Ni, and Guangtao Wang “A Fast Clustering-
Based FeatureSubset Selection Algorithm for High-Dimensional
Data”IEEE TRANSACTIONS ON KNOWLEDGE AND DATA
ENGINEERING, VOL. 25, NO. 1, JANUARY 2013
[5] Shaurya Jauhari and S.A.M. Rizvi “Mining Gene Expression Data
Focusing Cancer Therapeutics: A Digest” IEEE/ACM
TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND
BIOINFORMATICS, VOL. 11, NO. 3, MAY/JUNE 2014
[6] Yifeng Li and Alioune Ngom “Nonnegative Least-Squares Methods for
the Classification of High-Dimensional Biological Data” IEEE/ACM
TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND
BIOINFORMATICS, VOL. 10, NO. 2, MARCH/APRIL 2013
[7] Cagatay Turkay, Student Member, IEEE, Arvid Lundervold, Member,
IEEE ,Astri Johansen Lundervold, and Helwig Hauser, Member, IEEE “
Representative Factor Generation for the Interactive Visual Analysis of
High-Dimensional Data” IEEE TRANSACTIONS ON VISUALIZATION
AND COMPUTER GRAPHICS, VOL. 18, NO. 12, DECEMBER 2012
[8] Nenad Toma_sev, Milo_s Radovanovi_c, Dunja Mladeni_c, and Mirjana
Ivanovi_c “The Role of Hubness in Clustering High-Dimensional Data”
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA
ENGINEERING, VOL. 26, NO. 3, MARCH 2014
[9] Jenny Hyunjung Lee, Kevin T. McDonnell, Member, IEEE, Alla
Zelenyuk, Dan Imre, and Klaus Mueller, Senior Member, IEEE “A
Structure-Based Distance Metric for High-Dimensional Space
Exploration with Multidimensional Scaling” IEEE TRANSACTIONS
ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 20, NO.
3, MARCH 2014
[10] Ibrahim M. El-Hasnony, Hazem M. El Bakry, Ahmed A. Saleh Faculty
of Computer Science & Information Systems, Mansoura University,
Mansoura, “Data Mining Techniques for Medical Applications: A
Survey” EGYPT Mathematical Methods in Science and Mechanics
[11] Mohammed Abdul Khaleel* Sateesh Kumar Pradham G.N. Dash
Research Scholar P.G.Department of Computer Science P.G.Department
of Physics Sambalpur University, India Utkal University, India
Sambalpur University, India “ A Survey of Data Mining Techniques on
Medical Data for Finding Locally Frequent Diseases” Volume 3, Issue
8, August 2013
[12] “Survey of Clustering Data Mining Techniques Pavel Berkhin” Accrue
Software, Inc.
[13] Divya Tomar and Sonali Agarwal, Indian Institute of Information
Technology, Allahabad, India “A survey on Data Mining
approaches for Healthcare” International Journal of Bio-Science and
Bio-Technology Vol.5, No.5 (2013), pp. 241-266
[14] Michael Sedlmair, Member, IEEE, Tamara Munzner, Member, IEEE,
and Melanie Tory IEEE “Empirical Guidance on Scatterplot and
Dimension Reduction Technique Choices” TRANSACTIONS ON
VISUALIZATION AND COMPUTER GRAPHICS, VOL. 19, NO. 12,
DECEMBER 2013
[15] Chunxia Xiao, Meng Liu, Donglin Xiao, Zhao Dong, and Kwan-Liu Ma,
Fellow, IEEE “Fast Closed-Form Matting Using a Hierarchical Data
Structure”IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS
FOR VIDEO TECHNOLOGY, VOL. 24, NO. 1, JANUARY 2014

More Related Content

Similar to Representing Paradoxical Clinical Data

A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSEditor IJCATR
 
Indexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record DeduplicationIndexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record Deduplicationidescitation
 
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)Universitas Pembangunan Panca Budi
 
CLUSTERING DICHOTOMOUS DATA FOR HEALTH CARE
CLUSTERING DICHOTOMOUS DATA FOR HEALTH CARECLUSTERING DICHOTOMOUS DATA FOR HEALTH CARE
CLUSTERING DICHOTOMOUS DATA FOR HEALTH CAREijistjournal
 
Clustering of medline documents using semi supervised spectral clustering
Clustering of medline documents using semi supervised spectral clusteringClustering of medline documents using semi supervised spectral clustering
Clustering of medline documents using semi supervised spectral clusteringeSAT Journals
 
Clustering of medline documents using semi supervised spectral clustering
Clustering of medline documents using semi supervised spectral clusteringClustering of medline documents using semi supervised spectral clustering
Clustering of medline documents using semi supervised spectral clusteringeSAT Publishing House
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2Gokulks007
 
CLUSTERING DICHOTOMOUS DATA FOR HEALTH CARE
CLUSTERING DICHOTOMOUS DATA FOR HEALTH CARECLUSTERING DICHOTOMOUS DATA FOR HEALTH CARE
CLUSTERING DICHOTOMOUS DATA FOR HEALTH CAREijistjournal
 
Student Performance Evaluation in Education Sector Using Prediction and Clust...
Student Performance Evaluation in Education Sector Using Prediction and Clust...Student Performance Evaluation in Education Sector Using Prediction and Clust...
Student Performance Evaluation in Education Sector Using Prediction and Clust...IJSRD
 
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYCLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYEditor IJMTER
 
dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxAsrithaKorupolu
 
Predicting Students Performance using K-Median Clustering
Predicting Students Performance using  K-Median ClusteringPredicting Students Performance using  K-Median Clustering
Predicting Students Performance using K-Median ClusteringIIRindia
 
Analysis on Data Mining Techniques for Heart Disease Dataset
Analysis on Data Mining Techniques for Heart Disease DatasetAnalysis on Data Mining Techniques for Heart Disease Dataset
Analysis on Data Mining Techniques for Heart Disease DatasetIRJET Journal
 
Comprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction TechniquesComprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction Techniquesijsrd.com
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
A study and survey on various progressive duplicate detection mechanisms
A study and survey on various progressive duplicate detection mechanismsA study and survey on various progressive duplicate detection mechanisms
A study and survey on various progressive duplicate detection mechanismseSAT Journals
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data miningUjjawal
 
A Novel Clustering Method for Similarity Measuring in Text Documents
A Novel Clustering Method for Similarity Measuring in Text DocumentsA Novel Clustering Method for Similarity Measuring in Text Documents
A Novel Clustering Method for Similarity Measuring in Text DocumentsIJMER
 

Similar to Representing Paradoxical Clinical Data (20)

A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
 
Indexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record DeduplicationIndexing based Genetic Programming Approach to Record Deduplication
Indexing based Genetic Programming Approach to Record Deduplication
 
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
Data Mining Classification Comparison (Naïve Bayes and C4.5 Algorithms)
 
Du35687693
Du35687693Du35687693
Du35687693
 
CLUSTERING DICHOTOMOUS DATA FOR HEALTH CARE
CLUSTERING DICHOTOMOUS DATA FOR HEALTH CARECLUSTERING DICHOTOMOUS DATA FOR HEALTH CARE
CLUSTERING DICHOTOMOUS DATA FOR HEALTH CARE
 
Clustering of medline documents using semi supervised spectral clustering
Clustering of medline documents using semi supervised spectral clusteringClustering of medline documents using semi supervised spectral clustering
Clustering of medline documents using semi supervised spectral clustering
 
Clustering of medline documents using semi supervised spectral clustering
Clustering of medline documents using semi supervised spectral clusteringClustering of medline documents using semi supervised spectral clustering
Clustering of medline documents using semi supervised spectral clustering
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2
 
CLUSTERING DICHOTOMOUS DATA FOR HEALTH CARE
CLUSTERING DICHOTOMOUS DATA FOR HEALTH CARECLUSTERING DICHOTOMOUS DATA FOR HEALTH CARE
CLUSTERING DICHOTOMOUS DATA FOR HEALTH CARE
 
Student Performance Evaluation in Education Sector Using Prediction and Clust...
Student Performance Evaluation in Education Sector Using Prediction and Clust...Student Performance Evaluation in Education Sector Using Prediction and Clust...
Student Performance Evaluation in Education Sector Using Prediction and Clust...
 
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEYCLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
CLASSIFICATION ALGORITHM USING RANDOM CONCEPT ON A VERY LARGE DATA SET: A SURVEY
 
dataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptxdataminingclassificationprediction123 .pptx
dataminingclassificationprediction123 .pptx
 
Predicting Students Performance using K-Median Clustering
Predicting Students Performance using  K-Median ClusteringPredicting Students Performance using  K-Median Clustering
Predicting Students Performance using K-Median Clustering
 
Analysis on Data Mining Techniques for Heart Disease Dataset
Analysis on Data Mining Techniques for Heart Disease DatasetAnalysis on Data Mining Techniques for Heart Disease Dataset
Analysis on Data Mining Techniques for Heart Disease Dataset
 
Comprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction TechniquesComprehensive Survey of Data Classification & Prediction Techniques
Comprehensive Survey of Data Classification & Prediction Techniques
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
A study and survey on various progressive duplicate detection mechanisms
A study and survey on various progressive duplicate detection mechanismsA study and survey on various progressive duplicate detection mechanisms
A study and survey on various progressive duplicate detection mechanisms
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
 
A Novel Clustering Method for Similarity Measuring in Text Documents
A Novel Clustering Method for Similarity Measuring in Text DocumentsA Novel Clustering Method for Similarity Measuring in Text Documents
A Novel Clustering Method for Similarity Measuring in Text Documents
 

More from ijcnes

A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...ijcnes
 
Economic Growth of Information Technology (It) Industry on the Indian Economy
Economic Growth of Information Technology (It) Industry on the Indian EconomyEconomic Growth of Information Technology (It) Industry on the Indian Economy
Economic Growth of Information Technology (It) Industry on the Indian Economyijcnes
 
An analysis of Mobile Learning Implementation in Shinas College of Technology...
An analysis of Mobile Learning Implementation in Shinas College of Technology...An analysis of Mobile Learning Implementation in Shinas College of Technology...
An analysis of Mobile Learning Implementation in Shinas College of Technology...ijcnes
 
A Survey on the Security Issues of Software Defined Networking Tool in Cloud ...
A Survey on the Security Issues of Software Defined Networking Tool in Cloud ...A Survey on the Security Issues of Software Defined Networking Tool in Cloud ...
A Survey on the Security Issues of Software Defined Networking Tool in Cloud ...ijcnes
 
Challenges of E-government in Oman
Challenges of E-government in OmanChallenges of E-government in Oman
Challenges of E-government in Omanijcnes
 
Power Management in Micro grid Using Hybrid Energy Storage System
Power Management in Micro grid Using Hybrid Energy Storage SystemPower Management in Micro grid Using Hybrid Energy Storage System
Power Management in Micro grid Using Hybrid Energy Storage Systemijcnes
 
Holistic Forecasting of Onset of Diabetes through Data Mining Techniques
Holistic Forecasting of Onset of Diabetes through Data Mining TechniquesHolistic Forecasting of Onset of Diabetes through Data Mining Techniques
Holistic Forecasting of Onset of Diabetes through Data Mining Techniquesijcnes
 
A Survey on Disease Prediction from Retinal Colour Fundus Images using Image ...
A Survey on Disease Prediction from Retinal Colour Fundus Images using Image ...A Survey on Disease Prediction from Retinal Colour Fundus Images using Image ...
A Survey on Disease Prediction from Retinal Colour Fundus Images using Image ...ijcnes
 
Feature Extraction in Content based Image Retrieval
Feature Extraction in Content based Image RetrievalFeature Extraction in Content based Image Retrieval
Feature Extraction in Content based Image Retrievalijcnes
 
Challenges and Mechanisms for Securing Data in Mobile Cloud Computing
Challenges and Mechanisms for Securing Data in Mobile Cloud ComputingChallenges and Mechanisms for Securing Data in Mobile Cloud Computing
Challenges and Mechanisms for Securing Data in Mobile Cloud Computingijcnes
 
Detection of Node Activity and Selfish & Malicious Behavioral Patterns using ...
Detection of Node Activity and Selfish & Malicious Behavioral Patterns using ...Detection of Node Activity and Selfish & Malicious Behavioral Patterns using ...
Detection of Node Activity and Selfish & Malicious Behavioral Patterns using ...ijcnes
 
Optimal Channel and Relay Assignment in Ofdmbased Multi-Relay Multi-Pair Two-...
Optimal Channel and Relay Assignment in Ofdmbased Multi-Relay Multi-Pair Two-...Optimal Channel and Relay Assignment in Ofdmbased Multi-Relay Multi-Pair Two-...
Optimal Channel and Relay Assignment in Ofdmbased Multi-Relay Multi-Pair Two-...ijcnes
 
An Effective and Scalable AODV for Wireless Ad hoc Sensor Networks
An Effective and Scalable AODV for Wireless Ad hoc Sensor NetworksAn Effective and Scalable AODV for Wireless Ad hoc Sensor Networks
An Effective and Scalable AODV for Wireless Ad hoc Sensor Networksijcnes
 
Secured Seamless Wi-Fi Enhancement in Dynamic Vehicles
Secured Seamless Wi-Fi Enhancement in Dynamic VehiclesSecured Seamless Wi-Fi Enhancement in Dynamic Vehicles
Secured Seamless Wi-Fi Enhancement in Dynamic Vehiclesijcnes
 
Virtual Position based Olsr Protocol for Wireless Sensor Networks
Virtual Position based Olsr Protocol for Wireless Sensor NetworksVirtual Position based Olsr Protocol for Wireless Sensor Networks
Virtual Position based Olsr Protocol for Wireless Sensor Networksijcnes
 
Mitigation and control of Defeating Jammers using P-1 Factorization
Mitigation and control of Defeating Jammers using P-1 FactorizationMitigation and control of Defeating Jammers using P-1 Factorization
Mitigation and control of Defeating Jammers using P-1 Factorizationijcnes
 
An analysis and impact factors on Agriculture field using Data Mining Techniques
An analysis and impact factors on Agriculture field using Data Mining TechniquesAn analysis and impact factors on Agriculture field using Data Mining Techniques
An analysis and impact factors on Agriculture field using Data Mining Techniquesijcnes
 
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...ijcnes
 
Priority Based Multi Sen Car Technique in WSN
Priority Based Multi Sen Car Technique in WSNPriority Based Multi Sen Car Technique in WSN
Priority Based Multi Sen Car Technique in WSNijcnes
 
Semantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based SystemSemantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based Systemijcnes
 

More from ijcnes (20)

A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...A Survey of Ontology-based Information Extraction for Social Media Content An...
A Survey of Ontology-based Information Extraction for Social Media Content An...
 
Economic Growth of Information Technology (It) Industry on the Indian Economy
Economic Growth of Information Technology (It) Industry on the Indian EconomyEconomic Growth of Information Technology (It) Industry on the Indian Economy
Economic Growth of Information Technology (It) Industry on the Indian Economy
 
An analysis of Mobile Learning Implementation in Shinas College of Technology...
An analysis of Mobile Learning Implementation in Shinas College of Technology...An analysis of Mobile Learning Implementation in Shinas College of Technology...
An analysis of Mobile Learning Implementation in Shinas College of Technology...
 
A Survey on the Security Issues of Software Defined Networking Tool in Cloud ...
A Survey on the Security Issues of Software Defined Networking Tool in Cloud ...A Survey on the Security Issues of Software Defined Networking Tool in Cloud ...
A Survey on the Security Issues of Software Defined Networking Tool in Cloud ...
 
Challenges of E-government in Oman
Challenges of E-government in OmanChallenges of E-government in Oman
Challenges of E-government in Oman
 
Power Management in Micro grid Using Hybrid Energy Storage System
Power Management in Micro grid Using Hybrid Energy Storage SystemPower Management in Micro grid Using Hybrid Energy Storage System
Power Management in Micro grid Using Hybrid Energy Storage System
 
Holistic Forecasting of Onset of Diabetes through Data Mining Techniques
Holistic Forecasting of Onset of Diabetes through Data Mining TechniquesHolistic Forecasting of Onset of Diabetes through Data Mining Techniques
Holistic Forecasting of Onset of Diabetes through Data Mining Techniques
 
A Survey on Disease Prediction from Retinal Colour Fundus Images using Image ...
A Survey on Disease Prediction from Retinal Colour Fundus Images using Image ...A Survey on Disease Prediction from Retinal Colour Fundus Images using Image ...
A Survey on Disease Prediction from Retinal Colour Fundus Images using Image ...
 
Feature Extraction in Content based Image Retrieval
Feature Extraction in Content based Image RetrievalFeature Extraction in Content based Image Retrieval
Feature Extraction in Content based Image Retrieval
 
Challenges and Mechanisms for Securing Data in Mobile Cloud Computing
Challenges and Mechanisms for Securing Data in Mobile Cloud ComputingChallenges and Mechanisms for Securing Data in Mobile Cloud Computing
Challenges and Mechanisms for Securing Data in Mobile Cloud Computing
 
Detection of Node Activity and Selfish & Malicious Behavioral Patterns using ...
Detection of Node Activity and Selfish & Malicious Behavioral Patterns using ...Detection of Node Activity and Selfish & Malicious Behavioral Patterns using ...
Detection of Node Activity and Selfish & Malicious Behavioral Patterns using ...
 
Optimal Channel and Relay Assignment in Ofdmbased Multi-Relay Multi-Pair Two-...
Optimal Channel and Relay Assignment in Ofdmbased Multi-Relay Multi-Pair Two-...Optimal Channel and Relay Assignment in Ofdmbased Multi-Relay Multi-Pair Two-...
Optimal Channel and Relay Assignment in Ofdmbased Multi-Relay Multi-Pair Two-...
 
An Effective and Scalable AODV for Wireless Ad hoc Sensor Networks
An Effective and Scalable AODV for Wireless Ad hoc Sensor NetworksAn Effective and Scalable AODV for Wireless Ad hoc Sensor Networks
An Effective and Scalable AODV for Wireless Ad hoc Sensor Networks
 
Secured Seamless Wi-Fi Enhancement in Dynamic Vehicles
Secured Seamless Wi-Fi Enhancement in Dynamic VehiclesSecured Seamless Wi-Fi Enhancement in Dynamic Vehicles
Secured Seamless Wi-Fi Enhancement in Dynamic Vehicles
 
Virtual Position based Olsr Protocol for Wireless Sensor Networks
Virtual Position based Olsr Protocol for Wireless Sensor NetworksVirtual Position based Olsr Protocol for Wireless Sensor Networks
Virtual Position based Olsr Protocol for Wireless Sensor Networks
 
Mitigation and control of Defeating Jammers using P-1 Factorization
Mitigation and control of Defeating Jammers using P-1 FactorizationMitigation and control of Defeating Jammers using P-1 Factorization
Mitigation and control of Defeating Jammers using P-1 Factorization
 
An analysis and impact factors on Agriculture field using Data Mining Techniques
An analysis and impact factors on Agriculture field using Data Mining TechniquesAn analysis and impact factors on Agriculture field using Data Mining Techniques
An analysis and impact factors on Agriculture field using Data Mining Techniques
 
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...
A Study on Code Smell Detection with Refactoring Tools in Object Oriented Lan...
 
Priority Based Multi Sen Car Technique in WSN
Priority Based Multi Sen Car Technique in WSNPriority Based Multi Sen Car Technique in WSN
Priority Based Multi Sen Car Technique in WSN
 
Semantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based SystemSemantic Search of E-Learning Documents Using Ontology Based System
Semantic Search of E-Learning Documents Using Ontology Based System
 

Recently uploaded

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 

Recently uploaded (20)

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Call Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call GirlsCall Girls Narol 7397865700 Independent Call Girls
Call Girls Narol 7397865700 Independent Call Girls
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 

Representing Paradoxical Clinical Data

  • 1. Integrated Intelligent Research(IIR) International Journal of Business Intelligent Volume: 04 Issue: 02 December 2015,Pages No.112- 114 ISSN: 2278-2400 112 Curse of Dimensionality in Paradoxical High Dimensional Clinical Datasets – A Survey S.Rajeswari, Dr.M.S.Josephine,V.Jeyabalaraja Research Scholar, Computer Applications, Bharathiyar University, Coimbatore, India Professor, Computer Applications, Dr.MGR University, Chennai, India Professor, Computer Science & Engineering,, Velammal Engineering College, Chennai, India Email: Vrajee2008@gmail.com,josejbr@yahoo.com,jeyabalaraja@gmail.com Abstract-Data storage and retrieval is one among the challenging process in the field of computation. The storage and retrieval of multi-dimensional unstructured conflict data are needs the conversion of structure process. The storage has the major impact on access and computation time. A significant analysis of very large data sets involves different types of datasets as paradoxical high dimensional data. The ideal case assumptions are that data are collected in equal length intervals and while comparing the length are not valid for many real data sets especially clinical data sets. In addition the datasets are different from each other, the data are paradoxical and varies by each medical data. In this paper, the concept of hierarchical clustering with dendrogram structure is used to represent the paradoxical high dimensional clinical datasets. These large clusters of high dimensional datasets are of different dimensions and they may produce much noise and mask the real data to be diverse. There is a survey of clustering techniques used in paradoxical high dimensional clinical datasets and which will be highlighted by the dendrogram representation and also to reduces the dimensions of different clusters. Keywords: - Hierarchical Clustering, High dimensional data, Dendrogram. I. INTRODUCTION Data mining refers to extracting or mining knowledge from large amount of data. It is the process of digging data for discovering latent patterns which can be translated into valuable information. Clinical data mining can visualize the hidden patterns present in voluminous data which is to be discovered. Data mining techniques which are applied to clinical datasets include clustering, classification, prediction and frequent pattern and attribute selection. This paper provides the summary of the techniques used to represent the paradoxical or heterogeneous high dimensional data II. EASE OF USE A. Heterogenous High Dimensional Data It consists of a set of interconnected, autonomous component databases. Object in one component database may different greatly from objects in other component database making it difficult to assimilate their semantics into the overall heterogeneous database. A legacy database is a group of heterogeneous databases that combine different kinds of data systems such as relational or object oriented databases, hierarchical databases, network databases, spread sheets, multimedia databases or file systems. B. Paradoxical Clinical Datasets. One of the most significant challenges of the data mining in medical side is to obtain the quality and relevant clinical trial data. Medical data is complex and heterogeneous in nature, because it is collected from various sources such as from the medical reports of laboratory, from the discussion with patient or from the review of physician. The medical information is characteristics of redundancy, multi-attribution, incompletion and closely related with time.In this paper, we have discussed about the clustering techniques especially hierarchical clustering with the dendrogram structure of clusters with top- down method of representation of clinical data clusters to reduce the curse of dimensionality. C. Cluster Analysis The process of grouping a set of physical or abstract objects into classes of similar objects is called clustering. A cluster is a collection of data objects that are similar to one another within the same cluster and are dissimilar to the objects in other cluster. Cluster analysis is a popular data discretization method. A clustering algorithm can be applied to discretize a numerical attribute, C by partitioning the values of C into clusters or groups. Clustering takes the distribution of C into consideration, as well as the closeness of data points, and therefore is able to produce high-quality discretization results. Clustering can be used to generate a concept hierarchy for C by following either a top down splitting strategy or a bottom-up merging strategy, where each cluster forms a node of the concept hierarchy. In the former, each initial cluster or partition may be further decomposed into several sub clusters, forming a lower level of the hierarchy. In the latter, clusters are formed by repeatedly grouping neighboring clusters in order to form higher-level concepts. Clustering is also called Data Segmentation in some applications because clustering partitions large datasets into groups according to their similarity. Clustering is also used for outlier detection. For a machine learning perspective clustered corresponds to hidden patterns, the search for clusters in unsupervised learning. D. Hierarchical Clustering A hierarchical clustering metrics works by grouping data objects into a tree of clusters, which is further divided into agglomerative or divisive depending on whether the hierarchical decomposition is formed in bottom-up (Merging)
  • 2. Integrated Intelligent Research(IIR) International Journal of Business Intelligent Volume: 04 Issue: 02 December 2015,Pages No.112- 114 ISSN: 2278-2400 113 or top-down(splitting) fashion.A tree structure called a Dendrogram is commonly used to represent the process of hierarchical clustering. It shows how objects are grouped together step by step. Divisive clustering will put all objects in one cluster, repeat the iteration until all clusters as singleton. Choose a cluster to split(criteria based) and replace the chosen clusters with the sub-clusters. Here cut based optimization is used to weaken the connection between objects in different clusters rather than strengthens the connection between object within a cluster. Figure1. Dendrogram Structure of Diferent Clusters III. RELATED WORK Here we make a survey among different articles that relate to our study. By the Michael Sedlmair et al. [14] apply data mining techniques to reduce the dimension of high dimensional data by using the concept of scatter plot. Chunxia Xiao et.al[15] quoted the use of hierarchical data structure for the representing high resolution image and videos, which gives an excellent quality of data. With respect to the filter feature selection methods, the application of cluster analysis has been demonstrated to be more effective than traditional feature selection algorithm Piereira et al.[1], Bakar and Mccallum[2] and Dhillon et.al[3] employed to reduce the distributional clustering of words to reduce the dimensionality of text data. Qinbao Song et.al[4] stated that the features are divided into clusters by using graph-theoretic clustering methods for representing the high dimensional data. Shaurya Jauhari et al.[5] represent the gene expression data by using hierarchical clustering of agglomerative method. Yifeng Li et.al[6] shows that the micro array data as high dimensional data and implement the concept of classification by using the non-negative least square method. Cagatay et.al[7] by representation of factor generation, they introduce the construction and utilization of representative factors for the interactive visual analysis of structures in high dimensional data sets to reduce the dimensions. Nenad Toma Sev et. al[8] stated that the role of hubness in clustering high dimensional data will reduce the curse of dimensionality in datasets. Jenny Hyunjung Lee et. al[9] represents a structure based distance uses the multidimensional scaling method to calculate the distance in the clusters of high dimensional data. IV. PROPOSED WORK A. Clustering Paradoxical High Dimensional Clinical Data. It is a challenging process due to the curse of dimensionality. Many dimensions may not be relevant. As the number of dimension increases the data become increasingly sparse so that the distance measurement between pairs of points becomes tedious. Here we represent the paradoxical high dimensional clinical data using hierarchical clustering with divisive top- down decompositions method.An interesting strategy that often yields good results is to first apply a hierarchical divisive methods, which determines the number of clusters and find an initial clustering and then use iterative relocation to improve the clustering. For clustering purposes, the most relevant aspect of the curse of dimensionality concerns with the effect of increasing dimensionality as distance or similarity.For example, as we taking the four different clusters with distance as follows, where |x-y| is the distance between to objects or points, x and y. M is the mean for clusters C and Ci is the number of objects in C. B. Equations Minimum Distance Dmin(Ci, Cj)= min x€Ci y€Cj|x-y| (1) Maximum Distance Dmax(Ci, Cj)= max x€Ci y€Cj|x-y| (2) Mean Distance Dmean(Ci, Cj)= |Mi-Mj| (3) Average Distance Davg(Ci, Cj)= 1/CiCj ∑ ∑ |x-y| (4) x€Ci y€ Cj when the algorithm uses the minimum distance, Dmin(Ci,Cj) to measure the distance between clusters it is called Nearest – neighbor clustering algorithm. Combining the edges linking clusters always go between distinct clusters, the resulting graph will generate a tree. Thus a divisive or agglomerative hierarchical clustering algorithm that uses the minimum distance measures.Where the maximum distance, Dmax(Ci,Cj) to measure the distance between clusters called Farthest – Neighbour clustering algorithm. The distance between two clusters is determined by the most distant nodes in the two clusters. Farthest-neighbour algorithm tends to minimize the increase in diameter of the clusters at each iteration as little as possible. If the tree clusters are rather compact and approximately equal in size, the method will produce huge- quality clusters, otherwise the clusters produced can be meaningless.In a theoretical analysis of several different types of clusters of paradoxical clinical data sets. This work was oriented towards the problem of finding the nearest neighbours of points, but the result also indicate potential problems for clustering high dimensionaldata.Finding the distance measures of the clusters as Maximum and Minimum distance. The absolute distance of Dmax-Dmin of the clusters are closet and farthest neighbours of independently related points depends on
  • 3. Integrated Intelligent Research(IIR) International Journal of Business Intelligent Volume: 04 Issue: 02 December 2015,Pages No.112- 114 ISSN: 2278-2400 114 the distance measure. By the distance measures will curse of dimensionality increases and also remain constant, but some information may lost.To enhance this type of dimensionality reduction without the information loss is Principal Component Analysis(PCA) or Singular value Decomposition(SVD) can be used from the allocated split up clusters. V. CONCLUSION In this paper we survey various articles used clustering techniques that have been employed for medical data mining. Data mining techniques have higher utility in medical data mining as there is voluminous data in this industry. Due to the enormous growth of clinical data, it has become indispensable to use data mining techniques to help decision support and predication systems in the field of Healthcare. In this paper, we include main issues of handling noise and dimension reduction without loss of information. In real world systems producing large amount of heterogeneous medical data, processing compute tedious tasks. We suggest here the data mining techniques of Hierarchical clustering with principal component Analysis may provide a fine results for curse of dimensionality. The medical mining yields required business intelligence to support well informed diagnosis and decisions. REFERENCES [1] F. Pereira, N. Tishby, and L. Lee, “Distributional Clustering of English Words,” Proc. 31st Ann. Meeting on Assoc. for Computational Linguistics, pp. 183-190, 1993. [2] L.D. Baker and A.K. McCallum, “Distributional Clustering ofWords for Text Classification,” Proc. 21st Ann. Int’l ACM SIGIRConf. Research and Development in information Retrieval, pp. 96-103,1998. [3] I.S. Dhillon, S. Mallela, and R. Kumar, “A Divisive InformationTheoretic Feature Clustering Algorithm for Text Classification,”J. Machine Learning Research, vol. 3, pp. 1265-1287, 2003 [4] Qinbao Song, Jingjie Ni, and Guangtao Wang “A Fast Clustering- Based FeatureSubset Selection Algorithm for High-Dimensional Data”IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 1, JANUARY 2013 [5] Shaurya Jauhari and S.A.M. Rizvi “Mining Gene Expression Data Focusing Cancer Therapeutics: A Digest” IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 11, NO. 3, MAY/JUNE 2014 [6] Yifeng Li and Alioune Ngom “Nonnegative Least-Squares Methods for the Classification of High-Dimensional Biological Data” IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, VOL. 10, NO. 2, MARCH/APRIL 2013 [7] Cagatay Turkay, Student Member, IEEE, Arvid Lundervold, Member, IEEE ,Astri Johansen Lundervold, and Helwig Hauser, Member, IEEE “ Representative Factor Generation for the Interactive Visual Analysis of High-Dimensional Data” IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 18, NO. 12, DECEMBER 2012 [8] Nenad Toma_sev, Milo_s Radovanovi_c, Dunja Mladeni_c, and Mirjana Ivanovi_c “The Role of Hubness in Clustering High-Dimensional Data” IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 3, MARCH 2014 [9] Jenny Hyunjung Lee, Kevin T. McDonnell, Member, IEEE, Alla Zelenyuk, Dan Imre, and Klaus Mueller, Senior Member, IEEE “A Structure-Based Distance Metric for High-Dimensional Space Exploration with Multidimensional Scaling” IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 20, NO. 3, MARCH 2014 [10] Ibrahim M. El-Hasnony, Hazem M. El Bakry, Ahmed A. Saleh Faculty of Computer Science & Information Systems, Mansoura University, Mansoura, “Data Mining Techniques for Medical Applications: A Survey” EGYPT Mathematical Methods in Science and Mechanics [11] Mohammed Abdul Khaleel* Sateesh Kumar Pradham G.N. Dash Research Scholar P.G.Department of Computer Science P.G.Department of Physics Sambalpur University, India Utkal University, India Sambalpur University, India “ A Survey of Data Mining Techniques on Medical Data for Finding Locally Frequent Diseases” Volume 3, Issue 8, August 2013 [12] “Survey of Clustering Data Mining Techniques Pavel Berkhin” Accrue Software, Inc. [13] Divya Tomar and Sonali Agarwal, Indian Institute of Information Technology, Allahabad, India “A survey on Data Mining approaches for Healthcare” International Journal of Bio-Science and Bio-Technology Vol.5, No.5 (2013), pp. 241-266 [14] Michael Sedlmair, Member, IEEE, Tamara Munzner, Member, IEEE, and Melanie Tory IEEE “Empirical Guidance on Scatterplot and Dimension Reduction Technique Choices” TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, VOL. 19, NO. 12, DECEMBER 2013 [15] Chunxia Xiao, Meng Liu, Donglin Xiao, Zhao Dong, and Kwan-Liu Ma, Fellow, IEEE “Fast Closed-Form Matting Using a Hierarchical Data Structure”IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 24, NO. 1, JANUARY 2014