SlideShare a Scribd company logo
International Journal of Engineering Research and Development
e-ISSN: 2278-067X, p-ISSN: 2278-800X, www.ijerd.com
Volume 4, Issue 7 (November 2012), PP. 14-16

                         Framework for Missing Value Imputation
                       Ms.R.Malarvizhi1, Dr.Antony Selvadoss Thanamani2,
 1
     Scholar, Department of CS, NGM College of Arts and Science ,Pollachi ,Bharathiyar University,Coimbatore .
          2
           HOD, Department of CS, NGM College of Arts and Science, Pollachi, Bharathiyar University, Coimbatore.


Abstract— Missing values may occur due to several reasons. In this paper, data is imputed by comparing the two most
popular techniques. Mean Substitution the traditional method replaces mean value in K-means Clustering and in groups of
kNN classifier. When compared in terms of accuracy of imputing missing data, the proposed kNN classifier is evaluated to
demonstrate that the approach is better than the existing K-means clustering.
Keywords— K-Means Clustering, kNN Classifier, Missing Value, Mean Substitution, Imputation

                                             I.        INTRODUCTION
          Researchers are often faced with the issues of missing in Survey. Errors may occur due to human or machine
when processed a sample and stored data values into their respective records. Traditional methods are there for handling
missing data. List-wise deletion and pair-wise deletion exclude the data from analyses. These methods are unaccepted by the
researchers as they produce biased result. K-means clustering is a method of cluster analysis which aims to partition n
observations into k clusters in which each observation belongs to the cluster with the nearest mean. kNN classifier operate
on the premises that classification of unknown instances can be done by relating the unknown to the known according to
some distance function. Mean substitution replaces missing data with the average of the valid data. This paper proposed a
framework which compares two techniques.

                               II.      TYPES OF VARIABLES IN MISSING DATA
         Data that are Missing Completely at Random (MCAR) can be considered as a simple random sample of observed
data. Data are Missing at Random (MAR) when the probability of the value being missing is dependent on some
measurement characteristics of the individual but not on the missing value itself.(1). Data are missing not at Random
(MNAR), variable value being missing is directly related value of the variable itself.

                       III.        CONSTRAINTS FOR MISSING VALUE REPLACEMENT
          Any method which is used for replacing the missing value should follow certain constraints. They are (i) they
should produce unbiased result (ii) They should not change any values of other variables (iii) Estimation should minimize
the cost.

                                       IV         IMPUTATION METHODS
A.            List-Wise Deletion
          The List-wise Deletion leads to the loss of large amount of data, as the whole record gets deleted even if one
single variable was missing. So finally the database gets reduced.

B.            Pair-Wise Deletion
           Pair-wise deletion is an appropriate method when missing data is completely at random. It is easy to implement but
the resultant data are difficult to interpret

C.            Regression Imputation
          Regression imputation is also known as conditional mean imputation in which linear regression equation is used to
replace the missing values.

D.            Hot-Deck Imputation
          Hot-Deck imputation is a type of imputation in which missing data value is replaced with actual data by estimating
similar data set which is currently in use.

E.            Cold-Deck Imputation
          Cold-Deck imputation follows the same procedure of Hot-Deck Imputation but it compares with the similar data
set which is not currently in use.

F.            Expectation Maximization (EM)
          The EM strategy is based on a recursive process: The missing data have information that is useful in estimating
various parameters, and the estimated parameter has information that is useful in finding the most likely value of the missing

                                                             14
Framework for Missing Value Imputation

data (Bennett, 2001). Thus, the EM method is an iterative procedure with two steps in each iteration. The disadvantage of
EM is that the standard errors and confidence intervals are not provided, so obtaining those statistics requires an additional
step. For inferential analyses for which those are essential, EM may not suffice.

G.         Mean Imputation
           In Mean Imputation, missing values are imputed with the mean value of that variable on the basis of the non
missing values for that variable. This method assumes that data are MCAR and results in biased means when this assumption
is false. Furthermore, imputing the mean value into cases tends to reduce the variance, which also attenuates covariance that
the variable has with other variables. This method produces biased means with data that are MAR or NMAR and
underestimates variance and covariance’s in all cases. Experts strongly advise against this method (Allison 2001; Bennett
2001; Graham et al., 2003; Pallant, 2007).

                                         IV.        K-MEANS CLUSTERING
          K-Means (Mac Queen, 1967) is one the simplest unsupervised learning algorithms that solve the well known
clustering problem. The intra-cluster dissimilarity is measured in K-Means clustering by the summation of distances between
the objects and the centroid of the clusters they are assigned. In K-Means clustering method, the data set X is divide into K-
Clusters. Each cluster is represented by the centroid of the set of objects in the cluster.
          The objective function




Where                   is a chosen distance measure between a data point        and the cluster centre , is an indicator of
the distance of the n data points from their respective cluster center.
The algorithm is composed of the following steps:
Procedure 1:
1. Place K points into the space represented by the objects that are being clustered. These points represent initial group
     centroids.
2. Assign each object to the group that has the closest centroid.
3. When all objects have been assigned, recalculate the positions of the K centroids.
4. Repeat Steps 2 and 3 until the centroids no longer move. This produces a separation of the objects into groups from
     which the metric to be minimized can be calculated.

                                                VI. KNN CLASSIFIER
           The k-nearest neighbor algorithm is amongst the simplest of all machine learning algorithms: an object is classified
by a majority vote of its neighbors, with the object being assigned to the class most common amongst its k nearest neighbors
(k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of its nearest neighbor. The
neighbors are taken from a set of objects for which the correct classification is known. This can be thought of as the training
set for the algorithm, though no explicit training step is required. The k-nearest neighbor algorithm is sensitive to the local
structure of the data

Procedure 2:
1. Compute Euclidean or Mahalanobis distance from target plot to those that were sampled.
2. Order samples taking for account calculated distances.
3. Choose heuristically optimal k nearest neighbor based on RMSE done by cross validation technique.
4. Calculate an inverse distance weighted average with the k-nearest multivariate neighbors.

                                       VII.       PROPOSED FRAMEWORK
          The above two techniques are implemented separately in a dataset which has missing value. When K-Means
Clustering is implemented, the missing values in each cluster are replaced by finding mean of the remaining values.
Similarly mean value is imputed for each group from kNN classifier. When both the datasets are compared for accuracy,
kNN imputation seems to perform better than the K-Means Clustering.




                                                               15
Framework for Missing Value Imputation

                                                     Database with
                                                    Missing Values




                                       K-Means                       Knn
                                       Clustering                    Classifier




                                C1       C2         Cn         G1        G2        G3
                                 Cn                             Cn        Cn        Cn




                                MI       MI         MI         MI        MI         MI




                          VIII.         CONCLUSION AND FUTURE ENHANCEMENT
         This paper discussed different methods to impute the missing values. It also explains about the framework to be
implemented with dataset with missing data. The presented two techniques deal with grouping of dataset so that each group
can be dealt with mean substitution or any other traditional method. The results are compared for accuracy.

                                                      REFERENCES
[1]    Allison, P.D-“Missing Data”, Thousand Oaks, CA: Sage -2001.
[2]    Bennett, D.A. “How can I deal with missing data in my study? Australian and New Zealand Journal of Public
       Health”, 25, pp.464 – 469, 2001.
[3]    Graham, J.W. “Adding missing-data-relevant variables to FIML- based structural equation models. Structural
       Equation Modeling”, 10, pp.80 – 100, 2003.
[4]    Graham, J.W, “Missing Data Analysis: Making it work in the real world. Annual Review of Psychology”, 60, 549 –
       576 , 2009.
[5]    Gabriel L.Schlomer, Sheri Bauman, and Noel A. Card : “ Best Practices for Missing Data Management in Counseling
       Psychology” , Journal of Counseling Psychology 2010, Vol.57.No 1,1 – 10.
[6]    Jeffrey C.Wayman , “Multiple Imputation For Missing Data : What Is It And How Can I Use It?” , Paper presented
       at the 2003 Annual Meeting of the American Educational Research Association, Chicago, IL ,pp . 2 -16, 2003.
[7]    A.Rogier T.Donders, Geert J.M.G Vander Heljden, Theo Stijnen, Kernel G.M Moons, “Review: A gentle
       introduction to imputation of missing values” , Journal of Clinical Epidemiology 59 , pp.1087 – 1091, 2006.
[8]    Kin Wagstaff ,”Clustering with Missing Values : No Imputation Required” -NSF grant IIS-0325329,pp.1-10.
[9]    S.Hichao Zhang , Jilian Zhang, Xiaofeng Zhu, Yongsong Qin,chengqi Zhang , “Missing Value Imputation Based on
       Data Clustering”, Springer-Verlag Berlin, Heidelberg ,2008.
[10]   Richard J.Hathuway , James C.Bezex, Jacalyn M.Huband , “Scalable Visual Assessment of Cluster Tendency for
       Large Data Sets”, Pattern Recognition ,Volume 39, Issue 7,pp,1315-1324- Feb 2006.
[11]   Qinbao Song, Martin Shepperd ,”A New Imputation Method for Small Software Project Data set”, The Journal of
       Systems and Software 80 ,pp,51–62, 2007.
[12]   Gabriel L.Scholmer, Sheri Bauman and Noel A.card “Best practices for Missing Data Management in Counseling
       Psychology”, Journal of Counseling Psychology, Vol. 57, No. 1,pp. 1–10,2010.
[13]   R.Kavitha Kumar, Dr.R.M Chandrasekar,“Missing Data Imputation in Cardiac Data Set” ,International Journal on
       Computer Science and Engineering , Vol.02 , No.05,pp-1836 – 1840 , 2010.
[14]   Jinhai Ma, Noori Aichar –Danesh , Lisa Dolovich, Lahana Thabane , “Imputation Strategies for Missing Binary
       Outcomes in Cluster Randomized Trials”- BMC Med Res Methodol. 2011; pp- 11: 18. – 2011.
[15]   R.S.Somasundaram , R.Nedunchezhian , “Evaluation of Three Simple Imputation Methods for Enhancing
       Preprocessing of Data with Missing Values”, International Journal of Computer Applications (0975 – 8887) Volume
       21 – No.10 ,pp.14-19 ,May 2011.
[16]   K.Raja , G.Tholkappia Arasu , Chitra. S.Nair , “Imputation Framework for Missing Value” , International Journal of
       Computer Trends and Technology – volume3Issue2 – 2012.
[17]   BOB L.Wall , Jeff K.Elser – “Imputation of Missing Data for Input to Support Vector Machines” ,



                                                             16

More Related Content

What's hot

Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
Dr Nisha Arora
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm
Laura Petrosanu
 
Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?
Kazi Toufiq Wadud
 
Data Science - Part VII - Cluster Analysis
Data Science - Part VII -  Cluster AnalysisData Science - Part VII -  Cluster Analysis
Data Science - Part VII - Cluster Analysis
Derek Kane
 
CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...
CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...
CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...
csandit
 
A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...
A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...
A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...
csandit
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
inventionjournals
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
IJECEIAES
 
A Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCAA Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCA
Editor Jacotech
 
Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2
IAEME Publication
 
50120140505013
5012014050501350120140505013
50120140505013
IAEME Publication
 
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
IRJET Journal
 
Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...
IJECEIAES
 
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type  Data MK-Prototypes: A Novel Algorithm for Clustering Mixed Type  Data
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
IJMER
 
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSSCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
ijdkp
 
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
IJDKP
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
s v
 
EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE...
EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE...EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE...
EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE...
Zac Darcy
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithm
ijsrd.com
 
A046010107
A046010107A046010107
A046010107
IJERA Editor
 

What's hot (20)

Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm
 
Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?Dimension Reduction: What? Why? and How?
Dimension Reduction: What? Why? and How?
 
Data Science - Part VII - Cluster Analysis
Data Science - Part VII -  Cluster AnalysisData Science - Part VII -  Cluster Analysis
Data Science - Part VII - Cluster Analysis
 
CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...
CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...
CORRELATION OF EIGENVECTOR CENTRALITY TO OTHER CENTRALITY MEASURES: RANDOM, S...
 
A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...
A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...
A FLOATING POINT DIVISION UNIT BASED ON TAYLOR-SERIES EXPANSION ALGORITHM AND...
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...
 
A Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCAA Novel Algorithm for Design Tree Classification with PCA
A Novel Algorithm for Design Tree Classification with PCA
 
Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2
 
50120140505013
5012014050501350120140505013
50120140505013
 
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
Enhancing Classification Accuracy of K-Nearest Neighbors Algorithm using Gain...
 
Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...
 
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type  Data MK-Prototypes: A Novel Algorithm for Clustering Mixed Type  Data
MK-Prototypes: A Novel Algorithm for Clustering Mixed Type Data
 
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSSCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMS
 
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
EXPERIMENTS ON HYPOTHESIS "FUZZY K-MEANS IS BETTER THAN K-MEANS FOR CLUSTERING"
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE...
EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE...EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE...
EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE...
 
A survey on Efficient Enhanced K-Means Clustering Algorithm
 A survey on Efficient Enhanced K-Means Clustering Algorithm A survey on Efficient Enhanced K-Means Clustering Algorithm
A survey on Efficient Enhanced K-Means Clustering Algorithm
 
A046010107
A046010107A046010107
A046010107
 

Viewers also liked

Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
IJERD Editor
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
IJERD Editor
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 

Viewers also liked (9)

Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 

Similar to Welcome to International Journal of Engineering Research and Development (IJERD)

K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imp...
K-NN Classifier Performs Better Than K-Means Clustering in  Missing Value Imp...K-NN Classifier Performs Better Than K-Means Clustering in  Missing Value Imp...
K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imp...
IOSR Journals
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
Editor IJCATR
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes Clustering
IJERD Editor
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
IRJET Journal
 
Data mining Part 1
Data mining Part 1Data mining Part 1
Data mining Part 1
Gautam Kumar
 
Data clustering using kernel based
Data clustering using kernel basedData clustering using kernel based
Data clustering using kernel based
IJITCA Journal
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
Natasha Grant
 
K means report
K means reportK means report
K means report
Gaurav Handa
 
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
ijiert bestjournal
 
Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)
Alexander Decker
 
Performance Analysis of Different Clustering Algorithm
Performance Analysis of Different Clustering AlgorithmPerformance Analysis of Different Clustering Algorithm
Performance Analysis of Different Clustering Algorithm
IOSR Journals
 
F017132529
F017132529F017132529
F017132529
IOSR Journals
 
Clustering heterogeneous categorical data using enhanced mini batch K-means ...
Clustering heterogeneous categorical data using enhanced mini  batch K-means ...Clustering heterogeneous categorical data using enhanced mini  batch K-means ...
Clustering heterogeneous categorical data using enhanced mini batch K-means ...
IJECEIAES
 
Unsupervised Machine Learning Algorithm K-means-Clustering.pptx
Unsupervised Machine Learning Algorithm K-means-Clustering.pptxUnsupervised Machine Learning Algorithm K-means-Clustering.pptx
Unsupervised Machine Learning Algorithm K-means-Clustering.pptx
Anupama Kate
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm
iosrjce
 
I017235662
I017235662I017235662
I017235662
IOSR Journals
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
SowmyaJyothi3
 
1376846406 14447221
1376846406  144472211376846406  14447221
1376846406 14447221
Editor Jacotech
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
Nandhini S
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
Vikash Kumar
 

Similar to Welcome to International Journal of Engineering Research and Development (IJERD) (20)

K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imp...
K-NN Classifier Performs Better Than K-Means Clustering in  Missing Value Imp...K-NN Classifier Performs Better Than K-Means Clustering in  Missing Value Imp...
K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imp...
 
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETSA HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
A HYBRID MODEL FOR MINING MULTI DIMENSIONAL DATA SETS
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes Clustering
 
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
Machine Learning Algorithms for Image Classification of Hand Digits and Face ...
 
Data mining Part 1
Data mining Part 1Data mining Part 1
Data mining Part 1
 
Data clustering using kernel based
Data clustering using kernel basedData clustering using kernel based
Data clustering using kernel based
 
A Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data MiningA Comparative Study Of Various Clustering Algorithms In Data Mining
A Comparative Study Of Various Clustering Algorithms In Data Mining
 
K means report
K means reportK means report
K means report
 
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
 
Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)Survey on classification algorithms for data mining (comparison and evaluation)
Survey on classification algorithms for data mining (comparison and evaluation)
 
Performance Analysis of Different Clustering Algorithm
Performance Analysis of Different Clustering AlgorithmPerformance Analysis of Different Clustering Algorithm
Performance Analysis of Different Clustering Algorithm
 
F017132529
F017132529F017132529
F017132529
 
Clustering heterogeneous categorical data using enhanced mini batch K-means ...
Clustering heterogeneous categorical data using enhanced mini  batch K-means ...Clustering heterogeneous categorical data using enhanced mini  batch K-means ...
Clustering heterogeneous categorical data using enhanced mini batch K-means ...
 
Unsupervised Machine Learning Algorithm K-means-Clustering.pptx
Unsupervised Machine Learning Algorithm K-means-Clustering.pptxUnsupervised Machine Learning Algorithm K-means-Clustering.pptx
Unsupervised Machine Learning Algorithm K-means-Clustering.pptx
 
Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm Particle Swarm Optimization based K-Prototype Clustering Algorithm
Particle Swarm Optimization based K-Prototype Clustering Algorithm
 
I017235662
I017235662I017235662
I017235662
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
 
1376846406 14447221
1376846406  144472211376846406  14447221
1376846406 14447221
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 

More from IJERD Editor

A Novel Method for Prevention of Bandwidth Distributed Denial of Service Attacks
A Novel Method for Prevention of Bandwidth Distributed Denial of Service AttacksA Novel Method for Prevention of Bandwidth Distributed Denial of Service Attacks
A Novel Method for Prevention of Bandwidth Distributed Denial of Service Attacks
IJERD Editor
 
MEMS MICROPHONE INTERFACE
MEMS MICROPHONE INTERFACEMEMS MICROPHONE INTERFACE
MEMS MICROPHONE INTERFACE
IJERD Editor
 
Influence of tensile behaviour of slab on the structural Behaviour of shear c...
Influence of tensile behaviour of slab on the structural Behaviour of shear c...Influence of tensile behaviour of slab on the structural Behaviour of shear c...
Influence of tensile behaviour of slab on the structural Behaviour of shear c...
IJERD Editor
 
Gold prospecting using Remote Sensing ‘A case study of Sudan’
Gold prospecting using Remote Sensing ‘A case study of Sudan’Gold prospecting using Remote Sensing ‘A case study of Sudan’
Gold prospecting using Remote Sensing ‘A case study of Sudan’
IJERD Editor
 
Reducing Corrosion Rate by Welding Design
Reducing Corrosion Rate by Welding DesignReducing Corrosion Rate by Welding Design
Reducing Corrosion Rate by Welding Design
IJERD Editor
 
Router 1X3 – RTL Design and Verification
Router 1X3 – RTL Design and VerificationRouter 1X3 – RTL Design and Verification
Router 1X3 – RTL Design and Verification
IJERD Editor
 
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...
IJERD Editor
 
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVR
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVRMitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVR
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVR
IJERD Editor
 
Study on the Fused Deposition Modelling In Additive Manufacturing
Study on the Fused Deposition Modelling In Additive ManufacturingStudy on the Fused Deposition Modelling In Additive Manufacturing
Study on the Fused Deposition Modelling In Additive Manufacturing
IJERD Editor
 
Spyware triggering system by particular string value
Spyware triggering system by particular string valueSpyware triggering system by particular string value
Spyware triggering system by particular string value
IJERD Editor
 
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
IJERD Editor
 
Secure Image Transmission for Cloud Storage System Using Hybrid Scheme
Secure Image Transmission for Cloud Storage System Using Hybrid SchemeSecure Image Transmission for Cloud Storage System Using Hybrid Scheme
Secure Image Transmission for Cloud Storage System Using Hybrid Scheme
IJERD Editor
 
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...
IJERD Editor
 
Gesture Gaming on the World Wide Web Using an Ordinary Web Camera
Gesture Gaming on the World Wide Web Using an Ordinary Web CameraGesture Gaming on the World Wide Web Using an Ordinary Web Camera
Gesture Gaming on the World Wide Web Using an Ordinary Web Camera
IJERD Editor
 
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...
IJERD Editor
 
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...
IJERD Editor
 
Moon-bounce: A Boon for VHF Dxing
Moon-bounce: A Boon for VHF DxingMoon-bounce: A Boon for VHF Dxing
Moon-bounce: A Boon for VHF Dxing
IJERD Editor
 
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
IJERD Editor
 
Importance of Measurements in Smart Grid
Importance of Measurements in Smart GridImportance of Measurements in Smart Grid
Importance of Measurements in Smart Grid
IJERD Editor
 
Study of Macro level Properties of SCC using GGBS and Lime stone powder
Study of Macro level Properties of SCC using GGBS and Lime stone powderStudy of Macro level Properties of SCC using GGBS and Lime stone powder
Study of Macro level Properties of SCC using GGBS and Lime stone powder
IJERD Editor
 

More from IJERD Editor (20)

A Novel Method for Prevention of Bandwidth Distributed Denial of Service Attacks
A Novel Method for Prevention of Bandwidth Distributed Denial of Service AttacksA Novel Method for Prevention of Bandwidth Distributed Denial of Service Attacks
A Novel Method for Prevention of Bandwidth Distributed Denial of Service Attacks
 
MEMS MICROPHONE INTERFACE
MEMS MICROPHONE INTERFACEMEMS MICROPHONE INTERFACE
MEMS MICROPHONE INTERFACE
 
Influence of tensile behaviour of slab on the structural Behaviour of shear c...
Influence of tensile behaviour of slab on the structural Behaviour of shear c...Influence of tensile behaviour of slab on the structural Behaviour of shear c...
Influence of tensile behaviour of slab on the structural Behaviour of shear c...
 
Gold prospecting using Remote Sensing ‘A case study of Sudan’
Gold prospecting using Remote Sensing ‘A case study of Sudan’Gold prospecting using Remote Sensing ‘A case study of Sudan’
Gold prospecting using Remote Sensing ‘A case study of Sudan’
 
Reducing Corrosion Rate by Welding Design
Reducing Corrosion Rate by Welding DesignReducing Corrosion Rate by Welding Design
Reducing Corrosion Rate by Welding Design
 
Router 1X3 – RTL Design and Verification
Router 1X3 – RTL Design and VerificationRouter 1X3 – RTL Design and Verification
Router 1X3 – RTL Design and Verification
 
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...
 
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVR
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVRMitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVR
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVR
 
Study on the Fused Deposition Modelling In Additive Manufacturing
Study on the Fused Deposition Modelling In Additive ManufacturingStudy on the Fused Deposition Modelling In Additive Manufacturing
Study on the Fused Deposition Modelling In Additive Manufacturing
 
Spyware triggering system by particular string value
Spyware triggering system by particular string valueSpyware triggering system by particular string value
Spyware triggering system by particular string value
 
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...
 
Secure Image Transmission for Cloud Storage System Using Hybrid Scheme
Secure Image Transmission for Cloud Storage System Using Hybrid SchemeSecure Image Transmission for Cloud Storage System Using Hybrid Scheme
Secure Image Transmission for Cloud Storage System Using Hybrid Scheme
 
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...
 
Gesture Gaming on the World Wide Web Using an Ordinary Web Camera
Gesture Gaming on the World Wide Web Using an Ordinary Web CameraGesture Gaming on the World Wide Web Using an Ordinary Web Camera
Gesture Gaming on the World Wide Web Using an Ordinary Web Camera
 
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...
 
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...
 
Moon-bounce: A Boon for VHF Dxing
Moon-bounce: A Boon for VHF DxingMoon-bounce: A Boon for VHF Dxing
Moon-bounce: A Boon for VHF Dxing
 
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
 
Importance of Measurements in Smart Grid
Importance of Measurements in Smart GridImportance of Measurements in Smart Grid
Importance of Measurements in Smart Grid
 
Study of Macro level Properties of SCC using GGBS and Lime stone powder
Study of Macro level Properties of SCC using GGBS and Lime stone powderStudy of Macro level Properties of SCC using GGBS and Lime stone powder
Study of Macro level Properties of SCC using GGBS and Lime stone powder
 

Welcome to International Journal of Engineering Research and Development (IJERD)

  • 1. International Journal of Engineering Research and Development e-ISSN: 2278-067X, p-ISSN: 2278-800X, www.ijerd.com Volume 4, Issue 7 (November 2012), PP. 14-16 Framework for Missing Value Imputation Ms.R.Malarvizhi1, Dr.Antony Selvadoss Thanamani2, 1 Scholar, Department of CS, NGM College of Arts and Science ,Pollachi ,Bharathiyar University,Coimbatore . 2 HOD, Department of CS, NGM College of Arts and Science, Pollachi, Bharathiyar University, Coimbatore. Abstract— Missing values may occur due to several reasons. In this paper, data is imputed by comparing the two most popular techniques. Mean Substitution the traditional method replaces mean value in K-means Clustering and in groups of kNN classifier. When compared in terms of accuracy of imputing missing data, the proposed kNN classifier is evaluated to demonstrate that the approach is better than the existing K-means clustering. Keywords— K-Means Clustering, kNN Classifier, Missing Value, Mean Substitution, Imputation I. INTRODUCTION Researchers are often faced with the issues of missing in Survey. Errors may occur due to human or machine when processed a sample and stored data values into their respective records. Traditional methods are there for handling missing data. List-wise deletion and pair-wise deletion exclude the data from analyses. These methods are unaccepted by the researchers as they produce biased result. K-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. kNN classifier operate on the premises that classification of unknown instances can be done by relating the unknown to the known according to some distance function. Mean substitution replaces missing data with the average of the valid data. This paper proposed a framework which compares two techniques. II. TYPES OF VARIABLES IN MISSING DATA Data that are Missing Completely at Random (MCAR) can be considered as a simple random sample of observed data. Data are Missing at Random (MAR) when the probability of the value being missing is dependent on some measurement characteristics of the individual but not on the missing value itself.(1). Data are missing not at Random (MNAR), variable value being missing is directly related value of the variable itself. III. CONSTRAINTS FOR MISSING VALUE REPLACEMENT Any method which is used for replacing the missing value should follow certain constraints. They are (i) they should produce unbiased result (ii) They should not change any values of other variables (iii) Estimation should minimize the cost. IV IMPUTATION METHODS A. List-Wise Deletion The List-wise Deletion leads to the loss of large amount of data, as the whole record gets deleted even if one single variable was missing. So finally the database gets reduced. B. Pair-Wise Deletion Pair-wise deletion is an appropriate method when missing data is completely at random. It is easy to implement but the resultant data are difficult to interpret C. Regression Imputation Regression imputation is also known as conditional mean imputation in which linear regression equation is used to replace the missing values. D. Hot-Deck Imputation Hot-Deck imputation is a type of imputation in which missing data value is replaced with actual data by estimating similar data set which is currently in use. E. Cold-Deck Imputation Cold-Deck imputation follows the same procedure of Hot-Deck Imputation but it compares with the similar data set which is not currently in use. F. Expectation Maximization (EM) The EM strategy is based on a recursive process: The missing data have information that is useful in estimating various parameters, and the estimated parameter has information that is useful in finding the most likely value of the missing 14
  • 2. Framework for Missing Value Imputation data (Bennett, 2001). Thus, the EM method is an iterative procedure with two steps in each iteration. The disadvantage of EM is that the standard errors and confidence intervals are not provided, so obtaining those statistics requires an additional step. For inferential analyses for which those are essential, EM may not suffice. G. Mean Imputation In Mean Imputation, missing values are imputed with the mean value of that variable on the basis of the non missing values for that variable. This method assumes that data are MCAR and results in biased means when this assumption is false. Furthermore, imputing the mean value into cases tends to reduce the variance, which also attenuates covariance that the variable has with other variables. This method produces biased means with data that are MAR or NMAR and underestimates variance and covariance’s in all cases. Experts strongly advise against this method (Allison 2001; Bennett 2001; Graham et al., 2003; Pallant, 2007). IV. K-MEANS CLUSTERING K-Means (Mac Queen, 1967) is one the simplest unsupervised learning algorithms that solve the well known clustering problem. The intra-cluster dissimilarity is measured in K-Means clustering by the summation of distances between the objects and the centroid of the clusters they are assigned. In K-Means clustering method, the data set X is divide into K- Clusters. Each cluster is represented by the centroid of the set of objects in the cluster. The objective function Where is a chosen distance measure between a data point and the cluster centre , is an indicator of the distance of the n data points from their respective cluster center. The algorithm is composed of the following steps: Procedure 1: 1. Place K points into the space represented by the objects that are being clustered. These points represent initial group centroids. 2. Assign each object to the group that has the closest centroid. 3. When all objects have been assigned, recalculate the positions of the K centroids. 4. Repeat Steps 2 and 3 until the centroids no longer move. This produces a separation of the objects into groups from which the metric to be minimized can be calculated. VI. KNN CLASSIFIER The k-nearest neighbor algorithm is amongst the simplest of all machine learning algorithms: an object is classified by a majority vote of its neighbors, with the object being assigned to the class most common amongst its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of its nearest neighbor. The neighbors are taken from a set of objects for which the correct classification is known. This can be thought of as the training set for the algorithm, though no explicit training step is required. The k-nearest neighbor algorithm is sensitive to the local structure of the data Procedure 2: 1. Compute Euclidean or Mahalanobis distance from target plot to those that were sampled. 2. Order samples taking for account calculated distances. 3. Choose heuristically optimal k nearest neighbor based on RMSE done by cross validation technique. 4. Calculate an inverse distance weighted average with the k-nearest multivariate neighbors. VII. PROPOSED FRAMEWORK The above two techniques are implemented separately in a dataset which has missing value. When K-Means Clustering is implemented, the missing values in each cluster are replaced by finding mean of the remaining values. Similarly mean value is imputed for each group from kNN classifier. When both the datasets are compared for accuracy, kNN imputation seems to perform better than the K-Means Clustering. 15
  • 3. Framework for Missing Value Imputation Database with Missing Values K-Means Knn Clustering Classifier C1 C2 Cn G1 G2 G3 Cn Cn Cn Cn MI MI MI MI MI MI VIII. CONCLUSION AND FUTURE ENHANCEMENT This paper discussed different methods to impute the missing values. It also explains about the framework to be implemented with dataset with missing data. The presented two techniques deal with grouping of dataset so that each group can be dealt with mean substitution or any other traditional method. The results are compared for accuracy. REFERENCES [1] Allison, P.D-“Missing Data”, Thousand Oaks, CA: Sage -2001. [2] Bennett, D.A. “How can I deal with missing data in my study? Australian and New Zealand Journal of Public Health”, 25, pp.464 – 469, 2001. [3] Graham, J.W. “Adding missing-data-relevant variables to FIML- based structural equation models. Structural Equation Modeling”, 10, pp.80 – 100, 2003. [4] Graham, J.W, “Missing Data Analysis: Making it work in the real world. Annual Review of Psychology”, 60, 549 – 576 , 2009. [5] Gabriel L.Schlomer, Sheri Bauman, and Noel A. Card : “ Best Practices for Missing Data Management in Counseling Psychology” , Journal of Counseling Psychology 2010, Vol.57.No 1,1 – 10. [6] Jeffrey C.Wayman , “Multiple Imputation For Missing Data : What Is It And How Can I Use It?” , Paper presented at the 2003 Annual Meeting of the American Educational Research Association, Chicago, IL ,pp . 2 -16, 2003. [7] A.Rogier T.Donders, Geert J.M.G Vander Heljden, Theo Stijnen, Kernel G.M Moons, “Review: A gentle introduction to imputation of missing values” , Journal of Clinical Epidemiology 59 , pp.1087 – 1091, 2006. [8] Kin Wagstaff ,”Clustering with Missing Values : No Imputation Required” -NSF grant IIS-0325329,pp.1-10. [9] S.Hichao Zhang , Jilian Zhang, Xiaofeng Zhu, Yongsong Qin,chengqi Zhang , “Missing Value Imputation Based on Data Clustering”, Springer-Verlag Berlin, Heidelberg ,2008. [10] Richard J.Hathuway , James C.Bezex, Jacalyn M.Huband , “Scalable Visual Assessment of Cluster Tendency for Large Data Sets”, Pattern Recognition ,Volume 39, Issue 7,pp,1315-1324- Feb 2006. [11] Qinbao Song, Martin Shepperd ,”A New Imputation Method for Small Software Project Data set”, The Journal of Systems and Software 80 ,pp,51–62, 2007. [12] Gabriel L.Scholmer, Sheri Bauman and Noel A.card “Best practices for Missing Data Management in Counseling Psychology”, Journal of Counseling Psychology, Vol. 57, No. 1,pp. 1–10,2010. [13] R.Kavitha Kumar, Dr.R.M Chandrasekar,“Missing Data Imputation in Cardiac Data Set” ,International Journal on Computer Science and Engineering , Vol.02 , No.05,pp-1836 – 1840 , 2010. [14] Jinhai Ma, Noori Aichar –Danesh , Lisa Dolovich, Lahana Thabane , “Imputation Strategies for Missing Binary Outcomes in Cluster Randomized Trials”- BMC Med Res Methodol. 2011; pp- 11: 18. – 2011. [15] R.S.Somasundaram , R.Nedunchezhian , “Evaluation of Three Simple Imputation Methods for Enhancing Preprocessing of Data with Missing Values”, International Journal of Computer Applications (0975 – 8887) Volume 21 – No.10 ,pp.14-19 ,May 2011. [16] K.Raja , G.Tholkappia Arasu , Chitra. S.Nair , “Imputation Framework for Missing Value” , International Journal of Computer Trends and Technology – volume3Issue2 – 2012. [17] BOB L.Wall , Jeff K.Elser – “Imputation of Missing Data for Input to Support Vector Machines” , 16