SlideShare a Scribd company logo
1 of 18
PROBLEM DATA: MISSING
DATA AND CAUSES
BEING A PRESENTATION BY GROUP
TWO (2)
FOR BIOL 802
WHAT ARE MISSING DATA?
• In statistics, missing data, or missing values,
occur when no data value is stored for the
variable in an observation.
• This situation mostly occur as a result of
manual data entry procedures, equipment
errors and incorrect measurements.
1
2
PROBLEMS OF MISSING DATA
-loss of efficiency
- complications in handling and analyzing
the data
- bias resulting from differences between
missing and complete data.
• MISSING COMPLETELY AT RANDOM (MCAR)
TYPES OF MISSING DATA
Values in a data set can miss completely at
random (MCAR) if the events that lead to any
particular data-item missing are independent
both of observable variables and of
unobservable parameters of interest, and occur
entirely at random.
3
• MISSING AT RANDOM (MAR)
TYPES OF MISSING DATA (Cont’d)
Missing at random (MAR) is an alternative, and
occurs when the missing-ness is related to a
particular variable, but it is not related to the value
of the variable that has missing data.
An example of this is accidentally omitting an
answer on a questionnaire.
4
TYPES OF MISSING DATA (Cont’d)
• MISSING NOT AT RANDOM (MNAR)
This is data that is missing for a specific reason
(i.e. the value of the variable that is missing is
related to the reason it is missing).
An example of this is if certain question on a
questionnaire tend to be skipped deliberately by
participants with certain characteristics.
5
HOW TO DEAL WITH MISSING DATA
Missing data reduce the representativeness of the
sample and can therefore distort inferences about
the population.
• There is no need to use a special method for
dealing missing values if method that is used
for data analysis has its own policy for
handling missing values.
6
8
HOW TO DEAL WITH MISSING DATA
• Decision rules extraction methods may consider
attributes with missing data as irrelevant
•Association rules extraction methods may
ignore rows with missing values (conservative
approach)
•handle missing values as they are supporting the
rule (optimistic approach)
HOW TO DEAL WITH MISSING DATA
•REDUCING THE DATA SET
The simplest solution for the missing data
imputation problem is the reduction of the data
set and elimination of all missing values. This can
be done by elimination of samples (rows) with
missing values or elimination of attributes
(columns) with missing values
10
HOW TO DEAL WITH MISSING DATA
TREATING MISSING ATTRIBUTE VALUES AS
SPECIAL VALUES.
This method deals with the unknown attribute
values using a totally different approach.
Rather than trying to find some known
attribute value as its value, we treat missing
value itself as a new value for the attributes
that contain missing values and treat it in the
same way as other values.
11
HOW TO DEAL WITH MISSING DATA
REPLACE MISSING VALUE WITH MEAN
This method replaces each missing value with
mean of the attribute.
The mean is calculated based on all known values
of the attribute. This method is usable only for
numeric attributes and is usually combined with
replacing missing values with most common
attribute value for symbolic attributes.
12
HOW TO DEAL WITH MISSING DATA
REPLACE MISSING VALUE WITH MEAN FOR
THE GIVEN CLASS
This method is similar to the previous one. The
difference is in that the mean is not calculated
from all known values of the attribute, but only
attributes values belonging to given class are used.
13
HOW TO DEAL WITH MISSING DATA
REPLACE MISSING VALUE WITH MOST COMMON
ATTRIBUTE VALUE
Since the mean is affected by the presence of
outliers it seems natural to use the median instead
just to assure robustness. In this case the missing
data for a given attribute is replaced by the median
of all known values of that attribute in the class
where the instance with the missing value belongs
14
HOW TO DEAL WITH MISSING DATA
CLOSEST FIT
The closest fit algorithm for missing attribute
values is based on replacing a missing attribute
value with an existing value of the same attribute
from another case that resembles as much as
possible the case with missing attribute values
15
HOW TO DEAL WITH MISSING DATA
E.g. if 5 attributes of given case is known and 6th
is being searched, the 6th attribute value is taken
from the case which has other 5 attributes values
most similar to the given case. The proximity
measure between two cases x and y is the
Manhattan distance between x and y, i.e.,
16
HOW TO DEAL WITH MISSING DATA
where r is the difference between the maximum
and minimum of the known values.
Problem with using only one closest case may
lied in replacing missing value by an outlier.
17
HOW TO DEAL WITH MISSING DATA
The problem is illustrated by following example
THANK YOU ALL FOR LISTENING
8

More Related Content

Similar to missingpdf

K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imp...
K-NN Classifier Performs Better Than K-Means Clustering in  Missing Value Imp...K-NN Classifier Performs Better Than K-Means Clustering in  Missing Value Imp...
K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imp...IOSR Journals
 
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTION
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTIONINFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTION
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTIONIJDKP
 
Machine Learning Approaches and its Challenges
Machine Learning Approaches and its ChallengesMachine Learning Approaches and its Challenges
Machine Learning Approaches and its Challengesijcnes
 
Data Reduction
Data ReductionData Reduction
Data ReductionRajan Shah
 
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Scores...
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating  Scores...SPSS GuideAssessing Normality, Handling Missing Data, and Calculating  Scores...
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Scores...ahmedragab433449
 
Multiple Linear Regression Models in Outlier Detection
Multiple Linear Regression Models in Outlier Detection Multiple Linear Regression Models in Outlier Detection
Multiple Linear Regression Models in Outlier Detection IJORCS
 
Imputation Techniques For Market Research Datasets With Missing Values
Imputation Techniques For Market Research Datasets With Missing Values Imputation Techniques For Market Research Datasets With Missing Values
Imputation Techniques For Market Research Datasets With Missing Values Salford Systems
 
Data Analytics on Solar Energy Using Hadoop
Data Analytics on Solar Energy Using HadoopData Analytics on Solar Energy Using Hadoop
Data Analytics on Solar Energy Using HadoopIJMERJOURNAL
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2Gokulks007
 
Statistical Approaches to Missing Data
Statistical Approaches to Missing DataStatistical Approaches to Missing Data
Statistical Approaches to Missing DataDataCards
 
Poor man's missing value imputation
Poor man's missing value imputationPoor man's missing value imputation
Poor man's missing value imputationLeonardo Auslender
 
Exploratory Data Analysis Unit 1 ppt presentation.pptx
Exploratory Data Analysis Unit 1 ppt presentation.pptxExploratory Data Analysis Unit 1 ppt presentation.pptx
Exploratory Data Analysis Unit 1 ppt presentation.pptxMayura shelke
 
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Total S...
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Total S...SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Total S...
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Total S...ahmedragab433449
 

Similar to missingpdf (20)

K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imp...
K-NN Classifier Performs Better Than K-Means Clustering in  Missing Value Imp...K-NN Classifier Performs Better Than K-Means Clustering in  Missing Value Imp...
K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imp...
 
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTION
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTIONINFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTION
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTION
 
Machine Learning Approaches and its Challenges
Machine Learning Approaches and its ChallengesMachine Learning Approaches and its Challenges
Machine Learning Approaches and its Challenges
 
Multiple imputation of missing data
Multiple imputation of missing dataMultiple imputation of missing data
Multiple imputation of missing data
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
 
Data Reduction
Data ReductionData Reduction
Data Reduction
 
Data pre processing
Data pre processingData pre processing
Data pre processing
 
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Scores...
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating  Scores...SPSS GuideAssessing Normality, Handling Missing Data, and Calculating  Scores...
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Scores...
 
Multiple Linear Regression Models in Outlier Detection
Multiple Linear Regression Models in Outlier Detection Multiple Linear Regression Models in Outlier Detection
Multiple Linear Regression Models in Outlier Detection
 
Imputation Techniques For Market Research Datasets With Missing Values
Imputation Techniques For Market Research Datasets With Missing Values Imputation Techniques For Market Research Datasets With Missing Values
Imputation Techniques For Market Research Datasets With Missing Values
 
Data Analytics on Solar Energy Using Hadoop
Data Analytics on Solar Energy Using HadoopData Analytics on Solar Energy Using Hadoop
Data Analytics on Solar Energy Using Hadoop
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2
 
EDA by Sastry.pptx
EDA by Sastry.pptxEDA by Sastry.pptx
EDA by Sastry.pptx
 
Statistical Approaches to Missing Data
Statistical Approaches to Missing DataStatistical Approaches to Missing Data
Statistical Approaches to Missing Data
 
Poor man's missing value imputation
Poor man's missing value imputationPoor man's missing value imputation
Poor man's missing value imputation
 
Data Mining: Data Preprocessing
Data Mining: Data PreprocessingData Mining: Data Preprocessing
Data Mining: Data Preprocessing
 
Exploratory Data Analysis Unit 1 ppt presentation.pptx
Exploratory Data Analysis Unit 1 ppt presentation.pptxExploratory Data Analysis Unit 1 ppt presentation.pptx
Exploratory Data Analysis Unit 1 ppt presentation.pptx
 
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Total S...
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Total S...SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Total S...
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Total S...
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 
Unit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdfUnit-3 Data Analytics.pdf
Unit-3 Data Analytics.pdf
 

Recently uploaded

How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and ModificationsMJDuyan
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdfssuserdda66b
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 

Recently uploaded (20)

How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 

missingpdf

  • 1. PROBLEM DATA: MISSING DATA AND CAUSES BEING A PRESENTATION BY GROUP TWO (2) FOR BIOL 802
  • 2. WHAT ARE MISSING DATA? • In statistics, missing data, or missing values, occur when no data value is stored for the variable in an observation. • This situation mostly occur as a result of manual data entry procedures, equipment errors and incorrect measurements. 1
  • 3. 2 PROBLEMS OF MISSING DATA -loss of efficiency - complications in handling and analyzing the data - bias resulting from differences between missing and complete data.
  • 4. • MISSING COMPLETELY AT RANDOM (MCAR) TYPES OF MISSING DATA Values in a data set can miss completely at random (MCAR) if the events that lead to any particular data-item missing are independent both of observable variables and of unobservable parameters of interest, and occur entirely at random. 3
  • 5. • MISSING AT RANDOM (MAR) TYPES OF MISSING DATA (Cont’d) Missing at random (MAR) is an alternative, and occurs when the missing-ness is related to a particular variable, but it is not related to the value of the variable that has missing data. An example of this is accidentally omitting an answer on a questionnaire. 4
  • 6. TYPES OF MISSING DATA (Cont’d) • MISSING NOT AT RANDOM (MNAR) This is data that is missing for a specific reason (i.e. the value of the variable that is missing is related to the reason it is missing). An example of this is if certain question on a questionnaire tend to be skipped deliberately by participants with certain characteristics. 5
  • 7. HOW TO DEAL WITH MISSING DATA Missing data reduce the representativeness of the sample and can therefore distort inferences about the population. • There is no need to use a special method for dealing missing values if method that is used for data analysis has its own policy for handling missing values. 6
  • 8. 8 HOW TO DEAL WITH MISSING DATA • Decision rules extraction methods may consider attributes with missing data as irrelevant •Association rules extraction methods may ignore rows with missing values (conservative approach) •handle missing values as they are supporting the rule (optimistic approach)
  • 9. HOW TO DEAL WITH MISSING DATA •REDUCING THE DATA SET The simplest solution for the missing data imputation problem is the reduction of the data set and elimination of all missing values. This can be done by elimination of samples (rows) with missing values or elimination of attributes (columns) with missing values
  • 10. 10 HOW TO DEAL WITH MISSING DATA TREATING MISSING ATTRIBUTE VALUES AS SPECIAL VALUES. This method deals with the unknown attribute values using a totally different approach. Rather than trying to find some known attribute value as its value, we treat missing value itself as a new value for the attributes that contain missing values and treat it in the same way as other values.
  • 11. 11 HOW TO DEAL WITH MISSING DATA REPLACE MISSING VALUE WITH MEAN This method replaces each missing value with mean of the attribute. The mean is calculated based on all known values of the attribute. This method is usable only for numeric attributes and is usually combined with replacing missing values with most common attribute value for symbolic attributes.
  • 12. 12 HOW TO DEAL WITH MISSING DATA REPLACE MISSING VALUE WITH MEAN FOR THE GIVEN CLASS This method is similar to the previous one. The difference is in that the mean is not calculated from all known values of the attribute, but only attributes values belonging to given class are used.
  • 13. 13 HOW TO DEAL WITH MISSING DATA REPLACE MISSING VALUE WITH MOST COMMON ATTRIBUTE VALUE Since the mean is affected by the presence of outliers it seems natural to use the median instead just to assure robustness. In this case the missing data for a given attribute is replaced by the median of all known values of that attribute in the class where the instance with the missing value belongs
  • 14. 14 HOW TO DEAL WITH MISSING DATA CLOSEST FIT The closest fit algorithm for missing attribute values is based on replacing a missing attribute value with an existing value of the same attribute from another case that resembles as much as possible the case with missing attribute values
  • 15. 15 HOW TO DEAL WITH MISSING DATA E.g. if 5 attributes of given case is known and 6th is being searched, the 6th attribute value is taken from the case which has other 5 attributes values most similar to the given case. The proximity measure between two cases x and y is the Manhattan distance between x and y, i.e.,
  • 16. 16 HOW TO DEAL WITH MISSING DATA where r is the difference between the maximum and minimum of the known values. Problem with using only one closest case may lied in replacing missing value by an outlier.
  • 17. 17 HOW TO DEAL WITH MISSING DATA The problem is illustrated by following example
  • 18. THANK YOU ALL FOR LISTENING 8