SlideShare a Scribd company logo
1 of 18
PROBLEM DATA: MISSING
DATA AND CAUSES
BEING A PRESENTATION BY GROUP
TWO (2)
FOR BIOL 802
WHAT ARE MISSING DATA?
• In statistics, missing data, or missing values,
occur when no data value is stored for the
variable in an observation.
• This situation mostly occur as a result of
manual data entry procedures, equipment
errors and incorrect measurements.
1
2
PROBLEMS OF MISSING DATA
-loss of efficiency
- complications in handling and analyzing
the data
- bias resulting from differences between
missing and complete data.
• MISSING COMPLETELY AT RANDOM (MCAR)
TYPES OF MISSING DATA
Values in a data set can miss completely at
random (MCAR) if the events that lead to any
particular data-item missing are independent
both of observable variables and of
unobservable parameters of interest, and occur
entirely at random.
3
• MISSING AT RANDOM (MAR)
TYPES OF MISSING DATA (Cont’d)
Missing at random (MAR) is an alternative, and
occurs when the missing-ness is related to a
particular variable, but it is not related to the value
of the variable that has missing data.
An example of this is accidentally omitting an
answer on a questionnaire.
4
TYPES OF MISSING DATA (Cont’d)
• MISSING NOT AT RANDOM (MNAR)
This is data that is missing for a specific reason
(i.e. the value of the variable that is missing is
related to the reason it is missing).
An example of this is if certain question on a
questionnaire tend to be skipped deliberately by
participants with certain characteristics.
5
HOW TO DEAL WITH MISSING DATA
Missing data reduce the representativeness of the
sample and can therefore distort inferences about
the population.
• There is no need to use a special method for
dealing missing values if method that is used
for data analysis has its own policy for
handling missing values.
6
8
HOW TO DEAL WITH MISSING DATA
• Decision rules extraction methods may consider
attributes with missing data as irrelevant
•Association rules extraction methods may
ignore rows with missing values (conservative
approach)
•handle missing values as they are supporting the
rule (optimistic approach)
HOW TO DEAL WITH MISSING DATA
•REDUCING THE DATA SET
The simplest solution for the missing data
imputation problem is the reduction of the data
set and elimination of all missing values. This can
be done by elimination of samples (rows) with
missing values or elimination of attributes
(columns) with missing values
10
HOW TO DEAL WITH MISSING DATA
TREATING MISSING ATTRIBUTE VALUES AS
SPECIAL VALUES.
This method deals with the unknown attribute
values using a totally different approach.
Rather than trying to find some known
attribute value as its value, we treat missing
value itself as a new value for the attributes
that contain missing values and treat it in the
same way as other values.
11
HOW TO DEAL WITH MISSING DATA
REPLACE MISSING VALUE WITH MEAN
This method replaces each missing value with
mean of the attribute.
The mean is calculated based on all known values
of the attribute. This method is usable only for
numeric attributes and is usually combined with
replacing missing values with most common
attribute value for symbolic attributes.
12
HOW TO DEAL WITH MISSING DATA
REPLACE MISSING VALUE WITH MEAN FOR
THE GIVEN CLASS
This method is similar to the previous one. The
difference is in that the mean is not calculated
from all known values of the attribute, but only
attributes values belonging to given class are used.
13
HOW TO DEAL WITH MISSING DATA
REPLACE MISSING VALUE WITH MOST COMMON
ATTRIBUTE VALUE
Since the mean is affected by the presence of
outliers it seems natural to use the median instead
just to assure robustness. In this case the missing
data for a given attribute is replaced by the median
of all known values of that attribute in the class
where the instance with the missing value belongs
14
HOW TO DEAL WITH MISSING DATA
CLOSEST FIT
The closest fit algorithm for missing attribute
values is based on replacing a missing attribute
value with an existing value of the same attribute
from another case that resembles as much as
possible the case with missing attribute values
15
HOW TO DEAL WITH MISSING DATA
E.g. if 5 attributes of given case is known and 6th
is being searched, the 6th attribute value is taken
from the case which has other 5 attributes values
most similar to the given case. The proximity
measure between two cases x and y is the
Manhattan distance between x and y, i.e.,
16
HOW TO DEAL WITH MISSING DATA
where r is the difference between the maximum
and minimum of the known values.
Problem with using only one closest case may
lied in replacing missing value by an outlier.
17
HOW TO DEAL WITH MISSING DATA
The problem is illustrated by following example
THANK YOU ALL FOR LISTENING
8

More Related Content

What's hot

Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...Stats Statswork
 
Decision Trees
Decision TreesDecision Trees
Decision TreesStudent
 
Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Mohammed Musah
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsSarfraz Ahmad
 
Inferential statistics.ppt
Inferential statistics.pptInferential statistics.ppt
Inferential statistics.pptNursing Path
 
Ordinal logistic regression
Ordinal logistic regression Ordinal logistic regression
Ordinal logistic regression Dr Athar Khan
 
3.7 outlier analysis
3.7 outlier analysis3.7 outlier analysis
3.7 outlier analysisKrish_ver2
 
Introduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorIntroduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorAmir Al-Ansary
 
Data mining Part 1
Data mining Part 1Data mining Part 1
Data mining Part 1Gautam Kumar
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionDrZahid Khan
 
data analysis techniques and statistical softwares
data analysis techniques and statistical softwaresdata analysis techniques and statistical softwares
data analysis techniques and statistical softwaresDr.ammara khakwani
 
Introduction to unsupervised learning: outlier detection
Introduction to unsupervised learning: outlier detectionIntroduction to unsupervised learning: outlier detection
Introduction to unsupervised learning: outlier detectionJoseph Itopa Abubakar
 
discriminant analysis
discriminant analysisdiscriminant analysis
discriminant analysiskrishnadk
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysisgokulprasath06
 

What's hot (20)

Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...
Statistical Data Analysis | Data Analysis | Statistics Services | Data Collec...
 
Data cleaning-outlier-detection
Data cleaning-outlier-detectionData cleaning-outlier-detection
Data cleaning-outlier-detection
 
Type of data
Type of dataType of data
Type of data
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Inferential statistics.ppt
Inferential statistics.pptInferential statistics.ppt
Inferential statistics.ppt
 
Ordinal logistic regression
Ordinal logistic regression Ordinal logistic regression
Ordinal logistic regression
 
Pca ppt
Pca pptPca ppt
Pca ppt
 
Chapter 12 outlier
Chapter 12 outlierChapter 12 outlier
Chapter 12 outlier
 
3.7 outlier analysis
3.7 outlier analysis3.7 outlier analysis
3.7 outlier analysis
 
Introduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorIntroduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood Estimator
 
Data mining Part 1
Data mining Part 1Data mining Part 1
Data mining Part 1
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
data analysis techniques and statistical softwares
data analysis techniques and statistical softwaresdata analysis techniques and statistical softwares
data analysis techniques and statistical softwares
 
Introduction to unsupervised learning: outlier detection
Introduction to unsupervised learning: outlier detectionIntroduction to unsupervised learning: outlier detection
Introduction to unsupervised learning: outlier detection
 
discriminant analysis
discriminant analysisdiscriminant analysis
discriminant analysis
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 

Viewers also liked

Statistical Approaches to Missing Data
Statistical Approaches to Missing DataStatistical Approaches to Missing Data
Statistical Approaches to Missing DataDataCards
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingextraganesh
 
The Joys of Clean Data with Matt Dowle
The Joys of Clean Data with  Matt DowleThe Joys of Clean Data with  Matt Dowle
The Joys of Clean Data with Matt DowleSri Ambati
 
PROMISE 2011: "Handling missing data in software effort prediction with naive...
PROMISE 2011: "Handling missing data in software effort prediction with naive...PROMISE 2011: "Handling missing data in software effort prediction with naive...
PROMISE 2011: "Handling missing data in software effort prediction with naive...CS, NcState
 
Seminar presentation on AKI and CKD in pediatrics
Seminar presentation on AKI and CKD in pediatricsSeminar presentation on AKI and CKD in pediatrics
Seminar presentation on AKI and CKD in pediatricsfasil wagnew
 
A handbook-of-statistical-analyses-using-stata-3rd-edition
A handbook-of-statistical-analyses-using-stata-3rd-editionA handbook-of-statistical-analyses-using-stata-3rd-edition
A handbook-of-statistical-analyses-using-stata-3rd-editionTriều Dương
 
Data management in Stata
Data management in StataData management in Stata
Data management in Stataizahn
 
Quantitative Data Analysis
Quantitative Data AnalysisQuantitative Data Analysis
Quantitative Data AnalysisAsma Muhamad
 

Viewers also liked (14)

Statistical Approaches to Missing Data
Statistical Approaches to Missing DataStatistical Approaches to Missing Data
Statistical Approaches to Missing Data
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
STATA 13
STATA 13STATA 13
STATA 13
 
The Joys of Clean Data with Matt Dowle
The Joys of Clean Data with  Matt DowleThe Joys of Clean Data with  Matt Dowle
The Joys of Clean Data with Matt Dowle
 
PROMISE 2011: "Handling missing data in software effort prediction with naive...
PROMISE 2011: "Handling missing data in software effort prediction with naive...PROMISE 2011: "Handling missing data in software effort prediction with naive...
PROMISE 2011: "Handling missing data in software effort prediction with naive...
 
Stata tutorial
Stata tutorialStata tutorial
Stata tutorial
 
Seminar presentation on AKI and CKD in pediatrics
Seminar presentation on AKI and CKD in pediatricsSeminar presentation on AKI and CKD in pediatrics
Seminar presentation on AKI and CKD in pediatrics
 
A handbook-of-statistical-analyses-using-stata-3rd-edition
A handbook-of-statistical-analyses-using-stata-3rd-editionA handbook-of-statistical-analyses-using-stata-3rd-edition
A handbook-of-statistical-analyses-using-stata-3rd-edition
 
Survival analysis
Survival analysisSurvival analysis
Survival analysis
 
Data cleansing
Data cleansingData cleansing
Data cleansing
 
Acute Kidney Injury
Acute Kidney InjuryAcute Kidney Injury
Acute Kidney Injury
 
Data management in Stata
Data management in StataData management in Stata
Data management in Stata
 
Variables de investigación
Variables de investigaciónVariables de investigación
Variables de investigación
 
Quantitative Data Analysis
Quantitative Data AnalysisQuantitative Data Analysis
Quantitative Data Analysis
 

Similar to Missing Data and Causes

Database ppt.pptx
Database ppt.pptxDatabase ppt.pptx
Database ppt.pptxAASTHAJAJOO
 
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...ijiert bestjournal
 
data Sreening.doc
data Sreening.docdata Sreening.doc
data Sreening.docmurtaza5500
 
SELECTED DATA PREPARATION METHODS
SELECTED DATA PREPARATION METHODSSELECTED DATA PREPARATION METHODS
SELECTED DATA PREPARATION METHODSKAMIL MAJEED
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data PreprocessingT Kavitha
 
3 Missing data12256429.ppt
3 Missing data12256429.ppt3 Missing data12256429.ppt
3 Missing data12256429.pptAravind Reddy
 
A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...
A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...
A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...CSCJournals
 
K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imp...
K-NN Classifier Performs Better Than K-Means Clustering in  Missing Value Imp...K-NN Classifier Performs Better Than K-Means Clustering in  Missing Value Imp...
K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imp...IOSR Journals
 
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTION
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTIONINFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTION
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTIONIJDKP
 
Machine Learning Approaches and its Challenges
Machine Learning Approaches and its ChallengesMachine Learning Approaches and its Challenges
Machine Learning Approaches and its Challengesijcnes
 
Data Reduction
Data ReductionData Reduction
Data ReductionRajan Shah
 
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Scores...
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating  Scores...SPSS GuideAssessing Normality, Handling Missing Data, and Calculating  Scores...
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Scores...ahmedragab433449
 
Multiple Linear Regression Models in Outlier Detection
Multiple Linear Regression Models in Outlier Detection Multiple Linear Regression Models in Outlier Detection
Multiple Linear Regression Models in Outlier Detection IJORCS
 
Imputation Techniques For Market Research Datasets With Missing Values
Imputation Techniques For Market Research Datasets With Missing Values Imputation Techniques For Market Research Datasets With Missing Values
Imputation Techniques For Market Research Datasets With Missing Values Salford Systems
 
Data Analytics on Solar Energy Using Hadoop
Data Analytics on Solar Energy Using HadoopData Analytics on Solar Energy Using Hadoop
Data Analytics on Solar Energy Using HadoopIJMERJOURNAL
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2Gokulks007
 

Similar to Missing Data and Causes (20)

Database ppt.pptx
Database ppt.pptxDatabase ppt.pptx
Database ppt.pptx
 
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
 
data Sreening.doc
data Sreening.docdata Sreening.doc
data Sreening.doc
 
SELECTED DATA PREPARATION METHODS
SELECTED DATA PREPARATION METHODSSELECTED DATA PREPARATION METHODS
SELECTED DATA PREPARATION METHODS
 
B0930610
B0930610B0930610
B0930610
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
3 Missing data12256429.ppt
3 Missing data12256429.ppt3 Missing data12256429.ppt
3 Missing data12256429.ppt
 
A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...
A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...
A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...
 
K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imp...
K-NN Classifier Performs Better Than K-Means Clustering in  Missing Value Imp...K-NN Classifier Performs Better Than K-Means Clustering in  Missing Value Imp...
K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imp...
 
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTION
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTIONINFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTION
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTION
 
Machine Learning Approaches and its Challenges
Machine Learning Approaches and its ChallengesMachine Learning Approaches and its Challenges
Machine Learning Approaches and its Challenges
 
Multiple imputation of missing data
Multiple imputation of missing dataMultiple imputation of missing data
Multiple imputation of missing data
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
 
Data Reduction
Data ReductionData Reduction
Data Reduction
 
Data pre processing
Data pre processingData pre processing
Data pre processing
 
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Scores...
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating  Scores...SPSS GuideAssessing Normality, Handling Missing Data, and Calculating  Scores...
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Scores...
 
Multiple Linear Regression Models in Outlier Detection
Multiple Linear Regression Models in Outlier Detection Multiple Linear Regression Models in Outlier Detection
Multiple Linear Regression Models in Outlier Detection
 
Imputation Techniques For Market Research Datasets With Missing Values
Imputation Techniques For Market Research Datasets With Missing Values Imputation Techniques For Market Research Datasets With Missing Values
Imputation Techniques For Market Research Datasets With Missing Values
 
Data Analytics on Solar Energy Using Hadoop
Data Analytics on Solar Energy Using HadoopData Analytics on Solar Energy Using Hadoop
Data Analytics on Solar Energy Using Hadoop
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2
 

Recently uploaded

Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555kikilily0909
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 

Recently uploaded (20)

Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 

Missing Data and Causes

  • 1. PROBLEM DATA: MISSING DATA AND CAUSES BEING A PRESENTATION BY GROUP TWO (2) FOR BIOL 802
  • 2. WHAT ARE MISSING DATA? • In statistics, missing data, or missing values, occur when no data value is stored for the variable in an observation. • This situation mostly occur as a result of manual data entry procedures, equipment errors and incorrect measurements. 1
  • 3. 2 PROBLEMS OF MISSING DATA -loss of efficiency - complications in handling and analyzing the data - bias resulting from differences between missing and complete data.
  • 4. • MISSING COMPLETELY AT RANDOM (MCAR) TYPES OF MISSING DATA Values in a data set can miss completely at random (MCAR) if the events that lead to any particular data-item missing are independent both of observable variables and of unobservable parameters of interest, and occur entirely at random. 3
  • 5. • MISSING AT RANDOM (MAR) TYPES OF MISSING DATA (Cont’d) Missing at random (MAR) is an alternative, and occurs when the missing-ness is related to a particular variable, but it is not related to the value of the variable that has missing data. An example of this is accidentally omitting an answer on a questionnaire. 4
  • 6. TYPES OF MISSING DATA (Cont’d) • MISSING NOT AT RANDOM (MNAR) This is data that is missing for a specific reason (i.e. the value of the variable that is missing is related to the reason it is missing). An example of this is if certain question on a questionnaire tend to be skipped deliberately by participants with certain characteristics. 5
  • 7. HOW TO DEAL WITH MISSING DATA Missing data reduce the representativeness of the sample and can therefore distort inferences about the population. • There is no need to use a special method for dealing missing values if method that is used for data analysis has its own policy for handling missing values. 6
  • 8. 8 HOW TO DEAL WITH MISSING DATA • Decision rules extraction methods may consider attributes with missing data as irrelevant •Association rules extraction methods may ignore rows with missing values (conservative approach) •handle missing values as they are supporting the rule (optimistic approach)
  • 9. HOW TO DEAL WITH MISSING DATA •REDUCING THE DATA SET The simplest solution for the missing data imputation problem is the reduction of the data set and elimination of all missing values. This can be done by elimination of samples (rows) with missing values or elimination of attributes (columns) with missing values
  • 10. 10 HOW TO DEAL WITH MISSING DATA TREATING MISSING ATTRIBUTE VALUES AS SPECIAL VALUES. This method deals with the unknown attribute values using a totally different approach. Rather than trying to find some known attribute value as its value, we treat missing value itself as a new value for the attributes that contain missing values and treat it in the same way as other values.
  • 11. 11 HOW TO DEAL WITH MISSING DATA REPLACE MISSING VALUE WITH MEAN This method replaces each missing value with mean of the attribute. The mean is calculated based on all known values of the attribute. This method is usable only for numeric attributes and is usually combined with replacing missing values with most common attribute value for symbolic attributes.
  • 12. 12 HOW TO DEAL WITH MISSING DATA REPLACE MISSING VALUE WITH MEAN FOR THE GIVEN CLASS This method is similar to the previous one. The difference is in that the mean is not calculated from all known values of the attribute, but only attributes values belonging to given class are used.
  • 13. 13 HOW TO DEAL WITH MISSING DATA REPLACE MISSING VALUE WITH MOST COMMON ATTRIBUTE VALUE Since the mean is affected by the presence of outliers it seems natural to use the median instead just to assure robustness. In this case the missing data for a given attribute is replaced by the median of all known values of that attribute in the class where the instance with the missing value belongs
  • 14. 14 HOW TO DEAL WITH MISSING DATA CLOSEST FIT The closest fit algorithm for missing attribute values is based on replacing a missing attribute value with an existing value of the same attribute from another case that resembles as much as possible the case with missing attribute values
  • 15. 15 HOW TO DEAL WITH MISSING DATA E.g. if 5 attributes of given case is known and 6th is being searched, the 6th attribute value is taken from the case which has other 5 attributes values most similar to the given case. The proximity measure between two cases x and y is the Manhattan distance between x and y, i.e.,
  • 16. 16 HOW TO DEAL WITH MISSING DATA where r is the difference between the maximum and minimum of the known values. Problem with using only one closest case may lied in replacing missing value by an outlier.
  • 17. 17 HOW TO DEAL WITH MISSING DATA The problem is illustrated by following example
  • 18. THANK YOU ALL FOR LISTENING 8