Imputation Methods for Handling Missing Data in Longitudinal Studies

Imputation methods for missing data in longitudinal studies
Having complete data that one sought to record infrequently happens,
especially in a study setting where repeated measures from subjects are
taken. A major shortcoming of longitudinal or follow-up studies, primarily due
to their design, is loss to follow-up that may lead to attrition bias in case the
subjects who withdraw from the study are systematically different than those
who complete the study. Reasons for attrition could be: migration from the
study area, death, subject fatigue and treatment side-effects. In other cases
subjects might not attend scheduled observations times but attend
subsequent ones, resulting in missing data.
There are a number of ways through which data analysts deal with missing
data in longitudinal studies, and indeed other types of study designs:
Complete case analysis: In this option of dealing with missing data
subjects/cases without complete information are dropped in the analysis
sample. This approach result in loss of information because partly complete
information of some subjects is dropped, and may lead to introduction of bias
in the estimates of the model coefficients if the data is not missing completely
at random.
Last-Observations-Carried-Forward (LOCF): This method can only be
applied under a longitudinal study. The missing values, for each
individual/case, are replaced by the last observation of a variable. This
manner of dealing with missing values has been discouraged in literature

recently. The means and precision measures such as the variance can be
biased leading to wrong inferences. We advise against using this approach in
dealing with missing values.
Mean imputation: Under mean imputation the missing values in a variable
are replaced by its mean value of the non-missing observations of that
variable. It preserves the mean (the mean in the data wont be biased) but
does not preserve the relationship between variables; it might reduce/increase
the correlation between the variables being studied. This approach does not
account for the uncertainty in the imputed values by including an additional
variance from imputation, hence less preferred over data imputation
techniques such as multiple imputation that account for the uncertainty.
Hot-deck imputation: In this method of dealing with missing data, each
missing value is replaced with an observed response from a similar unit in the
same sample dataset. There are several ways of implementing the Hot-deck
imputation method. For example, randomly picking the observed response
from the set of cases that are similar to the case for which the imputation is
needed, or finding the mean of the variable among the set of similar cases.
This article provides a detailed review of the various Hot-deck imputations
techniques. The performance of this imputation technique, in terms of the
preservation of relationships between variables, differs according to the
specific technique chosen.

Estimation maximisation (EM): This iterative procedure of dealing with
missing values uses other variables to impute an expected value (estimation
step), then checks whether that is the value that is most likely (maximization
step). The EM algorithm preserves the relationship with other variables a
feature that is important in regression analysis. However, they understate
standard error and should be used when the extent of missing values is not
big, for instance when the proportion of missing values is not more than 5%.
Multiple imputation: This approach has three stages. First, multiple copies of
the dataset, with the missing values replaced by imputed values, are
generated. The imputed values are sampled from their predictive distribution
based on the observed data. Next, standard statistical methods are used to fit
the model of interest to each of the imputed dataset. Lastly, the estimated of
parameters from each imputed dataset are pooled to provide a single
estimate for each parameter of interest. The standard errors of these pooled
estimates are calculated using rules that take account of the variability
between the imputed datasets. Valid inferences are obtained because results
are averaged over the distribution of the missing data given the observed
data. Nonetheless, there are pitfalls in multiple imputations that analysts
should be aware of when they contemplate using this approach.

Imputation Methods for Handling Missing Data in Longitudinal Studies

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Imputation Methods for Handling Missing Data in Longitudinal Studies

Similar to Imputation Methods for Handling Missing Data in Longitudinal Studies (20)

Recently uploaded

Recently uploaded (20)

Imputation Methods for Handling Missing Data in Longitudinal Studies