Session II Estimation Methods and Accuracy - D. Filipponi, Integration of administrative sources and survey data through Hidden Markov Models for the production of labour statistics
Similar to Session II Estimation Methods and Accuracy - D. Filipponi, Integration of administrative sources and survey data through Hidden Markov Models for the production of labour statistics
Similar to Session II Estimation Methods and Accuracy - D. Filipponi, Integration of administrative sources and survey data through Hidden Markov Models for the production of labour statistics (20)
Session II Estimation Methods and Accuracy - D. Filipponi, Integration of administrative sources and survey data through Hidden Markov Models for the production of labour statistics
1. Integration of administrative sources and survey data through HMM for the production of labour statistics
Integration of administrative sources
and survey data through HMM for the
production of labour statistics
Danila Filipponi, Ugo Guarnera, Roberta Varriale
ISTAT
Istat - 19 November 2018
2. Integration of administrative sources and survey data through HMM for the production of labour statistics
outline
1 the problem
2 non supervised approach
3 Application to employment statistics
4 results and conclusions
3. Integration of administrative sources and survey data through HMM for the production of labour statistics
the problem
motivating example
Build a micro-data file to support the estimates on
employment (including the Permanent Census)
Available information:
weekly administrative data on employment status and other
individual characteristics
quarterly labour force survey (LFS) data
yearly Master Sample (MS) data (from 2018)
4. Integration of administrative sources and survey data through HMM for the production of labour statistics
the problem
motivating example
Table 1: Cross-classification of the employment status measured by
LFS data and AD. Year 2015. Italy
LFS AD Out In Total
Not Employed 59.9 2.9 62.8
Employed 3.2 34.0 37.2
Total 63.1 36.9 100.0
The two measures disagree on about the 6% of the surveyed units
5. Integration of administrative sources and survey data through HMM for the production of labour statistics
the problem
overcoverage of ad according to lfs data by
type of source. year 2015 italy
6. Integration of administrative sources and survey data through HMM for the production of labour statistics
non supervised approach
the approach
Given the informative context and the expected output a suitable
approach should
model data at unit level
be unsupervised, since none of the sources can be reasonably
considered as error free - all the sources are treated as
imperfect measures of the true phenomenon -
take into account the longitudinal structure of the data
7. Integration of administrative sources and survey data through HMM for the production of labour statistics
non supervised approach
Non supervised approach: the variables
L variables, representing the “true” (latent) target phenomenon
L are the variables that we would observe if data were error
free
L are considered latent variables because they are not directly
observed
Yg variables (g = 1, . . . , G), representing imperfect (observed)
measures of the target phenomenon
Q1, Q2 variables, covariates associated respectively to the
latent variables L and to the measures Yg
8. Integration of administrative sources and survey data through HMM for the production of labour statistics
non supervised approach
Non supervised approach: the model
The statistical model is composed of two components specified via
the conditional probability distributions:
P(L | Q1) : latent model
P(Y1, . . . , YG | L, Q2) : measurement model
9. Integration of administrative sources and survey data through HMM for the production of labour statistics
non supervised approach
Non supervised approach: estimates
Estimates of the model parameters can be obtained via MLE
methods based on the observed data likelihood
Using the parameter estimates, one can obtain via Bayes
formula the posterior distribution of the target variable
conditional on all the available information:
P(L|Y1, . . . , YG, Q1, Q2)
Predictions of the latent variable for each unit can be obtained
by taking expectations from the previous distribution
10. Integration of administrative sources and survey data through HMM for the production of labour statistics
Application to employment statistics
modelling employment data
Goal: Estimation of the monthly employment status
two categories: 1 = employed, 2 = not employed
Lt: true employment status (latent)
Lt ∈ (1, 2) t ∈ (1, . . . , 12)
Y1: employment status according to the LFS
Y1t ∈ (1, 2)
Y2: employment status according to the AD
Y2t ∈ (1, 2)
covariates
Q: retirement status, student, income, age, sex
Source: type of administrative sources.
11. Integration of administrative sources and survey data through HMM for the production of labour statistics
Application to employment statistics
Modeling employment data
After the comments and suggestions from the Advisory board, we
define the following model:
Y11
L3
Y21 Y22 Y23
L2L1
Y14
Y24
L4
SOURCE Q
X
12. Integration of administrative sources and survey data through HMM for the production of labour statistics
Application to employment statistics
Modeling employment data
Y11
Y21 Y22 Y23
Y14
Y24
SOURCE Q
X
L1 L3L2 L4
Latent model
initial probability
pl1
.
= P(L1 = j)
transition matrix
p
(t)
i|j
.
= P(Lt =
i|Lt−1 = j)
13. Integration of administrative sources and survey data through HMM for the production of labour statistics
Application to employment statistics
Modeling employment data
L3
Y21 Y22 Y23
L2L1
Y24
L4
SOURCE Q
X
Y11 Y14
Measurement model
Y 1: employment status
according to the LFS
14. Integration of administrative sources and survey data through HMM for the production of labour statistics
Application to employment statistics
Modeling employment data
Y11
L3L2L1
Y14
L4
SOURCE Q
X
Y21 Y22 Y23 Y24
Y2: employment status according
to AD
local independence
serial independence
measurament errors
ψg
j|i
.
= P(Ygt = ygj|Lt = i),
15. Integration of administrative sources and survey data through HMM for the production of labour statistics
Application to employment statistics
Modeling employment data
Y11
L3
Y21 Y22 Y23
L2L1
Y14
Y24
L4
SOURCE Q
X
X: mixture, 3 classes
Captures heterogeneity in
the population:
1= Never employed
2= Stayers
3= Movers
16. Integration of administrative sources and survey data through HMM for the production of labour statistics
Application to employment statistics
Modeling employment data
Y11
L3
Y21 Y22 Y23
L2L1
Y14
Y24
L4
X
SOURCE Q Covariates
Q: retirement status,
student, income, age, sex
SOURCE: source type
1= No source
2= Employees
3= Self-employers
(time)
4= Self-employers (no
time)
17. Integration of administrative sources and survey data through HMM for the production of labour statistics
Application to employment statistics
Modeling employment data
Y11
L3
Y21 Y22 Y23
L2L1
Y14
Y24
L4
SOURCE Q
X
Complete model
18. Integration of administrative sources and survey data through HMM for the production of labour statistics
results and conclusions
some results
We focus on the following outcomes from the model:
estimates of measurement error parameters ψg
j|i
estimates of the employments (by domain D) using the
expectation of the posterior distribution
12
t=1 k∈D
ˆlk,t
12
where
ˆlk,t
.
= P(lt = 1|Y1 = y1,k, Y2 = y2,k, Q = qk),
index k refers to the kth unit, t to the month and D is some
domain of interest.
19. Integration of administrative sources and survey data through HMM for the production of labour statistics
results and conclusions
Monthly and quarterly estimate of the
employment in Italy. Year 2015
20. Integration of administrative sources and survey data through HMM for the production of labour statistics
results and conclusions
Employments by Region. Year 2015
21. Integration of administrative sources and survey data through HMM for the production of labour statistics
results and conclusions
Bootstrap 95% prediction interval for
monthly employment in Umbria. Year 2015
22. Integration of administrative sources and survey data through HMM for the production of labour statistics
results and conclusions
conclusions and future works
Non-supervised models seem to be a promising approach in a
multi-source context.
Combining administrative and survey data we can obtain a
prediction of the employment status that takes into account
miss-classification errors from both sources.
Evaluate how to use an additional measure on employment
(Continuous Census)
Evaluate the model in small area prediction
23. Integration of administrative sources and survey data through HMM for the production of labour statistics
results and conclusions
Thanks