SlideShare a Scribd company logo
The data fusion of EU-SILC and HBS at ISTAT
G. Donatiello, M. D’Orazio, D. Frattarola, M. Scanu, M. Spaziani
Workshop Comitato Consultivo per le Metodologie Statistiche - Roma, 19 November 2018
Overview
 Statement of the data fusion problem and non-identifiability of the
model for the data
 How to deal with non-identifiability: matching error and uncertainty
 Conditional independence assumption
 Results from real data
 Has data fusion a future?
 Lessons learnt from committee advices
Mauro Scanu – workshop CCMS, 19 november 2018
Data fusion
 Two independent samples drawn from the same population
 The only common information is in the X variables (glue, according to
Reiter)
 Example: (Y1) expenditure variables, (Y2) income variables (e.g. EU
Statistics on Income and Living Conditions (EU-SILC) and Household
Budget Survey (HBS))
Y1 X
y1,1 x1
y1,2 x2
… …
Y1,nA xnA
X Y2
x1 y2,1
x2 y2,2
… …
xnB Y2,nB
A
B
Mauro Scanu – workshop CCMS, 19 november 2018
Y1
Y2
Red parts not observed
How can we estimate
the joint (Y1, Y2)
distribution?
Data fusion
Two main methodological problems in the statistical matching context:
 The model for (X, Y1, Y2) is not identifiable given the data sets A and B
(unless specific models are imposed)
 The two samples could be drawn according to complex survey
designs, and it is not of immediate solution how to use survey weights
in the statistical matching context
Let’s focus just on the first issue
Mauro Scanu – workshop CCMS, 19 november 2018
Data fusion: identifiability
If instead of the samples A and B there was complete knowledge on the
distribution of (Y1,X) and (Y2,X), the joint (Y1,Y2,X) distribution is still
problematic for Y1 and Y2 given X. Generally speaking, it is possible to
say that
These are the traditional Fréchet bounds for cumulative distribution
functions. This set of distribution is named uncertainty set
They can be complemented with additional information, so that this space
of distributions becomes narrower
Mauro Scanu – workshop CCMS, 19 november 2018
Data fusion on real data
Let’s consider HBS and EU-SILC, again.
X Y2 X Y1
SILC HBS
Y1 = expenditures
Y2 = income
Mauro Scanu – workshop CCMS, 19 november 2018
Without any other information, it is possible to reconstruct a joint distribution
between Y1 and Y2 by Fréchet bounds.
Trick: Use the «income variable» observed in HBS
Assumption: unreliable in the income value, reliable on the household
order from the lowest to the highest income
Otherwise, let’s specify a model that is estimable given the data at hand
F(y1,y2|X)=F(y1|x)F(y2|x)
Data fusion on real data
X Y2
Other common var. Y*2 Y2
X Y1
Other common var. Y*2 Y1
SILC HBS
Y1 = classes of expenditures
Y2 = classes of income Y*2 = classes of ordered income
Mauro Scanu – workshop CCMS, 19 november 2018
Y*2 is highly associated with Y2 .
According to Zhang (2015) Y2 and Y*2 are proxy: same support and similar
definition. Hence, conditional independence can be assumed
F(y1,y2|X)=F(y1|x)F(y2|x)
Data fusion: uncertainty and CIA estimate
 Uncertainty: We did not impose any constraint on (Y1, Y2) or (Y1, Y2|X)
 Model: Conditional independence between (Y1, Y2|X) is just one of the
possible distributions, given knowledge on (Y1|X) and (Y2|X)
The estimate under the Conditional Independence Assumption is just one of the
possible and equally plausible estimates we can get from the two sample
surveys
Is this assumption correct?
Mauro Scanu – workshop CCMS, 19 november 2018
As part of the revision of the EU-SILC within the new Framework
Regulation on Social Statistics (IESS), Italy implemented the ESS
Agreement by testing the rolling module on Consumption & Wealth
(C&W) into EU-SILC 2017
The module collected five consumption target variables:
• Food at home
• Food outside home
• Public Transport
• Private Transport
• Regular Savings
Italy decided to continue to collect the most relevant variables of the
C&W module also in 2018 and 2019 to have consolidated and useful
proxy variables
An update: what we’ll have in the near future
Mauro Scanu – workshop CCMS, 19 november 2018
An update: The CIA is a good model!
The variables of the C&W module, plus the housing costs annually
available, should represent a significant part of total consumption that can
allow to estimate a total consumption variable also into SILC
These are some of the partial correlations on income and module
consumption (really observed joint data!) given some common variables
including Y*2 as observed in EU-SILC
Mauro Scanu – workshop CCMS, 19 november 2018
Geo.
Ripart.
N.
goods
Ordinal inc.
class Y*2
Partial
correlation
Number
of obs.
1 6 1 0,01 193
1 6 2 -0,01 244
1 6 3 0,06 287
1 6 4 -0,06 206
1 6 5 0,09 173
1 6 6 0,07 142
1 6 7 -0,07 51
1 4 7 1,00 2
1 5 1 -0,18 186
What can we learn
• Duplicate this exercise? Why not! E.g. anytime microdata sets on the joint
(income, Z) are not observed and cannot be recreated by record linkage (e.g.
Z from multipurpose, labour force, time use surveys, …), but…
• The important thing is: plan in advance the presence of the correct glue (in
our case Y*2) in the data sets to fuse. For instance: a question on income that
will never be analyzed for its answers, but as a glue in order to attach income
as detected in EU-SILC. For social surveys, glue can even be on the pop.
Census.
• A word of caution: Taken income from EU-SILC and its proxy/glue Y*2 data
fusion is able to recreate information on (income, Z) whatever Z in the other
survey, also multivariate.
Mauro Scanu – workshop CCMS, 19 november 2018
It does not work the other way round!
Lessons learnt from the committee
1. Connect data fusion with ecological inference
2. Pay attention to estimates based on calibration estimators, and the
dependence relationship between the variables
3. It is extremely lucky if we are in the condition to fuse data that cannot be
linked: don’t be too “multivariate”
4. Data fusion results dissemination can be possible, with aggregate data.
Microdata are less appropriate to disseminate, because users will use
them as “real data” without any caution
5. Are uncertainty sets always intervals?
Mauro Scanu – workshop CCMS, 19 november 2018
Session III - Census and Registers -  M. Scanu, G.Donariello, D. Frattarola, M. Spaziani, Data fusion of EU-SILC and HBS at Istat | (updates 2018)

More Related Content

Similar to Session III - Census and Registers - M. Scanu, G.Donariello, D. Frattarola, M. Spaziani, Data fusion of EU-SILC and HBS at Istat | (updates 2018)

Session 4 a chen et al discussion
Session 4 a chen et al   discussionSession 4 a chen et al   discussion
Session 4 a chen et al discussion
IARIW 2014
 
02_european report_Anne Van Lancker_EN_vf
02_european report_Anne Van Lancker_EN_vf02_european report_Anne Van Lancker_EN_vf
02_european report_Anne Van Lancker_EN_vfAnne Van Lancker
 
HLEG thematic workshop on measuring economic, social and environmental resili...
HLEG thematic workshop on measuring economic, social and environmental resili...HLEG thematic workshop on measuring economic, social and environmental resili...
HLEG thematic workshop on measuring economic, social and environmental resili...
StatsCommunications
 
2014.05.20_OECD-ECLAC-PSE Forum_altenburg
2014.05.20_OECD-ECLAC-PSE Forum_altenburg2014.05.20_OECD-ECLAC-PSE Forum_altenburg
2014.05.20_OECD-ECLAC-PSE Forum_altenburg
OECD_Inclusivegrowth
 
Employer Employee linked data in Italy availability and usage by institusions
Employer Employee linked data in Italy availability and usage by institusionsEmployer Employee linked data in Italy availability and usage by institusions
Employer Employee linked data in Italy availability and usage by institusions
Structuralpolicyanalysis
 
PUBLIC CAPITAL. Measurement Issues
PUBLIC CAPITAL. Measurement IssuesPUBLIC CAPITAL. Measurement Issues
PUBLIC CAPITAL. Measurement Issues
SPINTAN
 
Talk_boE_end_proyect2013
Talk_boE_end_proyect2013Talk_boE_end_proyect2013
Talk_boE_end_proyect2013
arsanmar
 
Advanced Econometrics L1-2.pptx
Advanced Econometrics L1-2.pptxAdvanced Econometrics L1-2.pptx
Advanced Econometrics L1-2.pptx
akashayosha
 
Rebalancing the €A: Insights from #BdFeco research, Marc-O. Strauss-Kahn
Rebalancing the €A: Insights from #BdFeco research, Marc-O. Strauss-KahnRebalancing the €A: Insights from #BdFeco research, Marc-O. Strauss-Kahn
Rebalancing the €A: Insights from #BdFeco research, Marc-O. Strauss-Kahn
Soledad Zignago
 
Discussion paper: The welfare and distributional effects of fiscal volatility...
Discussion paper: The welfare and distributional effects of fiscal volatility...Discussion paper: The welfare and distributional effects of fiscal volatility...
Discussion paper: The welfare and distributional effects of fiscal volatility...
ADEMU_Project
 
Michele Postigliola, Dinamiche della politica fiscale e del debito pubblico i...
Michele Postigliola, Dinamiche della politica fiscale e del debito pubblico i...Michele Postigliola, Dinamiche della politica fiscale e del debito pubblico i...
Michele Postigliola, Dinamiche della politica fiscale e del debito pubblico i...
Istituto nazionale di statistica
 
Predicting the economic public opinions in Europe
Predicting the economic public opinions in EuropePredicting the economic public opinions in Europe
Predicting the economic public opinions in Europe
SYRTO Project
 
Javier Ordóñez. Real unit labour costs in Eurozone countries: Drivers and clu...
Javier Ordóñez. Real unit labour costs in Eurozone countries: Drivers and clu...Javier Ordóñez. Real unit labour costs in Eurozone countries: Drivers and clu...
Javier Ordóñez. Real unit labour costs in Eurozone countries: Drivers and clu...
Eesti Pank
 
Constructing A Knowledge Economy Composite Indicator With Imprecise Data
Constructing A Knowledge Economy Composite Indicator With Imprecise DataConstructing A Knowledge Economy Composite Indicator With Imprecise Data
Constructing A Knowledge Economy Composite Indicator With Imprecise Dataronicky
 
The EU Productivity Gap - Open Session
The EU Productivity Gap - Open SessionThe EU Productivity Gap - Open Session
The EU Productivity Gap - Open Session
SPINTAN
 
WPIA Meeting - OECD. Paris Oct2016
WPIA Meeting - OECD. Paris Oct2016WPIA Meeting - OECD. Paris Oct2016
WPIA Meeting - OECD. Paris Oct2016
SPINTAN
 
Working Party on Industry Analysis (WPIA)
Working Party on Industry Analysis (WPIA)Working Party on Industry Analysis (WPIA)
Working Party on Industry Analysis (WPIA)
SPINTAN
 
Subsidies Reforms and Social Justice
Subsidies Reforms and Social JusticeSubsidies Reforms and Social Justice
Subsidies Reforms and Social Justice
Economic Research Forum
 
Stephen Aldridge -Public sector efficiency in the UK
Stephen Aldridge -Public sector efficiency in the UKStephen Aldridge -Public sector efficiency in the UK
Stephen Aldridge -Public sector efficiency in the UK
OECD CFE
 
Plenary_Talk_1_Meyer
Plenary_Talk_1_MeyerPlenary_Talk_1_Meyer
Plenary_Talk_1_Meyer
CSS-Institute
 

Similar to Session III - Census and Registers - M. Scanu, G.Donariello, D. Frattarola, M. Spaziani, Data fusion of EU-SILC and HBS at Istat | (updates 2018) (20)

Session 4 a chen et al discussion
Session 4 a chen et al   discussionSession 4 a chen et al   discussion
Session 4 a chen et al discussion
 
02_european report_Anne Van Lancker_EN_vf
02_european report_Anne Van Lancker_EN_vf02_european report_Anne Van Lancker_EN_vf
02_european report_Anne Van Lancker_EN_vf
 
HLEG thematic workshop on measuring economic, social and environmental resili...
HLEG thematic workshop on measuring economic, social and environmental resili...HLEG thematic workshop on measuring economic, social and environmental resili...
HLEG thematic workshop on measuring economic, social and environmental resili...
 
2014.05.20_OECD-ECLAC-PSE Forum_altenburg
2014.05.20_OECD-ECLAC-PSE Forum_altenburg2014.05.20_OECD-ECLAC-PSE Forum_altenburg
2014.05.20_OECD-ECLAC-PSE Forum_altenburg
 
Employer Employee linked data in Italy availability and usage by institusions
Employer Employee linked data in Italy availability and usage by institusionsEmployer Employee linked data in Italy availability and usage by institusions
Employer Employee linked data in Italy availability and usage by institusions
 
PUBLIC CAPITAL. Measurement Issues
PUBLIC CAPITAL. Measurement IssuesPUBLIC CAPITAL. Measurement Issues
PUBLIC CAPITAL. Measurement Issues
 
Talk_boE_end_proyect2013
Talk_boE_end_proyect2013Talk_boE_end_proyect2013
Talk_boE_end_proyect2013
 
Advanced Econometrics L1-2.pptx
Advanced Econometrics L1-2.pptxAdvanced Econometrics L1-2.pptx
Advanced Econometrics L1-2.pptx
 
Rebalancing the €A: Insights from #BdFeco research, Marc-O. Strauss-Kahn
Rebalancing the €A: Insights from #BdFeco research, Marc-O. Strauss-KahnRebalancing the €A: Insights from #BdFeco research, Marc-O. Strauss-Kahn
Rebalancing the €A: Insights from #BdFeco research, Marc-O. Strauss-Kahn
 
Discussion paper: The welfare and distributional effects of fiscal volatility...
Discussion paper: The welfare and distributional effects of fiscal volatility...Discussion paper: The welfare and distributional effects of fiscal volatility...
Discussion paper: The welfare and distributional effects of fiscal volatility...
 
Michele Postigliola, Dinamiche della politica fiscale e del debito pubblico i...
Michele Postigliola, Dinamiche della politica fiscale e del debito pubblico i...Michele Postigliola, Dinamiche della politica fiscale e del debito pubblico i...
Michele Postigliola, Dinamiche della politica fiscale e del debito pubblico i...
 
Predicting the economic public opinions in Europe
Predicting the economic public opinions in EuropePredicting the economic public opinions in Europe
Predicting the economic public opinions in Europe
 
Javier Ordóñez. Real unit labour costs in Eurozone countries: Drivers and clu...
Javier Ordóñez. Real unit labour costs in Eurozone countries: Drivers and clu...Javier Ordóñez. Real unit labour costs in Eurozone countries: Drivers and clu...
Javier Ordóñez. Real unit labour costs in Eurozone countries: Drivers and clu...
 
Constructing A Knowledge Economy Composite Indicator With Imprecise Data
Constructing A Knowledge Economy Composite Indicator With Imprecise DataConstructing A Knowledge Economy Composite Indicator With Imprecise Data
Constructing A Knowledge Economy Composite Indicator With Imprecise Data
 
The EU Productivity Gap - Open Session
The EU Productivity Gap - Open SessionThe EU Productivity Gap - Open Session
The EU Productivity Gap - Open Session
 
WPIA Meeting - OECD. Paris Oct2016
WPIA Meeting - OECD. Paris Oct2016WPIA Meeting - OECD. Paris Oct2016
WPIA Meeting - OECD. Paris Oct2016
 
Working Party on Industry Analysis (WPIA)
Working Party on Industry Analysis (WPIA)Working Party on Industry Analysis (WPIA)
Working Party on Industry Analysis (WPIA)
 
Subsidies Reforms and Social Justice
Subsidies Reforms and Social JusticeSubsidies Reforms and Social Justice
Subsidies Reforms and Social Justice
 
Stephen Aldridge -Public sector efficiency in the UK
Stephen Aldridge -Public sector efficiency in the UKStephen Aldridge -Public sector efficiency in the UK
Stephen Aldridge -Public sector efficiency in the UK
 
Plenary_Talk_1_Meyer
Plenary_Talk_1_MeyerPlenary_Talk_1_Meyer
Plenary_Talk_1_Meyer
 

More from Istituto nazionale di statistica

Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
Istituto nazionale di statistica
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
Istituto nazionale di statistica
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
Istituto nazionale di statistica
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
Istituto nazionale di statistica
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
Istituto nazionale di statistica
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
Istituto nazionale di statistica
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
Istituto nazionale di statistica
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
Istituto nazionale di statistica
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
Istituto nazionale di statistica
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
Istituto nazionale di statistica
 
14a Conferenza Nazionale di Statisticacnstatistica14
14a Conferenza Nazionale di Statisticacnstatistica1414a Conferenza Nazionale di Statisticacnstatistica14
14a Conferenza Nazionale di Statisticacnstatistica14
Istituto nazionale di statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
Istituto nazionale di statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
Istituto nazionale di statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
Istituto nazionale di statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
Istituto nazionale di statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
Istituto nazionale di statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
Istituto nazionale di statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
Istituto nazionale di statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
Istituto nazionale di statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
Istituto nazionale di statistica
 

More from Istituto nazionale di statistica (20)

Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profitCensimenti Permanenti Istituzioni non profit
Censimenti Permanenti Istituzioni non profit
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
 
Censimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni PubblicheCensimento Permanente Istituzioni Pubbliche
Censimento Permanente Istituzioni Pubbliche
 
14a Conferenza Nazionale di Statisticacnstatistica14
14a Conferenza Nazionale di Statisticacnstatistica1414a Conferenza Nazionale di Statisticacnstatistica14
14a Conferenza Nazionale di Statisticacnstatistica14
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 
14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica14a Conferenza Nazionale di Statistica
14a Conferenza Nazionale di Statistica
 

Recently uploaded

A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
chanes7
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
Krisztián Száraz
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
Wasim Ak
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 

Recently uploaded (20)

A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
Digital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion DesignsDigital Artifact 2 - Investigating Pavilion Designs
Digital Artifact 2 - Investigating Pavilion Designs
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Normal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of LabourNormal Labour/ Stages of Labour/ Mechanism of Labour
Normal Labour/ Stages of Labour/ Mechanism of Labour
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.Biological Screening of Herbal Drugs in detailed.
Biological Screening of Herbal Drugs in detailed.
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 

Session III - Census and Registers - M. Scanu, G.Donariello, D. Frattarola, M. Spaziani, Data fusion of EU-SILC and HBS at Istat | (updates 2018)

  • 1. The data fusion of EU-SILC and HBS at ISTAT G. Donatiello, M. D’Orazio, D. Frattarola, M. Scanu, M. Spaziani Workshop Comitato Consultivo per le Metodologie Statistiche - Roma, 19 November 2018
  • 2. Overview  Statement of the data fusion problem and non-identifiability of the model for the data  How to deal with non-identifiability: matching error and uncertainty  Conditional independence assumption  Results from real data  Has data fusion a future?  Lessons learnt from committee advices Mauro Scanu – workshop CCMS, 19 november 2018
  • 3. Data fusion  Two independent samples drawn from the same population  The only common information is in the X variables (glue, according to Reiter)  Example: (Y1) expenditure variables, (Y2) income variables (e.g. EU Statistics on Income and Living Conditions (EU-SILC) and Household Budget Survey (HBS)) Y1 X y1,1 x1 y1,2 x2 … … Y1,nA xnA X Y2 x1 y2,1 x2 y2,2 … … xnB Y2,nB A B Mauro Scanu – workshop CCMS, 19 november 2018 Y1 Y2 Red parts not observed How can we estimate the joint (Y1, Y2) distribution?
  • 4. Data fusion Two main methodological problems in the statistical matching context:  The model for (X, Y1, Y2) is not identifiable given the data sets A and B (unless specific models are imposed)  The two samples could be drawn according to complex survey designs, and it is not of immediate solution how to use survey weights in the statistical matching context Let’s focus just on the first issue Mauro Scanu – workshop CCMS, 19 november 2018
  • 5. Data fusion: identifiability If instead of the samples A and B there was complete knowledge on the distribution of (Y1,X) and (Y2,X), the joint (Y1,Y2,X) distribution is still problematic for Y1 and Y2 given X. Generally speaking, it is possible to say that These are the traditional Fréchet bounds for cumulative distribution functions. This set of distribution is named uncertainty set They can be complemented with additional information, so that this space of distributions becomes narrower Mauro Scanu – workshop CCMS, 19 november 2018
  • 6. Data fusion on real data Let’s consider HBS and EU-SILC, again. X Y2 X Y1 SILC HBS Y1 = expenditures Y2 = income Mauro Scanu – workshop CCMS, 19 november 2018 Without any other information, it is possible to reconstruct a joint distribution between Y1 and Y2 by Fréchet bounds. Trick: Use the «income variable» observed in HBS Assumption: unreliable in the income value, reliable on the household order from the lowest to the highest income Otherwise, let’s specify a model that is estimable given the data at hand F(y1,y2|X)=F(y1|x)F(y2|x)
  • 7. Data fusion on real data X Y2 Other common var. Y*2 Y2 X Y1 Other common var. Y*2 Y1 SILC HBS Y1 = classes of expenditures Y2 = classes of income Y*2 = classes of ordered income Mauro Scanu – workshop CCMS, 19 november 2018 Y*2 is highly associated with Y2 . According to Zhang (2015) Y2 and Y*2 are proxy: same support and similar definition. Hence, conditional independence can be assumed F(y1,y2|X)=F(y1|x)F(y2|x)
  • 8. Data fusion: uncertainty and CIA estimate  Uncertainty: We did not impose any constraint on (Y1, Y2) or (Y1, Y2|X)  Model: Conditional independence between (Y1, Y2|X) is just one of the possible distributions, given knowledge on (Y1|X) and (Y2|X) The estimate under the Conditional Independence Assumption is just one of the possible and equally plausible estimates we can get from the two sample surveys Is this assumption correct? Mauro Scanu – workshop CCMS, 19 november 2018
  • 9. As part of the revision of the EU-SILC within the new Framework Regulation on Social Statistics (IESS), Italy implemented the ESS Agreement by testing the rolling module on Consumption & Wealth (C&W) into EU-SILC 2017 The module collected five consumption target variables: • Food at home • Food outside home • Public Transport • Private Transport • Regular Savings Italy decided to continue to collect the most relevant variables of the C&W module also in 2018 and 2019 to have consolidated and useful proxy variables An update: what we’ll have in the near future Mauro Scanu – workshop CCMS, 19 november 2018
  • 10. An update: The CIA is a good model! The variables of the C&W module, plus the housing costs annually available, should represent a significant part of total consumption that can allow to estimate a total consumption variable also into SILC These are some of the partial correlations on income and module consumption (really observed joint data!) given some common variables including Y*2 as observed in EU-SILC Mauro Scanu – workshop CCMS, 19 november 2018 Geo. Ripart. N. goods Ordinal inc. class Y*2 Partial correlation Number of obs. 1 6 1 0,01 193 1 6 2 -0,01 244 1 6 3 0,06 287 1 6 4 -0,06 206 1 6 5 0,09 173 1 6 6 0,07 142 1 6 7 -0,07 51 1 4 7 1,00 2 1 5 1 -0,18 186
  • 11. What can we learn • Duplicate this exercise? Why not! E.g. anytime microdata sets on the joint (income, Z) are not observed and cannot be recreated by record linkage (e.g. Z from multipurpose, labour force, time use surveys, …), but… • The important thing is: plan in advance the presence of the correct glue (in our case Y*2) in the data sets to fuse. For instance: a question on income that will never be analyzed for its answers, but as a glue in order to attach income as detected in EU-SILC. For social surveys, glue can even be on the pop. Census. • A word of caution: Taken income from EU-SILC and its proxy/glue Y*2 data fusion is able to recreate information on (income, Z) whatever Z in the other survey, also multivariate. Mauro Scanu – workshop CCMS, 19 november 2018 It does not work the other way round!
  • 12. Lessons learnt from the committee 1. Connect data fusion with ecological inference 2. Pay attention to estimates based on calibration estimators, and the dependence relationship between the variables 3. It is extremely lucky if we are in the condition to fuse data that cannot be linked: don’t be too “multivariate” 4. Data fusion results dissemination can be possible, with aggregate data. Microdata are less appropriate to disseminate, because users will use them as “real data” without any caution 5. Are uncertainty sets always intervals? Mauro Scanu – workshop CCMS, 19 november 2018