SlideShare a Scribd company logo
1 of 28
Download to read offline
Can big data help in the production of reliable
local area statistics?
Partha Lahiri
Joint Program in Survey Methodology
University of Maryland, College Park, USA
SDAL, Virginia Tech.
January 28, 2015
SDAL January 28, 2015 1 / 27
Ref: http://farmdocdaily.illinois.edu/2013/07/concentration-corn-soybean-production.html
Based on average yearly production of corn during 2010-12 using NASS/USDA data
SDAL January 28, 2015 2 / 27
Remote Sensing for Crop Acreage
The NASS-USDA has been publishing county estimates of crop
acreage, crop production, crop yield and livestock inventories since
1917.
Uses: local agricultural decision making, payments to farmers if crop
yields are below certain levels.
Can earth resources satellite data provide useful ancillary data source
for county estimates of crop acreage?
Satellite information is recorded for pixels (a term for picture
elements). A pixel is about .45 hectares;
Based on satellite readings in early Fall, it is possible to classify the
crop cover all pixels. This generates big data.
SDAL January 28, 2015 3 / 27
Ref: http://www.nass.usda.gov/Statistics-by-State/Iowa/Publications/Cropland-Data-Layer/2011/index.asp
2011 Hardin County, Iowa
0 1.41 2.83 4.24
miles
LandOCoverOCategories
-byOdecreasingOacreage*O
AGRICULTURE
Corn
Soybeans
GrasslandOHerbaceous
Alfalfa
OtherOHay/NonOAlfalfa
Oats
WinterOWheat
Rye
Fallow/IdleOCropland
Sod/GrassOSeed
NON-AGRICULTURED
Developed/OpenOSpace
DeciduousOForest
Developed/LowOIntensity
WoodyOWetlands
OpenOWater
Developed/MediumOIntensity
Produced by CropScape - http://nassgeodata.gmu.edu/CropScape * Only top 6 non-agriculturecategroies are listed.
SDAL January 28, 2015 4 / 27
Remote Sensing for Crop Acreage
Bellow et al.
NASS has been a user of remote sensing products since the
1950’s when it began using midaltitude aerial photography to
construct area sampling frames (ASF’s) for the 48 states of the
continental United States. A new era in remote sensing began in
1972 with the launch of the Landsat I earth-resource monitoring
satellite. Four additional Landsats have been launched since
1972, with Landsat IV and V still in operation in 1993. The
polar-orbiting Landsat satellites contain a multi-spectral scanner
(MSS) that measures reflected energy in four bands of the
electromagnetic spectrum for an area of just under one acre. The
spectral bands were selected to be responsive to vegetation
characteristics.
SDAL January 28, 2015 5 / 27
Remote Sensing for Crop Acreage
In addition to the MSS sensor, Landsats IV and V have a
Thematic Mapper (TM) sensor which measures seven energy
bands and has increased spatial resolution. The large area (185
by 170 km) and repeat (16 day per satellite) coverage of these
satellites opened new areas of remote sensing research: large area
crop inventories, crop yields, land cover mapping, area frame
stratification, and small area crop cover estimation.
SDAL January 28, 2015 6 / 27
SDAL January 28, 2015 7 / 27
SDAL January 28, 2015 8 / 27
Ref: Battese, Harter and Fuller (1988 JASA)
SDAL January 28, 2015 9 / 27
Unit Level Model
yij : value of the study variable for the jth unit of the i small area
population (i = 1, · · · , m; j = 1, · · · , Ni )
We are interested in estimating the finite population means:
¯Yi = N−1
i
Ni
j=1
yij .
Nested Error Regression Model
yij = xij β + vi + eij ,
where xij is a p × 1 column vector of known auxiliary variables; {vi } and
{eij } are all independent with vi
iid
∼ N(0, σ2
v ) and eij
iid
∼ N(0, σ2
e )
SDAL January 28, 2015 10 / 27
An Example
Estimation of the number of hectares of corn for 12 Iowa counties
based on the 1978 June Enumerative Survey and satellite data.
yij : the number of hectares of corn in the jth segment of the ith
county as reported in the June Enumerative Survey.
xij = (1, x1ij , x2ij ), where x1ij (x2ij ) is the number of pixels classified as
corn (soybean) in the jth segment of the ith county.
¯X = (1, ¯X1i , ¯X2i ), where ¯X1i ( ¯X2i ) is the mean number of pixels per
segment classified as corn (soybean) for county i.
SDAL January 28, 2015 11 / 27
EBLUP
EBLUP (EB) estimators of ¯Yi :
¯yEB
i = fi
ˆ¯Y Reg
i + (1 − fi ){(1 − ˆBi ) ˆ¯Y Reg
i + Bi
ˆ¯Y Syn
i },
where
Bi =
ˆσ2
e /ni
ˆσ2
v + ˆσ2
e /ni
ˆ¯Y Reg
i = ¯yi + ( ¯Xi − ¯xi ) ˆβ
ˆ¯Y Syn
i = ¯Xi
ˆβ
Any standard variance component estimation method (e.g., REML)
can be used to obtain ˆσ2
v and ˆσ2
e .
ˆβ: the weighted least squares estimator with estimated variance
components
SDAL January 28, 2015 12 / 27
Plots of Survey-Weighted Poverty Rates and SAE for a Small County
(drawn by Sam Hawala)
SDAL January 28, 2015 13 / 27
Plots of Estimated SE Survey-Weighted Poverty Rates and SAE for a
Small County (drawn by Sam Hawala)
SDAL January 28, 2015 14 / 27
A Cross-Sectional Model
Ref: Fay and Herriot (JASA 1979)
For i = 1, · · · , m,
Level 1: (Sampling Distribution): yi = θi + ei ;
Level 2: (Linking Distribution): θi = xi β + vi
where
yi : direct survey estimate of true small area mean θi for area i
xi : p × 1 vector of known auxiliary variables coming from big data;
{ei } and {vi } are indep. with ei ∼ N(0, ψi ) and vi ∼ N(0, σ2
v ); ψit’s
are assumed to be known.
The p × 1 vector of regression coefficients βt and model variance σ2
vt
are unknown.
SDAL January 28, 2015 15 / 27
Auxiliary Variables from big data
The proportion of child exemptions reported by families in poverty on
their tax returns.
The proportion of people under 65 who did not file income tax
returns.
The proportion of people receiving food stamps.
SDAL January 28, 2015 16 / 27
A Time Series Cross-Sectional Model
Ref: Datta, Lahiri, Maiti and Lu (1999) Datta, Lahiri, Maiti (2002)
For i = 1, · · · , m; t = 1, · · · , T,
Level 1: : yit = θit + eit;
Level 2: : θit = xitβ + vi + uit
Level 3: : uit = uit−1 + it
where
yit: direct survey estimate of median income of four person family for
state i, year t
eit: sampling error
xit: auxiliary variables coming from big data (previous census and
administrative records)
vi : state specific random effects
uit: state and year specific random effects
SDAL January 28, 2015 17 / 27
Estimates of Coefficient of Variations of CPS Direct estimates of
Median Income of 4-person Families in the US States: Year 1989
2.5
5.0
7.5
10.0
12.5
U.S. state level
CV, CPS
SDAL January 28, 2015 18 / 27
Estimates of Coefficient of Variations of EB Direct estimates of
Median Income of 4-person Families in the US States: Year 1989
2.5
5.0
7.5
10.0
12.5
U.S. state level
CV, EB
SDAL January 28, 2015 19 / 27
A Plot of Absolute Residuals From a Simple Linear Regression
Dep Variable: 1989 Median Income Estimates from 1990 Census
Indep. Variable: CPS or EB Estimates for 1989
0 10 20 30 40 50
0200040006000800010000
Plot of absolute residual versus state
State
Absoluteresidual
CPS
EB
SDAL January 28, 2015 20 / 27
Poverty mapping: the Chilean Case
High poverty rates can work favorably to a Chilean municipality in
terms of securing more funds from the Chilean central government.
Consider the following situation. For a given small municipality,
poverty rate for the current year turns out to be high by standard
design-based method.
How do we convince the mayor of that municipality to go for a
statistically efficient SAE method that yields lower poverty rate?
SDAL January 28, 2015 21 / 27
Plots of Survey-Weighted Poverty Rates and SAE for
Selected Comunas (drawn by Carolina Casas-Cordero)
0
.1
.2
.3
.4
0
.1
.2
.3
.4
2000 2003 2006 2009 2012 2000 2003 2006 2009 2012
concón hualpén
lolol santiago
Direct SAE
PovertyRate
Year
Source: Casen Survey 2000 to 2011
Estimates of poverty rates for comunas, Chile
SDAL January 28, 2015 22 / 27
Initial set of auxiliary variables
Number and Name of the auxiliary variable Institution responsible for data collection Frequency of publication
of the data
#1. Subsidio Familiar Unidad de Prestaciones Monetarias, Ministerio de Desarrollo Social. monthly and yearly
#2. Subsidio al Pago del Consumo de Agua Potable
y Servicio de Alcantarillado de Aguas Servidas
Unidad de Prestaciones Monetarias, Ministerio de Desarrollo Social. monthly and yearly
#3. Bono Chile Solidario Unidad de Prestaciones Monetarias, Ministerio de Desarrollo Social. monthly and yearly
#4. Subsidio de Discapacidad Mental Unidad de Prestaciones Monetarias, Ministerio de Desarrollo Social. monthly and yearly
#5. Pensión Básica Solidaria (vejez e invalidez) Unidad de Prestaciones Monetarias, Ministerio de Desarrollo Social. December
#6. Aporte Previsional Solidario (vejez e invalidez) Unidad de Prestaciones Monetarias, Ministerio de Desarrollo Social. December
#7. Bonificación al Ingreso Ético Familiar Unidad de Prestaciones Monetarias, Ministerio de Desarrollo Social. monthly and yearly
#8. Beca de Apoyo a la Retención Escolar, BARE Unidad de Prestaciones Monetarias, Ministerio de Desarrollo Social. monthly and yearly
#9. Afiliados Sistema de Capitalización Individual Superintendencia de Pensiones monthly and yearly
#10. Matrícula Ministerio de Educación Yearly
#11. Rendimiento Ministerio de Educación Yearly
#12. SIMCE Ministerio de Educación Yearly or every two years
#13. Titulados Educación Superior Ministerio de Educación Yearly
#14. Índice de Vulnerabilidad del Establecimiento
(IVE-SINAE)
Junta Nacional Escolar y Becas (Junaeb) Yearly
#15. Situación Nutricional estudiantes básica y
media
Junta Nacional Escolar y Becas (Junaeb) Yearly
#16. Población beneficiaria Fonasa Ministerio de Salud Yearly
#17. Atenciones sector privado Ministerio de Salud Yearly
#18. Razón de analfabetos respecto a la población de
10 y más años en la comuna
CENSO, INE Every 10 years
#19. Porcentaje de Población Rural CENSO, INE Every 10 years
#20. Porcentaje de Asistencia Escolar Comunal SINIM monthly
#21. Tamaño promedio del hogar CENSO, INE Every 10 years
#22. Tasa de pobreza histórica CASEN Every 2 or 3 years
#23. Contribuciones de Vivienda SII (http://www.sii.cl/avaluaciones/estadisticas/estadisticas_bbrr.htm#2) Yearly
#24. Remuneraciones promedio de los trabajadores
dependientes
Yearly
Source: Ministerio de Desarrollo Social (2013a).
SDAL January 28, 2015 23 / 27
Regression Analysis
Independent variables
Regression coefficient estimate
(t-statistics): original comuna
weights
Average wage of dependent
workers (log)
-0.09575646
(3.52**)
Average of the poverty rate from
Casen 2000, 2003 and 2006
(arcsin)
0.49548266
(7.92**)
% of population in rural areas
(arcsin)
-0.13409847
(4.96**)
% of illiterate population (arcsin)
0.40349163
(2.57*)
% of population attending to
school
(arcsin)
-0.21883535
(2.23*)
Dummy for region 7 (=1)
0.03442978
(2.11*)
Dummy for region 8 (=1)
0.03882056
(2.67**)
Dummy for region 9 (=1)
0.105632
(6.04**)
Constant
1.61477028
(4.24**)
Number of observations 235
Adjusted R2
0.67
SDAL January 28, 2015 24 / 27
Length of the direct and parametric bootstrap confidence intervals of the comuna-level
poverty rates for comunas sorted by the limited translation empirical Bayes estimates of
the poverty rate.
SDAL January 28, 2015 25 / 27
”...D.J. Finney once wrote about the statistician whose
client comes in and says, ”Here is my mountain of trash.
Find the gems that lie therein.” Finney’s advice was to
not throw him out of the office but to attempt to find out
what he considers ”gems”. After all, if the trained
statistician does not help, he will find some one who
will....” David Salsburg, ASA Connect Discussion
SDAL January 28, 2015 26 / 27
First
Latin American ISI Satellite
Meeting on Small Area Estimation
August 3-5, 2015, Santiago, Chile
International Statistical Institute (ISI) Satellite Meeting
At Pontificia Universidad Católica de Chile
Invited Talks:
 Malay Ghosh
“Small Area Estimation with Health Applications”
 Wayne Fuller
“Bootstrap Methods for Small Area Predictions”
 Partha Lahiri
“Recent Advances in Poverty Mapping Methodology”
 Angela Luna, Nikos Tzavidis and LiChun Zhang
“From start to finish: Specify – Adapt – Evaluate (SAE)”
 Danny Pfeffermann and Richard Tiller
“Small Area Labor Force Statistics using Time Series Models”
 J.N.K. Rao
“Measuring Uncertainty of Small Area Estimators”
Special Topics, Contributed & Poster Sessions:
Submit abstracts by April 15th of 2015 at sae2015@uc.cl
Abstracts accepted on a first-come basis.
Language of the conference:
English
Website:
http://www.encuestas.uc.cl/sae2015/
 Main Organizer: Centro de Encuestas y Estudios Longitudinales, Universidad Católica de Chile.
 Co-organizers: International Statistical Institute (ISI), International Association of Survey Statisticians
(IASS), Sociedad Chilena de Estadística (SOCHE), Instituto Nacional de Estadísticas (INE), Ministerio de
Desarrollo Social (MDS), Departamento de Estadística, Departamento de Salud Pública e Instituto de
Sociología de la Universidad Católica de Chile.
Purpose:
We hope that this
meeting will serve
as a bridge between
mathematical
statisticians and
practitioners working on
small area estimation in
academia, private and
government agencies.
This meeting in Santiago
will give researchers
an opportunity to learn
about state-of-the-art
small area estimation
techniques from the
experts in the field.
Journal
of the Royal
Statistical Society
(JRSS) Series A
Special Issue
on SAE !!!
THANK YOU!
SDAL January 28, 2015 27 / 27

More Related Content

Viewers also liked

Sdal molfino, emily mapping conflict onto insfrastructure
Sdal molfino, emily mapping conflict onto insfrastructureSdal molfino, emily mapping conflict onto insfrastructure
Sdal molfino, emily mapping conflict onto insfrastructure
kimlyman
 
Presentacion Final.pptx
Presentacion Final.pptxPresentacion Final.pptx
Presentacion Final.pptx
UNISON
 
Exploring percussive gesture on i pads with ensemble
Exploring percussive gesture on i pads with ensembleExploring percussive gesture on i pads with ensemble
Exploring percussive gesture on i pads with ensemble
又瑋 賴
 
Psychological disorder
Psychological disorder Psychological disorder
Psychological disorder
UNISON
 
Rainbowfish &skin button
Rainbowfish &skin buttonRainbowfish &skin button
Rainbowfish &skin button
又瑋 賴
 

Viewers also liked (14)

Sdal molfino, emily mapping conflict onto insfrastructure
Sdal molfino, emily mapping conflict onto insfrastructureSdal molfino, emily mapping conflict onto insfrastructure
Sdal molfino, emily mapping conflict onto insfrastructure
 
Graffiti fur
Graffiti furGraffiti fur
Graffiti fur
 
Tugas 4 persoalan khusus
Tugas 4 persoalan khususTugas 4 persoalan khusus
Tugas 4 persoalan khusus
 
Presentacion Final.pptx
Presentacion Final.pptxPresentacion Final.pptx
Presentacion Final.pptx
 
Kuliah sli 2015 rev 127
Kuliah sli 2015 rev 127Kuliah sli 2015 rev 127
Kuliah sli 2015 rev 127
 
Kitchen Cabinet Doors
Kitchen Cabinet DoorsKitchen Cabinet Doors
Kitchen Cabinet Doors
 
Exploring percussive gesture on i pads with ensemble
Exploring percussive gesture on i pads with ensembleExploring percussive gesture on i pads with ensemble
Exploring percussive gesture on i pads with ensemble
 
乘車安全
乘車安全乘車安全
乘車安全
 
Domain Specific Language for Specify Operations of a Central Counterparty(CCP)
Domain Specific Language for Specify Operations of a Central Counterparty(CCP)Domain Specific Language for Specify Operations of a Central Counterparty(CCP)
Domain Specific Language for Specify Operations of a Central Counterparty(CCP)
 
Psychological disorder
Psychological disorder Psychological disorder
Psychological disorder
 
Reclaim the fat
Reclaim the fatReclaim the fat
Reclaim the fat
 
Rainbowfish &skin button
Rainbowfish &skin buttonRainbowfish &skin button
Rainbowfish &skin button
 
Healey sdal social dynamics in living systems from microbe to metropolis
Healey sdal social dynamics in living systems from microbe to metropolis Healey sdal social dynamics in living systems from microbe to metropolis
Healey sdal social dynamics in living systems from microbe to metropolis
 
5 velmente råd til gründere
5 velmente råd til gründere5 velmente råd til gründere
5 velmente råd til gründere
 

Similar to Can big data help in the production of reliable local area statistics?

Cross secssa presentation_ecama
Cross secssa presentation_ecamaCross secssa presentation_ecama
Cross secssa presentation_ecama
IFPRIMaSSP
 
Community Health Assessment Delaware County 1998 .docx
Community Health Assessment Delaware County 1998  .docxCommunity Health Assessment Delaware County 1998  .docx
Community Health Assessment Delaware County 1998 .docx
monicafrancis71118
 
Berlin presentation
Berlin presentationBerlin presentation
Berlin presentation
markbrough
 
InstructionsAll the tasks calculations on this assignment must .docx
InstructionsAll the tasks calculations on this assignment must .docxInstructionsAll the tasks calculations on this assignment must .docx
InstructionsAll the tasks calculations on this assignment must .docx
dirkrplav
 
DashboardTemplateTargetWards
DashboardTemplateTargetWardsDashboardTemplateTargetWards
DashboardTemplateTargetWards
Max Akister
 

Similar to Can big data help in the production of reliable local area statistics? (20)

Safety nets, asset growth and poverty transitions: Any roles for safety nets ...
Safety nets, asset growth and poverty transitions: Any roles for safety nets ...Safety nets, asset growth and poverty transitions: Any roles for safety nets ...
Safety nets, asset growth and poverty transitions: Any roles for safety nets ...
 
A measurement error model approach to survey data integration: combining info...
A measurement error model approach to survey data integration: combining info...A measurement error model approach to survey data integration: combining info...
A measurement error model approach to survey data integration: combining info...
 
How much has wealth concentration grown in the United States? A re-examinatio...
How much has wealth concentration grown in the United States? A re-examinatio...How much has wealth concentration grown in the United States? A re-examinatio...
How much has wealth concentration grown in the United States? A re-examinatio...
 
Cross secssa presentation_ecama
Cross secssa presentation_ecamaCross secssa presentation_ecama
Cross secssa presentation_ecama
 
Taxes, transfers, inequality and the poor in the developing world
Taxes, transfers, inequality and the poor in the developing worldTaxes, transfers, inequality and the poor in the developing world
Taxes, transfers, inequality and the poor in the developing world
 
Povertymappingmalawitoddbenson 091211011812-phpapp01
Povertymappingmalawitoddbenson 091211011812-phpapp01Povertymappingmalawitoddbenson 091211011812-phpapp01
Povertymappingmalawitoddbenson 091211011812-phpapp01
 
Session 6 b garner ruser of ag growth iairw 2014
Session 6 b garner ruser of ag growth iairw 2014Session 6 b garner ruser of ag growth iairw 2014
Session 6 b garner ruser of ag growth iairw 2014
 
Community Health Assessment Delaware County 1998 .docx
Community Health Assessment Delaware County 1998  .docxCommunity Health Assessment Delaware County 1998  .docx
Community Health Assessment Delaware County 1998 .docx
 
Westfield, MA 01085 Real Estate Market Report March 2018
Westfield, MA 01085 Real Estate Market Report March 2018Westfield, MA 01085 Real Estate Market Report March 2018
Westfield, MA 01085 Real Estate Market Report March 2018
 
Short background data
Short background dataShort background data
Short background data
 
Productivity growth and fiscal adjustment
Productivity growth and fiscal adjustmentProductivity growth and fiscal adjustment
Productivity growth and fiscal adjustment
 
Making Use of Big Data October 2015
Making Use of Big Data October 2015Making Use of Big Data October 2015
Making Use of Big Data October 2015
 
OECD and Progress - Beyond GDP
OECD and Progress - Beyond GDPOECD and Progress - Beyond GDP
OECD and Progress - Beyond GDP
 
Berlin presentation
Berlin presentationBerlin presentation
Berlin presentation
 
InstructionsAll the tasks calculations on this assignment must .docx
InstructionsAll the tasks calculations on this assignment must .docxInstructionsAll the tasks calculations on this assignment must .docx
InstructionsAll the tasks calculations on this assignment must .docx
 
Westfield, MA Real Estate Market Report March 2021-Sept 2021
Westfield, MA Real Estate Market Report March 2021-Sept 2021Westfield, MA Real Estate Market Report March 2021-Sept 2021
Westfield, MA Real Estate Market Report March 2021-Sept 2021
 
Community-level data
Community-level dataCommunity-level data
Community-level data
 
Southwick, MA 01077 Real Estate Market Report by Lesley Lambert, Southwick RE...
Southwick, MA 01077 Real Estate Market Report by Lesley Lambert, Southwick RE...Southwick, MA 01077 Real Estate Market Report by Lesley Lambert, Southwick RE...
Southwick, MA 01077 Real Estate Market Report by Lesley Lambert, Southwick RE...
 
DashboardTemplateTargetWards
DashboardTemplateTargetWardsDashboardTemplateTargetWards
DashboardTemplateTargetWards
 
Bangladesh - Identification
Bangladesh - IdentificationBangladesh - Identification
Bangladesh - Identification
 

Recently uploaded

Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
Lokesh Kothari
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 

Recently uploaded (20)

Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 

Can big data help in the production of reliable local area statistics?

  • 1. Can big data help in the production of reliable local area statistics? Partha Lahiri Joint Program in Survey Methodology University of Maryland, College Park, USA SDAL, Virginia Tech. January 28, 2015 SDAL January 28, 2015 1 / 27
  • 2. Ref: http://farmdocdaily.illinois.edu/2013/07/concentration-corn-soybean-production.html Based on average yearly production of corn during 2010-12 using NASS/USDA data SDAL January 28, 2015 2 / 27
  • 3. Remote Sensing for Crop Acreage The NASS-USDA has been publishing county estimates of crop acreage, crop production, crop yield and livestock inventories since 1917. Uses: local agricultural decision making, payments to farmers if crop yields are below certain levels. Can earth resources satellite data provide useful ancillary data source for county estimates of crop acreage? Satellite information is recorded for pixels (a term for picture elements). A pixel is about .45 hectares; Based on satellite readings in early Fall, it is possible to classify the crop cover all pixels. This generates big data. SDAL January 28, 2015 3 / 27
  • 4. Ref: http://www.nass.usda.gov/Statistics-by-State/Iowa/Publications/Cropland-Data-Layer/2011/index.asp 2011 Hardin County, Iowa 0 1.41 2.83 4.24 miles LandOCoverOCategories -byOdecreasingOacreage*O AGRICULTURE Corn Soybeans GrasslandOHerbaceous Alfalfa OtherOHay/NonOAlfalfa Oats WinterOWheat Rye Fallow/IdleOCropland Sod/GrassOSeed NON-AGRICULTURED Developed/OpenOSpace DeciduousOForest Developed/LowOIntensity WoodyOWetlands OpenOWater Developed/MediumOIntensity Produced by CropScape - http://nassgeodata.gmu.edu/CropScape * Only top 6 non-agriculturecategroies are listed. SDAL January 28, 2015 4 / 27
  • 5. Remote Sensing for Crop Acreage Bellow et al. NASS has been a user of remote sensing products since the 1950’s when it began using midaltitude aerial photography to construct area sampling frames (ASF’s) for the 48 states of the continental United States. A new era in remote sensing began in 1972 with the launch of the Landsat I earth-resource monitoring satellite. Four additional Landsats have been launched since 1972, with Landsat IV and V still in operation in 1993. The polar-orbiting Landsat satellites contain a multi-spectral scanner (MSS) that measures reflected energy in four bands of the electromagnetic spectrum for an area of just under one acre. The spectral bands were selected to be responsive to vegetation characteristics. SDAL January 28, 2015 5 / 27
  • 6. Remote Sensing for Crop Acreage In addition to the MSS sensor, Landsats IV and V have a Thematic Mapper (TM) sensor which measures seven energy bands and has increased spatial resolution. The large area (185 by 170 km) and repeat (16 day per satellite) coverage of these satellites opened new areas of remote sensing research: large area crop inventories, crop yields, land cover mapping, area frame stratification, and small area crop cover estimation. SDAL January 28, 2015 6 / 27
  • 7. SDAL January 28, 2015 7 / 27
  • 8. SDAL January 28, 2015 8 / 27
  • 9. Ref: Battese, Harter and Fuller (1988 JASA) SDAL January 28, 2015 9 / 27
  • 10. Unit Level Model yij : value of the study variable for the jth unit of the i small area population (i = 1, · · · , m; j = 1, · · · , Ni ) We are interested in estimating the finite population means: ¯Yi = N−1 i Ni j=1 yij . Nested Error Regression Model yij = xij β + vi + eij , where xij is a p × 1 column vector of known auxiliary variables; {vi } and {eij } are all independent with vi iid ∼ N(0, σ2 v ) and eij iid ∼ N(0, σ2 e ) SDAL January 28, 2015 10 / 27
  • 11. An Example Estimation of the number of hectares of corn for 12 Iowa counties based on the 1978 June Enumerative Survey and satellite data. yij : the number of hectares of corn in the jth segment of the ith county as reported in the June Enumerative Survey. xij = (1, x1ij , x2ij ), where x1ij (x2ij ) is the number of pixels classified as corn (soybean) in the jth segment of the ith county. ¯X = (1, ¯X1i , ¯X2i ), where ¯X1i ( ¯X2i ) is the mean number of pixels per segment classified as corn (soybean) for county i. SDAL January 28, 2015 11 / 27
  • 12. EBLUP EBLUP (EB) estimators of ¯Yi : ¯yEB i = fi ˆ¯Y Reg i + (1 − fi ){(1 − ˆBi ) ˆ¯Y Reg i + Bi ˆ¯Y Syn i }, where Bi = ˆσ2 e /ni ˆσ2 v + ˆσ2 e /ni ˆ¯Y Reg i = ¯yi + ( ¯Xi − ¯xi ) ˆβ ˆ¯Y Syn i = ¯Xi ˆβ Any standard variance component estimation method (e.g., REML) can be used to obtain ˆσ2 v and ˆσ2 e . ˆβ: the weighted least squares estimator with estimated variance components SDAL January 28, 2015 12 / 27
  • 13. Plots of Survey-Weighted Poverty Rates and SAE for a Small County (drawn by Sam Hawala) SDAL January 28, 2015 13 / 27
  • 14. Plots of Estimated SE Survey-Weighted Poverty Rates and SAE for a Small County (drawn by Sam Hawala) SDAL January 28, 2015 14 / 27
  • 15. A Cross-Sectional Model Ref: Fay and Herriot (JASA 1979) For i = 1, · · · , m, Level 1: (Sampling Distribution): yi = θi + ei ; Level 2: (Linking Distribution): θi = xi β + vi where yi : direct survey estimate of true small area mean θi for area i xi : p × 1 vector of known auxiliary variables coming from big data; {ei } and {vi } are indep. with ei ∼ N(0, ψi ) and vi ∼ N(0, σ2 v ); ψit’s are assumed to be known. The p × 1 vector of regression coefficients βt and model variance σ2 vt are unknown. SDAL January 28, 2015 15 / 27
  • 16. Auxiliary Variables from big data The proportion of child exemptions reported by families in poverty on their tax returns. The proportion of people under 65 who did not file income tax returns. The proportion of people receiving food stamps. SDAL January 28, 2015 16 / 27
  • 17. A Time Series Cross-Sectional Model Ref: Datta, Lahiri, Maiti and Lu (1999) Datta, Lahiri, Maiti (2002) For i = 1, · · · , m; t = 1, · · · , T, Level 1: : yit = θit + eit; Level 2: : θit = xitβ + vi + uit Level 3: : uit = uit−1 + it where yit: direct survey estimate of median income of four person family for state i, year t eit: sampling error xit: auxiliary variables coming from big data (previous census and administrative records) vi : state specific random effects uit: state and year specific random effects SDAL January 28, 2015 17 / 27
  • 18. Estimates of Coefficient of Variations of CPS Direct estimates of Median Income of 4-person Families in the US States: Year 1989 2.5 5.0 7.5 10.0 12.5 U.S. state level CV, CPS SDAL January 28, 2015 18 / 27
  • 19. Estimates of Coefficient of Variations of EB Direct estimates of Median Income of 4-person Families in the US States: Year 1989 2.5 5.0 7.5 10.0 12.5 U.S. state level CV, EB SDAL January 28, 2015 19 / 27
  • 20. A Plot of Absolute Residuals From a Simple Linear Regression Dep Variable: 1989 Median Income Estimates from 1990 Census Indep. Variable: CPS or EB Estimates for 1989 0 10 20 30 40 50 0200040006000800010000 Plot of absolute residual versus state State Absoluteresidual CPS EB SDAL January 28, 2015 20 / 27
  • 21. Poverty mapping: the Chilean Case High poverty rates can work favorably to a Chilean municipality in terms of securing more funds from the Chilean central government. Consider the following situation. For a given small municipality, poverty rate for the current year turns out to be high by standard design-based method. How do we convince the mayor of that municipality to go for a statistically efficient SAE method that yields lower poverty rate? SDAL January 28, 2015 21 / 27
  • 22. Plots of Survey-Weighted Poverty Rates and SAE for Selected Comunas (drawn by Carolina Casas-Cordero) 0 .1 .2 .3 .4 0 .1 .2 .3 .4 2000 2003 2006 2009 2012 2000 2003 2006 2009 2012 concón hualpén lolol santiago Direct SAE PovertyRate Year Source: Casen Survey 2000 to 2011 Estimates of poverty rates for comunas, Chile SDAL January 28, 2015 22 / 27
  • 23. Initial set of auxiliary variables Number and Name of the auxiliary variable Institution responsible for data collection Frequency of publication of the data #1. Subsidio Familiar Unidad de Prestaciones Monetarias, Ministerio de Desarrollo Social. monthly and yearly #2. Subsidio al Pago del Consumo de Agua Potable y Servicio de Alcantarillado de Aguas Servidas Unidad de Prestaciones Monetarias, Ministerio de Desarrollo Social. monthly and yearly #3. Bono Chile Solidario Unidad de Prestaciones Monetarias, Ministerio de Desarrollo Social. monthly and yearly #4. Subsidio de Discapacidad Mental Unidad de Prestaciones Monetarias, Ministerio de Desarrollo Social. monthly and yearly #5. Pensión Básica Solidaria (vejez e invalidez) Unidad de Prestaciones Monetarias, Ministerio de Desarrollo Social. December #6. Aporte Previsional Solidario (vejez e invalidez) Unidad de Prestaciones Monetarias, Ministerio de Desarrollo Social. December #7. Bonificación al Ingreso Ético Familiar Unidad de Prestaciones Monetarias, Ministerio de Desarrollo Social. monthly and yearly #8. Beca de Apoyo a la Retención Escolar, BARE Unidad de Prestaciones Monetarias, Ministerio de Desarrollo Social. monthly and yearly #9. Afiliados Sistema de Capitalización Individual Superintendencia de Pensiones monthly and yearly #10. Matrícula Ministerio de Educación Yearly #11. Rendimiento Ministerio de Educación Yearly #12. SIMCE Ministerio de Educación Yearly or every two years #13. Titulados Educación Superior Ministerio de Educación Yearly #14. Índice de Vulnerabilidad del Establecimiento (IVE-SINAE) Junta Nacional Escolar y Becas (Junaeb) Yearly #15. Situación Nutricional estudiantes básica y media Junta Nacional Escolar y Becas (Junaeb) Yearly #16. Población beneficiaria Fonasa Ministerio de Salud Yearly #17. Atenciones sector privado Ministerio de Salud Yearly #18. Razón de analfabetos respecto a la población de 10 y más años en la comuna CENSO, INE Every 10 years #19. Porcentaje de Población Rural CENSO, INE Every 10 years #20. Porcentaje de Asistencia Escolar Comunal SINIM monthly #21. Tamaño promedio del hogar CENSO, INE Every 10 years #22. Tasa de pobreza histórica CASEN Every 2 or 3 years #23. Contribuciones de Vivienda SII (http://www.sii.cl/avaluaciones/estadisticas/estadisticas_bbrr.htm#2) Yearly #24. Remuneraciones promedio de los trabajadores dependientes Yearly Source: Ministerio de Desarrollo Social (2013a). SDAL January 28, 2015 23 / 27
  • 24. Regression Analysis Independent variables Regression coefficient estimate (t-statistics): original comuna weights Average wage of dependent workers (log) -0.09575646 (3.52**) Average of the poverty rate from Casen 2000, 2003 and 2006 (arcsin) 0.49548266 (7.92**) % of population in rural areas (arcsin) -0.13409847 (4.96**) % of illiterate population (arcsin) 0.40349163 (2.57*) % of population attending to school (arcsin) -0.21883535 (2.23*) Dummy for region 7 (=1) 0.03442978 (2.11*) Dummy for region 8 (=1) 0.03882056 (2.67**) Dummy for region 9 (=1) 0.105632 (6.04**) Constant 1.61477028 (4.24**) Number of observations 235 Adjusted R2 0.67 SDAL January 28, 2015 24 / 27
  • 25. Length of the direct and parametric bootstrap confidence intervals of the comuna-level poverty rates for comunas sorted by the limited translation empirical Bayes estimates of the poverty rate. SDAL January 28, 2015 25 / 27
  • 26. ”...D.J. Finney once wrote about the statistician whose client comes in and says, ”Here is my mountain of trash. Find the gems that lie therein.” Finney’s advice was to not throw him out of the office but to attempt to find out what he considers ”gems”. After all, if the trained statistician does not help, he will find some one who will....” David Salsburg, ASA Connect Discussion SDAL January 28, 2015 26 / 27
  • 27. First Latin American ISI Satellite Meeting on Small Area Estimation August 3-5, 2015, Santiago, Chile International Statistical Institute (ISI) Satellite Meeting At Pontificia Universidad Católica de Chile Invited Talks:  Malay Ghosh “Small Area Estimation with Health Applications”  Wayne Fuller “Bootstrap Methods for Small Area Predictions”  Partha Lahiri “Recent Advances in Poverty Mapping Methodology”  Angela Luna, Nikos Tzavidis and LiChun Zhang “From start to finish: Specify – Adapt – Evaluate (SAE)”  Danny Pfeffermann and Richard Tiller “Small Area Labor Force Statistics using Time Series Models”  J.N.K. Rao “Measuring Uncertainty of Small Area Estimators” Special Topics, Contributed & Poster Sessions: Submit abstracts by April 15th of 2015 at sae2015@uc.cl Abstracts accepted on a first-come basis. Language of the conference: English Website: http://www.encuestas.uc.cl/sae2015/  Main Organizer: Centro de Encuestas y Estudios Longitudinales, Universidad Católica de Chile.  Co-organizers: International Statistical Institute (ISI), International Association of Survey Statisticians (IASS), Sociedad Chilena de Estadística (SOCHE), Instituto Nacional de Estadísticas (INE), Ministerio de Desarrollo Social (MDS), Departamento de Estadística, Departamento de Salud Pública e Instituto de Sociología de la Universidad Católica de Chile. Purpose: We hope that this meeting will serve as a bridge between mathematical statisticians and practitioners working on small area estimation in academia, private and government agencies. This meeting in Santiago will give researchers an opportunity to learn about state-of-the-art small area estimation techniques from the experts in the field. Journal of the Royal Statistical Society (JRSS) Series A Special Issue on SAE !!!
  • 28. THANK YOU! SDAL January 28, 2015 27 / 27