SlideShare a Scribd company logo
1
Outlier Identification in National
Resources Inventory and Theoretical
Extensions to Nondifferentiable Survey
Estimators
Jianqiang Wang
Major Professor: Jean Opsomer
Committee: Wayne A. Fuller
Song X. Chen
Dan Nettleton
Dimitris Margaritis
2
Outline
W Introduction
W Notation and assumptions
W Mean and median-based inference
W Variance estimation
W Simulation study
W Application in National Resources Inventory
W Theoretical extensions
3
National Resources Inventory (1)
W National Resources Inventory is a longitudinal survey of
natural resources on non-Federal land in U.S.
W Conducted by the USDA NRCS, in co-operation with CSSM
at Iowa State University.
W Produce a longitudinal database containing numerous agro-
environmental variables for scientific investigation and
policy-making.
W Information was updated every 5 years before 1997 and
annually through a partially overlapping subsampling
design.
4
National Resources Inventory (2)
W Various aspects of land use, farming practice, and
environmentally important variables like wetland status and
soil erosion.
W Measure both level and change over time in these
variables.
W Primary mode of data collection is a combination of aerial
photography and field collection.
W Outliers arise from errors in data collection, processing or
some real points themselves behave abnormally.
5
Outlier identification for a
longitudinal survey
W Identify outliers for periodically updated data.
W Build outlier identification rules on previous years’ data and
use the rules to flag current observations.
Observe
years
2001-2005
(2001,2002,
2003)
(2003,2004
,2005)
Training set
Test set
6
Target variables
W Non-pseudo core points with soil erosion in years 2001-
2005.
W Training set variables: broad use, land use, C factor, support
practice factor, slope, slope length and USLE loss in years
2001, 2002 and 2003.
W USLE loss represents the potential long term soil loss in
tons/acre.USLELOSS= R * K * LS * C * P
7
Point classification
b.u. Point Type b.u. Point Type
1 Cultivated cropland 7 Urban and built-up land
2 Noncultivated cropland 8 Rural transportation
3 Pastureland 9 Small water areas
4 Rangeland 10 Large water areas
5 Forest land 11 Rederal land
6 Minor land 12 CRP
8
Initial partitioning
W Initial partitioning uses geographical association and broad
use category.
Partition national data into state-wise categories.
Collapse northeastern states.
Partition each region based on broad use sequence
into (1,1,1), (2,2,2) (3,3,3), (12,12,12) and points
with broad use change.
Merge points with same broad use change pattern,
say (2,2,3), (1,1,12).
9
Source of outlyingness
W Flagged 1% points on training set, and compare test
distances with 99%-quantile of training distances.
W Source of outlyingness
^eº ;i =
b§ ¡ 1 = 2
º
( ^¹ º ¡ y i )
k b§ ¡ 1 = 2
º ( ^¹ º ¡ y i )k
10
Analysis of flagged points
W Agricultural specialists analyzed identified points by
suspicious variables.
W C factor: almost all points were considered suspicious.
W Data entry errors
W Invalid entries
 c factor=1 for hayland, pastureland or CRP
W Unusual levels or trends in relation to landuse
(0.013, 0.13, 0.013, 0.013, 0.013)
(0.011, 0.06, 0.11, 0.003, 0.003)
11
Analysis of flagged points
W P factor: all points are candidates for review because of the
change over time.
W Slope length: all points were flagged because of the level,
not change over time.
(1.0, 1.0, 1.0, 0.6, 1.0)
12
Nondifferentiable survey
estimators
W The sample distance distribution is nondifferentiable
function of the estimated location parameter.
W A general class of survey estimators:
with corresponding population quantity
W A direct Taylor linearization may not be applicable, again use
a differentiable limiting function , with
derivative .
bT(^¸ ) = 1
N
P
i 2 Sº
1
¼i
h(yi ; ^¸ )
TN (¸ N ) = 1
N
PN
i = 1
h(yi ; ¸ N )
Not necessarily
differentiable
T (° ) = lim
N ! 1
TN (° )
³ (° )
bDº ;d(^¹º )
13
Asymptotics
W Under certain regularity conditions,
where
W The extra variance due to estimating unknown parameter
may or may not be negligible.
W Propose a kernel estimator to estimate unknown derivative.
n¤1=2
h
V ( bT(^¸ ))
i¡ 1=2 ³
bT(^¸ ) ¡ TN (¸ N )
´¯
¯
¯
¯ F
d
! N(0; 1)
( bT(^¸ )) =
³
1; [³ (¸ N )]T
´
V (¹z¼)
µ
1
³ (¸ N )
¶
:
14
Estimating distribution function
using auxiliary information
W Ratio model
W Use as a substitute of , where .
W Difference estimator
W The extra variance due to estimating ratio is negligible
(RKM, 1990).
yi = Rxi + ²i ; ²i » N(0; xi ¾2)
^Rxi
yi ^R =
P
S º
yi =¼i
P
S º
x i =¼i
bT ( ^R) = 1
N
nP
Sº
1
¼i
I(yi · t) +
hP
U I( ^Rxi · t)
¡
P
Sº
1
¼i
I( ^Rxi · t)
io
15
Estimating a fraction below an
estimated quantity
W Estimate the fraction of households in poverty when the
poverty line is drawn at 60% of the median income.
with population quantity
W Assume that , the extra variance
depends on .
bT (^q) = 1
N
P
Sº
1
¼i
I(yi · 0:6^q)
TN (qN ) = 1
N
NP
i = 1
I(yi · 0:6qN )
lim
N ! 1
TN (°) = FY (0:6°)
@FY (0:6° )
@°
16
Concluding remarks
W Proposed an estimator for subpopulation distance
distribution and demonstrated its statistical properties.
W Application in a large-scale longitudinal survey.
W Theoretical extensions to nondifferentiable survey
estimators.
17
Thank you

More Related Content

Similar to Multivariate outlier detection

A GIS-Based Framework to Identify Opportunities to Use Surface Water to Offse...
A GIS-Based Framework to Identify Opportunities to Use Surface Water to Offse...A GIS-Based Framework to Identify Opportunities to Use Surface Water to Offse...
A GIS-Based Framework to Identify Opportunities to Use Surface Water to Offse...
ASADULISLAMSORIF
 
Digital Soil Mapping/ Pedomterics
Digital Soil Mapping/ Pedomterics Digital Soil Mapping/ Pedomterics
Digital Soil Mapping/ Pedomterics
Soils FAO-GSP
 
Birr - Identifying Critical Portions of the Landscape
Birr - Identifying Critical Portions of the LandscapeBirr - Identifying Critical Portions of the Landscape
Birr - Identifying Critical Portions of the Landscape
Jose A. Hernandez
 
02Degraded Land Areas.pdf
02Degraded Land Areas.pdf02Degraded Land Areas.pdf
02Degraded Land Areas.pdf
AnishRatnaShakya
 
DROUGHT INDEX
DROUGHT INDEXDROUGHT INDEX
DROUGHT INDEX
Shyam Mohan Chaudhary
 
11.[9 20] analytical study of rainfal of nigeria
11.[9 20] analytical study of rainfal of nigeria11.[9 20] analytical study of rainfal of nigeria
11.[9 20] analytical study of rainfal of nigeria
Alexander Decker
 
APPLICATION OF KRIGING IN GROUND WATER STUDIES
APPLICATION OF KRIGING IN GROUND WATER STUDIESAPPLICATION OF KRIGING IN GROUND WATER STUDIES
APPLICATION OF KRIGING IN GROUND WATER STUDIES
Abhiram Kanigolla
 
Remote Sensing Methods for operational ET determinations in the NENA region, ...
Remote Sensing Methods for operational ET determinations in the NENA region, ...Remote Sensing Methods for operational ET determinations in the NENA region, ...
Remote Sensing Methods for operational ET determinations in the NENA region, ...
NENAwaterscarcity
 
Comparison of Spatial Interpolation Methods for Precipitation in Ningxia, China
Comparison of Spatial Interpolation Methods for Precipitation in Ningxia, ChinaComparison of Spatial Interpolation Methods for Precipitation in Ningxia, China
Comparison of Spatial Interpolation Methods for Precipitation in Ningxia, China
International Journal of Science and Research (IJSR)
 
Pstj 1342
Pstj 1342Pstj 1342
2018 ASPRS: Big Data: Utilizing Landsat to Detect Ephemeral Water Sources in ...
2018 ASPRS: Big Data: Utilizing Landsat to Detect Ephemeral Water Sources in ...2018 ASPRS: Big Data: Utilizing Landsat to Detect Ephemeral Water Sources in ...
2018 ASPRS: Big Data: Utilizing Landsat to Detect Ephemeral Water Sources in ...
GIS in the Rockies
 
Western crop science society of america conference oregon, 2013 - greensee...
Western crop science society of america conference    oregon, 2013 - greensee...Western crop science society of america conference    oregon, 2013 - greensee...
Western crop science society of america conference oregon, 2013 - greensee...
Wtarc Conrad Montana
 
Forest Change Detection in incomplete satellite images with deep neural networks
Forest Change Detection in incomplete satellite images with deep neural networksForest Change Detection in incomplete satellite images with deep neural networks
Forest Change Detection in incomplete satellite images with deep neural networks
Aatif Sohail
 
The status, research progress, and new application of soil inventory in Japan...
The status, research progress, and new application of soil inventory in Japan...The status, research progress, and new application of soil inventory in Japan...
The status, research progress, and new application of soil inventory in Japan...
FAO
 
Leaf Area Index (LAI) in the quantification of vegetation disturbance in Iris...
Leaf Area Index (LAI) in the quantification of vegetation disturbance in Iris...Leaf Area Index (LAI) in the quantification of vegetation disturbance in Iris...
Leaf Area Index (LAI) in the quantification of vegetation disturbance in Iris...
Environmental Protection Agency, Ireland
 
SURF_Poster_REU2016Team1
SURF_Poster_REU2016Team1SURF_Poster_REU2016Team1
SURF_Poster_REU2016Team1
Mathew Prindle
 
Application of Bayesian Regularized Neural Networks for Groundwater Level Mo...
Application of Bayesian Regularized Neural  Networks for Groundwater Level Mo...Application of Bayesian Regularized Neural  Networks for Groundwater Level Mo...
Application of Bayesian Regularized Neural Networks for Groundwater Level Mo...
Dr. Amir Mosavi, PhD., P.Eng.
 
International journal of engineering issues vol 2015 - no 2 - paper3
International journal of engineering issues   vol 2015 - no 2 - paper3International journal of engineering issues   vol 2015 - no 2 - paper3
International journal of engineering issues vol 2015 - no 2 - paper3
sophiabelthome
 
New swat tile drain equations
New swat tile drain equationsNew swat tile drain equations
New swat tile drain equations
Soil and Water Conservation Society
 
Probalistic assessment of agriculture
Probalistic assessment of agricultureProbalistic assessment of agriculture
Probalistic assessment of agriculture
Soil and Water Conservation Society
 

Similar to Multivariate outlier detection (20)

A GIS-Based Framework to Identify Opportunities to Use Surface Water to Offse...
A GIS-Based Framework to Identify Opportunities to Use Surface Water to Offse...A GIS-Based Framework to Identify Opportunities to Use Surface Water to Offse...
A GIS-Based Framework to Identify Opportunities to Use Surface Water to Offse...
 
Digital Soil Mapping/ Pedomterics
Digital Soil Mapping/ Pedomterics Digital Soil Mapping/ Pedomterics
Digital Soil Mapping/ Pedomterics
 
Birr - Identifying Critical Portions of the Landscape
Birr - Identifying Critical Portions of the LandscapeBirr - Identifying Critical Portions of the Landscape
Birr - Identifying Critical Portions of the Landscape
 
02Degraded Land Areas.pdf
02Degraded Land Areas.pdf02Degraded Land Areas.pdf
02Degraded Land Areas.pdf
 
DROUGHT INDEX
DROUGHT INDEXDROUGHT INDEX
DROUGHT INDEX
 
11.[9 20] analytical study of rainfal of nigeria
11.[9 20] analytical study of rainfal of nigeria11.[9 20] analytical study of rainfal of nigeria
11.[9 20] analytical study of rainfal of nigeria
 
APPLICATION OF KRIGING IN GROUND WATER STUDIES
APPLICATION OF KRIGING IN GROUND WATER STUDIESAPPLICATION OF KRIGING IN GROUND WATER STUDIES
APPLICATION OF KRIGING IN GROUND WATER STUDIES
 
Remote Sensing Methods for operational ET determinations in the NENA region, ...
Remote Sensing Methods for operational ET determinations in the NENA region, ...Remote Sensing Methods for operational ET determinations in the NENA region, ...
Remote Sensing Methods for operational ET determinations in the NENA region, ...
 
Comparison of Spatial Interpolation Methods for Precipitation in Ningxia, China
Comparison of Spatial Interpolation Methods for Precipitation in Ningxia, ChinaComparison of Spatial Interpolation Methods for Precipitation in Ningxia, China
Comparison of Spatial Interpolation Methods for Precipitation in Ningxia, China
 
Pstj 1342
Pstj 1342Pstj 1342
Pstj 1342
 
2018 ASPRS: Big Data: Utilizing Landsat to Detect Ephemeral Water Sources in ...
2018 ASPRS: Big Data: Utilizing Landsat to Detect Ephemeral Water Sources in ...2018 ASPRS: Big Data: Utilizing Landsat to Detect Ephemeral Water Sources in ...
2018 ASPRS: Big Data: Utilizing Landsat to Detect Ephemeral Water Sources in ...
 
Western crop science society of america conference oregon, 2013 - greensee...
Western crop science society of america conference    oregon, 2013 - greensee...Western crop science society of america conference    oregon, 2013 - greensee...
Western crop science society of america conference oregon, 2013 - greensee...
 
Forest Change Detection in incomplete satellite images with deep neural networks
Forest Change Detection in incomplete satellite images with deep neural networksForest Change Detection in incomplete satellite images with deep neural networks
Forest Change Detection in incomplete satellite images with deep neural networks
 
The status, research progress, and new application of soil inventory in Japan...
The status, research progress, and new application of soil inventory in Japan...The status, research progress, and new application of soil inventory in Japan...
The status, research progress, and new application of soil inventory in Japan...
 
Leaf Area Index (LAI) in the quantification of vegetation disturbance in Iris...
Leaf Area Index (LAI) in the quantification of vegetation disturbance in Iris...Leaf Area Index (LAI) in the quantification of vegetation disturbance in Iris...
Leaf Area Index (LAI) in the quantification of vegetation disturbance in Iris...
 
SURF_Poster_REU2016Team1
SURF_Poster_REU2016Team1SURF_Poster_REU2016Team1
SURF_Poster_REU2016Team1
 
Application of Bayesian Regularized Neural Networks for Groundwater Level Mo...
Application of Bayesian Regularized Neural  Networks for Groundwater Level Mo...Application of Bayesian Regularized Neural  Networks for Groundwater Level Mo...
Application of Bayesian Regularized Neural Networks for Groundwater Level Mo...
 
International journal of engineering issues vol 2015 - no 2 - paper3
International journal of engineering issues   vol 2015 - no 2 - paper3International journal of engineering issues   vol 2015 - no 2 - paper3
International journal of engineering issues vol 2015 - no 2 - paper3
 
New swat tile drain equations
New swat tile drain equationsNew swat tile drain equations
New swat tile drain equations
 
Probalistic assessment of agriculture
Probalistic assessment of agricultureProbalistic assessment of agriculture
Probalistic assessment of agriculture
 

More from Jay (Jianqiang) Wang

The Practice of Data Driven Products in Kuaishou
The Practice of Data Driven Products in KuaishouThe Practice of Data Driven Products in Kuaishou
The Practice of Data Driven Products in Kuaishou
Jay (Jianqiang) Wang
 
Artificial Intelligence in fashion -- Combining Statistics and Expert Human J...
Artificial Intelligence in fashion -- Combining Statistics and Expert Human J...Artificial Intelligence in fashion -- Combining Statistics and Expert Human J...
Artificial Intelligence in fashion -- Combining Statistics and Expert Human J...
Jay (Jianqiang) Wang
 
Making data-informed decisions and building intelligent products (Chinese)
Making data-informed decisions and building intelligent products (Chinese)Making data-informed decisions and building intelligent products (Chinese)
Making data-informed decisions and building intelligent products (Chinese)
Jay (Jianqiang) Wang
 
Notes on Machine Learning and Data-centric Startups
Notes on Machine Learning and Data-centric StartupsNotes on Machine Learning and Data-centric Startups
Notes on Machine Learning and Data-centric Startups
Jay (Jianqiang) Wang
 
Introduction to data science and its application in online advertising
Introduction to data science and its application in online advertisingIntroduction to data science and its application in online advertising
Introduction to data science and its application in online advertising
Jay (Jianqiang) Wang
 
How to prepare for data science interviews
How to prepare for data science interviewsHow to prepare for data science interviews
How to prepare for data science interviews
Jay (Jianqiang) Wang
 
Introduction to data science and candidate data science projects
Introduction to data science and candidate data science projectsIntroduction to data science and candidate data science projects
Introduction to data science and candidate data science projects
Jay (Jianqiang) Wang
 
Boosted multinomial logit model (working manuscript)
Boosted multinomial logit model (working manuscript)Boosted multinomial logit model (working manuscript)
Boosted multinomial logit model (working manuscript)
Jay (Jianqiang) Wang
 
Boosted Tree-based Multinomial Logit Model for Aggregated Market Data
Boosted Tree-based Multinomial Logit Model for Aggregated Market DataBoosted Tree-based Multinomial Logit Model for Aggregated Market Data
Boosted Tree-based Multinomial Logit Model for Aggregated Market Data
Jay (Jianqiang) Wang
 
A Bayesian Approach to Estimating Agricultual Yield Based on Multiple Repeat...
A Bayesian Approach to Estimating Agricultual Yield Based on Multiple Repeat...A Bayesian Approach to Estimating Agricultual Yield Based on Multiple Repeat...
A Bayesian Approach to Estimating Agricultual Yield Based on Multiple Repeat...
Jay (Jianqiang) Wang
 

More from Jay (Jianqiang) Wang (10)

The Practice of Data Driven Products in Kuaishou
The Practice of Data Driven Products in KuaishouThe Practice of Data Driven Products in Kuaishou
The Practice of Data Driven Products in Kuaishou
 
Artificial Intelligence in fashion -- Combining Statistics and Expert Human J...
Artificial Intelligence in fashion -- Combining Statistics and Expert Human J...Artificial Intelligence in fashion -- Combining Statistics and Expert Human J...
Artificial Intelligence in fashion -- Combining Statistics and Expert Human J...
 
Making data-informed decisions and building intelligent products (Chinese)
Making data-informed decisions and building intelligent products (Chinese)Making data-informed decisions and building intelligent products (Chinese)
Making data-informed decisions and building intelligent products (Chinese)
 
Notes on Machine Learning and Data-centric Startups
Notes on Machine Learning and Data-centric StartupsNotes on Machine Learning and Data-centric Startups
Notes on Machine Learning and Data-centric Startups
 
Introduction to data science and its application in online advertising
Introduction to data science and its application in online advertisingIntroduction to data science and its application in online advertising
Introduction to data science and its application in online advertising
 
How to prepare for data science interviews
How to prepare for data science interviewsHow to prepare for data science interviews
How to prepare for data science interviews
 
Introduction to data science and candidate data science projects
Introduction to data science and candidate data science projectsIntroduction to data science and candidate data science projects
Introduction to data science and candidate data science projects
 
Boosted multinomial logit model (working manuscript)
Boosted multinomial logit model (working manuscript)Boosted multinomial logit model (working manuscript)
Boosted multinomial logit model (working manuscript)
 
Boosted Tree-based Multinomial Logit Model for Aggregated Market Data
Boosted Tree-based Multinomial Logit Model for Aggregated Market DataBoosted Tree-based Multinomial Logit Model for Aggregated Market Data
Boosted Tree-based Multinomial Logit Model for Aggregated Market Data
 
A Bayesian Approach to Estimating Agricultual Yield Based on Multiple Repeat...
A Bayesian Approach to Estimating Agricultual Yield Based on Multiple Repeat...A Bayesian Approach to Estimating Agricultual Yield Based on Multiple Repeat...
A Bayesian Approach to Estimating Agricultual Yield Based on Multiple Repeat...
 

Recently uploaded

一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
lzdvtmy8
 

Recently uploaded (20)

一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
 

Multivariate outlier detection

  • 1. 1 Outlier Identification in National Resources Inventory and Theoretical Extensions to Nondifferentiable Survey Estimators Jianqiang Wang Major Professor: Jean Opsomer Committee: Wayne A. Fuller Song X. Chen Dan Nettleton Dimitris Margaritis
  • 2. 2 Outline W Introduction W Notation and assumptions W Mean and median-based inference W Variance estimation W Simulation study W Application in National Resources Inventory W Theoretical extensions
  • 3. 3 National Resources Inventory (1) W National Resources Inventory is a longitudinal survey of natural resources on non-Federal land in U.S. W Conducted by the USDA NRCS, in co-operation with CSSM at Iowa State University. W Produce a longitudinal database containing numerous agro- environmental variables for scientific investigation and policy-making. W Information was updated every 5 years before 1997 and annually through a partially overlapping subsampling design.
  • 4. 4 National Resources Inventory (2) W Various aspects of land use, farming practice, and environmentally important variables like wetland status and soil erosion. W Measure both level and change over time in these variables. W Primary mode of data collection is a combination of aerial photography and field collection. W Outliers arise from errors in data collection, processing or some real points themselves behave abnormally.
  • 5. 5 Outlier identification for a longitudinal survey W Identify outliers for periodically updated data. W Build outlier identification rules on previous years’ data and use the rules to flag current observations. Observe years 2001-2005 (2001,2002, 2003) (2003,2004 ,2005) Training set Test set
  • 6. 6 Target variables W Non-pseudo core points with soil erosion in years 2001- 2005. W Training set variables: broad use, land use, C factor, support practice factor, slope, slope length and USLE loss in years 2001, 2002 and 2003. W USLE loss represents the potential long term soil loss in tons/acre.USLELOSS= R * K * LS * C * P
  • 7. 7 Point classification b.u. Point Type b.u. Point Type 1 Cultivated cropland 7 Urban and built-up land 2 Noncultivated cropland 8 Rural transportation 3 Pastureland 9 Small water areas 4 Rangeland 10 Large water areas 5 Forest land 11 Rederal land 6 Minor land 12 CRP
  • 8. 8 Initial partitioning W Initial partitioning uses geographical association and broad use category. Partition national data into state-wise categories. Collapse northeastern states. Partition each region based on broad use sequence into (1,1,1), (2,2,2) (3,3,3), (12,12,12) and points with broad use change. Merge points with same broad use change pattern, say (2,2,3), (1,1,12).
  • 9. 9 Source of outlyingness W Flagged 1% points on training set, and compare test distances with 99%-quantile of training distances. W Source of outlyingness ^eº ;i = b§ ¡ 1 = 2 º ( ^¹ º ¡ y i ) k b§ ¡ 1 = 2 º ( ^¹ º ¡ y i )k
  • 10. 10 Analysis of flagged points W Agricultural specialists analyzed identified points by suspicious variables. W C factor: almost all points were considered suspicious. W Data entry errors W Invalid entries  c factor=1 for hayland, pastureland or CRP W Unusual levels or trends in relation to landuse (0.013, 0.13, 0.013, 0.013, 0.013) (0.011, 0.06, 0.11, 0.003, 0.003)
  • 11. 11 Analysis of flagged points W P factor: all points are candidates for review because of the change over time. W Slope length: all points were flagged because of the level, not change over time. (1.0, 1.0, 1.0, 0.6, 1.0)
  • 12. 12 Nondifferentiable survey estimators W The sample distance distribution is nondifferentiable function of the estimated location parameter. W A general class of survey estimators: with corresponding population quantity W A direct Taylor linearization may not be applicable, again use a differentiable limiting function , with derivative . bT(^¸ ) = 1 N P i 2 Sº 1 ¼i h(yi ; ^¸ ) TN (¸ N ) = 1 N PN i = 1 h(yi ; ¸ N ) Not necessarily differentiable T (° ) = lim N ! 1 TN (° ) ³ (° ) bDº ;d(^¹º )
  • 13. 13 Asymptotics W Under certain regularity conditions, where W The extra variance due to estimating unknown parameter may or may not be negligible. W Propose a kernel estimator to estimate unknown derivative. n¤1=2 h V ( bT(^¸ )) i¡ 1=2 ³ bT(^¸ ) ¡ TN (¸ N ) ´¯ ¯ ¯ ¯ F d ! N(0; 1) ( bT(^¸ )) = ³ 1; [³ (¸ N )]T ´ V (¹z¼) µ 1 ³ (¸ N ) ¶ :
  • 14. 14 Estimating distribution function using auxiliary information W Ratio model W Use as a substitute of , where . W Difference estimator W The extra variance due to estimating ratio is negligible (RKM, 1990). yi = Rxi + ²i ; ²i » N(0; xi ¾2) ^Rxi yi ^R = P S º yi =¼i P S º x i =¼i bT ( ^R) = 1 N nP Sº 1 ¼i I(yi · t) + hP U I( ^Rxi · t) ¡ P Sº 1 ¼i I( ^Rxi · t) io
  • 15. 15 Estimating a fraction below an estimated quantity W Estimate the fraction of households in poverty when the poverty line is drawn at 60% of the median income. with population quantity W Assume that , the extra variance depends on . bT (^q) = 1 N P Sº 1 ¼i I(yi · 0:6^q) TN (qN ) = 1 N NP i = 1 I(yi · 0:6qN ) lim N ! 1 TN (°) = FY (0:6°) @FY (0:6° ) @°
  • 16. 16 Concluding remarks W Proposed an estimator for subpopulation distance distribution and demonstrated its statistical properties. W Application in a large-scale longitudinal survey. W Theoretical extensions to nondifferentiable survey estimators.