SlideShare a Scribd company logo
1 of 17
1
Outlier Identification in National
Resources Inventory and Theoretical
Extensions to Nondifferentiable Survey
Estimators
Jianqiang Wang
Major Professor: Jean Opsomer
Committee: Wayne A. Fuller
Song X. Chen
Dan Nettleton
Dimitris Margaritis
2
Outline
W Introduction
W Notation and assumptions
W Mean and median-based inference
W Variance estimation
W Simulation study
W Application in National Resources Inventory
W Theoretical extensions
3
National Resources Inventory (1)
W National Resources Inventory is a longitudinal survey of
natural resources on non-Federal land in U.S.
W Conducted by the USDA NRCS, in co-operation with CSSM
at Iowa State University.
W Produce a longitudinal database containing numerous agro-
environmental variables for scientific investigation and
policy-making.
W Information was updated every 5 years before 1997 and
annually through a partially overlapping subsampling
design.
4
National Resources Inventory (2)
W Various aspects of land use, farming practice, and
environmentally important variables like wetland status and
soil erosion.
W Measure both level and change over time in these
variables.
W Primary mode of data collection is a combination of aerial
photography and field collection.
W Outliers arise from errors in data collection, processing or
some real points themselves behave abnormally.
5
Outlier identification for a
longitudinal survey
W Identify outliers for periodically updated data.
W Build outlier identification rules on previous years’ data and
use the rules to flag current observations.
Observe
years
2001-2005
(2001,2002,
2003)
(2003,2004
,2005)
Training set
Test set
6
Target variables
W Non-pseudo core points with soil erosion in years 2001-
2005.
W Training set variables: broad use, land use, C factor, support
practice factor, slope, slope length and USLE loss in years
2001, 2002 and 2003.
W USLE loss represents the potential long term soil loss in
tons/acre.USLELOSS= R * K * LS * C * P
7
Point classification
b.u. Point Type b.u. Point Type
1 Cultivated cropland 7 Urban and built-up land
2 Noncultivated cropland 8 Rural transportation
3 Pastureland 9 Small water areas
4 Rangeland 10 Large water areas
5 Forest land 11 Rederal land
6 Minor land 12 CRP
8
Initial partitioning
W Initial partitioning uses geographical association and broad
use category.
Partition national data into state-wise categories.
Collapse northeastern states.
Partition each region based on broad use sequence
into (1,1,1), (2,2,2) (3,3,3), (12,12,12) and points
with broad use change.
Merge points with same broad use change pattern,
say (2,2,3), (1,1,12).
9
Source of outlyingness
W Flagged 1% points on training set, and compare test
distances with 99%-quantile of training distances.
W Source of outlyingness
^eº ;i =
b§ ¡ 1 = 2
º
( ^¹ º ¡ y i )
k b§ ¡ 1 = 2
º ( ^¹ º ¡ y i )k
10
Analysis of flagged points
W Agricultural specialists analyzed identified points by
suspicious variables.
W C factor: almost all points were considered suspicious.
W Data entry errors
W Invalid entries
 c factor=1 for hayland, pastureland or CRP
W Unusual levels or trends in relation to landuse
(0.013, 0.13, 0.013, 0.013, 0.013)
(0.011, 0.06, 0.11, 0.003, 0.003)
11
Analysis of flagged points
W P factor: all points are candidates for review because of the
change over time.
W Slope length: all points were flagged because of the level,
not change over time.
(1.0, 1.0, 1.0, 0.6, 1.0)
12
Nondifferentiable survey
estimators
W The sample distance distribution is nondifferentiable
function of the estimated location parameter.
W A general class of survey estimators:
with corresponding population quantity
W A direct Taylor linearization may not be applicable, again use
a differentiable limiting function , with
derivative .
bT(^¸ ) = 1
N
P
i 2 Sº
1
¼i
h(yi ; ^¸ )
TN (¸ N ) = 1
N
PN
i = 1
h(yi ; ¸ N )
Not necessarily
differentiable
T (° ) = lim
N ! 1
TN (° )
³ (° )
bDº ;d(^¹º )
13
Asymptotics
W Under certain regularity conditions,
where
W The extra variance due to estimating unknown parameter
may or may not be negligible.
W Propose a kernel estimator to estimate unknown derivative.
n¤1=2
h
V ( bT(^¸ ))
i¡ 1=2 ³
bT(^¸ ) ¡ TN (¸ N )
´¯
¯
¯
¯ F
d
! N(0; 1)
( bT(^¸ )) =
³
1; [³ (¸ N )]T
´
V (¹z¼)
µ
1
³ (¸ N )
¶
:
14
Estimating distribution function
using auxiliary information
W Ratio model
W Use as a substitute of , where .
W Difference estimator
W The extra variance due to estimating ratio is negligible
(RKM, 1990).
yi = Rxi + ²i ; ²i » N(0; xi ¾2)
^Rxi
yi ^R =
P
S º
yi =¼i
P
S º
x i =¼i
bT ( ^R) = 1
N
nP
Sº
1
¼i
I(yi · t) +
hP
U I( ^Rxi · t)
¡
P
Sº
1
¼i
I( ^Rxi · t)
io
15
Estimating a fraction below an
estimated quantity
W Estimate the fraction of households in poverty when the
poverty line is drawn at 60% of the median income.
with population quantity
W Assume that , the extra variance
depends on .
bT (^q) = 1
N
P
Sº
1
¼i
I(yi · 0:6^q)
TN (qN ) = 1
N
NP
i = 1
I(yi · 0:6qN )
lim
N ! 1
TN (°) = FY (0:6°)
@FY (0:6° )
@°
16
Concluding remarks
W Proposed an estimator for subpopulation distance
distribution and demonstrated its statistical properties.
W Application in a large-scale longitudinal survey.
W Theoretical extensions to nondifferentiable survey
estimators.
17
Thank you

More Related Content

Similar to Multivariate outlier detection

A GIS-Based Framework to Identify Opportunities to Use Surface Water to Offse...
A GIS-Based Framework to Identify Opportunities to Use Surface Water to Offse...A GIS-Based Framework to Identify Opportunities to Use Surface Water to Offse...
A GIS-Based Framework to Identify Opportunities to Use Surface Water to Offse...ASADULISLAMSORIF
 
Digital Soil Mapping/ Pedomterics
Digital Soil Mapping/ Pedomterics Digital Soil Mapping/ Pedomterics
Digital Soil Mapping/ Pedomterics Soils FAO-GSP
 
Birr - Identifying Critical Portions of the Landscape
Birr - Identifying Critical Portions of the LandscapeBirr - Identifying Critical Portions of the Landscape
Birr - Identifying Critical Portions of the LandscapeJose A. Hernandez
 
11.[9 20] analytical study of rainfal of nigeria
11.[9 20] analytical study of rainfal of nigeria11.[9 20] analytical study of rainfal of nigeria
11.[9 20] analytical study of rainfal of nigeriaAlexander Decker
 
APPLICATION OF KRIGING IN GROUND WATER STUDIES
APPLICATION OF KRIGING IN GROUND WATER STUDIESAPPLICATION OF KRIGING IN GROUND WATER STUDIES
APPLICATION OF KRIGING IN GROUND WATER STUDIESAbhiram Kanigolla
 
Remote Sensing Methods for operational ET determinations in the NENA region, ...
Remote Sensing Methods for operational ET determinations in the NENA region, ...Remote Sensing Methods for operational ET determinations in the NENA region, ...
Remote Sensing Methods for operational ET determinations in the NENA region, ...NENAwaterscarcity
 
2018 ASPRS: Big Data: Utilizing Landsat to Detect Ephemeral Water Sources in ...
2018 ASPRS: Big Data: Utilizing Landsat to Detect Ephemeral Water Sources in ...2018 ASPRS: Big Data: Utilizing Landsat to Detect Ephemeral Water Sources in ...
2018 ASPRS: Big Data: Utilizing Landsat to Detect Ephemeral Water Sources in ...GIS in the Rockies
 
Western crop science society of america conference oregon, 2013 - greensee...
Western crop science society of america conference    oregon, 2013 - greensee...Western crop science society of america conference    oregon, 2013 - greensee...
Western crop science society of america conference oregon, 2013 - greensee...Wtarc Conrad Montana
 
Forest Change Detection in incomplete satellite images with deep neural networks
Forest Change Detection in incomplete satellite images with deep neural networksForest Change Detection in incomplete satellite images with deep neural networks
Forest Change Detection in incomplete satellite images with deep neural networksAatif Sohail
 
The status, research progress, and new application of soil inventory in Japan...
The status, research progress, and new application of soil inventory in Japan...The status, research progress, and new application of soil inventory in Japan...
The status, research progress, and new application of soil inventory in Japan...FAO
 
Leaf Area Index (LAI) in the quantification of vegetation disturbance in Iris...
Leaf Area Index (LAI) in the quantification of vegetation disturbance in Iris...Leaf Area Index (LAI) in the quantification of vegetation disturbance in Iris...
Leaf Area Index (LAI) in the quantification of vegetation disturbance in Iris...Environmental Protection Agency, Ireland
 
SURF_Poster_REU2016Team1
SURF_Poster_REU2016Team1SURF_Poster_REU2016Team1
SURF_Poster_REU2016Team1Mathew Prindle
 
Application of Bayesian Regularized Neural Networks for Groundwater Level Mo...
Application of Bayesian Regularized Neural  Networks for Groundwater Level Mo...Application of Bayesian Regularized Neural  Networks for Groundwater Level Mo...
Application of Bayesian Regularized Neural Networks for Groundwater Level Mo...Dr. Amir Mosavi, PhD., P.Eng.
 
International journal of engineering issues vol 2015 - no 2 - paper3
International journal of engineering issues   vol 2015 - no 2 - paper3International journal of engineering issues   vol 2015 - no 2 - paper3
International journal of engineering issues vol 2015 - no 2 - paper3sophiabelthome
 

Similar to Multivariate outlier detection (20)

A GIS-Based Framework to Identify Opportunities to Use Surface Water to Offse...
A GIS-Based Framework to Identify Opportunities to Use Surface Water to Offse...A GIS-Based Framework to Identify Opportunities to Use Surface Water to Offse...
A GIS-Based Framework to Identify Opportunities to Use Surface Water to Offse...
 
Digital Soil Mapping/ Pedomterics
Digital Soil Mapping/ Pedomterics Digital Soil Mapping/ Pedomterics
Digital Soil Mapping/ Pedomterics
 
Birr - Identifying Critical Portions of the Landscape
Birr - Identifying Critical Portions of the LandscapeBirr - Identifying Critical Portions of the Landscape
Birr - Identifying Critical Portions of the Landscape
 
02Degraded Land Areas.pdf
02Degraded Land Areas.pdf02Degraded Land Areas.pdf
02Degraded Land Areas.pdf
 
DROUGHT INDEX
DROUGHT INDEXDROUGHT INDEX
DROUGHT INDEX
 
11.[9 20] analytical study of rainfal of nigeria
11.[9 20] analytical study of rainfal of nigeria11.[9 20] analytical study of rainfal of nigeria
11.[9 20] analytical study of rainfal of nigeria
 
APPLICATION OF KRIGING IN GROUND WATER STUDIES
APPLICATION OF KRIGING IN GROUND WATER STUDIESAPPLICATION OF KRIGING IN GROUND WATER STUDIES
APPLICATION OF KRIGING IN GROUND WATER STUDIES
 
Remote Sensing Methods for operational ET determinations in the NENA region, ...
Remote Sensing Methods for operational ET determinations in the NENA region, ...Remote Sensing Methods for operational ET determinations in the NENA region, ...
Remote Sensing Methods for operational ET determinations in the NENA region, ...
 
Comparison of Spatial Interpolation Methods for Precipitation in Ningxia, China
Comparison of Spatial Interpolation Methods for Precipitation in Ningxia, ChinaComparison of Spatial Interpolation Methods for Precipitation in Ningxia, China
Comparison of Spatial Interpolation Methods for Precipitation in Ningxia, China
 
Pstj 1342
Pstj 1342Pstj 1342
Pstj 1342
 
2018 ASPRS: Big Data: Utilizing Landsat to Detect Ephemeral Water Sources in ...
2018 ASPRS: Big Data: Utilizing Landsat to Detect Ephemeral Water Sources in ...2018 ASPRS: Big Data: Utilizing Landsat to Detect Ephemeral Water Sources in ...
2018 ASPRS: Big Data: Utilizing Landsat to Detect Ephemeral Water Sources in ...
 
Western crop science society of america conference oregon, 2013 - greensee...
Western crop science society of america conference    oregon, 2013 - greensee...Western crop science society of america conference    oregon, 2013 - greensee...
Western crop science society of america conference oregon, 2013 - greensee...
 
Forest Change Detection in incomplete satellite images with deep neural networks
Forest Change Detection in incomplete satellite images with deep neural networksForest Change Detection in incomplete satellite images with deep neural networks
Forest Change Detection in incomplete satellite images with deep neural networks
 
The status, research progress, and new application of soil inventory in Japan...
The status, research progress, and new application of soil inventory in Japan...The status, research progress, and new application of soil inventory in Japan...
The status, research progress, and new application of soil inventory in Japan...
 
Leaf Area Index (LAI) in the quantification of vegetation disturbance in Iris...
Leaf Area Index (LAI) in the quantification of vegetation disturbance in Iris...Leaf Area Index (LAI) in the quantification of vegetation disturbance in Iris...
Leaf Area Index (LAI) in the quantification of vegetation disturbance in Iris...
 
SURF_Poster_REU2016Team1
SURF_Poster_REU2016Team1SURF_Poster_REU2016Team1
SURF_Poster_REU2016Team1
 
Application of Bayesian Regularized Neural Networks for Groundwater Level Mo...
Application of Bayesian Regularized Neural  Networks for Groundwater Level Mo...Application of Bayesian Regularized Neural  Networks for Groundwater Level Mo...
Application of Bayesian Regularized Neural Networks for Groundwater Level Mo...
 
International journal of engineering issues vol 2015 - no 2 - paper3
International journal of engineering issues   vol 2015 - no 2 - paper3International journal of engineering issues   vol 2015 - no 2 - paper3
International journal of engineering issues vol 2015 - no 2 - paper3
 
New swat tile drain equations
New swat tile drain equationsNew swat tile drain equations
New swat tile drain equations
 
Probalistic assessment of agriculture
Probalistic assessment of agricultureProbalistic assessment of agriculture
Probalistic assessment of agriculture
 

More from Jay (Jianqiang) Wang

The Practice of Data Driven Products in Kuaishou
The Practice of Data Driven Products in KuaishouThe Practice of Data Driven Products in Kuaishou
The Practice of Data Driven Products in KuaishouJay (Jianqiang) Wang
 
Artificial Intelligence in fashion -- Combining Statistics and Expert Human J...
Artificial Intelligence in fashion -- Combining Statistics and Expert Human J...Artificial Intelligence in fashion -- Combining Statistics and Expert Human J...
Artificial Intelligence in fashion -- Combining Statistics and Expert Human J...Jay (Jianqiang) Wang
 
Making data-informed decisions and building intelligent products (Chinese)
Making data-informed decisions and building intelligent products (Chinese)Making data-informed decisions and building intelligent products (Chinese)
Making data-informed decisions and building intelligent products (Chinese)Jay (Jianqiang) Wang
 
Notes on Machine Learning and Data-centric Startups
Notes on Machine Learning and Data-centric StartupsNotes on Machine Learning and Data-centric Startups
Notes on Machine Learning and Data-centric StartupsJay (Jianqiang) Wang
 
Introduction to data science and its application in online advertising
Introduction to data science and its application in online advertisingIntroduction to data science and its application in online advertising
Introduction to data science and its application in online advertisingJay (Jianqiang) Wang
 
How to prepare for data science interviews
How to prepare for data science interviewsHow to prepare for data science interviews
How to prepare for data science interviewsJay (Jianqiang) Wang
 
Introduction to data science and candidate data science projects
Introduction to data science and candidate data science projectsIntroduction to data science and candidate data science projects
Introduction to data science and candidate data science projectsJay (Jianqiang) Wang
 
Boosted multinomial logit model (working manuscript)
Boosted multinomial logit model (working manuscript)Boosted multinomial logit model (working manuscript)
Boosted multinomial logit model (working manuscript)Jay (Jianqiang) Wang
 
Boosted Tree-based Multinomial Logit Model for Aggregated Market Data
Boosted Tree-based Multinomial Logit Model for Aggregated Market DataBoosted Tree-based Multinomial Logit Model for Aggregated Market Data
Boosted Tree-based Multinomial Logit Model for Aggregated Market DataJay (Jianqiang) Wang
 
A Bayesian Approach to Estimating Agricultual Yield Based on Multiple Repeat...
A Bayesian Approach to Estimating Agricultual Yield Based on Multiple Repeat...A Bayesian Approach to Estimating Agricultual Yield Based on Multiple Repeat...
A Bayesian Approach to Estimating Agricultual Yield Based on Multiple Repeat...Jay (Jianqiang) Wang
 

More from Jay (Jianqiang) Wang (10)

The Practice of Data Driven Products in Kuaishou
The Practice of Data Driven Products in KuaishouThe Practice of Data Driven Products in Kuaishou
The Practice of Data Driven Products in Kuaishou
 
Artificial Intelligence in fashion -- Combining Statistics and Expert Human J...
Artificial Intelligence in fashion -- Combining Statistics and Expert Human J...Artificial Intelligence in fashion -- Combining Statistics and Expert Human J...
Artificial Intelligence in fashion -- Combining Statistics and Expert Human J...
 
Making data-informed decisions and building intelligent products (Chinese)
Making data-informed decisions and building intelligent products (Chinese)Making data-informed decisions and building intelligent products (Chinese)
Making data-informed decisions and building intelligent products (Chinese)
 
Notes on Machine Learning and Data-centric Startups
Notes on Machine Learning and Data-centric StartupsNotes on Machine Learning and Data-centric Startups
Notes on Machine Learning and Data-centric Startups
 
Introduction to data science and its application in online advertising
Introduction to data science and its application in online advertisingIntroduction to data science and its application in online advertising
Introduction to data science and its application in online advertising
 
How to prepare for data science interviews
How to prepare for data science interviewsHow to prepare for data science interviews
How to prepare for data science interviews
 
Introduction to data science and candidate data science projects
Introduction to data science and candidate data science projectsIntroduction to data science and candidate data science projects
Introduction to data science and candidate data science projects
 
Boosted multinomial logit model (working manuscript)
Boosted multinomial logit model (working manuscript)Boosted multinomial logit model (working manuscript)
Boosted multinomial logit model (working manuscript)
 
Boosted Tree-based Multinomial Logit Model for Aggregated Market Data
Boosted Tree-based Multinomial Logit Model for Aggregated Market DataBoosted Tree-based Multinomial Logit Model for Aggregated Market Data
Boosted Tree-based Multinomial Logit Model for Aggregated Market Data
 
A Bayesian Approach to Estimating Agricultual Yield Based on Multiple Repeat...
A Bayesian Approach to Estimating Agricultual Yield Based on Multiple Repeat...A Bayesian Approach to Estimating Agricultual Yield Based on Multiple Repeat...
A Bayesian Approach to Estimating Agricultual Yield Based on Multiple Repeat...
 

Recently uploaded

GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookmanojkuma9823
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 

Recently uploaded (20)

GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 

Multivariate outlier detection

  • 1. 1 Outlier Identification in National Resources Inventory and Theoretical Extensions to Nondifferentiable Survey Estimators Jianqiang Wang Major Professor: Jean Opsomer Committee: Wayne A. Fuller Song X. Chen Dan Nettleton Dimitris Margaritis
  • 2. 2 Outline W Introduction W Notation and assumptions W Mean and median-based inference W Variance estimation W Simulation study W Application in National Resources Inventory W Theoretical extensions
  • 3. 3 National Resources Inventory (1) W National Resources Inventory is a longitudinal survey of natural resources on non-Federal land in U.S. W Conducted by the USDA NRCS, in co-operation with CSSM at Iowa State University. W Produce a longitudinal database containing numerous agro- environmental variables for scientific investigation and policy-making. W Information was updated every 5 years before 1997 and annually through a partially overlapping subsampling design.
  • 4. 4 National Resources Inventory (2) W Various aspects of land use, farming practice, and environmentally important variables like wetland status and soil erosion. W Measure both level and change over time in these variables. W Primary mode of data collection is a combination of aerial photography and field collection. W Outliers arise from errors in data collection, processing or some real points themselves behave abnormally.
  • 5. 5 Outlier identification for a longitudinal survey W Identify outliers for periodically updated data. W Build outlier identification rules on previous years’ data and use the rules to flag current observations. Observe years 2001-2005 (2001,2002, 2003) (2003,2004 ,2005) Training set Test set
  • 6. 6 Target variables W Non-pseudo core points with soil erosion in years 2001- 2005. W Training set variables: broad use, land use, C factor, support practice factor, slope, slope length and USLE loss in years 2001, 2002 and 2003. W USLE loss represents the potential long term soil loss in tons/acre.USLELOSS= R * K * LS * C * P
  • 7. 7 Point classification b.u. Point Type b.u. Point Type 1 Cultivated cropland 7 Urban and built-up land 2 Noncultivated cropland 8 Rural transportation 3 Pastureland 9 Small water areas 4 Rangeland 10 Large water areas 5 Forest land 11 Rederal land 6 Minor land 12 CRP
  • 8. 8 Initial partitioning W Initial partitioning uses geographical association and broad use category. Partition national data into state-wise categories. Collapse northeastern states. Partition each region based on broad use sequence into (1,1,1), (2,2,2) (3,3,3), (12,12,12) and points with broad use change. Merge points with same broad use change pattern, say (2,2,3), (1,1,12).
  • 9. 9 Source of outlyingness W Flagged 1% points on training set, and compare test distances with 99%-quantile of training distances. W Source of outlyingness ^eº ;i = b§ ¡ 1 = 2 º ( ^¹ º ¡ y i ) k b§ ¡ 1 = 2 º ( ^¹ º ¡ y i )k
  • 10. 10 Analysis of flagged points W Agricultural specialists analyzed identified points by suspicious variables. W C factor: almost all points were considered suspicious. W Data entry errors W Invalid entries  c factor=1 for hayland, pastureland or CRP W Unusual levels or trends in relation to landuse (0.013, 0.13, 0.013, 0.013, 0.013) (0.011, 0.06, 0.11, 0.003, 0.003)
  • 11. 11 Analysis of flagged points W P factor: all points are candidates for review because of the change over time. W Slope length: all points were flagged because of the level, not change over time. (1.0, 1.0, 1.0, 0.6, 1.0)
  • 12. 12 Nondifferentiable survey estimators W The sample distance distribution is nondifferentiable function of the estimated location parameter. W A general class of survey estimators: with corresponding population quantity W A direct Taylor linearization may not be applicable, again use a differentiable limiting function , with derivative . bT(^¸ ) = 1 N P i 2 Sº 1 ¼i h(yi ; ^¸ ) TN (¸ N ) = 1 N PN i = 1 h(yi ; ¸ N ) Not necessarily differentiable T (° ) = lim N ! 1 TN (° ) ³ (° ) bDº ;d(^¹º )
  • 13. 13 Asymptotics W Under certain regularity conditions, where W The extra variance due to estimating unknown parameter may or may not be negligible. W Propose a kernel estimator to estimate unknown derivative. n¤1=2 h V ( bT(^¸ )) i¡ 1=2 ³ bT(^¸ ) ¡ TN (¸ N ) ´¯ ¯ ¯ ¯ F d ! N(0; 1) ( bT(^¸ )) = ³ 1; [³ (¸ N )]T ´ V (¹z¼) µ 1 ³ (¸ N ) ¶ :
  • 14. 14 Estimating distribution function using auxiliary information W Ratio model W Use as a substitute of , where . W Difference estimator W The extra variance due to estimating ratio is negligible (RKM, 1990). yi = Rxi + ²i ; ²i » N(0; xi ¾2) ^Rxi yi ^R = P S º yi =¼i P S º x i =¼i bT ( ^R) = 1 N nP Sº 1 ¼i I(yi · t) + hP U I( ^Rxi · t) ¡ P Sº 1 ¼i I( ^Rxi · t) io
  • 15. 15 Estimating a fraction below an estimated quantity W Estimate the fraction of households in poverty when the poverty line is drawn at 60% of the median income. with population quantity W Assume that , the extra variance depends on . bT (^q) = 1 N P Sº 1 ¼i I(yi · 0:6^q) TN (qN ) = 1 N NP i = 1 I(yi · 0:6qN ) lim N ! 1 TN (°) = FY (0:6°) @FY (0:6° ) @°
  • 16. 16 Concluding remarks W Proposed an estimator for subpopulation distance distribution and demonstrated its statistical properties. W Application in a large-scale longitudinal survey. W Theoretical extensions to nondifferentiable survey estimators.