SlideShare a Scribd company logo
1 of 21
Review
             Measuring Quality
           Bandwidth Selection
Multivariate Density Estimation




      Nonparametric Econometrics
     Kernel Methods for Density Estimation


                     James Nordlund


                       April 21, 2011




                      Nordlund    Nonparametric Econometrics
Review
                     Measuring Quality
                   Bandwidth Selection
        Multivariate Density Estimation



Example Problem




                              Nordlund    Nonparametric Econometrics
Review
                     Measuring Quality
                   Bandwidth Selection
        Multivariate Density Estimation



Example Problem




                              Nordlund    Nonparametric Econometrics
Review
                     Measuring Quality
                   Bandwidth Selection
        Multivariate Density Estimation




How useful are kernel density estimates?
    How many sample observations should we have?
    Are kernel functions always reliable or did I just provide
    one lucky example?




                              Nordlund    Nonparametric Econometrics
Review
                      Measuring Quality
                    Bandwidth Selection
         Multivariate Density Estimation



Modes of Convergence




      Convergence in rth Mean

      Big O notation




                               Nordlund    Nonparametric Econometrics
Review
                       Measuring Quality
                     Bandwidth Selection
          Multivariate Density Estimation



Definitions



  Definition (Convergence in rth Mean)
  We say that xn converges to X in the rth mean, if for some
  r > 0,
                      lim E[||xn − X||r ] = 0
                           n→∞

                          rth
  We write this as xn → X




                                Nordlund    Nonparametric Econometrics
Review
                       Measuring Quality
                     Bandwidth Selection
          Multivariate Density Estimation



Definitions



  Definition (Order: Big O)
  For a positive integer n, we write an = O(1) if, as n → ∞, an
  remains bounded, i.e., |an | ≤ C for some constant C and for all
  large values of n (an is a bounded sequence). Similarly, we write
  an = O(bn ) if an /bn = O(1), or equivalently an ≤ Cbn , for some
  constant C and for all n sufficiently large.




                                Nordlund    Nonparametric Econometrics
Review
                       Measuring Quality
                     Bandwidth Selection
          Multivariate Density Estimation



Main Theorem
  Theorem
  Let X1 , X2 , ..., Xn denote independent, identically distributed
  observations with a twice differentiable p.d.f., f (x), and let
  f (s) (x) denote the sth order derivative of f (x)(s = 1, 2). Let x
  be an interior point in the support of X, and let
                          −x
  f (x) = nh n k Xih . Assume that the kernel function, k(∗)
   ˆ         1
                  i=1
  is bounded and has µ2 < ∞. Assume that
  supξ∈S(X) |f (l) (ξ)| < ∞ for l = 0, 1, 2 where S(X) denotes the
  support of X. Assume that |u3 k(u)|du < ∞. Also, as n → ∞,
  h → 0, and nh → ∞, then

                            ˆ                          1
                       M SE(f (x)) = O h4 +
                                                      nh


                                Nordlund    Nonparametric Econometrics
Review
                        Measuring Quality
                      Bandwidth Selection
           Multivariate Density Estimation



Inside the Proof
   Recall that
         ˆ          ˆ                      ˆ             ˆ
    M SE(f (x) = E [f (x) − f (x)]2 = V ar(f (x)) + Bias(f (x))2 .

   Along the proof, we obtain

                 ˆ               h2 (2)
            Bias(f (x)) =          f (x)        u2 k(u)du + O(h3 )
                                 2
                       and
                  ˆ         1
             V ar(f (x)) =    f (x)            k(u)j du + O(h)
                           nh
   Notice that there is a trade off between minimizing variance
   and bias

                                 Nordlund    Nonparametric Econometrics
Review
                    Measuring Quality
                  Bandwidth Selection
       Multivariate Density Estimation




How do we balance variance and bias?




                             Nordlund    Nonparametric Econometrics
Review
                      Measuring Quality
                    Bandwidth Selection
         Multivariate Density Estimation



Important Tools



                                           2
     ISE(h) =          ˆ
                       f (x) − f (x)           dx

                                                    2
    M ISE(h) = E            ˆ
                            f (x) − f (x)               dx

                   1                                                         1               2
  AM ISE(h) =               k(x)2 dx + h4                    f (2) (x)2 dx       x2 k(x)dx
                  nh                                                         4




                               Nordlund        Nonparametric Econometrics
Review
                     Measuring Quality
                   Bandwidth Selection
        Multivariate Density Estimation



Optimal h




                                                                       1
                                            k(x)2 dx                   5

        hopt,AM ISE =                                            2
                              n    f (2) (x)2 dx    x2 k(x)dx




                              Nordlund    Nonparametric Econometrics
Review
                       Measuring Quality
                     Bandwidth Selection
          Multivariate Density Estimation



Rule of Thumb Methods

  A popular method is to assume the unknown function f has a
                                                  ˆ
  normal distribution. Then we know we know what S(α) should
  look like. This gives

                             hROT ≈ 1.06ˆ n−1/5
                                        σ

  Of course, if we knew what f looked like, we’d stick to
  parametric estimation techniques.

  Importantly, hROT is close to optimal for symmetric, unimodal
  densities

  In this case, we call hROT the normal reference rule of thumb


                                Nordlund    Nonparametric Econometrics
Review
                       Measuring Quality
                     Bandwidth Selection
          Multivariate Density Estimation



Plug-In Methods


  The plug-in method is a two step process

      Find hROT (usually just by taking the normal reference
      rule of thumb)

      Use hROT to estimate             f (2) (x)2 dx in
                                                              1
                                      k(x)2 dx                5

                                                          2
                      n    f (2) (x)2 dx     x2 k(x)dx




                                Nordlund    Nonparametric Econometrics
Review
                       Measuring Quality
                     Bandwidth Selection
          Multivariate Density Estimation



Plug-In Methods




  This improves the asymptotic rate of convergence for the kernel
  function

  Higher iterations do not add any usefulness




                                Nordlund    Nonparametric Econometrics
Review
                     Measuring Quality
                   Bandwidth Selection
        Multivariate Density Estimation




To be fair, we still make some assumption on the form of f
using the rule of thumb method

However, the assumption has less influence on our estimation,
and in applied settings the plug-in method does fairly well

More data-driven methods exist (e.g. Least Squares
Cross-Validation), but these can have a very slow rate of
convergence




                              Nordlund    Nonparametric Econometrics
Review
                     Measuring Quality
                   Bandwidth Selection
        Multivariate Density Estimation




What about multivariate density?




                              Nordlund    Nonparametric Econometrics
Review
                        Measuring Quality
                      Bandwidth Selection
           Multivariate Density Estimation



Multivariate Kernel


   Univariate:
                                             n
                         ˆ        1                      Xi − x
                         f (x) =                 k
                                 nh                        h
                                         i=1

   Multivariate:
                                                     n
                   ˆ              1                          Xi − x
                   f (x) =                               K
                             nh1 h2 · · · hq                   h
                                                 i=1




                                 Nordlund        Nonparametric Econometrics
Review
                        Measuring Quality
                      Bandwidth Selection
           Multivariate Density Estimation



Multivariate Properties

   Univariate:
                             ˆ                1
                        M SE(f (x)) = O h4 +
                                             nh
   Multivariate:
                                             q
                ˆ
           M SE(f (x)) = O                       h2
                                                  s   + (nh1 · · · hq )−1
                                         s=1

   Same trade-off between minimizing bias and variance




                                 Nordlund        Nonparametric Econometrics
Review
                       Measuring Quality
                     Bandwidth Selection
          Multivariate Density Estimation



Real Example




  DiNardo and Tobias (2001) - growth in female wage inequality

  Parametric methods missed sharp lower bound from minimum
  wage in 1979




                                Nordlund    Nonparametric Econometrics
Review
                Measuring Quality
              Bandwidth Selection
   Multivariate Density Estimation




Dr. Olofsson and Trinity Mathematics
H¨rdle, W. and Linton, O. (1994). Applied Nonparametric
 a
Methods. Handbook of Econometrics. 2297-2339.
Jones, M., Marron, J., Sheather, S. (1996). A Brief Survey
of Bandwidth Selection for Density Estimation. Journal of
the American Statistical Association. Vol 91. No. 433.
401-407.
Li, Q., Racine, J. (2007). Nonparametric Econometrics.




                         Nordlund    Nonparametric Econometrics

More Related Content

What's hot

slides_nuclear_norm_regularization_david_mateos
slides_nuclear_norm_regularization_david_mateosslides_nuclear_norm_regularization_david_mateos
slides_nuclear_norm_regularization_david_mateosDavid Mateos
 
somenath_fixedpoint_dasguptaIMF17-20-2013
somenath_fixedpoint_dasguptaIMF17-20-2013somenath_fixedpoint_dasguptaIMF17-20-2013
somenath_fixedpoint_dasguptaIMF17-20-2013Somenath Bandyopadhyay
 
Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and StatisticsMalik Sb
 
Description and retrieval of medical visual information based on language mod...
Description and retrieval of medical visual information based on language mod...Description and retrieval of medical visual information based on language mod...
Description and retrieval of medical visual information based on language mod...Antonio Foncubierta Rodriguez
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
 
Gentle Introduction to Dirichlet Processes
Gentle Introduction to Dirichlet ProcessesGentle Introduction to Dirichlet Processes
Gentle Introduction to Dirichlet ProcessesYap Wooi Hen
 
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Chiheb Ben Hammouda
 
23 industrial engineering
23 industrial engineering23 industrial engineering
23 industrial engineeringmloeb825
 
Geometric and viscosity solutions for the Cauchy problem of first order
Geometric and viscosity solutions for the Cauchy problem of first orderGeometric and viscosity solutions for the Cauchy problem of first order
Geometric and viscosity solutions for the Cauchy problem of first orderJuliho Castillo
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsJulyan Arbel
 
Study of the impact of dielectric constant perturbation on electromagnetic
Study of the impact of dielectric constant perturbation on electromagneticStudy of the impact of dielectric constant perturbation on electromagnetic
Study of the impact of dielectric constant perturbation on electromagneticAlexander Decker
 
Resource theory of asymmetric distinguishability
Resource theory of asymmetric distinguishabilityResource theory of asymmetric distinguishability
Resource theory of asymmetric distinguishabilityMark Wilde
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsJulyan Arbel
 
Beating the Spread: Time-Optimal Point Meshing
Beating the Spread: Time-Optimal Point MeshingBeating the Spread: Time-Optimal Point Meshing
Beating the Spread: Time-Optimal Point MeshingDon Sheehy
 

What's hot (20)

QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
kcde
kcdekcde
kcde
 
slides_nuclear_norm_regularization_david_mateos
slides_nuclear_norm_regularization_david_mateosslides_nuclear_norm_regularization_david_mateos
slides_nuclear_norm_regularization_david_mateos
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
 
somenath_fixedpoint_dasguptaIMF17-20-2013
somenath_fixedpoint_dasguptaIMF17-20-2013somenath_fixedpoint_dasguptaIMF17-20-2013
somenath_fixedpoint_dasguptaIMF17-20-2013
 
Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and Statistics
 
Description and retrieval of medical visual information based on language mod...
Description and retrieval of medical visual information based on language mod...Description and retrieval of medical visual information based on language mod...
Description and retrieval of medical visual information based on language mod...
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
Gentle Introduction to Dirichlet Processes
Gentle Introduction to Dirichlet ProcessesGentle Introduction to Dirichlet Processes
Gentle Introduction to Dirichlet Processes
 
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
 
Mines April 2017 Colloquium
Mines April 2017 ColloquiumMines April 2017 Colloquium
Mines April 2017 Colloquium
 
23 industrial engineering
23 industrial engineering23 industrial engineering
23 industrial engineering
 
Geometric and viscosity solutions for the Cauchy problem of first order
Geometric and viscosity solutions for the Cauchy problem of first orderGeometric and viscosity solutions for the Cauchy problem of first order
Geometric and viscosity solutions for the Cauchy problem of first order
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian Nonparametrics
 
Study of the impact of dielectric constant perturbation on electromagnetic
Study of the impact of dielectric constant perturbation on electromagneticStudy of the impact of dielectric constant perturbation on electromagnetic
Study of the impact of dielectric constant perturbation on electromagnetic
 
Resource theory of asymmetric distinguishability
Resource theory of asymmetric distinguishabilityResource theory of asymmetric distinguishability
Resource theory of asymmetric distinguishability
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
A Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian NonparametricsA Gentle Introduction to Bayesian Nonparametrics
A Gentle Introduction to Bayesian Nonparametrics
 
Beating the Spread: Time-Optimal Point Meshing
Beating the Spread: Time-Optimal Point MeshingBeating the Spread: Time-Optimal Point Meshing
Beating the Spread: Time-Optimal Point Meshing
 
Mo u quantified
Mo u   quantifiedMo u   quantified
Mo u quantified
 

Viewers also liked

Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...Beniamino Murgante
 
2014.7.9 detecting p2 p botnets through network behavior analysis and machine...
2014.7.9 detecting p2 p botnets through network behavior analysis and machine...2014.7.9 detecting p2 p botnets through network behavior analysis and machine...
2014.7.9 detecting p2 p botnets through network behavior analysis and machine...ericsuboy
 
Chaubey seminarslides2017
Chaubey seminarslides2017Chaubey seminarslides2017
Chaubey seminarslides2017ychaubey
 
WSDM'16 Relational Learning with Social Status Analysis
WSDM'16 Relational Learning with Social Status AnalysisWSDM'16 Relational Learning with Social Status Analysis
WSDM'16 Relational Learning with Social Status AnalysisArizona State University
 
Moa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsMoa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsAlbert Bifet
 
Lundi 16h15-copules-charpentier
Lundi 16h15-copules-charpentierLundi 16h15-copules-charpentier
Lundi 16h15-copules-charpentierArthur Charpentier
 

Viewers also liked (7)

Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
Network Based Kernel Density Estimation for Cycling Facilities Optimal Locati...
 
2014.7.9 detecting p2 p botnets through network behavior analysis and machine...
2014.7.9 detecting p2 p botnets through network behavior analysis and machine...2014.7.9 detecting p2 p botnets through network behavior analysis and machine...
2014.7.9 detecting p2 p botnets through network behavior analysis and machine...
 
Chaubey seminarslides2017
Chaubey seminarslides2017Chaubey seminarslides2017
Chaubey seminarslides2017
 
WSDM'16 Relational Learning with Social Status Analysis
WSDM'16 Relational Learning with Social Status AnalysisWSDM'16 Relational Learning with Social Status Analysis
WSDM'16 Relational Learning with Social Status Analysis
 
Slides smart-2015
Slides smart-2015Slides smart-2015
Slides smart-2015
 
Moa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsMoa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data Streams
 
Lundi 16h15-copules-charpentier
Lundi 16h15-copules-charpentierLundi 16h15-copules-charpentier
Lundi 16h15-copules-charpentier
 

Similar to Nonparametric Density Estimation

Randomness conductors
Randomness conductorsRandomness conductors
Randomness conductorswtyru1989
 
columbus15_cattaneo.pdf
columbus15_cattaneo.pdfcolumbus15_cattaneo.pdf
columbus15_cattaneo.pdfAhmadM65
 
Condition Monitoring Of Unsteadily Operating Equipment
Condition Monitoring Of Unsteadily Operating EquipmentCondition Monitoring Of Unsteadily Operating Equipment
Condition Monitoring Of Unsteadily Operating EquipmentJordan McBain
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Frank Nielsen
 
Lecture: Monte Carlo Methods
Lecture: Monte Carlo MethodsLecture: Monte Carlo Methods
Lecture: Monte Carlo MethodsFrank Kienle
 
Q-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeQ-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeMagdi Mohamed
 
Q-Metrics in Theory And Practice
Q-Metrics in Theory And PracticeQ-Metrics in Theory And Practice
Q-Metrics in Theory And Practiceguest3550292
 
Stochastic Differentiation
Stochastic DifferentiationStochastic Differentiation
Stochastic DifferentiationSSA KPI
 
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...grssieee
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012Zheng Mengdi
 
Some Thoughts on Sampling
Some Thoughts on SamplingSome Thoughts on Sampling
Some Thoughts on SamplingDon Sheehy
 
A common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spacesA common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spacesAlexander Decker
 
Monte-Carlo method for Two-Stage SLP
Monte-Carlo method for Two-Stage SLPMonte-Carlo method for Two-Stage SLP
Monte-Carlo method for Two-Stage SLPSSA KPI
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingFrank Nielsen
 
Topological Inference via Meshing
Topological Inference via MeshingTopological Inference via Meshing
Topological Inference via MeshingDon Sheehy
 

Similar to Nonparametric Density Estimation (20)

Randomness conductors
Randomness conductorsRandomness conductors
Randomness conductors
 
columbus15_cattaneo.pdf
columbus15_cattaneo.pdfcolumbus15_cattaneo.pdf
columbus15_cattaneo.pdf
 
Pres metabief2020jmm
Pres metabief2020jmmPres metabief2020jmm
Pres metabief2020jmm
 
Condition Monitoring Of Unsteadily Operating Equipment
Condition Monitoring Of Unsteadily Operating EquipmentCondition Monitoring Of Unsteadily Operating Equipment
Condition Monitoring Of Unsteadily Operating Equipment
 
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)Computational Information Geometry on Matrix Manifolds (ICTP 2013)
Computational Information Geometry on Matrix Manifolds (ICTP 2013)
 
Optimization tutorial
Optimization tutorialOptimization tutorial
Optimization tutorial
 
Lecture: Monte Carlo Methods
Lecture: Monte Carlo MethodsLecture: Monte Carlo Methods
Lecture: Monte Carlo Methods
 
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
QMC: Transition Workshop - Density Estimation by Randomized Quasi-Monte Carlo...
 
Q-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeQ-Metrics in Theory and Practice
Q-Metrics in Theory and Practice
 
Q-Metrics in Theory And Practice
Q-Metrics in Theory And PracticeQ-Metrics in Theory And Practice
Q-Metrics in Theory And Practice
 
Stochastic Differentiation
Stochastic DifferentiationStochastic Differentiation
Stochastic Differentiation
 
KAUST_talk_short.pdf
KAUST_talk_short.pdfKAUST_talk_short.pdf
KAUST_talk_short.pdf
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...
Mapping Ash Tree Colonization in an Agricultural Moutain Landscape_ Investiga...
 
SPDE presentation 2012
SPDE presentation 2012SPDE presentation 2012
SPDE presentation 2012
 
Some Thoughts on Sampling
Some Thoughts on SamplingSome Thoughts on Sampling
Some Thoughts on Sampling
 
A common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spacesA common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spaces
 
Monte-Carlo method for Two-Stage SLP
Monte-Carlo method for Two-Stage SLPMonte-Carlo method for Two-Stage SLP
Monte-Carlo method for Two-Stage SLP
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processing
 
Topological Inference via Meshing
Topological Inference via MeshingTopological Inference via Meshing
Topological Inference via Meshing
 

Nonparametric Density Estimation

  • 1. Review Measuring Quality Bandwidth Selection Multivariate Density Estimation Nonparametric Econometrics Kernel Methods for Density Estimation James Nordlund April 21, 2011 Nordlund Nonparametric Econometrics
  • 2. Review Measuring Quality Bandwidth Selection Multivariate Density Estimation Example Problem Nordlund Nonparametric Econometrics
  • 3. Review Measuring Quality Bandwidth Selection Multivariate Density Estimation Example Problem Nordlund Nonparametric Econometrics
  • 4. Review Measuring Quality Bandwidth Selection Multivariate Density Estimation How useful are kernel density estimates? How many sample observations should we have? Are kernel functions always reliable or did I just provide one lucky example? Nordlund Nonparametric Econometrics
  • 5. Review Measuring Quality Bandwidth Selection Multivariate Density Estimation Modes of Convergence Convergence in rth Mean Big O notation Nordlund Nonparametric Econometrics
  • 6. Review Measuring Quality Bandwidth Selection Multivariate Density Estimation Definitions Definition (Convergence in rth Mean) We say that xn converges to X in the rth mean, if for some r > 0, lim E[||xn − X||r ] = 0 n→∞ rth We write this as xn → X Nordlund Nonparametric Econometrics
  • 7. Review Measuring Quality Bandwidth Selection Multivariate Density Estimation Definitions Definition (Order: Big O) For a positive integer n, we write an = O(1) if, as n → ∞, an remains bounded, i.e., |an | ≤ C for some constant C and for all large values of n (an is a bounded sequence). Similarly, we write an = O(bn ) if an /bn = O(1), or equivalently an ≤ Cbn , for some constant C and for all n sufficiently large. Nordlund Nonparametric Econometrics
  • 8. Review Measuring Quality Bandwidth Selection Multivariate Density Estimation Main Theorem Theorem Let X1 , X2 , ..., Xn denote independent, identically distributed observations with a twice differentiable p.d.f., f (x), and let f (s) (x) denote the sth order derivative of f (x)(s = 1, 2). Let x be an interior point in the support of X, and let −x f (x) = nh n k Xih . Assume that the kernel function, k(∗) ˆ 1 i=1 is bounded and has µ2 < ∞. Assume that supξ∈S(X) |f (l) (ξ)| < ∞ for l = 0, 1, 2 where S(X) denotes the support of X. Assume that |u3 k(u)|du < ∞. Also, as n → ∞, h → 0, and nh → ∞, then ˆ 1 M SE(f (x)) = O h4 + nh Nordlund Nonparametric Econometrics
  • 9. Review Measuring Quality Bandwidth Selection Multivariate Density Estimation Inside the Proof Recall that ˆ ˆ ˆ ˆ M SE(f (x) = E [f (x) − f (x)]2 = V ar(f (x)) + Bias(f (x))2 . Along the proof, we obtain ˆ h2 (2) Bias(f (x)) = f (x) u2 k(u)du + O(h3 ) 2 and ˆ 1 V ar(f (x)) = f (x) k(u)j du + O(h) nh Notice that there is a trade off between minimizing variance and bias Nordlund Nonparametric Econometrics
  • 10. Review Measuring Quality Bandwidth Selection Multivariate Density Estimation How do we balance variance and bias? Nordlund Nonparametric Econometrics
  • 11. Review Measuring Quality Bandwidth Selection Multivariate Density Estimation Important Tools 2 ISE(h) = ˆ f (x) − f (x) dx 2 M ISE(h) = E ˆ f (x) − f (x) dx 1 1 2 AM ISE(h) = k(x)2 dx + h4 f (2) (x)2 dx x2 k(x)dx nh 4 Nordlund Nonparametric Econometrics
  • 12. Review Measuring Quality Bandwidth Selection Multivariate Density Estimation Optimal h 1 k(x)2 dx 5 hopt,AM ISE = 2 n f (2) (x)2 dx x2 k(x)dx Nordlund Nonparametric Econometrics
  • 13. Review Measuring Quality Bandwidth Selection Multivariate Density Estimation Rule of Thumb Methods A popular method is to assume the unknown function f has a ˆ normal distribution. Then we know we know what S(α) should look like. This gives hROT ≈ 1.06ˆ n−1/5 σ Of course, if we knew what f looked like, we’d stick to parametric estimation techniques. Importantly, hROT is close to optimal for symmetric, unimodal densities In this case, we call hROT the normal reference rule of thumb Nordlund Nonparametric Econometrics
  • 14. Review Measuring Quality Bandwidth Selection Multivariate Density Estimation Plug-In Methods The plug-in method is a two step process Find hROT (usually just by taking the normal reference rule of thumb) Use hROT to estimate f (2) (x)2 dx in 1 k(x)2 dx 5 2 n f (2) (x)2 dx x2 k(x)dx Nordlund Nonparametric Econometrics
  • 15. Review Measuring Quality Bandwidth Selection Multivariate Density Estimation Plug-In Methods This improves the asymptotic rate of convergence for the kernel function Higher iterations do not add any usefulness Nordlund Nonparametric Econometrics
  • 16. Review Measuring Quality Bandwidth Selection Multivariate Density Estimation To be fair, we still make some assumption on the form of f using the rule of thumb method However, the assumption has less influence on our estimation, and in applied settings the plug-in method does fairly well More data-driven methods exist (e.g. Least Squares Cross-Validation), but these can have a very slow rate of convergence Nordlund Nonparametric Econometrics
  • 17. Review Measuring Quality Bandwidth Selection Multivariate Density Estimation What about multivariate density? Nordlund Nonparametric Econometrics
  • 18. Review Measuring Quality Bandwidth Selection Multivariate Density Estimation Multivariate Kernel Univariate: n ˆ 1 Xi − x f (x) = k nh h i=1 Multivariate: n ˆ 1 Xi − x f (x) = K nh1 h2 · · · hq h i=1 Nordlund Nonparametric Econometrics
  • 19. Review Measuring Quality Bandwidth Selection Multivariate Density Estimation Multivariate Properties Univariate: ˆ 1 M SE(f (x)) = O h4 + nh Multivariate: q ˆ M SE(f (x)) = O h2 s + (nh1 · · · hq )−1 s=1 Same trade-off between minimizing bias and variance Nordlund Nonparametric Econometrics
  • 20. Review Measuring Quality Bandwidth Selection Multivariate Density Estimation Real Example DiNardo and Tobias (2001) - growth in female wage inequality Parametric methods missed sharp lower bound from minimum wage in 1979 Nordlund Nonparametric Econometrics
  • 21. Review Measuring Quality Bandwidth Selection Multivariate Density Estimation Dr. Olofsson and Trinity Mathematics H¨rdle, W. and Linton, O. (1994). Applied Nonparametric a Methods. Handbook of Econometrics. 2297-2339. Jones, M., Marron, J., Sheather, S. (1996). A Brief Survey of Bandwidth Selection for Density Estimation. Journal of the American Statistical Association. Vol 91. No. 433. 401-407. Li, Q., Racine, J. (2007). Nonparametric Econometrics. Nordlund Nonparametric Econometrics