Estimation of the Box-Cox Transformation Parameter and Application to Hydrologic Data Melanie Wong Friday, August 29, 2008
Introduction <ul><li>Many statistical tests assume normality </li></ul><ul><li>Most hydrologic data are highly skewed </li...
Detecting Normality <ul><li>β1 (third moment) = skewness </li></ul><ul><li>β2 (fourth moment) = kurtosis </li></ul><ul><li...
Moment Ratio Diagram
Transformation to Normality <ul><li>Hydrologic data are not normal </li></ul><ul><li>Various transformations available </l...
The Box-Cox Transformation
Transformation Procedure <ul><li>Decide if sample data is normal </li></ul><ul><li>Obtain value of λ </li></ul><ul><li>Tra...
Determining the Optimal λ <ul><li>“ Snap-to-the-grid” method </li></ul><ul><ul><li>Box and Cox (1967): “... fix one, or po...
Research Goal and Objectives <ul><li>Goal: </li></ul><ul><ul><li>To develop a better understanding of the Box-Cox transfor...
Sampling Variation 10,000 simulations
Second Objective <ul><ul><li>To provide a method for estimating the Box-Cox transformation parameter  for any set of data ...
The Importance of λ <ul><li>Small changes in λ    Large  changes in sampling variation </li></ul><ul><li>Need a more prec...
Optimizing λ <ul><li>Procedure: Monte Carlo simulation to identify sampling distributions of β1 and β2 values for differen...
Simulation Results β1 values  ↔ β2 values  ↕
Variation of λ
Confidence Intervals Procedure: Logarithmic <ul><li>1) Make Logarithmic transformation of x </li></ul><ul><li>2) Use norma...
Confidence Interval Procedure: Box-Cox <ul><li>1) Make BCT of x to y using optimum λ </li></ul><ul><li>2) Use normal theor...
Example 1: Monthly Rainfall Measurements <ul><li>36 monthly rainfall measurements (mm/month) from Lawrenceville, GA </li><...
Example 1: After Logarithmic Transformation <ul><li>β1: -1.12 β2: 4.40 </li></ul>
Example 1: After Box-Cox Transformation <ul><li>λ: 0.55 </li></ul><ul><li>β1: -0.09 β2: 2.82 </li></ul>
Histogram of Rainfall Data
90% Confidence Intervals  on Rainfall Box-Cox Transformed: Log Transformed:
Example 2: Drainage Pipe Costs <ul><li>The costs of 70 drainage systems </li></ul><ul><li>β1: 4.40  β2: 27.36 </li></ul>
Example 2: After Logarithmic Transformation <ul><li>β1: -0.167 β2: 4.04 </li></ul>
Example 2: After Box-Cox Transformation <ul><li>λ: 0.04 </li></ul><ul><li>β1: 0.00379 β2: 3.96 </li></ul>
Histogram of Pipe Cost Data
90% Confidence Intervals  on Pipe Cost Box-Cox Transformed: Log Transformed:
Conclusions <ul><li>Box-Cox transform is more suitable than logarithmic transform for: </li></ul><ul><ul><li>Normalizing d...
<ul><li>QUESTIONS? </li></ul>
Upcoming SlideShare
Loading in …5
×

Estimation Of The Box Cox Transformation Parameter And Application To Hydrologic Data 1

3,620 views

Published on

Culminating PowerPoint presentation for my summer research with Dr. Richard McCuen.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,620
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Estimation Of The Box Cox Transformation Parameter And Application To Hydrologic Data 1

  1. 1. Estimation of the Box-Cox Transformation Parameter and Application to Hydrologic Data Melanie Wong Friday, August 29, 2008
  2. 2. Introduction <ul><li>Many statistical tests assume normality </li></ul><ul><li>Most hydrologic data are highly skewed </li></ul>
  3. 3. Detecting Normality <ul><li>β1 (third moment) = skewness </li></ul><ul><li>β2 (fourth moment) = kurtosis </li></ul><ul><li>Characteristics of a normal distribution: </li></ul><ul><ul><li>β1=0 </li></ul></ul><ul><ul><li>β2=3 </li></ul></ul>
  4. 4. Moment Ratio Diagram
  5. 5. Transformation to Normality <ul><li>Hydrologic data are not normal </li></ul><ul><li>Various transformations available </li></ul><ul><ul><li>Logarithmic </li></ul></ul><ul><ul><li>Box-Cox </li></ul></ul>
  6. 6. The Box-Cox Transformation
  7. 7. Transformation Procedure <ul><li>Decide if sample data is normal </li></ul><ul><li>Obtain value of λ </li></ul><ul><li>Transform the data </li></ul><ul><li>Apply confidence intervals, statistical </li></ul><ul><li>tests, or tolerance limits </li></ul><ul><li>Perform inverse transformation </li></ul>
  8. 8. Determining the Optimal λ <ul><li>“ Snap-to-the-grid” method </li></ul><ul><ul><li>Box and Cox (1967): “... fix one, or possibly a small number, of λ's and go ahead with the detailed estimation...” </li></ul></ul><ul><li>Distribution-based method </li></ul><ul><ul><li>λ=-1.0 reciprocal transform </li></ul></ul><ul><ul><li>λ=-0.5 reciprocal square root transform </li></ul></ul><ul><ul><li>λ=0 natural log transform </li></ul></ul><ul><ul><li>λ=+0.5 square root transform </li></ul></ul><ul><ul><li>λ=1.0 no transformation needed </li></ul></ul>
  9. 9. Research Goal and Objectives <ul><li>Goal: </li></ul><ul><ul><li>To develop a better understanding of the Box-Cox transformation so that it can be applied with greater confidence </li></ul></ul><ul><li>Objectives: </li></ul><ul><ul><li>To characterize the sampling variation </li></ul></ul><ul><ul><li>To provide a method for estimating the Box-Cox transformation parameter for any set of data </li></ul></ul>
  10. 10. Sampling Variation 10,000 simulations
  11. 11. Second Objective <ul><ul><li>To provide a method for estimating the Box-Cox transformation parameter for any set of data </li></ul></ul>
  12. 12. The Importance of λ <ul><li>Small changes in λ  Large changes in sampling variation </li></ul><ul><li>Need a more precise method to obtain optimum λ </li></ul>
  13. 13. Optimizing λ <ul><li>Procedure: Monte Carlo simulation to identify sampling distributions of β1 and β2 values for different λ values </li></ul><ul><li>Distribution types: Gamma, exponential, uniform </li></ul><ul><li>Population sizes: 1000, 500, 200, 100, 50, 10 </li></ul>
  14. 14. Simulation Results β1 values ↔ β2 values ↕
  15. 15. Variation of λ
  16. 16. Confidence Intervals Procedure: Logarithmic <ul><li>1) Make Logarithmic transformation of x </li></ul><ul><li>2) Use normal theory for confidence intervals on y </li></ul><ul><li>3) Inverse transform the confidence intervals to values of x </li></ul>
  17. 17. Confidence Interval Procedure: Box-Cox <ul><li>1) Make BCT of x to y using optimum λ </li></ul><ul><li>2) Use normal theory for confidence intervals on y </li></ul><ul><li>3) Inverse transform the confidence intervals to </li></ul><ul><li>values of x </li></ul>
  18. 18. Example 1: Monthly Rainfall Measurements <ul><li>36 monthly rainfall measurements (mm/month) from Lawrenceville, GA </li></ul><ul><li>β1: 0.56 β2: 2.91 </li></ul>
  19. 19. Example 1: After Logarithmic Transformation <ul><li>β1: -1.12 β2: 4.40 </li></ul>
  20. 20. Example 1: After Box-Cox Transformation <ul><li>λ: 0.55 </li></ul><ul><li>β1: -0.09 β2: 2.82 </li></ul>
  21. 21. Histogram of Rainfall Data
  22. 22. 90% Confidence Intervals on Rainfall Box-Cox Transformed: Log Transformed:
  23. 23. Example 2: Drainage Pipe Costs <ul><li>The costs of 70 drainage systems </li></ul><ul><li>β1: 4.40 β2: 27.36 </li></ul>
  24. 24. Example 2: After Logarithmic Transformation <ul><li>β1: -0.167 β2: 4.04 </li></ul>
  25. 25. Example 2: After Box-Cox Transformation <ul><li>λ: 0.04 </li></ul><ul><li>β1: 0.00379 β2: 3.96 </li></ul>
  26. 26. Histogram of Pipe Cost Data
  27. 27. 90% Confidence Intervals on Pipe Cost Box-Cox Transformed: Log Transformed:
  28. 28. Conclusions <ul><li>Box-Cox transform is more suitable than logarithmic transform for: </li></ul><ul><ul><li>Normalizing data </li></ul></ul><ul><ul><li>Determining confidence intervals, tolerance limits, outliers, and other tests </li></ul></ul><ul><li>Sampling distributions of λ were determined </li></ul><ul><li>Optimum values of λ vs. β1 and β2 were developed </li></ul>
  29. 29. <ul><li>QUESTIONS? </li></ul>

×