SlideShare a Scribd company logo
1 of 14
Atmospheric Power Law Behavior
James Duncan

Philip Sura
The Study of Extreme Events

•   Extreme climatic events are weather phenomena that
    occupy the tails of a dataset’s probability density
    function (PDF).

•   While it understood that the PDFs of atmospheric
    phenomena are non-Gaussian, the exact
    shape/distribution of these tails are not fully understood.

•   Analysis of recent observational studies have shown that
    many atmospheric variables follow a power law
    distribution in the tails of their distribution.
What is a Power Law Distribution?
Mathematically, a power law probability
distribution of quantity x may be written
as:

                             -a
           p(x) = Cx         min

Where α is the exponent or scaling
parameter and C is the normalization
constant.

Stochastic theory asserts that power law
distributions should exist in the tails of
distributions.
                                             [Newmans et al. 1986]
Construction of the Power Law
Algorithm

•   Calculate a lower bound xmin and some scaling
    parameter α of our power law distribution.

•   Calculate the goodness of fit between the empirical
    data and the power law. Make preliminary conclusion
    based upon resulting p-value.

•   Perform a likelihood ratio test comparing competing
    hypothesis/distribution fits.
Estimating Lower Bound on
    Power Law Behavior
•   For the case of empirical data, if
    the data is to follow a power-
    law distribution, it does so only
    above some lower bound xmin.

•   To find our lower-bound, xmin, we
    employ the Kolmogorov-Smirnov
    or KS Statistic which the
    maximum difference between
    the CDF of the observed data
    and the CDF of the estimated
    power law distribution.

                                         [Press et al. 1986]
    D = max | F(x) - P(x) |
           x³ x min
Estimating the Scaling Parameter

•   An accurate estimate of α is dependent upon an
    accurate estimate of our lower bound, xmin.

•   To do so, we employ the “method of maximum likelihood”
    given by:
                         én     xi   ù- 1
                 a =1+ n êå ln
                 ö                   ú
                         ë i=1 x min û
Significance Testing
•   Employ the use of a goodness of fit test which will
    measure and analyze the KS distance of our power law
    distribution with that of other synthetically derived power
    law distributions.

•   From the goodness of fit test, we are able to derive a “p-
    value” which expresses the probability that the estimated
    power law distribution is a good fit to the observed data.
About the Data

•   Daily weather observations from the southeastern United
    States (AL, FL, GA, NC, SC) spanning 1948-2009.

•   Data includes minimum and maximum temperatures,
    and daily precipitation amounts.

•   Mean annual cycle has been removed from the data.
Motivation for Power Law

    Skewness                          Kurtosis




      µ3       [Press et al. 1986]
                                        µ4
   γ=                                κ=    - 3
      s3                                s4
Skewness/Kurtosis




        [Courtesy of Dr Stefanova}
Ft Lauderdale




            α      xmin    ppower   pgauss                    α        xmin     ppower      pgauss

Positive   23.53   2.44    .118     0.00      Positive       10.98     2.62      .498       .602

                                              Negative       5.76      3.16      .364       .007
Negative   7.64    2.98     .22     .037

           Negative Kurtosis, Negative Skew              Positive Kurtosis, Negative Skew
Pensacola




              α      xmin     ppower    pgauss                 α      xmin    ppower   pgauss

Positive    14.7     2.44     .0204     0.00     Positive     14.7    2.44    .0204    0.00

Negative    7.68     3.16     .928      .076     Negative     7.68    3.16    .928     .076

           Negative Kurtosis, Positive Skew             Negative Kurtosis, Negative Skew
Future Work

•   Further examine power law distributions in the physical
    world.

•   Analyze these distributions during years of distinction:
    •   El Niño and La Niña years
    •   Seasonal trends
    •   Historically active or tranquil hurricane seasons
    •   Years of intense drought or flooding events
Questions?
References:
Clauset, A., C. R. Shalizi, and M. E. J. Newman, Power-law distributions in empirical
data, SIAM Review, 51, 661-703, 2009.
Newman, M. E. J., Power laws, Pareto distributions and Zipf’s law, Contemporary
Physics, 46(5), 323-351, 2005.
Press, W. H., B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes: The
Art of Scientific Computing, 1st ed., 818 pp., Cambridge University Press, 1986.

More Related Content

Similar to COAPS Short Seminar Series

R Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceR Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceWork-Bench
 
Quantum Models for Decision and Cognition
Quantum Models for Decision and CognitionQuantum Models for Decision and Cognition
Quantum Models for Decision and CognitionCatarina Moreira
 
UNF Undergrad Physics
UNF Undergrad PhysicsUNF Undergrad Physics
UNF Undergrad PhysicsNick Kypreos
 
Efficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream ClassifiersEfficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream ClassifiersAlbert Bifet
 
Transportation and logistics modeling 2
Transportation and logistics modeling 2Transportation and logistics modeling 2
Transportation and logistics modeling 2karim sal3awi
 
Unit1_Prerequisites.pdf
Unit1_Prerequisites.pdfUnit1_Prerequisites.pdf
Unit1_Prerequisites.pdfpalashgupta53
 
NNBAR SESAPS PRESENTATION FINAL
NNBAR SESAPS PRESENTATION FINALNNBAR SESAPS PRESENTATION FINAL
NNBAR SESAPS PRESENTATION FINALJoshua Barrow
 
Improving Physical Parametrizations in Climate Models using Machine Learning
Improving Physical Parametrizations in Climate Models using Machine LearningImproving Physical Parametrizations in Climate Models using Machine Learning
Improving Physical Parametrizations in Climate Models using Machine LearningNoah Brenowitz
 
NMK Developing and Evaluating Algorithms for Gaussian State Reconstruction FINAL
NMK Developing and Evaluating Algorithms for Gaussian State Reconstruction FINALNMK Developing and Evaluating Algorithms for Gaussian State Reconstruction FINAL
NMK Developing and Evaluating Algorithms for Gaussian State Reconstruction FINALNadav Kravitz
 
A@kash Physics NCERT Maps.pdf physics short notes physics short notes physics...
A@kash Physics NCERT Maps.pdf physics short notes physics short notes physics...A@kash Physics NCERT Maps.pdf physics short notes physics short notes physics...
A@kash Physics NCERT Maps.pdf physics short notes physics short notes physics...FahadAlam52
 

Similar to COAPS Short Seminar Series (20)

R Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceR Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal Dependence
 
Quantum Models for Decision and Cognition
Quantum Models for Decision and CognitionQuantum Models for Decision and Cognition
Quantum Models for Decision and Cognition
 
Unit3
Unit3Unit3
Unit3
 
UNF Undergrad Physics
UNF Undergrad PhysicsUNF Undergrad Physics
UNF Undergrad Physics
 
Poster
PosterPoster
Poster
 
Efficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream ClassifiersEfficient Online Evaluation of Big Data Stream Classifiers
Efficient Online Evaluation of Big Data Stream Classifiers
 
Transportation and logistics modeling 2
Transportation and logistics modeling 2Transportation and logistics modeling 2
Transportation and logistics modeling 2
 
Davis_Research_Report
Davis_Research_ReportDavis_Research_Report
Davis_Research_Report
 
Unit1_Prerequisites.pdf
Unit1_Prerequisites.pdfUnit1_Prerequisites.pdf
Unit1_Prerequisites.pdf
 
NNBAR SESAPS PRESENTATION FINAL
NNBAR SESAPS PRESENTATION FINALNNBAR SESAPS PRESENTATION FINAL
NNBAR SESAPS PRESENTATION FINAL
 
Improving Physical Parametrizations in Climate Models using Machine Learning
Improving Physical Parametrizations in Climate Models using Machine LearningImproving Physical Parametrizations in Climate Models using Machine Learning
Improving Physical Parametrizations in Climate Models using Machine Learning
 
modal pushover analysis
modal pushover analysismodal pushover analysis
modal pushover analysis
 
Quantum theory research overview
Quantum theory research overview Quantum theory research overview
Quantum theory research overview
 
Yield curve estimation in Costa Rica
Yield curve estimation in Costa RicaYield curve estimation in Costa Rica
Yield curve estimation in Costa Rica
 
MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty fo...
MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty fo...MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty fo...
MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty fo...
 
Vaulin pohang 2010
Vaulin pohang 2010Vaulin pohang 2010
Vaulin pohang 2010
 
NMK Developing and Evaluating Algorithms for Gaussian State Reconstruction FINAL
NMK Developing and Evaluating Algorithms for Gaussian State Reconstruction FINALNMK Developing and Evaluating Algorithms for Gaussian State Reconstruction FINAL
NMK Developing and Evaluating Algorithms for Gaussian State Reconstruction FINAL
 
A@kash Physics NCERT Maps.pdf physics short notes physics short notes physics...
A@kash Physics NCERT Maps.pdf physics short notes physics short notes physics...A@kash Physics NCERT Maps.pdf physics short notes physics short notes physics...
A@kash Physics NCERT Maps.pdf physics short notes physics short notes physics...
 
1789 1800
1789 18001789 1800
1789 1800
 
1789 1800
1789 18001789 1800
1789 1800
 

COAPS Short Seminar Series

  • 1. Atmospheric Power Law Behavior James Duncan Philip Sura
  • 2. The Study of Extreme Events • Extreme climatic events are weather phenomena that occupy the tails of a dataset’s probability density function (PDF). • While it understood that the PDFs of atmospheric phenomena are non-Gaussian, the exact shape/distribution of these tails are not fully understood. • Analysis of recent observational studies have shown that many atmospheric variables follow a power law distribution in the tails of their distribution.
  • 3. What is a Power Law Distribution? Mathematically, a power law probability distribution of quantity x may be written as: -a p(x) = Cx min Where α is the exponent or scaling parameter and C is the normalization constant. Stochastic theory asserts that power law distributions should exist in the tails of distributions. [Newmans et al. 1986]
  • 4. Construction of the Power Law Algorithm • Calculate a lower bound xmin and some scaling parameter α of our power law distribution. • Calculate the goodness of fit between the empirical data and the power law. Make preliminary conclusion based upon resulting p-value. • Perform a likelihood ratio test comparing competing hypothesis/distribution fits.
  • 5. Estimating Lower Bound on Power Law Behavior • For the case of empirical data, if the data is to follow a power- law distribution, it does so only above some lower bound xmin. • To find our lower-bound, xmin, we employ the Kolmogorov-Smirnov or KS Statistic which the maximum difference between the CDF of the observed data and the CDF of the estimated power law distribution. [Press et al. 1986] D = max | F(x) - P(x) | x³ x min
  • 6. Estimating the Scaling Parameter • An accurate estimate of α is dependent upon an accurate estimate of our lower bound, xmin. • To do so, we employ the “method of maximum likelihood” given by: én xi ù- 1 a =1+ n êå ln ö ú ë i=1 x min û
  • 7. Significance Testing • Employ the use of a goodness of fit test which will measure and analyze the KS distance of our power law distribution with that of other synthetically derived power law distributions. • From the goodness of fit test, we are able to derive a “p- value” which expresses the probability that the estimated power law distribution is a good fit to the observed data.
  • 8. About the Data • Daily weather observations from the southeastern United States (AL, FL, GA, NC, SC) spanning 1948-2009. • Data includes minimum and maximum temperatures, and daily precipitation amounts. • Mean annual cycle has been removed from the data.
  • 9. Motivation for Power Law Skewness Kurtosis µ3 [Press et al. 1986] µ4 γ= κ= - 3 s3 s4
  • 10. Skewness/Kurtosis [Courtesy of Dr Stefanova}
  • 11. Ft Lauderdale α xmin ppower pgauss α xmin ppower pgauss Positive 23.53 2.44 .118 0.00 Positive 10.98 2.62 .498 .602 Negative 5.76 3.16 .364 .007 Negative 7.64 2.98 .22 .037 Negative Kurtosis, Negative Skew Positive Kurtosis, Negative Skew
  • 12. Pensacola α xmin ppower pgauss α xmin ppower pgauss Positive 14.7 2.44 .0204 0.00 Positive 14.7 2.44 .0204 0.00 Negative 7.68 3.16 .928 .076 Negative 7.68 3.16 .928 .076 Negative Kurtosis, Positive Skew Negative Kurtosis, Negative Skew
  • 13. Future Work • Further examine power law distributions in the physical world. • Analyze these distributions during years of distinction: • El Niño and La Niña years • Seasonal trends • Historically active or tranquil hurricane seasons • Years of intense drought or flooding events
  • 14. Questions? References: Clauset, A., C. R. Shalizi, and M. E. J. Newman, Power-law distributions in empirical data, SIAM Review, 51, 661-703, 2009. Newman, M. E. J., Power laws, Pareto distributions and Zipf’s law, Contemporary Physics, 46(5), 323-351, 2005. Press, W. H., B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes: The Art of Scientific Computing, 1st ed., 818 pp., Cambridge University Press, 1986.

Editor's Notes

  1. Where the magnitude of the event is large, but the probability of the occurrence is small
  2. Transition to Power Law (Not Arbitrary to Look at Power Law) Mathematically -a quantity x obeys a power law if -When the frequency of an event varies as a power of some attribute of that event -more often the power law applies only for values greater than some minimum xmin, in such cases we say that the tail of the distribution follows a power law. -The distribution must deviate from the power-law form below some minimum value xmin. (Describe Physical Distribution of Power Law) -The ubiquity of power-law behavior in the natural world has led many scientists to wonder whether there is a single, simple, underlying mechanism linking all these different systems together. -It has been shown from observations that many atmospheric variables follow a power law distribution in the tails -Power-law distributions occur in an extraordinarily diverse range of phenomena. -Word frequency, Web Hits, Copies of Books Sold, Wealth fo Richest Americans, Instensity of Solar Flares, Populations of Cities Properties of Power IMPORTANT WITH RESEPCT TO CLIMATE CHANGE BECAUSE IF WE GET A SMALL SHIFT IN THE MEAN OF A DATASET, THEN THE EXTREME VALUES BECOME MORE IMPORTANT
  3. Must first obtain a PDF of the data itself. -create a timeseries of normalized anomalies that are ready to be placed in histograms -placing normalized anomalies into “ bins” rainging from -10 to 10 standard devaitions -histogram is normalized, creating a PDF In order to utilize the K-S statitistic, the CDFs of both the observed data and the estimated power law distirbution must be calculated. We would like to generate a sequence of independent random variables that are uniformly distributed throughout the domain. We utilize the Mersenne Twister, pseudorandom number generator. Want a dataset that attempts to follow the same power law fit as estimated by the power law algorithm. Because the observed atmospheric and oceanic data values exhibit an autocorrelation bias, we must approximate a separate “de=correlation timescale” or length of time (time lag) that it takes a sequence of data to beocme un-correlated. Diving the length of our observed timeseries witht eh value of the de-correlation timescale.
  4. When attempting to fit a probability distribution to empirical data, it is nearly impossible to find only one distribution that describes the behavior of the data. One typically cannot say with absolute certainty that an empirical data set is described by a specific probability distribution Rather, it can only be stated that the observed data is in agreement with the proposed PDF. (test various values of xmin, choose the one with the msallest K-S statisitc) Our method attempts to minimize the difference between the distribution of the observed data and the best estimation of the power law distribution assigned to the data by using the Komogorov-Smirnov statistic (K-S Statistic) -D is the maximum distance between the cumulative distribution function of the observed data F(x) and the cumulative distribution function of the estimated power law distribution P(x), in the domain of x > xmin -By testing different values of xmin and calculating the respective K-S distance, one obtains many different values of D that serve as a comparison between the CDF of the estimated power law distribution and the CDF of the observed data. -The value of xmin where the smallest value of D was obtained becomes the permanent lower bound of the estimated power law fit. Approximation of Lower Bound X-min -note: must be some lower bound to the power-law behavior. Point at which the power law distribution appears. -allows one to consolidate the domain of x where the power law is located. -if we choose too low a value for xmin, we will get a biased estimate of the scaling parameter since we will be attempting to fit a power-law model to non-power-law data. -if we choose too high a value for xmin, we are effectively throwing away legitimate data point x <xmin -better to err a little on the high side, but estimates that are too low oculd have severe consequences. -estimating a value of xmin is crucial for determining the power law exponent, as the slope of the power law distribution is determined by which data points are within the domain of the power law distribution.
  5. Once we have an estimation of the lower bound of the power law distribution, the value of xmin may be used in estimating the scaling parameter of the power law distribution. Talk about straight line on log log plot, note alpha is slope To obtain this parameter, we utilize the “method of maximum likelihood” (MLE) -obtains a value of alpha by summing over each empiracle data point (xi) (xi are observed values) that is greather than or equal to the previously estimated value of xmin. -MLEs will give us no warning that our fits are wrong: they tell us only the best fit to the power-law form, not whether the power law is in fact a good model for the data.
  6. the most we can say is that our observations are consistent with the hypothesis that x is drawn from a distribution. In some cases we may also be able to rule out some other competing hypothesis. To quantitatively measure the significance of our estimated power law distribution, we employ a test that calculates the K-S distance between the power law distribution and many idealized, synthetically-produced data sets. One is not enough, it is plausible that by chance the synthetic dataset will have a more precise fit to the emipracle data than that of a power law distribution with small vairations or smapling errros. -In other words, in instances where D syn < D the estimated power law distribution is not able to represent the data more closely than random chance. -Compare the K-S distance of a lare number of synthetic datasets. -As the number of datasets increases, Dsyn< D will converge closer to an expected value. To obtain an estimate of the expected value, we take the number of datasets where Dsyn<D and divide it by the total number of synthetic datasets. The result is a “p-value” which expresses the probability that the estimated power law distirubution is a good fit to the observed data. -Use the threshod of .10, thus less than 10% of the time our synthetic data set was a better fit to the distribution. The calculation of p-values for multiple distributions is a way to test or compare different probability distirubtion fits to empiralce data. Pgauss, is a quantitive measure of how appropriate the Gaussiais fit is to the data.
  7. Discuss What Type of Data We Are Using -Direct weather station observations from 1948 through 2009 -Have maximum and minimum temperatures, and daily rainfall -Note adjustments we made to the data -Calculated the mean annual cycle (daily cycle from the years), from this determine daily anomaly -Mean Annual Cycle: part of a measure quantity’s fluctuation that is attributed to Earth’s changing position in orbit over the course of the year. -The data we are left with, what does that describe
  8. Plots of Skewness and Kurtosis (Not Trival) -Skewness (third moment) we know: -Is a measure of the asymmetry of the probability distribution of a real-value random variable (right vs left skew) -Kurtosis (fourth moment) we know: -any measure of the ‘peakedness’ of the probability distribution of a real-valued random variable. -Kurtosis is a descriptor of the shape of a probability distribution. -the ‘minus 3’ to the formula serves as a correction to make kurtosis of the normal distribution equal to 0. -a higher kurtosis distribution has a sharper peak and longer, fatter tails, while a low kurtosis distribution has a more rounded peak and shorter, thinner tails. -positive kurtosis corresponds to a “taller” peak of the PDF around the mean as well as larger amount of data in the tails of the PDF. -negative kurtosis is seen in PDFs that have less data in the tails and a “broader” cluster of the probability distribution located about the mean. - kurtosis gauges the level of fluctuation within a distribution The nth central moment is the expectation of the difference between the random variable X and it’s mean to the nth power -central moment taken about the mean -Mean: first moment, Variance: second order moment, Skewness: third order moment, Kurtosis: fourth order moment -Moments are a quantitative measure of the shape of a set of points. All of this though points to one thing, we are interested in the tails of our distributions, location of extreme events. Launching Point for Power Law
  9. Are we interest in positive or negative regions of kurtosis, with positive expected greater tails. Kurtosis measures the "fatness" of the tails of a distribution. Positive excess kurtosis means that distribution has fatter tails than a normal distribution. Fat tails means there is a higher than normal probability of big positive and negative returns realizations Negative numbers indicate a platykurtic distribution; positive numbers indicate a leptokurtic distribution. When compared to a normal distribution, a platykurtic data set has a flatter peak around its mean, which causes thin tails within the distribution. The flatness results from the data being less concentrated around its mean, due to large variations within observations. Leptokurtic distributions have higher peaks around the mean compared to normal distributions, which leads to thick tails on both sides. These peaks result from the data being highly concentrated around the mean, due to lower variations within observations.
  10. Tmax (Positive Kurtosis, Negative Skew) Tmin (Negative Kurtosis, Negative Skew)
  11. Negative Kurtosis for both Min has small positive skew Max has negative skew