COAPS Short Seminar Series

•Download as PPT, PDF•

0 likes•158 views

jbd07met

Brief introduction to my graduate level research.

Atmospheric Power Law Behavior
James Duncan

Philip Sura

The Study of Extreme Events

• Extreme climatic events are weather phenomena that
occupy the tails of a dataset’s probability density
function (PDF).

• While it understood that the PDFs of atmospheric
phenomena are non-Gaussian, the exact
shape/distribution of these tails are not fully understood.

• Analysis of recent observational studies have shown that
many atmospheric variables follow a power law
distribution in the tails of their distribution.

What is a Power Law Distribution?
Mathematically, a power law probability
distribution of quantity x may be written
as:

-a
p(x) = Cx min

Where α is the exponent or scaling
parameter and C is the normalization
constant.

Stochastic theory asserts that power law
distributions should exist in the tails of
distributions.
[Newmans et al. 1986]

Construction of the Power Law
Algorithm

• Calculate a lower bound xmin and some scaling
parameter α of our power law distribution.

• Calculate the goodness of fit between the empirical
data and the power law. Make preliminary conclusion
based upon resulting p-value.

• Perform a likelihood ratio test comparing competing
hypothesis/distribution fits.

Estimating Lower Bound on
Power Law Behavior
• For the case of empirical data, if
the data is to follow a power-
law distribution, it does so only
above some lower bound xmin.

• To find our lower-bound, xmin, we
employ the Kolmogorov-Smirnov
or KS Statistic which the
maximum difference between
the CDF of the observed data
and the CDF of the estimated
power law distribution.

[Press et al. 1986]
D = max | F(x) - P(x) |
x³ x min

Estimating the Scaling Parameter

• An accurate estimate of α is dependent upon an
accurate estimate of our lower bound, xmin.

• To do so, we employ the “method of maximum likelihood”
given by:
én xi ù- 1
a =1+ n êå ln
ö ú
ë i=1 x min û

Significance Testing
• Employ the use of a goodness of fit test which will
measure and analyze the KS distance of our power law
distribution with that of other synthetically derived power
law distributions.

• From the goodness of fit test, we are able to derive a “p-
value” which expresses the probability that the estimated
power law distribution is a good fit to the observed data.

About the Data

• Daily weather observations from the southeastern United
States (AL, FL, GA, NC, SC) spanning 1948-2009.

• Data includes minimum and maximum temperatures,
and daily precipitation amounts.

• Mean annual cycle has been removed from the data.

Motivation for Power Law

Skewness Kurtosis

µ3 [Press et al. 1986]
µ4
γ= κ= - 3
s3 s4

Skewness/Kurtosis

[Courtesy of Dr Stefanova}

Ft Lauderdale

α xmin ppower pgauss α xmin ppower pgauss

Positive 23.53 2.44 .118 0.00 Positive 10.98 2.62 .498 .602

Negative 5.76 3.16 .364 .007
Negative 7.64 2.98 .22 .037

Negative Kurtosis, Negative Skew Positive Kurtosis, Negative Skew

Pensacola

α xmin ppower pgauss α xmin ppower pgauss

Positive 14.7 2.44 .0204 0.00 Positive 14.7 2.44 .0204 0.00

Negative 7.68 3.16 .928 .076 Negative 7.68 3.16 .928 .076

Negative Kurtosis, Positive Skew Negative Kurtosis, Negative Skew

Future Work

• Further examine power law distributions in the physical
world.

• Analyze these distributions during years of distinction:
• El Niño and La Niña years
• Seasonal trends
• Historically active or tranquil hurricane seasons
• Years of intense drought or flooding events

Questions?
References:
Clauset, A., C. R. Shalizi, and M. E. J. Newman, Power-law distributions in empirical
data, SIAM Review, 51, 661-703, 2009.
Newman, M. E. J., Power laws, Pareto distributions and Zipf’s law, Contemporary
Physics, 46(5), 323-351, 2005.
Press, W. H., B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes: The
Art of Scientific Computing, 1st ed., 818 pp., Cambridge University Press, 1986.

Similar to COAPS Short Seminar Series

R Packages for Time-Varying Networks and Extremal DependenceWork-Bench

Quantum Models for Decision and CognitionCatarina Moreira

Unit3manojsingh786

UNF Undergrad PhysicsNick Kypreos

PosterSteven Block

Efficient Online Evaluation of Big Data Stream ClassifiersAlbert Bifet

Transportation and logistics modeling 2karim sal3awi

Davis_Research_ReportBrandon McKinzie

Unit1_Prerequisites.pdfpalashgupta53

NNBAR SESAPS PRESENTATION FINALJoshua Barrow

Improving Physical Parametrizations in Climate Models using Machine LearningNoah Brenowitz

modal pushover analysisЦ. Алтангэрэл

Quantum theory research overview University of Glasgow Optical Sciences Seminar

Yield curve estimation in Costa RicaFacultad de Ciencias, UCR

MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty fo...The Statistical and Applied Mathematical Sciences Institute

Vaulin pohang 2010Sergey Sozykin

NMK Developing and Evaluating Algorithms for Gaussian State Reconstruction FINALNadav Kravitz

A@kash Physics NCERT Maps.pdf physics short notes physics short notes physics...FahadAlam52

1789 1800Editor IJARCET

Similar to COAPS Short Seminar Series (20)

R Packages for Time-Varying Networks and Extremal Dependence

Quantum Models for Decision and Cognition

Unit3

UNF Undergrad Physics

Poster

Efficient Online Evaluation of Big Data Stream Classifiers

Transportation and logistics modeling 2

Davis_Research_Report

Unit1_Prerequisites.pdf

NNBAR SESAPS PRESENTATION FINAL

Improving Physical Parametrizations in Climate Models using Machine Learning

modal pushover analysis

Quantum theory research overview

Yield curve estimation in Costa Rica

MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty fo...

Vaulin pohang 2010

NMK Developing and Evaluating Algorithms for Gaussian State Reconstruction FINAL

A@kash Physics NCERT Maps.pdf physics short notes physics short notes physics...

1789 1800

COAPS Short Seminar Series

1. Atmospheric Power Law Behavior James Duncan Philip Sura

2. The Study of Extreme Events • Extreme climatic events are weather phenomena that occupy the tails of a dataset’s probability density function (PDF). • While it understood that the PDFs of atmospheric phenomena are non-Gaussian, the exact shape/distribution of these tails are not fully understood. • Analysis of recent observational studies have shown that many atmospheric variables follow a power law distribution in the tails of their distribution.

3. What is a Power Law Distribution? Mathematically, a power law probability distribution of quantity x may be written as: -a p(x) = Cx min Where α is the exponent or scaling parameter and C is the normalization constant. Stochastic theory asserts that power law distributions should exist in the tails of distributions. [Newmans et al. 1986]

4. Construction of the Power Law Algorithm • Calculate a lower bound xmin and some scaling parameter α of our power law distribution. • Calculate the goodness of fit between the empirical data and the power law. Make preliminary conclusion based upon resulting p-value. • Perform a likelihood ratio test comparing competing hypothesis/distribution fits.

5. Estimating Lower Bound on Power Law Behavior • For the case of empirical data, if the data is to follow a power- law distribution, it does so only above some lower bound xmin. • To find our lower-bound, xmin, we employ the Kolmogorov-Smirnov or KS Statistic which the maximum difference between the CDF of the observed data and the CDF of the estimated power law distribution. [Press et al. 1986] D = max | F(x) - P(x) | x³ x min

6. Estimating the Scaling Parameter • An accurate estimate of α is dependent upon an accurate estimate of our lower bound, xmin. • To do so, we employ the “method of maximum likelihood” given by: én xi ù- 1 a =1+ n êå ln ö ú ë i=1 x min û

7. Significance Testing • Employ the use of a goodness of fit test which will measure and analyze the KS distance of our power law distribution with that of other synthetically derived power law distributions. • From the goodness of fit test, we are able to derive a “p- value” which expresses the probability that the estimated power law distribution is a good fit to the observed data.

8. About the Data • Daily weather observations from the southeastern United States (AL, FL, GA, NC, SC) spanning 1948-2009. • Data includes minimum and maximum temperatures, and daily precipitation amounts. • Mean annual cycle has been removed from the data.

9. Motivation for Power Law Skewness Kurtosis µ3 [Press et al. 1986] µ4 γ= κ= - 3 s3 s4

10. Skewness/Kurtosis [Courtesy of Dr Stefanova}

11. Ft Lauderdale α xmin ppower pgauss α xmin ppower pgauss Positive 23.53 2.44 .118 0.00 Positive 10.98 2.62 .498 .602 Negative 5.76 3.16 .364 .007 Negative 7.64 2.98 .22 .037 Negative Kurtosis, Negative Skew Positive Kurtosis, Negative Skew

12. Pensacola α xmin ppower pgauss α xmin ppower pgauss Positive 14.7 2.44 .0204 0.00 Positive 14.7 2.44 .0204 0.00 Negative 7.68 3.16 .928 .076 Negative 7.68 3.16 .928 .076 Negative Kurtosis, Positive Skew Negative Kurtosis, Negative Skew

13. Future Work • Further examine power law distributions in the physical world. • Analyze these distributions during years of distinction: • El Niño and La Niña years • Seasonal trends • Historically active or tranquil hurricane seasons • Years of intense drought or flooding events

14. Questions? References: Clauset, A., C. R. Shalizi, and M. E. J. Newman, Power-law distributions in empirical data, SIAM Review, 51, 661-703, 2009. Newman, M. E. J., Power laws, Pareto distributions and Zipf’s law, Contemporary Physics, 46(5), 323-351, 2005. Press, W. H., B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling, Numerical Recipes: The Art of Scientific Computing, 1st ed., 818 pp., Cambridge University Press, 1986.

Editor's Notes

Where the magnitude of the event is large, but the probability of the occurrence is small
Transition to Power Law (Not Arbitrary to Look at Power Law) Mathematically -a quantity x obeys a power law if -When the frequency of an event varies as a power of some attribute of that event -more often the power law applies only for values greater than some minimum xmin, in such cases we say that the tail of the distribution follows a power law. -The distribution must deviate from the power-law form below some minimum value xmin. (Describe Physical Distribution of Power Law) -The ubiquity of power-law behavior in the natural world has led many scientists to wonder whether there is a single, simple, underlying mechanism linking all these different systems together. -It has been shown from observations that many atmospheric variables follow a power law distribution in the tails -Power-law distributions occur in an extraordinarily diverse range of phenomena. -Word frequency, Web Hits, Copies of Books Sold, Wealth fo Richest Americans, Instensity of Solar Flares, Populations of Cities Properties of Power IMPORTANT WITH RESEPCT TO CLIMATE CHANGE BECAUSE IF WE GET A SMALL SHIFT IN THE MEAN OF A DATASET, THEN THE EXTREME VALUES BECOME MORE IMPORTANT
Must first obtain a PDF of the data itself. -create a timeseries of normalized anomalies that are ready to be placed in histograms -placing normalized anomalies into “ bins” rainging from -10 to 10 standard devaitions -histogram is normalized, creating a PDF In order to utilize the K-S statitistic, the CDFs of both the observed data and the estimated power law distirbution must be calculated. We would like to generate a sequence of independent random variables that are uniformly distributed throughout the domain. We utilize the Mersenne Twister, pseudorandom number generator. Want a dataset that attempts to follow the same power law fit as estimated by the power law algorithm. Because the observed atmospheric and oceanic data values exhibit an autocorrelation bias, we must approximate a separate “de=correlation timescale” or length of time (time lag) that it takes a sequence of data to beocme un-correlated. Diving the length of our observed timeseries witht eh value of the de-correlation timescale.
When attempting to fit a probability distribution to empirical data, it is nearly impossible to find only one distribution that describes the behavior of the data. One typically cannot say with absolute certainty that an empirical data set is described by a specific probability distribution Rather, it can only be stated that the observed data is in agreement with the proposed PDF. (test various values of xmin, choose the one with the msallest K-S statisitc) Our method attempts to minimize the difference between the distribution of the observed data and the best estimation of the power law distribution assigned to the data by using the Komogorov-Smirnov statistic (K-S Statistic) -D is the maximum distance between the cumulative distribution function of the observed data F(x) and the cumulative distribution function of the estimated power law distribution P(x), in the domain of x > xmin -By testing different values of xmin and calculating the respective K-S distance, one obtains many different values of D that serve as a comparison between the CDF of the estimated power law distribution and the CDF of the observed data. -The value of xmin where the smallest value of D was obtained becomes the permanent lower bound of the estimated power law fit. Approximation of Lower Bound X-min -note: must be some lower bound to the power-law behavior. Point at which the power law distribution appears. -allows one to consolidate the domain of x where the power law is located. -if we choose too low a value for xmin, we will get a biased estimate of the scaling parameter since we will be attempting to fit a power-law model to non-power-law data. -if we choose too high a value for xmin, we are effectively throwing away legitimate data point x <xmin -better to err a little on the high side, but estimates that are too low oculd have severe consequences. -estimating a value of xmin is crucial for determining the power law exponent, as the slope of the power law distribution is determined by which data points are within the domain of the power law distribution.
Once we have an estimation of the lower bound of the power law distribution, the value of xmin may be used in estimating the scaling parameter of the power law distribution. Talk about straight line on log log plot, note alpha is slope To obtain this parameter, we utilize the “method of maximum likelihood” (MLE) -obtains a value of alpha by summing over each empiracle data point (xi) (xi are observed values) that is greather than or equal to the previously estimated value of xmin. -MLEs will give us no warning that our fits are wrong: they tell us only the best fit to the power-law form, not whether the power law is in fact a good model for the data.
the most we can say is that our observations are consistent with the hypothesis that x is drawn from a distribution. In some cases we may also be able to rule out some other competing hypothesis. To quantitatively measure the significance of our estimated power law distribution, we employ a test that calculates the K-S distance between the power law distribution and many idealized, synthetically-produced data sets. One is not enough, it is plausible that by chance the synthetic dataset will have a more precise fit to the emipracle data than that of a power law distribution with small vairations or smapling errros. -In other words, in instances where D syn < D the estimated power law distribution is not able to represent the data more closely than random chance. -Compare the K-S distance of a lare number of synthetic datasets. -As the number of datasets increases, Dsyn< D will converge closer to an expected value. To obtain an estimate of the expected value, we take the number of datasets where Dsyn<D and divide it by the total number of synthetic datasets. The result is a “p-value” which expresses the probability that the estimated power law distirubution is a good fit to the observed data. -Use the threshod of .10, thus less than 10% of the time our synthetic data set was a better fit to the distribution. The calculation of p-values for multiple distributions is a way to test or compare different probability distirubtion fits to empiralce data. Pgauss, is a quantitive measure of how appropriate the Gaussiais fit is to the data.
Discuss What Type of Data We Are Using -Direct weather station observations from 1948 through 2009 -Have maximum and minimum temperatures, and daily rainfall -Note adjustments we made to the data -Calculated the mean annual cycle (daily cycle from the years), from this determine daily anomaly -Mean Annual Cycle: part of a measure quantity’s fluctuation that is attributed to Earth’s changing position in orbit over the course of the year. -The data we are left with, what does that describe
Plots of Skewness and Kurtosis (Not Trival) -Skewness (third moment) we know: -Is a measure of the asymmetry of the probability distribution of a real-value random variable (right vs left skew) -Kurtosis (fourth moment) we know: -any measure of the ‘peakedness’ of the probability distribution of a real-valued random variable. -Kurtosis is a descriptor of the shape of a probability distribution. -the ‘minus 3’ to the formula serves as a correction to make kurtosis of the normal distribution equal to 0. -a higher kurtosis distribution has a sharper peak and longer, fatter tails, while a low kurtosis distribution has a more rounded peak and shorter, thinner tails. -positive kurtosis corresponds to a “taller” peak of the PDF around the mean as well as larger amount of data in the tails of the PDF. -negative kurtosis is seen in PDFs that have less data in the tails and a “broader” cluster of the probability distribution located about the mean. - kurtosis gauges the level of fluctuation within a distribution The nth central moment is the expectation of the difference between the random variable X and it’s mean to the nth power -central moment taken about the mean -Mean: first moment, Variance: second order moment, Skewness: third order moment, Kurtosis: fourth order moment -Moments are a quantitative measure of the shape of a set of points. All of this though points to one thing, we are interested in the tails of our distributions, location of extreme events. Launching Point for Power Law
Are we interest in positive or negative regions of kurtosis, with positive expected greater tails. Kurtosis measures the &quot;fatness&quot; of the tails of a distribution. Positive excess kurtosis means that distribution has fatter tails than a normal distribution. Fat tails means there is a higher than normal probability of big positive and negative returns realizations Negative numbers indicate a platykurtic distribution; positive numbers indicate a leptokurtic distribution. When compared to a normal distribution, a platykurtic data set has a flatter peak around its mean, which causes thin tails within the distribution. The flatness results from the data being less concentrated around its mean, due to large variations within observations. Leptokurtic distributions have higher peaks around the mean compared to normal distributions, which leads to thick tails on both sides. These peaks result from the data being highly concentrated around the mean, due to lower variations within observations.
Tmax (Positive Kurtosis, Negative Skew) Tmin (Negative Kurtosis, Negative Skew)
Negative Kurtosis for both Min has small positive skew Max has negative skew

COAPS Short Seminar Series

Recommended

Recommended

More Related Content

Similar to COAPS Short Seminar Series

Similar to COAPS Short Seminar Series (20)

COAPS Short Seminar Series

Editor's Notes