2. Motivation & Introduction
Extreme climatic events are weather phenomena
that occupy the tails of a dataset‟s probability density
function (PDF).
Advanced stochastic theory asserts that power law
distributions should exist in the tail ends of our data.
Questions to Answer:
Show That Power Law Distributions Are Evident within
Temperature Data.
Analyze how power law distributions change with varying
weather and climatic patterns (seasons, ENSO, etc.).
3. What is a Power Law
Distribution?
Mathematically, a power law probability distribution of
quantity x may be written as: -a
a -1 æ x ö
p ( x ) = Cx -a p(x) = ç
x èx ø
÷
min min
Where α is the exponent or scaling parameter and C is
the normalization constant.
[Neelin et al. 2011]
4. Data & Methods
Data
Daily observed maximum and minimum temperatures across the
southeastern United States (AL, FL, GA, NC, SC) spanning 1960-
2009.
Measures of quality control have been put in place resulting in the
omission of 20 stations.
Methods
Trends have been removed from the data.
If data is to follow a power law distribution, it does so above some
lower bound xmin.
To find our lower bound, we employ the Kolmogorov-Smirnov or KS
Statistic which calculates the maximum difference between the CDF of the
observed data and estimated power law distribution.
To calculate our scaling parameter α, we employ the “method of
maximum likelihood”.
én x ù-1
D = max | F(x) - P(x) | a = 1+ n êå ln i ú
ˆ
x³x min ë i=1 x min û
5. Significance Testing
Employ the use of a goodness of fit test which will
measure and analyze the KS distance of our power
law distribution with that of other synthetically
derived power law distributions.
From this goodness of fit test, we are able to derive
a „p-value‟ which expresses the probability that the
estimated power law distribution is a good fit to the
observed data.
7. Power Law Fit &
Significance Skewness
P-Value Tests Kurtosis
Criteria Is Power Law Fit Significant?
Ppower>0.10 and Pgauss<0.10 YES
Ppower<0.10 and Pgauss>0.10 NO
Ppower>0.10 and Pgauss>0.10 but Ppower>Pgauss Both Fits Are Significant, But Can Say Power Law is Better Fit (YES)
Ppower<0.10 and Pgauss<0.10 NO
8. Xmin & Alpha
More analysis is needed to adequately note whether patterns exist in the spatial
distributions of Xmin and Alpha.
9. Distinguished Power Laws
Maximum
Temperatures
Tamiami, FL Asheville, NC
Minimum
Hialeah, FL Temperature Henderson, NC
s
Distinguished Criteria (ppower>0.90 & pgauss~0)
10. Seasonal Shifts in
Power Law Distributions
Now that we have established that power law
distributions are existent, how are they
modulated by changes in the seasonal cycle?
11. Fall Power Law Fit &
Significance Skewness
P-Value Tests Kurtosis
Criteria Is Power Law Fit Significant?
Ppower>0.10 and Pgauss<0.10 YES
Ppower<0.10 and Pgauss>0.10 NO
Ppower>0.10 and Pgauss>0.10 but Ppower>Pgauss Both Fits Are Significant, But Can Say Power Law is Better Fit (YES)
Ppower<0.10 and Pgauss<0.10 NO
12. Future Work & Conclusions
There does appear to be a dynamic link between
areas of significant power law fit and areas of distinct
skewness and kurtosis.
Further examine how power law distributions change
with respect to season, ENSO, and other climatic
cycles.
Look to see if these modulations in the power law
distribution may be explained by any specific physical
processes.
Look into more ways to objectively characterize
changes in the power law parameters (Xmin and
Alpha) and distribution.
13. References
Clauset, A., C. R. Shalizi, and M. E. J. Newman, 2009: Power-law distributions in empirical data, SIAM Rev., 51, 661-
703.
Neelin, D., and T. W. Ruff, 2011: Long tails in regional surface temperature probability distributions with implications for
extremes under global warming. Geophys. Res. Lett., 39, l04704, doi: 10.1029/2011GL05061.
Newman, M. E. J., 2005: Power laws, Pareto distributions and Zipf‟s law, Contemp. Phys., 46, 323-351.
Sura, P., 2011: A general perspective of extreme events in weather and cliamte. Atmos. Res., 101, 1-21.
Stefanova, L., P. Sura, and M. Griffin, 2012: Quantifying the non-Gaussianity of wintertime daily maximum and
minimum temperatures in the Southeast United States. J. Climate, in press.
14. Winter Power Law Fit &
Significance
P-Value Tests Skewness Kurtosis
Criteria Is Power Law Fit Significant?
Ppower>0.10 and Pgauss<0.10 YES
Ppower<0.10 and Pgauss>0.10 NO
Ppower>0.10 and Pgauss>0.10 but Ppower>Pgauss Both Fits Are Significant, But Can Say Power Law is Better Fit (YES)
Ppower<0.10 and Pgauss<0.10 NO
15. Values of Xmin
Appears to be several more distinct regions of behavior than Annual Behavior;
however, more analysis and comparison is need to adequately depict the
potential patterns developing spatially.
Editor's Notes
This research is motivated by the study of extreme events, that is an event where the magnitude of the event is large, but the probability of the occurrence is rather/relatively small. These extreme events are high impact, hard to predict phenomena that is beyond our normal (Gaussian) expectations. Thus for the interest of our research, we are interested in the tails (maxima/minima) in the data. Here, an extreme event is defined in terms of the non-Gaussian tail of the data’s probability density function, opposed to the definition in extreme value theory.While it understood that the PDFs of atmospheric phenomena are non-Gaussian, the exact shape/distribution of these tails are not fully understood. So from a purely stochastic perspective (intrinsically non-deterministic, sporadic, and categorically not intermittent (ie random), this distribution should exist in the tails of this, so we want to analyze/investigate this question and further look into the behavior of the power laws in nature ( in this case temperature )An example of a stochastic process in the natural world is pressure in a gas. Even though each molecule is moving in a deterministic path, the motion of a collection of them is computationally and practically unpredictable. Purpose of research: More so, understanding the statistical distribution of daily temperature extremes is of practical interest in ecology, agriculture and utilities planningWeather and climate risk asesment depends on knowing the tails of the PDFsState purpose in modeling our extreme events (get a better idea of the distribution of extreme events, and inherently do a better job forecasting/predicting the occurrence and magnitude of these events), also important with regards to climate change, because if we are witnessing a shift in the mean or norm of our data, we can also expect a shift (sometimes in multiple magnitudes) in the tail of our datasets. -way to model, temperature department building (model energy use), how many times will max occur -industries effected by this, insurance and modeling industryThat is, with respect to climate change, if we get a small shift in the mean of a dataset, then the extreme values become of more importance. Present and discuss observational examples, and applications of our non-Gaussian stochastic framework.
It is not arbitrary to look for a power law distribution (as stated by stochastic theory and the existence of power laws throughout the physical world)[Equation on the Right is the Normalized Expression]Properties of PowerMathematically-a quantity x obeys a power law if -When the frequency of an event varies as a power of some attribute of that event -more often the power law applies only for values greater than some minimum xmin, in such cases we say that the tail of thedistribution follows a power law. -The distribution must deviate from the power-law form below some minimum value xminPhysically-It has been shown from observations that many atmospheric variables follow a power law distribution in the tails -Power-law distributions occur in an extraordinarily diverse range of phenomena. Note: power laws with alpha of less than one rarely occur in nature, as they would diverge
Quality control measures have been put into place (resulted in omission of 20+ stations) -quality controlled digital data from the Summary of the Day data set supplied by the National Climatic Data Center (NCDC) -daily measurements of maximum and minimum temperature are provided by the National Weather Service’s Cooperative Observation Program (COOP) -For this study, only selected stations reporting since at least 1960, stations that have more than 5 consecutive years of missing data were discarded. -In case of missing data for a given station, correlations between the existing time series at this reference station and surrounding stations within a 50-mile radius are computed and stations with correlations greater than 0.6 are retained for use in reconstructing the reference station’s missing data. -also not too worried about the missing data beyond the QC put in place, the reason for this is because we are interested in the tails of our distribution, and we expect data that is in the tails to always be recorded, anomalously large events are always recorded, more likely that values close to the mean would be omitted. ********************************************************************************************************************************************************************************************In order to utilize the K-S statitistic, the CDFs of both the observed data and the estimated power law distirbution must be calculated. One typically cannot say with absolute certainty that an empirical data set is described by a specific probability distribution Rather, it can only be stated that the observed data is in agreement with the proposed PDF.(test various values of xmin, choose the one with the smallest K-S statistic)Our method attempts to minimize the difference between the distribution of the observed data and the best estimation of the power law distribution assigned to the data by using the Komogorov-Smirnov statistic (K-S Statistic) -D is the maximum distance between the cumulative distribution function of the observed data F(x) and the cumulative distribution function of the estimated power law distribution P(x), in the domain of x > xmin -By testing different values of xmin and calculating the respective K-S distance, one obtains many different values of D that serve as a comparison between the CDF of the estimated power law distribution and the CDF of the observed data. -The value of xmin where the smallest value of D was obtained becomes the permanent lower bound of the estimated power law fit.-note: must be some lower bound to the power-law behavior. Point at which the power law distribution appears. -allows one to consolidate the domain of x where the power law is located. -if we choose too low a value for xmin, we will get a biased estimate of the scaling parameter since we will be attempting to fit a power-law model to non-power-law data. -if we choose too high a value for xmin, we are effectively throwing away legitimate data point x <xmin -better to err a little on the high side, but estimates that are too low oculd have severe consequences. -estimating a value of xmin is crucial for determining the power law exponent, as the slope of the power law distribution is determined by which data points are within the domain of the power law distribution.*********************************************************************************************Once we have an estimation of the lower bound of the power law distribution, the value of xmin may be used in estimating the scaling parameter of the power law distribution.Talk about straight line on log log plot, note alpha is slopeTo obtain this parameter, we utilize the “method of maximum likelihood” (MLE)-obtains a value of alpha by summing over each empiracle data point (xi) (xi are observed values) that is greather than or equal to the previously estimated value of xmin. -MLEs will give us no warning that our fits are wrong: they tell us only the best fit to the power-law form, not whether the power law is in fact a good model for the data.
To quantitatively measure the significance of our estimated power law distribution, we employ a test that calculates the K-S distance between the power law distribution and many idealized, synthetically-produced data sets. One is not enough, it is plausible that by chance the synthetic dataset will have a more precise fit to the empirical data than that of a power law distribution with small variations or sampling errors. -In other words, in instances where D syn < D the estimated power law distribution is not able to represent the data more closely than random chance. -Compare the K-S distance of a large number of synthetic datasets. -As the number of datasets increases, Dsyn< D will converge closer to an expected value. To obtain an estimate of the expected value, we take the number of datasets where Dsyn<D and divide it by the total number of synthetic datasets. The result is a “p-value” which expresses the probability that the estimated power law distribution is a good fit to the observed data. -Use the threshed of .10, thus less than 10% of the time our synthetic data set was a better fit to the distribution. The calculation of p-values for multiple distributions is a way to test or compare different probability distribution fits to empirical data. Pgauss, is a quantitative measure of how appropriate the Gaussian fit is to the data.
So I started running this program through the distributions of this data one by one, for maximum and minimum temperatures for the 272 stations. Slowly realized that this would neither be an effective or efficient way to note power law behavior in the atmosphere.Nor will it help us to determine any patterns or obvious fluctuations with changing weather patterns.
So I wanted to quantify the strength of the power law distributions, in which places we can say with significance whether or not there is a power law distribution present. When attempting to fit a probability distribution to empirical data, it is nearly impossible to find only one distribution that describes the behavior of the data. One typically cannot say with absolute certainty that an empirical data set is described by a specific probability distribution Rather, it can only be stated that the observed data is in agreement with the proposed PDF.However, when both p-values for p(gauss) and p(power) are above .10, there arises some problems…both distributions are then significant in that they could be a possible fit for the data. However, we can say that the power law fit was ‘better’ due to the higher p-value. It may be helpful to discern which distribution returns a larger p-value, even though both distributions are ‘significant’Our current criteria may miss out on any case where the power law is a ‘better’ fit of the data than the Gaussian distribution. When p is greater than 0.10 for a distribution, can say that the fit is significant. Thus, however if both are greater than our threshold (cite this threshold), we say that both distributions are significant, however, one may be a better fit that the other. Where should we expect power law distribtuions, in regions where we have heavy tails, ie a certain combination foskewnnes and kurtosis-Skewness --Is a measure of the asymmetry of the probability distribution of a real-value random variable (right vs left skew) -positive skewness, seem to see power law in the negative side of the pdf, negative skewness seem to see power law in the positive side-Kurtosis-any measure of the ‘peakedness’ of the probability distribution of a real-valued random variable. -Kurtosis is a descriptor of the shape of a probability distribution. -a higher kurtosis distribution has a sharper peak and longer, fatter tails, while a low kurtosis distribution has a more -ie a positive kurtosis corresponds to high peak with more data contained in the tails of the distribution -stronger kurtosis, potentially have ‘heavier’ tails (so we are interested in potentially larger areas of kurtosis)Results:Because of negative skewness, expect positive side of pdf to have a ‘heavier tail’ and this doest seem to mirror where we find our ‘most significant’ power law distributionsNegative side seems to be dominated by kurtosis pattern, whereas skewness dominated positive side of the pdf.In non-gaussian areas, see greater power law fitAnalyze impacts of
Remember nick thinks that it isnt bad to have one or two statements underneath the plotsNegative skewness would mean tail extends out to left, and we may see a relation beteen higher x-min values and the region where the tail extends out to, ietrivaly because less standard deviations from the mean(mean seeing higher x-min values in respective region of skewness)-tailsideWant to look at x-min and alpha behavior in locations where our p-values were strongest/most significantNote the few patterns that do show up in thisAlso not known whether or not these have any physical meaning or purposeNote:Maximum Temperatures-had significant power law fits on the positive side of the distribution given the large span of negative skewness-significant power law fit up through north Carolina-had almost like a cold tongue in north Carolina in kurtosis which appeared to show up in the negative p-value side. -Think of a way to better analyze (possibly normalize x-min values, and find departure, or instead visualize anomalous x-min values), and potentially note patterns with changing season and such. -Further analysis is neededMinimum Temperatures-skewness most zero throughout much of the southeastern United States-Small pocket of negative skewness was existent in the middle portion of Florida. This was matched in the presence of significant power law fits in the man portions of Florida. Most of Georgian Alabama South and North Carolina demonstrated non significant power law fit. -Kurtosis was mostly positive through the southeastern US, south eastern florida demonstrates on average above normal positive values of kurtosis when compared to other areas. -Meaning more peakedness, and heavier tails, and thus of more interest to us, with regards to an annual analysis
Selected by Magnitude Plot, possibly show here, and look into individual plots to note characteristicsDecided based upon comparative values of pSEEMED TO BE INTERESTING STUFF GOING ON IN SOUTH FLORIDA AND NORHT CAROLINA
Now that we have established that power laws exist, who do they change with season, or ENSOWe must note change in order to understand the true significance to the originalHow do seasons alter a) strongest distributions from before b) alter areas of weak confidence
Note impact of kurtosis on the northern portions of the SE United StatesAppears as though with negative skewness, appear to have a higher confidence in the significane of the p-value on the positive tail of the pdfSkewness pattern in the minimum temperature similar to the annual, but may not be identical power law pattern because of negative kurotois (flatness), this may make for a more symmetrical appearance of power laws0 kurtosis seems to mirror 0 p-signifcance in side of the tailIn annual we saw a lot more regions of positive kurtosis or peakedness, resulting in heavier tails
Discovered the patterns, what is next stepPower law distributions seem to be determined by the non-gaussinity of the dataset
Previously Kurtosis seemed to mirror negative, but skewness seemed to dictate much of positive p valueMust ask the question as to why the pattern does not seem consistent, however it is interesting to note that percentage of power law significance has increased since the last plot, very few white dotsNegative kurtosis may decrease pattern/relation of skewness to p-value significanceEither way from these we are still able to see that power laws are significant throughout naturePatterns of power law significance does change with varying seasons
Values of x-min are in general smaller it appears, more diverse a range of x-min values which is interesting