Time series and forecasting from wikipedia


Published on

A compilation of Wikipedia articles on Time Series and Stochastic Processes with a view towards quantitative finance

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Time series and forecasting from wikipedia

  1. 1. Time Series and Forecasting Compiled by M.Barros, D.Sc. December 12th, 2012 Source: Wikipedia PDF generated using the open source mwlib toolkit. See http://code.pediapress.com/ for more information. PDF generated at: Tue, 11 Dec 2012 03:49:39 UTC
  2. 2. ContentsArticles Time series 1 Forecasting 8 Stationary process 14 Stochastic process 16 Covariance 20 Autocovariance 24 Autocorrelation 25 Cross-correlation 31 White noise 35 Random walk 41 Brownian motion 55 Wiener process 66 Autoregressive model 74 Moving average 80 Autoregressive–moving-average model 86 Fourier transform 90 Spectral density 110 Signal processing 116 Autoregressive conditional heteroskedasticity 118 Autoregressive integrated moving average 122 Volatility (finance) 124 Stable distribution 129 Mathematical finance 137 Stochastic differential equation 141 Brownian model of financial markets 145 Stochastic volatility 151 Black–Scholes 154 Black model 168 Black–Derman–Toy model 170 Cox–Ingersoll–Ross model 172 Monte Carlo method 173References Article Sources and Contributors 185
  3. 3. Image Sources, Licenses and Contributors 188 Article Licenses License 190AVAILABLE FREE OF CHARGE AT:www.mbarros.comhttp://mbarrosconsultoria.blogspot.comhttp://mbarrosconsultoria2.blogspot.com
  4. 4. Time series 1 Time series In statistics, signal processing, pattern recognition, econometrics, mathematical finance, Weather forecasting, Earthquake prediction, Electroencephalography, Control engineering and Communications engineering a time series is a sequence of data points, measured typically at successive time instants spaced at uniform time intervals. Examples of time series are the daily closing value of the Dow Jones index or the annual flow volume of the Nile River at Aswan. Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other Time series: random data plus trend, with best-fit line and different characteristics of the data. Time series forecasting is smoothings the use of a model to predict future values based on previously observed values. Time series are very frequently plotted via line charts. Time series data have a natural temporal ordering. This makes time series analysis distinct from other common data analysis problems, in which there is no natural ordering of the observations (e.g. explaining peoples wages by reference to their respective education levels, where the individuals data could be entered in any order). Time series analysis is also distinct from spatial data analysis where the observations typically relate to geographical locations (e.g. accounting for house prices by the location as well as the intrinsic characteristics of the houses). A stochastic model for a time series will generally reflect the fact that observations close together in time will be more closely related than observations further apart. In addition, time series models will often make use of the natural one-way ordering of time so that values for a given period will be expressed as deriving in some way from past values, rather than from future values (see time reversibility.) Methods for time series analyses may be divided into two classes: frequency-domain methods and time-domain methods. The former include spectral analysis and recently wavelet analysis; the latter include auto-correlation and cross-correlation analysis. Additionally time series analysis techniques may be divided into parametric and non-parametric methods. The parametric approaches assume that the underlying stationary Stochastic process has a certain structure which can be described using a small number of parameters (for example, using an autoregressive or moving average model). In these approaches, the task is to estimate the parameters of the model that describes the stochastic process. By contrast, non-parametric approaches explicitly estimate the covariance or the spectrum of the process without assuming that the process has any particular structure. Additionally methods of time series analysis may be divided into linear and non-linear, univariate and multivariate. Time series analysis can be applied to: • real-valued, continuous data • discrete numeric data • discrete symbolic data (i.e. sequences of characters, such as letters and words in English language[1]).
  5. 5. Time series 2 Analysis There are several types of data analysis available for time series which are appropriate for different purposes. In the context of statistics, econometrics, quantitative finance, seismology, meteorology, geophysics the primary goal of time series analysis is forecasting, in the context of signal processing, control engineering and communication engineering it is used for signal detection and estimation while in the context of data mining, pattern recognition and machine learning time series analysis can be used for clustering, classification, query by content, anomaly detection as well as forecasting. Exploratory analysis The clearest way to examine a regular time series manually is with a line chart such as the one shown for tuberculosis in the United States, made with a spreadsheet program. The number of cases was standardized to a rate per 100,000 and the percent change per year in this rate was calculated. The nearly steadily dropping line shows that the TB incidence was decreasing in most years, but the percent change in this rate varied by as much as +/- 10%, with surges in 1975 and around the early 1990s. The use of both vertical axes allows the comparison of two time series in one graphic. Other techniques Tuberculosis incidence US 1953-2009 include: • Autocorrelation analysis to examine serial dependence • Spectral analysis to examine cyclic behaviour which need not be related to seasonality. For example, sun spot activity varies over 11 year cycles.[2][3] Other common examples include celestial phenomena, weather patterns, neural activity, commodity prices, and economic activity. • Separation into components representing trend, seasonality, slow and fast variation, cyclical irregular: see decomposition of time series • Simple properties of marginal distributions Prediction and forecasting • Fully formed statistical models for stochastic simulation purposes, so as to generate alternative versions of the time series, representing what might happen over non-specific time-periods in the future • Simple or fully formed statistical models to describe the likely outcome of the time series in the immediate future, given knowledge of the most recent outcomes (forecasting). • Forecasting on time series is usually done using automated statistical software packages and programming languages, such as R (programming language), S (programming language), SAS (software), SPSS, Minitab and many others.
  6. 6. Time series 3 Classification • Assigning time series pattern to a specific category, for example identify a word based on series of hand movements in Sign language See main article: Statistical classification Regression analysis • Estimating future value of a signal based on its previous behavior, e.g. predict the price of AAPL stock based on its previous price movements for that hour, day or month, or predict position of Apollo 11 spacecraft at a certain future moment based on its current trajectory (i.e. time series of its previous locations).[4] • Regression analysis is usually based on statistical interpretation of time series properties in time domain, pioneered by statisticians George Box and Gwilym Jenkins in the 50s: see Box–Jenkins See main article: Regression analysis Signal Estimation • This approach is based on Harmonic analysis and filtering of signals in Frequency domain using Fourier transform, and Spectral density estimation, the development of which was significantly accelerated during World War II by mathematician Norbert Weiner, electrical engineers Rudolf E. Kálmán, Dennis Gabor and others for filtering signal from noise and predicting signal value at a certain point in time, see Kalman Filter, Estimation theory and Digital Signal Processing Models Models for time series data can have many forms and represent different stochastic processes. When modeling variations in the level of a process, three broad classes of practical importance are the autoregressive (AR) models, the integrated (I) models, and the moving average (MA) models. These three classes depend linearly[5] on previous data points. Combinations of these ideas produce autoregressive moving average (ARMA) and autoregressive integrated moving average (ARIMA) models. The autoregressive fractionally integrated moving average (ARFIMA) model generalizes the former three. Extensions of these classes to deal with vector-valued data are available under the heading of multivariate time-series models and sometimes the preceding acronyms are extended by including an initial "V" for "vector". An additional set of extensions of these models is available for use where the observed time-series is driven by some "forcing" time-series (which may not have a causal effect on the observed series): the distinction from the multivariate case is that the forcing series may be deterministic or under the experimenters control. For these models, the acronyms are extended with a final "X" for "exogenous". Non-linear dependence of the level of a series on previous data points is of interest, partly because of the possibility of producing a chaotic time series. However, more importantly, empirical investigations can indicate the advantage of using predictions derived from non-linear models, over those from linear models, as for example in nonlinear autoregressive exogenous models. Among other types of non-linear time series models, there are models to represent the changes of variance along time (heteroskedasticity). These models represent autoregressive conditional heteroskedasticity (ARCH) and the collection comprises a wide variety of representation (GARCH, TARCH, EGARCH, FIGARCH, CGARCH, etc.). Here changes in variability are related to, or predicted by, recent past values of the observed series. This is in contrast to other possible representations of locally varying variability, where the variability might be modelled as being driven by a separate time-varying process, as in a doubly stochastic model. In recent work on model-free analyses, wavelet transform based methods (for example locally stationary wavelets and wavelet decomposed neural networks) have gained favor. Multiscale (often referred to as multiresolution) techniques decompose a given time series, attempting to illustrate time dependence at multiple scales. See also
  7. 7. Time series 4 Markov switching multifractal (MSMF) techniques for modeling volatility evolution. Notation A number of different notations are in use for time-series analysis. A common notation specifying a time series X that is indexed by the natural numbers is written X = {X1, X2, ...}. Another common notation is Y = {Yt: t ∈ T}, where T is the index set. Conditions There are two sets of conditions under which much of the theory is built: • Stationary process • Ergodic process However, ideas of stationarity must be expanded to consider two important ideas: strict stationarity and second-order stationarity. Both models and applications can be developed under each of these conditions, although the models in the latter case might be considered as only partly specified. In addition, time-series analysis can be applied where the series are seasonally stationary or non-stationary. Situations where the amplitudes of frequency components change with time can be dealt with in time-frequency analysis which makes use of a time–frequency representation of a time-series or signal.[6] Models The general representation of an autoregressive model, well known as AR(p), is where the term εt is the source of randomness and is called white noise. It is assumed to have the following characteristics: • • • With these assumptions, the process is specified up to second-order moments and, subject to conditions on the coefficients, may be second-order stationary. If the noise also has a normal distribution, it is called normal or Gaussian white noise. In this case, the AR process may be strictly stationary, again subject to conditions on the coefficients. Tools for investigating time-series data include: • Consideration of the autocorrelation function and the spectral density function (also cross-correlation functions and cross-spectral density functions) • Scaled cross- and auto-correlation functions[7] • Performing a Fourier transform to investigate the series in the frequency domain • Use of a filter to remove unwanted noise • Principal components analysis (or empirical orthogonal function analysis) • Singular spectrum analysis • "Structural" models:
  8. 8. Time series 5 • General State Space Models • Unobserved Components Models • Machine Learning • Artificial neural networks • Support Vector Machine • Fuzzy Logic • Hidden Markov model • Control chart • Shewhart individuals control chart • CUSUM chart • EWMA chart • Detrended fluctuation analysis • Dynamic time warping • Dynamic Bayesian network • Time-frequency analysis techniques: • Fast Fourier Transform • Continuous wavelet transform • Short-time Fourier transform • Chirplet transform • Fractional Fourier transform • Chaotic analysis • Correlation dimension • Recurrence plots • Recurrence quantification analysis • Lyapunov exponents • Entropy encoding Measures Time series metrics or features that can be used for time series classification or regression analysis[8]: • Univariate linear measures • Moment (mathematics) • Spectral band power • Spectral edge frequency • Accumulated Energy (signal processing) • Characteristics of the autocorrelation function • Hjorth parameters • FFT parameters • Autoregressive model parameters • Univariate non-linear measures • Measures based on the correlation sum • Correlation dimension • Correlation integral • Correlation density • Correlation entropy
  9. 9. Time series 6 • Approximate Entropy[9] • Sample Entropy • Fourier entropy • Wavelet entropy • Rényi entropy • Higher-order methods • Marginal predictability • Dynamical similarity index • State space dissimilarity measures • Lyapunov exponent • Permutation methods • Local flow • Other univariate measures • Algorithmic complexity • Kolmogorov complexity estimates • Hidden Markov Model states • Surrogate time series and surrogate correction • Loss of recurrence (degree of non-stationarity) • Bivariate linear measures • Maximum linear cross-correlation • Linear Coherence (signal processing) • Bivariate non-linear measures • Non-linear interdependence • Dynamical Entrainment (physics) • Measures for Phase synchronization • Similarity measures[10]: • Dynamic Time Warping • Hidden Markov Models • Edit distance • Total correlation • Newey–West estimator • Prais-Winsten transformation • Data as Vectors in a Metrizable Space • Minkowski distance • Mahalanobis distance • Data as Time Series with Envelopes • Global Standard Deviation • Local Standard Deviation • Windowed Standard Deviation • Data Interpreted as Stochastic Series • Pearson product-moment correlation coefficient • Spearmans rank correlation coefficient • Data Interpreted as a Probability Distribution Function • Kolmogorov-Smirnov test • Cramér-von Mises criterion
  10. 10. Time series 7 References [1] Lin, Jessica and Keogh, Eamonn and Lonardi, Stefano and Chiu, Bill. A symbolic representation of time series, with implications for streaming algorithms. Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, 2003. url: http:/ / doi. acm. org/ 10. 1145/ 882082. 882086 [2] Bloomfield, P. (1976). Fourier analysis of time series: An introduction. New York: Wiley. [3] Shumway, R. H. (1988). Applied statistical time series analysis. Englewood Cliffs, NJ: Prentice Hall. [4] Lawson, Charles L., Hanson, Richard, J. (1987). Solving Least Squares Problems. Society for Industrial and Applied Mathematics, 1987. [5] Gershenfeld, N. (1999). The nature of mathematical modeling. p.205-08 [6] Boashash, B. (ed.), (2003) Time-Frequency Signal Analysis and Processing: A Comprehensive Reference, Elsevier Science, Oxford, 2003 ISBN ISBN 0-08-044335-4 [7] Nikolić D, Muresan RC, Feng W, Singer W (2012) Scaled correlation analysis: a better way to compute a cross-correlogram. European Journal of Neuroscience, pp. 1–21, doi:10.1111/j.1460-9568.2011.07987.x http:/ / www. danko-nikolic. com/ wp-content/ uploads/ 2012/ 03/ Scaled-correlation-analysis. pdf [8] Mormann, Florian and Andrzejak, Ralph G. and Elger, Christian E. and Lehnertz, Klaus. Seizure prediction: the long and winding road. Brain, 2007,130 (2): 314-33.url : http:/ / brain. oxfordjournals. org/ content/ 130/ 2/ 314. abstract [9] Land, Bruce and Elias, Damian. Measuring the "Complexity" of a time series. URL: http:/ / www. nbb. cornell. edu/ neurobio/ land/ PROJECTS/ Complexity/ [10] Ropella, G.E.P.; Nag, D.A.; Hunt, C.A.; , "Similarity measures for automated comparison of in silico and in vitro experimental results," Engineering in Medicine and Biology Society, 2003. Proceedings of the 25th Annual International Conference of the IEEE , vol.3, no., pp. 2933- 2936 Vol.3, 17-21 Sept. 2003 doi: 10.1109/IEMBS.2003.1280532 URL: http:/ / ieeexplore. ieee. org/ stamp/ stamp. jsp?tp=& arnumber=1280532& isnumber=28615 Further reading • Bloomfield, P. (1976). Fourier analysis of time series: An introduction. New York: Wiley. • Box, George; Jenkins, Gwilym (1976), Time series analysis: forecasting and control, rev. ed., Oakland, California: Holden-Day • Brillinger, D. R. (1975). Time series: Data analysis and theory. New York: Holt, Rinehart. & Winston. • Brigham, E. O. (1974). The fast Fourier transform. Englewood Cliffs, NJ: Prentice-Hall. • Elliott, D. F., & Rao, K. R. (1982). Fast transforms: Algorithms, analyses, applications. New York: Academic Press. • Gershenfeld, Neil (2000), The nature of mathematical modeling, Cambridge: Cambridge Univ. Press, ISBN 978-0-521-57095-4, OCLC 174825352 • Hamilton, James (1994), Time Series Analysis, Princeton: Princeton Univ. Press, ISBN 0-691-04289-6 • Jenkins, G. M., & Watts, D. G. (1968). Spectral analysis and its applications. San Francisco: Holden-Day. • Priestley, M. B. (1981). Spectral Analysis and Time Series. London: Academic Press. ISBN 978-0-12-564901-8 • Shasha, D. (2004), High Performance Discovery in Time Series, Berlin: Springer, ISBN 0-387-00857-8 • Shumway, R. H. (1988). Applied statistical time series analysis. Englewood Cliffs, NJ: Prentice Hall. • Wiener, N.(1964). Extrapolation, Interpolation, and Smoothing of Stationary Time Series.The MIT Press. • Wei, W. W. (1989). Time series analysis: Univariate and multivariate methods. New York: Addison-Wesley. • Weigend, A. S., and N. A. Gershenfeld (Eds.) (1994) Time Series Prediction: Forecasting the Future and Understanding the Past. Proceedings of the NATO Advanced Research Workshop on Comparative Time Series Analysis (Santa Fe, May 1992) MA: Addison-Wesley. • Durbin J., and Koopman S.J. (2001) Time Series Analysis by State Space Methods. Oxford University Press.
  11. 11. Time series 8 External links • A First Course on Time Series Analysis (http://statistik.mathematik.uni-wuerzburg.de/timeseries/) - an open source book on time series analysis with SAS • Introduction to Time series Analysis (Engineering Statistics Handbook) (http://www.itl.nist.gov/div898/ handbook/pmc/section4/pmc4.htm) - a practical guide to Time series analysis • MATLAB Toolkit for Computation of Multiple Measures on Time Series Data Bases (http://www.jstatsoft.org/ v33/i05/paper) Forecasting Forecasting is the process of making statements about events whose actual outcomes (typically) have not yet been observed. A commonplace example might be estimation of some variable of interest at some specified future date. Prediction is a similar, but more general term. Both might refer to formal statistical methods employing time series, cross-sectional or longitudinal data, or alternatively to less formal judgemental methods. Usage can differ between areas of application: for example, in hydrology, the terms "forecast" and "forecasting" are sometimes reserved for estimates of values at certain specific future times, while the term "prediction" is used for more general estimates, such as the number of times floods will occur over a long period. Risk and uncertainty are central to forecasting and prediction; it is generally considered good practice to indicate the degree of uncertainty attaching to forecasts. In any case, the data must be up to date in order for the forecast to be as accurate as possible.[1] Although quantitative analysis can be very precise, it is not always appropriate. Some experts in the field of forecasting have advised against the use of mean square error to compare forecasting methods.[2] Categories of forecasting methods Qualitative vs. quantitative methods Qualitative forecasting techniques are subjective, based on the opinion and judgment of consumers, experts; appropriate when past data is not available. It is usually applied to intermediate-long range decisions. Examples of qualitative forecasting methods are: informed opinion and judgment, the Delphi method, market research, historical life-cycle analogy. Quantitative forecasting models are used to estimate future demands as a function of past data; appropriate when past data are available. The method is usually applied to short-intermediate range decisions. Examples of quantitative forecasting methods are: last period demand, simple and weighted moving averages (N-Period), simple exponential smoothing, multiplicative seasonal indexes.
  12. 12. Forecasting 9 Naïve approach Naïve forecasts are the most cost-effective and efficient objective forecasting model, and provide a benchmark against which more sophisticated models can be compared. For stable time series data, this approach says that the forecast for any period equals the previous periods actual value. Reference class forecasting Reference class forecasting was developed by Oxford professor Bent Flyvbjerg to eliminate or reduce bias in forecasting by focusing on distributional information about past, similar outcomes to that being forecasted.[3] Daniel Kahneman, Nobel Prize winner in economics, calls Flyvbjergs counsel to use reference class forecasting to de-bias forecasts, "the single most important piece of advice regarding how to increase accuracy in forecasting.”[4] Time series methods Time series methods use historical data as the basis of estimating future outcomes. • Moving average • Weighted moving average • Kalman filtering • Exponential smoothing • Autoregressive moving average (ARMA) • Autoregressive integrated moving average (ARIMA) e.g. Box-Jenkins • Extrapolation • Linear prediction • Trend estimation • Growth curve Causal / econometric forecasting methods Some forecasting methods use the assumption that it is possible to identify the underlying factors that might influence the variable that is being forecast. For example, including information about weather conditions might improve the ability of a model to predict umbrella sales. This is a model of seasonality which shows a regular pattern of up and down fluctuations. In addition to weather, seasonality can also be due to holidays and customs such as predicting that sales in college football apparel will be higher during football season as opposed to the off season.[5] Casual forecasting methods are also subject to the discretion of the forecaster. There are several informal methods which do not have strict algorithms, but rather modest and unstructured guidance. One can forecast based on, for example, linear relationships. If one variable is linearly related to the other for a long enough period of time, it may be beneficial to predict such a relationship in the future. This is quite different from the aforementioned model of seasonality whose graph would more closely resemble a sine or cosine wave. The most important factor when performing this operation is using concrete and substantiated data. Forecasting off of another forecast produces inconclusive and possibly erroneous results. Such methods include: • Regression analysis includes a large group of methods that can be used to predict future values of a variable using information about other variables. These methods include both parametric (linear or non-linear) and non-parametric techniques. • Autoregressive moving average with exogenous inputs (ARMAX)[6]
  13. 13. Forecasting 10 Judgmental methods Judgmental forecasting methods incorporate intuitive judgements, opinions and subjective probability estimates. • Composite forecasts • Delphi method • Forecast by analogy • Scenario building • Statistical surveys • Technology forecasting Artificial intelligence methods • Artificial neural networks • Group method of data handling • Support vector machines Often these are done today by specialized programs loosely labeled • Data mining • Machine Learning • Pattern Recognition Other methods • Simulation • Prediction market • Probabilistic forecasting and Ensemble forecasting Forecasting accuracy The forecast error is the difference between the actual value and the forecast value for the corresponding period. where E is the forecast error at period t, Y is the actual value at period t, and F is the forecast for period t. Measures of aggregate error: Mean absolute error (MAE) Mean Absolute Percentage Error (MAPE) Mean Absolute Deviation (MAD) Percent Mean Absolute Deviation (PMAD) Mean squared error (MSE) Root Mean squared error (RMSE) Forecast skill (SS) Average of Errors (E)
  14. 14. Forecasting 11 Business forecasters and practitioners sometimes use different terminology in the industry. They refer to the PMAD as the MAPE, although they compute this as a volume weighted MAPE. For more information see Calculating demand forecast accuracy. Reference class forecasting was developed to increase forecasting accuracy by framing the forecasting problem so as to take into account available distributional information.[7] Daniel Kahneman, winner of the Nobel Prize in economics, calls the use of reference class forecasting "the single most important piece of advice regarding how to increase accuracy in forecasting.”[8] Forecasting accuracy, in contrary to belief, cannot be increased by the addition of experts in the subject area relevant to the phenomenon to be forecast.[9] See also • Calculating demand forecast accuracy • Consensus forecasts • Forecast error • Predictability • Prediction intervals, similar to confidence intervals • Reference class forecasting Applications of forecasting The process of climate change and increasing energy prices has led to the usage of Egain Forecasting of buildings. The method uses forecasting to reduce the energy needed to heat the building, thus reducing the emission of greenhouse gases. Forecasting is used in the practice of Customer Demand Planning in every day business forecasting for manufacturing companies. Forecasting has also been used to predict the development of conflict situations. Experts in forecasting perform research that use empirical results to gauge the effectiveness of certain forecasting models.[10] Research has shown that there is little difference between the accuracy of forecasts performed by experts knowledgeable of the conflict situation of interest and that performed by individuals who knew much less.[11] Similarly, experts in some studies argue that role thinking does not contribute to the accuracy of the forecast.[12] The discipline of demand planning, also sometimes referred to as supply chain forecasting, embraces both statistical forecasting and a consensus process. An important, albeit often ignored aspect of forecasting, is the relationship it holds with planning. Forecasting can be described as predicting what the future will look like, whereas planning predicts what the future should look like.[13][14] There is no single right forecasting method to use. Selection of a method should be based on your objectives and your conditions (data etc.).[15] A good place to find a method, is by visiting a selection tree. An example of a selection tree can be found here.[16] Forecasting has application in many situations: • Supply chain management - Forecasting can be used in Supply Chain Management to make sure that the right product is at the right place at the right time. Accurate forecasting will help retailers reduce excess inventory and therefore increase profit margin. Studies have shown that extrapolations are the least accurate, while company earnings forecasts are the most reliable.[17] Accurate forecasting will also help them meet consumer demand. • Economic forecasting • Earthquake prediction • Egain Forecasting • Land use forecasting • Player and team performance in sports • Political Forecasting • Product forecasting • Sales Forecasting • Technology forecasting
  15. 15. Forecasting 12 • Telecommunications forecasting • Transport planning and Transportation forecasting • Weather forecasting, Flood forecasting and Meteorology Limitations As proposed by Edward Lorenz in 1963, long range weather forecasts, those made at a range of two weeks or more, are impossible to definitively predict the state of the atmosphere, owing to the chaotic nature of the fluid dynamics equations involved. Extremely small errors in the initial input, such as temperatures and winds, within numerical models doubles every five days.[18] References [1] Scott Armstrong, Fred Collopy, Andreas Graefe and Kesten C. Green (2010 (last updated)). "Answers to Frequently Asked Questions" (http:/ / qbox. wharton. upenn. edu/ documents/ mktg/ research/ FAQ. pdf). . [2] J. Scott Armstrong and Fred Collopy (1992). "Error Measures For Generalizing About Forecasting Methods: Empirical Comparisons" (http:/ / marketing. wharton. upenn. edu/ ideas/ pdf/ armstrong2/ armstrong-errormeasures-empirical. pdf). International Journal of Forecasting 8: 69–80. . [3] Flyvbjerg, B. (2008). "Curbing Optimism Bias and Strategic Misrepresentation in Planning: Reference Class Forecasting in Practice" (http:/ / www. sbs. ox. ac. uk/ centres/ bt/ Documents/ Curbing Optimism Bias and Strategic Misrepresentation. pdf). European Planning Studies 16 (1): 3–21. . [4] Daniel Kahneman, 2011, Thinking, Fast and Slow (New York: Farrar, Straus and Giroux), p. 251 [5] Nahmias, Steven (2009). Production and Operations Analysis. [6] Ellis, Kimberly (2008). Production Planning and Inventory Control Virginia Tech. McGraw Hill. ISBN 978-0-390-87106-0. [7] Flyvbjerg, B. (2008) "Curbing Optimism Bias and Strategic Misrepresentation in Planning: Reference Class Forecasting in Practice." (http:/ / www. sbs. ox. ac. uk/ centres/ bt/ Documents/ Curbing Optimism Bias and Strategic Misrepresentation. pdf) European Planning Studies,16 (1), 3-21.] [8] Daniel Kahneman (2011) Thinking, Fast and Slow (New York: Farrar, Straus and Giroux) (p. 251) [9] J. Scott Armstrong (1980). "The Seer-Sucker Theory: The Value of Experts in Forecasting" (http:/ / www. forecastingprinciples. com/ paperpdf/ seersucker. pdf). Technology Review: 16–24. . [10] J. Scott Armstrong, Kesten C. Green and Andreas Graefe (2010). "Answers to Frequently Asked Questions" (http:/ / qbox. wharton. upenn. edu/ documents/ mktg/ research/ FAQ. pdf). . [11] Kesten C. Greene and J. Scott Armstrong (2007). "The Ombudsman: Value of Expertise for Forecasting Decisions in Conflicts" (http:/ / marketing. wharton. upenn. edu/ documents/ research/ Value of expertise. pdf). Interfaces (INFORMS) 0: 1–12. . [12] Kesten C. Green and J. Scott Armstrong (1975). "Role thinking: Standing in other people’s shoes to forecast decisions in conflicts" (http:/ / www. forecastingprinciples. com/ paperpdf/ Escalation Bias. pdf). Role thinking: Standing in other people’s shoes to forecast decisions in conflicts 39: 111–116. . [13] "FAQ" (http:/ / www. forecastingprinciples. com/ index. php?option=com_content& task=view& id=3& Itemid=3). Forecastingprinciples.com. 1998-02-14. . Retrieved 2012-08-28. [14] Kesten C. Greene and J. Scott Armstrong. 2015.pdf "Structured analogies for forecasting" (http:/ / www. qbox. wharton. upenn. edu/ documents/ mktg/ research/ INTFOR3581 - Publication%) (PDF). qbox.wharton.upenn.edu. 2015.pdf. [15] "FAQ" (http:/ / www. forecastingprinciples. com/ index. php?option=com_content& task=view& id=3& Itemid=3#D. _Choosing_the_best_method). Forecastingprinciples.com. 1998-02-14. . Retrieved 2012-08-28. [16] "Selection Tree" (http:/ / www. forecastingprinciples. com/ index. php?option=com_content& task=view& id=17& Itemid=17). Forecastingprinciples.com. 1998-02-14. . Retrieved 2012-08-28. [17] J. Scott Armstrong (1983). "Relative Accuracy of Judgmental and Extrapolative Methods in Forecasting Annual Earnings" (http:/ / www. forecastingprinciples. com/ paperpdf/ Monetary Incentives. pdf). Journal of Forecasting 2: 437–447. . [18] Cox, John D. (2002). Storm Watchers. John Wiley & Sons, Inc.. pp. 222–224. ISBN 0-471-38108-X. • Armstrong, J. Scott (ed.) (2001) (in English). Principles of forecasting: a handbook for researchers and practitioners. Norwell, Massachusetts: Kluwer Academic Publishers. ISBN 0-7923-7930-6. • Flyvbjerg, Bent, 2008, "Curbing Optimism Bias and Strategic Misrepresentation in Planning: Reference Class Forecasting in Practice," European Planning Studies, vol. 16, no. 1, January, pp. 3-21. (http://www.sbs.ox.ac. uk/centres/bt/Documents/Curbing Optimism Bias and Strategic Misrepresentation.pdf) • Ellis, Kimberly (2010) (in English). Production Planning and Inventory Control. McGraw-Hill. ISBN 0-412-03471-9.
  16. 16. Forecasting 13 • Geisser, Seymour (1 June 1993) (in English). Predictive Inference: An Introduction. Chapman & Hall, CRC Press. ISBN 0-390-87106-0. • Gilchrist, Warren (1976) (in English). Statistical Forecasting. London: John Wiley & Sons. ISBN 0-471-99403-0. • Hyndman, R.J., Koehler, A.B (2005) "Another look at measures of forecast accuracy" (http://www. robjhyndman.com/papers/mase.pdf), Monash University note. • Makridakis, Spyros; Wheelwright, Steven; Hyndman, Rob J. (1998) (in English). Forecasting: methods and applications (http://www.robjhyndman.com/forecasting/). New York: John Wiley & Sons. ISBN 0-471-53233-9. • Kress, George J.; Snyder, John (30 May 1994) (in English). Forecasting and market analysis techniques: a practical approach. Westport, Connecticut, London: Quorum Books. ISBN 0-89930-835-X. • Rescher, Nicholas (1998) (in English). Predicting the future: An introduction to the theory of forecasting. State University of New York Press. ISBN 0-7914-3553-9. • Taesler, R. (1990/91) Climate and Building Energy Management. Energy and Buildings, Vol. 15-16, pp 599 – 608. • Turchin, P. (2007) "Scientific Prediction in Historical Sociology: Ibn Khaldun meets Al Saud". In: History & Mathematics: Historical Dynamics and Development of Complex Societies. (http://edurss.ru/cgi-bin/db. pl?cp=&page=Book&id=53185&lang=en&blang=en&list=Found) Moscow: KomKniga. ISBN 978-5-484-01002-8 • Sasic Kaligasidis, A et al. (2006) Upgraded weather forecast control of building heating systems. p. 951 ff in Research in Building Physics and Building Engineering Paul Fazio (Editorial Staff), ISBN 0-415-41675-2 • United States Patent 6098893 Comfort control system incorporating weather forecast data and a method for operating such a system (Inventor Stefan Berglund) External links • Forecasting Principles: "Evidence-based forecasting" (http://www.forecastingprinciples.com) • International Institute of Forecasters (http://www.forecasters.org) • Introduction to Time series Analysis (Engineering Statistics Handbook) (http://www.itl.nist.gov/div898/ handbook/pmc/section4/pmc4.htm) - A practical guide to Time series analysis and forecasting • Time Series Analysis (http://www.statsoft.com/textbook/sttimser.html) • Global Forecasting with IFs (http://www.ifs.du.edu) • Earthquake Electromagnetic Precursor Research (http://www.quakefinder.com)
  17. 17. Stationary process 14 Stationary process In mathematics, a stationary process (or strict(ly) stationary process or strong(ly) stationary process) is a stochastic process whose joint probability distribution does not change when shifted in time or space. Consequently, parameters such as the mean and variance, if they exist, also do not change over time or position. Stationarity is used as a tool in time series analysis, where the raw data are often transformed to become stationary; for example, economic data are often seasonal and/or dependent on a non-stationary price level. An important type of non-stationary process that does not include a trend-like behavior is the cyclostationary process. Note that a "stationary process" is not the same thing as a "process with a stationary distribution". Indeed there are further possibilities for confusion with the use of "stationary" in the context of stochastic processes; for example a "time-homogeneous" Markov chain is sometimes said to have "stationary transition probabilities". On the other hand, all stationary Markov random processes are time-homogeneous. Definition Formally, let be a stochastic process and let represent the cumulative distribution function of the joint distribution of at times . Then, is said to be stationary if, for all , for all , and for all , Since does not affect , is not a function of time. Examples As an example, white noise is stationary. The sound of a cymbal clashing, if hit only once, is not stationary because the acoustic power of the clash (and hence its variance) diminishes with time. However, it would be possible to invent a stochastic process describing when the cymbal is hit, such that the overall response would form a stationary process. An example of a discrete-time stationary process where the sample space is also discrete (so that the random variable may take one of N possible values) is a Bernoulli scheme. Other examples of a discrete-time stationary process with continuous sample space include some autoregressive and moving average processes which are both subsets of the autoregressive moving average model. Models with a Two simulated time series processes, one non-trivial autoregressive component may be either stationary or stationary the other non-stationary. The Augmented Dickey–Fuller test is reported for non-stationary, depending on the parameter values, and important each process and non-stationarity cannot be non-stationary special cases are where unit roots exist in the model. rejected for the second process. Let Y be any scalar random variable, and define a time-series { Xt }, by . Then { Xt } is a stationary time series, for which realisations consist of a series of constant values, with a different constant value for each realisation. A law of large numbers does not apply on this case, as the limiting value of an average from a single realisation takes the random value determined by Y, rather than taking the expected value of Y. As a further example of a stationary process for which any single realisation has an apparently noise-free structure, let Y have a uniform distribution on (0,2π] and define the time series { Xt } by
  18. 18. Stationary process 15 Then { Xt } is strictly stationary. Weaker forms of stationarity Weak or wide-sense stationarity A weaker form of stationarity commonly employed in signal processing is known as weak-sense stationarity, wide-sense stationarity (WSS) or covariance stationarity. WSS random processes only require that 1st moment and covariance do not vary with respect to time. Any strictly stationary process which has a mean and a covariance is also WSS. So, a continuous-time random process x(t) which is WSS has the following restrictions on its mean function and autocovariance function The first property implies that the mean function mx(t) must be constant. The second property implies that the covariance function depends only on the difference between and and only needs to be indexed by one variable rather than two variables. Thus, instead of writing, the notation is often abbreviated and written as: This also implies that the autocorrelation depends only on , since When processing WSS random signals with linear, time-invariant (LTI) filters, it is helpful to think of the correlation function as a linear operator. Since it is a circulant operator (depends only on the difference between the two arguments), its eigenfunctions are the Fourier complex exponentials. Additionally, since the eigenfunctions of LTI operators are also complex exponentials, LTI processing of WSS random signals is highly tractable—all computations can be performed in the frequency domain. Thus, the WSS assumption is widely employed in signal processing algorithms. Second-order stationarity The case of second-order stationarity arises when the requirements of strict stationarity are only applied to pairs of random variables from the time-series. The definition of second order stationarity can be generalized to Nth order (for finite N) and strict stationary means stationary of all orders. A process is second order stationary if the first and second order density functions satisfy for all , , and . Such a process will be wide sense stationary if the mean and correlation functions are finite. A process can be wide sense stationary without being second order stationary.
  19. 19. Stationary process 16 Other terminology The terminology used for types of stationarity other than strict stationarity can be rather mixed. Some examples follow. • Priestley[1][2] uses stationary up to order m if conditions similar to those given here for wide sense stationarity apply relating to moments up to order m. Thus wide sense stationarity would be equivalent to "stationary to order 2", which is different from the definition of second-order stationarity given here. • Honarkhah[3] also uses the assumption of stationarity in the context of multiple-point geostatistics, where higher n-point statistics are assumed to be stationary in the spatial domain. References [1] Priestley, M.B. (1981) Spectral Analysis and Time Series, Academic Press. ISBN 0-12-564922-3 [2] Priestley, M.B. (1988) Non-linear and Non-stationary Time Series Analysis, Academic Press. ISBN 0-12-564911-8 [3] Honarkhah, M and Caers, J, 2010, Stochastic Simulation of Patterns Using Distance-Based Pattern Modeling (http:/ / dx. doi. org/ 10. 1007/ s11004-010-9276-7), Mathematical Geosciences, 42: 487 - 517 External links • Spectral decomposition of a random function (Springer) (http://eom.springer.de/s/s086360.htm) Stochastic process In probability theory, a stochastic process (pronunciation: /stəʊˈkæstɪk/), or sometimes random process (widely used) is a collection of random variables; this is often used to represent the evolution of some random value, or system, over time. This is the probabilistic counterpart to a deterministic process (or deterministic system). Instead of describing a process which can only evolve in one way (as in the case, for example, of solutions of an ordinary differential equation), in a stochastic or random process there is some indeterminacy: even if the initial condition (or starting point) is known, there are several (often infinitely many) directions in which the process may evolve. In the simple case of discrete time, a stochastic process amounts to a sequence of random variables known as a time series (for example, see Markov chain). Another basic type of a stochastic process is a random field, whose domain is a region of space, in other words, a random function whose arguments are drawn from a range of continuously changing values. One approach to stochastic processes treats them as functions of one or several deterministic arguments (inputs, in most cases regarded as time) whose values (outputs) are random variables: non-deterministic (single) quantities which have certain probability distributions. Random variables corresponding to various times (or points, in the case of random fields) may be completely different. The main requirement is that these different random quantities all have the same type. Type refers to the codomain of the function. Although the random values of a stochastic process at different times may be independent random variables, in most commonly considered situations they exhibit complicated statistical correlations. Familiar examples of processes modeled as stochastic time series include stock market and exchange rate fluctuations, signals such as speech, audio and video, medical data such as a patients EKG, EEG, blood pressure or temperature, and random movement such as Brownian motion or random walks. Examples of random fields include static images, random terrain (landscapes), wind waves or composition variations of a heterogeneous material.
  20. 20. Stochastic process 17 Formal definition and basic properties Definition Given a probability space and a measurable space , an S-valued stochastic process is a collection of S-valued random variables on , indexed by a totally ordered set T ("time"). That is, a stochastic process X is a collection where each is an S-valued random variable on . The space S is then called the state space of the process. Finite-dimensional distributions Let X be an S-valued stochastic process. For every finite subset , the k-tuple is a random variable taking values in . The distribution of this random variable is a probability measure on . This is called a finite-dimensional distribution of X. Under suitable topological restrictions, a suitably "consistent" collection of finite-dimensional distributions can be used to define a stochastic process (see Kolmogorov extension in the next section). Construction In the ordinary axiomatization of probability theory by means of measure theory, the problem is to construct a sigma-algebra of measurable subsets of the space of all functions, and then put a finite measure on it. For this purpose one traditionally uses a method called Kolmogorov extension.[1] There is at least one alternative axiomatization of probability theory by means of expectations on C-star algebras of random variables. In this case the method goes by the name of Gelfand–Naimark–Segal construction. This is analogous to the two approaches to measure and integration, where one has the choice to construct measures of sets first and define integrals later, or construct integrals first and define set measures as integrals of characteristic functions. Kolmogorov extension The Kolmogorov extension proceeds along the following lines: assuming that a probability measure on the space of all functions exists, then it can be used to specify the joint probability distribution of finite-dimensional random variables . Now, from this n-dimensional probability distribution we can deduce an (n − 1)-dimensional marginal probability distribution for . Note that the obvious compatibility condition, namely, that this marginal probability distribution be in the same class as the one derived from the full-blown stochastic process, is not a requirement. Such a condition only holds, for example, if the stochastic process is a Wiener process (in which case the marginals are all gaussian distributions of the exponential class) but not in general for all stochastic processes. When this condition is expressed in terms of probability densities, the result is called the Chapman–Kolmogorov equation. The Kolmogorov extension theorem guarantees the existence of a stochastic process with a given family of finite-dimensional probability distributions satisfying the Chapman–Kolmogorov compatibility condition.
  21. 21. Stochastic process 18 Separability, or what the Kolmogorov extension does not provide Recall that in the Kolmogorov axiomatization, measurable sets are the sets which have a probability or, in other words, the sets corresponding to yes/no questions that have a probabilistic answer. The Kolmogorov extension starts by declaring to be measurable all sets of functions where finitely many coordinates are restricted to lie in measurable subsets of . In other words, if a yes/no question about f can be answered by looking at the values of at most finitely many coordinates, then it has a probabilistic answer. In measure theory, if we have a countably infinite collection of measurable sets, then the union and intersection of all of them is a measurable set. For our purposes, this means that yes/no questions that depend on countably many coordinates have a probabilistic answer. The good news is that the Kolmogorov extension makes it possible to construct stochastic processes with fairly arbitrary finite-dimensional distributions. Also, every question that one could ask about a sequence has a probabilistic answer when asked of a random sequence. The bad news is that certain questions about functions on a continuous domain dont have a probabilistic answer. One might hope that the questions that depend on uncountably many values of a function be of little interest, but the really bad news is that virtually all concepts of calculus are of this sort. For example: 1. boundedness 2. continuity 3. differentiability all require knowledge of uncountably many values of the function. One solution to this problem is to require that the stochastic process be separable. In other words, that there be some countable set of coordinates whose values determine the whole random function f. The Kolmogorov continuity theorem guarantees that processes that satisfy certain constraints on the moments of their increments have continuous modifications and are therefore separable. Filtrations Given a probability space , a filtration is a weakly increasing collection of sigma-algebras on , , indexed by some totally ordered set T, and bounded above by . I.e. for with s < t, . A stochastic process X on the same time set T is said to be adapted to the filtration if, for every , is [2] -measurable. The natural filtration Given a stochastic process , the natural filtration for (or induced by) this process is the filtration where is generated by all values of up to time s = t. I.e. . A stochastic process is always adapted to its natural filtration. Classification Stochastic processes can be classified according to the cardinality of its index set (usually interpreted as time) and state space.
  22. 22. Stochastic process 19 Discrete time and discrete states If both and belong to , the set of natural numbers, then we have models which lead to Markov chains. For example: (a) If means the bit (0 or 1) in position of a sequence of transmitted bits, then can be modelled as a Markov chain with 2 states. This leads to the error correcting viterbi algorithm in data transmission. (b) If means the combined genotype of a breeding couple in the th generation in a inbreeding model, it can be shown that the proportion of heterozygous individuals in the population approaches zero as goes to ∞.[3] Continuous time and continuous state space The paradigm of continuous stochastic process is that of the Wiener process. In its original form the problem was concerned with a particle floating on a liquid surface, receiving "kicks" from the molecules of the liquid. The particle is then viewed as being subject to a random force which, since the molecules are very small and very close together, is treated as being continuous and, since the particle is constrained to the surface of the liquid by surface tension, is at each point in time a vector parallel to the surface. Thus the random force is described by a two component stochastic process; two real-valued random variables are associated to each point in the index set, time, (note that since the liquid is viewed as being homogeneous the force is independent of the spatial coordinates) with the domain of the two random variables being R, giving the x and y components of the force. A treatment of Brownian motion generally also includes the effect of viscosity, resulting in an equation of motion known as the Langevin equation.[4] Discrete time and continuous state space If the index set of the process is N (the natural numbers), and the range is R (the real numbers), there are some natural questions to ask about the sample sequences of a process {Xi}i ∈ N, where a sample sequence is {Xi(ω)}i ∈ N. 1. What is the probability that each sample sequence is bounded? 2. What is the probability that each sample sequence is monotonic? 3. What is the probability that each sample sequence has a limit as the index approaches ∞? 4. What is the probability that the series obtained from a sample sequence from converges? 5. What is the probability distribution of the sum? Main applications of discrete time continuous state stochastic models include Markov chain Monte Carlo (MCMC) and the analysis of Time Series. Continuous time and discrete state space Similarly, if the index space I is a finite or infinite interval, we can ask about the sample paths {Xt(ω)}t ∈ I 1. What is the probability that it is bounded/integrable/continuous/differentiable...? 2. What is the probability that it has a limit at ∞ 3. What is the probability distribution of the integral? References [1] Karlin, Samuel & Taylor, Howard M. (1998). An Introduction to Stochastic Modeling, Academic Press. ISBN 0-12-684887-4. [2] Durrett, Rick. Probability: Theory and Examples. Fourth Edition. Cambridge: Cambridge University Press, 2010. [3] Allen, Linda J. S., An Introduction to Stochastic Processes with Applications to Biology, 2th Edition, Chapman and Hall, 2010, ISBN 1-4398-1882-7 [4] Gardiner, C. Handbook of Stochastic Methods: for Physics, Chemistry and the Natural Sciences, 3th Edition, Springer, 2004, ISBN 3540208828
  23. 23. Stochastic process 20 Further reading • Wio, S. Horacio, Deza, R. Roberto & Lopez, M. Juan (2012). An Introduction to Stochastic Processes and Nonequilibrium Statistical Physics. World Scientific Publishing. ISBN 978-981-4374-78-1. • Papoulis, Athanasios & Pillai, S. Unnikrishna (2001). Probability, Random Variables and Stochastic Processes. McGraw-Hill Science/Engineering/Math. ISBN 0-07-281725-9. • Boris Tsirelson. "Lecture notes in Advanced probability theory" (http://www.webcitation.org/5cfvVZ4Kd). • Doob, J. L. (1953). Stochastic Processes. Wiley. • Klebaner, Fima C. (2011). Introduction to Stochastic Calculus With Applications. Imperial College Press. ISBN 1-84816-831-4. • Bruce Hajek (July 2006). "An Exploration of Random Processes for Engineers" (http://www.ifp.uiuc.edu/ ~hajek/Papers/randomprocesses.html). • "An 8 foot tall Probability Machine (named Sir Francis) comparing stock market returns to the randomness of the beans dropping through the quincunx pattern" (http://www.youtube.com/watch?v=AUSKTk9ENzg). Index Funds Advisors IFA.com (http://www.ifa.com). • "Popular Stochastic Processes used in Quantitative Finance" (http://www.sitmo.com/article/ popular-stochastic-processes-in-finance/). sitmo.com. • "Addressing Risk and Uncertainty" (http://www.goldsim.com/Content.asp?PageID=455). Covariance In probability theory and statistics, covariance is a measure of how much two random variables change together. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the smaller values, i.e., the variables tend to show similar behavior, the covariance is positive.[1] In the opposite case, when the greater values of one variable mainly correspond to the smaller values of the other, i.e., the variables tend to show opposite behavior, the covariance is negative. The sign of the covariance therefore shows the tendency in the linear relationship between the variables. The magnitude of the covariance is not that easy to interpret. The normalized version of the covariance, the correlation coefficient, however, shows by its magnitude the strength of the linear relation. A distinction must be made between (1) the covariance of two random variables, which is a population parameter that can be seen as a property of the joint probability distribution, and (2) the sample covariance, which serves as an estimated value of the parameter. Definition The covariance between two jointly distributed real-valued random variables x and y with finite second moments is defined[2] as where E[x] is the expected value of x, also known as the mean of x. By using the linearity property of expectations, this can be simplified to For random vectors and (of dimension m and n respectively) the m×n covariance matrix is equal to
  24. 24. Covariance 21 where mT is the transpose of the vector (or matrix) m. The (i,j)-th element of this matrix is equal to the covariance Cov(xi, yj) between the i-th scalar component of x and the j-th scalar component of y. In particular, Cov(y, x) is the transpose of Cov(x, y). For a vector of m jointly distributed random variables with finite second moments, its covariance matrix is defined as Random variables whose covariance is zero are called uncorrelated. The units of measurement of the covariance Cov(x, y) are those of x times those of y. By contrast, correlation coefficients, which depend on the covariance, are a dimensionless measure of linear dependence. (In fact, correlation coefficients can simply be understood as a normalized version of covariance.) Properties • Variance is a special case of the covariance when the two variables are identical: • If x, y, w, and v are real-valued random variables and a, b, c, d are constant ("constant" in this context means non-random), then the following facts are a consequence of the definition of covariance: For sequences x1, ..., xn and y1, ..., ym of random variables, we have For a sequence x1, ..., xn of random variables, and constants a1, ..., an, we have A more general identity for covariance matrices Let be a random vector, let denote its covariance matrix, and let be a matrix that can act on . The result of applying this matrix to is a new vector with covariance matrix . This is a direct result of the linearity of expectation and is useful when applying a linear transformation, such as a whitening transformation, to a vector.
  25. 25. Covariance 22 Uncorrelatedness and independence If x and y are independent, then their covariance is zero. This follows because under independence, The converse, however, is not generally true. For example, let x be uniformly distributed in [-1, 1] and let y = x2. Clearly, x and y are dependent, but In this case, the relationship between y and x is non-linear, while correlation and covariance are measures of linear dependence between two variables. Still, as in the example, if two variables are uncorrelated, that does not imply that they are independent. Relationship to inner products Many of the properties of covariance can be extracted elegantly by observing that it satisfies similar properties to those of an inner product: 1. bilinear: for constants a and b and random variables x, y, z, σ(ax + by, z) = a σ(x, z) + b σ(y, z); 2. symmetric: σ(x, y) = σ(y, x); 3. positive semi-definite: σ2(x) = σ(x, x) ≥ 0, and σ(x, x) = 0 implies that x is a constant random variable (K). In fact these properties imply that the covariance defines an inner product over the quotient vector space obtained by taking the subspace of random variables with finite second moment and identifying any two that differ by a constant. (This identification turns the positive semi-definiteness above into positive definiteness.) That quotient vector space is isomorphic to the subspace of random variables with finite second moment and mean zero; on that subspace, the covariance is exactly the L2 inner product of real-valued functions on the sample space. As a result for random variables with finite variance, the inequality holds via the Cauchy–Schwarz inequality. Proof: If σ2(y) = 0, then it holds trivially. Otherwise, let random variable Then we have QED.
  26. 26. Covariance 23 Calculating the sample covariance The sample covariance of N observations of K variables is the K-by-K matrix with the entries , which is an estimate of the covariance between variable j and variable k. The sample mean and the sample covariance matrix are unbiased estimates of the mean and the covariance matrix of the random vector , a row vector whose jth element (j = 1, ..., K) is one of the random variables. The reason the sample covariance matrix has in the denominator rather than is essentially that the population mean is not known and is replaced by the sample mean . If the population mean is known, the analogous unbiased estimate is given by Comments The covariance is sometimes called a measure of "linear dependence" between the two random variables. That does not mean the same thing as in the context of linear algebra (see linear dependence). When the covariance is normalized, one obtains the correlation coefficient. From it, one can obtain the Pearson coefficient, which gives us the goodness of the fit for the best possible linear function describing the relation between the variables. In this sense covariance is a linear gauge of dependence. References [1] http:/ / mathworld. wolfram. com/ Covariance. html [2] Oxford Dictionary of Statistics, Oxford University Press, 2002, p. 104. External links • Hazewinkel, Michiel, ed. (2001), "Covariance" (http://www.encyclopediaofmath.org/index.php?title=p/ c026800), Encyclopedia of Mathematics, Springer, ISBN 978-1-55608-010-4 • MathWorld page on calculating the sample covariance (http://mathworld.wolfram.com/Covariance.html) • Covariance Tutorial using R (http://www.r-tutor.com/elementary-statistics/numerical-measures/covariance)
  27. 27. Autocovariance 24 Autocovariance In statistics, given a real stochastic process X(t), the autocovariance is the covariance of the variable against a time-shifted version of itself. If the process has the mean , then the autocovariance is given by where E is the expectation operator. Stationarity If X(t) is stationary process, then the following are true: for all t, s and where is the lag time, or the amount of time by which the signal has been shifted. As a result, the autocovariance becomes where RXX represents the autocorrelation in the signal processing sense. Normalization When normalized by dividing by the variance σ2, the autocovariance C becomes the autocorrelation coefficient function c,[1] The autocovariance function is itself a version of the autocorrelation function with the mean level removed. If the signal has a mean of 0, the autocovariance and autocorrelation functions are identical.[1] However, often the autocovariance is called autocorrelation even if this normalization has not been performed. The autocovariance can be thought of as a measure of how similar a signal is to a time-shifted version of itself with an autocovariance of σ2 indicating perfect correlation at that lag. The normalisation with the variance will put this into the range [−1, 1]. Properties The autocovariance of a linearly filtered process is
  28. 28. Autocovariance 25 References • P. G. Hoel, Mathematical Statistics, Wiley, New York, 1984. • Lecture notes on autocovariance from WHOI [2] [1] Westwick, David T. (2003). Identification of Nonlinear Physiological Systems. IEEE Press. pp. 17–18. ISBN 0-471-27456-9. [2] http:/ / w3eos. whoi. edu/ 12. 747/ notes/ lect06/ l06s02. html Autocorrelation Autocorrelation is the cross-correlation of a signal with itself. Informally, it is the similarity between observations as a function of the time separation between them. It is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal which has been buried under noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals. Definitions A plot showing 100 random numbers with a "hidden" sine function, and an autocorrelation Different fields of study define (correlogram) of the series on the bottom. autocorrelation differently, and not all of these definitions are equivalent. In some fields, the term is used interchangeably with autocovariance. Statistics In statistics, the autocorrelation of a random process describes the correlation between values of the process at different times, as a function of the two times or of the time difference. Let X be some repeatable process, and i be some point in time after the Visual comparison of convolution, cross-correlation and autocorrelation. start of that process. (i may be an integer for a discrete-time process or a real number for a continuous-time process.) Then Xi is the value (or realization) produced by a given run of the process at time i. Suppose that the process is further known to have defined values for mean μi and variance σi2 for all times i. Then the definition of the autocorrelation between times s and t is where "E" is the expected value operator. Note that this expression is not well-defined for all time series or processes, because the variance may be zero (for a constant process) or infinite. If the function R is well-defined, its
  29. 29. Autocorrelation 26 value must lie in the range [−1, 1], with 1 indicating perfect correlation and −1 indicating perfect anti-correlation. If Xt is a second-order stationary process then the mean μ and the variance σ2 are time-independent, and further the autocorrelation depends only on the difference between t and s: the correlation depends only on the time-distance between the pair of values but not on their position in time. This further implies that the autocorrelation can be expressed as a function of the time-lag, and that this would be an even function of the lag τ = s − t. This gives the more familiar form and the fact that this is an even function can be stated as It is common practice in some disciplines, other than statistics and time series analysis, to drop the normalization by σ2 and use the term "autocorrelation" interchangeably with "autocovariance". However, the normalization is important both because the interpretation of the autocorrelation as a correlation provides a scale-free measure of the strength of statistical dependence, and because the normalization has an effect on the statistical properties of the estimated autocorrelations. Signal processing In signal processing, the above definition is often used without the normalization, that is, without subtracting the mean and dividing by the variance. When the autocorrelation function is normalized by mean and variance, it is sometimes referred to as the autocorrelation coefficient.[1] Given a signal , the continuous autocorrelation is most often defined as the continuous cross-correlation integral of with itself, at lag . where represents the complex conjugate and represents convolution. For a real function, . The discrete autocorrelation at lag for a discrete signal is The above definitions work for signals that are square integrable, or square summable, that is, of finite energy. Signals that "last forever" are treated instead as random processes, in which case different definitions are needed, based on expected values. For wide-sense-stationary random processes, the autocorrelations are defined as For processes that are not stationary, these will also be functions of , or . For processes that are also ergodic, the expectation can be replaced by the limit of a time average. The autocorrelation of an ergodic process is sometimes defined as or equated to[1] These definitions have the advantage that they give sensible well-defined single-parameter results for periodic functions, even when those functions are not the output of stationary ergodic processes. Alternatively, signals that last forever can be treated by a short-time autocorrelation function analysis, using finite time integrals. (See short-time Fourier transform for a related process.)
  30. 30. Autocorrelation 27 Multi-dimensional autocorrelation is defined similarly. For example, in three dimensions the autocorrelation of a square-summable discrete signal would be When mean values are subtracted from signals before computing an autocorrelation function, the resulting function is usually called an auto-covariance function. Properties In the following, we will describe properties of one-dimensional autocorrelations only, since most properties are easily transferred from the one-dimensional case to the multi-dimensional cases. • A fundamental property of the autocorrelation is symmetry, , which is easy to prove from the definition. In the continuous case, the autocorrelation is an even function when is a real function, and the autocorrelation is a Hermitian function when is a complex function. • The continuous autocorrelation function reaches its peak at the origin, where it takes a real value, i.e. for any delay , . This is a consequence of the Cauchy–Schwarz inequality. The same result holds in the discrete case. • The autocorrelation of a periodic function is, itself, periodic with the same period. • The autocorrelation of the sum of two completely uncorrelated functions (the cross-correlation is zero for all ) is the sum of the autocorrelations of each function separately. • Since autocorrelation is a specific type of cross-correlation, it maintains all the properties of cross-correlation. • The autocorrelation of a continuous-time white noise signal will have a strong peak (represented by a Dirac delta function) at and will be absolutely 0 for all other . • The Wiener–Khinchin theorem relates the autocorrelation function to the power spectral density via the Fourier transform: • For real-valued functions, the symmetric autocorrelation function has a real symmetric transform, so the Wiener–Khinchin theorem can be re-expressed in terms of real cosines only:
  31. 31. Autocorrelation 28 Efficient computation For data expressed as a discrete sequence, it is frequently necessary to compute the autocorrelation with high computational efficiency. The brute force method based on the definition can be used. For example, to calculate the autocorrelation of , we employ the usual multiplication method with right shifts: 231 ×231 ________ 231 693 462 _____________ 2 9 14 9 2 Thus the required autocorrelation is (2,9,14,9,2). In this calculation we do not perform the carry-over operation during addition because the vector has been defined over a field of real numbers. Note that we can halve the number of operations required by exploiting the inherent symmetry of the autocorrelation. While the brute force algorithm is order n2, several efficient algorithms exist which can compute the autocorrelation in order n log(n). For example, the Wiener–Khinchin theorem allows computing the autocorrelation from the raw data X(t) with two Fast Fourier transforms (FFT)[2]: FR(f) = FFT[X(t)] S(f) = FR(f) FR*(f) R(τ) = IFFT[S(f)] where IFFT denotes the inverse Fast Fourier transform. The asterisk denotes complex conjugate. Alternatively, a multiple τ correlation can be performed by using brute force calculation for low τ values, and then progressively binning the X(t) data with a logarithmic density to compute higher values, resulting in the same n log(n) efficiency, but with lower memory requirements. Estimation For a discrete process of length defined as with known mean and variance, an estimate of the autocorrelation may be obtained as for any positive integer . When the true mean and variance are known, this estimate is unbiased. If the true mean and variance of the process are not known there are a several possibilities: • If and are replaced by the standard formulae for sample mean and sample variance, then this is a biased estimate. • A periodogram-based estimate replaces in the above formula with . This estimate is always biased; [3][4] however, it usually has a smaller mean square error. • Other possibilities derive from treating the two portions of data and separately and calculating separate sample means and/or sample variances for use in defining the estimate. The advantage of estimates of the last type is that the set of estimated autocorrelations, as a function of , then form a function which is a valid autocorrelation in the sense that it is possible to define a theoretical process having
  32. 32. Autocorrelation 29 exactly that autocorrelation. Other estimates can suffer from the problem that, if they are used to calculate the variance of a linear combination of the s, the variance calculated may turn out to be negative. Regression analysis In regression analysis using time series data, autocorrelation of the errors is a problem. Autocorrelation of the errors, which themselves are unobserved, can generally be detected because it produces autocorrelation in the observable residuals. (Errors are also known as "error terms" in econometrics.) Autocorrelation violates the ordinary least squares (OLS) assumption that the error terms are uncorrelated. While it does not bias the OLS coefficient estimates, the standard errors tend to be underestimated (and the t-scores overestimated) when the autocorrelations of the errors at low lags are positive. The traditional test for the presence of first-order autocorrelation is the Durbin–Watson statistic or, if the explanatory variables include a lagged dependent variable, Durbins h statistic. A more flexible test, covering autocorrelation of higher orders and applicable whether or not the regressors include lags of the dependent variable, is the Breusch–Godfrey test. This involves an auxiliary regression, wherein the residuals obtained from estimating the model of interest are regressed on (a) the original regressors and (b) k lags of the residuals, where k is the order of the test. The simplest version of the test statistic from this auxiliary regression is TR2, where T is the sample size and R2 is the coefficient of determination. Under the null hypothesis of no autocorrelation, this statistic is asymptotically distributed as with k degrees of freedom. Responses to nonzero autocorrelation include generalized least squares and the Newey–West HAC estimator (Heteroskedasticity and Autocorrelation Consistent).[5] Applications • One application of autocorrelation is the measurement of optical spectra and the measurement of very-short-duration light pulses produced by lasers, both using optical autocorrelators. • Autocorrelation is used to analyze Dynamic light scattering data, which notably enables to determine the particle size distributions of nanometer-sized particles or micelles suspended in a fluid. A laser shining into the mixture produces a speckle pattern that results from the motion of the particles. Autocorrelation of the signal can be analyzed in terms of the diffusion of the particles. From this, knowing the viscosity of the fluid, the sizes of the particles can be calculated. • The Small-angle X-ray scattering intensity of a nanostructured system is the Fourier transform of the spatial autocorrelation function of the electron density. • In optics, normalized autocorrelations and cross-correlations give the degree of coherence of an electromagnetic field. • In signal processing, autocorrelation can give information about repeating events like musical beats (for example, to determine tempo) or pulsar frequencies, though it cannot tell the position in time of the beat. It can also be used to estimate the pitch of a musical tone. • In music recording, autocorrelation is used as a pitch detection algorithm prior to vocal processing, as a distortion effect or to eliminate undesired mistakes and inaccuracies.[6] • Autocorrelation in space rather than time, via the Patterson function, is used by X-ray diffractionists to help recover the "Fourier phase information" on atom positions not available through diffraction alone. • In statistics, spatial autocorrelation between sample locations also helps one estimate mean value uncertainties when sampling a heterogeneous population. • The SEQUEST algorithm for analyzing mass spectra makes use of autocorrelation in conjunction with cross-correlation to score the similarity of an observed spectrum to an idealized spectrum representing a peptide.
  33. 33. Autocorrelation 30 • In Astrophysics, auto-correlation is used to study and characterize the spatial distribution of galaxies in the Universe and in multi-wavelength observations of Low Mass X-ray Binaries. • In panel data, spatial autocorrelation refers to correlation of a variable with itself through space. • In analysis of Markov chain Monte Carlo data, autocorrelation must be taken into account for correct error determination. References [1] Patrick F. Dunn, Measurement and Data Analysis for Engineering and Science, New York: McGraw–Hill, 2005 ISBN 0-07-282538-3 [2] Box, G. E. P., G. M. Jenkins, and G. C. Reinsel. Time Series Analysis: Forecasting and Control. 3rd ed. Upper Saddle River, NJ: Prentice–Hall, 1994. [3] Spectral analysis and time series, M.B. Priestley (London, New York : Academic Press, 1982) [4] Percival, Donald B.; Andrew T. Walden (1993). Spectral Analysis for Physical Applications: Multitaper and Conventional Univariate Techniques. Cambridge University Press. pp. 190–195. ISBN 0-521-43541-2. [5] Christopher F. Baum (2006). An Introduction to Modern Econometrics Using Stata (http:/ / books. google. com/ ?id=acxtAylXvGMC& pg=PA141& dq=newey-west-standard-errors+ generalized-least-squares). Stata Press. ISBN 1-59718-013-0. . [6] Tyrangiel, Josh (2009-02-05). "Auto-Tune: Why Pop Music Sounds Perfect" (http:/ / www. time. com/ time/ magazine/ article/ 0,9171,1877372,00. html). Time Magazine. . External links • Weisstein, Eric W., " Autocorrelation (http://mathworld.wolfram.com/Autocorrelation.html)" from MathWorld. • Autocorrelation articles in Comp.DSP (DSP usenet group). (http://www.dsprelated.com/comp.dsp/keyword/ Autocorrelation.php) • GPU accelerated calculation of autocorrelation function. (http://www.iop.org/EJ/abstract/1367-2630/11/9/ 093024/)
  34. 34. Cross-correlation 31 Cross-correlation In signal processing, cross-correlation is a measure of similarity of two waveforms as a function of a time-lag applied to one of them. This is also known as a sliding dot product or sliding inner-product. It is commonly used for searching a long-signal for a shorter, known feature. It also has applications in pattern recognition, single particle analysis, electron tomographic averaging, cryptanalysis, and neurophysiology. For continuous functions, f and g, the cross-correlation is defined as: where f * denotes the complex conjugate of f. Similarly, for discrete functions, the cross-correlation is defined as: The cross-correlation is similar in nature to the convolution of two functions. In an autocorrelation, which is the cross-correlation of a signal with itself, there will always be a peak at a lag of zero unless the signal is a trivial zero signal. In probability theory and statistics, correlation is always used to include a standardising factor in such a way that correlations have values between −1 and +1, Visual comparison of convolution, cross-correlation and autocorrelation. and the term cross-correlation is used for referring to the correlation corr(X, Y) between two random variables X and Y, while the "correlation" of a random vector X is considered to be the correlation matrix (matrix of correlations) between the scalar elements of X. If and are two independent random variables with probability density functions f and g, respectively, then the probability density of the difference is formally given by the cross-correlation (in the signal-processing sense) ; however this terminology is not used in probability and statistics. In contrast, the convolution (equivalent to the cross-correlation of f(t) and g(−t) ) gives the probability density function of the sum . Explanation As an example, consider two real valued functions and differing only by an unknown shift along the x-axis. One can use the cross-correlation to find how much must be shifted along the x-axis to make it identical to . The formula essentially slides the function along the x-axis, calculating the integral of their product at each position. When the functions match, the value of is maximized. This is because when peaks (positive areas) are aligned, they make a large contribution to the integral. Similarly, when troughs (negative areas) align, they also make a positive contribution to the integral because the product of two negative numbers is positive. With complex-valued functions and , taking the conjugate of ensures that aligned peaks (or aligned troughs) with imaginary components will contribute positively to the integral. In econometrics, lagged cross-correlation is sometimes referred to as cross-autocorrelation[1]
  35. 35. Cross-correlation 32 Properties • The correlation of functions f(t) and g(t) is equivalent to the convolution of f *(−t) and g(t).  I.e.: • If f is Hermitian, then • • Analogous to the convolution theorem, the cross-correlation satisfies: where denotes the Fourier transform, and an asterisk again indicates the complex conjugate. Coupled with fast Fourier transform algorithms, this property is often exploited for the efficient numerical computation of cross-correlations. (see circular cross-correlation) • The cross-correlation is related to the spectral density. (see Wiener–Khinchin theorem) • The cross correlation of a convolution of f and h with a function g is the convolution of the correlation of f and g with the kernel h: Normalized cross-correlation For image-processing applications in which the brightness of the image and template can vary due to lighting and exposure conditions, the images can be first normalized. This is typically done at every step by subtracting the mean and dividing by the standard deviation. That is, the cross-correlation of a template, with a subimage is . where is the number of pixels in and , is the average of f and is standard deviation of f. In functional analysis terms, this can be thought of as the dot product of two normalized vectors. That is, if and then the above sum is equal to where is the inner product and is the L² norm. Thus, if f and t are real matrices, their normalized cross-correlation equals the cosine of the angle between the unit vectors F and T, being thus 1 if and only if F equals T multiplied by a positive scalar. Normalized correlation is one of the methods used for template matching, a process used for finding incidences of a pattern or object within an image. It is also the 2-dimensional version of Pearson product-moment correlation coefficient.