This document discusses testing the normality assumption of log-returns for stock prices. It summarizes that the Black-Scholes model, widely used in pricing derivatives, assumes log-returns are normally distributed. The author tests this assumption on over 1000 company stock prices from the Nasdaq composite index using Kolmogorov-Smirnov, Shapiro-Wilk, and Anderson-Darling goodness-of-fit tests for normality with daily, weekly, and monthly price data from 2000-2011.
This document describes a stock price model based on the voter model from statistical physics. The model represents stock investors on a lattice who can be in a buying, selling, or neutral position based on interactions with neighboring investors. The stock price is then derived from the positions of investors. Computer simulations of the 1D model are presented, showing the fluctuations of generated stock prices and returns. The document also discusses properties found in real stock market data, such as power law distributions and volatility clustering, that the model aims to replicate.
This document presents a new statistical physics model for analyzing stock market fluctuations based on the contact process model from statistical physics. The contact process model is used to model the spread of information in the stock market and how it affects investor positions and subsequently stock prices. Simulation results from the one-dimensional contact process model are presented and show fat-tailed distributions and long memory effects in returns that are consistent with real stock market data. Statistical analysis techniques are applied to study the fluctuation characteristics of stock price returns from the Shanghai and Shenzhen stock exchanges.
This document compares the Black-Scholes model and Merton jump diffusion model for accurately estimating option prices. It estimates parameters for both models using historical stock price and option data from several large companies. The results do not conclusively show that Merton jump diffusion more accurately prices derivatives, but it may better approximate moments from the data. The document provides background on geometric Brownian motion, the Black-Scholes option pricing formula, and the maximum likelihood estimation and EM algorithm used to estimate Merton jump diffusion model parameters.
COVARIANCE ESTIMATION AND RELATED PROBLEMS IN PORTFOLIO OPTIMICruzIbarra161
COVARIANCE ESTIMATION AND RELATED PROBLEMS IN PORTFOLIO OPTIMIZATION
Ilya Pollak
Purdue University
School of Electrical and Computer Engineering
West Lafayette, IN 47907
USA
ABSTRACT
This overview paper reviews covariance estimation problems and re-
lated issues arising in the context of portfolio optimization. Given
several assets, a portfolio optimizer seeks to allocate a fixed amount
of capital among these assets so as to optimize some cost function.
For example, the classical Markowitz portfolio optimization frame-
work defines portfolio risk as the variance of the portfolio return,
and seeks an allocation which minimizes the risk subject to a target
expected return. If the mean return vector and the return covariance
matrix for the underlying assets are known, the Markowitz problem
has a closed-form solution.
In practice, however, the expected returns and the covariance
matrix of the returns are unknown and are therefore estimated from
historical data. This introduces several problems which render the
Markowitz theory impracticable in real portfolio management appli-
cations. This paper discusses these problems and reviews some of
the existing literature on methods for addressing them.
Index Terms— Covariance, estimation, portfolio, market, fi-
nance, Markowitz
1. INTRODUCTION
The return of a security between trading day t1 and trading day t2
is defined as the change in the closing price over this time period,
divided by the closing price on day t1. For example, the daily (i.e.,
one-day) return on trading day t is defined as (p(t)−p(t−1))/p(t−
1) where p(t) is the closing price on day t and p(t−1) is the closing
price on the previous trading day. Note that if t is a Monday or the
day after a holiday, the previous trading day will not be the same as
the previous calendar day.
Suppose an investment is made into N assets whose return vec-
tor is R, modeled as a random vector with expected return µ =
E[R] and covariance matrix Λ = E[(R − µ)(R − µ)T ]. In other
words, R = (R(1), . . . , R(N))T where R(n) is the return of the n-th
asset. It is assumed throughout the paper that the covariance matrix
Λ is invertible. This assumption is realistic, since it is quite unusual
in practice to have a set of assets whose linear combination has re-
turns exactly equal to zero. Even if an investment universe contained
such a set, the number of assets in the universe could be reduced to
eliminate the linear dependence and make the covariance matrix in-
vertible.
Out of these N assets, a portfolio is formed with allocation
weights w = (w(1), . . . , w(N))T . The n-th weight is defined as the
amount invested into the n-th asset, as a fraction of the overall invest-
ment into the portfolio: if the overall investment into the portfolio is
$D, and $D(n) is invested into the n-th asset, then w(n) = D(n)/D.
Therefore, by definition, the weights sum to one:
w
T
1 = 1, (1)
where 1 is an N -vector of ones. Note that some of the weights may
be negative, ...
Numerical method for pricing american options under regime Alexander Decker
This document presents a numerical method for pricing American options under regime-switching jump-diffusion models. It begins with an abstract that describes using a cubic spline collocation method to solve a set of coupled partial integro-differential equations (PIDEs) with the free boundary feature. The document then provides background on regime-switching Lévy processes and derives the PIDEs that describe the American option price under different regimes. It presents the time and spatial discretization methods, using Crank-Nicolson for time stepping and cubic spline collocation for the spatial variable. The method is shown to exhibit second order convergence in space and time.
Write down the four (4) nonlinear regression models covered in class,.pdfjyothimuppasani1
Write down the four (4) nonlinear regression models covered in class, i.e. both the name of the
model and their functional form showing their respective regression coefficients. A rapidly
growing bacteria has been discovered with a growth rate as shown in the table. Analyze this date
using a power law model. First linearize the data in the table based on the power law model, call
this new linearized data x and y. Next, determine the coefficients a_0 and a_1 of a linear
regression y = a_0 + a_1x for the linearized power law data x and y, and compute the correlation
coefficient for the liner regression. Note that you are not being asked to do a linear regression of
the data in the table but rather a linear regression of the linearized version of that data. Once you
compute a_0 and a_1 use these to determine the coefficients of the power law model. Write a
Matlab script that uses the polyfit function to determine the regression coefficients of a second
order polynomial fit to data in the table. In the same script, write down the code for using the
polyval command to determine the value of the regression curve at the observed times.
Solution
Nonlinear regression is a statistical technique that helps describe nonlinear relationships in
experimental data. A nonlinear regression model can be written Yn = f(xn, ) + Zn where f is the
expectation function and xn is a vector of associated regressor variables or independent variables
for the nth case.
A nonlinear model is one in which at least one of the parameters appears nonlinearly. More
formally, in a nonlinear model, at least one derivative with respect to a parameter should involve
that parameter. Examples of a nonlinear model are:
Y(t) = exp(at+bt2)
Y(t) = at + exp(-bt).
Important non linear growth models
1.Logistic Model
This model is represented by the differential equation
dN/dt = rN (1-N/K).
Integrating, we get
N(t) = K / [1+(K/No-1) exp(-rt)]
The graph of N(t) versus t is elongated S-shaped and the curve is symmetrical about its point of
inflexion.
2.Malthus Model
If N(t) denotes the population size or biomass at time t and r is the intrinsic growth rate, then the
rate of growth of population size is given by
dN/dt = rN.
Integrating, we get
N(t) = No exp (rt),
where No denotes the population size at t=0. Thus this law entails an exponential increase for
r>0. Furthermore, N(t) à as t à , which cannot happen in reality.
3. Monomolecular Model
This model describes the progress of a growth situation in which it is believed that the rate of
growth at any time is proportional to the resources yet to be achieved, i.e.
dN/dt = r(K-N),
where K is the carrying size of the system. Integrating (3.3), we get
N(t) = K-(K-No) exp (-rt).
4. Richards Model.
This model is given by dN/dt = rN (KN – Nm)/mKm which, on integration, gives N(t )=K N0 /[
N0+ (KN – N0m) exp (-rt)]1/m.
Unlike the earlier models, this model has four parameters
Hours (H)
10
20
40
50
No fo bacteria (N)
25
70
380
550
610
b)
N = b H m
Log10.
Interest Rate Modeling With Cox Ingersoll Rossstretyakov
The document describes the Monte Carlo method for estimating integrals and its application to pricing financial derivatives. It discusses using Monte Carlo simulation to price a European call option and a caplet by generating random stock prices and short-term interest rates based on stochastic processes, and taking averages of the discounted payoffs over many sample paths. It also examines how the parameters of the short-rate Cox-Ingersoll-Ross model affect the generated term structure of interest rates.
This document describes a stock price model based on the voter model from statistical physics. The model represents stock investors on a lattice who can be in a buying, selling, or neutral position based on interactions with neighboring investors. The stock price is then derived from the positions of investors. Computer simulations of the 1D model are presented, showing the fluctuations of generated stock prices and returns. The document also discusses properties found in real stock market data, such as power law distributions and volatility clustering, that the model aims to replicate.
This document presents a new statistical physics model for analyzing stock market fluctuations based on the contact process model from statistical physics. The contact process model is used to model the spread of information in the stock market and how it affects investor positions and subsequently stock prices. Simulation results from the one-dimensional contact process model are presented and show fat-tailed distributions and long memory effects in returns that are consistent with real stock market data. Statistical analysis techniques are applied to study the fluctuation characteristics of stock price returns from the Shanghai and Shenzhen stock exchanges.
This document compares the Black-Scholes model and Merton jump diffusion model for accurately estimating option prices. It estimates parameters for both models using historical stock price and option data from several large companies. The results do not conclusively show that Merton jump diffusion more accurately prices derivatives, but it may better approximate moments from the data. The document provides background on geometric Brownian motion, the Black-Scholes option pricing formula, and the maximum likelihood estimation and EM algorithm used to estimate Merton jump diffusion model parameters.
COVARIANCE ESTIMATION AND RELATED PROBLEMS IN PORTFOLIO OPTIMICruzIbarra161
COVARIANCE ESTIMATION AND RELATED PROBLEMS IN PORTFOLIO OPTIMIZATION
Ilya Pollak
Purdue University
School of Electrical and Computer Engineering
West Lafayette, IN 47907
USA
ABSTRACT
This overview paper reviews covariance estimation problems and re-
lated issues arising in the context of portfolio optimization. Given
several assets, a portfolio optimizer seeks to allocate a fixed amount
of capital among these assets so as to optimize some cost function.
For example, the classical Markowitz portfolio optimization frame-
work defines portfolio risk as the variance of the portfolio return,
and seeks an allocation which minimizes the risk subject to a target
expected return. If the mean return vector and the return covariance
matrix for the underlying assets are known, the Markowitz problem
has a closed-form solution.
In practice, however, the expected returns and the covariance
matrix of the returns are unknown and are therefore estimated from
historical data. This introduces several problems which render the
Markowitz theory impracticable in real portfolio management appli-
cations. This paper discusses these problems and reviews some of
the existing literature on methods for addressing them.
Index Terms— Covariance, estimation, portfolio, market, fi-
nance, Markowitz
1. INTRODUCTION
The return of a security between trading day t1 and trading day t2
is defined as the change in the closing price over this time period,
divided by the closing price on day t1. For example, the daily (i.e.,
one-day) return on trading day t is defined as (p(t)−p(t−1))/p(t−
1) where p(t) is the closing price on day t and p(t−1) is the closing
price on the previous trading day. Note that if t is a Monday or the
day after a holiday, the previous trading day will not be the same as
the previous calendar day.
Suppose an investment is made into N assets whose return vec-
tor is R, modeled as a random vector with expected return µ =
E[R] and covariance matrix Λ = E[(R − µ)(R − µ)T ]. In other
words, R = (R(1), . . . , R(N))T where R(n) is the return of the n-th
asset. It is assumed throughout the paper that the covariance matrix
Λ is invertible. This assumption is realistic, since it is quite unusual
in practice to have a set of assets whose linear combination has re-
turns exactly equal to zero. Even if an investment universe contained
such a set, the number of assets in the universe could be reduced to
eliminate the linear dependence and make the covariance matrix in-
vertible.
Out of these N assets, a portfolio is formed with allocation
weights w = (w(1), . . . , w(N))T . The n-th weight is defined as the
amount invested into the n-th asset, as a fraction of the overall invest-
ment into the portfolio: if the overall investment into the portfolio is
$D, and $D(n) is invested into the n-th asset, then w(n) = D(n)/D.
Therefore, by definition, the weights sum to one:
w
T
1 = 1, (1)
where 1 is an N -vector of ones. Note that some of the weights may
be negative, ...
Numerical method for pricing american options under regime Alexander Decker
This document presents a numerical method for pricing American options under regime-switching jump-diffusion models. It begins with an abstract that describes using a cubic spline collocation method to solve a set of coupled partial integro-differential equations (PIDEs) with the free boundary feature. The document then provides background on regime-switching Lévy processes and derives the PIDEs that describe the American option price under different regimes. It presents the time and spatial discretization methods, using Crank-Nicolson for time stepping and cubic spline collocation for the spatial variable. The method is shown to exhibit second order convergence in space and time.
Write down the four (4) nonlinear regression models covered in class,.pdfjyothimuppasani1
Write down the four (4) nonlinear regression models covered in class, i.e. both the name of the
model and their functional form showing their respective regression coefficients. A rapidly
growing bacteria has been discovered with a growth rate as shown in the table. Analyze this date
using a power law model. First linearize the data in the table based on the power law model, call
this new linearized data x and y. Next, determine the coefficients a_0 and a_1 of a linear
regression y = a_0 + a_1x for the linearized power law data x and y, and compute the correlation
coefficient for the liner regression. Note that you are not being asked to do a linear regression of
the data in the table but rather a linear regression of the linearized version of that data. Once you
compute a_0 and a_1 use these to determine the coefficients of the power law model. Write a
Matlab script that uses the polyfit function to determine the regression coefficients of a second
order polynomial fit to data in the table. In the same script, write down the code for using the
polyval command to determine the value of the regression curve at the observed times.
Solution
Nonlinear regression is a statistical technique that helps describe nonlinear relationships in
experimental data. A nonlinear regression model can be written Yn = f(xn, ) + Zn where f is the
expectation function and xn is a vector of associated regressor variables or independent variables
for the nth case.
A nonlinear model is one in which at least one of the parameters appears nonlinearly. More
formally, in a nonlinear model, at least one derivative with respect to a parameter should involve
that parameter. Examples of a nonlinear model are:
Y(t) = exp(at+bt2)
Y(t) = at + exp(-bt).
Important non linear growth models
1.Logistic Model
This model is represented by the differential equation
dN/dt = rN (1-N/K).
Integrating, we get
N(t) = K / [1+(K/No-1) exp(-rt)]
The graph of N(t) versus t is elongated S-shaped and the curve is symmetrical about its point of
inflexion.
2.Malthus Model
If N(t) denotes the population size or biomass at time t and r is the intrinsic growth rate, then the
rate of growth of population size is given by
dN/dt = rN.
Integrating, we get
N(t) = No exp (rt),
where No denotes the population size at t=0. Thus this law entails an exponential increase for
r>0. Furthermore, N(t) à as t à , which cannot happen in reality.
3. Monomolecular Model
This model describes the progress of a growth situation in which it is believed that the rate of
growth at any time is proportional to the resources yet to be achieved, i.e.
dN/dt = r(K-N),
where K is the carrying size of the system. Integrating (3.3), we get
N(t) = K-(K-No) exp (-rt).
4. Richards Model.
This model is given by dN/dt = rN (KN – Nm)/mKm which, on integration, gives N(t )=K N0 /[
N0+ (KN – N0m) exp (-rt)]1/m.
Unlike the earlier models, this model has four parameters
Hours (H)
10
20
40
50
No fo bacteria (N)
25
70
380
550
610
b)
N = b H m
Log10.
Interest Rate Modeling With Cox Ingersoll Rossstretyakov
The document describes the Monte Carlo method for estimating integrals and its application to pricing financial derivatives. It discusses using Monte Carlo simulation to price a European call option and a caplet by generating random stock prices and short-term interest rates based on stochastic processes, and taking averages of the discounted payoffs over many sample paths. It also examines how the parameters of the short-rate Cox-Ingersoll-Ross model affect the generated term structure of interest rates.
Application of stochastic lognormal diffusion model withAlexander Decker
This document describes a stochastic lognormal diffusion model that incorporates polynomial exogenous factors to model energy consumption in Ghana from 1999-2010. It presents:
1) A lognormal diffusion process model with drift and diffusion coefficients that depend on exogenous factors affecting consumption.
2) Maximum likelihood estimators for the drift and diffusion coefficients based on energy consumption data.
3) Hypothesis tests to evaluate the effect of exogenous factors on consumption patterns, showing Ghana's energy consumption has an upward trend over time.
This document discusses statistical estimation and inference. It defines key terms like parameter, statistic, estimator, and estimate. It explains that the goal of statistical inference is to use sample data to estimate unknown population parameters and test hypotheses about them. Specifically, it discusses point estimation, which aims to determine a single value for an unknown parameter, and interval estimation, which determines a range of values within which the parameter is expected to lie. It also outlines criteria for evaluating estimators, including unbiasedness, consistency, efficiency, and sufficiency.
A note on estimation of population mean in sample survey using auxiliary info...Alexander Decker
1. The document proposes a class of estimators for estimating the population mean in two-phase sampling using auxiliary information.
2. Some common estimators like the ratio, product, and regression estimators are special cases within the proposed class. Expressions for bias and mean squared error of the estimators are obtained up to the first order of approximation.
3. Asymptotically optimum estimators are identified that have minimum mean squared error. The proposed class of estimators is found to perform better than usual ratio and other estimators for population mean estimation.
The Black-Scholes-Merton model prices options using the assumption that stock price changes are lognormally distributed. It derives the Black-Scholes differential equation by constructing a riskless portfolio of stock and options and requiring its value to earn the risk-free rate of return. The model uses risk-neutral valuation, which assumes the expected return on the stock is the risk-free rate, to calculate the expected payoff of the option. Discounting this expected payoff at the risk-free rate gives the option price. The model provides closed-form formulas for European call and put option prices in terms of the stock price, strike price, risk-free rate, time to maturity, and volatility.
This paper analyzes the swap rates issued by the China Inter-bank Offered Rate(CHIBOR) and
selects the one-year FR007 daily data from January 1st, 2019 to June 30th, 2019 as a sample. To fit the data,
we conduct Monte Carlo simulation with several typical continuous short-term swap rate models such as the
Merton model, the Vasicek model, the CIR model, etc. These models contain both linear forms and nonlinear
forms and each has both drift terms and diffusion terms. After empirical analysis, we obtain the parameter
values in Euler-Maruyama scheme and relevant statistical characteristics of each model. The results show that
most of the short-term swap rate models can fit the swap rates and reflect the change of trend, while the CKLSO
model performs best.
equity, implied, and local volatilitiesIlya Gikhman
This document discusses connections between stock volatility, implied volatility, and local volatility in option pricing models. It provides an overview of the Black-Scholes pricing model, which assumes stock volatility is known. However, implied volatility estimated from market option prices does not match the true stock volatility. The local volatility model develops implied volatility as a function of underlying variables to better match market prices, without relying on an assumed stock process.
Last my paper equity, implied, and local volatilitiesIlya Gikhman
In this paper we present a critical point on connections between stock volatility, implied
volatility, and local volatility. The essence of the Black Scholes pricing model is based on assumption
that option piece is formed by no arbitrage portfolio. Such assumption effects the change of the real
underlying stock by its risk neutral counterpart. Market practice shows even more. The volatility of the
underlying should be also changed. Such practice calls for implied volatility. Underlying with implied
volatility is specific for each option. The local volatility development presents the value of implied
volatility.
Econometrics uses economic theory and statistical tools to quantify economic relationships and answer questions about economic data. An econometric model includes both a systematic component based on economic theory and an error term that represents unpredictable factors. Most economic data comes from non-experimental sources and is in time-series, cross-sectional, or panel form. The goal of econometrics is statistical inference like estimating parameters, predicting outcomes, and testing hypotheses using sample data. Econometric models incorporate probability distributions, random variables, and concepts like the mean, variance, and normal distribution to analyze economic data statistically.
The document provides an overview of correlation and regression analysis, time series models, and cost indexes. It defines correlation, regression analysis, and their importance and applications. It discusses simple linear regression equations, assumptions, and hypothesis testing. It also covers multiple linear regression, moving averages, exponential smoothing, and quantitative measures for evaluating time series models. The document is serving as the agenda for the Advanced Economics for Engineers course taught by Leemary Berrios, Irving Rivera, and Wilfredo Robles.
The Vasicek model is one of the earliest stochastic models for modeling the term structure of interest rates. It represents the movement of interest rates as a function of market risk, time, and the equilibrium value the rate tends to revert to. This document discusses parameter estimation techniques for the Vasicek one-factor model using least squares regression and maximum likelihood estimation on historical interest rate data. It also covers simulating the term structure and pricing zero-coupon bonds under the Vasicek model. The two-factor Vasicek model is introduced as an extension of the one-factor model.
The document discusses modeling volatility for European carbon markets using stochastic volatility (SV) models. It outlines estimating SV model parameters from market data, re-projecting conditional volatility, and using the re-projected volatility to price options and calculate implied volatilities. The modeling approach involves projecting historical returns, estimating an SV model, and then re-projecting conditional volatility and pricing options based on the estimated model. Parameters are estimated for both the NASDAQ OMX and Intercontinental Exchange carbon markets and model diagnostics are presented.
The document discusses modeling volatility for European carbon markets using stochastic volatility (SV) models. It outlines estimating SV model parameters from market data, simulating conditional volatility distributions, and using these to price options and evaluate market pricing errors. The modeling approach involves projecting returns from an SV model, estimating parameters, and then re-projecting to obtain conditional volatility forecasts for option pricing. Estimated model parameters and implied volatilities from major European carbon exchanges are presented and compared.
In this paper we show how the ambiguities in derivation of the BSE can be eliminated.
We pay attention to option as a hedging instrument and present definition of the option price based on market risk weighting. In such approach we define random market price for each market scenario. The spot price then is interpreted as a one that reflect balance between profit-loss expectations of the market participants
An investigation of inference of the generalized extreme value distribution b...Alexander Decker
This document presents an investigation of parameter estimation for the generalized extreme value distribution based on record values. Maximum likelihood estimation is used to estimate the parameters β (scale parameter) and ξ (shape parameter). Likelihood equations are derived and solved numerically. Bootstrap and Markov chain Monte Carlo methods are proposed to construct confidence intervals for the parameters since intervals based on asymptotic normality may not perform well due to small sample sizes of records. Bayesian estimation of the parameters using MCMC is also investigated. An illustrative example involving simulated records is provided.
Stochastic Order Level Inventory Model with Inventory Returns and Special Salestheijes
This article deals with a single period inventory model with inventory returns and special sales. The demand is assumed to occur in a uniform pattern during a planning period say, tp. Both non-deteriorating items and deteriorating items are considered for the discussion
Expanding further the universe of exotic options closed pricing formulas in t...caplogic-ltd
The document proposes a pricing method for exotic options like Best Of and Rainbow options that results in a closed-form pricing formula. The method assumes returns follow a Brownian motion under the Black-Scholes model. The pricing formula is a linear combination of the current market prices of the underlying assets multiplied by a probability expressed in the risk-neutral measure. This probability can be evaluated using the cumulative function of the normal multivariate distribution if the payoff is defined as a comparison of asset prices at different times. The paper provides proofs and discusses how to evaluate the required probability.
. In some papers it have been remarked that derivation of the Black Scholes Equation (BSE) contains mathematical ambiguities. In particular in [2,3 ] there are two problems which can be raise by accepting Black Scholes (BS) pricing concept. One is technical derivation of the BSE and other the pricing definition of the option.
In this paper, we show how the ambiguities in derivation of the BSE can be eliminated.
We pay attention to option as a hedging instrument and present definition of the option price based on market risk weighting. In such approach, we define random market price for each market scenario. The spot price then is interpreted as a one that reflects balance between profit-loss expectations of the market participants.
The document discusses dimensional analysis, similitude, and model analysis. It provides background on how dimensional analysis and model testing are used to study fluid mechanics problems. Dimensional analysis uses the dimensions of physical quantities to determine which parameters influence a phenomenon. Model testing in a laboratory allows measurements to be applied to larger scale systems using similitude. Buckingham's π-theorem is introduced as a way to non-dimensionalize variables when there are more variables than fundamental dimensions. Rayleigh's and Buckingham's methods are demonstrated on an example of determining the resisting force on an aircraft.
This document discusses the Black-Scholes pricing concept for options. It summarizes two popular derivations of the Black-Scholes equation, the original derivation and an alternative presented in other literature. It also discusses ambiguities that have been noted in the derivation of the Black-Scholes equation and proposes corrections to the derivation using modern stochastic calculus. Specifically, it introduces a hedged portfolio function defined over two variables to accurately represent the value and dynamics of the hedged portfolio. The document concludes that the Black-Scholes pricing concept only guarantees a risk-free return at a single point in time and does not necessarily reflect market prices.
This document discusses the Black-Scholes pricing concept for options. It summarizes two common derivations of the Black-Scholes equation, the original by Black and Scholes which uses a hedged position consisting of a long stock position and short option position. An alternative derivation presented in some textbooks uses a hedged position consisting of a long option position and short stock position. The document also notes that while the Black-Scholes price guarantees a risk-free return at a single moment in time, it does not necessarily reflect market prices and there is no guarantee the market will use the Black-Scholes price.
Application of stochastic lognormal diffusion model withAlexander Decker
This document describes a stochastic lognormal diffusion model that incorporates polynomial exogenous factors to model energy consumption in Ghana from 1999-2010. It presents:
1) A lognormal diffusion process model with drift and diffusion coefficients that depend on exogenous factors affecting consumption.
2) Maximum likelihood estimators for the drift and diffusion coefficients based on energy consumption data.
3) Hypothesis tests to evaluate the effect of exogenous factors on consumption patterns, showing Ghana's energy consumption has an upward trend over time.
This document discusses statistical estimation and inference. It defines key terms like parameter, statistic, estimator, and estimate. It explains that the goal of statistical inference is to use sample data to estimate unknown population parameters and test hypotheses about them. Specifically, it discusses point estimation, which aims to determine a single value for an unknown parameter, and interval estimation, which determines a range of values within which the parameter is expected to lie. It also outlines criteria for evaluating estimators, including unbiasedness, consistency, efficiency, and sufficiency.
A note on estimation of population mean in sample survey using auxiliary info...Alexander Decker
1. The document proposes a class of estimators for estimating the population mean in two-phase sampling using auxiliary information.
2. Some common estimators like the ratio, product, and regression estimators are special cases within the proposed class. Expressions for bias and mean squared error of the estimators are obtained up to the first order of approximation.
3. Asymptotically optimum estimators are identified that have minimum mean squared error. The proposed class of estimators is found to perform better than usual ratio and other estimators for population mean estimation.
The Black-Scholes-Merton model prices options using the assumption that stock price changes are lognormally distributed. It derives the Black-Scholes differential equation by constructing a riskless portfolio of stock and options and requiring its value to earn the risk-free rate of return. The model uses risk-neutral valuation, which assumes the expected return on the stock is the risk-free rate, to calculate the expected payoff of the option. Discounting this expected payoff at the risk-free rate gives the option price. The model provides closed-form formulas for European call and put option prices in terms of the stock price, strike price, risk-free rate, time to maturity, and volatility.
This paper analyzes the swap rates issued by the China Inter-bank Offered Rate(CHIBOR) and
selects the one-year FR007 daily data from January 1st, 2019 to June 30th, 2019 as a sample. To fit the data,
we conduct Monte Carlo simulation with several typical continuous short-term swap rate models such as the
Merton model, the Vasicek model, the CIR model, etc. These models contain both linear forms and nonlinear
forms and each has both drift terms and diffusion terms. After empirical analysis, we obtain the parameter
values in Euler-Maruyama scheme and relevant statistical characteristics of each model. The results show that
most of the short-term swap rate models can fit the swap rates and reflect the change of trend, while the CKLSO
model performs best.
equity, implied, and local volatilitiesIlya Gikhman
This document discusses connections between stock volatility, implied volatility, and local volatility in option pricing models. It provides an overview of the Black-Scholes pricing model, which assumes stock volatility is known. However, implied volatility estimated from market option prices does not match the true stock volatility. The local volatility model develops implied volatility as a function of underlying variables to better match market prices, without relying on an assumed stock process.
Last my paper equity, implied, and local volatilitiesIlya Gikhman
In this paper we present a critical point on connections between stock volatility, implied
volatility, and local volatility. The essence of the Black Scholes pricing model is based on assumption
that option piece is formed by no arbitrage portfolio. Such assumption effects the change of the real
underlying stock by its risk neutral counterpart. Market practice shows even more. The volatility of the
underlying should be also changed. Such practice calls for implied volatility. Underlying with implied
volatility is specific for each option. The local volatility development presents the value of implied
volatility.
Econometrics uses economic theory and statistical tools to quantify economic relationships and answer questions about economic data. An econometric model includes both a systematic component based on economic theory and an error term that represents unpredictable factors. Most economic data comes from non-experimental sources and is in time-series, cross-sectional, or panel form. The goal of econometrics is statistical inference like estimating parameters, predicting outcomes, and testing hypotheses using sample data. Econometric models incorporate probability distributions, random variables, and concepts like the mean, variance, and normal distribution to analyze economic data statistically.
The document provides an overview of correlation and regression analysis, time series models, and cost indexes. It defines correlation, regression analysis, and their importance and applications. It discusses simple linear regression equations, assumptions, and hypothesis testing. It also covers multiple linear regression, moving averages, exponential smoothing, and quantitative measures for evaluating time series models. The document is serving as the agenda for the Advanced Economics for Engineers course taught by Leemary Berrios, Irving Rivera, and Wilfredo Robles.
The Vasicek model is one of the earliest stochastic models for modeling the term structure of interest rates. It represents the movement of interest rates as a function of market risk, time, and the equilibrium value the rate tends to revert to. This document discusses parameter estimation techniques for the Vasicek one-factor model using least squares regression and maximum likelihood estimation on historical interest rate data. It also covers simulating the term structure and pricing zero-coupon bonds under the Vasicek model. The two-factor Vasicek model is introduced as an extension of the one-factor model.
The document discusses modeling volatility for European carbon markets using stochastic volatility (SV) models. It outlines estimating SV model parameters from market data, re-projecting conditional volatility, and using the re-projected volatility to price options and calculate implied volatilities. The modeling approach involves projecting historical returns, estimating an SV model, and then re-projecting conditional volatility and pricing options based on the estimated model. Parameters are estimated for both the NASDAQ OMX and Intercontinental Exchange carbon markets and model diagnostics are presented.
The document discusses modeling volatility for European carbon markets using stochastic volatility (SV) models. It outlines estimating SV model parameters from market data, simulating conditional volatility distributions, and using these to price options and evaluate market pricing errors. The modeling approach involves projecting returns from an SV model, estimating parameters, and then re-projecting to obtain conditional volatility forecasts for option pricing. Estimated model parameters and implied volatilities from major European carbon exchanges are presented and compared.
In this paper we show how the ambiguities in derivation of the BSE can be eliminated.
We pay attention to option as a hedging instrument and present definition of the option price based on market risk weighting. In such approach we define random market price for each market scenario. The spot price then is interpreted as a one that reflect balance between profit-loss expectations of the market participants
An investigation of inference of the generalized extreme value distribution b...Alexander Decker
This document presents an investigation of parameter estimation for the generalized extreme value distribution based on record values. Maximum likelihood estimation is used to estimate the parameters β (scale parameter) and ξ (shape parameter). Likelihood equations are derived and solved numerically. Bootstrap and Markov chain Monte Carlo methods are proposed to construct confidence intervals for the parameters since intervals based on asymptotic normality may not perform well due to small sample sizes of records. Bayesian estimation of the parameters using MCMC is also investigated. An illustrative example involving simulated records is provided.
Stochastic Order Level Inventory Model with Inventory Returns and Special Salestheijes
This article deals with a single period inventory model with inventory returns and special sales. The demand is assumed to occur in a uniform pattern during a planning period say, tp. Both non-deteriorating items and deteriorating items are considered for the discussion
Expanding further the universe of exotic options closed pricing formulas in t...caplogic-ltd
The document proposes a pricing method for exotic options like Best Of and Rainbow options that results in a closed-form pricing formula. The method assumes returns follow a Brownian motion under the Black-Scholes model. The pricing formula is a linear combination of the current market prices of the underlying assets multiplied by a probability expressed in the risk-neutral measure. This probability can be evaluated using the cumulative function of the normal multivariate distribution if the payoff is defined as a comparison of asset prices at different times. The paper provides proofs and discusses how to evaluate the required probability.
. In some papers it have been remarked that derivation of the Black Scholes Equation (BSE) contains mathematical ambiguities. In particular in [2,3 ] there are two problems which can be raise by accepting Black Scholes (BS) pricing concept. One is technical derivation of the BSE and other the pricing definition of the option.
In this paper, we show how the ambiguities in derivation of the BSE can be eliminated.
We pay attention to option as a hedging instrument and present definition of the option price based on market risk weighting. In such approach, we define random market price for each market scenario. The spot price then is interpreted as a one that reflects balance between profit-loss expectations of the market participants.
The document discusses dimensional analysis, similitude, and model analysis. It provides background on how dimensional analysis and model testing are used to study fluid mechanics problems. Dimensional analysis uses the dimensions of physical quantities to determine which parameters influence a phenomenon. Model testing in a laboratory allows measurements to be applied to larger scale systems using similitude. Buckingham's π-theorem is introduced as a way to non-dimensionalize variables when there are more variables than fundamental dimensions. Rayleigh's and Buckingham's methods are demonstrated on an example of determining the resisting force on an aircraft.
This document discusses the Black-Scholes pricing concept for options. It summarizes two popular derivations of the Black-Scholes equation, the original derivation and an alternative presented in other literature. It also discusses ambiguities that have been noted in the derivation of the Black-Scholes equation and proposes corrections to the derivation using modern stochastic calculus. Specifically, it introduces a hedged portfolio function defined over two variables to accurately represent the value and dynamics of the hedged portfolio. The document concludes that the Black-Scholes pricing concept only guarantees a risk-free return at a single point in time and does not necessarily reflect market prices.
This document discusses the Black-Scholes pricing concept for options. It summarizes two common derivations of the Black-Scholes equation, the original by Black and Scholes which uses a hedged position consisting of a long stock position and short option position. An alternative derivation presented in some textbooks uses a hedged position consisting of a long option position and short stock position. The document also notes that while the Black-Scholes price guarantees a risk-free return at a single moment in time, it does not necessarily reflect market prices and there is no guarantee the market will use the Black-Scholes price.
Similar to Normality_assumption_for_the_log_re.pdf (20)
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Normality_assumption_for_the_log_re.pdf
1. Discussiones Mathematicae
Probability and Statistics 32 (2012) 47–58
doi:10.7151/dmps.1143
NORMALITY ASSUMPTION FOR THE LOG-RETURN OF
THE STOCK PRICES
Pedro P. Mota1
Department of Mathematics
CMA of FCT/UNL
e-mail: pjpm@fct.unl.pt
Abstract
The normality of the log-returns for the price of the stocks is one of the
most important assumptions in mathematical finance. Usually is assumed
that the price dynamics of the stocks are driven by geometric Brownian
motion and, in that case, the log-return of the prices are independent and
normally distributed. For instance, for the Black-Scholes model and for the
Black-Scholes pricing formula [4] this is one of the main assumptions. In
this paper we will investigate if this assumption is verified in the real world,
that is, for a large number of company stock prices we will test the normality
assumption for the log-return of their prices. We will apply the Kolmogorov-
Smirnov [10, 5], the Shapiro-Wilks [17, 16] and the Anderson-Darling [1, 2]
tests for normality to a wide number of company prices from companies
quoted in the Nasdaq composite index.
Keywords: Anderson-Darling, Black-Scholes, Geometric Brownian motion,
Kolmogorov-Smirnov, Log-return, Normality test, Shapiro-Wilks.
2010 Mathematics Subject Classification: 62F03, 62F10, 62G30, 62Q05.
1. Introduction
The assumption that the observed data from a certain phenomenon belongs to
a certain distribution is not new and a large number of goodness-of-fit tests can
be applied under different conditions. There is an area where the assumption
1
This work was partially supported by the Fundação para a Ciência e a Tecnologia (Por-
tuguese Foundation for Science and Technology) through PEst-OE/MAT/UI0297/2011 (CMA)
2. 48 P.P. Mota
of normality of the data is of extremely importance, the mathematical finance,
where in many models the price of the stocks are assumed to follow a geometric
Brownian motion process and the log-return of those prices will have normal
distribution. This assumption is fundamental, for instance, in the definition of
the Black-Scholes model and in the deduction of the Black-Scholes pricing formula
for European options. Is easy to understand this importance if we refer that the
Black-Scholes pricing formula is widely used in the pricing of derivative products
in financial (derivative) markets and the value of those markets are of the order
of trillions of dollars. We will test the assumption of normality of the log-returns
of the historical stock prices for more than 1000 companies chosen from the
Nasdaq composite index, this data is available from the site finance.yahoo.com.
From the about 2500 companies that compose the index, we choose the ones
with mean daily transaction volume bigger than 50.000 units in the considered
year. We will test the normality for prices with different time interval between
observations: daily prices, weekly prices and monthly prices. We will work with
the data from individual years and from grouped years in a range from January of
2000 to December of 2011. We will test the data using the Kolmogorov-Smirnov,
Shapiro-Wilk/Shapiro-Francia and the Anderson-Darling goodness-of-fit tests for
normality.
2. Geometric Brownian motion and Black-Scholes model
In mathematical finance are widely used models where the stock prices dynamics
are the solution of the stochastic differential equation
dXt = µXtdt + σXtdBt,
(1)
where Bt is the Brownian motion process. For an overview in Brownian motion
and stochastic differential equations, see [8] and [12]. The solution of the preview
equation is the so called geometric Brownian motion
Xt = Xse(µ−1
2
σ2
)(t−s)+σ(Bt−Bs)
, s < t,
(2)
and from this representation we can write
Ln
Xt
Xs
=
µ−
1
2
σ2
(t−s)+σ(Bt−Bs) ∼ N
µ −
1
2
σ2
(t−s); σ2
(t−s)
(3)
because Bt − Bs ∼ N(0, t − s). One of the reasons for using this process, when
modeling the stock prices, is because with this process one can introduce the
Black-Scholes model, where a financial market, see [4] or [12], is composed by
two assets, a risk free asset with price dynamics
dRt = rRtdt,
(4)
3. Normality assumption for the log-return of the stock prices 49
where r is the risk free rate of return, and a risky asset with price dynamics
dXt = µXtdt + σXtdBt,
(5)
where µ and σ 0 are real numbers. In this context a contingent claim with
exercise date T is any random variable χ = Φ(XT ), measurable with respect to
the σ-field FT = σ({Xt : 0 ≤ t ≤ T}).
An example of a contingent claim is the European call option with exercise
price K and exercise date T, on the underlying asset with price process X. This
claim is a contract that gives to the older of the option the right, but not the
obligation, to buy one share of the asset X at price K from the underwriter of
the option, at time T (only at the precise time T).
For a contingent claim Φ(XT ), the arbitrage free price, at time t, is given by
the formula
F(t, x) = e−r(T−t)
EQ
t,x[Φ(XT )]
(6)
with X verifying the equation
dXu = rXudu + σXudBu, Xt = x,
(7)
under the (martingale) measure Q. For a European call option with strike price
K and maturity T this formula gives the well known Black-Scholes formula for
the option price Π(t) = F(t, Xt), where
F(t, x) = xN[d1(t, x)] − e−r(T−t)
KN[d2(t, x)].
(8)
With N[.] the N(0, 1) distribution function and with
d1(t, x) =
ln(x/K)+
r+σ2
2
(T−t)
σ
√
T−t
,
(9)
d2(t, x) = d1(t, x) − σ
√
T − t.
(10)
Being this formula widely used in the pricing of the European options.
3. Goodness-of-fit tests
We will apply several goodness-of-fit tests, both the Kolmogorov-Smirnov and the
Anderson-Darling test use the cumulative distribution function and the empirical
distribution function and are based in a measure of the discrepancy between those
two functions and therefore are considered in the class of “distance testes”. Some
advantages of this kind of tests is that they are easy to compute, they are more
powerful than the χ2 (Chi-Square) test, over a wide range of alternatives, and
4. 50 P.P. Mota
they provide consistent tests. The Shapiro-Wilks (or Shapiro-Francia) is a test
based in the regression between the order statistics of the sample and the mean
value of the order statistics from the tested distribution. This test for normality
has higher power than the previous ones.
3.1. Kolmogorov-Smirnov test
The Kolmogorov-Smirnov statistic provides a mean of testing if a set of obser-
vations are from some completely specified continuous distribution, F0(X). The
test statistic is given by
D = max {|F0(Xk:n) − Sn(Xk:n)| , |F0(Xk:n) − Sn(Xk−1:n)|} ,
(11)
where F0 is the distribution function of the distribution being tested, Sn is the
empirical distribution function and X1:n, X2:n, . . . , Xn:n are the order statistics.
Tables with the critical values for this statistic can be found in many texts, for
instance in [3] or [11]. However, when certain parameters of the distribution
need to be estimated from the sample, like the mean or the standard deviation,
then the commonly tabulated critical points are of no use. It seems that if the
test is used with this critical values the results will be conservative, that is, the
probability of rejecting the null hypothesis, when the hypothesis is true, will
be smaller than it should be. One way to overcome this problem is to use the
critical values presented by Lilliefors in [10] and for the ones that don’t like to use
extensive tables is possible to use a modification of the test statistic, see [18], to
dispense the usual tables of percentage points. In our case, only when considering
one year of monthly data we will work with samples of small dimension, for bigger
dimension samples the table values are obtained from simple formulas depending
on the sample dimension n and we do not need to use extensive tables. In our
problem, when testing the normality of the log-returns of the prices we need to
estimate the mean and the standard deviation of the data and because of that we
will use the Lilliefors’s critical values and if the observed value of the D statistic
exceeds the critical value in the table, we will reject the hypothesis that the
observations are from a normal population.
Table 1. Lilliefors’s critical values for samples of dimension n = 12 and n 30
Percentage points
n α = .20 α = .15 α = .10 α = .05 α = .01
= 12 .199 .212 .223 .242 .275
30 .736/
√
n .768/
√
n .805/
√
n .886/
√
n 1.031/
√
n
5. Normality assumption for the log-return of the stock prices 51
3.2. Shapiro-Wilk test
In the Shapiro-Wilk’s test presented in the paper [17] the test statistic W was
constructed through the regression of the order sample statistics on the expected
normal order statistics. The test statistic W is defined by
W =
(
Pn
i=1 aiXi:n)2
Pn
i=1(Xi:n − X̄)2
,
(12)
where
aT
= (a1, a2, . . . , an) =
mT V−1
mT V−1
V−1
m
1
2
,
(13)
when
mT
= (m1, m2, . . . , mn), V = [vij]n×n
(14)
represents the vector of expected values of standard normal order statistics and
the corresponding covariance matrix, respectively. Tables for the expected values
of the order statistics mi, i = 1, . . . , n for sample sizes n = 2(1)100(25)300(50)400
can be found at [7]. The values of a and the percentage points of W are known
up to sample sizes of n = 50 and can be found in the original paper [17]. For
samples of larger dimension an extension of the Shapiro and Wilk’s test can be
found in [13] or in alternative, the Shapiro-Francia statistic W0 (with simpler
coefficients) introduced in [16] and defined by
W0
=
(
Pn
i=1 biXi:n)2
Pn
i=1(Xi:n − X̄)2
,
(15)
where
bT
= (b1, b2, . . . , bn) =
mT
(mT m)
1
2
,
(16)
can be used. This test is proved to be consistent, see [15]. Values for b can be
computed from the order statistics in [7] and percentage points for W0 can be
found in [16] for sample sizes n = 35, 50, 51(2)99. For samples of larger dimension,
percentage points can be found in the paper [14] where an extension algorithm
for W0 is presented. Just for illustrative purposes we present a few percentage
points from the W statistics.
6. 52 P.P. Mota
Table 2. Shapiro-Wilk’s critical values for samples of dimension n = 12 and n = 30
Percentage points
n α = .01 α = .02 α = .05 α = .10
12 .805 .828 .859 .883
30 .900 .912 .927 .939
For this test, small values of W are significant, i.e. indicate non-normality.
3.3. Anderson-Darling test
The Anderson-Darling test is the third and the last goodness-of-fitness test that
we will apply to test for normality. This test, presented in the papers [1] and
[2], compares the observed cumulative distribution function to the expected cu-
mulative distribution function as the Kolmogorov-Smirnov test. This test gives
more weight to the tails of the distribution than the Kolmogorov-Smirnov test,
that is, is more sensitive to deviations in the tails between the empirical and the
theoretical distributions.
The test statistic A2 is defined by
A2
= −n −
1
n
n
X
i=1
(2i − 1) [ln(F0(Xi:n)) + ln(1 − F0(Xn−i+1:n))] ,
(17)
where, as before F0, represents the distribution function of the distribution to be
tested. In the paper [2], asymptotic critical points for significance levels of 1%, 5%
and 10% are presented, from Monte Carlo simulation, extensive tables of critical
points can be found in [9] or from saddle point approximation to the distribution
function, in [6]. For the situation where the distribution to be tested is normal
or exponential and the parameters of the distribution must be estimated, we can
find critical points in the papers [19, 20] or [21].
Table 3. Anderson-Darling’s asymptotic critical values
Percentage points
α = .01 α = .05 α = .10
3.857 2.492 1.933
For this test, are the large values of A2 that are significant, that is, indicate
non-normality.
7. Normality assumption for the log-return of the stock prices 53
4. Data testing
In this section we will present the results of the normality tests. As we referred,
we tested the assumption of normality of the log-returns of the historical stock
prices for a large number of companies chosen from the Nasdaq composite index.
This data is available from the site finance.yahoo.com and from, something like
2500 companies that compose the index, we choose the ones with daily transaction
volume bigger (in mean) than 50.000 units in the considered data interval. We
will test the normality for prices with different time interval between observations:
daily prices, weekly prices and monthly prices. We will work with the data from
individual years and from grouped years in a range from the beginning of 2000
to the end of 2011. We will test the data using the three goodness-of-fit tests for
normality presented in the previous section.
First, we considered the daily prices from each one of the years of 2005
through 2011, as we referred, in each year we only test the data from the com-
panies with daily transaction volume bigger than 50.000 unit in that year, this
number of selected companies will be represented by N. After getting a large
majority of rejected samples we decide to repeat the normality tests for weekly
data. The results obtained will be presented in the following tables.
Using daily and weekly prices, the Kolmogorov-Smirnov test and a level of
significance of 1% we get the results presented in Table 4.
Table 4. Total (and percentage) of samples rejected by the K.S. test for normality at 1%
level of significance (daily and weekly prices)
Number of rejected companies % of rejected companies
Year N Daily Prices Weekly Prices Daily Prices Weekly Prices
2005 1029 730 149 70.94% 14.48%
2006 1162 835 150 71.86% 12.91%
2007 1322 1038 216 78.52% 16.34%
2008 1366 1163 288 85.14% 21.08%
2009 1385 1067 201 77.04% 14.51%
2010 1471 927 171 63.02% 11.62%
2011 1568 1128 211 71.94% 13.46%
2007-2011 1322 1321 1037 99.92% 78.44%
Using daily and weekly prices, the Shapiro-Wilk test and a level of significance
of 1% we get the results presented in Table 5 and for the Anderson-Darling test
the results are presented in Table 6.
8. 54 P.P. Mota
Table 5. Total (and percentage) of samples rejected by the S.W. test for normality at 1%
level of significance (daily and weekly prices)
Number of rejected companies % of rejected companies
Year N Daily Prices Weekly Prices Daily Prices Weekly Prices
2005 1029 917 310 89.12% 30.13%
2006 1162 1035 301 89.01% 25.90%
2007 1322 1240 422 93.80% 31.92%
2008 1366 1324 499 96.93% 36.53%
2009 1385 1278 387 92.27% 27.94%
2010 1471 1205 332 81.92% 22.57%
2011 1568 1458 389 92.98% 24.81%
2007-2011 1322 1322 1241 100% 93.87%
Table 6. Total (and percentage) of samples rejected by the A.D. test for normality at 1%
level of significance (daily and weekly prices)
Number of rejected companies % of rejected companies
Year N Daily Prices Weekly Prices Daily Prices Weekly Prices
2005 1029 863 244 83.87% 23.71%
2006 1162 972 229 83.65% 19.71%
2007 1322 1196 354 90.47% 26.78%
2008 1366 1292 462 94.58% 33.82%
2009 1385 1261 300 91.05% 21.66%
2010 1471 1166 276 79.27% 18.76%
2011 1568 1387 315 88.46% 20.09%
2007-2011 1322 1322 1196 100% 90.47%
Looking at the results from all the normality tests, it seems reasonable to conclude
that the daily data in all the considered years do not follow normal distributions.
However, when we consider weekly prices, in the same years, the percentage of
company prices where the normal assumption is rejected is much smaller. Then
a question arises, the reason for the high/low percentage of rejections is because
we consider daily/weekly prices or that difference is due to the difference in the
samples dimension? When we consider daily data, we consider ±252 observations
but when we consider weekly prices the sample dimension is 52. To answer this
question we repeat the tests using weekly prices from 2007 to 2011, that is,
9. Normality assumption for the log-return of the stock prices 55
for samples with dimension 5 × 52 = 260 and in that situation high rejection
percentages are obtained again, as can be verified in the last line in the previous
Tables (4, 5 and 6). So it seems that weekly data can be assumed to be normal
distributed only when we have small samples. Finally, putting the question on
the other way around, that is, what will happen to small samples of daily data?
Well, for samples with smaller dimension corresponding to daily prices for the
different quarters of the different years of 2010 and 2011 (samples of dimension
±63), we obtain the results that we present in the Table 7.
Table 7. Samples rejected by the three tests for normality, at 1% level of significance, for
daily prices for each quarter of the years of 2010 and 2011
Number of rejected companies % of rejected companies
Quarter N K.S. S.W. A.D. K.S. S.W. A.D.
First 2010 1471 357 612 503 24.27% 41.60% 34.19%
Second 2010 1471 1456 1550 1539 92.86% 98.85% 98.15%
Third 2010 1471 1340 1455 1435 91.09% 98.91% 97.55%
Fourth 2010 1471 418 669 570 28.42% 45.48% 38.75%
First 2011 1568 384 648 532 24.49% 41.33% 33.93%
Second 2011 1568 1065 1407 1334 67.92% 89.73% 85.08%
Third 2011 1568 702 1081 969 44.77% 68.94% 61.80%
Fourth 2011 1568 228 444 329 14.54% 28.32% 20.98%
For small samples of daily prices we can observe radical changes from one quarter
to the other. In some quarter we observe the same behavior as for the weekly
prices in small samples, that is, small percentages of rejection, but in others we
continue to observe high percentages of rejection of normality.
For a final confirmation of the normality failure, we also tested monthly
prices. For monthly prices, we start with small samples, in this situation just
considering the data from each year (12 observations) and then we grouped several
years in order to build larger samples (2007 − 2011 ⇒ 60 observations; 2005 −
2011 ⇒ 84 observations; 2000 − 2011 ⇒ 144 observations). The results are
presented in Table 8.
Once again, as the samples dimension increases also the percentage of nor-
mality rejection increases just as happened in the daily and weekly prices sets of
data.
Observation 1. The reason for the decreasing number of selected companies to
test, when we go from the year of 2011 to the year of 2000, is because we start by
considering the companies that belong to the Nasdaq composite index in the year
of 2011 but not all those companies belong to the index in the previous years.
10. 56 P.P. Mota
Table 8. Samples rejected by the three tests for normality, at 1% level of significance, for
monthly prices in different years
Number of rejected companies % of rejected companies
Year(s) N K.S. S.W. A.D. K.S. S.W. A.D.
2005 1029 24 40 20 2.33% 3.89% 1.94%
2006 1162 19 38 28 1.64% 3.27% 2.41%
2007 1322 22 57 35 1.66% 4.31% 2.65%
2008 1366 38 69 47 2.78% 5.05% 3.44%
2009 1385 38 76 50 2.74% 5.49% 3.61%
2010 1471 18 41 31 1.22% 2.79% 2.11%
2011 1568 25 64 42 1.59% 4.08% 2.68%
2007-2011 1322 188 416 296 14.22% 31.47% 22.39%
2005-2011 1029 224 428 337 21.77% 41.59% 32.75%
2000-2011 752 347 551 478 46.14% 73.27% 63.56%
5. Final remarks
The normality of the logarithm of the stock prices is widely accepted in the finan-
cial models. However, our study suggests that the assumption of normality will
fail for an high percentage of companies prices from the Nasdaq composite index.
This seems to be true even when we consider different observations interval, that
is, when we consider weekly prices or even monthly prices. In fact, the number
of observations, that is, the sample dimension is more important when testing
the normality than the fact of considering daily, weekly or monthly prices. One
can argue that to use the pricing Black-Scholes formula we can use the implied
volatility (σ parameter) and in that approach we do not need to use the historical
prices to estimate it. Even so, the Black-Scholes formula is deduced supposing
that the stock prices follows the geometric Brownian motion dynamics and than
the normality assumption must be verified.
References
[1] T.W. Anderson and D.A. Darling, Asymptotic theory of certain goodness-of-fit cri-
teria based on stochastic processes, Ann. Math. Statist. 23 (1952) 193–212.
[2] T.W. Anderson and D.A. Darling, A test of goodness of fit, Journal of the American
Statistical Association 49 (268) (1954) 765–769.
[3] Z.W. Birnbaum, Numerical tabulation of the distribution of Kolmogorov’s statistic
for finite sample size, Journal of the American Statistical Association 47 (259)
(1952) 425–441.
11. Normality assumption for the log-return of the stock prices 57
[4] T. Bjork, Arbitrage Theory in Continuous Time (Oxford University Press, 1998).
[5] D.A. Darling, The Kolmogorov-Smirnov, Cramer-von Mises tests, Ann. Math.
Statist. 28 (4) (1957) 823–838.
[6] D.E.A. Giles, A saddlepoint approximation to the distribution function of the
Anderson-Darling test statistic, Communications in Statistics - Simulation and Com-
putation 30 (4) (2001) 899–905.
[7] H.L. Harter, Expected values of normal order statistics, Biometrika 48 (1–2) (1961)
151–165.
[8] I. Karatzas and Shreve, Brownian Motion and Stochastic Calculus (Springer-Verlag,
2000).
[9] P.A. Lewis, Distribution of the Anderson-Darling statistic, Ann. Math. Statist. 32
(4) (1961) 1118–1124.
[10] H.W. Lilliefors, On the Kolmogorov-Smirnov test for normality with mean and vari-
ance unknown, Journal of the American Statistical Association 62 (318) (1967)
399–402.
[11] F.J. Massey, The Kolmogorov-Smirnov test for goodness of fit, Journal of the Amer-
ican Statistical Association 46 (253) (1951) 68–78.
[12] B. Oksendall, Stochastic Differential Equations (Springer-Verlag, 1998).
[13] J.P. Royston, An extension of Shapiro and Wilk’s W test for normality to large
samples, Journal of the Royal Statistical Society. Series C (Applied Statistics) 31
(1982) 115–124.
[14] J.P. Royston, A simple method for evaluating the Shapiro-Francia W’ test of non-
normality, Journal of the Royal Statistical Society. Series D (The Statistician) 32
(1983) 287–300.
[15] K. Sarkadi, The consistency of the Shapiro-Francia test, Biometrika 62 (2) (1975)
445–450.
[16] S.S. Shapiro and R.S. Francia, An approximate analysis of variance test for normal-
ity, Journal of the American Statistical Association 67 (1972) 215–216.
[17] S.S. Shapiro and M. Wilk, An analysis of variance test for normality (Complete
Samples), Biometrika 52 (1965) 591–611.
[18] M.A. Stephens, Use of the Kolmogorov-Smirnov, Cramer-Von Mises and related
statistics without extensive tables, Journal of the Royal Statistical Society. Series B
(Methodological) 32 (1) (1970) 115–122.
[19] M.A. Stephens, EDF statistics for goodness of fit and some comparisons, Journal of
the American Statistical Association 69 (347) (1974) 730–737.
[20] M.A. Stephens, Asymptotic results for goodness-of-fit statistics with unknown pa-
rameters, Annals of Statistics 4 (2) (1976) 357–369.
12. 58 P.P. Mota
[21] M.A. Stephens, Goodness of Fit with Special Reference to Tests for Exponentiality,
Technical Report No. 262 (Department of Statistics, Stanford University, Stanford,
CA, 1977).
Received 18 May 2012