Chapter 1 STATISTICAL CONCEPTS AND MARKET RETURNS Statistical methods provide a set of tools for analyzing data and drawing conclusions onasset returns, earnings, growth rates, commodity prices, or any other financial data. In thischapter, we will study the four properties of return distributions: a) where the returns arecentered (central tendency); b) how far returns are dispersed (dispersion); c) whether thedistribution of returns is symmetric or lopsided (skewness); and, d) whether extreme outcomesare likely (kurtosis).Terminologies:Population – all elements of a specific groupParameter – quantity used to describe a population such mean, range and varianceSample – a representation or a subset of a populationStatistic – quantity used to describe a sampleMeasurement Scales1. Nominal scale - categorizes each member of the population or sample using an integer foreach category.2. Ordinal scale - each member of the population or sample is placed into a category and thesecategories are ranked/ordered with respect to some characteristic3. Interval scale - each member is assigned a number from a scale. It provides not only rankingbut also assurance that the differences between values are equal.4. Ratio scale - has all the characteristics of interval measurement scale as well as truemeaning of zero point as the origin.Frequency Distribution A frequency distribution is a tabular display of data summarized into intervals. It helps inthe analysis of large amount of statistical data for all types of measurement scales. We usefrequency distribution to summarize rates of return, the fundamental units that analysts andportfolio managers use for making investment decisions. To analyze rates of return, we firstcompute the total return or the holding period return for time period t, Rt: Pt Pt 1 Dt Rt Pt 1 where: Pt = price per share at the end of time period t Pt – 1 = price per share at the end of time period t – 1, the time period immediately preceding time period t Dt = cash distributions received during time period t
Construction of a Frequency Distribution 1. Calculate the range of the data. 2. Decide on the number of classes in the frequency distribution, k, usually between 5 and 15. 3. Determine the interval width as Range/k. 4. Determine class boundaries. 5. Count the number of observations falling in each interval. 6. Compute the class midpoints. 7. Construct a table of the intervals listed from smallest to largest that shows the number of observations falling in each interval.Relative Frequency Relative frequency is the absolute frequency of each interval divided by the total numberof observations.Cumulative Relative Frequency Accumulates or adds up the relative frequencies from the first class to the last class. Ittells us the fraction of observations that are less than the upper limits of each class interval.Graphical Presentation of DataHistogram – a bar chart of data from a frequency distributionFrequency Polygon – a line graph of the class midpoint (on the x –axis) and the absolutefrequency of the class interval (on the y-axis).Cumulative Frequency graph – a line graph of the cumulative relative frequency (y-axis) and theupper class limit (x-axis).Measures of Position/Location1. Measures of Central Tendency – summarize the location on which the data are centered.a. Mean i. Arithmetic mean – the sum of the observations divided by the number of observations Population mean is the arithmetic mean value of a population. For N observations and Xi is the ith observation, the population mean, , is computed as N Xi i 1 N Sample mean is the arithmetic mean computed for a sample. For n observations in the sample, the sample mean, X , is computed as
n Xi i 1 X nii. Weighted mean – weights each value in the distribution according to its importance Weighted mean for a set of observations X1 , X2 , . . . , Xn with corresponding weights w1, w2 , . . . , wn is computed as n Xw wi X i i 1 Example 1.1: An investment manager with P10,000,000 to invest allocates P6,500,000 to equities and P3,500,000 to bonds. If the portfolio has a weight of .7 on equities and .3 on bonds, what is the return on this portfolio? Solution: X w .7(P6,500 000) .3(P3,500,000) = P 5,600,000 , This value is the portfolio’s return on the stocks and bond investments. This example illustrates the general principle that a portfolio return (past data) is a weighted sum. Weighted mean can also be used for forward-looking data and we call this as expected value or expected return. The weights are probabilities from forecasts.iii. Geometric mean – most frequently used to average rates of change over time or to compute the growth rate of a variable, such as time series rates of returns on an asset or a portfolio, or to compute the growth rate of a financial variable such as earnings or sale. Geometric mean, G, of a set of observations X1 , X2 , . . . , Xn, where each Xi is greater than or equal to 0, is computed as G n X 1 X 2 ...X n When data involves returns in time series, we compute the geometric mean return, RG, or compound returns over the time period spanned by returns R1 through RT, as 1 RG T (1 R1 )(1 R2 )...( RT ) 1 or RG T (1 R1 )(1 R2 )...( RT ) 1 1 Example 1.2: Calculate the geometric mean return of the total returns of the ABC company from 2000 to 2005: 16.2%, 20.3%, 9.8%, -11%, 1.6%, -13.5%. RG = ((1.162)(1.203)(1.098)(.89)(1.016)(.865))1/6 – 1 = .030929545 = 3.093%
iv. Harmonic mean – a special type of weighted mean in which an observation’s weight is inversely proportional to its magnitude. It is appropriate when averaging ratios (amount per unit). It is used to compute cost averaging which involves the periodic investment of a fixed amount of money. Harmonic mean of a set of observations X1 , X2 , . . . , Xn, where each Xi > 0, is computed as n XH n 1 i 1 Xi Example 1.3: Suppose an investor purchases $5000 of a security each month for 3 months. The share prices are $15, $10, and $12 at the three purchase dates. What is the average price paid for the security? Solution: 3 3 XH = $ 12 - average price paid for the security 1 1 1 15 15 10 12 60b. Median – the value of the middle item of a set of items that has been sorted into ascending ordescending order. In an odd numbered sample of n items, the median is in the (n+1)/2 position.In an even-numbered sample, the median is the mean of the values in the n/2 and (n+2)/2positions.c. Mode – most frequently occurring value in a distribution or the value or values with thehighest frequency.2. Quantiles – describe the location of data that involves identifying values at or below whichspecified proportions of the data lie. It is used in portfolio performance evaluation as well as ininvestment strategy development and research.a. Percentiles – divide the n observations of the distribution into hundredthsb. Quartiles – divide the n observations of the distribution into quarters where the divisions are(Q1 or P25), (P50 or Md) and (Q3 or P75)c. Quintiles – divide the n observations of the distribution into fifths where the divisions are P20,P40, P60, and P80d. Deciles – divide the n observations of the distribution into tenths where the divisions are P10,P20, P30, P40, P50, P60, P70, P80, and P90Measures of Dispersion1. Range (R) – the difference between the maximum and minimum values in set of observations2. Mean Absolute Deviation (MAD) – average of the absolute deviations around the mean.
n Xi X i 1 MAD = n3. Variance – average of the squared deviations around the mean a. Population variance N (Xi )2 2 i 1 N where: Xi = ith observation = population mean b. Sample variance n (X i X )2 s2 i 1 n 1 where: Xi = ith observation X = sample mean4. Standard Deviation – positive square root of the variance n (Xi X )2 i 1 s n 15. Semivariance – average squared deviation below the mean n* (Xi X )2 2 i 1 sb , Xi < X n* 16. Semideviation or semistandard deviation – positive square root of semivariance n* (Xi X )2 i 1 sb , Xi < X n* 1Chebyshev’s Inequality The proportion of the observations within k standard deviations of the arithmetic mean isat least (1 – 1/ k2) for all k > 1. The table below illustrates the proportion of the observations that must lie within acertain number of standard deviations around the sample mean.
Proportions from Chebyshev’s Inequality K Interval around the sample Proportion Mean 1.25 X 1.25s 36% 1.50 X 1.50s 56% 2.00 X 2.00s 75% 2.50 X 2.50s 84% 3.00 X 3.00s 89% 4.00 X 4.00s 94%Example 1.4: The arithmetic mean and the standard deviation of monthly returns of ABCinvestments were .95% and 6.5%, respectively from 1950-2009 is 720 monthly observations.a) Determine the interval that must contain at least 75% of monthly returns.b) What are the minimum and maximum number of observations that must lie in the interval in(a) ?Solution:a) At least 75% of the observations must lie within 2 standard deviations of the mean. Thus,the interval that must contain at least 75% of the observations for the monthly return series, wehave .95% 2(6.5%) = .95 13% or – 12.05% to 13.95%.b) For a sample size of 720, at least .75(720) = 540 observations must lie in the interval from - 12.05% to 13.95%.Coefficient of Variation (CV) – the ratio of the standard deviation of a set of observations totheir mean value. It is a measure of relative dispersion, that is, the amount of dispersion relativeto a reference value or benchmark. When the observations are returns, the coefficient ofvariation measures the amount of risk (standard deviation) per unit of mean return. Thus, infinance analysis, the greater the CV of returns the more it is risky. s CV = XExample 1.5: The table summarizes the annual mean returns and standard deviations forseveral major US asset classes from 1926-2002 Asset Class Arithmetic Mean Standard Deviation Return (%) of Return (%) S & P 500 12.3 21.9 US small stock 16.9 35.1 US long-term corporate 6.1 7.2 US long-term government 5.8 8.2 US 30-day T-bill 3.8 0.9 a) Determine the coefficient of variation for each asset class b) Which asset class is most risky? least risky? c) Determine whether there is more difference between the absolute risk (standard deviation) or the relative risk (CV) of the S&P 500 and US small stocks.
Solution:a) S & P 500 CV = 21.9/12.3 = 1.78 US small stock CV = 35.1/16.9 = 2.077 US long-term corporate CV = 7.2/6.1 = 1.18 US long-term government CV = 8.2/5.8 = 1.414 US 30-day T-bill CV = 0.9/3.8 = 0.237b) US small stock is most risky while US 30-day T-bill is least risky.c) The standard deviation of US small stock return is (35.1–21.9)/21.9=.603= 60.3% larger thanS&P 500 returns compared with their difference in the CV of (2.077–1.78)/1.78=0.167= 16.7%.Sharpe Ratio or Reward-to-Variability RatioSharpe ratio is widely used for investment performance measurement to measure excess returnper unit of risk. The Sharpe ratio for a portfolio is calculated as Rp RF Sh spwhere: Rp = mean return to the portfolio, p RF = mean return to a risk-free asset sp = standard deviation of return on the portfolioThe numerator of the Sharpe ratio (called mean excess return on portfolio p) measures theextra reward that investors receive for the added risk taken. Moreover, a portfolio’s Sharpe ratiodecreases if we increase risk, all else equal. Risk-averse investors who make decisions basedon mean return and standard deviation of return prefer portfolios with larger Sharpe ratios tothose with smaller Sharpe ratio.Example 1.6: Using the given table in Example 1.5, consider the performance of the S&P 500and US small stocks, using the mean of US T-bill return to represent the risk-free rate (leastrisky), we find the Sharpe ratios asS&P 500: Sh = (12.3 – 3.8)/21.9 = 0.39US small stocks: Sh = (16.9 – 3.8)/35.1 = 0.37US small stocks earned higher mean returns but performed slightly less well than the S&P 500.Measures of Shape – measures the degree of symmetry in return distributions.1. Normal Distribution – is symmetrical, bell-shaped distribution where the mean, median andmode are equal and described by the parameters mean and variance. If a return distribution issymmetric about the mean, equal loss and gain intervals exhibit the same frequencies, that is,losses from -4% to -2% occur with about the same frequency as gains from 2% to 4%.
| | | | | | -3 -2 -1 1 2 32. Skewed distribution – a distribution that is not symmetric about its mean. It is computed bythe following formula: n (Xi X )3 n i 1 SK (n 1)(n 2) s3 where: n = the number of observations in the sample s = the sample standard deviationa. Positively skewed distribution A return distribution with positive skew has frequent small losses and a few extremegains. The mode is less than the median which is less than the mean. SK > 0b. Negatively skewed distribution A return distribution with negative skew has frequent small gains and a few extremelosses. The mean is less than the median which is less than the mode. SK < 03. Kurtosis – measures the peakedness of a distribution and provides information about theprobability of extreme outcomes. A return distribution differs from a normal distribution by havingmore returns clustered closely around the mean and more returns with large deviations from themean (having fatter tails). Investors would perceive a greater chance of extremely largedeviations from the mean as increasing risk. Excess Kurtosis, KE, is calculated by the followingformula: n n(n 1) (Xi X )4 i 1 3(n 1) 2 KE , n < 100 (n 1)(n 2)(n 3) s 4 (n 2)(n 3)
n (Xi X )4 1 i 1KE 3, n 100 n s4 a. Leptokurtic (B) – a distribution that is more peaked than the normal distribution, K > 3 or KE > 0. b. Mesokurtic (A) – a distribution identical to a normal distribution, K = 3 or KE = 0 c. Platykurtic (C) – a distribution that is less peaked than the normal distribution, K < 3 or KE<0. c
Exercises 1.1 Name: __________________________ Year & Sec: _____________________ Score: ____________State the type of scale used to measure the following sets of data:1. sales2. investment style of mutual funds3. Analyst’s rating of a stock in a portfolio as underweight, market weight, or overweight4. a measure of the risk of portfolios on a scale of 1 (very conservative) to 5 (very risky).5. credit ratings for bond issues6. cash dividends per share7. hedge fund classification8. bond maturity in years
Exercise 1.2 Name: __________________________ Year & Sec: _____________________ Score: ____________The table below gives the deviations of a hypothetical portfolio’s annual total returns (gross offees) from its benchmark’s annual returns, for a 12-year period. Portfolio’s Deviations from Benchmark Return Year Deviation from benchmark (%) 1992 -7.14 1993 1.62 1994 2.48 1995 -2.59 1996 9.37 1997 -0.55 1998 -0.89 1999 -9.19 2000 -5.11 2001 -0.49 2002 6.84 2003 3.04 a. Make a frequency distribution for the portfolio’s deviations from benchmark return using k = 6. b. Calculate the frequency, cumulative frequency, relative frequency and cumulative frequency for the portfolio’s deviations from benchmark return. c. Construct a histogram using the data. d. Identify the modal interval of the grouped data.
Exercise 1.3 Name: __________________________ Year & Sec: _____________________ Score: ____________The table below gives the deviations of a hypothetical portfolio’s annual total returns (gross offees) from its benchmark’s annual returns, for a 12-year period. Portfolio’s Deviations from Benchmark Return Year Deviation from benchmark (%) 1992 -7.14 1993 1.62 1994 2.48 1995 -2.59 1996 9.37 1997 -0.55 1998 -0.89 1999 -9.19 2000 -5.11 2001 -0.49 2002 6.84 2003 3.04 a. Calculate the sample mean return. b. Calculate the median return. c. Calculate the geometric mean. d. Calculate the P25, P40, P80. e. Determine the range, MAD, variance, and standard deviation f. Determine the semivariance and semideviation.
Exercise 1.4 Name: __________________________ Year & Sec: _____________________ Score: ____________The table below gives the deviations of a hypothetical portfolio’s annual total returns (gross offees) from its benchmark’s annual returns, for a 12-year period. Portfolio’s Deviations from Benchmark Return Year Deviation from benchmark (%) 1992 -7.14 1993 1.62 1994 2.48 1995 -2.59 1996 9.37 1997 -0.55 1998 -0.89 1999 -9.19 2000 -5.11 2001 -0.49 2002 6.84 2003 3.04 a. Calculate the skewness b. Calculate the excess kurtosis.
Exercise 1.5 Name: __________________________ Year & Sec: _____________________ Score: ____________An analyst has estimated the following parameters for the annual returns distributions for fourportfolios: Portfolio Mean Return Variance of Returns Skewness Kurtosis A 10% 625 1.8 0 B 14% 900 0.0 3 C 16% 1250 -0.85 5 D 19% 2000 1.4 2The analyst has been asked to evaluate the portfolios’ risk and return characteristics. Assumethat a risk-free investment will earn 5%. a. Which portfolio would be preferred based on the Sharpe performance measure? b. Which portfolio would be the most preferred based on the coefficient of variation? c. Which portfolio/s is/are symmetric? d. Which portfolio/s has/have fatter tails than a normal distribution? e. Which portfolio is the riskiest based on its skewness? f. Which portfolio is the riskiest based on its kurtosis? g. Which portfolio will likely be considered more risky when judged by its semivariance rather than by its variance?