Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

1,152 views

Published on

No Downloads

Total views

1,152

On SlideShare

0

From Embeds

0

Number of Embeds

91

Shares

0

Downloads

0

Comments

0

Likes

1

No embeds

No notes for slide

- 1. Not Waving But Drowning Understanding Data Andrew Hingston Switch Solutions quant training solutions ahingston@switchsolutions.com.au
- 2. Find out from someone: Name and what they do Hobby or interest that others don’t know Waving or drowning in data? 2 Not Waving But Drowning by Stevie Smith Nobody heard him, the dead man, But still he lay moaning: I was much further out than you thought And not waving but drowning. Poor chap, he always loved larking And now he's dead It must have been too cold for him his heart gave way, They said. Oh, no nono, it was too cold always (Still the dead one lay moaning) I was much too far out all my life And not waving but drowning.
- 3. Course schedule Descriptive statistics Normal distribution Monitoring processes 1 Monitoring processes 2 Hypothesis testing Simple regression 1 Simple regression 2 Multiple regression 1 Multiple regression 2 Time series models 1 Time series models 2 Review 3
- 4. Learning objectives Understand and calculate descriptive statistics Plot and interpret a histogram Plot and interpret a box-plot Interpret descriptive statistics to solve basic business problems 4
- 5. Why understand data? Fact based decisions Avoiding bias Power and persuasion 5
- 6. Common biases Memorability Anchoring and adjustment Status quo Self-serving Negative comparisons Framing 6 Why understand data?
- 7. Sources of power Legitimate power (position) Referent power (loyalty) Expert power (skills and expertise) Reward power (material rewards) Coercive power (withhold rewards) French and Raven (1959)“The bases of social power”See Wikipedia:“Power (philosophy)” 7 Why understand data?
- 8. Principles of persuasion Reciprocity (favors) Consistency (commitment) Social Proof (herd) Authority Liking Scarcity Robert Cialdini (2001)“Influence: Science and practice”See Wikipedia “Robert Cialdini” 8 Why understand data?
- 9. Some data jargon 9 Why understand data
- 10. Steps for using data Specify problem Propose answers Identify the right tools Obtain your data Visualise it Crunch numbers Interpret, persuade, apply 10 Why understand data
- 11. Discussion question 1 occasion when charts or stats used well. 1 occasion when they were abused. 11
- 12. 12 1 Visualising Data
- 13. Why visualise your data? For you Fast understanding Build solid ‘foundation’ Flags problems 13 For others Easier to follow Memorable Less info overload More convincing
- 14. When to use each of these charts? 14 1. Visualising data with charts
- 15. Bar charts 15 1. Visualising data with charts
- 16. Column charts 16 1. Visualising data with charts
- 17. Line charts 17 1. Visualising data with charts Source: StatCounterGlobalStats
- 18. Scatterplot 18 1. Visualising data with charts
- 19. Discussion Rate these Mobile OS GUIs out of 10 based on overall attractiveness and functionality for the consumer market aged 16 to 30: 19 1. Visualising data with charts
- 20. Bubble chart 20 1. Visualising data with charts
- 21. Pie chart 21 1. Visualising data with charts
- 22. Stacked column 22 1. Visualising data with charts
- 23. Radar charts 23 1. Visualising data with charts
- 24. Compound charts (eg. Stock Chart) 24 1. Visualising data with charts
- 25. Chart presentation tips Must tell story in <10s Avoid complexity Think about colours Think about font sizes Use handouts creatively Avoid jargon 25 1. Visualising data with charts
- 26. Exercise Charting exercise using software and data. 26
- 27. 2. Measuring the middle Mean Median Mode Weighted average Trimmed mean 27
- 28. Mean Arithmetic average of a set of data points Sum values then divide by number of data points Example: mean return of ASX200 = 12% p.a. 28 Advantages Easy to calculate Easy to interpret for symmetric distributions Based on all the data Disadvantages Less useful with skewed data Affected by outliers 3. Numerical statistics
- 29. Median Middle data point when the data is ordered Example: Sydney median house price = $600k 29 Advantages Easy to obtain from a list of sorted data Easy to interpret Unaffected by outliers Disadvantages Only based on the ‘middle’ data point(s) and so can be more variable than the sample mean. 3. Numerical statistics
- 30. 3. Measuring the spread Max, min, range Inter-quartile range Standard deviation Coefficient of variation 30
- 31. Standard deviation and variance Measures ‘variability’ or ‘spread’ of data Based on how far each score varies from mean Common notation: S2 or 2 = variance S or = standard deviation 31 3. Numerical statistics
- 32. Spread and normal distribution 32 3. Numerical statistics +3SD 2SD +2SD +1SD 3SD 1SD Mean ASX200 Mean = 10% SD (Std dev) = 10% 68.2% chance 95.4% chance 99.7% chance ... more on this in next unit!
- 33. Inter-quartile range (IQR) Spread of middle 50% of data IQR = Q3 – Q1 where Order data from small to large Q1 is the data point at the end of the first quarter Q3 is the data point at the end of the third quarter 33 3. Numerical statistics
- 34. Exercise Calculate descriptive statistics for a data set Express what each one means in your own words. 34
- 35. 4. Visualising with histograms 35
- 36. Histogram advantages 36 Easy to construct Easy to interpret Indicate symmetryor skewness Indicates multimodality (>1 peak) Can be used if data comes in grouped form 2. Graphical representation
- 37. Histogram disadvantages 37 Original data points? Width of class intervals effects appearance Class intervals often not well chosen Small samples can be misleading 2. Graphical representation
- 38. Histogram stories 38 2. Graphical representation Double - Peaked Bell - Shaped Comb Plateau Skewed Truncated Edge - Peaked Isolated - Peaked
- 39. Determined by: Number of data points (as number, width) Spread of the data (as spread , width ) Best calculated using: IQR = inter-quartile range (spread of middle 50% of data) n = number of data points Histogram interval width 39 2. Graphical representation
- 40. 5. Visualising with box plots 40
- 41. Box plots A rich graphical representation that shows: Location of data (mean and median) Spread (inter-quartile range plus visual) Symmetry Extreme data points (outliers) Very useful but underutilised since: Most managers have weak data skills Excel can’t do them ... need a stats package! 41 4. Box plots
- 42. Box plot example 42
- 43. Interpreting shape 43 4. Box plots Right-Skewed Left-Skewed Symmetric Q Median Q Q Median Q Q Median Q 1 3 1 3 1 3 * * * Mean
- 44. Box plot comparisons Box plots useful for comparing samples If boxes do not overlap: Strong evidence that two samples are different Preferably more than 10 data points! If boxes do overlap: Samples may or may not be different Need to use more advanced techniques (later) 44 4. Box plots
- 45. Bishop’s supermarket Describe the shape of register receipts Too many bins in the histogram? Mild or extreme outliers?What should we do with them? How much do customers typically spend? 45 End of unit exercise 1.1
- 46. End of unit exercise 1.2 Chan’s laundry Does discount size affect profitability?If so, how much? Recommendations? 46
- 47. Class discussion question If the long-run average return and standard deviation of the ASX200 have both been 10% per year, what is the likelihood of the -40% returns experienced in 2008? 47 Data analysis
- 48. Normal distribution 48 2. Normal distribution 68.2% chance 95.4% chance 99.7% chance
- 49. Features Bell-shaped with single peak (unimodal) Mean = median = mode = Symmetrical around mean () Skewness = 0 (if -ve then tail on left, bulge on right) Kurtosis = 3 (if > 3 then pinched with fat tails) There are values from – to + Total area under curve = 1 (probability of all events) Combination of 2+ normal variables is normal 49 2. Normal distribution
- 50. Different means 50 =1 =1 The means of two variables can be differentbut they can both still be normal 2. Normal distribution
- 51. Different standard deviations 51 =1 =2 The standard deviation of two variables can be different but they can both still be normal 2. Normal distribution
- 52. Using Excel for probabilities 52 2. Normal distribution =2 Value of X = 3 Mean () = 2 Std dev () = 1 =1 X = Calculates probability between and X In Excel ... =NORMDIST( X , , , 1) In this case ... =NORMDIST( 3 , 2 , 1 , 1) = 0.84
- 53. Using Excel for value of X 53 2. Normal distribution =2 Probability = 0.84 Mean () = 2 Std dev () = 1 =1 p = 0.84 Calculates X for a probability between and X In Excel ... =NORMINV ( p , , ) In this case ... =NORMINV ( 0.84 , 2 , 1 ) = 3
- 54. Standard normal distribution 54 2. Normal distribution =0 =1 Z = Special normal distribution with = 0, = 1 Used to generalise for all normal distributions Z-score = number of std deviations from mean
- 55. Using Excel for std normal distribution 55 2. Normal distribution =0 Value of Z = 1 Mean () = 0 Std dev () = 1 =1 Z = Calculates probability between and Z In Excel ... =NORMSDIST ( Z ) In this case ... =NORMSDIST ( 1 ) = 0.84 And also ... =NORMSINV ( 0.84 ) = 1
- 56. Negative Z-scores 56 2. Normal distribution =0 =1 Z = Negative Z-Scores happen when X < (mean) Number of std deviations to left of mean IN Excel ... =NORMSDIST ( -1 ) = 0.16
- 57. Fun with Z-scores 57 2. Normal distribution P ( - < Z < + ) = P ( Z < 0 ) = P ( Z > 0 ) = P ( Z < 1 ) = P ( Z > -1 ) = P ( Z > 1 ) = P ( Z < -1 ) = P ( 0 < Z < 1 ) = P ( -1 < Z < 1 ) = =0 =1 Z = P ( - < Z < 1 ) = 0.84 You can use this result for lot’s of other regions!
- 58. Fun with Z-scores 58 2. Normal distribution P ( - < Z < + ) = P ( Z < 0 ) = P ( Z > 0 ) = P ( Z < 1 ) = P ( Z > -1 ) = P ( Z > 1 ) = P ( Z < -1 ) = P ( 0 < Z < 1 ) = P ( -1 < Z < 1 ) = 1.00 0.50 0.50 0.84 0.84 0.16 0.16 0.34 0.68 =0 =1 Z = P ( - < Z < 1 ) = 0.84 You can use this result for lot’s of other regions!
- 59. Fun with probability 59 2. Normal distribution P ( - < X < + ) = P ( X < 2 ) = P ( X > 2 ) = P ( X < 3 ) = P ( X > 1 ) = P ( X > 3 ) = P ( X < 1 ) = P ( 2 < X < 3 ) = P ( 1 < X < 3 ) = 1.00 0.50 0.50 0.84 0.84 0.16 0.16 0.34 0.68 =2 =1 X = P ( - < X < 3 ) = 0.84 Use same logic as before orconvert X to Z-score (Z = 1)
- 60. Chebyshev’s rule ‘Rough as guts’ estimation of probability Use when variable is not normally distributed The rule: At least 3/4 will fall within 2 std deviations of mean At least 8/9 will fall within 3 std deviations of mean Beware! Even if you don’t know distribution of variable ... ... you might know distribution of the sample mean or the average! See Central Limit Theorem later! 60
- 61. Problem solving tips Write down variables from question = 5, = 2, P(X>6) = ??? Draw a quick bell shaped diagram Mark in middle of bell and position of X Shade the region that you are trying to find Look for provided NORMDIST or NORMSDIST If provided NORMSDIST then calculate Z-Score If provided NORMDIST then usually don’t have to Identify probability of correct region(s) 61 Exam tips
- 62. End of unit exercises Rework NORMSDIST(2.31) = 0.9896 NORMSINV(0.99) = 2.326 NORMSINV(0.15) = -1.036 Quality NORMSDIST(-2) = 0.0228 NORMSDIST(2.5) = 0.9938 Coal yield NORMSINV(0.2) = -0.842 NORMSINV(0.01) = -2.326 Answers 1a) 0.0104 1b) 0.05976 1c) 0.00942 2. $9.66 3a) 0.1587 3b) 792.1 3c) 42.99 62 Coal yield
- 63. Distribution of sample means Take lots of random samples from a population Features of the mean of each sample? Mean of each sample will be a bit different They should be quite close to the population mean For big samples ... mean should be close For small samples ... mean could be very different 63 3. Central Limit Theorem Sample1 Sample3 Population Sample2 Sample4
- 64. Populations have lots of distributions! 64 3. Central Limit Theorem Double - Peaked Bell - Shaped Comb Plateau Skewed Truncated Edge - Peaked Isolated - Peaked
- 65. ... but means of sample are normal! Central Limit Theorem It doesn’t matter how population is distributed ... ... if you take a sufficiently large sample (n>30) ... the probability distribution of sample means ... will be approximately normally distributed ... around the population mean What is a ‘sufficiently large’? The bigger the sample the more ‘normal’ the mean For this course, if sample size n > 30 65 3. Central Limit Theorem
- 66. Demonstration 66 http://onlinestatbook.com/stat_sim/sampling_dist/index.html 3. Central Limit Theorem
- 67. Spread of sample means Sample means will almost never be the same as the true population mean! Sample mean will be most accurate when: Sample size (n) is big Spread of the population values () is low Standard error Measures the spread of sample mean Sometimes called SE or sd (X) 67 3. Central Limit Theorem
- 68. Probability of sample mean 68 Sample mean (X) is normal (if sample size >30) Z-score = std errors from population mean Probability mean of sample < 3 = 0.84= NORMSDIST (1) 3. Central Limit Theorem =2 X =
- 69. When to use ... Taking a sample oraverage of a variable(which doesn’t need to be normal) Probability thatsample mean oraverage of the variabletakes on certain values (n > 30 if it is a sample) Variable is normally distributed Probability the variable takes on certain values 69 3. Central Limit Theorem

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment