Like this presentation? Why not share!

# Session 8 stats in business research

## by Liam Greenslade on Dec 02, 2011

• 965 views

This lecture introduces students to basic statistics and their use in business research. It explanins the difference between inferential and descriptive statistics and introduces concepts such as ...

This lecture introduces students to basic statistics and their use in business research. It explanins the difference between inferential and descriptive statistics and introduces concepts such as correlation, distribution and standard deviation.

### Views

Total Views
965
Views on SlideShare
951
Embed Views
14

Likes
1
31
0

### 1 Embed14

 https://bpp.blackboard.com 14

### Categories

Uploaded via SlideShare as Microsoft PowerPoint

## Session 8 stats in business researchPresentation Transcript

• International Pre-Masters Diploma in Business Studies.International Pre-Masters Diploma in Legal Studies.Project & Report Session 8: Statistics in Business Research Liam Greenslade Professional Education: developing your career
• Lesson objectives At the end of today‟s class you will Understand why we use statistics in business research Recognise the difference between descriptive and inferential statistics Identify 4 measures of central tendency Identify 3 measures of variation Understand the concept of statistical significance
• Why we use statistics?  The primary role of statistics is to provide decision makers with the information to help them make decisions.  Statistics are used to:  Answer short and long-range planning questions,  Inform investment and sales/marketing decisions  Decision makers make better decisions when they have strategic summary information based on systematic statistical analysis
• 2 types of statistics There are two types of statistics: Descriptive Statistics are concerned with summary calculations, graphs, charts and tables. Inferential Statistics are used to generalize from a sample to a population. For example, the average income of all families (the population) in the UK can be estimated from figures obtained from a few hundred (the sample) families.
• Levels of measurement:Recap Nominal – observations can be assigned to a category (a distinct group). There is no logical order for categories. Numbers act like names Ordinal – observations can be assigned values that are rank ordered (arranged in a logical order), where one rank is further along a dimension than another (e.g. age groupings) Interval – measures can be taken on a continuous scale where equal intervals represent equal differences (e.g. Thermometer) Ratio – measures are taken on a continuous scale with interval properties and a true zero point (e.g. age, length, height, weight)
• Nominal, Ordinal, Interval, and Ratio Scales Provide Different Information
• Why do levels of measurement matter? Numbers represent different things and have different functions at each level of measurement  Nominal : we can classify or categorise observations. Numbers represent frequency counts.  Ordinal : we can rank or order observations. Numbers represent rank or relative position in a sequence.  Interval/ratio : we can measure characteristics or properties and assign them scores. Scores represent an exact amount of some property.
• Descriptive Statistics Collect data  e.g. Survey Present data  e.g. Tables and graphs Characterize data  e.g. Sample mean = Xi n
• Descriptive Statistics: SummaryMeasures Describing Data NumericallyCentral Tendency Quartiles Variation Shape Arithmetic Mean Range Skewness Median Interquartile Range Mode Variance Geometric Mean Standard Deviation Coefficient of VariationBasic BusinessStatistics, 10e ©2006 Prentice-Hall,
• Measures of Central Tendency Overview Central TendencyArithmetic Mean Median Mode Geometric Mean n XG ( X1 X 2  Xn )1/ n Xi i 1 X n Midpoint of Most ranked frequently values observed Basic Business value Statistics, 10e © 2006 Prentice-Hall,
• Arithmetic Mean  The most common measure of central tendency  Mean = sum of values divided by the number of values  Affected by extreme values (outliers) 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Mean = 3 Mean = 4 1 2 3 4 5 15 1 2 3 4 10 20 3 4 5 5 5 5Basic BusinessStatistics, 10e ©2006 Prentice-Hall,
• Median  In an ordered array, the median is the “middle” number (50% above, 50% below) 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 Median = 3 Median = 3  Not affected by extreme valuesBasic BusinessStatistics, 10e ©2006 Prentice-Hall,
• Finding the Median  The location of the median: n 1 Median position position in the ordereddata 2  If the number of values is odd, the median is the middle number  If the number of values is even, the median is the average of the two middle numbers n 1  Note that is not the value of the median, only the 2 position of the median in the ranked data
• Mode  Value that occurs most often  Not affected by extreme values  Used for either numerical or categorical (nominal) data  There may be no mode  There may be several modes 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6Basic Business No ModeStatistics, 10e © Mode = 9 Chap 3-142006 Prentice-Hall,
• Geometric Mean The geometric mean is more appropriate than the arithmetic mean for describing proportional growth, both exponential growth (constant proportional growth) and varying growth In business the geometric mean of growth rates is known as the compound annual growth rate(CAGR). The geometric mean of growth over periods yields the equivalent constant growth rate that would yield the same final amount.
• Measuring central tendency  Five houses on a hill by the beach \$2,000 K House Prices: \$2,000,000 500,000 \$500 K 300,000 \$300 K 100,000 100,000 \$100 KBasic Business \$100 KStatistics, 10e ©2006 Prentice-Hall,
• Review Example: Summary Statistics House Prices:  Mean: (\$3,000,000/5) \$2,000,000 = \$600,000 500,000 300,000 100,000 100,000  Median: middle value of ranked dataSum \$3,000,000 = \$300,000  Mode: most frequent value = \$100,000 Basic Business Statistics, 10e © 2006 Prentice-Hall,
• Which measure of location is the “best”?  Mean is generally used, unless extreme values (outliers) exist  Then median is often used, since the median is not sensitive to extreme values.  The mode might be used when the data is nominal or categoricalBasic BusinessStatistics, 10e ©2006 Prentice-Hall,
• Which measure of location is the “best”? Best measure of centralType of Variable tendencyNominal ModeOrdinal MedianInterval/Ratio (not Meanskewed)Interval/Ratio Median(skewed)
• Distributions Normal Distribution We often test whether our data is normally distributed as this is a common assumption underlying many statistical tests When you have a normally distributed sample you can use both the mean or the median as your measure of central tendency. In fact, in any symmetrical distribution the mean, median and mode are equal.
• Skewed distributions  In these situations, the median is generally considered to be the best representative of the central location of the data.  The more skewed the distribution the greater the difference between the median and mean, and the greater emphasis should be placed on using the median as opposed to the mean.  A classic example of the right- skewed distribution is income (salary), where higher-earners provide a false representation of the typical income
• Measures of Variation Variation Range Interquartile Variance Standard Coefficient Range Deviation of Variation Measures of variation give information on the spread or variability of the data values.Basic Business Same center,Statistics, 10e ©2006 Prentice-Hall, different variation Chap 3-22
• Range  Simplest measure of variation  Difference between the largest and the smallest values in a set of data: Range = Xlargest – XsmallestExample: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Range = 14 - 1 = 13
• Disadvantages of the Range  Ignores the way in which data are distributed 7 8 9 10 11 12 7 8 9 10 11 12 Range = 12 - 7 = 5 Range = 12 - 7 = 5  Sensitive to outliers 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5 Range = 5 - 1 = 4 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120Basic Business Range = 120 - 1 = 119Statistics, 10e ©2006 Prentice-Hall,
• Variance  Average (approximately) of squared deviations of values from the mean n 2  Sample variance: (Xi X) 2 i 1 S n -1 Where X = mean n = sample sizeBasic BusinessStatistics, 10e © Xi = ith value of the variable X Chap 3-252006 Prentice-Hall,
• Standard Deviation  Most commonly used measure of variation  Shows variation about the mean  Is the square root of the variance  Has the same units as the original data n Sample standard deviation: 2 (Xi X) i 1 S n -1Basic BusinessStatistics, 10e © Chap 3-262006 Prentice-Hall,
• Standard Deviation The standard deviation help you to know how a set of data clusters or distributes around its mean. For almost all sets of data, the majority of the observed values lie within an interval of plus and minus one standard deviation above and below the mean. Neither the variance nor the standard deviation can ever be negative.Basic BusinessStatistics, 10e ©2006 Prentice-Hall,
• Inferential Statistics Estimation  e.g.: Estimate the population mean weight using the sample mean weight Hypothesis testing  e.g. Test the claim that the population mean weight is 120 pounds Drawing conclusions and/or making decisions concerning a population based on sample results.
• Testing hypotheses Inferential stats enable us to identify differences between groups They also enable us to identify meaningful patterns of covariation or correlation in populations We can use statistical tests to establish whether our results are statistically significant (i.e. not likely to have occurred by chance)
• Statistical testing & the p value When we test hypotheses, we need to establish whether our grounds for rejecting the null hypothesis are valid. To do this we use statistical tests of significance These enable us to calculate whether our results have occurred by chance We calculate a level of significance using the 5% level, i.e that the probability of the result we obtained would occur by chance less than 5 times in 100 Obtaining a „p‟ value of <.05 means we can reject the null hypothesis
• Statistical significance  The statistical significance of a result is the probability that the observed relationship (e.g., between variables) or a difference (e.g., between means) in a sample occurred by pure chance ("luck of the draw"),  Using less technical terms, we could say that the statistical significance of a result tells us something about the degree to which the result is "true" in the sense of being "representative of the population".
• Parametric & Non-Parametric data Statistical tests fall into 2 categories Parametric – where we can assume that the data is normally distributed Non-parametric where we cannot make that assumption Parametric data permits more powerful tests of significance
• Correlation Correlation is a measure of the relation between two or more variables. Correlation coefficients can range from -1.00 to +1.00. The value of -1.00 represents a perfect negative correlation A value of +1.00 represents a perfect positive correlation. A value of 0.00 represents a lack of correlation.