Basic Statistical ConceptsPopulation The specific group of individuals or individual objects or events to be studied; the total group to which we make projections and inferences (hence, the term “inferential statistics”)Sample A subset of individual elements from a population which is examined and from which we draw conclusions about the population as a wholeRandom Sample A sample selected in such a way that every individual member of the population has an equal chance of being selected
Basic Statistical ConceptsBias In statistics, bias is bad, nasty, evil. In research, it is simply an obstacle. Bias is the systematic favoritism that is present in the data collection process which may result in skewed or misleading results.Sources of Bias Sample selection: Non-random samples may be biased. Data collection: The way questions are asked, as well as the processing and handling of data may create bias.Bias is often referred to as Error
Basic Statistical ConceptsData The actual measurements obtained through a study or procedure (“data” is plural, the singular is “datum”)Types of Data Numerical: Measurements for which the numbers have value such as height and weight. Something which has quantity (hence, “quantitative data”) Categorical: Observations of categories such as gender or race. Numbers may be used to label categories but there is no relationship between the number and its value (e.g., 1 = male and 2 = female)
Basic Statistical ConceptsStatistic A number that summarizes the data collected from a sample. Some examples include frequencies, percentages, percentiles, and averages.Parameter Statistics are based on sample data. If the summary number is from the entire population then it is a parameter. A study that obtains data from an entire population and is summarized by using parameters is a census.
Basic Statistical ConceptsMean The average or middle of a data set obtained by summing all the values in the data set and dividing by the total number of values. Also called the arithmetic mean; different calculations are used to create a geometric mean or a harmonic mean; these are not generally used in social and market research.Median When the data values are lined up in order from smallest to largest, the median is the middle value, the point where half of the values are above the median and half are below.Mode When data are grouped into categories, the mode is the largest category, based on the number of individuals in the category.
Basic Statistical ConceptsMean vs Median (vs Mode) x = ∑. x The mean is calculated with the formula n The median is the middle value in an ordered distribution. Consider these data: 40 38 Car Prices N of 35 What’s “average”? Models $15K 38 30 $20K 16 25 $25K 11 20 16 15 14 $30K 6 11 10 9 $35K 3 6 5 3 2 $40K 2 1 0 0 0 0 $45K 1 $15K $20K $25K $30K $35K $40K $45K $50K $55K $60K $65K $70K $65K 9 $70K 14 The mean is $31,400 (n = 100) and the median is $20,000. The mode is the most frequent category, $15,000.
Basic Statistical ConceptsVariation Not every score is the same. There are different prices for different cars. Some people pay different prices for the same car. Prices change over time. As the previous exhibit shows, measures of central tendency are not sufficient to describe the distribution (and variability) of scores.Standard Deviation ∑ ( x −. x ) 2 The formula for standard deviation is s= n −1 The standard deviation tells you whether the scores are tightly grouped or widely distributed. Two data sets can have the same mean but have very different distributions. In the previous bi-modal example, the standard deviation is $21,119, indicating a high degree of variation. Note that is the variance.2 s
Basic Statistical ConceptsNormal Distribution The normal distribution is described graphically by the bell-shaped curve. As the number of values in a distribution grows large, there is a tendency in many situations for the largest group of individuals to cluster in the middle of the distribution with successively fewer individuals as the values move out to the tails or ends of the distribution. Due to symmetry, the mean and median are equal and in the middle of the distribution.
Basic Statistical ConceptsNormal Distribution (continued) The normal distribution is the starting point for understanding variability. With a normal distribution, standard deviation has special significance. It is the distance from the mean to the saddle point or point where the curvature changes from concave up to concave down. At this point, about 68% of the values lie within one standard deviation (this is know as the empirical rule). 95% of the values will fall within two standard deviations and 99.7% will fall within three standard deviations.
Basic Statistical ConceptsNormal Distribution The difference in variability is clear300 when two normal distributions with250 252 x = 50 the same mean are shown on the200 210 210 s = 1.6 same scale150 300 120 120100 250 45 45 20050 10 10 1 1 0 150 45 46 47 48 49 50 51 52 53 54 5530 10025 50 x = 5020 s = 16 0 0 10 20 30 40 50 60 70 80 90 1001510 5 0 0 10 20 30 40 50 60 70 80 90 100
Basic Statistical ConceptsStandard Scores ( x − x) The formula for a standard score is s , where x is the original score and s is the standard deviation. Among other things, a standard score allows for comparisons when means and distributions may be different for the scores being compared. The standard score gives the relative standing of the original score taking into account the mean and the variation in the distribution. Standard scores are used in statements like, “Sales at the Troy store are +2 standard deviations (above the mean).” Knowing that a score is above or below the mean and that it is 2, 3, or more standard deviations identifies the scores position relative to all other scores both in terms of direction (from the mean) and how extreme the score is given how other scores are distributed.
Basic Statistical ConceptsStandard Error Standard error is the same basic concept as standard deviation, both represent a typical distance from the mean. The difference is that the original population values will deviate from each other due to natural phenomena (different height, different ideas, different characteristics). But standard error is the deviation of the sample means (from multiple samples of the population). Sample means vary due to the error that occurs from not doing a census (hence, “standard error”). According to the central limit theorem if the samples are large enough the distribution of all possible sample means will have a bell-shaped or normal distribution. Error above and below the mean cancels out and the distribution is symmetrical. σ n is the standard error, where σ is the population standard deviation.