1. Studocu is not sponsored or endorsed by any college or university
Introduction to Statistics Chapter 1 Notes
Introduction to Statistics (Athabasca University)
Studocu is not sponsored or endorsed by any college or university
Introduction to Statistics Chapter 1 Notes
Introduction to Statistics (Athabasca University)
Downloaded by Ahmed Elmi (ahmedatoshe@gmail.com)
lOMoARcPSD|7438752
2. Introduction to Statistics Chapter 1 Notes
1.1.1
What is Statistics
?
Statistics has two meanings. The first meaning refers to numerical facts like a families income, a
persons age, the starting salary of a college graduate or the percentage of completed passes from a
quarterback. The second meaning refers to the field or discipline of study. In this sense of the word,
statistics can be defined as follows
:
STATISTICS - the science of collecting, analyzing, presenting and interpreting data, as well as making
decisions based on such analyses
Decisions made by using statistical methods are called educated guess and decisions made without
using statistical methods are called pure guesses
.
Statistics has two aspects: theoretical and applied. Theoretical statistics deals with the development,
derivation and proof of statistical theorems, formulas, rules and laws. Applied statistics involves the
application of those theorems, formulas, rules and laws
.
1.1.2
Types of Statistics
Applied statistics can be divided into two areas: descriptive statistics and inferential statistics
.
Descriptive Statistics
Large data sets are difficult to interpret in their original form. To make it easier to draw a conclusion
we construct tables and graphs to organize the data
.
DESCRIPTIVE STATISTICS - methods of organizing, displaying and describing data by using graphs, tables
and summary measures
Inferential Statistics
A major portion of statistics deals with decision making and predictions. The data used in this type of
statistics is taken from groups of people during samples, or surveys
INFERENTIAL STATISTICS - consists of methods that use sample results to help make decisions or
predictions about a population
Probability, which measures the likelihood that a certain outcome will occur, acts as a link between
descriptive and inferential statistics. It is used to make statements about the occurrence or
nonoccurrence of an event under certain circumstances
1.2
Basic Terms
ELEMENT OR MEMBER - of a sample or population is the specific subject or object about which the
information is being collected
VARIABLE - a characteristic under study that assumes different values for different elements
OBSERVATION OR MEASUREMENT - the value of a variable for an element
DATA SET - collection of observations on one or more variables
Downloaded by Ahmed Elmi (ahmedatoshe@gmail.com)
lOMoARcPSD|7438752
3. 1.3
Types of Variables
A variable is a characteristic under study that assumes different values for different elements. A
variable may be classified as either qualitative or quantitative
.
1.3.1
Quantitative Variables
Income, height, gross sales and price of a home are all examples of quantitative variables because they
can be determined numerically
.
QUANTITATIVE VARIABLE - a variable that can be measured numerically; the data collected on
quantitative variables is called quantitative data
Discrete Variables
The number of cars sold on a given day at a car dealership could either be 1, 2, 3, 4… and is able to be
counted, this is an example of a discrete variable
DISCRETE VARIABLE - a variable whose values are countable; can only assume certain values with no
intermediate values
Continuous Variables
The time taken to complete an examination could be .5, 1.5, 1.75 hours… so theoretically it can be
measured as precisely as possible, this is an example of a continuous variable
CONTINUOUS VARIABLE - a variable that can assume any numerical value over a certain interval or
intervals
1.3.2
Qualitative or Categorical Variables
The status of an undergraduate college student cannot be measured numerically but can be put into
four different groups: freshman, sophomore, junior and senior, this is an example of a qualitative
variable
QUALITATIVE VARIABLE - a variable
that cannot assume a numerical but
can be classified into two or more
nonnumerical categories; the data
collected on qualitative
variables is called qualitative data Variables
Quantitative
Qualitative
Continuous Discrete
Downloaded by Ahmed Elmi (ahmedatoshe@gmail.com)
lOMoARcPSD|7438752
4. 1.4
Cross-Section Versus Time-Series Data
Based on the time over which they are collected, data can be classified as either cross-section or time-
series data
1.4.1
Cross-Section Data
Contains information on different elements of a population or sample for the same period of time;
example: the information on 100 different families for 2015
CROSS-SECTION DATA - data collected on different elements at the same point in time or for the same
period of time
1.4.2
Time-Series Data
Contains information on the same elements at different points in time; example: information on US
exports for the years 2001 to 2015
TIME-SERIES DATA - data collected on the same element for the same variable at different points in
time or for different periods of time
1.5
Population Versus Sample
All the voters in a city, all companies in New York City, and all homes in California are all examples of
a population. In statistics, a population does not only mean a group of people but it could also mean a
group of books, cars or houses. The population of interest is called the target population
.
POPULATION OR TARGET POPULATION - a population consists of all elements — individuals, items, or
objects — whose characteristics are being studied. The population being studied is called the target
population
Most of the time, decisions are made based on a portion of a population. For example, election polls in
the United States to estimate a percentage of voters who favour various candidates are only based on a
few 100 or few 1000 people across the country, this is called a sample
.
SAMPLE - a portion of the population selected for study
The collection of information from the elements of a sample is called a sample survey but when we
collect information on all elements of a population, it is called a census. Taking census is usually very
difficult to take because a population could be very large, so we would rather do a sample survey
instead
.
CENSUS - a survey that includes every member of the population
SAMPLE SURVEY - a survey that includes only a portion of the population
The purpose of conducting a sample survey is to make decisions based on a specific population. It is
important that the results from the survey closely match the results of a census or it will not apply to
Downloaded by Ahmed Elmi (ahmedatoshe@gmail.com)
lOMoARcPSD|7438752
5. the corresponding population
REPRESENTATIVE SAMPLE - a sample that represents the characteristics of a population as closely as
possible
For sampling, we can do a sample with replacement or a sample without replacement. In sampling with
replacement, each time an element is selected it gets put back into the population before selecting
another element. Thus giving that element another chance of being selected. In sampling without
replacement, once an element has been chosen, it cannot be chosen again
.
1.5.1
Why Sample
?
Time
The size of a population could be very large, so conducting a census could take a large about of time.
With sampling, you can choose which portions of the population to do the survey on, hence conducting
it much faster
.
Cost
The cost of collecting information from each member of a population could cost much more than what
is budgeted for a census. Sampling costs would be much less
Impossibility of Conducting a Census
A census must be conducting by surveying every member of the population. But what if you can’t find
or get ahold of every member. A sample allows a portion of the population to be contacted so you
don’t need to find every member
1.5.2
Random and Nonrandom Samples
RANDOM SAMPLE - a sample drawn in such a way that each member of the population has some chance
of being selected in the sample
NONRANDOM SAMPLE - some members of the population may not have a chance of being in the sample
Two types of nonrandom samples are convenience samples and judgement samples. In a convenience
sample, the most accessible members of the population are selected to obtain the results quickly. In a
judgement sample, the members of the population are selected based on judgement and prior
knowledge from an expert
.
Pseudo polls are examples of non representative samples. For example, polls in newspapers or on radio
shows are polls only intended for the readers or listeners
.
Quota samples are a type of sample where a portion of the population gets picked based on
subpopulations using different characteristics
1.5.3
Sampling and Non-sampling Errors
Sampling or Chance Error
All samples collected from the same population usually get different results because they contain
different elements of the population. The results obtained from a sample could be different from a
census. The difference in results from a sample and a census is called the sampling error
.
Downloaded by Ahmed Elmi (ahmedatoshe@gmail.com)
lOMoARcPSD|7438752
6. SAMPLING ERROR - the difference
between the result obtained from a
sample survey and the result obtained
if the whole population had been
included in the survey
Non-sampling Errors or Biases
Non-sampling errors can occur in sampling or in a census. These errors occur because of human
mistakes and not chance. Non-sampling errors can be minimized if questions are prepared carefully and
data is recorded correctly
NON-SAMPLING ERROR - the errors that occur in the collection, recording and tabulation of data
Selection Error or Bias
The list of members from a population we would choose for a sample is called a sampling frame. The
sampling frame that is selected for the sample may not be representative to the entire population. This
could cause different results from the sample and the population
.
SELECTION ERROR - the error that occurs because the sampling frame is not representative of the
population
Non-Response Error
This error usually occurs when a sample is done by mail because people don’t return the questionnaire.
Studies show that families don’t want to pay the postage to return so they simply do not do it
.
NON-RESPONSE ERROR - the error that occurs because many of the people included do not participate
Response Error
This error occurs when the person answering the sample does not give the correct information. They
may do it on purpose, and don’t actually know the correct answer assuming they do
.
Types of Errors
Sampling or Chance
Error
Non-sampling or
Chance Errors
Voluntary Response
Error or Bias
Response Error or
Bias
Non-response Error
or Bias
Selection Error or Bias
Downloaded by Ahmed Elmi (ahmedatoshe@gmail.com)
lOMoARcPSD|7438752
7. RESPONSE ERROR - occurs when people who do the survey do not provide the correct answers
Voluntary Response Error
These errors occur from questionnaires in newspapers and magazines. They occur because people who
fill them out usually have very strong opinions on the issues
.
VOLUNTARY RESPONSE ERROR - occurs when a survey is not conducted on a randomly selected sample
but on a questionnaire published in a magazine or newspaper and people are invited to respond
1.5.4
Random Sampling Techniques
Simple Random Sampling
From a given population, we can select a large number of samples of the same size. If each of these
samples has the same probability of being select, it is a simple random sample. Examples, the lottery
SIMPLE RANDOM SAMPLING - each sample of the same size has the same probability of being selected
Systematic Random Sampling
Used for larger populations. An example procedure for systematic random sampling is as follows: A
population of 45000 households gets arranged alphabetically (or based on a different characteristic).
Since the sample size should equal 150, the ratio of population to sample size is 45000/150 = 300.
Using this ratio, we randomly select one household from the first 300 households in the arranged list
and so on
.
SYSTEMATIC RANDOM SAMPLING - randomly select one member from the first k units of the list of
elements arranged based on a given characteristic where k is the number obtained by dividing the
population size by the intended sample size. Then every kth member, starting with the first selected
member, is included in the sample
Stratified Random Sampling
Dividing the whole population into different groups based on income levels giving different
subpopulations, which are called strata. Then collect a sample from each subpopulation on stratum.
The collection of all samples selected from the different strata gives the required sample, called the
stratified random sample
.
STRATIFIED RANDOM SAMPLING - divide the population into subpopulations. Then, one sample from
each subpopulation. Each sample creates a stratified random sample
Cluster Sampling
To complete this sample you must divide a population into geographical groups or clusters. Then,
select a random group of clusters from all the clusters and conduct a simple random sample
CLUSTER SAMPLING - the whole population is first divided into (geographical) groups called clusters.
Downloaded by Ahmed Elmi (ahmedatoshe@gmail.com)
lOMoARcPSD|7438752
8. Each cluster is representative of the population. Then a random sample of clusters is selected. Finally,
a random sample of all elements from each of the selected clusters is selected
.
Downloaded by Ahmed Elmi (ahmedatoshe@gmail.com)
lOMoARcPSD|7438752