INTRODUCTION TO STATISTICS
LO1: Explain the meaning of data and statistics
LO2: Describe the role of uncertainty in decision making
LO3: Distinguish between various terms and concepts used in statistical analysis
LO4: Distinguish between descriptive and inferential statistics
LO5: Differentiate between different types of data
LO6: Distinguish between probability sampling and non-probability sampling
LO1: Explain the meaning of data and statistics
Data is made up of raw numbers, typically with many variables. These numbers often
come from research that was conducted as part of a study or survey and are a primary
source. Data is often in the form of a digital data set. Data is capable of being analyzed
using statistical procedures. Data is used to create new information or knowledge. For
example, census data provides data about the number of people within a particular
area with variables such as gender, age, income, etc.
Statistics are the interpretation of raw data, often to show relationships among
variables. Statistics answer “why”or “how”questions. Statistics allow you t use just a
few numbers, rather than a data set, to support an argument or statement. Statistics are
usually presented in tables, graphs, or charts. For example, statistics can answer such
questions as to whether there is a correlation between income level and education.
LO2: Describe the role of uncertainty in decision making
Uncertainty plays a significant role in decision-making, as it can affect the outcome of a
decision and the confidence with which it is made. Here are some ways uncertainty can
impact decision-making:
Effects of Uncertainty on Decision-Making
1. Risk aversion: Uncertainty can lead to risk aversion, as decision-makers may prefer to
avoid making a decision or choose a safer option.
2. Delayed decision-making: Uncertainty can cause decision-makers to delay making a
decision, hoping that more information will become available.
3. Increased anxiety: Uncertainty can increase anxiety and stress levels, leading to
impulsive or irrational decision-making.
4. Reduced confidence: Uncertainty can reduce confidence in the decision-making
process, leading to second-guessing and self-doubt.
LO3: Distinguish between various terms and concepts used in statistical analysis
LO4: Distinguish between descriptive and inferential statistics
Descriptive Statistics
Descriptive statistics refers to the process of summarizing and analyzing data to
describe its main features in a clear and meaningful way. It is used to present raw data
in a form that makes it easier to understand and interpret. Descriptive statistics involves
both graphical representations (such as charts and plots) and numerical measures to
summarize data effectively. Unlike inferential statistics, which makes predictions about
a population based on a sample, Descriptive statistics is applied to data that is already
known.
Inferential Statistics
Inferential statistics involves using data from a sample to make predictions,
generalizations, or conclusions about a larger population. Unlike descriptive statistics,
which simply summarizes known data, inferential statistics makes inferences or draws
conclusions that go beyond the available data. It uses probability theory to estimate
population parameters and test hypotheses. By working with a sample, inferential
statistics allows researchers to make informed decisions without having to gather data
from an entire population.
LO5: Differentiate between different types of data
Types of data include:
a) Nominal data
b) Ordinal data
c) Discrete data
d) Continuous data
a) Nominal data: Nominal Data is used to label variables without any order or
quantitative value. Nominal data is a type of data that consists of categories or
names that cannot be ordered or ranked. Nominal data is often used to
categorize observations into groups, and the groups are not comparable. In other
words, nominal data has no inherent order or ranking. The colour of hair can be
considered nominal data, as one colour can’t be compared with another colour.
The name “nominal”comes from the Latin name “nomen,”which means “name.”
With the help of nominal data, we can’t do any numerical tasks or can’t give any
order to sort the data. These data don’t have any meaningful order; their values
are distributed into distinct categories.
Examples of Nominal Data :
i)Colour of hair (Blonde, red, Brown, Black, etc.)
ii)Marital status (Single, Widowed, Married)
iii)Nationality (Indian, German, American)
iv)Gender (Male, Female, Others)
v)Eye Colour (Black, Brown, etc.)
vi) Race (white, black,Asian)
vii) Religion (Hinduism , Christianity, Islam , Judaism )
viii) Blood type (A,B,AB,O)
ix) Marital status( single or married)
b) Ordinal scale: is data that can be categorized and has a natural ordering but it
has no quantitative (numeric) value. For example, someone's final place in a race
would be considered ordinal data: first, second, third, etc. This data type can be
described statistically in terms of frequency. Other examples of ordinal data
include academic grades (A,B,C,D,F), education degree level (Bachelor's,
Master's, Doctoral), and satisfaction rating (extremely dissatisfied, dissatisfied,
neutral, satisfied, extremely satisfied).
Ordinal data is a type of data that consists of categories that can be ordered or ranked.
However, the distance between categories is not necessarily equal. Ordinal data is
often used to measure subjective attributes or opinions, where there is a natural order
to the responses.
Examples of ordinal data include:
• Education level (Elementary, Middle, High School, College),
• Job position (Manager, Supervisor, Employee), etc.
Ordinal data can be represented using bar charts, or line charts. These displays show
the order or ranking of the categories, but they do not imply that the distances between
categories are equal.
Ordinal data is analyzed using non-parametric tests, which make no assumptions about
the underlying distribution of the data. Common non-parametric tests for ordinal data
include the Wilcoxon Signed-Rank test and the Mann-Whitney U test.
c) Discrete data type is a type of data in statistics that only uses Discrete Values or
Single Values. These data types have values that can be easily counted as whole
numbers.
Example of the discreate data types are,
Discrete data are data that has quantitative (numeric) value and can be counted, not
measured. For example, the total number of cars parked on campus would be
considered discrete data. This data type can be described statistically in aggregate -- as
a single number, for example, there may be 50 cars parked on campus. This aggregate
number can be subdivided based on a categorical feature -- there are 10 blue cars, 30
red cars, and 10 green cars. You can run more advanced statistical tests on discrete
data if you observe the data over time. With this longitudinal view, you may ask the
average cars parked on campus over a given temporal period (day, week, etc.). Other
examples of discrete data include the number of students in a classroom, the number
of computers in a given space, and the number of light fixtures in a commercial
property.
d) Continuous data are data that have quantitative (numeric) value but
are measured instead of counted. For example, someone's height would be
considered continuous data. When continuous data is observed and collected in
a dataset, statistical procedures can be used on the determine the mean, mode,
range, standard deviation, and other statistical characteristics of the data. Other
examples of continuous data include temperature in a given space, time taken to
complete a task, and length of a film. Continuous data can be converted without
changing the value itself --> 3 feet is the same as 1 yard.
Continuous data is the type of quantitative data that represents the data in a
continuous range. The variable in the data set can have any value within the
range of the data set.
Examples of the continuous data types are,
• Temperature Range
• Salary range of Workers in a Factory, etc.
e) Interval data is measured along a numerical scale that has equal distances
between adjacent values. These distances are called “intervals.” Interval data is
a data type that is measured on a scale where each value is placed at an equal
distance (interval) from one another
Examples of interval data include:
i)Test scores are an example of interval data. SAT scores that range from 200 to
800 do not use an absolute zero because there are no scores below 200.
Teachers use interval data when grading tests or calculating the grade point
average. Even credit scores use interval data.
ii) Time, using a twelve-hour clock, is another example. The distance
between each number on the clock is equidistant and measurable, so
the distance between four and five o’clock is the same as the distance
between five and six o’clock.
iii)Temperature, when comparing Celsius and Fahrenheit, is a type of interval
data since there is no absolute zero. It is most commonly used as a tool in
statistical research, particularly if performing measures of probability or
studying a specific population.
f) Ratio data is quantitative (numbered data) that has a meaningful interval
between data points. Unlike Interval data, ratio data has a meaningful zero -
known as a true zero.
Examples of ratio data are:
Length (in cm, where 0 cm has no length).
Weight (in kg, where 0 kg has no weight).
It measures variables on a continuous scale, with an equal distance between
adjacent values
• Temperature in Kelvin (0, +10, +20, +30, +40, etc.)
• Height (5ft. 8in., 5ft. 9in., 5ft. 10in., 5ft. 11in., 6ft. 0in. etc.)
• Price of goods ($0, $5, $10, $15, $20, $30, etc.)
• Age in years (from zero to 100+)
• Distance (from zero miles/km upwards)
• Time intervals (might include race times or the number of hours spent watching
Netflix!)
LO6: Distinguish between probability sampling and non-probability sampling
Probability sampling is a sampling strategy in which each member of a
population has an equal and known chance of being chosen. The goal of this
approach is to get a representative sample that correctly represents the makeup
and features of the full population. Simple random sampling, systematic
sampling, stratified sampling and cluster sampling are the four main methods of
probability sampling. However, non-probability sampling does not offer every
individual an equal chance of selection. Instead, elements are selected based
on their accessibility or the researcher's judgment. Convenience sampling,
purposive sampling, quota sampling, and snowball sampling are examples of
non-probability sampling methods.
INTRODUCTION TO STATISTICS QUANTITATIVE TECHNIQUES.pdf

INTRODUCTION TO STATISTICS QUANTITATIVE TECHNIQUES.pdf

  • 1.
    INTRODUCTION TO STATISTICS LO1:Explain the meaning of data and statistics LO2: Describe the role of uncertainty in decision making LO3: Distinguish between various terms and concepts used in statistical analysis LO4: Distinguish between descriptive and inferential statistics LO5: Differentiate between different types of data LO6: Distinguish between probability sampling and non-probability sampling LO1: Explain the meaning of data and statistics Data is made up of raw numbers, typically with many variables. These numbers often come from research that was conducted as part of a study or survey and are a primary source. Data is often in the form of a digital data set. Data is capable of being analyzed using statistical procedures. Data is used to create new information or knowledge. For example, census data provides data about the number of people within a particular area with variables such as gender, age, income, etc. Statistics are the interpretation of raw data, often to show relationships among variables. Statistics answer “why”or “how”questions. Statistics allow you t use just a few numbers, rather than a data set, to support an argument or statement. Statistics are usually presented in tables, graphs, or charts. For example, statistics can answer such questions as to whether there is a correlation between income level and education. LO2: Describe the role of uncertainty in decision making Uncertainty plays a significant role in decision-making, as it can affect the outcome of a decision and the confidence with which it is made. Here are some ways uncertainty can impact decision-making: Effects of Uncertainty on Decision-Making 1. Risk aversion: Uncertainty can lead to risk aversion, as decision-makers may prefer to avoid making a decision or choose a safer option. 2. Delayed decision-making: Uncertainty can cause decision-makers to delay making a decision, hoping that more information will become available.
  • 2.
    3. Increased anxiety:Uncertainty can increase anxiety and stress levels, leading to impulsive or irrational decision-making. 4. Reduced confidence: Uncertainty can reduce confidence in the decision-making process, leading to second-guessing and self-doubt. LO3: Distinguish between various terms and concepts used in statistical analysis LO4: Distinguish between descriptive and inferential statistics Descriptive Statistics Descriptive statistics refers to the process of summarizing and analyzing data to describe its main features in a clear and meaningful way. It is used to present raw data in a form that makes it easier to understand and interpret. Descriptive statistics involves both graphical representations (such as charts and plots) and numerical measures to summarize data effectively. Unlike inferential statistics, which makes predictions about a population based on a sample, Descriptive statistics is applied to data that is already known. Inferential Statistics Inferential statistics involves using data from a sample to make predictions, generalizations, or conclusions about a larger population. Unlike descriptive statistics, which simply summarizes known data, inferential statistics makes inferences or draws conclusions that go beyond the available data. It uses probability theory to estimate population parameters and test hypotheses. By working with a sample, inferential statistics allows researchers to make informed decisions without having to gather data from an entire population.
  • 4.
    LO5: Differentiate betweendifferent types of data Types of data include: a) Nominal data b) Ordinal data c) Discrete data d) Continuous data a) Nominal data: Nominal Data is used to label variables without any order or quantitative value. Nominal data is a type of data that consists of categories or names that cannot be ordered or ranked. Nominal data is often used to categorize observations into groups, and the groups are not comparable. In other words, nominal data has no inherent order or ranking. The colour of hair can be considered nominal data, as one colour can’t be compared with another colour. The name “nominal”comes from the Latin name “nomen,”which means “name.” With the help of nominal data, we can’t do any numerical tasks or can’t give any order to sort the data. These data don’t have any meaningful order; their values are distributed into distinct categories. Examples of Nominal Data : i)Colour of hair (Blonde, red, Brown, Black, etc.) ii)Marital status (Single, Widowed, Married) iii)Nationality (Indian, German, American)
  • 5.
    iv)Gender (Male, Female,Others) v)Eye Colour (Black, Brown, etc.) vi) Race (white, black,Asian) vii) Religion (Hinduism , Christianity, Islam , Judaism ) viii) Blood type (A,B,AB,O) ix) Marital status( single or married) b) Ordinal scale: is data that can be categorized and has a natural ordering but it has no quantitative (numeric) value. For example, someone's final place in a race would be considered ordinal data: first, second, third, etc. This data type can be described statistically in terms of frequency. Other examples of ordinal data include academic grades (A,B,C,D,F), education degree level (Bachelor's, Master's, Doctoral), and satisfaction rating (extremely dissatisfied, dissatisfied, neutral, satisfied, extremely satisfied). Ordinal data is a type of data that consists of categories that can be ordered or ranked. However, the distance between categories is not necessarily equal. Ordinal data is often used to measure subjective attributes or opinions, where there is a natural order to the responses. Examples of ordinal data include: • Education level (Elementary, Middle, High School, College), • Job position (Manager, Supervisor, Employee), etc. Ordinal data can be represented using bar charts, or line charts. These displays show the order or ranking of the categories, but they do not imply that the distances between categories are equal. Ordinal data is analyzed using non-parametric tests, which make no assumptions about the underlying distribution of the data. Common non-parametric tests for ordinal data include the Wilcoxon Signed-Rank test and the Mann-Whitney U test. c) Discrete data type is a type of data in statistics that only uses Discrete Values or Single Values. These data types have values that can be easily counted as whole numbers. Example of the discreate data types are,
  • 6.
    Discrete data aredata that has quantitative (numeric) value and can be counted, not measured. For example, the total number of cars parked on campus would be considered discrete data. This data type can be described statistically in aggregate -- as a single number, for example, there may be 50 cars parked on campus. This aggregate number can be subdivided based on a categorical feature -- there are 10 blue cars, 30 red cars, and 10 green cars. You can run more advanced statistical tests on discrete data if you observe the data over time. With this longitudinal view, you may ask the average cars parked on campus over a given temporal period (day, week, etc.). Other examples of discrete data include the number of students in a classroom, the number of computers in a given space, and the number of light fixtures in a commercial property. d) Continuous data are data that have quantitative (numeric) value but are measured instead of counted. For example, someone's height would be considered continuous data. When continuous data is observed and collected in a dataset, statistical procedures can be used on the determine the mean, mode, range, standard deviation, and other statistical characteristics of the data. Other examples of continuous data include temperature in a given space, time taken to complete a task, and length of a film. Continuous data can be converted without changing the value itself --> 3 feet is the same as 1 yard. Continuous data is the type of quantitative data that represents the data in a continuous range. The variable in the data set can have any value within the range of the data set. Examples of the continuous data types are, • Temperature Range • Salary range of Workers in a Factory, etc.
  • 7.
    e) Interval datais measured along a numerical scale that has equal distances between adjacent values. These distances are called “intervals.” Interval data is a data type that is measured on a scale where each value is placed at an equal distance (interval) from one another Examples of interval data include: i)Test scores are an example of interval data. SAT scores that range from 200 to 800 do not use an absolute zero because there are no scores below 200. Teachers use interval data when grading tests or calculating the grade point average. Even credit scores use interval data. ii) Time, using a twelve-hour clock, is another example. The distance between each number on the clock is equidistant and measurable, so the distance between four and five o’clock is the same as the distance between five and six o’clock. iii)Temperature, when comparing Celsius and Fahrenheit, is a type of interval data since there is no absolute zero. It is most commonly used as a tool in statistical research, particularly if performing measures of probability or studying a specific population.
  • 8.
    f) Ratio datais quantitative (numbered data) that has a meaningful interval between data points. Unlike Interval data, ratio data has a meaningful zero - known as a true zero. Examples of ratio data are: Length (in cm, where 0 cm has no length). Weight (in kg, where 0 kg has no weight). It measures variables on a continuous scale, with an equal distance between adjacent values • Temperature in Kelvin (0, +10, +20, +30, +40, etc.) • Height (5ft. 8in., 5ft. 9in., 5ft. 10in., 5ft. 11in., 6ft. 0in. etc.) • Price of goods ($0, $5, $10, $15, $20, $30, etc.) • Age in years (from zero to 100+) • Distance (from zero miles/km upwards) • Time intervals (might include race times or the number of hours spent watching Netflix!) LO6: Distinguish between probability sampling and non-probability sampling Probability sampling is a sampling strategy in which each member of a population has an equal and known chance of being chosen. The goal of this approach is to get a representative sample that correctly represents the makeup and features of the full population. Simple random sampling, systematic sampling, stratified sampling and cluster sampling are the four main methods of probability sampling. However, non-probability sampling does not offer every individual an equal chance of selection. Instead, elements are selected based on their accessibility or the researcher's judgment. Convenience sampling,
  • 9.
    purposive sampling, quotasampling, and snowball sampling are examples of non-probability sampling methods.