Prof. P.K.Suri
School of Management,
Delhi Technological University
Quantitative Techniques
(Statistics)
Fundamentals
• Statistics is a science. It is a way to get information from data to
facilitate decision making or interpretation.
• Examples of Data:
− Daily wholesale prices and arrivals of a particular agricultural
produce (say wheat) in particular markets for last 3 months ;
− Marks of students in QT for last three years;
− Weather data of a city for last 30 days,
− House-wise, model-wise ownership of cars in a particular locality,
etc.
(Explain associated interpretations/ decisions).
Fundamentals
• Variable: A characteristic of an item or individual
Variable Types:
Categorical variables (Qualitative variables):
(have values that can be placed only in categories e.g.
Are you married: Yes/No)
Numerical variables (Quantitative variables):
(have values that represent quantities)
- Are of two types : Discrete and Continuous
• Data: The set of individual values associated with a variable
Fundamentals (contd.)
Data can be:
• quantitative or qualitative
• in grouped or ungrouped form.
Quantitative data can be subjected to arithmetic operations unlike
qualitative data.
The field of statistics deals with measurements (Quantitative or
Qualitative).
Four generally used scales of measurement (from weakest to
strongest):
To describe values of a categorical variable, we use: Nominal scale and Ordinal
scale
To describe values of a numerical variable, we use: Interval scale and Ratio scale
Fundamentals (contd.)
Nominal Scale: Here numbers are used simply as labels for
categories. For example, an employee may be (M) Male/ (F)
Female (even if numbers are assigned to categories, these
are arbitrary); Weakest scale because you cannot specify
any ranks across categories
Ordinal Scale: Here, data elements are ordered according
to their relative merit. Ex. A product may be ranked as 1, 2, 3
or 4 where 1 denotes worst quality and 4 the best quality.
Ordinal scale does not tell us how much better a product is
than others. It only tells that it is better.
Thus, ordinal scale is weaker in the sense that it is silent
about the amount of difference between categories.
Fundamentals (contd.)
• Interval Scale: An ordered scale in which the difference between
measurement is an meaningful quantity but does not involve a true zero
point.
the value of 0 is assigned arbitrarily and thus we cannot take ratio of
two measurements. But we can take ratio of intervals.
Ex.: 70
C is 2 degrees warmer than 50
C and so is a comparison
between 700
C and 720
C but the environmetal conditions are totally
different.
Ex: Time of a day is in interval scale. We cannot say that 10 AM is
twice as long as 5 AM. But we can say that interval between 0 AM
and 10 AM (10 hrs) is twice as long as interval between 0 AM and 5
AM (5 hrs). This is because 0 AM does not mean absence of any
time.
Fundamentals (contd.)
• Ratio Scale: An ordered scale in which the difference between
measurement is an meaningful quantity and involves a true 0 point (0
is in ratio scale is an absolute 0). Strongest scale.
• If two measurements are in ratio scale, then we can take ratios of
those measurements.
Ex. Money is measured in ratio scale. A sum of Rs. 0 means no
money and is thus an absolute zero. A sum of Rs. 100 is twice as
large as Rs. 50. Other examples are height, weight, volume, area,
length.
[Note that in interval scale, the interval between two interval scale
measurements is in ratio scale (not the individual observations). ]
Fundamentals (contd.)
Collecting Data
Data Sources
Primary Data (Data which you collect yourself for doing analysis)
Secondary Data (Data which is collected by someone else and you use
for doing analysis)
Sources could be:
Data distributed by an organization or individual (e.g. Centre for
Monitoring Indian Economy: www.cmie.com; CRISIL: www.crisil.com;
Nielsen: provide consumer research data to telecom and mobile media
companies)
The outcomes of a designed experiment
The responses from a survey
The results of an observational study
Data collected by ongoing business activities
Samples and Population
•The distinction between sample and population is very important in
statistics.
•A population is the group of all items of interest to an investigator
(not necessarily group of people). Also called universe. In DTU
campus, it may be population of B.Tech. students, population of
MBA students, population of faculty members, etc. Other examples,
Population of weights of cricket bats produced in a factory,
population of cows in a village, etc.
•A descriptive measure of population is called parameter e.g.
average weight of bats produced, average milk given by cows in a
village.
•A sample is a subset of units selected from a population (sampling
units vs sampled units)
•A descriptive measure of sample is called statistic e.g. average
weight of a sample of bats, average milk given by sampled cows.
Fundamentals (contd.)
Sampling
• A sample is drawn from a population using a sampling
procedure.
Non Probability Samples
Judgment Samples
Convenient Samples
Probability Samples
Simple Random Sampling (SRS) (With or Without replacement)
Stratified Sampling
Systematic Sampling
Cluster Sampling, etc.
• The aim is to get a representative sample of the population so
that it leads to near accurate inferences about the population
parameters.
Fundamentals (contd.)
Sampling Frame
To be prepared before sampling.
Partial sampling frame may lead to misleading results (e.g. when you
exclude a particular group of people).
Fundamentals (contd.)
When do we prefer sampling over census approach of data
collection?
• When selecting a sample is less time consuming than selecting
every item of the population
•When selecting a sample is less costly than selecting every item of
the population
•Analyzing a sample is less cumbersome than analyzing enitre
population
• Data Cleaning: Removing outliers
Fundamentals (contd.)
Statistical Inference
• A conclusion drawn about a population based on the information
in a sample from the population is called a statistical inference.
• We use sample statistics to make inference about population
parameters.
• Conclusion about a population based on the sample statistics may
not always be correct. Therefore, we use measures of reliability
while undertaking statistical inference. Two such measures are:
– Confidence level and
– Significance level.
Fundamentals (contd.)
Statistical Inference
• Confidence level is the proportion of times an estimation
procedure will be correct. For example, if we use an estimation
procedure and produce an estimate that has a confidence level of
95% that would mean – In the long run, estimates based on this
estimation procedure will be correct 95% of the time.
• Significance level measures how frequently a conclusion drawn
about the population will be wrong in the long run. A 5%
significance level means that, in the long run, a conclusion drawn
would be wrong 5% of the time.
Fundamentals (contd.)
Sampling
• e.g. a farmer ‘X’ has 1500 sheep. These constitute the
entire population of sheep for farmer ‘X’. If 15 sheep are
selected from this population, it will form a sample of 15
sheep from the population of 1500 sheep. Further, if these
15 sheep are selected at random, the sample would be a
simple random sample.
• Note that Sample and Population are relative to each other.
If we consider the entire district with 20,000 sheep, the 1500
sheep with farmer ‘X’ could be one sample of the district
population of sheep (though not a random sample of 1500
sheep from the district).
Fundamentals (contd.)
Types of Survey Errors
• Validity of survey results must be examined. We must evaluate the
purpose of survey and for whom it is conducted.
• Inferences based on non probability samples could be seriously
misleading
• The only way to make valid statistics inference about population is by
using a probability sample.
• Even surveys based on probabilistic samples are subject to four types of
errors:
- Coverage error
- Nonresponse error
- Sampling error
- Measurement error
•
Fundamentals (contd.)
Types of Survey Errors (contd.)
• Our aim should be to minimize these four errors.
Ex.
non-response bias i.e. bias introduced when we ignore the fact that
certain people may not respond to few questions. The bias gets
introduces when such people belong more to one segment. E.g.
consider a question “Have you ever been arrested?” There may be
poor response to this question from people who have indeed been
arrested.
Fundamentals (contd.)
Examples: Use of Statistical Inference in Business Situations
•A pharmaceutical manufacturer interested in marketing a new drug may be
required to prove that the drug does not cause any side effects. The drug
may be tested on a random sample of people and the technique of
statistical inference may be used to draw conclusion about the entire
population.
•To assess the popularity of its ATMs, a bank may seek opinion of a
randomly selected sample of customers. Statistical inference can be used
to generalize the conclusions for the entire population of bank’s customers.
•A quality control engineer at a plant making bulbs needs to ensure that not
more than 3 % of the bulbs produced are defective. The engineer may
periodically collect random samples of bulbs and check their quality. Based
on the random samples, the engineer can draw conclusion about the
proportion of defective items in the entire population of bulbs.
Fundamentals (contd.)
Descriptive Statistics
Percentiles and Quartiles
Percentiles
• The Pth percentile of a group of numbers is that value below which
lie P% of the numbers in the group. The position of Pth percentile is
given by (n+1)P/100, where n is the number of data points.
• Ex: sales made by each of the 20 sales persons of a departmental
store are as follows:
• (arranged in ascending order – to be done in case data is not ordered)
6,9,10,12,13,14,14,15,16,16,16,17,17,18,18,19,20,21,22,24.
50th
percentile: 10.5 i.e.16
80th
percentile: 16.8 i.e.19.8
90th
percentile: 18.9 i.e. 21.9
Fundamentals (contd.)
Descriptive Statistics
Quartiles
• Quartile are special percentiles which break the distribution of data
into four groups.
• The first quartile is the 25th
percentile. It is the point below which
lie one fourth of data. Also called lower quartile.
• The second quartile is the 50th
percentile. It is the point below
which lie one half of data (also called median). Also called middle
quartile.
• The third quartile is the 75th
percentile. It is the point below which
lie 75 % of data. Also called upper quartile.
• The difference between third and first quartile is called
interquartile range. It is a measure of spread of data.
Exercise: Interquartile range for above example is 18.75 – 13.25 =
5.5.
Fundamentals (contd.)
Descriptive Statistics
Measures of Central Tendency
Common measures of central tendency (centre of data) : mean,
median, mode.
• Mean or Arithmetic Mean or Average:
– Strengths
– Limitations
(Sample Mean, Population Mean)
• Median
– Strengths
– Limitations
• Mode
– Strengths
– Limitations
Fundamentals (contd.)

Qt business statistics-lesson1-2013

  • 1.
    Prof. P.K.Suri School ofManagement, Delhi Technological University Quantitative Techniques (Statistics)
  • 2.
    Fundamentals • Statistics isa science. It is a way to get information from data to facilitate decision making or interpretation. • Examples of Data: − Daily wholesale prices and arrivals of a particular agricultural produce (say wheat) in particular markets for last 3 months ; − Marks of students in QT for last three years; − Weather data of a city for last 30 days, − House-wise, model-wise ownership of cars in a particular locality, etc. (Explain associated interpretations/ decisions).
  • 3.
    Fundamentals • Variable: Acharacteristic of an item or individual Variable Types: Categorical variables (Qualitative variables): (have values that can be placed only in categories e.g. Are you married: Yes/No) Numerical variables (Quantitative variables): (have values that represent quantities) - Are of two types : Discrete and Continuous • Data: The set of individual values associated with a variable
  • 4.
    Fundamentals (contd.) Data canbe: • quantitative or qualitative • in grouped or ungrouped form. Quantitative data can be subjected to arithmetic operations unlike qualitative data. The field of statistics deals with measurements (Quantitative or Qualitative). Four generally used scales of measurement (from weakest to strongest): To describe values of a categorical variable, we use: Nominal scale and Ordinal scale To describe values of a numerical variable, we use: Interval scale and Ratio scale
  • 5.
    Fundamentals (contd.) Nominal Scale:Here numbers are used simply as labels for categories. For example, an employee may be (M) Male/ (F) Female (even if numbers are assigned to categories, these are arbitrary); Weakest scale because you cannot specify any ranks across categories Ordinal Scale: Here, data elements are ordered according to their relative merit. Ex. A product may be ranked as 1, 2, 3 or 4 where 1 denotes worst quality and 4 the best quality. Ordinal scale does not tell us how much better a product is than others. It only tells that it is better. Thus, ordinal scale is weaker in the sense that it is silent about the amount of difference between categories.
  • 6.
    Fundamentals (contd.) • IntervalScale: An ordered scale in which the difference between measurement is an meaningful quantity but does not involve a true zero point. the value of 0 is assigned arbitrarily and thus we cannot take ratio of two measurements. But we can take ratio of intervals. Ex.: 70 C is 2 degrees warmer than 50 C and so is a comparison between 700 C and 720 C but the environmetal conditions are totally different. Ex: Time of a day is in interval scale. We cannot say that 10 AM is twice as long as 5 AM. But we can say that interval between 0 AM and 10 AM (10 hrs) is twice as long as interval between 0 AM and 5 AM (5 hrs). This is because 0 AM does not mean absence of any time.
  • 7.
    Fundamentals (contd.) • RatioScale: An ordered scale in which the difference between measurement is an meaningful quantity and involves a true 0 point (0 is in ratio scale is an absolute 0). Strongest scale. • If two measurements are in ratio scale, then we can take ratios of those measurements. Ex. Money is measured in ratio scale. A sum of Rs. 0 means no money and is thus an absolute zero. A sum of Rs. 100 is twice as large as Rs. 50. Other examples are height, weight, volume, area, length. [Note that in interval scale, the interval between two interval scale measurements is in ratio scale (not the individual observations). ]
  • 8.
    Fundamentals (contd.) Collecting Data DataSources Primary Data (Data which you collect yourself for doing analysis) Secondary Data (Data which is collected by someone else and you use for doing analysis) Sources could be: Data distributed by an organization or individual (e.g. Centre for Monitoring Indian Economy: www.cmie.com; CRISIL: www.crisil.com; Nielsen: provide consumer research data to telecom and mobile media companies) The outcomes of a designed experiment The responses from a survey The results of an observational study Data collected by ongoing business activities
  • 9.
    Samples and Population •Thedistinction between sample and population is very important in statistics. •A population is the group of all items of interest to an investigator (not necessarily group of people). Also called universe. In DTU campus, it may be population of B.Tech. students, population of MBA students, population of faculty members, etc. Other examples, Population of weights of cricket bats produced in a factory, population of cows in a village, etc. •A descriptive measure of population is called parameter e.g. average weight of bats produced, average milk given by cows in a village. •A sample is a subset of units selected from a population (sampling units vs sampled units) •A descriptive measure of sample is called statistic e.g. average weight of a sample of bats, average milk given by sampled cows. Fundamentals (contd.)
  • 10.
    Sampling • A sampleis drawn from a population using a sampling procedure. Non Probability Samples Judgment Samples Convenient Samples Probability Samples Simple Random Sampling (SRS) (With or Without replacement) Stratified Sampling Systematic Sampling Cluster Sampling, etc. • The aim is to get a representative sample of the population so that it leads to near accurate inferences about the population parameters. Fundamentals (contd.)
  • 11.
    Sampling Frame To beprepared before sampling. Partial sampling frame may lead to misleading results (e.g. when you exclude a particular group of people). Fundamentals (contd.)
  • 12.
    When do weprefer sampling over census approach of data collection? • When selecting a sample is less time consuming than selecting every item of the population •When selecting a sample is less costly than selecting every item of the population •Analyzing a sample is less cumbersome than analyzing enitre population • Data Cleaning: Removing outliers Fundamentals (contd.)
  • 13.
    Statistical Inference • Aconclusion drawn about a population based on the information in a sample from the population is called a statistical inference. • We use sample statistics to make inference about population parameters. • Conclusion about a population based on the sample statistics may not always be correct. Therefore, we use measures of reliability while undertaking statistical inference. Two such measures are: – Confidence level and – Significance level. Fundamentals (contd.)
  • 14.
    Statistical Inference • Confidencelevel is the proportion of times an estimation procedure will be correct. For example, if we use an estimation procedure and produce an estimate that has a confidence level of 95% that would mean – In the long run, estimates based on this estimation procedure will be correct 95% of the time. • Significance level measures how frequently a conclusion drawn about the population will be wrong in the long run. A 5% significance level means that, in the long run, a conclusion drawn would be wrong 5% of the time. Fundamentals (contd.)
  • 15.
    Sampling • e.g. afarmer ‘X’ has 1500 sheep. These constitute the entire population of sheep for farmer ‘X’. If 15 sheep are selected from this population, it will form a sample of 15 sheep from the population of 1500 sheep. Further, if these 15 sheep are selected at random, the sample would be a simple random sample. • Note that Sample and Population are relative to each other. If we consider the entire district with 20,000 sheep, the 1500 sheep with farmer ‘X’ could be one sample of the district population of sheep (though not a random sample of 1500 sheep from the district). Fundamentals (contd.)
  • 16.
    Types of SurveyErrors • Validity of survey results must be examined. We must evaluate the purpose of survey and for whom it is conducted. • Inferences based on non probability samples could be seriously misleading • The only way to make valid statistics inference about population is by using a probability sample. • Even surveys based on probabilistic samples are subject to four types of errors: - Coverage error - Nonresponse error - Sampling error - Measurement error • Fundamentals (contd.)
  • 17.
    Types of SurveyErrors (contd.) • Our aim should be to minimize these four errors. Ex. non-response bias i.e. bias introduced when we ignore the fact that certain people may not respond to few questions. The bias gets introduces when such people belong more to one segment. E.g. consider a question “Have you ever been arrested?” There may be poor response to this question from people who have indeed been arrested. Fundamentals (contd.)
  • 18.
    Examples: Use ofStatistical Inference in Business Situations •A pharmaceutical manufacturer interested in marketing a new drug may be required to prove that the drug does not cause any side effects. The drug may be tested on a random sample of people and the technique of statistical inference may be used to draw conclusion about the entire population. •To assess the popularity of its ATMs, a bank may seek opinion of a randomly selected sample of customers. Statistical inference can be used to generalize the conclusions for the entire population of bank’s customers. •A quality control engineer at a plant making bulbs needs to ensure that not more than 3 % of the bulbs produced are defective. The engineer may periodically collect random samples of bulbs and check their quality. Based on the random samples, the engineer can draw conclusion about the proportion of defective items in the entire population of bulbs. Fundamentals (contd.)
  • 19.
    Descriptive Statistics Percentiles andQuartiles Percentiles • The Pth percentile of a group of numbers is that value below which lie P% of the numbers in the group. The position of Pth percentile is given by (n+1)P/100, where n is the number of data points. • Ex: sales made by each of the 20 sales persons of a departmental store are as follows: • (arranged in ascending order – to be done in case data is not ordered) 6,9,10,12,13,14,14,15,16,16,16,17,17,18,18,19,20,21,22,24. 50th percentile: 10.5 i.e.16 80th percentile: 16.8 i.e.19.8 90th percentile: 18.9 i.e. 21.9 Fundamentals (contd.)
  • 20.
    Descriptive Statistics Quartiles • Quartileare special percentiles which break the distribution of data into four groups. • The first quartile is the 25th percentile. It is the point below which lie one fourth of data. Also called lower quartile. • The second quartile is the 50th percentile. It is the point below which lie one half of data (also called median). Also called middle quartile. • The third quartile is the 75th percentile. It is the point below which lie 75 % of data. Also called upper quartile. • The difference between third and first quartile is called interquartile range. It is a measure of spread of data. Exercise: Interquartile range for above example is 18.75 – 13.25 = 5.5. Fundamentals (contd.)
  • 21.
    Descriptive Statistics Measures ofCentral Tendency Common measures of central tendency (centre of data) : mean, median, mode. • Mean or Arithmetic Mean or Average: – Strengths – Limitations (Sample Mean, Population Mean) • Median – Strengths – Limitations • Mode – Strengths – Limitations Fundamentals (contd.)