SlideShare a Scribd company logo
1 of 44
BUSINESS STATISTICS
DATA ORGANIZATION, VISUALIZATION &
Description
By
Prof Alok Kumar Singh
Introduction
• Statistics is a way of thinking that can lead to better decisions.
• It is science of gathering, presenting, analyzing, and interpreting
data
• It uses mathematics and probability
• Statistics requires analytics skills and is an important part of
business education.
• The DCOVA framework guides your application of statistics.
• Modern-day information technology enables businesses to apply
statistics in new ways to solve business problems utilizing lots of
data and analytical tools.
A.K.Singh, IIM Nagpur
Introduction (Contd..)
• Business statistics provides a formal basis to:
• Summarize and visualize business data.
• Reach conclusions from business data.
• Make reliable predictions about business activities.
• Improve business processes.
A.K.Singh, IIM Nagpur
Statistics in Business (But not limited to…)
• Operations : Supply chain Performance and Benchmarking
• Accounting: Auditing and cost estimation, Financing
• Economics : Local, regional, national, and international economic
performance
• Finance : Investments and portfolio management
• Human Resource Management: Compensation and Performance
Measurement
• Management Information Systems :Performance of systems
• Marketing :Market analysis and Consumer research
• International Business : International Market and demographic
analysis
A.K.Singh, IIM Nagpur
Some Basic Definitions
Variable
• A characteristic of an item or individual.
Data
• The set of individual values associated with one or more variables.
Statistic
• A value that summarizes the data of a particular variable.
Descriptive Statistics
• The methods that primarily help summarize and present data.
A.K.Singh, IIM Nagpur
Some Basic Definitions (Contd…)
Inferential Statistics
• Methods that use data collected from a small group to reach conclusions
about larger group.
Population
• The whole collection of all persons, objects, or items under study
Census
• Gathering data from the entire population
Sample
• gathering data on a subset of the population
• Use information about the sample to infer about the population
A.K.Singh, IIM Nagpur
Population vs Sample
A.K.Singh, IIM Nagpur
Population Sample
Parameter vs. Statistic
• Parameter — descriptive measure of the population
• Usually represented by Greek letters
• Statistic — descriptive measure of a sample
• Usually represented by Roman letters
parameter
population
denotes

variance
population
denotes
2

 denotes populationstandard deviation
mean
sample
denotes
x
variance
sample
denotes
s2
deviation
standard
sample
denotes
s
A.K.Singh, IIM Nagpur
Process of Inferential Statistics
)
(parameter
Population
1.

)
(statistic
x
Sample
3.

estimate
to
x
Use
4.
sample
random
a
Select
2.
A.K.Singh, IIM Nagpur
Uncertainty in Business
• Inferences about parameters made under conditions of uncertainty
(which are always present in statistics)
• Uncertainty can be caused by
• Randomness in selection of a sample
• lack of knowledge about the source of the inferences
• change in conditions
A.K.Singh, IIM Nagpur
Statistics in Business
• Probability is used in statistics (will be discussed in details later
in the course)
• To estimate the level of confidence in a confidence interval
• To calculate the p-value in hypothesis testing
A.K.Singh, IIM Nagpur
Classifying Variables By Type
 Categorical (qualitative) variables take categories as their values
(data) such as “yes,no” or “blue, brown, green” or “Easy, Normal,
Tough” etc..
 Numerical (quantitative) variables have values (data) that represent a
counted or measured quantity.
 Discrete variables arise from a counting process.
 Continuous variables arise from a measuring process.
A.K.Singh, IIM Nagpur
Examples of Types of Variables
A.K.Singh, IIM Nagpur
Question Responses Variable Type
Do you have a Facebook profile?
How many whatsapp messages have you sent in the past 1 hour?
How long did the mobile app update take to download?
What is the colour of your eyes ?
What is your weight ?
In which class do you study ?
In which section you are ?
How do you rate New Netflix Series?
In nominal measurement the values just "name" the attribute
uniquely. Numbers are used to classify or categorize
• No ordering of the cases is implied.
• Gender.
• boys vs. girls or
• males vs. females
• Religion
• Hindu
• Muslim
• Sikh
• Christian
• Jain etc.
• Employment Classification
• 1 for Educator
• 2 for Construction Worker
• 3 for Manufacturing Worker
Levels of Data Measurement : Nominal
A.K.Singh, IIM Nagpur
• A variable is ordinal measurable if ranking is possible for values of the
variable. However, the difference between the numbers are not
comparable.
• For example:
• A gold medal reflects superior performance to a silver or bronze medal in the
Olympics.
• You can’t say a gold and a bronze medal average out to a silver medal,
though.
Position within an organization
• 1 for President
• 2 for Vice President
• 3 for Plant Manager
• 4 for Department Supervisor
• 5 for Employee
• Preference scales are typically ordinal
How much do you like this cereal?
• Like it a lot, somewhat like it, neutral, somewhat dislike it, dislike it a lot.
Levels of Data Measurement : Ordinal
A.K.Singh, IIM Nagpur
In interval measurement the distance between attributes
does have meaning.
• Numerical data typically fall into this category.
• Doesn’t have any absolute 0 value
• For example :
• Measuring temperature
• Scales for measurement
Levels of Data Measurement : Interval
A.K.Singh, IIM Nagpur
• Ratio measurement there is always a reference point
that is meaningful (either 0 for rates or 1 for ratios)
• This means that you can construct a meaningful fraction
(or ratio) with a ratio variable.
• In applied social research most "count" variables are ratio,
for example, the number of clients in past six months.
• Height, Weight, and Volume
• Profit and Loss
Levels of Data Measurement : Ratio
A.K.Singh, IIM Nagpur
Types of Variables (Summary)
Variables
Categorical Numerical
Discrete Continuous
Examples:
 Marital Status
 Political Party
 Eye Color
(Defined Categories)
Examples:
 Number of Children
 Defects per hour
(Counted items)
Examples:
 Weight
 Voltage
(Measured
characteristics)
Nominal Ordinal
Examples: Ratings
 Good, Better, Best
 Low, Med, High
(Ordered Categories)
A.K.Singh, IIM Nagpur
Sources of Data
 Primary Sources: The data collector is the one using the data for
analysis:
 Data from a political survey.
 Data collected from an experiment.
 Observed data.
 Secondary Sources: The person performing data analysis is not the
data collector:
 Analyzing census data.
 Examining data from print journals or data published on the
internet.
A.K.Singh, IIM Nagpur
Organizing and Visualization
of Variables
A.K.Singh, IIM Nagpur
Categorical Data
One Categorical
Variable
Summary Table
Two/More
Categorical
Variable
Contingency
Table
Organization of Categorical Data
Organization of Numerical Data
A.K.Singh, IIM Nagpur
Numerical
Data
Ordered
Array
Frequency
Distribution
Cumulative
Distribution
Visualization of Categorical Data
Categorical
Data
Visualizing Data
Bar
Chart
Summary
Table For One
Variable
Contingency
Table For Two
Variables
Side By Side
Bar Chart
Pie Chart
Pareto
Chart
A.K.Singh, IIM Nagpur
Visualization of Numerical Data
Numerical Data : 1 Variable
Ordered Array
Stem-and-Leaf
Display
Histogram Polygon Ogive
Frequency Distributions
and
Cumulative Distributions
A.K.Singh, IIM Nagpur
Visualization of Numerical Data (Contd..)
Numerical Data : 2 Variable
Scatter Plot Time Series
A.K.Singh, IIM Nagpur
Organizing Many Variables
• Use Pivot Chart
• It summarizes variables as a multidimensional summary table.
• It allows interactive changing of the level of summarization and
formatting of the variables.
• It allows to interactively “slice” data to summarize subsets of data
that meet specified criteria.
• It can be used to discover possible patterns and relationships in
multidimensional data that simpler tables and charts would fail to
make apparent.
A.K.Singh, IIM Nagpur
Best Practices for Constructing Visualizations
 Use the simplest possible visualization.
 Include a title & label all axes.
 Include a scale for each axis if the chart contains axes.
 Begin the scale for a vertical axis at zero & use a constant scale.
 Avoid 3D or “exploded” effects etc..
 Use consistent colorings in charts meant to be compared.
 Avoid using uncommon chart types including radar, surface, bubble, cone,
and pyramid charts.
A.K.Singh, IIM Nagpur
DATA DESCRIPTION
Introduction
 The central tendency is the extent to which the values of a numerical
variable group around a typical or central value.
 The variation is the amount of dispersion or scattering away from a
central value that the values of a numerical variable show.
 The shape is the pattern of the distribution of values from the lowest
value to the highest value.
A.K.Singh, IIM Nagpur
Measures of Central Tendency
• Mean
• Average of all the values
• Affected by extreme values (Also called Outliers)
• Median
• In an ordered array, the median is the “middle” number (50% above, 50% below).
• Median position can be determined by formula (n+1)/2, where n is the number of
values of a given data set. The value at that given position is called median value.
• For a data set with even number of values, it will be average of the two middle
values.
• Less sensitive than the mean to extreme values.
• Mode
• Value that occurs most often.
• Not affected by extreme values.
• Used for either numerical or categorical data.
• There may be no mode.
• There may be several modes.
A.K.Singh, IIM Nagpur
Measures of Central Tendency (Contd..)
• Used to measure the rate of change of a variable over time.
 The mean is generally used, unless extreme values (outliers) exist.
 The median is often used, since the median is not sensitive to extreme
values. For example, median home prices may be reported for a
region; it is less sensitive to outliers.
 In many situations it makes sense to report both the mean and the
median.
A.K.Singh, IIM Nagpur
Measures of Central Tendency: Summary
Central Tendency
Arithmetic
Mean
Median Mode
n
X
X
n
i
i


 1
Middle value in
the ordered array
Most frequently
observed value
A.K.Singh, IIM Nagpur
Measures of Variation
A.K.Singh, IIM Nagpur
 Measures of variation give information on the spread or
variability or dispersion of the data values.
 It is important to look at the dispersions as well and not only at
central value for better understanding.
Variation
Standard
Deviation
Coefficient
of Variation
Range Variance
Measures of Variation (Contd..)
• Range = Xlargest – Xsmallest
• Does not account for how the data are distributed.
• Sensitive to outliers
• Sample Variance : Average (approximately) of squared deviations of values from
the mean.
• Sample Standard Deviation : is the square root of the variance.
• Has the same units as the original data.
• Most commonly used measure of variation.
• Shows variation about the mean.
• For Population, the denominator will be n in place of n-1 (makes sample
estimators unbiased). (Discussion on unbiased estimator is for advanced courses).
A.K.Singh, IIM Nagpur
1
-
n
)
X
(X
S
n
1
i
2
i
2




1
-
n
)
X
(X
S
n
1
i
2
i




Measures of Variation (Contd..)
• Coefficient of Variation
• Measures relative variation.
• Always in percentage (%).
• Shows variation relative to mean.
• Can be used to compare the variability of two or more sets of data measured
in different units.
A.K.Singh, IIM Nagpur
100%
X
S
CV 









Measures of Variation: Comparing Coefficients of Variation
• Stock A:
• Mean price last year = $50.
• Standard deviation = $5.
𝐶𝑉 𝐴 =
𝑆𝐴
𝑋𝐴
∗ 100 =
5
50
∗ 100 = 10 %
• Stock B:
• Mean price last year = $60.
• Standard deviation = $10.
𝐶𝑉 𝐵 =
𝑆𝐵
𝑋𝐵
∗ 100 =
10
60
∗ 100 = 16.67 %
A.K.Singh, IIM Nagpur
Shape of a Distribution
• Describes how data are distributed.
• Two useful shape related statistics are:
• Skewness:
• Measures the extent to which data values are not symmetrical.
• Kurtosis:
• Kurtosis measures the peakedness of the curve of the distribution—that
is, how sharply the curve rises approaching the center of the distribution.
A.K.Singh, IIM Nagpur
Shape of a Distribution (Skewness)
• Measures the extent to which data is not symmetrical. Most widely used
formula for Coefficient for Skewness is 3(Mean-Median)/ SD.
Mean = Median = Mode
Mean < Median < Mode Mode < Median < Mean
Right-Skewed
Left-Skewed Symmetric
Skewness
Statistic < 0 0 >0
MEAN MEDIAN MODE
A.K.Singh, IIM Nagpur
Shape of a Distribution -- Kurtosis
• It measures how
sharply the curve
rises approaching
the center of the
distribution
A.K.Singh, IIM Nagpur
Sharper Peak
Than Bell-Shaped
(Kurtosis > 3)
Flatter Than
Bell-Shaped
(Kurtosis < 3)
Bell-Shaped
(Kurtosis = 3)
Exploring Numerical Data Using Quartiles
• The five-number summary.
• Constructing a boxplot.
• General formula of finding percentile position is = (P/100)*n where n is
the number of values in a given data set.
• If the result is a whole number then it is the ranked position to use.
• If the result is a fractional half , then average the two corresponding
data values.
• The IQR is Q3 – Q1 and measures the spread in the middle 50% of the
data.
• The IQR is also called the midspread because it covers the middle 50%
of the data.
• The IQR is a measure of variability that is not influenced by outliers or
extreme values.
A.K.Singh, IIM Nagpur
The Empirical Rule for Normal Distribution
Vs Chebyshev’s Rule for any other distribution
• Chebyshev’s Rule : Regardless of how the data are distributed, at least (1 - 1/k2) *
100% of the values will fall within k standard deviations of the mean (for k > 1).
μ
68%
𝜇 ± 2𝜎
𝜇 ± 𝜎
𝜇 ± 3𝜎
Another 13.5 %
Another 2.35 %
Range Empirical for Normal
Curve
Chebyshev’s Rule for any
Distribution
𝜇 ± 𝜎 68% NA for K< 1
𝜇 ± 2𝜎 95% 75%
𝜇 ± 3𝜎 99.7% 88.89%
A.K.Singh, IIM Nagpur
Measures Of The Relationship Between Two Numerical
Variables
• Scatter plots allow you to visually examine the relationship
• The Covariance {Cov(x,y)}
• The covariance measures the linear relationship between two numerical variables
• Only concerned with the nature of the relationship.
• No causal effect is implied.
• > 0, < 0, = 0, nature of movement of variable is same, opposite and are independent respectively.
Relative strength of relationship is missing.
• The Coefficient of Correlation (r)
• Measures the relative strength of the linear relationship between two numerical
variables.
• Varies between -1 to +1, which represents strong negative relationship to strong
positive relationship
• The coefficient of Determination ( r2)
• shows percentage variation in y which is explained by all the x variables together
• Varies between 0 and 1, higher the better the causal relationship explained.
A.K.Singh, IIM Nagpur
A.K.Singh, IIM Nagpur
A.K.Singh, IIM Nagpur

More Related Content

Similar to BS 1 and 2 30th Oct.pptx

data analysis in Statistics-2023 guide 2023
data analysis in Statistics-2023 guide 2023data analysis in Statistics-2023 guide 2023
data analysis in Statistics-2023 guide 2023ayesha455941
 
Basic stat analysis using excel
Basic stat analysis using excelBasic stat analysis using excel
Basic stat analysis using excelParag Shah
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptxIndhuGreen
 
Basic Statistics in 1 hour.pptx
Basic Statistics in 1 hour.pptxBasic Statistics in 1 hour.pptx
Basic Statistics in 1 hour.pptxParag Shah
 
Quantitative Research Design.pptx
Quantitative Research Design.pptxQuantitative Research Design.pptx
Quantitative Research Design.pptxAlok Kumar Gaurav
 
Statistical analysis training course
Statistical analysis training courseStatistical analysis training course
Statistical analysis training courseMarwa Abo-Amra
 
presentaion-ni-owel.pptx
presentaion-ni-owel.pptxpresentaion-ni-owel.pptx
presentaion-ni-owel.pptxJareezRobios
 
Sampling fundamentals
Sampling fundamentalsSampling fundamentals
Sampling fundamentalsSreeraj S R
 
Introduction to Data Analysis for Nurse Researchers
Introduction to Data Analysis for Nurse ResearchersIntroduction to Data Analysis for Nurse Researchers
Introduction to Data Analysis for Nurse ResearchersRupa Verma
 
Introduction to data analysis using excel
Introduction to data analysis using excelIntroduction to data analysis using excel
Introduction to data analysis using excelAhmed Essam
 
Data Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityData Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityIkbal Ahmed
 

Similar to BS 1 and 2 30th Oct.pptx (20)

data analysis in Statistics-2023 guide 2023
data analysis in Statistics-2023 guide 2023data analysis in Statistics-2023 guide 2023
data analysis in Statistics-2023 guide 2023
 
Basic stat analysis using excel
Basic stat analysis using excelBasic stat analysis using excel
Basic stat analysis using excel
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
Quantitative research
Quantitative researchQuantitative research
Quantitative research
 
Basic Statistics in 1 hour.pptx
Basic Statistics in 1 hour.pptxBasic Statistics in 1 hour.pptx
Basic Statistics in 1 hour.pptx
 
Quantitative Research Design.pptx
Quantitative Research Design.pptxQuantitative Research Design.pptx
Quantitative Research Design.pptx
 
ABOUT STATISTICS
ABOUT STATISTICSABOUT STATISTICS
ABOUT STATISTICS
 
Statistics.pptx
Statistics.pptxStatistics.pptx
Statistics.pptx
 
Statistical analysis training course
Statistical analysis training courseStatistical analysis training course
Statistical analysis training course
 
Statistics with R
Statistics with R Statistics with R
Statistics with R
 
presentaion-ni-owel.pptx
presentaion-ni-owel.pptxpresentaion-ni-owel.pptx
presentaion-ni-owel.pptx
 
Sampling fundamentals
Sampling fundamentalsSampling fundamentals
Sampling fundamentals
 
Introduction to Data Analysis for Nurse Researchers
Introduction to Data Analysis for Nurse ResearchersIntroduction to Data Analysis for Nurse Researchers
Introduction to Data Analysis for Nurse Researchers
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Introduction.pdf
Introduction.pdfIntroduction.pdf
Introduction.pdf
 
presentaion ni owel iwiw.pptx
presentaion ni owel iwiw.pptxpresentaion ni owel iwiw.pptx
presentaion ni owel iwiw.pptx
 
Introduction to data analysis using excel
Introduction to data analysis using excelIntroduction to data analysis using excel
Introduction to data analysis using excel
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
STATISTICS-E.pdf
STATISTICS-E.pdfSTATISTICS-E.pdf
STATISTICS-E.pdf
 
Data Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityData Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & Normality
 

Recently uploaded

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 

Recently uploaded (20)

Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 

BS 1 and 2 30th Oct.pptx

  • 1. BUSINESS STATISTICS DATA ORGANIZATION, VISUALIZATION & Description By Prof Alok Kumar Singh
  • 2. Introduction • Statistics is a way of thinking that can lead to better decisions. • It is science of gathering, presenting, analyzing, and interpreting data • It uses mathematics and probability • Statistics requires analytics skills and is an important part of business education. • The DCOVA framework guides your application of statistics. • Modern-day information technology enables businesses to apply statistics in new ways to solve business problems utilizing lots of data and analytical tools. A.K.Singh, IIM Nagpur
  • 3. Introduction (Contd..) • Business statistics provides a formal basis to: • Summarize and visualize business data. • Reach conclusions from business data. • Make reliable predictions about business activities. • Improve business processes. A.K.Singh, IIM Nagpur
  • 4. Statistics in Business (But not limited to…) • Operations : Supply chain Performance and Benchmarking • Accounting: Auditing and cost estimation, Financing • Economics : Local, regional, national, and international economic performance • Finance : Investments and portfolio management • Human Resource Management: Compensation and Performance Measurement • Management Information Systems :Performance of systems • Marketing :Market analysis and Consumer research • International Business : International Market and demographic analysis A.K.Singh, IIM Nagpur
  • 5. Some Basic Definitions Variable • A characteristic of an item or individual. Data • The set of individual values associated with one or more variables. Statistic • A value that summarizes the data of a particular variable. Descriptive Statistics • The methods that primarily help summarize and present data. A.K.Singh, IIM Nagpur
  • 6. Some Basic Definitions (Contd…) Inferential Statistics • Methods that use data collected from a small group to reach conclusions about larger group. Population • The whole collection of all persons, objects, or items under study Census • Gathering data from the entire population Sample • gathering data on a subset of the population • Use information about the sample to infer about the population A.K.Singh, IIM Nagpur
  • 7. Population vs Sample A.K.Singh, IIM Nagpur Population Sample
  • 8. Parameter vs. Statistic • Parameter — descriptive measure of the population • Usually represented by Greek letters • Statistic — descriptive measure of a sample • Usually represented by Roman letters parameter population denotes  variance population denotes 2   denotes populationstandard deviation mean sample denotes x variance sample denotes s2 deviation standard sample denotes s A.K.Singh, IIM Nagpur
  • 9. Process of Inferential Statistics ) (parameter Population 1.  ) (statistic x Sample 3.  estimate to x Use 4. sample random a Select 2. A.K.Singh, IIM Nagpur
  • 10. Uncertainty in Business • Inferences about parameters made under conditions of uncertainty (which are always present in statistics) • Uncertainty can be caused by • Randomness in selection of a sample • lack of knowledge about the source of the inferences • change in conditions A.K.Singh, IIM Nagpur
  • 11. Statistics in Business • Probability is used in statistics (will be discussed in details later in the course) • To estimate the level of confidence in a confidence interval • To calculate the p-value in hypothesis testing A.K.Singh, IIM Nagpur
  • 12. Classifying Variables By Type  Categorical (qualitative) variables take categories as their values (data) such as “yes,no” or “blue, brown, green” or “Easy, Normal, Tough” etc..  Numerical (quantitative) variables have values (data) that represent a counted or measured quantity.  Discrete variables arise from a counting process.  Continuous variables arise from a measuring process. A.K.Singh, IIM Nagpur
  • 13. Examples of Types of Variables A.K.Singh, IIM Nagpur Question Responses Variable Type Do you have a Facebook profile? How many whatsapp messages have you sent in the past 1 hour? How long did the mobile app update take to download? What is the colour of your eyes ? What is your weight ? In which class do you study ? In which section you are ? How do you rate New Netflix Series?
  • 14. In nominal measurement the values just "name" the attribute uniquely. Numbers are used to classify or categorize • No ordering of the cases is implied. • Gender. • boys vs. girls or • males vs. females • Religion • Hindu • Muslim • Sikh • Christian • Jain etc. • Employment Classification • 1 for Educator • 2 for Construction Worker • 3 for Manufacturing Worker Levels of Data Measurement : Nominal A.K.Singh, IIM Nagpur
  • 15. • A variable is ordinal measurable if ranking is possible for values of the variable. However, the difference between the numbers are not comparable. • For example: • A gold medal reflects superior performance to a silver or bronze medal in the Olympics. • You can’t say a gold and a bronze medal average out to a silver medal, though. Position within an organization • 1 for President • 2 for Vice President • 3 for Plant Manager • 4 for Department Supervisor • 5 for Employee • Preference scales are typically ordinal How much do you like this cereal? • Like it a lot, somewhat like it, neutral, somewhat dislike it, dislike it a lot. Levels of Data Measurement : Ordinal A.K.Singh, IIM Nagpur
  • 16. In interval measurement the distance between attributes does have meaning. • Numerical data typically fall into this category. • Doesn’t have any absolute 0 value • For example : • Measuring temperature • Scales for measurement Levels of Data Measurement : Interval A.K.Singh, IIM Nagpur
  • 17. • Ratio measurement there is always a reference point that is meaningful (either 0 for rates or 1 for ratios) • This means that you can construct a meaningful fraction (or ratio) with a ratio variable. • In applied social research most "count" variables are ratio, for example, the number of clients in past six months. • Height, Weight, and Volume • Profit and Loss Levels of Data Measurement : Ratio A.K.Singh, IIM Nagpur
  • 18. Types of Variables (Summary) Variables Categorical Numerical Discrete Continuous Examples:  Marital Status  Political Party  Eye Color (Defined Categories) Examples:  Number of Children  Defects per hour (Counted items) Examples:  Weight  Voltage (Measured characteristics) Nominal Ordinal Examples: Ratings  Good, Better, Best  Low, Med, High (Ordered Categories) A.K.Singh, IIM Nagpur
  • 19. Sources of Data  Primary Sources: The data collector is the one using the data for analysis:  Data from a political survey.  Data collected from an experiment.  Observed data.  Secondary Sources: The person performing data analysis is not the data collector:  Analyzing census data.  Examining data from print journals or data published on the internet. A.K.Singh, IIM Nagpur
  • 21. A.K.Singh, IIM Nagpur Categorical Data One Categorical Variable Summary Table Two/More Categorical Variable Contingency Table Organization of Categorical Data
  • 22. Organization of Numerical Data A.K.Singh, IIM Nagpur Numerical Data Ordered Array Frequency Distribution Cumulative Distribution
  • 23. Visualization of Categorical Data Categorical Data Visualizing Data Bar Chart Summary Table For One Variable Contingency Table For Two Variables Side By Side Bar Chart Pie Chart Pareto Chart A.K.Singh, IIM Nagpur
  • 24. Visualization of Numerical Data Numerical Data : 1 Variable Ordered Array Stem-and-Leaf Display Histogram Polygon Ogive Frequency Distributions and Cumulative Distributions A.K.Singh, IIM Nagpur
  • 25. Visualization of Numerical Data (Contd..) Numerical Data : 2 Variable Scatter Plot Time Series A.K.Singh, IIM Nagpur
  • 26. Organizing Many Variables • Use Pivot Chart • It summarizes variables as a multidimensional summary table. • It allows interactive changing of the level of summarization and formatting of the variables. • It allows to interactively “slice” data to summarize subsets of data that meet specified criteria. • It can be used to discover possible patterns and relationships in multidimensional data that simpler tables and charts would fail to make apparent. A.K.Singh, IIM Nagpur
  • 27. Best Practices for Constructing Visualizations  Use the simplest possible visualization.  Include a title & label all axes.  Include a scale for each axis if the chart contains axes.  Begin the scale for a vertical axis at zero & use a constant scale.  Avoid 3D or “exploded” effects etc..  Use consistent colorings in charts meant to be compared.  Avoid using uncommon chart types including radar, surface, bubble, cone, and pyramid charts. A.K.Singh, IIM Nagpur
  • 29. Introduction  The central tendency is the extent to which the values of a numerical variable group around a typical or central value.  The variation is the amount of dispersion or scattering away from a central value that the values of a numerical variable show.  The shape is the pattern of the distribution of values from the lowest value to the highest value. A.K.Singh, IIM Nagpur
  • 30. Measures of Central Tendency • Mean • Average of all the values • Affected by extreme values (Also called Outliers) • Median • In an ordered array, the median is the “middle” number (50% above, 50% below). • Median position can be determined by formula (n+1)/2, where n is the number of values of a given data set. The value at that given position is called median value. • For a data set with even number of values, it will be average of the two middle values. • Less sensitive than the mean to extreme values. • Mode • Value that occurs most often. • Not affected by extreme values. • Used for either numerical or categorical data. • There may be no mode. • There may be several modes. A.K.Singh, IIM Nagpur
  • 31. Measures of Central Tendency (Contd..) • Used to measure the rate of change of a variable over time.  The mean is generally used, unless extreme values (outliers) exist.  The median is often used, since the median is not sensitive to extreme values. For example, median home prices may be reported for a region; it is less sensitive to outliers.  In many situations it makes sense to report both the mean and the median. A.K.Singh, IIM Nagpur
  • 32. Measures of Central Tendency: Summary Central Tendency Arithmetic Mean Median Mode n X X n i i    1 Middle value in the ordered array Most frequently observed value A.K.Singh, IIM Nagpur
  • 33. Measures of Variation A.K.Singh, IIM Nagpur  Measures of variation give information on the spread or variability or dispersion of the data values.  It is important to look at the dispersions as well and not only at central value for better understanding. Variation Standard Deviation Coefficient of Variation Range Variance
  • 34. Measures of Variation (Contd..) • Range = Xlargest – Xsmallest • Does not account for how the data are distributed. • Sensitive to outliers • Sample Variance : Average (approximately) of squared deviations of values from the mean. • Sample Standard Deviation : is the square root of the variance. • Has the same units as the original data. • Most commonly used measure of variation. • Shows variation about the mean. • For Population, the denominator will be n in place of n-1 (makes sample estimators unbiased). (Discussion on unbiased estimator is for advanced courses). A.K.Singh, IIM Nagpur 1 - n ) X (X S n 1 i 2 i 2     1 - n ) X (X S n 1 i 2 i    
  • 35. Measures of Variation (Contd..) • Coefficient of Variation • Measures relative variation. • Always in percentage (%). • Shows variation relative to mean. • Can be used to compare the variability of two or more sets of data measured in different units. A.K.Singh, IIM Nagpur 100% X S CV          
  • 36. Measures of Variation: Comparing Coefficients of Variation • Stock A: • Mean price last year = $50. • Standard deviation = $5. 𝐶𝑉 𝐴 = 𝑆𝐴 𝑋𝐴 ∗ 100 = 5 50 ∗ 100 = 10 % • Stock B: • Mean price last year = $60. • Standard deviation = $10. 𝐶𝑉 𝐵 = 𝑆𝐵 𝑋𝐵 ∗ 100 = 10 60 ∗ 100 = 16.67 % A.K.Singh, IIM Nagpur
  • 37. Shape of a Distribution • Describes how data are distributed. • Two useful shape related statistics are: • Skewness: • Measures the extent to which data values are not symmetrical. • Kurtosis: • Kurtosis measures the peakedness of the curve of the distribution—that is, how sharply the curve rises approaching the center of the distribution. A.K.Singh, IIM Nagpur
  • 38. Shape of a Distribution (Skewness) • Measures the extent to which data is not symmetrical. Most widely used formula for Coefficient for Skewness is 3(Mean-Median)/ SD. Mean = Median = Mode Mean < Median < Mode Mode < Median < Mean Right-Skewed Left-Skewed Symmetric Skewness Statistic < 0 0 >0 MEAN MEDIAN MODE A.K.Singh, IIM Nagpur
  • 39. Shape of a Distribution -- Kurtosis • It measures how sharply the curve rises approaching the center of the distribution A.K.Singh, IIM Nagpur Sharper Peak Than Bell-Shaped (Kurtosis > 3) Flatter Than Bell-Shaped (Kurtosis < 3) Bell-Shaped (Kurtosis = 3)
  • 40. Exploring Numerical Data Using Quartiles • The five-number summary. • Constructing a boxplot. • General formula of finding percentile position is = (P/100)*n where n is the number of values in a given data set. • If the result is a whole number then it is the ranked position to use. • If the result is a fractional half , then average the two corresponding data values. • The IQR is Q3 – Q1 and measures the spread in the middle 50% of the data. • The IQR is also called the midspread because it covers the middle 50% of the data. • The IQR is a measure of variability that is not influenced by outliers or extreme values. A.K.Singh, IIM Nagpur
  • 41. The Empirical Rule for Normal Distribution Vs Chebyshev’s Rule for any other distribution • Chebyshev’s Rule : Regardless of how the data are distributed, at least (1 - 1/k2) * 100% of the values will fall within k standard deviations of the mean (for k > 1). μ 68% 𝜇 ± 2𝜎 𝜇 ± 𝜎 𝜇 ± 3𝜎 Another 13.5 % Another 2.35 % Range Empirical for Normal Curve Chebyshev’s Rule for any Distribution 𝜇 ± 𝜎 68% NA for K< 1 𝜇 ± 2𝜎 95% 75% 𝜇 ± 3𝜎 99.7% 88.89% A.K.Singh, IIM Nagpur
  • 42. Measures Of The Relationship Between Two Numerical Variables • Scatter plots allow you to visually examine the relationship • The Covariance {Cov(x,y)} • The covariance measures the linear relationship between two numerical variables • Only concerned with the nature of the relationship. • No causal effect is implied. • > 0, < 0, = 0, nature of movement of variable is same, opposite and are independent respectively. Relative strength of relationship is missing. • The Coefficient of Correlation (r) • Measures the relative strength of the linear relationship between two numerical variables. • Varies between -1 to +1, which represents strong negative relationship to strong positive relationship • The coefficient of Determination ( r2) • shows percentage variation in y which is explained by all the x variables together • Varies between 0 and 1, higher the better the causal relationship explained. A.K.Singh, IIM Nagpur

Editor's Notes

  1. 17
  2. 21