3. • PhD (IIT BHU Varanasi)
• M Tech (IIT BHU Varanasi)
• B Tech (GBTU)
• 3 Research Paper in SCOPUS/ABDC Indexed
journals
• 8 papers reviewed as a reviewer
• Six sigma green belt
5. Univariate Descriptive Analysis
• Measures of Central Tendency- Mean, Median,
Mode
• Measures of Variability- Range, Variance, Standard
Deviation, Co-efficient of Deviation
• Measures of Shape- Skewness and Kurtosis
• Measures of Stability- Standard Error
7. Data Analysis
• The Process of cleaning, transforming,
interpreting, analyzing and visualizing the data
to extract useful information and gain valuable
insights to make more effective business
decisions is called data analysis.
8. Variables
• Variables: Any character, characteristics or
quality that varies is termed a variable.
• E.g.: To collect the basic clinical and
demographic information on patients with
particular illness. Variables of interest may
include Gender (M/F), age and height of the
patients.
9. Variable
Categorical Numerical
Nominal Ordinal Discrete Continuous
Categories are
mutually
exclusive and
unordered.
Eg. Gender (M/F)
Blood Group
(A/B/AB/O)
Categories are
mutually exclusive
and ordered.
Eg. Disease
severity (Mild,
Moderate and
Severe)
Integer values,
typically counts no
notion of
magnitude. Eg. No.
of children
vaccinated, days
sick per year
Takes any value in
a range of values
have a magnitude.
E.g. weight in kg
and Height in cm
11. Three types of analysis
• Univariate analysis: the examination of cases on only
one variable at a time (e.g., weight of college
students).
• Bivariate analysis: the examination of two variables
simultaneously (e.g., the relation between gender
and weight of college students).
• Multivariate analysis: examination of two variables
simultaneously (e.g., the relationship between
gender, race, and weight of college students).
12. Purpose of different type of analysis
• Univariate analysis: mainly description
• Bivariate analysis: Determining the empirical
relationship between two variables.
• Multivariate analysis: Determining the empirical
relationship among multiple variables.
13. Univariate
• The objective of univariate analysis is to derive the
data, define and summarize it and analyze the
pattern present in it.
• Univariate techniques are appropriate when there is
a single measurement of each element in the sample
or when there are several measurements of each
element but each variable is analyzed in isolation.
14. Univariate
Descriptive Inferential
• Measures of Central Tendency- Mean,
Median, Mode
• Measures of Variability- Range,
Variance, Standard Deviation, Co-efficient
of Deviation
• Measures of Shape- Skewness and
Kurtosis
• Measures of Stability- Standard Error
• z test
• t test
• Chi square test
15. Numerical Methods
• Mean
– Let X1, X2, X3,….Xn be the n data points, then mean
of data is defined as
– Mean provide the central value about which the
data is spread out.
16. Numerical Methods
• Median
– Median is the value which divide the data in two
halves
– Let X1, X2, X3,….Xn be the n data points
– Order the n data values
– If the number of data points is odd then sample
median is the value in position of (n+1)/2
– If the number of data points is even then sample
median is the average of value in position of n/2
and (n/2+1)
17. Mean or Median?
• Both the measures provide the “middle” value
of data, so how do they compare?
– Median is robust again extreme values in the data
– While mean is affected by the extreme values
• Example: 8, 9, 10, 11, 12 be the five data
points
– Mean = 10 and Median = 10
– Replace 12 by 18
• Mean = 11.2 but Median =10
18. Numerical Methods
• Mode
– Mode is the a value in data that occurs with
highest frequency
– It’s the most probable value of the data
– It is possible to have data that has more than one
Mode value. Such data is called multimodal.
19. Measures of Variability
• Percentile
– Order the data in ascending order
• Then, p1 in called the first percentile if 1% of points lie
below this value
• Similarly pk is called the k% of data points lie below this
value, where 0≤k≤100
• Quartile
– P25 is called the 1st quartile Q1
– P75 is called the 3rd quartile Q3
– P50 is Median
20. Measure of Dispersion
• Measures the spread of data
– Range
– Variation or standard deviation
• Measures the spread about mean/average value of
data
– Interquartile range
• Measures the spread about median value of the data
21. Measure of Dispersion
• Range = M-m, where,
– M = Max (x1, x2, ….xn)
– m = Min (x1, x2, ….xn)
• Variance
– S2 =
– Standard deviation = S
• Interquartile range: Q3 - Q1
22. Standard Deviation
• Standard Deviation is most commonly used
measure of dispersion.
– Under the assumption of normality the range of
Covers 67% of the data.
• Hence, this is commonly used to show possible error in
the observed value of data
23. Graphical Method
• Histogram or Bar chart
– Frequency Plot
• Pie Chart
• Cumulative frequency plot
• Box and Whisker plot
24.
25.
26. Bivariate
• Bi means two and variate means variable, so here
there are two variables. The analysisis related to
cause and the relationship between the two
variables.
• Correlation
• Covariance