Data Science - Normal Distribution
Dr.M.Pyingkodi
Associate Professor
Dept. of MCA
Kongu Engineering College
Erode, Tamil Nadu,India
• A Normal Distribution
• A Normal Distribution is a bell-shaped curve that is symmetric about the
mean, where most of the data points are clustered around the center and
fewer are found as you move away from the mean.
• Characteristics
• 1. Symmetry
• The distribution is symmetric about the mean.
• This means that the left side of the distribution is a mirror image of the
right side.
• 2. Bell-Shaped Curve
• The graph of a normal distribution is shaped like a bell, with a single
peak at the mean.
• It rises gradually on both sides and tails off symmetrically.
• 3. Mean, Median, and Mode Equality
• In a normal distribution, the mean, median, and mode are all equal and
located at the center of the distribution.
• This central point is also known as the peak of the curve.
Dr.M.Pyingkodi, Associate Professor, Dept of MCA , Kongu Engineering College, Tamil Nadu, India 2
• A Normal Distribution
• 4. 68-95-99.7 Rule
• Approximately 68% of the data falls within one standard deviation of the
mean, about 95% falls within two standard deviations, and about 99.7%
falls within three standard deviations.
• This is often referred to as the empirical rule.
• 5. Asymptotic Nature
• The tails of the normal distribution approach, but never actually touch,
the horizontal axis.
• This means that there is a theoretical possibility of observing values far
from the mean, though these occurrences become increasingly rare.
Dr.M.Pyingkodi, Associate Professor, Dept of MCA , Kongu Engineering College, Tamil Nadu, India 3
z-score
• A z-score is a statistical measure that indicates how many standard
deviations a data point is from the mean of a dataset.
• It is a way to standardize scores from different distributions, allowing for
comparison.
What a Z-Score Indicates ?
• 1.Position Relative to the Mean
• A z-score tells you how far away a specific value (data point) is from the
mean.
 A positive z-score indicates that the data point is above the
mean.
 A negative z-score indicates that the data point is below the
mean.
 A z-score of 0 means the data point is exactly at the mean.
Dr.M.Pyingkodi, Associate Professor, Dept of MCA , Kongu Engineering College, Tamil Nadu, India 4
• 2. Standard Deviations
• The value of the z-score represents the number of standard deviations
the data point is from the mean.
• For example, a z-score of 2 means the data point is two standard
deviations above the mean, while a z-score of -1.5 means
• it is one and a half standard deviations below the mean.
• Example
• Consider a dataset of test scores for a class, with the following
characteristics:
•
• Mean (μmuμ) - 70
• Standard Deviation (σsigmaσ) -10
• Suppose a student scores 85 on the test.
• To calculate the z-score for this student's score, we use the formula
• Where
• X = the data point (student's score)
• m= the mean
• σ = the standard deviation
Dr.M.Pyingkodi, Associate Professor, Dept of MCA , Kongu Engineering College, Tamil Nadu, India 5
• Substituting in the values
• In this example, the student's z-score is 1.5.
• This means that the student scored 1.5 standard deviations above the
mean score of the class.
• This indicates a relatively high performance compared to the average
student in the class.
• Using z-scores allows us to understand the position of a data point
within the context of the overall dataset, making it easier to compare
scores across different distributions.
• Why Z-Score ?
• Comparison Across Different Datasets
• Z-scores allow us to compare values from different datasets, even if they
have different units or scales.
•
Dr.M.Pyingkodi, Associate Professor, Dept of MCA , Kongu Engineering College, Tamil Nadu, India 6
• For instance, if you're comparing the performance of two students on
different tests, the z-scores normalize their scores, making it easier to
compare their relative performance.
• Probability and Percentiles
• In a normal distribution, z-scores are tied to probabilities.
• For example, a z-score of +1 corresponds to about the 84th percentile of
the data, meaning that 84% of the data points are below this score.
• Identifying Outliers
• Z-scores can highlight extreme values. If a z-score is very high or low
(e.g., beyond ±3), the value may be considered an outlier in the dataset.
• Imagine the average height of adult men in a population is 175 cm with a
standard deviation of 10 cm.
• This means he is 1.5 standard deviations taller than the average
• Z-scores make it easier to understand how unusual or typical a value is
relative to the overall data distribution.
Dr.M.Pyingkodi, Associate Professor, Dept of MCA , Kongu Engineering College, Tamil Nadu, India 7

Data Science Normal Distribution Z-Score

  • 1.
    Data Science -Normal Distribution Dr.M.Pyingkodi Associate Professor Dept. of MCA Kongu Engineering College Erode, Tamil Nadu,India
  • 2.
    • A NormalDistribution • A Normal Distribution is a bell-shaped curve that is symmetric about the mean, where most of the data points are clustered around the center and fewer are found as you move away from the mean. • Characteristics • 1. Symmetry • The distribution is symmetric about the mean. • This means that the left side of the distribution is a mirror image of the right side. • 2. Bell-Shaped Curve • The graph of a normal distribution is shaped like a bell, with a single peak at the mean. • It rises gradually on both sides and tails off symmetrically. • 3. Mean, Median, and Mode Equality • In a normal distribution, the mean, median, and mode are all equal and located at the center of the distribution. • This central point is also known as the peak of the curve. Dr.M.Pyingkodi, Associate Professor, Dept of MCA , Kongu Engineering College, Tamil Nadu, India 2
  • 3.
    • A NormalDistribution • 4. 68-95-99.7 Rule • Approximately 68% of the data falls within one standard deviation of the mean, about 95% falls within two standard deviations, and about 99.7% falls within three standard deviations. • This is often referred to as the empirical rule. • 5. Asymptotic Nature • The tails of the normal distribution approach, but never actually touch, the horizontal axis. • This means that there is a theoretical possibility of observing values far from the mean, though these occurrences become increasingly rare. Dr.M.Pyingkodi, Associate Professor, Dept of MCA , Kongu Engineering College, Tamil Nadu, India 3
  • 4.
    z-score • A z-scoreis a statistical measure that indicates how many standard deviations a data point is from the mean of a dataset. • It is a way to standardize scores from different distributions, allowing for comparison. What a Z-Score Indicates ? • 1.Position Relative to the Mean • A z-score tells you how far away a specific value (data point) is from the mean.  A positive z-score indicates that the data point is above the mean.  A negative z-score indicates that the data point is below the mean.  A z-score of 0 means the data point is exactly at the mean. Dr.M.Pyingkodi, Associate Professor, Dept of MCA , Kongu Engineering College, Tamil Nadu, India 4
  • 5.
    • 2. StandardDeviations • The value of the z-score represents the number of standard deviations the data point is from the mean. • For example, a z-score of 2 means the data point is two standard deviations above the mean, while a z-score of -1.5 means • it is one and a half standard deviations below the mean. • Example • Consider a dataset of test scores for a class, with the following characteristics: • • Mean (μmuμ) - 70 • Standard Deviation (σsigmaσ) -10 • Suppose a student scores 85 on the test. • To calculate the z-score for this student's score, we use the formula • Where • X = the data point (student's score) • m= the mean • σ = the standard deviation Dr.M.Pyingkodi, Associate Professor, Dept of MCA , Kongu Engineering College, Tamil Nadu, India 5
  • 6.
    • Substituting inthe values • In this example, the student's z-score is 1.5. • This means that the student scored 1.5 standard deviations above the mean score of the class. • This indicates a relatively high performance compared to the average student in the class. • Using z-scores allows us to understand the position of a data point within the context of the overall dataset, making it easier to compare scores across different distributions. • Why Z-Score ? • Comparison Across Different Datasets • Z-scores allow us to compare values from different datasets, even if they have different units or scales. • Dr.M.Pyingkodi, Associate Professor, Dept of MCA , Kongu Engineering College, Tamil Nadu, India 6
  • 7.
    • For instance,if you're comparing the performance of two students on different tests, the z-scores normalize their scores, making it easier to compare their relative performance. • Probability and Percentiles • In a normal distribution, z-scores are tied to probabilities. • For example, a z-score of +1 corresponds to about the 84th percentile of the data, meaning that 84% of the data points are below this score. • Identifying Outliers • Z-scores can highlight extreme values. If a z-score is very high or low (e.g., beyond ±3), the value may be considered an outlier in the dataset. • Imagine the average height of adult men in a population is 175 cm with a standard deviation of 10 cm. • This means he is 1.5 standard deviations taller than the average • Z-scores make it easier to understand how unusual or typical a value is relative to the overall data distribution. Dr.M.Pyingkodi, Associate Professor, Dept of MCA , Kongu Engineering College, Tamil Nadu, India 7