SlideShare a Scribd company logo
1 of 74
Download to read offline
lOMoAR cPSD|28265668
lOMoAR cPSD|2826
Department of Computer Science Engineering
CS3352-Foundations of Data Science
Unit - II: Describing Data
Data:
● A collection of actual observations or scores in a survey or an experiment
Types of Data and Variable:
● Data can be descriptive (characteristics) or numerical (numbers). Let us take
a look at some of the most prevalent types of data.
● There are two types of data:
1. Qualitative Data
2. Quantitative Data
lOMoAR cPSD|28265668
Types of Data
lOMoAR cPSD|28265668
(A) Qualitative Data:
● There are no numbers in qualitative data, so it cannot be measured. It is also called Categorical
Data because the data can be sorted by category rather than by number.
● Qualitative data is dealing with characteristics and descriptions that are difficult to measure but
may be subjectively observed, e.g., smells, tastes, textures, attractiveness, color, etc. They may
include favourite foods, favorite holiday destinations, religions, pictures, symbols, colours, and
so on.
● These data are described by some characteristics, for example, gender, blood group, etc. This data
can provide answers to questions such as: “How did it occur?” or “Why did this occur?"
● In general, qualitative data can be divided into two types:
1. Nominal data
2. Ordinal data
lOMoAR cPSD|28265668
1. Nominal Data:
● This type of data is used for naming variables and has no numerical value
● Nominal data is a collection of values (non-numeric) that do not have a natural order.
● For example, it is not possible to state that 'Green' is greater than 'Blue', so we cannot compare
one color to another, and so the color of a thing is a nominal data type
● Examples of Nominal Data:
○ Colors: (Brown, Red, etc.)
○ Taste: (Sour, Sweet, Salty, etc.)
○ Languages: (Hindi, English, Marathi, Gujarati, Tamil, Telugu, etc.
lOMoAR cPSD|28265668
2. Ordinal Data:
● Ordinal data is defined as qualitative data whose values are ordered.
● In this type of data, a natural ordering occurs while maintaining class values. In other words,
ordinal data is data that is sorted by its scale position. Ordinal numbers cannot be used for
arithmetic because they only display sequence.
● For example, we can easily sort the clothing brands' sizes according to their name tags in the
order of small < medium < large.
● Examples of Ordinal Data:
○ Economic status: (low, medium, high)
○ Letter grades: (A, B, C, D, E, etc.)
○ Rank in a competition: (First, Second, Third)
lOMoAR cPSD|28265668
(B) Quantitative Data:
● Quantitative data are numbers.
● Numbers make up quantitative data. That is, the data represented in numbers, are quantitative
data. Quantitative data is made up of numbers and things that can be measured objectively, e.g.,
area, volume, height, width, length, weight, speed, humidity, temperature, prices, year etc.
● Quantitative data is always represented by numbers that indicate either how much or how many.
● In general, quantitative data can be divided into two types:
1. Discrete data
2. Continuous data
lOMoAR cPSD|28265668
1. Discrete Data:
● Discrete data is counted, but it can only have certain values.
● Discrete data consists of finite, numeric, countable, and non-negative integers with discrete
variables.
● Generally, it involves integers. The number of pupils, the number of children, the shoe size, and
so on are all examples of discrete data.
● Examples of Discrete Data:
○ When we roll one die, we obtain 1, 2, 3, 4, 5, or 6 as discrete data.
○ The total number of students enrolled in a class is discrete data
○ The number of children in your household is discrete data
lOMoAR cPSD|28265668
2. Continuous Data:
● Continuous data is measured, and its value can be anything within a range.
● Continuous data is a set of numbers that can have any decimal or fractional value. Height,
weight, length, time, temperature are all instances of continuous data.
● For example, The height of a person may be precisely 5.78 feet. We can measure someone's
height in meters, centimetres, millimetres, and so on, so height is continuous data.
● Examples of continuous data:
○ Newborn babies' body weight
○ A freezer temperature
○ Wind speed
● Continuous data can be further classified as measured on an interval scale or a ratio scale.
lOMoAR cPSD|28265668
(i) Interval Scale
● Values that do not have a natural zero are referred to as the interval scale.
● An interval scale has order and the difference between two values is significant. You cannot
make a ratio out of these numbers, such as the temperature of a room in Celsius.
● Temperature, pH, and credit score are examples of interval variables.
lOMoAR cPSD|28265668
(ii) Ratio Scale:
● A ratio scale is a set of values that have a natural zero.
● Something measured on a ratio scale has the same properties as something measured on an
interval scale, with the exception that there is an absolute zero point with ratio scaling. In
other words, a ratio variable contains all of the attributes of an interval variable, plus a distinct
definition of 0.0. There is no value for the variable when it equals 0.0.
● An example is a temperature measured in Kelvin. Below 0 degrees Kelvin, there is no value
possible; it is absolute zero.
● Another example is weight; 0 kg indicates a notable absence of weight.
lOMoAR cPSD|28265668
Question : Indicate whether each of the following terms is qualitative (because it’s a word, letter, or
numerical code representing a class or category); or quantitative (because it’s a number representing an
amount or a count).
1) age
2) family size
3) academic major
4) IQ score
5) net worth (dollars)
6) third-place finish
7) gender
8) temperature
lOMoAR cPSD|28265668
Question : Indicate whether each of the following terms is qualitative (because it’s a word, letter, or numerical
code representing a class or category); or quantitative (because it’s a number representing an amount or a count).
Answer :
1) age (quantitative) (Discrete if measured in a number of years, minutes, seconds.) (Continuous/Ratio,
However it would be continuous if measured to an exact amount of time passed since the start of something.)
2) family size (quantitative/Discrete)
3) academic major (qualitative/Nominal)
4) IQ score (quantitative/Continuous/Interval)
5) net worth (dollars) (quantitative/Continuous/Interval)
6) third-place finish (qualitative/Ordinal)
7) gender (qualitative/Nominal)
8) temperature (quantitative/Continuous) (temperature in Celsius or Fahrenheit is at an interval scale because
zero is not the lowest possible temperature. In the Kelvin scale, a ratio scale, zero represents a total lack of
thermal energy.)
lOMoAR cPSD|28265668
Frequency Distributions for Quantitative Data:
● A frequency distribution is a collection of observations produced by sorting observations into
classes and showing their frequency (f ) of occurrence in each class.
● Frequency distribution is used to organize the collected data in table form.
● It is a way to summarize the data and it allows to quick visual interpretation of data
● For Example: The following are the scores of 10 students in the G.K. quiz released by Mr. Chris
15, 17, 20, 15, 20, 17, 17, 14, 14, 20. Let's represent this data in frequency distribution and find
out the number of students who got the same marks.
● It is easy to understand the given
information using frequency distribution
and from this we can see that the number
of students who obtained the same
marks.
lOMoAR cPSD|28265668
Types of Frequency Distributions:
1) Grouped Frequency Distribution:
● To arrange a large number of observations or data, we use grouped frequency
distribution table. In this, we form class intervals to tally the frequency for the data
that belongs to that particular class interval.
● For Example: Marks obtained by 20 students in the test are as follows. 5, 10, 20,
15, 5, 20, 20, 15, 15, 15, 10, 10, 10, 20, 15, 5, 18, 18, 18, 18. To arrange the data in
grouped table we have to make class intervals.
lOMoAR cPSD|28265668
2) Ungrouped Frequency Distribution:
● In the ungrouped frequency distribution, we don't make class intervals, we write the
accurate frequency of individual data.
● For Example: Marks obtained by 20 students in the test are as follows. 5, 10, 20, 15, 5,
20, 20, 15, 15, 15, 10, 10, 10, 20, 15, 5, 18, 18, 18, 18. To arrange the data in ungrouped
frequency distribution table we have to write the frequency of each individual data.
lOMoAR cPSD|28265668
3) Relative Frequency Distribution:
● Relative frequency distributions show the frequency of each class as a part or fraction of the
total frequency for the entire distribution.
● To convert a frequency distribution into a relative frequency distribution, divide the
frequency for each class by the total frequency for the entire distribution.
● For instance, to obtain the
proportion of .06 for the
class 130–139, divide the
frequency of 3 for that
class by the total
frequency of 53.
● Repeat this process until a
proportion has been
calculated for each class.
lOMoAR cPSD|28265668
4) Cumulative Frequency Distributions:
● Cumulative frequency distributions show the total number of observations in each class and in all
lower-ranked classes.
● To convert a frequency distribution into a cumulative frequency distribution, add the frequency of each
class to the sum of the frequencies of all classes ranked below it. This gives the cumulative frequency
for that class. Begin with the lowest-ranked class in the frequency distribution and work upward,
finding the cumulative frequencies in ascending order.
● Cumulative percentages are often referred
to as percentile ranks.
● The percentile rank of a score indicates the
percentage of scores in the entire
distribution with similar or smaller values
than that score.
lOMoAR cPSD|28265668
Frequency Distributions for Qualitative Data:
Nominal Qualitative Data
● Frequency distributions for qualitative data are easy to construct. Simply determine the
frequency with which observations occupy each class.
● For example:
○ In this Facebook profile survey, the frequency distribution reveals that Yes
replies are approximately twice as prevalent as No replies.
lOMoAR cPSD|28265668
Ordinal Qualitative Data
● When qualitative data have an ordinal level of measurement because observations can be
ordered from least to most, that order should be preserved in the frequency table.
● For example:
○ Here, Military ranks are listed in descending order from general to lieutenant
○ if measurement is ordinal because observations can be ordered from least to
most, cumulative frequencies (and cumulative percentages) can be used.
lOMoAR cPSD|28265668
Question : Construct a frequency distribution for ungrouped data.
Students in a theater arts appreciation class rated a classic film on a 10-point scale, ranging from 1 (poor)
to 10 (excellent), as follows:
Answer :
lOMoAR cPSD|28265668
Question : Construct a frequency distribution for grouped data. The IQ scores for a group of 35 high
school dropouts are as follows:
Answer : Calculating the class width (let’s desired classes 10)
lOMoAR cPSD|28265668
Question : GRE scores for a group of graduate school applicants are distributed as follows:
1) Convert to a relative frequency distribution. When calculating proportions,
round numbers to two digits to the right of the decimal point.
2) Convert to a cumulative frequency distribution.
3) Convert to a cumulative percent frequency distribution.
Answer 1) :
lOMoAR cPSD|28265668
Answer 2) & 3) :
lOMoAR cPSD|28265668
Question : Movie ratings reflect ordinal measurement because they can be ordered from most to least
restrictive: NC-17, R, PG-13, PG, and G. The ratings of some films shown recently in San Francisco are
as follows:
Answer :
(a) Construct a frequency distribution.
(b) Convert to relative frequencies, expressed as percentages.
(c) Construct a cumulative frequency distribution.
(d) Find the approximate percentile rank for those films with a PG rating.
lOMoAR cPSD|28265668
Graphs for Quantitative Data:
● Histograms
○ Equal units along the horizontal axis (the X axis, or abscissa) reflect the various
class intervals of the frequency distribution.
○ Equal units along the vertical axis (the Y axis, or ordinate) reflect increases in
frequency. (The units along the vertical axis do not have to be the same width as
those along the horizontal axis.)
○ The body of the histogram consists of a series of bars whose heights reflect the
frequencies for the various classes.
lOMoAR cPSD|28265668
● For example:
lOMoAR cPSD|28265668
● Frequency Polygon
● An important variation on a histogram is
the frequency polygon, or line graph.
● Frequency polygons may be constructed
directly from frequency distributions.
● For example:
lOMoAR cPSD|28265668
Question : The following frequency distribution shows the annual incomes in dollars for a group of college
graduates.
(a) Construct a histogram.
(b) Construct a frequency polygon.
Answer :
lOMoAR cPSD|28265668
● Stem and Leaf Displays
● Another technique for summarizing quantitative data is a stem and leaf display.
● Stem and Leaf Display is a way for presenting quantitative data in a graphical format,
similar to a histogram, to assist in visualizing the shape of a distribution.
● For example:
● For example:
lOMoAR cPSD|28265668
Question : Construct a stem and leaf display for the following IQ scores obtained from a group offour-year-
old children.
Answer :
lOMoAR cPSD|28265668
Graphs for Qualitative Data:
● Bar graph
○ Generally used for qualitative data.
○ Gaps are placed between adjacent bars of bar graphs to emphasize the discontinuous
nature of qualitative data.
○ A bar graph also can be used with quantitative data to emphasize the discontinuous
nature of a discrete variable, such as the number of children in a family.
● For example:
lOMoAR cPSD|28265668
Typical Distribution Curve Shapes:
● Whether expressed as a histogram, a frequency polygon, or a stem and leaf display, an important
characteristic of a frequency distribution is its shape.
lOMoAR cPSD|28265668
● Normal: Any distribution that approximates the normal shape
● Bimodal: Any distribution that approximates the bimodal shape
● Positively Skewed Distribution: A distribution that includes a few extreme observations in the
positive direction (to the right of the majority of observations).
● Negatively Skewed Distribution: A distribution that includes a few extreme observations in the
negative direction (to the left of the majority of observations).
lOMoAR cPSD|28265668
Describing Data with Averages:
● Mode:
○ The mode reflects the value of the most frequently occurring score.
○ For example:
Four years is the
modal term, since the
greatest number of
presidents, 7, served
this term.
lOMoAR cPSD|28265668
Question : Determine the mode for the following retirement ages: 60, 63, 45, 63, 65,
70, 55, 63, 60, 65, 63.
mode = 63
Question : The owner of a new car conducts six gas mileage tests and obtains the
following results, expressed in miles per gallon: 26.3, 28.7, 27.4, 26.6, 27.4, 26.9.
Find the mode for these data.
mode = 27.4
lOMoAR cPSD|28265668
Median:
● The median reflects the middle value when observations are ordered from least to
most.
○ The value of the median always reflects the value of middle-ranked scores, not
the position of these scores among the set of ordered scores.
● When you have an odd number of data points, the median is the value in the middle
of your data set.
● With an even number of data points, there are two values in the middle, so the
median is their mean.
lOMoAR cPSD|28265668
→ Odd-numbered data set:
Step 1: Order your values from low to high.
Step 2: Locate the median
Middle Position = (n+1)/2 = (11+1)/2 = 6
So, Median = 6th element = 72
lOMoAR cPSD|28265668
→ Even-numbered data set:
Step 1: Order your values from low to high.
Step 2: Locate the median.
Middle position = (n+1)/2 = (10+1)/2 = 5.5
So, Median = (5th element + 6th element)/2 = (72+76)/2 = 74
lOMoAR cPSD|28265668
Question : Find the median for the following retirement ages: 60, 63, 45, 63, 65, 70,
55, 63, 60, 65, 63.
median = 63
Question : Find the median for the following gas mileage tests: 26.3, 28.7, 27.4,
26.6, 27.4, 26.9.
median = 27.15 (halfway between 26.9 and 27.4)
lOMoAR cPSD|28265668
● Mean
○ The mean is found by adding all scores and then dividing by the number of scores.
○ Statisticians distinguish between two types of means—the population mean and the
sample mean—depending on whether the data are viewed as a population (a complete
set of scores) or as a sample (a subset of scores).
○ The mean reflects the values of all scores, not just those that are middle ranked (as with
the median), or those that occur most frequently (as with the mode).
lOMoAR cPSD|28265668
“Sample mean (X-bar) equals the sum of the values
of all scores in the sample (the sum of the variable
X) divided by the sample size n.”
“Population mean (μ) equals the sum of all
scores in the population (sum of the variable
X) divided by the population size N.”
Question : Find the mean for the following retirement ages: 60, 63, 45, 63, 65, 70,
55, 63, 60, 65, 63.
mean = 61.09
Question : Find the mean for the following gas mileage tests: 26.3, 28.7, 27.4, 26.6,
27.4, 26.9.
mean = 27.22
lOMoAR cPSD|28265668
Interpretation of the differences between Mean and Median
● When a distribution is skewed, differences between the values of the mean and median
signal the presence of a skewed distribution.
● If the mean exceeds the median, as it does for the infant death rates, the underlying
distribution is positively skewed because of one or more scores with relatively large
values, such as the very high infant death rates for a number of countries, especially Sierra
Leone.
● On the other hand, if the median exceeds the mean, the underlying distribution is
negatively skewed because of one or more scores with relatively small values.
● In the given example, The median infant death rate of 7 describes the middle-ranked rate.
Finally, the mean infant death rate of 30.00 describes the balance point for all rates.
*Rates per 1000 live births.
lOMoAR cPSD|28265668
Averages with Qualitative Data:
● The mode can always be used with all qualitative data.
● If qualitative data can be ordered from least to most because the level of measurement is
ordinal, the median also can be used.
○ It’s easiest to determine the median class for ordered qualitative data by using relative
frequencies. Cumulate the relative frequencies, working up from the bottom of the
distribution, until the cumulative percentage first equals or exceeds 50 percent.
● In this Example, Since it includes a
cumulative percent of 50, captain is the
median rank of officers in the U.S.
Army.
lOMoAR cPSD|28265668
Question : College students were surveyed about where they would most like to spend their spring break:
Daytona Beach (DB), Cancun, Mexico (C), South Padre Island (SP), Lake Havasu (LH), or other (O). The
results were as follows:
● Find the mode and, if possible, the median.
Answer :
● mode = DB (Daytona Beach)
● Impossible to find the median when qualitative data are unordered, with only nominal
measurement.
lOMoAR cPSD|28265668
Measures of variability:
● measures of the amount by which scores are dispersed or scattered in a distribution
● measures of variability define how far away the data points tend to fall from the center
● low variability is ideal because it means that you can better predict information about the
population based on sample data
● high variability means that the values are less consistent, so it’s harder to make predictions
● There are several measures of variability, including
○ the range,
○ the interquartile range,
○ the variance, and most important
○ the standard deviation
lOMoAR cPSD|28265668
● For distribution A with the least (zero) variability, all seven scores have the same value (10).
● For distribution B with intermediate variability, the values of scores vary slightly (one 9 and one 11), and
● For distribution C with most variability, they vary even more (one 7, two 9s, two 11s, and one 13).
lOMoAR cPSD|28265668
Range:
● range is the difference between the largest and smallest scores.
● For Example: Let we have 8 data points from Sample A.
Data (minutes) 72 110 134 190 238 287 305 324
The highest value (H) is 324 and the lowest (L) is 72.
R = H – L
R = 324 – 72 = 252
The range of your data is 252 minutes.
● Because only 2 numbers are used in finding range, so, the range is influenced by outliers
and doesn’t give you any information about the distribution of values. It’s best used in
combination with other measures.
lOMoAR cPSD|28265668
● In distribution A, the least variable (least variability), has the smallest range of 0 (from 10 to 10);
● distribution B, the moderately variable (intermediate variability), has an intermediate range of 2 (from 11 to 9);
● distribution C, the most variable (most variability), has the largest range of 6 (from 13 to 7).
lOMoAR cPSD|28265668
Interquartile Range (IQR):
● The interquartile range gives the spread of the middle of the distribution.
● The interquartile range is the difference of third quartile (Q3) and the first quartile (Q1).
● Interquartile range (IQR), is simply the range for the middle 50 percent of the scores.
● The interquartile range is an especially useful measure of variability for skewed distributions.
● The IQR is also useful for datasets with outliers. Because it’s based on the middle half of the
distribution, it’s less influenced by extreme values.
interquartile range in boxplot
lOMoAR cPSD|28265668
→ Odd-numbered data set:
Step 1: Order your values from low to high.
Step 2: Locate the median
Middle Position = (n+1)/2 = (11+1)/2 = 6
So, Median = 6th element = 72
lOMoAR cPSD|28265668
Step 3: Find Q1 and Q3.
Q3 is the median of the second half, So here 81
Step 4: Calculate the interquartile range.
Q1 is the median of the first half, So here 57 and
lOMoAR cPSD|28265668
→ Even-numbered data set:
Step 1: Order your values from low to high.
Step 2: Locate the median.
Middle position = (n+1)/2 = (10+1)/2 = 5.5
So, Median = (5th element + 6th element)/2 = (72+76)/2 = 74
lOMoAR cPSD|28265668
Step 3: Find Q1 and Q3.
Q1 is the median of the first half, So here 57 and
Q3 is the median of the second half, So here 81
Step 4: Calculate the interquartile range.
lOMoAR cPSD|28265668
Outliers:
● Appearance of one or more very extreme scores in the dataset is called as outliers.
● An outlier is a data point that lies abnormally far away from other values in a dataset.
● For Example:
○ Someone like Elon Musk who has a net worth in the billions of dollars would be
considered an outlier in terms of annual income.
○ Any freedivers who can hold their breath for 10 minutes or longer would be
considered outliers because they can hold their breath much longer than 165
seconds.
Formula to find outliers
[Q1 – 1.5 * IQR, Q3 + 1.5 * IQR]
If the value does not fall in the above range it considers outliers.
lOMoAR cPSD|28265668
Variance:
● The variance is the average of squared deviations from the mean. A deviation
from the mean is how far a score lies from the mean. Variance measures how far
each number in the dataset from the mean.
● Variance is the square of the standard deviation.
lOMoAR cPSD|28265668
● For Example:
𝝁 = 𝝁 =
lOMoAR cPSD|28265668
Standard Deviation:
● Standard deviation is a squared root of the variance.
● Low standard deviation indicates data points close to mean.
lOMoAR cPSD|28265668
Example: You grow 20 crystals from a solution and measure the length of each crystal in millimeters. Here is your
data: 9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4. Calculate the sample standard deviation of the length of
the crystals.
lOMoAR cPSD|28265668
sample
sample
lOMoAR cPSD|28265668
lOMoAR cPSD|28265668
lOMoAR cPSD|28265668
lOMoAR cPSD|28265668
lOMoAR cPSD|28265668
lOMoAR cPSD|28265668
lOMoAR cPSD|28265668
lOMoAR cPSD|28265668
lOMoAR cPSD|28265668
lOMoAR cPSD|28265668
lOMoAR cPSD|28265668
lOMoAR cPSD|28265668
lOMoAR cPSD|28265668
lOMoAR cPSD|28265668
lOMoAR cPSD|28265668

More Related Content

Similar to Unit - II FDS.pdf

Statistics
StatisticsStatistics
Statisticspikuoec
 
Introduction to Statistics Variable_part 2.pdf
Introduction to Statistics Variable_part 2.pdfIntroduction to Statistics Variable_part 2.pdf
Introduction to Statistics Variable_part 2.pdfAtoshe Elmi
 
Introduction to Statistics Variable_part 2.pptx
Introduction to Statistics Variable_part 2.pptxIntroduction to Statistics Variable_part 2.pptx
Introduction to Statistics Variable_part 2.pptxMdAshrafulAlamRayhan
 
Basic statistics
Basic statisticsBasic statistics
Basic statisticsGanesh Raju
 
Type of data @ Web Mining Discussion
Type of data @ Web Mining DiscussionType of data @ Web Mining Discussion
Type of data @ Web Mining DiscussionCherryBerry2
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsHiba Armouche
 
Chapter one Business statistics referesh
Chapter one Business statistics refereshChapter one Business statistics referesh
Chapter one Business statistics refereshYasin Abdela
 
4. six sigma descriptive statistics
4. six sigma descriptive statistics4. six sigma descriptive statistics
4. six sigma descriptive statisticsHakeem-Ur- Rehman
 
STATISTICS.pptx for the scholars and students
STATISTICS.pptx for the scholars and studentsSTATISTICS.pptx for the scholars and students
STATISTICS.pptx for the scholars and studentsssuseref12b21
 
Chi square test evidence based dentistry
Chi square test evidence based dentistryChi square test evidence based dentistry
Chi square test evidence based dentistryPiyushJain163909
 
Aed1222 lesson 2
Aed1222 lesson 2Aed1222 lesson 2
Aed1222 lesson 2nurun2010
 
Collection and Classification of Data
Collection and Classification  of DataCollection and Classification  of Data
Collection and Classification of DataSuresh Babu
 
Classification of data ppt.pptx
Classification of data ppt.pptxClassification of data ppt.pptx
Classification of data ppt.pptxSonuChauhan61
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptxIndhuGreen
 
Type of data @ web mining discussion
Type of data @ web mining discussionType of data @ web mining discussion
Type of data @ web mining discussionCherryBerry2
 
Sampling and Data_Update.ppt
Sampling and Data_Update.pptSampling and Data_Update.ppt
Sampling and Data_Update.pptMdShohelRana69
 
Statistics final seminar
Statistics final seminarStatistics final seminar
Statistics final seminarTejas Jagtap
 

Similar to Unit - II FDS.pdf (20)

Statistics
StatisticsStatistics
Statistics
 
Introduction to Statistics Variable_part 2.pdf
Introduction to Statistics Variable_part 2.pdfIntroduction to Statistics Variable_part 2.pdf
Introduction to Statistics Variable_part 2.pdf
 
Introduction to Statistics Variable_part 2.pptx
Introduction to Statistics Variable_part 2.pptxIntroduction to Statistics Variable_part 2.pptx
Introduction to Statistics Variable_part 2.pptx
 
AF-20-Module.pdf
AF-20-Module.pdfAF-20-Module.pdf
AF-20-Module.pdf
 
Basic statistics
Basic statisticsBasic statistics
Basic statistics
 
Type of data @ Web Mining Discussion
Type of data @ Web Mining DiscussionType of data @ Web Mining Discussion
Type of data @ Web Mining Discussion
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Chapter one Business statistics referesh
Chapter one Business statistics refereshChapter one Business statistics referesh
Chapter one Business statistics referesh
 
4. six sigma descriptive statistics
4. six sigma descriptive statistics4. six sigma descriptive statistics
4. six sigma descriptive statistics
 
Measurementand scaling-10
Measurementand scaling-10Measurementand scaling-10
Measurementand scaling-10
 
STATISTICS.pptx for the scholars and students
STATISTICS.pptx for the scholars and studentsSTATISTICS.pptx for the scholars and students
STATISTICS.pptx for the scholars and students
 
Chi square test evidence based dentistry
Chi square test evidence based dentistryChi square test evidence based dentistry
Chi square test evidence based dentistry
 
Aed1222 lesson 2
Aed1222 lesson 2Aed1222 lesson 2
Aed1222 lesson 2
 
Collection and Classification of Data
Collection and Classification  of DataCollection and Classification  of Data
Collection and Classification of Data
 
Classification of data ppt.pptx
Classification of data ppt.pptxClassification of data ppt.pptx
Classification of data ppt.pptx
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
Type of data @ web mining discussion
Type of data @ web mining discussionType of data @ web mining discussion
Type of data @ web mining discussion
 
Sampling and Data_Update.ppt
Sampling and Data_Update.pptSampling and Data_Update.ppt
Sampling and Data_Update.ppt
 
Statistics final seminar
Statistics final seminarStatistics final seminar
Statistics final seminar
 
statistics Lesson 1
statistics Lesson 1statistics Lesson 1
statistics Lesson 1
 

More from TamilarasiP13

Data Science Process.pptx.pdf
Data Science Process.pptx.pdfData Science Process.pptx.pdf
Data Science Process.pptx.pdfTamilarasiP13
 
linear_regression_notes.pdf
linear_regression_notes.pdflinear_regression_notes.pdf
linear_regression_notes.pdfTamilarasiP13
 
Averages and Variability.pdf
Averages and Variability.pdfAverages and Variability.pdf
Averages and Variability.pdfTamilarasiP13
 

More from TamilarasiP13 (6)

Data Science Process.pptx.pdf
Data Science Process.pptx.pdfData Science Process.pptx.pdf
Data Science Process.pptx.pdf
 
Correlation.pdf
Correlation.pdfCorrelation.pdf
Correlation.pdf
 
linear_regression_notes.pdf
linear_regression_notes.pdflinear_regression_notes.pdf
linear_regression_notes.pdf
 
Averages and Variability.pdf
Averages and Variability.pdfAverages and Variability.pdf
Averages and Variability.pdf
 
Unit - I FDS.pdf
Unit - I FDS.pdfUnit - I FDS.pdf
Unit - I FDS.pdf
 
FDS- PPT.pptx
FDS- PPT.pptxFDS- PPT.pptx
FDS- PPT.pptx
 

Recently uploaded

Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...ssuserf63bd7
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...Amil baba
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...BabaJohn3
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证zifhagzkk
 
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisData Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisBoston Institute of Analytics
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"John Sobanski
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证ju0dztxtn
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationmuqadasqasim10
 
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethDigital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethSamantha Rae Coolbeth
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxStephen266013
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjadimosmejiaslendon
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfgreat91
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证ppy8zfkfm
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证acoha1
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfRobertoOcampo24
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...ssuserf63bd7
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证pwgnohujw
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024patrickdtherriault
 

Recently uploaded (20)

Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(WashU毕业证书)圣路易斯华盛顿大学毕业证成绩单本科硕士学位证留信学历认证
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisData Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
 
Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"Aggregations - The Elasticsearch "GROUP BY"
Aggregations - The Elasticsearch "GROUP BY"
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae CoolbethDigital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
Digital Marketing Demystified: Expert Tips from Samantha Rae Coolbeth
 
Audience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptxAudience Researchndfhcvnfgvgbhujhgfv.pptx
Audience Researchndfhcvnfgvgbhujhgfv.pptx
 
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UPenn毕业证书)宾夕法尼亚大学毕业证成绩单本科硕士学位证留信学历认证
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
Statistics Informed Decisions Using Data 5th edition by Michael Sullivan solu...
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 

Unit - II FDS.pdf

  • 1. lOMoAR cPSD|28265668 lOMoAR cPSD|2826 Department of Computer Science Engineering CS3352-Foundations of Data Science Unit - II: Describing Data Data: ● A collection of actual observations or scores in a survey or an experiment Types of Data and Variable: ● Data can be descriptive (characteristics) or numerical (numbers). Let us take a look at some of the most prevalent types of data. ● There are two types of data: 1. Qualitative Data 2. Quantitative Data
  • 3. lOMoAR cPSD|28265668 (A) Qualitative Data: ● There are no numbers in qualitative data, so it cannot be measured. It is also called Categorical Data because the data can be sorted by category rather than by number. ● Qualitative data is dealing with characteristics and descriptions that are difficult to measure but may be subjectively observed, e.g., smells, tastes, textures, attractiveness, color, etc. They may include favourite foods, favorite holiday destinations, religions, pictures, symbols, colours, and so on. ● These data are described by some characteristics, for example, gender, blood group, etc. This data can provide answers to questions such as: “How did it occur?” or “Why did this occur?" ● In general, qualitative data can be divided into two types: 1. Nominal data 2. Ordinal data
  • 4. lOMoAR cPSD|28265668 1. Nominal Data: ● This type of data is used for naming variables and has no numerical value ● Nominal data is a collection of values (non-numeric) that do not have a natural order. ● For example, it is not possible to state that 'Green' is greater than 'Blue', so we cannot compare one color to another, and so the color of a thing is a nominal data type ● Examples of Nominal Data: ○ Colors: (Brown, Red, etc.) ○ Taste: (Sour, Sweet, Salty, etc.) ○ Languages: (Hindi, English, Marathi, Gujarati, Tamil, Telugu, etc.
  • 5. lOMoAR cPSD|28265668 2. Ordinal Data: ● Ordinal data is defined as qualitative data whose values are ordered. ● In this type of data, a natural ordering occurs while maintaining class values. In other words, ordinal data is data that is sorted by its scale position. Ordinal numbers cannot be used for arithmetic because they only display sequence. ● For example, we can easily sort the clothing brands' sizes according to their name tags in the order of small < medium < large. ● Examples of Ordinal Data: ○ Economic status: (low, medium, high) ○ Letter grades: (A, B, C, D, E, etc.) ○ Rank in a competition: (First, Second, Third)
  • 6. lOMoAR cPSD|28265668 (B) Quantitative Data: ● Quantitative data are numbers. ● Numbers make up quantitative data. That is, the data represented in numbers, are quantitative data. Quantitative data is made up of numbers and things that can be measured objectively, e.g., area, volume, height, width, length, weight, speed, humidity, temperature, prices, year etc. ● Quantitative data is always represented by numbers that indicate either how much or how many. ● In general, quantitative data can be divided into two types: 1. Discrete data 2. Continuous data
  • 7. lOMoAR cPSD|28265668 1. Discrete Data: ● Discrete data is counted, but it can only have certain values. ● Discrete data consists of finite, numeric, countable, and non-negative integers with discrete variables. ● Generally, it involves integers. The number of pupils, the number of children, the shoe size, and so on are all examples of discrete data. ● Examples of Discrete Data: ○ When we roll one die, we obtain 1, 2, 3, 4, 5, or 6 as discrete data. ○ The total number of students enrolled in a class is discrete data ○ The number of children in your household is discrete data
  • 8. lOMoAR cPSD|28265668 2. Continuous Data: ● Continuous data is measured, and its value can be anything within a range. ● Continuous data is a set of numbers that can have any decimal or fractional value. Height, weight, length, time, temperature are all instances of continuous data. ● For example, The height of a person may be precisely 5.78 feet. We can measure someone's height in meters, centimetres, millimetres, and so on, so height is continuous data. ● Examples of continuous data: ○ Newborn babies' body weight ○ A freezer temperature ○ Wind speed ● Continuous data can be further classified as measured on an interval scale or a ratio scale.
  • 9. lOMoAR cPSD|28265668 (i) Interval Scale ● Values that do not have a natural zero are referred to as the interval scale. ● An interval scale has order and the difference between two values is significant. You cannot make a ratio out of these numbers, such as the temperature of a room in Celsius. ● Temperature, pH, and credit score are examples of interval variables.
  • 10. lOMoAR cPSD|28265668 (ii) Ratio Scale: ● A ratio scale is a set of values that have a natural zero. ● Something measured on a ratio scale has the same properties as something measured on an interval scale, with the exception that there is an absolute zero point with ratio scaling. In other words, a ratio variable contains all of the attributes of an interval variable, plus a distinct definition of 0.0. There is no value for the variable when it equals 0.0. ● An example is a temperature measured in Kelvin. Below 0 degrees Kelvin, there is no value possible; it is absolute zero. ● Another example is weight; 0 kg indicates a notable absence of weight.
  • 11. lOMoAR cPSD|28265668 Question : Indicate whether each of the following terms is qualitative (because it’s a word, letter, or numerical code representing a class or category); or quantitative (because it’s a number representing an amount or a count). 1) age 2) family size 3) academic major 4) IQ score 5) net worth (dollars) 6) third-place finish 7) gender 8) temperature
  • 12. lOMoAR cPSD|28265668 Question : Indicate whether each of the following terms is qualitative (because it’s a word, letter, or numerical code representing a class or category); or quantitative (because it’s a number representing an amount or a count). Answer : 1) age (quantitative) (Discrete if measured in a number of years, minutes, seconds.) (Continuous/Ratio, However it would be continuous if measured to an exact amount of time passed since the start of something.) 2) family size (quantitative/Discrete) 3) academic major (qualitative/Nominal) 4) IQ score (quantitative/Continuous/Interval) 5) net worth (dollars) (quantitative/Continuous/Interval) 6) third-place finish (qualitative/Ordinal) 7) gender (qualitative/Nominal) 8) temperature (quantitative/Continuous) (temperature in Celsius or Fahrenheit is at an interval scale because zero is not the lowest possible temperature. In the Kelvin scale, a ratio scale, zero represents a total lack of thermal energy.)
  • 13. lOMoAR cPSD|28265668 Frequency Distributions for Quantitative Data: ● A frequency distribution is a collection of observations produced by sorting observations into classes and showing their frequency (f ) of occurrence in each class. ● Frequency distribution is used to organize the collected data in table form. ● It is a way to summarize the data and it allows to quick visual interpretation of data ● For Example: The following are the scores of 10 students in the G.K. quiz released by Mr. Chris 15, 17, 20, 15, 20, 17, 17, 14, 14, 20. Let's represent this data in frequency distribution and find out the number of students who got the same marks. ● It is easy to understand the given information using frequency distribution and from this we can see that the number of students who obtained the same marks.
  • 14. lOMoAR cPSD|28265668 Types of Frequency Distributions: 1) Grouped Frequency Distribution: ● To arrange a large number of observations or data, we use grouped frequency distribution table. In this, we form class intervals to tally the frequency for the data that belongs to that particular class interval. ● For Example: Marks obtained by 20 students in the test are as follows. 5, 10, 20, 15, 5, 20, 20, 15, 15, 15, 10, 10, 10, 20, 15, 5, 18, 18, 18, 18. To arrange the data in grouped table we have to make class intervals.
  • 15. lOMoAR cPSD|28265668 2) Ungrouped Frequency Distribution: ● In the ungrouped frequency distribution, we don't make class intervals, we write the accurate frequency of individual data. ● For Example: Marks obtained by 20 students in the test are as follows. 5, 10, 20, 15, 5, 20, 20, 15, 15, 15, 10, 10, 10, 20, 15, 5, 18, 18, 18, 18. To arrange the data in ungrouped frequency distribution table we have to write the frequency of each individual data.
  • 16. lOMoAR cPSD|28265668 3) Relative Frequency Distribution: ● Relative frequency distributions show the frequency of each class as a part or fraction of the total frequency for the entire distribution. ● To convert a frequency distribution into a relative frequency distribution, divide the frequency for each class by the total frequency for the entire distribution. ● For instance, to obtain the proportion of .06 for the class 130–139, divide the frequency of 3 for that class by the total frequency of 53. ● Repeat this process until a proportion has been calculated for each class.
  • 17. lOMoAR cPSD|28265668 4) Cumulative Frequency Distributions: ● Cumulative frequency distributions show the total number of observations in each class and in all lower-ranked classes. ● To convert a frequency distribution into a cumulative frequency distribution, add the frequency of each class to the sum of the frequencies of all classes ranked below it. This gives the cumulative frequency for that class. Begin with the lowest-ranked class in the frequency distribution and work upward, finding the cumulative frequencies in ascending order. ● Cumulative percentages are often referred to as percentile ranks. ● The percentile rank of a score indicates the percentage of scores in the entire distribution with similar or smaller values than that score.
  • 18. lOMoAR cPSD|28265668 Frequency Distributions for Qualitative Data: Nominal Qualitative Data ● Frequency distributions for qualitative data are easy to construct. Simply determine the frequency with which observations occupy each class. ● For example: ○ In this Facebook profile survey, the frequency distribution reveals that Yes replies are approximately twice as prevalent as No replies.
  • 19. lOMoAR cPSD|28265668 Ordinal Qualitative Data ● When qualitative data have an ordinal level of measurement because observations can be ordered from least to most, that order should be preserved in the frequency table. ● For example: ○ Here, Military ranks are listed in descending order from general to lieutenant ○ if measurement is ordinal because observations can be ordered from least to most, cumulative frequencies (and cumulative percentages) can be used.
  • 20. lOMoAR cPSD|28265668 Question : Construct a frequency distribution for ungrouped data. Students in a theater arts appreciation class rated a classic film on a 10-point scale, ranging from 1 (poor) to 10 (excellent), as follows: Answer :
  • 21. lOMoAR cPSD|28265668 Question : Construct a frequency distribution for grouped data. The IQ scores for a group of 35 high school dropouts are as follows: Answer : Calculating the class width (let’s desired classes 10)
  • 22. lOMoAR cPSD|28265668 Question : GRE scores for a group of graduate school applicants are distributed as follows: 1) Convert to a relative frequency distribution. When calculating proportions, round numbers to two digits to the right of the decimal point. 2) Convert to a cumulative frequency distribution. 3) Convert to a cumulative percent frequency distribution. Answer 1) :
  • 24. lOMoAR cPSD|28265668 Question : Movie ratings reflect ordinal measurement because they can be ordered from most to least restrictive: NC-17, R, PG-13, PG, and G. The ratings of some films shown recently in San Francisco are as follows: Answer : (a) Construct a frequency distribution. (b) Convert to relative frequencies, expressed as percentages. (c) Construct a cumulative frequency distribution. (d) Find the approximate percentile rank for those films with a PG rating.
  • 25. lOMoAR cPSD|28265668 Graphs for Quantitative Data: ● Histograms ○ Equal units along the horizontal axis (the X axis, or abscissa) reflect the various class intervals of the frequency distribution. ○ Equal units along the vertical axis (the Y axis, or ordinate) reflect increases in frequency. (The units along the vertical axis do not have to be the same width as those along the horizontal axis.) ○ The body of the histogram consists of a series of bars whose heights reflect the frequencies for the various classes.
  • 27. lOMoAR cPSD|28265668 ● Frequency Polygon ● An important variation on a histogram is the frequency polygon, or line graph. ● Frequency polygons may be constructed directly from frequency distributions. ● For example:
  • 28. lOMoAR cPSD|28265668 Question : The following frequency distribution shows the annual incomes in dollars for a group of college graduates. (a) Construct a histogram. (b) Construct a frequency polygon. Answer :
  • 29. lOMoAR cPSD|28265668 ● Stem and Leaf Displays ● Another technique for summarizing quantitative data is a stem and leaf display. ● Stem and Leaf Display is a way for presenting quantitative data in a graphical format, similar to a histogram, to assist in visualizing the shape of a distribution. ● For example: ● For example:
  • 30. lOMoAR cPSD|28265668 Question : Construct a stem and leaf display for the following IQ scores obtained from a group offour-year- old children. Answer :
  • 31. lOMoAR cPSD|28265668 Graphs for Qualitative Data: ● Bar graph ○ Generally used for qualitative data. ○ Gaps are placed between adjacent bars of bar graphs to emphasize the discontinuous nature of qualitative data. ○ A bar graph also can be used with quantitative data to emphasize the discontinuous nature of a discrete variable, such as the number of children in a family. ● For example:
  • 32. lOMoAR cPSD|28265668 Typical Distribution Curve Shapes: ● Whether expressed as a histogram, a frequency polygon, or a stem and leaf display, an important characteristic of a frequency distribution is its shape.
  • 33. lOMoAR cPSD|28265668 ● Normal: Any distribution that approximates the normal shape ● Bimodal: Any distribution that approximates the bimodal shape ● Positively Skewed Distribution: A distribution that includes a few extreme observations in the positive direction (to the right of the majority of observations). ● Negatively Skewed Distribution: A distribution that includes a few extreme observations in the negative direction (to the left of the majority of observations).
  • 34. lOMoAR cPSD|28265668 Describing Data with Averages: ● Mode: ○ The mode reflects the value of the most frequently occurring score. ○ For example: Four years is the modal term, since the greatest number of presidents, 7, served this term.
  • 35. lOMoAR cPSD|28265668 Question : Determine the mode for the following retirement ages: 60, 63, 45, 63, 65, 70, 55, 63, 60, 65, 63. mode = 63 Question : The owner of a new car conducts six gas mileage tests and obtains the following results, expressed in miles per gallon: 26.3, 28.7, 27.4, 26.6, 27.4, 26.9. Find the mode for these data. mode = 27.4
  • 36. lOMoAR cPSD|28265668 Median: ● The median reflects the middle value when observations are ordered from least to most. ○ The value of the median always reflects the value of middle-ranked scores, not the position of these scores among the set of ordered scores. ● When you have an odd number of data points, the median is the value in the middle of your data set. ● With an even number of data points, there are two values in the middle, so the median is their mean.
  • 37. lOMoAR cPSD|28265668 → Odd-numbered data set: Step 1: Order your values from low to high. Step 2: Locate the median Middle Position = (n+1)/2 = (11+1)/2 = 6 So, Median = 6th element = 72
  • 38. lOMoAR cPSD|28265668 → Even-numbered data set: Step 1: Order your values from low to high. Step 2: Locate the median. Middle position = (n+1)/2 = (10+1)/2 = 5.5 So, Median = (5th element + 6th element)/2 = (72+76)/2 = 74
  • 39. lOMoAR cPSD|28265668 Question : Find the median for the following retirement ages: 60, 63, 45, 63, 65, 70, 55, 63, 60, 65, 63. median = 63 Question : Find the median for the following gas mileage tests: 26.3, 28.7, 27.4, 26.6, 27.4, 26.9. median = 27.15 (halfway between 26.9 and 27.4)
  • 40. lOMoAR cPSD|28265668 ● Mean ○ The mean is found by adding all scores and then dividing by the number of scores. ○ Statisticians distinguish between two types of means—the population mean and the sample mean—depending on whether the data are viewed as a population (a complete set of scores) or as a sample (a subset of scores). ○ The mean reflects the values of all scores, not just those that are middle ranked (as with the median), or those that occur most frequently (as with the mode).
  • 41. lOMoAR cPSD|28265668 “Sample mean (X-bar) equals the sum of the values of all scores in the sample (the sum of the variable X) divided by the sample size n.” “Population mean (μ) equals the sum of all scores in the population (sum of the variable X) divided by the population size N.” Question : Find the mean for the following retirement ages: 60, 63, 45, 63, 65, 70, 55, 63, 60, 65, 63. mean = 61.09 Question : Find the mean for the following gas mileage tests: 26.3, 28.7, 27.4, 26.6, 27.4, 26.9. mean = 27.22
  • 42. lOMoAR cPSD|28265668 Interpretation of the differences between Mean and Median ● When a distribution is skewed, differences between the values of the mean and median signal the presence of a skewed distribution. ● If the mean exceeds the median, as it does for the infant death rates, the underlying distribution is positively skewed because of one or more scores with relatively large values, such as the very high infant death rates for a number of countries, especially Sierra Leone. ● On the other hand, if the median exceeds the mean, the underlying distribution is negatively skewed because of one or more scores with relatively small values. ● In the given example, The median infant death rate of 7 describes the middle-ranked rate. Finally, the mean infant death rate of 30.00 describes the balance point for all rates. *Rates per 1000 live births.
  • 43. lOMoAR cPSD|28265668 Averages with Qualitative Data: ● The mode can always be used with all qualitative data. ● If qualitative data can be ordered from least to most because the level of measurement is ordinal, the median also can be used. ○ It’s easiest to determine the median class for ordered qualitative data by using relative frequencies. Cumulate the relative frequencies, working up from the bottom of the distribution, until the cumulative percentage first equals or exceeds 50 percent. ● In this Example, Since it includes a cumulative percent of 50, captain is the median rank of officers in the U.S. Army.
  • 44. lOMoAR cPSD|28265668 Question : College students were surveyed about where they would most like to spend their spring break: Daytona Beach (DB), Cancun, Mexico (C), South Padre Island (SP), Lake Havasu (LH), or other (O). The results were as follows: ● Find the mode and, if possible, the median. Answer : ● mode = DB (Daytona Beach) ● Impossible to find the median when qualitative data are unordered, with only nominal measurement.
  • 45. lOMoAR cPSD|28265668 Measures of variability: ● measures of the amount by which scores are dispersed or scattered in a distribution ● measures of variability define how far away the data points tend to fall from the center ● low variability is ideal because it means that you can better predict information about the population based on sample data ● high variability means that the values are less consistent, so it’s harder to make predictions ● There are several measures of variability, including ○ the range, ○ the interquartile range, ○ the variance, and most important ○ the standard deviation
  • 46. lOMoAR cPSD|28265668 ● For distribution A with the least (zero) variability, all seven scores have the same value (10). ● For distribution B with intermediate variability, the values of scores vary slightly (one 9 and one 11), and ● For distribution C with most variability, they vary even more (one 7, two 9s, two 11s, and one 13).
  • 47. lOMoAR cPSD|28265668 Range: ● range is the difference between the largest and smallest scores. ● For Example: Let we have 8 data points from Sample A. Data (minutes) 72 110 134 190 238 287 305 324 The highest value (H) is 324 and the lowest (L) is 72. R = H – L R = 324 – 72 = 252 The range of your data is 252 minutes. ● Because only 2 numbers are used in finding range, so, the range is influenced by outliers and doesn’t give you any information about the distribution of values. It’s best used in combination with other measures.
  • 48. lOMoAR cPSD|28265668 ● In distribution A, the least variable (least variability), has the smallest range of 0 (from 10 to 10); ● distribution B, the moderately variable (intermediate variability), has an intermediate range of 2 (from 11 to 9); ● distribution C, the most variable (most variability), has the largest range of 6 (from 13 to 7).
  • 49. lOMoAR cPSD|28265668 Interquartile Range (IQR): ● The interquartile range gives the spread of the middle of the distribution. ● The interquartile range is the difference of third quartile (Q3) and the first quartile (Q1). ● Interquartile range (IQR), is simply the range for the middle 50 percent of the scores. ● The interquartile range is an especially useful measure of variability for skewed distributions. ● The IQR is also useful for datasets with outliers. Because it’s based on the middle half of the distribution, it’s less influenced by extreme values. interquartile range in boxplot
  • 50. lOMoAR cPSD|28265668 → Odd-numbered data set: Step 1: Order your values from low to high. Step 2: Locate the median Middle Position = (n+1)/2 = (11+1)/2 = 6 So, Median = 6th element = 72
  • 51. lOMoAR cPSD|28265668 Step 3: Find Q1 and Q3. Q3 is the median of the second half, So here 81 Step 4: Calculate the interquartile range. Q1 is the median of the first half, So here 57 and
  • 52. lOMoAR cPSD|28265668 → Even-numbered data set: Step 1: Order your values from low to high. Step 2: Locate the median. Middle position = (n+1)/2 = (10+1)/2 = 5.5 So, Median = (5th element + 6th element)/2 = (72+76)/2 = 74
  • 53. lOMoAR cPSD|28265668 Step 3: Find Q1 and Q3. Q1 is the median of the first half, So here 57 and Q3 is the median of the second half, So here 81 Step 4: Calculate the interquartile range.
  • 54. lOMoAR cPSD|28265668 Outliers: ● Appearance of one or more very extreme scores in the dataset is called as outliers. ● An outlier is a data point that lies abnormally far away from other values in a dataset. ● For Example: ○ Someone like Elon Musk who has a net worth in the billions of dollars would be considered an outlier in terms of annual income. ○ Any freedivers who can hold their breath for 10 minutes or longer would be considered outliers because they can hold their breath much longer than 165 seconds. Formula to find outliers [Q1 – 1.5 * IQR, Q3 + 1.5 * IQR] If the value does not fall in the above range it considers outliers.
  • 55. lOMoAR cPSD|28265668 Variance: ● The variance is the average of squared deviations from the mean. A deviation from the mean is how far a score lies from the mean. Variance measures how far each number in the dataset from the mean. ● Variance is the square of the standard deviation.
  • 56. lOMoAR cPSD|28265668 ● For Example: 𝝁 = 𝝁 =
  • 57. lOMoAR cPSD|28265668 Standard Deviation: ● Standard deviation is a squared root of the variance. ● Low standard deviation indicates data points close to mean.
  • 58. lOMoAR cPSD|28265668 Example: You grow 20 crystals from a solution and measure the length of each crystal in millimeters. Here is your data: 9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4. Calculate the sample standard deviation of the length of the crystals.