+ 
Working with Categorical Data 
Slides edited by Valerio Di Fonzo for www.globalpolis.org 
Based on the work of Mine Çetinkaya-Rundel of OpenIntro 
The slides may be copied, edited, and/or shared via the CC BY-SA license 
Some images may be included under fair use guidelines (educational purposes)
Contingency Tables 
A table that summarizes data for two categorical variables is called a 
contingency table. 
The contingency table below shows the distribution of students' genders 
and whether or not they are looking for a spouse while in college.
Bar Plots 
A bar plot is a common way to display a single categorical variable. A 
bar plot where proportions instead of frequencies are shown is called a 
relative frequency bar plot. 
How are bar plots different than histograms? 
Bar plots are used for displaying distributions of categorical variables, while 
histograms are used for numerical variables. The x-axis in a histogram is a 
number line, hence the order of the bars cannot be changed, while in a bar 
plot the categories can be listed in any order (though some orderings make 
more sense than others, especially for ordinal variables.)
Choosing the 
Appropriate Proportion 
Does there appear to be a relationship between gender and whether 
the student is looking for a spouse in college? 
To answer this question we examine the row proportions: 
● % Females looking for a spouse: 51 / 137 ~ 0.37 
● % Males looking for a spouse: 18 / 70 ~ 0.26
Segmented Bar and Mosaic Plots 
What are the differences between the three visualizations 
shown below?
Pie Charts 
Can you tell which order encompasses the lowest percentage of 
mammal species? NO! 
http://www.bucknell.edu/msw3
Comparing Numerical Data Across Groups: 
side-by-side box plot

Categorical Data

  • 1.
    + Working withCategorical Data Slides edited by Valerio Di Fonzo for www.globalpolis.org Based on the work of Mine Çetinkaya-Rundel of OpenIntro The slides may be copied, edited, and/or shared via the CC BY-SA license Some images may be included under fair use guidelines (educational purposes)
  • 2.
    Contingency Tables Atable that summarizes data for two categorical variables is called a contingency table. The contingency table below shows the distribution of students' genders and whether or not they are looking for a spouse while in college.
  • 3.
    Bar Plots Abar plot is a common way to display a single categorical variable. A bar plot where proportions instead of frequencies are shown is called a relative frequency bar plot. How are bar plots different than histograms? Bar plots are used for displaying distributions of categorical variables, while histograms are used for numerical variables. The x-axis in a histogram is a number line, hence the order of the bars cannot be changed, while in a bar plot the categories can be listed in any order (though some orderings make more sense than others, especially for ordinal variables.)
  • 4.
    Choosing the AppropriateProportion Does there appear to be a relationship between gender and whether the student is looking for a spouse in college? To answer this question we examine the row proportions: ● % Females looking for a spouse: 51 / 137 ~ 0.37 ● % Males looking for a spouse: 18 / 70 ~ 0.26
  • 5.
    Segmented Bar andMosaic Plots What are the differences between the three visualizations shown below?
  • 6.
    Pie Charts Canyou tell which order encompasses the lowest percentage of mammal species? NO! http://www.bucknell.edu/msw3
  • 7.
    Comparing Numerical DataAcross Groups: side-by-side box plot