2. MOTIVATION FOR THIS TOPIC:
DATA IN THE WORKPLACE
In a typical workplace (outside of research institutions),
the lion’s share of time is spent on:
1. Collecting the right data
2. ‘Cleaning’ the data to perform analysis
3. Presenting data
I. Making tables
II. Making charts
III. Interpreting
5. A PRODUCTIVE
EMPLOYEE
• Will know what kind of data
is needed
• Store and organize datasets
efficiently
• Be able to summarize and
present complex information
• Draw the correct
interpretation
• Make their work replicable
6. PRESENTING DATA
• Looking at raw data is not so intuitive
• Tables and descriptive statistics help summarize data
• But they’re sometimes boring
8. PRESENTING DATA
• Graphing data often drives the point
home
• Visually appealing
• Saves audience time
• Think about how you can tell the
most interesting story
9. A BASIC
FRAMEWORK FOR
GRAPHS
Categorical
variables
Bar charts
Pie chart
Quantitative
variables
Two variables
• Scatterplot
• Line plots
Single
Variable
• Box plots
• Histograms
This is not a strict classification
We can draw bar graphs for quantitative
variables
10. PIE CHARTS
• Categorical data
• Shows proportion
• Shows percentages of observations (individuals etc.) in
each category
• Relative size of slices is denoted by degrees of a full circle
48%
40%
12%
Biden Trump Jorgensen
11. PIE CHARTS
• Categorical data
• Shows proportion
• Shows percentages of observations (individuals etc.) in
each category
• Relative size of slices is denoted by degrees of a full circle
• A suitable color scheme can make the chart more
appealing
48%
40%
12%
Biden Trump Jorgensen
13. BAR CHART
• Also used for categorical data
• Shows magnitude of some indicator within each category
• Relative height used to denote size
• Consider the chart on the right that summarizes average
income for the supporters of each candidate
$-
$10,000
$20,000
$30,000
$40,000
$50,000
$60,000
$70,000
$80,000
Biden Trump Jorgensen
Avg. voter income
14. BAR CHART
• Also used for categorical data
• Shows magnitude of some indicator within each category
• Relative height used to denote size
• Consider the chart on the right that summarizes average
income for the supporters of each candidate
• Maintain consistency in chart schemes with a report $-
$10,000
$20,000
$30,000
$40,000
$50,000
$60,000
$70,000
$80,000
Biden Trump Jorgensen
Avg. voter income
16. TWO-WAY BAR CHARTS
• Bar charts can easily summarize two-way tables
• The chart on the right can show us visually
• Relative income for supporters of each candidate
• Segmented by gender
Avg. voter income
by gender
Male Female
Biden $ 56,417 $ 66,384
Trump $ 95,200 $ 54,020
Jorgensen $ 63,900 $ 62,000
17. TWO-WAY BAR CHARTS
• Bar charts can easily summarize two-way tables
• The chart on the right can show us visually
• Relative income for supporters of each candidate
• Segmented by gender
$-
$10,000
$20,000
$30,000
$40,000
$50,000
$60,000
$70,000
$80,000
$90,000
$100,000
Biden Trump Jorgensen
Avg. income by gender
Male Female
Avg. voter income
by gender
Male Female
Biden $ 56,417 $ 66,384
Trump $ 95,200 $ 54,020
Jorgensen $ 63,900 $ 62,000
18. TIPS ON MAKING BAR AND PIE CHARTS
In Excel: always need a table first
Use conditionals (countif, averageif, sumif)
Pivot tables are a blessing (upcoming)
Picking the right chart
Generally, use bar charts for magnitude
Generally use pie charts for proportion
Let Excel Suggest the chart
Alternate as to not bore your audience
19. SCATTER PLOTS
• Show simple relationship between two quantitative
variables
• One variable on the x-axis and the other on the y-axis
• Each point is one observation
• Will revisit when we look at correlations and regressions
• Consider the relationship between age and income from
our survey $-
$50,000
$100,000
$150,000
$200,000
$250,000
$300,000
0 20 40 60 80
Income
Age
20. LINE CHARTS
• Like scatterplots: show
relationship between two
quantitative variables
• But connected
• Most suitable when time is
on the x-axis (time charts)
• Each point in time should
only have one reading
0
2000
4000
6000
8000
10000
12000
14000
2020-02-25
2020-03-05
2020-03-14
2020-03-23
2020-04-01
2020-04-10
2020-04-19
2020-04-28
2020-05-07
2020-05-16
2020-05-25
2020-06-03
2020-06-12
2020-06-21
2020-06-30
2020-07-09
2020-07-18
2020-07-27
2020-08-05
2020-08-14
2020-08-23
2020-09-01
2020-09-10
2020-09-19
2020-09-28
2020-10-07
2020-10-16
2020-10-25
2020-11-03
2020-11-12
2020-11-21
2020-11-30
2020-12-09
2020-12-18
2020-12-27
2021-01-05
2021-01-14
New COVID-19 cases in Pakistan
21. PICKING THE RIGHT CHART
PICK SCATTER WHEN
TWO QUANTITATIVE
VARIABLES
PICK LINE WHEN TIME
ON X-AXIS
IF FEW VALUES IN TIME,
JUST MAKE BAR CHART
WHEN PRESENTING, TRY
TO ALTERNATE CHARTS
WHEN POSSIBLE
22. GRAPHING A SINGLE QUANTITATIVE
VARIABLE
Ironically, interpreting single
variable graphs is trickier than
graphs between two variables
Both box plots and histograms
represent the distribution of the
variable
23. BOX PLOTS / BOX AND WHISKER
PLOT
• Box plots depict the following information about a
variable:
• It’s median (line in the middle of the box)
• 25th percentile (lower edge of the box)
• 75th percentile (upper edge of the box)
• Minimum value (lower edge of the whisker)
• Maximum value (upper edge of the whisker)
• When using Excel: outliers (some magnitude of quartiles)
• Can help identify where bulk of the observations are
• The box and whisker plot of age of survey respondents is
shown
24. BOX PLOTS / BOX AND WHISKER
PLOT
• Box plots depict the following information about a
variable:
• It’s median (line in the middle of the box)
• 25th percentile (lower edge of the box)
• 75th percentile (upper edge of the box)
• Minimum value (lower edge of the whisker)
• Maximum value (upper edge of the whisker)
• When using Excel: outliers (some magnitude of quartiles)
• Can help identify where bulk of the observations are
• The box and whisker plot of age of survey respondents is
shown
25. VARIATIONS TO
WHISKER PLOTS
• Graphing min/max/quartiles is standard
• In practice, whiskers can be used to
graph many other measures:
• - Just min max (shown)
• - Confidence intervals (upcoming)
• The idea is the same: to show the
spread of the variable around some
center
Source: Ministry of Economic Affairs, Bhutan
26. HISTOGRAM
• Kind of like a bar graph
• Ranges of values of the variable on x-axis (called bins)
• No gaps or overlaps
• Ordered
• Frequency in each group on y-axis
• Graph of choice for single numerical variable
• On the y axis, can show frequency or fraction (density)
• Crucial to understanding probability distributions
27. INTERPRETING DISTRIBUTION:
SKEWNESS
Left hand size mirrors right hand size
Mode = Median = Mean
Example: Male heights
Symmetric
One tail stretching off to the right
Mode < Median < Mean
Example: Household income
Right / positively skewed
One tail stretching off to the left
Mode > Median > Mean
Example: Age of death
Left / negatively skewed
28.
29. INTERPRETING DISTRIBUTION:
VARIABILITY
We can also comment on the spread (variance or std dev) of the variable
by looking at the histogram
Spread out
Example: Results of many dice rolls
High variability
Closely clustered around center
Example: Age brackets of sophomores
at LUMS
Low variability