{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
Northern New England TUG - January 2024
1. Northern New England TUG
The Roux Institute at Northeastern University & Virtual
January 2024
Meeting Link
NU WIFI:
NU-wave-guest
Enter one-day conference code: conf014664
2. 900 – 930
930 – 1005
1005 – 1015
1015 – 1050
1050 - 1100
Hybrid attendees
please be patient
during breaks!
Networking & Breakfast
Presentation by Olivia Hoover
Quick Break
Presentation by Michael Correll
Wrap-up
January 2024 Agenda
7. People
Connect with others
• Data enthusiasts within your
org
• Join a Tableau User Group
(TUG)
• START a TUG?!
Seek inspiration
• Viz of the Day, Iron Viz,
Chartr
• Tableau Conference
• Tableau Visionaries
9. Portfolio
• Tableau Resume
• Tableau Public Portfolio
• Post everything - Show learning progress!
• Demonstrate new techniques
Link to Kimly Scott Visual
10. FREE Resources
Challenges
Makeover Monday
• Each week we post a link to a
chart, and its data, and then
you rework the chart.
• makeovermonday.co.uk
Workout Wednesday
• A weekly challenge to re-
create a data-driven
visualization.
• workout-wednesday.com
Blogs and Visionaries
Andy Kriebel
• Watch Me Viz - YouTube
Flerlage Twins blogs
• flerlagetwins.com
Ghafar Shah
• gshah300.medium.com
Tableau Visionaries
• tableau.com/community/
community-leaders/visionaries
Free Data Sets
Real World Fake Data
• Business-ready dashboards to
act as examples to the
community.
• sonsofhierarchies.com/
real-world-fake-data/
Data.World
Kaggle
Data.gov
Data is Plural
26. 26
The bar chart: visualization
of categorical data that
uses the length of a bar to
encode an aggregate value.
What do you do when the
length and value aren’t
proportional?
32. Tufte’s “Lie Factor”
Lie Factor =
The size of the effect shown in the graphic
The size of the effect in the data
”Dishonest” if LF!=1
33. Tufte’s “Lie Factor”
Lie Factor =
The size of the effect shown in the graphic
The size of the effect in the data
LF = 2.8!
34. Y Axis Truncation
Lie Factor =
The size of the effect shown in the graphic
The size of the effect in the data
Egraphic = 75px – 15px / 15px = 400%
Edata = 39.6 – 35 / 35 = ~13%
LF = ~30
34
43. Should We Truncate the Y-Axis?
🚫 The Anathemists
No, never!
📈The Line Chart Exceptionists
Only if it’s a line chart!
43
44. Should We Truncate the Y-Axis?
🚫 The Anathemists
No, never!
📈The Line Chart Exceptionists
Only if it’s a line chart!
🚦The Signalers
Only if you show me that you did!
44
51. 51
The bar chart: visualization
of categorical data that
uses the length of a bar to
encode an aggregate value.
What do you do when your
aggregates disagree?
61. 61
The bar chart: visualization
of categorical data that
uses the length of a bar to
encode an aggregate value.
What do you do when your
categories matter?
74. Algebraic Visualization
Data: D
Data Representations: R
Visualizations: V
⍺: data operation
⍵: design operation
We want stuff to commute.
D
D R
R
V
V
⍺ ⍵
r1
r2
v
v
74
103. Wrap Up
There’s ambiguity with bar charts:
Are they stacks of stuff or aggregated stuff?
They can fail “silently”:
They can hide important internal structures or insights
But if you think that’s bad, imagine what’s going on in charts that
aren’t as ubiquitous.
103
🍏
🍏
🍏
Avery
🍏
🍏
🍏
🍏
🍏
Blake
🍏
Charley
104. Wrap Up
There’s ambiguity with bar charts:
Are they stacks of stuff or aggregated stuff?
They can fail “silently”:
They can hide important internal structures or insights
But if you think that’s bad, imagine what’s going on in charts that
aren’t as ubiquitous.
104
105. Wrap Up
There’s ambiguity with bar charts:
Are they stacks of stuff or aggregated stuff?
They can fail “silently”:
They can hide important internal structures or insights
But if you think that’s bad, imagine what’s going on in charts that
are more complicated
105
107. January 25
February 1
February 13
~February
Chart Chat January 2024
Monthly series, replays on Youtube (Jeffrey Shaffer)
Where Does Data Viz Fit in an AI-Driven World?
Northeastern University & LinkedIn
Higher Education TUG
Portland Tech Meetup
follow @Doug Castoldi for info
Upcoming Events in the Tableau Community
A bar chart by Philippe Buache and Guillaume de L’Isle showing both the low and high water marks of the Seine for the thirty-five years between 1732 until 1766.
Foldout chart from Chronologische Geschichte der grossen Wasserfluthen des Elbstroms seit tausend und mehr Jahren (“Chronological history of the major floods of river Elbe, since a thousand and more years”), 1784.
Source: William Playfair, The Commercial and Political Atlas, London, 1786.
“The Struggle for Five Years in Four”
Jason Forrest
Here’s yet another chart from Fox News. Sorry to pick on them twice in a row, but they often put out such interesting deceptive charts that it’s hard to pass them up as examples. This bar chart here appears to show that tax rates will massively go up if the Bush tax cuts are allowed to expire in 2013. It looks like a big deal! In fact the bar on the right is about six times higher than the bar on the left. I don’t want my taxes to go up by a factor of 6!
Here’s yet another chart from Fox News. Sorry to pick on them twice in a row, but they often put out such interesting deceptive charts that it’s hard to pass them up as examples. This bar chart here appears to show that tax rates will massively go up if the Bush tax cuts are allowed to expire in 2013. It looks like a big deal! In fact the bar on the right is about six times higher than the bar on the left. I don’t want my taxes to go up by a factor of 6!
Here’s what happens when we plot that same data in Tableau. Our original chart is on the left, with that big scary looking increase. On the right, which is the default in Tableau, I’ve started the y-axis from 0, and so the ratio of the heights of the bars is exactly the same as the ratio of the difference in the data. That 5% or so increase in taxes looks much less threatening now.
Not so fast! Here’s a line chart that was tweeted by the National Review (they later deleted the tweet out of shame, alas) that they titled “The Only Climate Change Graph You’ll Ever Need.” It’s a chart of average global temperature over the past couple of centuries, plotted in Fahrenheit. I bet you all are breathing a sigh of relief. Maybe this global warming thing isn’t so bad after all?
This chart reminds me a little of when two kids get on each others nerves in a long car ride or something and go “I’m not touching you! I’m not touching you!”
The y-axis of this chart starts at 0, and I’ve jut spent all of this time showing you how non-zero axes can be misleading so I’m sure the designer would say: “see, it’s not misleading! I didn’t exaggerate the y-axis or anything!”
Not so fast! Here’s a line chart that was tweeted by the National Review (they later deleted the tweet out of shame, alas) that they titled “The Only Climate Change Graph You’ll Ever Need.” It’s a chart of average global temperature over the past couple of centuries, plotted in Fahrenheit. I bet you all are breathing a sigh of relief. Maybe this global warming thing isn’t so bad after all?
So here we don’t really want a y-axis that starts at zero. We want people to be able to see the changes. If you plot the same data on the right, and start at 55 degrees, you can see a clear and unmistakable and somewhat alarming warming globe.
We looked through the literature to see the other ways that people have tried to solve this problem, especially with bar charts. And it’s sort of a mixed bag. Usually you have to do a little bit of signaling to show to the viewer, “hey, we messed around with the bars here, pay attention.” My favorite is that 3D bar chart in the bottom right. It’s supposed to bend the bars up to make it clear where the truncation happened, but to me it looks like my data is meeting me at high noon for a showdown or something.
But it’s important to note that these were just proposed solutions. Very few of them had actually been tested. So that’s what we did. We gave people data sort of like this, and just asked them to qualitatively say how big they thought the effect was. I won’t go into a ton of details here, but it’s largely bad news.
Here is a snapshot of some of our results. The y-axis is people’s qualitative judgments. The higher up, the bigger a deal they thought the differences we were showing them were. Each column is where we started the y-axis. 0, 25, or 50%
The first thing that jumps out is that, yes, where we start the y-axis seems to make a big difference in how large the effect is perceived as being. Higher y-axis, bigger deal. So that’s bad news. This kind of truncation trick really does seem to bias people.
The other thing is that each of those dots within a column is a different design. We tried all sorts. Bar charts, line charts, charts with broken axis marks, the works. And nothing really seemed to help. Everything was about the same. To me that’s evidence that where you start that y-axis is super important, even if you try your best to signal to your viewer that you’ve done something odd. But as the climate change example showed up, just defaulting to starting from 0 isn’t necessarily going to make our problems go away. We have to think about what sort of differences we want to care about.
To illustrate this paradox, we’re going to be tackling the preeminent question of our age: who was a better batter in the mid 90s, the Yankee’s Derek Jeter or the Brave’s David Justice. I will admit some personal bias here in that I was raised from a young age to hate the Yankees with every ounce of my strength. But we don’t have to go on our gut, here. Let’s look at the data.
Here’s their batting averages for the 1995 and 1996 seasons, averaged together. David Justice, in blue, sadly seems to be underperforming Jeter, in orange.
But now I’ve dragged the “year” pill out from my shelf and the picture changes. For both of the years when I have data, Justice actually slightly over-performed Jeter. He beat Jeter every year!
This is Simpson’s paradox. We seem to have two contradictory conclusions: every single year, David Justice had a higher batting average than Derek Jeter. But when we add all of the seasons together, Jeter does better. So what gives? What’s the truth, here?
The solution to this paradox is that we’re dealing with averages. Here I’ve separated out the batting averages from the total number of at bats.
Justice just didn’t hit very well in 1995.
And neither did Jeter.
So 1995 was a bad year for them both. But Justice had way more at bats that year! So you add all of his worse performing at-bats with his much smaller number of at bats in 1996, a good year for both, and you resolve the paradox.
So yes, Justice outperformed Jeter every year. But they both had bad years in 1995, but that bad season mattered for Justice’s overall average much more than Jeter’s, so when you add everything together, you get the pattern you see here. How you decide to aggregate the data can determine the message that you get out of your chart. And it’s totally invisible unless you decide to explore different levels of aggregation! Yikes!
What I’m most interested here is what we use histograms for. For instance, in Tableau prep we often use them for “sanity checking”: hey, is my data fine, did whatever data prep steps I took screw anything up, things like that. But what this means is that we’re using histograms almost like the same way a doctor would use the results of, say, a lab test. Which means that we can get things wrong.
I didn’t go on that detour just to talk about algebra for a bit. I bring it up because these issues happen in histograms all the time! Here I’ve got two histograms that look very very similar. But trust me when I say the one on the right is hiding a dark secret.
When I increase the number of bins in the histogram, that secret is revealed. There’s a bunch of repeated data in the visualization which shows up as a spike on the right, which might indicate a problem. But those repeats are invisible when I don’t have enough bins. Big confuser!
On the other hand, I can have hallucinators too. Here I’ve got two very visually different histograms. The one on the right looks spikier, and the one on the left seems to have some outliers. But these are both samples from the same distribution, I just got a little lucky or unlucky with my samples.
When I reduce the number of bins, those extraneous visual differences all but disappear.
If I have too few bins, then I can miss out on features that I might care about: pretty much everything looks the same. If I have too many bins, then everything will look dramatically different, and I lose the big picture. So you often want to adjust these things to get it just right.
To do this adjustment, Tableau makes you right click on a field in a shelf and mess with a context menu
And then the viz will change when you’re done. And it turns out that pretty much nobody changes from the defaults. I looked at over 12 million workbooks on Tableau public, and of the ones that use bins, less than 2% ever change them off of the default.
This turns out to be a big problem, because if I’m in a particularly nefarious mood, and you let me choose the parameters of your histograms, I can hide all kinds of nastiness. Here’s a toy example where I have a perfectly fine data set on the left, and I’ve added some noise to it on the right. That’s that big spike.
And now I’ve searched through the design space to find the histogram or dot plot or what have you that will visually look the closest. I’ve created an Algebraic Confuser. If you were in a data prep scenario, you’d be in big trouble. Because the only way to find out that I’ve done this is to big into the parameters and mess with them until you figure out what I’ve done. And that’s often arduous or unintuitive or just plain not something people think about when they look at their data.