7. • Know row-level granularity of
your data
• Know what makes each row
unique
The dimension fields
represent the LOD of
the data source.
You cannot drill down
further than this.
8. Totally Aggregated
Totally Disaggregated
(granularity of data source -
cannot go lower)
Dimensions determine the Viz LOD. The Viz LOD
becomes less aggregated/more granular as more
dimensions are added.
Granularity
Less
More
Aggregation
More
Less
#
#
#
#
#
#
#
Dimensions
Measures
9.
10.
11. Do you have all the
dimensions?
No need to derive new
dimensions
New dimension need to
have a subset of data
only?
Subset of data need to
update automatically?
Create a computed
conditional set
Create a manual set
Dimension need to be
derived at row level?
Create row-level
calculation
Create FIXED LoD
expression
12. Measure needs mixed level
of aggregation?
Break measure into sub-
measures with a fixed level
of granularity?
Go back to Step 1, and go
through the process for each
sub-measure
Measure level of aggregation
same as dataset granularity?
Create a row level calculated
field
Measure aggregation at the
same LoD as the viz LoD?
Create an aggregated
calculated field
Measure aggregation more
granular than viz LoD?
Does result need just one
mark/value?
Create FIXED or EXCLUDE
LoD expression
Create table calculation
Create INCLUDE LoD
expression
Aggregation
Granularity
13.
14.
15. discount category
Measure needs mixed level
of aggregation?
Break measure into sub-
measures with a fixed level
of granularity?
Go back to Step 1, and go
through the process for each
sub-measure
Measure level of aggregation
same as dataset granularity?
Create a row level calculated
field
Measure aggregation at the
same LoD as the viz LoD?
Create an aggregated
calculated field
Measure aggregation more
granular than viz LoD?
Does result need just one
mark/value?
Create FIXED or EXCLUDE
LoD expression
Create table calculation
Create FIXED or INCLUDE
LoD expression
Aggregation
Granularity
16. discount product category
Measure needs mixed level
of aggregation?
Break measure into sub-
measures with a fixed level
of granularity?
Go back to Step 1, and go
through the process for each
sub-measure
Measure level of aggregation
same as dataset granularity?
Create a row level calculated
field
Measure aggregation at the
same LoD as the viz LoD?
Create an aggregated
calculated field
Measure aggregation more
granular than viz LoD?
Does result need just one
mark/value?
Create FIXED or EXCLUDE
LoD expression
Create table calculation
Create INCLUDE LoD
expression
Aggregation
Granularity
29. • It is easy for you to create a ton of LoD expressions
• It is very easy to overwhelm your data model
• It is most easy to use a wrong LoD expression in your viz
30.
31.
32.
33. Extract Filters
Data Source Filters
Context Filters
FIXED Expressions Evaluated
Dimension Filters
INCLUDE/EXCLUDE Expressions Evaluated
Measure Filters
Local Filters (ATTR, geocoding)
Table Calc Filters
Hide
database
local
34.
35. Query
• Query
database
• Cache
Results
Data
• Local data
joins
• Local
calculations
• Local filters
• Totals
• Forecasting
• Table
calculations
• 2nd pass
filters
• Sort
Layout
• Layout views
• Compute
legends
• Encode
marks
Render
• Marks
• Selection
• Highlighting
• Labels
52. What can go wrong?
Join Some rows might not match. What will happen?
Join There might be more than one match. Will you count
things twice?
Left It’s asymmetric, do you have the right primary?
Post-
aggregate
It’s already aggregated, is the combination of
aggregates valid?
Demo from workbook what these tools are and how they can help
No demo
Now that you know what dimensions are needed to answer your analytical question. First step is to ensure that you those dimensions. Use this framework to ensure that you have them.
Options include:
Sets – manual or computed.
Row level calculation
LoD expression (Fixed)
The reason we didn’t use filter and instead a set is because when you filter you exclude those values and cannot be used in a calculation.
Show a couple of examples of us working through the framework
Show a couple of examples of us working through the framework
Show a couple of examples of us working through the framework
Demo building table calculations in v9.3 using index() and then the new improvements in v10
Demo building table calculations in v9.3 using index() and then the new improvements in v10
Demo building table calculations in v9.3 using index() and then the new improvements in v10
Show a quick example, for dimension or measure, and for filtering on fixed LoD
Add comments to calculations
Add comments to field by copying desc
Demo % of Total
Examples to discuss:
Font – layout views
Filter – query but could be data if it is a table calculation or 2nd pass
Marks – when you select a mark it is simply render but when you have action filters you could trigger the whole pipeline.
Show an example of interactive vs. non-interactive interaction
One of the big ideas I want you to understand is you can use the techniques of visualization to make your job of debugging easier and faster.
To do this I’m going to take an example from Mrunal’s LOD expressions.
A few minutes ago, Mrunal was making sure that his calc for profit ratio on categories matched with the total calculation for profit ratio.
I’m going to build a visualization that shows both the new calc and the total.
If they align I know the calc is correct.
Do they align?
Yes. The calc is what I was expecting.
In this case, I was only trying to see if three numbers match, but the technique of using a visualization will scale up.
If there were 5 or 15 or 500 categories, I could tell in a glance if there was a difference between the bar and the reference line.
The next section is about joins.
In a normal join operation, you start with the columns on the left and augment them with columns from the right where you have a match on the join key.
In this example, the data set on the left has Sales for each State and the data set on the right has Population for each state.
When I join them by state, I’m going to end up with State, Region, Category, Sales and Profit, as well as Time Zone and Population.
For the first two rows, California, the data I get in Time Zone and Population is West and 38 million something.
Similar things happen for Colorado and Illinois.
For Missouri, Montana and Nevada, there’s just one row each.
For Texas and Washington it’s two rows each.
I can build a map of the sales per region.
BTW, do you know how to create custom territories from existing geographic fields?)
Right click on the field,
Geographic Role
Create From
Select the appropriate field.
I can easily build a map of sales per time zone.
Cool, my join added something useful data.
Let’s build another viz.
How many potential customers are there in each state?
Wait! How can that be?
There are only 38M people in California.
What’s going wrong?
Remember as part of the join, population gets put in to two rows for California.
When I drag out population, the default aggregation is SUM.
So my viz, which is doing an aggregation by state sums all the Population for each state.
That’s wrong for 5 of the 8 states in this data set.
How would I detect this might be a problem?
Do a sanity check, and make it easy for yourself.
You can add reference lines.
You can build a dashboard with the original viz.
You might be tempted to use a different aggregation, but you will need to sanity check it repeatedly.
LODs usually work nicely.
Blending is almost magical in how it works. It very often does exactly what you want.
Another concern with Joins are Nulls.
When you do a left or a right join there can be Nulls.
In this case, if I do a left join the row for Oregon will get NULL, for Region, Category, Sales and Profit.
If I do an inner join, I don’t even get a row for Oregon, but let’s stick to left joins for now.
Let’s look at Sales for the time zone.
If I didn’t already know there were NULLs for Oregon, how might I find out?
I can build a bar chart to look at how many records there are.
I can build a heat map to see where the NULLs are.
This works well even when there are lots and lots of records.
As you are looking, there are questions you can ask yourself.
Blends are a post aggregate left join.
An aggregate of aggregates is sometime fine.
A sum of sums is valid.
A min of mins is valid.
A average of sums is not the same as the average of the underlying items.
An average of averages is not either, but it’s often close and can fool you easily.
Here’s an example that I was involved with.
You may ask why different data sources? Because I didn’t have control of all of them. They were from different departments.
You may ask why I didn’t use a cross data base join? Because when I started this, Tableau didn’t have the features.
These are my motivations.
How can I check for dirty data?
How can I check my assumptions?
I’ve been talking about blends for a while, but there are similar concepts that apply to joins.
Most of the time, you don’t expect many-to-many joins and most of the time they are not the right thing.
However, sometimes the data surprises you and you get some unexpected joins.
In the Intern example, I did not expect Stephanie to have to mentor two interns.
Here’s is an example to detect that in the join case.