2. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 2
UNIT IV : REPORTS FORMATTING AND DATA REDUCTION
Using Color and Size in Visualization – Encoding Data using Color Encoding Data using Size Stacked
and Grouped Bar Chart – Stacked Area Chart and Streamgraph Line Chart with Multiple Lines –
Histograms Aggregating Data with Group – By Hexbin Mapping Cross filtering – Building a Migrant
Deaths Dashboard – Reports Vs Dashboards
CO4: Experiment to build interactive / animated dashboards construct data stories and
communicate important trends/ patterns in the datasets
3. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 3
Use of Color in Data Visualization:
To enhance the understanding of data, highlight patterns, and convey insights to the audience.
Remember that while color is a powerful tool, it should be used thoughtfully and intentionally.
several ways color can be used effectively in data visualization:
1. Categorical Data:
Use distinct colors to represent different categories or groups within our data. This makes it easy to
differentiate between various elements.
For example, in a bar chart comparing sales across different regions, can assign a unique color to each
region.
4. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 4
2. Sequential Data:
Employ a gradient color palette to show the progression of data from low to high values.
This is suitable for representing ordered data like temperature or time.
A map illustrating population density using different shades of a color (light to dark) is an example of
using sequential color.
3. Diverging Data:
Utilize a divergent color scheme to represent data that has a central point of reference, such as positive
and negative values relative to a mean or midpoint.
This can be effective in visualizing changes in sentiment or comparing performance against a baseline.
5. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 5
4. Heatmaps:
Employ a color scale to represent data density in heatmaps.
Darker shades indicate higher density or intensity, while lighter shades indicate lower density.
Heatmaps are often used in biology, finance, and geospatial analysis to visualize patterns in large
datasets.
5. Highlighting Data:
Use contrasting or bold colors to highlight specific data points, trends, or outliers within visualization.
This technique draws the viewer's attention to crucial information.
6. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 6
6. Color Coding:
Color-code elements consistently throughout our visualization to maintain clarity and consistency.
For example, to represent different types of products, use the same color for each type across different
charts.
7. Creating Visual Hierarchies:
Vary color intensity or saturation to create visual hierarchies.
More intense colors can indicate primary elements, while less intense colors can represent secondary
or supporting elements.
7. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 7
8.Combining Color and Texture:
In cases where color alone might not be sufficient (e.g., colorblindness), use textures or patterns
in conjunction with color to convey information.
This enhances accessibility and ensures that more people can interpret visualization accurately.
9.Multi-Dimensional Data:
When dealing with data that has multiple dimensions, can use color to represent one dimension
while size or shape represents another.
This allows for complex data visualization without overwhelming the viewer.
8. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 8
10.Time Series Analysis:
Use a consistent color scheme for data points over time to make it easier to follow trends and
changes.
Also create animated visualizations where color changes as time progresses.
11.Storytelling and Narration:
Use color changes strategically to guide the viewer's attention through a narrative or sequence of
events in the data.
9. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 9
12.Brand Consistency:
If data visualization is part of a larger presentation or report, consider using colors consistent
with brand to maintain a unified look.
10. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 10
Rules for Optimal use of Color in Data Visualization:
1. Use color when you should, not when you can
2. Utilize color to group related data points
3. Use Categorical colors for unrelated data
4. Categorical colors have few easily discernible bins
5. Change in chart type can often reduce the need for colors
6. When not to use sequential color scheme
7. Choose appropriate background
8. Not everyone can see all colors
11. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 11
1.Use color when you should, not when you can:
Use of color should be carefully strategized to communicate key findings and this decision, therefore,
cannot be left for automated algorithms to make. Most data should be in neutral colors like grey with
bright colors reserved for directing attention to significant or atypical data points.
12. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 12
2.Utilize color to group related data points :
Color can be used to group data points of similar value and to render the extent of this similarity using the
following two color palettes :
a. A sequential color palette is composed of varying intensities of a single hue of color at uniform saturation.
Variability in luminance of adjacent colors corresponds to the variation in data values that they are used to
render.
b. A divergent color palette is made of two sequential color palettes (each of a different hue) stacked next to
each other with an inflection point in the middle. These become helpful when visualizing data with variations
in two different directions.
13. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 13
3.Use Categorical colors for unrelated data:
Categorical color palettes are derived from colors of different hues but uniform saturation and
intensity and can be used to visualize unrelated data points of completely dissimilar origin or
unrelated values.
4.Categorical colors have few easily discernible bins : While the use of different colors can help
distinguish between different data points, a chart should at most comprise of 6–8 distinct color
categories for each of those to be readily distinguishable.
14. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 14
5.Change in chart type can often reduce the need for colors :
A pie chart probably is not the best option in the previous example. The resulting loss of categories
may not always be acceptable.
6.When not to use sequential color scheme : For the subtle difference in color of a sequential
palette to be readily apparent, these colors must be places right next to each other like in the chart
on the left below.
15. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 15
7.Choose appropriate background : How our perception of color of the moving square changes
with changes in its background. The human perception of colors is not absolute. It is made
relative to the surroundings. Perceived colour of an object is dependent not only on the colour of
the object itself but also of its background.
8.Not everyone can see all colors : Roughly 10% of the world population is colour blind and to
make coloured infographics accessible to everyone, avoid use of combinations of red and green.
Shown below are how people with three different kinds of color blindness view the same map.
16. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 16
Use of size in Data Visualization:
Using size effectively in data visualization involves conveying information through variations in the size of visual
elements.
Emphasizing Importance
Quantitative Representation
Comparisons
Hierarchical Information
Grouping and Clustering
Visualizing Relationships
Magnitude and Proportions
Temporal Representation
Layering and Complexity
Avoid Misrepresentation
Legend and Annotations
Accessibility
Aesthetic Balance
Contextual Usage
Testing and Iteration
import matplotlib.pyplot as plt
import numpy as np
x = np.random.rand(50)
y = np.random.rand(50)
sizes = np.random.randint(10, 100, 50)
plt.scatter(x, y, s=sizes, alpha=0.5)
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Scatter Plot with Size Encoding')
plt.show()
17. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 17
ENCODING DATA USING COLOR ENCODING
Color encoding is a data visualization technique that involves using different colors to represent distinct
categories, groups, or values within a dataset. It is an effective way to convey information and highlight
patterns, relationships, or differences in data.
Categorical Data
Nominal and Ordinal Data
Highlighting Data
Legend
Contrast and Accessibility
Color Consistency
Color Scales
Common Mistakes
import matplotlib.pyplot as plt
import numpy as np
categories = ['Category A', 'Category B',
'Category C', 'Category D']
values = [20, 45, 30, 15]
colors = ['blue', 'green', 'orange', 'red']
plt.bar(categories, values, color=colors)
plt.xlabel('Categories')
plt.ylabel('Values')
plt.title('Bar Chart with Color Encoding')
plt.show()
18. Data types
Quantitative
• Anything that has exact numbers.
• For example, Effort in points: 0, 1, 2, 3, 5, 8, 13.
Duration in days: 1, 4, 666.
Ordered / Qualitative
• Anything that can be compared and ordered.
• User Story Priority: Must Have, Great, Good, Not Sure.
Bug Severity: Blocking, Average, Who Cares.
Categorical
• Everything else.
• Entity types: Bugs, Stories, Features, Test Cases.
Fruits: Apples, Oranges, Plums.
26. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 26
Size Encoding:
Size encoding is a data visualization technique that involves using the dimensions or proportions of
graphical elements, such as shapes or bars, to represent quantitative values in a dataset. This technique
can help convey information about the magnitude or scale of data points.
Quantitative Data
Magnitude Representation
Scatter Plots
Bubble Charts
Proportional Symbols
Hierarchical Data
Attention and Emphasis
Limitations
Legend and Context
27. 2 January 2024
KPR Institute of Engineering and Technology, Coimbatore, Tamil
Nadu, India
27
Stacked Bar Chart:
A stacked bar chart is a type of data visualization that represents data using a series of bars stacked
on top of each other. Each bar is divided into segments or sections, and the height of each segment
corresponds to a specific value or category. Stacked bar charts are useful for visualizing the
composition of a whole and comparing the contributions of different subcategories.
Composition Representation
Categorical Data
Percentage or Proportion
Multiple Variables
Insight into Trends
Legend
Labeling
Limitations
Comparisons
import matplotlib.pyplot as plt
x = ['A', 'B', 'C', 'D']
y1 = [10, 20, 10, 30]
y2 = [20, 25, 15, 25]
plt.bar(x, y1, color='r')
plt.bar(x, y2, bottom=y1,
color='b')
plt.show()
28. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 28
import matplotlib.pyplot as plt
import numpy as np
x = ['A', 'B', 'C', 'D']
y1 = np.array([10, 20, 10, 30])
y2 = np.array([20, 25, 15, 25])
y3 = np.array([12, 15, 19, 6])
y4 = np.array([10, 29, 13, 19])
plt.bar(x, y1, color='r')
plt.bar(x, y2, bottom=y1, color='b')
plt.bar(x, y3, bottom=y1+y2, color='y')
plt.bar(x, y4, bottom=y1+y2+y3, color='g')
plt.xlabel("Teams")
plt.ylabel("Score")
plt.legend(["Round 1", "Round 2", "Round 3", "Round 4"])
plt.title("Scores by Teams in 4 Rounds")
plt.show()
29. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 29
Grouped Bar Chart:
A grouped bar chart is a type of data visualization that displays multiple bars side by side within a category.
Each group of bars represents a distinct category, and the individual bars within each group represent different
subcategories or variables.
Grouped bar charts are particularly useful for comparing values between different categories and subcategories.
Comparative Analysis
Categorical Data
Multiple Variables
Color Coding
Legend
Labeling
Spacing:
Limitations
Orientation
Trends and Patterns
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(5)
y1 = [34, 56, 12, 89, 67]
y2 = [12, 56, 78, 45, 90]
width = 0.40
plt.bar(x-0.2, y1, width)
plt.bar(x+0.2, y2, width)
30. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 30
Stacked Area Chart:
A stacked area chart is a type of data visualization that displays quantitative data as a series of stacked areas,
with each area representing a different category or subcategory.
This chart is particularly useful for showing how different components contribute to a whole over time or
across other dimensions
Composition Representation
Time-Series Data
Categorical Data:
Multiple Variables
Color Coding
Legend
Labeling
Trends and Patterns
Limitations:
import numpy as np
import matplotlib.pyplot as plt
x=range(1,6)
y1=[1,4,6,8,9]
y2=[2,2,7,10,12]
y3=[2,8,5,10,6]
# Basic stacked area chart.
plt.stackplot(x,y1, y2, y3,
labels=['A','B','C'])
plt.legend(loc='upper left')
31. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 31
Streamgraph Line Chart:
A streamgraph is a type of data visualization that is used to display how the composition of different categories
changes over time. It is similar to a stacked area chart but with a more fluid and wavy appearance. In a
streamgraph, each category is represented by a flowing stream of color, and the height of the stream at any point
in time indicates the relative proportion of that category.
Streamgraphs are useful for showing patterns and shifts in the distribution of data over time.
Designed for showing temporal changes in the composition of categories.
Fluid, wavy appearance with colors representing different categories.
Useful for highlighting trends and shifts over time.
Can become visually complex with too many categories or data points.
32. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 32
import matplotlib.pyplot as plt
import numpy as np
time = np.arange(0, 10, 0.1)
layer1 = np.sin(time) + 2
layer2 = np.sin(time) + 1
layer3 = np.sin(time)
layer4 = np.sin(time) - 1
plt.stackplot(time, layer1, layer2, layer3, layer4,
labels=['Layer 1', 'Layer 2', 'Layer 3', 'Layer 4'],
alpha=0.5)
plt.xlabel('Time')
plt.ylabel('Value')
plt.title('Streamgraph Line Chart')
plt.legend(loc='upper right')
plt.show()
33. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 33
Grouped Histograms:
Grouped histograms show multiple histograms side by side, each representing a distinct category or group. It is
suitable for comparing distributions across groups or categories.
Data Preparation: Organize data and define meaningful groups.
Select Bins: Determine bin ranges for each group. Bin size affects granularity.
Frequency Calculation: Count data points falling within each bin for each group.
Plotting: Create bars for each bin in each group on the same chart.
Color Coding: Use colors to differentiate bars from different groups.
Axes and Labels: Label bins on x-axis and show frequency/count on y-axis.
Legend: Include a legend to identify each group's bars.
Title and Context: Add a title and context to explain the visualization's purpose
Visualization Tools:
Software like Python (Matplotlib, Seaborn), R (ggplot2), and data visualization platforms offer tools for creating
grouped histograms.
34. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 34
import matplotlib.pyplot as plt
import numpy as np
# Sample data for two groups
group1_data = np.random.normal(0, 1, 1000)
group2_data = np.random.normal(2, 1, 1000)
# Creating histograms for both groups
plt.hist(group1_data, bins=20, alpha=0.5, label='Group 1')
plt.hist(group2_data, bins=20, alpha=0.5, label='Group 2')
# Adding labels and title
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Grouped Histograms')
plt.legend()
plt.show()
35. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 35
HEXBIN MAPPING:
Hexbin mapping is a data visualization technique used to handle dense data points in a scatter plot.
Instead of individual points, data is grouped into hexagonal bins, creating a heatmap-like representation.
Hexagons allow better visualization of density variations in a 2D space.
import matplotlib.pyplot as plt
import numpy as np
x = np.random.normal(0, 1, 1000)
y = np.random.normal(0, 1, 1000)
plt.hexbin(x, y, gridsize=20, cmap='viridis')
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Hexbin Mapping Example')
plt.colorbar(label='Density')
plt.show()
36. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 36
Cross Filtering:
Cross filtering is an interactive data exploration technique.
It involves selecting data points in one visualization and seeing the corresponding changes in other linked
visualizations.
Enables users to explore data relationships and correlations.
Hexbin Mapping with Cross Filtering:
Hexbin mapping can be combined with cross filtering to enhance data exploration.
Users can select a hexagonal bin in one visualization and see the effects on other linked visualizations.
37. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 37
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
x = np.random.normal(0, 1, 1000)
y = np.random.normal(0, 1, 1000)
# Creating a hexbin plot with Seaborn
sns.set(style="whitegrid")
sns.jointplot(x=x, y=y, kind="hex", color="blue")
plt.show()
38. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 38
Steps for Implementation:
Data Preparation: Organize and preprocess data for visualization.
Hexbin Mapping: Plot the hexbin map, aggregating data points into hexagonal bins.
Cross Filtering Setup: Link other visualizations (e.g., line chart, bar chart) to the hexbin map.
Interactive Selection: Enable user interaction to select hexagons.
Cross Filtering Effect: Reflect selected hexagons' data in linked visualizations.
Visualization Framework: Tools like D3.js, Plotly, or custom web applications can facilitate hexbin
mapping with cross filtering.
39. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 39
BUILDING A MIGRANT DEATHS DASHBOARD IN DATA VISUALIZATION
Data Collection and Preparation
Choose a Visualization Tool
Design the Dashboard
Interactive Features
Ethical Considerations
Accessibility and Usability
Deployment
Promotion and Outreach
Updates and Maintenance
40. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 40
REPORTS DASHBOARDS
A Report is any informational work. This information can be at
any format. Table, Chart, text, number or anything else.
These are documents that show a snapshot of findings pertaining
to a specific topic.
A dashboard is a visual display of the most important information
needed to achieve one or more objectives; consolidated and arranged
on a single screen so the information can be monitored at a glance.
A dashboard is a great way to customize and tailor the display of
chosen data, such as specific metrics or Key Performance
Indicator(KPIs).
Power BI Report
Power BI Report is combination of multiple visual elements
(charts, texts, values…) on a page that can be inter-related with
each other.
Data visualized in the report can be sliced and diced with slicers.
Power BI report is fully interactive from user and it can be
filtered based on some criteria.
Power BI Dashboard
Power BI Dashboard is a high-level view of some of key KPIs of one
or more reports.
Dashboard is a day-to-day view of KPIs, and provide the navigation
point to the detailed reports.
Power BI Dashboard isn’t built for slicing and dicing.
41. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 41
Items Dashboard Report
Data Source Dashboard are built based on multiple tables that are
connected to each other in one or more ways
Reports are generally created from single table of data set
with no relationship from other tables
No. of pages Dashboards are not allowed to cross more than one
page, it always shows the important reports in the
single page itself.
Reports are generally built-in multiple pages.
Visualizations Dashboards always concentrate on building insights
into the data by using attractive visuals, graphs,
charts, etc.
Reports are not concentrated on the visualization part of
the data rather it looks to create summary pages.
Template Dashboards don’t have any set template, it’s up to
the creator to visualize the data to fit the needs of
the business.
Reports generally have a set template and according to
the addition, deletion of the data, the template will
create reports if the formulas are applied from the data
table.
Slicers and filters Since dashboards are limited to a single page, not
possible to use filters and slicers.
In reports, we can filter and slice the data by using slicers
and many filtering options like cross-filtering, visual
level filtering, and page-level filtering.
42. 2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 42
Kind of
Information
Dashboards may include only limited
information which is only important to the
end users
Reports not limited to a single page, so it can have a
detailed break up of each category of the report in
detail in multiple pages.
Reader
Interactivity
Dashboard are pinned to the page so the reader
can just read through the data.
Reports are created with any kind of filters and slicers
so the user can interact with the data set.
Changes to Visuals Dashboards are pinned to the page even the
report owner changes it will not reflect on the
page.
Reports usually come along with the data set, so if
the reader wishes to change the visual type, they
can change at any point in time.
Alerts Dashboards can create alerts to email when
specific condition or criteria is met or limit
crossed.
Reports cannot create alerts to email when specific
condition or criteria is met or limit crossed.
Data set View With Dashboards, we cannot see the source data
because the reader only gets the single page
information.
Reports can see tables, data sets, and fields of the
data in detail i.e. Raw Data.