Data Visualization

2 January 2024 KPR Institute of Engineering and Technology, Coimbatore, Tamil Nadu, India 1
U21CS303 –Data Visualization

UNIT II : SHAPES OF DATAAND VISUALIZATION IDIOMS
Input for Visualization – Data and Tasks – Loading and Parsing Data – Data Cleansing and its importance
– Reusable Dynamic Components using the General Update Pattern – Reusable Scatter Plot – Common
Visualization Idioms – Bar Chart – Vertical Pie Chart – Horizontal Pie Chart – Coxcomb Plot Line Chart
– Area Chart – Criteria to choose charts
CO2: Apply visualization techniques for various data analysis tasks

INPUT FOR VISUALIZATION (DATAAND TASKS)
 Data is the core of any visualization. Understanding the nature of the data is vital before attempting to
visualize it.
 Tasks refer to the goals or objectives of the visualization. Understanding the task at hand helps in
choosing the right visualization technique.
Nature of Data:
1. Numerical Data
a. Continuous Data
b. Discrete Data
2. Categorical Data
3. Ordinal Data
4. Interval Data

Nature of Data:
1. Numerical Data
Numerical data represents quantities or measurements.
a. Continuous Data
Data that can take any value within a given range.
Examples:
Height of individuals (someone can be 5.7 feet tall or 5.71 feet tall), Temperature measurements.
Visualization Techniques: Histograms, line graphs, scatter plots.
b. Discrete Data
Data that can take only specific, separate values.
Examples:
Number of cars in a household (can't have 2.5 cars), Number of students in a class.
Visualization Techniques: Bar charts, pie charts.

2. Categorical Data
Categorical data represents categories or labels.
Examples:
Colors (red, blue, green), Genders (male, female).
Visualization Techniques:
Bar charts, pie charts, stacked bars.
3. Ordinal Data
Ordinal data represents categories with a specific order or rank.
Examples:
Education level (high school, bachelor's, master's, PhD).
Survey responses (strongly disagree, disagree, neutral, agree, strongly agree).
Bar charts (ordered), line graphs.

4. Interval Data
Interval data is similar to ordinal data but with consistent differences between values but lacks a true zero
point.
Examples:
 Temperature in Celsius (0°C doesn't mean absence of temperature).
 IQ scores.
Line graphs, scatter plots.

Hands-on Exercise:

The amount of data impacts the choice of visualization. Large datasets might require aggregation or
sampling.
Small datasets can be visualized in detail, while large datasets often require techniques to simplify,
aggregate, or sample the data to make it understandable.
1.Small Datasets
2. Large Datasets
3. Techniques for Handling Large Datasets
A. Aggregation
B. Sampling
Volume of Data:

Volume of Data:

Volume of Data:
Techniques for Handling Large Datasets:

Sources of Data:
 In the modern digital era, data is abundant and can be sourced from a variety of channels.
 Data can come from
 Databases
 Spreadsheets
 APIs
 Web scraping
 Manual input
1. Databases
A structured collection of data, often managed by a database management system (DBMS).
Databases are typically used by organizations to store vast amounts of transactional or operational data.
Characteristics:
 Structured, often relational in nature.
 Large volumes.
 Can be queried using SQL or other query languages.

2. Spreadsheets
Files that allow users to organize, format, and calculate data using a grid of cells.
Examples: Microsoft Excel and Google Sheets.
Characteristics:
 Tabular format.
 Easily accessible and modifiable.
 Contains formulas and calculations.
3. APIs (Application Programming Interfaces)
A set of rules and protocols that allow different software entities to communicate with each other. APIs
often serve as gateways to retrieve data from web servers or platforms.
 Characteristics:
 Automated data retrieval.
 Real-time or periodic updates.
 Structured, often in XML or JSON format.

4. Web Scraping
The process of extracting data from websites. This is done using scripts or bots that fetch web pages and
extract necessary information.
Characteristics:
 Data can be unstructured or semi-structured.
 Prone to changes based on website layout updates.
5. Manual Input
Data entry done by individuals manually, often into systems, applications, or forms.
Characteristics:
 Time-consuming.
 Highly susceptible to human error.

Data Quality:
 In any data-driven decision-making process, the quality of the data used is paramount.
 Poor data quality can lead to incorrect conclusions, flawed business decisions, and resource wastage.
1. Accuracy
2. Consistency
3. Completeness
4. Reliability
5. Timeliness of data

1. Accuracy
Definition: Accuracy refers to the closeness of a data value to its true or actual value.
Key Points:
Errors: Result from incorrect data entry, faulty sensors, or misinformation.
Validation: Use validation rules or checks to ensure data accuracy.
Verification: Cross-reference with trusted data sources.
Real-world Scenario:
Imagine an e-commerce platform recording product prices. If a product's price is incorrectly listed as $10
instead of $100, it could lead to significant revenue loss.

2. Consistency
Definition: Consistency ensures that data is uniform and consistent across all data sources and does not
contradict other recorded data.
Key Points:
Standardization: Use uniform units, formats, and conventions.
Deduplication: Ensure no repetitive or contradicting records exist.
Harmonization: Make sure data from different sources aligns well when integrated.
Consider a multinational company's employee database. If one branch records dates in "DD/MM/YYYY"
and another in "MM/DD/YYYY", inconsistencies can arise, causing confusion

3. Completeness
Definition: Completeness ensures that no necessary data is missing.
Key Points:
Null Values: Identify and address missing data points.
Default Values: Ensure they don't misrepresent actual data.
Data Imputation: Techniques can be used to fill in missing data based on existing data patterns.
In a medical trial, if patient outcomes are missing, the results of the trial could be skewed or inconclusive.

4. Reliability
Definition: Reliability pertains to the trustworthiness and consistency of the data over time.
Key Points:
Source: Data from reputable sources is often more reliable.
Duplication: Repeated or duplicate entries can compromise reliability.
Audit Trails: Maintain logs to trace back data changes or updates.
If a weather sensor occasionally malfunctions and records extreme values, the overall climate data
becomes less reliable.

5. Timeliness
Definition: Timeliness relates to the age of the data and its relevance to the current scenario.
Key Points:
Update Frequency: Ensure data is updated at appropriate intervals.
Real-time Data: Necessary for certain applications, like stock trading.
Historical Data: While not "timely", it's crucial for trend analysis.
In stock market analytics, using outdated stock prices can result in financial misjudgements.

Tasks
 Tasks refer to the goals or objectives of the visualization.
 Understanding the task at hand helps in choosing the right visualization technique.
 Some common tasks include:
 Comparison: Comparing two or more variables to see how they relate.
 Trends: Observing data over a period of time.
 Distribution: Understanding the spread and distribution of data.
 Relationship: Identifying correlation between variables.
 Geospatial: Mapping data to geographical locations.

1. Comparison
Definition: Comparison involves assessing two or more variables to understand their similarities,
differences, or relationships.
Key Points:
 Helps in benchmarking and relative judgment.
 Can be used for both categorical and numerical data.
 Bar Charts: Useful for comparing quantities across categories.
 Grouped or Stacked Bar Charts: Compare multiple variables across categories.
 Dot Plots: Good for comparing individual data points.
Imagine a company wants to compare sales of different products across regions. A grouped bar chart can
show each product's sales in each region, allowing for a clear comparison.

2. Trends
Definition: Trends involve observing data changes over a continuous interval, usually time.
Key Points:
 Reveals patterns, fluctuations, and cycles.
 Helps in forecasting or predicting future values.
 Line Charts: Ideal for showing trends over time.
 Area Charts: Similar to line charts but with the area under the curve shaded.
Consider stock market data. Investors often use line charts to track the price of a stock over time and
identify its upward or downward trend.

3. Distribution
Definition: Distribution refers to understanding how data is spread across different categories or ranges.
Key Points:
 Helps identify central tendencies, spread, and outliers.
 Reveals the overall shape of data distribution.
 Histograms: Shows the distribution of a single variable.
 Box Plots: Highlights median, quartiles, and potential outliers.
 Violin Plots: Combines aspects of box plots and density plots.
In a manufacturing unit, understanding the distribution of product sizes can help identify anomalies or
deviations from standards.

4. Relationship
Definition: This task focuses on determining if, how, and to what extent two or more variables are
related.
Key Points:
 Helps in identifying correlations or potential causations.
 Reveals how variables interact with each other.
 Scatter Plots: Plot two variables to see their relationship.
 Heat Maps: Great for showing correlations in larger datasets.
 Pair Plots: Visualize pairwise relationships in a dataset.
In healthcare, researchers might use scatter plots to see if there's a relationship between calorie intake and
weight gain among participants.

5. Geospatial
Definition: Geospatial visualization pertains to data that has a geographical or spatial component.
Key Points:
 Maps real-world data to geographical locations.
 Helps in spatial analysis and location-based insights.
 Choropleth Maps: Regions colored based on a variable.
 Point Maps: Individual data points plotted on a map.
 Heat Maps: Shows density or intensity of data points on a map.
During an epidemic, health agencies might use choropleth maps to show the spread and intensity of the
disease across regions.

Shapes of Data
 In the world of data analysis and visualization, the format or "shape" in which data is presented plays a
crucial role in determining how it can be processed, analyzed, and visualized.
 Data can be represented in various shapes or structures.
The most common ones include:
 Tabular Data: Data in rows and columns, like a spreadsheet.
 Hierarchical Data: Data in tree structures, like JSON.
 Network Data: Data representing connections, like social networks.
 Geospatial Data: Data with geographic information, like maps.

1. Tabular Data
Definition: Tabular data is structured data that's organized in rows and columns, much like a table or
spreadsheet.
Characteristics:
Each row typically represents a single record or entity.
Each column represents a specific attribute or field of the entity.
Contains headers to label each column.
Bar charts, line charts, and scatter plots for various columns.
Heatmaps for visualizing patterns within the table.
Real-world Example:
A company's sales record where each row represents a transaction, and columns may include date,
product ID, quantity sold, and total price.

2. Hierarchical Data
Definition:
Hierarchical data represents relationships in a tree-like structure, where entities can have parent-child
relationships.
Characteristics:
Data is nested.
Each entity (except the root) has one parent and can have multiple children.
Commonly represented in formats like JSON or XML.
Tree diagrams.
Dendrograms in hierarchical clustering.
Real-world Example:
A company's organizational chart where the CEO is at the top, followed by VPs, managers, and individual
contributors in a tree structure.

3. Network Data
Definition:
Network data, also known as graph data, represents entities (nodes) and the relationships or connections
(edges) between them.
Characteristics:
Nodes represent entities.
Edges represent relationships.
Can be directed (with a starting and ending point) or undirected.
Network graphs.
Force-directed layouts.
Matrix representation.
Real-world Example:
A social media platform where individuals are nodes and their friendships are edges connecting them.

4. Geospatial Data
Definition:
Geospatial data pertains to information associated with a specific geographic location.
Characteristics:
Contains spatial coordinates (like latitude and longitude).
Can also include altitude, distance, or other geographical metrics.
Often used in conjunction with maps.
Choropleth maps (regions colored based on data).
Point maps (points placed based on coordinates).
Heatmaps (density visualization).
Real-world Example:
A city's public health department mapping reported cases of a disease to identify hotspots.

LOADING AND PARSING DATA
 Loading data involves importing data from its source into the environment where the visualization
will be created.
 Once data is loaded, it might not be in a ready-to-use state.
 Parsing involves converting or arranging data into a usable format.
Loading Data
• Loading data is a foundational step in the process of data visualization.
• It essentially entails the importation of data from its original source into the specific environment
where the visualization will be constructed.
1.File Format
2.Database
3.Application Programming Interfaces

1. File formats:
One of the primary aspects to consider is the format of the data.
 Comma-Separated Values (CSV)
 Microsoft Excel
 JavaScript Object Notation (JSON), a hierarchical and key-value pair format
 XML, another hierarchical format, is also used in various applications, from web services to
document storage.

2.Databases:
Beyond file formats, databases play a crucial role in the world of data.
 Relational databases organize data into tables, allowing for complex queries that can join, filter,
and aggregate data.
 NoSQL databases, like MongoDB or Cassandra, offer more flexible storage solutions, especially
for unstructured or semi-structured data.
3. Application Programming Interfaces
 APIs allow applications to communicate with each other, often fetching real-time data.
 For instance, if one were creating a visualization about global weather patterns, they might fetch
data in real-time from a weather API.

Parsing Data:
 Data parsing is the process of converting or arranging data into a format or structure that's more
usable and insightful for our specific needs.
1. Data Cleaning
2. Data Transformation
3. Feature Engineering
4. Date and Time Parsing

Data Cleaning: The first and most fundamental step in data parsing is data cleaning.
Common Issues in Raw Data
 Missing values.
 Duplicate entries.
 Incorrect data types.
 Outliers or anomalies.
Importance of Data Cleaning:
 Eliminate errors and inconsistencies.
 Enhance accuracy of data analysis.
 Save time and resources in the long run
Real-world Example
Scenario: Imagine an e-commerce company collecting data on customer purchases. Incorrect data
might show a product being purchased before it was even launched. Data cleaning will identify and
rectify such anomalies.
Outcome: Clean data ensures accurate customer insights and better business decisions.

Data Transformation: Once our data is clean, it often needs to be transformed into a more suitable format
or scale.
Data types might need conversion, like turning a number stored as a string into an actual numeric value.
Such transformations are essential for mathematical operations and comparisons.
Examples of Data Transformation
 Numeric scaling: Normalizing or standardizing values.
 Encoding: converting categorical data into numeric format.
 Date formatting: ensuring a consistent date format across data.
 Aggregation: summarizing data into broader categories or metrics.
Real-world Example
Scenario: Consider a global e-commerce platform collecting data on customer purchases. Prices are stored
in various currencies. Before analyzing global sales, data needs to be transformed to a standard currency, like
USD. Clean data ensures correct exchange rates are applied consistently.
Outcome: Accurate insights on global sales and profit margins.

Feature Engineering: Feature engineering is the process of transforming raw data into features that better
represent the underlying patterns in the data, making them more suitable for modeling.
Parsing: Breaking down data into understandable and usable components.
Feature Engineering: Optimizing these components (features) for analysis or modelling.
Common Feature Engineering Techniques
 Normalization: Scaling features to a standard scale (e.g., 0 to 1) so they contribute equally to model
performance.
 One-hot encoding: Converting categorical variables into a format that can be provided to ML algorithms to
improve predictions.
 Binning: Converting continuous variables into discrete bins to handle outliers or make data more
interpretable.
 Feature extraction: Transforming high-dimensional data into a lower-dimensional form, retaining the most
important information
 Feature interaction: Creating new features by combining two or more features, capturing interactions
between them.

Date and Time Parsing: The process of converting textual data representing dates and times into a
standardized and usable format.
 Time-based data is common in various domains (e.g., finance, healthcare, logistics). Proper parsing ensures
it's utilized effectively for analysis and modeling.
 Proper date and time parsing is foundational for many data analysis tasks.
Common Formats & Challenges
Standard Date Formats: YYYY-MM-DD, DD/MM/YYYY, MM/DD/YYYY, etc.
Time Formats: HH:MM:SS, HH:MM AM/PM, etc.
Time Zones: UTC, GMT, PST, etc.
Daylight Saving Time: Adjustments can shift times.
Ambiguities: 03/04/2020 (Is it March 4th or April 3rd?)
Real-world Scenarios & Case Studies
Healthcare: Parsing appointment and treatment dates for patient data analysis

DATA CLEANSING AND ITS IMPORTANCE IN DATA VISUALIZATION
Data Cleansing is a systematic approach to ensuring the accuracy, consistency, and reliability of data by
identifying and rectifying errors and inconsistencies.
Every step, from data collection to visualization, can introduce errors. Without cleansing, these errors can
propagate and mislead.
Why Data Cleansing is essential for Data Visualization?
 Ensuring Accuracy
 Maintaining Consistency
 Building Trust
 Enhancing Efficiency
Common Data Quality Issues
 Missing Data
 Duplicate Data
 Inaccurate Data
 Outliers
 Inconsistent Formats

Steps in Data Cleansing
Data cleansing can be explained as a 5 step process.
 Step 1. Data Auditing
 Step 2. Data Cleaning
 Step 3. Data Verification
 Step 4. Data Reporting
 Step 5. Data Maintenance

Step 1. Data Auditing:
 Data auditing is the initial phase where data is meticulously examined to detect any anomalies,
inconsistencies, or inaccuracies.
 This process typically employs both descriptive statistics and visual methods.
 Descriptive statistics, such as means, medians, modes, standard deviations, and variance, can provide
insights into the central tendencies and dispersions within the data.
 Visual methods involve representing the data graphically to detect outliers or patterns that might not be
evident in tabulated data.
Tools:
 Histograms
 Scatter Plots
 Frequency Distributions

Step 2. Data Cleaning:
Once data auditing has identified problems, data cleaning rectifies these issues. It is the actual process of
amending or removing data in a dataset that is incorrect, incomplete, improperly formatted, or duplicated.
Methods:
Imputation: Missing data can distort analysis. One way to address this is by filling in the gaps with imputed
values.
Outlier Treatment: Not all outliers are errors; some might be genuine extreme values. The key is to ascertain
whether to keep them (if they're valid), adjust them (if they've been recorded incorrectly), or remove them (if
they're anomalies without relevance).
Deduplication: Duplicate data can arise due to various reasons, such as data merge errors or multiple data
entries. This step involves identifying and removing these redundant records to ensure each data point is
unique and accurate.

Step 3. Data Verification:
After cleaning, it's vital to revisit the data to ensure no new errors were introduced during the cleaning
process. Additionally, data verification ensures that no essential information was inadvertently removed.
 Methods: This might involve cross-referencing with other trusted datasets, re-running the initial auditing
techniques, or even sampling and manually reviewing records.
Step 4. Data Reporting:
Documenting the cleansing process is crucial for several reasons. It maintains transparency, ensures
replicability, and builds trustworthiness.
 Methods: This involves creating a detailed report outlining what anomalies were identified, the
techniques used to address them, and any potential implications of the cleaning process. This
documentation can be invaluable for future reference, for stakeholders to understand the data's quality, or
for other data professionals working on the dataset.

Step 5. Data Maintenance:
Data doesn't remain static. As new data is added or as existing data evolves, it's crucial to ensure that it
maintains its quality.
 Methods: Implementing routine checks, validations, and periodic audits can help in identifying and
rectifying new issues promptly. Automated scripts, data quality software, or even manual reviews (for
smaller datasets) can be employed to maintain data cleanliness over time.

Data Cleansing Tools and Techniques
Manual Review:
 Pros: Full control, detailed understanding.
 Cons: Time-consuming, not feasible for large datasets.
Automated Cleaning Tools:
DataWrangler, OpenRefine, and Trifacta.
 Pros: Faster, can handle large datasets.
 Cons: Might require a learning curve, can miss domain-specific nuances.
Programming Libraries:
Pandas in Python, dplyr in R.
 Pros: Highly customizable, can integrate with other data processing steps.
 Cons: Requires programming knowledge.
Data Quality Software:
Tools like Talend and Informatica offer end-to-end data management solutions.
 Pros: Comprehensive, often have built-in workflows.
 Cons: Can be expensive, might have a steeper learning curve.

Reusable Dynamic Components using the General Update Pattern
The General Update Pattern in D3.js (a popular data visualization library for JavaScript) provides such a mechanism,
facilitating the creation of dynamic and efficient visualizations.
What is the General Update Pattern?
The General Update Pattern is a methodology used in D3.js to handle data changes in a visualization. It breaks the
data-binding process into three distinct phases:
 Enter: Elements that need to be added to the visualization because they have new data associated.
 Update: Elements whose data has changed since the last render.
 Exit: Elements that no longer have associated data and need to be removed.

The Importance of Reusability
 Efficiency: Avoids re-writing code, saving time and reducing errors.
 Consistency: Ensures visualizations maintain a consistent look and feel across different parts of an application or
different applications.
 Modularity: Allows for easier debugging, testing, and understanding, as each component can be developed and
maintained separately.

Implementing the General Update Pattern
 Binding Data
Start by binding a dataset to a selection. This binding process involves pairing data points from dataset with DOM
elements in visualization.
 Handling the Enter Selection
 Create new elements for each of these data points.
 Define attributes and styles for these new elements based on their data.
 Handling the Update Selection
• Adjust attributes and styles of these elements to reflect any changes in their bound data.
 Handling the Exit Selection
• Remove these elements or adjust them as necessary, e.g., fading them out or shrinking them.

Making Components Reusable
 Encapsulation
Wrap your visualization code inside a function. This function can then be called whenever ,want to create a
new instance of visualization.
 Customizable Properties
Allow users of your component to customize its properties. This could be achieved using getter-setter functions
for each property.
 Event Handling
For interactivity, allow users to define custom event handlers. This ensures that the component can respond to
user interactions in a way that suits the specific application it's being used in.

REUSABLE SCATTER PLOT
A scatter plot, one of the most versatile visualization tools, displays individual data points on a two-dimensional
plane.
Components of a Scatter Plot
1. Axes and Scale
 X and Y Axes: Represent the domains of the two variables being compared.
 Scale: The mechanism that translates data values into pixel values. Common scales include linear,
logarithmic, and time-based.
2. Data Points
 Typically represented as circles, though other symbols can be used.
 Position determined by the data value for each variable.
3. Optional Components
 Gridlines: Enhance readability.
 Labels: Provide context or additional data about specific points.
 Trend Line: Indicates a general direction of data points.

 A reusable scatter plot is a type of data visualization that displays individual data points as markers on a
graph.
 The term "reusable" implies that the scatter plot is designed in a way that allows it to be easily used and
applied to different datasets or situations without requiring extensive modification.
 In a scatter plot, each data point is represented by a marker, typically a dot, and the position of the dot is
determined by the values of two variables (usually one on the x-axis and one on the y-axis).
 This allows for a visual representation of the relationship, if any, between the two variables.
REUSABLE SCATTER PLOT

COMMON VISUALIZATION IDIOMS
Bar Chart
 A bar chart is a statistical approach to represent given data using vertical and horizontal rectangular bars.
 The length of each bar is proportional to the value they represent.
 It is basically a graphical representation of data with the help of horizontal or vertical bars with different
heights.
 They are also known as bar graphs. The bar charts have three major characteristics:
 The bar charts are used to compare the different data among different groups.
 Bar charts show the relationship with the help of two axes. On one axis it represents the
categories and on another axis, it represents the discrete values.
 Over a period of time bar charts shows the major changes in available data.

Types of Bar Chart
The grouped bar chart is also referred the clustered bar chart (graph).
The stacked bar chart is also known as the composite bar chart.

Pie Chart
 A pie chart is a pictorial representation of data in the form of a circular chart or pie where the slices of
the pie show the size of the data.
 A list of numerical variables along with categorical variables is needed to represent data in the form of
a pie chart.
The whole pie represents a value of 100. It is divided into 10
slices or sectors. The various colors represent the ingredients
used to prepare the cake. What would be the exact quantity of
each of the ingredients represented in specific colors in the
following pie chart?
Quantity of Flour 30
Quantity of Sugar 20
Quantity of Egg 40
Quantity of Butter 10

Line Graph
 Line graphs, also called line charts, are used to represent quantitative data collected over a specific subject
and a specific time interval.
 All the data points are connected by a line.
 Data points represent the observations that are collected on a survey or research.
 The line graph has an x-axis and a y-axis.

Line Graph:
Shows continuous data over a period, setting against a general scale, and connecting individual data points
together, ideal for showing growth rate or trends at even intervals.
Scatter Plot
It works best when comparing large numbers of data points without regard to time. This tool is very
powerful when we are trying to show the relationship between two variables (x and y-axis), for example, a
person's weight and height.

AREA CHART
 An area chart, also known as a mountain chart, is a data visualization type that combines the
appearance of a line chart and a bar chart.
 It is commonly used to show how numerical values change based on a second variable, usually a time
period.

TYPES OF AREA CHART
 Basic area chart-tracking the number of weekly profile views from a business’s LinkedIn page.
 Overlapping area chart- compare two or more data groups based on a specific variable
 Stacked area chart(cumulative area chart)-deeper than just comparing values but also understand how
each group contributes to a total.

DIFFERENT CRITERIA TO CHOOSE CHARTS
 Choosing the right chart type is crucial in data visualization.
 It ensures that data is presented clearly, accurately, and effectively.
 Selecting an inappropriate chart can lead to misinterpretation of data or obscure critical insights.
Nature of Data
1. Qualitative vs. Quantitative
Qualitative Data (Categorical): Data that can be divided into categories.
Suitable Charts: Pie charts, bar charts, treemaps.
Quantitative Data (Numerical): Data that can be measured.
Suitable Charts: Line charts, scatter plots, histograms.
2. Time Series vs. Non-Time Series
Time Series Data: Data points indexed in time order.
Suitable Charts: Line charts, area charts, stacked area charts.
Non-Time Series Data: Data without a chronological order.
Suitable Charts: Bar charts, pie charts, scatter plots.


Number of Variables
1. Univariate Data
Data focusing on a single variable.
Suitable Charts: Histograms, pie charts, line charts (for time series).
2. Bivariate Data
Data exploring the relationship between two variables.
Suitable Charts: Scatter plots, line charts, bar charts.
3. Multivariate Data
Data involving more than two variables.
Suitable Charts: Scatter plot matrices, parallel coordinate plots, radar charts.

Relationships and Comparisons
1. Comparing Categories : Viewing the difference in values across different categories.
Suitable Charts: Bar charts, column charts, dot plots.
2. Distribution of Data : Observing the spread and shape of a dataset.
Suitable Charts: Histograms, box plots, density plots.
3. Trend Over Time : Analyzing patterns or changes over a period.
Suitable Charts: Line charts, area charts, waterfall charts.
4. Correlation : Examining the relationship between two or more variables.
Suitable Charts: Scatter plots, bubble charts.

Data Volume and Complexity
1. Small Data Sets : Emphasize individual data points or fine details.
Suitable Charts: Dot plots, bar charts.
2. Large Data Sets : Offer a general overview or highlight trends.
Suitable Charts: Line charts, heatmaps, histograms.
3. Hierarchical Data : Data with inherent structure or nested categories.
Suitable Charts: Treemaps, dendrograms, sunburst charts.

Audience and Context
1. Level of Expertise
General Audience: Opt for simpler, more intuitive charts.
Expert Audience: Can utilize more complex visualizations.
2. Presentation Medium
Digital Displays: Interactive visualizations like dynamic dashboards.
Print: Static visualizations with high clarity.
3. Storytelling vs. Exploration
Storytelling: Use charts that drive home a specific point or narrative.
Exploration: Use charts that allow users to delve into the data and discover insights

Aesthetics and Design
1. Clarity and Simplicity
Avoid clutter, prioritize readability.
2. Color Considerations
Ensure color choices are meaningful and accessible.
3. Consistency
If presenting multiple charts, maintain a consistent design language.

Data Visualization

Recommended

Recommended

More Related Content

Similar to Data Visualization

Similar to Data Visualization (20)

Recently uploaded

Recently uploaded (20)

Data Visualization