Data visualization

Data Visualization
How you see it…..
Baijayanti Chakraborty

About Me:-
I am Baijayanti Chakraborty, a Post Graduate student from Great Lakes Institute
of Management. I am doing PG in Business Analytics and Business Intelligence.
You can find me on:
1.LinkedIN: https://www.linkedin.com/in/baijayanti-chakraborty/
2.Twitter: twitter.com/baijayantic
3.Mail: baijayantichakraborty96@gmail.com
4.Github: https://github.com/baijayantichakraborty
5.Kaggel: https://www.kaggle.com/baijayanti94

Today’s Spot Of Interest
❖ What is visualization and why do we need it !!!!
❖ Basic Visualizations
❖ Advanced Visualizations

What is visualization and why do we need it !!!
Data visualization is an art of how to turn numbers into useful knowledge. We all know that when we see images its easy to
understand than when reading a lot of information.
Let’s consider the below example: Over here is a snip from the IRIS dataset which is already present in R. It’s quite difficult
to comprehend anything from the huge lot of data and hence to make it easy for understanding we will be using visualization
techniques.
Th

Some famous tools for data visualization ……….

Selecting the right kindof chart !!!
There are four basic presentation types:
1. Comparison
2. Composition
3. Distribution
4. Relationship
To determine which amongst these types is best suited for your
data at hand we should be able to answer the below questions :-
● How many variables do you want to show in a single
chart?
● How many data points will you display for each variable?
● Will you display values over a period of time, or among
items or groups?

Basic Visualization
(Without using ggplot2 library)

Histogram
Histogram is basically a plot that breaks the data into bins (or breaks) and shows frequency distribution of these bins.

Bar/Line Chart
● Bar charts are recommended when you want to plot a categorical variable or a combination of continuous and categorical
variable.
● Line Charts are commonly preferred when we are to analyse a trend spread over a time period.

Boxplots
Box Plots are used to plot a combination of categorical and continuous variables. This plot is useful for visualizing the spread of the
data and detect outliers. It shows five statistically significant numbers- the minimum, the 25th percentile, the median, the 75th
percentile and the maximum.Example for boxplot creation using the below code :

Boxplot analysis for the Iris dataset

Scatter Plot
Scatter Plot is used to see the relationship between two continuous variables.

Scatter plot can help to visualise multiple variables and their relations as well.

Some advanced packages of visualisation in R are :-
● Lattice Graphs :- Lattice package is essentially an improvement upon the R Graphics package and is used to
visualize multivariate data. Some kinds of visualisations with lattice package are :-
1.Kernal Density Plots

● ggplot2 :- this package is one of the most widely used visualisation packages in R. It enables the users to create
sophisticated visualisations with little code using the Grammar of Graphics.
● Plotly is an R package that creates interactive web-based graphs via the open source JavaScript graphing library
plotly.js. It can easily translate the ‘ggplot2’ graphs to web-based versions also.

Adavanced Scatter Plots
Besides the basic version of scatterplots we can also create them using the “ggplot2” library.
The below codes give a taste of the same.

Advanced scatter plots contd...

HeatMaps
Heat Map uses intensity (density) of colors to display relationship between two or three or many variables
in a two dimensional image. It allows us to explore two dimensions as the axis and the third dimension by
intensity of color.
The colour of the bars in the heat map is
dependent on the cyl parameter of the dataset.
The dataset used here is mtcars. It’s an inbuilt
dataset.

HeatMaps contd….
Using the library “plotly”, the heatmaps can be made interactive in nature. The below code gives
an insight as to how we can use plotly.

Correlogram
Correlogram is used to test the level of correlation among the variable available in the data
set. The cells of the matrix can be shaded or colored to show the correlation value.

Correlogram contd...
It is possible to use “ggplot2” aesthetics on the chart, for instance to color each category. We can use a new library “GGally”
and see how different variations are made to the simple correlogram.

Correlogram contd….
Change the type of plot used on each part of the correlogram. This is done with the upper and lower argument.

Area Chart
Area chart is used to show continuity across a variable or data set. It is very much same as line chart and is commonly used
for time series plots. Alternatively, it is also used to plot continuous variables and analyze the underlying trends.

3D Plots
● To create a 3D plot using R can be done
with the help of scatterplot3d package.
● scaterplot3d is very simple to use and it
can be easily extended by adding
supplementary points or regression
planes into an already generated
graphics.

Quick Information
For quick references you can easily check the cheatsheet side of Rstudio:
https://rstudio.com/resources/cheatsheets/
References :-
1. https://rstudio.com/resources/cheatsheets/
2. https://www.slant.co/topics/2354/~best-data-visualization-tools-for-massive-datasets
3. https://policyviz.com/product/core-principles-of-data-visualization-cheatsheet/
4. https://eazybi.com/blog/data_visualization_and_chart_types/
5. https://www.r-graph-gallery.com/199-correlation-matrix-with-ggally.html
6. https://towardsdatascience.com/a-guide-to-data-visualisation-in-r-for-beginners-ef6d41a34174?#0689

Data visualization

Recommended

Recommended

More Related Content

What's hot

What's hot (16)

Similar to Data visualization

Similar to Data visualization (20)

Recently uploaded

Recently uploaded (20)

Data visualization

Editor's Notes