2. Topics Covered in this chapter
● Overview of Descriptive Statistics (Central Tendency, Variability),
● Data Visualization – Definition,
● Visualization Techniques – Tables, Cross Tabulations, Charts,
● Data Dashboards Using MS-Excel & SPSS.
3. Introduction to Descriptive Statistics
In Descriptive statistics you are describing, presenting, summarizing, and organizing your data,
either through numerical calculations or graphs or tables. Some of the common measurements in
descriptive statistics are central tendency and others the variability of the dataset.
Descriptive statistical analysis helps us to understand our data and is very important part of Machine
Learning. Doing a descriptive statistical analysis of our dataset is absolutely crucial. A lot of people skip
this part and therefore lose a lot of valuable insight about their data, which often leads to wrong
conclusions.
Descriptive analytics, or business intelligence, uses historical information to answer the question “What
Happened?” Think of it as a rear-view mirror into business performance or a summary view of facts and figures in
an understandable format to either inform or prepare data for further analysis. Observations, case studies, and
surveys form the basis of descriptive analytics.
4.
5. Measure of Central Tendency
single value that represents the centre of its distribution. There are three main measures of
central tendeIt describes a whole set of data with a ncy:
1. Mean: It is the sum of the observation divided by the sample size. It is not a robust statistics as it is
affected by extreme values. So, very large or very low value(i.e. Outliers) can distort the answer.
2. Median: It is the middle value of data. It splits the data in half and also called 50th percentile. It is
much less affected by the outliers and skewed data than mean. If the no. of elements in the dataset is
odd, the middle most element is the median. If the no. of elements in the dataset is even, the median
would be the average of two central elements.
3. Mode: It is the value that occurs more frequently in a dataset. Therefore a dataset has no mode, if no
category is the same and also possible that a dataset has more than one mode. It is the only measure of
central tendency that can be used for categorical variables.
6.
7. Measures of Variability
Measures of Variability also known as spread of the data describes how similar or varied are the set
of observations. The most popular variability measures are the range, interquartile range (IQR),
variance, and standard deviation.
8.
9.
10.
11. Normal Distribution
It basically describes how large samples of data look like when they are plotted. It is sometimes
called the “bell curve“ or the “Gaussian curve“.
Inferential statistics and the calculation of probabilities require that a normal distribution is given.
This basically means, that if your data is not normally distributed, you need to be very careful what
statistical tests you apply to it since they could lead to wrong conclusions.
In a perfect normal distribution, each side is an exact mirror of the other.
12.
13. A normal distribution with a mean of 0 and a standard deviation of 1 is called a standard normal
distribution. Area under the standard normal distribution curve would be 1.
14. Central Limit Theorem
If we take means of random samples from a distribution and we plot the means, the graph
approaches to a normal distribution when we have taken sufficiently large number of such samples.
The theorem also says that the mean of means will be approximately equal to the mean of sample
means i.e. population mean.
15. Normal distributions for higher standard deviations are flatter i.e. more spread as compared to
those for lower standard deviations.
17. Introduction:
In addition to descriptive analytics, data visualization is important for predictive and
prescriptive analytics. The visual content much more easy than verbal description or
mathematical model.
Visualizing a pattern also helps analysts select the most appropriate mathematical
function to model the phenomenon. Visualizing the results often helps in understanding
and gaining the insight about model output and solutions.
Data visualization is the graphical representation of information and data. By using visual
elements like charts, graphs, and maps, data visualization tools provide an accessible way
to see and understand trends, outliers, and patterns in data.
In the world of Big Data, data visualization tools and technologies are essential to
analyze massive amounts of information and make data-driven decisions.
20. Different Types of Analysis for Data Visualization
Mainly, there are three different types of analysis for Data Visualization:
1. Univariate Analysis: In the univariate analysis, we will be using a single feature to analyze almost all of
its properties.
2. Bivariate Analysis: When we compare the data between exactly 2 features then it is known as
bivariate analysis.
3. Multivariate Analysis: In the multivariate analysis, we will be comparing more than 2 variables.
21. The different types of visualizations
When you think of data visualization, your first thought probably immediately goes to simple bar graphs or pie
charts. While these may be an integral part of visualizing data and a common baseline for many data graphics,
the right visualization must be paired with the right set of information. Simple graphs are only the tip of the
iceberg. There’s a whole selection of visualization methods to present data in effective and interesting ways.
Common general types of data visualization:
● Charts
● Tables
● Graphs
● Maps
● Infographics
● Dashboards
22. Contd…
More specific examples of methods to visualize data:
● Cartogram ● Circle View ● Heat Map
● Network
● Polar Area
● Radial Tree
● Scatter Plot (2D or 3D)
● Streamgraph
● Text Tables
● Timeline
● Treemap
● Wedge Stack Graph
● Area Chart
● Bar Chart
● Box-and-whisker Plots
● Matrix
● Dot Distribution Map
● Gantt Chart
● Highlight Table
● Word Cloud
● Bubble Cloud
● Bullet Graph
● Histogram
● And any mix-and-match
combination in a dashboard!
23.
24. DASHBOARDS
Definition of dashboard
A dashboard is an information management tool that
receives data from a linked database to provide data
visualizations. It typically offers high-level information in
one view that end users can use to answer a single question.
In many cases, they can be configured to provide specific
information to the end user and how this information is
visualized. E.g., Numbers, charts, or graphs.
25. The importance of a dashboard
Dashboards provide users from all different businesses the ability to
monitor performance, create reports, and set estimates and targets for
the future.
Benefits of data dashboards:
A visual representation of performance
The ability to identify trends
An easy way of measuring efficiency
The means to generate detailed reports with a single click
The capacity to make more informed decisions
Total visibility of all systems, campaigns, and actions
Quick identification of data outliers and correlations
27. Data dashboards
Making the data visible and accessible to employees at all levels is a hallmark of
effective modern organizations. A dashboard is a visual representation of a set of
key business measures. It is derived from the analogy of an automobile’s control
panel, which displays speed, gasoline level, temperature and so on. Dashboards
provide important summaries of key business information to help manage a business
process or function.
Dashboards might include tabular as well as visual data to allow managers to locate
quickly key data.
An effective dashboard should capture all the key information that the users need for
making good decisions. Important business metrics are often called as key
performance indicators(KPI).
28. People tend to look at data at the top left first, so the most important charts should
be positioned there.
An important principle in dashboard design is to keep it simple-don’t clutter the
dashboard with too much information or use formats such as 3-D charts: they
don't clearly convey the information.
https://transformingindia.mygov.in/budget-2023/
29.
30. Creating charts in Microsoft Excel
Microsoft excel provides a comprehensive charting capability with many features.
With a little experimentation, you can create a professional charts for business
analyses and presentations.
These include vertical and horizontal bar charts, line charts, pie charts and area
charts, scatter plots, and many other special types of charts.
Certain charts works better for certain types of data, and using the wrong chart
can make data difficult for the user to interpret and understand. The user has to
concentrate more on the effectiveness in displaying the information rather than
attention grabbing aspects.