Unit III.pptx

Data Visualization
• As data and insights grow in number, a new requirement is the ability
of the executives and decision makers to absorb this information in
real time.
• There is a limit to human comprehension and visualization capacity.
• That is a good reason to prioritize and manage with fewer but key
variables that relate directly to the Key Result Areas (KRAs) of a role.

Data Visualization
• Data visualization is the graphical representation of information and
data. By using visual elements like charts, graphs, and maps, data
visualization tools provide an accessible way to see and understand
trends, outliers, and patterns in data.
• Additionally, it provides an excellent way for employees or
business owners to present data to non-technical audiences
without confusion.
• In the world of Big Data, data visualization tools and
technologies are essential to analyze massive amounts of
information and make data-driven decisions.

Considerations
• Here are few considerations when presenting using data:
1. Present the conclusions and not just report the data.
2. Choose wisely from a palette of graphs to suit the data.
3. Organize the results to make the central point stand out.
4. Ensure that the visuals accurately reflect the numbers. Inappropriate visuals
can create misinterpretations and misunderstandings.
5. Make the presentation unique, imaginative and memorable.

History
• The classic presentation of the story of Napoleon’s march to Russia in
1812, by French cartographer Joseph Minard,
• It covers about six dimensions.
• Time is on horizontal axis. The geographical coordinates and rivers are
mapped in. The thickness of the bar shows the number of troops at
any point of time that is mapped. One color is used for the onward
march and another for the retreat. The weather temperature at each
time is shown in the line graph at the bottom.

Advantages of data visualization
• Easily sharing information.
• Interactively explore opportunities.
• Visualize patterns and relationships.

Disadvantages
• Biased or inaccurate information.
• Correlation doesn’t always mean causation.
• Core messages can get lost in translation.

Why data visualization is important?
• it helps people see, interact with, and better understand data.
Whether simple or complex, the right visualization can bring everyone
on the same page, regardless of their level of expertise.
• Every STEM field benefits from understanding data—and so do
fields in government, finance, marketing, history, consumer
goods, service industries, education, sports, and so on.
• Data visualization is one of the steps of the data science
process, which states that after data has been collected,
processed and modeled, it must be visualized for conclusions to
be made.
•

Data Science
• While both fields involve working with data to gain insights, data
science often involves using data to build models that can predict
future outcomes, while data analytics tends to focus more on
analyzing past data to inform decisions in the present.
• Data Science makes use of machine learning algorithms to get
insights. Data Analytics does not use machine learning to get the
insight of data.

• Data Visualization Discovers the Trends in Data

• Data Visualization Provides a Perspective on the Data

• Data Visualization Puts the Data into the Correct Context

• Data Visualization Saves Time

• Data Visualization Tells a Data Story

General Types of Visualizations
• Chart: Information presented in a tabular, graphical form with data
displayed along two axes. Can be in the form of a graph, diagram, or map.
• Table: A set of figures displayed in rows and columns.
• Graph: A diagram of points, lines, segments, curves, or areas that
represents certain variables in comparison to each other, usually along two
axes at a right angle.
• Geospatial: A visualization that shows data in map form using different
shapes and colors to show the relationship between pieces of data and
specific locations.
• Infographic: A combination of visuals and words that represent data.
Usually uses charts or diagrams.
• Dashboards: A collection of visualizations and data displayed in one
place to help with analyzing and presenting data.

Categories of Data Visualization

Numerical Data
• Numerical data is also known as Quantitative data. Numerical data is
any data where data generally represents amount such as height,
weight, age of a person, etcNumerical data is categorized into two
categories :
• Continuous Data –
• It can be narrowed or categorized (Example: Height measurements).
• Discrete Data –
• This type of data is not “continuous” (Example: Number of cars or children’s a household
has).
• The type of visualization techniques that are used to represent
numerical data visualization is Charts and Numerical Values. Examples
are Pie Charts, Bar Charts, Averages, Scorecards, etc.

Categorical Data
• Categorical data is also known as Qualitative data. Categorical data is any data
where data generally represents groups. It simply consists of categorical variables
that are used to represent characteristics such as a person’s ranking, a person’s
gender, etc. Categorical data visualization is all about depicting key themes,
establishing connections, and lending context. Categorical data is classified into
three categories :
• Binary Data –
• In this, classification is based on positioning (Example: Agrees or Disagrees).
• Nominal Data –
• In this, classification is based on attributes (Example: Male or Female).
• Ordinal Data –
• In this, classification is based on ordering of information (Example: Timeline or processes).
• The type of visualization techniques that are used to represent categorical data is
Graphics, Diagrams, and Flowcharts. Examples are Word clouds, Sentiment
Mapping, Venn Diagram, etc.

Top Data Visualization Tools
• The following are the 10 best Data Visualization Tools
• Tableau
• Looker
• Zoho Analytics
• Sisense
• IBM Cognos Analytics
• Qlik Sense
• Domo
• Microsoft Power BI
• Klipfolio
• SAP Analytics Cloud

Spatial Visualization Techniques
• Univariate data --1 dimension data
• A single value can be displayed
• as the number itself -- a string of digits
• as a dial (such as the altimeter, speedometer, guage)
• as a slider or thermometer

• Maximization
use least amount of "ink" or non-
background pixels and leverage our
pre-attentive vision to fill in the area.
Tukey plot as typically presented on
the left and a revised minimized plot
on the right (or below):

• Information in the axes
Histogram removal of y axis; axis values are aligned with the pre-
attentive "white" line through the data

• Sparklines
• Sparklines are examples of high data-ink ratios. They are typically a time series
and can be used to represent visually the sequence in a very dense and compact
manner. They may be small enough to just be included in the flow of the text
rather than having to refer to a separate figure.

• One Dimensional Data as
Spatial Data
• Time is now displayed as the
x axis and the data values are
the y axis

• Two Dimensional Data as Spatial Data
• Mapping spatial attributes of the data to the screen.
• We really are working in three dimensions now.
• Two dimensions specify the location
• A third dimension is then plotted, maybe with several other dimension (see
height and color on the map below).
• Scatterplot -- discrete data values are mapped to a location (pixel or dot) and marked by
color, shape or size; result is 2D
• Image -- each point is mapped to a pixel location and intermediate pixels that are
unmapped are interpolated for color or brightness according to neighboring mapped
pixels; result is 2D; often referred to as a "heat map"
• Rubber sheet -- each point is mapped to an image pixel and it has a third value that
controls a height. Missing points are also interpolated to make a smooth surface. Result is
3D

• 3D Data as spatial
• Visualizing the surface
• Visualizing the volume

Visualizing Geospatial Data
on a Map

Visualizing Geospatial Data on a Map
• 1. Point map
A point map is one
of the simplest
ways to visualize
geospatial data.
Basically, you
place a point at
any location on the
map that
corresponds to the
variable you’re
trying to measure
(such as a
building, e.g. a
hospital).

• Proportional symbol map
This is a variation of the point
map. It uses a circle or other
shape to represent data at a
particular location. However,
based on the point's size and/or
color, it can be used to
represent multiple other
variables at once (such as
population and/or average age).

• Cluster map
This is a proportional symbol
map with a twist. It features a
similar concept of using
points of varying sizes and
colors to represent multiple
types of data at a location at
once. However, these larger
points serve as stand-ins for
smaller points, which
become visible if you
increase the map’s scale.
This gets around the main
issue of overcrowding in
point maps, but requires
special geospatial data
visualization tools such as
GIS software.

• Choropleth map
It’s made by
separating the
area being
mapped, such as
by geographic or
political
boundaries, and
then filling each
resulting section
with a different
color or shade.

• Cartogram map
This variation of the
choropleth map is a
hybrid of a map and a
chart. It involves taking a
land area map of a
geographic region and
dividing it into segments
in such a way that sizes
and/or distances are
proportional to the
values of the variable
being measured.

• Hexagonal binning map

• Heat map

• Topographic map

• Flow map
Flow maps, also known as
‘path’ maps, are more
specialized versions of line
maps. Instead of focusing on
physical features of the earth,
they are used to represent the
movement of things across the
earth over time.

• Spider map
The spider map is a
variation of the flow
map. Instead of
focusing on discrete
pairs of origin and
destination data
points, the spider
map looks at the
relationships
between origin points
and multiple
destination points –
some of which may
be held in common.

• Time-space distribution map
This is an advanced form of
geospatial data mapping that
combines the precision of a point
map with the dynamism of a flow
map. It seeks to accurately
determine the locations of objects at
any point in time as they move.

• Data space distribution
map
This is another variant of
the flow map that aims to
not only represent the
movement of things over
time, but also how
variables dependent on
that movement change
over time.

Time Oriented Visualizations
• Time can be simply viewed as linear and chosen as the x-axis in most
visualizations.

1. Scale
• How is time measured? When are the data measurements/samples
taken?
• Ordinal -- before, during, after
• Discrete -- clear intervals (seconds, minutes, hours.....)
• Continuous -- mapping to the real numbers. Discrete values can be
interpolated

2. Scope
• The range of time associated with a measurement/sample
• point -- the sample is from a point in time that has no duration
• interval-based -- there is a duration; a start and end
These time primitives can be anchored (absolute) or unanchored (relative)
• We can also recognize determinancy:
• determinant -- all aspects of time is known and fixed
• indeterminant -- there may be some uncertainty. Intervals are sometimes used here
to compensate.

3. Arrangement
Time often has a cyclical nature, compared to the linear nature described
above:
• hourly cycle
• 24 hour cycle in a daily cyclc
• 7 days in a weekly cycle (Mon->Tues....Sun->Mon)
• ~30 days in a monthly cyclc
• lunar cycle
• quarterly/seasonal cycle (financial, astronomical, meteorological)
• 365 days, 52 weeks, 12 months in a yearly cycle
• Decades
The different units suggest granularity. How you might represent a
visualization may vary (interactively) by granularity (zoom in, zoom out)

Characteristics of Time-Oriented Data
• This is more of a reminder of the data typing we have discussed
earlier in the course

Multivariate Data
• Univariate statistics summarize only one variable at a time. Bivariate
statistics compare two variables. Multivariate statistics compare more
than two variables.
• Multivariate visualizations can be done by adding more than
one visual variable to a simple renderer. Common combinations
include:
1.Color and size
2.Size and rotation
3.Size, rotation, and color

Graphs, Trees, and How to
Visualize Them

Graphs, Trees, and How to Visualize Them
• Let’s instead talk about graphs, networks, & trees in the mathematical
sense: a model for representing items and the relationships between
those items
• Social / friendship networks
• Computer networks
• Energy or transportation grids
• Organizational structures
• Etc.

Node-link tree diagrams
• Nodes are distributed in space, connected by straight or curved lines
• Typical approach is to use 2D space to break apart breadth and depth
• Often, space is used to communicate hierarchical orientation

Text and Document Visualization

Text and Document Visualization
• Here we consider visualizing the text within a document, and collections of
documents which are likely related (corpus).
• Difficulty in analysis includes the loose structure, varied vocabulary, and
optional metadata such as author(s), date, modification dates, comments,
keywords, catalog codes, citations.
• Levels of text to be represented:
• Lexical level -- Simple grouping of characters into "tokens" which are typically words,
but word stems, phrases, word n-grams and character n-grams may be beneficial
• Syntactic level --Parsing purpose of token, grammatical category, tense, plurality, in
the context of the phrase, sentence and paragraph
• Semantic level -- Extract meaning of the syntactic structure with the tokens using
fuller analysis of the context.

Vector Space Model
• Analysis of the words in a document and determine their value in
contribution and significance to the document.
• Removal of noise words ("a", "an", "the", "that") and punctuation,
and stemming (collecting roots of words) are typical of preprocessing.
• Simple frequency counts of significant words ordered by decreasing
frequency is a simple vector.

Vector Space Model
• https://wordcounter.net/
• Here we consider visualizing the text within a document, and collections of documents which
are likely related (corpus).
• Difficulty in analysis includes the loose structure, varied vocabulary, and optional metadata
such as author(s), date, modification dates, comments, keywords, catalog codes, citations.
• Lexical level -- Simple grouping of characters into "tokens" which are typically words, but word
stems, phrases, word n-grams and character n-grams may be beneficial
• Syntactic level --Parsing purpose of token, grammatical category, tense, plurality, in the
context of the phrase, sentence and paragraph
• Semantic level -- Extract meaning of the syntactic structure with the tokens using fuller
analysis of the context.

Term Frequency--Inverse Document
Frequency

Mapping vector
space models to
the document

Single Document Visualization
• Tag Clouds visualizes the words by size based on frequency. Again this
is the opening Intro section.
• tagcrowd.com
• Here we consider visualizing the text within a document, and collections of documents which
are likely related (corpus).
• Difficulty in analysis includes the loose structure, varied vocabulary, and optional metadata such
as author(s), date, modification dates, comments, keywords, catalog codes, citations.
• Lexical level -- Simple grouping of characters into "tokens" which are typically words, but word
stems, phrases, word n-grams and character n-grams may be beneficial
• Syntactic level --Parsing purpose of token, grammatical category, tense, plurality, in the context
of the phrase, sentence and paragraph
• Semantic level -- Extract meaning of the syntactic structure with the tokens using fuller analysis
of the context.

Wordle
• Creates a visualization with size based on frequency.
• http://wordle.net

Word Tree
• https://www.jasondavies.com/wordtree/

Music theme
visualization
example

Literature fingerprinting
• Here we look at n-word-grams to match patterns of the author.
• N-gram is probably the easiest concept to understand in the
whole machine learning space, I guess. An N-gram means a
sequence of N words. So for example, “Medium blog” is a 2-
gram (a bigram), “A Medium blog post” is a 4-gram, and “Write
on Medium” is a 3-gram (trigram). Well, that wasn’t very
interesting or exciting. True, but we still have to look at the
probability used with n-grams, which is quite interesting.

Document Collection Visualizations
• Goal is to place similar documents close together.
• graph spring layouts,
• multi-dimensional scaling
• clustering (K-means, hierarchical)
• self-organizing maps
• Self-organizing maps -- use the vectors from each document to calculate distances from
each other. Higher weights draw the documents closer together. Randomly start with
one document.

Power Query & M Language
• Power Query is built on what was then a new query language called
M. It is a mashup language (hence the letter M) designed to create
queries that mix together data.

• 12 Methods for Visualizing Geospatial Data on a Map | SafeGraph
• Time Oriented Visualizations (juniata.edu)

Unit III.pptx

Recommended

Recommended

More Related Content

Similar to Unit III.pptx

Similar to Unit III.pptx (20)

Recently uploaded

Recently uploaded (20)

Unit III.pptx