2. THE GROWING YEARS
Y2
Ensure data
to increase
in depths
from bytes till
petabytes of
data.
Y1
Introduce
data
Y3
Focus
information
and other
data is just
discarded
3. Meaning
► Data Visualization is the graphical representation of data in the form of charts,
diagrams, pictures etc.
► Data visualization refers to those set of techniques used to communicate insights
from data through visual representation.
► The main goal is to distill large datasets into visual graphics to allow for easy
understanding of complex relationships within the data.
Data Insights Visualize
6. Significance
► In the age of technological development and growing big data it is
necessary to make data more understandable and provide data
with better visualization techniques.
► the ability to absorb information quickly, improve insights and make
faster decisions;
► an increased understanding of the next steps that must be taken to
improve the organization;
► an improved ability to maintain the audience's interest with
information they can understand.
7. Ways of Data Visualization
Data
Visualization
Charts
Tables
Graphs
Maps
Info
graphics
Dashboards
8. Traits of Meaningful data
1. It is visually appealing
2. It is scalable
3. It must provide audience with right information
4. It must be accessible
5. It enables rapid development and deployment
6. High volume
❖ More data means more chances of getting something interesting.
7. Historical
❖ Helps in understanding present and future by predicting patterns from the
past.
❖ Helps in getting better insights of data.
9. 8. Multivariate
❖ Data should be multi-valued so that every aspect of data to be
considered.
9. Detailed
❖ It should be the complete understanding of detailed analysis of data.
10. Clean
❖ Keeps data consistent despite changes over time.
❖ Keeps data segregated in different forms.
11. Clear
❖ Data should be familiar in similar terms for better understanding.
10. Data traits should be:-
► Meaningful – People use it on a regular basis and can make
relevant decisions by comprehensive view.
► Desirable – It is not only easy to use but also pleasant to use.
► Usable – People can use them achieve their goals easily and
quickly.
11. Brief History : Data growth from 1990s
and onwards…
► Since , the past few years the size of the data is growing at enormous rate that
even cannot be predicted.
► The data grows its insights from petabytes to exabytes of data.
► In 1990s only 3% of the data grows as time grows decade shifts the amount of
data growth also increases from 3 % to 31 % and till 85 %.
► The rate of growth is so rapid that human efforts even cannot recognize insights
from growing data.
► So, the data is sometimes discarded from the memory even without analyzing it
from memory.
13. ► To have better understandability of data we need proper tools and
techniques for processing the data and getting better outcomes out of
it.
► Some tools used for analyzing big data are:-
1. Apache Cassandra
2. Hadoop
3. Plotly
4. Mongo DB
5. Drill
6. Storm
7. Splunk
14. Features Apache
Cassandra
Hadoop Plotly Mongo DB Drill Storm Splunk
Mode of
software
Free open
source
distributed.
Open source
and free
source
Open source Open source
and free
software
Open source
and
distributed
framework
Open source
and
distributed.
Proprietary
tool and
real time
platform.
Data
Processing
SQL queries,
streaming data
,machine
learning
Batch
Processing
System
Interactive
graphs and
user defined
images
No SQL
database,
Document
oriented
database
Distributed
execution
environment
for large
scale data
Supports
Stream
Processing
Machine
data
processing
Language
Support
Supports
java,python,
Node.js
Supports java,
c
,C++,Ruby,per
l,python
Supports java
script,
python, R
plotly
Supports
python,
javascript, c,
c++,
php,Ruby,Perl
.
Supports
ANSI
SQL(industry
standard
query
language)
Supports non
JVM
Languages.
supports
python,jav
a,java
script.
Data Flow Data is written
into a commit
log as well as in
memory store.
Map reduce
computation
data flow
does not have
any loops. It is
a chain of
stages.
Track the flow
of individual
items through
Plotly Sankey.
Data streams
focussed on
massive flow
of data from
multiple fire
horses and
then routing
to systems.
It makes it
to use SQL
query non
relational
databases
and file
systems.
Designeas a
directed
acyclic graph
with streams
used to
process the
data.
Pipeline
executes a
series of
processors
which
operates
on the
data.
15. ensuring data
is secured.
active
directory
kerbores
and LDAP for
authenticatio
n.
tion
between
browser and
server by
encrypting
data in
transmission
with SSL.
makes it
easy to
control
access with
role based
user
manageme
nt.
petabytes. n
capabilities.
a paid
license. It
quickly
detects
and
responds
to internal
and
external
attacks .
Latency Lower latency
by replicating
across
multiple data
centers.
It offers
higher
latency.
It offers
reduced
latency due
to caching.
It offers low
latency
analytics as
it cant
analyse semi
or
unstructured
data.
It offers Low
latency.
Exteremely
low latency.
It offers
moderatel
y high
latency as
it works
with real
time
architectu
re.
Fault
tolerance
It is tolerant of
both network
partitions and
nodes dying.
Map reduce
is highly fault
tolerant. No
need to start
the
application
It has
trusted
advisor fault
tolerant
review
Replica set
deployment
architecture
supports
fault
tolerance.
JDBC
connection
improves
fault
tolerance
If any node
dies storm
will
automatical
ly restart for
fault
It offers
apprecia
bly high
fault
tolerance
.
16. Power of Visual Perception
► Visual Perception is the ability to understand the things of surrounding
environment via the ability of what the eyes see.
► Visual perception is the process of absorbing what one sees, organizing it in the
brain, and making sense of it.
► With poor visual perception, we would have significant deficits in many cognitive
processes or cognitive activities that affect our mental content.
► These deficits could significantly impact learning.
19. Making abstract data visible
► We are collecting data, but we lag in what we can do with it.
► When we have too much information , it becomes nearly
impossible to understand the insights of information depicted so we
use some ways of interpreting the raw data in the form of
visualization.
► So, that it becomes easy to understand the raw data and fetch
information out of it.
21. Information Visualization
► The visualization process is a transformation of data in one
representation to another, mostly to a representation better
observable by humans.
► The following steps of a visualization process can be found in any
problem area:
1.Data preparation
2.Encoding
3.Presentation
4.Interaction.
22. Visualization Process
1.Identification- First step of the visualization process, is used to identify relevant
entities and events that the visualization will deal with.
2. Encoding- The second step encoding deals with problems how the data will be
displayed.
The questions to be considered are oriented
to efficiency, aesthetic, understanding,
similarity etc.
These aspects of visualization play major role for humans that will be involved in the
visualization process, because when the visualization is not understandable or uses
nonstandard visuals, the benefits of visualization may get lost.
3. Presentation & Interaction- The final phase is presentation and interaction and
should answer questions about how the visualization objects are displayed and
which interaction possibilities are offered to the user. For each specific visualization
the vocabulary used is also important.
23. ► Visual exploration techniques can be classified according to three
orthogonal criteria:-
1. Data to be visualized
2. The visualization technique
3. Interaction
4. Distortion technique used.
24. Building blocks of information
visualization
► Precision
► Size
► Color
► Shape