2. What is Data Visualization
• Data visualization is actually a set of data points and information that are
represented graphically to make it easy and quick for user to understand. Data
visualization is good if it has a clear meaning, purpose, and is very easy to
interpret, without requiring context. Tools of data visualization provide an
accessible way to see and understand trends, outliers, and patterns in data by
using visual effects or elements such as a chart, graphs, and maps.
• Data visualization is the process of translating large data sets and metrics into
charts, graphs and other visuals. The resulting visual representation of data makes it
easier to identify and share real-time trends, outliers, and new insights about the
information represented in the data.
3.
4.
5. Challenges of big data visualization
• Visual noise: Most of the objects in dataset are too relative to each other.
Users cannot divide them as separate objects on the screen.
• Information loss: Reduction of visible data sets can be used, but leads to
information loss
.
• Large image perception: Data visualization methods are not only limited by
aspect ratio and resolution of device, but also by physical perception limits.
• High rate of image change: Users observe data and cannot react to the
number of data change or its intensity on display.
• High performance requirements: It can be hardly noticed in static
visualization because of lower visualization speed requirements--high
performance requirement.
6. The 3Vs
Let's take a moment to further examine the Vs.
Volume
Volume involves determining or calculating how much of something there is,
or in the case of big data, how much of something there will be.
Velocity
Velocity is the rate or pace at which something is occurring. The measured
velocity experience can and usually does change over time. Velocities directly affect
outcomes.
Variety
Thinking back to our previous mention of relational databases, it is generally
accepted that relational databases are considered to be highly structured, although they
may contain text in VCHAR, CLOB, or BLOB fields.
7. Solution to this challenges
1. Meeting the need for speed: One possible solution is hardware. Increased memory and
powerful parallel processing can be used. Another method is putting data in-memory but
using a grid computing approach, where many machines are used.
2. Understanding the data: One solution is to have the proper domain expertise in place.
3. Addressing data quality: It is necessary to ensure the data is clean through the process of
data governance or information management.
4. Displaying meaningful results: One way is to cluster data into a higher-level view where
smaller groups of data are visible and the data can be effectively visualized.
5. Dealing with outliers: Possible solutions are to remove the outliers from the data or create
a separate chart for the outliers.
8. Approaches to Big Data
Visualization
• When it comes to the topic of big data, simple data visualization tools with their
basic features become somewhat inadequate. The concepts and models necessary to
efficiently and effectively visualize big data can be daunting, but are not
unobtainable.
• Using workable approaches (studied in the following chapters of this book) the
reader will review some of the most popular (or currently trending) tools, such as:
• Hadoop
• R
• Data Manager
• D3
• Tableau
• Python
• Splunk
• This is done in an effort to meet the challenges of big data visualization and support
better decision making.
10. • D3 stands for Data-Driven Documents. It is an open-source JavaScript
library developed by Mike Bostock to create custom interactive data
visualizations in the web browser using SVG, HTML and CSS.
• With the massive amount of data being generated today, communicating
this information is getting difficult. Visual representations of data are the
most effective means of conveying meaningful information and D3
provides a great deal of ease and flexibility to create these data
visualizations. It is dynamic, intuitive and needs minimum amount of
effort.
• It is similar to Protovis in concept but while Protovis is used for static
visualizations, D3 focuses more on interactions, transitions and
transformations.
12. D3 Features
• Uses Web Standards: D3 is an extremely powerful visualization tool to create interactive
data visualizations. It exploits the modern web standards: SVG, HTML and CSS to create data
visualization.
• Data Driven: D3 is data driven. It can use static data or fetch it from the remote server in
different formats such as Arrays, Objects, CSV, JSON, XML etc. to create different types of
charts.
• DOM Manipulation: D3 allows you to manipulate the Document Object Model (DOM)
based on your data.
• Data Driven Elements: It empowers your data to dynamically generate elements and apply
styles to the elements, be it a table, a graph or any other HTML element and/or group of
elements.
• Dynamic Properties: D3 gives the flexibility to provide dynamic properties to most of its
functions. Properties can be specified as functions of data. That means your data can drive
your styles and attributes.
• Types of visualization: With D3, there are no standard visualization formats. But it enables
you to create anything from an HTML table to a Pie chart, from graphs and bar charts to
geospatial maps.
• Custom Visualizations: Since D3 works with web standards, it gives you complete control
over your visualization features.
• Transitions: D3 provides the transition() function. This is quite powerful because internally,
D3 works out the logic to interpolate between your values and find the intermittent states.
• Interaction and animation: D3 provides great support for animation with functions like
duration(), delay() and ease(). Animations from one state to another are fast and responsive to
user interactions.
14. • D3.js is a JavaScript library. So, it can be used with any JS framework
of your choice like Angular.js, React.js or Ember.js.
• D3 focuses on data, so it is the most appropriate and specialized tool
for data visualizations.
• D3 is open-source. So you can work with the source code and add
your own features.
• It works with web standards so you don't need any other technology or
plug-in other than a browser to make use of D3.
• D3 works with web standards like HTML, CSS and SVG, there is no
new learning or debugging tool required to work on D3.
• D3 does not provide any specific feature, so it gives you complete
control over your visualization to customize it the way you want. This
gives it an edge over other popular tools like Tableau or QlikView.
• Since D3 is lightweight, and works directly with web standards, it is
extremely fast and works well with large datasets.
16. • HTML = HyperText Markup Language
• HTML is used to structure the content of the web page. The current version is
HTML 5
• . It is stored in a text file with the extension ".html".
CSS
• CSS = Cascading Style Sheets
• HTML gives a structure to the web page, while CSS styles your web page making it
more pleasant to look at. It is a stylesheet language used to describe the presentation
of a document written in HTML or XML (including XML dialects like SVG or
XHTML). CSS describes how elements should be rendered on a web page.