"Big data" is a field that treats ways to analyze, systematically extract information from, or
otherwise deal with data sets that are too large or complex to be dealt with by traditional data-
processing application software. Data with many cases (rows) offer greater statistical power,
while data with higher complexity (more attributes or columns) may lead to a higher false
discovery rate
 Big data challenges
 include capturing data, data storage, data
analysis, search, sharing, transfer,
visualization, querying, updating,
information privacy and data source
 Data Challenges
While dealing with large amounts of information we face such challenges as volume, variety, velocity
and veracity that are also known as 4V of Big Data.
Volume : refers to the large amount of data, especially, machine-generated.
Variety : is related to different types and forms of data sources:
- structured (e.g. financial data) and
- unstructured (social media conversations, photos, videos, voice recordings and
others).
Velocity : refers to the speed of new data generation and distribution.
Veracity: refers to the complexity of data which may lead to a lack of quality and accuracy
 Why is Big Data problem?
A problem with big data is that it grows constantly and organizations often fail to capture the
opportunities and extract actionable data. Companies often fail to recognize on where they
need to allocate their resources. This failure in allocating the resources results in not making the
most of the information
 What is Data Visualization in big data?
Data visualization is a general term that describes any effort to help people understand the
significance of data by placing it in a visual context. Patterns, trends and correlations that might
go undetected in text-based data can be exposed and recognized easier with data visualization
software
 What are the benefits of data visualization?
o
Faster Action. The human brain tends to process visual information far more easily than written
information.
o
Communicate Findings in Constructive Ways.
o
Understand Connections Between Operations and Results.
o
Embrace Emerging Trends
o
Interact With Data
o
Create New Discussion
Why is data visualization important?
Meeting the need for speed.
Understanding the data
Addressing data quality
Dealing with outliers
Displaying meaningful results
visualized representation of data is abstract and extremely limited by one’s perception capabilities and
requests (see Fig. 4).
So human perception capabilities are not sufficient embrace large amount of data
types of data to be visualized:
• Univariate data One dimensional arrays, time series, etc.
• Two-dimensional data Point two-dimensional graphs, geographical
coordinates, etc.
• Multidimensional data Financial indicators, results of experiments, etc.
• Texts and hypertexts Newspaper articles, web documents, etc.
• Hierarchical and links The structure subordination in the organization, e-mails,
documents and hyperlinks, etc.
• Algorithms and programs Information flows, debug operations, etc.
Types of visualization techniques:
1) 2D/3D standard figure:
May be implemented as bars, line graphs, various charts, etc. (see Fig. 5). The main drawback of this type is the
complexity of the acceptable visualization for complicated data structures.
(.2 Geometric transformations :
This technique represents information as scatter diagram. This type is geared towards a multi-dimensional data
set’s transformation in order to display it in Cartesian and non-Cartesian geometric spaces.
(.3 Display icons:
this type displays the values of elements of multidimensional data in properties of images. Such images may include
human faces, arrows, stars, etc. Images can be grouped together for holistic analysis. The result of the visualization is
a texture pattern, which varies according to the specific characteristics of the data
(.4 Methods focused on the pixels
The main idea is to display the values in each dimension into the colored pixel and to merge some of
them according to specific measurements. Since one pixel is used to display a single value, therefore
visualization of large amounts of data can be reachable with this methodology;
(.5 Hierarchical images :
These type methods are used with the hierarchical structured data.
(.6 Tag cloud:
Tag cloud is used in text analysis, with a weighting value dependent on the frequency of use (citation)
of a particular word or phrase. It consists of an accumulation of lexical items (words, symbols or
combination of the two). This tech-nique is commonly integrated with web sources to quickly
familiarize visitors with the content via key words.
(.7 Clustergram :
Clustergram is an imaging technique used in cluster analysis by means of representing the relation of
individual elements of the data as they change their number. Choosing the optimal number of clusters
is also an important component of cluster analysis.
(.8 Motion charts :
Motion charts allow effective exploration of large and multivariate data and interact with it utilizing
dynamic 2D bubble charts. The blobs (bubbles—central objects of this technique) can be controlled due
to variable mapping for which it is designed. For instance, motion charts graphical data tools are
provided by Google , amCharts and IBM Many Eyes.
(.9 Dashboard:
Dashboard enables the display of log files of various formats and filter data based on chosen data
ranges. Traditionally, dashboard consists of three layers: data (raw data), analysis (includes formulas
and imported data from data layer to tables) and presentation (graphical representation based on the
analysis layer).
(.10 COLOR – SIZE – CONNECTION –SIMILARITY :
Nowadays, there are many publicly available tools to create meaningful and attractive visualizations.
Such as manipulation of size, color and connections between visual objects (see Fig. 14).
people tend to perceive the world in a form of holistic ordered configuration rather than constituent fragments
Otherwise, too many colors, shapes, and interconnections may cause difficulties in the comprehension of data, or
some visual elements may be too complex to recognize
(.11 Thematic Maps :
(.12 Text Analytics: Visualizing Natural Language
Text Annotation and Markup
Named entity recognition (NER)
Named entity recognition (NER) seeks to locate and classify atomic elements in text into predefined
categories such as the names of persons, organizations, locations, etc. The list of categories can be
extended to include disease names in the biomedical paradigm.
Quantities and Dates are often included in NER systems
Research indicates that NER systems developed for one domain do not typically perform
well on other domains.
Early work in NER systems in the 1990s was aimed primarily at extraction from
journalistic articles.
Attention then turned to processing of military dispatches and reports.
 Since about 1998, there has been a great deal of interest in entity identification in the
molecular biology, bioinformatics, and medical natural language processing
communities.The most common entity of interest in that domain has been names of
genes and gene products.
13.2 Descriptive Analytics of Graphs
Big data visualization state of the art

Big data visualization state of the art

  • 2.
    "Big data" isa field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data- processing application software. Data with many cases (rows) offer greater statistical power, while data with higher complexity (more attributes or columns) may lead to a higher false discovery rate
  • 3.
     Big datachallenges  include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy and data source
  • 4.
     Data Challenges Whiledealing with large amounts of information we face such challenges as volume, variety, velocity and veracity that are also known as 4V of Big Data. Volume : refers to the large amount of data, especially, machine-generated. Variety : is related to different types and forms of data sources: - structured (e.g. financial data) and - unstructured (social media conversations, photos, videos, voice recordings and others). Velocity : refers to the speed of new data generation and distribution. Veracity: refers to the complexity of data which may lead to a lack of quality and accuracy
  • 5.
     Why isBig Data problem? A problem with big data is that it grows constantly and organizations often fail to capture the opportunities and extract actionable data. Companies often fail to recognize on where they need to allocate their resources. This failure in allocating the resources results in not making the most of the information
  • 6.
     What isData Visualization in big data? Data visualization is a general term that describes any effort to help people understand the significance of data by placing it in a visual context. Patterns, trends and correlations that might go undetected in text-based data can be exposed and recognized easier with data visualization software
  • 7.
     What arethe benefits of data visualization? o Faster Action. The human brain tends to process visual information far more easily than written information. o Communicate Findings in Constructive Ways. o Understand Connections Between Operations and Results. o Embrace Emerging Trends o Interact With Data o Create New Discussion
  • 8.
    Why is datavisualization important? Meeting the need for speed. Understanding the data Addressing data quality Dealing with outliers Displaying meaningful results
  • 14.
    visualized representation ofdata is abstract and extremely limited by one’s perception capabilities and requests (see Fig. 4). So human perception capabilities are not sufficient embrace large amount of data
  • 15.
    types of datato be visualized: • Univariate data One dimensional arrays, time series, etc. • Two-dimensional data Point two-dimensional graphs, geographical coordinates, etc. • Multidimensional data Financial indicators, results of experiments, etc. • Texts and hypertexts Newspaper articles, web documents, etc. • Hierarchical and links The structure subordination in the organization, e-mails, documents and hyperlinks, etc. • Algorithms and programs Information flows, debug operations, etc.
  • 16.
    Types of visualizationtechniques: 1) 2D/3D standard figure: May be implemented as bars, line graphs, various charts, etc. (see Fig. 5). The main drawback of this type is the complexity of the acceptable visualization for complicated data structures.
  • 17.
    (.2 Geometric transformations: This technique represents information as scatter diagram. This type is geared towards a multi-dimensional data set’s transformation in order to display it in Cartesian and non-Cartesian geometric spaces.
  • 18.
    (.3 Display icons: thistype displays the values of elements of multidimensional data in properties of images. Such images may include human faces, arrows, stars, etc. Images can be grouped together for holistic analysis. The result of the visualization is a texture pattern, which varies according to the specific characteristics of the data
  • 19.
    (.4 Methods focusedon the pixels The main idea is to display the values in each dimension into the colored pixel and to merge some of them according to specific measurements. Since one pixel is used to display a single value, therefore visualization of large amounts of data can be reachable with this methodology;
  • 20.
    (.5 Hierarchical images: These type methods are used with the hierarchical structured data.
  • 21.
    (.6 Tag cloud: Tagcloud is used in text analysis, with a weighting value dependent on the frequency of use (citation) of a particular word or phrase. It consists of an accumulation of lexical items (words, symbols or combination of the two). This tech-nique is commonly integrated with web sources to quickly familiarize visitors with the content via key words.
  • 22.
    (.7 Clustergram : Clustergramis an imaging technique used in cluster analysis by means of representing the relation of individual elements of the data as they change their number. Choosing the optimal number of clusters is also an important component of cluster analysis.
  • 23.
    (.8 Motion charts: Motion charts allow effective exploration of large and multivariate data and interact with it utilizing dynamic 2D bubble charts. The blobs (bubbles—central objects of this technique) can be controlled due to variable mapping for which it is designed. For instance, motion charts graphical data tools are provided by Google , amCharts and IBM Many Eyes.
  • 24.
    (.9 Dashboard: Dashboard enablesthe display of log files of various formats and filter data based on chosen data ranges. Traditionally, dashboard consists of three layers: data (raw data), analysis (includes formulas and imported data from data layer to tables) and presentation (graphical representation based on the analysis layer).
  • 25.
    (.10 COLOR –SIZE – CONNECTION –SIMILARITY : Nowadays, there are many publicly available tools to create meaningful and attractive visualizations. Such as manipulation of size, color and connections between visual objects (see Fig. 14). people tend to perceive the world in a form of holistic ordered configuration rather than constituent fragments Otherwise, too many colors, shapes, and interconnections may cause difficulties in the comprehension of data, or some visual elements may be too complex to recognize
  • 26.
  • 27.
    (.12 Text Analytics:Visualizing Natural Language Text Annotation and Markup Named entity recognition (NER) Named entity recognition (NER) seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, etc. The list of categories can be extended to include disease names in the biomedical paradigm. Quantities and Dates are often included in NER systems
  • 28.
    Research indicates thatNER systems developed for one domain do not typically perform well on other domains. Early work in NER systems in the 1990s was aimed primarily at extraction from journalistic articles. Attention then turned to processing of military dispatches and reports.  Since about 1998, there has been a great deal of interest in entity identification in the molecular biology, bioinformatics, and medical natural language processing communities.The most common entity of interest in that domain has been names of genes and gene products.
  • 29.