Seok Hee Hong - Visual analytics of big data
Upcoming SlideShare
Loading in...5

Seok Hee Hong - Visual analytics of big data



Recent technological advances have led to the production of a big data, and consequently have led to many massive complex network models in many domains including science and engineering. Examples ...

Recent technological advances have led to the production of a big data, and consequently have led to many massive complex network models in many domains including science and engineering. Examples include biological networks such as phylogenetic network, gene regulatory network, metabolic pathways, biochemical network and protein‐protein interaction networks. Other examples are social networks such as facebook network, twitter network, linked‐in network, telephone call network, patent network, citation network and collaboration network. Visualization is an effective analysis tool for such networks. Good visualization reveals the hidden structure of the
networks and amplifies human understanding, thus leading to new insights, new findings and predictions. However, constructing good visualization of big data can be very challenging.
In this talk, I will present a framework for visual analytics of big data. Visual Analytics is the science of analytical reasoning facilitated by interactive visual interfaces. Our framework is based on the tight integration of network analysis methods with visualization methods to address the scalability and complexity issues. I will present a
number of case studies using various networks derived from big data, in particular social networks and biological networks.

First presented at the 2014 Winter School in Mathematical and Computational Biology



Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

Seok Hee Hong - Visual analytics of big data Seok Hee Hong - Visual analytics of big data Presentation Transcript

  • Visual Analytics of Big Data Seok-Hee Hong University of Sydney Bioinformatics Winter School 2014
  • Big Data and The Scale Problem
  • Social networks: Facebook users 2004 2005 2006 2007 50M 40M 30M 20M 10M 5M 0
  • Biological networks: KEGG database 1982 1988 1994 2000 2006 108 107 106 105 104 103 102
  • Internet Movie Data Base Year 1937
  • 1995
  • The scale problem Data sets are growing much faster than computing systems/tools to analyse them. Existing algorithms/methods do not scale well enough to be efficient/effective on the big data sets.
  • Big Graph/Network
  • Erdos networks Lincoln Lu
  • Visual Analytics
  •  Good visualisation can enable users:  to understand the structure  to discover new knowledge/insight  to find regular/abnormal patterns/behavior  to generate/confirm/reject hypothesis  to confirm expected and discover unexpected  to reveal the hidden truth  to predict the future Visual Analytics
  • Visual Data Mining
  • Key Scientific Challenge 1. Scalability 2. Visual Complexity 3. Domain Complexity
  • Visual Analysis Framework for Big Graph Big Data Graph Picture interaction visualisationanalysis
  • GEOMI (Geometry for Maximum Insight) Visual analytic tool for large and complex networks Developed by NICTA and University of Sydney
  • GEOMI (GEOmetry for Maximum Insight) Network Analysis Interaction Graph Layout
  • GEOMI Features Network/graph generator  Scale-free networks  Clustered graph  Hierarchical graph Network analysis  Centrality: degree, betweenness, closeness, eccentricity, eigenvector, randomwalk betweenness, uniqueness  Group analysis: blockmodelling, clustering, k-core, structural equivalence  Graph algorithms: filtering, shortest path, giant component Interaction/Navigation  Zoom, panning, rotation  Selection  Graph layout interaction/navigation  Animation  Head gesture interaction
  • Graph/Network Layout Node-link representation  Trees  Planar graphs  General undirected graphs  Directed graphs  Clustered graphs  Hierarchical graphs  Scale-free networks  Dynamic/Temporal networks  Multi-relational networks  Multi-variate networks  Overlapping networks Map representation  Tree/Radial tree map  Voronoi map  Temporal map Hybrid representation
  • Interaction with Cool Toys
  • IMDB (Internet Movie Data Base) Network Analysis Kevin Bacon Network
  • Days of Thunder (1990) Far and Away (1992) A Few Good Man Hollywood Movie Actor Collaboration Network Kevin Bacon Network IMDB (Internet Movie DataBase)
  • Kevin Bacon Tom Cruise: Bacon #1 Nicole Kidman: Bacon#2
  • Evolution of Kevin Bacon Network
  • GD05: Evolution of IMDB Kevin Bacon #1: 2000
  • WOS (Web of Science) Analysis Social Network Co-citation Network
  • Evolution of Co-citation Network in WOS
  • co-citation network of year 2003
  • co-citation network of Year 2006
  • Information Visualisation Network Analysis
  • Evolution of research area
  • Info Vis Collaboration Network
  • Email Network Virus Detection
  • History of World Cup
  • World Cup 2002
  • Edge Bundling with centrality analysis & k-core analysis
  • US Airline Network Analysis
  • Integration with Clustering Clustered Graph Layout
  • Metabolic Pathway Visualisation
  • GO-defined Protein Interaction Network
  • 2.5D Scale-free Network Visualisation
  • Scale-free Network  [Barabasi and Albert 99]  Exponential Growth  Preferential attachment  Properties  Power-law degree distribution  Sparse, but locally dense  Small-world property: O(loglogn) average path length  High clustering coefficient  Resilient to random attack, but vulnerable to designed attack  Examples:  Webgraph  Social networks  Biological networks
  • Parallel Plane/Concentric Sphere Layout G1 G3 G2G1 G3 G2
  • PPI networks Hawoong Jeong
  • Visualisation of Patterns Motif
  • Overlapping Network Visualisation for Integrated Analysis
  • protein-gene interactions protein-protein interactions PROTEOME GENOME Citrate Cycle METABOLISM Bio-chemical reactions
  • Two Overlapping Networks
  • Glycolysis Pathway [KEGG] and PPI [DIP]: E. Coli 9 overlap 1-neighborhood network
  • Gene Regulatory Network [RegulonDB] and PPI: E. Coli periphery proteins 6 hubs: no overlap
  • bottleneck proteins
  • Three Overlapping Networks
  • GRN [RegulonDB]: PPI [DIP]: MN [KEGG] (E. Coli)
  • 6 hubs in GR: crp, arcA, fis, hns, ihfAB, lrp No overlap
  • 3 aceE
  • 3 aceE aceF
  • 3 GRN [RegulonDB]: PPI [DIP]: MN [KEGG] (E. Coli)
  • 3 ptsG: overlap between 3 layers
  • Propagation Animation in Diffusion Network