Seok Hee Hong - Visual analytics of big data

1,794 views

Published on

Recent technological advances have led to the production of a big data, and consequently have led to many massive complex network models in many domains including science and engineering. Examples include biological networks such as phylogenetic network, gene regulatory network, metabolic pathways, biochemical network and protein‐protein interaction networks. Other examples are social networks such as facebook network, twitter network, linked‐in network, telephone call network, patent network, citation network and collaboration network. Visualization is an effective analysis tool for such networks. Good visualization reveals the hidden structure of the
networks and amplifies human understanding, thus leading to new insights, new findings and predictions. However, constructing good visualization of big data can be very challenging.
In this talk, I will present a framework for visual analytics of big data. Visual Analytics is the science of analytical reasoning facilitated by interactive visual interfaces. Our framework is based on the tight integration of network analysis methods with visualization methods to address the scalability and complexity issues. I will present a
number of case studies using various networks derived from big data, in particular social networks and biological networks.

First presented at the 2014 Winter School in Mathematical and Computational Biology http://bioinformatics.org.au/ws14/program/

Published in: Science
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,794
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Seok Hee Hong - Visual analytics of big data

  1. 1. Visual Analytics of Big Data Seok-Hee Hong University of Sydney Bioinformatics Winter School 2014
  2. 2. Big Data and The Scale Problem
  3. 3. Social networks: Facebook users 2004 2005 2006 2007 50M 40M 30M 20M 10M 5M 0
  4. 4. Biological networks: KEGG database 1982 1988 1994 2000 2006 108 107 106 105 104 103 102
  5. 5. Internet Movie Data Base Year 1937
  6. 6. 1995
  7. 7. The scale problem Data sets are growing much faster than computing systems/tools to analyse them. Existing algorithms/methods do not scale well enough to be efficient/effective on the big data sets.
  8. 8. Big Graph/Network
  9. 9. Erdos networks Lincoln Lu
  10. 10. Visual Analytics
  11. 11.  Good visualisation can enable users:  to understand the structure  to discover new knowledge/insight  to find regular/abnormal patterns/behavior  to generate/confirm/reject hypothesis  to confirm expected and discover unexpected  to reveal the hidden truth  to predict the future Visual Analytics
  12. 12. Visual Data Mining
  13. 13. Key Scientific Challenge 1. Scalability 2. Visual Complexity 3. Domain Complexity
  14. 14. Visual Analysis Framework for Big Graph Big Data Graph Picture interaction visualisationanalysis
  15. 15. GEOMI (Geometry for Maximum Insight) Visual analytic tool for large and complex networks Developed by NICTA and University of Sydney
  16. 16. GEOMI (GEOmetry for Maximum Insight) Network Analysis Interaction Graph Layout
  17. 17. GEOMI Features Network/graph generator  Scale-free networks  Clustered graph  Hierarchical graph Network analysis  Centrality: degree, betweenness, closeness, eccentricity, eigenvector, randomwalk betweenness, uniqueness  Group analysis: blockmodelling, clustering, k-core, structural equivalence  Graph algorithms: filtering, shortest path, giant component Interaction/Navigation  Zoom, panning, rotation  Selection  Graph layout interaction/navigation  Animation  Head gesture interaction
  18. 18. Graph/Network Layout Node-link representation  Trees  Planar graphs  General undirected graphs  Directed graphs  Clustered graphs  Hierarchical graphs  Scale-free networks  Dynamic/Temporal networks  Multi-relational networks  Multi-variate networks  Overlapping networks Map representation  Tree/Radial tree map  Voronoi map  Temporal map Hybrid representation
  19. 19. Interaction with Cool Toys
  20. 20. IMDB (Internet Movie Data Base) Network Analysis Kevin Bacon Network
  21. 21. Days of Thunder (1990) Far and Away (1992) A Few Good Man Hollywood Movie Actor Collaboration Network Kevin Bacon Network IMDB (Internet Movie DataBase)
  22. 22. Kevin Bacon Tom Cruise: Bacon #1 Nicole Kidman: Bacon#2
  23. 23. Evolution of Kevin Bacon Network
  24. 24. GD05: Evolution of IMDB Kevin Bacon #1: 2000
  25. 25. WOS (Web of Science) Analysis Social Network Co-citation Network
  26. 26. Evolution of Co-citation Network in WOS
  27. 27. co-citation network of year 2003
  28. 28. co-citation network of Year 2006
  29. 29. Information Visualisation Network Analysis
  30. 30. Evolution of research area
  31. 31. Info Vis Collaboration Network
  32. 32. Email Network Virus Detection
  33. 33. History of World Cup
  34. 34. World Cup 2002
  35. 35. Edge Bundling with centrality analysis & k-core analysis
  36. 36. US Airline Network Analysis
  37. 37. Integration with Clustering Clustered Graph Layout
  38. 38. Metabolic Pathway Visualisation
  39. 39. GO-defined Protein Interaction Network
  40. 40. 2.5D Scale-free Network Visualisation
  41. 41. Scale-free Network  [Barabasi and Albert 99]  Exponential Growth  Preferential attachment  Properties  Power-law degree distribution  Sparse, but locally dense  Small-world property: O(loglogn) average path length  High clustering coefficient  Resilient to random attack, but vulnerable to designed attack  Examples:  Webgraph  Social networks  Biological networks
  42. 42. Parallel Plane/Concentric Sphere Layout G1 G3 G2G1 G3 G2
  43. 43. PPI networks Hawoong Jeong
  44. 44. Visualisation of Patterns Motif
  45. 45. Overlapping Network Visualisation for Integrated Analysis
  46. 46. protein-gene interactions protein-protein interactions PROTEOME GENOME Citrate Cycle METABOLISM Bio-chemical reactions
  47. 47. Two Overlapping Networks
  48. 48. Glycolysis Pathway [KEGG] and PPI [DIP]: E. Coli 9 overlap 1-neighborhood network
  49. 49. Gene Regulatory Network [RegulonDB] and PPI: E. Coli periphery proteins 6 hubs: no overlap
  50. 50. bottleneck proteins
  51. 51. Three Overlapping Networks
  52. 52. GRN [RegulonDB]: PPI [DIP]: MN [KEGG] (E. Coli)
  53. 53. 6 hubs in GR: crp, arcA, fis, hns, ihfAB, lrp No overlap
  54. 54. 3 aceE
  55. 55. 3 aceE aceF
  56. 56. 3 GRN [RegulonDB]: PPI [DIP]: MN [KEGG] (E. Coli)
  57. 57. 3 ptsG: overlap between 3 layers
  58. 58. Propagation Animation in Diffusion Network

×