袁晓如：大数据时代可视化和可视分析的机遇与挑战

BDTC - Beijing, 2013-12-6

Big Data Visualization and
Visual Analysis
- the Challenges and Opportunities

⏠ ⏠


Visualization

3


! 

:

4

…

! 

! 
·
! 
·


! 

(Visualization)
(mental image)
Data

5

(mental model)

Insights

Mental
Model
Image
Visualization


From Data to Visualization

6


Visualization != Infographics
! 

7


Large Volume Visualization
!  Level of Details
!  Out of Core
!  Parallel Visualization

9


10

Top 10 Challenges in Extreme-Scale Data
Visual Analytics

Pak Chung Wong (PNNL)

Han-Wei Shen (OSU)

Chris Johnson (Utah)

Chaomei Chen (Drexel)

Robert Ross (Argonne)


Top 10 Challenges in ExtremeScale Data Visual Analytics

11

!  In Situ Analysis

!  Perform as much analysis as possible while the data are still in
memory
!  Interaction and User Interfaces

!  Machine-based automated systems vs. Human Cognition
!  Large-Data Visualization

!  Data projection and dimension Reduction, display technology
!  Databases and Storage

!  A cloud-based solution might not meet the needs
!  Algorithms

!  Address both data-size and visual-efficiency issues


Top 10 Challenges in ExtremeScale Data Visual Analytics

12

!  Data Movement/Transport, & Network Infrastructure

!  Efficiently use networking resources and provide convenient
abstractions
!  Uncertainty Quantification

!  Cope with incomplete data
!  Parallelism
!  Domain and Development Libraries, Frameworks, and Tools
!  Affordable resource libraries, frameworks, and tools

!  Social, Community, and Government Engagements


Challenges in Big Data
Visualization/Visual Analytics - 1
!  Integrating heterogeneous Data from different resources and
scales

13


Beijing Taxi GPS data

14


Data
!  Beijing taxi GPS data
!  Size: 34.5GB
!  Taxi number: 28,519
!  Sampling point number: 379,107,927
!  Time range: 2009/03/02~25 (24 days, but 03/18 data is missing)
!  Sampling rate: 30 seconds per point (but 60% data missing)

!  Beijing road network (from OpenStreetMap)
!  Size: 40.9 MB
!  169,171 nodes and 35,422 ways

15


18

Traffic Jam Detection
Defining&propaga,on&based&on&
spa,al/temporal&rela,onship:&
Raw&taxi&
GPS&Data

Raw&Road&
Network

Cleaned&
GPS&Data

Processed&&
Road&
Network

e0
a

GPS&Trajectories&Matched&
to&the&Road&Network

Traffic&Jam&Event&Data

b

e0&happens&before&e1,&and&
on&a&dWay&following&e1

……

e0

…

……

…

9:10&am

Road&Speed&Data
Traffic&Jam&Detec,on

e1

50&km/h

9:10&am

55&km/h

9:20&am

45&km/h

9:20&am

10&km/h

9:30&am

12&km/h

9:30&am

12&km/h

9:40&am

15&km/h

9:40&am

45&km/h

……

…

……

…

e1


19

Visual Interface
Road&Segment&Level&Explora,on&and&Analysis

Road&of&
Interest

One&
Propaga,on&
Graph

Road&Speed&Data

Propaga,on&
Graphs&of&
Interest

Propaga,on&Graph&Level&
Explora,on

Propaga,on&Graph&List

Spa,al&Density

Time&and&Size&Distribu,on

Spa,al&Filter

Temporal&&&Size&Filter

Topological&
Clustering&&

Traﬃc&Jam&Event&Data
Traﬃc&Jam&
Propaga,on&Graphs
Dynamic&Query

Topological&
Filter


Preprocessing: Map
Matching
Raw taxi
GPS
Data

Raw Road
Network

Cleane
d GPS
Data

Processed
Road
Network

Map Matching
GPS Trajectories
Matched
to the Road Network

20


Visual Interface: Single Road
Level

22

!  Pixel based visualization
Time of a day: 144 columns (each for a 10min)

Days: 24 rows
(each for one day)

Each cell represents one time bin
Color encode speed


Case Study: Road Level
Exploration and Analysis
!  Different road congestion patterns

23


Case Study: Road Level
Exploration and Analysis

24


25

Propagation Graph Analysis
!  Spatial Temporal information of one propagation

Large delay

Spatial path

Temporal delay


Propagation Pattern
Exploration
!  Propagation graphs for one region in the morning of different
days

26


! 

27


Weibo ThemeMap

36


Weibo ThemeMap
! 

37


Xiamen Traffic

38


! 

39


scales
!  Scalability in Data/Task complexity
!  Data inherent properties impose more computational challenges
methods for visualization and visual analysis on big data

40


Pollution From China

41


Pollution from USA

42


43

Multivariate to Multi-Run
Visual Analysis
QVAPOR

QVAPOR
QCLOUD
Pressure
Speed

Run 1

QCLOUD
QVAPOR
QCLOUD
Pressure
Speed

QVAPOR

Run 2

Pressure

Speed

(Multivariate)

QVAPOR
QCLOUD
Pressure
Speed

(Ensemble Runs)

Run 3


Eulerian and Lagriangian
Specifications
!  Eulerian:

!  Lagriangian:

!  Relationships between two specifications (flow map):

44


Eulerian-based Attribute
Space Projection
! Samples on data grid !
Samples in attribute space !
Eulerian-based Attribute Space Projection
(EASP)

45


Lagrangian-based Attribute
Space Projection
!  Pathlines on data grid !
Pathlines in attribute space !
Lagrangian-based Attribute Space Projection (LASP)
!  Both multivariate scalar fields and vector field are considered

46


Case: GEOS-5 Simulation

47


48

Couple Ensemble Flow Line Advection
and Analysis (eFLAA)-Concept

!  Ensemble data (large)
!  Field line data (much larger than ensemble data)
!  Variation field (small)
!  Filtered lines (even smaller)

[Guo, Yuan, Huang and Zhu TVCG 2013 (SCIVis ‘13)]


Benchmark Platform: NCSSJN
!  ShenWei-based supercomputer
!  SW1600 processor, 1.0~1.1GHz
!  1GB memory for each core
!  40Gbps high-speed interconnection

!  x86-based supercomputer
!  Intel Xeon E5675 hexa-core processor, 3.06GHz
!  4GB memory for each core
!  QDR Infiniband interconnection

!  Shared global filesystem: SWGFS

49


50

Scalability
!  Strong scalability test in National Super Computer Center in Jinan
(ShenWei and x86 architectures)


GEOS-5 Simulation

51


GEOS-5 Simulation

52


GEOS-5 Simulation: CO2based Metric

53

! The metric: the differences of locations / CO2
concentration along the pathline
! Findings
!  The variation of the wind field is high in the north hemisphere
!  However, The CO2 difference is higher in south hemisphere and
some places in the north
!  CO2 concentration is not sensitive to wind in above regions


scales

!  Limited access in Interaction for Large Data

54


Query

55


Dynamic Query
! 

56


Real-time Visual Querying of
Big Data
! 

imMens

57


Real-time Visual Querying of
Big Data
! 
! 

58


Nanocubes for Real-Time Exploration
of Spatiotemporal Datasets
! 

59


scales

!  Scalability in User
!  Collaborative Visualization and Analysis on large data
!  Can scientist create novel visualization without programming

60


61

Double Gulf
Visualization
Designer

Visualization
User

Representation

Evaluation

Data

Visualization

Conceptual
Model
Execution

Manipulation


62

Double Gulf
Visualization
Designer

Visualization
User

Representation

Evaluation

Data

Visualization

Conceptual
Model
Execution

Manipulation


63

From Data to User
Visualization
User

Evaluation
Execution

Visualization
Designer

Representation
Manipulation


64

Scalability In Users
Visualization
Designer

Visualization
User

Representation

Evaluation

Data

Visualization

Conceptual
Model
Execution

Manipulation


Scalability In Users –
Collaborative Visualization

65


ThemeMap – Crowd Sourcing

66


Large Security Data Vis

[Chen et al. IEEE VAST 2013 Situation Awareness Award]

67


! 

68



69


Crowd Sourcing based Vis.
! 

70


Scalability In Users –
User - Visualization Expert

71


Visualization Assembly Line

http://vis.pku.edu.cn/mddv/val/

72


Visualization Assembly Line

73


scales
!  System Development
!  Domain and Development Libraries, Frameworks, and Tools
!  Social, Community, and Government Engagements

74


75

SCIVIS Visualization Systems
!  VisIt - LLNL
https://wci.llnl.gov/codes/visit
!  ParaView- Kitware/SNL/LANL
http://www.paraview.org
!  IceT (Image Composition Engine for Tiles) - Sandia
http://icet.sandia.gov
!  Daxtoolkit - Data Analysis at Extreme
http://www.daxtoolkit.org
!  PISTON - Portable Data-Parallel Visualization and Analysis Library LANL
http://viz.lanl.gov/projects/PISTON.html


VisIt
!  Production end-user tool supporting
scientific and engineering
applications.
!  Parallel post-processing that scales
from desktops to massive HPC
clusters.

76


77

Development of VisIt
!  The VisIt project started in 2000 to support LLNL’s large scale ASC
physics codes.
!  Supported by multiple organizations: LLNL, LBNL, ORNL, UC Davis,
Univ. of Utah, …
!  Over 75 person years effort.
!  1.5+ million lines of code.

Based on SC’11 Tutorial


79

VTK

W.J. Schroeder, K. Martin, and W. Lorensen, The
Visualization Toolkit: An Object Oriented Approach to
Computer Graphics, Third Edition, Kitware, Inc.,
ISBN-1-930934-12-2 (2004).
S. E. Rogers, D. Kwak, and U. K. Kaul, A numerical study of
three-dimensional incompressible flow around multiple
post. In Proceedings of AIAA Aerospace Sciences
Conference. AIAA Paper 86-0353. Reno, Nevada, 1986.


ParaView
!  2000 Los Alamos National Laboratories and Kitware Inc.
!  2005 Sandia National Laboratories and Kitware Inc.
!  Used by academic, government, and commercial institutions
worldwide.
!  Downloaded ~100K times per year.

80


UV-CDAT Project

81


IN-SPIRE

82


Starlight Information
Visualization System

83


Build a successful vis system
!  System Design
!  Domain User – Visualization Scientist “Co-design”
!  Stable Development Team
!  Funding Mechanism

84


Build a successful vis system
!  System Design
!  Domain User – Visualization Scientist “Co-design”
!  Stable Development Team
!  Funding Mechanism

85


scales
!  System Development
!  Visualization Experts

87


VIS 2013 in Atlanta

88


89

Social, Community, and
Government Engagements
!  2013

IEEE VIS

! 
! 

533
87

! 
! 

31
24

! 

17

895
! 

! 

! 
! 
! 
! 
! 
! 
! 

! 

! 

! 

! 
! 
! 
! 
! 
! 
!


90

Social, Community, and
Government Engagements
!  Universities
! 
! 
! 
! 
! 
! 
! 
! 

University of Tennessee in Knoxville
Ohio State University
SCI Institute, University of Utah
University of California, Davis
University of California, San Diego
University of Nebraska-Lincoln
Michigan Technological University
Drexel University

!  Supercomputer centers
!  San Diego Supercomputer Center (SDSC)
!  Texas Advanced Computing Center
(TACC)
!  National Center for Supercomputing
Applications at the University of Illinois
(NCSA)

!  DoE Labs
!  Argonne National Laboratory (ANL)
!  Lawrence Berkeley National Laboratory
(LBNL)
!  Lawrence Livermore National Laboratory
(LLNL)
!  Los Alamos National Laboratory (LANL)
!  Pacific Northwest National Laboratory
(PNNL)
!  Oak Ridge National Laboratory (ORNL)
!  Sandia National Laboratories (SNL)
!  National Renewable Energy Laboratory
(NREL)

!  Companies
!  Kitware


91

Good News
!  More and more universities started visualization research
program
!  Many Companies are aware of the importance of visualization
!  Still, lack of national infrastructure


Vis Workshop 2013 @ PKU
!  2013.7.12-13

92


MOOC Course on
Visualization at PKU
!  Start Spring 2014
!  Cover major topics in visualization

93


Acknowledgement
!  Students
!  Funds
! 
! 
! 
! 

NSFC
863
PKU
Beijing NSF

!  Collaborators
! 
! 
! 
! 
! 

Jian Huang University of Tennessee
http://vis.pku.edu.cn/wiki
Zhu Xiaoming, SDSCC
Xiaoru.yuan@pku.edu.cn
Yongxian Zhang, China Earthquake Network Center
Xiaoguang Ma, CAS IAP
More …

94

袁晓如：大数据时代可视化和可视分析的机遇与挑战

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to 袁晓如：大数据时代可视化和可视分析的机遇与挑战

Similar to 袁晓如：大数据时代可视化和可视分析的机遇与挑战 (20)

More from hdhappy001

More from hdhappy001 (20)

Recently uploaded

Recently uploaded (20)

袁晓如：大数据时代可视化和可视分析的机遇与挑战