袁晓如:大数据时代可视化和可视分析的机遇与挑战
Upcoming SlideShare
Loading in...5
×
 

袁晓如:大数据时代可视化和可视分析的机遇与挑战

on

  • 459 views

BDTC 2013 Beijing China

BDTC 2013 Beijing China

Statistics

Views

Total Views
459
Views on SlideShare
459
Embed Views
0

Actions

Likes
0
Downloads
8
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

袁晓如:大数据时代可视化和可视分析的机遇与挑战 袁晓如:大数据时代可视化和可视分析的机遇与挑战 Presentation Transcript

  • BDTC - Beijing, 2013-12-6
  • BDTC - Beijing, 2013-12-6 Big Data Visualization and Visual Analysis - the Challenges and Opportunities ⏠ ⏠
  • BDTC - Beijing, 2013-12-6 Visualization 3
  • BDTC - Beijing, 2013-12-6 !  : 4 … !  !  · !  ·
  • BDTC - Beijing, 2013-12-6 !  (Visualization) (mental image) Data 5 (mental model) Insights Mental Model Image Visualization
  • BDTC - Beijing, 2013-12-6 From Data to Visualization 6
  • BDTC - Beijing, 2013-12-6 Visualization != Infographics !  7
  • BDTC - Beijing, 2013-12-6 8
  • BDTC - Beijing, 2013-12-6 Large Volume Visualization !  Level of Details !  Out of Core !  Parallel Visualization 9
  • BDTC - Beijing, 2013-12-6 10 Top 10 Challenges in Extreme-Scale Data Visual Analytics Pak Chung Wong (PNNL) Han-Wei Shen (OSU) Chris Johnson (Utah) Chaomei Chen (Drexel) Robert Ross (Argonne)
  • BDTC - Beijing, 2013-12-6 Top 10 Challenges in ExtremeScale Data Visual Analytics 11 !  In Situ Analysis !  Perform as much analysis as possible while the data are still in memory !  Interaction and User Interfaces !  Machine-based automated systems vs. Human Cognition !  Large-Data Visualization !  Data projection and dimension Reduction, display technology !  Databases and Storage !  A cloud-based solution might not meet the needs !  Algorithms !  Address both data-size and visual-efficiency issues
  • BDTC - Beijing, 2013-12-6 Top 10 Challenges in ExtremeScale Data Visual Analytics 12 !  Data Movement/Transport, & Network Infrastructure !  Efficiently use networking resources and provide convenient abstractions !  Uncertainty Quantification !  Cope with incomplete data !  Parallelism !  Domain and Development Libraries, Frameworks, and Tools !  Affordable resource libraries, frameworks, and tools !  Social, Community, and Government Engagements
  • BDTC - Beijing, 2013-12-6 Challenges in Big Data Visualization/Visual Analytics - 1 !  Integrating heterogeneous Data from different resources and scales 13
  • BDTC - Beijing, 2013-12-6 Beijing Taxi GPS data 14
  • BDTC - Beijing, 2013-12-6 Data !  Beijing taxi GPS data !  Size: 34.5GB !  Taxi number: 28,519 !  Sampling point number: 379,107,927 !  Time range: 2009/03/02~25 (24 days, but 03/18 data is missing) !  Sampling rate: 30 seconds per point (but 60% data missing) !  Beijing road network (from OpenStreetMap) !  Size: 40.9 MB !  169,171 nodes and 35,422 ways 15
  • BDTC - Beijing, 2013-12-6 16
  • BDTC - Beijing, 2013-12-6 17
  • BDTC - Beijing, 2013-12-6 18 Traffic Jam Detection Defining&propaga,on&based&on& spa,al/temporal&rela,onship:& Raw&taxi& GPS&Data Raw&Road& Network Cleaned& GPS&Data Processed&& Road& Network e0 a GPS&Trajectories&Matched& to&the&Road&Network Traffic&Jam&Event&Data b e0&happens&before&e1,&and& on&a&dWay&following&e1 …… e0 … …… … 9:10&am Road&Speed&Data Traffic&Jam&Detec,on e1 50&km/h 9:10&am 55&km/h 9:20&am 45&km/h 9:20&am 10&km/h 9:30&am 12&km/h 9:30&am 12&km/h 9:40&am 15&km/h 9:40&am 45&km/h …… … …… … e1
  • BDTC - Beijing, 2013-12-6 19 Visual Interface Road&Segment&Level&Explora,on&and&Analysis Road&of& Interest One& Propaga,on& Graph Road&Speed&Data Propaga,on& Graphs&of& Interest Propaga,on&Graph&Level& Explora,on Propaga,on&Graph&List Spa,al&Density Time&and&Size&Distribu,on Spa,al&Filter Temporal&&&Size&Filter Topological& Clustering&& Traffic&Jam&Event&Data Traffic&Jam& Propaga,on&Graphs Dynamic&Query Topological& Filter
  • BDTC - Beijing, 2013-12-6 Preprocessing: Map Matching Raw taxi GPS Data Raw Road Network Cleane d GPS Data Processed Road Network Map Matching GPS Trajectories Matched to the Road Network 20
  • BDTC - Beijing, 2013-12-6 21
  • BDTC - Beijing, 2013-12-6 Visual Interface: Single Road Level 22 !  Pixel based visualization Time of a day: 144 columns (each for a 10min) Days: 24 rows (each for one day) Each cell represents one time bin Color encode speed
  • BDTC - Beijing, 2013-12-6 Case Study: Road Level Exploration and Analysis !  Different road congestion patterns 23
  • BDTC - Beijing, 2013-12-6 Case Study: Road Level Exploration and Analysis 24
  • BDTC - Beijing, 2013-12-6 25 Propagation Graph Analysis !  Spatial Temporal information of one propagation Large delay Spatial path Temporal delay
  • BDTC - Beijing, 2013-12-6 Propagation Pattern Exploration !  Propagation graphs for one region in the morning of different days 26
  • BDTC - Beijing, 2013-12-6 !  27
  • BDTC - Beijing, 2013-12-6 28
  • BDTC - Beijing, 2013-12-6 29
  • BDTC - Beijing, 2013-12-6 30
  • BDTC - Beijing, 2013-12-6 31
  • BDTC - Beijing, 2013-12-6 32
  • BDTC - Beijing, 2013-12-6 33
  • BDTC - Beijing, 2013-12-6 34
  • BDTC - Beijing, 2013-12-6 35
  • BDTC - Beijing, 2013-12-6 Weibo ThemeMap 36
  • BDTC - Beijing, 2013-12-6 Weibo ThemeMap !  37
  • BDTC - Beijing, 2013-12-6 Xiamen Traffic 38
  • BDTC - Beijing, 2013-12-6 !  39
  • BDTC - Beijing, 2013-12-6 Challenges in Big Data Visualization/Visual Analytics - 2 !  Integrating heterogeneous Data from different resources and scales !  Scalability in Data/Task complexity !  Data inherent properties impose more computational challenges methods for visualization and visual analysis on big data 40
  • BDTC - Beijing, 2013-12-6 Pollution From China 41
  • BDTC - Beijing, 2013-12-6 Pollution from USA 42
  • BDTC - Beijing, 2013-12-6 43 Multivariate to Multi-Run Visual Analysis QVAPOR QVAPOR QCLOUD Pressure Speed Run 1 QCLOUD QVAPOR QCLOUD Pressure Speed QVAPOR Run 2 Pressure Speed (Multivariate) QVAPOR QCLOUD Pressure Speed (Ensemble Runs) Run 3
  • BDTC - Beijing, 2013-12-6 Eulerian and Lagriangian Specifications !  Eulerian: !  Lagriangian: !  Relationships between two specifications (flow map): 44
  • BDTC - Beijing, 2013-12-6 Eulerian-based Attribute Space Projection ! Samples on data grid ! Samples in attribute space ! Eulerian-based Attribute Space Projection (EASP) 45
  • BDTC - Beijing, 2013-12-6 Lagrangian-based Attribute Space Projection !  Pathlines on data grid ! Pathlines in attribute space ! Lagrangian-based Attribute Space Projection (LASP) !  Both multivariate scalar fields and vector field are considered 46
  • BDTC - Beijing, 2013-12-6 Case: GEOS-5 Simulation 47
  • BDTC - Beijing, 2013-12-6 48 Couple Ensemble Flow Line Advection and Analysis (eFLAA)-Concept !  Ensemble data (large) !  Field line data (much larger than ensemble data) !  Variation field (small) !  Filtered lines (even smaller) [Guo, Yuan, Huang and Zhu TVCG 2013 (SCIVis ‘13)]
  • BDTC - Beijing, 2013-12-6 Benchmark Platform: NCSSJN !  ShenWei-based supercomputer !  SW1600 processor, 1.0~1.1GHz !  1GB memory for each core !  40Gbps high-speed interconnection !  x86-based supercomputer !  Intel Xeon E5675 hexa-core processor, 3.06GHz !  4GB memory for each core !  QDR Infiniband interconnection !  Shared global filesystem: SWGFS 49
  • BDTC - Beijing, 2013-12-6 50 Scalability !  Strong scalability test in National Super Computer Center in Jinan (ShenWei and x86 architectures)
  • BDTC - Beijing, 2013-12-6 GEOS-5 Simulation 51
  • BDTC - Beijing, 2013-12-6 GEOS-5 Simulation 52
  • BDTC - Beijing, 2013-12-6 GEOS-5 Simulation: CO2based Metric 53 ! The metric: the differences of locations / CO2 concentration along the pathline ! Findings !  The variation of the wind field is high in the north hemisphere !  However, The CO2 difference is higher in south hemisphere and some places in the north !  CO2 concentration is not sensitive to wind in above regions
  • BDTC - Beijing, 2013-12-6 Challenges in Big Data Visualization/Visual Analytics - 3 !  Integrating heterogeneous Data from different resources and scales !  Scalability in Data/Task complexity !  Data inherent properties impose more computational challenges methods for visualization and visual analysis on big data !  Limited access in Interaction for Large Data 54
  • BDTC - Beijing, 2013-12-6 Query 55
  • BDTC - Beijing, 2013-12-6 Dynamic Query !  56
  • BDTC - Beijing, 2013-12-6 Real-time Visual Querying of Big Data !  imMens 57
  • BDTC - Beijing, 2013-12-6 Real-time Visual Querying of Big Data !  !  58
  • BDTC - Beijing, 2013-12-6 Nanocubes for Real-Time Exploration of Spatiotemporal Datasets !  59
  • BDTC - Beijing, 2013-12-6 Challenges in Big Data Visualization/Visual Analytics - 4 !  Integrating heterogeneous Data from different resources and scales !  Scalability in Data/Task complexity !  Data inherent properties impose more computational challenges methods for visualization and visual analysis on big data !  Limited access in Interaction for Large Data !  Scalability in User !  Collaborative Visualization and Analysis on large data !  Can scientist create novel visualization without programming 60
  • BDTC - Beijing, 2013-12-6 61 Double Gulf Visualization Designer Visualization User Representation Evaluation Data Visualization Conceptual Model Execution Manipulation
  • BDTC - Beijing, 2013-12-6 62 Double Gulf Visualization Designer Visualization User Representation Evaluation Data Visualization Conceptual Model Execution Manipulation
  • BDTC - Beijing, 2013-12-6 63 From Data to User Visualization User Evaluation Execution Visualization Designer Representation Manipulation
  • BDTC - Beijing, 2013-12-6 64 Scalability In Users Visualization Designer Visualization User Representation Evaluation Data Visualization Conceptual Model Execution Manipulation
  • BDTC - Beijing, 2013-12-6 Scalability In Users – Collaborative Visualization 65
  • BDTC - Beijing, 2013-12-6 ThemeMap – Crowd Sourcing 66
  • BDTC - Beijing, 2013-12-6 Large Security Data Vis [Chen et al. IEEE VAST 2013 Situation Awareness Award] 67
  • BDTC - Beijing, 2013-12-6 Large Security Data Vis !  68
  • BDTC - Beijing, 2013-12-6 Large Security Data Vis 69
  • BDTC - Beijing, 2013-12-6 Crowd Sourcing based Vis. !  70
  • BDTC - Beijing, 2013-12-6 Scalability In Users – User - Visualization Expert 71
  • BDTC - Beijing, 2013-12-6 Visualization Assembly Line http://vis.pku.edu.cn/mddv/val/ 72
  • BDTC - Beijing, 2013-12-6 Visualization Assembly Line 73
  • BDTC - Beijing, 2013-12-6 Challenges in Big Data Visualization/Visual Analytics - 5 !  Integrating heterogeneous Data from different resources and scales !  Scalability in Data/Task complexity !  Limited access in Interaction for Large Data !  Scalability in User !  System Development !  Domain and Development Libraries, Frameworks, and Tools !  Social, Community, and Government Engagements 74
  • BDTC - Beijing, 2013-12-6 75 SCIVIS Visualization Systems !  VisIt - LLNL https://wci.llnl.gov/codes/visit !  ParaView- Kitware/SNL/LANL http://www.paraview.org !  IceT (Image Composition Engine for Tiles) - Sandia http://icet.sandia.gov !  Daxtoolkit - Data Analysis at Extreme http://www.daxtoolkit.org !  PISTON - Portable Data-Parallel Visualization and Analysis Library LANL http://viz.lanl.gov/projects/PISTON.html
  • BDTC - Beijing, 2013-12-6 VisIt !  Production end-user tool supporting scientific and engineering applications. !  Parallel post-processing that scales from desktops to massive HPC clusters. 76
  • BDTC - Beijing, 2013-12-6 77 Development of VisIt !  The VisIt project started in 2000 to support LLNL’s large scale ASC physics codes. !  Supported by multiple organizations: LLNL, LBNL, ORNL, UC Davis, Univ. of Utah, … !  Over 75 person years effort. !  1.5+ million lines of code. Based on SC’11 Tutorial
  • BDTC - Beijing, 2013-12-6 78
  • BDTC - Beijing, 2013-12-6 79 VTK W.J. Schroeder, K. Martin, and W. Lorensen, The Visualization Toolkit: An Object Oriented Approach to Computer Graphics, Third Edition, Kitware, Inc., ISBN-1-930934-12-2 (2004). S. E. Rogers, D. Kwak, and U. K. Kaul, A numerical study of three-dimensional incompressible flow around multiple post. In Proceedings of AIAA Aerospace Sciences Conference. AIAA Paper 86-0353. Reno, Nevada, 1986.
  • BDTC - Beijing, 2013-12-6 ParaView !  2000 Los Alamos National Laboratories and Kitware Inc. !  2005 Sandia National Laboratories and Kitware Inc. !  Used by academic, government, and commercial institutions worldwide. !  Downloaded ~100K times per year. 80
  • BDTC - Beijing, 2013-12-6 UV-CDAT Project 81
  • BDTC - Beijing, 2013-12-6 IN-SPIRE 82
  • BDTC - Beijing, 2013-12-6 Starlight Information Visualization System 83
  • BDTC - Beijing, 2013-12-6 Build a successful vis system !  System Design !  Domain User – Visualization Scientist “Co-design” !  Stable Development Team !  Funding Mechanism 84
  • BDTC - Beijing, 2013-12-6 Build a successful vis system !  System Design !  Domain User – Visualization Scientist “Co-design” !  Stable Development Team !  Funding Mechanism 85
  • BDTC - Beijing, 2013-12-6 86
  • BDTC - Beijing, 2013-12-6 Challenges in Big Data Visualization/Visual Analytics - 6 !  Integrating heterogeneous Data from different resources and scales !  Scalability in Data/Task complexity !  Limited access in Interaction for Large Data !  Scalability in User !  System Development !  Visualization Experts 87
  • BDTC - Beijing, 2013-12-6 VIS 2013 in Atlanta 88
  • BDTC - Beijing, 2013-12-6 89 Social, Community, and Government Engagements !  2013 IEEE VIS !  !  533 87 !  !  31 24 !  17 895 !  !  !  !  !  !  !  !  !  !  !  !  !  !  !  !  !  !  ! 
  • BDTC - Beijing, 2013-12-6 90 Social, Community, and Government Engagements !  Universities !  !  !  !  !  !  !  !  University of Tennessee in Knoxville Ohio State University SCI Institute, University of Utah University of California, Davis University of California, San Diego University of Nebraska-Lincoln Michigan Technological University Drexel University !  Supercomputer centers !  San Diego Supercomputer Center (SDSC) !  Texas Advanced Computing Center (TACC) !  National Center for Supercomputing Applications at the University of Illinois (NCSA) !  DoE Labs !  Argonne National Laboratory (ANL) !  Lawrence Berkeley National Laboratory (LBNL) !  Lawrence Livermore National Laboratory (LLNL) !  Los Alamos National Laboratory (LANL) !  Pacific Northwest National Laboratory (PNNL) !  Oak Ridge National Laboratory (ORNL) !  Sandia National Laboratories (SNL) !  National Renewable Energy Laboratory (NREL) !  Companies !  Kitware
  • BDTC - Beijing, 2013-12-6 91 Good News !  More and more universities started visualization research program !  Many Companies are aware of the importance of visualization !  Still, lack of national infrastructure
  • BDTC - Beijing, 2013-12-6 Vis Workshop 2013 @ PKU !  2013.7.12-13 92
  • BDTC - Beijing, 2013-12-6 MOOC Course on Visualization at PKU !  Start Spring 2014 !  Cover major topics in visualization 93
  • BDTC - Beijing, 2013-12-6 Acknowledgement !  Students !  Funds !  !  !  !  NSFC 863 PKU Beijing NSF !  Collaborators !  !  !  !  !  Jian Huang University of Tennessee http://vis.pku.edu.cn/wiki Zhu Xiaoming, SDSCC Xiaoru.yuan@pku.edu.cn Yongxian Zhang, China Earthquake Network Center Xiaoguang Ma, CAS IAP More … 94