• Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
300
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
10
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. BDTC - Beijing, 2013-12-6
  • 2. BDTC - Beijing, 2013-12-6 Big Data Visualization and Visual Analysis - the Challenges and Opportunities ⏠ ⏠
  • 3. BDTC - Beijing, 2013-12-6 Visualization 3
  • 4. BDTC - Beijing, 2013-12-6 !  : 4 … !  !  · !  ·
  • 5. BDTC - Beijing, 2013-12-6 !  (Visualization) (mental image) Data 5 (mental model) Insights Mental Model Image Visualization
  • 6. BDTC - Beijing, 2013-12-6 From Data to Visualization 6
  • 7. BDTC - Beijing, 2013-12-6 Visualization != Infographics !  7
  • 8. BDTC - Beijing, 2013-12-6 8
  • 9. BDTC - Beijing, 2013-12-6 Large Volume Visualization !  Level of Details !  Out of Core !  Parallel Visualization 9
  • 10. BDTC - Beijing, 2013-12-6 10 Top 10 Challenges in Extreme-Scale Data Visual Analytics Pak Chung Wong (PNNL) Han-Wei Shen (OSU) Chris Johnson (Utah) Chaomei Chen (Drexel) Robert Ross (Argonne)
  • 11. BDTC - Beijing, 2013-12-6 Top 10 Challenges in ExtremeScale Data Visual Analytics 11 !  In Situ Analysis !  Perform as much analysis as possible while the data are still in memory !  Interaction and User Interfaces !  Machine-based automated systems vs. Human Cognition !  Large-Data Visualization !  Data projection and dimension Reduction, display technology !  Databases and Storage !  A cloud-based solution might not meet the needs !  Algorithms !  Address both data-size and visual-efficiency issues
  • 12. BDTC - Beijing, 2013-12-6 Top 10 Challenges in ExtremeScale Data Visual Analytics 12 !  Data Movement/Transport, & Network Infrastructure !  Efficiently use networking resources and provide convenient abstractions !  Uncertainty Quantification !  Cope with incomplete data !  Parallelism !  Domain and Development Libraries, Frameworks, and Tools !  Affordable resource libraries, frameworks, and tools !  Social, Community, and Government Engagements
  • 13. BDTC - Beijing, 2013-12-6 Challenges in Big Data Visualization/Visual Analytics - 1 !  Integrating heterogeneous Data from different resources and scales 13
  • 14. BDTC - Beijing, 2013-12-6 Beijing Taxi GPS data 14
  • 15. BDTC - Beijing, 2013-12-6 Data !  Beijing taxi GPS data !  Size: 34.5GB !  Taxi number: 28,519 !  Sampling point number: 379,107,927 !  Time range: 2009/03/02~25 (24 days, but 03/18 data is missing) !  Sampling rate: 30 seconds per point (but 60% data missing) !  Beijing road network (from OpenStreetMap) !  Size: 40.9 MB !  169,171 nodes and 35,422 ways 15
  • 16. BDTC - Beijing, 2013-12-6 16
  • 17. BDTC - Beijing, 2013-12-6 17
  • 18. BDTC - Beijing, 2013-12-6 18 Traffic Jam Detection Defining&propaga,on&based&on& spa,al/temporal&rela,onship:& Raw&taxi& GPS&Data Raw&Road& Network Cleaned& GPS&Data Processed&& Road& Network e0 a GPS&Trajectories&Matched& to&the&Road&Network Traffic&Jam&Event&Data b e0&happens&before&e1,&and& on&a&dWay&following&e1 …… e0 … …… … 9:10&am Road&Speed&Data Traffic&Jam&Detec,on e1 50&km/h 9:10&am 55&km/h 9:20&am 45&km/h 9:20&am 10&km/h 9:30&am 12&km/h 9:30&am 12&km/h 9:40&am 15&km/h 9:40&am 45&km/h …… … …… … e1
  • 19. BDTC - Beijing, 2013-12-6 19 Visual Interface Road&Segment&Level&Explora,on&and&Analysis Road&of& Interest One& Propaga,on& Graph Road&Speed&Data Propaga,on& Graphs&of& Interest Propaga,on&Graph&Level& Explora,on Propaga,on&Graph&List Spa,al&Density Time&and&Size&Distribu,on Spa,al&Filter Temporal&&&Size&Filter Topological& Clustering&& Traffic&Jam&Event&Data Traffic&Jam& Propaga,on&Graphs Dynamic&Query Topological& Filter
  • 20. BDTC - Beijing, 2013-12-6 Preprocessing: Map Matching Raw taxi GPS Data Raw Road Network Cleane d GPS Data Processed Road Network Map Matching GPS Trajectories Matched to the Road Network 20
  • 21. BDTC - Beijing, 2013-12-6 21
  • 22. BDTC - Beijing, 2013-12-6 Visual Interface: Single Road Level 22 !  Pixel based visualization Time of a day: 144 columns (each for a 10min) Days: 24 rows (each for one day) Each cell represents one time bin Color encode speed
  • 23. BDTC - Beijing, 2013-12-6 Case Study: Road Level Exploration and Analysis !  Different road congestion patterns 23
  • 24. BDTC - Beijing, 2013-12-6 Case Study: Road Level Exploration and Analysis 24
  • 25. BDTC - Beijing, 2013-12-6 25 Propagation Graph Analysis !  Spatial Temporal information of one propagation Large delay Spatial path Temporal delay
  • 26. BDTC - Beijing, 2013-12-6 Propagation Pattern Exploration !  Propagation graphs for one region in the morning of different days 26
  • 27. BDTC - Beijing, 2013-12-6 !  27
  • 28. BDTC - Beijing, 2013-12-6 28
  • 29. BDTC - Beijing, 2013-12-6 29
  • 30. BDTC - Beijing, 2013-12-6 30
  • 31. BDTC - Beijing, 2013-12-6 31
  • 32. BDTC - Beijing, 2013-12-6 32
  • 33. BDTC - Beijing, 2013-12-6 33
  • 34. BDTC - Beijing, 2013-12-6 34
  • 35. BDTC - Beijing, 2013-12-6 35
  • 36. BDTC - Beijing, 2013-12-6 Weibo ThemeMap 36
  • 37. BDTC - Beijing, 2013-12-6 Weibo ThemeMap !  37
  • 38. BDTC - Beijing, 2013-12-6 Xiamen Traffic 38
  • 39. BDTC - Beijing, 2013-12-6 !  39
  • 40. BDTC - Beijing, 2013-12-6 Challenges in Big Data Visualization/Visual Analytics - 2 !  Integrating heterogeneous Data from different resources and scales !  Scalability in Data/Task complexity !  Data inherent properties impose more computational challenges methods for visualization and visual analysis on big data 40
  • 41. BDTC - Beijing, 2013-12-6 Pollution From China 41
  • 42. BDTC - Beijing, 2013-12-6 Pollution from USA 42
  • 43. BDTC - Beijing, 2013-12-6 43 Multivariate to Multi-Run Visual Analysis QVAPOR QVAPOR QCLOUD Pressure Speed Run 1 QCLOUD QVAPOR QCLOUD Pressure Speed QVAPOR Run 2 Pressure Speed (Multivariate) QVAPOR QCLOUD Pressure Speed (Ensemble Runs) Run 3
  • 44. BDTC - Beijing, 2013-12-6 Eulerian and Lagriangian Specifications !  Eulerian: !  Lagriangian: !  Relationships between two specifications (flow map): 44
  • 45. BDTC - Beijing, 2013-12-6 Eulerian-based Attribute Space Projection ! Samples on data grid ! Samples in attribute space ! Eulerian-based Attribute Space Projection (EASP) 45
  • 46. BDTC - Beijing, 2013-12-6 Lagrangian-based Attribute Space Projection !  Pathlines on data grid ! Pathlines in attribute space ! Lagrangian-based Attribute Space Projection (LASP) !  Both multivariate scalar fields and vector field are considered 46
  • 47. BDTC - Beijing, 2013-12-6 Case: GEOS-5 Simulation 47
  • 48. BDTC - Beijing, 2013-12-6 48 Couple Ensemble Flow Line Advection and Analysis (eFLAA)-Concept !  Ensemble data (large) !  Field line data (much larger than ensemble data) !  Variation field (small) !  Filtered lines (even smaller) [Guo, Yuan, Huang and Zhu TVCG 2013 (SCIVis ‘13)]
  • 49. BDTC - Beijing, 2013-12-6 Benchmark Platform: NCSSJN !  ShenWei-based supercomputer !  SW1600 processor, 1.0~1.1GHz !  1GB memory for each core !  40Gbps high-speed interconnection !  x86-based supercomputer !  Intel Xeon E5675 hexa-core processor, 3.06GHz !  4GB memory for each core !  QDR Infiniband interconnection !  Shared global filesystem: SWGFS 49
  • 50. BDTC - Beijing, 2013-12-6 50 Scalability !  Strong scalability test in National Super Computer Center in Jinan (ShenWei and x86 architectures)
  • 51. BDTC - Beijing, 2013-12-6 GEOS-5 Simulation 51
  • 52. BDTC - Beijing, 2013-12-6 GEOS-5 Simulation 52
  • 53. BDTC - Beijing, 2013-12-6 GEOS-5 Simulation: CO2based Metric 53 ! The metric: the differences of locations / CO2 concentration along the pathline ! Findings !  The variation of the wind field is high in the north hemisphere !  However, The CO2 difference is higher in south hemisphere and some places in the north !  CO2 concentration is not sensitive to wind in above regions
  • 54. BDTC - Beijing, 2013-12-6 Challenges in Big Data Visualization/Visual Analytics - 3 !  Integrating heterogeneous Data from different resources and scales !  Scalability in Data/Task complexity !  Data inherent properties impose more computational challenges methods for visualization and visual analysis on big data !  Limited access in Interaction for Large Data 54
  • 55. BDTC - Beijing, 2013-12-6 Query 55
  • 56. BDTC - Beijing, 2013-12-6 Dynamic Query !  56
  • 57. BDTC - Beijing, 2013-12-6 Real-time Visual Querying of Big Data !  imMens 57
  • 58. BDTC - Beijing, 2013-12-6 Real-time Visual Querying of Big Data !  !  58
  • 59. BDTC - Beijing, 2013-12-6 Nanocubes for Real-Time Exploration of Spatiotemporal Datasets !  59
  • 60. BDTC - Beijing, 2013-12-6 Challenges in Big Data Visualization/Visual Analytics - 4 !  Integrating heterogeneous Data from different resources and scales !  Scalability in Data/Task complexity !  Data inherent properties impose more computational challenges methods for visualization and visual analysis on big data !  Limited access in Interaction for Large Data !  Scalability in User !  Collaborative Visualization and Analysis on large data !  Can scientist create novel visualization without programming 60
  • 61. BDTC - Beijing, 2013-12-6 61 Double Gulf Visualization Designer Visualization User Representation Evaluation Data Visualization Conceptual Model Execution Manipulation
  • 62. BDTC - Beijing, 2013-12-6 62 Double Gulf Visualization Designer Visualization User Representation Evaluation Data Visualization Conceptual Model Execution Manipulation
  • 63. BDTC - Beijing, 2013-12-6 63 From Data to User Visualization User Evaluation Execution Visualization Designer Representation Manipulation
  • 64. BDTC - Beijing, 2013-12-6 64 Scalability In Users Visualization Designer Visualization User Representation Evaluation Data Visualization Conceptual Model Execution Manipulation
  • 65. BDTC - Beijing, 2013-12-6 Scalability In Users – Collaborative Visualization 65
  • 66. BDTC - Beijing, 2013-12-6 ThemeMap – Crowd Sourcing 66
  • 67. BDTC - Beijing, 2013-12-6 Large Security Data Vis [Chen et al. IEEE VAST 2013 Situation Awareness Award] 67
  • 68. BDTC - Beijing, 2013-12-6 Large Security Data Vis !  68
  • 69. BDTC - Beijing, 2013-12-6 Large Security Data Vis 69
  • 70. BDTC - Beijing, 2013-12-6 Crowd Sourcing based Vis. !  70
  • 71. BDTC - Beijing, 2013-12-6 Scalability In Users – User - Visualization Expert 71
  • 72. BDTC - Beijing, 2013-12-6 Visualization Assembly Line http://vis.pku.edu.cn/mddv/val/ 72
  • 73. BDTC - Beijing, 2013-12-6 Visualization Assembly Line 73
  • 74. BDTC - Beijing, 2013-12-6 Challenges in Big Data Visualization/Visual Analytics - 5 !  Integrating heterogeneous Data from different resources and scales !  Scalability in Data/Task complexity !  Limited access in Interaction for Large Data !  Scalability in User !  System Development !  Domain and Development Libraries, Frameworks, and Tools !  Social, Community, and Government Engagements 74
  • 75. BDTC - Beijing, 2013-12-6 75 SCIVIS Visualization Systems !  VisIt - LLNL https://wci.llnl.gov/codes/visit !  ParaView- Kitware/SNL/LANL http://www.paraview.org !  IceT (Image Composition Engine for Tiles) - Sandia http://icet.sandia.gov !  Daxtoolkit - Data Analysis at Extreme http://www.daxtoolkit.org !  PISTON - Portable Data-Parallel Visualization and Analysis Library LANL http://viz.lanl.gov/projects/PISTON.html
  • 76. BDTC - Beijing, 2013-12-6 VisIt !  Production end-user tool supporting scientific and engineering applications. !  Parallel post-processing that scales from desktops to massive HPC clusters. 76
  • 77. BDTC - Beijing, 2013-12-6 77 Development of VisIt !  The VisIt project started in 2000 to support LLNL’s large scale ASC physics codes. !  Supported by multiple organizations: LLNL, LBNL, ORNL, UC Davis, Univ. of Utah, … !  Over 75 person years effort. !  1.5+ million lines of code. Based on SC’11 Tutorial
  • 78. BDTC - Beijing, 2013-12-6 78
  • 79. BDTC - Beijing, 2013-12-6 79 VTK W.J. Schroeder, K. Martin, and W. Lorensen, The Visualization Toolkit: An Object Oriented Approach to Computer Graphics, Third Edition, Kitware, Inc., ISBN-1-930934-12-2 (2004). S. E. Rogers, D. Kwak, and U. K. Kaul, A numerical study of three-dimensional incompressible flow around multiple post. In Proceedings of AIAA Aerospace Sciences Conference. AIAA Paper 86-0353. Reno, Nevada, 1986.
  • 80. BDTC - Beijing, 2013-12-6 ParaView !  2000 Los Alamos National Laboratories and Kitware Inc. !  2005 Sandia National Laboratories and Kitware Inc. !  Used by academic, government, and commercial institutions worldwide. !  Downloaded ~100K times per year. 80
  • 81. BDTC - Beijing, 2013-12-6 UV-CDAT Project 81
  • 82. BDTC - Beijing, 2013-12-6 IN-SPIRE 82
  • 83. BDTC - Beijing, 2013-12-6 Starlight Information Visualization System 83
  • 84. BDTC - Beijing, 2013-12-6 Build a successful vis system !  System Design !  Domain User – Visualization Scientist “Co-design” !  Stable Development Team !  Funding Mechanism 84
  • 85. BDTC - Beijing, 2013-12-6 Build a successful vis system !  System Design !  Domain User – Visualization Scientist “Co-design” !  Stable Development Team !  Funding Mechanism 85
  • 86. BDTC - Beijing, 2013-12-6 86
  • 87. BDTC - Beijing, 2013-12-6 Challenges in Big Data Visualization/Visual Analytics - 6 !  Integrating heterogeneous Data from different resources and scales !  Scalability in Data/Task complexity !  Limited access in Interaction for Large Data !  Scalability in User !  System Development !  Visualization Experts 87
  • 88. BDTC - Beijing, 2013-12-6 VIS 2013 in Atlanta 88
  • 89. BDTC - Beijing, 2013-12-6 89 Social, Community, and Government Engagements !  2013 IEEE VIS !  !  533 87 !  !  31 24 !  17 895 !  !  !  !  !  !  !  !  !  !  !  !  !  !  !  !  !  !  ! 
  • 90. BDTC - Beijing, 2013-12-6 90 Social, Community, and Government Engagements !  Universities !  !  !  !  !  !  !  !  University of Tennessee in Knoxville Ohio State University SCI Institute, University of Utah University of California, Davis University of California, San Diego University of Nebraska-Lincoln Michigan Technological University Drexel University !  Supercomputer centers !  San Diego Supercomputer Center (SDSC) !  Texas Advanced Computing Center (TACC) !  National Center for Supercomputing Applications at the University of Illinois (NCSA) !  DoE Labs !  Argonne National Laboratory (ANL) !  Lawrence Berkeley National Laboratory (LBNL) !  Lawrence Livermore National Laboratory (LLNL) !  Los Alamos National Laboratory (LANL) !  Pacific Northwest National Laboratory (PNNL) !  Oak Ridge National Laboratory (ORNL) !  Sandia National Laboratories (SNL) !  National Renewable Energy Laboratory (NREL) !  Companies !  Kitware
  • 91. BDTC - Beijing, 2013-12-6 91 Good News !  More and more universities started visualization research program !  Many Companies are aware of the importance of visualization !  Still, lack of national infrastructure
  • 92. BDTC - Beijing, 2013-12-6 Vis Workshop 2013 @ PKU !  2013.7.12-13 92
  • 93. BDTC - Beijing, 2013-12-6 MOOC Course on Visualization at PKU !  Start Spring 2014 !  Cover major topics in visualization 93
  • 94. BDTC - Beijing, 2013-12-6 Acknowledgement !  Students !  Funds !  !  !  !  NSFC 863 PKU Beijing NSF !  Collaborators !  !  !  !  !  Jian Huang University of Tennessee http://vis.pku.edu.cn/wiki Zhu Xiaoming, SDSCC Xiaoru.yuan@pku.edu.cn Yongxian Zhang, China Earthquake Network Center Xiaoguang Ma, CAS IAP More … 94