Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Designing Progressive and Interactive Analytics Processes for High-Dimensional Data Analysis

163 views

Published on

Slides for my talk at the VAST 2016 conference within IEEE VIS 2016. The details of the presented paper can be found on this page: http://www.gicentre.net/featuredpapers/#/turkaydesigning2016/

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Designing Progressive and Interactive Analytics Processes for High-Dimensional Data Analysis

  1. 1. Cagatay Turkay Erdem Kaya Selim Balcisoy Helwig Hauser www.gicentre.net/vis2016 Designing Progressive and Interactive Analytics Processes for High-Dimensional Data Analysis
  2. 2. Visual analytics (VA) can be considered as an interactive and iterative dialogue between the human and the computer where the interactive analysis process is a sequence of actions by the user and responses by the computer motivated by an analytical question … Based on several works, e.g., [Keim et al. 2008], [Green et al. 2008]
  3. 3. …. this iterative discourse serves as the fundamental mechanism through which analysts make observations … we need to ensure that this dialogue takes place at a pace that respects the temporal cognitive capabilities of users …
  4. 4. Please wait, while I construct my next sentence ...
  5. 5. Card, S.K., Robertson, G.G. and Mackinlay, J.D., 1991, The information visualizer, an information workspace. In Proceedings of the ACM SIGCHI
  6. 6. THURSDAY, 4:15 InfoVis: Scalable Algorithms
  7. 7. This paper … … visual data analysis processes where a computational tool is integrated to support high-dimensional data analysis
  8. 8. … instead of forcing the user to wait for an interactive computation to finish, we present a best possible result within an acceptable time frame. In essence ..
  9. 9. …. techniques and design considerations to incorporate progressive methods within interactive analysis processes that involve high-dimensional data …. Online algorithms Visual Representations Levels of Operation Interactions
  10. 10. Human time constants to govern the pace of interaction….
  11. 11. Levels of Operation Level 1 (0.1 sec.) What: (animated) transitions between (computation) results Why: ensures perceptually smooth transitions Level 2 (1 sec.) What: guaranteed response time for intermediate results Why: Maintains dialog nature Level 3 (10 – 30 sec.) What: analytical unit task completion Why: Answer a specific question e.g. finding groups, locate outliers A framework to implement human time constants
  12. 12. Integrating online algorithms Can operate on small batches of data (on random sample subsets) Produce approximate results Updates can be done efficiently Online PCA (Ross et al., 2008) Online clustering (Sculley et al., 2010)
  13. 13. online PCA - Incremental SVD computation - Intermediate results at each 1 sec. - Immediate response - Compute on subset but project all - Colouring & improved transitions
  14. 14. Adaptive random sampling Guarantee response in a fixed period of time (i.e. 1 sec.) Faster convergence 1st Batch size: %8 , Time taken: 0.3 sec. 2nd Batch size: %16, Time taken: 0.7 sec. 3rd Batch size: %33, Time taken: 1.3 sec. 4th Batch size: %25, Time taken: 0.9 sec. 5th Batch size: %25, Time taken: 0.9 sec. Let’s say these are your data items
  15. 15. online clustering - Cluster only the subset - Incrementally grow clusters
  16. 16. progress & certainty Increasing sample size
  17. 17. Interaction methods to moderate the process Key-framed brushing [Turkay, 2014] Well-defined sequences that can be represented in 30 sec. Help define analytical unit task
  18. 18. Evalution Workshops Problem: Credit card transactions segmentation -- groups of expenditures with similar characteristics Data: 300K+ CC transactions, 5K customers (demog., location, financial metrics, etc.) Methodology: - 2-months long case study, 4 analysis session (1 for training) with 4 CRM analyst - Fly-on-the-wall observations - Semi-structured interviews - Video and sound recorded, renounce times noted, 32 hours of video processed for the extraction of inference moments and quotes transcribed. - Insights, questions, hypothesis identified
  19. 19. Observed/reported positive aspects - Generation and verification of hypotheses in short time ..... ..... .....
  20. 20. Observed/reported positive aspects - Generation and verification of hypotheses in short time - Continuous engagement “We could generate so many new hypotheses in a very short time without waiting for the whole calculation to end.” “..., [Visualization] is quite engaging as we don’t have to wait for even a moment to get some initial results.”
  21. 21. Observed/reported positive aspects - Generation and verification of hypotheses in short time - Continuous engagement - Stability is key in decisions “ … It seems like the clustering will not change. ... let’s switch to some other set …
  22. 22. Observed/reported issues - Continuous update of the visualization can be distracting “... it can be distracting to look at an ever-changing visualization. [If we were] able to set the step size, … then we can have some time to talk about intermediate results.” - Uncertainty and unstability is an issue “…. I’ve just seen a high response score for the selected cluster, but it has just gone away. “ - Early decisions might be wrong - Multiple views operating concurrently can be problematic Unaligned convergence
  23. 23. Ten Design Recommendations DR1: Employ human time constants as the underlying theoretical framework that governs the pace of interaction in analytical processes DR2: Employ online learning algorithms that are capable of handling data in sub-batches to perform computational tasks. DR3: Employ an adaptive sampling mechanism that estimates suitable sample sizes for computations to ensure efficiency in convergence while still respecting the temporal constraints. DR4: Facilitate the immediate initiation of computations in response to user interactions that limit the domain of the algorithms. DR5: Provide users with interaction mechanisms enabling management (pause, step size, re- run) of the progression. DR6: During the interaction design of visual analytic solutions, consider the effects of possible fluctuations due to unaligned progression in multiple progressive views. DR7: Provide interaction mechanisms to define structured investigation sequences for systematic generation and comparisons of computational results. DR8: Support the interpretation of the evolution of the results through suitable visualization techniques. DR9: Inform analysts on the progress of computations and indications of time-to-completion. DR10: Inform analysts on the uncertainty in the computations and the way the computations develop.
  24. 24. Future challenges & opportunities Better heuristics/quality metrics Reproducibility? - different samples in each run Provenance
  25. 25. …. instead of forcing the user to adjust to the temporal and cognitive capabilities of visual analysis solutions, we orient the technical solutions at the communication characteristics of the users. To conclude …
  26. 26. Cagatay Turkay Erdem Kaya Selim Balcisoy Helwig Hauser Designing Progressive and Interactive Analytics Processes for High-Dimensional Data Analysis Full list of giCentre VIS 2016 contributions www.gicentre.net/vis2016

×