Visual analytics
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
401
On Slideshare
388
From Embeds
13
Number of Embeds
2

Actions

Shares
Downloads
18
Comments
0
Likes
0

Embeds 13

http://pointcarre.vub.ac.be 12
http://www.slideee.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. 09/05/14 pag. 1 Information visualization lecture 8 visual analytics Katrien Verbert Department of Computer Science Faculty of Science Vrije Universiteit Brussel katrien.verbert@vub.ac.be
  • 2. 09/05/14 pag. 2 Motivation h"p://bigdatablog.emc.com/2013/08/08/mlb-­‐a-­‐big-­‐fan-­‐of-­‐big-­‐data/  
  • 3. 09/05/14 pag. 3 Motivation •  Volume:Gigabyte(109), Terabyte(1012), Petabyte(1015), Exabyte(1018), Zettabytes(1021) •  Variety: –  structured, semi-structured, unstructured; –  text, image, audio, video, record •  Velocity: dynamic, sometimes time-varying Big  Data  refers  to  datasets  that  grow  so  large  that  it  is  difficult  to  capture,  store,   manage,  share,  analyze  and  visualize  with  the  typical  database  soFware  tools.    
  • 4. 09/05/14 pag. 4 The social layer in an interconnected world 2+   billion   people  on   the  Web   by  end   2011     30  billion  RFID   tags  today    (1.3B  in  2005)   4.6   billion   camera   phones   world  wide   100s  of   millions   of  GPS   enabled   devices  sold   annually   76  million  smart   meters  in  2009…    200M  by  2014     12+ TBs of tweet data every day 25+ TBs of log data every day ?  TBs  of   data  every  day  
  • 5. 09/05/14 pag. 5 Motivation Raw  data  has  no  value  in  itself,  only  the  extracted  informaIon  has  value.       Src  :   h"p://www.sas.com/knowledge-­‐exchange/business-­‐analyIcs/featured/big-­‐data-­‐drives-­‐performance-­‐%E2%80%93-­‐if-­‐ you-­‐can-­‐exploit-­‐the-­‐informaIon-­‐overload/index.html  
  • 6. 09/05/14 pag. 6 Information overload •  Refers to the danger of getting lost in data, which may be: –  irrelevant  to  the  current  task  at  hand,   –  processed  in  an  inappropriate  way,  or   –  presented  in  an  inappropriate  way.   Src:  h"p://supermarketpeople.co.uk/archives/713  
  • 7. 09/05/14 pag. 7 Visual Analytics - Mastering the Information Age h"ps://www.youtube.com/watch?v=5i3xbitEVfs    
  • 8. 09/05/14 pag. 8 Visual analytics is the science of analytical reasoning facilitated by interactive visual interfaces. Definition
  • 9. 09/05/14 pag. 9 Whom does it matter •  Research Community J •  Business Community - New tools, new capabilities, new infrastructure, new business models etc. Financial  Services...  
  • 10. 09/05/14 pag. 10 Visual analytics is a multidisciplinary field
  • 11. 09/05/14 pag. 11 History
  • 12. 09/05/14 pag. 12 History discovery tools •  Shneiderman (IV ‘02) suggests combining computational analysis approaches such as data mining with information visualization –  Too often viewed as competitors in past –  Instead, can complement each other –  Each has something valuable to contribute •  Alternatives –  Issues influencing the design of discovery tools: •  Statistical algorithms vs. visual data presentation •  Hypothesis testing vs. exploratory data analysis –  Each has Pro’s and Con’s
  • 13. 09/05/14 pag. 13 Hypothesis testing & exploratory data analysis •  Hypothesis testing –  Advocates: •  By stating hypotheses up front, limit variables and sharpens thinking, more precise measurement –  Critics: •  Too far from reality, initial hypotheses bias toward finding evidence to support it •  Exploratory Data Analysis –  Advocates: •  Find the interesting things this way, we now have computational capabilities to do them –  Skeptics: •  Not generalizable, everything is a special case, detecting statistical relationships does not infer cause and effect
  • 14. 09/05/14 pag. 14 Recommendations •  Integrate data mining and information visualization •  Allow users to specify what they are seeking •  Recognize that users are situated in a social context •  Respect human responsibility
  • 15. 09/05/14 pag. 15 History •  Information visualization systems inadequately supported decision making (Amar & Stasko IV ‘04): –  Limited affordances –  Predetermined representations –  Decline of determinism in decision-Making •  “Representational primacy” versus “Analytic primacy” –  Telling truth about data vs. providing analytically useful visualizations
  • 16. 09/05/14 pag. 16 Representational primacy •  Pursuit of faithful data replication and comprehension •  Above all else, we must show the data •  Can be a limiting notion •  May focus on low level tasks not relevant to users •  Some data is hard to represent faithfully (think high dimensional data)
  • 17. 09/05/14 pag. 17 Gaps between representation and analysis •  When explanatory or correlative models are the desired outcome, model visualization may be more important than data visualization •  One end is to build black boxes where data goes in and the answer comes out •  Not a good idea to trust black boxes –  What  can  go  wrong  with  this?     –  White  box  approach  be"er.  
  • 18. 09/05/14 pag. 18 Application of Precepts Worldview-based tasks (“Did we show the right thing to the user?”) –  Determine Domain Parameters –  Expose Multivariate Explanation –  Facilitate Hypothesis Testing Rationale-based tasks (“Will the user believe what she sees?”) -  Expose  Uncertainty     -  ConcreIze  RelaIonships     -  Expose  Cause  and  Effect     Amar  &  Stasko  2005  
  • 19. 09/05/14 pag. 19 Task level emphases •  Don’t just help “low-level” tasks –  Find, filter, correlate, etc. •  Facilitate analytical thinking –  Complex decision-making, especially under uncertainty –  Learning a domain –  Identifying the nature of trends –  Predicting the future
  • 20. 09/05/14 pag. 20 History:  coining  the  term •  2003-04 Jim Thomas of PNNL, together with colleagues, develops notion of visual analytics –  Holds workshops at PNNL and at InfoVis ‘04 to help define a research agenda •  Agenda is formalized in book, Illuminating the Path –  Available online, http://nvac.pnl.gov/ •  People use visual analytics tools and techniques to –  Synthesize information and derive insight from massive, dynamic, ambiguous, and often conflicting data –  Detect the expected and discover the unexpected –  Provide timely, defensible, and understandable assessments –  Communicate assessment effectively for action. Src: Stasko
  • 21. 09/05/14 pag. 21 Visual  AnalyMcs   •  Not really an “area” per se –  More of an “umbrella” notion –  Combines multiple areas or disciplines –  Ultimately about using data to improve our knowledge and help make decisions •  Main Components: Src: Stasko
  • 22. 09/05/14 pag. 22 Visual Analytics: alternate Definition Visual analytics combines automated analysis techniques with interactive visualizations for an effective understanding, reasoning and decision making on the basis of very large and complex data sets Keim et al, chapter in Information Visualization: Human-Centered Issues and Perspectives, 2008
  • 23. 09/05/14 pag. 23 In-Spire Video h"ps://www.youtube.com/watch?v=YONTBZaxz8g    
  • 24. 09/05/14 pag. 24 Synergy •  Combine strengths of both human and electronic data processing –  Gives a semi-automated analytical process –  Use strengths from each –  Below from Keim, 2008
  • 25. 09/05/14 pag. 25 InfoVis Comparison •  Clearly much overlap •  Perhaps fair to say that infovis hasn’t always focused on analysis tasks so much and that it doesn’t always include advanced data analysis algorithms –  Not a criticism, just not focus –  InfoVis has a more narrow scope Src: Stasko
  • 26. 09/05/14 pag. 26 Visual Analytics •  Encompassing, integrated approach to data analysis –  Use computational algorithms where helpful –  Use human-directed visual exploration where helpful –  Not just “Apply A, then apply B” though –  Integrate the two tightly Stasko, 2013 Src: Stasko
  • 27. 09/05/14 pag. 27 Visual analytics aims at making data and information processing transparent just as information visualization has changed our view on databases, the goal of visual analytics is to make our way of processing data and information transparent for an analytic discourse
  • 28. 09/05/14 pag. 28 The visual analytics process The  visual  analyIcs  process  is  characterized  through  interacIon  between  data,  visualizaIons,   models  about  the  data,  and  the  users  in  order  to  discover  knowledge.    
  • 29. 09/05/14 pag. 29 Information seeking mantra for visual analytics applications Overview first, zoom/filter, details on demand Analyze first, show the important, zoom/filter, analyze further, details on demand. Keim  et  al.  2006  
  • 30. 09/05/14 pag. 30 Applications
  • 31. 09/05/14 pag. 31
  • 32. 09/05/14 pag. 32 UTOPIAN: user-driven topic modeling     h"ps://www.youtube.com/watch?v=du6_s6hcaRA&feature=youtu.be  
  • 33. 09/05/14 pag. 33 SketchPadN-D: WYDIWYG Sculpting and Editing in High-Dimensional Space     h"ps://www.youtube.com/watch?v=ar8kAdAfx6w  
  • 34. 09/05/14 pag. 34 Some features
  • 35. 09/05/14 pag. 35 Relationship discovery Scale independent representations, whole and parts at same time at multiple levels of abstraction, often linked
  • 36. 09/05/14 pag. 36 Relationship discovery Explore high dimensional relationships, theme groupings, outlier detection, searching by proximity at multiple scales
  • 37. 09/05/14 pag. 37 Combined exploratory and confirmatory analytics •  Develop and refine hypothesis •  Evidence collection, management, and matching to hypothesis •  Tailor views/displays for thematic/hypothesis focus of interest •  Often suggestive of predictions enabling proactive thinking
  • 38. 09/05/14 pag. 38 Multiple data types •  Supports multiple data types: structured/unstructured text •  Imagery/video, cyber •  Systems of either data type or application specific
  • 39. 09/05/14 pag. 39 Temporal views and interactions •  Most analytics situations involve time, pace, velocity •  Group segments of thoughts by time •  Compare time segments •  Often combined with geospatial
  • 40. 09/05/14 pag. 40 Reasoning workspace Stu Card, PARC
  • 41. 09/05/14 pag. 41 Grouping and outlier detection •  Form groups of thought/data •  Labels and annotation •  Compare groupings •  Find small groups or outliers 41
  • 42. 09/05/14 pag. 42 Labeling •  Critically important, •  Dynamic in scope, number labels, size, color •  Positioning •  Almost everything has labels •  Labels tell semantic meaning
  • 43. 09/05/14 pag. 43 Multiple linked views Temporal, geospatial, theme, cluster, list views with association linkages between views
  • 44. 09/05/14 pag. 44 Challenges
  • 45. 09/05/14 pag. 45 Challenge 1: scalability •  Challenge with regard to both: –  visual  representaIons     –  automaIc  analysis   •  Requirements/needs: –  SoluIon  needs  to  scale  in  size,  dimensionality,  data  types,  levels  of  quality.   –  Methods  are  needed  to  deal  with  input  data  that  is  noisy  and  conInuous.   –  Pa"erns  and  relaIonships  need  to  be  visualized  on  different  levels  of   details,  and  with  appropriate  levels  of  data  and  visual  abstracIon.  
  • 46. 09/05/14 pag. 46 Challenge 2: uncertainty •  Challenge: –  large  amount  of  noise  and  missing  values  from  heterogeneous  data  sources     –  bias  introduced  by  automaIc  analysis  methods  as  well  as  human  percepIon   •  Requirements/needs: –  representaIon  of  the  noIon  of  data  quality  and  the  confidence     –  analysts  need  to  be  aware  of  the  uncertainty  and  be  able  to  analyze  quality  
  • 47. 09/05/14 pag. 47 Challenge 3: text data stream •  Analysis and visualization of text steams is still a relatively new field. •  Text stream data often –  have  li"le  structure,  grammar  and  context,   –  are  provided  as  high-­‐frequency  mulIlingual  stream,     –  contain  a  high  percentage  of  non-­‐meaningful  and  irrelevant  messages.     •  The challenge is to handle the dynamic nature of the stream data and the low-tolerance of monitoring delay in emergency cases.
  • 48. 09/05/14 pag. 48 Challenge 4: interaction •  Novel interaction techniques and tangible user-interfaces for seamless intuitive visual communication with the system. •  The analyst should be able to –  fully  focus  on  the  task  at  hand  and     –  not  be  distracted  by  overly  technical  or  complex  user  interfaces  and  interacIons.     •  User feedback should be taken as intelligently as possible, requiring as little user input as possible.
  • 49. 09/05/14 pag. 49 Challenge 5: evaluation •  Challenge: –  hard  to  assess  the  quality  of  visual  analyIcs  soluIons   –  due  to  the  interdisciplinary  nature  and  complex  process   •  Needs: –  evaluaIon  framework  to  assess  effecIveness,  efficiency  and  acceptance   –  of  new  visual  analyIcs  techniques,  methods,  and  models  
  • 50. 09/05/14 pag. 50 Challenge 6: infrastructure •  Challenge: –  most  soluIons  develop  their  own  infrastructures     –  mismatch  between  the  level  of  service  provided  and  real  needs:   •  fast  and  precise  answers  with  progressive  refinement,     •  incremental  re-­‐computaIon,     •  steering  the  computaIon  towards  data  regions  that  are  of  interest.     •  Requirements/needs: –  infrastructure  to  bind  together  all  the  processes,  funcIons,  and  services     –  repositories  of  available  visual  analyIcs  soluIons  
  • 51. 09/05/14 pag. 51 h"p://www.visual-­‐analyIcs.eu/    
  • 52. 09/05/14 pag. 52 Questions?
  • 53. 09/05/14 pag. 53 Readings •  Ch. 1-2 h"p://www.vismaster.eu/wp-­‐content/uploads/2010/11/VisMaster-­‐book-­‐lowres.pdf    
  • 54. 09/05/14 pag. 54 References •  Amar, R. A., & Stasko, J. T. (2005). Knowledge precepts for design and evaluation of information visualizations. Visualization and Computer Graphics, IEEE Transactions on, 11(4), 432-442. •  D. A. Keim, F. Mansmann, J. Schneidewind, and H. Ziegler. Challenges in visual data analysis. In Information Visualization (IV 2006), Invited Paper, July 5-7, London, United Kingdom. IEEE Press, 2006. •  Daniel Keim, Gennady Andrienko, Jean-Daniel Fekete, Carsten Görg, Jörn Kohlhammer, and Guy Melancon. 2008. Visual Analytics: Definition, Process, and Challenges. In Information Visualization, Andreas Kerren, John T. Stasko, Jean-Daniel Fekete, and Chris North (Eds.). Lecture Notes In Computer Science, Vol. 4950. Springer-Verlag, Berlin, Heidelberg 154-175 •  Shneiderman, B. (2002) Inventing discovery tools: combining information visualization with data mining1. Information visualization, 1(1), 5-12.