Info vis 12-2012-v17-shneiderman

  • 1,135 views
Uploaded on

Information visualization review: Visual Analytics for Association of University Centers on Disabilities (AUCD) conference

Information visualization review: Visual Analytics for Association of University Centers on Disabilities (AUCD) conference

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,135
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
13
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • "The IN Cell Analyzer automated microscope was used to identify proteins influencing the division of human cells. After the images were analyzed, quantitative results were transferred to Spotfire DecisionSite. This screen revealed the previously unknown involvement of the retinol binding protein RBP1 in cell cycle control.(Stubbs S, & Thomas N. 2006 Methods in Enzymology; 414:1-21.) Retinol a form of Vitamin A plays a crucial role in vision and during embryonic development"  
  • Contrast and Creatinine dataset In some diagnostic radiology procedures, patients are injected contrast material. However, some patients develop adverse side effects to the contrast material. One serious side effect is renal failure, which is detected by high creatinine levels in a patient's blood. This adverse effect usually occur within two weeks after the radiology contrast. WHC is interested in finding the proportion of patients who exhibit this condition in historical records. Screenshots 1-aligned-ranked.png: We align by the 1st occurrence of radiology contrast and rank by the number of creatinine high (CREAT-H) events to bring the most severe patients to the top. We realize two things: (1) some patients have more than 1 "Radiology Contrast" events, and (2), some patients have consistently high creatinine readings (chronic kidney failure). 2-aligned(all)-distribution-selected.png We align by all occurrences of raiology contrast, and then show the temporal summary of CREAT-H events. The patients are presented in 4 exclusive sets in the summary: those who have CREAT-H only before alignment, only after alignment, both before and after, and neither. We then select from the "only after" summary the patients who have at least one CREAT-H event within 2 weeks of any "Radiology Contrast" event. There are 421 patients.
  • Using LifeFlow, 7,041 patients are aggregated into this visualization and LifeFlow immediately reveal the most common pattern, which you could not do easily in SQL. You could easily notice this huge pattern “Arrival -> ER -> Exit”, meaning patients who visited with minor injuries or simple conditions and left the hospital immediately after receiving their treatment. When hovering the mouse over, LifeFlow displays a tooltip that gives more information, such as number of patients and other statistics, and also shows the distribution of the patients. As the horizontal gap represents time, you can see from the distribution that some patients left the hospital very quickly after visiting the emergency room while some of them stayed longer. *optional The second most common pattern is “Arrival (Blue) -> ER (Pink) -> Floor (Green) -> Exit (Cyan)”, meaning patients who were admitted to observe the conditions and then everything went well so they left the hospital. You can also use the horizontal gap to compare these patients with the patients who exit from the emergency room. Comparing the gap from pink to cyan and pink to green, you can see that the gap from pink to green is smaller than pink to cyan, so the patients were transferred to Floor faster than exit the hospital in average. You have seen the two most common cases, now I will remove the common patterns so we can analyze the less frequent patterns.
  • After removing all the common cases, we have 344 patients left. These are mostly the patients who were admitted. There are many information that I can explain from this visualization here, but I will go straight into the case that our physician partners are mostly interested in. The mouse is pointing at this sequence, which represents the “bounce backs” patients, meaning patients who were transferred from ICU to Floor because they seemed to get better, however, they were transferred back to the ICU. So the physician are interested in finding these patients to analyze what made them made the wrong decisions. *optional Another case is the step ups, which means the patients whose level of care were escalated to higher level, you can see from the visualization that there were patients who were transferred from ER to Floor (green) to ICU (red) and IMC (orange). The number of these patients and the average transferred time could be compare to the hospital standards to measure the quality of care.
  • Ben: This slide is optional. You can use it to show that when you click on the bounce backs patients, you can get the details of each patient in LifeLines2 view.
  • Another interesting feature is you can align by a particular event. For example, if you want to know what happened before and after the patients went to the ICU, you can align by ICU. The dash line separate between what happened before and what happened after. You can see that the ICU patients mostly came from the ER (pink), and most of them were transferred to Floor (green) after that. Unfortunately, some of them died after they were transferred to the ICU (black). From this visualization, you may notice a small pattern in the bottom. Let me zoom in.
  • So this patient was dead before transferred to the ICU, which is impossible. Of course, this must be problem with data entry. But we may never notice it if the data are hidden in the database. Therefore, you can see that LifeFlow support this kind of analysis by giving overview, showing common trends, providing summary of every sequences, you can do SQL and calculate average for every transfer if you like, but in LifeFlow, it is right there, you just need to move your mouse over. showing every possible transfer pattern and may led you to a discovery of surprising pattern.
  • Live Demonstration
  • Aligning sales and marketing is essential for success. The graph on the left shows sales people linked to opportunities, including industry. The thicker the line, the higher the probability of closing the deal. The larger the dollar sign, the bigger the deal. Sullivan, Vazquez and Distefano are performing the best. The upper right shows the number of deals by stage in the sales cycle. The blue bubble chart shows potential revenue by marketing program and stage in the sales cycle. Search engine optimization and inbound links from Web sites have the biggest impact. Armed with this information, marketing managers can advertise to the financial services and manufacturing sectors through specific tactics, and sales managers can see the performance of the reps and the industries where they are successful.
  • Chapter 3, Figure 1 (page 6). A NodeXL social media network diagram of relationships among Twitter users mentioning the hashtag “#WIN09” used by attendees of a conference on Network Science at NYU in September 2009. Each user’s node is sized proportional to the number of tweets they have ever made to that date.
  • Figure 1. (a) Harel-Koren (HK) fast multi-scale layout of a clustered network of Twitter users, using color to differentiate among the vertices in different clusters. The layout produces a visualization with overlapping cluster positions. . (b) Group-in-a-Box (GIB) layout of the same Twitter network: clusters are distributed in a treemap structure that partitions the drawing canvas based on the size of the clusters and the properties of the rendered layout. Inside each box, clusters are rendered with the HK layout.
  • Figure 3. The 2007 U.S. Senate co-voting network graph, visualized with the GIB layout. The group in each box represents senators from a given U.S. region (1: South; 2: Midwest; 3: Northeast; 4: Mountain; 5: Pacific) and individual groups are displayed using the FR layout. Vertices colors represent the senators’ party affiliations (blue: Democrats; red: Republicans; orange: Independent) and their size is proportional to betweenness centrality. Edges represent percentage of agreement between senators: (a) above 50%; (b) above 90%..
  • Figure 13.20. NodeXL cluster visualization showing three Flickr tag clusters, each representing a different context for “mouse”. Figure 13.21. NodeXL display of Isolated clusters for three different contexts for the “mouse” tag in Flickr: mouse animal, computer mouse, and Mickey Mouse Disney character.
  • Chapter 3, Figure 1 (page 6). A NodeXL social media network diagram of relationships among Twitter users mentioning the hashtag “#WIN09” used by attendees of a conference on Network Science at NYU in September 2009. Each user’s node is sized proportional to the number of tweets they have ever made to that date.

Transcript

  • 1. Visual Analytics:New Tools for Gaining Insight from Your Data Ben Shneiderman ben@cs.umd.edu Founding Director (1983-2000), Human-Computer Interaction Lab Professor, Department of Computer Science Member, Institute for Advanced Computer Studies University of Maryland College Park, MD 20742
  • 2. Visual Analytics:New Tools for Gaining Insight from Your Data Ben Shneiderman ben@cs.umd.edu Twitter: @benbendc University of Maryland College Park, MD 20742
  • 3. Interdisciplinary research community - Computer Science & Info Studies - Psych, Socio, Poli Sci & MITH (www.cs.umd.edu/hcil)
  • 4. Design Issues• Input devices & strategies • Keyboards, pointing devices, voice • Direct manipulation • Menus, forms, commands• Output devices & formats • Screens, windows, color, sound • Text, tables, graphics • Instructions, messages, help• Collaboration & Social Media www.awl.com/DTUI Fifth Edition: 2010• Help, tutorials, training• Search • Visualization
  • 5. HCI Pride: Serving 5B UsersMobile, desktop, web, cloud Diverse users: novice/expert, young/old, literate/illiterate, abled/disabled, cultural, ethnic & linguistic diversity, gender, personality, skills, motivation, ... Diverse applications: E-commerce, law, health/wellness, education, creative arts, community relationships, politics, IT4ID, policy negotiation, mediation, peace studies, ... Diverse interfaces: Ubiquitous, pervasive, embedded, tangible, invisible, multimodal, immersive/augmented/virtual, ambient, social, affective, empathic, persuasive, ...
  • 6. Workshop Overview Wordle.net
  • 7. Information Visualization• Visual bandwidth is enormous • Human perceptual skills are remarkable • Trend, cluster, gap, outlier... • Color, size, shape, proximity...• Three challenges • Meaningful visual displays of massive data • Interaction: widgets & window coordination • Process models for discovery
  • 8. Information Visualization & Visual Analytics • Visual bands • Human percle • Trend, clus.. • Color, size,.. • Three challe • Meaningful vi • Interaction: w • Process mo 1999
  • 9. Information Visualization & Visual Analytics • Visual bandwidth is enormous • Human perceptual skills are remarkable • Trend, cluster, gap, outlier... • Color, size, shape, proximity... • Three challenges • Meaningful visual displays of massive da • Interaction: widgets & window coordinati • Process models for discovery 1999 2004
  • 10. Information Visualization & Visual Analytics • Visual bandwidth is enormous • Human perceptual skills are remarkable • Trend, cluster, gap, outlier... • Color, size, shape, proximity... • Three challenges • Meaningful visual displays of massive data • Interaction: widgets & window coordination • Process models for discovery 1999 2004 2010
  • 11. Business takes action• General Dynamics buys MayaViz• Agilent buys GeneSpring• Google buys Gapminder• Oracle buys Hyperion• Microsoft buys Proclarity• InfoBuilders buys Advizor Solutions• SAP buys (Business Objects buys Xcelsius & Inxight & Crystal Reports )• IBM buys (Cognos buys Celequest) & ILOG• TIBCO buys Spotfire
  • 12. Spotfire: Retinol’s role in embryos & vision
  • 13. Spotfire: DC natality data
  • 14. http://registration.spotfire.com/eval/default_edu.asp
  • 15. 10M - 100M pixels: Large displays
  • 16. 100M-pixels & more
  • 17. 1M-pixels & less Small mobile devices
  • 18. Information Visualization: Mantra• Overview, zoom & filter, details-on-demand• Overview, zoom & filter, details-on-demand• Overview, zoom & filter, details-on-demand• Overview, zoom & filter, details-on-demand• Overview, zoom & filter, details-on-demand• Overview, zoom & filter, details-on-demand• Overview, zoom & filter, details-on-demand• Overview, zoom & filter, details-on-demand• Overview, zoom & filter, details-on-demand• Overview, zoom & filter, details-on-demand
  • 19. Information Visualization: Data Types • 1-D Linear. Document Lens, SeeSoft, Info Mural • 2-D Map GIS, ArcView, PageMaker, Medical imagery • 3-D World CAD, Medical, Molecules, Architecturezi Vc S i • Multi-Var Spotfire, Tableau, GGobi, TableLens, ParCoords, • Temporal LifeLines, TimeSearcher, Palantir, DataMontage • Tree Cone/Cam/Hyperbolic, SpaceTree, Treemap • Network Pajek, JUNG, UCINet, SocialAction, NodeXLzi V f nI o infosthetics.com flowingdata.com infovis.org www.infovis.net/index.php?lang=2
  • 20. ManyEyes: A web sharing platform www-958.ibm.com/
  • 21. Anscombe’s Quartet 1 2 3 4x y x y x y x y10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.7613.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.8411.0 8.33 11.0 9.26 11.0 7.81 8.0 8.4714.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.5012.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
  • 22. Anscombe’s Quartet 1 2 3 4x y x y x y x y Property Value10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58 Mean of x 9.0 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76 Variance of x 11.013.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71 Mean of y 7.5 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84 Variance of y 4.1211.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47 Correlation 0.81614.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04 Linear regression y = 3 + 0.5x 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.5012.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
  • 23. Anscombe’s Quartet
  • 24. Temporal Data: TimeSearcher 1.3• Time series • Stocks • Weather • Genes• User-specified patterns• Rapid search
  • 25. Temporal Data: TimeSearcher 2.0• Long Time series (>10,000 time points)• Multiple variables• Controlled precision in match (Linear, offset, noise, amplitude)
  • 26. LifeLines: Patient Histories www.cs.umd.edu/hcil/lifelines
  • 27. LifeLines2: Contrast+Creatine
  • 28. LifeLines2: Align-Rank-Filter & Summarize
  • 29. LifeFlow: Aggregation Strategy Temporal Categorical Data (4 records) LifeLines2 format Tree of Event Sequences LifeFlow Aggregation www.cs.umd.edu/hcil/lifeflow
  • 30. LifeFlow: Interface with User Controls
  • 31. Treemap: Gene Ontology+ Space filling+ Space limited+ Color coding+ Size coding- Requires learning (Shneiderman, ACM Trans. on Graphics, 1992 & 2003) www.cs.umd.edu/hcil/treemap/
  • 32. Treemap: Smartmoney MarketMap www.smartmoney.com/marketmap
  • 33. Market falls steeply Feb 27, 2007, with one exception
  • 34. Market falls steeply Sept 22, 2011, some exceptions
  • 35. Market mixed, February 8, 2008Energy & Technology up, Financial & Health Care down
  • 36. Market rises, September 1, 2010, Gold contrarians
  • 37. Market rises, March 21, 2011, Sprint declines
  • 38. Treemap: Newsmap (Marcos Weskamp) newsmap.jp
  • 39. Treemap: WHC Emergency Room (6304 patients in Jan2006)Group by Admissions/MF, size by service time, color by age
  • 40. Treemap: WHC Emergency Room (6304 patients in Jan2006) (only those service time >12 hours)Group by Admissions/MF, size by service time, color by age
  • 41. Treemap: Supply Chain www.hivegroup.com
  • 42. Treemap: Nutritional Analysis www.hivegroup.com
  • 43. Treemap: Spotfire Bond Portfolio Analysis www.spotfire.com
  • 44. Treemap: NY Times – Car&Truck Sales www.cs.umd.edu/hcil/treemap/
  • 45. Treemap (Voronoi): NY Times - Inflationwww.nytimes.com/interactive/2008/05/03/business/20080403_SPENDING_GRAPHIC.html
  • 46. VisualComplexity.com : Manuel Lima
  • 47. Discovery Process: Systematic Yet Flexible Preparation • Own the problem & define the schedule • Data cleaning & conditioning • Handle missing & uncertain data • Extract subsets & link to related information
  • 48. SocialAction• Integrates statistics & visualization• 4 case studies, 4-8 weeks (journalist, bibliometrician, terrorist analyst, organizational analyst)• Identified desired features, gave strong positive feedback about benefits of integration www.cs.umd.edu/hcil/socialaction Perer & Shneiderman, CHI2008, IEEE CG&A 2009
  • 49. Network from Database Tables www.centrifugesystems.com
  • 50. NodeXL: Network Overview for Discovery & Exploration in Excelwww.codeplex.com/nodexl
  • 51. NodeXL:Network Overview for Discovery & Exploration in Excel www.codeplex.com/nodexl
  • 52. NodeXL: Import Dialogs www.codeplex.com/nodexl
  • 53. Tweets at #WIN09 Conference: 2 groups
  • 54. ‘GOP’ tweets, clustered (red-Republicans)
  • 55. Twitter networks: #SOTU
  • 56. WWW2010 Twitter Community
  • 57. Twitter Network for “msrtf11 OR techfest ”
  • 58. Twitter Network for “msrtf11 OR techfest ”
  • 59. Twitter Network for “SpaceX ”
  • 60. Twitter Network for “TTW”
  • 61. Twitter Network for #CI2012
  • 62. No Location Philadelphia Innovation Clusters: People, Locations, Companies 11,000 nodes 26,000 links Pharmaceutical/MedicalPittsburgh MetroWestinghouse Electric
  • 63. No Location Philadelphia Innovation Clusters: People, Locations, CompaniesPharmaceutical/MedicalPittsburgh MetroWestinghouse Electric
  • 64. No Location Philadelphia Innovation Clusters: People, Locations, Companies Patent Tech Navy SBIR (federal) PA DCED (state) Related patent 2: Federal agencyPharmaceutical/Medical 3: EnterprisePittsburgh Metro 5: Inventors 9: Universities 10: PA DCED 11/12: Phil/Pitt metro cnty 13-15: Semi-rural/rural cnty 17: Foreign countries 19: Other statesWestinghouse Electric
  • 65. CHI2010 Twitter Community www.codeplex.com/nodexl/
  • 66. Flickr clusters for “mouse” Computer Mickey Animal
  • 67. Flickr networks
  • 68. Analyzing Social Media Networks with NodeXLI. Getting Started with Analyzing Social Media Networks 1. Introduction to Social Media and Social Networks 2. Social media: New Technologies of Collaboration 3. Social Network AnalysisII. NodeXL Tutorial: Learning by Doing 4. Layout, Visual Design & Labeling 5. Calculating & Visualizing Network Metrics  6. Preparing Data & Filtering 7. Clustering &GroupingIII Social Media Network Analysis Case Studies 8. Email 9. Threaded Networks 10. Twitter 11. Facebook   12. WWW 13. Flickr 14. YouTube  15. Wiki Networks  www.elsevier.com/wps/find/bookdescription.cws_home/723354/description
  • 69. Social Media Research FoundationResearchers who want to - create open tools - generate & host open data - support open scholarshipMap, measure & understand social media  Support tool projects to collection, analyze & visualize social media data.   smrfoundation.org
  • 70. Sense-Making Loop Thomas & Cook: Illuminating the Path (2004)
  • 71. Sense-Making Loop: Expanded Thomas & Cook: Illuminating the Path (2004)
  • 72. Discovery Process: Systematic Yet Flexible Preparation • Own the problem & define the schedule • Data cleaning & conditioning • Handle missing & uncertain data • Extract subsets & link to related information
  • 73. Discovery Process: Systematic Yet Flexible Preparation • Own the problem & define the schedule • Data cleaning & conditioning • Handle missing & uncertain data • Extract subsets & link to related information Purposeful exploration – Hypothesis testing • Range & distribution • Relationships & correlations • Clusters & gaps • Outliers & anomalies • Aggregation & summary • Split & trellis • Temporal comparisons & multiple views • Statistics & forecasts
  • 74. Discovery Process: Systematic Yet Flexible Preparation • Own the problem & define the schedule • Data cleaning & conditioning • Handle missing & uncertain data • Extract subsets & link to related information Purposeful exploration – Hypothesis testing • Range & distribution • Relationships & correlations • Clusters & gaps • Outliers & anomalies • Aggregation & summary • Split & trellis • Temporal comparisons & multiple views • Statistics & forecasts Situated decision making - Social context • Annotation & marking • Collaboration & coordination • Decisions & presentations
  • 75. UN Millennium Development Goals To be achieved by 2015 • Eradicate extreme poverty and hunger • Achieve universal primary education • Promote gender equality and empower women • Reduce child mortality • Improve maternal health • Combat HIV/AIDS, malaria and other diseases • Ensure environmental sustainability • Develop a global partnership for development
  • 76. 30th Anniversary Symposium May 22-23, 2013 www.cs.umd.edu/hcil
  • 77. For More Information• Visit the HCIL website for 650 papers & info on videos www.cs.umd.edu/hcil• Conferences & resources: www.infovis.org• See Chapter 14 on Info Visualization Shneiderman, B. and Plaisant, C., Designing the User Interface: Strategies for Effective Human-Computer Interaction: Fifth Edition (2010) www.awl.com/DTUI• Edited Collections: Card, S., Mackinlay, J., and Shneiderman, B. (1999) Readings in Information Visualization: Using Vision to Think Bederson, B. and Shneiderman, B. (2003) The Craft of Information Visualization: Readings and Reflections
  • 78. For More Information• Treemaps • HiveGroup: www.hivegroup.com • Smartmoney: www.smartmoney.com/marketmap • HCIL Treemap 4.0: www.cs.umd.edu/hcil/treemap• Spotfire: www.spotfire.com• TimeSearcher: www.cs.umd.edu/hcil/timesearcher• NodeXL: nodexl.codeplex.com• Hierarchical Clustering Explorer: www.cs.umd.edu/hcil/hce• LifeLines2: www.cs.umd.edu/hcil/lifelines2• Similan: www.cs.umd.edu/hcil/similan