Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Krist Wongsuphasawat's Dissertation Defense: Interactive Exploration of Temporal Event Sequences

1,676 views

Published on

Krist Wongsuphasawat's Dissertation Defense at the University of Maryland, College Park on April 10, 2012

Published in: Technology, Business
  • Be the first to comment

Krist Wongsuphasawat's Dissertation Defense: Interactive Exploration of Temporal Event Sequences

  1. 1. event event event event eventevent event event LIFE event event event event event event event event
  2. 2. Time Event type ( 7:00 am, Wake up ) event event event event eventevent event event LIFE event event event event event event event event
  3. 3. event event event event eventevent event event LIFE event event event event event event event event “Event Sequence”
  4. 4. Daily Activities7:00/W!"# $p 7:15/S%&w#r 8:00/Br#!"f!(
  5. 5. Student ProgressA$)’07/E*(#r M!+’09/M!(#r Apr’12/D#f#*#
  6. 6. Event Sequences Medical Transportation Sports Education Web logs Logistics and more…
  7. 7. Two interesting problems
  8. 8. 1. Lack of overview Show overview or summary 60,041 patients 203,214 traffic incidentsWhere should I start?Is the dataset cleaned? 7,022 web sessions … and more
  9. 9. 2. Approximate search ICU Floor ICU QUERY within 2 daysFind somethinguseful and display. RESULTS Frustrated! Found 0 record
  10. 10. Research Questions Overview SearchHow to provide an overview How to support usersof multiple event sequences? when they are uncertain about what they are looking for? LifeFlow Similan Flexible Temporal Search
  11. 11. Outline ApproximateIntroduction Search Conclusions LifeFlow Case Studies Overview How to provide an overview of multiple event sequences?
  12. 12. From one event sequence...•  Single record [Cousins91], [Harrison94], [Plaisant98], … Patient ID: 45851737 12/02/2008&14:26 &Arrival& 12/02/2008&14:26 &Emergency& 12/02/2008&22:44 &ICU& 12/05/2008&05:07 &Floor& 12/08/2008&10:02 &Floor& 12/14/2008&06:19 &Discharge& & Time Patient #45851737 Arrival Emergency Room ICU Floor Discharge compact
  13. 13. To multiple event sequences...•  Search [Fails06], [Wang08], [Vrotsou09], …
  14. 14. To multiple event sequences...•  Search [Fails06], [Wang08], [Vrotsou09], …•  Group [Phan07], [Burch08], [Wang09], … 1 { 2 {
  15. 15. but…
  16. 16. Summarizee.g. 1) What happened to the patients after they arrived? Arrival! ? ? 2) What happened to the patients before & after ICU? ICU! ? ? ? ?
  17. 17. Overview / Summary Millions of records!
  18. 18. Challenges Squeeze into one screen AGGREGATE ScreenMillions of records Preserve information!
  19. 19. 1 # LifeFlowscalable & novel overviewsummarizes all possible sequences! & gaps between events!
  20. 20. DemoLifeFlow Design
  21. 21. 1 # time#1& Event Sequences#2& n records#3&…& 1,000,000Aggregate O(n) Tree of Sequences α" No. of patterns 9 nodesRepresent time records LifeFlow Visual Representation Space-filling technique Average time Event Bar End Node
  22. 22. DemoLifeFlow
  23. 23. User Studyxxxxx 12-minuteyyyyy10 participants training 15 tasks Participants could perform the tasks accurately and rapidly.
  24. 24. Quotes “ Oh! This is very cool! ” “ Theunderstandeasy to tool is “ LifeFlow provides a great summary and easy to use.! ” of the big picture.! ”“ find common Very easy to “ Can I use it and uncommon sequences! with my dataset? ” ”
  25. 25. wait for the case studies :)
  26. 26. Outline How to support users when they are uncertain about what they are looking for? ApproximateIntroduction Search Conclusions LifeFlow Case Studies Overview Similarity Search Hybrid Search
  27. 27. Related Work: Exact Match Exact Match •  Event Sequence MUST have A, B, C –  TimeSearcher [Hochheiser04] Query –  PatternFinder [Fails06] –  LifeLines2 Record#1 [Wang08] –  ActiviTree Record#2 [Vrotsou09] –  QueryMarvel Record#3 [Jin09]
  28. 28. Related Work: Similarity Search•  Image Similarity Search [Kato92] SHOULD have A, B, C•  Stock Price [Wattenberg01] Query more" similar!•  Web page [Watai07] Record#2 0.91•  Bank account [Chang07] Record#1 0.83•  Event Sequence? Record#3 0.70
  29. 29. ChallengesWhat is similar? depends on users/tasks Query Record #1 A! B! C! Record #2 missing A! B! C! Record #3 extra A! B! C! D! Record #4 A! B! time difference C! Record #5 swap A! C! B!
  30. 30. Match & Mismatch (M&M) Measure TimeQuery Record #1 A! C! B! D! Record #2 A! B! C! E! Matched events Missing Extra } Time difference Number of swap Total Score Number of missing events 0.00-1.00 Number of extra events
  31. 31. 2 # Similarity SearchSimilarity Measure Match & Mismatch + User Interface Similan What is similar?! Specify query / Display results! Version 1 xxxxyyyy Version 2
  32. 32. Screenshot Similan
  33. 33. Controlled ExperimentExact Match Similarity Search LifeLines2 Similan xxxxxxxxx xxxxyyyyy 18 participants
  34. 34. Lessons Exact Match Similarity Search Counting SimilarConfidence Flexible Uncertainty accept reject
  35. 35. CombinationExact Match + Similarity Search = Hybrid accept reject accept reject
  36. 36. 3 #Flexible Temporal Search (FTS) “mandatory” Results BeginQuery Constraint #1 PASS FAIL Constraint #2 accept Constraint #3 mandatory reject optional Reject
  37. 37. 3 #Flexible Temporal Search (FTS) “optional” ResultsQuery Constraint #1 PASS FAIL Constraint #2 accept Constraint #3 mandatory reject optional
  38. 38. mandatoryConstraints•  Event A! B! C! Aug 14, 2000•  Timing A!•  Negation A! C! B!•  Gap A! 1-2 days! C!
  39. 39. optionalConstraints•  Event A! B! C! Aug 14, 2000•  Timing A!•  Negation A! C! B!•  Gap A! 1-2 days! C!
  40. 40. FTS Matching TimeQuery A! B! C! D! E! Record #2 A! B! D! C!
  41. 41. FTS Matching(2) i Query A! B! C! D! E! s(0,0) s(1,0) Record #2 s(0,1) A! Dynamic programming { B!j s(i-1, j) + skip( query[i] ) D! s(i, j) = max s(i, j-1) + skip( events[j] ) s(i-1, j-1) + match( query[i], events[j] ) C!
  42. 42. Similarity Vector s(i,j)•  No. of matched events (mandatory)•  No. of matched events (optional)•  No. of negations violated (optional)•  No. of negations violated (mandatory)•  No. of time constraints violated•  Time difference•  No. of extra events –  Extra before the first match –  Extra between the first and last match –  Extra after the last match
  43. 43. (Flexible Temporal Search) Query FTS Record#1 Grade Similarity ScorePass/Fail 0-100 1.  Missing events 2.  Extra events 3.  Negation violations 4.  Time difference
  44. 44. DemoFlexible Temporal Search (FTS)
  45. 45. Outline ApproximateIntroduction Search Conclusions LifeFlow Case Studies Overview Multi-dimensional In-depth Long-term Case Studies (MILCs)
  46. 46. “to the wild”
  47. 47. MILCs# Domain Data Size Duration1 Medical 7,041 7 months2 Transportation 203,214 3 months3 Medical 20,000 6 months4 Medical 60,041 1 year5 Web logs 7,022 6 weeks6 Activity logs 60 5 months7 Logistics 821 6 weeks8 Sports 61 5 weeks 8 case studies / 6 domains
  48. 48. Case #1: MedicalUser: Dr. A. Zach Hettinger MedStar Institute for Innovation mi2.orgData: 60,041 patientsTask: Hospital readmissions
  49. 49. Current ReportPatient Diagnosis Visit Date Physician Visit Date Physician #1 #1 #2 #2Mr. X Back pain Jun 10, 2010 Dr. Jones Jun 29, 2010 Dr. BrownMr. Y Chest pain Jun 11, 2010 Dr. Jones Jun 20, 2010 Dr. Jones… … … … … … An example of current report used in a hospital (fake data) How many patients came back? Did they come back for the 3rd, 4th, … time? How many came back and died? …
  50. 50. 60,041 patients How many patients came back? Did they come back for the 3rd, 4th, … time? Registration
  51. 51. 60,041 patients Registration How many came back and died? Death
  52. 52. 60,041 patients Location Registration Admission Death
  53. 53. 60,041 patients Find a pattern: Registration > Discharge > Registration > Death Registration Discharge Death
  54. 54. 60,041 patients Find a pattern: Registration > Discharge > Registration > Death Registration Discharge Death
  55. 55. Analyzing data in a new way Personal exploration Long-term monitoring Save more lives!
  56. 56. Case #2: TransportationUser: CATT Lab at the University of Maryland www.cattlab.umd.eduData: 203,214 traffic incidentsTask: Comparing traffic agencies’ performance
  57. 57. 100 Years!
  58. 58. Clean the data!
  59. 59. Video
  60. 60. Suspicious distribution!
  61. 61. Detect anomalies Clean data Large dataset
  62. 62. Case #3: Web logsUser: Anne Rose International Children’s Digital Library www.childrenslibrary.orgData: 7,022 sessionsTask: How do people read children books online? PAGE 1 PAGE 2 PAGE 3 …
  63. 63. ~5 MINUTES
  64. 64. 24 SECONDS
  65. 65. Understand dataSurprising patternNew hypotheses
  66. 66. Case #4: Sports User: Daniel Lertpratchya Manchester United soccer fan www.manutd.com Data: 61 soccer matches Task: Find interesting matches to watch replay videos. Explore data to find fun facts.Begin Score Opponent Score End
  67. 67. Find interesting matchesBeginScoreOpponent ScoreEnd
  68. 68. Demolish another team.
  69. 69. Came back after conceded two goals.
  70. 70. Performance: home vs. awayBeginScoreOpponent ScoreMissed PenaltyEnd
  71. 71. Finding specific situations.BeginScoreOpponent ScoreMissed PenaltyEnd
  72. 72. 4 # Design GuidelinesAlign-Rank-Filter Handle event types Incorporate attributes Breakfast Lunch } Meal Multiple levels Multiple overviews Coordinated views of information Overview Record Event Search Data preprocessing History / Provenance
  73. 73. Outline ApproximateIntroduction Search Conclusions LifeFlow Case Studies Overview
  74. 74. Contributions1.  How to provide an overview of multiple event sequences? # 1 LifeFlow Visualization Aggregation, Visual encodings & Interactions2.  How to support users when they are uncertain about what they are looking for? #2 # 3 Similarity Search Hybrid Search Similan + Match & Mismatch Flexible Temporal Search 4 # Case Studies + Design Guidelines
  75. 75. Future DirectionsOutflow Improve the New tasks: visualization & UI: comparison, colors, gaps, … attributes in query, …! More complex data: Scalability: stream, interval database, concurrency, …! cloud computing, …
  76. 76. Outline ApproximateIntroduction Search Conclusions LifeFlow Case Studies Overview
  77. 77. Outline ApproximateIntroduction Search Conclusions LifeFlow Case Studies Overview This is an event sequence!
  78. 78. refresh
  79. 79. fruitful
  80. 80. Acknowledgement Washington Hospital Center Dr. A. Zach Hettinger , Dr. Phuong Ho and Dr. Mark Smith National Institutes of Health Grant RC1CA147489-02Center for Integrated Transportation Systems Management a Tier 1 Transportation Center at the University of Maryland Study Participants Advisors, Committees, HCIL Colleagues
  81. 81. Contributions 1.  How to provide an overview of multiple event sequences? LifeFlow Visualization Aggregation, Visual encodings & Interactions 2.  How to support users when they are uncertain about what they are looking for? Similarity Search Hybrid Search Similan + Match & Mismatch Flexible Temporal Search Case Studies + Design Guidelineshttp://www.cs.umd.edu/hcil/lifeflow kristw@cs.umd.edu / @kristwongz
  82. 82. Thank you ขอบคุณครับ

×