Here are the top 3 most similar records to your query:Record 1 (Similarity score: 0.83)Record 2 (Similarity score: 0.75) Record 3 (Similarity score: 0.70)Would you like me to show you the details? Or search for other similar records?User: Show details of Record 1 please
Vector Databases 101 - An introduction to the world of Vector Databases
Here are the top 3 most similar records to your query:Record 1 (Similarity score: 0.83)Record 2 (Similarity score: 0.75) Record 3 (Similarity score: 0.70)Would you like me to show you the details? Or search for other similar records?User: Show details of Record 1 please
9. 1. Lack of overview
Show overview
or summary
60,041 patients
203,214 traffic incidents
Where should I start?
Is the dataset cleaned? 7,022 web sessions
… and more
10. 2. Approximate search
ICU Floor ICU
QUERY within 2 days
Find something
useful and display.
RESULTS Frustrated!
Found 0 record
11. Research Questions
Overview Search
How to provide an overview How to support users
of multiple event sequences? when they are uncertain
about what they are looking for?
LifeFlow
Similan
Flexible Temporal Search
12. Outline
Approximate
Introduction Search Conclusions
LifeFlow Case Studies
Overview
How to provide an overview
of multiple event sequences?
13. From one event sequence...
• Single record [Cousins91], [Harrison94], [Plaisant98], …
Patient ID: 45851737
12/02/2008&14:26 &Arrival&
12/02/2008&14:26 &Emergency&
12/02/2008&22:44 &ICU&
12/05/2008&05:07 &Floor&
12/08/2008&10:02 &Floor&
12/14/2008&06:19 &Discharge&
& Time
Patient #45851737 Arrival
Emergency Room
ICU
Floor
Discharge
compact
22. 1
#
time
#1&
Event Sequences
#2&
n records
#3&
…&
1,000,000
Aggregate O(n)
Tree of Sequences
α" No. of patterns
9 nodes
Represent
time
records
LifeFlow
Visual Representation
Space-filling technique
Average time Event Bar End Node
24. User Study
xxxxx 12-minute
yyyyy
10 participants
training
15 tasks
Participants could perform the tasks
accurately and rapidly.
25. Quotes
“ Oh! This is very cool! ”
“ Theunderstand
easy to
tool is “ LifeFlow provides
a great summary
and easy to use.! ” of the big picture.! ”
“ find common
Very easy to
“ Can I use it
and uncommon
sequences!
with my dataset? ”
”
27. Outline
How to support users when they are uncertain
about what they are looking for?
Approximate
Introduction Search Conclusions
LifeFlow Case Studies
Overview
Similarity Search Hybrid Search
28. Related Work: Exact Match
Exact Match • Event Sequence
MUST have A, B, C – TimeSearcher
[Hochheiser04]
Query – PatternFinder
[Fails06]
– LifeLines2
Record#1
[Wang08]
– ActiviTree
Record#2 [Vrotsou09]
– QueryMarvel
Record#3 [Jin09]
29. Related Work: Similarity Search
• Image Similarity Search
[Kato92] SHOULD have A, B, C
• Stock Price
[Wattenberg01] Query more"
similar!
• Web page
[Watai07] Record#2 0.91
• Bank account
[Chang07] Record#1 0.83
• Event Sequence?
Record#3 0.70
30. Challenges
What is similar?
depends on users/tasks
Query Record #1
A! B! C!
Record #2 missing
A! B! C!
Record #3 extra
A! B! C! D!
Record #4
A! B! time difference C!
Record #5 swap
A! C! B!
31. Match & Mismatch (M&M) Measure
Time
Query Record #1
A! C! B! D!
Record #2
A! B! C! E!
Matched events Missing Extra
}
Time difference
Number of swap Total Score
Number of missing events 0.00-1.00
Number of extra events
32. 2
#
Similarity Search
Similarity Measure
Match & Mismatch + User Interface
Similan
What is similar?! Specify query / Display results!
Version 1
xxxxyyyy
Version 2
43. Similarity Vector s(i,j)
• No. of matched events (mandatory)
• No. of matched events (optional)
• No. of negations violated (optional)
• No. of negations violated (mandatory)
• No. of time constraints violated
• Time difference
• No. of extra events
– Extra before the first match
– Extra between the first and last match
– Extra after the last match
48. MILCs
# Domain Data Size Duration
1 Medical 7,041 7 months
2 Transportation 203,214 3 months
3 Medical 20,000 6 months
4 Medical 60,041 1 year
5 Web logs 7,022 6 weeks
6 Activity logs 60 5 months
7 Logistics 821 6 weeks
8 Sports 61 5 weeks
8 case studies / 6 domains
49. Case #1: Medical
User: Dr. A. Zach Hettinger
MedStar Institute for Innovation
mi2.org
Data: 60,041 patients
Task: Hospital readmissions
50. Current Report
Patient Diagnosis Visit Date Physician Visit Date Physician
#1 #1 #2 #2
Mr. X Back pain Jun 10, 2010 Dr. Jones Jun 29, 2010 Dr. Brown
Mr. Y Chest pain Jun 11, 2010 Dr. Jones Jun 20, 2010 Dr. Jones
… … … … … …
An example of current report used in a hospital (fake data)
How many patients came back?
Did they come back for the 3rd, 4th, … time?
How many came back and died?
…
51. 60,041 patients How many patients came back?
Did they come back for the 3rd, 4th, … time?
Registration
52. 60,041 patients
Registration How many came back and died?
Death
54. 60,041 patients Find a pattern:
Registration > Discharge > Registration > Death
Registration
Discharge
Death
55. 60,041 patients Find a pattern:
Registration > Discharge > Registration > Death
Registration
Discharge
Death
56. Analyzing data in a new way
Personal exploration
Long-term monitoring
Save more lives!
57. Case #2: Transportation
User: CATT Lab at the University of Maryland
www.cattlab.umd.edu
Data: 203,214 traffic incidents
Task: Comparing traffic agencies’ performance
66. Case #3: Web logs
User: Anne Rose
International Children’s Digital Library
www.childrenslibrary.org
Data: 7,022 sessions
Task: How do people read children books online?
PAGE 1 PAGE 2 PAGE 3 …
72. Case #4: Sports
User: Daniel Lertpratchya
Manchester United soccer fan
www.manutd.com
Data: 61 soccer matches
Task: Find interesting matches to watch replay videos.
Explore data to find fun facts.
Begin Score Opponent Score End
78. 4
#
Design Guidelines
Align-Rank-Filter Handle event types Incorporate attributes
Breakfast
Lunch } Meal
Multiple levels Multiple overviews Coordinated views
of information
Overview
Record
Event
Search Data preprocessing History / Provenance
79. Outline
Approximate
Introduction Search Conclusions
LifeFlow Case Studies
Overview
80. Contributions
1. How to provide an overview of multiple event sequences?
# 1
LifeFlow Visualization
Aggregation, Visual encodings & Interactions
2. How to support users when they are uncertain about
what they are looking for?
#2 # 3
Similarity Search Hybrid Search
Similan + Match & Mismatch Flexible Temporal Search
4
#
Case Studies + Design Guidelines
81. Future Directions
Outflow
Improve the New tasks:
visualization & UI: comparison,
colors, gaps, … attributes in query, …!
More complex data: Scalability:
stream, interval database,
concurrency, …! cloud computing, …
82. Outline
Approximate
Introduction Search Conclusions
LifeFlow Case Studies
Overview
83. Outline
Approximate
Introduction Search Conclusions
LifeFlow Case Studies
Overview
This is an event sequence!
89. Acknowledgement
Washington Hospital Center
Dr. A. Zach Hettinger , Dr. Phuong Ho and Dr. Mark Smith
National Institutes of Health
Grant RC1CA147489-02
Center for Integrated Transportation Systems Management
a Tier 1 Transportation Center at the University of Maryland
Study Participants
Advisors, Committees, HCIL Colleagues
90. Contributions
1. How to provide an overview of multiple event sequences?
LifeFlow Visualization
Aggregation, Visual encodings & Interactions
2. How to support users when they are uncertain about
what they are looking for?
Similarity Search Hybrid Search
Similan + Match & Mismatch Flexible Temporal Search
Case Studies + Design Guidelines
http://www.cs.umd.edu/hcil/lifeflow kristw@cs.umd.edu / @kristwongz